RBSC coefficient

RBSC defines the correlation between a binomial variable b ∈ {0, 1} and a ranking variable r ∈ [rmin, rmax].

Suppose that we have a data setX with several elements (e.g. students in a class).

Assume that each element is associated with a binary feature (e.g. the student commutes by bicycle or not) and a ranking feature (e.g. the student’s weight). Consider a hypothesis which asserts that if b = 1, then r takes larger values.

Once can calculate the RBSC coefficient p to quantify the validity of this hypothesis where S is the number of evidence supporting the hypothesis and C is the number of evidence contradicting the hypothesis. Specifically, S and C are counted by comparing each pair of items in the data set.

$p=S-C/S+C$

First we consider two subsets, where one is the collection of the items with b = 0, X0, and the other is the collection of the items with b = 1, X1.

We then compare each pair of items x0 ∈ X0 and x1 ∈ X1 from these subsets and check if their ranking features agree with the hypothesis or not.

Suppose that the ranking features of x0 and x1 are denoted with r(x0) and r(x1), respectively. The number of evidence supporting the hypothesis S is the number of (x0, x1) pairs which satisfy r(x0) > r(x1). Similarly, the number of contradicting evidence C is the number of (x0, x1) pairs which satisfy r(x0) < r(x1). The range of RBSC coefficient p is [−1, 1].

If the evidence are favorable p will be exactly 1, and if the data are all non-favorable, p will be −1.

By this way, we can measure the validity of the hypothesis by considering how close p is to 1.