Spearman's rank correlation coefficient

In statistics, Spearman's rank correlation coefficient, named for Charles Spearman and often denoted by the Greek letter &rho; (rho), is a non-parametric measure of correlation &#8211; that is, it assesses how well an arbitrary monotonic function could describe the relationship between two variables, without making any assumptions about the frequency distribution of the variables. Unlike the Pearson product-moment correlation coefficient, it does not require the assumption that the relationship between the variables is linear, nor does it require the variables to be measured on interval scales; it can be used for variables measured at the ordinal level.

In principle, &rho; is simply a special case of the Pearson product-moment coefficient in which the data are converted to ranks before calculating the coefficient. In practice, however, a simpler procedure is normally used to calculate &rho;. The raw scores are converted to ranks, and the differences D between the ranks of each observation on the two variables are calculated. &rho; is then given by:


 * $$ \rho = 1- {\frac {6 \sum D^2}{N(N^2 - 1)}}$$

where:


 * D = the difference between the ranks of corresponding values of X and Y, and


 * N = the number of pairs of values.

The formula becomes more complicated in the presence of tied ranks, but unless the tie bands are large, the effect of ignoring them is small.

Determining significance
The modern approach to testing whether an observed value of &rho; is significantly different from zero is to calculate the probability that it would be greater than or equal to the observed &rho;, given the null hypothesis, by using a permutation test. This approach is almost always superior to traditional methods, unless the data set is so large that computing power is not sufficient to generate permutations (unlikely in modern computing), or unless an algorithm for creating permutations that are logical under the null hypothesis is difficult to devise for the particular case (but usually these algorithms are straightforward).

Although the permutation test is often trivial to perform for anyone with computing resources and programming experience, traditional methods for determining significance are still widely used. The most basic approach is to compare the observed &rho; with published tables for various levels of significance. This is a simple solution if the significance only needs to be known within a certain range or less than a certain value, as long as tables are available that specify the desired ranges. A reference to such a table is given below. However, generating these lookup tables is computationally intensive and complicated mathematical tricks have been used over the years to generate tables for greater and greater sample sizes, so it is not practical for most people to extend existing tables.

An alternative approach available for sufficiently large sample sizes is an approximation to the Student's t-distribution. For sample sizes above about 20, the variable
 * $$t = \frac{\rho}{\sqrt{(1-\rho^2)/(n-2)}}$$

has a Student's t-distribution in the null case (zero correlation). In the non-null case (i.e. to test whether an observed &rho; is significantly different from a theoretical value, or whether two observed &rho;s differ significantly) tests are much less powerful, though the t-distribution can again be used.

A generalisation of the Spearman coefficient is useful in the situation where there are three or more conditions, a number of subjects are all observed in each of them, and we predict that the observations will have a particular order. For example, a number of subjects might each be given three trials at the same task, and we predict that performance will improve from trial to trial. A test of the significance of the trend between conditions in this situation was developed by E. B. Page and is usually referred to as Page's trend test for ordered alternatives.

Determining significance
The modern approach to testing whether an observed value of &rho; is significantly different from zero is to calculate the probability that it would be greater than or equal to the observed &rho;, given the null hypothesis, by using a permutation test. This approach is almost always superior to traditional methods, unless the data set is so large that computing power is not sufficient to generate permutations (unlikely in modern computing), or unless an algorithm for creating permutations that are logical under the null hypothesis is difficult to devise for the particular case (but usually these algorithms are straightforward).

Although the permutation test is often trivial to perform for anyone with computing resources and programming experience, traditional methods for determining significance are still widely used. The most basic approach is to compare the observed &rho; with published tables for various levels of significance. This is a simple solution if the significance only needs to be known within a certain range or less than a certain value, as long as tables are available that specify the desired ranges. A reference to such a table is given below. However, generating these lookup tables is computationally intensive and complicated mathematical tricks have been used over the years to generate tables for greater and greater sample sizes, so it is not practical for most people to extend existing tables.

An alternative approach available for sufficiently large sample sizes is an approximation to the Student's t-distribution. For sample sizes above about 20, the variable
 * $$t = \frac{\rho}{\sqrt{(1-\rho^2)/(n-2)}}$$

has a Student's t-distribution in the null case (zero correlation). In the non-null case (i.e. to test whether an observed &rho; is significantly different from a theoretical value, or whether two observed &rho;s differ significantly) tests are much less powerful, though the t-distribution can again be used.

A generalisation of the Spearman coefficient is useful in the situation where there are three or more conditions, a number of subjects are all observed in each of them, and we predict that the observations will have a particular order. For example, a number of subjects might each be given three trials at the same task, and we predict that performance will improve from trial to trial. A test of the significance of the trend between conditions in this situation was developed by E. B. Page and is usually referred to as Page's trend test for ordered alternatives.