Correlation ratio

In statistics, the correlation ratio is a measure of the relationship between the statistical dispersion within individual categories and the dispersion across the whole population or sample.

Suppose each observation is yxi where x indicates the category that observation is in and xi is the label of the particular observation. We will write nx for the number of observations in category x (not necessarily the same for different values of x) and
 * $$\overline{y}_x=\frac{\sum_i y_{xi}}{n_x}$$ and $$\overline{y}=\frac{\sum_x n_x \overline{y}_x}{\sum_x n_x}$$

then the correlation ratio &eta; (eta) is defined so as to satisfy


 * $$\eta^2 = \frac{\sum_x n_x (\overline{y}_x-\overline{y})^2}{\sum_{xi} (y_{xi}-\overline{y})^2}$$

which might be written as


 * $$\frac{{\sigma_{\overline{y}}}^2}{{\sigma_{y}}^2}.$$

It is worth noting that if the relationship between values of $$x \;\ $$ and values of $$\overline{y}_x$$ is linear (which is certainly true when there are only two possibilities for x) this will give the same result as the square of the correlation coefficient; if not then the correlation ratio will be larger in magnitude, though still no more than 1 in magnitude. It can therefore be used for judging non-linear relationships.