Cross tabulation

A cross tabulation (often abbreviated as cross tab) displays the joint distribution of two or more variables. They are usually presented as a contingency table in a matrix format. Whereas a frequency distribution provides the distribution of one variable, a contingency table describes the distribution of two or more variables simultaneously. Each cell shows the number of respondents that gave a specific combination of responses, that is, each cell contains a single cross tabulation.

The following is a ficticious example of a 2 &times; 3 contingency table. The variable “Wikipedia usage” has three categories: heavy user, light user, and non user. These categories are all inclusive so the columns sum to 100%. The other variable "underpants" has two categories: boxers, and briefs. These categories are not all inclusive so the rows need not sum to 100%. Each cell gives the percentage of subjects that share that combination of traits.



Cross tabs are frequently used because: 
 * 1) They are easy to understand. They appeal to people that do not understand the more sophisticated measures.
 * 2) They can be used with any level of data: nominal, ordinal, interval, or ratio - cross tabs treat all data as if it is nominal
 * 3) A table can provide greater insight than single statistics
 * 4) It solves the problem of empty or sparse cells
 * 5) they are simple to conduct

Statistics related to cross tabulations
The following list is not comprehensive.
 * Chi-square - This tests the statistical significance of the cross tabulations. Chi-squared should not be calculated for percentages. The cross tabs must be converted back to absolute counts (numbers) before calculating chi-squared. Chi-squared is also problematic when any cell has a joint frequency of less than five.
 * Contingency Coefficient - This tests the strength of association of the cross tabulations. It is a variant of the phi coefficient that adjusts for statistical significance. Values range from 0 (no association) to 1 (the theoretical maximum possible association).
 * Cramer’s V - This tests the strength of association of the cross tabulations. It is a variant of the phi coefficient that adjusts for the number of rows and columns. Values range from 0 (no association) to 1 (the theoretical maximum possible association).
 * Lambda Coefficient - This tests the strength of association of the cross tabulations when the variables are measured at the nominal level. Values range from 0 (no association) to 1 (the theoretical maximum possible association). Asymmetric lambda measures the percentage improvement in predicting the dependent variable. Symmetric lambda measures the percentage improvement when prediction is done in both directions.
 * phi coefficient - If both variables instead are nominal and dichotomous, phi coeficient is a measure of the degree of association between two binary variables. This measure is similar to the correlation coefficient in its interpretation. Two binary variables are considered positively associated if most of the data falls along the diagonal cells. In contrast, two binary variables are considered negatively associated if most of the data falls off the diagonal.
 * Kendall tau:
 * Tau b - This tests the strength of association of the cross tabulations when both variables are measured at the ordinal level. It makes adjustments for ties and is most suitable for square tables. Values range from -1 (no association) to +1 (the theoretical maximum possible association).
 * Tau c - This tests the strength of association of the cross tabulations when both variables are measured at the ordinal level. It makes adjustments for ties and is most suitable for rectangular tables. Values range from -1 (no association) to +1 (the theoretical maximum possible association).
 * Gamma - This tests the strength of association of the cross tabulations when both variables are measured at the ordinal level. It makes no adjustment for either table size or ties. Values range from -1 (no association) to +1 (the theoretical maximum possible association).
 * Uncertainty coefficient, entropy coefficient or Theil's U