Introduction to Contingency Tables
Contingency Tables also called cross tables or two-way frequency tables describe the relationship between several categorical (qualitative) variables. A bivariate relationship is defined by the joint distribution of the two associated random variables.
Table of Contents
Contingency Tables
Let $X$ and $Y$ be two categorical response variables. Let variable $X$ have $I$ levels and variable $Y$ have $J$. The possible combinations of classifications for both variables are $I\times J$. The response $(X, Y)$ of a subject randomly chosen from some population has a probability distribution, which can be shown in a rectangular table having $I$ rows (for categories of $X$) and $J$ columns (for categories of $Y$).
The cells of this rectangular table represent the $IJ$ possible outcomes. Their probability (say $\pi_{ij}$) denotes the probability that ($X, Y$) falls in the cell in row $i$ and column $j$. When these cells contain frequency counts of outcomes, the table is called a contingency or cross-classification table and it is referred to as a $I$ by $J$ ($I \times J$) table.
Joint and Marginal Distribution
The probability distribution {$\pi_{ij}$} is the joint distribution of $X$ and $Y$. The marginal distributions are the rows and columns totals obtained by summing the joint probabilities. For the row variable ($X$) the marginal probability is denoted by $\pi_{i+}$ and for column variable ($Y$) it is denoted by $\pi_{+j}$, where the subscript “+” denotes the sum over the index it replaces; that is, $\pi_{i+}=\sum_j \pi_{ij}$ and $\pi_{+j}=\sum_i \pi_{ij}$ satisfying
$l\sum_{i} \pi_{i+} =\sum_{j} \pi_{+j} = \sum_i \sum_j \pi_{ij}=1$
Note that the marginal distributions are single-variable information, and do not pertain to association linkages between the variables.
In (many) contingency tables, one variable (say, $Y$) is a response, and the other $X$) is an explanatory variable. When $X$ is fixed rather than random, the notation of a joint distribution for $X$ and $Y$ is no longer meaningful. However, for a fixed level of $X$, the variable $Y$ has a probability distribution. It is germane to study how this probability distribution of $Y$ changes as the level of $X$ changes.
Contingency Table Uses
- Identify relationships between categorical variables.
- See if one variable is independent of the other (i.e. if the frequency of one category is the same regardless of the other variable’s category).
- Calculate probabilities of specific combinations occurring.
- Often used as a stepping stone for further statistical analysis, like chi-square tests, to determine if the observed relationship between the variables is statistically significant.
Read More about Contingency Tables