Chi-Square Distribution - Statistics for Data Science & Analytics

Contingency Tables

Jun 25, 2024May 10, 2021 by Muhammad Imdad Ullah

Introduction to Contingency Tables

Contingency Tables also called cross tables or two-way frequency tables describe the relationship between several categorical (qualitative) variables. A bivariate relationship is defined by the joint distribution of the two associated random variables.

Contingency Tables

Let $X$ and $Y$ be two categorical response variables. Let variable $X$ have $I$ levels and variable $Y$ have $J$. The possible combinations of classifications for both variables are $I\times J$. The response $(X, Y)$ of a subject randomly chosen from some population has a probability distribution, which can be shown in a rectangular table having $I$ rows (for categories of $X$) and $J$ columns (for categories of $Y$).

The cells of this rectangular table represent the $IJ$ possible outcomes. Their probability (say $\pi_{ij}$) denotes the probability that ($X, Y$) falls in the cell in row $i$ and column $j$. When these cells contain frequency counts of outcomes, the table is called a contingency or cross-classification table and it is referred to as a $I$ by $J$ ($I \times J$) table.

Joint and Marginal Distribution

The probability distribution {$\pi_{ij}$} is the joint distribution of $X$ and $Y$. The marginal distributions are the rows and columns totals obtained by summing the joint probabilities. For the row variable ($X$) the marginal probability is denoted by $\pi_{i+}$ and for column variable ($Y$) it is denoted by $\pi_{+j}$, where the subscript “+” denotes the sum over the index it replaces; that is, $\pi_{i+}=\sum_j \pi_{ij}$ and $\pi_{+j}=\sum_i \pi_{ij}$ satisfying

$l\sum_{i} \pi_{i+} =\sum_{j} \pi_{+j} = \sum_i \sum_j \pi_{ij}=1$

Note that the marginal distributions are single-variable information, and do not pertain to association linkages between the variables.

In (many) contingency tables, one variable (say, $Y$) is a response, and the other $X$) is an explanatory variable. When $X$ is fixed rather than random, the notation of a joint distribution for $X$ and $Y$ is no longer meaningful. However, for a fixed level of $X$, the variable $Y$ has a probability distribution. It is germane to study how this probability distribution of $Y$ changes as the level of $X$ changes.

Contingency Table Uses

Identify relationships between categorical variables.
See if one variable is independent of the other (i.e. if the frequency of one category is the same regardless of the other variable’s category).
Calculate probabilities of specific combinations occurring.
Often used as a stepping stone for further statistical analysis, like chi-square tests, to determine if the observed relationship between the variables is statistically significant.

Read More about Contingency Tables

Computer MCQs Test Online

R Programming Language

Chi Square Goodness of Fit Test (2019)

Aug 2, 2024Aug 8, 2019 by Muhammad Imdad Ullah

The post is about the Chi Square Goodness of Fit Test.

Application of $\chi^2$distribution is the test of goodness of fit. It is possible to test the hypothesis that a population has a specified theoretical distribution using the $\chi^2$ distribution. The theoretical distribution may be Normal, Binomial, Poisson, or any other distribution.

The Chi-Square Goodness of Fit Test enables us to check whether there is a significant difference between an observed frequency distribution and a theoretical frequency distribution (expected frequency distribution) based on some theoretical models, that is (how well it fits the distribution of data we have observed). A goodness of fit test between observed and expected frequencies is based upon

[\chi^2 = \sum\limits_{i=1}^k \left[ \frac{(OF_i – EF_i)^2}{EF_i} \right] ]

where $OF_i$ represents the observed and $EF_i$ the expected frequencies. for the $i$th class and $k$ is the number of possible outcomes or the number of different classes.

Degrees of Freedom (Chi Square Goodness of Fit Test)

It is important to note that

The computed $\chi^2$ value will be small if the observed frequencies are close to the corresponding expected frequencies indicating a good fit.
The computed $\chi^2$ value will be large, if observed and expected frequencies have a great deal of difference, indicating a poor fit.
A good fit leads to the acceptance of the null hypothesis that the sample distribution agrees with the hypothetical or theoretical distribution.
A bad fit leads to the rejection of the null hypothesis.

Critical Region (Chi Square Goodness of Fit Test)

The critical region under the $\chi^2$ curve will fall in the right tail of the distribution. We find the critical value of $\chi^2_{\alpha}$ from the table for a specified level of significance $\alpha$ and $v$ degrees of freedom.

Decision

If the computed $\chi^2$ value is greater than the critical $\chi^2_{\alpha}$ the null hypothesis will be rejected. Thus $\chi^2> \chi^2_{\alpha}$ constitutes the critical region.

Some Requirements

The Chi Square Goodness of fit test should not be applied unless each of the expected frequencies is at least equal to 5. When there are smaller expected frequencies in several, these should be combined (merged). The total number of frequencies should not be less than fifty.

Note that we must look with suspicion upon circumstances where $\chi^2$ is too close to zero since it is rare that observed frequencies agree well with expected frequencies. To examine such situations, we can determine whether the computed value of $\chi^2$ is less than $\chi^2_{0.95}$ to decide that the agreement is too good at the 0.05 level of significance.

R Programming Language

Computer MCQs Test Online

Measure of Association: Contingency Table (2019)

Aug 11, 2024Apr 18, 2019 by Muhammad Imdad Ullah

The Contingency Table (also called two-way frequency tables/ crosstabs or cross-tabulations) is used to find the relationship (association or dependencies (a measure of association)) between two or more variables measured on the nominal or ordinal measurement scale.

Contingency Table: A Measure of Association

A contingency table contains $R$ rows and $C$ columns measured, the order of the contingency table is $R \times C$. There should be a minimum of 2 (categories in row variable without row header) and 2 (categories in column variable without column header).

A cross table is created by listing all the categories (groups or levels) of one variable as rows in the table and the categories (groups or levels) of other (second) variables as columns, and then joint (cell) frequency (or counts) for each cell. The cell frequencies are totaled across both the rows and the columns. These totals (sums) are called marginal frequencies. The sum (total) of column sums (or rows sum) can be called the Grand Total and must be equal to $N$. The frequencies or counts in each cell are the observed frequency.

The next step in calculating the Chi-square statistics is the computation of the expected frequency for each cell of the contingency table. The expected values of each cell are computed by multiplying the marginal frequencies of the row and marginal frequencies of the column (row sums and column sums are multiplied) and then dividing by the total number of observations (Grand Total, $N$). It can be formulated as

$Expected\,\, Frequency = \frac{(Row\,\, Total \,\, * \,\, Column\,\, Total)}{ Grand \,\, Total}$

The same procedure is used to compute the expected frequencies for all the cells of the contingency table.

The next step is related to the computation of the amount of deviation or error for each cell. for this purpose subtract the expected cell frequency from the observed cell frequency for each cell. The Chi-square statistic is computed by squaring the difference and then dividing the square of the difference by the expected frequency for each cell.

Contingency Table Measure of Association

Finally, the aggregate Chi-square statistic is computed by summing the Chi-square statistic. For formula is,

$$\chi^2=\sum_{i=1}^n \frac{\left(O_{if}-E_{ij}\right)^2}{E_{ij}}$$

The $\chi^2$ table value, the degrees of freedom, and the level of significance are required. The degrees of freedom for a contingency table is computed as
$$df=(number\,\, of \,\, rows – 1)(number \,\, of \,\, columns -1)$$.

For further detail about the contingency table (as a measure of association) and its example about how to compute expected frequencies and Chi-Square statistics, see the video lecture

See Classification of Randomized Complete Designs

Online MCQs Tests with Answers

Contingency Tables

Introduction to Contingency Tables

Table of Contents

Contingency Tables

Joint and Marginal Distribution

Contingency Table Uses

Chi Square Goodness of Fit Test (2019)

Table of Contents

Degrees of Freedom (Chi Square Goodness of Fit Test)

Critical Region (Chi Square Goodness of Fit Test)

Decision

Some Requirements

Measure of Association: Contingency Table (2019)

Contingency Table: A Measure of Association

Introduction to Contingency Tables

Table of Contents

Contingency Tables

Joint and Marginal Distribution

Contingency Table Uses

Share this:

Table of Contents

Degrees of Freedom (Chi Square Goodness of Fit Test)

Critical Region (Chi Square Goodness of Fit Test)

Decision

Some Requirements

Share this:

Contingency Table: A Measure of Association

Share this: