Testing of Hypothesis - Statistics for Data Science & Analytics

Chi Square Goodness of Fit Test (2019)

Aug 2, 2024Aug 8, 2019 by Muhammad Imdad Ullah

The post is about the Chi Square Goodness of Fit Test.

Application of $\chi^2$distribution is the test of goodness of fit. It is possible to test the hypothesis that a population has a specified theoretical distribution using the $\chi^2$ distribution. The theoretical distribution may be Normal, Binomial, Poisson, or any other distribution.

The Chi-Square Goodness of Fit Test enables us to check whether there is a significant difference between an observed frequency distribution and a theoretical frequency distribution (expected frequency distribution) based on some theoretical models, that is (how well it fits the distribution of data we have observed). A goodness of fit test between observed and expected frequencies is based upon

[\chi^2 = \sum\limits_{i=1}^k \left[ \frac{(OF_i – EF_i)^2}{EF_i} \right] ]

where $OF_i$ represents the observed and $EF_i$ the expected frequencies. for the $i$th class and $k$ is the number of possible outcomes or the number of different classes.

Degrees of Freedom (Chi Square Goodness of Fit Test)

It is important to note that

The computed $\chi^2$ value will be small if the observed frequencies are close to the corresponding expected frequencies indicating a good fit.
The computed $\chi^2$ value will be large, if observed and expected frequencies have a great deal of difference, indicating a poor fit.
A good fit leads to the acceptance of the null hypothesis that the sample distribution agrees with the hypothetical or theoretical distribution.
A bad fit leads to the rejection of the null hypothesis.

Critical Region (Chi Square Goodness of Fit Test)

The critical region under the $\chi^2$ curve will fall in the right tail of the distribution. We find the critical value of $\chi^2_{\alpha}$ from the table for a specified level of significance $\alpha$ and $v$ degrees of freedom.

Decision

If the computed $\chi^2$ value is greater than the critical $\chi^2_{\alpha}$ the null hypothesis will be rejected. Thus $\chi^2> \chi^2_{\alpha}$ constitutes the critical region.

Some Requirements

The Chi Square Goodness of fit test should not be applied unless each of the expected frequencies is at least equal to 5. When there are smaller expected frequencies in several, these should be combined (merged). The total number of frequencies should not be less than fifty.

Note that we must look with suspicion upon circumstances where $\chi^2$ is too close to zero since it is rare that observed frequencies agree well with expected frequencies. To examine such situations, we can determine whether the computed value of $\chi^2$ is less than $\chi^2_{0.95}$ to decide that the agreement is too good at the 0.05 level of significance.

R Programming Language

Computer MCQs Test Online

Cohen Effect Size and Statistical Significance

May 24, 2024Apr 25, 2019 by Muhammad Imdad Ullah

Statistical significance is important but not the most important consideration in evaluating the results. Because statistical significance tells only the likelihood (probability) that the observed results are due to chance alone. Considering the effect size when obtaining statistically significant results is important.

Effect size is a quantitative measure of some phenomenon. For example,

Correlation between two variables
The regression coefficients ($\beta_0, \beta_1, \beta_2$) for the regression model, for example, coefficients $\beta_1, \beta_2, \cdots$
The mean difference between two or more groups
The risk with which something happens

The effect size plays an important role in power analysis, sample size planning, and meta-analysis.

Since effect size indicates how strong (or important) our results are. Therefore, when you are reporting results about the statistical significance for an inferential test, the effect size should also be reported.

For the difference in means, the pooled standard deviation (also called combined standard deviation, obtained from pooled variance) is used to indicate the effect size.

The Cohen Effect Size for the Difference in Means

The effect size ($d$) for the difference in means by Cohen is

$d=\frac{mean\, of\, group\,1 – mean\,of\,group\,2}{SD_{pooled}}$

Cohen provided rough guidelines for interpreting the effect size.

If $d=0.2$, the effect size will be considered as small.

For $d=0.5$, the effect size will be medium.

and if $d=0.8$, the effect size is considered as large.

Note that statistical significance is not the same as the effect size. The statistical significance tells how likely it is that the result is due to chance, while effect size tells how important the result is.

Also note that the statistical-significance is not equal to economic, human, or scientific significance.

For the effect size of the dependent sample $t$-test, see the post-effect size for the dependent sample t-test

Cohen Effect size and statistical significance

See the short video on Effect Size and Statistical Significance

Visit: https://gmstat.com

https://rfaqs.com

Performing Chi Square test from Crosstabs in SPSS

Apr 7, 2024Apr 20, 2019 by Muhammad Imdad Ullah

In this post, we will learn about “performing Chi Square Test” in SPSS Statistics Software. For this purpose, from the ANALYSIS menu of SPSS, the crosstabs procedure in descriptive statistics is used to create contingency tables also known as two-way frequency tables, cross-tabulation, which describe the association between two categories of variables.

In a crosstab, the categories of one variable determine the rows of the contingency table, and the categories of the other variable determine the columns. The contingency table dimensions can be reported as $R\times C$, where $R$ is the number of categories for the row variables, and $C$ is the number of categories for the column variable. Additionally, a “square” crosstab is one in which the row and column variables have the same number of categories. Tables of dimensions $2 \times 2$, $3\times 3$, $4\times 4$, etc., are all square crosstab.

Performing Chi Square Test in SPSS

Let us start performing Chi Square test on cross-tabulation in SPSS, first, click Analysis from the main menu, then Descriptive Statistics, and then Crosstabs, as shown in the figure below

Performing Chi Square Test Crosstabs in SPSS

As an example, we are using the “satisf.sav” data file that is already available in the SPSS installation folder. Suppose, we are interested in finding the relationship between the “Shopping Frequency” and the “Made Purchase” variable. For this purpose, shift any one of the variables from the left pan to the right pan as row(s) and the other in the right pan as column(s). Here, we are taking “Shopping Frequency” as row(s) and “Made Purchase” as column(s) variables. Pressing OK will give the contingency table only.

The ROW(S) box is used to enter one or more variables to be used in the cross-table and Chi-Square statistics. Similarly, the COLUMNS(S) box is used to enter one or more variables to be used in the cross-table and Chi-Square statistics. Note At least one row and one column variable should be used.

The layer box is used when you need to find the association between three or more variables. When the layer variable is specified, the crosstab between the row and the column variables will be created at each level of the layer variable. You can have multiple layers of variables by specifying the first layer variable and then clicking next to specify the second layer variable. Alternatively, you can try out multiple variables as single layers at a time by putting them all in layer 1 of 1 box.

The STATISTICS button will lead to a dialog box that contains different inferential statistics for finding the association between categorical variables.

The CELL button will lead to a dialog box that controls which output is displayed in each crosstab cell, such as observed frequency, expected frequency, percentages, residuals, etc., as shown below.

Performing Chi Square test on the selected variables, click on the “Statistics” button and choose (tick) the option of “Chi-Square” from the top-left side of the dialog box shown below. Note the Chi-square check box must have a tick in it, otherwise only a cross-table will be displayed.

Press the “Continue” button and then the OK button. We will get output windows containing the cross-tabulation results in Chi-Square statistics as shown below

The Chi-Square results indicate an association between the categories of the “Sopping Frequency” variable and the “Made Purchase” variable since the p-value is smaller than say 0.01 level of significance.

For video lecture on Contingency Table and chi-square statistics, See the video lectures

See another video about the Contingency Table and Chi-Square Goodness of Fit Test

Learn How to perform data analysis in SPSS

Learn R Programming Language

Table of Contents

Degrees of Freedom (Chi Square Goodness of Fit Test)

Critical Region (Chi Square Goodness of Fit Test)

Decision

Some Requirements

Share this:

The Cohen Effect Size for the Difference in Means

Share this:

Performing Chi Square Test in SPSS

Share this: