Checking Normality of Error Term (2019)

Normality of Error Term

In multiple linear regression models, the sum of squared residuals (SSR) is divided by $n-p$ (degrees of freedom, where $n$ is the total number of observations, and $p$ is the number of the parameter in the model) is a good estimate of the error variance. In the multiple linear regression model, the residual vector is

e &=(I-H)y\\

where $H$ is the hat matrix for the regression model.

Each component $e_i=\varepsilon – \sum\limits_{i=1}^n h_{ij} \varepsilon_i$. Therefore, In multiple linear regression models, the normality of the residual is not simply the normality of the error term.

Note that:

\[Cov(\mathbf{e})=(I-H)\sigma^2 (I-H)’ = (I-H)\sigma^2\]

We can write $Var(e_i)=(1-h_{ii})\sigma^2$.

If the sample size ($n$) is much larger than the number of the parameters ($p$) in the model (i.e. $n > > p$), in other words, if sample size ($n$) is large enough, $h_{ii}$ will be small as compared to 1, and $Var(e_i) \approx \sigma^2$.

In multiple regression models, a residual behaves like an error if the sample size is large. However, this is not true for a small sample size.

It is unreliable to check the normality of error term assumption using residuals from multiple linear regression models when the sample size is small.

Normality of the Error Term

Learn more about Hat matrix: Role of Hat matrix in Diagnostics of Regression Analysis. statistics help

Learn R Programming Language

Chi Square Goodness of Fit Test (2019)

The post is about the Chi Square Goodness of Fit Test.

Application of $\chi^2$distribution is the test of goodness of fit. It is possible to test the hypothesis that a population has a specified theoretical distribution using the $\chi^2$ distribution. The theoretical distribution may be Normal, Binomial, Poisson, or any other distribution.

The Chi-Square Goodness of Fit Test enables us to check whether there is a significant difference between an observed frequency distribution and a theoretical frequency distribution (expected frequency distribution) based on some theoretical models, that is (how well it fits the distribution of data we have observed). A goodness of fit test between observed and expected frequencies is based upon

[\chi^2 = \sum\limits_{i=1}^k \left[ \frac{(OF_i – EF_i)^2}{EF_i} \right] ]

where $OF_i$ represents the observed and $EF_i$ the expected frequencies. for the $i$th class and $k$ is the number of possible outcomes or the number of different classes.

Degrees of Freedom (Chi Square Goodness of Fit Test)

It is important to note that

  • The computed $\chi^2$ value will be small if the observed frequencies are close to the corresponding expected frequencies indicating a good fit.
  • The computed $\chi^2$ value will be large, if observed and expected frequencies have a great deal of difference, indicating a poor fit.
  • A good fit leads to the acceptance of the null hypothesis that the sample distribution agrees with the hypothetical or theoretical distribution.
  • A bad fit leads to the rejection of the null hypothesis.

Critical Region (Chi Square Goodness of Fit Test)

The critical region under the $\chi^2$ curve will fall in the right tail of the distribution. We find the critical value of $\chi^2_{\alpha}$ from the table for a specified level of significance $\alpha$ and $v$ degrees of freedom.


If the computed $\chi^2$ value is greater than the critical $\chi^2_{\alpha}$ the null hypothesis will be rejected. Thus $\chi^2> \chi^2_{\alpha}$ constitutes the critical region.

Chi square goodness of fit test

Some Requirements

The Chi Square Goodness of fit test should not be applied unless each of the expected frequencies is at least equal to 5. When there are smaller expected frequencies in several, these should be combined (merged). The total number of frequencies should not be less than fifty.

Note that we must look with suspicion upon circumstances where $\chi^2$ is too close to zero since it is rare that observed frequencies agree well with expected frequencies. To examine such situations, we can determine whether the computed value of $\chi^2$ is less than $\chi^2_{0.95}$ to decide that the agreement is too good at the 0.05 level of significance.

R Programming Language

Computer MCQs Test Online

MCQs Introductory Statistics 3

The post is about MCQs Introductory Statistics. There are 25 multiple-choice questions covering topics related to the measure of dispersions, measure of central tendency, and mean deviation. Let us start with the MCQs introductory statistics quiz with answers.

Online MCQs about Basic Statistics with Answers

1. The lowest value of variance can be


2. Variance is always calculated from


3. The range of the values -5, -8, -10, 0, 6, 10 is


4. If the standard deviation of the values 2, 4, 6, and 8 is 2.58, then the standard deviation of the values 4, 6, 8, and 10 is


5. Mean Deviation, Variance, and Standard Deviation of the values 4, 4, 4, 4, 4, 4 is


6. The variance of 5 numbers is 10. If each number is divided by 2, then the variance of new numbers is


7. If all values are the same then the measure of dispersion will be


8. If $Y=-8X-5$ and SD of $X$ is 3, then SD of $Y$ is


9. $Var(2X+3)\,$ is


10. The sum of squares of deviation is least if measured from


11. The sum of squared deviations of a set of $n$ values from their mean is


12. If $X$ and $Y$ are independent then $SD(X-Y)$ is


13. The standard deviation is always _________ than the mean deviation


14. The measure of dispersion is changed by a change of


15. Which of these is a relative measure of dispersion


16. The mean deviation of the values, 18, 12, and 15 is


17. Standard deviation is calculated from the Harmonic Mean (HM)


18. For the symmetrical distribution, approximately 68% of the cases are included between


19. Variance remains unchanged by the change of


20. The percentage of values lies between $\overline{X}\pm 2 SD\,$ is


21. A measure of dispersion is always


22. Suppose for 40 observations, the variance is 50. If all the observations are increased by 20, the variance of these increased observations will be


23. If $a$ and $b$ are two constants, then $Var(a + bX)\,$ is


24. The measure of Dispersion can never be


25. The variance of a constant is


MCQs Introductory Statistics with Answers

MCQs Introductory Statistics with Answers
  • A measure of dispersion is always
  • Which of these is a relative measure of dispersion
  • The measure of spread/dispersion is changed by a change of
  • Mean Deviation, Variance, and Standard Deviation of the values 4, 4, 4, 4, 4, 4 is
  • The mean deviation of the values, 18, 12, and 15 is
  • The sum of squares of deviation is least if measured from
  • The sum of squared deviations of a set of $n$ values from their mean is
  • Variance is always calculated from
  • The lowest value of variance can be
  • The variance of a constant is
  • Variance remains unchanged by the change of
  • $Var(2X+3)\,$ is
  • If $a$ and $b$ are two constants, then $Var(a + bX)\,$ is
  • Suppose for 40 observations, the variance is 50. If all the observations are increased by 20, the variance of these increased observations will be
  • Standard deviation is calculated from the Harmonic Mean (HM)
  • The variance of 5 numbers is 10. If each number is divided by 2, then the variance of new numbers is
  • If $X$ and $Y$ are independent then $SD(X-Y)$ is
  • If $Y=-8X-5$ and SD of $X$ is 3, then SD of $Y$ is
  • The standard deviation is always ———– than the mean deviation
  • If the standard deviation of the values 2, 4, 6, and 8 is 2.58, then the standard deviation of the values 4, 6, 8, and 10 is
  • For the symmetrical distribution, approximately 68% of the cases are included between
  • The percentage of values lies between $\overline{X}\pm 2 SD\,$ is
  • The measure of Dispersion can never be
  • If all values are the same then the measure of dispersion will be
  • The range of the values -5, -8, -10, 0, 6, 10 is
Statistics Help mcqs introductory statistics with answers