Violation of OLS Assumptions - Statistics for Data Science & Analytics

The Breusch Pagan test (named after Trevor Breusch and Adrian Pagan) is used to check for the presence of heteroscedasticity in a linear regression model.

Assume our regression model is $Y_i = \beta_1 + \beta_2 X_{2i} + \mu_i$ i.e we have simple linear regression model, and $E(u_i^2)=\sigma_i^2$, where $\sigma_i^2=f(\alpha_1 + \alpha_2 Z_{2i})$,

That is $\sigma_i^2$ is some function of the non-stochastic variable $Z$’s. The $f()$ allows for both the linear and non-linear forms of the model. The variable $Z$ is the independent variable $X$ or it could represent a group of independent variables other than $X$.

Step to Perform Breusch Pagan test

Estimate the model by OLS and obtain the residuals $\hat{u}_1, \hat{u}_2+\cdots$
Estimate the variance of the residuals i.e. $\hat{\sigma}^2=\frac{\sum e_i^2}{(n-2)}$
Run the regression $\frac{e_i^2}{\hat{\sigma^2}}=\beta_1+\beta_2 Z_i + u_i$ and compute the explained sum of squares (ESS) from this regression
Test the statistical significance of $\frac{ESS}{2}$ by $\chi^2$-test with 1 df at the appropriate level of significance ($\alpha$).
Reject the hypothesis of homoscedasticity in favour of heteroscedasticity if $\frac{ESS}{2} > \chi^2_{(1)}$ at the appropriate level of $\alpha$.

Bruesch-Pagan-Test-of-Heteroscedasticity

Note that the

The Breusch Pagan test is valid only if $u_i$’s are normally distributed.
For k independent variables, $\frac{ESS}{2}$ has ($\chi^2$) Chi-square distribution with k degree of freedom.
If the $u_i$’s (error term) are not normally distributed, the White test is used.

If heteroscedasticity is detected, remedies may include using robust standard errors, transforming the data, or employing weighted least squares estimation to adjust for heteroscedasticity.

The Breusch Pagan test is considered a useful tool for detecting the presence of heteroscedasticity in the regression models. The Breusch Pagan Test helps to ensure the validity of statistical inference and estimation.

A sample of Stata output related to the Breusch-Pagan Test for the detection of heteroscedasticity.

By analyzing the p-value of the chi-squared test statistic from the second regression, one can decide whether to reject the null hypothesis of homoscedasticity. If the p-value is lower than the chosen level of significance (say, 0.05), one has the evidence of heteroscedasticity.

The following are important points that need to be considered while using Breusch Pagan test of Heteroscedasticity.

The Breusch-Pagan test can be sensitive to the normality of the error terms. Therefore, It is advisable to check if the residuals are normally distributed before running the Breusch-Pagan test.
There are other tests for heteroscedasticity, but the Breusch-Pagan test is a widely used and relatively straightforward option.

References:

Breusch, T.S.; Pagan, A.R. (1979). “Simple test for heteroscedasticity and random coefficient variation”. Econometrica (The Econometric Society) 47 (5): 1287–1294.

See the Numerical Example of the Breusch-Pagan Test for the Detection of Heteroscedasticity

R Frequently Asked Questions

Heteroscedasticity Definition

An important assumption of OLS is that the disturbances $u_i$ appearing in the population regression function are homoscedastic (Error terms have the same variance).

The variance of each disturbance term $u_i$, conditional on the chosen values of explanatory variables, is some constant number equal to $\sigma^2$. $E(u_{i}^{2})=\sigma^2$; where $i=1,2,\cdots, n$.
Homo means equal and scedasticity means spread.

Consider the general linear regression model
\[y_i=\beta_1+\beta_2 x_{2i}+ \beta_3 x_{3i} +\cdots + \beta_k x_{ki} + \varepsilon\]

If $E(\varepsilon_{i}^{2})=\sigma^2$ for all $i=1,2,\cdots, n$ then the assumption of constant variance of the error term or homoscedasticity is satisfied.

If $E(\varepsilon_{i}^{2})\ne\sigma^2$ then the assumption of homoscedasticity is violated and heteroscedasticity is said to be present. In the case of heteroscedasticity, the OLS estimators are unbiased but inefficient.

Examples:

The range in family income between the poorest and richest families in town is the classical example of heteroscedasticity.
The range in annual sales between a corner drug store and a general store.

Heteroscedasticity Definition, Reasons, Consequences

Reasons for Heteroscedasticity

There are several reasons why the variances of error term $u_i$ may be variable, some of which are:

Following the error learning models, as people learn their errors of behavior become smaller over time. In this case $\sigma_{i}^{2}$ is expected to decrease. For example the number of typing errors made in a given period on a test to the hours put in typing practice.
As income grows, people have more discretionary income, and hence $\sigma_{i}^{2}$ is likely to increase with income.
As data-collecting techniques improve, $\sigma_{i}^{2}$ is likely to decrease.
Heteroscedasticity can also arise as a result of the presence of outliers. The inclusion or exclusion of such observations, especially when the sample size is small, can substantially alter the results of regression analysis.
Heteroscedasticity arises from violating the assumption of CLRM (classical linear regression model), that the regression model is not correctly specified.
Skewness in the distribution of one or more regressors included in the model is another source of heteroscedasticity.
Incorrect data transformation and incorrect functional form (linear or log-linear model) are also the sources of heteroscedasticity

Consequences of Heteroscedasticity

The OLS estimators and regression predictions based on them remain unbiased and consistent.
The OLS estimators are no longer the BLUE (Best Linear Unbiased Estimators) because they are no longer efficient, so the regression predictions will be inefficient too.
Because of the inconsistency of the covariance matrix of the estimated regression coefficients, the tests of hypotheses, (t-test, F-test) are no longer valid.

Note: Problems of heteroscedasticity are likely to be more common in cross-sectional than in time series data.

Reference
Greene, W.H. (1993). Econometric Analysis, Prentice–Hall, ISBN 0-13-013297-7.
Verbeek, Marno (2004.) A Guide to Modern Econometrics, 2. ed., Chichester: John Wiley & Sons.
Gujarati, D. N. & Porter, D. C. (2008). Basic Econometrics, 5. ed., McGraw Hill/Irwin.

FAQS about Heteroscedasticity

Define heteroscedasticity.
What are the major consequences that may occur if heteroscedasticity occurs?
What does mean by the constant variance of the error term in linear regression models?
What are the possible reasons that make error term variance a variable?
In what kind of data are problems of heteroscedasticity is likely to exist?

Learn R Programming Language

Breusch Pagan Test for Heteroscedasticity (2021)

Step to Perform Breusch Pagan test

Heteroscedasticity Definition, Reasons, Consequences

Heteroscedasticity Definition

Table of Contents

Reasons for Heteroscedasticity

Consequences of Heteroscedasticity

FAQS about Heteroscedasticity

Step to Perform Breusch Pagan test

Share this:

Heteroscedasticity Definition

Table of Contents

Reasons for Heteroscedasticity

Consequences of Heteroscedasticity

FAQS about Heteroscedasticity

Share this: