Heteroscedasticity Definition
An important assumption of OLS is that the disturbances $u_i$ appearing in the population regression function are homoscedastic (Error terms have the same variance).
Table of Contents
The variance of each disturbance term $u_i$, conditional on the chosen values of explanatory variables is some constant number equal to $\sigma^2$. $E(u_{i}^{2})=\sigma^2$; where $i=1,2,\cdots, n$.
Homo means equal and scedasticity means spread.
Consider the general linear regression model
\[y_i=\beta_1+\beta_2 x_{2i}+ \beta_3 x_{3i} +\cdots + \beta_k x_{ki} + \varepsilon\]
If $E(\varepsilon_{i}^{2})=\sigma^2$ for all $i=1,2,\cdots, n$ then the assumption of constant variance of the error term or homoscedasticity is satisfied.
If $E(\varepsilon_{i}^{2})\ne\sigma^2$ then the assumption of homoscedasticity is violated and heteroscedasticity is said to be present. In the case of heteroscedasticity, the OLS estimators are unbiased but inefficient.
Examples:
- The range in family income between the poorest and richest families in town is the classical example of heteroscedasticity.
- The range in annual sales between a corner drug store and a general store.

Reasons for Heteroscedasticity
There are several reasons why the variances of error term $u_i$ may be variable, some of which are:
- Following the error learning models, as people learn their errors of behavior become smaller over time. In this case $\sigma_{i}^{2}$ is expected to decrease. For example the number of typing errors made in a given period on a test to the hours put in typing practice.
- As income grows, people have more discretionary income, and hence $\sigma_{i}^{2}$ is likely to increase with income.
- As data-collecting techniques improve, $\sigma_{i}^{2}$ is likely to decrease.
- Heteroscedasticity can also arise as a result of the presence of outliers. The inclusion or exclusion of such observations, especially when the sample size is small, can substantially alter the results of regression analysis.
- Heteroscedasticity arises from violating the assumption of CLRM (classical linear regression model), that the regression model is not correctly specified.
- Skewness in the distribution of one or more regressors included in the model is another source of heteroscedasticity.
- Incorrect data transformation and incorrect functional form (linear or log-linear model) are also the sources of heteroscedasticity

Consequences of Heteroscedasticity
- The OLS estimators and regression predictions based on them remain unbiased and consistent.
- The OLS estimators are no longer the BLUE (Best Linear Unbiased Estimators) because they are no longer efficient, so the regression predictions will be inefficient too.
- Because of the inconsistency of the covariance matrix of the estimated regression coefficients, the tests of hypotheses, (t-test, F-test) are no longer valid.
Note: Problems of heteroscedasticity are likely to be more common in cross-sectional than in time series data.
Reference
Greene, W.H. (1993). Econometric Analysis, Prentice–Hall, ISBN 0-13-013297-7.
Verbeek, Marno (2004.) A Guide to Modern Econometrics, 2. ed., Chichester: John Wiley & Sons.
Gujarati, D. N. & Porter, D. C. (2008). Basic Econometrics, 5. ed., McGraw Hill/Irwin.
FAQS about Heteroscedasticity
- Define heteroscedasticity.
- What are the major consequences that may occur if heteroscedasticity occurs?
- What does mean by the constant variance of the error term in linear regression models?
- What are the possible reasons that make error term variance a variable?
- In what kind of data are problems of heteroscedasticity is likely to exist?
Learn R Programming Language