Introduction

Heteroscedasticity in Regression

Heteroscedasticity in Regression

Heteroscedasticity in Regression: The term heteroscedasticity refers to the violation of the assumption of homoscedasticity in linear regression models (LRM). In the case of heteroscedasticity, the errors have unequal variances for different levels of the regressors, which leads to biased and inefficient estimators of the regression coefficients. The disturbances in the Classical Linear Regression Model (CLRM) appearing in the population regression function should be homoscedastic; that is they all have the same variance.

Mathematical Proof of $E(\hat{\sigma}^2)\ne \sigma^2$ when there is some presence of hetero in the data.

For the proof of $E(\hat{\sigma}^2)\ne \sigma^2$, consider the two-variable linear regression model in the presence of heteroscedasticity,

\begin{align}
Y_i=\beta_1 + \beta_2 X+ u_i, \quad\quad (eq1)
\end{align}

where $Var(u_i)=\sigma_i^2$ (Case of heteroscedasticity)

as

\begin{align}
\hat{\sigma^2} &= \frac{\sum \hat{u}_i^2 }{n-2}\\
&= \frac{\sum (Y_i – \hat{Y}_i)^2 }{n-2}\\
&=\frac{(\beta_1 + \beta_2 X_i + u_i – \hat{\beta}_1 -\hat{\beta}_2 X_i )^2}{n-2}\\
&=\frac{\sum \left( -(\hat{\beta}_1-\beta_1) – (\hat{\beta}_2 – \beta_2)X_i + u_i \right)^2 }{n-2}\quad\quad (eq2)
\end{align}

Noting that

\begin{align*}
(Y_i-\hat{Y}_i)&=0\\
\beta_1 + \beta_2 X + u_i\, – \,\hat{\beta}_1 – \hat{\beta}_2X &=0\\
-(\hat{\beta}_1 -\beta_1) – X(\hat{\beta}_2-\beta_2) – u_i & =0\\
(\hat{\beta}_1 -\beta_1) &= – X (\hat{\beta}_2-\beta_2) + u_i\\
\text{Applying summation on both side}&\\
\sum (\hat{\beta}_1-\beta_1) &= -(\hat{\beta}_2-\beta_2)\sum X + \sum u_i\\
(\hat{\beta}_1 – \beta_1) &= -(\hat{\beta}_2-\beta_2)\overline{X}+\overline{u}
\end{align*}

Substituting it in (eq2) and taking expectation on both sides:

\begin{align}
\hat{\sigma}^2 &= \frac{1}{n-2} \left[ -(-(\hat{\beta}_2 – \beta_2) \overline{X} + \overline{u} ) – (\hat{\beta}_2-\beta_2)X_i + u_i  \right]^2\\
&=\frac{1}{n-2}E\left[(\hat{\beta}_2-\beta_2)\overline{X} -\overline{u} – (\hat{\beta}_2-\beta_2)X_i-u_i \right]^2\\
&=\frac{1}{n-2} E\left[ -(\hat{\beta}_2 – \beta_2)(X_i-\overline{X}) + (u_i-\overline{u})\right]^2\\
&= \frac{1}{n-2}\left[-\sum x_i^2 Var(\hat{\beta}_2) + E[\sum(u_i-\overline{u}]^2 \right]\\
&=\frac{1}{n-2} \left[ -\frac{\sum x_i^2 \sigma_i^2}{(\sum x_i^2)} + \frac{(n-1)\sum \sigma_i^2}{n} \right]
\end{align}

If there is homoscedasticity, then $\sigma_i^2=\sigma^2$ for each $i$, $E(\hat{\sigma}_i^2)=\sigma^2$.

The expected value of the $\hat{\sigma}^2=\frac{\hat{u}_i^2}{n-2}$ will not be equal to the true $\sigma^2$ in the presence of heteroscedasticity.


Heteroscedasticity in Regression

Read more on the Remedy of Heteroscedasticity

More on heteroscedasticity on Wikipedia

MCQs General Knowledge

R Programming Language

Consequences of Heteroscedasticity

The following are consequences of heteroscedasticity when it exists in the data.

  • The OLS estimators and regression predictions based on them remain unbiased and consistent.
  • The OLS estimators are no longer the BLUE (Best Linear Unbiased Estimators) because they are no longer efficient, so the regression predictions will be inefficient too.
  • Because of the inconsistency of the covariance matrix of the estimated regression coefficients, the tests of hypotheses, (t-test, F-test) are no longer valid.
Consequences of Heteroscedasticity

Learn about Remedial Measures of Heteroscedasticity

R Programming Language

Test Preparation MCQs

OLS Estimation in the Presence of Heteroscedasticity

For the OLS Estimation in the presence of heteroscedasticity, consider the two-variable model

\begin{align*}
Y_i &= \beta_1 +\beta_2X_i + u_i\\
\hat{\beta}_2&=\frac{\sum x_i y_i}{\sum x_i^2}\\
Var(\hat{\beta}_2)&= \frac{\sum x_i^2\, \sigma_i^2}{(\sum x_i^2)^2}
\end{align*}

OLS Estimation

OLS Estimation in the Presence of Heteroscedasticity, the variance of the OLS estimator will be

$Var(\hat{\beta}_2)$ under the assumption of homoscedasticity is $Var(\hat{\beta}_2)=\frac{\sigma^2}{\sum x_i^2}$. If $\sigma_i^2=\sigma^2$ the both $Var(\hat{\beta}_2)$ will be same.

Note that in the case of heteroscedasticity, the OLS estimators

Heteroscedasticity: OLS Estimation
  • $\hat{\beta_2}$ is BLUE if the assumptions of the classical model, including homoscedasticity, hold.
  • To establish the unbiasedness of $\hat{\beta}_2$, it is not necessary for the disturbances ($u_i$) to be homoscedastic.
  • The variance of $u_i$, homoscedasticity, or heteroscedasticity plays no part in the determination of the unbiasedness property.
  • $\hat{\beta}_2$ will be a consistent estimator despite heteroscedasticity.
  • With the increase of sample size indefinitely, the $\hat{\beta}_2$ (estimated $\beta_2$) converges to its true value.
  • $\hat{\beta}_2$ is asymptotically normally distributed.

For AR(1) the two-variable model will be $Y_t=\beta_1+\beta_2 X_2+u_t$.

The variance of $\hat{\beta}_2$ for AR(1) scheme is

$$Var(\hat{\beta}_2)_{AR(1)} = \frac{\sigma^2}{\sum x_i^2}\left[ 1+ 2 \rho \frac{\sum x_t x_{t-1}}{\sum x_t^2} +2\rho^2 \frac{\sum x_t x_{t-2}}{\sum x_t^2} +\cdots + 2\rho^{n-1} \frac{x_tx_n}{\sum x_t^2} \right]$$

If $\rho=0$ then $Var(\hat{\beta}_2)_{AR(1)} = Var(\hat{\beta}_2)_{OLS}$.

Assume that the regressors $X$ also follows the AR(1) scheme with a coefficient of autocorrelation for $r$, then

\begin{align*}
Var(\hat{\beta}_2)_{AR(1)} &= \frac{\sigma^2}{\sum x_t^2}\left(\frac{1+r\rho}{1-r \rho} \right)\\
&=Var(\hat{\beta}_2)_{OLS}\left(\frac{1+r\rho}{1-r \rho} \right)
\end{align*}

That is, the usual OLS formula of variance of $\hat{\beta}_2$ will underestimate the variance of $\hat{\beta}_2{_{AR(1)}} $.

Note $\hat{\beta}_2$ although linear-unbiased but not efficient.

In general, in economics, negative autocorrelation is much less likely to occur than positive autocorrelation.

Higher-Order Autocorrelation

Autocorrelation can take many forms. for example,

$$u_t = \rho_1 u_{t-1} + \rho_2 u_{t-2} + \cdots + \rho_p u_{t-p} + \varepsilon_t$$

It is $p$th order autocorrelation.

If we have quarterly data, and we omit seasonal effects, we might expect to find that a 4th-order autocorrelation is present. Similarly, monthly data might exhibit 12th-order autocorrelation.

Learn about Heteroscedasticity Tests and Remedies

MCQ Test Online

Learn R Software
Scroll to Top