When heteroscedasticity is present in data, then estimates based on Ordinary Least Square (OLS) are subjected to

The consequences of Heteroscedasticity are as follows

We cannot apply the formula of the variance of the coefficients to conduct tests of significance and construct confidence intervals.

If the error term ($\mu_i$) is heteroscedastic, then the OLS estimates do not have the minimum variance property in the class of unbiased estimators, i.e. they are inefficient in small samples. Furthermore, they are asymptotically inefficient.

The estimated coefficients remain unbiased statistically. That means the property of unbiasedness of OLS estimation is not violated by the presence of heteroscedasticity.

The forecasts based on the model with heteroscedasticity will be less efficient as OLS estimation yields higher values of the variance of the estimated coefficients.

All this means the standard errors will be underestimated and the t-statistics and F-statistics will be inaccurate, caused by several factors, but the main cause is when the variables have substantially different values for each observation. For instance, GDP will suffer from heteroscedasticity if we include large countries such as the USA and small countries such as Cuba. In this case, it may be better to use GDP per person. Also, note that heteroscedasticity tends to affect cross-sectional data more than time series.

The post is about Remedial Measures of Heteroscedasticity.

Heteroscedasticity is a condition in which the variance of the residual term, or error term, in a regression model, varies widely.

The heteroscedasticity does not destroy the unbiasedness and consistency properties of the OLS estimator (as OLS estimators remain unbiased and consistent in the presence of heteroscedasticity), but they are no longer efficient, not even asymptotically. The lack of efficiency makes the usual hypothesis testing procedure dubious (مشکوک، غیر معتبر). Therefore, there should be some remedial measures for heteroscedasticity.

Remedial Measures of Heteroscedasticity

For remedial measures of heteroscedasticity, there are two approaches: (i) when $\sigma_i^2$ is known, and (ii) when $\sigma_i^2$ is unknown.

If $V(u_i)=\sigma_i^2$ then heteroscedasticity is present. Given the values of $\sigma_i^2$, heteroscedasticity can be corrected by using weighted least squares (WLS) as a special case of Generalized Least Squares (GLS). Weighted least squares is the OLS method of estimation applied to the transformed model.

When heteroscedasticity is detected by any appropriate statistical test, then the appropriate solution is to transform the original model in such a way that the transformed disturbance term has a constant variance. The transformed model reduces the adjustment of the original data. The transformed error term $u_i$ has a constant variance i.e. homoscedastic. Mathematically

This approach has limited use as the individual error variances are not always known a priori. In case of significant sample information, reasonable guesses of the true error variances can be made and be used for $\sigma_i^2$.

Let us discuss the second remedy of heteroscedasticity from remedial measures of heteroscedasticity.

(ii) $\sigma_i^2$ is unknown

If $\sigma_i^2$ is not known a priori, then heteroscedasticity is corrected by hypothesizing a relationship between the error variance and one of the explanatory variables. There can be several versions of the hypothesized relationship. Suppose the hypothesized relationship is $Var(u)=\sigma^2 X_i^2$ (error variance is proportional to $X_i^2$). For this hypothesized relation we will use the following transformation to correct for heteroscedasticity for the following simple linear regression model $Y_i =\alpha + \beta X_i +u_i$. \begin{eqnarray*} \frac{Y_i}{X_i}&=&\frac{\alpha}{X_i}+\beta+\frac{u_i}{X_i}\\ \Rightarrow \quad Y_i^*&=&\beta +\alpha_i^*+u_i^*\\ \mbox{where } Y_i^*&=&\frac{Y_i}{X_i}, \alpha_I^*=\frac{1}{X_i} \mbox{and } u_i^*=\frac{u}{X_i} \end{eqnarray*}

Now the OLS estimation of the above transformed model will yield the efficient parameter estimates as $u_i^*$’s have constant variance. i.e.

For remedial measures of heteroscedasticity, some other hypothesized relations are:

Error variance is proportional to $X_i$ (Square root transformation) i.e $E(u_i^2)=\sigma^2X_i$ The transformed model is \[\frac{Y_i}{\sqrt{X_i}}=\frac{\alpha}{\sqrt{X_i}}+\beta\sqrt{X_i}+\frac{u_i}{\sqrt{X_i}}\] It (the transformed model) has no intercept term. Therefore we have to use the regression through the origin model to estimate $\alpha$ and $\beta$. To get the original model, multiply $\sqrt{X_i}$ with the transformed model.

Error Variance is proportional to the square of the mean value of $Y$. i.e. $E(u_i^2)=\sigma^2[E(Y_i)]^2$ Here the variance of $u_i$ is proportional to the square of the expected value of $Y$, and $E(Y_i)$ = \alpha + \beta X_i$. The transformed model will be \[\frac{Y_i}{E(Y_i)}=\frac{\alpha}{E(Y_i)}+\beta\frac{X_i}{E(Y_i)}+\frac{u_i}{E(Y_i)}\] This transformation is not appropriate because $E(Y_i)$ depends upon $\alpha$ and $\beta$ which are unknown parameters. $\hat{Y_i}=\hat{\alpha}+\hat{\beta}$ is an estimator of $E(Y_i)$, so we will proceed in two steps:

We run the usual OLS regression dis-regarding the heteroscedasticity problem and obtain $\hat{Y_i}$

We will transform the model by using estimated $\hat{Y_i}$ i.e. $\frac{Y_i}{\hat{Y_i}}=\alpha\frac{1}{\hat{Y_i}}+\beta_1\frac{X_i}{\hat{Y_i}}+\frac{u_i}{\hat{Y_i}}$ and run the regression on transformed model.

This transformation will perform satisfactory results only if the sample size is reasonably large.

Log transformation such as $ln\, Y_i = \alpha + \beta\, ln\, X_i + u_i$. Log transformation compresses the scales in which the variables are measured. However, this transformation is not applicable in some of the $Y$ and $X$ values that are zero or negative.

The term “Homoscedasticity” is the assumption about the random variable $u$ (error term) is that its probability distribution remains the same for all observations of $X$ and in particular that the variance of each $u$ is the same for all values of the explanatory variables, i.e the variance of errors is the same across all levels of the independent variables (Homoscedasticity: assumption about the constant variance of a random variable). Symbolically it can be represented as

This assumption is known as the assumption of homoscedasticity or the assumption of constant variance of the error term $u$’s. It means that the variation of each $u_i$ around its zero means does not depend on the values of $X$ (independent) because the error term expresses the influence on the dependent variables due to

Errors in measurement The errors of measurement tend to be cumulative over time. It is also difficult to collect the data and check its consistency and reliability. So the variance of $u_i$ increases with increasing the values of $X$.

Omitted variables Omitted variables from the function (regression model) tend to change in the same direction as $X$, causing an increase in the variance of the observation from the regression line.

The variance of each $u_i$ remains the same irrespective of small or large values of the explanatory variable i.e. $\sigma_u^2$ is not a function of $X_i$ i.e $\sigma_{u_i^2} \ne f(X_i)$.

Consequences if Homoscedasticity is not meet

If the assumption of homoscedastic disturbance (Constant Variance) is not fulfilled, the following are the consequence

We cannot apply the formula of the variance of the coefficient to conduct tests of significance and construct confidence intervals. The tests are inapplicable $Var(\hat{\beta}_0)=\sigma_u^2 \{\frac{\sum X^2}{n \sum X^2}\}$ and $Var(\hat{\beta}_1) = \sigma_u^2 \{\frac{1}{\sum X^2}\}$

If $u$ (error term) is heteroscedastic the OLS (Ordinary Least Square) estimates do not have minimum variance property in the class of Unbiased Estimators i.e. they are inefficient in small samples. Furthermore, they are inefficient in large samples (that is, asymptotically inefficient).

The coefficient estimates would still be statistically unbiased even if the $u$’s are heteroscedastic. The $\hat{\beta}$’s will have no statistical bias i.e. $E(\beta_i)=\beta_i$ (coefficient’s expected values will be equal to the true parameter value).

The prediction would be inefficient because the variance of prediction includes the variance of $u$ and of the parameter estimates which are not minimal due to the incidence of heteroscedasticity i.e. The prediction of $Y$ for a given value of $X$ based on the estimates $\hat{\beta}$’s from the original data, would have a high variance.

Tests for Homoscedasticity

Some tests commonly used for testing the assumption of homoscedasticity are: