# Tagged: Homoscedasticity

## White test for Heteroskedasticity Detection

One important assumption of Regression is that the variance of the Error Term is constant across observations. If the error has a constant variance, then the errors are called homoscedastic, otherwise heteroscedastic. In the case of heteroscedastic errors (non-constant variance), the standard estimation methods become inefficient. Typically, to assess the assumption of homoscedasticity, residuals are plotted.

## White’s test for Heteroskedasticity

White test (Halbert White, 1980) proposed a test that is very similar to that by Breusch-Pagen. White test for Heteroskedasticity is general because it does not rely on the normality assumptions and it is also easy to implement. Because of the generality of White’s test, it may identify the specification bias too. Both White’s test and the Breusch-Pagan test are based on the residuals of the fitted model.

To test the assumption of homoscedasticity, one can use auxiliary regression analysis by regressing the squared residuals from the original model on the set of original regressors, the cross-products of the regressors, and the squared regressors.

Step by step procedure or perform the White test for Heteroskedasticity is as follows:

Consider the following Linear Regression Model (assume there are two independent variable)
$Y_i=\beta_0+\beta_1X_{1i}+\beta_1X_{2i}+e_i \tag{1}$

For the given data, estimate the regression model, and obtain the residuals $e_i$’s.

Note that the regression of residuals can take linear or non-linear functional form.

1. Now run the following regression model to obtain squared residuals from original regression on the original set of the independent variable, the square value of independent variables, and the cross-product(s) of the independent variable(s) such as
$Y_i=\beta_0+\beta_1X_1+\beta_2X_2+\beta_3X_1^2+\beta_4X_2^2+\beta_5X_1X_2 \tag{2}$
2. Find the $R^2$ statistics from the auxiliary regression in step 2.
You can also use the higher power regressors such as the cube. Also, note that there will be a constant term in equation (2) even though the original regression model (1)may or may not have the constant term.
3. Test the statistical significance of $n \times R^2\sim\chi^2_{df}\tag{3},$ under the null hypothesis of homoscedasticity or no heteroscedasticity, where df is the number of regressors in equation (2)
4. If the calculated chi-square value obtained in (3) is greater than the critical chi-square value at the chosen level of significance, reject the hypothesis of homoscedasticity in favor of heteroscedasticity.

For several independent variables (regressors) model, introducing all the regressors, their square or higher terms and their cross products, consume degrees of freedom.

In cases where the White test statistics are statistically significant, heteroscedasticity may not necessarily be the cause, but specification errors. In other words, “The white test can be a test of heteroscedasticity or specification error or both. If no cross-product terms are introduced in the White test procedure, then this is a pure test of pure heteroscedasticity.
If the cross-product is introduced in the model, then it is a test of both heteroscedasticity and specification bias.

### References

See the Numerical Example of White Test of Heteroscedasticity

## Homoscedasticity: Assumption of constant variance of a random variable

The assumption about the random variable μ (error term) is that its probability distribution remains the same for all observations of X and in particular that the variance of each μ is the same for all values of the explanatory variables, i.e the variance of errors is the same across all levels of the independent variables. Symbolically it can be represented as

$Var(\mu) = E\{\mu_i – E(\mu)\}^2 = E(\mu_i)^2 = \sigma_\mu^2 = \mbox(Constant)$

This assumption is known as the assumption of homoscedasticity or the assumption of constant variance of the error term μ‘s. It means that the variation of each μi around its zero means does not depend on the values of X (independent) because the error term expresses the influence on the dependent variables due to

• Errors in measurement
The errors of measurement tend to be cumulative over time. It is also difficult to collect the data and check its consistency and reliability. So the variance of μi increases with increasing the values of X.
• Omitted variables
Omitted variables from the function (regression model) tend to change in the same direction as X, causing an increase in the variance of the observation from the regression line.

The variance of each μi remains the same irrespective of small or large values of the explanatory variable i.e. $\sigma_\mu^2$ is not a function of Xi i.e $\sigma_{\mu_i^2} \ne f(X_i)$.

## Consequences if Homoscedasticity is not meet

If the assumption of homoscedastic disturbance (Constant Variance) is not fulfilled, the following are the consequence

1. We cannot apply the formula of the variance of the coefficient to conduct tests of significance and construct confidence intervals. The tests are inapplicable $Var(\hat{\beta}_0)=\sigma_\mu^2 \{\frac{\sum X^2}{n \sum X^2}\}$ and $Var(\hat{\beta}_1) = \sigma_\mu^2 \{\frac{1}{\sum X^2}\}$
2. If μ (error term) is heteroscedastic the OLS (Ordinary Least Square) estimates do not have minimum variance property in the class of Unbiased Estimators i.e they are inefficient in small samples. Furthermore, they are inefficient in large samples (that is, asymptotically inefficient).
3. The coefficient estimates would still be statistically unbiased even if the μ‘s are heteroscedastic. The $\hat{\beta}$’s will have no statistical bias i.e $E(\beta_i)=\beta_i$ (coefficient’s expected values will be equal to the true parameter value).
4. The prediction would be inefficient because the variance of prediction includes the variance of μ and of the parameter estimates which are not minimal due to the incidence of heteroscedasticity i.e. The prediction of Y for a given value of X based on the estimates $\hat{\beta}$’s from the original data, would have a high variance.

### Tests for Homoscedasticity

Some tests commonly used for testing the assumption of homoscedasticity are:

Reference:
A. Koutsoyiannis (1972). “Theory of Econometrics”. 2nd Ed.

## Goldfeld Quandt Test: Comparison of the Variances of Error Terms

The Goldfeld Quandt test is one of two tests proposed in a 1965 paper by Stephen Goldfeld and Richard Quandt. Both parametric and nonparametric tests are described in the paper, but the term “Goldfeld–Quandt test” is usually associated only with the parametric test.
Goldfeld-Quandt test is frequently used as it is easy to apply when one of the regressors (or another r.v.) is considered the proportionality factor of heteroscedasticity. Goldfeld-Quandt test is applicable for large samples. The observations must be at least twice as many as the parameters to be estimated. The test assumes normality and serially independent error terms μi.

The Goldfeld–Quandt test compares the variance of error terms across discrete subgroups. So data is divided into h subgroups. Usually, the data set is divided into two parts or groups, and hence the test is sometimes called a two-group test.

## Goldfeld-Quandt Test Procedure:

The procedure of conducting the Goldfeld-Quandt Test is;

1. Order the observations according to the magnitude of X (the independent variable which is the proportionality factor).
2. Select arbitrarily a certain number (c) of central observations which we omit from the analysis. (for n=30, 8 central observations are omitted i.e. 1/3 of the observations are removed). The remaining n – c observations are divided into two sub-groups of equal size i.e. (n – c)/2, one sub-group includes small values of X and other sub-group include the large values of X, a data set is arranged according to the magnitude of X.
3. Now Fit the separate regression to each of the sub-group, and obtain the sum of squared residuals form each of them.
So $\sum c_1^2$ show sum of squares of Residuals from sub-sample of low values of X with $(n – c)/2 – K$ df, where K is the total number of parameters.$\sum c_2^2$ show sum of squares of Residuals from sub-sample of large values of X with $(n – c)/2 – K$ df, where K is the total number of parameters.
4. Compute the Relation $F^* = \frac{RSS_2/df}{RSS_2/df}=\frac{\sum c_2^2/ ((n-c)/2-k)}{\sum c_1^2/((n-c)/2-k) }$

If variances differ, F* will have a large value. The higher the observed value of the F*-ratio the stronger the heteroscedasticity of the $u_i$.

References

• Goldfeld, Stephen M.; Quandt, R. E. (June 1965). “Some Tests for Homoscedasticity”. Journal of the American Statistical Association 60 (310): 539–547
• Kennedy, Peter (2008). A Guide to Econometrics (6th ed.). Blackwell. p. 116

Numerical Example of the Goldfeld-Quandt Test.

# Heteroscedasticity

An important assumption of OLS is that the disturbances μi appearing in the population regression function are homoscedastic (Error term have the same variance).
i.e. The variance of each disturbance term μi, conditional on the chosen values of explanatory variables is some constant number equal to $\sigma^2$. $E(\mu_{i}^{2})=\sigma^2$; where $i=1,2,\cdots, n$.
Homo means equal and scedasticity means spread.

Consider the general linear regression model
$y_i=\beta_1+\beta_2 x_{2i}+ \beta_3 x_{3i} +\cdots + \beta_k x_{ki} + \varepsilon$

If $E(\varepsilon_{i}^{2})=\sigma^2$ for all $i=1,2,\cdots, n$ then the assumption of constant variance of the error term or homoscedasticity is satisfied.

If $E(\varepsilon_{i}^{2})\ne\sigma^2$ then assumption of homoscedasticity is violated and heteroscedasticity is said to be present. In the case of heteroscedasticity, the OLS estimators are unbiased but inefficient.

Examples:

1. The range in family income between the poorest and richest family in town is the classical example of heteroscedasticity.
2. The range in annual sales between a corner drug store and general store.  ## Reasons for Heteroscedasticity

There are several reasons when the variances of error term μi may be variable, some of which are:

1. Following the error learning models, as people learn their error of behaviors becomes smaller over time. In this case $\sigma_{i}^{2}$ is expected to decrease. For example the number of typing errors made in a given time period on a test to the hours put in typing practice.
2. As income grows, people have more discretionary income and hence $\sigma_{i}^{2}$ is likely to increase with income.
3. As data collecting techniques improve, $\sigma_{i}^{2}$ is likely to decrease.
4. Heteroscedasticity can also arise as a result of the presence of outliers. The inclusion or exclusion of such observations, especially when the sample size is small, can substantially alter the results of regression analysis.
5. Heteroscedasticity arises from violating the assumption of CLRM (classical linear regression model), that the regression model is not correctly specified.
6. Skewness in the distribution of one or more regressors included in the model is another source of heteroscedasticity.
7. Incorrect data transformation, incorrect functional form (linear or log-linear model) is also the source of heteroscedasticity

# Consequences of Heteroscedasticity

1. The OLS estimators and regression predictions based on them remains unbiased and consistent.
2. The OLS estimators are no longer the BLUE (Best Linear Unbiased Estimators) because they are no longer efficient, so the regression predictions will be inefficient too.
3. Because of the inconsistency of the covariance matrix of the estimated regression coefficients, the tests of hypotheses, (t-test, F-test) are no longer valid.

Note: Problems of heteroscedasticity is likely to be more common in cross-sectional than in time series data.

Reference
Greene, W.H. (1993). Econometric Analysis, Prentice–Hall, ISBN 0-13-013297-7.
Verbeek, Marno (2004.) A Guide to Modern Econometrics, 2. ed., Chichester: John Wiley & Sons.
Gujarati, D. N. & Porter, D. C. (2008). Basic Econometrics, 5. ed., McGraw Hill/Irwin.