# Basic Statistics and Data Analysis

### Category: Test of Heteroscedasticity

Different available test of heteroscedasticty, Detection of Heteroscedasticity using Graphical techniques will be presented in this category.

## Heteroscedasticity Tests and Remedies

There are a set of heteroscedasticity tests and remedies that require an assumption about the structure of the heteroscedasticity, if it exists. That is, to use these tests you must choose a specific functional form for the relationship between the error vriance and the variables that you believe determine the error variance. The major difference between these tests is the functional form that each test assumes.

Breusch-Pagan Test

The Breusch-Pagan test assumes the error variance is a linear function of one or more variables.

Harvey-Godfrey Test

The Harvey-Godfrey test assumes the error variance is an exponential function of one or more variables. The variables are usually assumed to be one or more of the explanatory variables in the regression equation.

The White Test

The white test of heteroscedasticity is a general test for the detection of heteroscdsticity existence in data set. It has the following advantages:

1. It does not require you to specify a model of the structure of the heteroscedasticity, if it exists.
2. It does not depend on the assumption that the errors are normally distributed.
3. It specifically tests if the presence of heteroscedasticity causes the OLS formula for the variances and the covariances of the estimates to be incorrect.

# Remedies for Heteroscedasticity

Suppose that you find the evidence of existence of heteroscedasticity. If you use the oLS estimator, you will get unbiased but inefficient estimates of the parameters of the model. Also, the estimates of the variances and covariances of the parameter estimates will be biased and inconsistent, and as a result hypothesis tests will not be valid. When there is evidence of heteroscedasticity, econometricians do one of the two things:

• Use OLS estimator to estimate the parameters of the model. Correct the estimates of the variances and covariances of the OLS estimates so that they are consistent.
• Use an estimator other than the OLS estimator to estimate the parameters of the model.

Many econometricians choose first alternative. This is because the most serious consequence of using the OLS estimator when there is heteroscedasticity is that the estimates of the variances and covariances of the parameter estimates are biased and inconsistent. If this problem is corrected, then the only shortcoming of using OLS is that you lose some precision relative to some other estimator that you could have used. However, to get more precise estimates with an alternative estimator, you must know the approximate structure of the heteroscedasticity. If you specify the wrong model of heteroscedasticity, then this alternative estimator can yield estimates that are worse than the OLS

# Heteroscedasticity

One of the assumption of classical linear regression model is that there is no heteroscedasticity (error terms has constant error term) meaning that ordinary least square (OLS) estimators are (BLUE, best linear unbiased estimator) and their variances is the lowest of all other unbiased estimators (Gauss Markov Theorem). If the assumption of constant variance does not hold then this means that the Gauss Markov Theorem does not apply. For heteroscedastic data, regression analysis provide unbiased estimate for the relationship between the predictors and the outcome variables.

As we have discussed that heteroscedasticity occurs when the error variance has non-constant variance.  In this case, we can think of the disturbance for each observation as being drawn from a different distribution with a different variance.  Stated equivalently, the variance of the observed value of the dependent variable around the regression line is non-constant.  We can think of each observed value of the dependent variable as being drawn from a different conditional probability distribution with a different conditional variance. A general linear regression model with the assumption of heteroscedasticity can be expressed as follows

\begin{align*}
y_i & = \beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2} + \cdots + \beta_p X_ip + \varepsilon_i\\
Var(\varepsilon_i)&=E(\varepsilon_i^2)\\
&=\sigma_i^2; \cdots i=1,2,\cdots, n
\end{align*}

Note that we have a $i$ subscript attached to sigma squared.  This indicates that the disturbance for each of the $n$-units is drawn from a probability distribution that has a different variance.

If the error term has non-constant variance, but all other assumptions of the classical linear regression model are satisfied, then the consequences of using the OLS estimator to obtain estimates of the population parameters are:

• The OLS estimator is still unbiased
• The OLS estimator is inefficient; that is, it is not BLUE
• The estimated variances and covariances of the OLS estimates are biased and inconsistent
• Hypothesis tests are not valid

## Detection of Heteroscedasticity Regression Residual Plot

The residual for the $i$th observation, $\hat{\varepsilon_i}$, is an unbiased estimate of the unknown and unobservable error for that observation, $\hat{\varepsilon_i}$. Thus the squared residuals, $\hat{\varepsilon_i^2}$ , can be used as an estimate of the unknown and unobservable error variance,  $\sigma_i^2=E(\hat{\varepsilon_i})$.  You can calculate the squared residuals and then plot them against an explanatory variable that you believe might be related to the error variance.  If you believe that the error variance may be related to more than one of the explanatory variables, you can plot the squared residuals against each one of these variables.  Alternatively, you could plot the squared residuals against the fitted value of the dependent variable obtained from the OLS estimates.  Most statistical programs (softwares) have a command to do these residual plots.  It must be emphasized that this is not a formal test for heteroscedasticity.  It would only suggest whether heteroscedasticity may exist.

Below there are residual plots showing the three typical patterns. The first plot shows a random pattern that indicates a good fit for a linear model. The other two plot patterns of residual plots are non-random (U-shaped and inverted U), suggesting a better fit for a non-linear model, than linear regression model.

Heteroscedasticity Regression Residual Plot 1

Heteroscedasticity Residual Residual Plot 2

Heteroscedasticity Regression Residual Plot 3

## White test for Heteroskedasticity detection

One of important assumption of Regression is that the variance of Error Term is constant across observations. If the error have constant variance, then the errors are called homoscedastic, otherwise heteroscedastic. In case of heteroscedastic errors (non-constant variance), the standard estimation methods becomes inefficient. Typically, to assess the assumption of homoscedasticity, residuals are plotted.

## White’s test for Heteroskedasticity

White test (Halbert White, 1980) proposed a test which is vary similar to that by Breusch-Pagen. White test for Heteroskedasticity is general because it do not rely on the normality assumptions and it is also easy to implement. Because of the generality of White’s test, it may identify the specification bias too. Both White’s test and the Breusch-Pagan test are based on the residuals of the fitted model.

To test the assumption of homoscedasticity, one can use auxiliary regression analysis by regressing the squared residuals from the original model on set of original regressors, the cross-products of the regressors and the squared regressors.

Step by step procedure or perform White test for Heteroskedasticity is as follows:

Consider the following Linear Regression Model (assume there are two independent variable)
$Y_i=\beta_0+\beta_1X_{1i}+\beta_1X_{2i}+e_i \tag{1}$

For given data, estimate the regression model and obtain the residuals $e_i$’s.

1. Now run the following regression model to obtain squared residuals from original regression on the original set of independent variable, square value of independent variables and the cross-product(s) of the independent variable(s) such as
$Y_i=\beta_0+\beta_1X_1+\beta_2X_2+\beta_3X_1^2+\beta_4X_2^2+\beta_5X_1X_2 \tag{2}$
2. Find the $R^2$ statistics from the auxiliary regression in step 2.
You can also use higher power of regressors such as cube. Also note that there will be constant term in equation (2) even though the original regression model (1)may or may not have the constant term.
3. Test the statistical significance of $n \times R^2\sim\chi^2_{df}\tag{3},$ under the null hypothesis of homoscedasticity or no heteroscedasticity, where df is number of regressors in equation (2)
4. If calculated chi-square value obtained in (3) is greater than the critical chi-square value at chosen level of significance, reject the hypothesis of homoscedasticity in favour of heteroscedasticity.

Note that the regression of residuals can take linear or non-linear functional form.

For several independent variables (regressors) model, introducing all the regressors, their square or higher terms and their cross products, consume degrees of freedom.

In cases where the White test statistics is statistically significant, heteroscedasticity may not necessarily be the cause, but specification errors. In other words, “The white test can be a test of heteroscedasticity or specification error or both. If no cross product terms are introduced in the White test procedure, then this is a pure test of pure heteroscedasticity.
If cross product are introduced in model, then it is a test of both heteroscedasticity and specification bias.

### References

• H. White (1980), “A heteroscedasticity Consistent Covariance Matrix Estimator and a Direct Test of Heteroscedasticity”, Econometrica, Vol. 48, pp. 817-818.
• https://en.wikipedia.org/wiki/White_test

## Breusch-Pagan Test for Heteroscedasticity

Breusch–Pagan test (named after Trevor Breusch and Adrian Pagan) is used to test for heteroscedasticity in a linear regression model.

Assume our regression model is $Y_i = \beta_1 + \beta_2 X_{2i} + \mu_i$ i.e we have simple linear regression model, and $E(\mu_i^2)=\sigma_i^2$, where $\sigma_i^2=f(\alpha_1 + \alpha_2 Z_{2i})$

That is $\sigma_i^2$ is some function of the non-stochastic variable Z‘s. f() allows for both the linear and non-linear forms of the model. The variable Z is the independent variable X or it could represent a group of independent variables other than X.

Step to Perform Breusch-Pagan test

1. Estimate the model by OLS and obtain the residuals $\hat{\mu}_1, \hat{\mu}_2+\cdots$
2. Estimate the variance of the residuals i.e. $\hat{\sigma}^2=\frac{\sum e_i^2}{(n-2)}$
3. Run the regression $\frac{e_i^2}{\hat{\sigma^2}}=\beta_1+\beta_2 Z_i + \mu_i$ and compute explained sum of squares (ESS) from this regression
4. Test the statistical significance of ESS/2 by $\chi^2$-test with 1 df at appropriate level of significance (α).
5. Reject the hypothesis of homoscedasticity in favour of heteroscedasticity if $\frac{ESS}{2} > \chi^2_{(1)}$ at appropriate level of α.

Note that the

• Breusch-Pagan test is valid only if μi‘s are normally distributed.
• For k independent variables, ESS/2 have ($\chi^2$) Chi-square distribution with k degree of freedom.
• If the μi‘s (error term) are not normally distributed, White test is used.

References:

• Breusch, T.S.; Pagan, A.R. (1979). “Simple test for heteroscedasticity and random coefficient variation”. Econometrica (The Econometric Society) 47 (5): 1287–1294.

# GoldFeld-Quandt Test of Heteroscedasticity

The Goldfeld-Quandt test is one of two tests proposed in a 1965 paper by Stephen Goldfeld and Richard Quandt. Both a parametric and nonparametric test are described in the paper, but the term “Goldfeld–Quandt test” is usually associated only with the parametric test.
GoldFeld-Quandt test is frequently used as it is easy to apply when one of the regressors (or another r.v.) is considered the proportionality factor of heteroscedasticity. GoldFeld-Quandt test is applicable for large samples.The observations must be at least twice as many as the parameters to be estimated. The test assumes normality and serially independent error terms μi.

The Goldfeld–Quandt test compares the variance of error terms across discrete subgroups. So data is divided in h subgroups. Usually data set is divided into two parts or groups, and hence the test is sometimes called a two-group test.

The procedure of conducting GoldFeld-Quandt Test is

1. Order the observations according to the magnitude of X (the independent variable which is the proportionality factor).
2. Select arbitrarily a certain number (c) of central observations which we omit from the analysis. (for n=30, 8 central observations are omitted i.e. 1/3 of the observations are removed). The remaining n – c observations are divided into two sub-groups of equal size i.e.(n – c)/2, one sub-group includes small values of X and other sub-group includes the large values of X, as data set is arranged according to the magnitude of X.
3. Now Fit the separate regression to each of the sub-group, and obtain the sum of squared residuals form each of them. So$\sum c_1^2$
Show sum of squares of Residuals from sub-sample of low values of X with (n – c)/2K df, where Kis total number of parameters.

$\sum c_2^2$
Show sum of squares of Residuals from sub-sample of large values of X with (n – c)/2K df, where K is total number of parameters.

4. Compute the Relation $F^* = \frac{RSS_2/df}{RSS_2/df}=\frac{\sum c_2^2/ ((n-c)/2-k)}{\sum c_1^2/((n-c)/2-k) }$

If Variances differs, F* will have a large value. The higher the observed value of F* ratio the stronger the hetro of the μi‘s.

References

• Goldfeld, Stephen M.; Quandt, R. E. (June 1965). “Some Tests for Homoscedasticity”. Journal of the American Statistical Association 60 (310): 539–547
• Kennedy, Peter (2008). A Guide to Econometrics (6th ed.). Blackwell. p. 116
• Cook, R. Dennis; Weisberg, S. (April 1983). “Diagnostics for heteroscedasticitiy in regression”. Biometrika 70 (1): 1–10.