Heteroscedasticity Tests and Remedies (2018)

The post is about Heteroscedasticity Tests and Remedies of Heteroscedasticity.

There is a set of heteroscedasticity tests and remedies that require an assumption about the structure of the heteroscedasticity if it exists. That is, to use these tests you must choose a specific functional form for the relationship between the error variance and the variables that you believe determine the error variance. The major difference between these tests is the functional form that each test assumes.

Heteroscedasticity Tests

Breusch-Pagan Test

The Breusch-Pagan test assumes the error variance is a linear function of one or more variables.

Harvey-Godfrey Test

The Harvey-Godfrey test assumes the error variance is an exponential function of one or more variables. The variables are usually assumed to be one or more of the explanatory variables in the regression equation.

The White Test

The white test of heteroscedasticity is a general test for the detection of heteroscedasticity existence in the data set. It has the following advantages:

  1. It does not require you to specify a model of the structure of the heteroscedasticity if it exists.
  2. It does not depend on the assumption that the errors are normally distributed.
  3. It specifically tests if the presence of heteroscedasticity causes the OLS formula for the variances and the covariances of the estimates to be incorrect.

Remedies for Heteroscedasticity

Suppose that you find the evidence of existence of heteroscedasticity. If you use the oLS estimator, you will get unbiased but inefficient estimates of the parameters of the model. Also, the estimates of the variances and covariances of the parameter estimates will be biased and inconsistent, and as a result, hypothesis tests will not be valid. When there is evidence of heteroscedasticity, econometricians do one of the two things:

  • Use the OLS estimator to estimate the parameters of the model. Correct the estimates of the variances and covariances of the OLS estimates so that they are consistent.
  • Use an estimator other than the OLS estimator to estimate the parameters of the model.
Heteroscedasticity Tests

Many econometricians choose the first alternative. This is because the most serious consequence of using the OLS estimator when there is heteroscedasticity is that the estimates of the variances and covariances of the parameter estimates are biased and inconsistent. If this problem is corrected, then the only shortcoming of using OLS is that you lose some precision relative to some other estimator that you could have used.

Heteroscedasticity Pattern, Tests, and Remedy

However, to get more precise estimates with an alternative estimator, you must know the approximate structure of the heteroscedasticity. If you specify the wrong model of heteroscedasticity, then this alternative estimator can yield estimates that are worse than the OLS

Learn R Programming Language

Consequences of Heteroscedasticity (2013)

When heteroscedasticity is present in data, then estimates based on Ordinary Least Square (OLS) are subjected to

The consequences of Heteroscedasticity are as follows

  1. We cannot apply the formula of the variance of the coefficients to conduct tests of significance and construct confidence intervals.
  2. If the error term ($\mu_i$) is heteroscedastic, then the OLS estimates do not have the minimum variance property in the class of unbiased estimators, i.e. they are inefficient in small samples. Furthermore, they are asymptotically inefficient. The large standard errors may lead to incorrect conclusions about the statistical significance of the regression coefficients.
  3. The estimated coefficients remain unbiased statistically. That means the property of unbiasedness of OLS estimation is not violated by the presence of heteroscedasticity.
  4. The forecasts based on the model with heteroscedasticity will be less efficient as OLS estimation yields higher values of the variance of the estimated coefficients.
Heteroscedasticity

All this means the standard errors will be underestimated and the t-statistics and F-statistics will be inaccurate, caused by several factors, but the main cause is when the variables have substantially different values for each observation. For instance, GDP will suffer from heteroscedasticity if we include large countries such as the USA and small countries such as Cuba. In this case, it may be better to use GDP per person. Also, note that heteroscedasticity tends to affect cross-sectional data more than time series.

Consider the simple linear regression model (SLRM)

The OLS estimate of $\hat{\beta}$ and $\alpha$ are

\begin{align*}
\hat{\beta}&=\frac{\sum x_i y_i}{\sum x_i^2}=\frac{\sum x_i (\beta x_i +\epsilon_i)}{\sum x_i^2}\\
&=\beta\frac{\sum x_i^2}{\sum x_i^2}+\frac{\sum x_i \epsilon_i}{\sum x_i^2}\\
&=\beta + \frac{\sum x_i \epsilon_i}{\sum x_i^2}
\end{align*}

Applying expectations on both sides we get:

\[E(\hat{\beta}=\beta+\frac{\sum E(x_i \epsilon_i)}{\sum x_i^2}=\beta \qquad E(\epsilon_i x_i)=0\]

Similarly

\begin{align*}\hat{\alpha}&=\overline{y}-\hat{\beta}\overline{X}\\
&=\alpha+\beta\overline{X}+\overline{\epsilon}-\hat{\beta}\overline{X}\\
&=\alpha+\beta\overline{X}+0-\overline{X}\beta=\alpha
\end{align*}

For further details about the Consequences of Heteroscedasticity on OLS parameters, see https://itfeature.com/hetero/hetero-intro/heteroscedasticity-consequences/

Hence, the unbiasedness property of OLS estimation is not affected by Heteroscedasticity.

Consequences of Heteroscedasticity
Heteroscedasticity Pattern

Consequences of Heteroscedasticity References

Goldfeld Quandt Test: Comparison of Variances of Error Terms

The Goldfeld Quandt test is one of two tests proposed in a 1965 paper by Stephen Goldfeld and Richard Quandt. Both parametric and nonparametric tests are described in the paper, but the term “Goldfeld–Quandt test” is usually associated only with the parametric test.
Goldfeld-Quandt test is frequently used as it is easy to apply when one of the regressors (or another r.v.) is considered the proportionality factor of heteroscedasticity. Goldfeld-Quandt test is applicable for large samples. The observations must be at least twice as many as the parameters to be estimated. The test assumes normality and serially independent error terms $u_i$.

The Goldfeld Quandt test compares the variance of error terms across discrete subgroups. So data is divided into h subgroups. Usually, the data set is divided into two parts or groups, and hence the test is sometimes called a two-group test.

Goldfeld Quandt Test: Comparison of Variances of Error Terms

Before starting how to perform the Goldfeld Quand Test, you may read more about the term Heteroscedasticity, the remedial measures of heteroscedasticity, Tests of Heteroscedasticity, and Generalized Least Square Methods.

Goldfeld Quandt Test Procedure:

The procedure for conducting the Goldfeld-Quandt Test is;

  1. Order the observations according to the magnitude of $X$ (the independent variable which is the proportionality factor).
  2. Select arbitrarily a certain number (c) of central observations which we omit from the analysis. (for $n=30$, 8 central observations are omitted i.e. 1/3 of the observations are removed). The remaining $n-c$ observations are divided into two sub-groups of equal size i.e. $\frac{(n-2)}{2}$, one sub-group includes small values of $X$ and the other sub-group includes the large values of $X$, and a data set is arranged according to the magnitude of $X$.
  3. Now Fit the separate regression to each of the sub-groups, and obtain the sum of squared residuals from each of them.
    So $\sum c_1^2$ shows the sum of squares of Residuals from a sub-sample of low values of $X$ with $(n – c)/2 – K$ df, where K is the total number of parameters.$\sum c_2^2$ shows the sum of squares of Residuals from a sub-sample of large values of $X$ with $(n – c)/2 – K$ df, where K is the total number of parameters.
  4. Compute the Relation $F^* = \frac{RSS_2/df}{RSS_2/df}=\frac{\sum c_2^2/ ((n-c)/2-k)}{\sum c_1^2/((n-c)/2-k) }$

If variances differ, F* will have a large value. The higher the observed value of the F*-ratio the stronger the heteroscedasticity of the $u_i$.

Goldfeld Quandt Test of

References

  • Goldfeld, Stephen M.; Quandt, R. E. (June 1965). “Some Tests for Homoscedasticity”. Journal of the American Statistical Association 60 (310): 539–547
  • Kennedy, Peter (2008). A Guide to Econometrics (6th ed.). Blackwell. p. 116

Numerical Example of the Goldfeld-Quandt Test.

R Programming and Data Analysis in R

Online MCQs Test Website

Heteroscedasticity Definition, Reasons, Consequences (2012)

Heteroscedasticity Definition

An important assumption of OLS is that the disturbances $u_i$ appearing in the population regression function are homoscedastic (Error terms have the same variance).

The variance of each disturbance term $u_i$, conditional on the chosen values of explanatory variables is some constant number equal to $\sigma^2$. $E(u_{i}^{2})=\sigma^2$; where $i=1,2,\cdots, n$.
Homo means equal and scedasticity means spread.

Consider the general linear regression model
\[y_i=\beta_1+\beta_2 x_{2i}+ \beta_3 x_{3i} +\cdots + \beta_k x_{ki} + \varepsilon\]

If $E(\varepsilon_{i}^{2})=\sigma^2$ for all $i=1,2,\cdots, n$ then the assumption of constant variance of the error term or homoscedasticity is satisfied.

If $E(\varepsilon_{i}^{2})\ne\sigma^2$ then the assumption of homoscedasticity is violated and heteroscedasticity is said to be present. In the case of heteroscedasticity, the OLS estimators are unbiased but inefficient.

Examples:

  1. The range in family income between the poorest and richest families in town is the classical example of heteroscedasticity.
  2. The range in annual sales between a corner drug store and a general store.
Heteroscedasticity Definition, Reasons, Consequences

Reasons for Heteroscedasticity

There are several reasons why the variances of error term $u_i$ may be variable, some of which are:

  1. Following the error learning models, as people learn their errors of behavior become smaller over time. In this case $\sigma_{i}^{2}$ is expected to decrease. For example the number of typing errors made in a given period on a test to the hours put in typing practice.
  2. As income grows, people have more discretionary income, and hence $\sigma_{i}^{2}$ is likely to increase with income.
  3. As data-collecting techniques improve, $\sigma_{i}^{2}$ is likely to decrease.
  4. Heteroscedasticity can also arise as a result of the presence of outliers. The inclusion or exclusion of such observations, especially when the sample size is small, can substantially alter the results of regression analysis.
  5. Heteroscedasticity arises from violating the assumption of CLRM (classical linear regression model), that the regression model is not correctly specified.
  6. Skewness in the distribution of one or more regressors included in the model is another source of heteroscedasticity.
  7. Incorrect data transformation and incorrect functional form (linear or log-linear model) are also the sources of heteroscedasticity
Heteroscedasticity Definition

Consequences of Heteroscedasticity

  1. The OLS estimators and regression predictions based on them remain unbiased and consistent.
  2. The OLS estimators are no longer the BLUE (Best Linear Unbiased Estimators) because they are no longer efficient, so the regression predictions will be inefficient too.
  3. Because of the inconsistency of the covariance matrix of the estimated regression coefficients, the tests of hypotheses, (t-test, F-test) are no longer valid.

Note: Problems of heteroscedasticity are likely to be more common in cross-sectional than in time series data.

Reference
Greene, W.H. (1993). Econometric Analysis, Prentice–Hall, ISBN 0-13-013297-7.
Verbeek, Marno (2004.) A Guide to Modern Econometrics, 2. ed., Chichester: John Wiley & Sons.
Gujarati, D. N. & Porter, D. C. (2008). Basic Econometrics, 5. ed., McGraw Hill/Irwin.

FAQS about Heteroscedasticity

  1. Define heteroscedasticity.
  2. What are the major consequences that may occur if heteroscedasticity occurs?
  3. What does mean by the constant variance of the error term in linear regression models?
  4. What are the possible reasons that make error term variance a variable?
  5. In what kind of data are problems of heteroscedasticity is likely to exist?
https://itfeature.com

Learn R Programming Language