Park Glejser Test: Numerical Example (2021)

To detect the presence of heteroscedasticity using the Park Glejser test, consider the following data.

Year1992199319941995199619971998
Yt37484536255563
Xt4.56.53.532.58.57.5

The step-by-step procedure for conducting the Park Glejser test:

Step 1: Obtain an estimate of the regression equation

$$\hat{Y}_i = 19.8822 + 4.7173X_i$$

Obtain the residuals from this estimated regression equation:

Residuals-4.1103-2.54508.60711.9657-6.6756-4.97977.7377
Heteroscedasticity Detection: Park Glejser Test Numerical Example

Take the absolute values of these residuals and consider it as your dependent variables to perform the different functional forms suggested by Glejser.

Step 2: Regress the absolute values of $\hat{u}_i$ on the $X$ variable that is thought to be closely associated with $\sigma_i^2$. We will use the following function forms.

Sr. No.Functional FormResults
1)$|\hat{u}_t| = \beta_1 + \beta_2 X_i +v_i$ $|\hat{u}_i| = 5.2666-0.00681X_i,\quad R^2=0.00004$

$t_{cal} = -0.014$

   
2)$|\hat{u}_t| = \beta_1 + \beta_2 \sqrt{X_i} +v_i$$|\hat{u}_i| = 5.445-0.0962X_i,\quad R^2=0.000389$

$t_{cal} = -0.04414$

   
3)$|\hat{u}_t| = \beta_1 + \beta_2 \frac{1}{X_i} +v_i$$||\hat{u}_i| = 4.9124+1.3571X_i,\quad R^2=0.00332$

$t_{cal} = -0.12914$

   
4)$|\hat{u}_t| = \beta_1 + \beta_2 \frac{1}{\sqrt{X_i}} +v_i$

$\hat{u}_i| = 4.7375+1.0428X_i,\quad R^2=0.00209$ $t_{cal} = 0.10252$

Since none of the residual regression is significant, therefore, the hypothesis of heteroscedasticity is rejected. Therefore, we can say that there is no relationship between the absolute value of the residuals ($u_i$) and the explanatory variable $X$.

Error Variance is Proportional to Xi: Park Glejser Test

How to perform White General Heteroscedasticity?

Online MCQs First Year Mathematics

R Data Analysis

Goldfeld-Quandt Test Example (2020)

Data is taken from the Economic Survey of Pakistan 1991-1992. The data file link is at the end of the post “Goldfeld-Quandt Test Example for the Detection of Heteroscedasticity”.

Read about the Goldfeld-Quandt Test in detail by clicking the link “Goldfeld-Quandt Test: Comparison of Variances of Error Terms“.

Goldfeld-Quandt Test Example

For an illustration of the Goldfeld-Quandt Test Example, the data given in the file should be divided into two sub-samples after dropping (removing/deleting) the middle five observations.

Sub-sample 1 consists of data from 1959-60 to 1970-71.

Sub-sample 2 consists of data from 1976-77 to 1987-1988.

The sub-sample 1 is highlighted in green colour, and sub-sample 2 is highlighted in blue color, while the middle observation that has to be deleted is highlighted in red.

Goldfeld-Quandt Test Example

The Step-by-Step Procedure to Conduct the Goldfeld Quandt Test

Step 1: Order or Rank the observations according to the value of $X_i$. (Note that observations are already ranked.)

Step 2: Omit $c$ central observations. We selected 1/6 observations to be removed from the middle of the observations. 

Step 3: Fit OLS regression on both samples separately and obtain the Residual Sum of Squares (RSS) for each sub-sample.

The Estimated regression for the two sub-samples are:

Sub-sample 1: $\hat{C}_1 = 1010.096 + 0.849 \text{Income}$

Sub-sample 2: $\hat{C}_2 = -244.003 + 0.88067 \text{Income}$

Now compute the Residual Sum of Squares for both sub-samples.

The residual Sum of Squares for Sub-Sample 1 is $RSS_1=2532224$

The residual Sum of Squares for Sub-Sample 2 is $RSS_2=10339356$

The F-Statistic is $ \lambda=\frac{RSS_2/n_2}{RSS_1/n_1}=\frac{10339356}{2532224}=4.083$

The critical value of $F(n_1=10, n_2=10$ at a 5% level of significance is 2.98.

Since the computed F value is greater than the critical value, heteroscedasticity exists in this case, that is, the variance of the error term is not consistent, rather it depends on the independent variable, GNP.

Your assignment is to perform the Goldfeld-Quandt Test Example using any statistical software and confirm the results.

Download the data file by clicking the link “GNP and consumption expenditure data“.

Learn about White’s Test of Heteroscedasticity

Goldfeld-Quandt Test Example

Learn R Programming

Online Test Preparation MCQS with Answers

Heteroscedasticity Residual Plot (2020)

The post is about Heteroscedasticity Residual Plot.

Heteroscedasticity and Heteroscedasticity Residual Plot

One of the assumptions of the classical linear regression model is that there is no heteroscedasticity (error terms have constant error terms) meaning that ordinary least square (OLS) estimators are (BLUE, best linear unbiased estimator) and their variances are the lowest of all other unbiased estimators (Gauss Markov Theorem).

If the assumption of constant variance does not hold then this means that the Gauss Markov Theorem does not apply. For heteroscedastic data, regression analysis provides an unbiased estimate of the relationship between the predictors and the outcome variables.

As we have discussed heteroscedasticity occurs when the error variance has non-constant variance.  In this case, we can think of the disturbance for each observation as being drawn from a different distribution with a different variance.  Stated equivalently, the variance of the observed value of the dependent variable around the regression line is non-constant. 

We can think of each observed value of the dependent variable as being drawn from a different conditional probability distribution with a different conditional variance. A general linear regression model with the assumption of heteroscedasticity can be expressed as follows

\begin{align*}
y_i & = \beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2} + \cdots + \beta_p X_ip + \varepsilon_i\\
Var(\varepsilon_i)&=E(\varepsilon_i^2)\\
&=\sigma_i^2; \cdots i=1,2,\cdots, n
\end{align*}

Note that we have a $i$ subscript attached to sigma squared.  This indicates that the disturbance for each of the $ n$ units is drawn from a probability distribution that has a different variance.

If the error term has non-constant variance, but all other assumptions of the classical linear regression model are satisfied, then the consequences of using the OLS estimator to obtain estimates of the population parameters are:

  • The OLS estimator is still unbiased
  • The OLS estimator is inefficient; that is, it is not BLUE
  • The estimated variances and covariances of the OLS estimates are biased and inconsistent
  • Hypothesis tests are not valid

Detection of Heteroscedasticity Residual Plot

The residual for the $i$th observation, $\hat{\varepsilon_i}$, is an unbiased estimate of the unknown and unobservable error for that observation, $\hat{\varepsilon_i}$. Thus the squared residuals, $\hat{\varepsilon_i^2} $, can be used as an estimate of the unknown and unobservable error variance,  $\sigma_i^2=E(\hat{\varepsilon_i})$. 

One can calculate the squared residuals and then plot them against an explanatory variable that you believe might be related to the error variance.  If you believe that the error variance may be related to more than one of the explanatory variables, you can plot the squared residuals against each one of these variables.  Alternatively, you could plot the squared residuals against the fitted value of the dependent variable obtained from the OLS estimates.  Most statistical programs (software) have a command to do these residual plots.  It must be emphasized that this is not a formal test for heteroscedasticity.  It would only suggest whether heteroscedasticity may exist.

Below there are residual plots showing the three typical patterns. The first plot shows a random pattern that indicates a good fit for a linear model. The other two plot patterns of residual plots are non-random (U-shaped and inverted U), suggesting a better fit for a non-linear model, than a linear regression model.

Heteroscedasticity Regression Residual Plot 3
Heteroscedasticity Residual Plot 1
Heteroscedasticity Residual Plot 1
Heteroscedasticity Residual Residual Plot 2
Heteroscedasticity Residual Plot 2
Heteroscedasticity Residual Plot 3

Learn R Language from R Frequently Asked Questions

Goldfeld Quandt Test: Comparison of Variances of Error Terms

The Goldfeld Quandt test is one of two tests proposed in a 1965 paper by Stephen Goldfeld and Richard Quandt. Both parametric and nonparametric tests are described in the paper, but the term “Goldfeld–Quandt test” is usually associated only with the parametric test.
Goldfeld-Quandt test is frequently used as it is easy to apply when one of the regressors (or another r.v.) is considered the proportionality factor of heteroscedasticity. Goldfeld-Quandt test is applicable for large samples. The observations must be at least twice as many as the parameters to be estimated. The test assumes normality and serially independent error terms $u_i$.

The Goldfeld Quandt test compares the variance of error terms across discrete subgroups. So data is divided into h subgroups. Usually, the data set is divided into two parts or groups, and hence the test is sometimes called a two-group test.

Goldfeld Quandt Test: Comparison of Variances of Error Terms

Before starting how to perform the Goldfeld Quand Test, you may read more about the term Heteroscedasticity, the remedial measures of heteroscedasticity, Tests of Heteroscedasticity, and Generalized Least Square Methods.

Goldfeld Quandt Test Procedure:

The procedure for conducting the Goldfeld-Quandt Test is;

  1. Order the observations according to the magnitude of $X$ (the independent variable which is the proportionality factor).
  2. Select arbitrarily a certain number (c) of central observations which we omit from the analysis. (for $n=30$, 8 central observations are omitted i.e. 1/3 of the observations are removed). The remaining $n-c$ observations are divided into two sub-groups of equal size i.e. $\frac{(n-2)}{2}$, one sub-group includes small values of $X$ and the other sub-group includes the large values of $X$, and a data set is arranged according to the magnitude of $X$.
  3. Now Fit the separate regression to each of the sub-groups, and obtain the sum of squared residuals from each of them.
    So $\sum c_1^2$ shows the sum of squares of Residuals from a sub-sample of low values of $X$ with $(n – c)/2 – K$ df, where K is the total number of parameters.$\sum c_2^2$ shows the sum of squares of Residuals from a sub-sample of large values of $X$ with $(n – c)/2 – K$ df, where K is the total number of parameters.
  4. Compute the Relation $F^* = \frac{RSS_2/df}{RSS_2/df}=\frac{\sum c_2^2/ ((n-c)/2-k)}{\sum c_1^2/((n-c)/2-k) }$

If variances differ, F* will have a large value. The higher the observed value of the F*-ratio the stronger the heteroscedasticity of the $u_i$.

Goldfeld Quandt Test of

References

  • Goldfeld, Stephen M.; Quandt, R. E. (June 1965). “Some Tests for Homoscedasticity”. Journal of the American Statistical Association 60 (310): 539–547
  • Kennedy, Peter (2008). A Guide to Econometrics (6th ed.). Blackwell. p. 116

Numerical Example of the Goldfeld-Quandt Test.

R Programming and Data Analysis in R

Online MCQs Test Website