Statistics for Data Analyst - Statistics MCQs, Analysis, Software

White General Heteroscedasticity Test (Numerical) 2021

May 2, 2024Jan 20, 2021 by Muhammad Imdad Ullah

One important assumption of Regression is that the variance of the Error Term is constant across observations. If the error has a constant variance, then the errors are called homoscedastic, otherwise heteroscedastic. In the case of heteroscedastic errors (non-constant variance), the standard estimation methods become inefficient. Typically, to assess the assumption of homoscedasticity, residuals are plotted.

Read about Heteroscedasticity Consequences in detail.

white general heteroscedasticity test https://itfeature.com

We will consider the following data, to test the presence of heteroscedasticity using White General Heteroscedasticity test.

Income	Education	Job Experience
5	2	9
9.7	4	18
28.4	8	21
8.8	8	12
21	8	14
26.6	10	16
25.4	12	16
23.1	12	9
22.5	12	18
19.5	12	5
21.7	12	7
24.8	13	9
30.1	14	12
24.8	14	17
28.5	15	19
26	15	6
38.9	16	17
22.1	16	1
33.1	17	10
48.3	21	17

White General Heteroscedasticity Test

To perform the White General Heteroscedasticity test, the general procedure is

Step 1: Run a regression and obtain $\hat{u}_i$ of this regression equation.

The regression model is: $income = \beta_1+\beta_2\, educ + \beta_3\, jobexp + u_i$

The Regression results are: $Income_i=-7.09686 + 1.93339 educ_{i} + 0.649365 jobexp_{i}$

Step 2: Run the following auxiliary regression

$$\hat{u}_i^2=\alpha_1+\alpha_2X_{2i}+\alpha_3 X_{3i}+\alpha_4 X_{2i}^2+\alpha_5X_{3i}^2+\alpha_6X_{2i}X_{3i}+vi $$

that is, regress the squared residuals on a constant, all the explanatory variables, the squared explanatory variables, and their respective cross-product.

Here in auxiliary regression education, $Y$ is income, $X_2$ is educ, and $X_3$ is jobexp.

The results from auxiliary regression are:

$$Y=42.6145 -0.10872\,X_{2i} – 5.8402\, X_{3i} -0.15273\, X_{2i}^2 + 0.200715\, X_{3i}^2 + 0.226517\,X_{2i}X_{3i}$$

Step 3: Formulate the null and alternative hypotheses

$H_0: \alpha_1=\alpha_2=\cdots=\alpha_p=0$

$H_1$: at least one of the $\alpha$s is different from zero

Step 4: Reject the null and conclude that there is significant evidence of heteroscedasticity when the statistic is bigger than the critical value.

The statistic with computed value is:

$$n \cdot R^2 \, \Rightarrow = 20\times 0.4488 = 8.977$$

The statistics follow asymptotically $\chi^2_{df}$, where $df=k-1$. The Critical value is $\chi^2_5$ at a 5% level of significance is 11.07.

Since the calculated value is smaller than the tabulated value, therefore, the null hypothesis is accepted. Therefore, based on the White general heteroscedasticity test, there is no heteroscedasticity.

Download the data file: White’s test Related Data

Online MCQs Quiz Website

Park Glejser Test: Numerical Example (2021)

Apr 7, 2024Jan 15, 2021 by Muhammad Imdad Ullah

To detect the presence of heteroscedasticity using the Park Glejser test, consider the following data.

Year	1992	1993	1994	1995	1996	1997	1998
Yt	37	48	45	36	25	55	63
Xt	4.5	6.5	3.5	3	2.5	8.5	7.5

The step-by-step procedure for conducting the Park Glejser test:

Step 1: Obtain an estimate of the regression equation

$$\hat{Y}_i = 19.8822 + 4.7173X_i$$

Obtain the residuals from this estimated regression equation:

Residuals

-4.1103

-2.5450

8.6071

1.9657

-6.6756

-4.9797

7.7377

Heteroscedasticity Detection: Park Glejser Test Numerical Example

Take the absolute values of these residuals and consider it as your dependent variables to perform the different functional forms suggested by Glejser.

Step 2: Regress the absolute values of $\hat{u}_i$ on the $X$ variable that is thought to be closely associated with $\sigma_i^2$. We will use the following function forms.

Sr. No.	Functional Form	Results
1)	$\|\hat{u}_t\| = \beta_1 + \beta_2 X_i +v_i$	$\|\hat{u}_i\| = 5.2666-0.00681X_i,\quad R^2=0.00004$ $t_{cal} = -0.014$

2)	$\|\hat{u}_t\| = \beta_1 + \beta_2 \sqrt{X_i} +v_i$	$\|\hat{u}_i\| = 5.445-0.0962X_i,\quad R^2=0.000389$ $t_{cal} = -0.04414$

3)	$\|\hat{u}_t\| = \beta_1 + \beta_2 \frac{1}{X_i} +v_i$	$\|\|\hat{u}_i\| = 4.9124+1.3571X_i,\quad R^2=0.00332$ $t_{cal} = -0.12914$

4)	$\|\hat{u}_t\| = \beta_1 + \beta_2 \frac{1}{\sqrt{X_i}} +v_i$	$\hat{u}_i\| = 4.7375+1.0428X_i,\quad R^2=0.00209$ $t_{cal} = 0.10252$

Since none of the residual regression is significant, therefore, the hypothesis of heteroscedasticity is rejected. Therefore, we can say that there is no relationship between the absolute value of the residuals ($u_i$) and the explanatory variable $X$.

Error Variance is Proportional to Xi: Park Glejser Test

How to perform White General Heteroscedasticity?

Online MCQs First Year Mathematics

R Data Analysis

Durbin-Watson Test Statistic (2021)

May 17, 2024Jan 10, 2021 by Muhammad Imdad Ullah

Durbin and Watson have suggested a test to detect the presence of autocorrelation which applies to small samples. However, the test is appropriate only for the first-order autoregressive scheme ($u_t = \rho u_{t-1} + \varepsilon_t$).

Step by Step procedure for the Durbin-Watson Test

Step 1: Null and Alternative Hypothesis

The null hypothesis is $H_0:\rho=0$ (that is, $u$’s are not autocorrelated with a first-order scheme)

The alternative hypothesis is $H_1: \rho \ne 0$ (that is, $u$’s are autocorrelated with a first-order scheme)

Step 2: Level of Significance

Choose the appropriate level of significance, such as 5%, 1%, 10%, etc.

Step 3: Test Statistics

To test the null hypothesis, the Durbin-Watson Test statistic is

$$d = \frac{\sum\limits_{t=2}^n (u_t – u_{t-1})^2}{\sum\limits_{t=1}^n e_t^2}$$

The value of $d$ lies between 0 and 4, when $d=2$, then $\rho=0$. It means that $H_0:\rho=0$ is equivalent to testing $H_0:d=2$.

\begin{align*}
d&= \frac{\sum\limits_{t=2}^n (u_t – u_{t-1})^2}{\sum\limits_{t=1}^n u_t^2}\\
&=\frac{ \sum\limits_{t=2}^n (u_t^2 + u_{t-1}^2 – 2u_t u_{t-1} ) }{\sum\limits_{t=1}^n u_t^2} \\
&=\frac{ \sum\limits_{t=2}^n u_t^2 + \sum\limits_{t=2}^n u_{t-1}^2 – 2 \sum\limits_{t=2}^n u_t u_{t-1} }{\sum\limits_{t=1}^n u_t^2}
\end{align*}

Durbin-Watson Test Statistic is simply the ratio of the sum of squared differences in the successive residuals to the residual sum of squares. In the numerator, there will be $n-2$ observations because of lag values.

For large samples $\sum\limits_{t=2}^n u_t^2$, $\sum\limits_{t=2}^n u_{t-1}^2$ and $\sum\limits_{t=1}^n u_t^2$ are all approximately equal. Therefore,

\begin{align*}
d &\approx \frac{2 \sum u_t^2 – 1}{\sum u_{t-1}^2} – \frac{2 \sum_{t=2}^n u_tu_{t-1} }{ \sum u_{t-1}^2 }\\
& \approx 2 \left[ 1- \frac{\sum u_t u_{t-1} }{ \sum u_{t-1}^2 }\right]\\
\text{but }\,\,\, \hat{\rho} &= \frac{\sum u_t u_{t-1}}{\sum u_{t-1}^2}
\end{align*}

Therefore $d\approx 2(1-\hat{\rho})$

It is obvious that the values of $d$ lie between 0 and 4.

Firstly: If there is no autocorrelation $\hat{\rho}=$ then $d=2$, it means that from the sample data $d^*\approx 2$. We accept that there is no autocorrelation.

Secondly: If $\hat{\rho}=+1$, we have perfect positive autocorrelation. Therefore, if $2<d^* <4$ there is some degree of positive autocorrelation (which is stronger the higher for the higher value of $d^*$).

Thirdly: If $\hat{\rho}=-1, d=4$. We have perfect negative autocorrelation. Therefore, if $2<d^*<4$, there is some degree of negative autocorrelation (which is stronger for the higher value of $d^*$).

The next step is to use the sample residual ($u_t$’s) and compute the empirical value of the Durbin-Watson statistic $d^*$.

Finally, the empirical $d^*$ must be compared with the theoretical values of $d$, that is, the values of $d$ which define the critical region of the test.

The problem with this test is that the exact distribution of $d$ is not known. However, Durbin and Watson have established upper ($d_u$) and lower ($d_l$) limits for the significance level of $d$ which are appropriate to the hypothesis of zero first-order autocorrelation against the alternative hypothesis of positive first-order autocorrelation. Durbin and Watson have tabulated these upper and lower values at 5% and 1% level of significance.

Critical Region of $d$ Durbin-Watson test

If $d^*<d_l$ we reject the null hypothesis of no autocorrelation and accept that there is positive autocorrelation of the first order.
If $d^* >( 4-d_l)$ we reject the null hypothesis of no autocorrelation and accept that there is negative autocorrelation of the first order.
If $d_u < d^* < (4-d_u)$ we accept the null hypothesis of no autocorrelation
if $d_l < d^* < d_u$ or if $(4-d_u)<d^*<(4-d_l)$ the test is inconclusive.

Assumptions underlying the $d$ Statistics

The regression model includes the intercept term. It is not present as in the case of the regression through the origin, it is essential to return the regression including the intercept term to obtain the RSS.
The explanatory variables, $X$ are non-stochastic or fixed in repeated sampling.
The disturbances $u_t$ are generated by the first-order autoregressive scheme: $u_t=\rho + u_{t-1} +\varepsilon_t$ (it cannot be used to detect higher-order autoregression schemes.
The error term $u_t$ is assumed to be normally distributed.
The regression model does not include the lagged values(s) of the dependent variable as one of the explanatory variables. The Durbin-Watson test is inappropriate to the model of this type $$Y_t=\beta_1 + \beta_2X_{2t} + \beta_3 X_{3t} + \cdots+ \beta_k X_{kt} + \gamma Y_{t-1}+u_t$$, where $Y_{t-1}$ is the one period lagged values of $Y$.
There are no missing observations in the data.

Limitations or Shortcomings of Durbin-Watson Test Statistics

Durbin-Watson test has several shortcomings:

The $d$ statistics is not an appropriate measure of autocorrelation if, among the explanatory variables, there are lagged values of the endogenous variables.
Durbin-Watson test is inconclusive if the computed value lies between $d_l$ and $d_u$.
It is inappropriate for testing higher-order serial correlation or for other forms of autocorrelation.

An Asymptotic or Large Sample Test

Under the null hypothesis that $\rho=0$ and assuming that the sample size $n$ is large, it can be shown that $\sqrt{n}\hat{\rho}$ follows the normal distribution with 0 mean and variance 1, i.e. asymptotically,

$$\sqrt{n}\,\, \hat{\rho} \sim N(0, 1)$$

Introduction to Autocorrelation

Computer MCQs Test Online

Sr. No.	Functional Form	Results
1)	$\|\hat{u}_t\| = \beta_1 + \beta_2 X_i +v_i$	$\|\hat{u}_i\| = 5.2666-0.00681X_i,\quad R^2=0.00004$ $t_{cal} = -0.014$

2)	$\|\hat{u}_t\| = \beta_1 + \beta_2 \sqrt{X_i} +v_i$	$\|\hat{u}_i\| = 5.445-0.0962X_i,\quad R^2=0.000389$ $t_{cal} = -0.04414$

3)	$\|\hat{u}_t\| = \beta_1 + \beta_2 \frac{1}{X_i} +v_i$	$\|\|\hat{u}_i\| = 4.9124+1.3571X_i,\quad R^2=0.00332$ $t_{cal} = -0.12914$

4)	$\|\hat{u}_t\| = \beta_1 + \beta_2 \frac{1}{\sqrt{X_i}} +v_i$	$\hat{u}_i\| = 4.7375+1.0428X_i,\quad R^2=0.00209$ $t_{cal} = 0.10252$

White General Heteroscedasticity Test

Share this:

The step-by-step procedure for conducting the Park Glejser test:

Share this:

Step by Step procedure for the Durbin-Watson Test

Share this: