Muhammad Imdad Ullah - Statistics for Data Science & Analytics

Goldfeld-Quandt Test Example (2020)

Apr 8, 2024Nov 14, 2020 by Muhammad Imdad Ullah

Data is taken from the Economic Survey of Pakistan 1991-1992. The data file link is at the end of the post “Goldfeld-Quandt Test Example for the Detection of Heteroscedasticity”.

Read about the Goldfeld-Quandt Test in detail by clicking the link “Goldfeld-Quandt Test: Comparison of Variances of Error Terms“.

Goldfeld-Quandt Test Example

For an illustration of the Goldfeld-Quandt Test Example, the data given in the file should be divided into two sub-samples after dropping (removing/deleting) the middle five observations.

Sub-sample 1 consists of data from 1959-60 to 1970-71.

Sub-sample 2 consists of data from 1976-77 to 1987-1988.

The sub-sample 1 is highlighted in green colour, and sub-sample 2 is highlighted in blue color, while the middle observation that has to be deleted is highlighted in red.

The Step-by-Step Procedure to Conduct the Goldfeld Quandt Test

Step 1: Order or Rank the observations according to the value of $X_i$. (Note that observations are already ranked.)

Step 2: Omit $c$ central observations. We selected 1/6 observations to be removed from the middle of the observations.

Step 3: Fit OLS regression on both samples separately and obtain the Residual Sum of Squares (RSS) for each sub-sample.

The Estimated regression for the two sub-samples are:

Sub-sample 1: $\hat{C}_1 = 1010.096 + 0.849 \text{Income}$

Sub-sample 2: $\hat{C}_2 = -244.003 + 0.88067 \text{Income}$

Now compute the Residual Sum of Squares for both sub-samples.

The residual Sum of Squares for Sub-Sample 1 is $RSS_1=2532224$

The residual Sum of Squares for Sub-Sample 2 is $RSS_2=10339356$

The F-Statistic is $ \lambda=\frac{RSS_2/n_2}{RSS_1/n_1}=\frac{10339356}{2532224}=4.083$

The critical value of $F(n_1=10, n_2=10$ at a 5% level of significance is 2.98.

Since the computed F value is greater than the critical value, heteroscedasticity exists in this case, that is, the variance of the error term is not consistent, rather it depends on the independent variable, GNP.

Your assignment is to perform the Goldfeld-Quandt Test Example using any statistical software and confirm the results.

Download the data file by clicking the link “GNP and consumption expenditure data“.

Learn about White’s Test of Heteroscedasticity

Learn R Programming

Online Test Preparation MCQS with Answers

First Order Autocorrelation (2020)

Apr 29, 2024Nov 10, 2020 by Muhammad Imdad Ullah

To understand the First Order Autocorrelation, consider the multiple regression model as described below

$$Y_t=\beta_1+\beta_2 X_{2t}+\beta_3 X_{3t}+\cdots+\beta_k X_{kt}+u_t,$$

In the model above the current observation of the error term ($u_t$) is a function of the previous (lagged) observation of the error term ($u_{t-1}$). That is,

\begin{align*}
u_t = \rho u_{t-1} + \varepsilon_t, \tag*{eq 1}
\end{align*}

where $\rho$ is the parameter depicting the functional relationship among observations of the error term $u_t$ and $\varepsilon_t$ is a stochastic error term which is iid (identically independently distributed). It satisfies the standard OLS assumption:

\begin{align*}
E(\varepsilon) &=0\\
Var(\varepsilon) &=\sigma_t^2\\
Cov(\varepsilon_t, \varepsilon_{t+s} ) &=0
\end{align*}

Note if $\rho=1$, then all these assumptions are undefined.

The scheme (eq1) is known as a Markov first-order autoregressive scheme, usually denoted by AR(1). The eq1 is interpreted as the regression of $u_t$ on itself tagged on period. It is first-order because $u_t$ and its immediate past value are involved. Note the $Var(u_t)$ is still homoscedasticity under the AR(1) scheme.

The coefficient $\rho$ is called the first order autocorrelation coefficient (also called the coefficient of autocovariance) and takes values from -1 to 1 or ($|\rho|<1$). The size of $\rho$ determines the strength of autocorrelation (serial correlation). There are three different cases:

If $\rho$ is zero, then there is no autocorrelation because $u_t=\varepsilon_t$.
If $\rho$ approaches 1, the value of the previous observation of the error ($u_t-1$) becomes more important in determining the value of the current error term ($u_t$), and therefore, greater positive autocorrelation exists. The negative error term will lead to negative and positive will lead to a positive error term.
If $\rho$ approaches -1, there is a very high degree of negative autocorrelation. The signs of the error term tend to switch signs from negative to positive and vice versa in consecutive observations.

First Order Autocorrelation AR(1)

\begin{align*}
u_t &= \rho u_{t-1}+\varepsilon_t\\
E(u_t) &= \rho E(u_{t-1})+ E(\varepsilon_t)=0\\
Var(u_t)&=\rho^2 Var(u_{t-1}+var(\varepsilon_t)\\
\text{Because $u$’s and $\varepsilon$’s are uncorrelated}\\
Var(u_t)&=\sigma^2\\
Var(u_{t-1}) &=\sigma^2\\
Var(\varepsilon_t)&=\sigma_t^2\\
\Rightarrow Var(u_t) &=\rho^2 \sigma^2+\sigma_t^2\\
\Rightarrow \sigma^2-\rho^2\sigma^2 &=\sigma_t^2\\
\Rightarrow \sigma^2(1-\rho^2)&=\sigma_t^2\\
\Rightarrow Var(u_t)&=\sigma^2=\frac{\sigma_t^2}{1-\rho^2}
\end{align*}

For covariance, multiply equation (eq1) by $u_{t-1}$ and taking the expectations on both sides

\begin{align*}
u_t\cdot u_{t-1} &= \rho u_{t-1} \cdot u_{t-1} + \varepsilon_t \cdot u_{t-1}\\
E(u_t u_{t-1}) &= E[\rho u_{t-1}^2 + u_{t-1}\varepsilon_t ]\\
cov(u_t, u_{t-1}) &= E(u_t u_{t-1}) = E[\rho u_{t-1}^2 + u_{t-1}\varepsilon_t ]\\
&=\rho \frac{\sigma_t^2}{1-\rho^2}\tag*{$\because Var(u_t) = \frac{\sigma_t^2}{1-\rho^2}$}
\end{align*}

Similarly,
\begin{align*}
cov(u_t,u_{t-2}) &=\rho^2 \frac{\sigma_t^2}{(1-\rho^2)}\\
cov(u_t,u_{t-2}) &= \rho^2 \frac{\sigma_t^2}{(1-\rho^2)}\\
cov(u_t, u_{t+s}) &= \rho^p
\end{align*}

The strength and direction of the correlation (positive or negative) and its distance from zero determine the significance of the first-order autocorrelation. Values close to $+1$ or $-1$ indicate strong positive or negative autocorrelation, respectively. A value close to zero suggests little to no autocorrelation.

Software like R, Python, and MS Excel have built-in functions to calculate autocorrelation. The visualization of ACF is often a preferred method to assess autocorrelation across different lags, not just the first order autocorrelation.

In summary, first order autocorrelation refers to the correlation between a time series and lagged values of the same time series, specifically at a lag of one time period. It measures how much a variable in a time series is related to its immediate past value.

Learn R Programming

Computer MCQs Online Test

What are the Consequences of Autocorrelation (2020)

Jul 23, 2024Nov 5, 2020 by Muhammad Imdad Ullah

Autocorrelation, when ignored, can lead to several issues in analyzing data, particularly in statistical models. In this post, we will discuss some important consequences of the existence of autocorrelation in the data. The consequences of the OLS estimators in the presence of Autocorrelation can be summarized as follows:

Consequences Patterns of Autocorrelation and Non-Autocorrelation

Consequences of Autocorrelation on OLS Estimators If Exists

When the disturbance terms are serially correlated then the OLS estimators of the $\hat{\beta}$s are still unbiased and consistent but the optimist property (minimum variance property) is not satisfied. This makes it harder to determine if the estimated effect of a variable is truly significant.
The OLS estimators will be inefficient and therefore, no longer BLUE. Inefficient means there could be better ways to estimate the model parameters that would produce more precise results with lower variance.
The estimated variance of the regression coefficients will be biased and inconsistent and will be greater than the variances of estimate calculated by other methods, therefore, hypothesis testing is no longer valid. In most of the cases, $R^2$ will be overestimated (indicating a better fit than the one that truly exists). The t- and F-statistics will tend to be higher. One might reject a true null hypothesis (meaning a relationship does not exist) or fail to reject a false one (meaning a relationship appears to exist when it does not).
The variance of random term $u$ may be under-estimated if the $u$’s are autocorrelated. That is, the random variance $\hat{\sigma}^2=\frac{\sum \hat{u}_i^2}{n-2}$ is likely to be under-estimate the true $\sigma^2$.
Among the consequences of autocorrelation, another is, that if the disturbance terms are autocorrelated then the OLS estimates are not asymptotic. That is $\hat{\beta}$s are not asymptotically efficient.

Therefore, autocorrelation may lead to misleading results and unreliable statistical tests. If autocorrelation is suspected in the data being analyzed, then use different statistical techniques to address it and improve the validity of your analysis.