Remedial Measures of Heteroscedasticity (2018)

The post is about Remedial Measures of Heteroscedasticity.

Heteroscedasticity is a condition in which the variance of the residual term, or error term, in a regression model, varies widely.

The heteroscedasticity does not destroy the unbiasedness and consistency properties of the OLS estimator (as OLS estimators remain unbiased and consistent in the presence of heteroscedasticity), but they are no longer efficient, not even asymptotically. The lack of efficiency makes the usual hypothesis testing procedure dubious (مشکوک، غیر معتبر). Therefore, there should be some remedial measures for heteroscedasticity.

Homoscedasticity

Remedial Measures of Heteroscedasticity

For remedial measures of heteroscedasticity, there are two approaches: (i) when $\sigma_i^2$ is known, and (ii) when $\sigma_i^2$ is unknown.

(i) $\sigma_i^2$ is known

Consider the simple linear regression model $Y_i=\alpha + \beta X_i + u_i$.

If $V(u_i)=\sigma_i^2$ then heteroscedasticity is present. Given the values of $\sigma_i^2$, heteroscedasticity can be corrected by using weighted least squares (WLS) as a special case of Generalized Least Squares (GLS). Weighted least squares is the OLS method of estimation applied to the transformed model.

When heteroscedasticity is detected by any appropriate statistical test, then the appropriate solution is to transform the original model in such a way that the transformed disturbance term has a constant variance. The transformed model reduces the adjustment of the original data. The transformed error term $u_i$ has a constant variance i.e. homoscedastic. Mathematically

\begin{eqnarray*}
V(u_i^*)&=&V\left(\frac{u_i}{\sigma_i}\right)\\
&=&\frac{1}{\sigma_i^2}Var(u_i)\\
&=&\frac{1}{\sigma_i^2}\sigma_i^2=1
\end{eqnarray*}

This approach has limited use as the individual error variances are not always known a priori. In case of significant sample information, reasonable guesses of the true error variances can be made and be used for $\sigma_i^2$.

Let us discuss the second remedy of heteroscedasticity from remedial measures of heteroscedasticity.

(ii) $\sigma_i^2$ is unknown

If $\sigma_i^2$ is not known a priori, then heteroscedasticity is corrected by hypothesizing a relationship between the error variance and one of the explanatory variables. There can be several versions of the hypothesized relationship. Suppose the hypothesized relationship is $Var(u)=\sigma^2 X_i^2$ (error variance is proportional to $X_i^2$). For this hypothesized relation we will use the following transformation to correct for heteroscedasticity for the following simple linear regression model $Y_i =\alpha + \beta X_i +u_i$.
\begin{eqnarray*}
\frac{Y_i}{X_i}&=&\frac{\alpha}{X_i}+\beta+\frac{u_i}{X_i}\\
\Rightarrow \quad Y_i^*&=&\beta +\alpha_i^*+u_i^*\\
\mbox{where } Y_i^*&=&\frac{Y_i}{X_i}, \alpha_I^*=\frac{1}{X_i} \mbox{and  } u_i^*=\frac{u}{X_i}
\end{eqnarray*}

Now the OLS estimation of the above transformed model will yield the efficient parameter estimates as $u_i^*$’s have constant variance. i.e.

\begin{eqnarray*}
V(u_i^*)&=&V(\frac{u_i}{X_i})\\
&=&\frac{1}{X_i^2} V(u_i^2)\\
&=&\frac{1}{X_i^2}\sigma^2X_i^2\\
&=&\sigma^2=\mbox{ Constant}
\end{eqnarray*}

Remedial Measures of Heteroscedasticity (2018)

For remedial measures of heteroscedasticity, some other hypothesized relations are:

  • Error variance is proportional to $X_i$ (Square root transformation) i.e $E(u_i^2)=\sigma^2X_i$
    The transformed model is
    \[\frac{Y_i}{\sqrt{X_i}}=\frac{\alpha}{\sqrt{X_i}}+\beta\sqrt{X_i}+\frac{u_i}{\sqrt{X_i}}\]
    It (the transformed model) has no intercept term. Therefore we have to use the regression through the origin model to estimate $\alpha$ and $\beta$. To get the original model, multiply $\sqrt{X_i}$ with the transformed model.
  • Error Variance is proportional to the square of the mean value of $Y$. i.e. $E(u_i^2)=\sigma^2[E(Y_i)]^2$
    Here the variance of $u_i$ is proportional to the square of the expected value of $Y$, and $E(Y_i)$ = \alpha + \beta X_i$.
    The transformed model will be
    \[\frac{Y_i}{E(Y_i)}=\frac{\alpha}{E(Y_i)}+\beta\frac{X_i}{E(Y_i)}+\frac{u_i}{E(Y_i)}\]
    This transformation is not appropriate because $E(Y_i)$ depends upon $\alpha$ and $\beta$ which are unknown parameters. $\hat{Y_i}=\hat{\alpha}+\hat{\beta}$ is an estimator of $E(Y_i)$, so we will proceed in two steps:
     
    1. We run the usual OLS regression dis-regarding the heteroscedasticity problem and obtain $\hat{Y_i}$
    2. We will transform the model by using estimated $\hat{Y_i}$ i.e. $\frac{Y_i}{\hat{Y_i}}=\alpha\frac{1}{\hat{Y_i}}+\beta_1\frac{X_i}{\hat{Y_i}}+\frac{u_i}{\hat{Y_i}}$ and run the regression on transformed model.

      This transformation will perform satisfactory results only if the sample size is reasonably large.

  • Log transformation such as $ln\, Y_i = \alpha + \beta\, ln\, X_i + u_i$.
    Log transformation compresses the scales in which the variables are measured. However, this transformation is not applicable in some of the $Y$ and $X$ values that are zero or negative.

Visit: R Language Frequently Asked Questions

Block Design, Incidence, and Concurrence Matrix (2018)

Block Design Properties

The necessary conditions that the parameters of a Balanced Incomplete Block Design (BIB design) must satisfy are

  • $bk = vr$, where $r=\frac{bk}{v}$ each treatment has $r$ replications
  • no treatment appears more than once in any block
  • all unordered pairs of treatments appear exactly in $\lambda$ blocks (equi-concurrence)
    where $\lambda=\frac{r(k-1)}{v-1}=\frac{bk(k-1}{v(v-1)}$ is often referred to as the concurrence parameter of a BIB design.

A design say $d$ with parameters $(v, b, r, k, \lambda)$ can be represented as a $v \times b$ treatment block incidence matrix (having $v$ rows and $b$ columns). Let denote it by $N=n_{ij}$ whose elements $n_{ij}$ signify the number of units in block $j$ allocated to treatment $i$. The rows of the incidence matrix are labeled with varieties (treatments) of the design and the columns with the blocks.

We have to put 1 in the ($i$, $j$)th cell of the matrix if variety $i$ is contained in block $j$ and 0 otherwise. Each row of the incidence matrix has $r$ 1’s, each column has $k$ 1’s, and each pair of distinct rows has $\lambda$ column 1’s, leading to a useful identity matrix.
The matrix $NN’$ has $v$ rows and $v$ columns, referred to as the concurrence matrix of design $d$, and its entries, the concurrence parameters are denoted by $\lambda_{dij}$. For a BIBD, $n_{ij}$ is either one or zero, and $n_{ij}^2= n_{ij}$.

Theorem: If $N$ is the incidence matrix of a $(v, b, r, k, \lambda)$-design then $NN’=(r-\lambda)I+\lambda J$ where $I$ is $v\times v$ identity matrix and $J$ is the $v\times v$ matrix of all 1’s.

Example: For Block Design {1,2,3}, {2,3,4}, {3,4,1}, {4,1,2} construct incidence matrix

Block Design: incidence matrix
Incidence and Concurrence matrix


Denoting the elements of $NN’$ by $q_{ih}$, we see that $q_{ii}=\sum_j n_{ij}^2$ and $q_{ih}=\sum_j n_{ij} n_{hj}, (i \ne h)$. For any block design $NN’$, the treatment concurrence with diagonal elements equal to $q_{ii}=r$ and off-diagonal elements are $q_{ih}=\lambda, (i\ne h)$ equal to the number of times any pairs of treatment occur together within the block. In a balanced design, the off-diagonal entries in $NN’$ are all equal to a constant $\lambda$ i.e., the common replication for a BIBD is $r$, and the common pairwise treatment concurrence is $\lambda$.

$N$ is a matrix of $v$ rows and $b$ columns that $r(N)\le min(b, c)$. Hence, $t\le min(b, v)$. If design is symmetric $b=v$ and $N$ is square the $|NN’|=|N|^2$, so $(r-\lambda)^{v-1}r^2$ is a perfect square.

Using R Packages

MCQs General Knowledge

Heteroscedasticity Tests and Remedies (2018)

The post is about Heteroscedasticity Tests and Remedies of Heteroscedasticity.

There is a set of heteroscedasticity tests and remedies that require an assumption about the structure of the heteroscedasticity if it exists. That is, to use these tests you must choose a specific functional form for the relationship between the error variance and the variables that you believe determine the error variance. The major difference between these tests is the functional form that each test assumes.

Heteroscedasticity Tests

Breusch-Pagan Test

The Breusch-Pagan test assumes the error variance is a linear function of one or more variables.

Harvey-Godfrey Test

The Harvey-Godfrey test assumes the error variance is an exponential function of one or more variables. The variables are usually assumed to be one or more of the explanatory variables in the regression equation.

The White Test

The white test of heteroscedasticity is a general test for the detection of heteroscedasticity existence in the data set. It has the following advantages:

  1. It does not require you to specify a model of the structure of the heteroscedasticity if it exists.
  2. It does not depend on the assumption that the errors are normally distributed.
  3. It specifically tests if the presence of heteroscedasticity causes the OLS formula for the variances and the covariances of the estimates to be incorrect.

Remedies for Heteroscedasticity

Suppose that you find the evidence of existence of heteroscedasticity. If you use the oLS estimator, you will get unbiased but inefficient estimates of the parameters of the model. Also, the estimates of the variances and covariances of the parameter estimates will be biased and inconsistent, and as a result, hypothesis tests will not be valid. When there is evidence of heteroscedasticity, econometricians do one of the two things:

  • Use the OLS estimator to estimate the parameters of the model. Correct the estimates of the variances and covariances of the OLS estimates so that they are consistent.
  • Use an estimator other than the OLS estimator to estimate the parameters of the model.
Heteroscedasticity Tests

Many econometricians choose the first alternative. This is because the most serious consequence of using the OLS estimator when there is heteroscedasticity is that the estimates of the variances and covariances of the parameter estimates are biased and inconsistent. If this problem is corrected, then the only shortcoming of using OLS is that you lose some precision relative to some other estimator that you could have used.

Heteroscedasticity Pattern, Tests, and Remedy

However, to get more precise estimates with an alternative estimator, you must know the approximate structure of the heteroscedasticity. If you specify the wrong model of heteroscedasticity, then this alternative estimator can yield estimates that are worse than the OLS

Learn R Programming Language