OLS Estimation in the Presence of Heteroscedasticity (2020)

OLS Estimation Method is a widely used method in regression analysis for the estimation of the parameters used in a linear regression model. However, when heteroscedasticity exists (which refers to the situation where the variance of the error terms is not constant across observations) the assumptions of OLS may be violated. This violation leads to biased and inefficient parameter estimates, as well as unreliable hypothesis tests and confidence intervals. For more information see Consequences of Heteroscedasticity.

For the OLS Estimation in the presence of heteroscedasticity, consider the two-variable model

\begin{align*}
Y_i &= \beta_1 +\beta_2X_i + u_i\\
\hat{\beta}_2&=\frac{\sum x_i y_i}{\sum x_i^2}\\
Var(\hat{\beta}_2)&= \frac{\sum x_i^2\, \sigma_i^2}{(\sum x_i^2)^2}
\end{align*}

OLS Estimation in the Presence of Heteroscedasticity

OLS Estimation in the Presence of Heteroscedasticity, the variance of the OLS estimator will be

$Var(\hat{\beta}_2)$ under the assumption of homoscedasticity is $Var(\hat{\beta}_2)=\frac{\sigma^2}{\sum x_i^2}$. If $\sigma_i^2=\sigma^2$ the both $Var(\hat{\beta}_2)$ will be same.

OLS Estimation in the Presence of Heteroscedasticity (2020)

Note that in the case of heteroscedasticity, the OLS estimators

  • $\hat{\beta_2}$ is BLUE if the assumptions of the classical model, including homoscedasticity, hold.
  • To establish the unbiasedness of $\hat{\beta}_2$, it is not necessary for the disturbances ($u_i$) to be homoscedastic.
  • The variance of $u_i$, homoscedasticity, or heteroscedasticity plays no part in the determination of the unbiasedness property.
  • $\hat{\beta}_2$ will be a consistent estimator despite heteroscedasticity.
  • With the increase of sample size indefinitely, the $\hat{\beta}_2$ (estimated $\beta_2$) converges to its true value.
  • $\hat{\beta}_2$ is asymptotically normally distributed.
OLS Estimation in the presence of Heteroscedasticity

Overall, addressing the existence of heteroscedasticity in regression analysis is crucial to ensure the validity and reliability of the estimated parameters and inference results. Various methods and techniques are available to account for heteroscedasticity and obtain accurate estimates in regression analysis. For more details see Tests of Heteroscedasticity.

The best approach to address heteroscedasticity depends on the specific situation and the characteristics of the data being studied. The general guidelines are:

  • For mild heteroscedasticity, robust standard errors might be sufficient.
  • If the form of heteroscedasticity is known and the assumptions are comfortable, consider WLS or GLS.
  • Data transformation can be a simple solution, but weigh the benefits against the potential drawbacks of interpreting the transformed coefficients.

Remember that The OLS estimates remain unbiased under heteroscedasticity, however, addressing it can improve the efficiency and reliability of regression analysis, leading to more robust and interpretable results.

https://itfeature.com statistics help

Learn about Heteroscedasticity Tests and Remedies

MCQ Test Online

Learn R Software

Nature of Heteroscedasticity (2020)

Let us start with the nature of heteroscedasticity.

The assumption of homoscedasticity (equal spread, equal variance) is

$$E(u_i^2)=E(u_i^2|X_{2i},X_{3i},\cdots, X_{ki})=\sigma^2,\quad 1,2,\cdots, n$$

Nature of Heteroscedasticity (2020)

The above Figure shows that the conditional variance of $Y_i$ (which is equal to that of $u_i$), conditional upon the given $X_i$, remains the same regardless of the values taken by the variable $X$.

Nature of Heteroscedasticity

The Figure shows that the conditional value of $Y_i$ increases as $X$ increases. The variance of $Y_i$ is not the same, there is heteroscedasticity.

$$E(u_i^2)=E(u_i^2|X_{2i},X_{3i},\cdots, X_{ki})=\sigma_i^2$$

Nature of Heteroscedasticity

The nature of heteroscedasticity refers to the violation of the assumption of homoscedasticity in linear regression models. In the case of heteroscedasticity, the errors have unequal variances for different levels of the regressors, which leads to biased and inefficient estimators of the regression coefficients. There are several reasons why the variances of $u_i$ may be variable:

  • Following the error-learning models, as people learn, their error of behavior becomes smaller over time or the number of errors becomes more consistent. In such cases, $\sigma_i^2$ is expected to decrease.
  • As income grows, people have more discretionary income (income remaining after deduction of taxes) and hence more scope for choice about disposition (برتاؤ، قابو) of their income. Similarly, companies with larger profits are generally expected to show greater variability in their dividend (کمپنی کا منافع) policies than companies with lower profits.
  • As data collecting techniques improve $\sigma_i^2$ is likely to decrease. For example, Banks having sophisticated data processing equipment are likely to commit fewer errors in the monthly or quarterly statements of their customers than banks without such equipment.
  • Heteroscedasticity can also arise as a result of the presence of outliers. The inclusion or exclusion of such an observation, especially if the sample size is small, can substantially (معقول حد تک، درحقیقت) alter the results of regression analysis.
  • The omission of variables also results in the problem of Heteroscedasticity. Upon deleting the variable from the model the researcher would not be able to interpret anything from the model.
    \item Heteroscedasticity may arise from the violation of the assumption of CLRM that the model is correctly specified.
  • Skewness in the distribution of one or more regressors is another source of heteroscedasticity. For example, income is uneven.
  • Incorrect data transformation (ratio or first difference), and incorrect functional form (linear vs log-linear) are also the source of heteroscedasticity.
  • The problem of heteroscedasticity is likely to be more in cross-sectional data than in time series data.
https://itfeature.com Statistics Help

Computer MCQs

Learn R Programming

Introduction Heteroscedasticity (2020)

The pose is about a general discussion and an introduction to heteroscedasticity.

Introduction Heteroscedasticity and Homoscedasticity

The term heteroscedasticity refers to the violation of the assumption of homoscedasticity in linear regression models (LRM). In the case of heteroscedasticity, the errors have unequal variances for different levels of the regressors, which leads to biased and inefficient estimators of the regression coefficients. The disturbances $u_i$ in the Classical Linear Regression Model (CLRM) appearing in the population regression function should be homoscedastic; that is they all have the same variance.

In short words, heteroscedasticity means different (or unequal), and the Greek word Skodastic means spread (or scatter). Homoscedasticity means equal spread and heteroscedasticity means unequal spread.

Effect on the Var-Cov Matrix of the Error Terms:
The Var-Cov matrix of errors is

$$E(uu’) = E(u_i^2)=Var(u_i^)=\begin{pmatrix}
\sigma^2 & 0 & \cdots & 0\\ 0 & \sigma^2 & \vdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ 0&0&\ddots &\sigma^2
\end{pmatrix}=\sigma^2 I_n,$$

where $I_n$ is an $n\times n$ identity matrix.

In the presence of heteroscedasticity, the Var-Cov matrix of the residuals will no longer be constant.

$$E(uu’)= E(u_i^2)=Var(u_i^)==\begin{pmatrix}
\sigma_1^2 & 0 & 0 & \cdots & 0 \\0 & \sigma^2_2 & 0 & \cdots & 0 \\ 0 & 0 & \sigma^2_3 & \cdots & 0 \\ 0 & 0 & 0 &\ddots & \sigma_n^2
\end{pmatrix}$$

The Var-Cov matrix of the OLS estimators $\hat{\beta}$ is

\begin{align*}
Cov(\hat{\beta}) &= E\left[(\hat{\beta}-\beta)(\hat{\beta}-\beta)’ \right]\\
&=E\left[[(X’X)^{-1}X’u][(X’X)^{-1}X’u]’ \right]\\
&=E\left[(X’X)^{-1}X’uu’X(X’X)^{-1} \right]\\
&=(X’X)^{-1}X’E(uu’)X(X’X)^{-1}\\
&=(X’X)^{-1}X’\Omega X (X’X)^{-1}
\end{align*}

The following are questions when we are concerned with heteroscedasticity:

That’s all about some basic introduction to heteroscedasticity.

https://itfeautre.com

Learn R Programming

Basic Computer MCQs

Remedial Measures of Heteroscedasticity (2018)

The post is about Remedial Measures of Heteroscedasticity.

Heteroscedasticity is a condition in which the variance of the residual term, or error term, in a regression model, varies widely.

The heteroscedasticity does not destroy the unbiasedness and consistency properties of the OLS estimator (as OLS estimators remain unbiased and consistent in the presence of heteroscedasticity), but they are no longer efficient, not even asymptotically. The lack of efficiency makes the usual hypothesis testing procedure dubious (مشکوک، غیر معتبر). Therefore, there should be some remedial measures for heteroscedasticity.

Homoscedasticity

Remedial Measures of Heteroscedasticity

For remedial measures of heteroscedasticity, there are two approaches: (i) when $\sigma_i^2$ is known, and (ii) when $\sigma_i^2$ is unknown.

(i) $\sigma_i^2$ is known

Consider the simple linear regression model $Y_i=\alpha + \beta X_i + u_i$.

If $V(u_i)=\sigma_i^2$ then heteroscedasticity is present. Given the values of $\sigma_i^2$, heteroscedasticity can be corrected by using weighted least squares (WLS) as a special case of Generalized Least Squares (GLS). Weighted least squares is the OLS method of estimation applied to the transformed model.

When heteroscedasticity is detected by any appropriate statistical test, then the appropriate solution is to transform the original model in such a way that the transformed disturbance term has a constant variance. The transformed model reduces the adjustment of the original data. The transformed error term $u_i$ has a constant variance i.e. homoscedastic. Mathematically

\begin{eqnarray*}
V(u_i^*)&=&V\left(\frac{u_i}{\sigma_i}\right)\\
&=&\frac{1}{\sigma_i^2}Var(u_i)\\
&=&\frac{1}{\sigma_i^2}\sigma_i^2=1
\end{eqnarray*}

This approach has limited use as the individual error variances are not always known a priori. In case of significant sample information, reasonable guesses of the true error variances can be made and be used for $\sigma_i^2$.

Let us discuss the second remedy of heteroscedasticity from remedial measures of heteroscedasticity.

(ii) $\sigma_i^2$ is unknown

If $\sigma_i^2$ is not known a priori, then heteroscedasticity is corrected by hypothesizing a relationship between the error variance and one of the explanatory variables. There can be several versions of the hypothesized relationship. Suppose the hypothesized relationship is $Var(u)=\sigma^2 X_i^2$ (error variance is proportional to $X_i^2$). For this hypothesized relation we will use the following transformation to correct for heteroscedasticity for the following simple linear regression model $Y_i =\alpha + \beta X_i +u_i$.
\begin{eqnarray*}
\frac{Y_i}{X_i}&=&\frac{\alpha}{X_i}+\beta+\frac{u_i}{X_i}\\
\Rightarrow \quad Y_i^*&=&\beta +\alpha_i^*+u_i^*\\
\mbox{where } Y_i^*&=&\frac{Y_i}{X_i}, \alpha_I^*=\frac{1}{X_i} \mbox{and  } u_i^*=\frac{u}{X_i}
\end{eqnarray*}

Now the OLS estimation of the above transformed model will yield the efficient parameter estimates as $u_i^*$’s have constant variance. i.e.

\begin{eqnarray*}
V(u_i^*)&=&V(\frac{u_i}{X_i})\\
&=&\frac{1}{X_i^2} V(u_i^2)\\
&=&\frac{1}{X_i^2}\sigma^2X_i^2\\
&=&\sigma^2=\mbox{ Constant}
\end{eqnarray*}

Remedial Measures of Heteroscedasticity (2018)

For remedial measures of heteroscedasticity, some other hypothesized relations are:

  • Error variance is proportional to $X_i$ (Square root transformation) i.e $E(u_i^2)=\sigma^2X_i$
    The transformed model is
    \[\frac{Y_i}{\sqrt{X_i}}=\frac{\alpha}{\sqrt{X_i}}+\beta\sqrt{X_i}+\frac{u_i}{\sqrt{X_i}}\]
    It (the transformed model) has no intercept term. Therefore we have to use the regression through the origin model to estimate $\alpha$ and $\beta$. To get the original model, multiply $\sqrt{X_i}$ with the transformed model.
  • Error Variance is proportional to the square of the mean value of $Y$. i.e. $E(u_i^2)=\sigma^2[E(Y_i)]^2$
    Here the variance of $u_i$ is proportional to the square of the expected value of $Y$, and $E(Y_i)$ = \alpha + \beta X_i$.
    The transformed model will be
    \[\frac{Y_i}{E(Y_i)}=\frac{\alpha}{E(Y_i)}+\beta\frac{X_i}{E(Y_i)}+\frac{u_i}{E(Y_i)}\]
    This transformation is not appropriate because $E(Y_i)$ depends upon $\alpha$ and $\beta$ which are unknown parameters. $\hat{Y_i}=\hat{\alpha}+\hat{\beta}$ is an estimator of $E(Y_i)$, so we will proceed in two steps:
     
    1. We run the usual OLS regression dis-regarding the heteroscedasticity problem and obtain $\hat{Y_i}$
    2. We will transform the model by using estimated $\hat{Y_i}$ i.e. $\frac{Y_i}{\hat{Y_i}}=\alpha\frac{1}{\hat{Y_i}}+\beta_1\frac{X_i}{\hat{Y_i}}+\frac{u_i}{\hat{Y_i}}$ and run the regression on transformed model.

      This transformation will perform satisfactory results only if the sample size is reasonably large.

  • Log transformation such as $ln\, Y_i = \alpha + \beta\, ln\, X_i + u_i$.
    Log transformation compresses the scales in which the variables are measured. However, this transformation is not applicable in some of the $Y$ and $X$ values that are zero or negative.

Visit: R Language Frequently Asked Questions