Heteroscedasticity in Regression (2020)

Heteroscedasticity in Regression

Heteroscedasticity in Regression: The term heteroscedasticity refers to the violation of the assumption of homoscedasticity in linear regression models (LRM). In the case of heteroscedasticity, the errors have unequal variances for different levels of the regressors, which leads to biased and inefficient estimators of the regression coefficients. The disturbances in the Classical Linear Regression Model (CLRM) appearing in the population regression function should be homoscedastic; that is they all have the same variance.

Mathematical Proof of $E(\hat{\sigma}^2)\ne \sigma^2$ when there is some presence of hetero in the data.

For the proof of $E(\hat{\sigma}^2)\ne \sigma^2$, consider the two-variable linear regression model in the presence of heteroscedasticity,

\begin{align}
Y_i=\beta_1 + \beta_2 X+ u_i, \quad\quad (eq1)
\end{align}

where $Var(u_i)=\sigma_i^2$ (Case of heteroscedasticity)

as

\begin{align}
\hat{\sigma^2} &= \frac{\sum \hat{u}_i^2 }{n-2}\\
&= \frac{\sum (Y_i – \hat{Y}_i)^2 }{n-2}\\
&=\frac{(\beta_1 + \beta_2 X_i + u_i – \hat{\beta}_1 -\hat{\beta}_2 X_i )^2}{n-2}\\
&=\frac{\sum \left( -(\hat{\beta}_1-\beta_1) – (\hat{\beta}_2 – \beta_2)X_i + u_i \right)^2 }{n-2}\quad\quad (eq2)
\end{align}

Noting that

\begin{align*}
(Y_i-\hat{Y}_i)&=0\\
\beta_1 + \beta_2 X + u_i\, – \,\hat{\beta}_1 – \hat{\beta}_2X &=0\\
-(\hat{\beta}_1 -\beta_1) – X(\hat{\beta}_2-\beta_2) – u_i & =0\\
(\hat{\beta}_1 -\beta_1) &= – X (\hat{\beta}_2-\beta_2) + u_i\\
\text{Applying summation on both side}&\\
\sum (\hat{\beta}_1-\beta_1) &= -(\hat{\beta}_2-\beta_2)\sum X + \sum u_i\\
(\hat{\beta}_1 – \beta_1) &= -(\hat{\beta}_2-\beta_2)\overline{X}+\overline{u}
\end{align*}

Substituting it in (eq2) and taking expectation on both sides:

\begin{align}
\hat{\sigma}^2 &= \frac{1}{n-2} \left[ -(-(\hat{\beta}_2 – \beta_2) \overline{X} + \overline{u} ) – (\hat{\beta}_2-\beta_2)X_i + u_i  \right]^2\\
&=\frac{1}{n-2}E\left[(\hat{\beta}_2-\beta_2)\overline{X} -\overline{u} – (\hat{\beta}_2-\beta_2)X_i-u_i \right]^2\\
&=\frac{1}{n-2} E\left[ -(\hat{\beta}_2 – \beta_2)(X_i-\overline{X}) + (u_i-\overline{u})\right]^2\\
&= \frac{1}{n-2}\left[-\sum x_i^2 Var(\hat{\beta}_2) + E[\sum(u_i-\overline{u}]^2 \right]\\
&=\frac{1}{n-2} \left[ -\frac{\sum x_i^2 \sigma_i^2}{(\sum x_i^2)} + \frac{(n-1)\sum \sigma_i^2}{n} \right]
\end{align}

If there is homoscedasticity, then $\sigma_i^2=\sigma^2$ for each $i$, $E(\hat{\sigma}_i^2)=\sigma^2$.

The expected value of the $\hat{\sigma}^2=\frac{\hat{u}_i^2}{n-2}$ will not be equal to the true $\sigma^2$ in the presence of heteroscedasticity.


Heteroscedasticity in regression

To address heteroscedasticity in regression analysis, several techniques can be used to stabilize the variance of the errors:

  1. Transformations: Transforming the variables (such as using logarithmic or square root transformations) can sometimes help stabilize the variance of the errors.
  2. Weighted Least Squares (WLS): WLS is a method that assigns different weights to observations based on their variances, thereby giving more weight to observations with smaller variances. This may also help to mitigate the impact of heteroscedasticity on the estimation of parameters.
  3. Robust Standard Errors: heteroscedasticity-consistent standard errors also known as Robust standard errors, provide a way to correct standard errors and hypothesis tests in the presence of heteroscedasticity without requiring assumptions about the specific form of heteroscedasticity.
  4. Generalized Least Squares (GLS): The GLS method allows to estimation of regression coefficients under a broader range of assumptions about the variance-covariance structure of the errors, including heteroscedasticity.

Overall, detecting and addressing heteroscedasticity is important for ensuring the validity and reliability of regression analysis results.

Read more on the Remedy of Heteroscedasticity

More on heteroscedasticity on Wikipedia

MCQs General Knowledge

R Programming Language

Heteroscedasticity Consequences

Heteroscedasticity refers to a situation in which the variability of the errors (residuals) in a regression model is not constant across all levels of the independent variable(s). It refers to the violation of the assumption of homoscedasticity in linear regression models (LRM).

Heteroscedasticity Consequences

A short detail about the Heteroscedasticity Consequences is described below:

  • The OLS estimators and regression predictions based on them remain unbiased and consistent.
  • The OLS estimators are no longer the BLUE (Best Linear Unbiased Estimators) because they are no longer efficient, so the regression predictions will be inefficient too.
  • Because of the inconsistency of the covariance matrix of the estimated regression coefficients, the tests of hypotheses, (t-test, F-test) are no longer valid.

A detailed discussion about the Heteroscedasticity Consequences are:

Heteroscedasticity Consequences
  1. Inefficient Estimates: As a result of a violation of the homoscedasticity assumption, the OLS estimates become inefficient, that is, the estimators are not more Best Linear Unbiased Estimators (BLUE) and therefore, could have larger standard errors. The large standard errors may lead to incorrect conclusions about the statistical significance of the regression coefficients.
  2. Biased Estimates: Heteroscedasticity may lead to biased estimates of regression coefficients. In the case of heteroscedasticity, the ordinary least squares estimators (OLSE) are still unbiased, but they are no longer the most efficient estimators, as estimators may have larger possible variances. The estimated coefficients for the regressors may not accurately reflect the true population parameters.
  3. Incorrect Standard Errors: The standard errors of the regression coefficients are biased in the presence of heteroscedasticity, which leads to inaccurate inference in hypothesis testing, including incorrect t-test, F-test, and p-values. Researchers may mistakenly conclude that a variable is not statistically significant when it is, or vice versa.
  4. Invalid Inference: Larger standard errors may also lead to invalid inferences about the population parameters, it is because the confidence intervals and hypothesis tests based on these estimates may be unreliable and become wider to include the population parameter.
  5. Model Misspecification: Heteroscedasticity may indicate a misspecification of the underlying model. If the assumption of constant variance is violated, it suggests that there may be unaccounted-for factors or omitted variables influencing the variability of the errors. It suggests that the model may not be capturing all the variability in the data adequately.
  6. Inflated Type I Errors: Heteroscedasticity can lead to inflated Type I errors (false positives) in hypothesis tests. Researchers might mistakenly reject null hypotheses when they should not, leading to incorrect conclusions.
  7. Suboptimal Forecasting: Models affected by heteroscedasticity may provide suboptimal forecasts since the variability of the errors is not accurately captured. This can impact the model’s ability to make reliable predictions.
  8. Robustness Issues: Heteroscedasticity can make regression models less robust, meaning that their performance deteriorates when applied to different datasets or when the underlying assumptions are not met.

The Test of heteroscedasticity, such as the Breusch-Pagan test, or the White test of heteroscedasticity, and consider corrective measures like weighted least squares regression or transforming the data.

Learn about Remedial Measures of Heteroscedasticity

R Programming Language

Test Preparation MCQs

Nature of Heteroscedasticity (2020)

Let us start with the nature of heteroscedasticity.

The assumption of homoscedasticity (equal spread, equal variance) is

$$E(u_i^2)=E(u_i^2|X_{2i},X_{3i},\cdots, X_{ki})=\sigma^2,\quad 1,2,\cdots, n$$

Nature of Heteroscedasticity (2020)

The above Figure shows that the conditional variance of $Y_i$ (which is equal to that of $u_i$), conditional upon the given $X_i$, remains the same regardless of the values taken by the variable $X$.

Nature of Heteroscedasticity

The Figure shows that the conditional value of $Y_i$ increases as $X$ increases. The variance of $Y_i$ is not the same, there is heteroscedasticity.

$$E(u_i^2)=E(u_i^2|X_{2i},X_{3i},\cdots, X_{ki})=\sigma_i^2$$

Nature of Heteroscedasticity

The nature of heteroscedasticity refers to the violation of the assumption of homoscedasticity in linear regression models. In the case of heteroscedasticity, the errors have unequal variances for different levels of the regressors, which leads to biased and inefficient estimators of the regression coefficients. There are several reasons why the variances of $u_i$ may be variable:

  • Following the error-learning models, as people learn, their error of behavior becomes smaller over time or the number of errors becomes more consistent. In such cases, $\sigma_i^2$ is expected to decrease.
  • As income grows, people have more discretionary income (income remaining after deduction of taxes) and hence more scope for choice about disposition (برتاؤ، قابو) of their income. Similarly, companies with larger profits are generally expected to show greater variability in their dividend (کمپنی کا منافع) policies than companies with lower profits.
  • As data collecting techniques improve $\sigma_i^2$ is likely to decrease. For example, Banks having sophisticated data processing equipment are likely to commit fewer errors in the monthly or quarterly statements of their customers than banks without such equipment.
  • Heteroscedasticity can also arise as a result of the presence of outliers. The inclusion or exclusion of such an observation, especially if the sample size is small, can substantially (معقول حد تک، درحقیقت) alter the results of regression analysis.
  • The omission of variables also results in the problem of Heteroscedasticity. Upon deleting the variable from the model the researcher would not be able to interpret anything from the model.
    \item Heteroscedasticity may arise from the violation of the assumption of CLRM that the model is correctly specified.
  • Skewness in the distribution of one or more regressors is another source of heteroscedasticity. For example, income is uneven.
  • Incorrect data transformation (ratio or first difference), and incorrect functional form (linear vs log-linear) are also the source of heteroscedasticity.
  • The problem of heteroscedasticity is likely to be more in cross-sectional data than in time series data.
https://itfeature.com Statistics Help

Computer MCQs

Learn R Programming

Introduction Heteroscedasticity (2020)

The pose is about a general discussion and an introduction to heteroscedasticity.

Introduction Heteroscedasticity and Homoscedasticity

The term heteroscedasticity refers to the violation of the assumption of homoscedasticity in linear regression models (LRM). In the case of heteroscedasticity, the errors have unequal variances for different levels of the regressors, which leads to biased and inefficient estimators of the regression coefficients. The disturbances $u_i$ in the Classical Linear Regression Model (CLRM) appearing in the population regression function should be homoscedastic; that is they all have the same variance.

In short words, heteroscedasticity means different (or unequal), and the Greek word Skodastic means spread (or scatter). Homoscedasticity means equal spread and heteroscedasticity means unequal spread.

Effect on the Var-Cov Matrix of the Error Terms:
The Var-Cov matrix of errors is

$$E(uu’) = E(u_i^2)=Var(u_i^)=\begin{pmatrix}
\sigma^2 & 0 & \cdots & 0\\ 0 & \sigma^2 & \vdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ 0&0&\ddots &\sigma^2
\end{pmatrix}=\sigma^2 I_n,$$

where $I_n$ is an $n\times n$ identity matrix.

In the presence of heteroscedasticity, the Var-Cov matrix of the residuals will no longer be constant.

$$E(uu’)= E(u_i^2)=Var(u_i^)==\begin{pmatrix}
\sigma_1^2 & 0 & 0 & \cdots & 0 \\0 & \sigma^2_2 & 0 & \cdots & 0 \\ 0 & 0 & \sigma^2_3 & \cdots & 0 \\ 0 & 0 & 0 &\ddots & \sigma_n^2
\end{pmatrix}$$

The Var-Cov matrix of the OLS estimators $\hat{\beta}$ is

\begin{align*}
Cov(\hat{\beta}) &= E\left[(\hat{\beta}-\beta)(\hat{\beta}-\beta)’ \right]\\
&=E\left[[(X’X)^{-1}X’u][(X’X)^{-1}X’u]’ \right]\\
&=E\left[(X’X)^{-1}X’uu’X(X’X)^{-1} \right]\\
&=(X’X)^{-1}X’E(uu’)X(X’X)^{-1}\\
&=(X’X)^{-1}X’\Omega X (X’X)^{-1}
\end{align*}

The following are questions when we are concerned with heteroscedasticity:

That’s all about some basic introduction to heteroscedasticity.

https://itfeautre.com

Learn R Programming

Basic Computer MCQs