Heteroscedasticity Consistent Standard Errors

The post is about “Heteroscedasticity Consistent Standard Errors and Variances.

$\sigma_i^2$ are rarely known. However, there is a way of obtaining consistent estimates of variances and covariances of OLS estimators even if there is heteroscedasticity.

White’s Heteroscedasticity Consistent Standard Errors and Variances

White’s heteroscedasticity-corrected standard errors are known as robust standard errors. White’s heteroscedasticity-corrected standard errors are larger (maybe smaller too) than the OLS standard errors and therefore, the estimated $t$-values are much smaller (or maybe larger) than those obtained by the OLS.

Comparing the OLS output with White’s heteroscedasticity consistent standard errors (variances) may be useful to see whether heteroscedasticity is a serious problem in a particular set of data.

Plausible Assumptions about Heteroscedasticity Patterns

Assumption 1: The error variance is proportional to $X_i^2$

Heteroscedasticity Consistent standard errors and Variances

$$E(u_i^2)=\sigma^2 X_i^2$$
It is believed that the variance of $u_i$ is proportional to the square of the $X$ (in graphical methods or Park and Glejser approaches).

One may transform the original model as follows:

\begin{align}\label{assump1}
\frac{Y_i}{X_i} &=\frac{\beta_1}{X_i} + \beta_2 + \frac{u_i}{X_i} \nonumber \\
&=\beta_1 \frac{1}{X_i} + \beta_2 + v_i,\qquad \qquad (1)
\end{align}

where $v_i$ is the transformed disturbance term, equal to $\frac{u_i}{X_i}$. It can be verified that

\begin{align*}
E(v_i^2) &=E\left(\frac{u_i}{X_i}\right)^2\\
&=\frac{1}{X_i^2}E(u_i^2)=\sigma^2
\end{align*}

Hence, the variance of $v_i$ is now homoscedastic, and one may apply OLS to the transformed equation by regressing $\frac{Y_i}{X_i}$ on $\frac{1}{X_i}$.

Notice that in the transformed regression the intercept term $\beta_2$ is the slope coefficient in the original equation and the slope coefficient $\beta_1$ is the intercept term in the original model. Therefore, to get back to the original model multiply the estimated equation (1) by $X_i$.

Assumption 2: The Error Variance is Proportional to $X_i$

The square root transformation: $E(u_i^2) = \sigma^2 X_i$

Heteroscedasticity Consistent Variances

If it is believed that the variance of $u_i$ is proportional to $X_i$, then the original model can be transformed as

\begin{align*}
\frac{Y_i}{\sqrt{X_i}} &= \frac{\beta_1}{\sqrt{X_i}} + \beta_2 \sqrt{X_i} + \frac{u_i}{\sqrt{X_i}}\\
&=\beta_1 \frac{1}{\sqrt{X_i}} + \beta_2\sqrt{X_i}+v_i,\quad\quad (a)
\end{align*}

where $v_i=\frac{u_i}{\sqrt{X_i}}$ and $X_i>0$

$E(v_i^2)=\sigma^2$ (a homoscedastic situation)

One may proceed to apply OLS on equation (a), regressing $\frac{Y_i}{\sqrt{X_i}}$ on $\frac{1}{\sqrt{X_i}}$ and $\sqrt{X_i}$.

Note that the transformed model (a) has no intercept term. Therefore, use the regression through the origin model to estimate $\beta_1$ and $\beta_2$. To get back the original model simply multiply the equation (a) by $\sqrt{X_i}$.

Consider a case of $intercept = 0$, that is, $Y_i=\beta_2X_i+u_i$. The transformed model will be

\begin{align*}
\frac{Y_i}{\sqrt{X_i}} &= \beta_2 \sqrt{X_i} + \frac{u_i}{\sqrt{X_i}}\\
\beta_2 &=\frac{\overline{Y}}{\overline{X}}
\end{align*}

Here, the WLS estimator is simply the ratio of the means of the dependent and explanatory variables.

Assumption 3: The Error Variance is proportional to the Square of the Mean value of $Y$

$$E(u_i^2)=\sigma^2[E(Y_i)]^2$$

The original model is $Y_i=\beta_1 + \beta_2 X_i + u_I$ and $E(Y_i)=\beta_1 + \beta_2X_i$

The transformed model

\begin{align*}
\frac{Y_i}{E(Y_i)}&=\frac{\beta_1}{E(Y_i)} + \beta_2 \frac{X_i}{E(Y_i)} + \frac{u_i}{E(Y_i)}\\
&=\beta_1\left(\frac{1}{E(Y_i)}\right) + \beta_2 \frac{X_i}{E(Y_i)} + v_i, \quad \quad (b)
\end{align*}

where $v_i=\frac{u_i}{E(Y_i)}$, and $E(v_i^2)=\sigma^2$ (a situation of homoscedasticity).

Note that the transformed model (b) is inoperational as $E(Y_i)$ depends on $\beta_1$ and $\beta_2$ which are unknown. We know $\hat{Y}_i = \hat{\beta}_1 + \hat{\beta}_2X_i$ which is an estimator of $E(Y_i)$. Therefore, we proceed in two steps.

Step 1: Run the usual OLS regression ignoring the presence of heteroscedasticity problem and obtain $\hat{Y}_i$.

Step 2: Use the estimate of $\hat{Y}_i$ to transform the model as

\begin{align*}
\frac{Y_i}{\hat{Y}_i}&=\frac{\beta_1}{\hat{Y}_i} + \beta_2 \frac{X_i}{\hat{Y}_i} + \frac{u_i}{\hat{Y}_i}\\
&=\beta_1\left(\frac{1}{\hat{Y}_i}\right) + \beta_2 \frac{X_i}{\hat{Y}_i} + v_i, \quad \quad (c)
\end{align*}

where $v_i=\frac{u_i}{\hat{Y}_i}$.

Although $\hat{Y}_i$ is not exactly $E(Y_i)$, they are consistent estimates (as the sample size increases indefinitely; $\hat{Y}_i$ converges to true $E(Y_i)$). Therefore, the transformed model (c) will perform well if the sample size is reasonably large.

Assumption 4: Log Transformation

A log transformation

$$ ln Y_i = \beta_1 + \beta_2 ln X_i + u_i \tag*{log model-1}$$ usually reduces heteroscedasticity when compared to the regression $$Y_i=\beta_1+\beta_2X_i + u_i $$

It is because log transformation compresses the scales in which the variables are measured, by reducing a tenfold (دس گنا) difference between two values to a twofold (دگنا) difference. For example, 80 is 10 times the number 8, but ln(80) = 4.3280 is about twice as large as ln(8) = 2.0794.

By taking the log transformation, the slope coefficient $\beta_2$ measures the elasticity of $Y$ concerning $X$ (that is, the percentage change in $Y$ for the percentage change in $X$).

If $Y$ is consumption and $X$ is income in the model (log model-1) then $\beta_2$ measures income elasticity, while in the original model (model without any transformation: OLS model), $\beta_2$ measures only the rate of change of mean consumption for a unit change in income.

Note that the log transformation is not applicable if some of the $Y$ and $X$ values are zero or negative.

Note regarding all assumptions about the nature of heteroscedasticity, we are essentially speculating (سوچنا، منصوبہ بنانا) about the nature of $\sigma_i^2$.

  • There may be a problem of spurious correlation. For example, in the model $$Y_i = \beta_1+\beta_2X_i + u_i,$$ the $Y$ and $X$ variables may not be correlation but in transformed model $$\frac{Y_i}{X_i}=\beta_1\left(\frac{1}{X_i}\right) + \beta_2,$$ the $\frac{Y_i}{X_i}$ and $\frac{1}{X_i}$ are often found to be correlated.
  • $\sigma_i^2$ are not directly known, we estimate them from one or more of the transformations. All testing procedures are valid only in large samples. Therefore, be careful in interpreting the results based on the various transformations in small or finite samples.
  • For a model with more than one explanatory variable, one may not know in advance, which of the $X$ variables should be chosen for transforming data.

Read more about Heteroscedasticity and Homoscedasticity on Wikipedia

Heteroscedasticity Consistent Standard Errors

Heteroscedasticity in Regression

Learn R Programming Language

Heteroscedasticity in Regression (2020)

Heteroscedasticity in Regression

Heteroscedasticity in Regression: The term heteroscedasticity refers to the violation of the assumption of homoscedasticity in linear regression models (LRM). In the case of heteroscedasticity, the errors have unequal variances for different levels of the regressors, which leads to biased and inefficient estimators of the regression coefficients. The disturbances in the Classical Linear Regression Model (CLRM) appearing in the population regression function should be homoscedastic; that is they all have the same variance.

Mathematical Proof of $E(\hat{\sigma}^2)\ne \sigma^2$ when there is some presence of hetero in the data.

For the proof of $E(\hat{\sigma}^2)\ne \sigma^2$, consider the two-variable linear regression model in the presence of heteroscedasticity,

\begin{align}
Y_i=\beta_1 + \beta_2 X+ u_i, \quad\quad (eq1)
\end{align}

where $Var(u_i)=\sigma_i^2$ (Case of heteroscedasticity)

as

\begin{align}
\hat{\sigma^2} &= \frac{\sum \hat{u}_i^2 }{n-2}\\
&= \frac{\sum (Y_i – \hat{Y}_i)^2 }{n-2}\\
&=\frac{(\beta_1 + \beta_2 X_i + u_i – \hat{\beta}_1 -\hat{\beta}_2 X_i )^2}{n-2}\\
&=\frac{\sum \left( -(\hat{\beta}_1-\beta_1) – (\hat{\beta}_2 – \beta_2)X_i + u_i \right)^2 }{n-2}\quad\quad (eq2)
\end{align}

Noting that

\begin{align*}
(Y_i-\hat{Y}_i)&=0\\
\beta_1 + \beta_2 X + u_i\, – \,\hat{\beta}_1 – \hat{\beta}_2X &=0\\
-(\hat{\beta}_1 -\beta_1) – X(\hat{\beta}_2-\beta_2) – u_i & =0\\
(\hat{\beta}_1 -\beta_1) &= – X (\hat{\beta}_2-\beta_2) + u_i\\
\text{Applying summation on both side}&\\
\sum (\hat{\beta}_1-\beta_1) &= -(\hat{\beta}_2-\beta_2)\sum X + \sum u_i\\
(\hat{\beta}_1 – \beta_1) &= -(\hat{\beta}_2-\beta_2)\overline{X}+\overline{u}
\end{align*}

Substituting it in (eq2) and taking expectation on both sides:

\begin{align}
\hat{\sigma}^2 &= \frac{1}{n-2} \left[ -(-(\hat{\beta}_2 – \beta_2) \overline{X} + \overline{u} ) – (\hat{\beta}_2-\beta_2)X_i + u_i  \right]^2\\
&=\frac{1}{n-2}E\left[(\hat{\beta}_2-\beta_2)\overline{X} -\overline{u} – (\hat{\beta}_2-\beta_2)X_i-u_i \right]^2\\
&=\frac{1}{n-2} E\left[ -(\hat{\beta}_2 – \beta_2)(X_i-\overline{X}) + (u_i-\overline{u})\right]^2\\
&= \frac{1}{n-2}\left[-\sum x_i^2 Var(\hat{\beta}_2) + E[\sum(u_i-\overline{u}]^2 \right]\\
&=\frac{1}{n-2} \left[ -\frac{\sum x_i^2 \sigma_i^2}{(\sum x_i^2)} + \frac{(n-1)\sum \sigma_i^2}{n} \right]
\end{align}

If there is homoscedasticity, then $\sigma_i^2=\sigma^2$ for each $i$, $E(\hat{\sigma}_i^2)=\sigma^2$.

The expected value of the $\hat{\sigma}^2=\frac{\hat{u}_i^2}{n-2}$ will not be equal to the true $\sigma^2$ in the presence of heteroscedasticity.


Heteroscedasticity in regression

To address heteroscedasticity in regression analysis, several techniques can be used to stabilize the variance of the errors:

  1. Transformations: Transforming the variables (such as using logarithmic or square root transformations) can sometimes help stabilize the variance of the errors.
  2. Weighted Least Squares (WLS): WLS is a method that assigns different weights to observations based on their variances, thereby giving more weight to observations with smaller variances. This may also help to mitigate the impact of heteroscedasticity on the estimation of parameters.
  3. Robust Standard Errors: heteroscedasticity-consistent standard errors also known as Robust standard errors, provide a way to correct standard errors and hypothesis tests in the presence of heteroscedasticity without requiring assumptions about the specific form of heteroscedasticity.
  4. Generalized Least Squares (GLS): The GLS method allows to estimation of regression coefficients under a broader range of assumptions about the variance-covariance structure of the errors, including heteroscedasticity.

Overall, detecting and addressing heteroscedasticity is important for ensuring the validity and reliability of regression analysis results.

Read more on the Remedy of Heteroscedasticity

More on heteroscedasticity on Wikipedia

MCQs General Knowledge

R Programming Language

Residuals plot for Detection of Autocorrelation (2020)

The existence and pattern of autocorrelation may be detected using a graphical representation of residuals obtained from ordinary least square regression. One can draw the following residual plot for the detection of autocorrelation:

Detection of Autocorrelation from Residual Plots

  • A plot of the residual plot against time.
  • A plot of the $hat{u}_t$ against $hat{u}_{t-1}$
  • A plot of standardized residuals against time.

Note that the population disturbances $u_t$ are not directly observable, we use their proxies, the residuals $hat{u}_t$.

Detection of Positive negative autocorrelation
  • A random pattern of residuals indicates the non-presence of autocorrelation.
  • A plot of residuals for detection of residuals used for visual examination of $hat{u}_t$ or  $hat{u}_t^2$ can provide useful information not only about the presence of autocorrelation but also about the presence of heteroscedasticity. Similarly, the examination of $hat{u}_t$ and $hat{u}_t^2$ provides useful information about model inadequacy or specification bias too.
  • The standardized residuals are computed as $frac{u_t}{hat{sigma}}$ where $hat{sigma}$ is standard error of regression.
Residuals plot for Autocorrelation

Note: The plot of residuals against time is called the sequence plot. For time-series data, the researcher can plot (graphically draw) the residuals versus time (called a time sequence plot), he may expect to observe some random pattern in the time series data, indicating that the data is not autocorrelated. However, if the researcher observes some pattern (other than random) in the graphical representation of the data, then it means that the data is autocorrelated. The existence of some patterns shown in the above Figure can be used for the detection of autocorrelation.

See more on Autocorrelation

R and Data Analysis

RecordCast – Recording the Screen in One Click

A tutorial about RecordCast: Screen Recording Tool.

Have you ever made video tutorials? Have you ever recorded your computer screen? Indeed, many tutorials are done from the video of a computer screen. This enables Internet users to follow the various steps to follow to resolve a problem or use new software. If you want to instantly record the actions you take in a desktop window without sharing or recording information from your computer, the free online RecordCast‌ Screen Capture tool (RecordCast – Recording the Screen) is what you are looking for!

RecordCast - Recording the Screen

What is RecordCast?

It is a simple web-based tool to record or capture your screen without using third-party apps. All records are processed in the browser, and nothing is saved on the server. It is supported in modern browsers like Chrome, Edge, Firefox, and more.

RecordCast – Recording the Screen is very easy to use, with the essential ability to record everything that happens on our screen, with or without sound. After recording, you can edit the created video, adding text (it has several templates for entering text), images, audio, etc. You can also cut the video and isolate the pieces you want or do not want.

RecordCast

How RecordCast Screen Recorder works

  • Open your browser and go to the service.
  • All you have to do now is click on the “start recording” button in the center.
  • You can choose the type of recording you want, including screen+webcam, screen only, or webcam only.
  • It is possible to record microphone, system audio, or mute audio while recording your screen.
  • After allowing or forbidding the recording, you can make the necessary settings of the recording media available to start your screen’s recording process.
  • You now have three options: select the entire screen, the application window, or the Chrome tab. If you select an application window, the service will show all open windows. If you select a Chrome tab, all open tabs will be displayed in the list.
  • After selecting an app or screen, tap on the record button.
  • After you’ve finished recording, you will have the option to load the recording or start a new recording by deleting the clip.
  • It is available to edit the recorded video in a built-in editor provided by RecordCast, but you need to create a free account now.
RecordCast

In conclusion

RecordCast – Recording a Screen is a great tool for YouTubers, bloggers, and presentations as it gives you everything you need to make a cool show. Some cool features of RecordCast are free, and you can connect a microphone to comment on your video or a webcam, where you can be seen while you are filming.

The only minus we could find about the program is that it only allows you to film for 30 minutes now, which can feel like a very short time. However, the program is good to use if you are inexperienced in making screenshots, as it is incredibly easy to use. In addition, the quality of the recording itself is also really good.

Of course, there are other free web-based screen capture programs, but I do not have enough hands-on experience with them to comment on them. Is RecordCast something you can use? Do you know other and better alternatives? I would love to hear what you think, so leave a comment and make us all smarter!

SPSS Data Analysis

Online MCQs Quiz Website