Key Points of Heteroscedasticity (2021)

The following are some key points about heteroscedasticity. These key points are about the definition, example, properties, assumptions, and tests for the detection of heteroscedasticity (detection of hetero in short).

One important assumption of Regression is that the

One important assumption of Regression is that the variance of the Error Term is constant across observations. If the error has a constant variance, then the errors are called homoscedastic, otherwise heteroscedastic. In the case of heteroscedastic errors (non-constant variance), the standard estimation methods become inefficient. Typically, to assess the assumption of homoscedasticity, residuals are plotted.

Heteroscedasticity

  • The disturbance term of OLS regression $u_i$ should be homoscedastic. By Homo, we mean equal, and scedastic means spread or scatter.
  • By hetero, we mean unequal.
  • Heteroscedasticity means that the conditional variance of $Y_i$ (i.e., $var(u_i))$ conditional upon the given $X_i$ does not remain the same regardless of the values taken by the variable $X$.
  • In case of heteroscedasticity $E(u_i^2)=\sigma_i^2=var(u_i^2)$, where $i=1,2,\cdots, n$.
  • In case of Homoscedasticity $E(u_i^2)=\sigma^2=var(u_i^2)$, where $i=1,2,\cdots, n$
  • Homoscedasticity means that the conditional variance of $Y_i$ (i.e. $var(u_i))$ conditional upon the given $X_i$ remains the same regardless of the values taken by the variable $X$.
  • The error terms are heteroscedastic, when the scatter of the errors is different, varying depending on the value of one or more of the explanatory variables.
  • Heteroscedasticity is a systematic change in the scatteredness of the residuals over the range of measured values
  • The presence of outliers may be due to (i) The presence of outliers in the data, (ii) incorrect functional form of the regression model, (iii) incorrect transformation of the data, and (iv) missing observations with different measures of scale.
  • The presence of hetero does not destroy the unbiasedness and consistency of OLS estimators.
  • Hetero is more common in cross-section data than time-series data.
  • Hetero may affect the variance and standard errors of the OLS estimates.
  • The standard errors of OLS estimates are biased in the case of hetero.
  • Statistical inferences (confidence intervals and hypothesis testing) of estimated regression coefficients are no longer valid.
  • The OLS estimators are no longer BLUE as they are no longer efficient in the presence of hetero.
  • The regression predictions are inefficient in the case of hetero.
  • The usual OLS method assigns equal weights to each observation.
  • In GLS the weight assigned to each observation is inversely proportional to $\sigma_i$.
  • In GLS a weighted sum of squares is minimized with weight $w_i=\frac{1}{\sigma_i^2}$.
  • In GLS each squared residual is weighted by the inverse of $Var(u_i|X_i)$
  • GLS estimates are BLUE.
  • Heteroscedasticity can be detected by plotting an estimated $u_i^2$ against $\hat{Y}_i$.
  • Plotting $u_i^2$ against $\hat{Y}_i$, if no systematic pattern exists then there is no hetero.
  • In the case of prior information about $\sigma_i^2$, one may use WLS.
  • If $\sigma_i^2$ is unknown, one may proceed with heteroscedastic corrected standard errors (that are also called robust standard errors).
  • Drawing inferences in the presence of hetero (or if hetero is suspected) may be very misleading.

MCQs Online Website with Answers: https://gmstat.com

R Frequently Asked Questions

The Breusch-Pagan Test (Numerical Example)

To perform the Breusch-Pagan test for the detection of heteroscedasticity, use the data from the following file Table_11.3.

Step 1:

The estimated regression is $\hat{Y}_i = 9.2903 + 0.6378X_i$

Step 2:

The residuals obtained from this regression are:

$\hat{u}_i$$\hat{u}_i^2$$p_i$
-5.3130728.228730.358665
-8.0687665.104940.827201
6.4980142.224070.536485
0.553390.306240.003891
-6.8244546.573180.591743
1.364471.861770.023655
5.7977033.613330.427079
-3.5801512.817440.162854
0.986620.973420.012368
8.3090869.040850.877209
-2.257695.097150.064763
-1.335841.784460.022673
8.0420164.673910.821724
10.47524109.730661.3942
6.2309338.824510.493291
-9.0915382.655881.050197
-12.79183163.630992.079039
-16.84722283.828793.606231
-17.35860301.321043.828481
2.719557.395950.09397
2.397095.746040.073007
0.774940.600520.00763
9.4524889.349301.135241
4.8857123.870140.303286
4.5306320.526580.260804
-0.036140.001311.66E-05
-0.303220.091940.001168
9.5078690.399441.148584
-18.98076360.269094.577455
20.26355410.611595.217089

The estimated $\tilde{\sigma}^2$ is $\frac{\sum u_i^2}{n} = \frac{2361.15325}{30} = 78.7051$.

Compute a new variable $p_i = \frac{\hat{u}_i^2}{\hat{\sigma^2}}$

Step 3:

Assuming $p_i$ is linearly related to $X_i(=Z_i)$ and run the regression of $p_i=\alpha_1+\alpha_2Z_{2i}+v_i$.

The regression Results are: $\hat{p}_i=-0.74261 + 0.010063X_i$

Step 4:

Obtain the Explained Sum of Squares (ESS) = 10.42802.

Step 5:

Compute: $\Theta = \frac{1}{2} ESS = \frac{10.42802}{2}= 5.2140$.

The Breusch-Pagan test follows Chi-Square Distribution. The $\chi^2_{tab}$ value at a 5% level of significance and with ($k-1$) one degree of freedom is 3.8414. The $\chi_{cal}^2$ is greater than $\chi_{tab}^2$, therefore, results are statistically significant. There is evidence of heteroscedasticity at a 5% level of significance.

See More about the Breusch-Pagan Test

Sampling Basics and Objectives (2021)

In this article, we will discuss the Sampling Basics. It is often required to collect information from the data. These two methods are used for collecting the required information.

  • Complete information
  • Sampling

Complete Information

This method collects the required information from every individual in the population. This method is used when it is difficult to draw some conclusion (inference) about the population based on sample information. This method is costly and time-consuming. This method of getting data is also called Complete Enumeration or Population Census.

Sampling Basics

What is Sampling?

Sampling is the most common and widely used method of collecting information. Instead of studying the whole population only a small part of the population is selected and studied and the result is applied to the whole population. For example, a cotton dealer picked up a small quantity of cotton from the different bales to know the quality of the cotton.

Sampling and Sampling Distribution

Purpose or objective of sampling

Two basic purposes of sampling are

  1. To obtain the maximum information about the population without examining every unit of the population.
  2. To find the reliability of the estimates derived from the sample, which can be done by computing the standard error of the statistic.

Advantages of sampling over Complete Enumeration

  1. It is a much cheaper method to collect the required information from the sample as compared to complete enumeration as fewer units are studied in the sample rather than the population.
  2. From a sample, the data can be collected more quickly and greatly save time.
  3. Planning for sample surveys can be done more carefully and easily as compared to complete enumeration.
  4. Sampling is the only available method of collecting the required information when the population object/ subject or individual in the population is destructive.
  5. Sampling is the only available method of collecting the required information when the population is infinite or large enough.
  6. The most important advantage of sampling is that it provides the reliability of the estimates.
  7. Sampling is extensively used to obtain some of the census information.
Sampling Basics and Objectives

This is all about Sampling Basics.

https://itfeature.com

For further reading visit: 

Sampling Theory and Reasons to Sample
Sampling Basics

https://rfaqs.com