Muhammad Imdad Ullah - Statistics for Data Science & Analytics

Nature of Heteroscedasticity (2020)

May 24, 2024Sep 11, 2020 by Muhammad Imdad Ullah

Let us start with the nature of heteroscedasticity.

The assumption of homoscedasticity (equal spread, equal variance) is

$$E(u_i^2)=E(u_i^2|X_{2i},X_{3i},\cdots, X_{ki})=\sigma^2,\quad 1,2,\cdots, n$$

The above Figure shows that the conditional variance of $Y_i$ (which is equal to that of $u_i$), conditional upon the given $X_i$, remains the same regardless of the values taken by the variable $X$.

The Figure shows that the conditional value of $Y_i$ increases as $X$ increases. The variance of $Y_i$ is not the same, there is heteroscedasticity.

$$E(u_i^2)=E(u_i^2|X_{2i},X_{3i},\cdots, X_{ki})=\sigma_i^2$$

Nature of Heteroscedasticity

The nature of heteroscedasticity refers to the violation of the assumption of homoscedasticity in linear regression models. In the case of heteroscedasticity, the errors have unequal variances for different levels of the regressors, which leads to biased and inefficient estimators of the regression coefficients. There are several reasons why the variances of $u_i$ may be variable:

Following the error-learning models, as people learn, their error of behavior becomes smaller over time or the number of errors becomes more consistent. In such cases, $\sigma_i^2$ is expected to decrease.
As income grows, people have more discretionary income (income remaining after deduction of taxes) and hence more scope for choice about disposition (برتاؤ، قابو) of their income. Similarly, companies with larger profits are generally expected to show greater variability in their dividend (کمپنی کا منافع) policies than companies with lower profits.
As data collecting techniques improve $\sigma_i^2$ is likely to decrease. For example, Banks having sophisticated data processing equipment are likely to commit fewer errors in the monthly or quarterly statements of their customers than banks without such equipment.
Heteroscedasticity can also arise as a result of the presence of outliers. The inclusion or exclusion of such an observation, especially if the sample size is small, can substantially (معقول حد تک، درحقیقت) alter the results of regression analysis.
The omission of variables also results in the problem of Heteroscedasticity. Upon deleting the variable from the model the researcher would not be able to interpret anything from the model.
\item Heteroscedasticity may arise from the violation of the assumption of CLRM that the model is correctly specified.
Skewness in the distribution of one or more regressors is another source of heteroscedasticity. For example, income is uneven.
Incorrect data transformation (ratio or first difference), and incorrect functional form (linear vs log-linear) are also the source of heteroscedasticity.
The problem of heteroscedasticity is likely to be more in cross-sectional data than in time series data.

Computer MCQs

Learn R Programming

Introduction Heteroscedasticity (2020)

May 24, 2024Sep 6, 2020 by Muhammad Imdad Ullah

The pose is about a general discussion and an introduction to heteroscedasticity.

Introduction Heteroscedasticity and Homoscedasticity

The term heteroscedasticity refers to the violation of the assumption of homoscedasticity in linear regression models (LRM). In the case of heteroscedasticity, the errors have unequal variances for different levels of the regressors, which leads to biased and inefficient estimators of the regression coefficients. The disturbances $u_i$ in the Classical Linear Regression Model (CLRM) appearing in the population regression function should be homoscedastic; that is they all have the same variance.

In short words, heteroscedasticity means different (or unequal), and the Greek word Skodastic means spread (or scatter). Homoscedasticity means equal spread and heteroscedasticity means unequal spread.

Introduction Heteroscedasticity — Homoscedasticity

Introduction Heteroscedasticity (2020) — Homoscedasticity

Effect on the Var-Cov Matrix of the Error Terms:
The Var-Cov matrix of errors is

$$E(uu’) = E(u_i^2)=Var(u_i^)=\begin{pmatrix}
\sigma^2 & 0 & \cdots & 0\\ 0 & \sigma^2 & \vdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ 0&0&\ddots &\sigma^2
\end{pmatrix}=\sigma^2 I_n,$$

where $I_n$ is an $n\times n$ identity matrix.

In the presence of heteroscedasticity, the Var-Cov matrix of the residuals will no longer be constant.

$$E(uu’)= E(u_i^2)=Var(u_i^)==\begin{pmatrix}
\sigma_1^2 & 0 & 0 & \cdots & 0 \\0 & \sigma^2_2 & 0 & \cdots & 0 \\ 0 & 0 & \sigma^2_3 & \cdots & 0 \\ 0 & 0 & 0 &\ddots & \sigma_n^2
\end{pmatrix}$$

The Var-Cov matrix of the OLS estimators $\hat{\beta}$ is

\begin{align*}
Cov(\hat{\beta}) &= E\left[(\hat{\beta}-\beta)(\hat{\beta}-\beta)’ \right]\\
&=E\left[[(X’X)^{-1}X’u][(X’X)^{-1}X’u]’ \right]\\
&=E\left[(X’X)^{-1}X’uu’X(X’X)^{-1} \right]\\
&=(X’X)^{-1}X’E(uu’)X(X’X)^{-1}\\
&=(X’X)^{-1}X’\Omega X (X’X)^{-1}
\end{align*}

The following are questions when we are concerned with heteroscedasticity:

What is the nature of heteroscedasticity?
What are the consequences of heteroscedasticity?
How does one detect/ Test Heteroscedasticity?
What are the remedial measures of heteroscedasticity?

That’s all about some basic introduction to heteroscedasticity.

Learn R Programming

Basic Computer MCQs

Coding Time Variable (2020)

Apr 29, 2024Sep 2, 2020 by Muhammad Imdad Ullah

Coding Time Variable by Taking Origin at the Beginning

Suppose we have time-series data for the years 1990, 1991, 1992, and 1994.

We can take the origin of a time series at the beginning and assign $x = 0$ to the first period and $1, 2, 3, …$ to other periods. The code for the year will be

Coding Time Variable by Taking Middle Years as Zero

To simplify the trend calculations, the time variable $t$ (year variable) is coded by taking deviations $t-\overline{t}$, where $\overline{t}$ is the average number computed as $\overline{t}=\frac{First\, Period + Last\, Period}{2}$. Taking $x=t-\overline{t}$ we get
$$\sum x = 0 = \sum x^3 = \sum x^5 = \cdots$$

There are two cases when coding a Time Variable (when taking zero in the Middle):

When there are an odd number of Years:
For an odd number of years (as in the period 1990 to 1994) the $\overline{t}$ is the middle point. The $\overline{t}$ is $\overline{t} = (1990+1994)/2=1992$ the code for the year $t$ is $x=t-\overline{t}$. For t=1990, we have $x=1990-1992 =0$. Thus the coded year is zero at $\overline{t}$. Now after taking x=0 at the middle of an odd number of years, we assign $-1, -2, …$ for the years before the middle of the year and $1,2,…$ for the years after the middle year.

Year (t) $x=t-\overline{t}$

1990 -2

1991 -1

1992 0

1993 1

1994 2
When there are even numbers of years
Suppose we have time-series data for the years 1990, 1991, 1992, 1993, 1994, and 1995. The value of middle point is $\overline{t} = (1990+1995)/2 = 1992.5$. So $x=0$ halfway between the years 1992 and 1993 (in the middle of 1992 and 1993). For $t=1992$, we have $x=t-\overline{t}=1992-1992.5=-0.5$. Thus coding the middle of an even number of years as $x=0$, we assign $-0.5, -1.5, -2.5, …$ for the years before the middle year and $0.5, 1.5, 2.5, …$ for the years after the middle year as shown below

Year(t)	$x=t-\overline{t}$	$x=\frac{t-\overline{t}}{1/2}$
1990	-2.5	-5
1991	-1.5	-3
1992	-0.5	1
1993	0.5	1
1994	1.5	3
1995	2.5	5

To avoid decimals in the coded year, we can take the unit of measurement as $\frac{1}{2}$ year. Therefore, after coding $x=0$ in the middle of an even number of years, we assign $-1,-3, -5,…$ for the year before the middle year and $1,3,5,…$ for the years after the middle year as shown above.

Read more about Coding Time Variables in R

R Programming Language

Computer MCQs

Nature of Heteroscedasticity

Share this:

Introduction Heteroscedasticity and Homoscedasticity

Share this:

Coding Time Variable by Taking Origin at the Beginning

Coding Time Variable by Taking Middle Years as Zero

Share this: