The Breusch-Pagan Test (Numerical Example)

To perform the Breusch-Pagan test for the detection of heteroscedasticity, use the data from the following file Table_11.3.

Step 1:

The estimated regression is $\hat{Y}_i = 9.2903 + 0.6378X_i$

Step 2:

The residuals obtained from this regression are:

$\hat{u}_i$$\hat{u}_i^2$$p_i$
-5.3130728.228730.358665
-8.0687665.104940.827201
6.4980142.224070.536485
0.553390.306240.003891
-6.8244546.573180.591743
1.364471.861770.023655
5.7977033.613330.427079
-3.5801512.817440.162854
0.986620.973420.012368
8.3090869.040850.877209
-2.257695.097150.064763
-1.335841.784460.022673
8.0420164.673910.821724
10.47524109.730661.3942
6.2309338.824510.493291
-9.0915382.655881.050197
-12.79183163.630992.079039
-16.84722283.828793.606231
-17.35860301.321043.828481
2.719557.395950.09397
2.397095.746040.073007
0.774940.600520.00763
9.4524889.349301.135241
4.8857123.870140.303286
4.5306320.526580.260804
-0.036140.001311.66E-05
-0.303220.091940.001168
9.5078690.399441.148584
-18.98076360.269094.577455
20.26355410.611595.217089

The estimated $\tilde{\sigma}^2$ is $\frac{\sum u_i^2}{n} = \frac{2361.15325}{30} = 78.7051$.

Compute a new variable $p_i = \frac{\hat{u}_i^2}{\hat{\sigma^2}}$

Step 3:

Assuming $p_i$ is linearly related to $X_i(=Z_i)$ and run the regression of $p_i=\alpha_1+\alpha_2Z_{2i}+v_i$.

The regression Results are: $\hat{p}_i=-0.74261 + 0.010063X_i$

Step 4:

Obtain the Explained Sum of Squares (ESS) = 10.42802.

Step 5:

Compute: $\Theta = \frac{1}{2} ESS = \frac{10.42802}{2}= 5.2140$.

The Breusch-Pagan test follows Chi-Square Distribution. The $\chi^2_{tab}$ value at a 5% level of significance and with ($k-1$) one degree of freedom is 3.8414. The $\chi_{cal}^2$ is greater than $\chi_{tab}^2$, therefore, results are statistically significant. There is evidence of heteroscedasticity at a 5% level of significance.

See More about the Breusch-Pagan Test

Sampling Basics and Objectives (2021)

In this article, we will discuss the Sampling Basics. It is often required to collect information from the data. These two methods are used for collecting the required information.

  • Complete information
  • Sampling

Complete Information

This method collects the required information from every individual in the population. This method is used when it is difficult to draw some conclusion (inference) about the population based on sample information. This method is costly and time-consuming. This method of getting data is also called Complete Enumeration or Population Census.

Sampling Basics

What is Sampling?

Sampling is the most common and widely used method of collecting information. Instead of studying the whole population only a small part of the population is selected and studied and the result is applied to the whole population. For example, a cotton dealer picked up a small quantity of cotton from the different bales to know the quality of the cotton.

Sampling and Sampling Distribution

Purpose or objective of sampling

Two basic purposes of sampling are

  1. To obtain the maximum information about the population without examining every unit of the population.
  2. To find the reliability of the estimates derived from the sample, which can be done by computing the standard error of the statistic.

Advantages of sampling over Complete Enumeration

  1. It is a much cheaper method to collect the required information from the sample as compared to complete enumeration as fewer units are studied in the sample rather than the population.
  2. From a sample, the data can be collected more quickly and greatly save time.
  3. Planning for sample surveys can be done more carefully and easily as compared to complete enumeration.
  4. Sampling is the only available method of collecting the required information when the population object/ subject or individual in the population is destructive.
  5. Sampling is the only available method of collecting the required information when the population is infinite or large enough.
  6. The most important advantage of sampling is that it provides the reliability of the estimates.
  7. Sampling is extensively used to obtain some of the census information.
Sampling Basics and Objectives

This is all about Sampling Basics.

https://itfeature.com

For further reading visit: 

Sampling Theory and Reasons to Sample
Sampling Basics

https://rfaqs.com

The Spearman Rank Correlation Test (Numerical Example)

Consider the following data for the illustration of the detection of heteroscedasticity using the Spearman Rank correlation test. The Data file is available to download.

YX2X3
11208.1
16188.4
11228.5
14218.5
13278.8
17269
14258.9
15279.4
12309.5
18289.5

The estimated multiple linear regression model is:

$$Y_i = -34.936 -0.75X_{2i} + 7.611X_{3i}$$

The Residuals with the data table are:

YX2X3Residuals
11208.1-0.63302
16188.40.575564
11228.5-2.16954
14218.50.076455
13278.81.317102
172693.040825
14258.90.047951
15279.4-1.2497
12309.5-2.74881
18289.51.743171

We need to find the rank of absolute values of $u_i$ and the expected heteroscedastic variable $X_2$.

$Y$$X_2$$X_3$ResidualsRank of |$u_i$|Rank of $X_2$$d$$d^2$
11208.1-0.633 4224
16188.40.576 3124
11228.5-2.170 84416
14218.50.076 23-11
13278.81.317 67.5-1.52.25
172693.041 106416
14258.90.048 15-416
15279.4-1.250 57.5-2.56.25
12309.5-2.749 910-11
18289.51.743 79-24
       Total =070.5

Calculating the Spearman Rank correlation

\begin{align}
r_s&=1-\frac{6\sum d^2}{n(n-1)}\\
&=1-\frac{6\times 70.5)}{10(100-1)}=0.5727
\end{align}

Let us perform the statistical significance of $r_s$ by t-test

\begin{align}
t&=\frac{r_s \sqrt{n}}{\sqrt{1-r_s^2}}\\
&=\frac{0.5727\sqrt{8}}{\sqrt{1-(0.573)^2}}=1.977
\end{align}

The value of $t$ from the table at a 5% level of significance at 8 degrees of freedom is 2.306.

Since $t_{cal} \ngtr t_{tab}$, there is no evidence of the systematic relationship between the explanatory variables, $X_2$ and the absolute value of the residuals ($|u_i|$) and hence there is no evidence of heteroscedasticity.

Since there is more than one regressor (the example is from the multiple regression model), therefore, Spearman’s Rank Correlation test should be repeated for each of the explanatory variables.

Spearman Rank Correlation

As an assignment perform the Spearman Rank Correlation between |$u_i$| and $X_3$  for the data above. Test the statistical significance of the coefficient in the above manner to explore evidence about heteroscedasticity.

Read about Pearson’s Correlation Coefficient

R Language Interview Questions