Statistics for Data Science & Analytics - Learn Statistics: MCQs, Software & Data Analysi

Skewness and Measures of Skewness

Mar 18, 2025Mar 10, 2021 by Muhammad Imdad Ullah

Post Views: 1,282

If the curve is symmetrical, a deviation below the mean exactly equals the corresponding deviation above the mean. This is called symmetry. Here, we will discuss Skewness and Measures of Skewness.

Skewness is the degree of asymmetry or departure from the symmetry of a distribution. Positive Skewness means when the tail on the right side of the distribution is longer or fatter. The mean and median will be greater than the mode. Negative Skewness is when the tail of the left side of the distribution is longer or fatter than the tail on the right side.

Measures of Skewness

Karl Pearson Measures of Relative Skewness

In a symmetrical distribution, the mean, median, and mode coincide. In skewed distributions, these values are pulled apart; the mean tends to be on the same side of the mode as the longer tail. Thus, a measure of the asymmetry is supplied by the difference ($mean – mode$). This can be made dimensionless by dividing by a measure of dispersion (such as SD).

The Karl Pearson measure of relative skewness is
$$\text{SK} = \frac{\text{Mean}-\text{mode}}{SD} =\frac{\overline{x}-\text{mode}}{s}$$
The value of skewness may be either positive or negative.

The empirical formula for skewness (called the second coefficient of skewness) is

$$\text{SK} = \frac{3(\text{mean}-\text{median})}{SD}=\frac{3(\tilde{X}-\text{median})}{s}$$

Bowley Measures of Skewness

In a symmetrical distribution, the quartiles are equidistant from the median ($Q_2-Q_1 = Q_3-Q_2$). If the distribution is not symmetrical, the quartiles will not be equidistant from the median (unless the entire asymmetry is located in the extreme quarters of the data). The Bowley suggested measure of skewness is

$$\text{Quartile Coefficient of SK} = \frac{Q_(2-Q_2)-(Q_2-Q_1)}{Q_3-Q_1}=\frac{Q_2-2Q_2+Q_1}{Q_3-Q_1}$$

This measure is always zero when the quartiles are equidistant from the median and is positive when the upper quartile is farther from the median than the lower quartile. This measure of skewness varies between $+1$ and $-1$.

Moment Coefficient of Skewness

In any symmetrical curve, the sum of odd powers of deviations from the mean will be equal to zero. That is, $m_3=m_5=m_7=\cdots=0$. However, it is not true for asymmetrical distributions. For this reason, a measure of skewness is devised based on $m_3$. That is

\begin{align}
\text{Moment of Coefficient of SK}&= a_3=\frac{m_3}{s^3}=\frac{m_3}{\sqrt{m_2^3}}\\
&=b_1=\frac{m_3^2}{m_2^3}
\end{align}

For perfectly symmetrical curves (normal curves), $a_3$ and $b_1$ are zero.

Real-Life Examples of Skewness

Income Distribution: Income distribution in most countries is right-skewed. A large number of people earn relatively low incomes, while a smaller number earn significantly higher incomes, creating a long tail on the right side of the distribution.
Insurance Claims: Insurance claim amounts are typically right-skewed. Most claims are for smaller amounts, but there are a few very large claims that create a long tail on the right.
Age at Retirement: The age at which people retire is often right-skewed. Most people retire around a certain age, but some continue to work much later in life, creating a long tail on the right.
Test Scores: In some educational settings, test scores can be left-skewed if the test is very easy, with most students scoring high and a few scoring much lower, creating a long tail on the left.
Hospital Stay Duration: The length of hospital stays is often right-skewed. Most patients stay for a short period, but some patients with severe conditions stay much longer, creating a long tail on the right.
House Prices: In many housing markets, the distribution of house prices is right-skewed. There are many houses priced within a certain range, but a few very expensive houses create a long tail on the right.
Web Traffic: The number of visitors to different websites can be highly right-skewed. A few popular sites get a huge number of visitors, while the majority of sites get much less traffic.
Customer Spending: In retail, customer spending can be right-skewed. Most customers spend a small amount, but a few spend a lot, creating a long tail on the right.
The lifespan of Products: The lifespan of certain products can be right-skewed. Most products last for a certain period, but a few last much longer, creating a long tail on the right.
Natural Disasters: The severity of natural disasters, such as earthquakes or hurricanes, can be right-skewed. Most events are of low to moderate severity, but a few are extremely severe, creating a long tail on the right.

FAQs about SKewness

What is skewness?
If a curve is symmetrical then what is the behavior of deviation below and above the mean?
What is Bowley’s Measure of Skewness?
What is Karl Person’s Measure of Relative Skewness?
What is the moment coefficient of skewness?
What is the positive and negative skewness?

Skewness

Online MCQs Test Preparation Website

Key Points of Heteroscedasticity (2021)

Mar 30, 2024Feb 10, 2021 by Muhammad Imdad Ullah

Post Views: 889

The following are some key points about heteroscedasticity. These key points are about the definition, example, properties, assumptions, and tests for the detection of heteroscedasticity (detection of hetero in short).

One important assumption of Regression is that the

One important assumption of Regression is that the variance of the Error Term is constant across observations. If the error has a constant variance, then the errors are called homoscedastic, otherwise heteroscedastic. In the case of heteroscedastic errors (non-constant variance), the standard estimation methods become inefficient. Typically, to assess the assumption of homoscedasticity, residuals are plotted.

Heteroscedasticity

The disturbance term of OLS regression $u_i$ should be homoscedastic. By Homo, we mean equal, and scedastic means spread or scatter.
By hetero, we mean unequal.
Heteroscedasticity means that the conditional variance of $Y_i$ (i.e., $var(u_i))$ conditional upon the given $X_i$ does not remain the same regardless of the values taken by the variable $X$.
In case of heteroscedasticity $E(u_i^2)=\sigma_i^2=var(u_i^2)$, where $i=1,2,\cdots, n$.
In case of Homoscedasticity $E(u_i^2)=\sigma^2=var(u_i^2)$, where $i=1,2,\cdots, n$
Homoscedasticity means that the conditional variance of $Y_i$ (i.e. $var(u_i))$ conditional upon the given $X_i$ remains the same regardless of the values taken by the variable $X$.
The error terms are heteroscedastic, when the scatter of the errors is different, varying depending on the value of one or more of the explanatory variables.
Heteroscedasticity is a systematic change in the scatteredness of the residuals over the range of measured values
The presence of outliers may be due to (i) The presence of outliers in the data, (ii) incorrect functional form of the regression model, (iii) incorrect transformation of the data, and (iv) missing observations with different measures of scale.
The presence of hetero does not destroy the unbiasedness and consistency of OLS estimators.
Hetero is more common in cross-section data than time-series data.
Hetero may affect the variance and standard errors of the OLS estimates.
The standard errors of OLS estimates are biased in the case of hetero.
Statistical inferences (confidence intervals and hypothesis testing) of estimated regression coefficients are no longer valid.
The OLS estimators are no longer BLUE as they are no longer efficient in the presence of hetero.
The regression predictions are inefficient in the case of hetero.
The usual OLS method assigns equal weights to each observation.
In GLS the weight assigned to each observation is inversely proportional to $\sigma_i$.
In GLS a weighted sum of squares is minimized with weight $w_i=\frac{1}{\sigma_i^2}$.
In GLS each squared residual is weighted by the inverse of $Var(u_i|X_i)$
GLS estimates are BLUE.
Heteroscedasticity can be detected by plotting an estimated $u_i^2$ against $\hat{Y}_i$.
Plotting $u_i^2$ against $\hat{Y}_i$, if no systematic pattern exists then there is no hetero.
In the case of prior information about $\sigma_i^2$, one may use WLS.
If $\sigma_i^2$ is unknown, one may proceed with heteroscedastic corrected standard errors (that are also called robust standard errors).
Drawing inferences in the presence of hetero (or if hetero is suspected) may be very misleading.

MCQs Online Website with Answers: https://gmstat.com

R Frequently Asked Questions

The Breusch-Pagan Test (Numerical Example)

Apr 7, 2024Feb 5, 2021 by Muhammad Imdad Ullah

Post Views: 942

To perform the Breusch-Pagan test for the detection of heteroscedasticity, use the data from the following file Table_11.3.

Step 1:

The estimated regression is $\hat{Y}_i = 9.2903 + 0.6378X_i$

Step 2:

The residuals obtained from this regression are:

$\hat{u}_i$	$\hat{u}_i^2$	$p_i$
-5.31307	28.22873	0.358665
-8.06876	65.10494	0.827201
6.49801	42.22407	0.536485
0.55339	0.30624	0.003891
-6.82445	46.57318	0.591743
1.36447	1.86177	0.023655
5.79770	33.61333	0.427079
-3.58015	12.81744	0.162854
0.98662	0.97342	0.012368
8.30908	69.04085	0.877209
-2.25769	5.09715	0.064763
-1.33584	1.78446	0.022673
8.04201	64.67391	0.821724
10.47524	109.73066	1.3942
6.23093	38.82451	0.493291
-9.09153	82.65588	1.050197
-12.79183	163.63099	2.079039
-16.84722	283.82879	3.606231
-17.35860	301.32104	3.828481
2.71955	7.39595	0.09397
2.39709	5.74604	0.073007
0.77494	0.60052	0.00763
9.45248	89.34930	1.135241
4.88571	23.87014	0.303286
4.53063	20.52658	0.260804
-0.03614	0.00131	1.66E-05
-0.30322	0.09194	0.001168
9.50786	90.39944	1.148584
-18.98076	360.26909	4.577455
20.26355	410.61159	5.217089

The estimated $\tilde{\sigma}^2$ is $\frac{\sum u_i^2}{n} = \frac{2361.15325}{30} = 78.7051$.

Compute a new variable $p_i = \frac{\hat{u}_i^2}{\hat{\sigma^2}}$

Step 3:

Assuming $p_i$ is linearly related to $X_i(=Z_i)$ and run the regression of $p_i=\alpha_1+\alpha_2Z_{2i}+v_i$.

The regression Results are: $\hat{p}_i=-0.74261 + 0.010063X_i$

Step 4:

Obtain the Explained Sum of Squares (ESS) = 10.42802.

Step 5:

Compute: $\Theta = \frac{1}{2} ESS = \frac{10.42802}{2}= 5.2140$.

The Breusch-Pagan test follows Chi-Square Distribution. The $\chi^2_{tab}$ value at a 5% level of significance and with ($k-1$) one degree of freedom is 3.8414. The $\chi_{cal}^2$ is greater than $\chi_{tab}^2$, therefore, results are statistically significant. There is evidence of heteroscedasticity at a 5% level of significance.

Bruesch-Pagan-Test-of-Heteroscedasticity — Heteroscedasticity Residual Plot 1

Skewness and Measures of Skewness

Table of Contents

Measures of Skewness