Skewness and Measures of Skewness

If the curve is symmetrical, a deviation below the mean exactly equals the corresponding deviation above the mean. This is called symmetry. Here, we will discuss Skewness and Measures of Skewness.

Skewness is the degree of asymmetry or departure from the symmetry of a distribution. Positive Skewness means when the tail on the right side of the distribution is longer or fatter. The mean and median will be greater than the mode. Negative Skewness is when the tail of the left side of the distribution is longer or fatter than the tail on the right side.

Skewness and Measures of Skewness

Measures of Skewness

Karl Pearson Measures of Relative Skewness

In a symmetrical distribution, the mean, median, and mode coincide. In skewed distributions, these values are pulled apart; the mean tends to be on the same side of the mode as the longer tail. Thus, a measure of the asymmetry is supplied by the difference ($mean – mode$). This can be made dimensionless by dividing by a measure of dispersion (such as SD).

The Karl Pearson measure of relative skewness is
$$\text{SK} = \frac{\text{Mean}-\text{mode}}{SD} =\frac{\overline{x}-\text{mode}}{s}$$
The value of skewness may be either positive or negative.

The empirical formula for skewness (called the second coefficient of skewness) is

$$\text{SK} = \frac{3(\text{mean}-\text{median})}{SD}=\frac{3(\tilde{X}-\text{median})}{s}$$

Bowley Measures of Skewness

In a symmetrical distribution, the quartiles are equidistant from the median ($Q_2-Q_1 = Q_3-Q_2$). If the distribution is not symmetrical, the quartiles will not be equidistant from the median (unless the entire asymmetry is located in the extreme quarters of the data). The Bowley suggested measure of skewness is

$$\text{Quartile Coefficient of SK} = \frac{Q_(2-Q_2)-(Q_2-Q_1)}{Q_3-Q_1}=\frac{Q_2-2Q_2+Q_1}{Q_3-Q_1}$$

This measure is always zero when the quartiles are equidistant from the median and is positive when the upper quartile is farther from the median than the lower quartile. This measure of skewness varies between $+1$ and $-1$.

Moment Coefficient of Skewness

In any symmetrical curve, the sum of odd powers of deviations from the mean will be equal to zero. That is, $m_3=m_5=m_7=\cdots=0$. However, it is not true for asymmetrical distributions. For this reason, a measure of skewness is devised based on $m_3$. That is

\begin{align}
\text{Moment of Coefficient of SK}&= a_3=\frac{m_3}{s^3}=\frac{m_3}{\sqrt{m_2^3}}\\
&=b_1=\frac{m_3^2}{m_2^3}
\end{align}

For perfectly symmetrical curves (normal curves), $a_3$ and $b_1$ are zero.

Skewness ad Measure of Skewness

Real-Life Examples of Skewness

  1. Income Distribution: Income distribution in most countries is right-skewed. A large number of people earn relatively low incomes, while a smaller number earn significantly higher incomes, creating a long tail on the right side of the distribution.
  2. Insurance Claims: Insurance claim amounts are typically right-skewed. Most claims are for smaller amounts, but there are a few very large claims that create a long tail on the right.
  3. Age at Retirement: The age at which people retire is often right-skewed. Most people retire around a certain age, but some continue to work much later in life, creating a long tail on the right.
  4. Test Scores: In some educational settings, test scores can be left-skewed if the test is very easy, with most students scoring high and a few scoring much lower, creating a long tail on the left.
  5. Hospital Stay Duration: The length of hospital stays is often right-skewed. Most patients stay for a short period, but some patients with severe conditions stay much longer, creating a long tail on the right.
  6. House Prices: In many housing markets, the distribution of house prices is right-skewed. There are many houses priced within a certain range, but a few very expensive houses create a long tail on the right.
  7. Web Traffic: The number of visitors to different websites can be highly right-skewed. A few popular sites get a huge number of visitors, while the majority of sites get much less traffic.
  8. Customer Spending: In retail, customer spending can be right-skewed. Most customers spend a small amount, but a few spend a lot, creating a long tail on the right.
  9. The lifespan of Products: The lifespan of certain products can be right-skewed. Most products last for a certain period, but a few last much longer, creating a long tail on the right.
  10. Natural Disasters: The severity of natural disasters, such as earthquakes or hurricanes, can be right-skewed. Most events are of low to moderate severity, but a few are extremely severe, creating a long tail on the right.

FAQs about SKewness

  1. What is skewness?
  2. If a curve is symmetrical then what is the behavior of deviation below and above the mean?
  3. What is Bowley’s Measure of Skewness?
  4. What is Karl Person’s Measure of Relative Skewness?
  5. What is the moment coefficient of skewness?
  6. What is the positive and negative skewness?

Skewness

Online MCQs Test Preparation Website

Skewness in Statistics A Measure of Asymmetry (2017)

The article is about Skewness in Statistics, a measure of asymmetry. Skewed and skew are widely used terminologies that refer to something that is out of order or distorted on one side. Similarly, when referring to the shape of frequency distributions or probability distributions, the term skewness also refers to the asymmetry of that distribution. A distribution with an asymmetric tail extending out to the right is referred to as “positively skewed” or “skewed to the right”. In contrast, a distribution with an asymmetric tail extending out to the left is “negatively skewed” or “skewed to the left”.

Skewness in Statistics A measure of Asymmetry

Skewness in Statistics

It ranges from minus infinity ($-\infty$) to positive infinity ($+\infty$). In simple words, skewness (asymmetry) is a measure of symmetry, or in other words, skewness is a lack of symmetry.

Skewness by Karl Pearson

Karl Pearson (1857-1936) first suggested measuring skewness by standardizing the difference between the mean and the mode, such that, $\frac{\mu-mode}{\text{standard deviation}}$. Since population modes are not well estimated from sample modes, therefore Stuart and Ord, 1994 suggested that one can estimate the difference between the mean and the mode as being three times the difference between the mean and the median. Therefore, the estimate of skewness will be $$\frac{3(M-median)}{\text{standard deviation}}$$. Many of the statisticians use this measure but after eliminating the ‘3’, that is, $$\frac{M-Median}{\text{standard deviation}}$$. This statistic ranges from $-1$ to $+1$. According to Hildebrand, 1986, absolute values above 0.2 indicate great skewness.

Fisher’s Skewness

Skewness has also been defined concerning the third moment about the mean, that is $\gamma_1=\frac{\sum(X-\mu)^3}{n\sigma^3}$, which is simply the expected value of the distribution of cubed $Z$ scores, measured in this way is also sometimes referred to as “Fisher’s skewness”. When the deviations from the mean are greater in one direction than in the other direction, this statistic will deviate from zero in the direction of the larger deviations.

From sample data, Fisher’s skewness is most often estimated by: $$g_1=\frac{n\sum z^3}{(n-1)(n-2)}$$. For large sample sizes ($n > 150$), $g_1$ may be distributed approximately normally, with a standard error of approximately $\sqrt{\frac{6}{n}}$. While one could use this sampling distribution to construct confidence intervals for or tests of hypotheses about $\gamma_1$, there is rarely any value in doing so.

Bowleys’ Coefficient of Skewness

Arthur Lyon Bowley (1869-19570, has also proposed a measure of asymmetry based on the median and the two quartiles. In a symmetrical distribution, the two quartiles are equidistant from the median but in an asymmetrical distribution, this will not be the case. The Bowley’s coefficient of skewness is $$\frac{q_1+q_3-2\text{median}}{Q_3-Q_1}$$. Its value lies between 0 and $\pm1$.

The most commonly used measures of Asymmetry (those discussed here) may produce some surprising results, such as a negative value when the shape of the distribution appears skewed to the right.

Impact of Lack of Symmetry

Researchers from the behavioral and business sciences need to measure the lack of symmetry when it appears in their data. A great amount of asymmetry may motivate the researcher to investigate the existence of outliers. When making decisions about which measure of the location to report and which inferential statistic to employ, one should take into consideration the estimated skewness of the population. Normal distributions have zero skewness. Of course, a distribution can be perfectly symmetric but may be far away from the normal distribution. Transformations of variables under study are commonly employed to reduce (positive) asymmetry. These transformations may include square root, log, and reciprocal of a variable.

In summary, by understanding and recognizing how skewness affects the data, one can choose appropriate analysis methods, gain more insights from the data, and make better decisions based on the findings.

FAQs About Skewness

  1. What statistical measure is used to find the asymmetry in the data?
  2. Define the term Skewness.
  3. What is the difference between symmetry and asymmetry concept?
  4. Describe negative and positive skewness.
  5. What is the difference between left-skewed and right-skewed data?
  6. What is a lack of symmetry?
  7. Discuss the measure proposed by Karl Pearson.
  8. Discuss the measure proposed by Bowley’s Coefficient of Skewness.
  9. For what distribution, the skewness is zero?
  10. What is the impact of transforming a variable?

Online MCQS/ Qui Test Website

R Programming Language

Skewness Formula

The post outlines key skewness formulas providing essential tools for analyzing data distribution asymmetry. The skewness formulas help quantify the direction and degree of skewness, aiding in data analysis and decision-making.

What is Skewness?

Skewness is a statistical measure that describes the asymmetry of a probability distribution around its mean. It indicates whether the data is skewed to the left (negative skew), the right (positive skew), or symmetrically distributed (zero skew). In short, Skewness is the degree of asymmetry or departure from the symmetry of the distribution of a real-valued random variable. The post describes some important skewness formulas.

Positive Skewed

If the frequency curve of distribution has a longer tail to the right of the central maximum than to the left, the distribution is said to be skewed to the right or to have positively skewed. In a positively skewed distribution, the mean is greater than the median and the median is greater than the mode i.e. $$Mean > Median > Mode$$

Negative Skewed

If the frequency curve has a longer tail to the left of the central maximum than to the right, the distribution is said to be skewed to the left or to be negatively skewed. In a negatively skewed distribution, the mode is greater than the median and the median is greater than the mean i.e. $$Mode > Median > Mean$$

Zero Skewness

For zero skewness, the data is symmetrically distributed, as in a normal distribution.

Measure of Skewness Formulation

In a symmetrical distribution, the mean, median, and mode coincide. In a skewed distribution, these values are pulled apart.

Skewness Formula

Pearson’s Coefficient of Skewness Formula

Karl Pearson, (1857-1936) introduced a coefficient to measure the degree of skewness of distribution or curve, which is denoted by $S_k$ and defined by

\begin{eqnarray*}
S_k &=& \frac{Mean – Mode}{Standard Deviation}\\
S_k &=& \frac{3(Mean – Median)}{Standard Deviation}\\
\end{eqnarray*}
Usually, this coefficient varies between –3 (for negative) to +3 (for positive) and the sign indicates the direction of skewness.

Bowley’s Coefficient of Skewness Formula (Quartile Coefficient)

Arthur Lyon Bowley (1869-1957) proposed a measure of skewness based on the median and the two quartiles.

\[S_k=\frac{Q_1+Q_3-2Median}{Q_3 – Q_1}\]
Its values lie between 0 and ±1.

Moment Coefficient of Skewness Formula

This measure of skewness is the third moment expressed in standard units (or the moment ratio) thus given by

\[S_k=\frac{\mu_3}{\sigma^3} \]
Its values lie between -2 and +2.

If $S_k$ is greater than zero, the distribution or curve is said to be positively skewed. If $S_k$ is less than zero the distribution or curve is said to be negatively skewed. If $S_k$ is zero the distribution or curve is said to be symmetrical.

The skewness of the distribution of a real-valued random variable can easily be seen by drawing a histogram or frequency curve.

The skewness may be very extreme and in such a case these are called J-shaped distributions.

Skewness: J-Shaped Distribution

Skewness helps identify deviations from normality, which is crucial for selecting appropriate statistical methods and interpreting data accurately. It is commonly used in finance, economics, and data analysis to understand the shape and behavior of datasets

FAQs about Skewness

  1. What is the degree of asymmetry called?
  2. What is a departure from symmetry?
  3. If a distribution is negatively skewed then what is the relation between mean, median, and mode?
  4. If a distribution is positively skewed then what is the relation between mean, median, and mode?
  5. What is the relation between mean, median, and mode for a symmetrical distribution?
  6. What is the range of the moment coefficient of skewness?

Learn R Frequently Asked Questions