Skewness and Measures of Skewness

If the curve is symmetrical, a deviation below the mean exactly equals the corresponding deviation above the mean. This is called symmetry. Here, we will discuss Skewness and Measures of Skewness.

Skewness is the degree of asymmetry or departure from the symmetry of a distribution. Positive Skewness means when the tail on the right side of the distribution is longer or fatter. The mean and median will be greater than the mode. Negative Skewness is when the tail of the left side of the distribution is longer or fatter than the tail on the right side.

Skewness and Measures of Skewness

Measures of Skewness

Karl Pearson Measures of Relative Skewness

In a symmetrical distribution, the mean, median, and mode coincide. In skewed distributions, these values are pulled apart; the mean tends to be on the same side of the mode as the longer tail. Thus, a measure of the asymmetry is supplied by the difference ($mean – mode$). This can be made dimensionless by dividing by a measure of dispersion (such as SD).

The Karl Pearson measure of relative skewness is
$$\text{SK} = \frac{\text{Mean}-\text{mode}}{SD} =\frac{\overline{x}-\text{mode}}{s}$$
The value of skewness may be either positive or negative.

The empirical formula for skewness (called the second coefficient of skewness) is

$$\text{SK} = \frac{3(\text{mean}-\text{median})}{SD}=\frac{3(\tilde{X}-\text{median})}{s}$$

Bowley Measures of Skewness

In a symmetrical distribution, the quartiles are equidistant from the median ($Q_2-Q_1 = Q_3-Q_2$). If the distribution is not symmetrical, the quartiles will not be equidistant from the median (unless the entire asymmetry is located in the extreme quarters of the data). The Bowley suggested measure of skewness is

$$\text{Quartile Coefficient of SK} = \frac{Q_(2-Q_2)-(Q_2-Q_1)}{Q_3-Q_1}=\frac{Q_2-2Q_2+Q_1}{Q_3-Q_1}$$

This measure is always zero when the quartiles are equidistant from the median and is positive when the upper quartile is farther from the median than the lower quartile. This measure of skewness varies between $+1$ and $-1$.

Moment Coefficient of Skewness

In any symmetrical curve, the sum of odd powers of deviations from the mean will be equal to zero. That is, $m_3=m_5=m_7=\cdots=0$. However, it is not true for asymmetrical distributions. For this reason, a measure of skewness is devised based on $m_3$. That is

\begin{align}
\text{Moment of Coefficient of SK}&= a_3=\frac{m_3}{s^3}=\frac{m_3}{\sqrt{m_2^3}}\\
&=b_1=\frac{m_3^2}{m_2^3}
\end{align}

For perfectly symmetrical curves (normal curves), $a_3$ and $b_1$ are zero.

Skewness ad Measure of Skewness

Real-Life Examples of Skewness

  1. Income Distribution: Income distribution in most countries is right-skewed. A large number of people earn relatively low incomes, while a smaller number earn significantly higher incomes, creating a long tail on the right side of the distribution.
  2. Insurance Claims: Insurance claim amounts are typically right-skewed. Most claims are for smaller amounts, but there are a few very large claims that create a long tail on the right.
  3. Age at Retirement: The age at which people retire is often right-skewed. Most people retire around a certain age, but some continue to work much later in life, creating a long tail on the right.
  4. Test Scores: In some educational settings, test scores can be left-skewed if the test is very easy, with most students scoring high and a few scoring much lower, creating a long tail on the left.
  5. Hospital Stay Duration: The length of hospital stays is often right-skewed. Most patients stay for a short period, but some patients with severe conditions stay much longer, creating a long tail on the right.
  6. House Prices: In many housing markets, the distribution of house prices is right-skewed. There are many houses priced within a certain range, but a few very expensive houses create a long tail on the right.
  7. Web Traffic: The number of visitors to different websites can be highly right-skewed. A few popular sites get a huge number of visitors, while the majority of sites get much less traffic.
  8. Customer Spending: In retail, customer spending can be right-skewed. Most customers spend a small amount, but a few spend a lot, creating a long tail on the right.
  9. The lifespan of Products: The lifespan of certain products can be right-skewed. Most products last for a certain period, but a few last much longer, creating a long tail on the right.
  10. Natural Disasters: The severity of natural disasters, such as earthquakes or hurricanes, can be right-skewed. Most events are of low to moderate severity, but a few are extremely severe, creating a long tail on the right.

FAQs about SKewness

  1. What is skewness?
  2. If a curve is symmetrical then what is the behavior of deviation below and above the mean?
  3. What is Bowley’s Measure of Skewness?
  4. What is Karl Person’s Measure of Relative Skewness?
  5. What is the moment coefficient of skewness?
  6. What is the positive and negative skewness?

Skewness

Online MCQs Test Preparation Website

Quantiles or Fractiles Uncovered (2020)

When the number of observations is sufficiently large, the principle by which a distribution is divided into two equal parts may be extended to divide the distribution into four, five, eight, ten, or hundred equal parts. The median, quartiles, deciles, and percentiles values are collectively called quantiles or fractiles. Let us start learning about Quantiles or Fractiles.

Quantiles or Fractiles Uncovered

Quantiles or Fractiles

Quartiles

These are the values that divide a distribution into four equal parts. There are three quartiles denoted by $Q_1, Q_2$, and $Q_3$. If $x_1,x_2,\cdots,x_n$ are $n$ observations on a variable $X$, and $x_{(1)}, x_{(2)}, \cdots, x_{(n)}$ is their array then $r$th quartile $Q_r$ is the values of $X$, such that $\frac{r}{4}$ of the observations is less than that value of $X$ and $\frac{4-r}{4}$ of the observations is greater.

The $Q_1$ is the value of $X$ such that $\frac{1}{4}$ of the observations is less than the value of $X$ and $\frac{4-1}{4}$ of the observations is greater, the $Q_3$ is the value of $X$, such that $\frac{3}{4}$ of the observations is less than that of $X$ and $\frac{4-3}{4}$ of the observations is greater.

Deciles

These are the values that divide a distribution into ten equal parts. There are 9 deciles $D_1, D_2, \cdots, D_9$.

Percentiles

These are the values that divide a distribution into a hundred equal parts. There are 99 percentiles denoted as $P_1,P_2,\cdots, P_{99}$.

The median, quartiles, deciles, percentiles, and other partition values are collectively called quantiles or fractiles. All quantiles are percentages. For example, $P_{50}, Q_2$, and $D_5$ are also median.

\begin{align*}
Q_2 &= D_5 = P_{50}\\
Q_1 &= P_{25} = D_{2.5}\\
Q_3 &= P_{75}=D_{7.5}
\end{align*}
The $r$th quantile, $k$th decile, and $j$th percentile are located in the array by the following relation:

For ungrouped Date
\begin{align}
Q_r &=\frac{r(n+1)}{4}\text{th value in the distribution and } r=1,2,3\\
D_k &=\frac{k(n+1)}{10}\text{th value in the distribution and } k=1,2,\cdots, 9\\
P_j &=\frac{j(n+1)}{100}\text{th value in the distribution and } k=1,2,\cdots, 99
\end{align}

For grouped Data
\begin{align}
Q_r&= l+\frac{h}{f}\left(\frac{rn}{4}-c\right)\\
D_k&= l+\frac{h}{f}\left(\frac{kn}{10}-c\right)\\
P_j&= l+\frac{h}{f}\left(\frac{jn}{100}-c\right)
\end{align}

Procedure for obtaining Percentile

A procedure for obtaining percentile (quartiles, deciles) of a data set of size $n$ is as follows:

Step 1: Arrange the data in ascending/ descending order.
Step 2: Compute an index $i$ as follows: $i=\frac{p}{100} (n+1)$th (in case of odd observation).

  • If $i$ is an integer, the $p$th percentile is the average of the $i$th and $(i+1)$th data values.
  • if $i$ is not an integer then round $i$ up to the nearest integer and take the value at that position or use some mathematics to locate the value of percentile between $i$th and $(i+1)$th value.

Percentile Example

Consider the following (sorted) data values: 380, 600, 690, 890, 1050, 1100, 1200, 1900, 890000.

For the $p=10$th percentile, $i=\frac{p}{100} (n+1) =\frac{10}{100} (9+1)= 1$. So the 10th percentile is the first sorted value or 380.

For the $p=75$ percentile, $i=\frac{p}{100} (n+1)= \frac{75}{100}(9+1) = 7.5$

To get the actual value we need to compute 7th value + (8th value – 7th value) $\times 0.5$. That is, $1200 + (1900-1200)\times 0.5 = 1200+350 = 1550$.

Quantiles or Fractiles

Read More about: Quartiles, Deciles, and Percentiles

Learn R Programming, Test Preparation MCQs

Frequently Asked Questions Fractiles

  1. What is meant by quartile, deciles, and percentiles?
  2. Describe the procedure of obtaining percentiles (quartiles, and deciles).
  3. What is the interquartile range?
  4. Why do we need to sort the data first when computing quartiles, deciles, and percentiles?

MCQs Statistics Online Test 10

This quiz contains MCQs Statistics Online Test with answers covering variable and type of variable, Measures of central tendency such as mean, median, mode, Weighted mean, data and type of data, sources of data, Measures of Dispersion/ Variation, Standard Deviation, Variance, Range, etc. Let us start the MCQs Statistics Online Test for the preparation of the PPSC Statistics Lecturer Post.

1. Cumulative frequency is

 
 
 
 

2. Data Classified by attributes are called:

 
 
 
 

3. Statistics are aggregates of

 
 
 
 

4. The correct relationship between AM, GM, and HM is

 
 
 
 

5. If each observation of a set is divided by 10, the standard deviation of the new observation is:

 
 
 
 

6. The sum of absolute deviations about the median is

 
 
 
 

7. Which measure of dispersion is the least affected by extreme values?

 
 
 
 

8. The sum of the square of the deviations about the mean is:

 
 
 
 

9. A set of values is said to be relatively uniform if it has:

 
 
 
 

10. The appropriate average for calculating the average percentage increase in population is

 
 
 
 

11. Statistics results are:

 
 
 
 

12. Which measure of dispersion ensures the highest degree of reliability?

 
 
 
 

13. The measures of dispersion are changed by the change of:

 
 
 
 

14. When mean, median, and mode are identical, the distribution is:

 
 
 
 

15. The Harmonic mean gives more weightage to:

 
 
 
 

16. The extreme values in negatively skewed distribution lie in the:

 
 
 
 

17. Which mean is most affected by extreme values?

 
 
 
 

18. If a constant value 5 is subtracted from each observation of a set, the variance is:

 
 
 
 

19. Commodities subject to considerable price variations could best be measured by:

 
 
 
 

20. Measurements usually provide:

 
 
 
 

If you found that any POSTED MCQ is/ are WRONG
PLEASE COMMENT below the MCQ with the CORRECT ANSWER and its DETAILED EXPLANATION.

Don’t forget to mention the MCQs Statement (or Screenshot), because MCQs and their answers are generated randomly

Introductory statistics deals with the measure of central tendency (that includes mean (arithmetic mean, or known as average), median, mode, weighted mean, geometric mean, and Harmonic mean) and measure of dispersion (such as range, standard deviation, and variance).

Introductory statistical methods include planning and designing the study, collecting data, arranging, and numerical and graphically summarizing the collected data. Basic statistics are also used to perform different statistical analyses to draw meaningful inferences.

MCQs Statistics Online Test

A basic visual inspection of data using some graphical and also with numerical statistics may give useful hidden information that is already available in the data. The graphical representation includes a bar chart, pie chart, dot chart, box plot, etc.

Companies related to finance, communication, manufacturing, charity organizations, government institutes, simple to large businesses, etc. are all examples that have a massive interest in collecting data and measuring different sorts of statistical findings. This helps them to learn from the past, noticing the trends, and planning for the future.

MCQs Statistics Online Test

  • Statistics results are:
  • Which mean is most affected by extreme values?
  • The sum of absolute deviations about the median is
  • The sum of the square of the deviations about the mean is:
  • If a constant value 5 is subtracted from each observation of a set, the variance is:
  • Which measure of dispersion ensures the highest degree of reliability?
  • Which measure of dispersion is the least affected by extreme values?
  • Statistics are aggregates of
  • Data Classified by attributes are called:
  • Measurements usually provide:
  • The measures of dispersion are changed by the change of:
  • Cumulative frequency is
  • The appropriate average for calculating the average percentage increase in population is
  • When mean, median, and mode are identical, the distribution is:
  • Commodities subject to considerable price variations could best be measured by:
  • The extreme values in negatively skewed distribution lie in the:
  • A set of values is said to be relatively uniform if it has:
  • If each observation of a set is divided by 10, the standard deviation of the new observation is:
  • The Harmonic mean gives more weightage to:
  • The correct relationship between AM, GM, and HM is

Introduction to R Programming

Online Quizzed Website