Skewness and Measures of Skewness (2021)

If the curve is symmetrical, a deviation below the mean exactly equals the corresponding deviation above the mean. This is called symmetry. Here, we will discuss Skewness and Measures of Skewness.

Skewness is the degree of asymmetry or departure from the symmetry of a distribution. Positive Skewness means when the tail on the right side of the distribution is longer or fatter. The mean and median will be greater than the mode. Negative Skewness is when the tail of the left side of the distribution is longer or fatter than the tail on the right side.

Skewness and Measures of Skewness

Measures of Skewness

Karl Pearson Measures of Relative Skewness
In a symmetrical distribution, the mean, median, and mode coincide. In skewed distributions, these values are pulled apart; the mean tends to be on the same side of the mode as the longer tail. Thus, a measure of the asymmetry is supplied by the difference ($mean – mode$). This can be made dimensionless by dividing by a measure of dispersion (such as SD). The Karl Pearson measure of relative skewness is
$$\text{SK} = \frac{\text{Mean}-\text{mode}}{SD} =\frac{\overline{x}-\text{mode}}{s}$$
The value of skewness may be either positive or negative.

The empirical formula for skewness (called the second coefficient of skewness) is

$$
\text{SK} = \frac{3(\text{mean}-\text{median})}{SD}=\frac{3(\tilde{X}-\text{median})}{s}
$$

Bowley Measures of Skewness

In a symmetrical distribution, the quartiles are equidistant from the median ($Q_2-Q_1 = Q_3-Q_2$). If the distribution is not symmetrical, the quartiles will not be equidistant from the median (unless the entire asymmetry is located in the extreme quarters of the data). The Bowley suggested measure of skewness is

$$\text{Quartile Coefficient of SK} = \frac{Q_(2-Q_2)-(Q_2-Q_1)}{Q_3-Q_1}=\frac{Q_2-2Q_2+Q_1}{Q_3-Q_1}$$

This measure is always zero when the quartiles are equidistant from the median and is positive when the upper quartile is farther from the median than the lower quartile. This measure of skewness varies between $+1$ and $-1$.

Moment Coefficient of Skewness

In any symmetrical curve, the sum of odd powers of deviations from the mean will be equal to zero. That is, $m_3=m_5=m_7=\cdots=0$. However, it is not true for asymmetrical distributions. For this reason, a measure of skewness is devised based on $m_3$. That is

\begin{align}
\text{Moment of Coefficient of SK}&= a_3=\frac{m_3}{s^3}=\frac{m_3}{\sqrt{m_2^3}}\\
&=b_1=\frac{m_3^2}{m_2^3}
\end{align}

For perfectly symmetrical curves (normal curves), $a_3$ and $b_1$ are zero.

See More about Skewness

Online MCQs Test Preparation Website

Quantiles or Fractiles

When the number of observations is sufficiently large, the principle by which a distribution is divided into two equal parts may be extended to divide the distribution into four, five, eight, ten, or hundred equal parts. The median, quartiles, deciles, and percentiles values are collectively called quantiles or fractiles. Let us start learning about Quantiles or Fractiles.

Quantiles or Fractiles

Quartiles

These are the values that divide a distribution into four equal parts. There are three quartiles denoted by $Q_1, Q_2$, and $Q_3$. If $x_1,x_2,\cdots,x_n$ are $n$ observations on a variable $X$, and $x_{(1)}, x_{(2)}, \cdots, x_{(n)}$ is their array then $r$th quartile $Q_r$ is the values of $X$, such that $\frac{r}{4}$ of the observations is less than that value of $X$ and $\frac{4-r}{4}$ of the observations is greater.

The $Q_1$ is the value of $X$ such that $\frac{1}{4}$ of the observations is less than the value of $X$ and $\frac{4-1}{4}$ of the observations is greater, the $Q_3$ is the value of $X$, such that $\frac{3}{4}$ of the observations is less than that of $X$ and $\frac{4-3}{4}$ of the observations is greater.

Deciles

These are the values that divide a distribution into ten equal parts. There are 9 deciles $D_1, D_2, \cdots, D_9$.

Percentiles

These are the values that divide a distribution into a hundred equal parts. There are 99 percentiles denoted as $P_1,P_2,\cdots, P_{99}$.

The median, quartiles, deciles, percentiles, and other partition values are collectively called quantiles or fractiles. All quantiles are percentages. For example, $P_{50}, Q_2$, and $D_5$ are also median.

\begin{align*}
Q_2 &= D_5 = P_{50}\\
Q_1 &= P_{25} = D_{2.5}\\
Q_3 &= P_{75}=D_{7.5}
\end{align*}
The $r$th quantile, $k$th decile, and $j$th percentile are located in the array by the following relation:
For ungrouped Date

\begin{align}
Q_r &=\frac{r(n+1)}{4}\text{th value in the distribution and } r=1,2,3\\
D_k &=\frac{k(n+1)}{10}\text{th value in the distribution and } k=1,2,\cdots, 9\\
P_j &=\frac{j(n+1)}{100}\text{th value in the distribution and } k=1,2,\cdots, 99
\end{align}
For grouped Data
\begin{align}
Q_r&= l+\frac{h}{f}\left(\frac{rn}{4}-c\right)\\
D_k&= l+\frac{h}{f}\left(\frac{kn}{10}-c\right)\\
P_j&= l+\frac{h}{f}\left(\frac{jn}{100}-c\right)
\end{align}

A procedure for obtaining percentile (quartiles, deciles) of a data set of size $n$ is as follows:

Step 1: Arrange the data in ascending/ descending order.
Step 2: Compute an index $i$ as follows: $i=\frac{p}{100} (n+1)$th (in case of odd observation).

  • If $i$ is an integer, the $p$th percentile is the average of the $i$th and $(i+1)$th data values.
  • if $i$ is not an integer then round $i$ up to the nearest integer and take the value at that position or use some mathematics to locate the value of percentile between $i$th and $(i+1)$th value.

Percentile Example:

Consider the following (sorted) data values: 380, 600, 690, 890, 1050, 1100, 1200, 1900, 890000.

For the $p=10$th percentile, $i=\frac{p}{100} (n+1) =\frac{10}{100} (9+1)= 1$. So the 10th percentile is the first sorted value or 380.

For the $p=75$ percentile, $i=\frac{p}{100} (n+1)= \frac{75}{100}(9+1) = 7.5$

To get the actual value we need to compute 7th value + (8th value – 7th value) $\times 0.5$. That is, $1200 + (1900-1200)\times 0.5 = 1200+350 = 1550$.

Quantiles or Fractiles

Read More about: Quartiles, Deciles, and Percentiles

Learn R Programming

Test Preparation MCQs

MCQs Statistics Online Test 10

This quiz contains MCQs Statistics Online Test with answers covering variable and type of variable, Measures of central tendency such as mean, median, mode, Weighted mean, data and type of data, sources of data, Measures of Dispersion/ Variation, Standard Deviation, Variance, Range, etc. Let us start the MCQs Statistics Online Test for the preparation of the PPSC Statistics Lecturer Post.

1. Which mean is most affected by extreme values?

 
 
 
 

2. The measures of dispersion are changed by the change of:

 
 
 
 

3. Data Classified by attributes are called:

 
 
 
 

4. The correct relationship between AM, GM, and HM is

 
 
 
 

5. The extreme values in negatively skewed distribution lie in the:

 
 
 
 

6. Measurements usually provide:

 
 
 
 

7. Which measure of dispersion ensures the highest degree of reliability?

 
 
 
 

8. If each observation of a set is divided by 10, the standard deviation of the new observation is:

 
 
 
 

9. The sum of absolute deviations about the median is

 
 
 
 

10. Which measure of dispersion is the least affected by extreme values?

 
 
 
 

11. Statistics are aggregates of

 
 
 
 

12. When mean, median, and mode are identical, the distribution is:

 
 
 
 

13. The sum of the square of the deviations about the mean is:

 
 
 
 

14. The appropriate average for calculating the average percentage increase in population is

 
 
 
 

15. Commodities subject to considerable price variations could best be measured by:

 
 
 
 

16. A set of values is said to be relatively uniform if it has:

 
 
 
 

17. Statistics results are:

 
 
 
 

18. Cumulative frequency is

 
 
 
 

19. If a constant value 5 is subtracted from each observation of a set, the variance is:

 
 
 
 

20. The Harmonic mean gives more weightage to:

 
 
 
 


If you found that any POSTED MCQ is/ are WRONG
PLEASE COMMENT below the MCQ with the CORRECT ANSWER and its DETAILED EXPLANATION.

Don’t forget to mention the MCQs Statement (or Screenshot), because MCQs and their answers are generated randomly

Introductory statistics deals with the measure of central tendencies (that include mean (arithmetic mean, or known as average), median, mode, weighted mean, geometric mean, and Harmonic mean) and measure of dispersion (such as range, standard deviation, and variance).

Introductory statistical methods include planning and designing the study, collecting data, arranging, and numerical and graphically summarizing the collected data. Basic statistics are also used to perform different statistical analyses to draw meaningful inferences.

MCQs Statistics Online Test

A basic visual inspection of data using some graphical and also with numerical statistics may give useful hidden information that is already available in the data. The graphical representation includes a bar chart, pie chart, dot chart, box plot, etc.

Companies related to finance, communication, manufacturing, charity organizations, government institutes, simple to large businesses, etc. are all examples that have a massive interest in collecting data and measuring different sorts of statistical findings. This helps them to learn from the past, noticing the trends, and planning for the future.

MCQs Statistics Online Test

  • Statistics results are:
  • Which mean is most affected by extreme values?
  • The sum of absolute deviations about the median is
  • The sum of the square of the deviations about the mean is:
  • If a constant value 5 is subtracted from each observation of a set, the variance is:
  • Which measure of dispersion ensures the highest degree of reliability?
  • Which measure of dispersion is the least affected by extreme values?
  • Statistics are aggregates of
  • Data Classified by attributes are called:
  • Measurements usually provide:
  • The measures of dispersion are changed by the change of:
  • Cumulative frequency is
  • The appropriate average for calculating the average percentage increase in population is
  • When mean, median, and mode are identical, the distribution is:
  • Commodities subject to considerable price variations could best be measured by:
  • The extreme values in negatively skewed distribution lie in the:
  • A set of values is said to be relatively uniform if it has:
  • If each observation of a set is divided by 10, the standard deviation of the new observation is:
  • The Harmonic mean gives more weightage to:
  • The correct relationship between AM, GM, and HM is

Introduction to R Programming

Online Quizzed Website

Characteristics of Statistics

The characteristics of statistics are

  1. Statistics deals with the behavior of aggregates or large groups of data. It has nothing to do with what is happening to a particular individual or object of the aggregate.
  2. Statistics deals with aggregates of observations of the same kind rather than isolated figures.
  3. Statistics deals with variability that obscures underlying patterns. No two objects in this universe are exactly alike. If they were there would have been no statistical problem.
  4. Among the important characteristics of statistics is that statistics deals with uncertainties as every process of getting observations whether controlled or uncontrolled involves deficiencies or chance variation. That is why we have to talk in terms of probability.
  5. Statistics deals with characteristics or aspects of things that can be described numerically by counts or measurements.
  6. Statistics deals with aggregates that are subject to several random causes, e.g., the heights of persons are subject to several causes such as race, ancestry, age, diet, habits, climate, etc.
  7. Statistical laws are valid on average or in the long run. There is no guarantee that a certain law will hold in all cases. Statistical inference is therefore made in the face of uncertainty.
  8. Among the important characteristics of Statistics is that statistical results might be misleading and incorrect if sufficient care in collecting, processing, and interpreting the data is not exercised or if the statistical data are handled by someone not well-versed in the subject matter of statistics.
Characteristics of Statistics

See the short History of Statistics

R FAQs