Skewness in Statistics A Measure of Asymmetry (2017)

The article is about Skewness in Statistics, which is a measure of asymmetry. Skewed and skew are widely used terminologies that refer to something that is out of order or distorted on one side. Similarly, when referring to the shape of frequency distributions or probability distributions, the term skewness also refers to the asymmetry of that distribution. A distribution with an asymmetric tail extending out to the right is referred to as “positively skewed” or “skewed to the right”, while a distribution with an asymmetric tail extending out to the left is referred to as “negatively skewed” or “skewed to the left”.

Skewness in Statistics

It ranges from minus infinity ($-\infty$) to positive infinity ($+\infty$). In simple words, skewness (asymmetry) is a measure of symmetry, or in other words, skewness is a lack of symmetry.

Skewness by Karl Pearson

Skewness in Statistics A measure of Asymmetry

Karl Pearson (1857-1936) first suggested measuring skewness by standardizing the difference between the mean and the mode, such that, $\frac{\mu-mode}{\text{standard deviation}}$. Since population modes are not well estimated from sample modes, therefore Stuart and Ord, 1994 suggested that one can estimate the difference between the mean and the mode as being three times the difference between the mean and the median. Therefore, the estimate of skewness will be $$\frac{3(M-median)}{\text{standard deviation}}$$. Many of the statisticians use this measure but after eliminating the ‘3’, that is, $$\frac{M-Median}{\text{standard deviation}}$$. This statistic ranges from $-1$ to $+1$. According to Hildebrand, 1986, absolute values above 0.2 indicate great skewness.

Fisher’s Skewness

Skewness has also been defined concerning the third moment about the mean, that is $\gamma_1=\frac{\sum(X-\mu)^3}{n\sigma^3}$, which is simply the expected value of the distribution of cubed $Z$ scores, measured in this way is also sometimes referred to as “Fisher’s skewness”. When the deviations from the mean are greater in one direction than in the other direction, this statistic will deviate from zero in the direction of the larger deviations.

From sample data, Fisher’s skewness is most often estimated by: $$g_1=\frac{n\sum z^3}{(n-1)(n-2)}$$. For large sample sizes ($n > 150$), $g_1$ may be distributed approximately normally, with a standard error of approximately $\sqrt{\frac{6}{n}}$. While one could use this sampling distribution to construct confidence intervals for or tests of hypotheses about $\gamma_1$, there is rarely any value in doing so.

Bowleys’ Coefficient of Skewness

Arthur Lyon Bowley (1869-19570, has also proposed a measure of asymmetry based on the median and the two quartiles. In a symmetrical distribution, the two quartiles are equidistant from the median but in an asymmetrical distribution, this will not be the case. The Bowley’s coefficient of skewness is $$\frac{q_1+q_3-2\text{median}}{Q_3-Q_1}$$. Its value lies between 0 and $\pm1$.

The most commonly used measures of Asymmetry (those discussed here) may produce some surprising results, such as a negative value when the shape of the distribution appears skewed to the right.

Impact of Lack of Symmetry

Researchers from the behavioral and business sciences need to measure the lack of symmetry when it appears in their data. A great amount of asymmetry may motivate the researcher to investigate the existence of outliers. When making decisions about which measure of the location to report and which inferential statistic to employ, one should take into consideration the estimated skewness of the population. Normal distributions have zero skewness. Of course, a distribution can be perfectly symmetric but may be far away from the normal distribution. Transformations of variables under study are commonly employed to reduce (positive) asymmetry. These transformations may include square root, log, and reciprocal of a variable.

In summary, by understanding and recognizing how skewness affects the data, one can choose appropriate analysis methods, gain more insights from the data, and make better decisions based on the findings.

R Programming Language

Leave a Comment

Discover more from Statistics for Data Analyst

Subscribe now to keep reading and get access to the full archive.

Continue reading