Tagged: Kurtosis

The Moments in Statistics

The measure of central tendency (location) and measure of dispersion (variation) both are useful to describe a data set but both of them fail to tell anything about the shape of the distribution. We need some other certain measure called the moments to identify the shape of the distribution known as skewness and kurtosis.

Moments about Mean

The moments about mean are the mean of deviations from the mean after raising them to integer powers. The rth population moment about mean is denoted by $\mu_r$ is

\[\mu_r=\frac{\sum^{N}_{i=1}(y_i – \bar{y} )^r}{N}\]

where r=1, 2, …

Corresponding sample moment denoted by mr is

\[\mu_r=\frac{\sum^{n}_{i=1}(y_i – \bar{y} )^r}{n}\]

Note that if r=1 i.e. the first moment is zero as $\mu_1=\frac{\sum^{n}_{i=1}(y_i – \bar{y} )^1}{n}=0$. So first moment is always zero.

If r=2 then the second moment is variance i.e. \[\mu_2=\frac{\sum^{n}_{i=1}(y_i – \bar{y} )^2}{n}\]

Similarly the 3rd and 4th moments are

\[\mu_3=\frac{\sum^{n}_{i=1}(y_i – \bar{y} )^3}{n}\]

\[\mu_4=\frac{\sum^{n}_{i=1}(y_i – \bar{y} )^4}{n}\]

For grouped data the rth sample moment  about sample mean $\bar{y}$ is

\[\mu_r=\frac{\sum^{n}_{i=1}f_i(y_i – \bar{y} )^r}{\sum^{n}_{i=1}f_i}\]

where $\sum^{n}_{i=1}f_i=n$

Moments about Arbitrary Value

The rth sample sample moment about any arbitrary origin “a” denoted by $m’_r$ is
\[m’_r = \frac{\sum^{n}_{i=1}(y_i – a)^2}{n} = \frac{\sum^{n}_{i=1}D^r_i}{n}\]
where $D_i=(y_i -a)$ and r = 1, 2, ….

therefore
\begin{eqnarray*}
m’_1&=&\frac{\sum^{n}_{i=1}(y_i – a)}{n}=\frac{\sum^{n}_{i=1}D_i}{n}\\
m’_2&=&\frac{\sum^{n}_{i=1}(y_i – a)^2}{n}=\frac{\sum^{n}_{i=1}D_i ^2}{n}\\
m’_3&=&\frac{\sum^{n}_{i=1}(y_i – a)^3}{n}=\frac{\sum^{n}_{i=1}D_i ^3}{n}\\
m’_4&=&\frac{\sum^{n}_{i=1}(y_i – a)^4}{n}=\frac{\sum^{n}_{i=1}D_i ^4}{n}
\end{eqnarray*}

The rth sample moment for grouped data about any arbitrary origin “a” is

\[m’_r=\frac{\sum^{n}_{i=1}f_i(y_i – a)^r}{\sum^{n}_{i=1}f} = \frac{\sum f_i D_i ^r}{\sum f}\]

The moment about the mean are usually called central moments and the moments about any arbitrary origin “a” are called non-central moments or raw moments.

One can calculate the moments about mean from the following relations by calculating the moments about arbitrary value

\begin{eqnarray*}
m_1&=& m’_1 – (m’_1) = 0 \\
m_2 &=& m’_2 – (m’_1)^2\\
m_3 &=& m’_3 – 3m’_2m’_1 +2(m’_1)^3\\
m_4 &=& m’_4 -4 m’_3m’_1 +6m’_2(m’_1)^2 -3(m’_1)^4
\end{eqnarray*}

Moments about Zero

If variable y assumes n values $y_1, y_2, \cdots, y_n$ then rth moment about zero can be obtained by taking a=0 so moment about arbitrary value will be
\[m’_r = \frac{\sum y^r}{n}\]

where r = 1, 2, 3, ….

therefore
\begin{eqnarray*}
m’_1&=&\frac{\sum y^1}{n}\\
m’_2 &=&\frac{\sum y^2}{n}\\
m’_3 &=&\frac{\sum y^3}{n}\\
m’_4 &=&\frac{\sum y^4}{n}\\
\end{eqnarray*}

The third moment is used to define the skewness of a distribution
\[{\rm Skewness} = \frac{\sum^{i=1}_{n} (y_i – \bar{y})^3}{ns^3}\]

If distribution is symmetric then the skewness will be zero. Skewness will be positive if there is a long tail in the positive direction and skewness will be negative if there is a long tail in the negative direction.

The fourth moment is used to define the kurtosis of a distribution

\[{\rm Kurtosis} = \frac{\sum^{i=1}_{n} (y_i -\bar{y})^4}{ns^4}\]

Measure of Kurtosis

In statistics, a measure of kurtosis is a measure of the “tailedness” of the probability distribution of a real-valued random variable. The standard measure of kurtosis is based on a scaled version of the fourth moment of the data or population. Therefore, the measure of kurtosis is related to the tails of the distribution, not its peak.

Sometimes, the Measure of Kurtosis is characterized as a measure of peakedness that is mistaken. A distribution having a relatively high peak is called leptokurtic. A distribution that is flat-topped is called platykurtic. The normal distribution which is neither very peaked nor very flat-topped is also called mesokurtic.  The histogram in some cases can be used as an effective graphical technique for showing both the skewness and kurtosis of the data set.

Kurtosis Pict

Data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak.

Moment ratio and Percentile Coefficient of kurtosis are used to measure the kurtosis

Moment Coefficient of Kurtosis= $b_2 = \frac{m_4}{S^2} = \frac{m_4}{m^{2}_{2}}$

Percentile Coefficient of Kurtosis = $k=\frac{Q.D}{P_{90}-P_{10}}$
where Q.D = $\frac{1}{2}(Q_3 – Q_1)$ is the semi-interquartile range. For normal distribution this has the value 0.263.

Dr. Wheeler defines kurtosis as:

The kurtosis parameter is a measure of the combined weight of the tails relative to the rest of the distribution.

So, kurtosis is all about the tails of the distribution – not the peakedness or flatness.

A normal random variable has a kurtosis of 3 irrespective of its mean or standard deviation. If a random variable’s kurtosis is greater than 3, it is said to be Leptokurtic. If its kurtosis is less than 3, it is said to be Platykurtic.

A large value of kurtosis indicates a more serious outlier issue and hence may lead the researcher to choose alternative statistical methods.

Some Examples of Kurtosis

  • In finance, risk and insurance are examples of needing to focus on the tail of the distribution and not assuming normality.
  • Kurtosis helps in determining whether resource used within an ecological guild is truly neutral or which it differs among species.
  • The accuracy of the variance as an estimate of the population $\sigma^2$ depends heavily on kurtosis.

For further reading see https://itfeature.com/statistics/measure-of-dispersion/moments