Measure of Dispersion - Statistics for Data Science & Analytics

Quartile Deviation (2025)

Dec 21, 2024 by Muhammad Imdad Ullah

Quartile deviation denoted by QD is the absolute measure of dispersion and it is defined as the half of the difference between the upper quartile ($Q_3$) and the lower quartile ($Q_1$).

The Quartile Deviation also known as semi-interquartile range (Semi IQR), is a measure of dispersion that focuses on the middle 50% of the data. It is calculated as half the difference between the Third Quartile ($Q_3$) and the First Quartile ($Q_1$). One can write it mathematically as

$$QD = \frac{Q_3-Q_1}{2}$$

Note that the interquartile range is only the difference between the upper quartile ($Q_3$) and the lower quartile ($Q_1$), that is,

$$Interquartile\,\, Range = IRQ = Q_3 – Q_1$$

The Relative Measure of Quartile Deviation is the Coefficient of Quartile Deviation and is given as

$$Coefficient\,\,of\,\,QD = \frac{Q_3 – Q_1}{Q_3 + Q_1}\times 100$$

When to Use QD

When dealing with skewed data or data with outliers.
When a quick and easy measure of dispersion is needed.

Interpretation QD

Spread: A larger quartile deviation indicates greater variability in the middle portion of the data.
Outliers: QD is less sensitive to extreme values (outliers) compared to the standard deviation.

Quartile Deviation for Ungrouped Data

22	22	25	25	30	30	30	31	31	33	36	39
40	40	42	42	48	48	50	51	52	55	57	59
81	86	89	89	90	91	91	91	92	93	93	93
93	94	94	94	95	96	96	96	97	97	98	98
99	99	99	100	100	100	101	101	102	102	102	102
102	103	103	104	104	104	105	106	106	106	107	108
108	108	109	109	109	110	111	112	112	113	113	113
113	114	115	116	116	117	117	117	118	118	119	121

The above data is already sorted and there are a total of 96 observations. The first and third quartiles of the data can be computed as follows:

$Q_1 = \left(\frac{n}{4}\right)th$ value $= \left(\frac{96}{4}\right)th$ value $= 24th$ value. The 24th observation is 59, therefore, $Q_1=59$.

$Q_3 = \left(\frac{3n}{4}\right)th$ value $= \left(\frac{3\times 96}{4}\right)th$ value $= 72th$ value. The 72nd observation is 108, therefore, $Q_3=108$.

The quartile deviation will be

$$QD=\frac{Q_3 – Q_1}{2} = \frac{108-59}{2} = 24.5$$

The Interquartile Range $= IQR = Q_3 – Q_1 = 108 – 59 = 49$

The coefficient of Quantile Deviation will be

$$Coefficient\,\, of\,\, QD = \frac{Q_3 – Q_1}{Q_3 – Q_1}\times 100 = \frac{108-59}{108+59}\times 100 = 29.34\%$$

Quartile Deviation for Grouped Data

Consider the following example for grouped data to compute the quartile deviation.

Classes	Frequencies	Class Boundaries	CF
11-14.9	11	10.95-14.95	11
15-20.9	19	14.95-20.95	30
21-24.9	21	20.95-24.95	51
25-30.9	34	24.95-30.95	85
31-34.9	16	30.95-34.95	101
35-40.9	9	34.95-40.95	110
41-44.9	4	40.95-44.95	114
Total	114

The first and third quartiles for the above-grouped data will be

\begin{align*}
Q_1 &= l + \frac{h}{f}\left(\frac{n}{4} – C\right)\\
&= 14.95 + \frac{6}{19}\left(\frac{114}{4} – 11\right)\\
&= 14.95 + \frac{6}{19}(28.5 – 11) = 20.48\\
Q_3 &= l + \frac{h}{f}\left(\frac{3\times 114}{4}-85\right)\\
&=30.95 + 0.187418 = 31.14
\end{align*}

The QD is

$$QD = \frac{Q_3 – Q_1}{2} = \frac{31.14 – 20.48}{2} = \frac{10.66}{2} = 5.33$$

The Interquartile Range will be

$$IQR = Q_3 – Q_1 = 31.14 – 20.48 = 10.66$$

The coefficient of quartile deviation is

$$Coefficient\,\,of\,\, QD = \frac{Q_3 – Q_1}{Q_3 + Q_1}\times 100 = \frac{31.14 – 20.48}{31.14+20.48}\times 100 = 20.65\%$$

Advantages of QD

Less affected by outliers: Makes it suitable for skewed data.
Easy to calculate: Relatively simple compared to standard deviation.

Disadvantages of QD

Ignores extreme values: This may not provide a complete picture of the data’s spread.
Less sensitive to changes in data: Compared to standard deviation.

In summary, Quartile deviation is a valuable and useful tool for understanding the spread of data, particularly when outliers are present. By focusing on the middle 50% of the data, it provides a robust measure of dispersion that is less sensitive to extreme values. However, it is important to consider its limitations, such as its insensitivity to outliers and changes in data.

Frequently Asked Questions about Quartile Deviation

What is quartile deviation?
What are the advantages of QD?
What are the disadvantages of QD?
What is IQR?
What is Semi-IQR?
How QD is interpreted?
How QD is computed for grouped and ungrouped data?
When QD should be used?

Learn R Programming, Test Preparation MCQs

Measures of Dispersion: Variance (2021)

May 13, 2024Aug 25, 2021 by Muhammad Imdad Ullah

Variance is one of the most important measures of dispersion of a distribution of a random variable. The term variance was introduced by R. A. Fisher in 1918. The variance of a set of observations (data set) is defined as the mean of the squares of deviations of all the observations from their mean. When it is computed for the entire population, the variance is called the population variance, usually denoted by $\sigma^2$, while for sample data, it is called sample variance and denoted by $S^2$ to distinguish between population variance and sample variance. Variance is also denoted by $Var(X)$ when we speak about the variance of a random variable. The symbolic definition of population and sample variance is

$\sigma^2=\frac{\sum (X_i – \mu)^2}{N}; \quad \text{for population data}$

$\sigma^2=\frac{\sum (X_i – \overline{X})^2}{n-1}; \quad \text{for sample data}$

It should be noted that the variance is in the square of units in which the observations are expressed and the variance is a large number compared to the observations themselves. The variance because of its nice mathematical properties, assumes an extremely important role in statistical theory.

Variance can be computed if we have standard deviation as the variance is the square of standard deviation i.e. Variance = (Standard Deviation)$^2$.

Variance can be used to compare dispersion in two or more sets of observations. Variance can never be negative since every term in the variance is the squared quantity, either positive or zero.
To calculate the standard deviation one has to follow these steps:

First, find the mean of the data.
Take the difference of each observation from the mean of the given data set. The sum of these differences should be zero or near zero it may be due to the rounding of numbers.
Square the values obtained in step 1, which should be greater than or equal to zero, i.e. should be a positive quantity.
Sum all the squared quantities obtained in step 2. We call it the sum of squares of differences.
Divide this sum of squares of differences by the total number of observations if we have to calculate population standard deviation ($\sigma$). For sample standard deviation (S) divide the sum of squares of differences by the total number of observations minus one i.e. degree of freedom.
Find the square root of the quantity obtained in step 4. The resultant quantity will be the standard deviation for the given data set.

The major characteristics of the variances are:
a)   All of the observations are used in the calculations
b)   Variance is not unduly influenced by extreme observations
c)   The variance is not in the same units as the observation, the variance is in the square of units in which the observations are expressed.

Consider a scenario: Imagine two groups of students both score an average of 70% on an exam. However, in Group A, most scores are clustered around 70%, while in Group B, scores are spread out widely. The measure of spread (like standard deviation or variance) helps distinguish these scenarios, providing a more nuanced understanding of student performance.

By understanding how spread out (scatterness of) the data points are from the average value (mean), standard deviation offers valuable insights in various practical scenarios. It allows for data-driven decision making in quality control, investment analysis, scientific research, and other fields.

Standard Deviation: A Measure of Dispersion (2017)

Mar 27, 2024Dec 24, 2017 by Muhammad Imdad Ullah

The standard deviation is a widely used concept in statistics and it tells how much variation (measure of spread or dispersion) is in the data set. It can be defined as the positive square root of the mean (average) of the squared deviations of the values from their mean.
To calculate the standard deviation one has to follow these steps:

Calculation of Standard Deviation

First, find the mean of the data.
Take the difference of each data point from the mean of the given data set (which is computed in step 1). Note that, the sum of these differences must be equal to zero or near to zero due to rounding of numbers.
Now compute the square of the differences obtained in Step 2, it would be greater than zero, and it will be a positive quantity.
Now add up all the squared quantities obtained in step 3. We call it the sum of squares of differences.
Divide this sum of squares of differences (obtained in step 4) by the total number of observations (available in data) if we have to calculate population standard deviation ($\sigma$). If you want t to compute sample standard deviation ($S$) then divide the sum of squares of differences (obtained in step 4) by the total number of observations minus one ($n-1$) i.e. the degree of freedom. Note that $n$ is the number of observations available in the data set.
Find the square root (also known as under root) of the quantity obtained in step 5. The resultant quantity in this way is known as the standard deviation (SD) for the given data set.

The sample SD of a set of $n$ observation, $X_1, X_2, \cdots, X_n$ denoted by $S$ is

\begin{aligned}
\sigma &=\sqrt{\frac{\sum_{i=1}^n (X_i-\overline{X})^2}{n}}; Population\, SD\\
S&=\sqrt{ \frac{\sum_{i=1}^n (X_i-\overline{X})^2}{n-1}}; Sample\, SD
\end{aligned}

The standard deviation can be computed from variance too.

The real meaning of the standard deviation is that for a given data set 68% of the data values will lie within the range $\overline{X} \pm \sigma$ i.e. within one standard deviation from the mean or simply within one $\sigma$. Similarly, 95% of the data values will lie within the range $\overline{X} \pm 2 \sigma$ and 99% within $\overline{X} \pm 3 \sigma$.

Examples

A large value of SD indicates more spread in the data set which can be interpreted as the inconsistent behaviour of the data collected. It means that the data points tend to be away from the mean value. For the case of smaller standard deviation, data points tend to be close (very close) to the mean indicating the consistent behavior of the data set.

The standard deviation and variance are used to measure the risk of a particular investment in finance. The mean of 15% and standard deviation of 2% indicates that it is expected to earn a 15% return on investment and we have a 68% chance that the return will be between 13% and 17%. Similarly, there is a 95% chance that the return on the investment will yield an 11% to 19% return.

Online MCQs Test Preparation Website

Quartile Deviation (2025)