Importance of Dispersion in Statistics

The importance of dispersion in statistics cannot be ignored. The term dispersion (or spread, or variability) is used to express the variability in the data set. The measure of dispersion is very important in statistics as it gives an average measure of how much data points differ from the average or another measure. The measure of variability tells about the consistency in the data sets.

The dispersion is a quantity that is far away from its center point (such as average). The data with minimum variation/variability with respect to its center point (average) is said to be more consistent. The lesser the variability in the data the more consistent the data.

Example of Measure of Dispersion

Suppose the score of three batsmen in three cricket matches:

PlayerMatch 1Match 2Match 3Average Score
A70809080
B75809580
C65809580

The question is which player is more consistent with his performance.

In the above data set the player whose deviation from average is minimum will be the most consistent player. So, the player B is more consistent than others. He shows less variation.

There are two types of measures of dispersion:

Absolute Measure of Dispersion

In absolute measure of dispersion, the measure is expressed in the original units in which the data is collected. For example, if data is collected in grams, the measure of dispersion will also be expressed in grams. The absolute measure of dispersion has the following types:

  • Range
  • Quartile Deviation
  • Average Deviation
  • Standard Deviation
  • Variance

Relative Measures of Dispersion

In the relative measures of dispersion, the measure is expressed in terms of coefficients, percentages, ratios, etc. It has the following types:

  • Coefficient of range
  • Coefficient of Quartile Deviation
  • Coefficient of Average Deviation
  • Coefficient of Variation (CV)

See more about Measures of Dispersion

Range and Coefficient of Range

Range is defined as the difference between the maximum value and minimum value of the data, statistically, it is $R=x_{max} – x_{min}$.

The Coefficient of Range is $=\frac{x_{max} – x_{min} }{x_{max} – x_{min} }$. Multiplying it by 100 will express it in percentages.

Consider the ungrouped data $x = 32, 36, 36, 37, 39, 41, 45, 46, 48$

The range will be $x_{max} – x_{min} = 48 – 32 = 16$.

The coefficient of Range will be $=\frac{x_{max} – x_{min} }{x_{max} – x_{min} }$

\begin{align*}
Coef\,\, of\,\, Range =\frac{x_{max} – x_{min} }{x_{max} – x_{min} } \\
&= \frac{48-32}{48+32} = \frac{16}{80} = 0.2\\
&= 0.2 \times 100 = 20\%
\end{align*}

For the following grouped data, the range and coefficient of the range will be

ClassesFreqClass Boundaries
65 – 84964.5 – 84.5
85 – 1041084.5 – 104.5
105 – 12417104.5 – 124.5
125 – 14410124.5 – 144.5
145 – 1645144.5 – 164.5
165 – 1844164.5 – 184.5
185 – 2045184.5 – 204.5
Tota.60

The upper class bound of the highest class will be $x_{min}$ and the lower class boundary of the lowest class will be $x_{min}$. Therefore, $x_{max}=204.5$ and $x_{min} = 64.5$. Therefore,

$$Range = x_{max} – x_{min} = 204.5 – 64.5 = 140$$

The Coefficient of Range will be

\begin{align*}
Coef\,\, of\,\, Range &=\frac{x_{max} – x_{min} }{x_{max} – x_{min} } \\
&= \frac{204.5-64.5}{204.5+64.5} = \frac{140}{269} = 0.5204\\
&= 0.5204 \times 100 = 52.04\%
\end{align*}

Average Deviation and Coefficient of Average Deviation

The average deviation is an absolute measure of dispersion. The mean/average of absolute deviation either taken from mean, median, or mode is called average deviation. Statistically, it is

$$Mean\,\, Deviation_{\overline{X}} = \frac{\sum\limits_{i=1}^n|x_i-\overline{x}|}{n}$$

$X$$x-\overline{x}$$|x-\overline{x}|$$x-\tilde{x}$$|x-\tilde{x}|$$x-\hat{x}$$|x-\hat{x}|$
32$32-40 = -8$8$32-39=-7$7$32-36=-4$4
36$36-40=-4$4$36-39=-3$3$36-36=0$0
36$36-40=-4$4$36-39=-3$3$36-36=0$0
37$37-40=-3$3$37-39=-2$2$37-36=1$1
39$39-40=-1$1$39-39=0$0$39-36=3$3
41$41-40=1$1$41-39=2$2$41-36=5$5
45$45-40=5$5$45-39=6$6$45-36=9$9
46$46-40=6$6$46-39=7$7$46-36=10$10
48$48-40=8$7$48-39=9$9$48-36=12$12
Total0403936

Where
\begin{align*}
Mean &= \overline{x} = \frac{\sum\limits_{i=1}^n x_i}{n} = \frac{360}{9} = 40\\
Mode &= 36\\
Median &= 39\\
MD_{\overline{x}} &= \frac{\sum\limits_{i=1}^n |x-\overline{x}|}{n} = \frac{40}{9} = 4.44\\
MD_{\tilde{x}} &= \frac{\sum\limits_{i=1}^n |x-\tilde{x}|}{n} = \frac{39}{9} = 4.33\\
MD_{\hat{x}} &= \frac{\sum\limits_{i=1}^n |x-\hat{x}|}{n} = \frac{36}{9} = 4.00
\end{align*}

The relative measure of average deviation is the coefficient of average deviation. It can be calculated as follows:

Coefficient of Average Deviation from Mean (also called Mean Coefficient of Dispersion)

\begin{align*}\text{Mean Coefficient of Dispersion} = \frac{MD_{\overline{x}}}{\overline{x}} = \frac{4.44}{40}\times 100 = 11.1\%\end{align*}

Coefficient of Average Deviation from Median (also called Median Coefficient of Dispersion)

\begin{align*}\text{Median Coefficient of Dispersion} = \frac{MD_{\tilde{x}}}{\tilde{x}} = \frac{4.33}{39}\times 100 = 11.1\%\end{align*}

Coefficient of Average Deviation from Mode (also called Mode Coefficient of Dispersion)

\begin{align*}\text{Mode Coefficient of Dispersion} = \frac{MD_{\hat{x}}}{\hat{x}} = \frac{4}{36}\times 100 = 11.1\%\end{align*}

Average Deviation for Grouped Data

One can also compute average deviations for grouped data (Discrete Case) as follows:

$x$
Mid Point
$f$$fx$$|x-\overline{x}|$$f|x-\overline{x}|$$|x-\tilde{x}|$$f|x-\tilde{x}|$
10990$10-34=24$21620180
2010200$20-34=14$14010100
3017510$30-34=4$6800
4010400$40-34=6$6010100
505250$50-34=16$8020100
604240$60-34=26$10430120
705350$70-34=36$18040200
Total602040848800

\begin{align*}
\overline{x} &= \frac{\sum\limits_{i=1}^n}{n} = \frac{2040}{60} = 34\\
\tilde{x} &= 30\\
\hat{x} &= 30\\
MD_{\overline{x}} &= \frac{\sum\limits_{i=1}^n f|x-\overline{x}|}{n} = \frac{848}{60} = 14.13\\
MD_{\tilde{x}} &= \frac{\sum\limits_{i=1}^n f|x-\tilde{x}|}{n} = \frac{800}{60} = 13.33\\
MD_{\hat{x}} &= \frac{\sum\limits_{i=1}^n |x-\hat{x}|}{n} = \frac{36}{9} = 4\\
\text{Mean Coefficient of Dispersion} &= \frac{MD_{\overline{x}}} {n} = \frac{14.13}{34}\times = 41.57\%\\
\text{Median Coefficient of Dispersion} &= \frac{MD_{\tilde{x}}}{\tilde{x}} = \frac{13.333}{30}\times100=44.44\%
\end{align*}

Importance of Dispersion in Statistics

From the above discussion and numerical examples, In statistics, the variability or dispersion is crucial. The following are some reasons for the importance of Dispersion in Statistics:

  • Understanding Data Spread: Variability gives insights into the spread or distribution of data, helping to understand how much individual data points differ from the average or some other measure.
  • Data Reliability: Lower variability in data can indicate higher reliability and consistency, which is key for making sound predictions and decisions.
  • Identifying Outliers: High variability can indicate the presence of outliers or anomalies in the data, which might require further investigation.
  • Comparing Datasets: Dispersion measures, such as variance and standard deviation, allow for the comparison of different datasets. Two datasets might have the same mean but different levels of dispersion, which can imply different data patterns or behaviors.
  • Risk Assessment: In fields like finance, assessing the variability of returns is crucial for understanding and managing risk. Higher variability often implies higher risk.
  • Statistical Inferences: Many statistical methods, such as hypothesis testing and confidence intervals, rely on the variability of data to make accurate inferences about populations from samples.
  • Balanced Decision Making: Understanding variability helps in making more informed decisions by providing a clearer picture of the data’s characteristics and potential fluctuations.
Importance of Dispersion in Statistics

Overall, variability is essential for a comprehensive understanding of data, enabling analysts to draw meaningful conclusions and make informed decisions.

R Language Frequently Asked Questions

Quartile Deviation (2025)

Quartile deviation denoted by QD is the absolute measure of dispersion and it is defined as the half of the difference between the upper quartile ($Q_3$) and the lower quartile ($Q_1$).

The Quartile Deviation also known as semi-interquartile range (Semi IQR), is a measure of dispersion that focuses on the middle 50% of the data. It is calculated as half the difference between the Third Quartile ($Q_3$) and the First Quartile ($Q_1$). One can write it mathematically as

$$QD = \frac{Q_3-Q_1}{2}$$

Note that the interquartile range is only the difference between the upper quartile ($Q_3$) and the lower quartile ($Q_1$), that is,

$$Interquartile\,\, Range = IRQ = Q_3 – Q_1$$

The Relative Measure of Quartile Deviation is the Coefficient of Quartile Deviation and is given as

$$Coefficient\,\,of\,\,QD = \frac{Q_3 – Q_1}{Q_3 + Q_1}\times 100$$

Quartile Deviation

When to Use QD

  • When dealing with skewed data or data with outliers.
  • When a quick and easy measure of dispersion is needed.

Interpretation QD

Spread: A larger quartile deviation indicates greater variability in the middle portion of the data.
Outliers: QD is less sensitive to extreme values (outliers) compared to the standard deviation.

Quartile Deviation for Ungrouped Data

222225253030303131333639
404042424848505152555759
818689899091919192939393
939494949596969697979898
999999100100100101101102102102102
102103103104104104105106106106107108
108108109109109110111112112113113113
113114115116116117117117118118119121

The above data is already sorted and there are a total of 96 observations. The first and third quartiles of the data can be computed as follows:

$Q_1 = \left(\frac{n}{4}\right)th$ value $= \left(\frac{96}{4}\right)th$ value $= 24th$ value. The 24th observation is 59, therefore, $Q_1=59$.

$Q_3 = \left(\frac{3n}{4}\right)th$ value $= \left(\frac{3\times 96}{4}\right)th$ value $= 72th$ value. The 72nd observation is 108, therefore, $Q_3=108$.

The quartile deviation will be

$$QD=\frac{Q_3 – Q_1}{2} = \frac{108-59}{2} = 24.5$$

The Interquartile Range $= IQR = Q_3 – Q_1 = 108 – 59 = 49$

The coefficient of Quantile Deviation will be

$$Coefficient\,\, of\,\, QD = \frac{Q_3 – Q_1}{Q_3 – Q_1}\times 100 = \frac{108-59}{108+59}\times 100 = 29.34\%$$

Quartile Deviation for Grouped Data

Consider the following example for grouped data to compute the quartile deviation.

ClassesFrequenciesClass BoundariesCF
11-14.91110.95-14.9511
15-20.91914.95-20.9530
21-24.92120.95-24.9551
25-30.93424.95-30.9585
31-34.91630.95-34.95101
35-40.9934.95-40.95110
41-44.9440.95-44.95114
Total114  

The first and third quartiles for the above-grouped data will be

\begin{align*}
Q_1 &= l + \frac{h}{f}\left(\frac{n}{4} – C\right)\\
&= 14.95 + \frac{6}{19}\left(\frac{114}{4} – 11\right)\\
&= 14.95 + \frac{6}{19}(28.5 – 11) = 20.48\\
Q_3 &= l + \frac{h}{f}\left(\frac{3\times 114}{4}-85\right)\\
&=30.95 + 0.187418 = 31.14
\end{align*}

The QD is

$$QD = \frac{Q_3 – Q_1}{2} = \frac{31.14 – 20.48}{2} = \frac{10.66}{2} = 5.33$$

The Interquartile Range will be

$$IQR = Q_3 – Q_1 = 31.14 – 20.48 = 10.66$$

The coefficient of quartile deviation is

$$Coefficient\,\,of\,\, QD = \frac{Q_3 – Q_1}{Q_3 + Q_1}\times 100 = \frac{31.14 – 20.48}{31.14+20.48}\times 100 = 20.65\%$$

  • Less affected by outliers: Makes it suitable for skewed data.
  • Easy to calculate: Relatively simple compared to standard deviation.

Disadvantages of QD

  • Ignores extreme values: This may not provide a complete picture of the data’s spread.
  • Less sensitive to changes in data: Compared to standard deviation.

In summary, Quartile deviation is a valuable and useful tool for understanding the spread of data, particularly when outliers are present. By focusing on the middle 50% of the data, it provides a robust measure of dispersion that is less sensitive to extreme values. However, it is important to consider its limitations, such as its insensitivity to outliers and changes in data.

Frequently Asked Questions about Quartile Deviation

  1. What is quartile deviation?
  2. What are the advantages of QD?
  3. What are the disadvantages of QD?
  4. What is IQR?
  5. What is Semi-IQR?
  6. How QD is interpreted?
  7. How QD is computed for grouped and ungrouped data?
  8. When QD should be used?

Learn R Programming, Test Preparation MCQs

Measures of Dispersion: Variance (2021)

Variance is one of the most important measures of dispersion of a distribution of a random variable. The term variance was introduced by R. A. Fisher in 1918. The variance of a set of observations (data set) is defined as the mean of the squares of deviations of all the observations from their mean. When it is computed for the entire population, the variance is called the population variance, usually denoted by $\sigma^2$, while for sample data, it is called sample variance and denoted by $S^2$ to distinguish between population variance and sample variance. Variance is also denoted by $Var(X)$ when we speak about the variance of a random variable. The symbolic definition of population and sample variance is

$\sigma^2=\frac{\sum (X_i – \mu)^2}{N}; \quad \text{for population data}$

$\sigma^2=\frac{\sum (X_i – \overline{X})^2}{n-1}; \quad \text{for sample data}$

It should be noted that the variance is in the square of units in which the observations are expressed and the variance is a large number compared to the observations themselves. The variance because of its nice mathematical properties, assumes an extremely important role in statistical theory.

Variance can be computed if we have standard deviation as the variance is the square of standard deviation i.e. Variance = (Standard Deviation)$^2$.

measures-of-dispersion

Variance can be used to compare dispersion in two or more sets of observations. Variance can never be negative since every term in the variance is the squared quantity, either positive or zero.
To calculate the standard deviation one has to follow these steps:

  1. First, find the mean of the data.
  2. Take the difference of each observation from the mean of the given data set. The sum of these differences should be zero or near zero it may be due to the rounding of numbers.
  3. Square the values obtained in step 1, which should be greater than or equal to zero, i.e. should be a positive quantity.
  4. Sum all the squared quantities obtained in step 2. We call it the sum of squares of differences.
  5. Divide this sum of squares of differences by the total number of observations if we have to calculate population standard deviation ($\sigma$). For sample standard deviation (S) divide the sum of squares of differences by the total number of observations minus one i.e. degree of freedom.
    Find the square root of the quantity obtained in step 4. The resultant quantity will be the standard deviation for the given data set.
Measures of Dispersion

The major characteristics of the variances are:
a)    All of the observations are used in the calculations
b)    Variance is not unduly influenced by extreme observations
c)    The variance is not in the same units as the observation, the variance is in the square of units in which the observations are expressed.

Consider a scenario: Imagine two groups of students both score an average of 70% on an exam. However, in Group A, most scores are clustered around 70%, while in Group B, scores are spread out widely. The measure of spread (like standard deviation or variance) helps distinguish these scenarios, providing a more nuanced understanding of student performance.

By understanding how spread out (scatterness of) the data points are from the average value (mean), standard deviation offers valuable insights in various practical scenarios. It allows for data-driven decision making in quality control, investment analysis, scientific research, and other fields.

https://itfeature.com

Read more about Measures of Dispersion

Computer MCQs Online Test

R Programming Language