The importance of dispersion in statistics cannot be ignored. The term dispersion (or spread, or variability) is used to express the variability in the data set. The measure of dispersion is very important in statistics as it gives an average measure of how much data points differ from the average or another measure. The measure of variability tells about the consistency in the data sets.
Table of Contents
The dispersion is a quantity that is far away from its center point (such as average). The data with minimum variation/variability with respect to its center point (average) is said to be more consistent. The lesser the variability in the data the more consistent the data.
Example of Measure of Dispersion
Suppose the score of three batsmen in three cricket matches:
Player | Match 1 | Match 2 | Match 3 | Average Score |
---|---|---|---|---|
A | 70 | 80 | 90 | 80 |
B | 75 | 80 | 95 | 80 |
C | 65 | 80 | 95 | 80 |
The question is which player is more consistent with his performance.
In the above data set the player whose deviation from average is minimum will be the most consistent player. So, the player B is more consistent than others. He shows less variation.
There are two types of measures of dispersion:
Absolute Measure of Dispersion
In absolute measure of dispersion, the measure is expressed in the original units in which the data is collected. For example, if data is collected in grams, the measure of dispersion will also be expressed in grams. The absolute measure of dispersion has the following types:
- Range
- Quartile Deviation
- Average Deviation
- Standard Deviation
- Variance
Relative Measures of Dispersion
In the relative measures of dispersion, the measure is expressed in terms of coefficients, percentages, ratios, etc. It has the following types:
- Coefficient of range
- Coefficient of Quartile Deviation
- Coefficient of Average Deviation
- Coefficient of Variation (CV)
See more about Measures of Dispersion
Range and Coefficient of Range
Range is defined as the difference between the maximum value and minimum value of the data, statistically, it is $R=x_{max} – x_{min}$.
The Coefficient of Range is $=\frac{x_{max} – x_{min} }{x_{max} – x_{min} }$. Multiplying it by 100 will express it in percentages.
Consider the ungrouped data $x = 32, 36, 36, 37, 39, 41, 45, 46, 48$
The range will be $x_{max} – x_{min} = 48 – 32 = 16$.
The coefficient of Range will be $=\frac{x_{max} – x_{min} }{x_{max} – x_{min} }$
\begin{align*}
Coef\,\, of\,\, Range =\frac{x_{max} – x_{min} }{x_{max} – x_{min} } \\
&= \frac{48-32}{48+32} = \frac{16}{80} = 0.2\\
&= 0.2 \times 100 = 20\%
\end{align*}
For the following grouped data, the range and coefficient of the range will be
Classes | Freq | Class Boundaries |
---|---|---|
65 – 84 | 9 | 64.5 – 84.5 |
85 – 104 | 10 | 84.5 – 104.5 |
105 – 124 | 17 | 104.5 – 124.5 |
125 – 144 | 10 | 124.5 – 144.5 |
145 – 164 | 5 | 144.5 – 164.5 |
165 – 184 | 4 | 164.5 – 184.5 |
185 – 204 | 5 | 184.5 – 204.5 |
Tota. | 60 |
The upper class bound of the highest class will be $x_{min}$ and the lower class boundary of the lowest class will be $x_{min}$. Therefore, $x_{max}=204.5$ and $x_{min} = 64.5$. Therefore,
$$Range = x_{max} – x_{min} = 204.5 – 64.5 = 140$$
The Coefficient of Range will be
\begin{align*}
Coef\,\, of\,\, Range &=\frac{x_{max} – x_{min} }{x_{max} – x_{min} } \\
&= \frac{204.5-64.5}{204.5+64.5} = \frac{140}{269} = 0.5204\\
&= 0.5204 \times 100 = 52.04\%
\end{align*}
Average Deviation and Coefficient of Average Deviation
The average deviation is an absolute measure of dispersion. The mean/average of absolute deviation either taken from mean, median, or mode is called average deviation. Statistically, it is
$$Mean\,\, Deviation_{\overline{X}} = \frac{\sum\limits_{i=1}^n|x_i-\overline{x}|}{n}$$
$X$ | $x-\overline{x}$ | $|x-\overline{x}|$ | $x-\tilde{x}$ | $|x-\tilde{x}|$ | $x-\hat{x}$ | $|x-\hat{x}|$ |
---|---|---|---|---|---|---|
32 | $32-40 = -8$ | 8 | $32-39=-7$ | 7 | $32-36=-4$ | 4 |
36 | $36-40=-4$ | 4 | $36-39=-3$ | 3 | $36-36=0$ | 0 |
36 | $36-40=-4$ | 4 | $36-39=-3$ | 3 | $36-36=0$ | 0 |
37 | $37-40=-3$ | 3 | $37-39=-2$ | 2 | $37-36=1$ | 1 |
39 | $39-40=-1$ | 1 | $39-39=0$ | 0 | $39-36=3$ | 3 |
41 | $41-40=1$ | 1 | $41-39=2$ | 2 | $41-36=5$ | 5 |
45 | $45-40=5$ | 5 | $45-39=6$ | 6 | $45-36=9$ | 9 |
46 | $46-40=6$ | 6 | $46-39=7$ | 7 | $46-36=10$ | 10 |
48 | $48-40=8$ | 7 | $48-39=9$ | 9 | $48-36=12$ | 12 |
Total | 0 | 40 | 39 | 36 |
Where
\begin{align*}
Mean &= \overline{x} = \frac{\sum\limits_{i=1}^n x_i}{n} = \frac{360}{9} = 40\\
Mode &= 36\\
Median &= 39\\
MD_{\overline{x}} &= \frac{\sum\limits_{i=1}^n |x-\overline{x}|}{n} = \frac{40}{9} = 4.44\\
MD_{\tilde{x}} &= \frac{\sum\limits_{i=1}^n |x-\tilde{x}|}{n} = \frac{39}{9} = 4.33\\
MD_{\hat{x}} &= \frac{\sum\limits_{i=1}^n |x-\hat{x}|}{n} = \frac{36}{9} = 4.00
\end{align*}
The relative measure of average deviation is the coefficient of average deviation. It can be calculated as follows:
Coefficient of Average Deviation from Mean (also called Mean Coefficient of Dispersion)
\begin{align*}\text{Mean Coefficient of Dispersion} = \frac{MD_{\overline{x}}}{\overline{x}} = \frac{4.44}{40}\times 100 = 11.1\%\end{align*}
Coefficient of Average Deviation from Median (also called Median Coefficient of Dispersion)
\begin{align*}\text{Median Coefficient of Dispersion} = \frac{MD_{\tilde{x}}}{\tilde{x}} = \frac{4.33}{39}\times 100 = 11.1\%\end{align*}
Coefficient of Average Deviation from Mode (also called Mode Coefficient of Dispersion)
\begin{align*}\text{Mode Coefficient of Dispersion} = \frac{MD_{\hat{x}}}{\hat{x}} = \frac{4}{36}\times 100 = 11.1\%\end{align*}
Average Deviation for Grouped Data
One can also compute average deviations for grouped data (Discrete Case) as follows:
$x$ Mid Point | $f$ | $fx$ | $|x-\overline{x}|$ | $f|x-\overline{x}|$ | $|x-\tilde{x}|$ | $f|x-\tilde{x}|$ |
---|---|---|---|---|---|---|
10 | 9 | 90 | $10-34=24$ | 216 | 20 | 180 |
20 | 10 | 200 | $20-34=14$ | 140 | 10 | 100 |
30 | 17 | 510 | $30-34=4$ | 68 | 0 | 0 |
40 | 10 | 400 | $40-34=6$ | 60 | 10 | 100 |
50 | 5 | 250 | $50-34=16$ | 80 | 20 | 100 |
60 | 4 | 240 | $60-34=26$ | 104 | 30 | 120 |
70 | 5 | 350 | $70-34=36$ | 180 | 40 | 200 |
Total | 60 | 2040 | 848 | 800 |
\begin{align*}
\overline{x} &= \frac{\sum\limits_{i=1}^n}{n} = \frac{2040}{60} = 34\\
\tilde{x} &= 30\\
\hat{x} &= 30\\
MD_{\overline{x}} &= \frac{\sum\limits_{i=1}^n f|x-\overline{x}|}{n} = \frac{848}{60} = 14.13\\
MD_{\tilde{x}} &= \frac{\sum\limits_{i=1}^n f|x-\tilde{x}|}{n} = \frac{800}{60} = 13.33\\
MD_{\hat{x}} &= \frac{\sum\limits_{i=1}^n |x-\hat{x}|}{n} = \frac{36}{9} = 4\\
\text{Mean Coefficient of Dispersion} &= \frac{MD_{\overline{x}}} {n} = \frac{14.13}{34}\times = 41.57\%\\
\text{Median Coefficient of Dispersion} &= \frac{MD_{\tilde{x}}}{\tilde{x}} = \frac{13.333}{30}\times100=44.44\%
\end{align*}
Importance of Dispersion in Statistics
From the above discussion and numerical examples, In statistics, the variability or dispersion is crucial. The following are some reasons for the importance of Dispersion in Statistics:
- Understanding Data Spread: Variability gives insights into the spread or distribution of data, helping to understand how much individual data points differ from the average or some other measure.
- Data Reliability: Lower variability in data can indicate higher reliability and consistency, which is key for making sound predictions and decisions.
- Identifying Outliers: High variability can indicate the presence of outliers or anomalies in the data, which might require further investigation.
- Comparing Datasets: Dispersion measures, such as variance and standard deviation, allow for the comparison of different datasets. Two datasets might have the same mean but different levels of dispersion, which can imply different data patterns or behaviors.
- Risk Assessment: In fields like finance, assessing the variability of returns is crucial for understanding and managing risk. Higher variability often implies higher risk.
- Statistical Inferences: Many statistical methods, such as hypothesis testing and confidence intervals, rely on the variability of data to make accurate inferences about populations from samples.
- Balanced Decision Making: Understanding variability helps in making more informed decisions by providing a clearer picture of the data’s characteristics and potential fluctuations.
Overall, variability is essential for a comprehensive understanding of data, enabling analysts to draw meaningful conclusions and make informed decisions.
R Language Frequently Asked Questions