# Tagged: Range

## Variance: A Measure of Dispersion

Variance is a measure of the dispersion of a distribution of a random variable. The term variance was introduced by R. A. Fisher in 1918. The variance of a set of observations (data set) is defined as the mean of the squares of deviations of all the observations from their mean. When it is computed for the entire population, the variance is called the population variance, usually denoted by $\sigma^2$, while for sample data, it is called sample variance and denoted by $S^2$ in order to distinguish between population variance and sample variance. Variance is also denoted by $Var(X)$ when we speak about the variance of a random variable. The symbolic definition for population and sample variance is

$\sigma^2=\frac{\sum (X_i – \mu)^2}{N}; \quad \text{for population data}$

$\sigma^2=\frac{\sum (X_i – \overline{X})^2}{n-1}; \quad \text{for sample data}$

It should be noted that the variance is in the square of units in which the observations are expressed and variance is a large number compared to observations themselves. The variance because of its nice mathematical properties, assumes an extremely important role in statistical theory.

Variance can be computed if we have standard deviation as the variance is square of standard deviation i.e. Variance = (Standard Deviation)$^2$.

Variance can be used to compare dispersion in two or more sets of observations. Variance can never be negative since every term in the variance is squared quantity, either positive or zero.
To calculate the standard deviation one has to follow these steps:

1. First find the mean of the data.
2. Take difference of each observation from mean of the given data set. The sum of these differences should be zero or near to zero it may be due to rounding of numbers.
3. Square the values obtained in step 1, which should be greater than or equal to zero, i.e. should be a positive quantity.
4. Sum all the squared quantities obtained in step 2. We call it sum of squares of differences.
5. Divide this sum of squares of differences by total number of observation if we have to calculate population standard deviation ($\sigma$). For sample standard deviation (S) divide the sum of squares of differences by total number of observation minus one i.e. degree of freedom.
Find the square root of the quantity obtained in step 4. The resultant quantity will be standard deviation for given data set.

The major characteristics of the variances are:
a)    All of the observations are used in the calculations
b)    Variance is not unduly influenced by extreme observations
c)    The variance is not in the same units as the observation, the variance is in the square of units in which the observations are expressed.

## Range: Measure of Dispersion

Measure of Central Tendency provides typical value about the data set, but it does not tell the actual story about data i.e. mean, median and mode are enough to get summary information, though we know about the center of the data. In other words, we can measure the center of the data by looking at averages (mean, median, mode). These measure tell nothing about the spread of data. So for more information about data we need some other measure, such as measure of dispersion or spread.

Spread of data can be measured by calculating the range of data; range tell us over how many numbers of data extends. Range (an absolute measure of dispersion) can be found by subtracting highest value (called upper bound) in data from smallest value (called lower bound) in data. i.e.

Range = Upper Bound – Lowest Bound
OR
Range = Largest Value – Smallest Value

This absolute measure of dispersion have disadvantages as range only describes the width of the data set (i.e. only spread out) measure in same unit as data, but it does not gives the real picture of how data is distributed. If data has outliers, using range to describe the spread of that can be very misleading as range is sensitive to outliers. So we need to be careful in using range as it does not give the full picture of what’s going between the highest and lowest value. It might give misleading picture of the spread of the data because it is based only on the two extreme values. It is therefore an unsatisfactory measure of dispersion.

However range is widely used in statistical process control such as control charts of manufactured products, daily temperature, stock prices etc., applications as it is very easy to calculate. It is an absolute measure of dispersion, its relatives measure known as the coefficient of dispersion defined the the relation

$Coefficient\,\, of\,\, Dispersion = \frac{x_m-x_0}{x_m-x_0}$

Coefficient of dispersion is a pure dimensionless and is used for comparison purpose.

## Absolute Measure of Dispersion

Absolute Measure of Dispersion gives an idea about the amount of dispersion/ spread in a set of observations. These quantities measure the dispersion in the same units as the units of original data. Absolute measures cannot be used to compare the variation of two or more series/ data sets. A measure of absolute dispersion does not in itself, tell whether the variation is large or small.

## Range

The Range is the difference between the largest value and the smallest value in the data set. For ungrouped data, let $X_0$ is the smallest value and $X_n$ is the largest  value in a data set then the range (R) is defined as
$R=X_n-X_0$.

For grouped data Range can be calculated in three different ways
R=Mid point of the highest class – Midpoint of the lowest class
R=Upper class limit of the highest class – Lower class limit of the lower class
R=Upper class boundary of the highest class – Lower class boundary of the lowest class

## Quartile Deviation (Semi-Interquantile Range)

The Quartile deviation is defined as the difference between the third and first quartiles, and half of this range is called the semi-interquartile range (SIQD) or simply quartile deviation (QD). $QD=\frac{Q_3-Q_1}{2}$
The Quartile Deviation is superior to range as it is not affected by extremely large or small observations, anyhow it does not give any information about the position of observation lying outside the two quantities. It is not amenable to mathematical treatment and is greatly affected by sampling variability. Although Quartile Deviation is not widely used as a measure of dispersion, it is used in situations in which extreme observations are thought to be unrepresentative/ misleading. Quartile Deviation is not based on all observations therefore it is affected by extreme observations.

Note: The range “Median ± QD” contains approximately 50% of the data.

## Mean Deviation (Average Deviation)

The Mean Deviation is defined as the arithmetic mean of the deviations measured either from the mean or from the median. All these deviations are counted as positive to avoid the difficulty arising from the property that the sum of deviations of observations from their mean is zero.
$MD=\frac{\sum|X-\overline{X}|}{n}\quad$ for ungrouped data for mean
$MD=\frac{\sum f|X-\overline{X}|}{\sum f}\quad$ for grouped data for mean
$MD=\frac{\sum|X-\tilde{X}|}{n}\quad$ for ungrouped data for median
$MD=\frac{\sum f|X-\tilde{X}|}{\sum f}\quad$ for grouped data for median
Mean Deviation can be calculated about other central tendencies but it is least when deviations are taken as the median.

The Mean Deviation gives more information than the range or the Quartile Deviation as it is based on all the observed values. The Mean Deviation does not give undue weight to occasional large deviations, so it should likely be used in situations where such deviations are likely to occur.

## Variance and Standard Deviation

This absolute measure of dispersion is defined as the mean of the squares of deviations of all the observations from their mean. Traditionally for population variance is denoted by $\sigma^2$ (sigma square) and for sample data denoted by $S^2$ or $s^2$.
Symbolically
$\sigma^2=\frac{\sum(X_i-\mu)^2}{N}\quad$ Population Variance for ungrouped data
$S^2=\frac{\sum(X_i-\overline{X})^2}{n}\quad$ sample Variance for ungrouded data
$\sigma^2=\frac{\sum f(X_i-\mu)^2}{\sum f}\quad$ Population Variance for grouped data
$\sigma^2=\frac{\sum f (X_i-\overline{X})^2}{\sum f}\quad$ Sample Variance for grouped data

The variance is denoted by Var(X) for random variable X. The term variance was introduced by R. A. Fisher (1890-1982) in 1918. The variance is in square of units and the variance is a large number compared to observation themselves.
Note that there are alternative formulas to compute Variance or Standard Deviations.

The positive square root of the variance is called Standard Deviation (SD) to express the deviation in the same units as the original observation themselves. It is a measure of the average spread about the mean and is symbolically defined as
$\sigma^2=\sqrt{\frac{\sum(X_i-\mu)^2}{N}}\quad$ Population Standard for ungrouped data
$S^2=\sqrt{\frac{\sum(X_i-\overline{X})^2}{n}}\quad$ Sample Standard Deviation for ungrouped data
$\sigma^2=\sqrt{\frac{\sum f(X_i-\mu)^2}{\sum f}}\quad$ Population Standard Deviation for grouped data
$\sigma^2=\sqrt{\frac{\sum f (X_i-\overline{X})^2}{\sum f}}\quad$ Sample Standard Deviation for grouped data
Standard Deviation is the most useful measure of dispersion is credited with the name Standard Deviation by Karl Pearson (1857-1936).
In some text Sample, Standard Deviation is defined as $S^2=\frac{\sum (X_i-\overline{X})^2}{n-1}$ on the basis of the argument that knowledge of any $n-1$ deviations determines the remaining deviations as the sum of n deviations must be zero. In fact, this is an unbiased estimator of the population variance $\sigma^2$. The Standard Deviation has a definite mathematical measure, it utilizes all the observed values and is amenable to mathematical treatment but affected by extreme values.

References