Variance is one of the most important measures of dispersion of a distribution of a random variable. The term variance was introduced by R. A. Fisher in 1918. The variance of a set of observations (data set) is defined as the mean of the squares of deviations of all the observations from their mean. When it is computed for the entire population, the variance is called the population variance, usually denoted by $\sigma^2$, while for sample data, it is called sample variance and denoted by $S^2$ to distinguish between population variance and sample variance. Variance is also denoted by $Var(X)$ when we speak about the variance of a random variable. The symbolic definition of population and sample variance is
$\sigma^2=\frac{\sum (X_i – \mu)^2}{N}; \quad \text{for population data}$
$\sigma^2=\frac{\sum (X_i – \overline{X})^2}{n-1}; \quad \text{for sample data}$
It should be noted that the variance is in the square of units in which the observations are expressed and the variance is a large number compared to the observations themselves. The variance because of its nice mathematical properties, assumes an extremely important role in statistical theory.
Variance can be computed if we have standard deviation as the variance is the square of standard deviation i.e. Variance = (Standard Deviation)$^2$.
Variance can be used to compare dispersion in two or more sets of observations. Variance can never be negative since every term in the variance is the squared quantity, either positive or zero.
To calculate the standard deviation one has to follow these steps:
- First, find the mean of the data.
- Take the difference of each observation from the mean of the given data set. The sum of these differences should be zero or near zero it may be due to the rounding of numbers.
- Square the values obtained in step 1, which should be greater than or equal to zero, i.e. should be a positive quantity.
- Sum all the squared quantities obtained in step 2. We call it the sum of squares of differences.
- Divide this sum of squares of differences by the total number of observations if we have to calculate population standard deviation ($\sigma$). For sample standard deviation (S) divide the sum of squares of differences by the total number of observations minus one i.e. degree of freedom.
Find the square root of the quantity obtained in step 4. The resultant quantity will be the standard deviation for the given data set.
The major characteristics of the variances are:
a) All of the observations are used in the calculations
b) Variance is not unduly influenced by extreme observations
c) The variance is not in the same units as the observation, the variance is in the square of units in which the observations are expressed.
Consider a scenario: Imagine two groups of students both score an average of 70% on an exam. However, in Group A, most scores are clustered around 70%, while in Group B, scores are spread out widely. The measure of spread (like standard deviation or variance) helps distinguish these scenarios, providing a more nuanced understanding of student performance.
By understanding how spread out (scatterness of) the data points are from the average value (mean), standard deviation offers valuable insights in various practical scenarios. It allows for data-driven decision making in quality control, investment analysis, scientific research, and other fields.