Introduction of Sum Square Deviations
In statistics, the sum of squared deviations (also known as the sum of squares) is a measure of the total variability (Measure of spread or variation) within a data set. In other words, the sum of squares is a measure of deviation or variation from the mean (average) value of the given data set.
Table of Contents
Computation of Sum of Squared Deviations
A sum of squares is calculated by first computing the differences between each data point (observation) and the mean of the data set, i.e. $x=X-\overline{X}$. The computed $x$ is known as the deviation score for the given data set. Squaring each of these deviation scores and then adding these squared deviation scores gave us the sum of squared deviation (SS), which is represented mathematically as
\[SS=\sum(x^2)=\sum(X-\overline{X})^2\]
Note that the small letter $x$ usually represents the deviation of each observation from the mean value, while the capital letter $X$ represents the variable of interest in statistics.
The Sum of Squared Deviations Example
Consider the following data set {5, 6, 7, 10, 12}. To compute the sum of squares of this data set, follow these steps
- Calculate the average of the given data by summing all the values in the data set and then divide this sum of numbers by the total number of observations in the data set. Mathematically, it is $\frac{\sum X_i}{n}=\frac{40}{5}=8$, where 40 is the sum of all numbers $5+6+7+10+12$ and there are 5 observations in number.
- Calculate the difference of each observation in the data set from the average computed in step 1, for the given data. The differences are
$5 – 8 = –3$; $6 – 8 = –2$; $7 – 8 = –1$; $10 – 8 =2$ and $12 – 8 = 4$
Note that the sum of these differences should be zero. $(–3 + –2 + –1 + 2 +4 = 0)$ - Now square each of the differences obtained in step 2. The square of these differences are
9, 4, 1, 4 and 16 - Now add the squared number obtained in step 3. The sum of these squared quantities will be $9 + 4 + 1 + 4 + 16 = 34$, which is the sum of the square of the given data set.
Sums of Squares in Different Context
In statistics, the sum of squares occurs in different contexts such as
- Partitioning of Variance (Partition of Sums of Squares)
- The sum of Squared Deviations (Least Squares)
- The sum of Squared Differences (Mean Squared Error)
- The sum of Squared Error (Residual Sum of Squares)
- The sum of Squares due to Lack of Fit (Lack of Fit Sum of Squares)
- The sum of Squares for Model Predictions (Explained Sum of Squares)
- The sum of Squares for Observations (Total Sum of Squares)
- The sum of Squared Deviation (Squared Deviations)
- Modeling involving the Sum of Squares (Analysis of Variance)
- Multivariate Generalization of Sum of Square (Multivariate Analysis of Variance)
As previously discussed, the Sum of Squares is a measure of the Total Variability of a set of scores around a specific number.
Summary
- A higher sum of squares indicates that your data points are further away from the mean on average, signifying greater spread or variability in the data. Conversely, a lower sum of squares suggests the data points are clustered closer to the mean, indicating less variability.
- The sum of squares plays a crucial role in calculating other important statistics like variance and standard deviation. These concepts help us understand the distribution of data and make comparisons between different datasets.