Sum of Squared Deviation from Mean
In statistics, the sum of squared deviation is a measure of the total variability (spread, variation) within a data set. In other words, the sum of squares is a measure of deviation or variation from the mean (average) value of the given data set. A sum of squares calculated by first computing the differences between each data point (observation) and mean of the data set, i.e. $x=X-\overline{X}$. The computed $x$ is known as the deviation score for the given data set. Squaring each of this deviation score and then adding these squared deviation scores gave us the sum of squared deviation (SS), which is represented mathematically as
\[SS=\sum(x^2)=\sum(X-\overline{X})^2\]
Note that the small letter $x$ usually represents the deviation of each observation from the mean value, while capital letter $X$ represents the variable of interest in statistics.
Sum of Squares Example
Consider the following data set {5, 6, 7, 10, 12}. To compute the sum of squares of this data set, follow these steps
- Calculate the average of the given data by summing all the values in the data set and then divide this sum of numbers by the total number of observations in the date set. Mathematically, it is $\frac{\sum X_i}{n}=\frac{40}{5}=8$, where 40 is the sum of all numbers $5+6+7+10+12$ and there are 5 observations in number.
- Calculate the difference of each observation in data set from the average computed in step 1, for given data. The differences are
5 – 8 = –3; 6 – 8 = –2; 7 – 8 = –1; 10 – 8 =2 and 12 – 8 = 4
Note that the sum of these differences should be zero. (–3 + –2 + –1 + 2 +4 = 0) - Now square the each of the differences obtained in step 2. The square of these differences are
9, 4, 1, 4 and 16 - Now add the squared number obtained in step 3. The sum of these squared quantities will be 9 + 4 + 1 + 4 + 16 = 34, which is the sum of the square of the given data set.
In statistics, sum of squares occurs in different contexts such as
- Partitioning of Variance (Partition of Sums of Squares)
- Sum of Squared Deviations (Least Squares)
- Sum of Squared Differences (Mean Squared Error)
- Sum of Squared Error (Residual Sum of Squares)
- Sum of Squares due to Lack of Fit (Lack of Fit Sum of Squares)
- Sum of Squares for Model Predictions (Explained Sum of Squares)
- Sum of Squares for Observations (Total Sum of Squares)
- Sum of Squared Deviation (Squared Deviations)
- Modeling involving Sum of Squares (Analysis of Variance)
- Multivariate Generalization of Sum of Square (Multivariate Analysis of Variance)
As previously discussed, Sum of Square is a measure of the Total Variability of a set of scores around a specific number.