Basic Statistics and Data Analysis

Lecture notes, MCQS of Statistics

Sum of Squares

Sum of Sqaures

In statistics, the sum of squares is a measure of the total variability (spread, variation) within a data set. In other words the sum of squares is a measure of deviation or variation from mean value of the given data set. A sum of squares calculated by first computing the differences between each data point (observation) and mean of the data set, i.e. $x=X-\overline{X}$. The computed $x$ is the deviation score for the given data set. Squaring each of this deviation score and then adding these squared deviation scores gave us the sum of squares (SS), which is represented mathematically as


Note that the small letter $x$ usually represents the deviation of each observation from mean value, while capital letter $X$ represents the variable of interest in statistics.

Sum of Squares Example

Consider the following data set {5, 6, 7, 10, 12}. To compute the sum of squares of this data set, follow these steps

  • Calculate the average of the given data by summing all the values in the data set and then divide this sum of numbers by the total number of observations in the date set. Mathematically, it is $\frac{\sum X_i}{n}=\frac{40}{5}=8$, where 40 is the sum of all numbers $5+6+7+10+12$ and there are 5 observations in number.
  • Calculate the difference of each observation in data set from the average computed in step 1, for given data. The difference are
    5 – 8 = –3; 6 – 8 = –2; 7 – 8 = –1; 10 – 8 =2 and 12 – 8 = 4
    Note that the sum of these differences should be zero. (–3 + –2 + –1 + 2 +4 = 0)
  • Now square the each of the differences obtained in step 2. The square of these differences are
    9, 4, 1, 4 and 16
  • Now add the squared number obtained in step 3. The sum of these squared quantities will be 9 + 4 + 1 + 4 + 16 = 34, which is the sum of the square of the given data set.

In statistics, sum of squares occurs in different contexts such as

  • Partitioning of Variance (Partition of Sums of Squares)
  • Sum of Squared Deviations (Least Squares)
  • Sum of Squared Differences (Mean Squared Error)
  • Sum of Squared Error (Residual Sum of Squares)
  • Sum of Squares due to Lack of Fit (Lack of Fit Sum of Squares)
  • Sum of Squares for Model Predictions (Explained Sum of Squares)
  • Sum of Squares for Observations (Total Sum of Squares)
  • Sum of Squared Deviation (Squared Deviations)
  • Modeling involving Sum of Squares (Analysis of Variance)
  • Multivariate Generalization of Sum of Square (Multivariate Analysis of Variance)

As previously discussed, Sum of Square is a measure of the Total Variability of a set of scores around a specific number.


The Author

Muhammad Imdadullah

Student and Instructor of Statistics and business mathematics. Currently Ph.D. Scholar (Statistics), Bahauddin Zakariya University Multan. Like Applied Statistics and Mathematics and Statistical Computing. Statistical and Mathematical software used are: SAS, STATA, GRETL, EVIEWS, R, SPSS, VBA in MS-Excel. Like to use type-setting LaTeX for composing Articles, thesis etc.

Leave a Reply

Copy Right © 2011 ITFEATURE.COM
%d bloggers like this: