The degrees of freedom (df) or several degrees of freedom refers to the number of observations in a sample minus the number of (population) parameters being estimated from the sample data. All this means that the degrees of freedom are a function of both sample size and the number of independent variables. In other words, it is the number of independent observations out of a total of ($n$) observations.
Table of Contents
Degrees of Freedom
In statistics, the df are considered the number of values in a study that are free to vary. Degree of freedom example in real life: if you have to take ten different courses to graduate, and only ten different courses are offered, then you have nine degrees of freedom. In nine semesters, you will be able to choose which class to take; in the tenth semester, there will only be one class left to take – there is no choice, if you want to graduate, this is the concept of the degrees of freedom (df) in statistics.
Let a random sample of size $n$ be taken from a population with an unknown mean $\overline{X}$. The sum of the deviations from their means is always equal to zero, i.e., $\sum_{i=1}^n (X_i-\overline{X})=0$. This requires a constraint on each deviation $X_i-\overline{X}$ used when calculating the variance.
\[S^2 =\frac{\sum_{i=1}^n (X_i-\overline{X})^2 }{n-1}\]
This constraint (restriction) implies that $n-1$ deviations completely determine the nth deviation. The $n$ deviations (and also the sum of their squares and the variance in the $S^2$ of the sample) therefore have $n-1$ degrees of freedom.
Common Way of Thinking DF
A common way to think of df is the number of independent pieces of information available to estimate another piece of information. More concretely, the number of degrees of freedom is the number of independent observations in a sample of data that are available to estimate a parameter of the population from which that sample is drawn. For example, if we have two observations, when calculating the mean, we have two independent observations; however, when calculating the variance, we have only one independent observation, since the two observations are equally distant from the mean.
Calculating DF
Single sample: For $n$ observations, one parameter (mean) needs to be estimated, which leaves $n-1$ degrees of freedom for estimating variability (dispersion).
Two samples: There are a total of $n_1+n_2$ observations ($n_1$ for group1 and $n_2$ for group2,) and two means need to be estimated, which leaves $n_1+n_2-2$ degrees of freedom for estimating variability.
Regression with p predictors: There are $n$ observations with $p+1$ parameters that need to be estimated (regression coefficient for each predictor and the intercept). This leaves $n-p-1$ degrees of freedom of error, which accounts for the error degrees of freedom in the ANOVA table.
DF in Statistical Distributions
Several commonly encountered statistical distributions (Student’s t, Chi-Squared, F) have parameters that are commonly referred to as df. This terminology simply reflects that in many applications where these distributions occur, the parameter corresponds to the degrees of freedom of an underlying random vector. If $X_i; i=1,2,\cdots, n$ are independent normal $(\mu, \sigma^2)$ random variables, the statistic (formula) is $\frac{\sum_{i=1}^n (X_i-\overline{X})^2}{\sigma^2}$, follows a chi-squared distribution with $n-1$ degree of freedom. Here, the degree of freedom arises from the residual sum of squares in the numerator and in turn the $n-1$ degree of freedom of the underlying residual vector $X_i-\overline{X}$.
Degrees of freedom (DF) represent the number of independent values in a statistical calculation that can vary without violating constraints. They play a crucial role in hypothesis testing, regression analysis, and probability distributions