Coefficient of Determination

$R^2$ pronounced R-squared (Coefficient of determination) is a useful statistic to check the regression fit value. $R^2$ measures the proportion of total variation about the mean $\bar{Y}$ explained by the regression. R is the correlation between $Y$ and $\hat{Y}$ and is usually the multiple correlation coefficient. The coefficient of determination ($R^2$) can take values as high as 1 or  (100%) when all the values are different, i.e., $0\le R^2\le 1$.

Coefficient of Determination

The Coefficient of Determination ($R^2$) quantifies how well a regression model explains the variance in the dependent variable. It ranges from 0 to 1. When repeat runs exist in the data, the value of $R^2$ cannot attain 1, no matter how well the model fits, because no model can explain the variation in the data due to the pure error. A perfect fit to data for which $\hat{Y}_i=Y_i$, $R^2=1$. If $\hat{Y}_i=\bar{Y}$, that is if $\beta_1=\beta_2=\cdots=\beta_{p-1}=0$ or if a model $Y=\beta_0 +\varepsilon$ alone has been fitted, then $R^2=0$. Therefore, we can say that $R^2$ is a measure of the usefulness of the terms other than $\beta_0$ in the model.

Note that we must be sure that an improvement/ increase in $R^2$ value due to adding a new term (variable) to the model under study should have some real significance and is not because the number of parameters in the model is getting else to saturation point. If there is no pure error, $R^2$ can be made unity.

Coefficient of Determination Formula

\begin{align*}
R^2 &= \frac{\text {SS due to regression given}\, b_0}{\text{Total SS corrected for mean} \, \bar{Y}} \\
&= \frac{SS \, (b_1 | b_0)}{S_{YY}} \\
&= \frac{\sum(\hat{Y_i}-\bar{Y})^2} {\sum(Y_i-\bar{Y})^2}r \\
&= \frac{S^2_{XY}}{(S_{XY})(S_{YY})}
\end{align*}

where summation are over $i=1,2,\cdots, n$.

Coefficient of Determination
Coefficient of Determination

Interpreting R-Square $R^2$ does not indicate whether:

  • The independent variables (explanatory variables) are a cause of the changes in the dependent variable;
  • Omitted-variable bias exists;
  • The correct regression was used;
  • The most appropriate set of explanatory variables has been selected;
  • There is collinearity (or multicollinearity) present in the data;
  • The model might be improved using transformed versions of the existing explanatory variables.

Use of Coefficient of Determination in Various Fields

  • Economics & Business: A company wants to predict sales based on advertising spending. A value of $R^2=0.85$ means 85% of the variation in sales can be explained by advertising spending. The remaining 15% may be due to other factors (e.g., seasonality, competition).
  • Healthcare: A study examines the relationship between exercise time and blood pressure. As an example, if $R^2=0.60$ means 60% of blood pressure variation is explained by exercise time. The other 40% could be due to diet, genetics, or stress.
  • Education: A school analyzes the impact of study hours on exam scores. $R^2=0.50$ means 50% of exam score variation is explained by study time. The rest may depend on teaching quality, student aptitude, or test difficulty.
  • Real Estate: Predicting house prices based on square footage. $R^2=0.75$ means that 75% of price variation is explained by size. The remaining 25% could be due to location, age of the house, or amenities.
  • Agriculture: A farmer studies the effect of fertilizer amount on crop yield. A value of $R^2=0.40$ means that 40% of yield variation is explained by fertilizer use. The other 60% could be due to rainfall, soil quality, or pest control.
  • Sports Analytics: Predicting a basketball player’s points per game based on practice hours. The value of $R^2=0.30$ means that 30% of the scoring variation is explained by practice time. The rest may depend on opponent strength, player fatigue, or teamwork.

Important Points about Coefficient of Determination

  • A high $R^2$ (e.g., 0.8) suggests the model explains most variability.
  • A low $R^2$ (e.g., 0.2) means other factors are more influential.
  • $R^2$ does not imply causation—only correlation.

Learn more about

https://itfeature.com