In this post, we will discuss not only the coefficient of determination formula but also the use and computation of the coefficient of determination. Coefficient of Determination as a Link between Regression and Correlation Analysis.
Coefficient of Determination $R^2$ in Statistics
The R squared ($r^2$; the square of the correlation coefficient) shows the percentage of the total variation of the dependent variable ($Y$) that can be explained by the independent (explanatory) variable ($X$). For this reason, $r^2$ (r-squared) is sometimes called the coefficient of determination.
The coefficient of Determination (R-squared is commonly used in various fields like Social Science, Finance, and Economics to evaluate the performance of the regression models. It helps the researchers to understand how well their models capture the relationship between the variables being studied.
Since
\[r=\frac{\sum x_i y_y}{\sqrt{\sum x_i^2} \sqrt{\sum y_i^2}},\]
Coefficient of Determination Formula
\begin{align*}
r^2&=\frac{(\sum x_iy_i)^2}{(\sum x_i^2)(\sum y_i^2)}=\frac{\sum \hat{y}^2}{\sum y^2}\\
&=\frac{\text{Explained Variation}}{\text{Total Variation}}
\end{align*}
where $r$ shows the degree of covariability of $X$ and $Y$. Note that the formula used here is in deviation form, that is, $x=X-\mu$ and $y=Y-\mu$.
The link of $r^2$ between regression and correlation analysis can be considered from these points.
- If all the observations lie on the regression line then there will be no scattered points. In other words, the total variation of variable $Y$ is explained completely by the estimated regression line, which shows that there would be no scatterness in the data points(or no unexplained variation). That is
\[\frac{\sum e^2}{\sum y^2}=\frac{\text{Unexplained Variation}}{\text{Total Variation}}=0\]
Hence, $r^2=r=1$.
- If the regression line explains only part of the variation in variable $Y$ then there will be some explained variation, that is,
\[\frac{\sum e^2}{\sum y^2}=\frac{\text{Unexplained Variation}}{\text{Total Variation}}>0\]
then, $r^2$ will be smaller than 1. - If the regression line does not explain any part of the variation of variable $Y$, that is,
\[\frac{\sum e^2}{\sum y^2}=\frac{\text{Unexplained Variation}}{\text{Total Variation}}=1\Rightarrow=\sum y^2 = \sum e^2\]
then, $r^2=0$.
Because $r^2=1-\frac{\text{unexlained variation}}{\text{total variation}}$
Key Points about Coefficient of Determination
- Overfitting: A model can achieve a high $R^2$ value by simply memorizing the training data, but the model might not perform well on unseen data.
- Number of Predictors: Adding more independent variables to a model will tend to increase the $R^2$ value, but it does not necessarily mean the additional variables are statistically significant.
- Alternative Metrics: To assess the nuance of the model fit, use other metrics like adjusted R-squared or residual analysis.
Keeping in mind the limitations of R-squared, the data analysts can use the coefficient of determination as a valuable tool to assess how well their models capture real-world relationships between variables.
Note that there are two main ways to calculate R-squared value:
- Squared Correlation Coefficient: R-squared is the square of the correlation coefficient ($r$) between the predicted values ($\hat{y}$) from the model and the actual values of the dependent variable ($y$).
- Analysis of Variance (ANOVA): R-squared can also be calculated using the ratio of the explained variance to the total variance (variance in the dependent variable).
FAQs about Coefficient of Determination
- For a simple linear regression model, what is the link between the coefficient of correlation and the coefficient of determination?
- How Coefficient of Determination is interpreted?
- How Coefficient of determination can be obtained from the ANOVA table?
- How overfitting can be identified from the value of $R^2$?
- What are alternatives to $R^2$?
- What is the link between total variation, explained variation, and unexplained variation?
Learn more about the Coefficient of Determination Formula and Definition in Statistics
- Pearson’s Correlation Coefficient use, Interpretation, Properties, and Coefficient of Determination formula
- Coefficient of Determination as Model Selection Criteria
- Range of Correlation Coefficient
- Coefficient of Determination on Wikipedia