Coefficient of Determination
Coefficient of Determination as a Link between Regression and Correlation Analysis
The R squared ($r^2$; the square of the correlation coefficient) shows the percentage of the total variation of the dependent variable ($Y$) that can be explained by the independent (explanatory) variable ($X$). For this reason, $r^2$ (r-squared) is sometimes called the coefficient of determination.
Since
\[r=\frac{\sum x_i y_y}{\sqrt{\sum x_i^2} \sqrt{\sum y_i^2}},\]
then
\begin{align*}
r^2&=\frac{(\sum x_iy_i)^2}{(\sum x_i^2)(\sum y_i^2)}=\frac{\sum \hat{y}^2}{\sum y^2}\\
&=\frac{\text{Explained Variation}}{\text{Total Variation}}
\end{align*}
where $r$ shows the degree of covariability of $X$ and $Y$. Note that in the formula used here is in deviation form, that is, $x=X-\mu$ and $y=Y-\mu$.
The link of $r^2$ between regression and correlation analysis can be considered from these points.
- If all the observations lie on the regression line then there will be no scattered of points. In other words, the total variation of variable $Y$ is explained completely by the estimated regression line, which shows that there would be no scatterness in the data points(or no unexplained variation). That is
\[\frac{\sum e^2}{\sum y^2}=\frac{\text{Unexplained Variation}}{\text{Total Variation}}=0\]
Hence, $r^2=r=1$.
- If the regression line explains only part of the variation in variable $Y$ then there will be some explained variation, that is,
\[\frac{\sum e^2}{\sum y^2}=\frac{\text{Unexplained Variation}}{\text{Total Variation}}>0\]
then, $r^2$ will be smaller than 1. - If the regression line does not explain any part of the variation of variable $Y$, that is,
\[\frac{\sum e^2}{\sum y^2}=\frac{\text{Unexplained Variation}}{\text{Total Variation}}=1\Rightarrow=\sum y^2 = \sum e^2\]
then, $r^2=0$.
Because $r^2=1-\frac{\text{unexlained variation}}{\text{total variation}}$
Learn more about