Coefficient of Determination

Coefficient of Determination as a Link between Regression and Correlation Analysis

The R squared ($r^2$; the square of the correlation coefficient) shows the percentage of the total variation of the dependent variable ($Y$) that can be explained by the independent (explanatory) variable ($X$). For this reason, $r^2$ (r-squared) is sometimes called the coefficient of determination.

Since

\[r=\frac{\sum x_i y_y}{\sqrt{\sum x_i^2} \sqrt{\sum y_i^2}},\]

then

\begin{align*}
r^2&=\frac{(\sum x_iy_i)^2}{(\sum x_i^2)(\sum y_i^2)}=\frac{\sum \hat{y}^2}{\sum y^2}\\
&=\frac{\text{Explained Variation}}{\text{Total Variation}}
\end{align*}

where $r$ shows the degree of covariability of $X$ and $Y$. Note that in the formula used here is in deviation form, that is, $x=X-\mu$ and $y=Y-\mu$.

The link of $r^2$ between regression and correlation analysis can be considered from these points.

  • If all the observations lie on the regression line then there will be no scattered of points. In other words, the total variation of variable $Y$ is explained completely by the estimated regression line, which shows that there would be no scatterness in the data points(or no unexplained variation). That is
    \[\frac{\sum e^2}{\sum y^2}=\frac{\text{Unexplained Variation}}{\text{Total Variation}}=0\]
    Hence, $r^2=r=1$.
  • If the regression line explains only part of the variation in variable $Y$ then there will be some explained variation, that is,
    \[\frac{\sum e^2}{\sum y^2}=\frac{\text{Unexplained Variation}}{\text{Total Variation}}>0\]
    then, $r^2$ will be smaller than 1.
  • If the regression line does not explain any part of the variation of variable $Y$, that is,
    \[\frac{\sum e^2}{\sum y^2}=\frac{\text{Unexplained Variation}}{\text{Total Variation}}=1\Rightarrow=\sum y^2 = \sum e^2\]
    then, $r^2=0$.

Because $r^2=1-\frac{\text{unexlained variation}}{\text{total variation}}$

Learn more about

Muhammad Imdad Ullah

Currently working as Assistant Professor of Statistics in Ghazi University, Dera Ghazi Khan. Completed my Ph.D. in Statistics from the Department of Statistics, Bahauddin Zakariya University, Multan, Pakistan. l like Applied Statistics, Mathematics, and Statistical Computing. Statistical and Mathematical software used is SAS, STATA, Python, GRETL, EVIEWS, R, SPSS, VBA in MS-Excel. Like to use type-setting LaTeX for composing Articles, thesis, etc.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

x Logo: Shield Security
This Site Is Protected By
Shield Security