The coefficient of correlation (r) measures the strength and direction of a linear relationship between two variables. In this post, we will discuss about coefficient of correlation and coefficient of determination.
Table of Contents
Correlation Coefficient Ranges
The correlation coefficient ranges from -1 to +1, where a value of +1 indicates the perfect positive correlation (as one variable increases, the other increases proportionally), the -1 value indicates the perfect negative correlation (as one variable increases, the other decreases proportionally), and the value of 0 indicates no linear correlation (no relationship between the variables).
The coefficient of correlation values between -1 and +1 indicates the degree of correlation:
- 0 to 0.3 (or -0.3 to 0): Weak correlation.
- 0.3 to 0.7 (or -0.7 to -0.3): Moderate correlation.
- 0.7 to 1 (or -1 to -0.7): Strong correlation.
The closer the value is to ±1, the stronger the linear relationship.
Coefficient of Determination
We know that the ratio of the explained variation to the total variation is called the coefficient of determination which is the square of the Correlation Coefficient Range lies between $-1$ and $+1$. This ratio (coefficient of determination) is non-negative, therefore denoted by $r^2$, thus
r^2&=\frac{\text{Explained Variation}}{\text{Total Variation}}\\
&=\frac{\sum (\hat{Y}-\overline{Y})^2}{\sum (Y-\overline{Y})^2}
It can be seen that if the total variation is all explained, the ratio $r^2$ (Coefficient of Determination) is one and if the total variation is all unexplained then the explained variation and the ratio $r^2$ are zero.
The square root of the coefficient of determination is called the correlation coefficient, given by
r&=\sqrt{ \frac{\text{Explained Variation}}{\text{Total Variation}} }\\
&=\pm \sqrt{\frac{\sum (\hat{Y}-\overline{Y})^2}{\sum (Y-\overline{Y})^2}}
\[\sum (\hat{Y}-\overline{Y})^2=\sum(Y-\overline{Y})^2-\sum (Y-\hat{Y})^2\]
r&=\sqrt{ \frac{\sum(Y-\overline{Y})^2-\sum (Y-\hat{Y})^2} {\sum(Y-\overline{Y})^2} }\\
&=\sqrt{1-\frac{\sum (Y-\hat{Y})^2}{\sum(Y-\overline{Y})^2}}\\
&=\sqrt{1-\frac{\text{Unexplained Variation}}{\text{Total Variation}}}=\sqrt{1-\frac{S_{y.x}^2}{s_y^2}}
where $s_{y.x}^2=\frac{1}{n} \sum (Y-\hat{Y})^2$ and $s_y^2=\frac{1}{n} \sum (Y-\overline{Y})^2$
\Rightarrow r^2&=1-\frac{s_{y.x}^2}{s_y^2}\\
\Rightarrow s_{y.x}^2&=s_y^2(1-r^2)
Since variances are non-negative
\[\frac{s_{y.x}^2}{s_y^2}=1-r^2 \geq 0\]
Solving for inequality we have
1-r^2 & \geq 0\\
\Rightarrow r^2 \leq 1\, \text{or}\, |r| &\leq 1\\
\Rightarrow & -1 \leq r\leq 1
Therefore, the Correlation Coefficient Range lies between $-1$ and $+1$ inclusive.
Alternative Proof: Correlation Coefficient Range
Since $\rho(X,Y)=\rho(X^*,Y^*)$ where $X^*=\frac{X-\mu_X}{\sigma_X}$ and $Y^*=\frac{Y-Y^*}{\sigma_Y}$
and as covariance is bi-linear and $X^*, Y^*$ have zero mean and variance 1, therefore
&=\frac{Cov(X,Y)}{\sigma_X \sigma_Y}=\rho(X,Y)
We also know that the variance of any random variable is $\ge 0$, it could be zero i.e. $(Var(X)=0)$ if and only if $X$ is a constant (almost surely), therefore
\[V(X^* \pm Y^*)=V(X^*)+V(Y^*)\pm2Cov(X^*,Y^*)\]
As $Var(X^*)=1$ and $Var(Y^*)=1$, the above equation would be negative if $Cov(X^*,Y^*)$ is either greater than 1 or less than -1. Hence \[1\geq \rho(X,Y)=\rho(X^*,Y^*)\geq -1\].
If $\rho(X,Y )=Cov(X^*,Y^*)=1$ then $Var(X^*- Y ^*)=0$ making $X^* = Y^*$ almost surely. Similarly, if $\rho(X,Y )=Cov(X^*,Y^*)=-1$ then $X^* = – Y^*$ almost surely. In either case, $Y$ would be a linear function of $X$ almost surely.
For proof of Cauchy-Schwarz Inequality please follow the link
We can see that the Correlation Coefficient range lies between $-1$ and $+1$.
Learn more about
- Pearson’s Correlation Coefficient use, Interpretation, and Properties
- Coefficient of Determination as Model Selection Criteria