Correlation Coefficient Range Unlock Easy Proof, 2012

Post Views: 1,383

The coefficient of correlation (r) measures the strength and direction of a linear relationship between two variables. In this post, we will discuss about coefficient of correlation and the coefficient of determination.

Correlation Coefficient Ranges

The correlation coefficient ranges from -1 to +1, where a value of +1 indicates the perfect positive correlation (as one variable increases, the other increases proportionally), the -1 value indicates the perfect negative correlation (as one variable increases, the other decreases proportionally), and the value of 0 indicates no linear correlation (no relationship between the variables).

The coefficient of correlation values between -1 and +1 indicate the degree of strength and direction of rthe elationship:

The strength of correlation depends on the absolute value of r:

Range of Correlation Value	Interpretation
0.90 to 1.00	Very strong correlation
0.70 to 0.89	Strong correlation
0.40 to 0.69	Moderate correlation
0.10 to 0.39	Weak correlation
0.00 to 0.09	No or negligible correlation

The closer the value of the correlation coefficient is to ±1, the stronger the linear relationship.

Coefficient of Determination

We know that the ratio of the explained variation to the total variation is called the coefficient of determination, which is the square of the Correlation Coefficient Range and lies between $-1$ and $+1$. This ratio (coefficient of determination) is non-negative; therefore, denoted by $r^2$, thus

\begin{align*}
r^2&=\frac{\text{Explained Variation}}{\text{Total Variation}}\\
&=\frac{\sum (\hat{Y}-\overline{Y})^2}{\sum (Y-\overline{Y})^2}
\end{align*}

It can be seen that if the total variation is all explained, the ratio $r^2$ (Coefficient of Determination) is one, and if the total variation is all unexplained, then the explained variation and the ratio $r^2$ are zero.

The square root of the coefficient of determination is called the correlation coefficient, given by

\begin{align*}
r&=\sqrt{ \frac{\text{Explained Variation}}{\text{Total Variation}} }\\
&=\pm \sqrt{\frac{\sum (\hat{Y}-\overline{Y})^2}{\sum (Y-\overline{Y})^2}}
\end{align*}

and

\[\sum (\hat{Y}-\overline{Y})^2=\sum(Y-\overline{Y})^2-\sum (Y-\hat{Y})^2\]

Therefore

\begin{align*}
r&=\sqrt{ \frac{\sum(Y-\overline{Y})^2-\sum (Y-\hat{Y})^2} {\sum(Y-\overline{Y})^2} }\\
&=\sqrt{1-\frac{\sum (Y-\hat{Y})^2}{\sum(Y-\overline{Y})^2}}\\
&=\sqrt{1-\frac{\text{Unexplained Variation}}{\text{Total Variation}}}=\sqrt{1-\frac{S_{y.x}^2}{s_y^2}}
\end{align*}

where $s_{y.x}^2=\frac{1}{n} \sum (Y-\hat{Y})^2$ and $s_y^2=\frac{1}{n} \sum (Y-\overline{Y})^2$

\begin{align*}
\Rightarrow r^2&=1-\frac{s_{y.x}^2}{s_y^2}\\
\Rightarrow s_{y.x}^2&=s_y^2(1-r^2)
\end{align*}

Since variances are non-negative

\[\frac{s_{y.x}^2}{s_y^2}=1-r^2 \geq 0\]

Solving for inequality, we have

\begin{align*}
1-r^2 & \geq 0\\
\Rightarrow r^2 \leq 1\, \text{or}\, |r| &\leq 1\\
\Rightarrow & -1 \leq r\leq 1
\end{align*}

Therefore, the Correlation Coefficient Range lies between $-1$ and $+1$ inclusive.

Alternative Proof: Correlation Coefficient Range

Since $\rho(X,Y)=\rho(X^*,Y^*)$ where $X^*=\frac{X-\mu_X}{\sigma_X}$ and $Y^*=\frac{Y-Y^*}{\sigma_Y}$

and as covariance is bi-linear and $X^*, Y^*$ have zero mean and variance 1, therefore

\begin{align*}
\rho(X^*,Y^*)&=Cov(X^*,Y^*)=Cov\{\frac{X-\mu_X}{\sigma_X},\frac{Y-\mu_Y}{\sigma_Y}\}\\
&=\frac{Cov(X-\mu_X,Y-\mu_Y)}{\sigma_X\sigma_Y}\\
&=\frac{Cov(X,Y)}{\sigma_X \sigma_Y}=\rho(X,Y)
\end{align*}

We also know that the variance of any random variable is $\ge 0$; it could be zero, i.e., $(Var(X)=0)$ if and only if $X$ is a constant (almost surely), therefore

\[V(X^* \pm Y^*)=V(X^*)+V(Y^*)\pm2Cov(X^*,Y^*)\]

As $Var(X^*)=1$ and $Var(Y^*)=1$, the above equation would be negative if $Cov(X^*,Y^*)$ is either greater than 1 or less than -1. Hence \[1\geq \rho(X,Y)=\rho(X^*,Y^*)\geq -1\].

If $\rho(X,Y)=Cov(X^*,Y^*)=1$ then $Var(X^*- Y^*)=0$ making $X^*= Y^*$ almost surely. Similarly, if $\rho(X,Y )=Cov(X^*,Y^*)=-1$ then $X^* = – Y^*$ almost surely. In either case, $Y$ would be a linear function of $X$ almost surely.

For proof of Cauchy-Schwarz Inequality, please follow the link

We can see that the Correlation Coefficient range lies between $-1$ and $+1$.

Real-Life Example

Variable 1	Variable 2	Coefficient Value	Interpretation
Study hours	Exam scores	+0.85	Strong positive
Screen time	Sleep duration	-0.70	Strong negative
Age	Shoe size	~0.00	No linear correlation

FAQs about Correlation Coefficient

What is a coefficient of correlation?
What does a positive or negative correlation mean?
What is a strong or weak correlation?
Can correlation imply causation?
What are the types of correlation coefficients?
When should I use Pearson vs. Spearman correlation?
What are the assumptions of the Pearson correlation?
Can correlation be used for more than two variables?
How is correlation different from regression?
How is the correlation coefficient calculated?
What does a zero correlation mean?
Can correlation be misleading?

Learn more about

Correlation Coefficient Range

Table of Contents

Correlation Coefficient Ranges

Coefficient of Determination

Alternative Proof: Correlation Coefficient Range

Real-Life Example

FAQs about Correlation Coefficient

Related

Leave a Comment Cancel reply

Table of Contents

Correlation Coefficient Ranges

Coefficient of Determination

Alternative Proof: Correlation Coefficient Range

Real-Life Example

FAQs about Correlation Coefficient

Share this:

Related

Leave a Comment Cancel reply

Discover more from Statistics for Data Science & Analytics