Covariance and Correlation

Introduction to Covariance and Correlation

Covariance and correlation are very important terminologies in statistics. Covariance measures the degree to which two variables co-vary (i.e., vary/change together). If the greater values of one variable (say, $X_i$) correspond with the greater values of the other variable (say, $X_j$), i.e., if the variables tend to show similar behavior, then the covariance between two variables ($X_i$, $X_j$) will be positive.

Similarly, if the smaller values of one variable correspond with the smaller values of the other variable, then the covariance between the two variables will be positive. In contrast, if the greater values of one variable (say, $X_i$) mainly correspond to the smaller values of the other variables (say, $X_j$), i.e., both of the variables tend to show opposite behavior, then the covariance will be negative.

In other words, positive covariance between two variables means they (both of the variables) vary/change together in the same direction relative to their expected values (averages). It means that if one variable moves above its average value, the other variable tends to be above its average value.

Similarly, if the covariance is negative between the two variables, then one variable tends to be above its expected value, while the other variable tends to be below its expected value. If covariance is zero then it means that there is no linear dependency between the two variables.

Mathematical Representation of Covariance

Mathematically covariance between two random variables $X_i$ and $X_j$ can be represented as
\[COV(X_i, X_j)=E[(X_i-\mu_i)(X_j-\mu_j)]\]
where
$\mu_i=E(X_i)$ is the average of the first variable
$\mu_j=E(X_j)$ is the average of the second variable

\begin{aligned}
COV(X_i, X_j)&=E[(X_i-\mu_i)(X_j-\mu_j)]\\
&=E[X_i X_j – X_i E(X_j)-X_j E(X_i)+E(X_i)E(X_j)]\\
&=E(X_i X_j)-E(X_i)E(X_j) – E(X_j)E(X_i)+E(X_i)E(X_j)\\
&=E(X_i X_j)-E(X_i)E(X_j)
\end{aligned}

Covariance

Note that the covariance of a random variable with itself is the variance of the random variable, i.e. $COV(X_i, X_i)=VAR(X)$. If $X_i$ and $X_j$ are independent, then $E(X_i X_j)=E(X_i)E(X_j)$ and $COV(X_i, X_j)=E(X_i X_j)-E(X_i) E(X_j)=0$.

Covariance and Correlation

Covariance and Correlation

Correlation and covariance are related measures but not equivalent statistical measures.

Equation of Correlation (Normalized Covariance)

The correlation between two variables (Let, $X_i$ and $X_j$) is their normalized covariance, defined as
\begin{aligned}
\rho_{i,j}&=\frac{E[(X_i-\mu_i)(X_j-\mu_j)]}{\sigma_i \sigma_j}\\
&=\frac{n \sum XY – \sum X \sum Y}{\sqrt{(n \sum X^2 -(\sum X)^2)(n \sum Y^2 – (\sum Y)^2)}}
\end{aligned}
where $\sigma_i$ is the standard deviation of $X_i$ and $\sigma_j$ is the standard deviation of $X_j$.

Note that correlation is dimensionless, i.e. a number that is free of the measurement unit and its values lie between -1 and +1 inclusive. In contrast, covariance has a unit of the product of the units of two variables.

When to Use Covariance and Correlation

The covariance and correlation should be used as described below:

  • Covariance: Useful in portfolio theory (finance).
  • Correlation: Preferred in most cases (e.g., psychology, medicine, ML) due to standardized interpretation.

For example, the correlation between study hours & exam scores can be used to measure the strength of the relationship (e.g.,$ r = 0.7$ shows a strong positive link between study hours and exam scores).

Similarly, the Covariance between stock returns Helps in diversification.

The Sign of Covariance

The Sign Matters covariance matters:

  • Positive Covariance: Variables move together (↑X → ↑Y).
  • Negative Covariance: Variables move inversely (↑X → ↓Y).

Limitation of Covariance

The value of covariance depends on units (for example, covariance of “hours vs. scores” $\ne$ “minutes vs. scores”). For unitless measures, use correlation for standardized interpretation.

For further reading about Correlation, follow these postsThe

R Frequently Asked Questions

Leave a Comment

Discover more from Statistics for Data Science & Analytics

Subscribe now to keep reading and get access to the full archive.

Continue reading