**Covariance and Correlation**

**Covariance and Correlation**

**Covariance** measures the degree to which two variables co-vary (i.e. vary/ changes together). If the greater values of one variable (say, $X_i$) correspond with the greater values of the other variable (say, $X_j$), i.e. if the variables tend to show similar behaviour, then the **covariance** between two variables ($X_i$, $X_j$) will be positive. Similarly if the smaller values of one variable correspond with the smaller values of the other variable, then the **covariance** between two variables will be positive. In contrast, if the greater values of one variable (say, $X_i$) mainly correspond to the smaller values of the other variables (say, $X_j$), i.e. both of the variables tend to show opposite behaviour, then the **covariance** will be negative.

In other words, for positive **covariance** between two variables means they (both of the variables) vary/changes together in the same direction relative to their expected values (averages). It means that if one variable moves above its average value, then the other variable tend to be above its average value also. Similarly, if **covariance** is negative between the two variables, then one variable tends to be above its expected value, while the other variable tends to be below its expected value. If **covariance** is zero then it means that there is no linear dependency between the two variables. Mathematically **covariance** between two **random variables** $X_i$ and $X_j$ can be represented as

\[COV(X_i, X_j)=E[(X_i-\mu_i)(X_j-\mu_j)]\]

where

$\mu_i=E(X_i)$ is the average of the first variable

$\mu_j=E(X_j)$ is the average of the second variable

\begin{aligned}

COV(X_i, X_j)&=E[(X_i-\mu_i)(X_j-\mu_j)]\\

&=E[X_i X_j – X_i E(X_j)-X_j E(X_i)+E(X_i)E(X_j)]\\

&=E(X_i X_j)-E(X_i)E(X_j) – E(X_j)E(X_i)+E(X_i)E(X_j)\\

&=E(X_i X_j)-E(X_i)E(X_j)

\end{aligned}

**Note that**, the **covariance** of a **random variable** with itself is the **variance** of the **random variable**, i.e. $COV(X_i, X_i)=VAR(X)$. If $X_i$ and $X_j$ are independent, then $E(X_i X_j)=E(X_i)E(X_j)$ and $COV(X_i, X_j)=E(X_i X_j)-E(X_i) E(X_j)=0$.

**Covariance and Correlation**

**Covariance and Correlation**

**Correlation and covariance** are related measures but not equivalent statistical measures. The **correlation** between two variables (Let, $X_i$ and $X_j$) is their **normalized covariance,** defined as

\begin{aligned}

\rho_{i,j}&=\frac{E[(X_i-\mu_i)(X_j-\mu_j)]}{\sigma_i \sigma_j}\\

&=\frac{n \sum XY – \sum X \sum Y}{\sqrt{(n \sum X^2 -(\sum X)^2)(n \sum Y^2 – (\sum Y)^2)}}

\end{aligned}

where $\sigma_i$ is the standard deviation of $X_i$ and $\sigma_j$ is the standard deviation of $X_j$.

**Note that correlation** is the dimensionless, i.e. a number which is free of measurement unit and its values lies between -1 and +1 inclusive. In contrast **covariance** has a unit of measure–the product of the units of two variables.

For further reading about **Correlation** follows these posts

- Correlation Coefficient lies between -1 and +1
- Pearson’s Correlation Coefficient
- How to find Correlation Coefficient in SPSS