Coefficient of Partial Correlation

It measures the relationship between any two variables, where all other variables are kept constant i.e. controlling the all other variables or removing the influence of all other variables. The purpose of partial correlation is to find the unique variance between two variables while eliminating the variance from third variable. The technique of partial correlation is commonly used in “causal” modeling fewer variables. The partial correlation coefficient is determined in terms of simple correlation coefficient among the various variables involved in a multiple relationship. Assumption for partial correlation are usual assumptions of Pearson Correlation:

1. Linearity of relationships
2. The same level of relationship throughout the range of the independent variable i.e. homoscedasticity
3. Interval or near-interval data, and
4. Data whose range is not truncated.

We typically conduct correlation analysis on all variables so that you can see whether there is significant relationships amongst the variables, including any “third variables” that may have a significant relationship to the variables under investigation.

This type of analysis helps to find the spurious correlations (i.e. correlations that is explained by the effect of some other variables) as well as to reveal hidden correlations – i.e correlations masked by the effect of other variables. The partial correlation coefficient $r_{xy.z}$ can also be defined as the correlation coefficient between residuals dx and dy in this model.

Suppose we haveĀ a sample of n observations $(x1_1,x2_1,x3_1),(x1_2,x2_2,x3_2),\cdots,(x1_n,x2_n,x3_n)$ from an unknown distribution of three random variables and we want to find the coefficient of partial correlation between $X_1$ and $X_2$ keeping $X_3$ constant which can be denoted by $r_{12.3}$ is the correlation between the residuals $x_{1.3}$ and $x_{2.3}$. The coefficient $r_{12.3}$ is a partial correlation of the 1st order.

$r_{12.3}=\frac{r_{12}-r_{13} r_{23}}{\sqrt{1-r_{13}^2 } \sqrt{1-r_{23}^2 } }$

The coefficient of partial correlation between three random variables X, Y and Z can be denoted by $r_{x,y,z}$ and also be defined as the coefficient of correlation between $\hat{x}_i$ and $\hat{y}_i$ with
\begin{align*}
\hat{x}_i&=\hat{\beta}_{0x}+\hat{\beta}_{1x}z_i\\
\hat{y}_i&=\hat{\beta}_{0y}+\hat{\beta}_{1y}z_i\\
\end{align*}
where $\hat{\beta}_{0x}$ and $\hat{\beta_{1x}}$ are the least square estimators obtained by regressing $x_i$ on $z_i$ and $\hat{\beta}_{0y}$ and $\hat{\beta}_{1y}$ are the least square estimators obtained by regressing $y_i$ on $z_i$. Therefore by definition, the partial correlation between of $x$ and $y$ by controlling $z$ is $r_{xy.z}=\frac{\sum(\hat{x}_i-\overline{x})(\hat{y}_i-\overline{y})}{\sqrt{\sum(\hat{x}_i-\overline{x})^2}\sqrt{\sum(\hat{y}_i-\overline{y})^2}}$

The partial correlation coefficient is determined in terms of the simple correlation coefficients among the various variables involved in a multiple relationship.

Reference
Yule, G. U. (1926). Why do we sometimes get non-sense correlation between time series? A study in sampling and the nature of time series. J. Roy. Stat. Soc. (2) 89, 1-64.