Partial Correlation Example

In this post, we will learn about Partial Correlation and will perform on a data as Partial Correlation Example. In multiple correlations, there are more than 2 variables, (3 variables and above) also called multivariable, in partial correlation there are 3 or more variables, partial correlation is defined as the degree of the linear relationship between any two variables, in a set of multivariable data, by keeping the effect of all other variables as a constant.

Introduction to Partial Correlation Coefficient

Like Pearson’s Correlation, Partial correlation measures the strength and direction of the relationship between two variables while controlling for (or removing the influence/effect of) one or more additional variables. It helps isolate the direct association between the two variables of interest, independent of other factors.

Suppose, you are interested in studying the correlation between exercise frequency and heart health while controlling for age, partial correlation removes the effect of age to reveal the pure relationship between exercise and heart health. Partial correlation is denoted as $r_{12.3}$, where 1 and 2 are the variables of interest, and 3 is the controlled variable.

Partial Correlation Formula

For three variables say $X_1, X_2, X_3$ then the partial correlation measures the relation between $X_1$ and $X_2$ by removing the influence of $X_3$ is the partial correlation $X_1$ and $X_2$. And is given as

$$r_{12 \cdot 3}= \frac{ r_{12} – r_{13} r_{23}} {\sqrt{(1-r_{13}^2)(1- r_{23}^2)} }$$

If we want to find the partial correlation between $X_1$ and $X_3$ then

$$r_{13\cdot 2}= \frac{ r_{13} – r_{12} r_{32}}{ \sqrt{(1- r_{12}^2)(1- r_{32}^2)}}$$

If we want to find the partial correlation between $X_2$ and $X_3$ then

$$r_{23\cdot 1}= \frac{r_{23} – r_{21} r_{31}}{\sqrt{(1- r_{21}^2)(1- r_{31}^2)}}$$

Partial Correlation Graphical Representation

Partial correlation is a statistical measure of the relationship between two variables while controlling for (excluding or eliminating) the effects of one or more additional variables. For three variables, say $X, Y,$ and $Z$ is

Partial Correlation Example

Partial Correlation is used when researchers want to determine the strength and direction of the relationship between two variables without the influence of other variables. This is particularly useful in multivariate analysis where multiple variables may be interrelated. The partial correlation coefficient ranges from $-1$ to $+1$, with $-1$ indicating a perfect negative correlation, $+1$ indicating a perfect positive correlation, and 0 indicating no correlation.

Partial Correlation Example

For the Partial Correlation Example, consider the following data with some basic computation.

$X_1$$X_2$$X_3$$X_1X_2$$X_1X_3$$X_2X_3$$X_1^2$$X_2^2$$X_3^2$
741287449161
1272842414144494
148411256321966416
179515385452898125
201282401609640014464
Total7040206173321911078354110

First compute $r_{21}, r_{13}, r_{23}, r_{12}, r_{31}$, and $r_{32}$.

\begin{align}
r_{12} &= \frac{n\Sigma (x_1 x_2 ) – (\Sigma x_1)(\Sigma x_2 )} {\sqrt{\left[n\Sigma x_1 ^2 -(\Sigma x_1)^2\right] \left[n \Sigma x_2^2 – (\Sigma x_2 )^2\right]}}\\
&= \frac{5(617)-(70)(40)} {\sqrt{\left[5 (1078)-(70)^2\right]\left[5(354)-(40)^2\right]} } = 0.987\\
r_{13} &= \frac{n\Sigma(x_1 x_3 ) – (\Sigma x_1)(\Sigma x_3 )}{\sqrt{\left[n\Sigma x_1^2 – (\Sigma x_1 )^2\right]\left[n \Sigma x_3^2 – (\Sigma x_3 )^2\right]}}\\
&= \frac{5(332)-(70)(20)}{\sqrt{\left[5 (1078)-(70)^2\right]\left[5(110)-(20)^2\right]}}= 0.959\\
r_{23} &= \frac{n\Sigma(x_2 x_3 )-(\Sigma x_2 )(\Sigma x_3 )}{\sqrt{\left[n\Sigma x_2^2 -(\Sigma x_2 )^2\right]\left[n\Sigma x_3^2 -(\Sigma x_3 )^2\right]}}\\
& = \frac{5(191)-(40)(20)}{\sqrt{\left[5(354)-40^2\right]\left[5(110)-20^2\right]}}= 0.971\\
r_{12\cdot 3} &= \frac{r_{12} – r_{13} r_{23} } {\sqrt{(1 – r_{13}^2) (1 – r_{23}^2) }}\\
& = \frac{0.987-(0.959)(0.971)} {\sqrt{(1-(0.959)^2)(1-(0.971)^2)}}\\
&=\frac{0.05659}{0.0681} = 0.8305
\end{align}

Real-Life Examples of Partial Coefficient

The following are some real-life examples of partial correlation to illustrate its application in controlling for confounding variables.

  • Exercise and Health: You may want to analyze the correlation between exercise frequency and heart health while controlling for age. It is because age can affect both exercise habits and heart health, so partial correlation removes its influence to reveal the true relationship between exercise and heart health.
  • Advertising and Sales: Suppose, you want to examine the relationship between advertising spending and sales revenue while controlling for seasonality (e.g., holiday sales). It is because seasonal factors can impact both advertising and sales, so partial correlation helps determine the direct effect of advertising on sales.
  • Education and Income: You may want to study the relationship between education level and income while controlling for work experience. It is because work experience may influence both education and income, so partial correlation helps isolate the direct relationship between education and income, independent of experience.
  • Student Performance: You want to analyze the relationship between hours spent studying and exam scores while controlling for prior academic performance. Because prior academic performance may influence both study habits and exam results, partial correlation reveals the direct effect of studying on exam scores.
  • Smoking and Lung Cancer: You are interested in studying the correlation between smoking and lung cancer risk while controlling for air pollution exposure. It is because air pollution can independently affect lung cancer risk, so partial correlation isolates the impact of smoking alone.
  • Diet and Weight Loss: You want to study the correlation between calorie intake and weight loss while controlling for physical activity levels. Because, physical activity affects both calorie intake and weight loss, so partial correlation helps isolate the direct effect of diet on weight loss.

Partial correlation is commonly used in statistical analysis, especially in fields like psychology, social sciences, and any area where multivariate relationships are analyzed. In short, partial correlation provides a clearer picture of the relationship between two variables by accounting for confounding influences.

https://rfaqs.com

Partial Correlation Coefficient (2012)

The Partial Correlation Coefficient measures the relationship between any two variables, where all other variables are kept constant i.e. controlling all other variables or removing the influence of all other variables. Partial correlation aims to find the unique variance between two variables while eliminating the variance from the third variable. The partial correlation technique is commonly used in “causal” modeling of fewer variables. The coefficient is determined in terms of the simple correlation coefficient among the various variables involved in multiple relationships.

Assumptions for computing the Partial Correlation Coefficient

The assumption for partial correlation is the usual assumption of Pearson Correlation:

  1. Linearity of relationships
  2. The same level of relationship throughout the range of the independent variable i.e. homoscedasticity
  3. Interval or near-interval data, and
  4. Data whose range is not truncated.

We typically conduct correlation analysis on all variables so that you can see whether there are significant relationships amongst the variables, including any “third variables” that may have a significant relationship to the variables under investigation.

This type of analysis helps to find the spurious correlations (i.e. correlations that are explained by the effect of some other variables) and to reveal hidden correlations, i.e. correlations masked by the impact of other variables. The partial-correlation coefficient $r_{xy.z}$ can also be defined as the correlation coefficient between residuals $dx$ and $dy$ in this model.

Partial Correlation Formula

Suppose we have a sample of $n$ observations $(x1_1,x2_1,x3_1), (x1_2,x2_2,x3_2), \cdots, (x1_n,x2_n,x3_n)$ from an unknown distribution of three random variables. We want to find the coefficient of partial correlation between $X_1$ and $X_2$ keeping $X_3$ constant which can be denoted by $r_{12.3}$ as the correlation between the residuals $x_{1.3}$ and $x_{2.3}$. The coefficient $r_{12.3}$ is a partial correlation of the 1st order.

\[r_{12.3}=\frac{r_{12}-r_{13} r_{23}}{\sqrt{1-r_{13}^2 } \sqrt{1-r_{23}^2 } }\]

Partial Correlation Coefficient

The coefficient of partial correlation between three random variables $X$, $Y$, and $Z$ can be denoted by $r_{x,y,z}$ and also be defined as the coefficient of correlation between $\hat{x}_i$ and $\hat{y}_i$ with
\begin{align*}
\hat{x}_i&=\hat{\beta}_{0x}+\hat{\beta}_{1x}z_i\\
\hat{y}_i&=\hat{\beta}_{0y}+\hat{\beta}_{1y}z_i\\
\end{align*}
where $\hat{\beta}_{0x}$ and $\hat{\beta_{1x}}$ are the least square estimators obtained by regressing $x_i$ on $z_i$ and $\hat{\beta}_{0y}$ and $\hat{\beta}_{1y}$ are the least square estimators obtained by regressing $y_i$ on $z_i$. Therefore by definition, the partial-correlation between of $x$ and $y$ by controlling $z$ is
\[r_{xy.z}=\frac{\sum(\hat{x}_i-\overline{x})(\hat{y}_i-\overline{y})}{\sqrt{\sum(\hat{x}_i-\overline{x})^2}\sqrt{\sum(\hat{y}_i-\overline{y})^2}}\]

Partial Correlation Analysis

It is determined in terms of the simple correlation coefficients among the various variables involved in a multiple relationship. It is a very helpful tool in the field of statistics for understanding the true underlying relationships between variables, especially when you are dealing with potentially confounding factors.

Reference

Yule, G. U. (1926). Why do we sometimes get non-sense correlations between time series? A study in sampling and the nature of time series. J. Roy. Stat. Soc. (2) 89, 1-64.

Learn R Programming Language