Partial Correlation Coefficient (2012)

The Partial Correlation Coefficient measures the relationship between any two variables, where all other variables are kept constant i.e. controlling all other variables or removing the influence of all other variables. Partial correlation aims to find the unique variance between two variables while eliminating the variance from the third variable. The partial correlation technique is commonly used in “causal” modeling of fewer variables. The coefficient is determined in terms of the simple correlation coefficient among the various variables involved in multiple relationships.

Assumptions for computing the Partial Correlation Coefficient

The assumption for partial correlation is the usual assumption of Pearson Correlation:

  1. Linearity of relationships
  2. The same level of relationship throughout the range of the independent variable i.e. homoscedasticity
  3. Interval or near-interval data, and
  4. Data whose range is not truncated.

We typically conduct correlation analysis on all variables so that you can see whether there are significant relationships amongst the variables, including any “third variables” that may have a significant relationship to the variables under investigation.

This type of analysis helps to find the spurious correlations (i.e. correlations that are explained by the effect of some other variables) as well as to reveal hidden correlations – i.e. correlations masked by the effect of other variables. The partial correlation coefficient $r_{xy.z}$ can also be defined as the correlation coefficient between residuals $dx$ and $dy$ in this model.

Suppose we have a sample of $n$ observations $(x1_1,x2_1,x3_1),(x1_2,x2_2,x3_2),\cdots,(x1_n,x2_n,x3_n)$ from an unknown distribution of three random variables. We want to find the coefficient of partial correlation between $X_1$ and $X_2$ keeping $X_3$ constant which can be denoted by $r_{12.3}$ is the correlation between the residuals $x_{1.3}$ and $x_{2.3}$. The coefficient $r_{12.3}$ is a partial correlation of the 1st order.

\[r_{12.3}=\frac{r_{12}-r_{13} r_{23}}{\sqrt{1-r_{13}^2 } \sqrt{1-r_{23}^2 } }\]

Partial Correlation Coefficient

The coefficient of partial correlation between three random variables $X$, $Y$, and $Z$ can be denoted by $r_{x,y,z}$ and also be defined as the coefficient of correlation between $\hat{x}_i$ and $\hat{y}_i$ with
\begin{align*}
\hat{x}_i&=\hat{\beta}_{0x}+\hat{\beta}_{1x}z_i\\
\hat{y}_i&=\hat{\beta}_{0y}+\hat{\beta}_{1y}z_i\\
\end{align*}
where $\hat{\beta}_{0x}$ and $\hat{\beta_{1x}}$ are the least square estimators obtained by regressing $x_i$ on $z_i$ and $\hat{\beta}_{0y}$ and $\hat{\beta}_{1y}$ are the least square estimators obtained by regressing $y_i$ on $z_i$. Therefore by definition, the partial correlation between of $x$ and $y$ by controlling $z$ is
\[r_{xy.z}=\frac{\sum(\hat{x}_i-\overline{x})(\hat{y}_i-\overline{y})}{\sqrt{\sum(\hat{x}_i-\overline{x})^2}\sqrt{\sum(\hat{y}_i-\overline{y})^2}}\]

Partial Correlation Analysis

The coefficient of partial correlation is determined in terms of the simple correlation coefficients among the various variables involved in a multiple relationship. It is a very helpful tool in the field of statistics for understanding the true underlying relationships between variables, especially when you are dealing with potentially confounding factors.

Reference
Yule, G. U. (1926). Why do we sometimes get non-sense correlations between time series? A study in sampling and the nature of time series. J. Roy. Stat. Soc. (2) 89, 1-64.

Learn R Programming Language

What is the Measure of Kurtosis (2012)

Introduction to Kurtosis

In statistics, a measure of kurtosis is a measure of the “tailedness” of the probability distribution of a real-valued random variable. The standard measure of kurtosis is based on a scaled version of the fourth moment of the data or population. Therefore, the measure of kurtosis is related to the tails of the distribution, not its peak.

Measure of Kurtosis

Sometimes, the Measure of Kurtosis is characterized as a measure of peakedness that is mistaken. A distribution having a relatively high peak is called leptokurtic. A distribution that is flat-topped is called platykurtic. The normal distribution which is neither very peaked nor very flat-topped is also called mesokurtic.  The histogram in some cases can be used as an effective graphical technique for showing the skewness and kurtosis of the data set.

Measure of Kurtosis

Data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak.

Moment ratio and Percentile Coefficient of kurtosis are used to measure the kurtosis

Moment Coefficient of Kurtosis= $b_2 = \frac{m_4}{S^2} = \frac{m_4}{m^{2}_{2}}$

Percentile Coefficient of Kurtosis = $k=\frac{Q.D}{P_{90}-P_{10}}$
where Q.D = $\frac{1}{2}(Q_3 – Q_1)$ is the semi-interquartile range. For normal distribution, this has a value of 0.263.

Dr. Wheeler defines kurtosis as:

The kurtosis parameter is a measure of the combined weight of the tails relative to the rest of the distribution.

So, kurtosis is all about the tails of the distribution – not the peakedness or flatness.

A normal random variable has a kurtosis of 3 irrespective of its mean or standard deviation. If a random variable’s kurtosis is greater than 3, it is considered Leptokurtic. If its kurtosis is less than 3, it is considered Platykurtic.

A large value of kurtosis indicates a more serious outlier issue and hence may lead the researcher to choose alternative statistical methods.

Measure of Kurtosis

Some Examples of Kurtosis

  • In finance, risk and insurance are examples of needing to focus on the tail of the distribution and not assuming normality.
  • Kurtosis helps in determining whether the resource used within an ecological guild is truly neutral or which it differs among species.
  • The accuracy of the variance as an estimate of the population $\sigma^2$ depends heavily on kurtosis.

For further reading see Moments in Statistics

FAQs about Kurtosis

  1. Define Kurtosis.
  2. What is the moment coefficient of Kurtosis?
  3. What is the definition of kurtosis by Dr. Wheeler?
  4. Give examples of kurtosis from real life.

R Frequently Asked Language

Sampling Error Definition, Example, Formula

In Statistics, sampling error also called estimation error is the amount of inaccuracy in estimating some value that is caused by only a portion of a population (i.e. sample) rather than the whole population. It is the difference between the statistic (value of the sample, such as sample mean) and the corresponding parameter (value of population, such as population mean) is called the sampling error. If $\bar{x}$ is the sample statistic and $\mu$ is the corresponding population parameter then it is defined as \[\bar{x} – \mu\].

Exact calculation/ measurements of sampling error are not feasible generally as the true value of the population is unknown usually, however, it can often be estimated by probabilistic modeling of the sample.

Sampling Error

Causes of Sampling-Error

  • The cause of the Error discussed may be due to the biased sampling procedure. Every research should select sample(s) that is free from any bias and the sample(s) are representative of the entire population of interest.
  • Another cause of this Error is chance. The process of randomization and probability sampling is done to minimize the sampling process error but it is still possible that all the randomized subjects/ objects are not representative of the population.

Eliminate/ Reduce the Sampling Error

The elimination/ Reduction of sampling-error can be done when a proper and unbiased probability sampling technique is used by the researcher and the sample size is large enough.

  • Increasing the sample size
    The sampling-error can be reduced by increasing the sample size. If the sample size $n$ is equal to the population size $N$, then the sampling error will be zero.
  • Improving the sample design i.e. By using the stratification
    The population is divided into different groups containing similar units.

The potential Sources of Errors are:

Potential Sources of Sampling and Non-Sampling

Also Read: Sampling and Non-Sampling Errors

Read more about Sampling Error on Wikipedia

https://rfaqs.com

Truth about Bias in Statistics

Bias in Statistics is defined as the difference between the expected value of a statistic and the true value of the corresponding parameter. Therefore, the bias is a measure of the systematic error of an estimator. The bias indicates the distance of the estimator from the true value of the parameter. For example, if we calculate the mean of a large number of unbiased estimators, we will find the correct value.

Bias in Statistics: The Difference between Expected and True Value

In other words, the bias (sampling error) is a systematic error in measurement or sampling and it tells how far off on the average the model is from the truth.

Gauss, C.F. (1821) during his work on the least-squares method gave the concept of an unbiased estimator.

The bias of an estimator of a parameter should not be confused with its degree of precision as the degree of precision is a measure of the sampling error. The bias is favoring one group or outcome intentionally or unintentionally over other groups or outcomes available in the population under study. Unlike random errors, bias is a serious problem and bias can be reduced by increasing the sample size and averaging the outcomes.

Bias in Statistics

Several types of bias should not be considered mutually exclusive

  • Selection Bias (arise due to systematic differences between the groups compared)
  • Exclusion Bias (arises due to the systematic exclusion of certain individuals from the study)
  • Analytical Bias (arise due to the way that the results are evaluated)

Mathematically Bias can be defined as

Let statistics $T$ used to estimate a parameter $\theta$, if $E(T) = \theta$+ bias$(\theta)$ then bias$(\theta)$ is called the bias of the statistic $T$, where $E(T)$ represents the expected value of the statistics $T$.

Note: that if bias$(\theta)=0$, then $E(T)=\theta$. So, $T$ is an unbiased estimator of the true parameter, say $\Theta$.

Types of Sample Selection Bias

Reference:
Gauss, C.F. (1821, 1823, 1826). Theoria Combinations Observationum Erroribus Minimis Obnoxiae, Parts 1, 2 and suppl. Werke 4, 1-108.

For further reading about Statistical Bias visit: Bias in Statistics.

Learn about Estimation and Types of Estimation