# High Correlation does not Indicate Cause and Effect

The correlation is a measure of the co-variability of variables. It is used to measure the strength between two quantitative variables. It also tells the direction of a relationship between the variables. The positive value of the correlation coefficient indicates that there is a direct (supportive or positive) relationship between the variables while the negative value indicates there is negative (opposite or indirect) relationship between the variables.

By definition, the correlation is interdependence between two quantitative variables. The causation (known as) cause and effect, is when an observed event or action appears to have caused a second event or action. Therefore, It does not necessarily imply any functional relationship between variables concerned. Correlation theory does not establish any causal relationship between the variables as it is interdependence between the variables. Knowledge of the value of the coefficient of correlation r alone will not enable us to predict the value of Y from X.

Sometimes there is the high correlation between unrelated variable such as the number of births and numbers of murders in a country. This is a spurious correlation.

For example, suppose there is a positive correlation between watching violent movies and violent behavior in adolescence. The cause of both these could be a third variable (extraneous variable) say, growing up in a violent environment which causes the adolescence to watch violence related movies and to have violent behavior.

Other Examples

• The number of absences from class lecture decreases the grades.
• As the weather gets colder, air conditioning costs decrease.
• As the speed of the train (car, bus, or any other vehicle) is increased the length of time to get to the final point will also decrease.
• As the age of a chicken increases the number of eggs it produces also decreases.

## Sampling Error Definition, Example, Formula

In Statistics, sampling error also called estimation error which is the amount of inaccuracy in estimating some value that is caused by only a portion of a population (i.e. sample) rather than the whole population. It is the difference between the statistic (value of the sample, such as sample mean) and the corresponding parameter (value of population, such as population mean) is called the sampling error. If $\bar{x}$ is the sample statistic and $\mu$ is the corresponding population parameter then the sampling error is defined as $\bar{x} – \mu$.

Exact calculation/ measurements of sampling error is not feasible generally as the true value of population is unknown usually, however it can often be estimated by probabilistic modeling of the sample.

Causes of Sampling Error

• The cause of the Error discussed may be due to the biased sampling procedure. Every research should select sample(s) that is free from any bias and the sample(s) is representative of the entire population of interest.
• Another cause of this Error is chance. The process of randomization and probability sampling is done to minimize the sampling process error but it is still possible that all the randomized subjects/ objects are not the representative of the population.

Eliminate/ Reduce the Sampling Error

The elimination/ Reduction of sampling error can be done when a proper and unbiased probability sampling technique is used by the researcher and the sample size is large enough.

• Increasing the sample size
The sampling error can be reduced by increasing the sample size. If the sample size $n$ is equal to the population size $N$, then the sampling error will be zero.
• Improving the sample design i.e. By using the stratification
The population is divided into different groups containing similar units.

Also Read: Sampling and Non-Sampling Errors

# Question: Differentiate Between Errors and Residuals in the Linear Model

In Statistics and Optimization, Statistical Errors and Residuals are two closely related and easily confused measures of “Deviation of a sample from the mean”.

Error is misnomer; an error is the amount by which an observation differs from its expected value. The errors e are unobservable random variable, assumed to have zero mean and uncorrelated elements each with common variance  σ2.

A Residual, on the other hand, is an observable estimate of the unobservable error. The residuals $\hat{e}$ are computed quantities with mean ${E(\hat{e})=0}$ and variance ${V(\hat{e})=\sigma^2 (1-H)}$.

Like the errors, each of the residuals has zero mean, but each residual may have a different variance. Unlike the errors the residuals are correlated. The residuals are linear combinations of the errors. If the errors are normally distributed so are the errors.

Note that the sum of the residuals is necessarily zero, and thus the residuals are necessarily not independent. The sum of the errors need not be zero; the errors are independent random variables if the individuals are chosen from the population independently.

## Testing of Hypothesis or Hypothesis Testing

#### To whom is the researcher similar to in hypothesis testing: the defense attorney or the prosecuting attorney? Why?

The researcher is similar to the prosecuting attorney in the sense that the researcher brings the null hypothesis “to trial” when she believes there is a probability of strong evidence against the null.

• Just as the prosecutor usually believes that the person on trial is not innocent, the researcher usually believes that the null hypothesis is not true.
• In the court system the jury must assume (by law) that the person is innocent until the evidence clearly calls this assumption into question; analogously, in hypothesis testing the researcher must assume (in order to use hypothesis testing) that the null hypothesis is true until the evidence calls this assumption into question.