MCQs Correlation and Regression Analysis 6

The post is about MCQs correlation and Regression Analysis with Answers. There are 20 multiple-choice questions covering topics related to correlation and regression analysis, coefficient of determination, testing of correlation and regression coefficient, Interpretation of regression coefficients, and the method of least squares, etc. Let us start with MCQS Correlation and Regression Analysis with answers.

Online Multiple-Choice Questions about Correlation and Regression Analysis with Answers

1. Which one of the following statements is true?

 
 
 
 

2. The estimated regression line relating the market value of a person’s stock portfolio to his annual income is $Y=5000+0.10X$. This means that each additional rupee of income will increase the stock portfolio by

 
 
 
 

3. If the coefficient of determination is 0.49, the correlation coefficient may be

 
 
 
 

4. The strength of the linear relationship between two numerical variables may be measured by the

 
 
 
 

5. What do we mean when a simple linear regression model is “statistically” useful?

 
 
 
 

6. The sample correlation coefficient between $X$ and $Y$ is 0.375. It has been found that the p-value is 0.256 when testing $H_0:\rho = 0$ against the two-sided alternative $H_1:\rho\ne 0$. To test $H_0:\rho=0$ against the one-sided alternative $H_1:\rho<0$ at a significance level of 0.193, the p-value is

 
 
 
 

7. Assuming a linear relationship between $X$ and $Y$ if the coefficient of correlation equals $-0.30$

 
 
 
 

8. The true correlation coefficient $\rho$ will be zero only if

 
 
 
 

9. Testing for the existence of correlation is equivalent to

 
 
 
 

10. Which of the following does the least squares method minimize?

 
 
 
 

11. The sample correlation coefficient between $X$ and $Y$ is 0.375. It has been found that the p-value is 0.256 when testing $H_0:\rho = 0$ against the two-sided alternative $H_1:\rho\ne 0$. To test $H_0:\rho =0$ against the one-sided alternative $H_1:\rho >0$ at a significance level of 0.193, the p-value is

 
 
 
 

12. If the correlation coefficient ($r=1.00$) then

 
 
 
 

13. The $Y$ intercept ($b_0$) represents the

 
 
 
 

14. If you wanted to find out if alcohol consumption (measured in fluid oz.) and grade point average on a 4-point scale are linearly related, you would perform a

 
 
 
 

15. Which one of the following situations is inconsistent?

 
 
 
 

16. The correlation coefficient

 
 
 
 

17. In a simple linear regression problem, $r$ and $\beta_1$

 
 
 
 

18. The slope ($b_1$) represents

 
 
 
 

19. The sample correlation coefficient between $X$ and $Y$ is 0.375. It has been found that the p-value is 0.256 when testing $H_0:\rho=0$ against the one-sided alternative $H_1:\rho>0$. To test $H_0:\rho =04 against the two-sided alternative $H_1:\rho\ne 0$ at a significance level of 0.193, the p-value is

 
 
 
 

20. If the correlation coefficient $r=1.00$ then

 
 
 
 

MCQs Correlation and Regression Analysis with Answers

MCQs Correlation and Regression Analysis

  • The $Y$ intercept ($b_0$) represents the
  • The slope ($b_1$) represents
  • Which of the following does the least squares method minimize?
  • What do we mean when a simple linear regression model is “statistically” useful?
  • If the correlation coefficient $r=1.00$ then
  • If the correlation coefficient ($r=1.00$) then
  • Assuming a linear relationship between $X$ and $Y$ if the coefficient of correlation equals $-0.30$
  • Testing for the existence of correlation is equivalent to
  • The strength of the linear relationship between two numerical variables may be measured by the
  • In a simple linear regression problem, $r$ and $\beta_1$
  • The sample correlation coefficient between $X$ and $Y$ is 0.375. It has been found that the p-value is 0.256 when testing $H_0:\rho = 0$ against the two-sided alternative $H_1:\rho\ne 0$. To test $H_0:\rho=0$ against the one-sided alternative $H_1:\rho<0$ at a significance level of 0.193, the p-value is The sample correlation coefficient between $X$ and $Y$ is 0.375. It has been found that the p-value is 0.256 when testing $H_0:\rho = 0$ against the two-sided alternative $H_1:\rho\ne 0$. To test $H_0:\rho =0$ against the one-sided alternative $H_1:\rho >0$ at a significance level of 0.193, the p-value is
  • The sample correlation coefficient between $X$ and $Y$ is 0.375. It has been found that the p-value is 0.256 when testing $H_0:\rho=0$ against the one-sided alternative $H_1:\rho>0$. To test $H_0:\rho =04 against the two-sided alternative $H_1:\rho\ne 0$ at a significance level of 0.193, the p-value is
  • If you wanted to find out if alcohol consumption (measured in fluid oz.) and grade point average on a 4-point scale are linearly related, you would perform a
  • The correlation coefficient
  • If the coefficient of determination is 0.49, the correlation coefficient may be
  • The estimated regression line relating the market value of a person’s stock portfolio to his annual income is $Y=5000+0.10X$. This means that each additional rupee of income will increase the stock portfolio by
  • Which one of the following situations is inconsistent?
  • Which one of the following statements is true?
  • The true correlation coefficient $\rho$ will be zero only if
Statistics Help https://itfeature.com MCQs Correlation and Regression

https://rfaqs.com, https://gmstat.com

Properties of Correlation Coefficient (2024)

The coefficient of correlation is a statistic used to measure the strength and direction of the linear relationship between two Quantitative variables.

Properties of Correlation Coefficient

Understanding these properties helps us to interpret the correlation coefficient accurately and avoid misinterpretations. The following are some important Properties of Correlation Coefficient.

  • The correlation coefficient ($r$) between $X$ and $Y$ is the same as the correlation between $Y$ and $X$. that is the correlation is symmetric with respect to $X$ and $Y$, i.e., $r_{XY} = r_{YX}$.
  • The $r$ ranges from $-1$ to $+1$, i.e., $-1\le r \le +1$.
  • There is no unit of $r$. The correlation coefficient $r$ is independent of the unit of measurement.
  • It is not affected by the change of origin and scale, i.e., $r_{XY}=r_{YX}$. If a constant is added to each value of a variable, it is called a change of origin and if each value of a variable is multiplied by a constant, it is called a change of scale.
  • The $r$ is the geometric mean of two regression coefficients, i.e., $\sqrt{b_{YX}\times b_{XY}}$.
    In other words, if the two regression lines of $Y$ on $X$ and $X$ on $Y$ are written as $Y=a+bX$ and $X=c+dy$ respectively then $bd=r^2$.
  • The sign of $r_{XY}, b_{YX}$, and $b_{XY}$ is dependent on covariance which is common in the three as given below:
  • $r=\frac{Cov(X, Y)}{\sqrt{Var(X) Var(Y)}},\,\, b_{YX} = \frac{Cov(Y, X)}{Var(X)}, \,\, b_{XY}=\frac{Cov(Y, X)}{Var(Y)}$

Hence, $r_{YX}, b_{YX}$, and $b_{XY}$ have the same sign.

  • If $r=-1$ the correlation is perfectly negative, meaning as one variable increases the other increases proportionally.
  • If $r=+1$ the correlation is perfectly positive, meaning as one variable increases the other decreases proportionally.
  • If $r=0$ there is no correlation, i.e., there is no linear relationship between the variables. However, a non-linear relationship may exist but it does not necessarily mean that the variables are independent.
Properties of Correlation Coefficient

Theorem: Correlation: Independent of Origin and Scale. Show that the correlation coefficient is independent of origin and scale, i.e., $r_{XY}=r_{uv}$.

Proof: The formula for correlation coefficient is,

$$r_{XY}=\frac{\varSigma(X-\overline{X})((Y-\overline{Y})) }{\sqrt{[\varSigma(X-\overline{X})^2][\varSigma(Y-\overline{Y})^2]}}$$

\begin{align*}
\text{Let}\quad u&=\frac{X-a}{h}\\
\Rightarrow X&=a+hu \Rightarrow \overline{X}=a+h\overline{u} \\
\text{and}\quad v&=\frac{Y-b}{K}\\
\Rightarrow Y&=b+Kv \Rightarrow \overline{Y}=b+K\overline{v}\\
\text{Therefore}\\
r_{uv}&=\frac{\varSigma(u-\overline{u})((v-\overline{v})) }{\sqrt{[\varSigma(u-\overline{u})^2][\varSigma(v-\overline{v})^2]}}\\
&=\frac{\varSigma (a+hu-a-h\overline{u}) (b+Kv-b-K\overline{v})} {\sqrt{\varSigma(a+hu-a-h\overline{u})^2\varSigma(b+Kv-b-K\overline{v})^2}}\\
&=\frac{\varSigma(hu-h\overline{u})(Kv-K\overline{v})}{\sqrt{[\varSigma(hu-h\overline{u})^2][\varSigma(Kv-K\overline{v})^2]}}\\
&=\frac{hK\varSigma(u-\overline{u})(v-\overline{v})}{\sqrt{[h^2 K^2 \varSigma(u-\overline{u})^2] [\varSigma(v-\overline{v})^2]}}\\
&=\frac{hK\varSigma(u-\overline{u})(v-\overline{v})}{hK\,\sqrt{[\varSigma(u-\overline{u})^2] [\varSigma(v-\overline{v})^2]}}\\
&=\frac{\varSigma(u-\overline{u})(v-\overline{v}) }{\sqrt{[\varSigma(u-\overline{u})^2][\varSigma(v-\overline{v})^2]}}=
r_{uv}
\end{align*}

Correlation Coefficient Range

Note that

  1. Non-causality: Correlation does not imply causation. If two variables are strongly correlated, it does not necessarily mean that changes in one variable cause changes in the other. This is because the correlation only measures the strength and direction of the linear relationship between two quantitative variables, not the underlying cause-and-effect relationship.
  2. Sensitive to Outliers: The correlation coefficient can be sensitive to outliers, as outliers can disproportionately influence the correlation calculation.
  3. Assumption of Linearity: The correlation coefficient measures the linear relationship between variables. It may not accurately capture non-linear relationships between variables.
  4. Scale Invariance: The correlation coefficient is independent of the scale of the data. That is, multiplying or dividing all the values of one or both variables by a constant will not affect the strength and direction of correlation coefficient. This makes it useful for comparing relationships between variables measured in different units.
  5. Strength vs. Causation: A high correlation does not necessarily imply causation. It is because two variables are strongly correlated does not mean one causes the other. There might be a third unknown factor influencing both variables. Correlation analysis is a good starting point for exploring relationships, but further investigation is needed to establish causality.
https://itfeature.com

https://gmstat.com

https://rfaqs.com

The Spearman Rank Correlation Test (Numerical Example)

Consider the following data for the illustration of the detection of heteroscedasticity using the Spearman Rank correlation test. The Data file is available to download.

YX2X3
11208.1
16188.4
11228.5
14218.5
13278.8
17269
14258.9
15279.4
12309.5
18289.5

The estimated multiple linear regression model is:

$$Y_i = -34.936 -0.75X_{2i} + 7.611X_{3i}$$

The Residuals with the data table are:

YX2X3Residuals
11208.1-0.63302
16188.40.575564
11228.5-2.16954
14218.50.076455
13278.81.317102
172693.040825
14258.90.047951
15279.4-1.2497
12309.5-2.74881
18289.51.743171

We need to find the rank of absolute values of $u_i$ and the expected heteroscedastic variable $X_2$.

$Y$$X_2$$X_3$ResidualsRank of |$u_i$|Rank of $X_2$$d$$d^2$
11208.1-0.633 4224
16188.40.576 3124
11228.5-2.170 84416
14218.50.076 23-11
13278.81.317 67.5-1.52.25
172693.041 106416
14258.90.048 15-416
15279.4-1.250 57.5-2.56.25
12309.5-2.749 910-11
18289.51.743 79-24
       Total =070.5

Calculating the Spearman Rank correlation

\begin{align}
r_s&=1-\frac{6\sum d^2}{n(n-1)}\\
&=1-\frac{6\times 70.5)}{10(100-1)}=0.5727
\end{align}

Let us perform the statistical significance of $r_s$ by t-test

\begin{align}
t&=\frac{r_s \sqrt{n}}{\sqrt{1-r_s^2}}\\
&=\frac{0.5727\sqrt{8}}{\sqrt{1-(0.573)^2}}=1.977
\end{align}

The value of $t$ from the table at a 5% level of significance at 8 degrees of freedom is 2.306.

Since $t_{cal} \ngtr t_{tab}$, there is no evidence of the systematic relationship between the explanatory variables, $X_2$ and the absolute value of the residuals ($|u_i|$) and hence there is no evidence of heteroscedasticity.

Since there is more than one regressor (the example is from the multiple regression model), therefore, Spearman’s Rank Correlation test should be repeated for each of the explanatory variables.

Spearman Rank Correlation

As an assignment perform the Spearman Rank Correlation between |$u_i$| and $X_3$  for the data above. Test the statistical significance of the coefficient in the above manner to explore evidence about heteroscedasticity.

Read about Pearson’s Correlation Coefficient

R Language Interview Questions

Covariance and Correlation (2015)

Introduction to Covariance and Correlation

Covariance and correlation are very important terminologies in statistics. Covariance measures the degree to which two variables co-vary (i.e. vary/change together). If the greater values of one variable (say, $X_i$) correspond with the greater values of the other variable (say, $X_j$), i.e. if the variables tend to show similar behavior, then the covariance between two variables ($X_i$, $X_j$) will be positive.

Similarly, if the smaller values of one variable correspond with the smaller values of the other variable, then the covariance between two variables will be positive. In contrast, if the greater values of one variable (say, $X_i$) mainly correspond to the smaller values of the other variables (say, $X_j$), i.e. both of the variables tend to show opposite behavior, then the covariance will be negative.

In other words, positive covariance between two variables means they (both of the variables) vary/change together in the same direction relative to their expected values (averages). It means that if one variable moves above its average value, the other variable tends to be above its average value.

Similarly, if covariance is negative between the two variables, then one variable tends to be above its expected value, while the other variable tends to be below its expected value. If covariance is zero then it means that there is no linear dependency between the two variables.

Mathematical Representation of Covariance

Mathematically covariance between two random variables $X_i$ and $X_j$ can be represented as
\[COV(X_i, X_j)=E[(X_i-\mu_i)(X_j-\mu_j)]\]
where
$\mu_i=E(X_i)$ is the average of the first variable
$\mu_j=E(X_j)$ is the average of the second variable

\begin{aligned}
COV(X_i, X_j)&=E[(X_i-\mu_i)(X_j-\mu_j)]\\
&=E[X_i X_j – X_i E(X_j)-X_j E(X_i)+E(X_i)E(X_j)]\\
&=E(X_i X_j)-E(X_i)E(X_j) – E(X_j)E(X_i)+E(X_i)E(X_j)\\
&=E(X_i X_j)-E(X_i)E(X_j)
\end{aligned}

Covariance

Note that, the covariance of a random variable with itself is the variance of the random variable, i.e. $COV(X_i, X_i)=VAR(X)$. If $X_i$ and $X_j$ are independent, then $E(X_i X_j)=E(X_i)E(X_j)$ and $COV(X_i, X_j)=E(X_i X_j)-E(X_i) E(X_j)=0$.

Covariance and Correlation

Covariance and Correlation

Correlation and covariance are related measures but not equivalent statistical measures.

Equation of Correlation (Normalized Covariance

The correlation between two variables (Let, $X_i$ and $X_j$) is their normalized covariance, defined as
\begin{aligned}
\rho_{i,j}&=\frac{E[(X_i-\mu_i)(X_j-\mu_j)]}{\sigma_i \sigma_j}\\
&=\frac{n \sum XY – \sum X \sum Y}{\sqrt{(n \sum X^2 -(\sum X)^2)(n \sum Y^2 – (\sum Y)^2)}}
\end{aligned}
where $\sigma_i$ is the standard deviation of $X_i$ and $\sigma_j$ is the standard deviation of $X_j$.

Note that correlation is dimensionless, i.e. a number that is free of the measurement unit and its values lie between -1 and +1 inclusive. In contrast, covariance has a unit of measure–the product of the units of two variables.

For further reading about Correlation follow these posts

R Frequently Asked Questions