Best Correlation and Regression MCQs 1

The post is about Correlation and Regression MCQs. There are 20 multiple-choice questions. The quiz covers topics related to the basics of correlation Analysis and regression analysis, correlation and regression coefficients, graphical representation relationships between variables, simple linear regression models and multiple linear regression models, and assumptions related to correlation and regression models. Let us start with the Correlation and Regression MCQs Quiz.

Online MCQs about Correlation and Regression Analysis with Answers

1. The estimate of $\beta$ in the regression equation $Y=\alpha+\beta\,X + e$ by the method of least square is:

 
 
 
 

2. An investigator reports that the arithmetic mean of two regression coefficients of a regression line is 0.7 and the correlation coefficient is 0.75. Are the results

 
 
 
 

3. The range of a partial correlation coefficient is:

 
 
 
 

4. The average of two regression coefficients is always greater than or equal to the correction coefficient is called:

 
 
 
 

5. The lines of regression intersect at the point

 
 
 
 

6. If $\beta_{XY}$ and $\beta_{YX}$ are two regression coefficients, they have

 
 
 
 

7. If $\beta_{YX}>1$, then $\beta_{XY}$ is:

 
 
 
 

8. When two variables move in the same direction then the correlation between the variables is

 
 
 
 

9. If regression line $\hat{y}=5$ then value of regression coefficient of $y$ on $x$ is

 
 
 
 

10. If $X$ and $Y$ are two independent variates with variance $\sigma_X^2$ and $\sigma_Y^2$, respectively, the coefficient of correlation between $X$ and ($X-Y$) is equal to:

 
 
 
 

11. If $\rho=0$, the lines of regression are:

 
 
 
 

12. The geometric mean of the two regression coefficient $\beta_{YX}$ and $\beta_{XY}$ is equal to:

 
 
 
 

13. In multiple linear regression analysis, the square root of Mean Squared Error (MSE) is called the:

 
 
 
 

14. If all the actual and estimated values of $Y$ are the same on the regression line, the sum of squares of errors will be

 
 
 
 

15. If each of $X$ variable is divided by 5 and $Y$ by 10 then $\beta_{YX}$ by coded value is:

 
 
 
 

16. Homogeneity of three or more population correlation coefficients can be tested by

 
 
 
 

17. If the two lines of regression are perpendicular to each other, the correlation coefficient $r=$ is:

 
 
 
 

18. The regression coefficient is independent of

 
 
 
 

19. If $\rho$ is the correlation coefficient, the quantity $\sqrt{1-\rho^2}$ is termed as

 
 
 
 

20. If the correlation coefficient between the variables $X$ and $Y$ is $\rho$, the correlation coefficient between $X^2$ and $Y^2$ is

 
 
 
 

Online Correlation and Regression MCQs

Online Correlation and Regression MCQs with Answers
  • The estimate of $\beta$ in the regression equation $Y=\alpha+\beta\,X + e$ by the method of least square is:
  • If $\beta_{XY}$ and $\beta_{YX}$ are two regression coefficients, they have
  • The average of two regression coefficients is always greater than or equal to the correction coefficient is called:
  • If $\beta_{YX}>1$, then $\beta_{XY}$ is:
  • If the two lines of regression are perpendicular to each other, the correlation coefficient $r=$ is:
  • The regression coefficient is independent of
  • If each of $X$ variable is divided by 5 and $Y$ by 10 then $\beta_{YX}$ by coded value is:
  • The geometric mean of the two regression coefficient $\beta_{YX}$ and $\beta_{XY}$ is equal to:
  • If $X$ and $Y$ are two independent variates with variance $\sigma_X^2$ and $\sigma_Y^2$, respectively, the coefficient of correlation between $X$ and ($X-Y$) is equal to:
  • In multiple linear regression analysis, the square root of Mean Squared Error (MSE) is called the:
  • The range of a partial correlation coefficient is:
  • Homogeneity of three or more population correlation coefficients can be tested by
  • If $\rho$ is the correlation coefficient, the quantity $\sqrt{1-\rho^2}$ is termed as
  • If the correlation coefficient between the variables $X$ and $Y$ is $\rho$, the correlation coefficient between $X^2$ and $Y^2$ is
  • The lines of regression intersect at the point
  • If $\rho=0$, the lines of regression are:
  • An investigator reports that the arithmetic mean of two regression coefficients of a regression line is 0.7 and the correlation coefficient is 0.75. Are the results
  • If regression line $\hat{y}=5$ then value of regression coefficient of $y$ on $x$ is
  • When two variables move in the same direction then the correlation between the variables is
  • If all the actual and estimated values of $Y$ are the same on the regression line, the sum of squares of errors will be
Statistics Help Correlation and Regression MCQs

https://rfaqs.com

https://gmstat.com

Akaike Information Criteria: A Comprehensive Guide

The Akaike Information Criteria/Criterion (AIC) is a method used in statistics and machine learning to compare the relative quality of different models for a given dataset. The AIC method helps in selecting the best model out of a bunch by penalizing models that are overly complex. Akaike Information Criterion provides a means for comparing among models i.e. a tool for model selection.

  • A too-simple model leads to a large approximation error.
  • A too-complex model leads to a large estimation error.

AIC is a measure of goodness of fit of a statistical model developed by Hirotsugo Akaike under the name of “an information Criteria (AIC) and published by him in 1974 first time. It is grounded in the concept of information entropy in between bias and variance in model construction or between accuracy and complexity of the model.

The Formula of Akaike Information Criteria

Given a data set, several candidate models can be ranked according to their AIC values. From AIC values one may infer that the top two models are roughly in a tie and the rest far worse.

$$AIC = 2k-ln(L)$$

where $k$ is the number of parameters in the model, and $L$ is the maximized value of the likelihood function for the estimated model.

Akaike Information Criteria/ Criterion (AIC)

For a set of candidate models for the data, the preferred model is the one that has a minimum AIC value. AIC estimates relative support for a model, which means that AIC scores by themselves are not very meaningful

Akaike Information Criteria focuses on:

  • Balances fit and complexity: A model that perfectly fits the data might not be the best because it might be memorizing the data instead of capturing the underlying trend. AIC considers both how well a model fits the data (goodness of fit) and how complex it is (number of variables).
  • A lower score is better: Models having lower AIC scores are preferred as they achieve a good balance between fitting the data and avoiding overfitting.
  • Comparison tool: AIC scores are most meaningful when comparing models for the same dataset. The model with the lowest AIC score is considered the best relative to the other models being evaluated.

Summary

The AIC score is a single number and is used as model selection criteria. One cannot interpret the AIC score in isolation. However, one can compare the AIC scores of different model fits to the same data. The model with the lowest AIC is generally considered the best choice.

The AIC is the most useful model selection criterion when there are multiple candidate models to choose from. It works well for larger datasets. However, for smaller datasets, the corrected AIC should be preferred. AIC is not perfect, and there can be situations where it fails to choose the optimal model.

There are many other model selection criteria. For more detail read the article: Model Selection Criteria

Akaike Information Criteria

https://rfaqs.com

https://gmstat.com

https://itfeature.com

Multiple Regression Analysis

Introduction to Multiple Regression Analysis

Francis Galton (a biometrician) examines the relationship between fathers’ and sons’ height. He analyzed the similarities between the parent and child generation of 700 sweet peas. Galton found that the offspring of tall parents tended to be shorter and offspring of shorter parents tended to be taller. The height of the children depends ($Y$) upon the height of the parents ($X$). In case, there is more than one independent variable (IV), we need multiple regression analysis (MRA), also called multiple linear regression (MLR).

Multiple Linear Regression Model

The linear regression model (equation) for two independent variables (regressors) is

$$Y_{ij} = \alpha + \beta_1 X_{1i} + \beta_2 X_{2i} + \varepsilon_{ij}$$

The general linear regression model (equation) for $k$ independent variables is

$$Y_{ij} = \alpha + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3X_{3i} + \cdots + \varepsilon_{ij}$$

The $\beta$s are all regression coefficients (partial slopes) and the $\alpha$ is the intercept.

The sample linear regression model is

$$\hat{y} = \hat{\alpha} + \hat{\beta}_1 x_{1i} + \hat{\beta}_2x_{2i} + \hat{\varepsilon}_{ij}$$

Multiple Regression Coefficients Formula

To fit the MLR equation for two variables, one needs to compute the values of $\hat{\beta}_1, \hat{\beta}_2$, and $\alpha$.

Multiple Regression Analysis Partial Coefficient 1

The yellow part of the above formula is the (“sum of the product of 1st independent and dependent variables”) multiplied by the (“sum of the square of 2nd independent variable).

The red part of the above formula is the (“Sum of the product of 2nd independent and dependent variables”) multiplied by the (“sum of the product of two independent variables”).

The green part of the above formula is the (“sum of the square of 1st independent variable”) multiplied by the (“sum of the square of 2nd independent variable”).

The blue part of the above formula is the (“square of the sum of the product of two independent variables”).

The formula for 2nd regression coefficient is

Multiple Regression Analysis Partial Coefficient 1

In short, note that the $S$ stands for the sum of squares and the sum of products.

Multiple Linear Regression Example

Consider the following data about two regressors ($X_1, X_2$) and one regressand variable ($Y$).

$Y$$X_1$$X_2$$X_1 y$$X_2 y$$X_1 X_2$$X_1^2$$X_2^2$
301015300450150100225
2258110176402564
161012160192120100144
737214921949
1421028140204100
8930526191007351238582

\begin{align*}
S_{x_1Y} &= \sum X_1 y – \frac{\sum X_1 \sum Y}{n} = 619 – \frac{30\times 59}{5} = 265\\
S_{x_1x_2} &= \sum X_1 X_2 – \frac{\sum X_1 \sum X_2}{n} = 351 – \frac{30 \times 52}{5} = 39\\
S_{X_1^2} &= \sum X_1^2 – \frac{(\sum X_1)^2}{n} = 238 -\frac{30^2}{5} = 58\\
S_{X_2^2} &= \sum X_2^2 – \frac{(\sum X_2)^2}{n} = 582 – \frac{52^2}{5} = 41.2\\
S_{X_2 y} &= \sum X_2 Y – \frac{\sum X_2 \sum Y}{n} =1007 – \frac{52 \times 89}{5} = 81.4
\end{align*}

\begin{align*}
\hat{\beta}_1 &= \frac{(S_{X_1 Y})(S_{X_2^2}) – (S_{X_2Y})(S_{X_1 X_2}) }{(S_{X_1^2})(S_{X_2^2}) – (S_{X_1X_2})^2} = \frac{(265)(41.2) – (81.4)(39)}{(58)(41.2) – (39)^2} = 8.91\\
\hat{\beta}_2 &= \frac{(S_{X_2 Y})(S_{X_1^2}) – (S_{X_1Y})(S_{X_1 X_2}) }{(S_{X_1^2})(S_{X_2^2}) – (S_{X_1X_2})^2} = \frac{(81.4)(58) – (265)(39)}{(58)(41.2) – (39)^2} = -6.46\\
\hat{\alpha} &= \overline{Y} – \hat{\beta}_1 \overline{X}_1 – \hat{\beta}_2 \overline{X}_2\\
&=31.524 + 8.91X_1 – 6.46X_2
\end{align*}

Important Key Points of Multiple Regression

  • Independent variables (predictors, regressors): These are the variables that one believes to influence the dependent variable. One can have two or more independent variables in a multiple-regression model.
  • Dependent variable (outcome, response): This is the variable one is trying to predict or explain using the independent variables.
  • Linear relationship: The core assumption is that the relationship between the independent variables and dependent variable is linear. This means the dependent variable changes at a constant rate for a unit change in the independent variable, holding all other variables constant.

The main goal of multiple regression analysis is to find a linear equation that best fits the data. The multiple regression analysis also allows one to:

  • Predict the value of the dependent variable based on the values of the independent variables.
  • Understand how changes in the independent variables affect the dependent variable while considering the influence of other independent variables.

Interpreting the Multiple Regression Coefficient

https://rfaqs.com

https://gmstat.com