Correlation and Regression Analysis, simple and multiple regression, regression, regression diagnostics, Assumptions of the regression model, model selection criteria, Linear Regression models, Relationship between Variables, Strength, and direction of the relationship
The post is about the Correlation and Regression Quiz. There are 20 multiple-choice questions. The quiz covers the topics related to correlation analysis and regression analysis, Basic concepts, assumptions, and violations of correlation and regression analysis, Model selection criteria, interpretation of correlation and regression coefficients, etc.
Online MCQs about Correlation and Regression Analysis with Answers
Online Correlation and Regression Quiz with Answers
The strength (degree) of the correlation between a set of independent variables $X$ and a dependent variable $Y$ is measured by
The percent of the total variation of the dependent variable $Y$ explained by the set of independent variables $X$ is measured by
A coefficient of correlation is computed to be -0.95 means that
Let the coefficient of determination computed to be 0.39 in a problem involving one independent variable and one dependent variable. This result means that
The relationship between the correlation coefficient and the coefficient of determination is that
Multicollinearity exists when
If “time” is used as the independent variable in a simple linear regression analysis, then which of the following assumptions could be violated
In multiple regression, when the global test of significance is rejected, we can conclude that
A residual is defined as
What test statistic is used for a global test of significance?
If the value of any regression coefficient is zero, then two variables are said to be
In the straight line graph of the linear equation $Y=a+bX$, the slope will be upward if
In the straight line graph of the linear equation $Y=a+bX$, the slope will be downward if
In the straight line graph of the linear equation $Y=a+BX$, the slope is horizontal if
For the regression $\hat{Y}=5$, the value of regression coefficient of $Y$ on $X$ will be
If $\beta_{yx} = -1.36$ and $\beta_{xy} = -0.34$ then $r_{xy} =$
If one regression coefficient is greater than one then the other will be
To determine the height of a person when his weight is given is
The post is about Correlation and Regression MCQs. There are 20 multiple-choice questions. The quiz covers topics related to the basics of correlation Analysis and regression analysis, correlation and regression coefficients, graphical representation relationships between variables, simple linear regression models and multiple linear regression models, and assumptions related to correlation and regression models. Let us start with the Correlation and Regression MCQs Quiz.
The estimate of $\beta$ in the regression equation $Y=\alpha+\beta\,X + e$ by the method of least square is:
If $\beta_{XY}$ and $\beta_{YX}$ are two regression coefficients, they have
The average of two regression coefficients is always greater than or equal to the correction coefficient is called:
If $\beta_{YX}>1$, then $\beta_{XY}$ is:
If the two lines of regression are perpendicular to each other, the correlation coefficient $r=$ is:
The regression coefficient is independent of
If each of $X$ variable is divided by 5 and $Y$ by 10 then $\beta_{YX}$ by coded value is:
The geometric mean of the two regression coefficient $\beta_{YX}$ and $\beta_{XY}$ is equal to:
If $X$ and $Y$ are two independent variates with variance $\sigma_X^2$ and $\sigma_Y^2$, respectively, the coefficient of correlation between $X$ and ($X-Y$) is equal to:
In multiple linear regression analysis, the square root of Mean Squared Error (MSE) is called the:
The range of a partial correlation coefficient is:
Homogeneity of three or more population correlation coefficients can be tested by
If $\rho$ is the correlation coefficient, the quantity $\sqrt{1-\rho^2}$ is termed as
If the correlation coefficient between the variables $X$ and $Y$ is $\rho$, the correlation coefficient between $X^2$ and $Y^2$ is
The lines of regression intersect at the point
If $\rho=0$, the lines of regression are:
An investigator reports that the arithmetic mean of two regression coefficients of a regression line is 0.7 and the correlation coefficient is 0.75. Are the results
If regression line $\hat{y}=5$ then value of regression coefficient of $y$ on $x$ is
When two variables move in the same direction then the correlation between the variables is
If all the actual and estimated values of $Y$ are the same on the regression line, the sum of squares of errors will be
The Akaike Information Criteria/Criterion (AIC) is a method used in statistics and machine learning to compare the relative quality of different models for a given dataset. The AIC method helps in selecting the best model out of a bunch by penalizing models that are overly complex. Akaike Information Criterion provides a means for comparing among models i.e. a tool for model selection.
A too-simple model leads to a large approximation error.
A too-complex model leads to a large estimation error.
AIC is a measure of goodness of fit of a statistical model developed by Hirotsugo Akaike under the name of “an information Criteria (AIC) and published by him in 1974 first time. It is grounded in the concept of information entropy in between bias and variance in model construction or between accuracy and complexity of the model.
The Formula of Akaike Information Criteria
Given a data set, several candidate models can be ranked according to their AIC values. From AIC values one may infer that the top two models are roughly in a tie and the rest far worse.
$$AIC = 2k-ln(L)$$
where $k$ is the number of parameters in the model, and $L$ is the maximized value of the likelihood function for the estimated model.
For a set of candidate models for the data, the preferred model is the one that has a minimum AIC value. AIC estimates relative support for a model, which means that AIC scores by themselves are not very meaningful
Akaike Information Criteria focuses on:
Balances fit and complexity: A model that perfectly fits the data might not be the best because it might be memorizing the data instead of capturing the underlying trend. AIC considers both how well a model fits the data (goodness of fit) and how complex it is (number of variables).
A lower score is better: Models having lower AIC scores are preferred as they achieve a good balance between fitting the data and avoiding overfitting.
Comparison tool: AIC scores are most meaningful when comparing models for the same dataset. The model with the lowest AIC score is considered the best relative to the other models being evaluated.
Summary
The AIC score is a single number and is used as model selection criteria. One cannot interpret the AIC score in isolation. However, one can compare the AIC scores of different model fits to the same data. The model with the lowest AIC is generally considered the best choice.
The AIC is the most useful model selection criterion when there are multiple candidate models to choose from. It works well for larger datasets. However, for smaller datasets, the corrected AIC should be preferred. AIC is not perfect, and there can be situations where it fails to choose the optimal model.
There are many other model selection criteria. For more detail read the article: Model Selection Criteria
Francis Galton (a biometrician) examines the relationship between fathers’ and sons’ height. He analyzed the similarities between the parent and child generation of 700 sweet peas. Galton found that the offspring of tall parents tended to be shorter and offspring of shorter parents tended to be taller. The height of the children depends ($Y$) upon the height of the parents ($X$). In case, there is more than one independent variable (IV), we need multiple regression analysis (MRA), also called multiple linear regression (MLR).
Table of Contents
Multiple Linear Regression Model
The linear regression model (equation) for two independent variables (regressors) is
To fit the MLR equation for two variables, one needs to compute the values of $\hat{\beta}_1, \hat{\beta}_2$, and $\alpha$.
The yellow part of the above formula is the (“sum of the product of 1st independent and dependent variables”) multiplied by the (“sum of the square of 2nd independent variable).
The red part of the above formula is the (“Sum of the product of 2nd independent and dependent variables”) multiplied by the (“sum of the product of two independent variables”).
The green part of the above formula is the (“sum of the square of 1st independent variable”) multiplied by the (“sum of the square of 2nd independent variable”).
The blue part of the above formula is the (“square of the sum of the product of two independent variables”).
The formula for 2nd regression coefficient is
In short, note that the $S$ stands for the sum of squares and the sum of products.
Multiple Linear Regression Example
Consider the following data about two regressors ($X_1, X_2$) and one regressand variable ($Y$).
Independent variables (predictors, regressors): These are the variables that one believes to influence the dependent variable. One can have two or more independent variables in a multiple-regression model.
Dependent variable (outcome, response): This is the variable one is trying to predict or explain using the independent variables.
Linear relationship: The core assumption is that the relationship between the independent variables and dependent variable is linear. This means the dependent variable changes at a constant rate for a unit change in the independent variable, holding all other variables constant.
The main goal of multiple regression analysis is to find a linear equation that best fits the data. The multiple regression analysis also allows one to:
Predict the value of the dependent variable based on the values of the independent variables.
Understand how changes in the independent variables affect the dependent variable while considering the influence of other independent variables.