MCQs Correlation and Regression 10

The post is about MCQs correlation and regression Quiz. There are 20 multiple-choice questions covering topics related to the basics of correlation and regression analysis, best-fitting trend, least square regression line, interpretation of correlation and regression coefficients, and regression plot. Let us start with the MCQs correlation and regression Quiz now.

Online MCQs correlation and Regression Analysis with Answers

1. What can you conclude about a Pearson’s r that is bigger than 1?

 
 
 
 

2. In a regression analysis, the variable that is used to explain the change in the outcome of an experiment, or some natural process, is called

 
 
 
 

3. The correlation coefficient is used to determine

 
 
 
 

4. Why do we use squared residuals when computing the regression line?

 
 
 
 

5. A professor uses the following formula to grade a statistics exam: $\hat{y} = 0.5 + 0.53x$. After obtaining the results the professor realizes that the grades are very low, so he might have been too strict. He decides to level up all results by one point. What will be the new grading equation?

 
 
 
 

6. Which of the following statement(s) about correlations is/are right?
I. When dealing with a positive Pearson’s r, the line goes up.
II. When the observations cluster around a straight line, we deal with a linear relation between the variables.
III. The steeper the line, the smaller the correlation.

 
 
 
 

7. For a mathematical model related to a straight line, if a value for the x variable is specified, then

 
 
 
 

8. The range of the multiple correlation coefficient is

 
 
 
 

9. A teacher asks his students to fill in a form about how many cigarettes they smoke every week and how much they weigh. After obtaining the data/results, he makes a scatterplot and analyses the data points. Pearson’s r is computed to assess the correlation and found to of 0.80. From the correlation results, it is concluded that smoking more cigarettes causes high body weight. What is wrong with this analysis?

 
 
 
 

10. What technique is used to help identify the nature of the relationship between two variables?

 
 
 
 

11. In regression analysis, the variable that is being predicted is

 
 
 
 

12. Suppose, you have investigated how eating chocolate bars influences the grades of students. For this purpose, you keep track of their chocolate intake (in bars per week) and assess their exam results one day later. Which statement(s) about the regression line $\hat{y} = 0.66x + 1.99$ is/are true?

 
 
 
 

13. If there is a very strong correlation between two variables then the correlation coefficient must be

 
 
 
 

14. Regression modeling is a statistical framework for developing a mathematical equation that describes how

 
 
 
 

15. In the least squares regression, which of the following is not a required assumption about the error term $\varepsilon$?

 
 
 
 

16. What is the explained variance? And how can you measure it?

 
 
 
 

17. When a regression line passes through the origin then

 
 
 
 

18. In a regression analysis if $R^2=1$ then

 
 
 
 

19. Regression is a form of this?

 
 
 
 

20. Suppose you have collected the following data about how much chocolate people eat and how happy these people are.
Amount of chocolate bars a week: 2, 4, 1.5, 2, 3.
Grades for happiness: 7, 3, 8, 8, 6.
(Note that the data follows paired observations)
The Pearson Correlation between these two variables will be

 
 
 
 

MCQs Correlation and Regression

  • Which of the following statement(s) about correlations is/are right? I. When dealing with a positive Pearson’s r, the line goes up. II. When the observations cluster around a straight line, we deal with a linear relation between the variables. III. The steeper the line, the smaller the correlation.
  • Suppose you have collected the following data about how much chocolate people eat and how happy these people are. Amount of chocolate bars a week: 2, 4, 1.5, 2, 3. Grades for happiness: 7, 3, 8, 8, 6. (Note that the data follows paired observations) The Pearson Correlation between these two variables will be
  • Suppose, you have investigated how eating chocolate bars influences the grades of students. For this purpose, you keep track of their chocolate intake (in bars per week) and assess their exam results one day later. Which statement(s) about the regression line $\hat{y} = 0.66x + 1.99$ is/are true?
  • A professor uses the following formula to grade a statistics exam: $\hat{y} = 0.5 + 0.53x$. After obtaining the results the professor realizes that the grades are very low, so he might have been too strict. He decides to level up all results by one point. What will be the new grading equation?
  • What is the explained variance? And how can you measure it?
  • A teacher asks his students to fill in a form about how many cigarettes they smoke every week and how much they weigh. After obtaining the data/results, he makes a scatterplot and analyses the data points. Pearson’s r is computed to assess the correlation and found to of 0.80. From the correlation results, it is concluded that smoking more cigarettes causes high body weight. What is wrong with this analysis?
  • What can you conclude about a Pearson’s r that is bigger than 1?
  • Why do we use squared residuals when computing the regression line?
  • What technique is used to help identify the nature of the relationship between two variables?
  • Regression is a form of this?
  • The correlation coefficient is used to determine
  • If there is a very strong correlation between two variables then the correlation coefficient must be
  • Regression modeling is a statistical framework for developing a mathematical equation that describes how
  • In the least squares regression, which of the following is not a required assumption about the error term $\varepsilon$?
  • In a regression analysis if $R^2=1$ then
  • In a regression analysis, the variable that is used to explain the change in the outcome of an experiment, or some natural process, is called
  • For a mathematical model related to a straight line, if a value for the x variable is specified, then
  • When a regression line passes through the origin then
  • The range of the multiple correlation coefficient is
  • In regression analysis, the variable that is being predicted is
MCQs correlation and Regression Analysis Quiz with Answers

Computer MCQs Online Test

MCQs in Statistics

Linear Regression and Correlation Quiz 9

The post is about MCQs Linear Regression and correlation Quiz. There are 20 multiple-choice questions covering topics related to the basics of correlation and regression analysis, best-fitting trend, least square regression line, interpretation of correlation and regression coefficients, and regression plot. Let us start with the MCQs about Linear Regression and Correlation Quiz now.

Please go to Linear Regression and Correlation Quiz 9 to view the test

Online Linear Regression and Correlation Quiz with Answers

Linear Regression and Correlation Quiz with Answers

  • A regression analysis is run between two continuous variables “amount of food eaten” vs “the amount of calories burnt”. The coefficient value is $-0.33$ for “the amount of food eaten” and an R-square value of 0.81. What is the correlation coefficient?
  • In the simple linear regression equation, the term $B_0$ represents the
  • In model development, one can develop more accurate models when one has which of the following?
  • How should one interpret an R-squared if it is 0.89?
  • When comparing linear regression models, when will the mean squared error (MSE) be smaller?
  • Which of the following is NOT true about a model?
  • Which of the following is NOT a method for evaluating a regression model?
  • Which of the following is NOT true about a model?
  • What type of model would you use if you wanted to find the relationship between a set of variables?
  • Pearson correlation are concerned with
  • Which of the following statements describes a positive correlation between two variables?
  • When using the Pearson method to evaluate the correlation between two variables, which set of numbers indicates a strong positive correlation?
  • What are the key reasons to develop a model for your data analysis?
  • There are four assumptions associated with a linear regression model. What is the definition of the assumption of homoscedasticity?
  • Which performance metric for regression is the mean of the square of the residuals (error)?
  • When comparing the MSE of different models, do you want the highest or lowest value of MSE?
  • Which is NOT true for comparing multiple linear regression (MLR) and simple linear regression (SLR)?
  • One can visualize the correlation between two variables by plotting them on a scatter plot and then doing which of the following?
  • When using the Pearson method to evaluate the correlation between two variables, how can one know that there is a strong certainty in the result?
  • The method of least squares finds the best-fit line that ————– the error between observed and estimated points on the line.

Simulation in R for Sampling

Model Selection Criteria

Method of Least Squares

Introduction to Method of Least Squares

The method of least squares is a statistical technique used to find the best-fitting curve or line for a set of data points. It does this by minimizing the sum of the squares of the offsets (residuals) of the points from the curve.

The method of least squares is used for

  • solution of equations, and
  • curve fitting

The principles of least squares consist of minimizing the sum of squares of deviations, errors, or residuals.

Mathematical Functions/ Models

Many types of mathematical functions (or models) can be used to model the response, i.e. a function of one or more independent variables. It can be classified into two categories, deterministic and probabilistic models. For example, $Y$ and $X$ are related according to the relation

$$Y=\beta_o + \beta_1 X,$$

where $\beta_o$ and $\beta_1$ are unknown parameter. $Y$ is a response variable and $X$ is an independent/auxiliary variable (regressor). The model above is called the deterministic model because it does not allow for any error in predicting $Y$ as a function of $X$.

Probabilistic and Deterministic Models

Suppose that we collect a sample of $n$ values of $Y$ corresponding to $n$ different settings for the independent random variable $X$ and the graph of the data is as shown below.

Method of Least Squares

In the figure above it is clear that $E(Y)$ may increase as a function of $X$ but the deterministic model is far from an adequate description of reality.

Repeating the experiment when say $X=20$, we would find $Y$ fluctuates about a random error, which leads us to the probabilistic model (that is the model is not deterministic or not an exact representation between two variables). Further, if the mode is used to predict $Y$ when $X=20$, the prediction would be subjected to some known error. This of course leads us to use the statistical method predicting $Y$ for a given value of $X$ is an inferential process and we need to find if the error of prediction is to be valued in real life. In contrast to the deterministic model, the probabilistic model is

$$E(Y)=\beta_o + \beta_1 X + \varepsilon,$$

where $\varepsilon$ is a random variable having the specified distribution, with zero mean. One may think having the deterministic component with error $\varepsilon$.

The probabilistic model accounts for the random behaviour of $Y$ exhibited in the figure and provides a more accurate description of reality than the deterministic model.

The properties of error of prediction of $Y$ can be divided for many probabilistic models. If the deterministic model can be used to predict with negligible error, for all practical purposes, we use them, if not, we seek a probabilistic model which will not be a correct/exact characterization of nature but enable us to assess the reality of our nature.

Estimation of Linear Model: Least Squares Method

For the estimation of the parameters of a linear model, we consider fitting a line.

$$E(Y) = \beta_o + \beta_1 X, \qquad (where\,\, X\,\,\, is \,\,\, fixed).$$

For a set of points ($x_i, y_i$), we consider the real situation

$$Y=\beta_o+\beta_1X+\varepsilon, \qquad with\,\,\, E(\varepsilon)=0$$

where $\varepsilon$ posses specific probability distribution with zero mean and $\beta_o$ and $\beta_1$ are unknown parameters.

Minimizing the Vertical Distances of Data Points

Now if $\hat{\beta}_o$ and $\hat{\beta}_1$ are the estimates of $\beta_o$ and $\beta_1$, respectively then $\hat{Y}=\hat{\beta}_o+\hat{\beta}_1X$ is an estimate of $E(Y)$.

Method of Least Squares

Suppose we have a set of $n$ data sets (points, $x_i, y_i$) and we want to minimize the sum of squares of the vertical distances of the data points from the fitted line $\hat{y}_i = \hat{\beta}_o + \hat{\beta}_1x_i; \,\,\, i=1,2,\cdots, n$. The $\hat{y}_i = \hat{\beta}_o + \hat{\beta}_1x_i$ is the predicted value of $i$th $Y$ when $X=x_i$. The deviation of observed values of $Y$ from $\hat{Y}$ line (sometimes called errors) is $y_i – \hat{y}_i$ and the sum of squares of deviations to be minimized is (vertical distance: $y_i – \hat{y}_i$).

\begin{align*}
SSE &= \sum\limits_{i=1}^n (y_i-\hat{y}_i)^2\\
&= \sum\limits_{i=1}^n (y_i – \hat{\beta}_o – \hat{\beta}_1x_i)^2
\end{align*}

The quantity SSE is called the sum of squares of errors. If SSE possesses minimum, it will occur for values of $\beta_o$ and $\beta_1$ that satisfied the equation $\frac{\partial SSE}{\partial \beta_o}=0$ and $\frac{\partial SSE}{\partial \beta_1}=0$.

Taking the partial derivatives of SSE with respect to $\hat{\beta}_o$ and $\hat{\beta}_1$ and setting them equal to zero, gives us

\begin{align*}
\frac{\partial SSE}{\partial \beta_o} &= \sum\limits_{i=1}^n (y_i – \hat{\beta}_o – \hat{\beta}_1 x_i)^2\\
&= -2 \sum\limits_{i=1}^n (y_i – \hat{\beta}_o – \hat{\beta}_1 x_i) =0\\
&= \sum\limits_{i=1}^n y_i – n\hat{\beta}_o – \hat{\beta}_1 \sum\limits_{i=1}^n x_i =0\\
\Rightarrow \overline{y} &= \hat{\beta}_o + \beta_1\overline{x} \tag*{eq (1)}
\end{align*}

and

\begin{align*}
\frac{\partial SSE}{\partial \beta_1} &= -2 \sum\limits_{i=1}^n (y_i – \hat{\beta}_o – \hat{\beta}_1 x_i)x_i =0\\
&= \sum\limits_{i=1}^n (y_i – \hat{\beta}_o – \hat{\beta}_1 x_i)x_i=0\\
\Rightarrow \sum\limits_{i=1}^n x_iy_i &= \hat{\beta}_o \sum\limits_{i=1}^n x_i – \hat{\beta}_1 \sum\limits_{i=1}^n x_i^2\tag*{eq (2)}
\end{align*}

The equation $\frac{\partial SSE}{\hat{\beta}_o}=0$ and $\frac{\partial SSE}{\partial \hat{\beta}_1}=0$ are called the least squares for estimating the parameters of a straight line. On solving the least squares equation, we have from equation (1),

$$\hat{\beta}_o = \overline{Y} – \hat{\beta}_1 \overline{X}$$

Putting $\hat{\beta}_o$ in equation (2)

\begin{align*}
\sum\limits_{i=1}^n x_i y_i &= (\overline{Y} – \hat{\beta}\overline{X}) \sum\limits_{i=1}^n x_i + \hat{\beta}_1 \sum\limits_{i=1}^n x_i^2\\
&= n\overline{X}\,\overline{Y} – n \hat{\beta}_1 \overline{X}^2 + \hat{\beta}_1 \sum\limits_{i=1}^n x_i^2\\
&= n\overline{X}\,\overline{Y} + (\sum\limits_{i=1}^n x_i^2 – n\overline{X}^2)\\
\Rightarrow \hat{\beta}_1 &= \frac{\sum\limits_{i=1}^n x_iy_i – n\overline{X}\,\overline{Y} }{\sum\limits_{i=1}^n x_i^2 – n\overline{X}^2} = \frac{\sum\limits_{i=1}^n (x_i-\overline{X})(y_i-\overline{Y})}{\sum\limits_{i=1}^n(x_i-\overline{X})^2}
\end{align*}

Applications of Least Squares Method

The method of least squares is a powerful statistical technique. It provides a systematic way to find the best-fitting curve or line for a set of data points. It enables us to model relationships between variables, make predictions, and gain insights from data. The method of least squares is widely used in various fields, such as:

  • Regression Analysis: To model the relationship between variables and make predictions.
  • Curve Fitting: To find the best-fitting curve for a set of data points.
  • Data Analysis: To analyze trends and patterns in data.
  • Machine Learning: As a foundation for many machine learning algorithms.

Frequently Asked Questions about Least Squares Method

  • What is the method of Least Squares?
  • Write down the applications of the Least Squares method.
  • How vertical distance of the data points from the regression line is minimized?
  • What is the principle of the Method of Least Squares?
  • What is meant by probabilistic and deterministic models?
  • Give an example of deterministic and probabilistic models.
  • What is the mathematical model?
  • What is the statistical model?
  • What is curve fitting?
  • State and prove the Least Squares Method?

R Programming Language