Multiple Regression Model Introduction (2015)

Introduction to Multiple Regression Model

A multiple regression model (a regression having multi-variable) is referred to as a regression model having more than one predictor (independent and explanatory variable) to explain a response (dependent) variable. We know that simple regression models have one predictor used to explain a single response while for the case of multiple (multivariable) regression models, more than one predictor in the models. Simple regression models and multiple (multivariable) regression models can further be categorized as linear or non-linear regression models.

Note that linearity is not based on predictors or the addition of more predictors in the simple regression model, it is referred to as the parameter of variability (parameters attached with predictors). If the parameters of variability have a constant rate of change then the models are referred to as linear models either it is a simple regression model or multiple (multivariable) regression models. It is assumed that the relationship between variables is considered linear, though this assumption can never be confirmed in the case of multiple linear regression.

However, as a rule, it is better to look at a bivariate scatter diagram of the variable of interest, you check that there should be no curvature in the relationship. A scatter matrix plot is a more useful visualization between variables of interest.

The multiple regression model also allows us to determine the overall fit (which is known as variance explained) of the model and the relative contribution of each of the predictors to the total variance explained (overall fit of the model). For example, one may be interested to know how much of the variation in exam performance can be explained by the following predictors such as revision time, test anxiety, lecture attendance, and gender “as a whole”, but also the “relative contribution” of each independent variable in explaining the variance.

General Form of Multiple Regression Model

A multiple regression model has the form

\[y=\alpha+\beta_1 x_1+\beta_2 x_2+\cdots+\beta_k x_k+\varepsilon\]

Here $y$ is continuous variables and $x$’s are known as predictors which may be continuous, categorical, or discrete. The above model is referred to as a linear multiple (multivariable) regression model.

Multiple Regression Model

Example of Multiple Regression Model

For example prediction of college GPA by using, high school GPA, test scores, time given to study, and rating of high school as predictors.

  • How rainfall, temperature, and amount of fertilizer impact and affect crop growth
  • Influence of various factors (such as cholesterol, blood pressure, or diabetes) on health outcomes
  • Blood pressure depends on variables, for example, gender, age, height, weight, exercise, diet, and medication.
  • The Weight of a person is linearly related to their height and age.
  • Studying the effect of education, gender, and profession on income.
  • The price of a house depends on the size of the house, number of rooms, community, facilities available, etc.

Assumptions of the Multiple Regression Model

Multiple regression models also have some assumptions that need to be followed or fulfilled. For example, the residuals should be normally distributed. There should be no collinearity/ multicollinearity among the regressors/ independent variables. The variance of error terms should be homoscedastic, and error terms should be not correlated (no autocorrelation).

Common Applications of Multiple Regression Models

  • Marketing: Predicting customer spending based on factors like income, gender, age, and advertising exposure.
  • Social Science: Analyzing the factors that influence voting behavior, such as gender, education level, income, and political party affiliation.
  • Finance: Estimating stock prices based on company earnings, economic indicators, and market trends.
  • Predicting house prices: One can use factors like square area, number of bedrooms, and location to predict the selling price of a house.
  • Identifying risk factors for diseases: Researchers can use multiple regression to see how lifestyle choices, genetics, and environmental factors contribute to the risk of developing a particular disease.

Read Assumptions of Multiple Regression Model

Learn R Programming Language

Logistic regression Introduction (2015)

Logistic regression was introduced in the 1930s by Ronald Fisher and Frank Yates and was first proposed in the 1970s as an alternative technique to overcome the limitations of ordinary least square regression in handling dichotomous outcomes. It is a type of probabilistic statistical classification model which is a non-linear regression model, and can be converted into a linear model by using a simple transformation. It is used to predict a binary response categorical dependent variable, based on one or more predictor variables. That is, it is used in estimating empirical values of the parameters in a model. Here response variable assumes a value as zero or one i.e., dichotomous variable.

Logistic Regression Model

It is the regression model of $b$, $a$ logistic regression model is written as

  \[\pi=\frac{1}{1+e^{-[\alpha +\sum_{i=1}^k \beta_i X_{ij}]}}\]

where $\alpha$ and $\beta_i$ are the intercept and slope respectively.

Logistic Regression

So in simple words, logistic regression is used to find the probability of the occurrence of the outcome of interest.  For example, if we want to find the significance of the different predictors (gender, sleeping hours, took part in extracurricular activities, etc.), on a binary response (pass or fail in exams coded as 0 and 1), for this kind of problems we used logistic regression.

By using a transformation this nonlinear regression model can be easily converted into a linear model. As $\pi$ is the probability of the events in which we are interested if we take the ratio of the probability of success and failure then the model becomes a linear model.

\[ln(y)=ln(\frac{\pi}{1-\pi})\]

The natural log of odds can convert the logistics regression model into a linear form.

Binary Logistic Regression in Minitab

References:

Introduction Odds Ratio (2015)

Introduction Odds Ratio

An odds ratio is a relative measure of effect, allowing the comparison of the intervention group of a study relative to the comparison or placebo group.

Medical students, students from clinical and psychological sciences, professionals allied to medicine enhancing their understanding and learning of medical literature, and researchers from different fields of life usually encounter Odds Ratio (OR) throughout their careers.

When computing the OR, one would do:

  • The numerator is the odds in the intervention arm
  • The denominator is the odds in the control or placebo arm= OR

Calculating ODDs Ratios

The ratio of the probability of success and failure is known as the odds. If the probability of an event is $P_1$ then the odds are:
\[OR=\frac{p_1}{1-p_1}\]

If the outcome is the same in both groups, the ratio will be 1, implying that there is no difference between the two arms of the study. However, if the $OR>1$, the control group is better than the intervention group while, if the $OR<1$, the intervention group is better than the control group.

The Odds Ratio is the ratio of two odds that can be used to quantify how much a factor is associated with the response factor in a given model. If the probabilities of occurrences of an event are $P_1$ (for the first group) and $P_2$ (for the second group), then the OR is:
\[OR=\frac{\frac{p_1}{1-p_1}}{\frac{p_2}{1-p_2}}\]

If predictors are binary then the OR for $i$th factor is defined as
\[OR_i=e^{\beta}_i\]

Odds Ratio

The regression coefficient $b_1$ from logistic regression is the estimated increase in the log odds of the dependent variable per unit increase in the value of the independent variable. In other words, the exponential function of the regression coefficients $(e^{b_1})$ in the OR is associated with a one-unit increase in the independent variable.

Online MCQs about Economics with Answers

R Programming Language Lectures