Binary Logistic Regression Minitab Tutorial

Binary Logistic Regression is used to perform logistic regression on a binary response (dependent) variable (a variable only that has two possible values, such as the presence or absence of a particular disease, this kind of variable is known as a dichotomous variable i.e. binary in nature).

Binary Logistic Regression

Binary Logistic Regression can classify observations into one of two categories. These classifications can give fewer classification errors than discriminant analysis for some cases.

The default model contains the variables that you enter in Continuous Predictors and Categorical Predictors. You can also add interaction and/or polynomial terms by using the tools available in the model sub-dialog box.

Minitab stores the last model that you fit for each response variable. These stored models can be used to quickly generate predictions, contour plots, surface plots, overlaid contour plots, factorial plots, and optimized responses.

Minitab Tutorial for Binary Logistic Regression

To perform a Binary Logistic Regression Analysis in Minitab, follow the steps given below. It is assumed that you have already launched the Minitab software.

Step 1:  Choose Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model.

Binary Logistic Regression Minitab Tutorial

Step 2:  Do one of the following:

If your data is in raw or frequency form, follow these steps:

Binary Logistic Regression 2
  1. Choose Response in binary response/frequency format, from the combo box on top
  2. In the Response text box, enter the column that contains the response variable.
  3. In the Frequency text box, enter the optional column that contains the count or frequency variable.

If you have summarized data, then follow these steps:

Binary Logistic Regression 3
  1. Choose Response in event/trial format, from the combo box on top of the dialog box.
  2. In the Number of events, enter the column that contains the number of times the event occurred in your sample at each combination of the predictor values.
  3. In the Number of trials, enter the column that contains the corresponding number of trials.

Step 4:  In Continuous predictors, enter the columns that contain continuous predictors. In Categorical predictors, enter the columns that contain categorical predictors. You can add interactions and other higher-order terms to the model.

Step 5:  If you like, use one or more of the dialog box options, then click OK.

Minitab Binary Logistic Regression Options

The following are options available in the main dialog box of Minitab Binary Logistic Regression:

  • The response in binary response/frequency format: Choose if the response data has been entered as a column that contains 2 distinct values i.e. as a dichotomous variable.
  • Response: Enter the column that contains the response values.
  • Response event: Choose which event of interest the results of the analysis will describe.
  • Frequency (optional): If the data are in two columns i.e. one column that contains the response values and the other column that contains their frequencies then enter the column that contains the frequencies.
  • Response in event/trial format: Choose if the response data are two columns – one column that contains the number of successes or events of interest and one column that contains the number of trials.
  • Event name: Enter a name for the event in the data.
  • Number of events: Enter the column that contains the number of events.
  • Number of trials: Enter the column that contains the number of nonevents.
  • Continuous predictors: Select the continuous variables that explain changes in the response. The predictor is also called the X variable.
  • Categorical predictors: Select the categorical classifications or group assignments, such as the type of raw material, that explain changes in the response. The predictor is also called the X variable.

Step 6: To store diagnostic measures and characteristics of the estimated equation click the Storage… button.

Binary Logistic Regression Minitab Tutorial

Online General Knowledge Quiz with Answers

Logistic regression Introduction (2015)

Logistic regression was introduced in the 1930s by Ronald Fisher and Frank Yates and was first proposed in the 1970s as an alternative technique to overcome the limitations of ordinary least square regression in handling dichotomous outcomes. It is a type of probabilistic statistical classification model which is a non-linear regression model, and can be converted into a linear model by using a simple transformation. It is used to predict a binary response categorical dependent variable, based on one or more predictor variables. That is, it is used in estimating empirical values of the parameters in a model. Here response variable assumes a value as zero or one i.e., dichotomous variable.

Logistic Regression Model

It is the regression model of $b$, $a$ logistic regression model is written as

  \[\pi=\frac{1}{1+e^{-[\alpha +\sum_{i=1}^k \beta_i X_{ij}]}}\]

where $\alpha$ and $\beta_i$ are the intercept and slope respectively.

Logistic Regression

So in simple words, logistic regression is used to find the probability of the occurrence of the outcome of interest.  For example, if we want to find the significance of the different predictors (gender, sleeping hours, took part in extracurricular activities, etc.), on a binary response (pass or fail in exams coded as 0 and 1), for this kind of problems we used logistic regression.

By using a transformation this nonlinear regression model can be easily converted into a linear model. As $\pi$ is the probability of the events in which we are interested if we take the ratio of the probability of success and failure then the model becomes a linear model.

\[ln(y)=ln(\frac{\pi}{1-\pi})\]

The natural log of odds can convert the logistics regression model into a linear form.

Binary Logistic Regression in Minitab

References:

Discovering Odds Ratio

An odds ratio is a relative measure of effect, allowing the comparison of the intervention group of a study relative to the comparison or placebo group. The odds ratio helps quantify the strength and direction of the relationship between two groups or conditions.

Introduction Odds Ratio

The odds ratio (OR) is a measure of association used in statistics to compare the odds of an event occurring in one group to the odds of it occurring in another group. It is commonly used in case-control studies and logistic regression.

  • an OR of 1 indicates no difference between groups,
  • an OR greater than 1 suggests higher odds in the first group, and
  • an OR less than 1 suggests lower odds in the first group.

Medical students, students from clinical and psychological sciences, professionals allied to medicine enhancing their understanding and learning of medical literature, and researchers from different fields of life usually encounter Odds Ratio (OR) throughout their careers.

When computing the OR, one would do:

  • The numerator is the odds in the intervention arm
  • The denominator is the odds in the control or placebo arm= OR

Calculating Odds Ratio

The ratio of the probability of success and failure is known as the odds. If the probability of an event is $P_1$ then the odds are:
\[OR=\frac{p_1}{1-p_1}\]

If the outcome is the same in both groups, the ratio will be 1, implying that there is no difference between the two arms of the study. However, if the $OR>1$, the control group is better than the intervention group while, if the $OR<1$, the intervention group is better than the control group.

The Odds Ratio is the ratio of two odds that can be used to quantify how much a factor is associated with the response factor in a given model. If the probabilities of occurrences of an event are $P_1$ (for the first group) and $P_2$ (for the second group), then the OR is:
\[OR=\frac{\frac{p_1}{1-p_1}}{\frac{p_2}{1-p_2}}\]

If predictors are binary then the OR for $i$th factor is defined as
\[OR_i=e^{\beta}_i\]

Odds Ratio

Real-Life Examples of Odds Ratio

  1. Medical Researches
    • Consider we are interested in comparing the odds of developing a disease (e.g., lung cancer) in smokers versus non-smokers. Suppose, the OR is 2.5, it means smokers have 2.5 times higher odds of developing lung cancer compared to non-smokers.
  2. Public Health
    • Suppose, we are interested in assessing the effectiveness of a vaccine. For example, comparing the odds of contracting a disease (e.g., COVID-19) in vaccinated versus unvaccinated individuals. An OR less than 1 would indicate the vaccine reduces the odds of infection.
  3. Social Sciences
    • Consider we are interested in studying the odds of students passing an exam based on attendance. For instance, if students who attend extra tutoring have an OR of 3.0 for passing, they have 3 times higher odds of passing compared to those who don’t attend.
  4. Marketing
    • Suppose we need to analyze the odds of customers purchasing a product after seeing an advertisement versus not seeing it. An OR greater than 1 suggests the ad increases the likelihood of purchase.
  5. Environmental Studies
    • Evaluating the odds of developing asthma in people living in high-pollution areas compared to those in low-pollution areas. An OR greater than 1 would indicate higher odds of asthma in high-pollution areas.

The regression coefficient $b_1$ from logistic regression is the estimated increase in the log odds of the dependent variable per unit increase in the value of the independent variable. In other words, the exponential function of the regression coefficients $(e^{b_1})$ in the OR is associated with a one-unit increase in the independent variable.

Online MCQs about Economics with Answers

R Programming Language Lectures