Logistic regression Introduction

Logistic regression was introduced in the 1930s by Ronald Fisher and Frank Yates and was first proposed in 1970s as an alternative technique to overcome limitations of ordinary least square regression in handling dichotomous outcomes. It is a type of probabilistic statistical classification model which is a non-linear regression model, can be converted into a linear model by using a simple transformation. It is used to predict a binary response categorical dependent variable, based on one or more predictor variables. That is, it is used in estimating empirical values of the parameters in a model. Here response variable assumes a value as zero or one i.e. dichotomous variable. It is the regression model of b, a logistic regression model is written as

  \[\pi=\frac{1}{1+e^{-[\alpha +\sum_{i=1}^k \beta_i X_{ij}]}}\]

where $\alpha$ and $\beta_i$ are the intercept and slope respectively.

So in simple words, logistic regression is used to find the probability of the occurrence of the outcome of interest.  For example, if we want to find the significance of the different predictors (gender, sleeping hours, took part in extracurricular activities, etc.), on a binary response (pass or fail in exams coded as 0 and 1), for this kind of problems we used logistic regression.

By using a transformation this nonlinear regression model can be easily converted into a linear model. As $\pi$ is the probability of the events in which we are interested so if we take the ratio of the probability of success and failure then the model becomes a linear model.


The natural log of odds can convert the logistics regression model into a linear form.