Logistic regression was introduced in the 1930s by Ronald Fisher and Frank Yates and was first proposed in the 1970s as an alternative technique to overcome the limitations of ordinary least squares regression in handling dichotomous outcomes. It is a type of probabilistic statistical classification model, which is a non-linear regression model, and can be converted into a linear model by using a simple transformation. It is used to predict a binary response categorical dependent variable, based on one or more predictor variables. That is, it is used in estimating empirical values of the parameters in a model. Here response variable assumes a value of zero or one, i.e., a dichotomous variable.
Logistic Regression Model
The logistic regression model is written as
\[\pi=\frac{1}{1+e^{-[\alpha +\sum_{i=1}^k \beta_i X_{ij}]}}\]
where $\alpha$ and $\beta_i$ are the intercept and slope respectively.
So, in simple words, logistic regression is used to find the probability of the occurrence of the outcome of interest. For example, if we want to find the significance of the different predictors (gender, sleeping hours, taking part in extracurricular activities, etc.), on a binary response (pass or fail in exams coded as 0 and 1), for this kind of problem, we used logistic regression.
By using a transformation, this nonlinear regression model can be easily converted into a linear model. As $\pi$ is the probability of the events in which we are interested, if we take the ratio of the probability of success and failure, then the model becomes a linear model.
\[ln(y)=ln(\frac{\pi}{1-\pi})\]
The natural log of odds can convert the logistic regression model into a linear form.
Real-Life Examples of Logistic Regression
Some real-life examples are:
- Medical Diagnostics: It is used to predict whether a patient has a disease (for example, diabetes, cancer) based on symptoms, lab tests, and medical history.
- Spam Email Detection: Emails can be classified as spam or not spam using word frequency, sender details, etc.
- Marketing (Customer Churn Prediction): It is used to predict if a customer will stop using a service (for example, cancel a subscription) based on usage patterns and demographics.
- Credit Scoring (Loan Approval): Banks use logistic regression to decide whether to approve a loan based on income, credit score, employment status, etc.
- Ad Click Prediction: It can be used to predict whether a user will click on an online ad based on browsing history and demographics.
- Employee Attrition: Can be used to predict if an employee will leave a company based on job satisfaction, salary, and tenure.
- College Admissions: It is used to predict whether a student will be admitted to a university based on GPA, test scores, and extracurricular activities.
- Fraud Detection in Banks: It can be used to detect fraudulent credit card transactions based on transaction amount, location, and spending habits.
- Political Election Forecasting: Predicting if a candidate will win an election based on polling data, demographics, and campaign spending.
- Sports Analytics: Win prediction of a team based on their past performance, player stats, and opponent strength can be made using logistic regression.
Binary Logistic Regression in Minitab
References: