Introduction Odds Ratio (2015)

Introduction Odds Ratio

An odds ratio is a relative measure of effect, allowing the comparison of the intervention group of a study relative to the comparison or placebo group.

Medical students, students from clinical and psychological sciences, professionals allied to medicine enhancing their understanding and learning of medical literature, and researchers from different fields of life usually encounter Odds Ratio (OR) throughout their careers.

When computing the OR, one would do:

  • The numerator is the odds in the intervention arm
  • The denominator is the odds in the control or placebo arm= OR

Calculating ODDs Ratios

The ratio of the probability of success and failure is known as the odds. If the probability of an event is $P_1$ then the odds are:
\[OR=\frac{p_1}{1-p_1}\]

If the outcome is the same in both groups, the ratio will be 1, implying that there is no difference between the two arms of the study. However, if the $OR>1$, the control group is better than the intervention group while, if the $OR<1$, the intervention group is better than the control group.

The Odds Ratio is the ratio of two odds that can be used to quantify how much a factor is associated with the response factor in a given model. If the probabilities of occurrences of an event are $P_1$ (for the first group) and $P_2$ (for the second group), then the OR is:
\[OR=\frac{\frac{p_1}{1-p_1}}{\frac{p_2}{1-p_2}}\]

If predictors are binary then the OR for $i$th factor is defined as
\[OR_i=e^{\beta}_i\]

Odds Ratio

The regression coefficient $b_1$ from logistic regression is the estimated increase in the log odds of the dependent variable per unit increase in the value of the independent variable. In other words, the exponential function of the regression coefficients $(e^{b_1})$ in the OR is associated with a one-unit increase in the independent variable.

Online MCQs about Economics with Answers

R Programming Language Lectures

Application of Regression in Medical: A Quick Guide (2024)

The application of Regression cannot be ignored, as regression is a powerful statistical tool widely used in medical research to understand the relationship between variables. It helps identify risk factors, predict outcomes, and optimize treatment strategies.

Considering the application of regression analysis in medical sciences, Chan et al. (2006) used multiple linear regression to estimate standard liver weight for assessing adequacies of graft size in live donor liver transplantation and remnant liver in major hepatectomy for cancer. Standard liver weight (SLW) in grams, body weight (BW) in kilograms, gender (male=1, female=0), and other anthropometric data of 159 Chinese liver donors who underwent donor right hepatectomy were analyzed. The formula (fitted model)

 \[SLW = 218 + 12.3 \times BW + 51 \times gender\]

 was developed with a coefficient of determination $R^2=0.48$.

Application of Regression Analysis

These results mean that in Chinese people, on average, for each 1-kg increase of BW, SLW increases about 12.3 g, and, on average, men have a 51-g higher SLW than women. Unfortunately, SEs and CIs for the estimated regression coefficients were not reported. Using Formula 6 in their article, the SLW for Chinese liver donors can be estimated if BW and gender are known. About 50% of the variance of SLW is explained by BW and gender.

The regression analysis helps in:

  • Identifying risk factors: Determine which factors contribute to the development of a disease (For example, gender, age, smoking, and blood pressure for heart disease).
  • Predicting disease occurrence: Estimate the likelihood of a patient developing a disease based on specific risk factors. for example, logistic regression is used to predict the risk of diabetes based on factors like BMI, age, and family history.

The following types of regression models are widely used in medical sciences:

  • Linear regression: Used when the outcome variable is continuous (e.g., blood pressure, cholesterol levels).
  • Logistic regression: Used when the outcome variable is binary (e.g., disease present/absent, survival/death).
  • Cox proportional hazards regression: Used for survival analysis (time to event data)

 Some other related articles (Application of Regression Analysis in Medical Sciences)

Reference of Article

  • Chan SC, Liu CL, Lo CM, et al. (2006). Estimating liver weight of adults by body weight and gender. World J Gastroenterol 12, 2217–2222.

R Programming Lectures

Regression Model Assumptions

Linear Regression Model Assumptions

The linear regression model (LRM) is based on certain statistical assumptions, some of which are related to the distribution of a random variable (error term) $u_i$, some are about the relationship between error term $u_i$ and the explanatory variables (Independent variables, $X$‘s) and some are related to the independent variable themselves. The linear regression model assumptions can be classified into two categories

  1. Stochastic Assumption
  2. None Stochastic Assumptions

These linear regression model assumptions (or assumptions about the ordinary least square method: OLS) are extremely critical to interpreting the regression coefficients.

Regression Model Assumptions
  • The error term ($u_i$) is a random real number i.e. $u_i$ may assume any positive, negative, or zero value upon chance. Each value has a certain probability, therefore, the error term is a random variable.
  • The mean value of $u$ is zero, i.e. $E(u_i)=0$ i.e. the mean value of $u_i$ is conditional upon the given $X_i$ is zero. It means that for each value of variable $X_i$, $u$ may take various values, some of them greater than zero and some smaller than zero. Considering all possible values of $u$ for any particular value of $X$, we have zero mean value of disturbance term $u_i$.
  • The variance of $u_i$ is constant i.e. for the given value of $X$, the variance of $u_i$ is the same for all observations. $E(u_i^2)=\sigma^2$. The variance of disturbance term ($u_i$) about its mean is at all values of $X$ will show the same dispersion about their mean.
  • The variable $u_i$ has a normal distribution i.e. $u_i\sim N(0,\sigma_{u}^2$. The value of $u$ (for each $X_i$) has a bell-shaped symmetrical distribution.
  • The random terms of different observations ($u_i,u_j$) are independent i..e $E(u_i,u_j)=0$, i.e. there is no autocorrelation between the disturbances. It means that the random term assumed in one period does not depend on the values in any other period.
  • $u_i$ and $X_i$ have zero covariance between them i.e. $u$ is independent of the explanatory variable or $E(u_i X_i)=0$ i.e. $Cov(u_i, X_i)=0$. The disturbance term $u$ and explanatory variable $X$ are uncorrelated. The $u$’s and $X$’s do not tend to vary together as their covariance is zero. This assumption is automatically fulfilled if the $X$ variable is nonrandom or non-stochastic or if the mean of the random term is zero.
  • All the explanatory variables are measured without error. It means that we will assume that the regressors are error-free while $y$ (dependent variable) may or may not include measurement errors.
  • The number of observations $n$ must be greater than the number of parameters to be estimated or the number of observations must be greater than the number of explanatory (independent) variables.
  • The should be variability in the $X$ values. That is $X$ values in a given sample must not be the same. Statistically, $Var(X)$ must be a finite positive number.
  • The regression model must be correctly specified, meaning there is no specification bias or error in the model used in empirical analysis.
  • No perfect or near-perfect multicollinearity or collinearity exists among the two or more explanatory (independent) variables.
  • Values taken by the regressors $X$ are considered to be fixed in repeating sampling i.e. $X$ is assumed to be non-stochastic. Regression analysis is conditional on the given values of the regressor(s) $X$.
  • The linear regression model is linear in the parameters, e.g. $y_i=\beta_1+\beta_2x_i +u_i$
regression model Assumptions

Visit MCQs Site: https://gmstat.com

Simple Linear Regression Model (SLRM)

A simple linear regression model (SLRM) is based on a single independent (explanatory) variable and it fits a straight line such that the sum of squared residuals of the regression model (or vertical distances between the fitted line and points of the data set) as small as possible. The simple linear regression model (usually known as a statistical or probabilistic model) is

\begin{align*}
y_i &= \alpha + \beta x_i +\varepsilon_i\\
\text{OR} \quad y_i&=b_0 + b_1 x_i + \varepsilon_i\\
\text{OR} \quad y_i&=\beta_0 + \beta x_i + \varepsilon_i
\end{align*}
where $y$ is the dependent variable, $x$ is the independent variable. In the regression context, $y$ is the regressand, and $x$ is the regressor. The epsilon ($\varepsilon$) is unobservable, denoting random error or the disturbance term of a regression model. $\varepsilon$ (random error) has some specific importance for its inclusion in the regression model:

Importance of Error Term in Simple Linear Regression Model

  1. Random error ($\varepsilon$) captures the effect on the dependent variable of all variables which are not included in the model under study, because the variable not included in the model may or may not be observable.
  2. Random error ($\varepsilon$) captures any specification error related to the assumed linear-functional form.
  3. Random error ($\varepsilon$) captures the effect of unpredictable random components present in the dependent variable.

We can say that $\varepsilon$ is the variation in variable$y$ not explained (unexplained) by the independent variable $x$ included in the model.

In the above equation or model $\hat{\beta_0}, \hat{\beta_1}$ are the parameters of the model and our main objective is to obtain the estimates of their numerical values i.e. $\hat{\beta_0}$ and $\hat{\beta_1}$, where $\beta_0$ is the intercept (regression constant), it passes through the ($\overline{x}, \overline{y}$) i.e. center of mass of the data points and $\beta_1$ is the slope or regression coefficient of the model and slope is the correlation between variable $x$ and $y$ corrected by the ratio of standard deviations of these variables.

The subscript $i$ denotes the ith value of the variable in the model.
\[y=\beta_0 + \beta_1 x_1\]
This is a mathematical model as all the variation in $y$ is due solely to change in $x$. There are no other factors affecting the dependent variable. If this is true then all the pairs $(x, y)$ will fall on a straight line if plotted on a two-dimensional plane. However, the plot may or may not be a straight line for observed values. A dimensional diagram with points plotted in pair form is called a scatter diagram.

Simple Linear Regression Model scatter with regression line

See Assumptions about Simple Linear Regression Model

FAQs about Simple Linear Regression Models

  1. What is a simple linear regression Model?
  2. What is a Probabilistic/ Statistical model?
  3. What is the equation of a simple linear regression model?
  4. Write about the importance of error terms in the regression model.
  5. What are the parameters in a simple linear regression model?
  6. What is the objective of estimating the parameters of a simple linear regression model?
itfeature.com statistics help

visit and learn R Programming Language