Application of Regression in Medical: A Quick Guide (2024)

The application of Regression cannot be ignored, as regression is a powerful statistical tool widely used in medical research to understand the relationship between variables. It helps identify risk factors, predict outcomes, and optimize treatment strategies.

Considering the application of regression analysis in medical sciences, Chan et al. (2006) used multiple linear regression to estimate standard liver weight for assessing adequacies of graft size in live donor liver transplantation and remnant liver in major hepatectomy for cancer. Standard liver weight (SLW) in grams, body weight (BW) in kilograms, gender (male=1, female=0), and other anthropometric data of 159 Chinese liver donors who underwent donor right hepatectomy were analyzed. The formula (fitted model)

 \[SLW = 218 + 12.3 \times BW + 51 \times gender\]

 was developed with a coefficient of determination $R^2=0.48$.

Application of Regression Analysis

These results mean that in Chinese people, on average, for each 1-kg increase of BW, SLW increases about 12.3 g, and, on average, men have a 51-g higher SLW than women. Unfortunately, SEs and CIs for the estimated regression coefficients were not reported. Using Formula 6 in their article, the SLW for Chinese liver donors can be estimated if BW and gender are known. About 50% of the variance of SLW is explained by BW and gender.

The regression analysis helps in:

  • Identifying risk factors: Determine which factors contribute to the development of a disease (For example, gender, age, smoking, and blood pressure for heart disease).
  • Predicting disease occurrence: Estimate the likelihood of a patient developing a disease based on specific risk factors. for example, logistic regression is used to predict the risk of diabetes based on factors like BMI, age, and family history.

The following types of regression models are widely used in medical sciences:

  • Linear regression: Used when the outcome variable is continuous (e.g., blood pressure, cholesterol levels).
  • Logistic regression: Used when the outcome variable is binary (e.g., disease present/absent, survival/death).
  • Cox proportional hazards regression: Used for survival analysis (time to event data)

 Some other related articles (Application of Regression Analysis in Medical Sciences)

Reference of Article

  • Chan SC, Liu CL, Lo CM, et al. (2006). Estimating liver weight of adults by body weight and gender. World J Gastroenterol 12, 2217–2222.

R Programming Lectures

Using Mathematica Built-in Functions (2014)

Introduction to Mathematica Built-in Functions

There are thousands of thousands of Mathematica Built-in Functions. Knowing a few dozen of the more important will help to do lots of neat calculations. Memorizing the names of most of the functions is not too hard as approximately all of the built-in functions in Mathematica follow naming convention (i.e. names of functions are related to the objective of their functionality), for example, the Abs function is for absolute value, Cos function is for Cosine and Sqrt is for the square root of a number.

The important thing than memorizing the function names is remembering the syntax needed to use built-in functions. Remembering many of the built-in Mathematica functions will not only make it easier to follow programs but also enhance your programming skills.

Important and Widely Used Mathematica Built-in Functions

The following is a short list related to Mathematica Built-in Functions.

  • Sqrt[ ]:   used to find the square root of a number
  • N[ ]:   used for numerical evaluation of any mathematical expression e.g. N[Sqrt[27]]
  • Log[  ]: used to find the log base 10 of a number
  • Sin[  ]: used to find trigonometric function Sin
  • Abs[  ]: used to find the absolute value of a number

Common Mathematica built-in functions include

  1. Trigonometric functions and their inverses
  2. Hyperbolic functions and their inverses
  3. logarithm and exponential functions

Every built-in function in Mathematica has two very important features

  • All Mathematica built-in functions begin with Capital letters, such as for square root we use Sqrt, for inverse cosine we use the ArCos built-in function.
  • Square brackets are always used to surround the input or argument of a function.

For computing the absolute value -12, write on command prompt Abs[-12]  instead of for example Abs(-12) or Abs{-12} etc i.e.   Abs[-12] is a valid command for computing the absolute value of -12.

Mathematica Built-in Functions

Note that:

In Mathematica single square brackets are used for input in a function, double square brackets [[ and ]] are used for lists, and parenthesis ( and ) are used to group terms in algebraic expression while curly brackets { and } are used to delimit lists. The three sets of delimiters [ ], ( ), { } are used for functions, algebraic expressions, and lists respectively.

Introduction to Mathematica

R Programming Language

MCQs General Knowledge

Time Series Analysis and Forecasting (2013)

Time Series Analysis

Time series analysis is the analysis of a series of data points over time, allowing one to answer questions such as what is the causal effect on a variable $Y$ of a change in variable $X$ over time? An important difference between time series and cross-section data is that the ordering of cases does matter in time series.

A time series $\{Y_t\}$ or $\{y_1,y_2,\cdots,y_T\}$ is a discrete-time, continuous state process where time $t=1,2,\cdots,=T$ are certain discrete time points spaced at uniform time intervals.

Usually, time is taken at more or less equally spaced intervals such as hour, day, month, quarter, or year. More specifically, it is a set of data in which observations are arranged in chronological order (A set of repeated observations of the same variable).

Use of Time Series

Time series are used in different fields of science such as statistics, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, electroencephalography, control engineering, astronomy, and communications engineering among many other fields.

Definition: A sequence of random variables indexed by time is called a stochastic process (stochastic means random) or time series for mere mortals. A data set is one possible outcome (realization) of the stochastic process. If history had been different, we would observe a different outcome, thus we can think of time series as the outcome of a random variable.

Rather than dealing with individuals as units, the unit of interest is time: the value of Y at time $t$ is $Y_t$. The unit of time can be anything from days to election years. The value of $Y_t$ in the previous period is called the first lag value: $Y_{t-1}$. The jth lag is denoted: $Y_{t-j}$. Similarly, $Y_{t+1}$ is the value of $Y_t$ in the next period. So a simple bivariate regression equation for time series data looks like: \[Y_t = \beta_0 + \beta X_t + u_t\]

Continuous Time Series

A time series is said to be continuous when observation are made continuously in time. The term continuous is used for series of this type even when the measured variable can only take a discrete set of values.

Discrete Time Series

A time series is said to be discrete when observations are taken at a specific time, usually equally spaced. The term discrete is used for series of this type even when the measured variable is a continuous variable.

Most Macroeconomic and financial data comes in the form of time series. GNP or Stock Return is an example of time series data.

We can write a series as $\{x_1,x_2,x_3,\cdots,x_T\}$ or $\{x_t\}$, where $t=1,2,3,\cdots,T$. $x_t$ is treated as a random variable.

Time series analysis refers to the branch of statistics where observations are collected sequentially in time, usually but not necessarily at equal-spaced time points. The arcane difference between time series and other variables is the use of subscripts.

Time series analysis comprises methods for analyzing time series data to extract some useful (meaningful) statistics and other characteristics of the data, while Time series forecasting is the use of a model to predict future values based on previously observed values.

Given an observed time series, the first step in analyzing a time series is to plot the given series on a graph taking time intervals (t) along the X-axis (as independent variable) and the observed value ($Y_t$) on the Y-axis (as dependent variable). Such a graph will show various types of fluctuations and other points of interest.

Time Series Analysis and Forecasting

Note

  • $Y_t$ is treated as random variable. If $Y_t$ is generated by some model (Regression model for time series i.e. $Y_t=x_t\beta +\varepsilon_t$, $E(\varepsilon_t|x_t)=0$, then ordinary least square (OLS) provides a consistent estimates of $\beta$.
  • Time series interchangeably used for sample $\{x_t\}$ and probability model. A possible probability model for the joint distribution of a time series $\{x_t\}$ is $x_t=\varepsilon_t$, $\varepsilon_t\sim iid  N(0,\sigma_\varepsilon^2)$
  • Time series are typically not iid (Independent Identically Distributed) e.g. If GNP today is unusually high, GNP tomorrow will also likely to be unusually high.

Reference:

R Programming Language