## Simple Linear Regression Model

Frequently, we measure two or more variables on each individual and try to express the nature of the relationship between these variables (for example in simple linear regression model and correlation analysis). Using the regression technique, we estimate the relationship of one variable with another by expressing the one in terms of a linear (or more complex) function of another. We also predict the values of one variable in terms of the other. The variables involved in regression and correlation analysis are continuous. In this post we will learn about Simple Linear Regression Model.

We are interested in establishing significant functional relationships between two (or more) variables. For example, the function $Y=f(X)=a+bx$ (read as $Y$ is function of $X$) establishes a relationship to predict the values of variable $Y$ for the given values of variable $X$. In statistics (biostatistics), the function is called a simple linear regression model or simply the regression equation.

The variable $Y$ is called the dependent (response) variable, and $X$ is called the independent (regressor or explanatory) variable.

In biology, many relationships can be appropriate over only a limited range of values of $X$. Negative values are meaningless in many cases, such as age, height, weight, and body temperature.

The method of linear regression is used to estimate the best-fitting straight line to describe the relationship between variables. The linear regression gives the equation of the straight line that best describes how the outcome of $Y$ increases/decreases with an increase/decrease in the explanatory variable $X$. The equation of the regression line is
$$Y=\beta_0 + \beta_1 X,$$
where $\beta_0$ is the intercept (value of $Y$ when $X=0$) and $\beta_1$ is the slope of the line. Both $\beta_0$ and $\beta_1$ are the parameters (or regression coefficients) of the linear equation.

### Estimation of Regression Coefficients in Simple Linear Regression Model

The best-fitting line is derived using the method of the \textit{Least Squares} by finding the values of the parameters $\beta_0$ and $\beta_1$ that minimize the sum of the squared vertical distances of the points from the regression line,

The dotted-line (best-fit) line passes through the point ($\overline{X}, \overline{Y}$).

The regression line $Y=\beta_0+\beta_1X$ is fit by the least-squares methods. The regression coefficients $\beta_0$ and $\beta_1$ both are calculated to minimize the sum of squares of the vertical deviations of the points about the regression line. Each deviation equals the difference between the observed value of $Y$ and the estimated value of $Y$ (the corresponding point on the regression.

The following table shows the \textit{body weight} and \textit{plasma volume} of eight healthy men.

The parameters $\beta_0$ and $\beta_1$ are estimated using the following formula (for simple linear regression model):

\begin{align}
\beta_1 &= \frac{n\sum\limits_{i=1}^{n} x_iy_i -\sum\limits_{i=1}^{n} x_i \sum\limits_{i=1}^{n} y_i} {n \sum\limits_{i=1}^{n} x_i^2 – \left(\sum\limits_{i=1}^{n} x_i \right)^2}\\
\beta_0 &= \overline{Y} – \beta_1 \overline{X}
\end{align}

Regression coefficients are sometimes known as “beta-coefficients”. When slope ($\beta_1=0$) then there is no relationship between $X$ and $Y$ variable. For the data above, the best-fitting straight line describing the relationship between plasma volume with body weight is
$$Plasma\, Volume = 0.0857 +0.0436\times Weight$$
Note that the calculated values for $\beta_0$ and $\beta_1$ are estimates of the population values, therefore, subject to sampling variations.

https://gmstat.com

https://rfaqs.com

## Simple Linear Regression Model (SLRM)

A simple linear regression model (SLRM) is based on a single independent (explanatory) variable and it fits a straight line such that the sum of squared residuals of the regression model (or vertical distances between the fitted line and points of the data set) as small as possible. The simple linear regression model (usually known as a statistical or probabilistic model) is

\begin{align*}
y_i &= \alpha + \beta x_i +\varepsilon_i\\
\text{OR} \quad y_i&=b_0 + b_1 x_i + \varepsilon_i\\
\text{OR} \quad y_i&=\beta_0 + \beta x_i + \varepsilon_i
\end{align*}
where $y$ is the dependent variable, $x$ is the independent variable. In the regression context, $y$ is the regressand, and $x$ is the regressor. The epsilon ($\varepsilon$) is unobservable, denoting random error or the disturbance term of a regression model. $\varepsilon$ (random error) has some specific importance for its inclusion in the regression model:

### Importance of Error Term in Simple Linear Regression Model

1. Random error ($\varepsilon$) captures the effect on the dependent variable of all variables which are not included in the model under study, because the variable not included in the model may or may not be observable.
2. Random error ($\varepsilon$) captures any specification error related to the assumed linear-functional form.
3. Random error ($\varepsilon$) captures the effect of unpredictable random components present in the dependent variable.

We can say that $\varepsilon$ is the variation in variable$y$ not explained (unexplained) by the independent variable $x$ included in the model.

In the above equation or model $\hat{\beta_0}, \hat{\beta_1}$ are the parameters of the model and our main objective is to obtain the estimates of their numerical values i.e. $\hat{\beta_0}$ and $\hat{\beta_1}$, where $\beta_0$ is the intercept (regression constant), it passes through the ($\overline{x}, \overline{y}$) i.e. center of mass of the data points and $\beta_1$ is the slope or regression coefficient of the model and slope is the correlation between variable $x$ and $y$ corrected by the ratio of standard deviations of these variables.

The subscript $i$ denotes the ith value of the variable in the model.
$y=\beta_0 + \beta_1 x_1$
This is a mathematical model as all the variation in $y$ is due solely to change in $x$. There are no other factors affecting the dependent variable. If this is true then all the pairs $(x, y)$ will fall on a straight line if plotted on a two-dimensional plane. However, the plot may or may not be a straight line for observed values. A dimensional diagram with points plotted in pair form is called a scatter diagram.

### FAQs about Simple Linear Regression Models

1. What is a simple linear regression Model?
2. What is a Probabilistic/ Statistical model?
3. What is the equation of a simple linear regression model?
4. Write about the importance of error terms in the regression model.
5. What are the parameters in a simple linear regression model?
6. What is the objective of estimating the parameters of a simple linear regression model?

visit and learn R Programming Language

## Interpreting Regression Coefficients in Simple Regression

How are the regression coefficients interpreted in simple regression?

The simple regression model is

The formula for Regression Coefficients in Simple Regression Models is:

$$b = \frac{n\Sigma XY – \Sigma X \Sigma Y}{n \Sigma X^2 – (\Sigma X)^2}$$

$$a = \bar{Y} – b \bar{X}$$

The basic or unstandardized regression coefficient is interpreted as the predicted change in $Y$ (i.e., the dependent variable abbreviated as DV) given a one-unit change in $X$ (i.e., the independent variable abbreviated as IV). It is in the same units as the dependent variable.

### Interpreting Regression Coefficients

Interpreting regression coefficients involves understanding the relationship between the IV(s) and the DV in a regression model.

• Magnitude: For simple linear regression models, the coefficient (slope) tells about the change in the DV associated with a one-unit change in the IV. For example, if the regression coefficient for IV (regressor) is 0.5, then it means that for every one-unit increase in that predictor, the DV is expected to increase by 0.5 units while keeping all else equal.
• Direction: The sign of the regression coefficient (+ or -) indicates the direction of the relationship between the IV and DV. A positive coefficient means that as the IV increases, the DV is expected to increase as well. A negative coefficient means that as the IV increases, the DV is expected to decrease.
• Statistical Significance: The statistical significance of the coefficient is important to consider. The significance of a regression coefficient tells whether the relationship between the IV and the DV is likely to be due to chance or if it’s statistically meaningful. Generally, if the p-value of a regression coefficient is less than a chosen significance level (say 0.05), then that coefficient will be considered to be statistically significant.
• Interaction Effects: The relationship between an IV and the DV may depend on the value of another variable. In such cases, the interpretation of regression coefficients may involve the interaction effects, where the effect of one variable on the DV varies depending on the value of another variable.
• Context: Always interpret coefficients in the context of the specific problem being investigated. It is quite possible that a coefficient might not make practical sense without considering the nature of the data and the underlying phenomenon being studied.

Therefore, the interpretation of regression coefficients should be done carefully. The assumptions of the regression model, and the limitations of the data, should be considered. On the other hand, interpretation may differ based on the type of regression model being used (e.g., linear regression, logistic regression) and the specific research question being addressed.

• Note that there is another form of the regression coefficient that is important: the standardized regression coefficient. The standardized coefficient varies from –1.00 to +1.00 just like a simple correlation coefficient;
• If the regression coefficient is in standardized units, then in simple regression the regression coefficient is the same thing as the correlation coefficient.