Simple Regression Analysis - Statistics for Data Science & Analytics

Method of Least Squares

Jan 2, 2025 by Muhammad Imdad Ullah

Introduction to Method of Least Squares

The method of least squares is a statistical technique used to find the best-fitting curve or line for a set of data points. It does this by minimizing the sum of the squares of the offsets (residuals) of the points from the curve.

The method of least squares is used for

solution of equations, and
curve fitting

The principles of least squares consist of minimizing the sum of squares of deviations, errors, or residuals.

Mathematical Functions/ Models

Many types of mathematical functions (or models) can be used to model the response, i.e. a function of one or more independent variables. It can be classified into two categories, deterministic and probabilistic models. For example, $Y$ and $X$ are related according to the relation

$$Y=\beta_o + \beta_1 X,$$

where $\beta_o$ and $\beta_1$ are unknown parameter. $Y$ is a response variable and $X$ is an independent/auxiliary variable (regressor). The model above is called the deterministic model because it does not allow for any error in predicting $Y$ as a function of $X$.

Probabilistic and Deterministic Models

Suppose that we collect a sample of $n$ values of $Y$ corresponding to $n$ different settings for the independent random variable $X$ and the graph of the data is as shown below.

In the figure above it is clear that $E(Y)$ may increase as a function of $X$ but the deterministic model is far from an adequate description of reality.

Repeating the experiment when say $X=20$, we would find $Y$ fluctuates about a random error, which leads us to the probabilistic model (that is the model is not deterministic or not an exact representation between two variables). Further, if the mode is used to predict $Y$ when $X=20$, the prediction would be subjected to some known error. This of course leads us to use the statistical method predicting $Y$ for a given value of $X$ is an inferential process and we need to find if the error of prediction is to be valued in real life. In contrast to the deterministic model, the probabilistic model is

$$E(Y)=\beta_o + \beta_1 X + \varepsilon,$$

where $\varepsilon$ is a random variable having the specified distribution, with zero mean. One may think having the deterministic component with error $\varepsilon$.

The probabilistic model accounts for the random behaviour of $Y$ exhibited in the figure and provides a more accurate description of reality than the deterministic model.

The properties of error of prediction of $Y$ can be divided for many probabilistic models. If the deterministic model can be used to predict with negligible error, for all practical purposes, we use them, if not, we seek a probabilistic model which will not be a correct/exact characterization of nature but enable us to assess the reality of our nature.

Estimation of Linear Model: Least Squares Method

For the estimation of the parameters of a linear model, we consider fitting a line.

$$E(Y) = \beta_o + \beta_1 X, \qquad (where\,\, X\,\,\, is \,\,\, fixed).$$

For a set of points ($x_i, y_i$), we consider the real situation

$$Y=\beta_o+\beta_1X+\varepsilon, \qquad with\,\,\, E(\varepsilon)=0$$

where $\varepsilon$ posses specific probability distribution with zero mean and $\beta_o$ and $\beta_1$ are unknown parameters.

Minimizing the Vertical Distances of Data Points

Now if $\hat{\beta}_o$ and $\hat{\beta}_1$ are the estimates of $\beta_o$ and $\beta_1$, respectively then $\hat{Y}=\hat{\beta}_o+\hat{\beta}_1X$ is an estimate of $E(Y)$.

Suppose we have a set of $n$ data sets (points, $x_i, y_i$) and we want to minimize the sum of squares of the vertical distances of the data points from the fitted line $\hat{y}_i = \hat{\beta}_o + \hat{\beta}_1x_i; \,\,\, i=1,2,\cdots, n$. The $\hat{y}_i = \hat{\beta}_o + \hat{\beta}_1x_i$ is the predicted value of $i$th $Y$ when $X=x_i$. The deviation of observed values of $Y$ from $\hat{Y}$ line (sometimes called errors) is $y_i – \hat{y}_i$ and the sum of squares of deviations to be minimized is (vertical distance: $y_i – \hat{y}_i$).

\begin{align*}
SSE &= \sum\limits_{i=1}^n (y_i-\hat{y}_i)^2\\
&= \sum\limits_{i=1}^n (y_i – \hat{\beta}_o – \hat{\beta}_1x_i)^2
\end{align*}

The quantity SSE is called the sum of squares of errors. If SSE possesses minimum, it will occur for values of $\beta_o$ and $\beta_1$ that satisfied the equation $\frac{\partial SSE}{\partial \beta_o}=0$ and $\frac{\partial SSE}{\partial \beta_1}=0$.

Taking the partial derivatives of SSE with respect to $\hat{\beta}_o$ and $\hat{\beta}_1$ and setting them equal to zero, gives us

\begin{align*}
\frac{\partial SSE}{\partial \beta_o} &= \sum\limits_{i=1}^n (y_i – \hat{\beta}_o – \hat{\beta}_1 x_i)^2\\
&= -2 \sum\limits_{i=1}^n (y_i – \hat{\beta}_o – \hat{\beta}_1 x_i) =0\\
&= \sum\limits_{i=1}^n y_i – n\hat{\beta}_o – \hat{\beta}_1 \sum\limits_{i=1}^n x_i =0\\
\Rightarrow \overline{y} &= \hat{\beta}_o + \beta_1\overline{x} \tag*{eq (1)}
\end{align*}

and

\begin{align*}
\frac{\partial SSE}{\partial \beta_1} &= -2 \sum\limits_{i=1}^n (y_i – \hat{\beta}_o – \hat{\beta}_1 x_i)x_i =0\\
&= \sum\limits_{i=1}^n (y_i – \hat{\beta}_o – \hat{\beta}_1 x_i)x_i=0\\
\Rightarrow \sum\limits_{i=1}^n x_iy_i &= \hat{\beta}_o \sum\limits_{i=1}^n x_i – \hat{\beta}_1 \sum\limits_{i=1}^n x_i^2\tag*{eq (2)}
\end{align*}

The equation $\frac{\partial SSE}{\hat{\beta}_o}=0$ and $\frac{\partial SSE}{\partial \hat{\beta}_1}=0$ are called the least squares for estimating the parameters of a straight line. On solving the least squares equation, we have from equation (1),

$$\hat{\beta}_o = \overline{Y} – \hat{\beta}_1 \overline{X}$$

Putting $\hat{\beta}_o$ in equation (2)

\begin{align*}
\sum\limits_{i=1}^n x_i y_i &= (\overline{Y} – \hat{\beta}\overline{X}) \sum\limits_{i=1}^n x_i + \hat{\beta}_1 \sum\limits_{i=1}^n x_i^2\\
&= n\overline{X}\,\overline{Y} – n \hat{\beta}_1 \overline{X}^2 + \hat{\beta}_1 \sum\limits_{i=1}^n x_i^2\\
&= n\overline{X}\,\overline{Y} + (\sum\limits_{i=1}^n x_i^2 – n\overline{X}^2)\\
\Rightarrow \hat{\beta}_1 &= \frac{\sum\limits_{i=1}^n x_iy_i – n\overline{X}\,\overline{Y} }{\sum\limits_{i=1}^n x_i^2 – n\overline{X}^2} = \frac{\sum\limits_{i=1}^n (x_i-\overline{X})(y_i-\overline{Y})}{\sum\limits_{i=1}^n(x_i-\overline{X})^2}
\end{align*}

Applications of Least Squares Method

The method of least squares is a powerful statistical technique. It provides a systematic way to find the best-fitting curve or line for a set of data points. It enables us to model relationships between variables, make predictions, and gain insights from data. The method of least squares is widely used in various fields, such as:

Regression Analysis: To model the relationship between variables and make predictions.
Curve Fitting: To find the best-fitting curve for a set of data points.
Data Analysis: To analyze trends and patterns in data.
Machine Learning: As a foundation for many machine learning algorithms.

Frequently Asked Questions about Least Squares Method

What is the method of Least Squares?
Write down the applications of the Least Squares method.
How vertical distance of the data points from the regression line is minimized?
What is the principle of the Method of Least Squares?
What is meant by probabilistic and deterministic models?
Give an example of deterministic and probabilistic models.
What is the mathematical model?
What is the statistical model?
What is curve fitting?
State and prove the Least Squares Method?

R Programming Language

Simple Linear Regression Model

Apr 4, 2025May 15, 2024 by Muhammad Imdad Ullah

A simple Linear Regression model is one of the most fundamental techniques in machine learning and statistics. Whether you are a data science newbie or just brushing up on the basics, understanding linear regression is essential.

Introduction

Frequently, we measure two or more variables on each individual and try to express the nature of the relationship between these variables (for example, in the simple linear regression model and correlation analysis). Using the regression technique, we estimate the relationship of one variable with another by expressing the one in terms of a linear (or more complex) function of another. We also predict the values of one variable in terms of the other. The variables involved in regression and correlation analysis are continuous. In this post, we will learn about the Simple Linear Regression Model.

Functional Relationship Between Variables

We are interested in establishing significant functional relationships between two (or more) variables. For example, the function $Y=f(X)=a+bx$ (read as $Y$ is a function of $X$) establishes a relationship to predict the values of variable $Y$ for the given values of variable $X$. In statistics (biostatistics), the function is a simple linear regression model or the regression equation.

The variable $Y$ is called the dependent (response) variable, and $X$ is called the independent (regressor or explanatory) variable.

In biology, many relationships can be appropriate over only a limited range of values of $X$. Negative values are meaningless in many cases, such as age, height, weight, and body temperature.

The method of linear regression is used to estimate the best-fitting straight line to describe the relationship between variables. The linear regression gives the equation of the straight line that best describes how the outcome of $Y$ increases/decreases with an increase/decrease in the explanatory variable $X$. The equation of the regression line is
$$Y=\beta_0 + \beta_1 X,$$
where $\beta_0$ is the intercept (value of $Y$ when $X=0$) and $\beta_1$ is the slope of the line. Both $\beta_0$ and $\beta_1$ are the parameters (or regression coefficients) of the linear equation.

Estimation of Regression Coefficients in Simple Linear Regression Model

The best-fitting line is derived using the method of the \textit{Least Squares} by finding the values of the parameters $\beta_0$ and $\beta_1$ that minimize the sum of the squared vertical distances of the points from the regression line,

The dotted-line (best-fit) line passes through the point ($\overline{X}, \overline{Y}$).

The regression line $Y=\beta_0+\beta_1X$ is fit by the least-squares methods. The regression coefficients $\beta_0$ and $\beta_1$ are both calculated to minimize the sum of squares of the vertical deviations of the points about the regression line. Each deviation equals the difference between the observed value of $Y$ and the estimated value of $Y$ (the corresponding point on the regression.

The following table shows the \textit{body weight} and \textit{plasma volume} of eight healthy men.

Subject	Body Weight (KG)	Plasma Volume (liters)
1	58.0	2.75
2	70.0	2.86
3	74.0	3.37
4	63.5	2.76
5	62.0	2.62
6	70.5	3.49
7	71.0	3.05
8	66.0	3.12

Simple Linear Regression Models: Scatter plot with regression line

Estimation of Paramters

The parameters $\beta_0$ and $\beta_1$ are estimated using the following formula (for simple linear regression model):

\begin{align}
\beta_1 &= \frac{n\sum\limits_{i=1}^{n} x_iy_i -\sum\limits_{i=1}^{n} x_i \sum\limits_{i=1}^{n} y_i} {n \sum\limits_{i=1}^{n} x_i^2 – \left(\sum\limits_{i=1}^{n} x_i \right)^2}\\
\beta_0 &= \overline{Y} – \beta_1 \overline{X}
\end{align}

Regression coefficients are sometimes known as “beta-coefficients”. When the slope ($\beta_1=0$, then there is no relationship between $X$ and $Y$ variables. For the data above, the best-fitting straight line describing the relationship between plasma volume with body weight is
$$Plasma\, Volume = 0.0857 +0.0436\times Weight$$
Note that the calculated values for $\beta_0$ and $\beta_1$ are estimates of the population values and, therefore, subject to sampling variations.

Real-Life Examples: Simple Linear Regression Models

Real Estate: Predicting House Prices (Estimate home prices based on size to guide buyers and sellers.)
Independent Variable ($X$): Size of the house (sq ft)
Dependent Variable ($Y$): Price of the house
Education: Predicting Student Scores (Teachers or students can predict likely outcomes based on study habits.)
$X$: Hours studied
$Y$: Exam scores
Healthcare: Predicting Blood Pressure (Understand how blood pressure tends to rise with age, aiding diagnosis.)
$X$: Age of patient
$Y$: Systolic blood pressure
Energy: Predicting Electricity Usage (Power companies use this to forecast demand and manage resources)
$X$: Temperature (°C or °F)
$Y$: Electricity consumption (kWh)
Manufacturing: Predicting Machine Failures
$X$: Hours a machine has been in use (Predict maintenance schedules and avoid production delays.)
$Y$: Number of breakdowns or wear percentage
Business: Predicting Sales Based on Advertising Spend (Helps businesses decide how much to invest in advertising.)
$X$: Advertising expenditure (in $\$$)
$Y$: Product sales (in units)
Agriculture: Predicting Crop Yield (Estimate yield based on expected rainfall to plan for food production.)
$X$: Amount of rainfall (mm)
$Y$: Crop yield (kg per acre)
Finance: Predicting Stock Prices (Although basic, it helps in forecasting trends over time (note: simple linear regression has limits in volatile markets))
$X$: Time (days or months)
$Y$: Stock closing price
Transportation: Estimating Fuel Consumption (Predict fuel needs and optimize transportation costs.)
$X$: Distance traveled (km)
$Y$: Fuel used (liters)
E-commerce: Predicting Customer Spending (Analyze user behavior and optimize website experience for better conversion.)
$X$: Time spent on the website
$Y$: Amount spent on a purchase

https://gmstat.com, https://rfaqs.com

Simple Linear Regression Model (SLRM)

Aug 18, 2024Oct 25, 2013 by Muhammad Imdad Ullah

A simple linear regression model (SLRM) is based on a single independent (explanatory) variable and it fits a straight line such that the sum of squared residuals of the regression model (or vertical distances between the fitted line and points of the data set) as small as possible. The simple linear regression model (usually known as a statistical or probabilistic model) is

\begin{align*}
y_i &= \alpha + \beta x_i +\varepsilon_i\\
\text{OR} \quad y_i&=b_0 + b_1 x_i + \varepsilon_i\\
\text{OR} \quad y_i&=\beta_0 + \beta x_i + \varepsilon_i
\end{align*}
where $y$ is the dependent variable, $x$ is the independent variable. In the regression context, $y$ is the regressand, and $x$ is the regressor. The epsilon ($\varepsilon$) is unobservable, denoting random error or the disturbance term of a regression model. $\varepsilon$ (random error) has some specific importance for its inclusion in the regression model:

Importance of Error Term in Simple Linear Regression Model

Random error ($\varepsilon$) captures the effect on the dependent variable of all variables which are not included in the model under study, because the variable not included in the model may or may not be observable.
Random error ($\varepsilon$) captures any specification error related to the assumed linear-functional form.
Random error ($\varepsilon$) captures the effect of unpredictable random components present in the dependent variable.

We can say that $\varepsilon$ is the variation in variable$y$ not explained (unexplained) by the independent variable $x$ included in the model.

In the above equation or model $\hat{\beta_0}, \hat{\beta_1}$ are the parameters of the model and our main objective is to obtain the estimates of their numerical values i.e. $\hat{\beta_0}$ and $\hat{\beta_1}$, where $\beta_0$ is the intercept (regression constant), it passes through the ($\overline{x}, \overline{y}$) i.e. center of mass of the data points and $\beta_1$ is the slope or regression coefficient of the model and slope is the correlation between variable $x$ and $y$ corrected by the ratio of standard deviations of these variables.

The subscript $i$ denotes the ith value of the variable in the model.
\[y=\beta_0 + \beta_1 x_1\]
This is a mathematical model as all the variation in $y$ is due solely to change in $x$. There are no other factors affecting the dependent variable. If this is true then all the pairs $(x, y)$ will fall on a straight line if plotted on a two-dimensional plane. However, the plot may or may not be a straight line for observed values. A dimensional diagram with points plotted in pair form is called a scatter diagram.

Simple Linear Regression Model scatter with regression line

See Assumptions about Simple Linear Regression Model

FAQs about Simple Linear Regression Models

What is a simple linear regression Model?
What is a Probabilistic/ Statistical model?
What is the equation of a simple linear regression model?
Write about the importance of error terms in the regression model.
What are the parameters in a simple linear regression model?
What is the objective of estimating the parameters of a simple linear regression model?

visit and learn R Programming Language

Method of Least Squares

Introduction to Method of Least Squares

Table of Contents

Mathematical Functions/ Models

Probabilistic and Deterministic Models

Estimation of Linear Model: Least Squares Method

Minimizing the Vertical Distances of Data Points

Applications of Least Squares Method

Frequently Asked Questions about Least Squares Method

Simple Linear Regression Model

Table of Contents

Introduction

Functional Relationship Between Variables

Estimation of Regression Coefficients in Simple Linear Regression Model

Estimation of Paramters

Real-Life Examples: Simple Linear Regression Models

Simple Linear Regression Model (SLRM)

Importance of Error Term in Simple Linear Regression Model

FAQs about Simple Linear Regression Models

Introduction to Method of Least Squares

Table of Contents

Mathematical Functions/ Models

Probabilistic and Deterministic Models

Estimation of Linear Model: Least Squares Method

Minimizing the Vertical Distances of Data Points

Applications of Least Squares Method

Frequently Asked Questions about Least Squares Method

Share this:

Table of Contents

Introduction

Functional Relationship Between Variables

Estimation of Regression Coefficients in Simple Linear Regression Model

Estimation of Paramters

Real-Life Examples: Simple Linear Regression Models

Share this:

Importance of Error Term in Simple Linear Regression Model

FAQs about Simple Linear Regression Models

Share this: