Method of Least Squares

Introduction to Method of Least Squares

The method of least squares is a statistical technique used to find the best-fitting curve or line for a set of data points. It does this by minimizing the sum of the squares of the offsets (residuals) of the points from the curve.

The method of least squares is used for

  • solution of equations, and
  • curve fitting

The principles of least squares consist of minimizing the sum of squares of deviations, errors, or residuals.

Mathematical Functions/ Models

Many types of mathematical functions (or models) can be used to model the response, i.e. a function of one or more independent variables. It can be classified into two categories, deterministic and probabilistic models. For example, $Y$ and $X$ are related according to the relation

$$Y=\beta_o + \beta_1 X,$$

where $\beta_o$ and $\beta_1$ are unknown parameter. $Y$ is a response variable and $X$ is an independent/auxiliary variable (regressor). The model above is called the deterministic model because it does not allow for any error in predicting $Y$ as a function of $X$.

Probabilistic and Deterministic Models

Suppose that we collect a sample of $n$ values of $Y$ corresponding to $n$ different settings for the independent random variable $X$ and the graph of the data is as shown below.

Method of Least Squares

In the figure above it is clear that $E(Y)$ may increase as a function of $X$ but the deterministic model is far from an adequate description of reality.

Repeating the experiment when say $X=20$, we would find $Y$ fluctuates about a random error, which leads us to the probabilistic model (that is the model is not deterministic or not an exact representation between two variables). Further, if the mode is used to predict $Y$ when $X=20$, the prediction would be subjected to some known error. This of course leads us to use the statistical method predicting $Y$ for a given value of $X$ is an inferential process and we need to find if the error of prediction is to be valued in real life. In contrast to the deterministic model, the probabilistic model is

$$E(Y)=\beta_o + \beta_1 X + \varepsilon,$$

where $\varepsilon$ is a random variable having the specified distribution, with zero mean. One may think having the deterministic component with error $\varepsilon$.

The probabilistic model accounts for the random behaviour of $Y$ exhibited in the figure and provides a more accurate description of reality than the deterministic model.

The properties of error of prediction of $Y$ can be divided for many probabilistic models. If the deterministic model can be used to predict with negligible error, for all practical purposes, we use them, if not, we seek a probabilistic model which will not be a correct/exact characterization of nature but enable us to assess the reality of our nature.

Estimation of Linear Model: Least Squares Method

For the estimation of the parameters of a linear model, we consider fitting a line.

$$E(Y) = \beta_o + \beta_1 X, \qquad (where\,\, X\,\,\, is \,\,\, fixed).$$

For a set of points ($x_i, y_i$), we consider the real situation

$$Y=\beta_o+\beta_1X+\varepsilon, \qquad with\,\,\, E(\varepsilon)=0$$

where $\varepsilon$ posses specific probability distribution with zero mean and $\beta_o$ and $\beta_1$ are unknown parameters.

Minimizing the Vertical Distances of Data Points

Now if $\hat{\beta}_o$ and $\hat{\beta}_1$ are the estimates of $\beta_o$ and $\beta_1$, respectively then $\hat{Y}=\hat{\beta}_o+\hat{\beta}_1X$ is an estimate of $E(Y)$.

Method of Least Squares

Suppose we have a set of $n$ data sets (points, $x_i, y_i$) and we want to minimize the sum of squares of the vertical distances of the data points from the fitted line $\hat{y}_i = \hat{\beta}_o + \hat{\beta}_1x_i; \,\,\, i=1,2,\cdots, n$. The $\hat{y}_i = \hat{\beta}_o + \hat{\beta}_1x_i$ is the predicted value of $i$th $Y$ when $X=x_i$. The deviation of observed values of $Y$ from $\hat{Y}$ line (sometimes called errors) is $y_i – \hat{y}_i$ and the sum of squares of deviations to be minimized is (vertical distance: $y_i – \hat{y}_i$).

\begin{align*}
SSE &= \sum\limits_{i=1}^n (y_i-\hat{y}_i)^2\\
&= \sum\limits_{i=1}^n (y_i – \hat{\beta}_o – \hat{\beta}_1x_i)^2
\end{align*}

The quantity SSE is called the sum of squares of errors. If SSE possesses minimum, it will occur for values of $\beta_o$ and $\beta_1$ that satisfied the equation $\frac{\partial SSE}{\partial \beta_o}=0$ and $\frac{\partial SSE}{\partial \beta_1}=0$.

Taking the partial derivatives of SSE with respect to $\hat{\beta}_o$ and $\hat{\beta}_1$ and setting them equal to zero, gives us

\begin{align*}
\frac{\partial SSE}{\partial \beta_o} &= \sum\limits_{i=1}^n (y_i – \hat{\beta}_o – \hat{\beta}_1 x_i)^2\\
&= -2 \sum\limits_{i=1}^n (y_i – \hat{\beta}_o – \hat{\beta}_1 x_i) =0\\
&= \sum\limits_{i=1}^n y_i – n\hat{\beta}_o – \hat{\beta}_1 \sum\limits_{i=1}^n x_i =0\\
\Rightarrow \overline{y} &= \hat{\beta}_o + \beta_1\overline{x} \tag*{eq (1)}
\end{align*}

and

\begin{align*}
\frac{\partial SSE}{\partial \beta_1} &= -2 \sum\limits_{i=1}^n (y_i – \hat{\beta}_o – \hat{\beta}_1 x_i)x_i =0\\
&= \sum\limits_{i=1}^n (y_i – \hat{\beta}_o – \hat{\beta}_1 x_i)x_i=0\\
\Rightarrow \sum\limits_{i=1}^n x_iy_i &= \hat{\beta}_o \sum\limits_{i=1}^n x_i – \hat{\beta}_1 \sum\limits_{i=1}^n x_i^2\tag*{eq (2)}
\end{align*}

The equation $\frac{\partial SSE}{\hat{\beta}_o}=0$ and $\frac{\partial SSE}{\partial \hat{\beta}_1}=0$ are called the least squares for estimating the parameters of a straight line. On solving the least squares equation, we have from equation (1),

$$\hat{\beta}_o = \overline{Y} – \hat{\beta}_1 \overline{X}$$

Putting $\hat{\beta}_o$ in equation (2)

\begin{align*}
\sum\limits_{i=1}^n x_i y_i &= (\overline{Y} – \hat{\beta}\overline{X}) \sum\limits_{i=1}^n x_i + \hat{\beta}_1 \sum\limits_{i=1}^n x_i^2\\
&= n\overline{X}\,\overline{Y} – n \hat{\beta}_1 \overline{X}^2 + \hat{\beta}_1 \sum\limits_{i=1}^n x_i^2\\
&= n\overline{X}\,\overline{Y} + (\sum\limits_{i=1}^n x_i^2 – n\overline{X}^2)\\
\Rightarrow \hat{\beta}_1 &= \frac{\sum\limits_{i=1}^n x_iy_i – n\overline{X}\,\overline{Y} }{\sum\limits_{i=1}^n x_i^2 – n\overline{X}^2} = \frac{\sum\limits_{i=1}^n (x_i-\overline{X})(y_i-\overline{Y})}{\sum\limits_{i=1}^n(x_i-\overline{X})^2}
\end{align*}

Applications of Least Squares Method

The method of least squares is a powerful statistical technique. It provides a systematic way to find the best-fitting curve or line for a set of data points. It enables us to model relationships between variables, make predictions, and gain insights from data. The method of least squares is widely used in various fields, such as:

  • Regression Analysis: To model the relationship between variables and make predictions.
  • Curve Fitting: To find the best-fitting curve for a set of data points.
  • Data Analysis: To analyze trends and patterns in data.
  • Machine Learning: As a foundation for many machine learning algorithms.

Frequently Asked Questions about Least Squares Method

  • What is the method of Least Squares?
  • Write down the applications of the Least Squares method.
  • How vertical distance of the data points from the regression line is minimized?
  • What is the principle of the Method of Least Squares?
  • What is meant by probabilistic and deterministic models?
  • Give an example of deterministic and probabilistic models.
  • What is the mathematical model?
  • What is the statistical model?
  • What is curve fitting?
  • State and prove the Least Squares Method?

R Programming Language

MCQs Basic Statistics Quiz 19

This Statistics Test is about MCQs Basic Statistics Quiz with Answers. There are 20 multiple-choice questions from Basics of Statistics, measures of central tendency, measures of dispersion, Measures of Position, and Distribution of Data. Let us start with the MCQS Basic Statistics Quiz with Answers

Online Multiple-Choice Questions about Basic Statistics with Answers

1. In general, which of the following statements is FALSE?

 
 
 
 

2. The most important measure of dispersion is

 
 
 
 

3. The dispersion expressed in the form of a ratio or coefficient and independent from units of measurement is called

 
 
 
 

4. Who used the term Statistics for the first time?

 
 
 
 

5. If $x=3$ then which of the following is the minimum

 
 
 
 

6. If 6 is multiple t all observations in the data, the mean is multiplied by

 
 
 
 

7. The difference between the largest and smallest value in the data is called

 
 
 
 

8. The first step in computing the median is

 
 
 
 

9. What would be the changes in the standard deviation if different values are increased by a constant?

 
 
 
 

10. Which of the following is a relative measure of dispersion

 
 
 
 

11. The median is larger than the arithmetic mean when

 
 
 
 

12. Fill in the missing words to the quote: “Statistical methods may be described as methods for drawing conclusions about —————- based on ————– computed from the —————“.

 
 
 
 

13. Mode of the values 3, 5, 8, 10, and 12 is

 
 
 
 

14. For a set of distributions if the value of the mean is 20 and the mode is 14 then what is the value of the median for a set of distributions?

 
 
 
 

15. Which of the properties of Average Deviation considers Mathematics assumption wrong?

 
 
 
 

16. Which of the following is an absolute measure of dispersion

 
 
 
 

17. Mode of the values 2, 6, 8, 6, 12, 15, 18, and 8 is

 
 
 
 

18. If any value in the data is negative, it is not possible to calculate

 
 
 
 

19. The half of the difference between the third and first quartiles is called

 
 
 
 

20. Two sets of distribution are as follows. For both of the sets, the Range is the same. Which of the demerits of Range is shown here in these sets of distribution?
Distribution 1: 30 14 18 25 12
Distribution 2: 30 7 19 27 12

 
 
 
 

Online MCQs Basic Statistics Quiz

  • If any value in the data is negative, it is not possible to calculate
  • Mode of the values 2, 6, 8, 6, 12, 15, 18, and 8 is
  • Mode of the values 3, 5, 8, 10, and 12 is
  • The first step in computing the median is
  • If $x=3$ then which of the following is the minimum
  • The dispersion expressed in the form of a ratio or coefficient and independent from units of measurement is called
  • The half of the difference between the third and first quartiles is called
  • The difference between the largest and smallest value in the data is called
  • The most important measure of dispersion is
  • Which of the following is a relative measure of dispersion
  • Which of the following is an absolute measure of dispersion
  • If 6 is multiple t all observations in the data, the mean is multiplied by
  • Which of the properties of Average Deviation considers Mathematics assumption wrong?
  • What would be the changes in the standard deviation if different values are increased by a constant?
  • Two sets of distribution are as follows. For both of the sets, the Range is the same. Which of the demerits of Range is shown here in these sets of distribution? Distribution 1: 30 14 18 25 12 Distribution 2: 30 7 19 27 12
  • For a set of distributions if the value of the mean is 20 and the mode is 14 then what is the value of the median for a set of distributions?
  • Who used the term Statistics for the first time?
  • The median is larger than the arithmetic mean when
  • Fill in the missing words to the quote: “Statistical methods may be described as methods for drawing conclusions about —————- based on ————– computed from the —————“.
  • In general, which of the following statements is FALSE?
MCQs Basic Statistics Quiz with Answers

Computer MCQs Online Test, Learn R Language

Inferential Statistics Terminology

This post is about Inferential Statistics (or statistical inference) and some of its related terminologies. This is a field of statistics that allows us to understand and make predictions about the world around us.

Parameter and Statistic

Any measurable characteristic of a population is called a parameter. For example, the mean of a population is a parameter. OR

Numerical values that describe the characteristics of a whole population are called parameters, commonly presented in Greek Letters.

Any measurable characteristic of a sample is called a statistic. For example, the mean of a sample is a statistic. OR

Numerical measures describing the characteristics of a sample are called statistics, represented by Roman Letters.

Population and Sample

Population: The entire group of individuals, objects, or data points that one is interested in studying. A population under study can be finite or infinite. However, often too large or impractical to study directly.

Sample: A smaller, representative subset of the population. It is used to gain insights about the population without having to study every member. A sample should accurately reflect the characteristics of the population.  

Inference

A Process of drawing conclusions about a population based on the information contained in a sample taken from that population

Estimator

An estimator is a rule (method, formula) that tells how to calculate the value of an estimate based on the measurements contained in a sample. The sample mean is one possible estimator of the population mean $\mu$.

An estimator will be a good estimator in the sense that the distribution of an estimator is concentrated near the value of the parameter.

Estimate

Estimate is a way to use samples. There are many ways to estimate a parameter. Estimates are near to reality (biased or crude). Decisions are very accurate if the estimate is near to reality.

$X_1, X_2, \cdots, X_n$ is a sample and $\overline{X}$ is an estimator. $x_1, x_2, \cdots, x_n$ are sample observation and $\overline{x}=\frac{\Sigma x_i}{n}$ is an estimate.

Estimation

Estimation is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable.

Statistical Inference (or Inferential Statistics)

Any process (art) of drawing inferences (conclusions) about the population based on limited information contained in a sample taken from the same population is called statistical inference (or inferential statistics). It is difficult to draw an inference about the population because the study of the entire universe (population) is not simple. To get some idea about the characteristics (parameters) of the population, we choose a part of a reasonable size, generally, referred to as a sample (by some appropriate method).

Statistical inference is a powerful set of tools used to conclude a population based on data collected from a sample of that population. It allows us to make informed decisions and predictions about the larger group even when we have not examined every single member.

Why Estimate?

  • Speed: Often, an estimate is faster to get than an exact calculation.
  • Simplicity: It can simplify complex problems.
  • Decision-Making: Estimates help one to make choices when one does not have all the details.
  • Checking: One can use estimates to check if a more precise answer is reasonable.

Why is Statistical Inference Important?

  • Decision-making: It helps us make informed decisions in various fields, such as medicine, business, and social sciences.
  • Research: It is crucial for conducting research and drawing meaningful conclusions from data.
  • Understanding the World: It allows us to understand and make predictions about the world around us.
Inferential Statistics or Statistical Inference

Learn R Programming Language, Learn Statistics and Data Analysis

Solved Probability Questions with Answers

This post is about some solved probability questions. These questions make use of (i) the Addition Law of Probabilities, and (ii) the Multiplication Law of Probabilities.

Solved Probability Questions

Question 1: Box A contains 5 Green and 7 Red balls. Box B contains 3 Green, 3 Red, and 6 Yellow balls. A box is selected at random, and a ball is drawn at random from it. What is the probability that the bill drawn is green?

Solution:

Box A

Total Balls: 5 + 7 = 12
Prob(Green) = $\frac{3}{12}$

Box B

Total Balls: 3 + 3 + 6 = 12
P(Green) = $\frac{3}{12} = \frac{1}{4}$

$$P(A+B) = P(A) + P(B) = \frac{5}{12} + \frac{3}{12} = \frac{8}{12} = \frac{2}{3}$$

Question 2: A pair of fair dice is thrown twice. What is the probability of getting a total of 5 or 11?

Solution:

\begin{align*}
P(X = 11 \,\, or X = 5) &= P(X=11) + P(X=15) – P(X=11\,\,and\,\, X=5)\\
P(X=11) &= \frac{2}{36}\\
P(X=5) &= \frac{4}{36}=\frac{1}{9}\\
P(X=11\,\, and X=5) &= 0
\end{align*}

Therefore,

\begin{align*}
P(X=11\,\, or X=5) &= P(X=11) + P(X=5) \\
&=\frac{2}{36} + \frac{1}{9} = \frac{1}{6}
\end{align*}

Note that $P(X=11\,\, and X=5) = 0$, because the sum of two dice cannot be at the same time 5 and 11.

Question 3: A marble is drawn at random from a box containing 10 red, 30 white, 20 blue, and 15 orange marbles. What is the probability that it is (i) orange or red (ii) not red or blue (iii) not blue, (iv) white, (v) red, white, or blue.

Solution:

Total number of balls = 10 + 30 + 20 + 15 = 75
Number of Orange balls = 15
Number of Blue balls = 20
Number of White balls = 30
Number of Red balls = 10

  1. P(a marble drawn is red or orange) = P(Red marble) + P(Orange marble)
    $$=\frac{10}{75} + \frac{15}{75} = \frac{1}{3}$$
  2. P(a marble drawn is not red or blue) = P(not Red) + P(Blue) – P(Blue and not Red)
    $$=\frac{65}{75} + \frac{20}{75} – \frac{20}{75} = \frac{65}{75}$$
  3. P(a ball drawn is not Blue) = $1 – P(Blue) = 1 – \frac{20}{75} = 0.733$
  4. P(a ball drawn is white) = $\frac{30}{75}$
  5. P(a ball drawn is Red, White, or Blue) = P(Red) + P(White) + P(Blue)
    $$=\frac{10}{75} + \frac{30}{75} + \frac{20}{75} = \frac{60}{75}$$

Question 4: If two dice are thrown what are the various total number of dots that may turn up? What are the probabilities of each of them? What is the probability that the number of dots will total at least four?

Solution:

When two dice are thrown together, the minimum total number of dots is 2 (1, 1), and the maximum dots possible are 12 (6, 6). Therefore

  • Probability of 2 dots (1, 1) = $\frac{1}{36}$
  • Probability of 3 dots {(2, 1), (1, 2)} = $\frac{2}{36} = \frac{1}{18}$
  • Probability of 4 dots {(2,2) (3,1) (1,3)} = $\frac{3}{36} = \frac{1}{12}$
  • Probability of 5 dots {(4,1) (1,4) (2,3) (3,2)} = $\frac{4}{36} = \frac{1}{9}$
  • {Probability of 6 dots {(3,3) (4,2) (2,4) (5,1) (5,1)} = $\frac{5}{36}$
  • Probability of 7 dots {(4,3) (3,4) (5,2) (2,5) (6,1) (1,6)} = $\frac{6}{36} = \frac{1}{6}$
  • Probability of 8 dots {(6,2) (2,6) (5,3) (3,5) (4,4)} = $\frac{5}{36}$
  • Probability of 9 {(5,4) (4,5) (6,3) (3,6)} dots = $\frac{4}{36} = \frac{1}{9}$
  • Probability of 10 dots {(5,5) (6,4) (4,6)} = $\frac{3}{36} = \frac{1}{2}$
  • Probability of 11 dots {(5,6) (6,5)} = \frac{2}{36} = \frac{1}{18}$
  • Probability of 12 dots {(6,6)} = $\frac{1}{36}$
  • Probability that the number of dots will total at least 4 = $\frac{33}{36}$

Question 5: A one card is selected at random from a deck of 52 playing cards. What is the probability that the card is a club or a face card or both?

Solution:

\begin{align*}
P(club\,\, or\,\, face\,\, or\,\, both) &= P(club) + P(face) – P(club\,\, and\,\, face)\\
&=\frac{13}{52} + \frac{12}{52} – \frac{3}{52} = \frac{11}{26}
\end{align*}

Question 6: A class contains 10 men and 20 women of which half men and half women have brown eyes. What is the probability that a person chosen at random is a man or has brown eyes?

Solution:

Let $A$ be the event that it is a man (10 out of 30)
Let $B$ be the event that the person has brown eyes (5 men and 10 women: 15 out of 30)

$P(A\cap B)$ is a man AND has brown eyes $\frac{5}{30}$

\begin{align*}
P(A \cup B) &= P(A) + P(B) – P(A \cap B)\\
&= \frac{10}{30} + \frac{15}{30} – \frac{5}{30} = \frac{2}{3}
\end{align*}

Question 7: A drawer contains 50 bolts and 150 nuts. Half of the bolts and half of the nuts are rested. If one item is chosen at random, what is the probability that it is rusted or is a bolt?

Solution:

Number of Bolts = 50
NUmber of Nuts = 150
Total number of Items = 50 + 150 = 200

Item chosen is rusted: $P(A) = \frac{100}{200} = \frac{1}{2}$
Item chosen is bolt: $P(B) = \frac{50}{200} = \frac{1}{4}$
Ite is Rusted and Bolt = $P(A\cap B) = P(A) \cdot P(B) = \frac{1}{2}\cdot \frac{1}{4} = \frac{1}{8}$

\begin{align*}
P(A \cup B) &= P(A) + P(B) – P(A\cap B) \\
&= \frac{1}{2} + \frac{1}{4} – \frac{1}{8} = \frac{5}{8}
\end{align*}

Solved Probability Questions with Answers

Learn R Programming, Computer MCQs Online Test