Method of Least Squares: Linear Trend (2020)

The least-squares principle (Method of Least Squares) says that “the sum of squares of the deviations of the observed values from the corresponding expected values should be least”. Among all the trend lines, the trend line is called a least-squares fit for which the sum of the squares of the deviations of the observed values from their corresponding expected values is the least.

Note that the usual probabilistic assumptions made in regression and correlation analysis are not met in the case of time series data.

Secular Trend — Linear Trend

It is useful to describe the trend in a time series where the amount of change is constant per unit of time.

Let $(x_1, y_1), (x_2, y_2), \cdots, (x_n,y_n)$ be the $n$ pairs of observed sample values of a time series variable $y$, with $x$ representing the coded time value. We can plot these $n$ points on a graph.

Let us suppose that we want to fit a straight line expressed in slope-intercept form as:

\begin{align}
\hat{y} = a + bx, \quad \quad (eq1)
\end{align}

The line (eq-1) will be called the least squares line if it makes $\sum(y-a-bx)^2$ minimum. The method of least squares yields the following normal equations:

\begin{align*}
\sum y &= na + b \sum x\\
\sum xy &= a \sum x + b \sum x^2
\end{align*}

The normal equations give the value of $a$ and $b$ as:

\begin{align*}
b &= \frac{n \sum xy – (\sum x \sum y )}{n \sum x^2 -(\sum x)^2}\\
a & = \overline{y}-b\overline{x}
\end{align*}

However, if $\sum x=0$ the usual normal equations reduces to

\begin{align*}
\sum y &= na\\
\sum xy & = b\sum x^2
\end{align*}

The value of $a$ and $b$ also reduces to

\begin{align*}
a&=\frac{\sum y}{n}=\overline{y}\\
b&=\frac{\sum xy}{\sum x^2}
\end{align*}

The trend values $\hat{y}$ are computed from the least-squares line $\hat{y}=a+bx$ by substituting the values of $x$ corresponding to the different time periods.

Properties of the Method of Least Squares

  • The least-squares line always passes through the point ($\overline{x}, \overline{y}$) called the center of gravity of the data.
  • The sum of deviations $\sum(y-\hat{y})$ of the observed values $y$ from their corresponding expected values $\hat{y}$ is zero, that is, $\sum(y-\hat{y})=0$, hence $\sum y= \sum \hat{y}$
  • The sum of squares of the deviations $\sum (y-\hat{y})^2$ measures how well the trend line fits the data. A smaller $\sum (y-\hat{y})^2$ means the better fit.

Moving Averages and Least Squares Linear Trend: The least-squares linear trend values corresponding to the central time period in each group of $k$ observations are equal to the $k$-period moving averages.

Question: Determine the trend line by the least-squares method from the following data. Plot the actual values and the linear trend on the same graph.

Year194519461947194819491950195119521953
Price3621079141218

Solution

The equation of the trend line is

\begin{align*}
\hat{Y} = a + b\, X
\end{align*}

Normal Equations are:

\begin{align*}
\Sigma Y & = n\, a + b \, \Sigma X \tag{i}\\
\Sigma XY& = a\, \Sigma X+ b\, \Sigma X^2 \tag{ii}
\end{align*}

YearValue$X$$XY$$X^2$$\hat{Y}$
19453-4-121622
19466-3-1893.9
19472-2-445.6
194810-1-1017.3
194970009.0
1950919110.7
195114228412.4
195212336914.1
1953184721615.8
Total8101016081.0

Putting the values in Normal equations:

\begin{align*}
81 &= 9a \tag*{1}\\
101&= 60b \tag*{2}
\end{align*}

From (1) $a=\frac{81}{9}=9$, and from (2) $b=\frac{101}{60}=1.7$.

Fitted trend line is $\hat{Y}=9 + 1.7\,X$.

The Method of Least Squares: Linear Trend

The method of least squares is a valuable tool for analyzing trends in time series data. By understanding the strengths and limitations of the methods, you can effectively use them to gain insights, make predictions, and compare trends across different time series datasets.

The Method of Least Squares: Non-Linear Trend

Online MCQs Test Website about various subjects

The Method of Moving Averages (2020)

The method of moving averages is of two types:

  1. Simple Moving Averages
  2. Weighted Moving Averages

Simple Moving Averages

If the observed values of a variable $Y$ are $y_1, y_2,\cdots, y_n$ corresponding to the time periods $t_1, t_2,\cdots, t_n$, respectively, the $k$-period simple moving averages are defined as

\begin{align*}
a_1 &= \frac{1}{k} \sum_{i=1}^{k} y_i\\
a_2 &= \frac{1}{k} \sum_{i=2}^{k+1} y_i,\\
a_3 &= \frac{1}{k} \sum_{i=3}^{k+2} y_i \\
\vdots &= \quad \vdots\\
a_m &= \frac{1}{k} \sum_{i=m}^{n} y_i
\end{align*}

where $a_1, a_2, \cdots, a_m$ is the sequence of $k$-period simple moving averages. That is, the $k$-period simple moving averages are calculated by averaging the first $k$ observations and then repeating this process of averaging the $k$ observations by dropping each time the first observation and including the next one. This process is continued till the last $k$ observations have been averaged. For example, the 3-period simple moving averages are given as:

\begin{align*}
a_1 &= \frac{1}{3} (y_1+y_2+y_3) = \frac{1}{3} \sum_{i=1}^{3} y_i\\
a_2 &= \frac{1}{3} (y_2+y_3+y_4) = \frac{1}{3} \sum_{i=2}^{4} y_i\\
a_3 &= \frac{1}{3} (y_3+y_4+y_5) = \frac{1}{3} \sum_{i=3}^{5} y_i\\
\vdots &= \quad \vdots\\
\text{and so on}
\end{align*}

Each of these simple moving averages of the sequence $a_1, a_2, a_3,\cdots$ is placed against the middle of each successive group. The $k$-period moving successive totals $S_1, S_2, S_3, \cdots$ are obtained by the following relations

\begin{align*}
S_1 = \sum_{i=1}^{k} y_i\\
S_2 &= S1+ y_{k+1}-y_1\\
S_3 &= S_2 + y_{k+2} – y_2\\
\vdots &= \quad \vdots\\
\text{so on}
\end{align*}

The $k$-period simple moving averages are obtained by dividing these $k$-period moving successive totals ($S_1, S_2, S_3, \cdots$) by $k$, as given in the following relations

\begin{align*}
a_1 &= \frac{S_1}{k}\\
a_2 &= a_1 + \frac{y_{k_1}0y_1} {k}\\
a_3 &= a_2 + \frac{y_{k+2} -y_2}{k}\\
\vdots &= \quad \vdots\\
\text{so on}
\end{align*}

method of moving averages
  • When $k$ is odd, the sequence $a_1, a_2, a_3, \cdots$ will be placed against the middle of its time-period.
  • When $k$ is even, the sequence $a_1, a_2, a_3, \cdots$ of simple moving averages will be placed in the middle of two time periods. It is necessary to centralize these averages. For centralization, further 2-period moving averages of the former $k$-period moving averages are computed which are called $k$-period centered moving averages.

Weighted Moving Averages

For observed values ($y_1, y_2, \cdots, y_n$) of a variable $Y$ corresponding to the time periods $t_1, t_2, \cdots, t_n$, respectively, the $k$-period weighted moving averages with weights $w_1, w_2, \cdots, w_k$ are defined as

\begin{align*}
a_1 &= \frac{1}{\sum w} \sum_{i=1}^{k} y_i w\\
a_2 &= \frac{1}{\sum w} \sum_{i=2}^{k+1} y_i w\\
a_3 &= \frac{1}{\sum w} \sum_{i=3}^{k+2} y_i w\\
\vdots &= \vdots\\
a_m &= \frac{1}{\sum w} \sum_{i=m}^{n} y_i w\\
\end{align*}

where $a_1, a_2, \cdots, a_m$ is a sequence of $k$-period weighted moving averages with weights $w_1, w_2, \cdots, w_k$, respectively. The $k$-period weighted moving averages are calculated by taking the weighted average of the first $k$ observed values with weights $w_1, w_2, \cdots, w_k$ and then repeating this process of averaging the $k$ observations by dropping each time the first observation and including the next one. This process is continued until the last $k$ observations have been averaged.

Merits (Method of Moving Averages)

  • The method of moving averages is simple and easy.
  • This method is appropriate to remove, seasonal variations, cyclical fluctuations, and irregular variations.

Demerits (Method of Moving Averages)

  • Some values at the beginning and the end of the series are lost.
  • Moving averages are greatly affected by extreme values.
  • The method does not provide a mathematical formula for the trend.

Example: Calculate 3-year simple moving averages for the following time series. Also, plot actual data and moving averages on a graph. Also, find the 3-year weighted moving averages with weights 2, 2, and 1, respectively.

Year19701971197219731974197519751977
Production170.0154.8156.6158.9140.3154.2160.7178.3

Solution:

YearProduction3-Year Simple MT3-Year Simple MA3-Year WMT3-Year WMA
1970170.0    
1971154.8481.3160.43806.1161.22
1972156.5470.2156.73781.5156.30
1973158.9455.7151.90771.1154.22
1974140.3453.4151.13752.6150.52
1975154.2455.2151.73749.7149.94
1976160.7493.2164.40808.1161.62
1977178.3    

*MT=moving total, MA=moving averages, WMT=weighted MT, WMA=Weighted MA

three year moving average

Importing and Exporting Data in R Language

Method of Semi Averages (2020)

The secular trends can also be measured by the method of semi averages. The steps are:

  • Divide the time series data into two equal portions. If observations are odd then either omit the middle value or include the middle value in each half.
  • Take the average of each part and place these average values against the midpoints of the two parts.
  • Plot the semi-averages in the graph of the original values.
  • Draw the required trend line through these two potted points and extend it to cover the whole period.
  • It is simple to compute the slope and $y$-intercept of the line drawn from two points. The trend values can be found from the semi-average trend line or by an estimated straight line as explained:

Let $y’_1$ and $y’_2$ be the semi-averages placed against the times $x_1$ and $x_2$. Let the estimated straight line $y’=a+bx$ is to pass through the points ($x_1$, $y’_1$) and ($x_2$, $y’_2$). The constant “$a$” and “$b$” can easily be determined. the equation of the line passing through the points ($x_1$, $y’_1$) and ($x_2$, $y’_2$) can be written as:

\begin{align*}
y’ – y’_1 &= \frac{y’_2-y’_1}{x_2-x_1}(x-x_1)\\
&= b(x-x_1)\\
\Rightarrow y’ &= (y’_1 – bx_1) + bx\\
&= a+bx, \quad \text{ where $a=y’_1-bx_1$}
\end{align*}

For an even number of observations, the slope of the trend line can be found as:

\begin{align*}
b&=\frac{1}{n/2}\left(\frac{S_2}{n/2} – \frac{S_1}{n/2} \right)\\
&= \frac{1}{n/2} \left(\frac{S_2-S_1}{n/2}\right)\\
&= \frac{4(S_2-S_1)}{n^2},
\end{align*}

where $S_1$ is the sum of $y$-values for the first half of the period, $S_2$ is the sum of $y$-values of the second half of the period, and $n$ is the number of time units covered by the time series.

The following merits and demerits of the Method of Semi Averages are as described:

Merits of Method of Semi Averages

  • The method of semi-averages is simple, easy, and quick.
  • It smooths out seasonal variations
  • It gives a better approximation to the trend because it is based on a mathematical model.

Demerits of Method of Semi Averages

  • It is a rough and objective method.
  • The arithmetic mean used in Semi Average is greatly affected by very large or by very small values.
  • The method of semi-averages is applicable when the trend is linear. This method is not appropriate if the trend is not linear.

Numerical Example 1: Method of Semi Averages

The following table shows the property damaged by road accidents in Punjab for the year 1973 to 1979.

Year1973197419751976197719781979
Property Damage201238392507484648742
  1. Obtain the semi-averages trend line
  2. Find out the trend values.

Solution

Let $x=t-1973$

YearProperty DamagedSemi TotalSemi AverageCoded YearTrend Values
1973201  0$y’=190+87(0)=190$
19742388312771$y’=190+87(1)=277$
1975392  2$y’=190+87(2)=364$
1976507  3$y’=190+87(3)=451$
1977484  4$y’=190+87(4)=538$
197854918756255$y’=190+87(5)=625$
1979742  6$y’=190+87(6)=712$
method of semi-averages (trend values)

\begin{align*}
y’_1 &= 277, x_1 = 1, y’_2 = 625, x_2=5\\
b&=\frac{y’_2-y’_1}{x_2-x_1}=\frac{625-277}{5-1}=87\\
a&=y’_1 – bx_1 = 277-87(1)=190
\end{align*}

The semi-average trend line $y’=190+87x$ (with the origin at 1973)

Numerical Example 2: Method of Semi Averages

The following table gives the number of books in thousands sold at a bookstore for the years 1973 to 1981

Year197319741975197619771978197919801981
No. of Books Sold423835253224201917
  1. Find the equation of the semi-average trend line
  2. Compute the trend values
  3. Estimate the number of books sold for the year 1982.

Solution

Let $x=t-1973$

YearNo. of books (y)Semi TotalSemi AverageCoded yearTrend Values
197342  0$y’=39.5 – 3(0)=39.5$
197438140351$y’=39.5 – 3(1)=36.5$
1975352$y’=39.5 – 3(2)=33.5$
197625  3$y’=39.5 – 3(3)=30.5$
197732  4$y’=39.5 – 3(4)=27.5$
197824  5$y’=39.5 – 3(5)=24.5$
19792080206$y’=39.5 – 3(6)= 21.5$
198019  7$y’=39.5 – 3(7)=18.5$
198117  8$y’=39.5 – 3(8)=15.5$

\begin{align*}
y’_1 &= 35, x_1=1.5, y’_2=20, x_2=6.5\\
b &= \frac{y’_2 – y’_1}{x_2-x_1} = \frac{20-35}{6.5-1.5} =-3\\
a &= y’_1 – bx_1 = 35 – (-3)(1.5) = 39.5\\
y’&= 39.5 – 3x (\text{with origin at 1973})
\end{align*}

For the year 1982, the estimated number of books sold is $y’=39.5-3(9)=12.5$.

Computer MCQs Test Online

R Programming Language

The Method of Free Hand Curve (2020)

The secular trend is measured by the method of the free hand curve in the following steps:

  • Take the time periods along the $x$-axis by taking appropriate scaling
  • Plot the points for observed values of the $Y$ variable as the dependent variable against the given time periods
  • Join these plotted points by line segments to get a historigram
  • Draw a free-hand smooth curve (or a straight line) through the histogram

In this method we draw the given times series data on graph paper, then we draw a free-hand trend line through the plotted graph according to the trend of the graph. Then we read trend values from this free-hand trend line.

It is generally preferred to use a curve instead of a straight line to show the secular trend.

Merits (Free Hand Curve)

  • The free-hand curve method is simple, easy, and quick for measuring secular trends.
  • A well-fitted trend line (or curve) approximates the trend closely based on a mathematical model.

Demerits (Free Hand Curve)

  • It is a rough and crude method.
  • It is greatly affected by personal bias as different persons may fit different trends to the same data.
    The estimates are not reliable due to personal bias.

Question: The following time series shows the number of road accidents in Punjab from 1972 to 1978.

Year1972197319741975197619771978
No. of Accidents2493263826993038374540794688
  • Obtain the historigram showing the number of road accidents and a free-hand trend line by drawing a straight line
  • Find the trend values for this time series

Solution:

Method of Free Hand Curve
YearValueTotalMeanTrend value
19722493  2200
19732638  2550
19742699  2950
1975303823338033403340
19763745  3650
19774079  4050
19784688  4499

The method of free hand curve is useful for:

  1. Exploratory Data Analysis (EDA): As a preliminary step free hand curve method helps us to understand the basic characteristics of the data and identify potential relationships between variables.
  2. Visual Communication: It also helps to present trends in the data in a clear and easily understandable way for non-statistical audiences.
  3. Limited Data: When you have a relatively small dataset, a free hand curve might be sufficient to get a basic idea of the central tendency.

By understanding the method of free hand curves and its limitations, one can use it as a valuable tool for initial data exploration and visualization alongside other statistical techniques for a more robust analysis.

MCQs Intermediate Mathematics Part-I Quadratic Equations