Analysing the Secular Trends - Statistics for Data Science & Analytics

NonLinear Trends and Method of Least Squares

Sep 7, 2024Oct 23, 2020 by Muhammad Imdad Ullah

Secular Trend — Nonlinear Trends

When a straight line does not describe accurately the long-term movement of a time series, then one might detect some curvature and decide to fit a curve instead of a straight line.

The most commonly used curve, to describe the nonlinear secular trend in a time series, are:

Exponential curve, and
Second-degree parabola

Exponential (Nonlinear) Curve

The exponential curve describes the trend (nonlinear) in a time series that changes by a constant percentage rate. The equation of the curve is $\hat{y} = ab^x$

Taking logarithm, we get the linear form $log\, \hat{y}=log\, a + (log\,b)x$

The method of least squares gives the normal equations:

\begin{align*}
\sum log\, y & = n\, log\, a + log\, b \sum x\\
\sum log\, y & = n\, log\, a \sum x + log\, b \sum x^2
\end{align*}

However, if $\sum x=0$ the normal equations becomes

\begin{align*}
\sum log\,y & = n\, log a\\
\sum x log\, y &= log\, b \sum x^2
\end{align*}

The values of $log\, a$ and $log\, b$ are

\begin{align*}
log\, a &=\frac{\sum log\, y}{n}\\
log\, b&= \frac{\sum x log\, y}{\sum x^2}
\end{align*}

Taking $antilog$ of of $log\, a$ and $log\, b$, we get the values of $a$ and $b$.

Question: The population of a country for the years 1911 to 1971 in ten yearly intervals in millions is 5.38, 7.22, 9.64, 12.70, 17.80, 24.02, and 31.34. (i) Fit a curve of the type $\hat{y}=ab^x$ to this time series and find the trend values, (ii) Forecast the population for the year 1991.

solution

(i) We have $\overline{t}=\frac{(1991+1971)}{2}=1941$. Let $x=\frac{t-\overline{t}}{10}=\frac{5-1941}{10}$ so that coded year number $x$ is measured in a unit of 10 years.

Year $t$	Population $y$	Coded Year $x=\frac{x-1941}{10}$	$log y$	$x log\, y$	$x^2$	$\hat{y}=13.029(1.345)^x$
1911	5.38	-3	0.73078	-2.19234	9	5.355
1921	7.22	-2	0.85854	-1.71708	4	7.202
1931	9.64	-1	0.98408	-0.98498	1	9.687
1941	12.70	0	1.10380	0	0	13.029
1951	17.808	1	1.25042	1.25042	1	17.524
1961	24.02	2	1.38057	2.76114	4	23.570
1971	31.34	3	1.49610	4.48830	9	31.701

The least squares exponential curve is $\hat{y} = ab^x$

Taking logarithm, $log\, \hat{y} = log a + (log\, b)x$

since $\sum x=0$, therefore

\begin{align*}
log\, a &= \frac{\sum log\, y}{n} = \frac{7.80429}{7}=1.1149\\
log\, b &= \frac{\sum x log\, y}{\sum x^2} = \frac{3.60636}{28}=0.12880\\
a &= antilog(1.1149)=13.029\\
b &= antilog(0.1288)=1.345\\
\hat{y} &=13.029 (1.345)^x,\quad \text{with origin at 1941 and unit of $x$ as 10 years}
\end{align*}

(ii) For $t=1941$ we have $x=\frac{t-1941}{10}= \frac{1991-1994}{10}=5$. Putting $x=5$, in the least squares exponential curve, we have
$\hat{y} = 13.029 (1.345)^5 = 57.348$ millions

Nonlinear Trends method of least squares

Second Degree Parabola (Nonlinear Trend)

It describes the trend (nonlinear) in a time series where a change in the amount of change is constant per unit of time. The quadratic (parabolic) trend can be described by the equation

\begin{align*}
\hat{y} = a + bx + cx^2
\end{align*}

The method of least squares gives the normal equations as

\begin{align*}
\sum y &= na + b\sum x + c \sum x^2\\
\sum xy &= a\sum x + b\sum x^2 + c \sum x^3\\
\sum x^2y &= a \sum x^2 + b\sum x^3 + c\sum x^4
\end{align*}

However if $\sum x = 0 \sum x^3$ then the normal equation reduces to

\begin{align*}
\sum y &= na + c\sum x^2\\
\sum xy &= b\sum x^2\\
\sum x^2 y &= a \sum x^2 + c \sum x^4\\
& \text{the values of $a$, $b$, and $c$ can be found as}\\
c &= \frac{n \sum x^2 y – (\sum x^2)(\sum y)}{n \sum x^2 -(\sum x^2)^2}\\
a&=\frac{\sum y – c\sum x^2}{n}\\
b&= \frac{\sum xy}{\sum x^2}
\end{align*}

Question: Given the following time series

Year	1931	1933	1935	1937	1939	1941	1943	1945
Price Index	96	87	91	102	108	139	307	289

Fit a second-degree parabola taking the origin in 1938.
Find the trend values
What would have been the equation of the parabola if the origin were in 1933

Solution

(i)

Year $t$	Price index $y$	Coded Year $x=t-1938$	$x^2$	$x^4$	$xy$	$x^2y$	Trend values $y=110.2+15.48x+2.01 x^2$
1931	96	-7	49	2401	-672	4704	100.33
1933	87	-5	25	625	-435	2175	83.05
1935	91	-3	9	81	-273	819	81.85
1937	102	-1	1	1	-102	102	96.73
1939	108	1	1	1	108	108	127.69
1941	139	3	9	81	417	1251	174.73
1943	307	5	25	625	1535	7675	237.85
1945	289	7	49	2401	2023	14161	317.05
Total	1219	0	168	6216	2601	30995

(ii) Different trend values are already computed in the above table.

\begin{align*}
\hat{y} &= a + b x + c x^2\\
c &= \frac{n\sum x^2 y-(\sum x^2)(\sum y)}{n \sum x^4 -(\sum x^2)^2} =\frac{8(30995)-(168)(1219)}{8(6126)-(168)^2}=2.01\\
a &= \frac{\sum y – a \sum x^2}{n}=\frac{1219-(2.01)(168)}{8}=119.2\\
b &= \frac{\sum xy}{\sum x^2}=\frac{2601}{168} = 15.48\\
\hat{y} &= 110.2 + 15.48x + 2.01^2,\quad \text{with origin at the year 1938}
\end{align*}

For different values of $x$, the trend values are obtained in the table.

For shifting the origin at 1933, replace $x$ by $(x-5)$

\begin{align*}
\hat{y} &= 110.2 + 15.48(x-5)+2.01(x-5)^2\\
&= 110.2 + 15.48(x-5)+2.01(x^2 -10x + 25)\\
&= 110.2 + 15.48x -77.4 + 2.01x^2 – 20.1x + 50.25\\
&= 83.05 -4.62x + 2.01x^2, \quad \text{with origin at the year 1933}
\end{align*}

Merits of Least Squares

The method of least squares gives the most satisfactory measurement of the secular trend in a time series when the distribution of the deviations is approximately normal.
The least-squares estimates are unbiased estimates of the parameters.
The method can be used when the trend is linear, exponential, or quadratic.

Demerits of Least Squares

The method of least squares method gives too much weight to extremely large deviations from the trend
The least-squares line is the best only for the period to which it has reference.
The elimination or addition for a few or more periods may change its position.

Statistical Models in R Programming Language

Method of Least Squares: Linear Trend (2020)

May 24, 2024Oct 17, 2020 by Muhammad Imdad Ullah

The least-squares principle (Method of Least Squares) says that “the sum of squares of the deviations of the observed values from the corresponding expected values should be least”. Among all the trend lines, the trend line is called a least-squares fit for which the sum of the squares of the deviations of the observed values from their corresponding expected values is the least.

Note that the usual probabilistic assumptions made in regression and correlation analysis are not met in the case of time series data.

Secular Trend — Linear Trend

It is useful to describe the trend in a time series where the amount of change is constant per unit of time.

Let $(x_1, y_1), (x_2, y_2), \cdots, (x_n,y_n)$ be the $n$ pairs of observed sample values of a time series variable $y$, with $x$ representing the coded time value. We can plot these $n$ points on a graph.

Let us suppose that we want to fit a straight line expressed in slope-intercept form as:

\begin{align}
\hat{y} = a + bx, \quad \quad (eq1)
\end{align}

The line (eq-1) will be called the least squares line if it makes $\sum(y-a-bx)^2$ minimum. The method of least squares yields the following normal equations:

\begin{align*}
\sum y &= na + b \sum x\\
\sum xy &= a \sum x + b \sum x^2
\end{align*}

The normal equations give the value of $a$ and $b$ as:

\begin{align*}
b &= \frac{n \sum xy – (\sum x \sum y )}{n \sum x^2 -(\sum x)^2}\\
a & = \overline{y}-b\overline{x}
\end{align*}

However, if $\sum x=0$ the usual normal equations reduces to

\begin{align*}
\sum y &= na\\
\sum xy & = b\sum x^2
\end{align*}

The value of $a$ and $b$ also reduces to

\begin{align*}
a&=\frac{\sum y}{n}=\overline{y}\\
b&=\frac{\sum xy}{\sum x^2}
\end{align*}

The trend values $\hat{y}$ are computed from the least-squares line $\hat{y}=a+bx$ by substituting the values of $x$ corresponding to the different time periods.

Properties of the Method of Least Squares

The least-squares line always passes through the point ($\overline{x}, \overline{y}$) called the center of gravity of the data.
The sum of deviations $\sum(y-\hat{y})$ of the observed values $y$ from their corresponding expected values $\hat{y}$ is zero, that is, $\sum(y-\hat{y})=0$, hence $\sum y= \sum \hat{y}$
The sum of squares of the deviations $\sum (y-\hat{y})^2$ measures how well the trend line fits the data. A smaller $\sum (y-\hat{y})^2$ means the better fit.

Moving Averages and Least Squares Linear Trend: The least-squares linear trend values corresponding to the central time period in each group of $k$ observations are equal to the $k$-period moving averages.

Question: Determine the trend line by the least-squares method from the following data. Plot the actual values and the linear trend on the same graph.

Year	1945	1946	1947	1948	1949	1950	1951	1952	1953
Price	3	6	2	10	7	9	14	12	18

Solution

The equation of the trend line is

\begin{align*}
\hat{Y} = a + b\, X
\end{align*}

Normal Equations are:

\begin{align*}
\Sigma Y & = n\, a + b \, \Sigma X \tag{i}\\
\Sigma XY& = a\, \Sigma X+ b\, \Sigma X^2 \tag{ii}
\end{align*}

Year	Value	$X$	$XY$	$X^2$	$\hat{Y}$
1945	3	-4	-12	16	22
1946	6	-3	-18	9	3.9
1947	2	-2	-4	4	5.6
1948	10	-1	-10	1	7.3
1949	7	0	0	0	9.0
1950	9	1	9	1	10.7
1951	14	2	28	4	12.4
1952	12	3	36	9	14.1
1953	18	4	72	16	15.8
Total	81	0	101	60	81.0

Putting the values in Normal equations:

\begin{align*}
81 &= 9a \tag*{1}\\
101&= 60b \tag*{2}
\end{align*}

From (1) $a=\frac{81}{9}=9$, and from (2) $b=\frac{101}{60}=1.7$.

Fitted trend line is $\hat{Y}=9 + 1.7\,X$.

The Method of Least Squares: Linear Trend

The method of least squares is a valuable tool for analyzing trends in time series data. By understanding the strengths and limitations of the methods, you can effectively use them to gain insights, make predictions, and compare trends across different time series datasets.

The Method of Least Squares: Non-Linear Trend

Online MCQs Test Website about various subjects

The Method of Moving Averages (2020)

Apr 8, 2024Oct 9, 2020 by Muhammad Imdad Ullah

The method of moving averages is of two types:

Simple Moving Averages
Weighted Moving Averages

Simple Moving Averages

If the observed values of a variable $Y$ are $y_1, y_2,\cdots, y_n$ corresponding to the time periods $t_1, t_2,\cdots, t_n$, respectively, the $k$-period simple moving averages are defined as

\begin{align*}
a_1 &= \frac{1}{k} \sum_{i=1}^{k} y_i\\
a_2 &= \frac{1}{k} \sum_{i=2}^{k+1} y_i,\\
a_3 &= \frac{1}{k} \sum_{i=3}^{k+2} y_i \\
\vdots &= \quad \vdots\\
a_m &= \frac{1}{k} \sum_{i=m}^{n} y_i
\end{align*}

where $a_1, a_2, \cdots, a_m$ is the sequence of $k$-period simple moving averages. That is, the $k$-period simple moving averages are calculated by averaging the first $k$ observations and then repeating this process of averaging the $k$ observations by dropping each time the first observation and including the next one. This process is continued till the last $k$ observations have been averaged. For example, the 3-period simple moving averages are given as:

\begin{align*}
a_1 &= \frac{1}{3} (y_1+y_2+y_3) = \frac{1}{3} \sum_{i=1}^{3} y_i\\
a_2 &= \frac{1}{3} (y_2+y_3+y_4) = \frac{1}{3} \sum_{i=2}^{4} y_i\\
a_3 &= \frac{1}{3} (y_3+y_4+y_5) = \frac{1}{3} \sum_{i=3}^{5} y_i\\
\vdots &= \quad \vdots\\
\text{and so on}
\end{align*}

Each of these simple moving averages of the sequence $a_1, a_2, a_3,\cdots$ is placed against the middle of each successive group. The $k$-period moving successive totals $S_1, S_2, S_3, \cdots$ are obtained by the following relations

\begin{align*}
S_1 = \sum_{i=1}^{k} y_i\\
S_2 &= S1+ y_{k+1}-y_1\\
S_3 &= S_2 + y_{k+2} – y_2\\
\vdots &= \quad \vdots\\
\text{so on}
\end{align*}

The $k$-period simple moving averages are obtained by dividing these $k$-period moving successive totals ($S_1, S_2, S_3, \cdots$) by $k$, as given in the following relations

\begin{align*}
a_1 &= \frac{S_1}{k}\\
a_2 &= a_1 + \frac{y_{k_1}0y_1} {k}\\
a_3 &= a_2 + \frac{y_{k+2} -y_2}{k}\\
\vdots &= \quad \vdots\\
\text{so on}
\end{align*}

When $k$ is odd, the sequence $a_1, a_2, a_3, \cdots$ will be placed against the middle of its time-period.
When $k$ is even, the sequence $a_1, a_2, a_3, \cdots$ of simple moving averages will be placed in the middle of two time periods. It is necessary to centralize these averages. For centralization, further 2-period moving averages of the former $k$-period moving averages are computed which are called $k$-period centered moving averages.

Weighted Moving Averages

For observed values ($y_1, y_2, \cdots, y_n$) of a variable $Y$ corresponding to the time periods $t_1, t_2, \cdots, t_n$, respectively, the $k$-period weighted moving averages with weights $w_1, w_2, \cdots, w_k$ are defined as

\begin{align*}
a_1 &= \frac{1}{\sum w} \sum_{i=1}^{k} y_i w\\
a_2 &= \frac{1}{\sum w} \sum_{i=2}^{k+1} y_i w\\
a_3 &= \frac{1}{\sum w} \sum_{i=3}^{k+2} y_i w\\
\vdots &= \vdots\\
a_m &= \frac{1}{\sum w} \sum_{i=m}^{n} y_i w\\
\end{align*}

where $a_1, a_2, \cdots, a_m$ is a sequence of $k$-period weighted moving averages with weights $w_1, w_2, \cdots, w_k$, respectively. The $k$-period weighted moving averages are calculated by taking the weighted average of the first $k$ observed values with weights $w_1, w_2, \cdots, w_k$ and then repeating this process of averaging the $k$ observations by dropping each time the first observation and including the next one. This process is continued until the last $k$ observations have been averaged.

Merits (Method of Moving Averages)

The method of moving averages is simple and easy.
This method is appropriate to remove, seasonal variations, cyclical fluctuations, and irregular variations.

Demerits (Method of Moving Averages)

Some values at the beginning and the end of the series are lost.
Moving averages are greatly affected by extreme values.
The method does not provide a mathematical formula for the trend.

Example: Calculate 3-year simple moving averages for the following time series. Also, plot actual data and moving averages on a graph. Also, find the 3-year weighted moving averages with weights 2, 2, and 1, respectively.

Year	1970	1971	1972	1973	1974	1975	1975	1977
Production	170.0	154.8	156.6	158.9	140.3	154.2	160.7	178.3

Solution:

Year	Production	3-Year Simple MT	3-Year Simple MA	3-Year WMT	3-Year WMA
1970	170.0
1971	154.8	481.3	160.43	806.1	161.22
1972	156.5	470.2	156.73	781.5	156.30
1973	158.9	455.7	151.90	771.1	154.22
1974	140.3	453.4	151.13	752.6	150.52
1975	154.2	455.2	151.73	749.7	149.94
1976	160.7	493.2	164.40	808.1	161.62
1977	178.3

*MT=moving total, MA=moving averages, WMT=weighted MT, WMA=Weighted MA

Importing and Exporting Data in R Language

Secular Trend — Nonlinear Trends

Table of Contents

Exponential (Nonlinear) Curve

Second Degree Parabola (Nonlinear Trend)

Merits of Least Squares

Demerits of Least Squares

Share this:

Secular Trend — Linear Trend

Properties of the Method of Least Squares

Share this:

Simple Moving Averages

Weighted Moving Averages

Merits (Method of Moving Averages)

Demerits (Method of Moving Averages)

Share this: