NonLinear Trends and Method of Least Squares

When a straight line does not describe accurately the long-term movement of a time series, then one might detect some curvature and decide to fit a curve instead of a straight line.

The most commonly used curve, to describe the nonlinear secular trend in a time series, are:

  1. Exponential curve, and
  2. Second-degree parabola

Exponential (Nonlinear) Curve

The exponential curve describes the trend (nonlinear) in a time series that changes by a constant percentage rate. The equation of the curve is $\hat{y} = ab^x$

Taking logarithm, we get the linear form $log\, \hat{y}=log\, a + (log\,b)x$

The method of least squares gives the normal equations:

\begin{align*}
\sum log\, y & = n\, log\, a + log\, b \sum x\\
\sum log\, y & = n\, log\, a \sum x + log\, b \sum x^2
\end{align*}

However, if $\sum x=0$ the normal equations becomes

\begin{align*}
\sum log\,y & = n\, log a\\
\sum x log\, y &= log\, b \sum x^2
\end{align*}

The values of $log\, a$ and $log\, b$ are

\begin{align*}
log\, a &=\frac{\sum log\, y}{n}\\
log\, b&= \frac{\sum x log\, y}{\sum x^2}
\end{align*}

Taking $antilog$ of of $log\, a$ and $log\, b$, we get the values of $a$ and $b$.

Question: The population of a country for the years 1911 to 1971 in ten yearly intervals in millions is 5.38, 7.22, 9.64, 12.70, 17.80, 24.02, and 31.34. (i) Fit a curve of the type $\hat{y}=ab^x$ to this time series and find the trend values, (ii) Forecast the population for the year 1991.

solution

(i) We have $\overline{t}=\frac{(1991+1971)}{2}=1941$. Let $x=\frac{t-\overline{t}}{10}=\frac{5-1941}{10}$ so that coded year number $x$ is measured in a unit of 10 years.

Year $t$Population $y$Coded Year $x=\frac{x-1941}{10}$$log y$$x log\, y$$x^2$$\hat{y}=13.029(1.345)^x$
19115.38-30.73078-2.1923495.355
19217.22-20.85854-1.7170847.202
19319.64-10.98408-0.9849819.687
194112.7001.103800013.029
195117.80811.250421.25042117.524
196124.0221.380572.76114423.570
197131.3431.496104.48830931.701

The least squares exponential curve is $\hat{y} = ab^x$

Taking logarithm, $log\, \hat{y} = log a + (log\, b)x$

since $\sum x=0$, therefore

\begin{align*}
log\, a &= \frac{\sum log\, y}{n} = \frac{7.80429}{7}=1.1149\\
log\, b &= \frac{\sum x log\, y}{\sum x^2} = \frac{3.60636}{28}=0.12880\\
a &= antilog(1.1149)=13.029\\
b &= antilog(0.1288)=1.345\\
\hat{y} &=13.029 (1.345)^x,\quad \text{with origin at 1941 and unit of $x$ as 10 years}
\end{align*}

(ii) For $t=1941$ we have $x=\frac{t-1941}{10}= \frac{1991-1994}{10}=5$. Putting $x=5$, in the least squares exponential curve, we have
$\hat{y} = 13.029 (1.345)^5 = 57.348$ millions

Nonlinear Trends method of least squares

Second Degree Parabola (Nonlinear Trend)

It describes the trend (nonlinear) in a time series where a change in the amount of change is constant per unit of time. The quadratic (parabolic) trend can be described by the equation

\begin{align*}
\hat{y} = a + bx + cx^2
\end{align*}

The method of least squares gives the normal equations as

\begin{align*}
\sum y &= na + b\sum x + c \sum x^2\\
\sum xy &= a\sum x + b\sum x^2 + c \sum x^3\\
\sum x^2y &= a \sum x^2 + b\sum x^3 + c\sum x^4
\end{align*}

However if $\sum x = 0 \sum x^3$ then the normal equation reduces to

\begin{align*}
\sum y &= na + c\sum x^2\\
\sum xy &= b\sum x^2\\
\sum x^2 y &= a \sum x^2 + c \sum x^4\\
& \text{the values of $a$, $b$, and $c$ can be found as}\\
c &= \frac{n \sum x^2 y – (\sum x^2)(\sum y)}{n \sum x^2 -(\sum x^2)^2}\\
a&=\frac{\sum y – c\sum x^2}{n}\\
b&= \frac{\sum xy}{\sum x^2}
\end{align*}

Question: Given the following time series

Year19311933193519371939194119431945
Price Index968791102108139307289
  1. Fit a second-degree parabola taking the origin in 1938.
  2. Find the trend values
  3. What would have been the equation of the parabola if the origin were in 1933

Solution

(i)

Year
$t$
Price index
$y$
Coded Year
$x=t-1938$
$x^2$$x^4$$xy$$x^2y$Trend values
$y=110.2+15.48x+2.01 x^2$
193196-7492401-6724704100.33
193387-525625-435217583.05
193591-3981-27381981.85
1937102-111-10210296.73
1939108111108108127.69
194113939814171251174.73
194330752562515357675237.85
19452897492401202314161317.05
Total121901686216260130995 

(ii) Different trend values are already computed in the above table.

\begin{align*}
\hat{y} &= a + b x + c x^2\\
c &= \frac{n\sum x^2 y-(\sum x^2)(\sum y)}{n \sum x^4 -(\sum x^2)^2} =\frac{8(30995)-(168)(1219)}{8(6126)-(168)^2}=2.01\\
a &= \frac{\sum y – a \sum x^2}{n}=\frac{1219-(2.01)(168)}{8}=119.2\\
b &= \frac{\sum xy}{\sum x^2}=\frac{2601}{168} = 15.48\\
\hat{y} &= 110.2 + 15.48x + 2.01^2,\quad \text{with origin at the year 1938}
\end{align*}

For different values of $x$, the trend values are obtained in the table.

For shifting the origin at 1933, replace $x$ by $(x-5)$

\begin{align*}
\hat{y} &= 110.2 + 15.48(x-5)+2.01(x-5)^2\\
&= 110.2 + 15.48(x-5)+2.01(x^2 -10x + 25)\\
&= 110.2 + 15.48x -77.4 + 2.01x^2 – 20.1x + 50.25\\
&= 83.05 -4.62x + 2.01x^2, \quad \text{with origin at the year 1933}
\end{align*}

Merits of Least Squares

  • The method of least squares gives the most satisfactory measurement of the secular trend in a time series when the distribution of the deviations is approximately normal.
  • The least-squares estimates are unbiased estimates of the parameters.
  • The method can be used when the trend is linear, exponential, or quadratic.

Demerits of Least Squares

  • The method of least squares method gives too much weight to extremely large deviations from the trend
  • The least-squares line is the best only for the period to which it has reference.
  • The elimination or addition for a few or more periods may change its position.

Statistical Models in R Programming Language

Leave a Comment

Discover more from Statistics for Data Analyst

Subscribe now to keep reading and get access to the full archive.

Continue reading