Time series analysis deals with the data observed with some time-related units such as a month, days, years, quarters, minutes, etc. Time series data means that data is in a series of particular periods or intervals. Therefore, a set of observations on the values that a variable takes at different times.
Real-World Applications of Time Series Analysis
Finance: Predicting stock prices, and analyzing market trends.
Sales and Marketing: Forecasting demand, and planning promotions.
Supply Chain Management: Optimizing inventory levels, and predicting product needs.
Healthcare: Monitoring patient health trends, and predicting disease outbreaks.
Environmental Science: Forecasting weather patterns, and analyzing climate change.
We have to find a way of isolating and measuring the seasonal variations. There are two reasons for isolating and measuring the effect of seasonal variations.
To study the changes brought by seasons in the values of the given variable in a time series
To remove it from the time series to determine the value of the variable
Summing the values of a particular season for several years, the irregular variations will cancel each other, due to independent random disturbances. If we also eliminate the effect of trend and cyclical variations, the seasonal variations will be left out which are expressed as a percentage of their average.
Seasonal Variations
A study of seasonal variation leads to more realistic planning of production and purchases etc.
Seasonal Index Method
When the effect of the trend has been eliminated, we can calculate a measure of seasonal variation known as the seasonal index. A seasonal index is simply an average of the monthly or quarterly value of different years expressed as a percentage of averages of all the monthly or quarterly values of the year.
The following methods are used to estimate seasonal variations.
Average percentage method (simple average method)
Link relative method
Ratio to the trend of short-time values
Ratio to the trend of long-time averages projected to short times
Ratio to moving average
The Simple Average Method
Assume the series is expressed as
$$Y=TSCI$$
Consider the long-time averages as trend values and eliminate the trend element by expressing a short-time observed value as a percentage of the corresponding long-time average. In the multiplicative model, we obtain
\begin{align*} \frac{\text{short time observed value} }{\text{long time average}}\times &= \frac{TSCI}{T}\times 100\\ &=SCI\times 100 \end{align*}
This percentage of the long-time average represents the seasonal (S), the cyclical (C), and the irregular (I) component.
Once $SCI$ is obtained, we try to remove $CI$ as much as possible from $SCI$. This is done by arranging these percentages season-wise for all the long times (say years) and taking the modified arithmetic mean for each season by ignoring both the smallest and the largest percentages. These would be seasonal indices.
If the average of these indices is not 100, then the adjustment can be made, by expressing these seasonal indices as the percentage of their arithmetic mean. The adjustment factor would be
\begin{align*} \frac{100}{\text{Mean of Seasonal Indiex}} \rightarrow \frac{400}{\text{sums of quarterly index}} \,\, \text{ or } \frac{1200}{\text{sums of monthly indices}} \end{align*}
Example of Seasonal Variations
Question: The following data is about several automobiles sold.
Year
Quarter 1
Quarter 2
Quarter 3
Quarter 4
1981
250
278
315
288
1982
247
265
301
285
1983
261
285
353
373
1984
300
325
370
343
1985
281
317
381
374
Calculate the seasonal indices by the average percentage method.
Solution:
First, we obtain the yearly (long-term) averages
Year
1981
1982
1983
1984
1985
Year Total
1131
1098
1272
1338
1353
Yearly Average
1131/4=282.75
274.50
318.00
334.50
338.25
Next, we divide each quarterly value by the corresponding yearly average and express the results as percentages. That is,
Year
Quarter 1
Quarter 2
Quarter 3
Quarter 4
1981
$\frac{250}{282.75}\times=88.42$
$\frac{278}{282.75}\times=98.32^*$
Total (modified)
$\frac{288}{282.75}\times=101.86^*$
1982
$\frac{247}{274.50}\times=89.98^*$
$\frac{265}{274.50}\times=96.54$
$\frac{301}{274.50}\times=109.65^*$
$\frac{285}{274.50}\times=103.83$
1983
$\frac{261}{318.00}\times=82.08^*$
$\frac{285}{318.00}\times=89.62^*$
$\frac{353}{318.00}\times=111.01$
$\frac{373}{318.00}\times=117.30^*$
1984
$\frac{300}{334.50}\times=89.69$
$\frac{325}{334.50}\times=97.16$
$\frac{370}{334.50}\times=110.61$
$\frac{343}{334.50}\times=102.54$
1985
$\frac{281}{338.25}\times=83.07$
$\frac{317}{338.25}\times=93.72$
$\frac{381}{338.25}\times=112.64^*$
$\frac{374}{338.25}\times=110.57$
Total (modified)
261.18
247.42
333.03
316.94
Total
Mean (modified)
$\frac{261.18}{3}=87.06$
$\frac{247.42}{3}=95.81$
$\frac{333.03}{3}=111.01$
$\frac{316.94}{3}=105.65$
399.52
* on values represents the smallest and largest values in a quarter that are not included in the total.
Statistical Software for Seasonal Variation
Several statistical software packages can automate these calculations for you. Popular options include:
Python libraries like Pandas and Statsmodels
R statistical computing environment
Excel with add-in tools like Data Analysis ToolPak
Detrending time series is a process of eliminating the trend component from a time series, where a trend refers to a change in the mean over time (a continuous decrease or increase over time). It means that when data is detrended, an aspect from that data has been removed that you think is causing some kind of distortion.
Assuming the multiplicative model:
$$Detrended\, value = \frac{Y}{T} = \frac{TSCI}{T}=SCI $$
Assuming additive model:
$$Detrended\, value = Y-T=T+S+C+I-T = S+C+I$$
Detrending Time Series (Stationary Time Series)
The detrending time series is a process of removing the trend from a non-stationary time series. A detrended time series is known as a stationary time series, while a time series with a trend is a non-stationary time series. A stationary time series oscillates about the horizontal line. If a series does not have a trend or we remove the trend successfully, the series is said to be trend stationary.
Eliminating the trend component may be thought of as rotating the trend line to a horizontal position. The trend component can be eliminated from the observed time series by computing either the ratios to the trend if the multiplicative model is assumed or the deviations from the trend if the additive model is assumed.
Note that the best detrending method depends on the nature of your trend:
Use differencing for stationary trends (constant increase/decrease).
Use model fitting for more complex trends (curves, changing slopes).
Detrending is often a preparatory step for further analysis such as forecasting and identifying seasonal patterns. On the other hand, detrending might not be necessary if the trend is already incorporated into your analysis. Some methods, like deseasonalizing, can involve both detrending and removing seasonal effects.
When a straight line does not describe accurately the long-term movement of a time series, then one might detect some curvature and decide to fit a curve instead of a straight line.
Table of Contents
The most commonly used curve, to describe the nonlinear secular trend in a time series, are:
Exponential curve, and
Second-degree parabola
Exponential (Nonlinear) Curve
The exponential curve describes the trend (nonlinear) in a time series that changes by a constant percentage rate. The equation of the curve is $\hat{y} = ab^x$
Taking logarithm, we get the linear form $log\, \hat{y}=log\, a + (log\,b)x$
The method of least squares gives the normal equations:
\begin{align*} \sum log\, y & = n\, log\, a + log\, b \sum x\\ \sum log\, y & = n\, log\, a \sum x + log\, b \sum x^2 \end{align*}
However, if $\sum x=0$ the normal equations becomes
\begin{align*} \sum log\,y & = n\, log a\\ \sum x log\, y &= log\, b \sum x^2 \end{align*}
The values of $log\, a$ and $log\, b$ are
\begin{align*} log\, a &=\frac{\sum log\, y}{n}\\ log\, b&= \frac{\sum x log\, y}{\sum x^2} \end{align*}
Taking $antilog$ of of $log\, a$ and $log\, b$, we get the values of $a$ and $b$.
Question: The population of a country for the years 1911 to 1971 in ten yearly intervals in millions is 5.38, 7.22, 9.64, 12.70, 17.80, 24.02, and 31.34. (i) Fit a curve of the type $\hat{y}=ab^x$ to this time series and find the trend values, (ii) Forecast the population for the year 1991.
solution
(i) We have $\overline{t}=\frac{(1991+1971)}{2}=1941$. Let $x=\frac{t-\overline{t}}{10}=\frac{5-1941}{10}$ so that coded year number $x$ is measured in a unit of 10 years.
Year $t$
Population $y$
Coded Year $x=\frac{x-1941}{10}$
$log y$
$x log\, y$
$x^2$
$\hat{y}=13.029(1.345)^x$
1911
5.38
-3
0.73078
-2.19234
9
5.355
1921
7.22
-2
0.85854
-1.71708
4
7.202
1931
9.64
-1
0.98408
-0.98498
1
9.687
1941
12.70
0
1.10380
0
0
13.029
1951
17.808
1
1.25042
1.25042
1
17.524
1961
24.02
2
1.38057
2.76114
4
23.570
1971
31.34
3
1.49610
4.48830
9
31.701
The least squares exponential curve is $\hat{y} = ab^x$
Taking logarithm, $log\, \hat{y} = log a + (log\, b)x$
since $\sum x=0$, therefore
\begin{align*} log\, a &= \frac{\sum log\, y}{n} = \frac{7.80429}{7}=1.1149\\ log\, b &= \frac{\sum x log\, y}{\sum x^2} = \frac{3.60636}{28}=0.12880\\ a &= antilog(1.1149)=13.029\\ b &= antilog(0.1288)=1.345\\ \hat{y} &=13.029 (1.345)^x,\quad \text{with origin at 1941 and unit of $x$ as 10 years} \end{align*}
(ii) For $t=1941$ we have $x=\frac{t-1941}{10}= \frac{1991-1994}{10}=5$. Putting $x=5$, in the least squares exponential curve, we have $\hat{y} = 13.029 (1.345)^5 = 57.348$ millions
Second Degree Parabola (Nonlinear Trend)
It describes the trend (nonlinear) in a time series where a change in the amount of change is constant per unit of time. The quadratic (parabolic) trend can be described by the equation
\begin{align*} \hat{y} = a + bx + cx^2 \end{align*}
The method of least squares gives the normal equations as
\begin{align*} \sum y &= na + b\sum x + c \sum x^2\\ \sum xy &= a\sum x + b\sum x^2 + c \sum x^3\\ \sum x^2y &= a \sum x^2 + b\sum x^3 + c\sum x^4 \end{align*}
However if $\sum x = 0 \sum x^3$ then the normal equation reduces to
\begin{align*} \sum y &= na + c\sum x^2\\ \sum xy &= b\sum x^2\\ \sum x^2 y &= a \sum x^2 + c \sum x^4\\ & \text{the values of $a$, $b$, and $c$ can be found as}\\ c &= \frac{n \sum x^2 y – (\sum x^2)(\sum y)}{n \sum x^2 -(\sum x^2)^2}\\ a&=\frac{\sum y – c\sum x^2}{n}\\ b&= \frac{\sum xy}{\sum x^2} \end{align*}
Question: Given the following time series
Year
1931
1933
1935
1937
1939
1941
1943
1945
Price Index
96
87
91
102
108
139
307
289
Fit a second-degree parabola taking the origin in 1938.
Find the trend values
What would have been the equation of the parabola if the origin were in 1933
Solution
(i)
Year $t$
Price index $y$
Coded Year $x=t-1938$
$x^2$
$x^4$
$xy$
$x^2y$
Trend values $y=110.2+15.48x+2.01 x^2$
1931
96
-7
49
2401
-672
4704
100.33
1933
87
-5
25
625
-435
2175
83.05
1935
91
-3
9
81
-273
819
81.85
1937
102
-1
1
1
-102
102
96.73
1939
108
1
1
1
108
108
127.69
1941
139
3
9
81
417
1251
174.73
1943
307
5
25
625
1535
7675
237.85
1945
289
7
49
2401
2023
14161
317.05
Total
1219
0
168
6216
2601
30995
(ii) Different trend values are already computed in the above table.
\begin{align*} \hat{y} &= a + b x + c x^2\\ c &= \frac{n\sum x^2 y-(\sum x^2)(\sum y)}{n \sum x^4 -(\sum x^2)^2} =\frac{8(30995)-(168)(1219)}{8(6126)-(168)^2}=2.01\\ a &= \frac{\sum y – a \sum x^2}{n}=\frac{1219-(2.01)(168)}{8}=119.2\\ b &= \frac{\sum xy}{\sum x^2}=\frac{2601}{168} = 15.48\\ \hat{y} &= 110.2 + 15.48x + 2.01^2,\quad \text{with origin at the year 1938} \end{align*}
For different values of $x$, the trend values are obtained in the table.
For shifting the origin at 1933, replace $x$ by $(x-5)$
The method of least squares gives the most satisfactory measurement of the secular trend in a time series when the distribution of the deviations is approximately normal.
The least-squares estimates are unbiased estimates of the parameters.
The method can be used when the trend is linear, exponential, or quadratic.
Demerits of Least Squares
The method of least squares method gives too much weight to extremely large deviations from the trend
The least-squares line is the best only for the period to which it has reference.
The elimination or addition for a few or more periods may change its position.