Autocorrelation An Introduction (2020)

The term autocorrelation may be defined as a “correlation between members of a series of observations ordered in time (as in time series data) or space (as in cross-sectional data)”. Autocorrelation is most likely to occur in time-series data. In the regression context, the CLRM assumes that covariances and correlations do not exist in the disturbances $u_i$. Symbolically,

$$Cov(u_i, u_j | x_i, x_j)=E(u_i u_j)=0, \quad i\ne j$$

In simple words, the disturbance term relating to any observation is not influenced by the disturbance term relating to any other observation. In other words, the error terms $u_i$ and $u_j$ are independently distributed (serially independent). If there are dependencies among disturbance terms, then there is a problem of autocorrelation. Symbolically,

$$ Cov(u_i,u_j|x_i, x_j) = E(u_i, u_j) \ne 0,\quad i\ne j$$

Autocorrelation

Suppose, we have disturbance terms from two different time series say $u$ and $v$ such as $u_1, u_2, \cdots, u_{10}$, and $v_1,v_2,\cdots, v_{11}$, then the correlation between these two different time series is called serial correlation (that is, the lag correlation between two series).

Suppose, we have two-time series $u$ ($u_1,u_2,\cdots, u_{10}$) and the lag values of this series are $u_2, u_3,\cdots, u_{12}$, then the correlation between these series is called auto-correlation (that is the lag correlation of a given series with itself, lagged by a number of time units).

The use of OLS to estimate a regression model results in BLUE estimates of the parameters only when all the assumptions of the CLRM are satisfied. After performing regression analysis one may plot the residuals to observe some patterns when results are not according to prior expectations.

Plausible Patterns of Autocorrelation

Some plausible patterns of autocorrelation and non-autocorrelation are:

Patterns of Autocorrelation

Figure $a$–$d$ shows that there is a discernible (قابل دریافت، عیاں، قابل فہم) pattern among the $u$’s.
ٖFigure (a) shows a cyclical pattern.
Figure (b) suggests an upward linear trend in the disturbances
Figure (c) suggests a downward linear trend in the disturbances
Figure (d) indicates both the linear and quadratic trend terms are present in the disturbances
Figure (e) shows no systematic pattern. Therefore, supporting the assumption of CLRM of no autocorrelation.

The importance of autocorrelation can be described as follows:

  • Identifying Patterns: Autocorrelation measures the correlation between a variable and its lagged versions, essentially checking how similar past values are to present values. Therefore, it helps identify trends or seasonality within the data. For instance, positive auto-correlation in stock prices might suggest momentum, where recent gains could indicate continued increase.
  • Validating Models: Many statistical models, especially related to time series forecasting, assume independence between errors term. Autocorrelation helps to assess this assumption. If data exhibits autocorrelation, it can mislead the model, and further may lead to inaccurate forecasts. Accounting for autocorrelation through appropriate techniques improves model accuracy.
  • Understanding Dynamic Systems: Presence of auto-correlation indicates that the dependence of system on its past states. This is valuable in various fields, like finance or engineering, where system behavior is influenced by its history.
https://itfeature.com

Learn R Programming

Computer MCQs Test Online

NonLinear Trends and Method of Least Squares

When a straight line does not describe accurately the long-term movement of a time series, then one might detect some curvature and decide to fit a curve instead of a straight line.

The most commonly used curve, to describe the nonlinear secular trend in a time series, are:

  1. Exponential curve, and
  2. Second-degree parabola

Exponential (Nonlinear) Curve

The exponential curve describes the trend (nonlinear) in a time series that changes by a constant percentage rate. The equation of the curve is $\hat{y} = ab^x$

Taking logarithm, we get the linear form $log\, \hat{y}=log\, a + (log\,b)x$

The method of least squares gives the normal equations:

\begin{align*}
\sum log\, y & = n\, log\, a + log\, b \sum x\\
\sum log\, y & = n\, log\, a \sum x + log\, b \sum x^2
\end{align*}

However, if $\sum x=0$ the normal equations becomes

\begin{align*}
\sum log\,y & = n\, log a\\
\sum x log\, y &= log\, b \sum x^2
\end{align*}

The values of $log\, a$ and $log\, b$ are

\begin{align*}
log\, a &=\frac{\sum log\, y}{n}\\
log\, b&= \frac{\sum x log\, y}{\sum x^2}
\end{align*}

Taking $antilog$ of of $log\, a$ and $log\, b$, we get the values of $a$ and $b$.

Question: The population of a country for the years 1911 to 1971 in ten yearly intervals in millions is 5.38, 7.22, 9.64, 12.70, 17.80, 24.02, and 31.34. (i) Fit a curve of the type $\hat{y}=ab^x$ to this time series and find the trend values, (ii) Forecast the population for the year 1991.

solution

(i) We have $\overline{t}=\frac{(1991+1971)}{2}=1941$. Let $x=\frac{t-\overline{t}}{10}=\frac{5-1941}{10}$ so that coded year number $x$ is measured in a unit of 10 years.

Year $t$Population $y$Coded Year $x=\frac{x-1941}{10}$$log y$$x log\, y$$x^2$$\hat{y}=13.029(1.345)^x$
19115.38-30.73078-2.1923495.355
19217.22-20.85854-1.7170847.202
19319.64-10.98408-0.9849819.687
194112.7001.103800013.029
195117.80811.250421.25042117.524
196124.0221.380572.76114423.570
197131.3431.496104.48830931.701

The least squares exponential curve is $\hat{y} = ab^x$

Taking logarithm, $log\, \hat{y} = log a + (log\, b)x$

since $\sum x=0$, therefore

\begin{align*}
log\, a &= \frac{\sum log\, y}{n} = \frac{7.80429}{7}=1.1149\\
log\, b &= \frac{\sum x log\, y}{\sum x^2} = \frac{3.60636}{28}=0.12880\\
a &= antilog(1.1149)=13.029\\
b &= antilog(0.1288)=1.345\\
\hat{y} &=13.029 (1.345)^x,\quad \text{with origin at 1941 and unit of $x$ as 10 years}
\end{align*}

(ii) For $t=1941$ we have $x=\frac{t-1941}{10}= \frac{1991-1994}{10}=5$. Putting $x=5$, in the least squares exponential curve, we have
$\hat{y} = 13.029 (1.345)^5 = 57.348$ millions

Nonlinear Trends method of least squares

Second Degree Parabola (Nonlinear Trend)

It describes the trend (nonlinear) in a time series where a change in the amount of change is constant per unit of time. The quadratic (parabolic) trend can be described by the equation

\begin{align*}
\hat{y} = a + bx + cx^2
\end{align*}

The method of least squares gives the normal equations as

\begin{align*}
\sum y &= na + b\sum x + c \sum x^2\\
\sum xy &= a\sum x + b\sum x^2 + c \sum x^3\\
\sum x^2y &= a \sum x^2 + b\sum x^3 + c\sum x^4
\end{align*}

However if $\sum x = 0 \sum x^3$ then the normal equation reduces to

\begin{align*}
\sum y &= na + c\sum x^2\\
\sum xy &= b\sum x^2\\
\sum x^2 y &= a \sum x^2 + c \sum x^4\\
& \text{the values of $a$, $b$, and $c$ can be found as}\\
c &= \frac{n \sum x^2 y – (\sum x^2)(\sum y)}{n \sum x^2 -(\sum x^2)^2}\\
a&=\frac{\sum y – c\sum x^2}{n}\\
b&= \frac{\sum xy}{\sum x^2}
\end{align*}

Question: Given the following time series

Year19311933193519371939194119431945
Price Index968791102108139307289
  1. Fit a second-degree parabola taking the origin in 1938.
  2. Find the trend values
  3. What would have been the equation of the parabola if the origin were in 1933

Solution

(i)

Year
$t$
Price index
$y$
Coded Year
$x=t-1938$
$x^2$$x^4$$xy$$x^2y$Trend values
$y=110.2+15.48x+2.01 x^2$
193196-7492401-6724704100.33
193387-525625-435217583.05
193591-3981-27381981.85
1937102-111-10210296.73
1939108111108108127.69
194113939814171251174.73
194330752562515357675237.85
19452897492401202314161317.05
Total121901686216260130995 

(ii) Different trend values are already computed in the above table.

\begin{align*}
\hat{y} &= a + b x + c x^2\\
c &= \frac{n\sum x^2 y-(\sum x^2)(\sum y)}{n \sum x^4 -(\sum x^2)^2} =\frac{8(30995)-(168)(1219)}{8(6126)-(168)^2}=2.01\\
a &= \frac{\sum y – a \sum x^2}{n}=\frac{1219-(2.01)(168)}{8}=119.2\\
b &= \frac{\sum xy}{\sum x^2}=\frac{2601}{168} = 15.48\\
\hat{y} &= 110.2 + 15.48x + 2.01^2,\quad \text{with origin at the year 1938}
\end{align*}

For different values of $x$, the trend values are obtained in the table.

For shifting the origin at 1933, replace $x$ by $(x-5)$

\begin{align*}
\hat{y} &= 110.2 + 15.48(x-5)+2.01(x-5)^2\\
&= 110.2 + 15.48(x-5)+2.01(x^2 -10x + 25)\\
&= 110.2 + 15.48x -77.4 + 2.01x^2 – 20.1x + 50.25\\
&= 83.05 -4.62x + 2.01x^2, \quad \text{with origin at the year 1933}
\end{align*}

Merits of Least Squares

  • The method of least squares gives the most satisfactory measurement of the secular trend in a time series when the distribution of the deviations is approximately normal.
  • The least-squares estimates are unbiased estimates of the parameters.
  • The method can be used when the trend is linear, exponential, or quadratic.

Demerits of Least Squares

  • The method of least squares method gives too much weight to extremely large deviations from the trend
  • The least-squares line is the best only for the period to which it has reference.
  • The elimination or addition for a few or more periods may change its position.

Statistical Models in R Programming Language

Method of Least Squares: Linear Trend (2020)

The least-squares principle (Method of Least Squares) says that “the sum of squares of the deviations of the observed values from the corresponding expected values should be least”. Among all the trend lines, the trend line is called a least-squares fit for which the sum of the squares of the deviations of the observed values from their corresponding expected values is the least.

Note that the usual probabilistic assumptions made in regression and correlation analysis are not met in the case of time series data.

Secular Trend — Linear Trend

It is useful to describe the trend in a time series where the amount of change is constant per unit of time.

Let $(x_1, y_1), (x_2, y_2), \cdots, (x_n,y_n)$ be the $n$ pairs of observed sample values of a time series variable $y$, with $x$ representing the coded time value. We can plot these $n$ points on a graph.

Let us suppose that we want to fit a straight line expressed in slope-intercept form as:

\begin{align}
\hat{y} = a + bx, \quad \quad (eq1)
\end{align}

The line (eq-1) will be called the least squares line if it makes $\sum(y-a-bx)^2$ minimum. The method of least squares yields the following normal equations:

\begin{align*}
\sum y &= na + b \sum x\\
\sum xy &= a \sum x + b \sum x^2
\end{align*}

The normal equations give the value of $a$ and $b$ as:

\begin{align*}
b &= \frac{n \sum xy – (\sum x \sum y )}{n \sum x^2 -(\sum x)^2}\\
a & = \overline{y}-b\overline{x}
\end{align*}

However, if $\sum x=0$ the usual normal equations reduces to

\begin{align*}
\sum y &= na\\
\sum xy & = b\sum x^2
\end{align*}

The value of $a$ and $b$ also reduces to

\begin{align*}
a&=\frac{\sum y}{n}=\overline{y}\\
b&=\frac{\sum xy}{\sum x^2}
\end{align*}

The trend values $\hat{y}$ are computed from the least-squares line $\hat{y}=a+bx$ by substituting the values of $x$ corresponding to the different time periods.

Properties of the Method of Least Squares

  • The least-squares line always passes through the point ($\overline{x}, \overline{y}$) called the center of gravity of the data.
  • The sum of deviations $\sum(y-\hat{y})$ of the observed values $y$ from their corresponding expected values $\hat{y}$ is zero, that is, $\sum(y-\hat{y})=0$, hence $\sum y= \sum \hat{y}$
  • The sum of squares of the deviations $\sum (y-\hat{y})^2$ measures how well the trend line fits the data. A smaller $\sum (y-\hat{y})^2$ means the better fit.

Moving Averages and Least Squares Linear Trend: The least-squares linear trend values corresponding to the central time period in each group of $k$ observations are equal to the $k$-period moving averages.

Question: Determine the trend line by the least-squares method from the following data. Plot the actual values and the linear trend on the same graph.

Year194519461947194819491950195119521953
Price3621079141218

Solution

The equation of the trend line is

\begin{align*}
\hat{Y} = a + b\, X
\end{align*}

Normal Equations are:

\begin{align*}
\Sigma Y & = n\, a + b \, \Sigma X \tag{i}\\
\Sigma XY& = a\, \Sigma X+ b\, \Sigma X^2 \tag{ii}
\end{align*}

YearValue$X$$XY$$X^2$$\hat{Y}$
19453-4-121622
19466-3-1893.9
19472-2-445.6
194810-1-1017.3
194970009.0
1950919110.7
195114228412.4
195212336914.1
1953184721615.8
Total8101016081.0

Putting the values in Normal equations:

\begin{align*}
81 &= 9a \tag*{1}\\
101&= 60b \tag*{2}
\end{align*}

From (1) $a=\frac{81}{9}=9$, and from (2) $b=\frac{101}{60}=1.7$.

Fitted trend line is $\hat{Y}=9 + 1.7\,X$.

The Method of Least Squares: Linear Trend

The method of least squares is a valuable tool for analyzing trends in time series data. By understanding the strengths and limitations of the methods, you can effectively use them to gain insights, make predictions, and compare trends across different time series datasets.

The Method of Least Squares: Non-Linear Trend

Online MCQs Test Website about various subjects