Autocorrelation Reasons

The post is about autocorrelation Reasons that may occur in time series data. To learn and understand what is autocorrelation, see the post about Introduction to autocorrelation.

Autocorrelation Reasons

Autocorrelation Reasons

There are several reasons for Autocorrelation. Some of the most important autocorrelation reasons are:

i) Inertia

Inertia or sluggishness in economic time series is a great reason for autocorrelation. For example, GNP, production, price index, employment, and unemployment exhibit business cycles. Starting at the bottom of the recession, when the economic recovery starts, most of these series start moving upward. In this upswing, the value of a series at one point in time is greater than its previous values. These successive periods (observations) are likely to be interdependent.

ii) Omitted Variables Specification Bias

The residuals (which are proxies for $u_i$) may suggest that some variables that were originally candidates but were not included in the model (for a variety of reasons) should be included. This is the case of excluded variable specification bias. Often the inclusion of such variables may remove the correlation pattern observed among the residuals. For example, the model

$$Y_t = \beta_1 + \beta_2 X_{2t} + \beta_3 X_{3t} + \beta_4 X_{4t} + u_t,$$

is correct. However, running

$$Y_t=\beta_1 + \beta_2 X_{2t} + \beta_3X_{3t}+v_i,\quad \text{where $v_t=\beta_4X_{4t}+u_t$ },$$

the error or disturbance term will reflect a systematic pattern. Thus creating false autocorrelation, due to the exclusion of $X_{4t}$ variable from the model. The effect of $X_{4t}$ will be captured by the disturbances $v_t$.

iii) Model Specification: Incorrect Functional Form

Autocorrelation can also occur due to the miss-specification of the model. Suppose that $Y_t$ is connected to $X_{2t}$ with a quadratic relation

$$Y_t=\beta_1 + \beta_2 X_{2t}^2+u_t,$$

but we wrongly estimate a straight line relationship ($Y_t=\beta_1 + \beta_2X_{2t}+u_t$). In this case, the error term obtained from the straight line specification will depend on $X_{2t}^2$. If $X_{2t}$ is increasing/decreasing over time, $u_t$ will also be increasing or decreasing over time. Therefore, an Incorrect Functional Form is also another important reason for autocorrelation.

iv) Effect of Cobweb Phenomenon

The quantity supplied in the period $t$ of many agricultural commodities depends on their price in period $t-1$. This is called the Cobweb phenomenon. This is because the decision to plant a crop in a period of $t$ is influenced by the price of the commodity in that period. However, the actual supply of the commodity is available in the period $t+1$.

\begin{align*}
QS_{t+1} &= \alpha + \beta P_t + \varepsilon_{t+1}\\
\text{or }\quad QS_t &= \alpha + \beta P_{t-1} + \varepsilon_t
\end{align*}

This supply model indicates that if the price in period $t$ is higher, the farmer will decide to produce more in the period $t+1$. Because of increased supply in period $t+1$, $P_{t+1}$ will be lower than $P_t$. As a result of lower price in period $t+1$, the farmer will produce less in period $t+2$ than they did in period $t+1$. Thus disturbances in the case of the Cobweb phenomenon are not expected to be random, rather, they will exhibit a systematic pattern and thus cause a problem of autocorrelation.

v) Effect of Lagged Relationship

Many times in business and economic research the lagged values of the dependent variable are used as explanatory variables. For example, to study the effect of tastes and habits on consumption in a period $t$, consumption in period $t-1$ is used as an explanatory variable since consumer do not change their consumption habits readily for psychological, technological, or institutional reasons. The consumption function will be

$$C_t = \alpha + \beta Y + \gamma C_{t-1} + \varepsilon_t,$$
where $C$ is consumption and $Y$ is income.

If the lagged terms ($C_{t-1}$) are not included in the above consumption function, the resulting error term will reflect a systematic pattern due to the impact of habits and tastes on current consumption and thereby autocorrelation will be present.

vi) Data Manipulation

Data manipulation is also another important reason for autocorrelation. Often raw data are manipulated in empirical analysis. For example, in time-series regression involving quarterly data, such data are usually derived from the monthly data by simply adding three monthly observations and dividing the sum by 3. This averaging introduces smoothness to the data by dampening the fluctuations in the monthly data. This smoothness may itself lend to a systematic pattern in the disturbances, thereby introducing autocorrelation.

Interpolation or extrapolation of data is also another source of data manipulation.

Vii) Non-Stationarity

Both $Y$ and $X$ may be non-stationary and therefore, the error $u$ is also non-stationary. In this case, the error term will exhibit autocorrelation.

Patterns of Autocorrelation and Non-Autocorrelation Autocorrelation Reasons

This is all about autocorrelation reasons.

MCQs General Knowledge

Read more about autocorrelation.

Autocorrelation An Introduction (2020)

The term autocorrelation may be defined as a “correlation between members of a series of observations ordered in time (as in time series data) or space (as in cross-sectional data)”. Autocorrelation is most likely to occur in time-series data. In the regression context, the CLRM assumes that covariances and correlations do not exist in the disturbances $u_i$. Symbolically,

$$Cov(u_i, u_j | x_i, x_j)=E(u_i u_j)=0, \quad i\ne j$$

In simple words, the disturbance term relating to any observation is not influenced by the disturbance term relating to any other observation. In other words, the error terms $u_i$ and $u_j$ are independently distributed (serially independent). If there are dependencies among disturbance terms, then there is a problem of autocorrelation. Symbolically,

$$ Cov(u_i,u_j|x_i, x_j) = E(u_i, u_j) \ne 0,\quad i\ne j$$

Autocorrelation

Suppose, we have disturbance terms from two different time series say $u$ and $v$ such as $u_1, u_2, \cdots, u_{10}$, and $v_1,v_2,\cdots, v_{11}$, then the correlation between these two different time series is called serial correlation (that is, the lag correlation between two series).

Suppose, we have two-time series $u$ ($u_1,u_2,\cdots, u_{10}$) and the lag values of this series are $u_2, u_3,\cdots, u_{12}$, then the correlation between these series is called auto-correlation (that is the lag correlation of a given series with itself, lagged by a number of time units).

The use of OLS to estimate a regression model results in BLUE estimates of the parameters only when all the assumptions of the CLRM are satisfied. After performing regression analysis one may plot the residuals to observe some patterns when results are not according to prior expectations.

Plausible Patterns of Autocorrelation

Some plausible patterns of autocorrelation and non-autocorrelation are:

Patterns of Autocorrelation

Figure $a$–$d$ shows that there is a discernible (قابل دریافت، عیاں، قابل فہم) pattern among the $u$’s.
ٖFigure (a) shows a cyclical pattern.
Figure (b) suggests an upward linear trend in the disturbances
Figure (c) suggests a downward linear trend in the disturbances
Figure (d) indicates both the linear and quadratic trend terms are present in the disturbances
Figure (e) shows no systematic pattern. Therefore, supporting the assumption of CLRM of no autocorrelation.

The importance of autocorrelation can be described as follows:

  • Identifying Patterns: Autocorrelation measures the correlation between a variable and its lagged versions, essentially checking how similar past values are to present values. Therefore, it helps identify trends or seasonality within the data. For instance, positive auto-correlation in stock prices might suggest momentum, where recent gains could indicate continued increase.
  • Validating Models: Many statistical models, especially related to time series forecasting, assume independence between errors term. Autocorrelation helps to assess this assumption. If data exhibits autocorrelation, it can mislead the model, and further may lead to inaccurate forecasts. Accounting for autocorrelation through appropriate techniques improves model accuracy.
  • Understanding Dynamic Systems: Presence of auto-correlation indicates that the dependence of system on its past states. This is valuable in various fields, like finance or engineering, where system behavior is influenced by its history.
https://itfeature.com

Learn R Programming

Computer MCQs Test Online

NonLinear Trends and Method of Least Squares

When a straight line does not describe accurately the long-term movement of a time series, then one might detect some curvature and decide to fit a curve instead of a straight line.

The most commonly used curve, to describe the nonlinear secular trend in a time series, are:

  1. Exponential curve, and
  2. Second-degree parabola

Exponential (Nonlinear) Curve

The exponential curve describes the trend (nonlinear) in a time series that changes by a constant percentage rate. The equation of the curve is $\hat{y} = ab^x$

Taking logarithm, we get the linear form $log\, \hat{y}=log\, a + (log\,b)x$

The method of least squares gives the normal equations:

\begin{align*}
\sum log\, y & = n\, log\, a + log\, b \sum x\\
\sum log\, y & = n\, log\, a \sum x + log\, b \sum x^2
\end{align*}

However, if $\sum x=0$ the normal equations becomes

\begin{align*}
\sum log\,y & = n\, log a\\
\sum x log\, y &= log\, b \sum x^2
\end{align*}

The values of $log\, a$ and $log\, b$ are

\begin{align*}
log\, a &=\frac{\sum log\, y}{n}\\
log\, b&= \frac{\sum x log\, y}{\sum x^2}
\end{align*}

Taking $antilog$ of of $log\, a$ and $log\, b$, we get the values of $a$ and $b$.

Question: The population of a country for the years 1911 to 1971 in ten yearly intervals in millions is 5.38, 7.22, 9.64, 12.70, 17.80, 24.02, and 31.34. (i) Fit a curve of the type $\hat{y}=ab^x$ to this time series and find the trend values, (ii) Forecast the population for the year 1991.

solution

(i) We have $\overline{t}=\frac{(1991+1971)}{2}=1941$. Let $x=\frac{t-\overline{t}}{10}=\frac{5-1941}{10}$ so that coded year number $x$ is measured in a unit of 10 years.

Year $t$Population $y$Coded Year $x=\frac{x-1941}{10}$$log y$$x log\, y$$x^2$$\hat{y}=13.029(1.345)^x$
19115.38-30.73078-2.1923495.355
19217.22-20.85854-1.7170847.202
19319.64-10.98408-0.9849819.687
194112.7001.103800013.029
195117.80811.250421.25042117.524
196124.0221.380572.76114423.570
197131.3431.496104.48830931.701

The least squares exponential curve is $\hat{y} = ab^x$

Taking logarithm, $log\, \hat{y} = log a + (log\, b)x$

since $\sum x=0$, therefore

\begin{align*}
log\, a &= \frac{\sum log\, y}{n} = \frac{7.80429}{7}=1.1149\\
log\, b &= \frac{\sum x log\, y}{\sum x^2} = \frac{3.60636}{28}=0.12880\\
a &= antilog(1.1149)=13.029\\
b &= antilog(0.1288)=1.345\\
\hat{y} &=13.029 (1.345)^x,\quad \text{with origin at 1941 and unit of $x$ as 10 years}
\end{align*}

(ii) For $t=1941$ we have $x=\frac{t-1941}{10}= \frac{1991-1994}{10}=5$. Putting $x=5$, in the least squares exponential curve, we have
$\hat{y} = 13.029 (1.345)^5 = 57.348$ millions

Nonlinear Trends method of least squares

Second Degree Parabola (Nonlinear Trend)

It describes the trend (nonlinear) in a time series where a change in the amount of change is constant per unit of time. The quadratic (parabolic) trend can be described by the equation

\begin{align*}
\hat{y} = a + bx + cx^2
\end{align*}

The method of least squares gives the normal equations as

\begin{align*}
\sum y &= na + b\sum x + c \sum x^2\\
\sum xy &= a\sum x + b\sum x^2 + c \sum x^3\\
\sum x^2y &= a \sum x^2 + b\sum x^3 + c\sum x^4
\end{align*}

However if $\sum x = 0 \sum x^3$ then the normal equation reduces to

\begin{align*}
\sum y &= na + c\sum x^2\\
\sum xy &= b\sum x^2\\
\sum x^2 y &= a \sum x^2 + c \sum x^4\\
& \text{the values of $a$, $b$, and $c$ can be found as}\\
c &= \frac{n \sum x^2 y – (\sum x^2)(\sum y)}{n \sum x^2 -(\sum x^2)^2}\\
a&=\frac{\sum y – c\sum x^2}{n}\\
b&= \frac{\sum xy}{\sum x^2}
\end{align*}

Question: Given the following time series

Year19311933193519371939194119431945
Price Index968791102108139307289
  1. Fit a second-degree parabola taking the origin in 1938.
  2. Find the trend values
  3. What would have been the equation of the parabola if the origin were in 1933

Solution

(i)

Year
$t$
Price index
$y$
Coded Year
$x=t-1938$
$x^2$$x^4$$xy$$x^2y$Trend values
$y=110.2+15.48x+2.01 x^2$
193196-7492401-6724704100.33
193387-525625-435217583.05
193591-3981-27381981.85
1937102-111-10210296.73
1939108111108108127.69
194113939814171251174.73
194330752562515357675237.85
19452897492401202314161317.05
Total121901686216260130995 

(ii) Different trend values are already computed in the above table.

\begin{align*}
\hat{y} &= a + b x + c x^2\\
c &= \frac{n\sum x^2 y-(\sum x^2)(\sum y)}{n \sum x^4 -(\sum x^2)^2} =\frac{8(30995)-(168)(1219)}{8(6126)-(168)^2}=2.01\\
a &= \frac{\sum y – a \sum x^2}{n}=\frac{1219-(2.01)(168)}{8}=119.2\\
b &= \frac{\sum xy}{\sum x^2}=\frac{2601}{168} = 15.48\\
\hat{y} &= 110.2 + 15.48x + 2.01^2,\quad \text{with origin at the year 1938}
\end{align*}

For different values of $x$, the trend values are obtained in the table.

For shifting the origin at 1933, replace $x$ by $(x-5)$

\begin{align*}
\hat{y} &= 110.2 + 15.48(x-5)+2.01(x-5)^2\\
&= 110.2 + 15.48(x-5)+2.01(x^2 -10x + 25)\\
&= 110.2 + 15.48x -77.4 + 2.01x^2 – 20.1x + 50.25\\
&= 83.05 -4.62x + 2.01x^2, \quad \text{with origin at the year 1933}
\end{align*}

Merits of Least Squares

  • The method of least squares gives the most satisfactory measurement of the secular trend in a time series when the distribution of the deviations is approximately normal.
  • The least-squares estimates are unbiased estimates of the parameters.
  • The method can be used when the trend is linear, exponential, or quadratic.

Demerits of Least Squares

  • The method of least squares method gives too much weight to extremely large deviations from the trend
  • The least-squares line is the best only for the period to which it has reference.
  • The elimination or addition for a few or more periods may change its position.

Statistical Models in R Programming Language