The Correlogram

A correlogram is a graph used to interpret a set of autocorrelation coefficients in which $r_k$ is plotted against the $log k$. A correlogram is often very helpful for visual inspection.

Some general advice to interpret the correlogram are:

  • A Random Series: If a time series is completely random, then for large $N$, $r_k \cong 0$ for all non-zero values of $k$. A random time series $r_k$ is approximately $N\left(0, \frac{1}{N}\right)$. If a time series is random, 19 out of 20 of the values of $r_k$ can be expected to lie between $\pm \frac{2}{\sqrt{N}}$. However, plotting the first 20 values of $r_k$, one can expect to find one significant value on average even when the time series is random.
  • Short-term Correlation: Stationary series often exhibit short-term correlation characterized by a fairly large value of $r_1$ followed by 2 or 3 more coefficients (significantly greater than zero) tend to get successively smaller values of $r_k$ for larger lags tend to get be approximately zero. A time series that gives rise to such a correlogram is one for which an observation above the mean tends to be followed by one or more further observations above the mean and similarly for observation below the mean. A model called an autoregressive model may be appropriate for a series of this type.
Correlogram
  • Alternating Series: If a time series tends to alternate with successive observations on different sides of the overall mean, then the correlogram also tends to alternate. The value of $r_1$ will be negative, however, the value of $r_2$ will be positive as observation at lag 2 will tend to be on the same side of the mean.
  • Non-Stationary Series: If a time series contains a trend, then the value of $r_k$ will not come down to zero except for very large values of the lags. This is because of a large number of further observations on the same side of the mean because of the trend. The sample autocorrelation function $\{ r_k \}$ should only be calculated for stationary time series and no trend should be removed before calculating $\{ r_k\}$.
  • Seasonal Fluctuations: If a time series contains a seasonal fluctuation then the correlogram will also exhibit an oscillation at the same frequency. If $x_t$ follows a sinusoidal pattern then so does $r_k$.
    $x_t=a\, cos\, t\, w, $ where $a$ is constant, $w$ is frequency such that $0 < w < \pi$. Therefore $r_k \cong cos\, k\, w$ for large $N$.
    If the seasonal variation is removed from seasonal data then the correlogram may provide useful information.
  • Outliers: If a time series contains one or more outliers the correlogram may be seriously affected. If there is one outlier in the time series and it is not adjusted, then the plot of $x_y$ vs $x_{t+k}$ will contain two extreme points, which will tend to depress the sample correlation coefficients towards zero. If there are two outliers, this effect is more noticeable.
  • General Remarks: Experience is required to interpret autocorrelation coefficients. We need to study the probability theory of stationary series and the classes of the model too. We also need to know the sampling properties of $x_t$.

There are two main types of correlograms depending on the type of correlation being analyzed:

  • Pearson Correlation: This is the most common type and measures linear correlations between continuous variables.
  • Spearman Rank Correlation: This is a non-parametric measure suitable for ordinal or continuous data and assesses monotonic relationships (not necessarily linear).

In summary, a correlogram is a valuable tool for exploratory data analysis. It helps us:

  • Understand the relationships between multiple variables in your data.
  • Identify potential issues with multicollinearity before building statistical models.
  • Gain insights into the underlying structure of your data.
itfeature.com correlogram

Learn R Programming and R Data Analysis

Online MCQs Test

Objectives of Time Series Analysis (2014)

There are many objectives of time series analysis. The one of major Objectives of Time Series is to identify the underlying structure of the Time Series represented by a sequence of observations by breaking it down into its components (Secular Trend, Seasonal Variation, Cyclical Trend, Irregular Variation).

Objectives of Time Series Analysis

The objectives of Time Series Analysis are classified as follows:

  1. Description
  2. Explanation
  3. Prediction
  4. Control

The description of the objectives of time series analysis is as follows:

Description of Time Series Analysis

The first step in the analysis is to plot the data and obtain simple descriptive measures (such as plotting data, looking for trends,  seasonal fluctuations, and so on) of the main properties of the series. In the above figure, there is a regular seasonal pattern of price change although this price pattern is not consistent. The Graph enables us to look for “wild” observations or outliers (not appear to be consistent with the rest of the data). Graphing the time series makes possible the presence of turning points where the upward trend suddenly changed to a downward trend. If there is a turning point, different models may have to be fitted to the two parts of the series.

Explanation

Observations were taken on two or more variables, making it possible to use the variation in a one-time series to explain the variation in another series. This may lead to a deeper understanding. A multiple regression model may be helpful in this case.

Prediction

Given an observed time series, one may want to predict the future values of the series. It is an important task in sales forecasting and is the analysis of economic and industrial time series. Prediction and forecasting are used interchangeably.

Control

When time series is generated to measure the quality of a manufacturing process (the aim may be) to control the process. Control procedures are of several different kinds. In quality control, the observations are plotted on a control chart and the controller takes action as a result of studying the charts. A stochastic model is fitted to the series. Future values of the series are predicted and then the input process variables are adjusted to keep the process on target.

Objectives of Time Series Analysis seasonal-effects
Image taken from: http://archive.stats.govt.nz

The figure shows that there is a regular seasonal pattern of price change although this price pattern is not consistent.

In quality control, the observations are plotted on the control chart and the controller takes action as a result of studying the charts.

A stochastic model is fitted to the series. Future values of the series are predicted and then the input process variables are adjusted to keep the process on target.

Learn more about Time Series on Wikipedia

Learn R Programming

Time Series Analysis and Forecasting (2013)

Time Series Analysis

Time series analysis is the analysis of a series of data points over time, allowing one to answer questions such as what is the causal effect on a variable $Y$ of a change in variable $X$ over time? An important difference between time series and cross-section data is that the ordering of cases does matter in time series.

A time series $\{Y_t\}$ or $\{y_1,y_2,\cdots,y_T\}$ is a discrete-time, continuous state process where time $t=1,2,\cdots,=T$ are certain discrete time points spaced at uniform time intervals.

Usually, time is taken at more or less equally spaced intervals such as hour, day, month, quarter, or year. More specifically, it is a set of data in which observations are arranged in chronological order (A set of repeated observations of the same variable).

Use of Time Series

Time series are used in different fields of science such as statistics, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, electroencephalography, control engineering, astronomy, and communications engineering among many other fields.

Definition: A sequence of random variables indexed by time is called a stochastic process (stochastic means random) or time series for mere mortals. A data set is one possible outcome (realization) of the stochastic process. If history had been different, we would observe a different outcome, thus we can think of time series as the outcome of a random variable.

Rather than dealing with individuals as units, the unit of interest is time: the value of Y at time $t$ is $Y_t$. The unit of time can be anything from days to election years. The value of $Y_t$ in the previous period is called the first lag value: $Y_{t-1}$. The jth lag is denoted: $Y_{t-j}$. Similarly, $Y_{t+1}$ is the value of $Y_t$ in the next period. So a simple bivariate regression equation for time series data looks like: \[Y_t = \beta_0 + \beta X_t + u_t\]

Continuous Time Series

A time series is said to be continuous when observation are made continuously in time. The term continuous is used for series of this type even when the measured variable can only take a discrete set of values.

Discrete Time Series

A time series is said to be discrete when observations are taken at a specific time, usually equally spaced. The term discrete is used for series of this type even when the measured variable is a continuous variable.

Most Macroeconomic and financial data comes in the form of time series. GNP or Stock Return is an example of time series data.

We can write a series as $\{x_1,x_2,x_3,\cdots,x_T\}$ or $\{x_t\}$, where $t=1,2,3,\cdots,T$. $x_t$ is treated as a random variable.

Time series analysis refers to the branch of statistics where observations are collected sequentially in time, usually but not necessarily at equal-spaced time points. The arcane difference between time series and other variables is the use of subscripts.

Time series analysis comprises methods for analyzing time series data to extract some useful (meaningful) statistics and other characteristics of the data, while Time series forecasting is the use of a model to predict future values based on previously observed values.

Given an observed time series, the first step in analyzing a time series is to plot the given series on a graph taking time intervals (t) along the X-axis (as independent variable) and the observed value ($Y_t$) on the Y-axis (as dependent variable). Such a graph will show various types of fluctuations and other points of interest.

Time Series Analysis and Forecasting

Note

  • $Y_t$ is treated as random variable. If $Y_t$ is generated by some model (Regression model for time series i.e. $Y_t=x_t\beta +\varepsilon_t$, $E(\varepsilon_t|x_t)=0$, then ordinary least square (OLS) provides a consistent estimates of $\beta$.
  • Time series interchangeably used for sample $\{x_t\}$ and probability model. A possible probability model for the joint distribution of a time series $\{x_t\}$ is $x_t=\varepsilon_t$, $\varepsilon_t\sim iid  N(0,\sigma_\varepsilon^2)$
  • Time series are typically not iid (Independent Identically Distributed) e.g. If GNP today is unusually high, GNP tomorrow will also likely to be unusually high.

Reference:

R Programming Language