# Basic Statistics and Data Analysis

## Random Walk Model

The random walk model is widely used in the area of finance. The stock prices or exchange rates (Asset prices) follow a random walk. A common and serious departure from random behavior is called a random walk (non-stationary), since today’s stock price is equal to yesterday stock price plus a random shock.

There are two types of random walks

1. Random walk without drift (no constant or intercept)
2. Random walk with drift (with a constant term)

Definition

A time series said to follow a random walk if the first differences (difference from one observation to the next observation) are random.

Note that in a random walk model, the time series itself is not random, however, the first differences of time series are random (the differences changes from one period to the next).

A random walk model for a time series $X_t$ can be written as

$X_t=X_{t-1}+e_t\, \, ,$

where $X_t$ is the value in time period $t$, $X_{t-1}$ is the value in time period $t-1$ plus a random shock $e_t$ (value of error term in time period $t$).

Since the random walk is defined in terms of first differences, therefore, it is easier to see the model as

$X_t-X_{t-1}=e_t\, \, ,$

where the original time series is changed to a first difference time series, that is the time series is transformed.

The transformed time series:

• Forecast the future trends to aid in decision making
• If time series follows random walk, the original series offers little or no insights
• May need to analyze first differenced time series

Consider a real-world example of daily US-dollar-to-Euro exchange rate. A plot of entire history (of daily US-dollar-to-Euro exchange rate) from January 1, 1999, to December 5, 2014 looks like

The historical pattern from above plot looks quite interesting, with many peaks and valleys. The plot of the daily changes (first difference) would look like

The volatility (variance) has not been constant over time, but the day-to-day changes are almost completely random.

Remember that, random walk patterns are also widely found elsewhere in nature, for example, in the phenomenon of Brownian Motion that was first explained by Einstein.

# Stationary Stochastic Process

A stochastic process is said to be stationary if its mean and variance are constant over time and the value of the covariance between the two time periods depends only on a distance or gap or lag between the two time periods and not the actual time at which the covariance is computed. Such a stochastic process also known as weak stationary, covariance stationary, second-order stationary or wide sense stochastic process.

In other words a sequence of random variables {$y_t$} is covariance stationary if there is no trend, and if the covariance does not change over time.

## Strictly Stationary (Covariance Stationary)

A time series is strictly stationary, if all the moments of its probability distribution are invariance over time but not for first two (mean and variance).

Let $y_t$ be a stochastic time series with

$E(y_t) = \mu$    $\Rightarrow$ Mean
$V(y_t) = E(y_t -\mu)^2=\sigma^2$  $\Rightarrow$ Variance
$\gamma_k = E[(y_t-\mu)(y_{t+k}-\mu)]$  $\Rightarrow$ Covariance = $Cov(y_t, y_{t-k})$

$\gamma_k$ is covariance or autocovariance at lag $k$.

If $k=0$ then $Var(y_t)=\sigma^2$ i.e. $Cov(y_t)=Var(y_t)=\sigma^2$

If $k=1$ then we have covariance between two adjacent value of $y$.

If $y_t$ is to be stationary, the mean, variance and autocovariance of $y_{t+m}$ (shift or origin of $y=m$) must be the same as those of $y_t$. OR

If if a time series is stationary, its mean, variance and autocovariance remain the same no matter at what point we measure them, i.e, they are time invariant.

## Non-Stationary Time Series

A time series having a time-varying mean or a time varying variance or both is called non-stationary time series.

## Purely Random/ White Noise Process

A stochastic process having zero mean and a constant variance ($\sigma^2$) and serially uncorrelated is called purely random/ white noise process.

If it is independent also then such a process is called strictly white noise.

White noise denoted by $\mu_t$ as $\mu_t \sim N(0, \sigma^2)$ i.e. $\mu_t$ is independently and identically distributed as a normal distribution with zero mean and constant variance.

Stationary time series is important because if a time series is non-stationary, we can study its behaviour only for the time period under consideration. Each set of time series data will therefore be for a particular episode. As consequence, it is not possible to generalize it to other time periods. Therefore, for the purpose of forecasting, such (non-stochastic) time series may be of little practical value. Our interest is in stationary time series.

# Autocorrelation Time Series Data

Autocorrelation (serial correlation, or cross-autocorrelation) function (the diagnostic tool) helps to describe the evaluation of a process through time. Inference based on autocorrelation function is often called an analysis in the time domain.

Autocorrelation of a random process, is the measure of correlation (relationship) between observations at different distances apart. This coefficients (correlation or autocorrelation) often provide insight into the probability model which generated the data. One can say that autocorrelation is a mathematical tool for finding repeating patterns in the data series.

Autocorrelation is usually used for the following two purposes:

1. Help to detect the non-randomness in data (the first i.e. lag 1 autocorrelation is performed)
2. Help in identifying an appropriate time series model if the data are not random (autocorrelation are usually plotted for many lags)

For simple correlation, let there are $n$ pairs of observations on two variables $x$ and $y$, then the usual correlation coefficient (Pearson’s coefficient of correlation) is

$r=\frac{\sum(x_i-\overline{x})(y_i-\overline{y})}{\sqrt{\sum (x_i-\overline{x})^2 \sum (y_i-\overline{y})^2 }}$

Similar idea can be used to time series to see either successive observations are correlated or not. Given $N$ observations $x_1, x_2, \cdots, x_N$ on a discrete time series, we can form ($n-1$) pairs of observations such as $(x_1, x_2), (x_2, x_3), \cdots, (x_{n-1}, x_n)$. Here in each pair first observation is as one variable ($x_t$) and the second observation is as second variable ($x_{t+1}$). So the correlation coefficient between $x_t$ and $x_{t+1}$ is

$r_1\frac{ \sum_{t=1}^{n-1} (x_t- \overline{x}_{(1)} ) (x_{t+1}-\overline{x}_{(2)}) } {\sqrt{ [\sum_{t=1}^{n-1} (x_t-\overline{x}_{(1)})^2] [ \sum_{t=1}^{n-1} (y_t-\overline{y}_{(1)})^2 ] } }$

where

$\overline{x}_{(1)}=\sum_{t=1}^{n-1} \frac{x_t}{n-1}$ is the mean of first $n-1$ observations

$\overline{x}_{(2)}=\sum_{t=2}^{n} \frac{x_t}{n-1}$ is the mean of last $n-1$ observations

Note that: The assumption is that the observations in autocorrelation are equally spaced (equi-spaced).

It is called autocorrelation or serial correlation coefficient. For large $n$, $r_1$ is approximately

$r_1=\frac{\frac{\sum_{t=1}^{n-1} (x_t-\overline{x})(x_{t+1}-\overline{x}) }{n-1}}{ \frac{\sum_{t=1}^n (x_t-\overline{x})^2}{n}}$

or

$r_1=\frac{\sum_{t=1}^{n-1} (x_t-\overline{x})(x_{t+1}-\overline{x}) } { \sum_{t=1}^n (x_t-\overline{x})^2}$

For $k$ distance apart i.e., for $k$ lags

$r_k=\frac{\sum_{t=1}^{n-k} (x_t-\overline{x})(x_{t+k}-\overline{x}) } { \sum_{t=1}^n (x_t-\overline{x})^2}$

An $r_k$ value of $\pm \frac{2}{\sqrt{n} }$ denotes a significant difference from zero and signifies an autocorrelation.

## Application of Autocorrelation

• Autocorrelation analysis is widely used in fluorescence correlation spectroscopy.
• Autocorrelation is used to measurement the optical spectra and to measure the very-short-duration light pulses produced by lasers.
• Autocorrelation is used to analyze dynamic light scattering data for the determination of the particle size distributions of nanometer-sized particles in a fluid. A laser shining into the mixture produces a speckle pattern. Autocorrelation of the signal can be analyzed in terms of the diffusion of the particles. From this, knowing the fluid viscosity, the sizes of the particles can be calculated using Autocorrelation.
• The small-angle X-ray scattering intensity of a nano-structured system is the Fourier transform of the spatial autocorrelation function of the electron density.
• In optics, normalized autocorrelations and cross-correlations give the degree of coherence of an electromagnetic field.
• In signal processing, autocorrelation can provide information about repeating events such as musical beats or pulsar frequencies, but it cannot tell the position in time of the beat. It can also be used to estimate the pitch of a musical tone.
• In music recording, autocorrelation is used as a pitch detection algorithm prior to vocal processing, as a distortion effect or to eliminate undesired mistakes and inaccuracies.
• In statistics, spatial autocorrelation between sample locations also helps one estimate mean value uncertainties when sampling a heterogeneous population.
• In astrophysics, auto-correlation is used to study and characterize the spatial distribution of galaxies in the Universe and in multi-wavelength observations of Low Mass X-ray Binaries.
• In analysis of Markov chain Monte Carlo data, autocorrelation must be taken into account for correct error determination.

## Component of Time Series Data

Traditional methods of time series analysis are concerned with decomposing of a series into a trend, a seasonal variation and other irregular fluctuations. Although this approach is not always the best but still useful (Kendall and Stuart, 1996).

The components, by which time series is composed of, are called component of time series data. There are four basic Component of time series data described below.

Different Sources of Variation are:

1. Seasonal effect (Seasonal Variation or Seasonal Fluctuations)
Many of the time series data exhibits a seasonal variation which is annual period, such as sales and temperature readings.  This type of variation is easy to understand and can be easily measured or removed from the data to give de-seasonalized data.Seasonal Fluctuations describes any regular variation (fluctuation) with a period of less than one year for example cost of variation types of fruits and vegetables, cloths, unemployment figures, average daily rainfall, increase in sale of tea in winter, increase in sale of ice cream in summer etc., all show seasonal variations.The changes which repeat themselves within a fixed period, are also called seasonal variations, for example, traffic on roads in morning and evening hours, Sales at festivals like EID etc., increase in the number of passengers at weekend etc. Seasonal variations are caused by climate, social customs, religious activities etc.
2. Other Cyclic Changes (Cyclical Variation or Cyclic Fluctuations)
Time series exhibits Cyclical Variations at a fixed period due to some other physical cause, such as daily variation in temperature. Cyclical variation is a non-seasonal component which varies in recognizable cycle. sometime series exhibits oscillation which do not have a fixed period but are predictable to some extent. For example, economic data affected by business cycles with a period varying between about 5 and 7 years.In weekly or monthly data, the cyclical component may describes any regular variation (fluctuations) in time series data. The cyclical variation are periodic in nature and repeat themselves like business cycle, which has four phases (i) Peak (ii) Recession (iii) Trough/Depression (iv) Expansion.
3. Trend (Secular Trend or Long Term Variation)
It is a longer term change. Here we take into account the number of observations available and make a subjective assessment of what is long term. To understand the meaning of long term, let for example climate variables sometimes exhibit cyclic variation over a very long time period such as 50 years. If one just had 20 years data, this long term oscillation would appear to be a trend, but if several hundreds years of data is available, then long term oscillations would be visible.These movements are systematic in nature where the movements are broad, steady, showing slow rise or fall in the same direction. The trend may be linear or non-linear (curvilinear). Some examples of secular trend are: Increase in prices, Increase in pollution, increase in the need of wheat, increase in literacy rate, decrease in deaths due to advances in science.Taking averages over a certain period is a simple way of detecting trend in seasonal data. Change in averages with time is evidence of a trend in the given series, though there are more formal tests for detecting trend in time series.
4. Other Irregular Variation (Irregular Fluctuations)
When trend and cyclical variations are removed from a set of time series data, the residual left, which may or may not be random. Various techniques for analyzing series of this type examine to see “if irregular variation may be explained in terms of probability models such as moving average or autoregressive  models, i.e. we can see if any cyclical variation is still left in the residuals.These variation occur due to sudden causes are called residual variation (irregular variation or accidental or erratic fluctuations) and are unpredictable, for example rise in prices of steel due to strike in the factory, accident due to failure of break, flood, earth quick, war etc.

Component of Time Series Data

## Objectives of Time Series Analysis

There are many objectives related to time series analysis, objectives of time series analysis may be classified as

The description of the objectives of time series analysis are as follows:

## Description

The first step in the analysis is to plot the data and obtain simple descriptive measures (such as plotting data, looking for trends,  seasonal fluctuations and so on) of the main properties of the series. In above figure , there is a regular seasonal pattern of price change although this price pattern is not consistent. Graph enables to look for “wild” observations or outlier (not appear to be consistent with the rest of the data). Graphing the time series make possible the presence of turning points where upward trend suddenly changed to a downward trend. If there is turning point, different models may have to be fitted to the two parts of the series.

## Explanation

Observations taken on two or more variables, making possible to use the variation in one time series to explain the variation in another series. This may lead to deeper understanding. Multiple regression model may be helpful in this case.

## Prediction

Given an observed time series, one may want to predict the future values of the series. It is an important task in sales of forecasting and is the analysis of economic and industrial time series. Prediction and forecasting used interchangeably.

## Control

When time series generated to measure the quality of a manufacturing process (the aim may be) to control the process. Control procedures are of several different kinds. In quality control, the observations are plotted on control chart and the controller takes action as a result of studying the charts. A stochastic model is fitted to the series. Future values of the series are predicted and then the input process variables are adjusted so as to keep the process on target.