# Stationary Stochastic Process

A stochastic process is said to be stationary if its mean and variance are constant over time and the value of the covariance between the two time periods depends only on a distance or gap or lag between the two time periods and not the actual time at which the covariance is computed. Such a stochastic process also known as weak stationary, covariance stationary, second-order stationary or wide sense stochastic process.

In other words a sequence of random variables {$y_t$} is covariance stationary if there is no trend, and if the covariance does not change over time.

## Strictly Stationary (Covariance Stationary)

A time series is strictly stationary, if all the moments of its probability distribution are invariance over time but not for first two (mean and variance).

Let $y_t$ be a stochastic time series with

$E(y_t) = \mu$    $\Rightarrow$ Mean
$V(y_t) = E(y_t -\mu)^2=\sigma^2$  $\Rightarrow$ Variance
$\gamma_k = E[(y_t-\mu)(y_{t+k}-\mu)]$  $\Rightarrow$ Covariance = $Cov(y_t, y_{t-k})$

$\gamma_k$ is covariance or autocovariance at lag $k$.

If $k=0$ then $Var(y_t)=\sigma^2$ i.e. $Cov(y_t)=Var(y_t)=\sigma^2$

If $k=1$ then we have covariance between two adjacent value of $y$.

If $y_t$ is to be stationary, the mean, variance and autocovariance of $y_{t+m}$ (shift or origin of $y=m$) must be the same as those of $y_t$. OR

If if a time series is stationary, its mean, variance and autocovariance remain the same no matter at what point we measure them, i.e, they are time invariant.

## Non-Stationary Time Series

A time series having a time-varying mean or a time varying variance or both is called non-stationary time series.

## Purely Random/ White Noise Process

A stochastic process having zero mean and a constant variance ($\sigma^2$) and serially uncorrelated is called purely random/ white noise process.

If it is independent also then such a process is called strictly white noise.

White noise denoted by $\mu_t$ as $\mu_t \sim N(0, \sigma^2)$ i.e. $\mu_t$ is independently and identically distributed as a normal distribution with zero mean and constant variance.

Stationary time series is important because if a time series is non-stationary, we can study its behaviour only for the time period under consideration. Each set of time series data will therefore be for a particular episode. As consequence, it is not possible to generalize it to other time periods. Therefore, for the purpose of forecasting, such (non-stochastic) time series may be of little practical value. Our interest is in stationary time series.

## Binomial Probability Distribution

A statistical experiment having successive independent trials having two possible outcomes (such as success and failure; true and false; yes and no; right and wrong etc.) and probability of success is equal for each trial, while this kind of experiment is repeated a fixed number of times (say $n$ times) is called Binomial Experiment, Each trial of this Binomial experiment is known as Bernoulli trial (a trial which is a single performance of an experiment), for example. There are four properties of Binomial Experiment.

1. Each trial of Binomial Experiment can be classified as success or failure.
2. The probability of success for each trial of the experiment is equal.
3. Successive trials are independent, that is, the occurrence of one outcome in an experiment does not affect occurrence of the other.
4. The experiment is repeated a fixed number of times.

## Binomial Probability Distribution

Let a discrete random variable, which denotes the number of successes of a Binomial Experiment (we call this binomial random variable). The random variable assume isolated values as $X=0,1,2,\cdots,n$. The probability distribution of binomial random variable is termed as binomial probability distribution. It is a discrete probability distribution.

## Binomial Probability Mass Function

The probability function of binomial distribution is also called binomial probability mass function and can be denoted by $b(x, n, p)$, that is, a binomial distribution of random variable $X$ with $n$ (given number of trials) and $p$ (probability of success) as parameters. If $p$ is the probability of success (alternatively $q=1-p$ is probability of failure such that $p+q=1$) then probability of exactly $x$ success can be found from the following formula,

\begin{align}
b(x, n, p) &= P(X=x)\\
&=\binom{n}{x} p^x q^{n-x}, \quad x=0,1,2, \cdots, n
\end{align}

where $p$ is probability of success of a single trial, $q$ is probability of failure and $n$ is number of independent trials.

The formula gives probability for each possible combination of $n$ and $p$ of binomial random variable $X$. Note that it does not give $P(X <0)$ and $P(X>n)$. Binomial distribution is suitable when $n$ is small and is applied when sampling done is with replacement.

$b(x, n, p) = \binom{n}{x} p^x q^{n-x}, \quad x=0,1,2,\cdots,n,$

is called Binomial distribution because its successive terms are exactly same as that of binomial expansion of

\begin{align}
(q+p)^n=\binom{0}{0} p^0 q^{n-0}+\binom{n}{1} p^1 q^{n-1}+\cdots+\binom{n}{n-1} p^n q^{n-(n-1)}+\binom{n}{n} p^n q^{n-n}
\end{align}

$\binom{n}{0}, \binom{n}{1}, \binom{n}{2},\cdots, \binom{n}{n-1}, \binom{n}{n}$ are called Binomial coefficients.

Note that it is necessary to describe the limit of the random variable otherwise it will be only the mathematical equation not the probability distribution.

## Writing Excel Formulas

Writing Excel formulas is a little different than the way it is done in mathematics class. All Excel formulas starts with equal sign (=), that is, the equal sign always goes in that cell where you want the answer to appear from formula. Therefore, the equal sign informs Excel that this is formula not just a name or number. Excel formula looks like

= 3 + 2

rather than

3+2 =

## Cell References in Formula

The example of formula has one drawback. If you want to change the number being calculated (3, and 2), you need to edit it or re-write the formula. A better way is to write formula in such a way that you can change the numbers without changing or re-writing the formulas themselves. To do this, cell references are used, which tells Excel that data/ numbers are located in a cell. Therefore a cell’s location/ reference in the spreadsheet is referred to as its cell reference.

To find a cell reference, simply click the cell of which you need cell reference and from NAME BOX (shown in figure below), see the text, such as F2.

F2 represents the cell in F column (horizontal position) and row 2 (vertical position). It means cells reference can also be found by reading column heading (at the top most position) of the cells and row number (at the left most position). Therefore, cell reference is a combination of the column letter and row number such as A1, B2, Z5, and A106 etc. For previous formula example, instead of writing = 3 + 2 in cell suppose (C1), follow this way of cell reference and formula writing:

In cell A1 write 3, and in cell B2 write 2. In C1 cell write the formula such as,

= A1 + A2

Note that there is no gap between A & 1 and A & 2, they are simply A1 and A2. See the diagram for clear understanding.

## Updating Excel Formula

Upon wrong cell references in Excel formula, the results from formula will be automatically updated, whenever the data value in relevant cells is changed. For example, if you want to change data in cell A1 to 8 instead of 3, you only need to change the content of A1 only. The result of formula in cell C1 will automatically be updated after the updation of data value in A1 or B1.

Note that the formula will not change because the cells references are being used instead of data values or numbers.

## The Correlogram

A correlogram is a graph used to interpret a set of autocorrelation coefficients in which $r_k$ is plotted against the $log k$. A correlogram is often very helpful for visual inspection. Some general advice to interpret the correlogram are:

• A Random Series: If a time series is completely random, then for large $N$, $r_k \cong 0$ for all non-zero value of $k$. A random time series $r_k$ is approximately $N\left(0, \frac{1}{N}\right)$. If a time series is random, let 19 out of 20 of the values of $r_k$ can be expected to lie between $\pm \frac{2}{\sqrt{N}}$. However, plotting the first 20 values of $r_k$, one can expect to find one significant value on average even when time series is really random.
• Short-term Correlation: Stationary series often exhibit short term correlation characterized by a fairly large value of $r_1$ followed by 2 or 3 more coefficients (significantly greater than zero) tend to get successively smaller value of $r_k$ for larger lags tends to get be approximately zero. A time series which give rise to such a correlogram is one for which an observation above the mean tends to be followed by one or more further observations above the mean and similarly for observation below the mean. A model called an autoregressive model, may be appropriate for series of this type.
• Alternating Series: If a time series has a tendency to alternate with successive observations on different sides of the overall mean, then the correlogram also tends to alternate. The value of $r_1$ will be negative, however, the value of $r_2$ will be positive as observation at lag 2 will tend to be on the same side of the mean.
• Non-Stationary Series: If a time series contains a trend, then the value of $r_k$ will not come down to zero except for very large values of the lags. This is because by a large number of further observations on the same side of the mean because of the trend. The sample autocorrelation function $\{ r_k \}$ should only be calculated for stationary time series and no any tend should be removed before calculating $\{ r_k\}$.
• Seasonal Fluctuations: If a time series contains a seasonal fluctuation then the correlogram will also exhibit an oscillation at the same frequency. If $x_t$ follows a sinusoidal patterns then so does $r_k$.
$x_t=a\, cos\, t\, w,$ where $a$ is constant, $w$ is frequency such that $0 < w < \pi$. Therefore $r_k \cong cos\, k\, w$ for large $N$.
If the seasonal variation is removed from seasonal data then the correlogram may provide useful information.
• Outliers: If a time series contains one or more outliers the correlogram may be seriously affected. If there is one outlier in the time series and it is not adjusted, then the plot of $x_y$ vs $x_{t+k}$ will contain two extreme points, which will tend to depress the sample correlation coefficients towards zero. If there are two outliers, this effect is more noticeable.
• General Remarks: Experience is required to interpret autocorrelation coefficients. We need to study the probability theory of stationary series and the classes of model too. We also need to know the sampling properties of $x_t$.

## Principal Component Regression (PCR)

The transformation of original data set into a new set of uncorrelated variables is called principal components.  This kind of transformation ranks the new variables according to their importance (that is, variable are ranked according to the size of their variance and eliminates those of least importance). After transformation, a least square regression on this reduced set of principal components is performed.

Principal Component Regression (PCR) is not scale invariant, therefore, one should scale and center data first. Therefore, given a p-dimensional random vector $x=(x_1, x_2, …, x_p)^t$ with covariance matrix $\sum$ and assume that $\sum$ is positive definite. Let $V=(v_1,v_2, \cdots, v_p)$ be a $(p \times p)$-matrix with orthogonal column vectors that is $v_i^t\, v_i=1$, where $i=1,2, \cdots, p$ and $V^t =V^{-1}$. The linear transformation

\begin{aligned}
z&=V^t x\\
z_i&=v_i^t x
\end{aligned}

The variance of the random variable $z_i$ is
\begin{aligned}
Var(Z_i)&=E[v_i^t\, x\, x^t\,\, v_i]\\
&=v_i^t \sum v_i
\end{aligned}

Maximizing the variance $Var(Z_i)$ under the conditions $v_i^t v_i=1$ with Lagrange gives
$\phi_i=v_i^t \sum v_i -a_i(v_i^t v_i-1)$

Setting the partial derivation to zero, we get
$\frac{\partial \phi_i}{\partial v_i} = 2 \sum v_i – 2a_i v_i=0$

which is
$(\sum – a_i I)v_i=0$

In matrix form
$\sum V= VA$
of
$\sum = VAV^t$

where $A=diag(a_1, a_2, \cdots, a_p)$. This is know as the eigvenvalue problem, $v_i$ are the eigenvectors of $\sum$ and $a_i$ the corresponding eigenvalues such that $a_1 \ge a_2 \cdots \ge a_p$. Since $\sum$ is positive definite, all eigenvalues are real and non-negative numbers.

$z_i$ is named the ith principal component of $x$ and we have
$Cov(z)=V^t Cov(x) V=V^t \sum V=A$

The variance of the ith principal component matches the eigenvalue $a_i$, while the variances are ranked in descending order. This means that, the last principal component will have the smallest variance. The principal components are orthogonal to all the other principal components (they are even uncorrelated) since $A$ is a diagonal matrix.

In following, for regression, we will use $q$, that is,($1\le q \le p$) principal components. The regression model for observed data $X$ and $y$ can then be expressed as

\begin{aligned}
y&=X\beta+\varepsilon\\
&=XVV^t\beta+\varepsilon\\
&= Z\theta+\varepsilon
\end{aligned}

with the $n\times q$ matrix of the empirical principal components $Z=XV$ and the new regression coefficients $\theta=V^t \beta$. The solution of the least squares estimation is

\begin{aligned}
\hat{\theta}_k=(z_k^t z_k)^{-1}z_k^ty
\end{aligned}

and $\hat{\theta}=(\theta_1, \cdots, \theta_q)^t$

Since the $z_k$ are orthogonal, the regression is a sum of univariate regressions, that is
$\hat{y}_{PCR}=\sum_{k=1}^q \hat{\theta}_k z_k$

Since $z_k$ are linear combinations of the original $x_j$, the solution in terms of coefficients of the $x_j$ can be expressed as
$\hat{\beta}_{PCR} (q)=\sum_{k=1}^q \hat{\theta}_k v_k=V \hat{\theta}$

Note that if $q=p$, we would get back the usual least squares estimates for the full model. For $q<p$, we get a “reduced” regression.