# Basic Statistics and Data Analysis

### Category: Stochastic Processes

Stochastic Processes, Markov Process, Random Walk Models, Poisson Process, Brownian Process

## Markov Chain

A Markov chain, named after Andrey Markov is a mathematical system that experience transitions from one state to another, between a finite or countable number of possible states. Markov chain is a random process usually characterized as memoryless: the next state depends only on the current state and not on the sequence of events that preceded it. This specific kind of memorylessness is called the Markov property. Markov chains have many applications as statistical models of real world processes.

If the random variables $X_{n-1}$ and $X_n$ take the values $X_{n-1}=i$ and $X_n=j$, then the system has made a transition $S_i \rightarrow S_j$, that is, a transition from state $S_i$ to state $S_j$ at the $n$th trial. Note that $i$ can equal $j$, so that transitions within the same state may be possible. We need to assign probabilities to the transitions $S_i \rightarrow S_j$. Generally in chain, the probability that $X_n=j$ will depend on the whole sequence of random variables starting with the initial value $X_0$. The Markov chain has the characteristic property that the probability that $X_n=j$ depends only on the immediate previous state of the system. This means that we need no further information at each step other than for each $i$ and $j$,  $P\{X_n=j|X_{n-1}=i\}$
which means the probability that $X_n=j$ given that $X_{n-1}=i$: this probability is independent of the values of $X_{n-2},X_{n-3},\cdots, X_0$.

Let we have a set of states $S=\{s_1,s_2,\cdots,s_n\}$. The process starts in one of these states and moves successively from one state to another state. Each move is called a step. If the chain is currently in state $s_i$ then it moves to state $s_j$ at the next step with probability denoted by $p_{ij}$ (transition probability) and this probability does not depend upon which states the chain was in before the current state. The probabilities $p_{ij}$ are called transition probabilities ($s_i \xrightarrow[]{p_{ij}} s_j$ ). The process can remain in the state it is in, and this occurs in probability $p_{ii}$.

An initial probability distribution, define on $S$ specifies the starting state. Usually this is done by specifying a particular state as the starting state.

Formally a Markov chain is a sequence of random variables $X_1,X_2,\cdots,$ with the Markov property that, given the present state, the future and past state are independent. Thus
$P(X_n=x|X_1=x_1,X_2=x_2\cdots X_{n-1}=x_{n-1})$
$\quad=P(X_n=x|X_{n-1}=x_{n-1})$
Or
$P(X_n=j|X_{n-1}=i)$

## Example: Markov Chain

A Markov chain $X$ on $S=\{0,1\}$ is determined by the initial distribution given by $p_0=P(X_0=0), \; p_1=P(X_0=1)$ and the one-step transition probability given by $p_{00}=P(x_{n+1}=0|X_n=0)$, $p_{10}=P(x_{n+1}=0|X_n=1)$, $p_{01}=1-p_{00}$ and $p_{11}=1-p_{10}$, so one-step transition probability in matrix form is $P=\begin{pmatrix}p_{00}&p_{10}\\p_{01}&p_{11}\end{pmatrix}$

## References:

• https://en.wikipedia.org/wiki/Markov_chain
• http://people.virginia.edu/~rlc9s/sys6005/SYS_6005_Intro_to_MC.pdf
• http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter11.pdf

# Symmetric Random Walk Probability

As we know that a related probability is the one for the event in which the first visit to position x occurs at the nth step given that the walk starts at the origin, i.e. the first passage through x. For case x=0 we will find the probability of the first return to the origin. Let A be the event that $X_n=0$, let $B_k$ be the event that the first visit to the origin occurs at the kth step. By the law of total probability
$P(A)=\sum_{k=1}^n P(A|B_k)P(B_k) \tag{1}\label{eqn2.341}$
Let $f_k=P(B_k)$. The conditional probability $P(A|B_k)$ is the probability that the walk returns to the origin after n–k steps i.e. $P(A|B_k)=p_{n-k}$. Note that $p_n$ is the probability that the walk is at the origin at step n and $p_n=0$ if n is odd. Hence (\ref{eqn2.341}) can be written as
$p_n=\sum_{k=1}^n p_{n-k}f_k \tag{2}\label{eqn2.342}$
Multiplying both sides of (\ref{eqn2.342}) by $s^n$ and summing for $n\geq1$, we have
$\sum_{n=1}^\infty p_ns^n =\sum_{n=1}^\infty \sum_{k=1}^n p_{n-k}f_k s^n \tag{3}\label{eqn2.343}$
as
$H(s)=\sum_{n=0}^\infty p_ns^n=1+\sum_{n=1}^\infty p_ns^n$
$H(s)-1=\sum_{n=1}^\infty p_ns^n\qquad$
Therefore from (\ref{eqn2.343})
\begin{align*}
H(s)-1&=\sum_{n=1}^\infty p_n s^n\\
&=\sum_{n=1}^\infty \sum_{k=1}^n p_{n-k}f_ks^n\tag{4}\label{eqn2.344}
\end{align*}
Since $p_0=1$ and $f_0=0$, we can replace (\ref{eqn2.344}) by
\begin{align*}
H(s)-1&=\sum_{n=1}^\infty p_ns^n\\
&=\sum_{n=0}^\infty \sum_{k=0}^n p_{n-k}f_ks^n = \sum_{n=0}^\infty p_ns^n \sum_{k=0}^\infty f_k s^k \tag{5}\label{eqn2.345}
\end{align*}
the product of two power series, also known as Convolution of the distribution.

Now if $Q(s)=\sum_{n=0}^\infty f_ns^n$ is the probability generating function of the first return distribution, then
$H(s)-1=H(s)Q(s)$.

From equation (8) of lecture “Probability Distribution after n steps” we have
\begin{align*}
Q(s)&=\frac{H(s)-1}{H(s)}=1-\frac{1}{H(s)}\\
&=1-(1-s^2)^{\frac{1}{2}} \tag{6}\label{eqn2.346}
\end{align*}
The probability that the walk will, at some step, return to the origin is $\sum_{n=1}^\infty f_n=Q(1)=1$;
i.e. the return is certain. In this walk the origin or any point by translation is persistent. However, the mean number of steps until this return occurs is
\begin{align*}
\sum_{n=1}^\infty nf_n &=\lim_{s \rightarrow 1-} Q'(s)\\
&=\lim_{s \rightarrow 1-} \frac{s}{(1-s^2)^{\frac{1}{2}}}=\infty
\end{align*}
In this walk which starts at the origin is certain to return there in the future, but on average, it will take an infinite number of steps.

Example:  Find the probability that a symmetric random walk starting from the origin returns there for the first time after 6 steps.

solution:  We require the coefficient of $s^6$ in the power series expansion of the pgf Q(s), which is
\begin{align*}
Q(s)&=1-(1-s^2)^{\frac{1}{2}}\\
&=1-\left[\frac{1^{\frac{1}{2}}}{0!}-\frac{\frac{1}{2}{(-s^2)}^1}{1!}-\frac{\frac{1}{2}(\frac{1}{2}-1){(-s^2)}^2}{2!}-\frac{\frac{1}{2}(\frac{1}{2}-1)(\frac{1}{2}-2){(-s^2)}^3}{3!}\right]\\
&=1-1+\frac{1}{2}s^2+\frac{1}{8}s^4+\frac{1}{16}s^6+0s^8
\end{align*}
Hence the probability of a first return at step 6 is $\frac{1}{16}$.

# Random Walk Probability of Returning to Origin after n Steps

Assume that the walk starts at x=0 with steps to the right or left occurring with probabilities p and q=1-p. We can write the position $X_n$ after n steps as
$X_n=R_n-L_n \tag{1}$
where $R_n$ is the number of right or positive steps (+1) and $L_n$ is the number of left or negative steps (-1).

Therefore the Total steps can be calculated as:  $n=R_n+L_n \tag{2}$
Hence
\begin{align*}
L_n&=n-R_n\\
\Rightarrow X_n&=R_n-n+R_n\\
R_n&=\frac{1}{2}(n+X_n) \tag{3}
\end{align*}
The equation (3) will be an integer only when n and $X_n$ are both even or both odd (eg. To go from x=0 to x=7 we must take an odd number of steps).

Now, let $v_{n,x}$ be the probability that the walk is at state x after n steps assuming that x is a positive integer. Then
\begin{align*}
v_{n,x}&=P(X_n=x)\\
&=P(R_n=\frac{1}{2}(n+x))
\end{align*}
$R_n$ is a binomial random variable with index $n$ having probability p, since the walker either moves to the right or not at every step, and the steps are independent, then
\begin{align*}
v_{n,x}&=\binom{n}{\frac{1}{2}(n+x)}p^{\frac{1}{2}(n+x)}q^{n-\frac{1}{2}(n+x)}\\
&=\binom{n}{\frac{1}{2}(n+x)}p^{\frac{1}{2}(n+x)}q^{\frac{1}{2}(n-x)} \tag{4}
\end{align*}
where (n,x) are both even or both odd and $-n \leq x \leq n$. Note that a similar argument can be constructed if x is a negative integer.

Example: For total number of steps is 2, the net displacement must be one of the three possibilities: (1) two steps to the left, (2) back to the start, (3) or two steps to the right. These correspond to values of x = -2, 0,+2. Clearly it is impossible to get more than two units away from the origin if you take only two steps and it is equally impossible to end up exactly one unit from the origin if you take two steps.

For symmetric case ($p=\tfrac{1}{2}$), starting from the origin, there are $2^n$ different paths of length n since there is a choice of right or left move at each step. Since the number of steps in the right direction must be $\tfrac{1}{2}(n+x)$ and the total number of paths must be the number of ways in which $\frac{1}{2}(n+x)$ can be chosen from n: that is
$N_{n,x}=\binom{n}{\tfrac{1}{2}(n+x)}$
provided that $\tfrac{1}{2}(n+x)$ is an integer.

By counting rule, the probability that the walk ends at x after n steps is given by the ratio of this number and the total number of paths (since all paths are equally likely). Hence
$v_{n,x}=\frac{N_{n,x}}{2^n}=\binom{n}{\tfrac{1}{2}(n+x)}\frac{1}{2^n}$
The probability $v_{n,x}$ is the probability that the walk ends at state x after n steps: the walk could have overshot x before returning there.

## Related Probability/ First Passage through x

A related probability is the probability that the first visit to position x occurs at the nth step. The following is descriptive derivation of the associated probability generating function of the symmetric random walk in which the walk starts at the origin, and we consider the probability that it returns to the origin.

From equation (4), the probability that a walk is at the origin at step n is
\begin{align*}
v_{n,x}&=\binom{n}{\frac{1}{2}(n+x)}p^{\frac{1}{2}(n+x)}q^{n-\frac{1}{2}(n+x)}\\
&=\binom{n}{\tfrac{1}{2}(n+0)} \left(\frac{1}{2}\right)^{\tfrac{1}{2}n} \left(\frac{1}{2}\right)^{\tfrac{1}{2}n}\\
&=\binom{n}{\tfrac{1}{2}n}\frac{1}{2^n}=p_n \,\,\,\text{(say)}, \quad (n=2,4,6,\cdots) \tag{5}
\end{align*}
Here $p_n$ is the probability that after n steps the position of walker is at origin. We also assume that $p_n=0$ if n is odd. From equation (5) we can construct a generating function.
\begin{align*}
H(s)&=\sum_{n=0}^\infty p_n s^n\\
&=\sum_{n=0}^\infty p_{2n}s^{2n}=\sum_{n=0}^\infty \frac{1}{2^{2n}}\binom{2n}{n}s^{2n} \tag{6}
\end{align*}
Note that $p_0=1$, and H(s) is not a probability generating function since $H(1)\neq1$.

The binomial coefficient can be re-arranged as follows:
\begin{align*}
\binom{2n}{n}&=\frac{(2n)!}{n!n!}=\frac{2n(2n-1)(2n-2)\cdots3.2.1}{n!n!}\\
&=\frac{2^nn!(2n-1)(2n-3)\cdots3.2.1}{n!n!}\\
&=\frac{2^{2n}}{n!}\frac{1}{2}\frac{3}{2}\cdots(n-\tfrac{1}{2})\\
&=(-1)^n \binom{-\tfrac{1}{2}}{n}2^{2n} \tag{7}
\end{align*}
Using equation (6) in (7)
$H(s)=\sum_{n=0}^\infty \frac{1}{2^{2n}}(-1)^n \binom{-\tfrac{1}{2}}{n}s^{2n}2^{2n}=(1-s^2)^{-\tfrac{1}{2}} \tag{8}$
by binomial theorem, provided $|s|<1$. Note that this expansion guarantees that $p_n=0$ if n is odd.

Note that the equation (8) does not sum to one. This is called defective distribution which still gives the probability that the walk is at the origin at step n.

We can estimate the behaviour of $p_n$ for large n by using Stirling’s Formula (asymptotic estimate for n! for large n), $n!\approx\sqrt{2\pi} n^{n+\tfrac{1}{2}}e^{-n}$

From equation (5)
\begin{align*}
p_{2n}&=\frac{1}{2^{2n}}\binom{2n}{n}=\frac{1}{2^{2n}}\frac{(2n)!}{n!n!}\\
&\approx\frac{1}{2^{2n}}\frac{\sqrt{2\pi}(2n)^{2n+\tfrac{1}{2}}e^{-2n}}{[\sqrt{2\pi}(n^{n+\tfrac{1}{2}}e^{-n})]^2}\\
&=\frac{1}{\sqrt{\pi n}}; \qquad \text{for large $n$}
\end{align*}
Hence $np_n\rightarrow \infty$ confirming that the series $\sum\limits_{n=0}^\infty p_n$ must diverge.

Random Walk Probability of Returning to Origin after n Steps some EXAMPLES

Example: Consider a random walk starts from x_0=0 find the probability that after 5 steps the position is 3. i.e. $X_5=3$, p=0.6.

Solution: Here number of steps are n=5 and position is x=3. Therefore positive and negative steps are

$R_n= \frac{1}{2}(n+x)=\frac{1}{2}(5+3)=4$ and $X_n=R_n-L_n \Rightarrow 3=4+L_n=1$
The probability that the event $X_5=3$ will occur in a random walk with $p=0.6$ is
$P(X_5=3)=\binom{5}{\frac{1}{2}(5+3)}(0.6)^{\tfrac{1}{2}(3+5)}(0.4)^{\tfrac{1}{2}(5-3)}=0.2592$

## Simple Random Walk: Unrestricted Random Walk

A simple random walk on a line or in one dimension occurs with probability $p$ when walker step forward (+1) and/or has probability q=1-p if walker steps back (-1). For ith step, the modified Bernoulli random variable Wi (takes the value +1 or -1 instead of {0,1}) is observed and the position of the walk at the nth step can be found by
\begin{align}
X_n&=X_0+W_1+W_2+\cdots+W_n\nonumber\\
&=X_0+\sum_{i=1}^nW_i\nonumber\\
&=X_{n-1}+W_n
\end{align}
In the gambler’s ruin problems X0=k, but here we assume (without loss of generality) that walks starts from the origin so that Xo=0.

Several derived results for random walks are restricted by boundaries. We consider here random walks without boundaries called unrestricted random walks. We are interested in

1. The position of the walk after a number of steps and
2. The probability of a return to the origin, the start of the walker.

From equation (1) the position of the walker at step n simply depends on the position at (n-1)th step, because the simple random walk possesses the Markov property (the current state of the walk depends on its immediate previous state, not on the history of the walks up to the present state)

Furthermore Xn=Xn-1 ±1

and the transition probabilities from one position to another is
P(Xn=j|Xn-1=j-1)=p, and P(Xn=j|Xn-1=j+1)=q are independent of the number of plays in the game or steps in the represented by n.

Mean and Variance of Xn can be calculate as:
\begin{align*}
E(X_n)&=E\left(X_0+\sum_{i=1}^n W_i\right)\\
&=E\left(\sum_{i=1}^n W_i\right)=nW_n\\
V(X_n)&=V\left(\sum_{i=1}^n W_i\right)=nV(W)
\end{align*}
Since Wi are independent and identically distributed (iid) random variables and where W is the common or typical Bernoulli random variable in the sequence {Wi}. Thus
\begin{align*}
E(W)&=1.p+(-1)q=p-q\\
V(W)&=E(W^2)-[E(W)]^2\\
&=1^2p+(-1)^2q-(p-q)^2\\
&=p+q-(p^2+q^2-2pq)\\
&=1-p^2-q^2+2pq\\
&=1-p^2-(1-p)^2+2pq\\
&=1-p^2-(1+p^2-2p)+2pq\\
&=1-p^2-1-p^2+2p+2pq\\
&=-2p^2+2p+2pq\\
&=2p(1-p)+2pq=4pq
\end{align*}
So the probability distribution of the position of the random walk at stage n has mean E(Xn)=n(p-q) and variance V(Xn)=4npq.

For symmetric random walk (where p=½) after n steps, the expected position is the origin, and it yields the maximum value of V(Xn)=4npq=4np(1-p).

If p>½ then a drift is expected away from the origin in a positive direction and if p<½ it would be expected that the drift would be in the negative direction.

Since V(Xn) is proportional to $n$, it grows with increasing n, and we would be increasingly uncertain about the position of the walker as n increases.
i.e.
\begin{align*}
\frac{\partial V(X_n)}{\partial p}&=\frac{\partial}{\partial p} {4npq}\\
&=\frac{\partial}{\partial p} \{4np-4np^2 \}=4n-8np \quad \Rightarrow p=\frac{1}{2}
\end{align*}
Just knowing the mean and standard deviation of a random variable does not enable us to identify its probability distribution. But for large n, we can apply the CLT.
$Z_n=\frac{X_n-n(p-q)}{\sqrt{4npq}}\thickapprox N(0,1)$
Applying continuity correction, approximate probabilities may be obtained for the position of the walk.

Example : Consider unrestricted random walk with n=100, p=0.6 then
\begin{align*}
E(X_n)&=E(X_{100})=nE(W)=n(p-q)\\
&=100(0.6-0.4)=20\\
V(X_n)&=4npq=4\times 100\times 0.6 \times 0.4=96
\end{align*}
The position of walk at the 100th step between 15 and 25 pace/step from the origin is
$P(15\leq X_{100}\leq30)\thickapprox P(14.5<X_{100}<25.5)$
$-\frac{5.5}{\sqrt{96}}<Z_{100}=\frac{X_{100}-20}{\sqrt{96}}<\frac{5.5}{96}$
hence
$P(-0.5613<Z_{100}<0.5613)=\phi(0.5613)-\phi(-0.5613)=0.43$
where Φ(Z) is the standard normal distribution function.

## Random Walks Model: A Mathematical Formalization of Path

A random walk (first introduced by Karl Pearson in 1905) is a mathematical formalization of a path consisting series of random steps. Some examples may include, (i) the path traced by molecule as it travels in a liquid or gas, (ii) the search path of a foraging animal, (iii) the price of fluctuating stock, and (iv) the financial status of a gambler. All these random steps in example can be modeled as random walks, although they may not be truly random in reality.

Suppose there are a+1 positions are marked out on a straight line and numbered 0,1,2,…,a. A person starts at k where 0<k<a. The walk proceeds in such a way that, at each step there is probability p that the walker goes forward one step to k+1 and a probability q=1-p that the walker goes back one step to k-1. The walk continues until either 0 or a is reached and then ends. In a random walk, the position of a walker after having moved n times is known as the state of the walk after n steps or after covering n stages. Thus the walk described above starts at stage k at step 0 and moves to either stage k-1 or stage k+1 after 1 step and so on.

If the walk is bounded, then the ends of the walk are known as barriers and they may have various properties. In this case the barriers are said to be absorbing implying that the walk must end once a barrier is reached since there is no escape.

A useful diagrammatic way of representing random walk is by a transition or process diagram. In a transition diagram, the possible states of the walker can be represented by points on a line. If a transition between two points can occur in one step then those points are joined by a curve or edge as shown with an arrow indicating the direction of the walk and a weighting denoting the probability of the step occurring. Transition diagram are also known as direct graph.

For small Markov processes the simplest way to represent the process is often in terms of its state transition diagram. In state transition diagram each state (outcome) represents the process as a node in a graph. The arcs in the graph represent possible transitions between states of the process. The arcs are labeled by the transition rates between the states.

Example:  Suppose a meteorologist notices that the weather on a given day seems to depend on the weather conditions of the previous day. He/ She observes that if it is raining one day, then the next day is sunny 60% of the time and rainy 40% of the time; on the other hand if it is sunny, the next day is sunny with probability 30% and rainy with probability 70%. Note that there are two outcomes (i) sunny (ii) rainy in this Markov process. The transition probability between sunny and rainy is 70%, between sunny and sunny is 30%, between rainy and sunny is 60% and between rainy and rainy is 40%. The simple weather forecasting Markov Process in transition diagram is

Random Walks

Random walks are widely being used in many fields such as Ecology, Economics, Psychology, Computer Science, Physics, Chemistry, and Biology etc. Random walks explain the observed behaviour of processes in all these fields, serving as a fundamental model for the recorded stochastic activity.