Role of Hat Matrix in Regression Analysis

The post is about the importance and role of the Hat Matrix in Regression Analysis.

Hat matrix is a $n\times n$ symmetric and idempotent matrix with many special properties that play an important role in the diagnostics of regression analysis by transforming the vector of observed responses $Y$ into the vector of fitted responses $\hat{Y}$.

The model $Y=X\beta+\varepsilon$ with solution $b=(X’X)^{-1}X’Y$ provided that $(X’X)^{-1}$ is non-singular. The fitted values are ${\hat{Y}=Xb=X(X’X)^{-1} X’Y=HY}$.

Like fitted values ($\hat{Y}$), the residual can be expressed as linear combinations of the response variable $Y_i$.

\begin{align*}
e&=Y-\hat{Y}\\
&=Y-HY\\&=(I-H)Y
\end{align*}

The role of hat matrix in Regression Analysis and Regression Diagnostics is:

  • The hat matrix only involves the observation in the predictor variable X  as $H=X(X’X)^{-1}X’$. It plays an important role in diagnostics for regression analysis.
  • The hat matrix plays an important role in determining the magnitude of a studentized deleted residual and identifying outlying Y observations.
  • The hat matrix is also helpful in directly identifying outlying $X$ observations.
  • In particular, the diagonal elements of the hat matrix are indicators in a multi-variable setting of whether or not a case is outlying concerning $X$ values.
  • The elements of the “Hat matrix” have their values between 0 and 1 always and their sum is p i.e. $0 \le h_{ii}\le 1$  and  $\sum _{i=1}^{n}h_{ii} =p $
    where p is the number of regression parameters with intercept term.
  • $h_{ii}$ is a measure of the distance between the $X$ values for the ith case and the means of the $X$ values for all $n$ cases.

Mathematical Properties of Hat Matrix

  • $HX=X$
  • $(I-H)X=0$
  • $HH=H^2 = H H^p$
  • $H(I-H)=0$
  • $Cov(\hat{e},\hat{Y})=Cov\left\{HY,(I-H)Y\right\}=\sigma ^{2} H(I-H)=0$
  • $1-H$ is also symmetric and idempotent.
  • $H1=1$ with intercept term. i.e. every row of $H$ adds up to $1. 1’=1H’=1’H$  & $1’H1=n$
  • The elements of $H$ are denoted by $h_{ii}$ i.e.
    \[H=\begin{pmatrix}{h_{11} } & {h_{12} } & {\cdots } & {h_{1n} } \\ {h_{21} } & {h_{22} } & {\cdots } & {h_{2n} } \\ {\vdots } & {\vdots } & {\ddots } & {\vdots } \\ {h_{n1} } & {h_{n2} } & {\vdots } & {h_{nn} }\end{pmatrix}\]
    The large value of $h_{ii}$ indicates that the ith case is distant from the center for all $n$ cases. In this context, the diagonal element $h_{ii}$ is called leverage of the ith case. $h_{ii}$ is a function of only the $X$ values, so $h_{ii}$ measures the role of the $X$ values in determining how important $Y_i$ is affecting the fitted $\hat{Y}_{i} $ values.
    The larger the $h_{ii}$ the smaller the variance of the residuals $e_i$ for $h_{ii}=1$, $\sigma^2(ei)=0$.
  • Variance, Covariance of $e$
    \begin{align*}
    e-E(e)&=(I-H)Y(Y-X\beta )=(I-H)\varepsilon \\
    E(\varepsilon \varepsilon ‘)&=V(\varepsilon )=I\sigma ^{2} \,\,\text{and} \,\, E(\varepsilon )=0\\
    (I-H)’&=(I-H’)=(I-H)\\
    V(e) & =  E\left[e-E(e_{i} )\right]\left[e-E(e_{i} )\right]^{{‘} } \\
    & = (I-H)E(\varepsilon \varepsilon ‘)(I-H)’ \\
    & = (I-H)I\sigma ^{2} (I-H)’ \\
    & =(I-H)(I-H)I\sigma ^{2} =(I-H)\sigma ^{2}
    \end{align*}
    $V(e_i)$ is given by the ith diagonal element $1-h_{ii}$ and $Cov(e_i, e_j)$ is given by the $(i, j)$th  element of $-h_{ij}$ of the matrix $(I-H)\sigma^2$.
    \begin{align*}
    \rho _{ij} &=\frac{Cov(e_{i} ,e_{j} )}{\sqrt{V(e_{i} )V(e_{j} )} } \\
    &=\frac{-h_{ij} }{\sqrt{(1-h_{ii} )(1-h_{jj} )} }\\
    SS(b) & = SS({\rm all\; parameter)=}b’X’Y \\
    & = \hat{Y}’Y=Y’H’Y=Y’HY=Y’H^{2} Y=\hat{Y}’\hat{Y}
    \end{align*}
    The average $V(\hat{Y}_{i} )$ to all data points is
    \begin{align*}
    \sum _{i=1}^{n}\frac{V(\hat{Y}_{i} )}{n} &=\frac{trace(H\sigma ^{2} )}{n}=\frac{p\sigma ^{2} }{n} \\
    \hat{Y}_{i} &=h_{ii} Y_{i} +\sum _{j\ne 1}h_{ij} Y_{j}
    \end{align*}

Role of Hat Matrix in Regression Diagnostic

Internally Studentized Residuals

$V(e_i)=(1-h_{ii})\sigma^2$ where $\sigma^2$ is estimated by $s^2$

i.e. $s^{2} =\frac{e’e}{n-p} =\frac{\Sigma e_{i}^{2} }{n-p} $  (RMS)

we can studentized the residual as $s_{i} =\frac{e_{i} }{s\sqrt{(1-h_{ii} )} } $

These studentized residuals are said to be internally studentized because $s$ has within it $e_i$ itself.

Extra Sum of Squares attributable to $e_i$

\begin{align*}
e&=(1-H)Y\\
e_{i} &=-h_{i1} Y_{1} -h_{i2} Y_{2} -\cdots +(1-h_{ii} )Y_{i} -h_{in} Y_{n} =c’Y\\
c’&=(-h_{i1} ,-h_{i2} ,\cdots ,(1-h_{ii} )\cdots -h_{in} )\\
c’c&=\sum _{i=1}^{n}h_{i1}^{2}  +(1-2h_{ii} )=(1-h_{ii} )\\
SS(e_{i})&=\frac{e_{i}^{2} }{(1-h_{ii} )}\\
S_{(i)}^{2}&=\frac{(n-p)s^{2} -\frac{e_{i}^{2}}{e_{i}^{2}  (1-h_{ii} )}}{n-p-1}
\end{align*}
provides an estimate of $\sigma^2$ after deletion of the contribution of $e_i$.

Externally Studentized Residuals

$t_{i} =\frac{e_{i} }{s(i)\sqrt{(1-h_{ii} )} }$ are externally studentized residuals. Here if $e_i$ is large, it is emphasized even more by the fact that $s_i$ has excluded it. The $t_i$ follows a $t_{n-p-1}$ distribution under the usual normality of error assumptions.

Hat Matrix in Regression itfeature.com

Read more about the Role of the Hat Matrix in Regression Analysis https://en.wikipedia.org/wiki/Hat_matrix

Read about Regression Diagnostics

https://rfaqs.com

Simple Random Walk (Unrestricted Random Walk) 2012

A simple random walk (or unrestricted random walk) on a line or in one dimension occurs with probability $p$ when the walker steps forward (+1) and/or has probability $q=1-p$ if the walker steps back ($-1$). For ith step, the modified Bernoulli random variable $W_i$ (takes the value $+1$ or $-1$ instead of {0,1}) is observed and the position of the walk at the nth step can be found by
\begin{align}
X_n&=X_0+W_1+W_2+\cdots+W_n\nonumber\\
&=X_0+\sum_{i=1}^nW_i\nonumber\\
&=X_{n-1}+W_n
\end{align}
In the gambler’s ruin problems $X_o=k$, but here we assume (without loss of generality) that walks start from the origin so that $X_0=0$.

Simple Random Walk

Several derived results for random walks are restricted by boundaries. We consider here random walks without boundaries called unrestricted random walks. We are interested in

  1. The position of the walk after a number of steps and
  2. The probability of a return to the origin, the start of the walker.

From equation (1) the position of the walker at step $n$ simply depends on the position at $(n-1)$th step, because the simple random walk possesses the Markov property (the current state of the walk depends on its immediate previous state, not on the history of the walks up to the present state)

Furthermore, $X_n=X_{n-1}\pm 1$ and the transition probabilities from one position to another is $P(X_n=j | X_{n-1}=j-1)=p$, and $P(X_n=j|X_{n-1}=j+1)=q$ is independent of the number of plays in the game or steps is represented by $n$.

The mean and Variance of $X_n$ can be calculated as:
\begin{align*}
E(X_n)&=E\left(X_0+\sum_{i=1}^n W_i\right)\\
&=E\left(\sum_{i=1}^n W_i\right)=nW_n\\
V(X_n)&=V\left(\sum_{i=1}^n W_i\right)=nV(W)
\end{align*}
Since $W_i$ are independent and identically distributed (iid) random variables and where $W$ is the common or typical Bernoulli random variable in the sequence$\{W_i\}$. Thus
\begin{align*}
E(W)&=1.p+(-1)q=p-q\\
V(W)&=E(W^2)-[E(W)]^2\\
&=1^2p+(-1)^2q-(p-q)^2\\
&=p+q-(p^2+q^2-2pq)\\
&=1-p^2-q^2+2pq\\
&=1-p^2-(1-p)^2+2pq\\
&=1-p^2-(1+p^2-2p)+2pq\\
&=1-p^2-1-p^2+2p+2pq\\
&=-2p^2+2p+2pq\\
&=2p(1-p)+2pq=4pq
\end{align*}
So the probability distribution of the position of the random walk at stage $n$ has to mean $E(X_n)=n(p-q)$ and $V(X_n)=4npq$ and variance.

For the symmetric random walk (where $p=½$) after $n$ steps, the expected position is the origin, and it yields the maximum value of $V(X_n)=4npq=4np(1-p)$.

If $p>\frac{1}{2}$ then drift is expected away from the origin in a positive direction and if $p<\frac{1}{2}$ it would be expected that the drift would be in the negative direction.

Since $V(X_n)$ is proportional to $n$, it grows with increasing n, and we would be increasingly uncertain about the position of the walker as $n$ increases.
i.e.
\begin{align*}
\frac{\partial V(X_n)}{\partial p}&=\frac{\partial}{\partial p} {4npq}\\
&=\frac{\partial}{\partial p} \{4np-4np^2 \}=4n-8np \quad \Rightarrow p=\frac{1}{2}
\end{align*}
Just knowing the mean and standard deviation of a random variable does not enable us to identify its probability distribution. But for large $n$, we can apply the CLT.
\[Z_n=\frac{X_n-n(p-q)}{\sqrt{4npq}}\thickapprox N(0,1)\]
Applying continuity correction, approximate probabilities may be obtained for the position of the walk.

Example : Consider unrestricted random walk with $n=100, p=0.6$ then
\begin{align*}
E(X_n)&=E(X_{100})=nE(W)=n(p-q)\\
&=100(0.6-0.4)=20\\
V(X_n)&=4npq=4\times 100\times 0.6 \times 0.4=96
\end{align*}
The position of the walk at the 100th step between 15 and 25 pace/steps from the origin is
\[P(15\leq X_{100}\leq30)\thickapprox P(14.5<X_{100}<25.5)\]
\[-\frac{5.5}{\sqrt{96}}<Z_{100}=\frac{X_{100}-20}{\sqrt{96}}<\frac{5.5}{96}\]
hence
\[P(-0.5613<Z_{100}<0.5613)=\phi(0.5613)-\phi(-0.5613)=0.43\]
where $\phi(Z)$ is the standard normal distribution function.

Simple Random Walk

Read more about Simple Random Walk: Random Walks Model

FAQs about Simple Random Walk

  1. What is meant by a simple random walk?
  2. How mean and variance of a simple random walk can be computed?
  3. Give an example of a simple random walk.

References

  1. https://www.encyclopediaofmath.org/index.php/Random_walk
  2. https://mathworld.wolfram.com/RandomWalk1-Dimensional.html

Random Walks Model: A Mathematical Formalization of Path

A random walk (first introduced by Karl Pearson in 1905) is a mathematical formalization of a path consisting of a series of random steps.

Random Walks Example

The following are some example related to random walks

  1. The path traced by a molecule as it travels in a liquid or gas,
  2. The search path of a foraging animal,
  3. The price of a fluctuating stock, and (iv) the financial status of a gambler.
    All these random steps in the example can be modeled as random walks, although they may not be truly random in reality.

Suppose there are $a+1$ positions marked out on a straight line and numbered 0,1,2,…, a. A person starts at $k$ where $0<k<a$. The walk proceeds in such a way that, at each step, there is probability p that the walker goes forward one step to $k+1$ and a probability $q=1-p$ that the walker goes back one step to $k-1$. The walk continues until either $0$ or $a$ is reached and then ends.

In a random walk, the position of a walker after having moved $n$ times is known as the state of the walk after $n$ steps or after covering $n$ stages. Thus the walk described above starts at stage $k$ at step $0$ and moves to either stage $k-1$ or stage $k+1$ after 1 step and so on.

If the walk is bounded, then the ends of the walk are known as barriers and they may have various properties. In this case, the barriers are said to be absorbing implying that the walk must end once a barrier is reached since there is no escape.

A useful diagrammatic way of representing random walk is by a transition or process diagram. In a transition diagram, the possible states of the walker can be represented by points on a line. If a transition between two points can occur in one step then those points are joined by a curve or edge as shown with an arrow indicating the direction of the walk and a weighting denoting the probability of the step occurring. A transition diagram is also known as a direct graph.

For small Markov processes the simplest way to represent the process is often in terms of its state transition diagram. In-state transition diagram each state (outcome) represents the process as a node in a graph. The arcs in the graph represent possible transitions between states of the process. The arcs are labeled by the transition rates between the states.

Example:  Suppose a meteorologist notices that the weather on a given day seems to depend on the weather conditions of the previous day. He/ She observes that if it is raining one day, then the next day is sunny 60% of the time and rainy 40% of the time; on the other hand, if it is sunny, the next day is sunny with probability 30% and rainy with probability 70%. Note that there are two outcomes (i) sunny and (ii) rainy in this Markov process.

The transition probability between sunny and rainy is 70%, between sunny and sunny is 30%, between rainy and sunny is 60%, and between rainy and rainy is 40%. The simple weather forecasting Markov Process in the transition diagram is

Random Walks https://itfeature.com
Random Walks

Random walk models are widely used in many fields such as Ecology, Economics, Psychology, Computer Science, Physics, Chemistry, Biology, etc. Random walks explain the observed behavior of processes in all these fields, serving as a fundamental model for the recorded stochastic activity.

Overall, the random walk model is a versatile tool within stochastic processes. It provides a framework for studying systems influenced by randomness and helps understand the evolution of such systems over time.

https://itfeature.com

Learning Statistics by Using R Programming Language

Visit: Quiz website