Hat matrix is a *n ×n* symmetric and idempotent matrix with many special properties play an important role in diagnostics of regression analysis by transforming the vector of observed responses *Y* into the vector of fitted responses $\hat{Y}$.

The model *Y=Xβ +ε* with solution *b=(XX) ^{-1} X’Y* provided that

*(XX)*is non-singular. The fitted values are ${\hat{Y}=Xb=X(X’X)^{-1} X’Y=HY}$.

^{-1}Like fitted values ($\hat{Y}$), the residual can be expressed as linear combinations of the response variable *Y _{i}*.

\begin{align*}

e&=Y-\hat{Y}\\

&=Y-HY\\&=(I-H)Y

\end{align*}

- Hat matrix only involves the observation in the predictor variable
*X*as*H=X(XX)*. It plays an important role in diagnostics for regression analysis.^{-1}X’ - The hat matrix plays an important role in determining the magnitude of a studentized deleted residual and therefore in identifying outlying
*Y*observations. - The hat matrix is also helpful in directly identifying outlying
*X*observation. - In particular the diagonal elements of the hat matrix are indicator of in a multi-variable setting of whether or not a case is outlying with respect to
*X*values. - The elements of hat matrix have their values between 0 and 1 always and their sum is
*p*i.e.

*0≤ h*and $\sum _{i=1}^{n}h_{ii} =p $_{ii}≤ 1

where*p*is number of regression parameter with intercept term. *h*is a measure of the distance between the_{ii}*X*values for the*i*th case and the means of the*X*values for all*n*cases.

**Mathematical Properties**

*HX=X**(I-H)X=0**HH=H*^{2}=H=H^{p}*H(I-H)=0*- $Cov(\hat{e},\hat{Y})=Cov\left\{HY,(I-H)Y\right\}=\sigma ^{2} H(I-H)=0$
*1-H*is also symmetric and idempotent.*H1=1*with intercept term. i.e. every row of*H*adds upto 1.*1’=1H’=1’H*&*1’H1=n*- The elements of
*H*are denoted by*h*i.e._{ii}

\[H=\begin{pmatrix}{h_{11} } & {h_{12} } & {\cdots } & {h_{1n} } \\ {h_{21} } & {h_{22} } & {\cdots } & {h_{2n} } \\ {\vdots } & {\vdots } & {\ddots } & {\vdots } \\ {h_{n1} } & {h_{n2} } & {\vdots } & {h_{nn} }\end{pmatrix}\]

The large value of*h*indicates that the_{ii}*i*th case is distant from the center for all*n*cases. The diagonal element*h*in this context is called leverage of the_{ii}*i*th case.*h*is a function of only the_{ii}*X*values, so*h*measures the role of the_{ii}*X*values in determining how important Yis affecting the fitted $\hat{Y}_{i} $ values._{i}

The larger the*h*, the smaller the variance of the residuals e_{ii}for_{i}*h*,_{ii}=1*σ*.^{2}(e_{i})=0 **Variance, Covariance of***e*

\begin{align*}

e-E(e)&=(I-H)Y(Y-X\beta )=(I-H)\varepsilon \\

E(\varepsilon \varepsilon ‘)&=V(\varepsilon )=I\sigma ^{2} \,\,\text{and} \,\, E(\varepsilon )=0\\

(I-H)’&=(I-H’)=(I-H)\\

V(e) & = E\left[e-E(e_{i} )\right]\left[e-E(e_{i} )\right]^{{‘} } \\

& = (I-H)E(\varepsilon \varepsilon ‘)(I-H)’ \\

& = (I-H)I\sigma ^{2} (I-H)’ \\

& =(I-H)(I-H)I\sigma ^{2} =(I-H)\sigma ^{2}

\end{align*}*V(e*is given by the_{i})*i*th diagonal element*1-h*and_{ii}*Cov(e*is given by the_{i},e_{j})*(i, j)*th element of −*h*of the matrix_{ij}*(I-H)σ*.\begin{align*}^{2}

\rho _{ij} &=\frac{Cov(e_{i} ,e_{j} )}{\sqrt{V(e_{i} )V(e_{j} )} } \\

&=\frac{-h_{ij} }{\sqrt{(1-h_{ii} )(1-h_{jj} )} }\\

SS(b) & = SS({\rm all\; parameter)=}b’X’Y \\

& = \hat{Y}’Y=Y’H’Y=Y’HY=Y’H^{2} Y=\hat{Y}’\hat{Y}

\end{align*}The average $V(\hat{Y}_{i} )$ to all data points is

\begin{align*}

\sum _{i=1}^{n}\frac{V(\hat{Y}_{i} )}{n} &=\frac{trace(H\sigma ^{2} )}{n}=\frac{p\sigma ^{2} }{n} \\

\hat{Y}_{i} &=h_{ii} Y_{i} +\sum _{j\ne 1}h_{ij} Y_{j}

\end{align*}

**Some regression Diagnostic using Hat Matrix**

**Internally Studentized Residuals**

$V(e_i)=(1-h_{ii}\sigma^2$ where *σ ^{2}* is estimated by s

^{2}i.e. $s^{2} =\frac{e’e}{n-p} =\frac{\Sigma e_{i}^{2} }{n-p} $ (RMS)

we can studentized the residual as $s_{i} =\frac{e_{i} }{s\sqrt{(1-h_{ii} )} } $

These studentized residuals are said to be internally studentized because *s* has within it e* _{i}* itself.

**Extra Sum of Squares attributable to $e_i$**

\begin{align*}

e&=(1-H)Y\\

e_{i} &=-h_{i1} Y_{1} -h_{i2} Y_{2} -\cdots +(1-h_{ii} )Y_{i} -h_{in} Y_{n} =c’Y\\

c’&=(-h_{i1} ,-h_{i2} ,\cdots ,(1-h_{ii} )\cdots -h_{in} )\\

c’c&=\sum _{i=1}^{n}h_{i1}^{2} +(1-2h_{ii} )=(1-h_{ii} )\\

SS(e_{i})&=\frac{e_{i}^{2} }{(1-h_{ii} )}\\

S_{(i)}^{2}&=\frac{(n-p)s^{2} -\frac{e_{i}^{2}}{e_{i}^{2} (1-h_{ii} )}}{n-p-1}

\end{align*}

provides an estimate of *σ ^{2}* after deletion of the contribution of e

*.*

_{i}**Externally Studentized Residuals**

$t_{i} =\frac{e_{i} }{s(i)\sqrt{(1-h_{ii} )} }$ are externally studentized residuals. Here if e* _{i}* is large, it is thrown into emphases even more by the fact that s

*has excluded it. The t*

_{i}*follows a*

_{i}*t*distribution under the usual normality of errors assumptions.

_{n-p-1}Reference

**Download Role of Hat Matrix in diagnostics:**
**Hat Matrix or Projection Matrix 134.95 KB**

**Hat Matrix or Projection Matrix 134.95 KB**