Role of Hat Matrix in Regression Analysis
Hat matrix is a $n\times n$ symmetric and idempotent matrix with many special properties play an important role in diagnostics of regression analysis by transforming the vector of observed responses Y into the vector of fitted responses $\hat{Y}$.
The model $Y=X\beta+\varepsilon$ with solution $b=(X’X)^{-1}X’Y$ provided that $(X’X)^{-1}$ is non-singular. The fitted values are ${\hat{Y}=Xb=X(X’X)^{-1} X’Y=HY}$.
Like fitted values ($\hat{Y}$), the residual can be expressed as linear combinations of the response variable Yi.
\begin{align*}
e&=Y-\hat{Y}\\
&=Y-HY\\&=(I-H)Y
\end{align*}
- Hat matrix only involves the observation in the predictor variable X as $H=X(X’X)^{-1}X’$. It plays an important role in diagnostics for regression analysis.
- The hat matrix plays an important role in determining the magnitude of a studentized deleted residual and therefore in identifying outlying Y observations.
- The hat matrix is also helpful in directly identifying outlying X observation.
- In particular the diagonal elements of the hat matrix are indicator of in a multi-variable setting of whether or not a case is outlying with respect to X values.
- The elements of hat matrix have their values between 0 and 1 always and their sum is p i.e. $0 \le h_{ii}\le 1$ and $\sum _{i=1}^{n}h_{ii} =p $
where p is number of regression parameter with intercept term. - hii is a measure of the distance between the X values for the ith case and the means of the X values for all n cases.
Mathematical Properties of Hat Matrix
- HX=X
- (I-H)X=0
- HH=H2=H=Hp
- H(I-H)=0
- $Cov(\hat{e},\hat{Y})=Cov\left\{HY,(I-H)Y\right\}=\sigma ^{2} H(I-H)=0$
- 1-H is also symmetric and idempotent.
- H1=1 with intercept term. i.e. every row of H adds upto 1. 1’=1H’=1’H & 1’H1=n
- The elements of H are denoted by hii i.e.
\[H=\begin{pmatrix}{h_{11} } & {h_{12} } & {\cdots } & {h_{1n} } \\ {h_{21} } & {h_{22} } & {\cdots } & {h_{2n} } \\ {\vdots } & {\vdots } & {\ddots } & {\vdots } \\ {h_{n1} } & {h_{n2} } & {\vdots } & {h_{nn} }\end{pmatrix}\]
The large value of hii indicates that the ith case is distant from the center for all n cases. The diagonal element hii in this context is called leverage of the ith case.hii is a function of only the X values, so hii measures the role of the X values in determining how important Yi is affecting the fitted $\hat{Y}_{i} $ values.
The larger the hii, the smaller the variance of the residuals ei for hii =1, σ2(ei)=0. - Variance, Covariance of e
\begin{align*}
e-E(e)&=(I-H)Y(Y-X\beta )=(I-H)\varepsilon \\
E(\varepsilon \varepsilon ‘)&=V(\varepsilon )=I\sigma ^{2} \,\,\text{and} \,\, E(\varepsilon )=0\\
(I-H)’&=(I-H’)=(I-H)\\
V(e) & = E\left[e-E(e_{i} )\right]\left[e-E(e_{i} )\right]^{{‘} } \\
& = (I-H)E(\varepsilon \varepsilon ‘)(I-H)’ \\
& = (I-H)I\sigma ^{2} (I-H)’ \\
& =(I-H)(I-H)I\sigma ^{2} =(I-H)\sigma ^{2}
\end{align*}V(ei) is given by the ith diagonal element 1-hii and Cov(ei ,ej ) is given by the (i, j)th element of −hij of the matrix (I-H)σ2.\begin{align*}
\rho _{ij} &=\frac{Cov(e_{i} ,e_{j} )}{\sqrt{V(e_{i} )V(e_{j} )} } \\
&=\frac{-h_{ij} }{\sqrt{(1-h_{ii} )(1-h_{jj} )} }\\
SS(b) & = SS({\rm all\; parameter)=}b’X’Y \\
& = \hat{Y}’Y=Y’H’Y=Y’HY=Y’H^{2} Y=\hat{Y}’\hat{Y}
\end{align*}The average $V(\hat{Y}_{i} )$ to all data points is
\begin{align*}
\sum _{i=1}^{n}\frac{V(\hat{Y}_{i} )}{n} &=\frac{trace(H\sigma ^{2} )}{n}=\frac{p\sigma ^{2} }{n} \\
\hat{Y}_{i} &=h_{ii} Y_{i} +\sum _{j\ne 1}h_{ij} Y_{j}
\end{align*}
Role of Hat Matrix in Regression Diagnostic
Internally Studentized Residuals
$V(e_i)=(1-h_{ii}\sigma^2$ where σ2 is estimated by s2
i.e. $s^{2} =\frac{e’e}{n-p} =\frac{\Sigma e_{i}^{2} }{n-p} $ (RMS)
we can studentized the residual as $s_{i} =\frac{e_{i} }{s\sqrt{(1-h_{ii} )} } $
These studentized residuals are said to be internally studentized because s has within it ei itself.
Extra Sum of Squares attributable to $e_i$
\begin{align*}
e&=(1-H)Y\\
e_{i} &=-h_{i1} Y_{1} -h_{i2} Y_{2} -\cdots +(1-h_{ii} )Y_{i} -h_{in} Y_{n} =c’Y\\
c’&=(-h_{i1} ,-h_{i2} ,\cdots ,(1-h_{ii} )\cdots -h_{in} )\\
c’c&=\sum _{i=1}^{n}h_{i1}^{2} +(1-2h_{ii} )=(1-h_{ii} )\\
SS(e_{i})&=\frac{e_{i}^{2} }{(1-h_{ii} )}\\
S_{(i)}^{2}&=\frac{(n-p)s^{2} -\frac{e_{i}^{2}}{e_{i}^{2} (1-h_{ii} )}}{n-p-1}
\end{align*}
provides an estimate of σ2 after deletion of the contribution of ei.
Externally Studentized Residuals
$t_{i} =\frac{e_{i} }{s(i)\sqrt{(1-h_{ii} )} }$ are externally studentized residuals. Here if ei is large, it is thrown into emphases even more by the fact that si has excluded it. The ti follows a tn-p-1 distribution under the usual normality of errors assumptions.
Read more about Role of Hat Matrix in Regression Anbalysis https://en.wikipedia.org/wiki/Hat_matrix
Read about Regression Diagnostics