Role of Hat Matrix in Regression Analysis

Hat matrix is a $n\times n$ symmetric and idempotent matrix with many special properties play an important role in diagnostics of regression analysis by transforming the vector of observed responses Y into the vector of fitted responses $\hat{Y}$.

The model $Y=X\beta+\varepsilon$ with solution $b=(X’X)^{-1}X’Y$ provided that $(X’X)^{-1}$ is non-singular. The fitted values are ${\hat{Y}=Xb=X(X’X)^{-1} X’Y=HY}$.

Like fitted values ($\hat{Y}$), the residual can be expressed as linear combinations of the response variable Yi.

\begin{align*}
e&=Y-\hat{Y}\\
&=Y-HY\\&=(I-H)Y
\end{align*}

  • Hat matrix only involves the observation in the predictor variable X  as $H=X(X’X)^{-1}X’$. It plays an important role in diagnostics for regression analysis.
  • The hat matrix plays an important role in determining the magnitude of a studentized deleted residual and therefore in identifying outlying Y observations.
  • The hat matrix is also helpful in directly identifying outlying X observation.
  • In particular the diagonal elements of the hat matrix are indicator of in a multi-variable setting of whether or not a case is outlying with respect to X values.
  • The elements of hat matrix have their values between 0 and 1 always and their sum is p i.e. $0 \le h_{ii}\le 1$  and  $\sum _{i=1}^{n}h_{ii} =p $
    where p is number of regression parameter with intercept term.
  • hii is a measure of the distance between the X values for the ith case and the means of the X values for all n cases.

Mathematical Properties of Hat Matrix

  • HX=X
  • (I-H)X=0
  • HH=H2=H=Hp
  • H(I-H)=0
  • $Cov(\hat{e},\hat{Y})=Cov\left\{HY,(I-H)Y\right\}=\sigma ^{2} H(I-H)=0$
  • 1-H is also symmetric and idempotent.
  • H1=1 with intercept term. i.e. every row of H adds  upto 1. 1’=1H’=1’H  & 1’H1=n
  • The elements of H are denoted by hii  i.e.
    \[H=\begin{pmatrix}{h_{11} } & {h_{12} } & {\cdots } & {h_{1n} } \\ {h_{21} } & {h_{22} } & {\cdots } & {h_{2n} } \\ {\vdots } & {\vdots } & {\ddots } & {\vdots } \\ {h_{n1} } & {h_{n2} } & {\vdots } & {h_{nn} }\end{pmatrix}\]
    The large value of hii indicates that the ith case is distant from the center for all n cases. The diagonal element hii in this context is called leverage of the ith case.hii is a function of only the X values, so hii measures the role of the X values in determining how important Yi is affecting the fitted $\hat{Y}_{i} $ values.
    The larger the hii, the smaller the variance of the residuals ei for hii =1, σ2(ei)=0.
  • Variance, Covariance of e
    \begin{align*}
    e-E(e)&=(I-H)Y(Y-X\beta )=(I-H)\varepsilon \\
    E(\varepsilon \varepsilon ‘)&=V(\varepsilon )=I\sigma ^{2} \,\,\text{and} \,\, E(\varepsilon )=0\\
    (I-H)’&=(I-H’)=(I-H)\\
    V(e) & =  E\left[e-E(e_{i} )\right]\left[e-E(e_{i} )\right]^{{‘} } \\
    & = (I-H)E(\varepsilon \varepsilon ‘)(I-H)’ \\
    & = (I-H)I\sigma ^{2} (I-H)’ \\
    & =(I-H)(I-H)I\sigma ^{2} =(I-H)\sigma ^{2}
    \end{align*}V(ei) is given by the ith diagonal element 1-hii  and Cov(ei ,ej ) is given by the (i, j)th  element of −hij of the matrix (I-H)σ2.\begin{align*}
    \rho _{ij} &=\frac{Cov(e_{i} ,e_{j} )}{\sqrt{V(e_{i} )V(e_{j} )} } \\
    &=\frac{-h_{ij} }{\sqrt{(1-h_{ii} )(1-h_{jj} )} }\\
    SS(b) & = SS({\rm all\; parameter)=}b’X’Y \\
    & = \hat{Y}’Y=Y’H’Y=Y’HY=Y’H^{2} Y=\hat{Y}’\hat{Y}
    \end{align*}The average $V(\hat{Y}_{i} )$ to all data points is
    \begin{align*}
    \sum _{i=1}^{n}\frac{V(\hat{Y}_{i} )}{n} &=\frac{trace(H\sigma ^{2} )}{n}=\frac{p\sigma ^{2} }{n} \\
    \hat{Y}_{i} &=h_{ii} Y_{i} +\sum _{j\ne 1}h_{ij} Y_{j}
    \end{align*}

Role of Hat Matrix in Regression Diagnostic

Internally Studentized Residuals

$V(e_i)=(1-h_{ii}\sigma^2$ where σ2 is estimated by s2

i.e. $s^{2} =\frac{e’e}{n-p} =\frac{\Sigma e_{i}^{2} }{n-p} $  (RMS)

we can studentized the residual as $s_{i} =\frac{e_{i} }{s\sqrt{(1-h_{ii} )} } $

These studentized residuals are said to be internally studentized because s has within it ei itself.

Extra Sum of Squares attributable to $e_i$

\begin{align*}
e&=(1-H)Y\\
e_{i} &=-h_{i1} Y_{1} -h_{i2} Y_{2} -\cdots +(1-h_{ii} )Y_{i} -h_{in} Y_{n} =c’Y\\
c’&=(-h_{i1} ,-h_{i2} ,\cdots ,(1-h_{ii} )\cdots -h_{in} )\\
c’c&=\sum _{i=1}^{n}h_{i1}^{2}  +(1-2h_{ii} )=(1-h_{ii} )\\
SS(e_{i})&=\frac{e_{i}^{2} }{(1-h_{ii} )}\\
S_{(i)}^{2}&=\frac{(n-p)s^{2} -\frac{e_{i}^{2}}{e_{i}^{2}  (1-h_{ii} )}}{n-p-1}
\end{align*}
provides an estimate of σ2 after deletion of the contribution of ei.

Externally Studentized Residuals

$t_{i} =\frac{e_{i} }{s(i)\sqrt{(1-h_{ii} )} }$ are externally studentized residuals. Here if ei is large, it is thrown into emphases even more by the fact that si has excluded it. The ti follows a tn-p-1 distribution under the usual normality of errors assumptions.

Read more about Role of Hat Matrix in Regression Anbalysis https://en.wikipedia.org/wiki/Hat_matrix

Read about Regression Diagnostics