Role of Hat Matrix in Regression Analysis

Hat matrix is a $n\times n$ symmetric and idempotent matrix with many special properties play an important role in diagnostics of regression analysis by transforming the vector of observed responses Y into the vector of fitted responses $\hat{Y}$.

The model $Y=X\beta+\varepsilon$ with solution $b=(X’X)^{-1}X’Y$ provided that $(X’X)^{-1}$ is non-singular. The fitted values are ${\hat{Y}=Xb=X(X’X)^{-1} X’Y=HY}$.

Like fitted values ($\hat{Y}$), the residual can be expressed as linear combinations of the response variable Yi.

\begin{align*}
e&=Y-\hat{Y}\\
&=Y-HY\\&=(I-H)Y
\end{align*}

  • Hat matrix only involves the observation in the predictor variable X  as $H=X(X’X)^{-1}X’$. It plays an important role in diagnostics for regression analysis.
  • The hat matrix plays an important role in determining the magnitude of a studentized deleted residual and therefore in identifying outlying Y observations.
  • The hat matrix is also helpful in directly identifying outlying X observation.
  • In particular the diagonal elements of the hat matrix are indicator of in a multi-variable setting of whether or not a case is outlying with respect to X values.
  • The elements of hat matrix have their values between 0 and 1 always and their sum is p i.e. $0 \le h_{ii}\le 1$  and  $\sum _{i=1}^{n}h_{ii} =p $
    where p is number of regression parameter with intercept term.
  • hii is a measure of the distance between the X values for the ith case and the means of the X values for all n cases.

Mathematical Properties of Hat Matrix

  • HX=X
  • (I-H)X=0
  • HH=H2=H=Hp
  • H(I-H)=0
  • $Cov(\hat{e},\hat{Y})=Cov\left\{HY,(I-H)Y\right\}=\sigma ^{2} H(I-H)=0$
  • 1-H is also symmetric and idempotent.
  • H1=1 with intercept term. i.e. every row of H adds  upto 1. 1’=1H’=1’H  & 1’H1=n
  • The elements of H are denoted by hii  i.e.
    \[H=\begin{pmatrix}{h_{11} } & {h_{12} } & {\cdots } & {h_{1n} } \\ {h_{21} } & {h_{22} } & {\cdots } & {h_{2n} } \\ {\vdots } & {\vdots } & {\ddots } & {\vdots } \\ {h_{n1} } & {h_{n2} } & {\vdots } & {h_{nn} }\end{pmatrix}\]
    The large value of hii indicates that the ith case is distant from the center for all n cases. The diagonal element hii in this context is called leverage of the ith case.hii is a function of only the X values, so hii measures the role of the X values in determining how important Yi is affecting the fitted $\hat{Y}_{i} $ values.
    The larger the hii, the smaller the variance of the residuals ei for hii =1, σ2(ei)=0.
  • Variance, Covariance of e
    \begin{align*}
    e-E(e)&=(I-H)Y(Y-X\beta )=(I-H)\varepsilon \\
    E(\varepsilon \varepsilon ‘)&=V(\varepsilon )=I\sigma ^{2} \,\,\text{and} \,\, E(\varepsilon )=0\\
    (I-H)’&=(I-H’)=(I-H)\\
    V(e) & =  E\left[e-E(e_{i} )\right]\left[e-E(e_{i} )\right]^{{‘} } \\
    & = (I-H)E(\varepsilon \varepsilon ‘)(I-H)’ \\
    & = (I-H)I\sigma ^{2} (I-H)’ \\
    & =(I-H)(I-H)I\sigma ^{2} =(I-H)\sigma ^{2}
    \end{align*}V(ei) is given by the ith diagonal element 1-hii  and Cov(ei ,ej ) is given by the (i, j)th  element of −hij of the matrix (I-H)σ2.\begin{align*}
    \rho _{ij} &=\frac{Cov(e_{i} ,e_{j} )}{\sqrt{V(e_{i} )V(e_{j} )} } \\
    &=\frac{-h_{ij} }{\sqrt{(1-h_{ii} )(1-h_{jj} )} }\\
    SS(b) & = SS({\rm all\; parameter)=}b’X’Y \\
    & = \hat{Y}’Y=Y’H’Y=Y’HY=Y’H^{2} Y=\hat{Y}’\hat{Y}
    \end{align*}The average $V(\hat{Y}_{i} )$ to all data points is
    \begin{align*}
    \sum _{i=1}^{n}\frac{V(\hat{Y}_{i} )}{n} &=\frac{trace(H\sigma ^{2} )}{n}=\frac{p\sigma ^{2} }{n} \\
    \hat{Y}_{i} &=h_{ii} Y_{i} +\sum _{j\ne 1}h_{ij} Y_{j}
    \end{align*}

Role of Hat Matrix in Regression Diagnostic

Internally Studentized Residuals

$V(e_i)=(1-h_{ii}\sigma^2$ where σ2 is estimated by s2

i.e. $s^{2} =\frac{e’e}{n-p} =\frac{\Sigma e_{i}^{2} }{n-p} $  (RMS)

we can studentized the residual as $s_{i} =\frac{e_{i} }{s\sqrt{(1-h_{ii} )} } $

These studentized residuals are said to be internally studentized because s has within it ei itself.

Extra Sum of Squares attributable to $e_i$

\begin{align*}
e&=(1-H)Y\\
e_{i} &=-h_{i1} Y_{1} -h_{i2} Y_{2} -\cdots +(1-h_{ii} )Y_{i} -h_{in} Y_{n} =c’Y\\
c’&=(-h_{i1} ,-h_{i2} ,\cdots ,(1-h_{ii} )\cdots -h_{in} )\\
c’c&=\sum _{i=1}^{n}h_{i1}^{2}  +(1-2h_{ii} )=(1-h_{ii} )\\
SS(e_{i})&=\frac{e_{i}^{2} }{(1-h_{ii} )}\\
S_{(i)}^{2}&=\frac{(n-p)s^{2} -\frac{e_{i}^{2}}{e_{i}^{2}  (1-h_{ii} )}}{n-p-1}
\end{align*}
provides an estimate of σ2 after deletion of the contribution of ei.

Externally Studentized Residuals

$t_{i} =\frac{e_{i} }{s(i)\sqrt{(1-h_{ii} )} }$ are externally studentized residuals. Here if ei is large, it is thrown into emphases even more by the fact that si has excluded it. The ti follows a tn-p-1 distribution under the usual normality of errors assumptions.

Read more about Role of Hat Matrix in Regression Anbalysis https://en.wikipedia.org/wiki/Hat_matrix

Read about Regression Diagnostics

Muhammad Imdad Ullah

Currently working as Assistant Professor of Statistics in Ghazi University, Dera Ghazi Khan. Completed my Ph.D. in Statistics from the Department of Statistics, Bahauddin Zakariya University, Multan, Pakistan. l like Applied Statistics, Mathematics, and Statistical Computing. Statistical and Mathematical software used is SAS, STATA, GRETL, EVIEWS, R, SPSS, VBA in MS-Excel. Like to use type-setting LaTeX for composing Articles, thesis, etc.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

x Logo: Shield Security
This Site Is Protected By
Shield Security