Basic Statistics and Data Analysis

Lecture notes, MCQS of Statistics

Role of Hat matrix in diagnostics of Regression Analysis

Hat matrix is a n ×n symmetric and idempotent matrix with many special properties play an important role in diagnostics of regression analysis by transforming the vector of observed responses Y into the vector of fitted responses $\hat{Y}$.

The model Y=Xβ +ε with solution b=(XX)-1 X’Y provided that (XX)-1 is non-singular. The fitted values are ${\hat{Y}=Xb=X(X’X)^{-1} X’Y=HY}$.

Like fitted values ($\hat{Y}$), the residual can be expressed as linear combinations of the response variable Yi.

\begin{align*}
e&=Y-\hat{Y}\\
&=Y-HY\\&=(I-H)Y
\end{align*}

  • Hat matrix only involves the observation in the predictor variable X  as H=X(XX)-1 X’. It plays an important role in diagnostics for regression analysis.
  • The hat matrix plays an important role in determining the magnitude of a studentized deleted residual and therefore in identifying outlying Y observations.
  • The hat matrix is also helpful in directly identifying outlying X observation.
  • In particular the diagonal elements of the hat matrix are indicator of in a multi-variable setting of whether or not a case is outlying with respect to X values.
  • The elements of hat matrix have their values between 0 and 1 always and their sum is p i.e.
    0≤ hii ≤ 1  and  $\sum _{i=1}^{n}h_{ii} =p $
    where p is number of regression parameter with intercept term.
  • hii is a measure of the distance between the X values for the ith case and the means of the X values for all n cases.

Mathematical Properties

  • HX=X
  • (I-H)X=0
  • HH=H2=H=Hp
  • H(I-H)=0
  • $Cov(\hat{e},\hat{Y})=Cov\left\{HY,(I-H)Y\right\}=\sigma ^{2} H(I-H)=0$
  • 1-H is also symmetric and idempotent.
  • H1=1 with intercept term. i.e. every row of H adds  upto 1. 1’=1H’=1’H  & 1’H1=n
  • The elements of H are denoted by hii  i.e.
    \[H=\begin{pmatrix}{h_{11} } & {h_{12} } & {\cdots } & {h_{1n} } \\ {h_{21} } & {h_{22} } & {\cdots } & {h_{2n} } \\ {\vdots } & {\vdots } & {\ddots } & {\vdots } \\ {h_{n1} } & {h_{n2} } & {\vdots } & {h_{nn} }\end{pmatrix}\]
    The large value of hii indicates that the ith case is distant from the center for all n cases. The diagonal element hii in this context is called leverage of the ith case.hii is a function of only the X values, so hii measures the role of the X values in determining how important Yi is affecting the fitted $\hat{Y}_{i} $ values.
    The larger the hii, the smaller the variance of the residuals ei for hii =1, σ2(ei)=0.
  • Variance, Covariance of e
    \begin{align*}
    e-E(e)&=(I-H)Y(Y-X\beta )=(I-H)\varepsilon \\
    E(\varepsilon \varepsilon ‘)&=V(\varepsilon )=I\sigma ^{2} \,\,\text{and} \,\, E(\varepsilon )=0\\
    (I-H)’&=(I-H’)=(I-H)\\
    V(e) & =  E\left[e-E(e_{i} )\right]\left[e-E(e_{i} )\right]^{{‘} } \\
    & = (I-H)E(\varepsilon \varepsilon ‘)(I-H)’ \\
    & = (I-H)I\sigma ^{2} (I-H)’ \\
    & =(I-H)(I-H)I\sigma ^{2} =(I-H)\sigma ^{2}
    \end{align*}V(ei) is given by the ith diagonal element 1-hii  and Cov(ei ,ej ) is given by the (i, j)th  element of −hij of the matrix (I-H)σ2.\begin{align*}
    \rho _{ij} &=\frac{Cov(e_{i} ,e_{j} )}{\sqrt{V(e_{i} )V(e_{j} )} } \\
    &=\frac{-h_{ij} }{\sqrt{(1-h_{ii} )(1-h_{jj} )} }\\
    SS(b) & = SS({\rm all\; parameter)=}b’X’Y \\
    & = \hat{Y}’Y=Y’H’Y=Y’HY=Y’H^{2} Y=\hat{Y}’\hat{Y}
    \end{align*}The average $V(\hat{Y}_{i} )$ to all data points is
    \begin{align*}
    \sum _{i=1}^{n}\frac{V(\hat{Y}_{i} )}{n} &=\frac{trace(H\sigma ^{2} )}{n}=\frac{p\sigma ^{2} }{n} \\
    \hat{Y}_{i} &=h_{ii} Y_{i} +\sum _{j\ne 1}h_{ij} Y_{j}
    \end{align*}

Some regression Diagnostic using Hat Matrix

Internally Studentized Residuals

$V(e_i)=(1-h_{ii}\sigma^2$ where σ2 is estimated by s2

i.e. $s^{2} =\frac{e’e}{n-p} =\frac{\Sigma e_{i}^{2} }{n-p} $  (RMS)

we can studentized the residual as $s_{i} =\frac{e_{i} }{s\sqrt{(1-h_{ii} )} } $

These studentized residuals are said to be internally studentized because s has within it ei itself.

Extra Sum of Squares attributable to $e_i$

\begin{align*}
e&=(1-H)Y\\
e_{i} &=-h_{i1} Y_{1} -h_{i2} Y_{2} -\cdots +(1-h_{ii} )Y_{i} -h_{in} Y_{n} =c’Y\\
c’&=(-h_{i1} ,-h_{i2} ,\cdots ,(1-h_{ii} )\cdots -h_{in} )\\
c’c&=\sum _{i=1}^{n}h_{i1}^{2}  +(1-2h_{ii} )=(1-h_{ii} )\\
SS(e_{i})&=\frac{e_{i}^{2} }{(1-h_{ii} )}\\
S_{(i)}^{2}&=\frac{(n-p)s^{2} -\frac{e_{i}^{2}}{e_{i}^{2}  (1-h_{ii} )}}{n-p-1}
\end{align*}
provides an estimate of σ2 after deletion of the contribution of ei.

Externally Studentized Residuals

$t_{i} =\frac{e_{i} }{s(i)\sqrt{(1-h_{ii} )} }$ are externally studentized residuals. Here if ei is large, it is thrown into emphases even more by the fact that si has excluded it. The ti follows a tn-p-1 distribution under the usual normality of errors assumptions.

Reference

 

Download Role of Hat Matrix in diagnostics:

 

The Author

Muhammad Imdadullah

Student and Instructor of Statistics and business mathematics. Currently Ph.D. Scholar (Statistics), Bahauddin Zakariya University Multan. Like Applied Statistics and Mathematics and Statistical Computing. Statistical and Mathematical software used are: SAS, STATA, GRETL, EVIEWS, R, SPSS, VBA in MS-Excel. Like to use type-setting LaTeX for composing Articles, thesis etc.

Leave a Reply

Copy Right © 2011 ITFEATURE.COM
%d bloggers like this: