Regression Diagnostics

Regression Analysis, Hat Matrix, Residual Analysis, Regression Diagnostics

Checking Normality of Error Term

Normality of Error Term

In multiple linear regression models, the sum of squared residuals (SSR) is divided by $n-p$ (degrees of freedom, where $n$ is the total number of observations, and $p$ is the number of the parameter in the model) is a good estimate of the error variance. In the multiple linear regression model, the residual vector is

e &=(I-H)y\\

where $H$ is the hat matrix for the regression model.

Each component $e_i=\varepsilon – \sum\limits_{i=1}^n h_{ij} \varepsilon_i$. Therefore, In multiple linear regression models, the normality of the residual is not simply the normality of the error term.

Note that:

\[Cov(\mathbf{e})=(I-H)\sigma^2 (I-H)’ = (I-H)\sigma^2\]

We can write $Var(e_i)=(1-h_{ii})\sigma^2$.

If the sample size ($n$) is much larger than the number of the parameters ($p$) in the model (i.e. $n > > p$), in other words, if sample size ($n$) is large enough, $h_{ii}$ will be small as compared to 1, and $Var(e_i) \approx \sigma^2$.

In multiple regression models, a residual behaves like an error if the sample size is large. However, this is not true for a small sample size.

It is unreliable to check the normality of error term assumption using residuals from multiple linear regression models when the sample size is small.

Normality of the Error Term

Learn more about Hat matrix: Role of Hat matrix in Diagnostics of Regression Analysis.

Learn R Programming Language

Role of Hat Matrix in Regression Analysis

The post is about the importance and role of the Hat Matrix in Regression Analysis.

Hat matrix is a $n\times n$ symmetric and idempotent matrix with many special properties that play an important role in the diagnostics of regression analysis by transforming the vector of observed responses $Y$ into the vector of fitted responses $\hat{Y}$.

The model $Y=X\beta+\varepsilon$ with solution $b=(X’X)^{-1}X’Y$ provided that $(X’X)^{-1}$ is non-singular. The fitted values are ${\hat{Y}=Xb=X(X’X)^{-1} X’Y=HY}$.

Like fitted values ($\hat{Y}$), the residual can be expressed as linear combinations of the response variable $Y_i$.


The role of hat matrix in Regression Analysis and Regression Diagnostics is:

  • The hat matrix only involves the observation in the predictor variable X  as $H=X(X’X)^{-1}X’$. It plays an important role in diagnostics for regression analysis.
  • The hat matrix plays an important role in determining the magnitude of a studentized deleted residual and therefore in identifying outlying Y observations.
  • The hat matrix is also helpful in directly identifying outlying $X$ observations.
  • In particular, the diagonal elements of the hat matrix are indicators in a multi-variable setting of whether or not a case is outlying concerning $X$ values.
  • The elements of the “Hat matrix” have their values between 0 and 1 always and their sum is p i.e. $0 \le h_{ii}\le 1$  and  $\sum _{i=1}^{n}h_{ii} =p $
    where p is the number of regression parameters with intercept term.
  • $h_{ii}$ is a measure of the distance between the $X$ values for the ith case and the means of the $X$ values for all $n$ cases.

Mathematical Properties of Hat Matrix

  • $HX=X$
  • $(I-H)X=0$
  • $HH=H^2 = H H^p$
  • $H(I-H)=0$
  • $Cov(\hat{e},\hat{Y})=Cov\left\{HY,(I-H)Y\right\}=\sigma ^{2} H(I-H)=0$
  • $1-H$ is also symmetric and idempotent.
  • $H1=1$ with intercept term. i.e. every row of $H$ adds up to $1. 1’=1H’=1’H$  & $1’H1=n$
  • The elements of $H$ are denoted by $h_{ii}$ i.e.
    \[H=\begin{pmatrix}{h_{11} } & {h_{12} } & {\cdots } & {h_{1n} } \\ {h_{21} } & {h_{22} } & {\cdots } & {h_{2n} } \\ {\vdots } & {\vdots } & {\ddots } & {\vdots } \\ {h_{n1} } & {h_{n2} } & {\vdots } & {h_{nn} }\end{pmatrix}\]
    The large value of $h_{ii}$ indicates that the ith case is distant from the center for all $n$ cases. The diagonal element $h_{ii}$ in this context is called leverage of the ith case. $h_{ii}$ is a function of only the $X$ values, so $h_{ii}$ measures the role of the $X$ values in determining how important $Y_i$ is affecting the fitted $\hat{Y}_{i} $ values.
    The larger the $h_{ii}$ the smaller the variance of the residuals $e_i$ for $h_{ii}=1$, $\sigma^2(ei)=0$.
  • Variance, Covariance of $e$
    e-E(e)&=(I-H)Y(Y-X\beta )=(I-H)\varepsilon \\
    E(\varepsilon \varepsilon ‘)&=V(\varepsilon )=I\sigma ^{2} \,\,\text{and} \,\, E(\varepsilon )=0\\
    V(e) & =  E\left[e-E(e_{i} )\right]\left[e-E(e_{i} )\right]^{{‘} } \\
    & = (I-H)E(\varepsilon \varepsilon ‘)(I-H)’ \\
    & = (I-H)I\sigma ^{2} (I-H)’ \\
    & =(I-H)(I-H)I\sigma ^{2} =(I-H)\sigma ^{2}
    $V(e_i)$ is given by the ith diagonal element $1-h_{ii}$ and $Cov(e_i, e_j)$ is given by the $(i, j)$th  element of $-h_{ij}$ of the matrix $(I-H)\sigma^2$.
    \rho _{ij} &=\frac{Cov(e_{i} ,e_{j} )}{\sqrt{V(e_{i} )V(e_{j} )} } \\
    &=\frac{-h_{ij} }{\sqrt{(1-h_{ii} )(1-h_{jj} )} }\\
    SS(b) & = SS({\rm all\; parameter)=}b’X’Y \\
    & = \hat{Y}’Y=Y’H’Y=Y’HY=Y’H^{2} Y=\hat{Y}’\hat{Y}
    The average $V(\hat{Y}_{i} )$ to all data points is
    \sum _{i=1}^{n}\frac{V(\hat{Y}_{i} )}{n} &=\frac{trace(H\sigma ^{2} )}{n}=\frac{p\sigma ^{2} }{n} \\
    \hat{Y}_{i} &=h_{ii} Y_{i} +\sum _{j\ne 1}h_{ij} Y_{j}

Role of Hat Matrix in Regression Diagnostic

Internally Studentized Residuals

$V(e_i)=(1-h_{ii})\sigma^2$ where $\sigma^2$ is estimated by $s^2$

i.e. $s^{2} =\frac{e’e}{n-p} =\frac{\Sigma e_{i}^{2} }{n-p} $  (RMS)

we can studentized the residual as $s_{i} =\frac{e_{i} }{s\sqrt{(1-h_{ii} )} } $

These studentized residuals are said to be internally studentized because $s$ has within it $e_i$ itself.

Extra Sum of Squares attributable to $e_i$

e_{i} &=-h_{i1} Y_{1} -h_{i2} Y_{2} -\cdots +(1-h_{ii} )Y_{i} -h_{in} Y_{n} =c’Y\\
c’&=(-h_{i1} ,-h_{i2} ,\cdots ,(1-h_{ii} )\cdots -h_{in} )\\
c’c&=\sum _{i=1}^{n}h_{i1}^{2}  +(1-2h_{ii} )=(1-h_{ii} )\\
SS(e_{i})&=\frac{e_{i}^{2} }{(1-h_{ii} )}\\
S_{(i)}^{2}&=\frac{(n-p)s^{2} -\frac{e_{i}^{2}}{e_{i}^{2}  (1-h_{ii} )}}{n-p-1}
provides an estimate of $\sigma^2$ after deletion of the contribution of $e_i$.

Externally Studentized Residuals

$t_{i} =\frac{e_{i} }{s(i)\sqrt{(1-h_{ii} )} }$ are externally studentized residuals. Here if $e_i$ is large, it is thrown into emphasis even more by the fact that $s_i$ has excluded it. The $t_i$ follows a $t_{n-p-1}$ distribution under the usual normality of error assumptions.

Hat Matrix in Regression

Read more about the Role of the Hat Matrix in Regression Analysis

Read about Regression Diagnostics

Scroll to Top