# Category: Multivariate Statistics

## EigenValues and EigenVectors

Eigenvalues and eigenvectors of matrices are needed for some of the methods such as Principal Component Analysis (PCA), Principal Component Regression (PCR), and assessment of the input of collinearity.

For a real, symmetric matrix $A_{n\times n}$ there exists a set of $n$ scalars $\lambda_i$, and $n$ non-zero vectors $Z_i\,\,(i=1,2,\cdots,n)$ such that

\begin{align*}
AZ_i &=\lambda_i\,Z_i\\
AZ_i – \lambda_i\, Z_i &=0\\
\Rightarrow (A-\lambda_i \,I)Z_i &=0
\end{align*}

The $\lambda_i$ are the $n$ eigenvalues (characteristic roots or latent root) of the matrix $A$ and the $Z_i$ are the corresponding (column) eigenvectors (characteristic vectors or latent vectors).

There are non-zero solution to $(A-\lambda_i\,I)=0$ only if the matrix ($A-\lambda_i\,I$) is less than full rank (only if the determinant of $(A-\lambda_i\,I)$ is zero). $\lambda_i$ are obtained by solving the general determinantal equation $|A-\lambda\,I|=0$.

The determinant of $(A-\lambda\,I)$ is an $n$th degree polynomial in $\lambda$. Solving this equation gives the $n$ values of $\lambda$, which are not necessarily distinct. Each value of $\lambda$ is used in equation $(A-\lambda_i\,I)Z_i=0$ to find the companion eigenvectors $Z_i$.

When the eigenvalues are distinct, the vector solution to $(A-\lambda_i\,I)Z_i=0$ is uniques except for an arbitrary scale factor and sign. By convention, each eigenvector is defined to be the solution vector scaled to have unit length; that is, $Z_i’Z_i=1$. Furthermore, the eigenvectors are mutually orthogonal; ($Z_i’Z_i=0$ when $i\ne j$).

When the eigenvalues are not distinct, there is an additional degree of arbitrariness in defining the subsets of vectors corresponding to each subset of non-distinct eigenvalues.

Example: Let the matrix $A=\begin{bmatrix}10&3\\3 & 8\end{bmatrix}$.

The eigenvalues of $A$ can be found by $|A-\lambda\,I|=0$. Therefore,

\begin{align*}
|A-\lambda\, I|&=\Big|\begin{matrix}10-\lambda & 3\\ 3& 8-\lambda\end{matrix}\Big|\\
\Rightarrow (10-\lambda)(8-\lambda)-9 &= \lambda^2 -18\lambda+71 =0
\end{align*}

By Quadratic formula, $\lambda_1 = 12.16228$ and $\lambda_2=5.83772$, arbitrary ordered from largest to smallest. Thus the matrix of eigenvalues of $A$ is

$$L=\begin{bmatrix}12.16228 & 0 \\ 0 & 5.83772\end{bmatrix}$$

The eigenvectors corresponding to $\lambda_1=12.16228$ is obtained by solving

$(A-\lambda_2\,I)Z_i=0$ for the element of $Z_i$;

\begin{align*}
(A-12.16228I)\begin{bmatrix}Z_{11}\\Z_{21}\end{bmatrix} &=0\\
\left(\begin{bmatrix}10&3\\3&8\end{bmatrix}-\begin{bmatrix}12.162281&0\\0&12.162281\end{bmatrix}\right)\begin{bmatrix}Z_{11}\\Z_{21}\end{bmatrix}&=0\\
\begin{bmatrix}-2.162276 & 3\\ 3 & -4.162276\end{bmatrix}\begin{bmatrix}Z_{11}\\Z_{21}\end{bmatrix}&=0
\end{align*}

Arbitrary setting $Z_{11}=1$ and solving for $Z_{11}$, using first equation gives $Z_{21}=0.720759$. Thus the vector $Z_1’=\begin{bmatrix}1 & 0.72759\end{bmatrix}$ statisfy first equation.

$Length(Z_1)=\sqrt{Z_1’Z_1}=\sqrt{1.5194935}=1.232677$, where $Z’=0.999997$.

\begin{align*}
Z_1 &=\begin{bmatrix} 0.81124&0.58471\end{bmatrix}\\
Z_2 &=\begin{bmatrix}-0.58471&0.81124\end{bmatrix}
\end{align*}

The elements of $Z_2$ are found in the same manner. Thus the matrix of eigenvectors for $A$ is

$$Z=\begin{bmatrix}0.81124 &-0.58471\\0.8471&0.81124\end{bmatrix}$$

Note that matrix $A$ is of rank two because both eigenvalues are non-zero. The decomposition of $A$ into two orthogonal matrices each of rank one.

\begin{align*}
A &=A_1+A_2\\
A_1 &=\lambda_1Z_1Z_1′ = 12.16228 \begin{bmatrix}0.81124\\0.58471\end{bmatrix}\begin{bmatrix}0.81124 & 0.58471\end{bmatrix}\\
&= \begin{bmatrix}8.0042 & 5.7691\\ 5.7691&4.1581\end{bmatrix}\\
A_2 &= \lambda_2Z_2Z_2′ = \begin{bmatrix}1.9958 & -2.7691\\-2.7691&3.8419\end{bmatrix}
\end{align*}

Thus sum of eigenvalues $\lambda_1+\lambda_2=18$ which is $trace(A)$. Thus sum of the eigenvalues for any square symmetric matrix is equal to the trace of the matrix. The trace of each of the component rank $-1$ matrix is equal to its eigenvalue. $trace(A_1)=\lambda_1$ and $trace(A_2)=\lambda_2$.

## Cholesky Transformation

Given the covariances between variables, one can write an invertible linear transformation that “uncorrelated” the variables. Contrariwise, one can transform a set of uncorrelated variables into variables with given covariances. This transformation is called Cholesky Transformation; represented by a matrix that is the “Square Root” of the covariance matrix.

## The Square Root Matrix

Given a covariance matrix $\Sigma$, it can be factored uniquely into a product $\Sigma=U’U$, where $U$ is an upper triangle matrix with positive diagonal entries. The matrix $U$ is the Cholesky (or square root) matrix. If one prefers to work with the lower triangular matrix entries ($L$), then one can define $$L=U’ \Rightarrow \quad \Sigma = LL’.$$

This is the form of the Cholesky decomposition given by Golub and Van Lean in 1996. They provided proof of the Cholesky Decomposition and various ways to compute it.