EigenValues and EigenVectors

Eigenvalues and eigenvectors of matrices are needed for some of the methods such as Principal Component Analysis (PCA), Principal Component Regression (PCR), and assessment of the input of collinearity.

For a real, symmetric matrix $A_{n\times n}$ there exists a set of $n$ scalars $\lambda_i$, and $n$ non-zero vectors $Z_i\,\,(i=1,2,\cdots,n)$ such that

\begin{align*}
AZ_i &=\lambda_i\,Z_i\\
AZ_i – \lambda_i\, Z_i &=0\\
\Rightarrow (A-\lambda_i \,I)Z_i &=0
\end{align*}

The $\lambda_i$ are the $n$ eigenvalues (characteristic roots or latent root) of the matrix $A$ and the $Z_i$ are the corresponding (column) eigenvectors (characteristic vectors or latent vectors).

There are non-zero solution to $(A-\lambda_i\,I)=0$ only if the matrix ($A-\lambda_i\,I$) is less than full rank (only if the determinant of $(A-\lambda_i\,I)$ is zero). $\lambda_i$ are obtained by solving the general determinantal equation $|A-\lambda\,I|=0$.

The determinant of $(A-\lambda\,I)$ is an $n$th degree polynomial in $\lambda$. Solving this equation gives the $n$ values of $\lambda$, which are not necessarily distinct. Each value of $\lambda$ is used in equation $(A-\lambda_i\,I)Z_i=0$ to find the companion eigenvectors $Z_i$.

When the eigenvalues are distinct, the vector solution to $(A-\lambda_i\,I)Z_i=0$ is uniques except for an arbitrary scale factor and sign. By convention, each eigenvector is defined to be the solution vector scaled to have unit length; that is, $Z_i’Z_i=1$. Furthermore, the eigenvectors are mutually orthogonal; ($Z_i’Z_i=0$ when $i\ne j$).

When the eigenvalues are not distinct, there is an additional degree of arbitrariness in defining the subsets of vectors corresponding to each subset of non-distinct eigenvalues.

Example: Let the matrix $A=\begin{bmatrix}10&3\\3 & 8\end{bmatrix}$.

The eigenvalues of $A$ can be found by $|A-\lambda\,I|=0$. Therefore,

\begin{align*}
|A-\lambda\, I|&=\Big|\begin{matrix}10-\lambda & 3\\ 3& 8-\lambda\end{matrix}\Big|\\
\Rightarrow (10-\lambda)(8-\lambda)-9 &= \lambda^2 -18\lambda+71 =0
\end{align*}

By Quadratic formula, $\lambda_1 = 12.16228$ and $\lambda_2=5.83772$, arbitrary ordered from largest to smallest. Thus the matrix of eigenvalues of $A$ is

$$L=\begin{bmatrix}12.16228 & 0 \\ 0 & 5.83772\end{bmatrix}$$

The eigenvectors corresponding to $\lambda_1=12.16228$ is obtained by solving

$(A-\lambda_2\,I)Z_i=0$ for the element of $Z_i$;

\begin{align*}
(A-12.16228I)\begin{bmatrix}Z_{11}\\Z_{21}\end{bmatrix} &=0\\
\left(\begin{bmatrix}10&3\\3&8\end{bmatrix}-\begin{bmatrix}12.162281&0\\0&12.162281\end{bmatrix}\right)\begin{bmatrix}Z_{11}\\Z_{21}\end{bmatrix}&=0\\
\begin{bmatrix}-2.162276 & 3\\ 3 & -4.162276\end{bmatrix}\begin{bmatrix}Z_{11}\\Z_{21}\end{bmatrix}&=0
\end{align*}

Arbitrary setting $Z_{11}=1$ and solving for $Z_{11}$, using first equation gives $Z_{21}=0.720759$. Thus the vector $Z_1’=\begin{bmatrix}1 & 0.72759\end{bmatrix}$ statisfy first equation.

$Length(Z_1)=\sqrt{Z_1’Z_1}=\sqrt{1.5194935}=1.232677$, where $Z’=0.999997$.

\begin{align*}
Z_1 &=\begin{bmatrix} 0.81124&0.58471\end{bmatrix}\\
Z_2 &=\begin{bmatrix}-0.58471&0.81124\end{bmatrix}
\end{align*}

The elements of $Z_2$ are found in the same manner. Thus the matrix of eigenvectors for $A$ is

$$Z=\begin{bmatrix}0.81124 &-0.58471\\0.8471&0.81124\end{bmatrix}$$

Note that matrix $A$ is of rank two because both eigenvalues are non-zero. The decomposition of $A$ into two orthogonal matrices each of rank one.

\begin{align*}
A &=A_1+A_2\\
A_1 &=\lambda_1Z_1Z_1′ = 12.16228 \begin{bmatrix}0.81124\\0.58471\end{bmatrix}\begin{bmatrix}0.81124 & 0.58471\end{bmatrix}\\
&= \begin{bmatrix}8.0042 & 5.7691\\ 5.7691&4.1581\end{bmatrix}\\
A_2 &= \lambda_2Z_2Z_2′ = \begin{bmatrix}1.9958 & -2.7691\\-2.7691&3.8419\end{bmatrix}
\end{align*}

Thus sum of eigenvalues $\lambda_1+\lambda_2=18$ which is $trace(A)$. Thus sum of the eigenvalues for any square symmetric matrix is equal to the trace of the matrix. The trace of each of the component rank $-1$ matrix is equal to its eigenvalue. $trace(A_1)=\lambda_1$ and $trace(A_2)=\lambda_2$.

Canonical Correlation Analysis

The bivariate correlation analysis measures the strength of relationship between two variables. One may require to find the strength of relationship between two sets of variables. In this case canonical correlation is an appropriate technique for measuring the strength of relationship between two sets of variables. Canonical correlation is appropriate in the same situations where multiple regression would be, but where there are multiple inter-correlated outcome variables. Canonical correlation analysis determines a set of canonical variates, orthogonal linear combinations of the variables within each set that best explain the variability both within and between sets. For example,

• In medical, individuals’ life styles and eating habits may have effect on their different health measures determined by number of health-related variables such as hypertension, weight, anxiety and tension level.
• In business, the marketing manager of a consumer goods firm may be interested in finding the relationship between types of products purchased and consumers’ life styles and personalities.

From above two examples, one set of variables is the predictor or independent while other set of variables is the criterion or dependent set. The objective of canonical correlation analysis is to determine if the predictor set of variables affects the criterion set of variables.

Note that it is not necessary to designate the two sets of variables as the dependent and independent sets. In this case the objective of canonical correlation is to ascertain the relationship between the two sets of variables.

The objective of canonical correlation is similar to that of conducting a principal components analysis on each set of variables. In principal components analysis, the first new axis results in a new variable that accounts for the maximum variance in the data, while in canonical correlation analysis a new axis is identified for each set of variables such that the correlation between the two resulting new variables is maximum.

Canonical correlation analysis can also be considered as data reduction technique as it is possible that only a few canonical variables are needed to adequately represents the association between the two sets of variables. Therefore, an additional objective of canonical correlation is to determine the minimum number of canonical correlations needed to adequately represent the association between two sets of variables.