Category: Multivariate Statistics

EigenValues and EigenVectors

Eigenvalues and eigenvectors of matrices are needed for some of the methods such as Principal Component Analysis (PCA), Principal Component Regression (PCR), and assessment of the input of collinearity.

For a real, symmetric matrix $A_{n\times n}$ there exists a set of $n$ scalars $\lambda_i$, and $n$ non-zero vectors $Z_i\,\,(i=1,2,\cdots,n)$ such that

\begin{align*}
AZ_i &=\lambda_i\,Z_i\\
AZ_i – \lambda_i\, Z_i &=0\\
\Rightarrow (A-\lambda_i \,I)Z_i &=0
\end{align*}

The $\lambda_i$ are the $n$ eigenvalues (characteristic roots or latent root) of the matrix $A$ and the $Z_i$ are the corresponding (column) eigenvectors (characteristic vectors or latent vectors).

There are non-zero solution to $(A-\lambda_i\,I)=0$ only if the matrix ($A-\lambda_i\,I$) is less than full rank (only if the determinant of $(A-\lambda_i\,I)$ is zero). $\lambda_i$ are obtained by solving the general determinantal equation $|A-\lambda\,I|=0$.

The determinant of $(A-\lambda\,I)$ is an $n$th degree polynomial in $\lambda$. Solving this equation gives the $n$ values of $\lambda$, which are not necessarily distinct. Each value of $\lambda$ is used in equation $(A-\lambda_i\,I)Z_i=0$ to find the companion eigenvectors $Z_i$.

When the eigenvalues are distinct, the vector solution to $(A-\lambda_i\,I)Z_i=0$ is uniques except for an arbitrary scale factor and sign. By convention, each eigenvector is defined to be the solution vector scaled to have unit length; that is, $Z_i’Z_i=1$. Furthermore, the eigenvectors are mutually orthogonal; ($Z_i’Z_i=0$ when $i\ne j$).

When the eigenvalues are not distinct, there is an additional degree of arbitrariness in defining the subsets of vectors corresponding to each subset of non-distinct eigenvalues.

Example: Let the matrix $A=\begin{bmatrix}10&3\\3 & 8\end{bmatrix}$.

The eigenvalues of $A$ can be found by $|A-\lambda\,I|=0$. Therefore,

\begin{align*}
|A-\lambda\, I|&=\Big|\begin{matrix}10-\lambda & 3\\ 3& 8-\lambda\end{matrix}\Big|\\
\Rightarrow (10-\lambda)(8-\lambda)-9 &= \lambda^2 -18\lambda+71 =0
\end{align*}

By Quadratic formula, $\lambda_1 = 12.16228$ and $\lambda_2=5.83772$, arbitrary ordered from largest to smallest. Thus the matrix of eigenvalues of $A$ is

$$L=\begin{bmatrix}12.16228 & 0 \\ 0 & 5.83772\end{bmatrix}$$

The eigenvectors corresponding to $\lambda_1=12.16228$ is obtained by solving

$(A-\lambda_2\,I)Z_i=0$ for the element of $Z_i$;

\begin{align*}
(A-12.16228I)\begin{bmatrix}Z_{11}\\Z_{21}\end{bmatrix} &=0\\
\left(\begin{bmatrix}10&3\\3&8\end{bmatrix}-\begin{bmatrix}12.162281&0\\0&12.162281\end{bmatrix}\right)\begin{bmatrix}Z_{11}\\Z_{21}\end{bmatrix}&=0\\
\begin{bmatrix}-2.162276 & 3\\ 3 & -4.162276\end{bmatrix}\begin{bmatrix}Z_{11}\\Z_{21}\end{bmatrix}&=0
\end{align*}

Arbitrary setting $Z_{11}=1$ and solving for $Z_{11}$, using first equation gives $Z_{21}=0.720759$. Thus the vector $Z_1’=\begin{bmatrix}1 & 0.72759\end{bmatrix}$ statisfy first equation.

$Length(Z_1)=\sqrt{Z_1’Z_1}=\sqrt{1.5194935}=1.232677$, where $Z’=0.999997$.

\begin{align*}
Z_1 &=\begin{bmatrix} 0.81124&0.58471\end{bmatrix}\\
Z_2 &=\begin{bmatrix}-0.58471&0.81124\end{bmatrix}
\end{align*}

The elements of $Z_2$ are found in the same manner. Thus the matrix of eigenvectors for $A$ is

$$Z=\begin{bmatrix}0.81124 &-0.58471\\0.8471&0.81124\end{bmatrix}$$

Note that matrix $A$ is of rank two because both eigenvalues are non-zero. The decomposition of $A$ into two orthogonal matrices each of rank one.

\begin{align*}
A &=A_1+A_2\\
A_1 &=\lambda_1Z_1Z_1′ = 12.16228 \begin{bmatrix}0.81124\\0.58471\end{bmatrix}\begin{bmatrix}0.81124 & 0.58471\end{bmatrix}\\
&= \begin{bmatrix}8.0042 & 5.7691\\ 5.7691&4.1581\end{bmatrix}\\
A_2 &= \lambda_2Z_2Z_2′ = \begin{bmatrix}1.9958 & -2.7691\\-2.7691&3.8419\end{bmatrix}
\end{align*}

Thus sum of eigenvalues $\lambda_1+\lambda_2=18$ which is $trace(A)$. Thus sum of the eigenvalues for any square symmetric matrix is equal to the trace of the matrix. The trace of each of the component rank $-1$ matrix is equal to its eigenvalue. $trace(A_1)=\lambda_1$ and $trace(A_2)=\lambda_2$.

Cholesky Transformation

Given the covariances between variables, one can write an invertible linear transformation that “uncorrelated” the variables. Contrariwise, one can transform a set of uncorrelated variables into variables with given covariances. This transformation is called Cholesky Transformation; represented by a matrix that is the “Square Root” of the covariance matrix.

The Square Root Matrix

Given a covariance matrix $\Sigma$, it can be factored uniquely into a product $\Sigma=U’U$, where $U$ is an upper triangle matrix with positive diagonal entries. The matrix $U$ is the Cholesky (or square root) matrix. If one prefers to work with the lower triangular matrix entries ($L$), then one can define $$L=U’ \Rightarrow \quad \Sigma = LL’.$$

This is the form of the Cholesky decomposition given by Golub and Van Lean in 1996. They provided proof of the Cholesky Decomposition and various ways to compute it.

The Cholesky matrix transforms uncorrelated variables into variables whose variances and covariances are given by $\Sigma$. If one generates standard normal variates, the Cholesky transformation maps the variables into variables for the multivariate normal distribution with covariance matrix $\Sigma$ and centered at the origin (%MVN(0, \Sigma)$).

Generally, pseudo-random numbers are used to generate two variables sampled from a population with a given degree of correlation. Property is used for a set of variables (correlated or uncorrelated) in the population, a given correlation matrix can be imposed by post-multiplying the data matrix $X$ by the upper triangular Cholesky Decomposition of the correlation matrix R. That is

  • Create two variables using the pseudo-random number, let the names are $X$ and $Y$
  • Create the desired correlation matrix between variables using $Y=X*R + Y*\sqrt{1-r^2},$
    where $r$ is the desired correlation value. $X$ and $Y$ variable will have exact desired relationship between them. For a larger number of times, the distribution of correlation will be centered on $r$.

The Cholesky Transformation: The Simple Case

Suppose you want to generate multivariate normal data that are uncorrelated, but have non-unit variance. The covariance matrix is the diagonal matrix of variance: $\Sigma = diag(\sigma_1^2,\sigma_2^2,\cdots, \sigma_p^2)$. The $\sqrt{\Sigma}$ is the diagnoal matrix $D$ that consists of the standard deviations $\Sigma = D’D$, where $D=diag(\sigma_1,\sigma_2,\cdots, \sigma_p)$.

Geometrically, the $D$ matrix scales each coordinate direction independent of other directions. The $X$-axix is scaled by a factor of 3, where as the $Y$-axis is unchanged (scale factor of 1). The transformation $D$ is $diag(3,1)$, which corresponds to a covariance matrix of $diag(9,1)$.

Thinking the circles in figure ‘a’ as probability contours for multivariate distribution $MNV(0,I)$, and Figure ‘b’ as the corresponding probability ellipses for the distribution $MNV(0,D)$.

# define the correlation matrix
C <- matrix(c(1.0, 0.6, 0.3,0.6, 1.0, 0.5,0.3, 0.5, 1.0),3,3)
# Find its cholesky decomposition
U = chol(C)
#generate correlated random numbers from uncorrelated
#numbers by multiplying them with the Cholesky matrix.
x <- matrix(rnorm(3000),1000,3)
xcorr <- x%*%U
cor(xcorr)

Reference: Cholesky Transformation to correlate and Uncorrelate variables

Cronbach’s Alpha Reliability Analysis of Measurement Scales

Reliability analysis is used to study the properties of measurement scales (Likert scale questionnaire) and the items (questions) that make them up. The reliability analysis method computes a number of commonly used measures of scale reliability. The reliability analysis also provides information about the relationships between individual items in the scale. The intraclass correlation coefficients can be used to compute the interrater reliability estimates.

Consider that you want to know that does my questionnaire measures the customer satisfaction in a useful way? For this purpose, you can use the reliability analysis to determine the extent to which the items (questions) in your questionnaire are correlated with each other. The overall index of the reliability or internal consistency of the scale as a whole can be obtained. You can also identify problematic items that should be removed (deleted) from the scale.

As an example open the data “satisf.save” already available in SPSS sample files. To check the reliability of Likert scale items follows the steps given below:

Step 1: On the Menu bar of SPSS, Click Analyze > Scale > Reliability Analysis… option
Reliability SPSS menu


Step 2: Select two more variables that you want to test and shift them from left pan to right pan of reliability analysis dialogue box. Note, multiple variables (items) can be selected by holding down the CTRL key and clicking the variable you want. Clicking the arrow button between the left and right pan will shift the variables to the item pan (right pan).
Reliability Analysis Dialog box
Step 3: Click on the “Statistics” Button to select some other statistics such as descriptives (for item, scale and scale if item deleted), summaries (for means, variances, covariances and correlations), inter-item (for correlations and covariances) and Anova table (for none, F-test, Friedman chi-square and Cochran chi-square) statistics etc.

Reliability Statistics

Click on the “Continue” button to save the current statistics options for analysis. Click the OK button in the Reliability Analysis dialogue box to get analysis to be done on selected items. The output will be shown in SPSS output windows.

Reliability Analysis Output

The Cronbach’s Alpha Reliability ($\alpha$) is about 0.827, which is good enough. Note that, deleting the item “organization satisfaction” will increase the reliability of remaining items to 0.860.

A rule of thumb for interpreting alpha for dichotomous items (questions with two possible answers only) or Likert scale items (question with 3, 5, 7, or 9 etc items) is:

  • If Cronbach’s Alpha is $\ge 0.9$, the internal consistency of scale is Excellent.
  • If Cronbach’s Alpha is $0.90 > \alpha \ge 0.8$, the internal consistency of scale is Good.
  • If Cronbach’s Alpha is $0.80 > \alpha \ge 0.7$, the internal consistency of scale is Acceptable.
  • If Cronbach’s Alpha is $0.70 > \alpha \ge 0.6$, the internal consistency of scale is Questionable.
  • If Cronbach’s Alpha is $0.60 > \alpha \ge 0.5$, the internal consistency of scale is Poor.
  • If Cronbach’s Alpha is $0.50 > \alpha $, the internal consistency of scale is Unacceptable.

However, the rules of thumb listed above should be used with caution. Since Cronbach’s Alpha reliability is sensitive to the number of items in a scale. A larger number of questions can results in a larger Alpha Reliability, while a smaller number of items may result in smaller $\alpha$.

Principal Component Regression (PCR)

The transformation of the original data set into a new set of uncorrelated variables is called principal components.  This kind of transformation ranks the new variables according to their importance (that is, variables are ranked according to the size of their variance and eliminate those of least importance). After transformation, a least square regression on this reduced set of principal components is performed.

Principal Component Regression (PCR) is not scale invariant, therefore, one should scale and center data first. Therefore, given a p-dimensional random vector $x=(x_1, x_2, …, x_p)^t$ with covariance matrix $\sum$ and assume that $\sum$ is positive definite. Let $V=(v_1,v_2, \cdots, v_p)$ be a $(p \times p)$-matrix with orthogonal column vectors that is $v_i^t\, v_i=1$, where $i=1,2, \cdots, p$ and $V^t =V^{-1}$. The linear transformation

\begin{aligned}
z&=V^t x\\
z_i&=v_i^t x
\end{aligned}

The variance of the random variable $z_i$ is
\begin{aligned}
Var(Z_i)&=E[v_i^t\, x\, x^t\,\, v_i]\\
&=v_i^t \sum v_i
\end{aligned}

Maximizing the variance $Var(Z_i)$ under the conditions $v_i^t v_i=1$ with Lagrange gives
\[\phi_i=v_i^t \sum v_i -a_i(v_i^t v_i-1)\]

Setting the partial derivation to zero, we get
\[\frac{\partial \phi_i}{\partial v_i} = 2 \sum v_i – 2a_i v_i=0\]

which is
\[(\sum – a_i I)v_i=0\]

In matrix form
\[\sum V= VA\]
of
\[\sum = VAV^t\]

where $A=diag(a_1, a_2, \cdots, a_p)$. This is know as the eigvenvalue problem, $v_i$ are the eigenvectors of $\sum$ and $a_i$ the corresponding eigenvalues such that $a_1 \ge a_2 \cdots \ge a_p$. Since $\sum$ is positive definite, all eigenvalues are real and non-negative numbers.

$z_i$ is named the ith principal component of $x$ and we have
\[Cov(z)=V^t Cov(x) V=V^t \sum V=A\]

The variance of the ith principal component matches the eigenvalue $a_i$, while the variances are ranked in descending order. This means that the last principal component will have the smallest variance. The principal components are orthogonal to all the other principal components (they are even uncorrelated) since $A$ is a diagonal matrix.

In following, for regression, we will use $q$, that is,($1\le q \le p$) principal components. The regression model for observed data $X$ and $y$ can then be expressed as

\begin{aligned}
y&=X\beta+\varepsilon\\
&=XVV^t\beta+\varepsilon\\
&= Z\theta+\varepsilon
\end{aligned}

with the $n\times q$ matrix of the empirical principal components $Z=XV$ and the new regression coefficients $\theta=V^t \beta$. The solution of the least squares estimation is

\begin{aligned}
\hat{\theta}_k=(z_k^t z_k)^{-1}z_k^ty
\end{aligned}

and $\hat{\theta}=(\theta_1, \cdots, \theta_q)^t$

Since the $z_k$ are orthogonal, the regression is a sum of univariate regressions, that is
\[\hat{y}_{PCR}=\sum_{k=1}^q \hat{\theta}_k z_k\]

Since $z_k$ are linear combinations of the original $x_j$, the solution in terms of coefficients of the $x_j$ can be expressed as
\[\hat{\beta}_{PCR} (q)=\sum_{k=1}^q \hat{\theta}_k v_k=V \hat{\theta}\]

Note that if $q=p$, we would get back the usual least squares estimates for the full model. For $q<p$, we get a “reduced” regression.

x Logo: Shield Security
This Site Is Protected By
Shield Security