The transformation of original data set into a new set of uncorrelated variables is called principal components. This kind of transformation ranks the new variables according to their importance (that is, variable are ranked according to the size of their variance and eliminates those of least importance). After transformation, a least square regression on this reduced set of principal components is performed.

Principal Component Regression (PCR) is not scale invariant, therefore, one should scale and center data first. Therefore, given a p-dimensional random vector $x=(x_1, x_2, …, x_p)^t$ with covariance matrix $\sum$ and assume that $\sum$ is positive definite. Let $V=(v_1,v_2, \cdots, v_p)$ be a $(p \times p)$-matrix with orthogonal column vectors that is $v_i^t\, v_i=1$, where $i=1,2, \cdots, p$ and $V^t =V^{-1}$. The linear transformation

\begin{aligned}

z&=V^t x\\

z_i&=v_i^t x

\end{aligned}

The variance of the random variable $z_i$ is

\begin{aligned}

Var(Z_i)&=E[v_i^t\, x\, x^t\,\, v_i]\\

&=v_i^t \sum v_i

\end{aligned}

Maximizing the variance $Var(Z_i)$ under the conditions $v_i^t v_i=1$ with Lagrange gives

\[\phi_i=v_i^t \sum v_i -a_i(v_i^t v_i-1)\]

Setting the partial derivation to zero, we get

\[\frac{\partial \phi_i}{\partial v_i} = 2 \sum v_i – 2a_i v_i=0\]

which is

\[(\sum – a_i I)v_i=0\]

In matrix form

\[\sum V= VA\]

of

\[\sum = VAV^t\]

where $A=diag(a_1, a_2, \cdots, a_p)$. This is know as the eigvenvalue problem, $v_i$ are the eigenvectors of $\sum$ and $a_i$ the corresponding eigenvalues such that $a_1 \ge a_2 \cdots \ge a_p$. Since $\sum$ is positive definite, all eigenvalues are real and non-negative numbers.

$z_i$ is named the ith principal component of $x$ and we have

\[Cov(z)=V^t Cov(x) V=V^t \sum V=A\]

The variance of the *i*th principal component matches the eigenvalue $a_i$, while the variances are ranked in descending order. This means that, the last principal component will have the smallest variance. The principal components are orthogonal to all the other principal components (they are even uncorrelated) since $A$ is a diagonal matrix.

In following, for regression, we will use $q$, that is,($1\le q \le p$) principal components. The regression model for observed data $X$ and $y$ can then be expressed as

\begin{aligned}

y&=X\beta+\varepsilon\\

&=XVV^t\beta+varepsilon\\

&= Z\theta+\varepsilon

\end{aligned}

with the $n\times q$ matrix of the empirical principal components $Z=XV$ and the new regression coefficients $\theta=V^t \beta$. The solution of the least squares estimation is

\begin{aligned}

\hat{\theta}_k=(z_k^t z_k)^{-1}z_k^ty

\end{aligned}

and $\hat{\theta}=(\theta_1, \cdots, \theta_q)^t$

Since the $z_k$ are orthogonal, the regression is a sum of univariate regressions, that is

\[\hat{y}_{PCR}=\sum_{k=1}^q \hat{\theta}_k z_k\]

Since $z_k$ are linear combinations of the original $x_j$, the solution in terms of coefficients of the $x_j$ can be expressed as

\[\hat{\beta}_{PCR} (q)=\sum_{k=1}^q \hat{\theta}_k v_k=V \hat{\theta}\]

Note that if $q=p$, we would get back the usual least squares estimates for the full model. For $q<p$, we get a “reduced” regression.