Statistics for Data Science & Analytics - Statistics MCQs, Software & Data Analysis

Learn Cholesky Transformation (2020)

Sep 11, 2024Feb 23, 2020 by Muhammad Imdad Ullah

Post Views: 1,166

Given the covariances between variables, one can write an invertible linear transformation that “uncorrelated” the variables. Contrariwise, one can transform a set of uncorrelated variables into variables with given covariances. This transformation is called Cholesky Transformation; represented by a matrix that is the “Square Root” of the covariance matrix.

The Square Root Matrix

Given a covariance matrix $\Sigma$, it can be factored uniquely into a product $\Sigma=U’U$, where $U$ is an upper triangle matrix with positive diagonal entries. The matrix $U$ is the Cholesky (or square root) matrix. If one prefers to work with the lower triangular matrix entries ($L$), then one can define $$L=U’ \Rightarrow \quad \Sigma = LL’.$$

This is the form of the Cholesky decomposition given by Golub and Van Lean in 1996. They provided proof of the Cholesky Decomposition and various ways to compute it.

The Cholesky matrix transforms uncorrelated variables into variables whose variances and covariances are given by $\Sigma$. If one generates standard normal variates, the Cholesky transformation maps the variables into variables for the multivariate normal distribution with covariance matrix $\Sigma$ and centered at the origin (%MVN(0, \Sigma)$).

Generally, pseudo-random numbers are used to generate two variables sampled from a population with a given degree of correlation. Property is used for a set of variables (correlated or uncorrelated) in the population, a given correlation matrix can be imposed by post-multiplying the data matrix $X$ by the upper triangular Cholesky Decomposition of the correlation matrix R. That is

Create two variables using the pseudo-random number, let the names be $X$ and $Y$
Create the desired correlation matrix between variables using $Y=X*R + Y*\sqrt{1-r^2},$
where $r$ is the desired correlation value. $X$ and $Y$ variables will have an exact desired relationship between them. For a larger number of times, the distribution of correlation will be centered on $r$.

The Cholesky Transformation: The Simple Case

Suppose you want to generate multivariate normal data that are uncorrelated but have non-unit variance. The covariance matrix is the diagonal matrix of variance: $\Sigma = diag(\sigma_1^2,\sigma_2^2,\cdots, \sigma_p^2)$. The $\sqrt{\Sigma}$ is the diagnoal matrix $D$ that consists of the standard deviations $\Sigma = D’D$, where $D=diag(\sigma_1,\sigma_2,\cdots, \sigma_p)$.

Geometrically, the $D$ matrix scales each coordinate direction independent of other directions. The $X$-axix is scaled by a factor of 3, whereas the $Y$-axis is unchanged (scale factor of 1). The transformation $D$ is $diag(3,1)$, which corresponds to a covariance matrix of $diag(9,1)$.

Thinking the circles in Figure ‘a’ as probability contours for multivariate distribution $MNV(0, I)$, and Figure ‘b’ as the corresponding probability ellipses for the distribution $MNV(0, D)$.

# define the correlation matrix
C <- matrix(c(1.0, 0.6, 0.3,0.6, 1.0, 0.5,0.3, 0.5, 1.0),3,3)

# Find its cholesky decomposition
U = chol(C)

#generate correlated random numbers from uncorrelated
#numbers by multiplying them with the Cholesky matrix.
x <- matrix(rnorm(3000),1000,3)
xcorr <- x%*%U
cor(xcorr)

Reference: Cholesky Transformation to correlate and Uncorrelated variables

R Programming Language

MCQs General Knowledge

What is research? Why do we conduct it?

Feb 25, 2024Jan 19, 2020 by Muhammad Imdad Ullah

Post Views: 861

An important question about discovering some new knowledge is What is Research? Why do we do Research? The answer of What is research and how it is conducted is explained below.

What is Research

Research is an inquiry. It is a process of discovering some new knowledge, that involves multiple elements such as theory development and testing, empirical inquiry, and sharing the generated knowledge with others such as experts and colleagues. A short description of the elements of theory is:

The theory is a set of ideas and perceptions that helps people to understand complex concepts and the relationships among these concepts. To develop and/or test a theory, researchers conduct empirical inquiries, collect and analyze relevant data, and discuss the findings from empirical results. Once theories have been through the research process, it is necessary to share the results of the studies with others such as researchers (related to the study) present papers at conferences, and publish reports in journals and other publications.

There are two ways to use the results of a study:

The results may contribute to researchers’ general understanding of the topic they have researched i.e. studied and may contribute to, understanding how the economy works, why price inflation happens, which factors increase a candidate’s chances of winning an election, etc. The generalizations of results that researchers draw from their studies on these issues can be shared with other researchers and the general public to advance society for the understanding of the topic.
The results of a study may contribute to solving particular problems in a nation, state, or community. For example, a study on the healthcare needs of the elderly in a community may discover that their primary need is finding vehicles for transportation when they want to visit their doctors. The leaders of the community (such as the mayor, and city council) may use this information from the healthcare study, to allocate some money for the transportation needs of the elderly in the next year’s budget.

Therefore, research is a tool that builds blocks of knowledge that in turn contribute to the development of science.

What is Research, Why we Conduct a Research?

Why conduct research?

To understand a phenomenon, situation, or behavior under study.
To test existing theories and to develop new theories based on existing ones.
To answer different questions of “how”, “what”, “which”, “when” and “why” about a phenomenon, behavior, or situation.
Research-related activities contribute to forming (making) new knowledge and expanding the existing knowledge base.

High-Quality Research

Nowadays one can collect/ gather information about almost anything from the Internet Just do a Google search. But a question is, does every Google search good research? Not quite! Do remember, though you will find some of the information, it may or may not be valid or high-quality information. A lot of the information available on the Internet is good and useful, but some are not. There may be misinformation too on the Internet. The information you find on the internet may be someone’s pure opinion, have some fabrication in it, or be based on some unsystematic research or unauthentic information. In short, the information may be valid (objective, true).

Therefore, a high-quality research project:

is based on the scholarly work that has been already done by others in the field,
can be replicated/ reproduced,
is a generalization to other settings,
is based on some logical rationale and tied to other existing theory;
is doable and can be done practically, i.e. when deciding the scope of research. A researcher should consider the availability of time and resources,
generates some new questions,
is incremental,
is an apolitical (politically neutral) activity that should be undertaken for the betterment of society.

Two Types/ Purposes

Typically, there are two types/purposes: Basic Research and Applied Research

To find out about truths regarding human behaviors, societies, economy, etc., or to understand them better. This type is called basic research.
To answer practical questions and support making informed decisions. This type is called applied research.

Note that, most of the public administration and public policy research projects are of the second kind.

Learn about Qualitative vs Quantitative Research

Learn R Programming Language

Model Selection Criteria

Mar 25, 2025Dec 24, 2019 by Muhammad Imdad Ullah

Post Views: 1,164

All models are wrong, but some are useful. Model selection criteria are rules used to select a (statistical) model among competing models based on given data.

Several model selection criteria are used to choose among a set of candidate models and/ or compare models for forecasting purposes.

All model selection criteria aim at minimizing the residual sum of squares (or increasing the coefficient of determination value). The criterion Adj-$R^2$, Akaike Information, Bayesian Information Criterion, Schwarz Information Criterion, and Mallow’s $C_p$ impose a penalty for including an increasingly large number of regressors. Therefore, there is a trade-off between the goodness of fit of the model and its complexity. The complexity refers to the number of parameters in the model.

Model Selection Criteria: Coefficient of Determination ($R^2$)

$$R^2=\frac{\text{Explained Sum of Square}}{\text{Total Sum of Squares}}=1-\frac{\text{Residuals Sum of Squares}}{\text{Total Sum of Squares}}$$

Adding more variables to the model may increase $R^2$, but it may also increase the variance of forecast error.
There are some problems with $R^2$

It measures in-sample goodness of fit (how close an estimated $Y$ value is to its actual values) in the given sample. There is no guarantee that $R^2$ will forecast well out-of-sample observations.
In comparing two or more $R^2$’s, the dependent variable must be the same.
$R^2$ cannot fall when more variables are added to the model.

Model Selection Criteria: Adjusted Coefficient of Determination

$$\overline{R}^2=1-\frac{RSS/(n-k}{TSS(n-1)}$$

$\overline{R}^2 \ge R^2$ shows that the adjusted $R^2$ penalizes for adding more regressors (explanatory variables). Unlike $R^2$, the adjusted $R^2$ will increase only if the absolute $t$-value of the added variable is greater than 1. For comparative purposes, $\overline{R}^2$ is a better measure than $R^2$. The regressand (dependent variable) must be the same for the comparison of models to be valid.

Model Selection Criteria: Akaike’s Information Criterion (AIC)

$$AIC=e^{\frac{2K}{n}}\frac{\sum \hat{u}^2_i}{n}=e^{\frac{2k}{n}}\frac{RSS}{n}$$
where $k$ is the number of regressors, including the intercept. The formula of AIC is

$$\ln AIC = \left(\frac{2k}{n}\right) + \ln \left(\frac{RSS}{n}\right)$$
where $\ln AIC$ is natural log of AIC and $\frac{2k}{n}$ is penalty factor.

AIC imposes a harsher penalty than the adjusted coefficient of determination for adding more regressors. In comparing two or more models, the model with the lowest value of AIC is preferred. AIC is useful for both in-sample and out-of-sample forecasting performance of a regression model. AIC is also used to determine the lag length in an AR(p) model.

Model Selection Criteria: Schwarz’s Information Criterion (SIC)

\begin{align*}
SIC &=n^{\frac{k}{n}}\frac{\sum \hat{u}_i^2}{n}=n^{\frac{k}{n}}\frac{RSS}{n}\\
\ln SIC &= \frac{k}{n} \ln n + \ln \left(\frac{RSS}{n}\right)
\end{align*}
where $\frac{k}{n}\ln\,n$ is the penalty factor. SIC imposes a harsher penalty than AIC.

Like AIC, SIC is used to compare the in-sample or out-of-sample forecasting performance of a model. The lower the values of SIC, the better the model.

Model Selection Criteria: Mallow’s $C_p$ Criterion

For Model selection, the Mallow criteria is
$$C_p=\frac{RSS_p}{\hat{\sigma}^2}-(n-2p)$$
where $RSS_p$ is the residual sum of the square using the $p$ regression in the model.
\begin{align*}
E(RSS_p)&=(n-p)\sigma^2\\
E(C_p)&\approx \frac{(n-p)\sigma^2}{\sigma^2}-(n-2p)\approx p
\end{align*}
A model that has a low $C_p$ value, about equal to $p$, is preferable.

Model Selection Criteria: Bayesian Information Criteria (BIC)

The Bayesian information Criteria is based on the likelihood function and it is closely related to the AIC. The penalty term in BIC is larger than in AIC.
$$BIC=\ln(n)k-2\ln(\hat{L})$$
where $\hat{L}$ is the maximized value of the likelihood function of the regression model.

Cross-Validation

Cross-validation is a technique where the data is split into training and testing sets. The model is trained on the training data and then evaluated on the unseen testing data. This helps assess how well the model generalizes unseen data and avoids overfitting.

Note that no one of these criteria is necessarily superior to the others.

Read more about Correlation and Regression Analysis

Learning R Language Programming

FAQs about Models Selection Criteria

What is meant by Models Selection Criteria?
Describe the coefficient of determination.
Describe the cross-validation technique.
Describe Mallow’s $C_p$ Criterion.
Write about the Adjusted Coefficient of Determination.
What is the Akaike’s Information Criterion (AIC)?
What is the Bayesian Information Criteria (BIC)?
What is the Schwarz’s Information Criterion (SIC)?

The Square Root Matrix

The Cholesky Transformation: The Simple Case

Share this:

What is Research

Why conduct research?

High-Quality Research

Two Types/ Purposes

Share this:

Table of Contents

Model Selection Criteria: Coefficient of Determination ($R^2$)

Model Selection Criteria: Adjusted Coefficient of Determination

Model Selection Criteria: Akaike’s Information Criterion (AIC)

Model Selection Criteria: Schwarz’s Information Criterion (SIC)

Model Selection Criteria: Mallow’s $C_p$ Criterion

Model Selection Criteria: Bayesian Information Criteria (BIC)

Cross-Validation

FAQs about Models Selection Criteria

Share this: