Model Selection Criteria (2019)

All models are wrong, but some are useful. Model selection criteria are rules used to select a (statistical) model among competing models, based on given data.

Several model selection criteria are used to choose among a set of candidate models, and/ or compare models for forecasting purposes.

All model selection criteria aim at minimizing the residual sum of squares (or increasing the coefficient of determination value). The criterion Adj-$R^2$, Akaike Information, Bayesian Information Criterion, Schwarz Information Criterion, and Mallow’s $C_p$ impose a penalty for including an increasingly large number of regressors. Therefore, there is a trade-off between the goodness of fit of the model and its complexity. The complexity refers to the number of parameters in the model.

Model Selection Criteria

Model Selection Criteria: Coefficient of Determination ($R^2$)

$$R^2=\frac{\text{Explained Sum of Square}}{\text{Total Sum of Squares}}=1-\frac{\text{Residuals Sum of Squares}}{\text{Total Sum of Squares}}$$

Adding more variables to the model may increase $R^2$ but it may also increase the variance of forecast error.
There are some problems with $R^2$

  • It measures in-sample goodness of fit (how close an estimated $Y$ value is to its actual values) in the given sample. There is no guarantee that $R^2$ will forecast well out-of-sample observations.
  • In comparing two or more $R^2$’s, the dependent variable must be the same.
  • $R^2$ cannot fall when more variables are added to the model.

Model Selection Criteria: Adjusted Coefficient of Determination ($R^2$)

$$\overline{R}^2=1-\frac{RSS/(n-k}{TSS(n-1)}$$

$\overline{R}^2 \ge R^2$ shows that the adjusted $R^2$ penalizes for adding more regressors (explanatory variables). Unlike $R^2$, the adjusted $R^2$ will increase only if the absolute $t$-value of the added variable is greater than 1. For comparative purposes, $\overline{R}^2$ is a better measure than $R^2$. The regressand (dependent variable) must be the same for the comparison of models to be valid.

Model Selection Criteria: Akaike’s Information Criterion (AIC)

$$AIC=e^{\frac{2K}{n}}\frac{\sum \hat{u}^2_i}{n}=e^{\frac{2k}{n}}\frac{RSS}{n}$$
where $k$ is the number of regressors including the intercept. The formula of AIC is

$$\ln AIC = \left(\frac{2k}{n}\right) + \ln \left(\frac{RSS}{n}\right)$$
where $\ln AIC$ is natural log of AIC and $\frac{2k}{n}$ is penalty factor.

AIC imposes a harsher penalty than the adjusted coefficient of determination for adding more regressors. In comparing two or more models, the model with the lowest value of AIC is preferred. AIC is useful for both in-sample and out-of-sample forecasting performance of a regression model. AIC is used to determine the lag length in an AR(p) model also.

Model Selection Criteria: Schwarz’s Information Criterion (SIC)

\begin{align*}
SIC &=n^{\frac{k}{n}}\frac{\sum \hat{u}_i^2}{n}=n^{\frac{k}{n}}\frac{RSS}{n}\\
\ln SIC &= \frac{k}{n} \ln n + \ln \left(\frac{RSS}{n}\right)
\end{align*}
where $\frac{k}{n}\ln\,n$ is the penalty factor. SIC imposes a harsher penalty than AIC.

Like AIC, SIC is used to compare the in-sample or out-of-sample forecasting performance of a model. The lower the values of SIC, the better the model.

Model Selection Criteria: Mallow’s $C_p$ Criterion

For Model selection the Mallow criteria is
$$C_p=\frac{RSS_p}{\hat{\sigma}^2}-(n-2p)$$
where $RSS_p$ is the residual sum of the square using the $p$ regression in the model.
\begin{align*}
E(RSS_p)&=(n-p)\sigma^2\\
E(C_p)&\approx \frac{(n-p)\sigma^2}{\sigma^2}-(n-2p)\approx p
\end{align*}
A model that has a low $C_p$ value, about equal to $p$ is preferable.

Model Selection Criteria: Bayesian Information Criteria (BIC)

The Bayesian information Criteria is based on the likelihood function and it is closely related to the AIC. The penalty term in BIC is larger than in AIC.
$$BIC=\ln(n)k-2\ln(\hat{L})$$
where $\hat{L}$ is the maximized value of the likelihood function of the regression model.

Cross-Validation

Cross-validation is a technique where the data is split into training and testing sets. The model is trained on the training data and then evaluated on the unseen testing data. This helps assess how well the model generalizes to unseen data and avoids overfitting.

Note that no one of these criteria is necessarily superior to the others.

Read more about Correlation and Regression Analysis

Learning R Language Programming

Google Search Tricks and Tips

Here are some of the most useful Google Search Tricks and Tips that can be used in Google from basic tips to new features. Let us start with helpful Google Search Tricks and Tips that can be useful for search queries by mathematicians and statisticians.

Google Search Tricks and Tips

Google Search Tricks and Tips 1: Double Quotes (for Exact Search)

The use of double quotes yields only the pages with the same words in the same order (containing a specific phrase) as what’s in the quotes.

Google Search Trick 2: Asterisk within Quotes (to Specify Unknown Words)

Searching a phrase in double quotes with an asterisk will search all variations of that phrase. For example “* matrix in regression analysis” will yield pages that have different words before and after ‘matrix in regression analysis’ such as “hat matrix in regression analysis”, “the inverse-partitioned-matrix method in linear regression analysis”, “a matrix form, in regression analysis”, and “second important matrix in regression analysis” etc.

Google Search Trick 3: Minus Sign to Exclude Words from Search

If you want to exclude (eliminate) certain words from your search, you can use the minus sign. For example, the “hat matrix in regression -outlier” will result in all the pages related to the hat matrix in regression but will not contain outlier words in the searches.

Google Search Trick 4: Tilde symbol (~) to Search for Similar Words

The tilde symbol used in the phrase will search for a word and all its synonyms. For example, ~Cross Table will result in crosstable, crosstabulation, cross-table, and cross-table query, etc.

Google Search Tricks and Tips 5: OR Operator for Multiple Words Searching

The “OR operator searches the pages that include either word before and after the OR operator. For example, residuals or error will result in the words “residuals” and either “error”.

Google Search Trick 6: Numerical Range

The use of a numerical range of numbers results in pages that match these numbers. For example, “History of Statistics 2000…2019”.

Google Search Tricks and Tips 7: Finding the Meanings of Word or Phrase

The define keyword is used to define a word or phrase. For example, define statistics, define: define: residuals or error, and define: goodness of fit test

Google Search Trick 8: Search a Particular Website

The site: function searches a particular website. For example, Learning statistics site:edu will result in pages found on .edu websites. The other examples can be “Learning statistics site: com”, and Learning statistics site:itfeature.com, etc.

Google Search Trick: Search a particular website

Google Search Trick 9: Search Webpages Linked to a Particular Website

The link: function searches for web pages that link to a particular website. For example, link: itfeature.com, link: stat.bzu.edu.pk

Google Search Trick 10: Math Answers

Google performs basic math functions for example, 4.7, 30% of 55, 20^2, sqrt(4), exp(4), log(10), cos(90), etc.

Google Search Trick 11: Unit Conversion

Converts the units of a measure. For example, 5cm in the foot, 100$ in PKR, 42 days in a fortnight, 10 mph in the speed of light, 100 miles in leagues, and 100 Km in miles, etc.

Google Search Trick 12: Compare using “vs”

A one-by-one comparison can be searched using the “vs” keyword. For example, statistics vs parameters, descriptive vs inferential statistics, AIC vs BIC, and statistics vs mathematics.

https://itfeature.com statistics help

R Faqs: Frequently Asked Questions

Online MCQs Website with Answers

Job Interview Recently Asked Questions

Multicollinearity Introduction Explained Easy (2019)

For a classical linear regression model with multiple regressors (explanatory variables), there should be no exact linear relationship between the explanatory variables. The collinearity or multicollinearity term is used if there is/are one or more linear relationship exists among the variables.

Multicollinearity Term

The term multicollinearity is considered as the violation of the assumption of “no exact linear relationship between the regressors.

Ragnar Frisch introduced this term, originally it means the existence of a “perfect” or “exact” linear relationship among some or all regressors of a regression model.

Consider a $k$-variable regression model involving explanatory variables $X_1, X_2, \cdots, X_k$. An exact linear relationship is said to exist if the following condition is satisfied.

\[\lambda_1 X_1 + \lambda_2  X_2 + \cdots + \lambda_k X_k=0,\]

where $\lambda_1, \lambda_2, \cdots, \lambda_k$ are constant and all of them are non-zero, simultaneously, and $X_1=1$ for all observations for intercept term.

Nowadays, the multicollinearity term is not only being used for the case of perfect multicollinearity but also in the case of not perfect collinearity (the case where the $X$ variables are intercorrelated but not perfectly). Therefore,

\[\lambda_1X_1 + \lambda_2X_2 + \cdots \lambda_kX_k + \upsilon_i,\]

where $\upsilon_i$ is a stochastic error term.

Multicollinearity

In the case of a perfect linear relationship (correlation coefficient will be one in this case) among explanatory variables, the parameters become indeterminate (it is impossible to obtain values for each parameter separately) and the method of least square breaks down. However, if regressors are not intercorrelated at all, the variables are called orthogonal and there is no problem concerning the estimation of coefficients.

Note that

  • Multicollinearity is not a condition that either exists or does not exist, but rather a phenomenon inherent in most relationships.
  • Multicollinearity refers to only a linear relationship among the $X$ variables. It does not rule out the non-linear relationships among them.

See use of mctest R package for diagnosing collinearity