Multicollinearity Introduction Explained Easy (2019)

For a classical linear regression model with multiple regressors (explanatory variables), there should be no exact linear relationship between the explanatory variables. The collinearity or multicollinearity term is used if there is/are one or more linear relationship exists among the variables.

Multicollinearity Term

The term multicollinearity is considered as the violation of the assumption of “no exact linear relationship between the regressors.

Ragnar Frisch introduced this term, originally it means the existence of a “perfect” or “exact” linear relationship among some or all regressors of a regression model.

Consider a $k$-variable regression model involving explanatory variables $X_1, X_2, \cdots, X_k$. An exact linear relationship is said to exist if the following condition is satisfied.

\[\lambda_1 X_1 + \lambda_2  X_2 + \cdots + \lambda_k X_k=0,\]

where $\lambda_1, \lambda_2, \cdots, \lambda_k$ are constant and all of them are non-zero, simultaneously, and $X_1=1$ for all observations for intercept term.

Nowadays, the multicollinearity term is not only being used for the case of perfect multicollinearity but also in the case of not perfect collinearity (the case where the $X$ variables are intercorrelated but not perfectly). Therefore,

\[\lambda_1X_1 + \lambda_2X_2 + \cdots \lambda_kX_k + \upsilon_i,\]

where $\upsilon_i$ is a stochastic error term.


In the case of a perfect linear relationship (correlation coefficient will be one in this case) among explanatory variables, the parameters become indeterminate (it is impossible to obtain values for each parameter separately) and the method of least square breaks down. However, if regressors are not intercorrelated at all, the variables are called orthogonal and there is no problem concerning the estimation of coefficients.

Note that

  • Multicollinearity is not a condition that either exists or does not exist, but rather a phenomenon inherent in most relationships.
  • Multicollinearity refers to only a linear relationship among the $X$ variables. It does not rule out the non-linear relationships among them.

See use of mctest R package for diagnosing collinearity

Leave a Comment

Discover more from Statistics for Data Analyst

Subscribe now to keep reading and get access to the full archive.

Continue reading