Linear Regression Model Assumptions
The linear regression model (LRM) is based on certain statistical assumptions, some of which are related to the distribution of a random variable (error term) $u_i$, some are about the relationship between error term $u_i$ and the explanatory variables (Independent variables, $X$‘s) and some are related to the independent variable themselves. The linear regression model assumptions can be classified into two categories
- Stochastic Assumption
- None Stochastic Assumptions
These linear regression model assumptions (or assumptions about the ordinary least square method: OLS) are extremely critical to interpreting the regression coefficients.
- The error term ($u_i$) is a random real number i.e. $u_i$ may assume any positive, negative, or zero value upon chance. Each value has a certain probability, therefore, the error term is a random variable.
- The mean value of $u$ is zero, i.e. $E(u_i)=0$ i.e. the mean value of $u_i$ is conditional upon the given $X_i$ is zero. It means that for each value of variable $X_i$, $u$ may take various values, some of them greater than zero and some smaller than zero. Considering all possible values of $u$ for any particular value of $X$, we have zero mean value of disturbance term $u_i$.
- The variance of $u_i$ is constant i.e. for the given value of $X$, the variance of $u_i$ is the same for all observations. $E(u_i^2)=\sigma^2$. The variance of disturbance term ($u_i$) about its mean is at all values of $X$ will show the same dispersion about their mean.
- The variable $u_i$ has a normal distribution i.e. $u_i\sim N(0,\sigma_{u}^2$. The value of $u$ (for each $X_i$) has a bell-shaped symmetrical distribution.
- The random terms of different observations ($u_i,u_j$) are independent i..e $E(u_i,u_j)=0$, i.e. there is no autocorrelation between the disturbances. It means that the random term assumed in one period does not depend on the values in any other period.
- $u_i$ and $X_i$ have zero covariance between them i.e. $u$ is independent of the explanatory variable or $E(u_i X_i)=0$ i.e. $Cov(u_i, X_i)=0$. The disturbance term $u$ and explanatory variable $X$ are uncorrelated. The $u$’s and $X$’s do not tend to vary together as their covariance is zero. This assumption is automatically fulfilled if the $X$ variable is nonrandom or non-stochastic or if the mean of the random term is zero.
- All the explanatory variables are measured without error. It means that we will assume that the regressors are error-free while $y$ (dependent variable) may or may not include measurement errors.
- The number of observations $n$ must be greater than the number of parameters to be estimated or the number of observations must be greater than the number of explanatory (independent) variables.
- The should be variability in the $X$ values. That is $X$ values in a given sample must not be the same. Statistically, $Var(X)$ must be a finite positive number.
- The regression model must be correctly specified, meaning there is no specification bias or error in the model used in empirical analysis.
- No perfect or near-perfect multicollinearity or collinearity exists among the two or more explanatory (independent) variables.
- Values taken by the regressors $X$ are considered to be fixed in repeating sampling i.e. $X$ is assumed to be non-stochastic. Regression analysis is conditional on the given values of the regressor(s) $X$.
- The linear regression model is linear in the parameters, e.g. $y_i=\beta_1+\beta_2x_i +u_i$
Visit MCQs Site: https://gmstat.com