# Non-Central Chi-Square Distribution

The non-central Chi-Squared Distribution is a generalization of the Chi-Squared Distribution.
If $Y_{1} ,Y_{2} ,\cdots ,Y_{n} \sim N(0,1)$ i.e. $(Y_{i} \sim N(0,1)) \Rightarrow y_{i}^{2} \sim \psi _{i}^{2}$ and $\sum y_{i}^{2} \sim \psi _{(n)}^{2}$

If mean ($\mu$) is non-zero then $y_{i} \sim N(\mu _{i} ,1)$ i.e each $y_{i}$ has different mean
\begin{align*}
\Rightarrow  & \qquad y_i^2 \sim \psi_{1,\frac{\mu_i^2}{2}} \\
\Rightarrow  & \qquad \sum y_i^2 \sim \psi_{(n,\frac{\sum \mu_i^2}{2})} =\psi_{(n,\lambda )}^{2}
\end{align*}

Note that if $\lambda =0$ then we have central $\psi ^{2}$. If $\lambda \ne 0$ then it is non-central chi-square because it has no central mean (as distribution is not standard normal).

Central Chi-Square Distribution $f(x)=\frac{1}{2^{\frac{n}{2}} \left|\! {\overline{\frac{n}{2} }} \right. } \chi ^{\frac{n}{2} -1} e^{-\frac{x}{2} }; \qquad 0<x<\infty$

## Theorem:

If $Y_{1} ,Y_{2} ,\cdots ,Y_{n}$ are independent normal random variables with $E(y_{i} )=\mu _{i}$ and $V(y_{i} )=1$ then $w=\sum y_{i}^{2}$ is distributed as non-central chi-square with $n$ degree of freedom and non-central parameter $\lambda$, where $\lambda =\frac{\sum \mu _{i}^{2} }{2}$ and has pdf

\begin{align*}
f(w)=e^{-\lambda } \sum _{i=0}^{\infty }\left[\frac{\lambda ^{i} w^{\frac{n+2i}{2} -1} e^{-\frac{w}{2} } }{i!\, 2^{\frac{n+2i}{2} } \left|\! {\overline{\frac{n+2i}{2} }}  \right. } \right]\qquad 0\le w\le \infty
\end{align*}

## Proof:

Consider the moment generating function of $w=\sum y_{i}^{2}$

\begin{align*}
M_{w} (t)=E(e^{wt} )=E(e^{t\sum y_{i}^{2}  } ); \qquad \text{ where } y_{i} \sim N(\mu \_{i} ,1)
\end{align*}

By definition
\begin{align*}
M_{w} (t) &= \int \cdots \int e^{t\sum y_{i}^{2} } .f(y_{i} )dy_{i} \\
&= K_{1} \int \cdots \int e^{-\frac{1}{2} (1-2t)\left[\sum y_{i}^{2} -\frac{2\sum y_{i} \mu _{i} }{1-2t} \right]}   dy_{1} .dy_{2} \cdots dy_{n} \\
&\text{By completing square}\\
& =K_{1} \int \cdots \int e^{\frac{1}{2} (1-2t)\sum \left[\left[y_{i} -\frac{\mu _{i} }{1-2t} \right]^{2} -\frac{\mu _{i}^{2} }{(1-2t)^{2} } \right]}   dy_{1} .dy_{2} \cdots dy_{n} \\
&= e^{-\frac{\sum \mu_{i}^{2} }{2} \left(1-\frac{1}{1-2t} \right)} \int \cdots \int \left(\frac{1}{\sqrt{2\pi } } \right)^{n} \frac{\frac{1}{\left(\sqrt{1-2t} \right)^{n} } }{\frac{1}{\left(\sqrt{1-2t} \right)^{n} } }  \, e^{-\frac{1}{2.\frac{1}{1-2t} } .\sum \left(y_{i} -\frac{\mu _{i} }{1-2t} \right)^{2} }  dy_{1} .dy_{2} \cdots dy_{n}\\
&=e^{-\frac{\sum \mu _{i}^{2} }{2} \left(1-\frac{1}{1-2t} \right)} .\frac{1}{\left(\sqrt{1-2t} \right)^{n} } \int \cdots \int \left(\frac{1}{\sqrt{2\pi } } \right)^{n}  \frac{1}{\left(\sqrt{\frac{1} {1-2t}} \right)^n} e^{-\, \frac{1}{2.\frac{1}{1-2t} } .\sum \left(y_{i} -\frac{\mu_i}{1-2t}\right)^{2} } dy_{1} .dy_{2} \cdots dy_{n}\\
\end{align*}

where

$\int_{-\infty}^{\infty } \cdots \int _{-\infty }^{\infty }\left(\frac{1}{\sqrt{2\pi}} \right)^{n} \frac{1}{\left(\frac{1}{1-2t} \right)^{\frac{n}{2}}} e^{-{\frac{1}{2}.\frac{1}{1-2t} }} .\sum \left(y_{i} -\frac{\mu _{i} }{1-2t} \right)^{2} dy_{1} .dy_{2} \cdots dy_{n}$
is integral of complete density

\begin{align*}
M_{w}(t)&=e^{-\frac{\sum \mu_i^2}{2} \left(1-\frac{1}{1-2t}\right)} .\left(\frac{1}{\sqrt{1-2t} } \right)^{n} \\
&=\left(\frac{1}{\sqrt{1-2t}}\right)^{n} e^{-\lambda \left(1-\frac{1}{1-2t} \right)} \\
&=e^{-\lambda }.e^{\frac{\lambda}{1-2t}} \frac{1}{(1-2t)^{\frac{n}{2}}}\\
&=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} }{i!(1-2t)^{i} (1-2t)^{n/2} }\\
M_{w=y_{i}^{2} } (t)&=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} }{i!(1-2t)^{\frac{n+2i}{2} } }\tag{A}
\end{align*}

Now Moment Generating Function (MGF) for non-central distribution for a given density function is
\begin{align*}
M_{\omega} (t) & = E(e^{\omega t} )\\
&=\int _{0}^{\infty }e^{\omega \lambda } e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} \omega ^{\frac{n+2i}{2} -1} e^{-\frac{\omega }{2} } }{i!2^{\frac{n+2i}{2} } \left|\! {\overline{\frac{n+2i}{2} }}  \right. } d\omega\\
&=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} }{i!2^{\frac{n+2i}{2} } \left|\! {\overline{\frac{n+2i}{2} }}  \right. }  \int _{0}^{\infty }e^{\frac{\omega }{2} (1-2t)}  \omega ^{\frac{n+2i}{2} -1} d\omega
\end{align*}
Let
\begin{align*}
\frac{\omega }{2} (1-2t)&=P\\
\Rightarrow \omega & =\frac{2P}{1-2t} \\
\Rightarrow d\omega &=\frac{2dp}{1-2t}
\end{align*}

\begin{align*}
&=e^{-\lambda } \sum\limits_{i=0}^{\infty }\frac{\lambda ^{i} }{i!2^{\frac{n+2i}{2} } \left|\! {\overline{\frac{n+2i}{2} }}  \right. }  \int _{0}^{\infty }e^{-P} \left(\frac{2P}{1-2t} \right)^{\frac{n+2i}{2} -1} \frac{2dP}{1-2t}  \\
&=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} 2^{\frac{n+2i}{2} } }{i!2^{\frac{n+2i}{2} } \left|\! {\overline{\frac{n+2i}{2} }}  \right. (1-2t)^{\frac{n+2i}{2} -1} } \int _{0}^{\infty }e^{-P} P^{\frac{n+2i}{2} -1}  dP \\
&=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} }{i!\left|\! {\overline{\frac{n+2i}{2} }}  \right. (1-2t)^{\frac{n+2i}{2} } } \left|\! {\overline{\frac{n+2i}{2} }}  \right.
\end{align*}

as $\int\limits _{0}^{\infty }e^{-P} P^{\frac{n+2i}{2} -1} dP=\left|\! {\overline{\frac{n+2i}{2} }} \right.$

$M_{\omega } (t)=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} }{i!(1-2t)^{\frac{n+2i}{2} } } \tag{B}$

Comparing ($A$) and ($B$)
$M_{w=\sum y_{i}^{2} } (t)=M_{\omega } (t)$

By Uniqueness theorem

$f_{w} (w)=f_{\omega } (\omega )$
\begin{align*}
\Rightarrow \qquad f_{w} (t)&=f(\psi ^{2} )\\
&=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} w^{\frac{n+2i}{2} -1} e^{-\frac{w}{2} } }{i!2^{\frac{n+2i}{2} } \left|\! {\overline{\frac{n+2i}{2} }}  \right. };  \qquad o\le w\le \infty
\end{align*}
is the pdf of non-central chi-square with n df and $\lambda =\frac{\sum \mu _{i}^{2} }{2}$ is the non-centrality parameter. Non-central chi-square is also Additive as central chi-square.

# Consequences of Heteroscedasticity

When heteroscedasticity is present in data, then estimates based on Ordinary Least Square (OLS) are subjected to following consequences:

1. We cannot apply the formula of the variance of the coefficients to conduct tests of significance and construct confidence intervals.
2. If error term ($\mu_i$) is heteroscedastic, then the OLS estimates do not have the minimum variance property in the class of unbiased estimators, i.e. they are inefficient in small samples. Furthermore they are asymptotically inefficient.
3. The estimated coefficients remain unbiased statistically. That means the property of unbiasedness of OLS estimation is not violated by the presence of heteroscedasticity.
4. The forecasts based on the model with heteroscedasticity will be less efficient as OLS estimation yield higher values of the variance of the estimated coefficients.

All this means the standard errors will be underestimated and the t-statistics and F-statistics will be inaccurate, caused by a number of factors, but the main cause is when the variables have substantially different values for each observation. For instance GDP will suffer from heteroscedasticity if we include large countries such as the USA and small countries such as Cuba. In this case it may be better to use GDP per person. Also note that heteroscedasticity tends to affect cross-sectional data more than time series.

Consider the simple linear regression model (slrm)

The OLS estimate of $\hat{\beta}$ and $\alpha$ are

\begin{align*}
\hat{\beta}&=\frac{\sum x_i y_i}{\sum x_i^2}=\frac{\sum x_i (\beta x_i +\epsilon_i)}{\sum x_i^2}\\
&=\beta\frac{\sum x_i^2}{\sum x_i^2}+\frac{\sum x_i \epsilon_i}{\sum x_i^2}\\
&=\beta + \frac{\sum x_i \epsilon_i}{\sum x_i^2}
\end{align*}

Applying expectation on both sides we get:

$E(\hat{\beta}=\beta+\frac{\sum E(x_i \epsilon_i)}{\sum x_i^2}=\beta \qquad E(\epsilon_i x_i)=0$

Similarly

\begin{align*}\hat{\alpha}&=\overline{y}-\hat{\beta}\overline{X}\\
&=\alpha+\beta\overline{X}+\overline{\epsilon}-\hat{\beta}\overline{X}\\
&=\alpha+\beta\overline{X}+0-\overline{X}\beta=\alpha
\end{align*}

Hence, unbiasedness property of OLS estimation is not affected by Heteroscedasticity.

# Assumptions about Linear Regression Models or Error Term

The linear regression model (LRM) is based on certain statistical assumption, some of which are related to the distribution of random variable (error term) $\mu_i$, some are about the relationship between error term $\mu_i$ and the explanatory variables (Independent variables, X’s) and some are related to the independent variable themselves. We can divide the assumptions into two categories

1. Stochastic Assumption
2. None Stochastic Assumptions

These assumptions about linear regression models (or ordinary least square method: OLS) are extremely critical to the interpretation of the regression coefficients.

• The error term ($\mu_i$) is a random real number i.e. $\mu_i$ may assume any positive, negative or zero value upon chance. Each value has a certain probability, therefore error term is a random variable.
• The mean value of $\mu$ is zero, i.e $E(\mu_i)=0$ i.e. the mean value of $\mu_i$ is conditional upon the given $X_i$ is zero. It means that for each value of variable $X_i$, $\mu$ may take various values, some of them greater than zero and some smaller than zero. Considering the all possible values of $\mu$ for any particular value of $X$, we have zero mean value of disturbance term $\mu_i$.
• The variance of $\mu_i$ is constant i.e. for the given value of X, the variance of $\mu_i$ is the same for all observations. $E(\mu_i^2)=\sigma^2$. The variance of disturbance term ($\mu_i$) about its mean is at all values of X will show the same dispersion about their mean.
• The variable $\mu_i$ has a normal distribution i.e. $\mu_i\sim N(0,\sigma_{\mu}^2$. The value of $\mu$ (for each $X_i$) have a bell shaped symmetrical distribution.
• The random term of different observation ($\mu_i,\mu_j$) are independent i..e $E(\mu_i,\mu_j)=0$, i.e. there is no autocorrelation between the disturbances. It means that random term assumed in one period does not depend of the values in any other period.
• $\mu_i$ and $X_i$ have zero covariance between them i.e. $\mu$ is independent of the explanatory variable or $E(\mu_i X_i)=0$ i.e. $Cov(\mu_i, X_i)=0$. The disturbance term $\mu$ and explanatory variable X are uncorrelated. The $\mu$’s and $X$’s do not tend to vary together as their covariance is zero. This assumption is automatically fulfilled if X variable is nonrandom or non-stochastic or if mean of random term is zero.
• All the explanatory variables are measured without error. It means that we will assume that the regressors are error free while y (dependent variable) may or may not include error of measurements.
• The number of observations n must be greater than the number of parameters to be estimated or alternatively the number of observation must be greater than the number of explanatory (independent) variables.
• The should be variability in the X values. That is X values in a given sample must not be same. Statistically, $Var(X)$ must be a finite positive number.
• The regression model must be correctly specified, meaning that there is no specification bias or error in the model used in empirical analysis.
• There is no perfect or near to perfect multicollinearity or collinearity among the two or more explanatory (independent) variables.
• Values taken by the regressors X are considered to be fixed in repeating sampling i.e. X is assumed to non-stochastic. Regression analysis is conditional on the given values of the regressor(s) X.
• Linear regression model is linear in the parameters, e.g. $y_i=\beta_1+\beta_2x_i +\mu_i$

# Simple Linear Regression Model (SLRM)

A simple linear regression is based on a single independent (explanatory) variable and it fits a straight line such that the sum of squared residuals of the regression model (or vertical distances between the fitted line and points of the data set) as small as possible. This model can (usually known as statistical or probabilistic model) be written as

\begin{align*}
y_i &= \alpha + \beta x_i +\varepsilon_i\\
\text{OR} \quad y_i&=b_0 + b_1 x_i + \varepsilon_i\\
\text{OR} \quad y_i&=\beta_0 + \beta x_i + \varepsilon_i
\end{align*}
where y is dependent variable, x is independent variable. In regression context, y is called regressand and x is called the regressors. The epsilon ($\varepsilon$) is unobservable, denoting random error or the disturbance term of regression model. $\varepsilon$ (random error) has some specific importance for its inclusion in the regression model:

1. Random error ($\varepsilon$) captures the effect on the dependent variable of all variables which are not included in the model under study, because the variable not included in the model may or may not be observable.
2. Random error ($\varepsilon$) captures any specification error related to assumed linear-functional form.
3. Random error ($\varepsilon$) captures the effect of unpredictable random component present in the dependent variable.

We can say that $\varepsilon$ is the variation in variable y not explained (unexplained) by the independent variable x included in the model.

In above equation or model $\hat{\beta_0}, \hat{\beta_1}$ are the parameters of the model and our main objective is to obtain the estimates of their numerical values i.e. $\hat{\beta_0}$ and $\hat{\beta_1}$, where $\beta_0$ is the intercept (regression constant), it passes through the ($\overline{x}, \overline{y}$) i.e. center of mass of the data points and $\beta_1$ is the slope or regression coefficient of the model and slope is the correlation between variable x and y corrected by the ratio of standard deviations of these variables. The subscript i denotes the ith value of the variable in the model.
$y=\beta_0 + \beta_1 x_1$
This model is called mathematical model as all the variation in y is due solely to change in x and there are no other factors affecting the dependent variable. It this is true then all the pairs (x, y) will fall on a straight line if plotted on two dimensional plane. However for observed values the plot may or may not be a straight line. Two dimensional diagram with points plotted in pair form is called scatter diagram.

# Range: Measure of Dispersion

Measure of Central Tendency provides typical value about the data set, but it does not tell the actual story about data i.e. mean, median and mode are enough to get summary information, though we know about the center of the data. In other words, we can measure the center of the data by looking at averages (mean, median, mode). These measure tell nothing about the spread of data. So for more information about data we need some other measure, such as measure of dispersion or spread.

Spread of data can be measured by calculating the range of data; range tell us over how many numbers of data extends. Range (an absolute measure of dispersion) can be found by subtracting highest value (called upper bound) in data from smallest value (called lower bound) in data. i.e.

Range = Upper Bound – Lowest Bound
OR
Range = Largest Value – Smallest Value

This measure of dispersion have disadvantages as range only describes the width of the data set (i.e. only spread out) measure in same unit as data, but it does not gives the real picture of how data is distributed. If data has outliers, using range to describe the spread of that can be very misleading as range is sensitive to outliers. So we need to be careful in using range as it does not give the full picture of what’s going between the highest and lowest value. It might give misleading picture of the spread of the data because it is based only on the two extreme values. It is therefore an unsatisfactory measure of dispersion.

However range is widely used in statistical process control such as control charts of manufactured products, daily temperature, stock prices etc., applications as it is very easy to calculate. It is an absolute measure of dispersion, its relatives measure known as the coefficient of dispersion defined the the relation

$Coefficient\,\, of\,\, Dispersion = \frac{x_m-x_0}{x_m-x_0}$

Coefficient of dispersion is a pure dimensionless and is used for comparison purpose.