# Basic Statistics and Data Analysis

## Non Central Chi Squared Distribution

The Non Central Chi Squared Distribution is a generalization of the Chi Squared Distribution.
If $Y_{1} ,Y_{2} ,\cdots ,Y_{n} \sim N(0,1)$ i.e. $(Y_{i} \sim N(0,1)) \Rightarrow y_{i}^{2} \sim \psi _{i}^{2}$ and $\sum y_{i}^{2} \sim \psi _{(n)}^{2}$

If mean ($\mu$) is non-zero then $y_{i} \sim N(\mu _{i} ,1)$ i.e each $y_{i}$ has different mean
\begin{align*}
\Rightarrow  & \qquad y_i^2 \sim \psi_{1,\frac{\mu_i^2}{2}} \\
\Rightarrow  & \qquad \sum y_i^2 \sim \psi_{(n,\frac{\sum \mu_i^2}{2})} =\psi_{(n,\lambda )}^{2}
\end{align*}

Note that if $\lambda =0$ then we have central $\psi ^{2}$. If $\lambda \ne 0$ then it is non central chi squared distribution because it has no central mean (as distribution is not standard normal).

Central Chi-Square Distribution $f(x)=\frac{1}{2^{\frac{n}{2}} \left|\! {\overline{\frac{n}{2} }} \right. } \chi ^{\frac{n}{2} -1} e^{-\frac{x}{2} }; \qquad 0<x<\infty$

## Theorem:

If $Y_{1} ,Y_{2} ,\cdots ,Y_{n}$ are independent normal random variables with $E(y_{i} )=\mu _{i}$ and $V(y_{i} )=1$ then $w=\sum y_{i}^{2}$ is distributed as non central chi square with $n$ degree of freedom and non-central parameter $\lambda$, where $\lambda =\frac{\sum \mu _{i}^{2} }{2}$ and has pdf

\begin{align*}
f(w)=e^{-\lambda } \sum _{i=0}^{\infty }\left[\frac{\lambda ^{i} w^{\frac{n+2i}{2} -1} e^{-\frac{w}{2} } }{i!\, 2^{\frac{n+2i}{2} } \left|\! {\overline{\frac{n+2i}{2} }}  \right. } \right]\qquad 0\le w\le \infty
\end{align*}

## Proof:

Consider the moment generating function of $w=\sum y_{i}^{2}$

\begin{align*}
M_{w} (t)=E(e^{wt} )=E(e^{t\sum y_{i}^{2}  } ); \qquad \text{ where } y_{i} \sim N(\mu \_{i} ,1)
\end{align*}

By definition
\begin{align*}
M_{w} (t) &= \int \cdots \int e^{t\sum y_{i}^{2} } .f(y_{i} )dy_{i} \\
&= K_{1} \int \cdots \int e^{-\frac{1}{2} (1-2t)\left[\sum y_{i}^{2} -\frac{2\sum y_{i} \mu _{i} }{1-2t} \right]}   dy_{1} .dy_{2} \cdots dy_{n} \\
&\text{By completing square}\\
& =K_{1} \int \cdots \int e^{\frac{1}{2} (1-2t)\sum \left[\left[y_{i} -\frac{\mu _{i} }{1-2t} \right]^{2} -\frac{\mu _{i}^{2} }{(1-2t)^{2} } \right]}   dy_{1} .dy_{2} \cdots dy_{n} \\
&= e^{-\frac{\sum \mu_{i}^{2} }{2} \left(1-\frac{1}{1-2t} \right)} \int \cdots \int \left(\frac{1}{\sqrt{2\pi } } \right)^{n} \frac{\frac{1}{\left(\sqrt{1-2t} \right)^{n} } }{\frac{1}{\left(\sqrt{1-2t} \right)^{n} } }  \, e^{-\frac{1}{2.\frac{1}{1-2t} } .\sum \left(y_{i} -\frac{\mu _{i} }{1-2t} \right)^{2} }  dy_{1} .dy_{2} \cdots dy_{n}\\
&=e^{-\frac{\sum \mu _{i}^{2} }{2} \left(1-\frac{1}{1-2t} \right)} .\frac{1}{\left(\sqrt{1-2t} \right)^{n} } \int \cdots \int \left(\frac{1}{\sqrt{2\pi } } \right)^{n}  \frac{1}{\left(\sqrt{\frac{1} {1-2t}} \right)^n} e^{-\, \frac{1}{2.\frac{1}{1-2t} } .\sum \left(y_{i} -\frac{\mu_i}{1-2t}\right)^{2} } dy_{1} .dy_{2} \cdots dy_{n}\\
\end{align*}

where

$\int_{-\infty}^{\infty } \cdots \int _{-\infty }^{\infty }\left(\frac{1}{\sqrt{2\pi}} \right)^{n} \frac{1}{\left(\frac{1}{1-2t} \right)^{\frac{n}{2}}} e^{-{\frac{1}{2}.\frac{1}{1-2t} }} .\sum \left(y_{i} -\frac{\mu _{i} }{1-2t} \right)^{2} dy_{1} .dy_{2} \cdots dy_{n}$
is integral of complete density

\begin{align*}
M_{w}(t)&=e^{-\frac{\sum \mu_i^2}{2} \left(1-\frac{1}{1-2t}\right)} .\left(\frac{1}{\sqrt{1-2t} } \right)^{n} \\
&=\left(\frac{1}{\sqrt{1-2t}}\right)^{n} e^{-\lambda \left(1-\frac{1}{1-2t} \right)} \\
&=e^{-\lambda }.e^{\frac{\lambda}{1-2t}} \frac{1}{(1-2t)^{\frac{n}{2}}}\\
&=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} }{i!(1-2t)^{i} (1-2t)^{n/2} }\\
M_{w=y_{i}^{2} } (t)&=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} }{i!(1-2t)^{\frac{n+2i}{2} } }\tag{A}
\end{align*}

Now Moment Generating Function (MGF) for non-central distribution for a given density function is
\begin{align*}
M_{\omega} (t) & = E(e^{\omega t} )\\
&=\int _{0}^{\infty }e^{\omega \lambda } e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} \omega ^{\frac{n+2i}{2} -1} e^{-\frac{\omega }{2} } }{i!2^{\frac{n+2i}{2} } \left|\! {\overline{\frac{n+2i}{2} }}  \right. } d\omega\\
&=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} }{i!2^{\frac{n+2i}{2} } \left|\! {\overline{\frac{n+2i}{2} }}  \right. }  \int _{0}^{\infty }e^{\frac{\omega }{2} (1-2t)}  \omega ^{\frac{n+2i}{2} -1} d\omega
\end{align*}
Let
\begin{align*}
\frac{\omega }{2} (1-2t)&=P\\
\Rightarrow \omega & =\frac{2P}{1-2t} \\
\Rightarrow d\omega &=\frac{2dp}{1-2t}
\end{align*}

\begin{align*}
&=e^{-\lambda } \sum\limits_{i=0}^{\infty }\frac{\lambda ^{i} }{i!2^{\frac{n+2i}{2} } \left|\! {\overline{\frac{n+2i}{2} }}  \right. }  \int _{0}^{\infty }e^{-P} \left(\frac{2P}{1-2t} \right)^{\frac{n+2i}{2} -1} \frac{2dP}{1-2t}  \\
&=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} 2^{\frac{n+2i}{2} } }{i!2^{\frac{n+2i}{2} } \left|\! {\overline{\frac{n+2i}{2} }}  \right. (1-2t)^{\frac{n+2i}{2} -1} } \int _{0}^{\infty }e^{-P} P^{\frac{n+2i}{2} -1}  dP \\
&=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} }{i!\left|\! {\overline{\frac{n+2i}{2} }}  \right. (1-2t)^{\frac{n+2i}{2} } } \left|\! {\overline{\frac{n+2i}{2} }}  \right.
\end{align*}

as $\int\limits _{0}^{\infty }e^{-P} P^{\frac{n+2i}{2} -1} dP=\left|\! {\overline{\frac{n+2i}{2} }} \right.$

$M_{\omega } (t)=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} }{i!(1-2t)^{\frac{n+2i}{2} } } \tag{B}$

Comparing ($A$) and ($B$)
$M_{w=\sum y_{i}^{2} } (t)=M_{\omega } (t)$

By Uniqueness theorem

$f_{w} (w)=f_{\omega } (\omega )$
\begin{align*}
\Rightarrow \qquad f_{w} (t)&=f(\psi ^{2} )\\
&=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} w^{\frac{n+2i}{2} -1} e^{-\frac{w}{2} } }{i!2^{\frac{n+2i}{2} } \left|\! {\overline{\frac{n+2i}{2} }}  \right. };  \qquad o\le w\le \infty
\end{align*}
is the pdf of non central chi square with n df and $\lambda =\frac{\sum \mu _{i}^{2} }{2}$ is the non-centrality parameter. Non central chi squared distribution is also Additive as central chi square distribution.

## F Distribution: Ratios of two Independent Estimators

F-distribution is a continuous probability distribution (also known as Snedecor’s F distribution or the Fisher-Snedecor distribution) which is named in honor of R.A. Fisher and George W. Snedecor. This distribution arises frequently as the null distribution of a test statistic (hypothesis testing), used to develop confidence interval and in the analysis of variance for comparison of several population means.

If $s_1^2$ and $s_2^2$ are two unbiased estimates of the population variance σ2 obtained from independent samples of size n1 and n2 respectively from the same normal population, then the mathematically F-ratio is defined as
$F=\frac{s_1^2}{s_2^2}=\frac{\frac{(n_1-1)\frac{s_1^2}{\sigma^2}}{v_1}}{\frac{(n_2-1)\frac{s_2^2}{\sigma^2}}{v_2}}$
where v1=n1-1 and v2=n2-1. Since $\chi_1^2=(n_1-1)\frac{s_1^2}{\sigma^2}$ and $\chi_2^2=(n_2-1)\frac{s_2^2}{\sigma^2}$ are distributed independently as $\chi^2$ with $v_1$ and $v_2$ degree of freedom respectively, we have
$F=\frac{\frac{\chi_1^2}{v_1}}{\frac{\chi_2^2}{v_2}}$

So, F Distribution is the ratio of two independent Chi-square ($\chi^2$) statistics each divided by their respective degree of freedom.

## Properties

•  F distribution takes only non-negative values since the numerator and denominator of the F-ratio are squared quantities.
• The range of F values is from 0 to infinity.
• The shape of the F-curve depends on the parameters v1 and v2 (its nominator and denominator df). It is non-symmetrical and skewed to the right (positive skewed) distribution. It tends to become more and more symmetric when one or both of the parameter values (v1, v2) increases, as shown in the following figure.

F distribution curves

• It is asymptotic. As X values increases, the F-curve approaches the X-axis but never cross it or touch it (a similar behavior to the normal probability distribution).
• F have a unique mode at the value $\tilde{F}=\frac{v_2(v_2-2)}{v_1(v_2+2)},\quad (v_2>2)$ which is always less than unity.
• The mean of F is $\mu=\frac{v_2}{v_2-2},\quad (v_2>2)$
• The variance of F is $\sigma^2=\frac{2v_2^2(v_1+v_2-2)}{v_1(v_2-2)(v_2-4)},\quad (v_2>4)$

### Assumptions of F-distribution

Statistical procedure of comparing the variances of two population have assumptions

• The two population (from which the samples are drawn) follows Normal distribution
• The two samples are random samples drawn independently from their respective populations.

Statistical procedure of comparing three or more populations means have assumptions

• The population follow the Normal distribution
• The population have equal standard deviations σ
• The populations are independent from each other.

## Note

F-distribution is relatively insensitive to violations of the assumptions of normality of the parent population or the assumption of equal variances.

## Use of F Distribution table

For given (specified) level of significance α, $F_\alpha(v_1,v_2)$ symbol is used to represent the upper (right hand side) 100% point of an F distribution having v1 and v2 df.

The lower (left hand side) percentage point can be found by taking the reciprocal of F-value corresponding to upper (right hand side) percentage point, but number of df are interchanged i.e. $F_{1-\alpha}(v_1,v_2)=\frac{1}{F_\alpha(v_2,v_1)}$

The distribution for the variable F is given by
$Y=k.F^{(\frac{v_1}{2})-1}\left(1+\frac{F}{v_2}\right)^{-\frac{(v_1+v_2)}{2}}$

References:

• http://en.wikibooks.org/wiki/Statistics/Distributions/F
• http://en.wikipedia.org/wiki/F-distribution
• http://www.itl.nist.gov/div898/handbook/eda/section3/eda3665.htm