F-distribution is a continuous probability distribution (also known as Snedecor’s F distribution or the Fisher-Snedecor distribution) which is named in honor of R.A. Fisher and George W. Snedecor. This distribution arises frequently as the null distribution of a test statistic (hypothesis testing), used to develop confidence interval and in the analysis of variance for comparison of several population means.
If $s_1^2$ and $s_2^2$ are two unbiased estimates of the population variance σ^{2 }obtained from independent samples of size n_{1 } and n_{2} respectively from the same normal population, then the mathematically F-ratio is defined as
\[F=\frac{s_1^2}{s_2^2}=\frac{\frac{(n_1-1)\frac{s_1^2}{\sigma^2}}{v_1}}{\frac{(n_2-1)\frac{s_2^2}{\sigma^2}}{v_2}}\]
where v_{1}=n_{1}-1 and v_{2}=n_{2}-1. Since $\chi_1^2=(n_1-1)\frac{s_1^2}{\sigma^2}$ and $\chi_2^2=(n_2-1)\frac{s_2^2}{\sigma^2}$ are distributed independently as $\chi^2$ with $v_1$ and $v_2$ degree of freedom respectively, we have
\[F=\frac{\frac{\chi_1^2}{v_1}}{\frac{\chi_2^2}{v_2}}\]
So, F Distribution is the ratio of two independent Chi-square ($\chi^2$) statistics each divided by their respective degree of freedom.
Properties
- F distribution takes only non-negative values since the numerator and denominator of the F-ratio are squared quantities.
- The range of F values is from 0 to infinity.
- The shape of the F-curve depends on the parameters v_{1} and v_{2} (its nominator and denominator df). It is non-symmetrical and skewed to the right (positive skewed) distribution. It tends to become more and more symmetric when one or both of the parameter values (v_{1}, v_{2}) increases, as shown in the following figure.
- It is asymptotic. As X values increases, the F-curve approaches the X-axis but never cross it or touch it (a similar behavior to the normal probability distribution).
- F have a unique mode at the value \[\tilde{F}=\frac{v_2(v_2-2)}{v_1(v_2+2)},\quad (v_2>2)\] which is always less than unity.
- The mean of F is $\mu=\frac{v_2}{v_2-2},\quad (v_2>2)$
- The variance of F is \[\sigma^2=\frac{2v_2^2(v_1+v_2-2)}{v_1(v_2-2)(v_2-4)},\quad (v_2>4)\]
Assumptions of F-distribution
Statistical procedure of comparing the variances of two population have assumptions
- The two population (from which the samples are drawn) follows Normal distribution
- The two samples are random samples drawn independently from their respective populations.
Statistical procedure of comparing three or more populations means have assumptions
- The population follow the Normal distribution
- The population have equal standard deviations σ
- The populations are independent from each other.
Note
F-distribution is relatively insensitive to violations of the assumptions of normality of the parent population or the assumption of equal variances.
Use of F Distribution table
For given (specified) level of significance α, $F_\alpha(v_1,v_2)$ symbol is used to represent the upper (right hand side) 100% point of an F distribution having v_{1} and v_{2} df.
The lower (left hand side) percentage point can be found by taking the reciprocal of F-value corresponding to upper (right hand side) percentage point, but number of df are interchanged i.e. \[F_{1-\alpha}(v_1,v_2)=\frac{1}{F_\alpha(v_2,v_1)}\]
The distribution for the variable F is given by
\[Y=k.F^{(\frac{v_1}{2})-1}\left(1+\frac{F}{v_2}\right)^{-\frac{(v_1+v_2)}{2}}\]
References:
- http://en.wikibooks.org/wiki/Statistics/Distributions/F
- http://en.wikipedia.org/wiki/F-distribution
- http://www.itl.nist.gov/div898/handbook/eda/section3/eda3665.htm
Download pdf file:
F-Distribution 193.79 KB