F Distribution: Ratios of two Independent Estimators

F-distribution is a continuous probability distribution (also known as Snedecor’s F distribution or the Fisher-Snedecor distribution) which is named in honor of R.A. Fisher and George W. Snedecor. This distribution arises frequently as the null distribution of a test statistic (hypothesis testing), used to develop confidence interval and in the analysis of variance for comparison of several population means.

If $s_1^2$ and $s_2^2$ are two unbiased estimates of the population variance σ2 obtained from independent samples of size n1 and n2 respectively from the same normal population, then the mathematically F-ratio is defined as
\[F=\frac{s_1^2}{s_2^2}=\frac{\frac{(n_1-1)\frac{s_1^2}{\sigma^2}}{v_1}}{\frac{(n_2-1)\frac{s_2^2}{\sigma^2}}{v_2}}\]
where v1=n1-1 and v2=n2-1. Since $\chi_1^2=(n_1-1)\frac{s_1^2}{\sigma^2}$ and $\chi_2^2=(n_2-1)\frac{s_2^2}{\sigma^2}$ are distributed independently as $\chi^2$ with $v_1$ and $v_2$ degree of freedom respectively, we have
\[F=\frac{\frac{\chi_1^2}{v_1}}{\frac{\chi_2^2}{v_2}}\]

So, F Distribution is the ratio of two independent Chi-square ($\chi^2$) statistics each divided by their respective degree of freedom.

Properties

  •  This takes only non-negative values since the numerator and denominator of the F-ratio are squared quantities.
  • The range of F values is from 0 to infinity.
  • The shape of the F-curve depends on the parameters v1 and v2 (its nominator and denominator df). It is non-symmetrical and skewed to the right (positive skewed) distribution. It tends to become more and more symmetric when one or both of the parameter values (v1, v2) increases, as shown in the following figure.
  • It is asymptotic. As X values increases, the F-curve approaches the X-axis but never cross it or touch it (a similar behavior to the normal probability distribution).
  • F have a unique mode at the value \[\tilde{F}=\frac{v_2(v_2-2)}{v_1(v_2+2)},\quad (v_2>2)\] which is always less than unity.
  • The mean of F is $\mu=\frac{v_2}{v_2-2},\quad (v_2>2)$
  • The variance of F is \[\sigma^2=\frac{2v_2^2(v_1+v_2-2)}{v_1(v_2-2)(v_2-4)},\quad (v_2>4)\]

Assumptions of F-distribution

Statistical procedure of comparing the variances of two population have assumptions

  • The two population (from which the samples are drawn) follows Normal distribution
  • The two samples are random samples drawn independently from their respective populations.

Statistical procedure of comparing three or more populations means have assumptions

  • The population follow the Normal distribution
  • The population have equal standard deviations σ
  • The populations are independent from each other.

Note

This distribution is relatively insensitive to violations of the assumptions of normality of the parent population or the assumption of equal variances.

Use of F-Distribution Table

For given (specified) level of significance α, $F_\alpha(v_1,v_2)$ symbol is used to represent the upper (right hand side) 100% point of an F distribution having v1 and v2 df.

The lower (left hand side) percentage point can be found by taking the reciprocal of F-value corresponding to upper (right hand side) percentage point, but number of df are interchanged i.e. \[F_{1-\alpha}(v_1,v_2)=\frac{1}{F_\alpha(v_2,v_1)}\]

The distribution for the variable F is given by
\[Y=k.F^{(\frac{v_1}{2})-1}\left(1+\frac{F}{v_2}\right)^{-\frac{(v_1+v_2)}{2}}\]

References: