Basic Statistics and Data Analysis

F Distribution: Ratios of two Independent Estimators

F-distribution is a continuous probability distribution (also known as Snedecor’s F distribution or the Fisher-Snedecor distribution) which is named in honor of R.A. Fisher and George W. Snedecor. This distribution arises frequently as the null distribution of a test statistic (hypothesis testing), used to develop confidence interval and in the analysis of variance for comparison of several population means.

If $s_1^2$ and $s_2^2$ are two unbiased estimates of the population variance σ2 obtained from independent samples of size n1 and n2 respectively from the same normal population, then the mathematically F-ratio is defined as
$F=\frac{s_1^2}{s_2^2}=\frac{\frac{(n_1-1)\frac{s_1^2}{\sigma^2}}{v_1}}{\frac{(n_2-1)\frac{s_2^2}{\sigma^2}}{v_2}}$
where v1=n1-1 and v2=n2-1. Since $\chi_1^2=(n_1-1)\frac{s_1^2}{\sigma^2}$ and $\chi_2^2=(n_2-1)\frac{s_2^2}{\sigma^2}$ are distributed independently as $\chi^2$ with $v_1$ and $v_2$ degree of freedom respectively, we have
$F=\frac{\frac{\chi_1^2}{v_1}}{\frac{\chi_2^2}{v_2}}$

So, F Distribution is the ratio of two independent Chi-square ($\chi^2$) statistics each divided by their respective degree of freedom.

Properties

•  F distribution takes only non-negative values since the numerator and denominator of the F-ratio are squared quantities.
• The range of F values is from 0 to infinity.
• The shape of the F-curve depends on the parameters v1 and v2 (its nominator and denominator df). It is non-symmetrical and skewed to the right (positive skewed) distribution. It tends to become more and more symmetric when one or both of the parameter values (v1, v2) increases, as shown in the following figure.

F distribution curves

• It is asymptotic. As X values increases, the F-curve approaches the X-axis but never cross it or touch it (a similar behavior to the normal probability distribution).
• F have a unique mode at the value $\tilde{F}=\frac{v_2(v_2-2)}{v_1(v_2+2)},\quad (v_2>2)$ which is always less than unity.
• The mean of F is $\mu=\frac{v_2}{v_2-2},\quad (v_2>2)$
• The variance of F is $\sigma^2=\frac{2v_2^2(v_1+v_2-2)}{v_1(v_2-2)(v_2-4)},\quad (v_2>4)$

Assumptions of F-distribution

Statistical procedure of comparing the variances of two population have assumptions

• The two population (from which the samples are drawn) follows Normal distribution
• The two samples are random samples drawn independently from their respective populations.

Statistical procedure of comparing three or more populations means have assumptions

• The population follow the Normal distribution
• The population have equal standard deviations σ
• The populations are independent from each other.

Note

F-distribution is relatively insensitive to violations of the assumptions of normality of the parent population or the assumption of equal variances.

Use of F Distribution table

For given (specified) level of significance α, $F_\alpha(v_1,v_2)$ symbol is used to represent the upper (right hand side) 100% point of an F distribution having v1 and v2 df.

The lower (left hand side) percentage point can be found by taking the reciprocal of F-value corresponding to upper (right hand side) percentage point, but number of df are interchanged i.e. $F_{1-\alpha}(v_1,v_2)=\frac{1}{F_\alpha(v_2,v_1)}$

The distribution for the variable F is given by
$Y=k.F^{(\frac{v_1}{2})-1}\left(1+\frac{F}{v_2}\right)^{-\frac{(v_1+v_2)}{2}}$

References:

• http://en.wikibooks.org/wiki/Statistics/Distributions/F
• http://en.wikipedia.org/wiki/F-distribution
• http://www.itl.nist.gov/div898/handbook/eda/section3/eda3665.htm

Binomial Probability Distributions

Bernoulli Trials

Many experiments consists of repeated independent trials and each trial have only two possible outcomes such as head or tail, right or wrong, alive or dead, defective or non-defective etc. If the probability of each outcome remains the same (constant) throughout the trials, then such trials are called the Bernoulli Trials.

Binomial Probability Distribution
Binomial Probability Distribution is a discrete probability distribution describing the results of an experiment known as Bernoulli Process. The experiment having n Bernoulli trials is called a Binomial Probability experiment possessing the following four conditions/ assumptions

1. The experiment consists of n repeated task.
2. Each trial, results in an outcome that may be classified as success or failure.
3. The probability of success denoted by p remains constant from trial to trial.
4. The repeated trials are independent.

A Binomial trial can result in a success with probability p and a failure with probability 1−p  having nx number of failures, then the probability distribution of Binomial Random Variable , the number of success in n independent trial is:

\begin{align*}
P(X=x)&=\binom{n}{x} \, p^x \, q^{n-x} \\
&=\frac{n!}{x!(n-x)!}\, p^x \, q^{n-x}
\end{align*}

The Binomial probability distribution is the most widely used distributions in situation of two outcomes. It was discovered by the Swiss mathematician Jakob Bernoulli (1654—1704) whose main work on “the ars Conjectandi” (the art of conjecturing) was published posthumously in Basel in 1713.

Mean of Binomial Distribution:   Mean = μ = np

Variance of Binomial Distribution:  Variance= npq

Standard Deviation of Binomial Distribution:  Standard Deviation = $\sqrt{npq}$

Moment Coefficient of Skewness:

\begin{align*}
\beta_1 &= \frac{q-p}{\sqrt{npq}}  \\
&= \frac{1-2p}{\sqrt{npq}}
\end{align*}

Moment Coefficient of Kurtosis:  $\beta_3 = 3+\frac{1-6pq}{npq}$