# Basic Statistics and Data Analysis

## Non Central Chi Squared Distribution

The Non Central Chi Squared Distribution is a generalization of the Chi Squared Distribution.
If $Y_{1} ,Y_{2} ,\cdots ,Y_{n} \sim N(0,1)$ i.e. $(Y_{i} \sim N(0,1)) \Rightarrow y_{i}^{2} \sim \psi _{i}^{2}$ and $\sum y_{i}^{2} \sim \psi _{(n)}^{2}$

If mean ($\mu$) is non-zero then $y_{i} \sim N(\mu _{i} ,1)$ i.e each $y_{i}$ has different mean
\begin{align*}
\Rightarrow  & \qquad y_i^2 \sim \psi_{1,\frac{\mu_i^2}{2}} \\
\Rightarrow  & \qquad \sum y_i^2 \sim \psi_{(n,\frac{\sum \mu_i^2}{2})} =\psi_{(n,\lambda )}^{2}
\end{align*}

Note that if $\lambda =0$ then we have central $\psi ^{2}$. If $\lambda \ne 0$ then it is non central chi squared distribution because it has no central mean (as distribution is not standard normal).

Central Chi-Square Distribution $f(x)=\frac{1}{2^{\frac{n}{2}} \left|\! {\overline{\frac{n}{2} }} \right. } \chi ^{\frac{n}{2} -1} e^{-\frac{x}{2} }; \qquad 0<x<\infty$

## Theorem:

If $Y_{1} ,Y_{2} ,\cdots ,Y_{n}$ are independent normal random variables with $E(y_{i} )=\mu _{i}$ and $V(y_{i} )=1$ then $w=\sum y_{i}^{2}$ is distributed as non central chi square with $n$ degree of freedom and non-central parameter $\lambda$, where $\lambda =\frac{\sum \mu _{i}^{2} }{2}$ and has pdf

\begin{align*}
f(w)=e^{-\lambda } \sum _{i=0}^{\infty }\left[\frac{\lambda ^{i} w^{\frac{n+2i}{2} -1} e^{-\frac{w}{2} } }{i!\, 2^{\frac{n+2i}{2} } \left|\! {\overline{\frac{n+2i}{2} }}  \right. } \right]\qquad 0\le w\le \infty
\end{align*}

## Proof:

Consider the moment generating function of $w=\sum y_{i}^{2}$

\begin{align*}
M_{w} (t)=E(e^{wt} )=E(e^{t\sum y_{i}^{2}  } ); \qquad \text{ where } y_{i} \sim N(\mu \_{i} ,1)
\end{align*}

By definition
\begin{align*}
M_{w} (t) &= \int \cdots \int e^{t\sum y_{i}^{2} } .f(y_{i} )dy_{i} \\
&= K_{1} \int \cdots \int e^{-\frac{1}{2} (1-2t)\left[\sum y_{i}^{2} -\frac{2\sum y_{i} \mu _{i} }{1-2t} \right]}   dy_{1} .dy_{2} \cdots dy_{n} \\
&\text{By completing square}\\
& =K_{1} \int \cdots \int e^{\frac{1}{2} (1-2t)\sum \left[\left[y_{i} -\frac{\mu _{i} }{1-2t} \right]^{2} -\frac{\mu _{i}^{2} }{(1-2t)^{2} } \right]}   dy_{1} .dy_{2} \cdots dy_{n} \\
&= e^{-\frac{\sum \mu_{i}^{2} }{2} \left(1-\frac{1}{1-2t} \right)} \int \cdots \int \left(\frac{1}{\sqrt{2\pi } } \right)^{n} \frac{\frac{1}{\left(\sqrt{1-2t} \right)^{n} } }{\frac{1}{\left(\sqrt{1-2t} \right)^{n} } }  \, e^{-\frac{1}{2.\frac{1}{1-2t} } .\sum \left(y_{i} -\frac{\mu _{i} }{1-2t} \right)^{2} }  dy_{1} .dy_{2} \cdots dy_{n}\\
&=e^{-\frac{\sum \mu _{i}^{2} }{2} \left(1-\frac{1}{1-2t} \right)} .\frac{1}{\left(\sqrt{1-2t} \right)^{n} } \int \cdots \int \left(\frac{1}{\sqrt{2\pi } } \right)^{n}  \frac{1}{\left(\sqrt{\frac{1} {1-2t}} \right)^n} e^{-\, \frac{1}{2.\frac{1}{1-2t} } .\sum \left(y_{i} -\frac{\mu_i}{1-2t}\right)^{2} } dy_{1} .dy_{2} \cdots dy_{n}\\
\end{align*}

where

$\int_{-\infty}^{\infty } \cdots \int _{-\infty }^{\infty }\left(\frac{1}{\sqrt{2\pi}} \right)^{n} \frac{1}{\left(\frac{1}{1-2t} \right)^{\frac{n}{2}}} e^{-{\frac{1}{2}.\frac{1}{1-2t} }} .\sum \left(y_{i} -\frac{\mu _{i} }{1-2t} \right)^{2} dy_{1} .dy_{2} \cdots dy_{n}$
is integral of complete density

\begin{align*}
M_{w}(t)&=e^{-\frac{\sum \mu_i^2}{2} \left(1-\frac{1}{1-2t}\right)} .\left(\frac{1}{\sqrt{1-2t} } \right)^{n} \\
&=\left(\frac{1}{\sqrt{1-2t}}\right)^{n} e^{-\lambda \left(1-\frac{1}{1-2t} \right)} \\
&=e^{-\lambda }.e^{\frac{\lambda}{1-2t}} \frac{1}{(1-2t)^{\frac{n}{2}}}\\
&=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} }{i!(1-2t)^{i} (1-2t)^{n/2} }\\
M_{w=y_{i}^{2} } (t)&=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} }{i!(1-2t)^{\frac{n+2i}{2} } }\tag{A}
\end{align*}

Now Moment Generating Function (MGF) for non-central distribution for a given density function is
\begin{align*}
M_{\omega} (t) & = E(e^{\omega t} )\\
&=\int _{0}^{\infty }e^{\omega \lambda } e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} \omega ^{\frac{n+2i}{2} -1} e^{-\frac{\omega }{2} } }{i!2^{\frac{n+2i}{2} } \left|\! {\overline{\frac{n+2i}{2} }}  \right. } d\omega\\
&=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} }{i!2^{\frac{n+2i}{2} } \left|\! {\overline{\frac{n+2i}{2} }}  \right. }  \int _{0}^{\infty }e^{\frac{\omega }{2} (1-2t)}  \omega ^{\frac{n+2i}{2} -1} d\omega
\end{align*}
Let
\begin{align*}
\frac{\omega }{2} (1-2t)&=P\\
\Rightarrow \omega & =\frac{2P}{1-2t} \\
\Rightarrow d\omega &=\frac{2dp}{1-2t}
\end{align*}

\begin{align*}
&=e^{-\lambda } \sum\limits_{i=0}^{\infty }\frac{\lambda ^{i} }{i!2^{\frac{n+2i}{2} } \left|\! {\overline{\frac{n+2i}{2} }}  \right. }  \int _{0}^{\infty }e^{-P} \left(\frac{2P}{1-2t} \right)^{\frac{n+2i}{2} -1} \frac{2dP}{1-2t}  \\
&=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} 2^{\frac{n+2i}{2} } }{i!2^{\frac{n+2i}{2} } \left|\! {\overline{\frac{n+2i}{2} }}  \right. (1-2t)^{\frac{n+2i}{2} -1} } \int _{0}^{\infty }e^{-P} P^{\frac{n+2i}{2} -1}  dP \\
&=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} }{i!\left|\! {\overline{\frac{n+2i}{2} }}  \right. (1-2t)^{\frac{n+2i}{2} } } \left|\! {\overline{\frac{n+2i}{2} }}  \right.
\end{align*}

as $\int\limits _{0}^{\infty }e^{-P} P^{\frac{n+2i}{2} -1} dP=\left|\! {\overline{\frac{n+2i}{2} }} \right.$

$M_{\omega } (t)=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} }{i!(1-2t)^{\frac{n+2i}{2} } } \tag{B}$

Comparing ($A$) and ($B$)
$M_{w=\sum y_{i}^{2} } (t)=M_{\omega } (t)$

By Uniqueness theorem

$f_{w} (w)=f_{\omega } (\omega )$
\begin{align*}
\Rightarrow \qquad f_{w} (t)&=f(\psi ^{2} )\\
&=e^{-\lambda } \sum _{i=0}^{\infty }\frac{\lambda ^{i} w^{\frac{n+2i}{2} -1} e^{-\frac{w}{2} } }{i!2^{\frac{n+2i}{2} } \left|\! {\overline{\frac{n+2i}{2} }}  \right. };  \qquad o\le w\le \infty
\end{align*}
is the pdf of non central chi square with n df and $\lambda =\frac{\sum \mu _{i}^{2} }{2}$ is the non-centrality parameter. Non central chi squared distribution is also Additive as central chi square distribution.

## F Distribution: Ratios of two Independent Estimators

F-distribution is a continuous probability distribution (also known as Snedecor’s F distribution or the Fisher-Snedecor distribution) which is named in honor of R.A. Fisher and George W. Snedecor. This distribution arises frequently as the null distribution of a test statistic (hypothesis testing), used to develop confidence interval and in the analysis of variance for comparison of several population means.

If $s_1^2$ and $s_2^2$ are two unbiased estimates of the population variance σ2 obtained from independent samples of size n1 and n2 respectively from the same normal population, then the mathematically F-ratio is defined as
$F=\frac{s_1^2}{s_2^2}=\frac{\frac{(n_1-1)\frac{s_1^2}{\sigma^2}}{v_1}}{\frac{(n_2-1)\frac{s_2^2}{\sigma^2}}{v_2}}$
where v1=n1-1 and v2=n2-1. Since $\chi_1^2=(n_1-1)\frac{s_1^2}{\sigma^2}$ and $\chi_2^2=(n_2-1)\frac{s_2^2}{\sigma^2}$ are distributed independently as $\chi^2$ with $v_1$ and $v_2$ degree of freedom respectively, we have
$F=\frac{\frac{\chi_1^2}{v_1}}{\frac{\chi_2^2}{v_2}}$

So, F Distribution is the ratio of two independent Chi-square ($\chi^2$) statistics each divided by their respective degree of freedom.

## Properties

•  F distribution takes only non-negative values since the numerator and denominator of the F-ratio are squared quantities.
• The range of F values is from 0 to infinity.
• The shape of the F-curve depends on the parameters v1 and v2 (its nominator and denominator df). It is non-symmetrical and skewed to the right (positive skewed) distribution. It tends to become more and more symmetric when one or both of the parameter values (v1, v2) increases, as shown in the following figure.

F distribution curves

• It is asymptotic. As X values increases, the F-curve approaches the X-axis but never cross it or touch it (a similar behavior to the normal probability distribution).
• F have a unique mode at the value $\tilde{F}=\frac{v_2(v_2-2)}{v_1(v_2+2)},\quad (v_2>2)$ which is always less than unity.
• The mean of F is $\mu=\frac{v_2}{v_2-2},\quad (v_2>2)$
• The variance of F is $\sigma^2=\frac{2v_2^2(v_1+v_2-2)}{v_1(v_2-2)(v_2-4)},\quad (v_2>4)$

### Assumptions of F-distribution

Statistical procedure of comparing the variances of two population have assumptions

• The two population (from which the samples are drawn) follows Normal distribution
• The two samples are random samples drawn independently from their respective populations.

Statistical procedure of comparing three or more populations means have assumptions

• The population follow the Normal distribution
• The population have equal standard deviations σ
• The populations are independent from each other.

## Note

F-distribution is relatively insensitive to violations of the assumptions of normality of the parent population or the assumption of equal variances.

## Use of F Distribution table

For given (specified) level of significance α, $F_\alpha(v_1,v_2)$ symbol is used to represent the upper (right hand side) 100% point of an F distribution having v1 and v2 df.

The lower (left hand side) percentage point can be found by taking the reciprocal of F-value corresponding to upper (right hand side) percentage point, but number of df are interchanged i.e. $F_{1-\alpha}(v_1,v_2)=\frac{1}{F_\alpha(v_2,v_1)}$

The distribution for the variable F is given by
$Y=k.F^{(\frac{v_1}{2})-1}\left(1+\frac{F}{v_2}\right)^{-\frac{(v_1+v_2)}{2}}$

References:

• http://en.wikibooks.org/wiki/Statistics/Distributions/F
• http://en.wikipedia.org/wiki/F-distribution
• http://www.itl.nist.gov/div898/handbook/eda/section3/eda3665.htm

# Introduction Probability Theory

Uncertainty is every where i.e nothing in this world is perfect or 100% certain except the Almighty Allah the Creator of the Universe. For example if someone bought 10 lottery tickets out of 500 and each of 500 tickets is as likely as any other to be selected or drawn for first prize then it means that you have 10 chances out of 500 tickets or 2% chances to win a first prize.

Similarly, a decision maker seldom have the complete information to make a decision.
So probability is a measure of likelihood that something will happen, however probability cannot predict the number of times that something will occur in future, so it is important that all the known risks involved be scientifically evaluated. The decisions that affect our daily life are based upon the likelihood (probability or chance) but not on absolute certainty. The use of probability theory allows the decision maker with only limited information to analyze the risks and minimize the gamble inherent. For example in marketing a new product or accepting an incoming shipment possibly containing defective parts.

Probability can be considered as the quantification of uncertainty or the likelihood. Probabilities are usually expressed as fraction such as {1/6, 1/2, 8/9} or as decimals such as {0.167, 0.5, 0.889} and can also be presented as percentages such as {16.7%, 50%, 88.9%}.

## Types of Probability

Suppose we want to compute the chances (Note that we are not predicting here, just measuring the chances) that something will occur in the future. For this purpose we have three types of probability

### 1) Classical Approach or Prior Approach

In classical probability approach two assumptions are used

Classical probability is defined as “The number of outcomes favorable to the occurrence of an event divided by the total number of all possible outcomes”.
OR
An experiment resulting “n” equally likely mutually exclusive and collectively exhaustive outcomes and “m” of which are favorable to the occurrence of an event A, then the probability of event A is the ration of m/n. (D.S. Laplace (1749-1927).

Symbolically we can write $P(A) = \frac{m}{n} = \frac{number\,\, of\,\, favorable\,\, outcomes}{Total\,\, number\,\, of\,\, outcomes}$

Some shortcoming of classical approach

• This approach to probability is useful only when one deals with cards games, dice games or coin tosses. i.e. Events are equally likely but not suitable for serious problems such as decisions in managements.
• This approach assumes a world that does not exists, as some assumptions are imposed described above.
• This approach assumes a symmetry about world but there may be some disorder in a system.

### 2) Relative Frequency or Empirical Probability or A Posterior Approach

The proportion of times that an event occurs in the long run when conditions are stable. Relative frequency becomes stable as the number of trials becomes large under the uniform conditions.
To calculate the relative frequency an experiment is repeated a large number of times say “n” under uniform/stable conditions. So if an event A occurs m times, then the probability of the occurrence of the event A is defined by
$P(A)=\lim_{x\to\infty}\frac{m}{n}$

if we say that the probability of a number n child will be a boy is 1/2, then it means that over a large number of children born 50% of all will be boys.

Some Critics

• It is difficult to ensure that the experiment is repeated under the stable/uniform conditions.
• Experiment can be repeated only a finite number of times in real world, not an infinite number of times.

### 3) Subjective Approach

This is the probability based on the beliefs of the persons making the probability assessment.
Subjective probability assessments are often found when events occur only once or at most a very few times.
This approach is applicable in business, marketing, economics for quick decisions without performing any mathematical calculations.
The Disadvantage of subjective probability is that two or more persons facing the same evidence/problem may arrive at different probabilities i.e for same problem there may be different decisions.

Real Life Example of Subjective Probability:

• A firm must decide whether or not to market a new type of product. The decision will be based prior information that the product will have high market acceptance.
• The Sales Manager considers that there is 40% chances of obtaining the order for which the firm has just quoted. This value (40% chances) cannot be tested by repeated trials.
• Estimating the probability that you will be married before the age of 30 years.
• Estimating the likelihood (probability, chances) that Pakistan budget deficit will be reduced by half in the next 5 years.

Note that subjective probability is not a repeatable experiment, the relative frequency approach to probability is not applicable, nor can equally likely probabilities be assigned.

# Probability Related Terms

Sets: A set is a well defined collection of distinct objects. The objects making up a set are called its elements. A set is usually capital letters i.e. A, B, C, while its elements are denoted by small letters i.e. a, b, c etc.

Null Set: A set that contains no element is called null set or simply the empty set. It is denoted by { } or Φ.

Subset: If every element of a set A is also an element of a set B, then A is said to be a subset of B and it is denoted by A≠B.

Proper Subset: If A is a subset of B, and B contains at least one element which is not an element of A, then A is said to be a proper subset of B and is denoted by; A $\subset$ B.

Finite and Infinite Sets: A set is finite, if it contains a specific number of elements, i.e. while counting the members of the sets, the counting process comes to an end otherwise the set is an infinite set.

Universal Set: A set consisting of all the elements of the sets under consideration is called the universal set. It is denoted by U.

Disjoint Set: Two sets A and B are said to be disjoint sets, if they have no elements in common i.e. if A U B =Φ, A then A and B are said to be disjoint sets.

Overlapping Sets: Two sets A and B are said to be overlapping sets, if they have at least one element in common, i.e. if A ∩ B ≠Φ and none of them is the subset of the other set then A and B are overlapping sets.

Union of Sets: Union of two sets A and B is a set that contains the elements either belonging to A or B or to both. It is denoted by B and read as A union B.

Intersection of Sets: Intersection of two sets A and B is a set that contains the elements belonging to both A and B. It is denoted by A U B and read as A intersection B.

Difference of Sets: The difference of a set A and a set B is the set that contains the elements of the set A which are not contained in B. The difference of sets A and B is denoted by A−B.

Complement of a Set: Complement of a set A denoted by $\bar{A}$ or $A^c$ and is defined as $\bar{A}$=U−A.

Experiment: Any activity where we observe something or measure something. Or an activity that results in or produces an event is called experiment.

Random Experiment: An experiment, if repeated under identical conditions may not give the same outcome, i.e The outcome of random experiment is uncertain, so that a given outcome is just one sample of many possible outcomes. For random experiment we knows about the all possible outcomes. A random experiment has the following properties;

1. The experiment can be repeated any number of times.
2. A random trial consists of at least two outcomes.

Sample Space: The set of all possible outcomes in a random experiment is called sample space. In coin toss experiment, the sample space is S={Head, Tail}, in card-drawing experiment the sample space has 52 member. Similarly the sample space for a die={1,2,3,4,5,6}.

Event:Event is simply a subset of sample space. In a sample space there can be two or more events consisting of sample points. For coin, the list of all possible event is 4, found by event =2ni.e. i) A1={H}, ii) A2={T}, iii) A3={H,T} and iv) A4are possible event for coin toss experiment.

Simple Event: If an event consists of one sample point, then it is called simple event. For example, when two coins are tossed, the event {TT} is a simple event.

Compound Event: If an event consists of more than one sample point, it is called a compound event. For example, when two dice are rolled, an event B, the sum of two faces is 4 i.e. B={(1,3), (2,3), 3,1)} is a compound event.

Independent Events: Two events A and B are said to be independent, if the occurrence of one does not affect the occurrence of the other. For example, in tossing two coins, the occurrence of a head on one coin does not affect in any way the occurrence of a head or tail on the other coin.

Dependent Events: Two events A and B are said to be dependent, if the occurrence of one event affects the occurrence of the other event.

Mutually Exclusive Events: Two events A and B are said to be mutually exclusive, if they cannot occur at the same time i.e. AUB=Φ. For example, when a coin is tossed, we get either a head or a tail, but not both. That is why they have no common point there, so these two events (head and tail) are mutually exclusive. Similarly, when a die is thrown, the possible outcomes 1, 2, 3, 4, 5, 6 are mutually exclusive.

Equally Likely or Non-Mutually Exclusive Events: Two events A and B are said to be equally likely events when one event is as likely to occur as the other. OR If the experiment is continued a large number of times all the events have the chance of occurring equal number of times. Mathematically, AUB≠Φ. For example when a coin is tossed, head is as likely to occur as tail or vice versa.

Exhaustive Events: When a sample space S is partitioned into some mutually exclusive events, such that their union is the sample space itself, the event are called exhaustive event. OR
Events are said to be collectively exhaustive when the union of mutually exclusive events is the entire sample space S.
Let a die is rolled, the sample space is S={1,2,3,4,5,6}.
Let A={1,2}, B={3,4,5} and C={6}

A, B and C are mutually exclusive events and their union is (AUBUC=S) is the sample space, so the events A, B and C are exhaustive.

# Binomial Probability Distributions

Bernoulli Trials

Many experiments consists of repeated independent trials and each trial have only two possible outcomes such as head or tail, right or wrong, alive or dead, defective or non-defective etc. If the probability of each outcome remains the same (constant) throughout the trials, then such trials are called the Bernoulli Trials.

Binomial Probability Distribution
Binomial Probability Distribution is a discrete probability distribution describing the results of an experiment known as Bernoulli Process. The experiment having n Bernoulli trials is called a Binomial Probability experiment possessing the following four conditions/ assumptions

1. The experiment consists of n repeated task.
2. Each trial, results in an outcome that may be classified as success or failure.
3. The probability of success denoted by p remains constant from trial to trial.
4. The repeated trials are independent.

A Binomial trial can result in a success with probability p and a failure with probability 1−p  having nx number of failures, then the probability distribution of Binomial Random Variable , the number of success in n independent trial is:

\begin{align*}
P(X=x)&=\binom{n}{x} \, p^x \, q^{n-x} \\
&=\frac{n!}{x!(n-x)!}\, p^x \, q^{n-x}
\end{align*}

The Binomial probability distribution is the most widely used distributions in situation of two outcomes. It was discovered by the Swiss mathematician Jakob Bernoulli (1654—1704) whose main work on “the ars Conjectandi” (the art of conjecturing) was published posthumously in Basel in 1713.

Mean of Binomial Distribution:   Mean = μ = np

Variance of Binomial Distribution:  Variance= npq

Standard Deviation of Binomial Distribution:  Standard Deviation = $\sqrt{npq}$

Moment Coefficient of Skewness:

\begin{align*}
\beta_1 &= \frac{q-p}{\sqrt{npq}}  \\
&= \frac{1-2p}{\sqrt{npq}}
\end{align*}

Moment Coefficient of Kurtosis:  $\beta_3 = 3+\frac{1-6pq}{npq}$