Finding probability distribuiton after n steps of Random walk starting at x=0

VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)
VN:F [1.9.22_1171]
Rating: 5.0/5 (1 vote cast)

Probability Distribution after n steps

Assume that the walk starts at x=0 with steps to the right or left occurring with probabilities p and q=1-p. We can write the position $X_n$ after n steps as
\[X_n=R_n-L_n \tag{1}\]
where $R_n$ is the number of right or positive steps (+1) and $L_n$ is the number of left or negative steps (-1).

Therefore the Total steps can be calculated as:  \[n=R_n+L_n \tag{2}\]
Hence
\begin{align*}
L_n&=n-R_n\\
\Rightarrow X_n&=R_n-n+R_n\\
R_n&=\frac{1}{2}(n+X_n) \tag{3}
\end{align*}
The equation (3) will be an integer only when n and $X_n$ are both even or both odd (eg. To go from x=0 to x=7 we must take an odd number of steps).

Now, let $v_{n,x}$ be the probability that the walk is at state x after n steps assuming that x is a positive integer. Then
\begin{align*}
v_{n,x}&=P(X_n=x)\\
&=P(R_n=\frac{1}{2}(n+x))
\end{align*}
$R_n$ is a binomial random variable with index $n$ having probability p, since the walker either moves to the right or not at every step, and the steps are independent, then
\begin{align*}
v_{n,x}&=\binom{n}{\frac{1}{2}(n+x)}p^{\frac{1}{2}(n+x)}q^{n-\frac{1}{2}(n+x)}\\
&=\binom{n}{\frac{1}{2}(n+x)}p^{\frac{1}{2}(n+x)}q^{\frac{1}{2}(n-x)} \tag{4}
\end{align*}
where (n,x) are both even or both odd and $-n \leq x \leq n$. Note that a similar argument can be constructed if x is a negative integer.

Example: For total number of steps is 2, the net displacement must be one of the three possibilities: (1) two steps to the left, (2) back to the start, (3) or two steps to the right. These correspond to values of x = -2, 0,+2. Clearly it is impossible to get more than two units away from the origin if you take only two steps and it is equally impossible to end up exactly one unit from the origin if you take two steps.

For symmetric case ($p=\tfrac{1}{2}$), starting from the origin, there are $2^n$ different paths of length n since there is a choice of right or left move at each step. Since the number of steps in the right direction must be $\tfrac{1}{2}(n+x)$ and the total number of paths must be the number of ways in which $\frac{1}{2}(n+x)$ can be chosen from n: that is
\[N_{n,x}=\binom{n}{\tfrac{1}{2}(n+x)}\]
provided that $\tfrac{1}{2}(n+x)$ is an integer.

By counting rule, the probability that the walk ends at x after n steps is given by the ratio of this number and the total number of paths (since all paths are equally likely). Hence
\[v_{n,x}=\frac{N_{n,x}}{2^n}=\binom{n}{\tfrac{1}{2}(n+x)}\frac{1}{2^n}\]
The probability $v_{n,x}$ is the probability that the walk ends at state x after n steps: the walk could have overshot x before returning there.

Related Probability/ First Passage through x

A related probability is the probability that the first visit to position x occurs at the nth step. The following is descriptive derivation of the associated probability generating function of the symmetric random walk in which the walk starts at the origin, and we consider the probability that it returns to the origin.

From equation (4), the probability that a walk is at the origin at step n is
\begin{align*}
v_{n,x}&=\binom{n}{\frac{1}{2}(n+x)}p^{\frac{1}{2}(n+x)}q^{n-\frac{1}{2}(n+x)}\\
&=\binom{n}{\tfrac{1}{2}(n+0)} \left(\frac{1}{2}\right)^{\tfrac{1}{2}n} \left(\frac{1}{2}\right)^{\tfrac{1}{2}n}\\
&=\binom{n}{\tfrac{1}{2}n}\frac{1}{2^n}=p_n \,\,\,\text{(say)}, \quad (n=2,4,6,\cdots) \tag{5}
\end{align*}
Here $p_n$ is the probability that after n steps the position of walker is at origin. We also assume that $p_n=0$ if n is odd. From equation (5) we can construct a generating function.
\begin{align*}
H(s)&=\sum_{n=0}^\infty p_n s^n\\
&=\sum_{n=0}^\infty p_{2n}s^{2n}=\sum_{n=0}^\infty \frac{1}{2^{2n}}\binom{2n}{n}s^{2n} \tag{6}
\end{align*}
Note that $p_0=1$, and H(s) is not a probability generating function since $H(1)\neq1$.

The binomial coefficient can be re-arranged as follows:
\begin{align*}
\binom{2n}{n}&=\frac{(2n)!}{n!n!}=\frac{2n(2n-1)(2n-2)\cdots3.2.1}{n!n!}\\
&=\frac{2^nn!(2n-1)(2n-3)\cdots3.2.1}{n!n!}\\
&=\frac{2^{2n}}{n!}\frac{1}{2}\frac{3}{2}\cdots(n-\tfrac{1}{2})\\
&=(-1)^n \binom{-\tfrac{1}{2}}{n}2^{2n} \tag{7}
\end{align*}
Using equation (6) in (7)
\[H(s)=\sum_{n=0}^\infty \frac{1}{2^{2n}}(-1)^n \binom{-\tfrac{1}{2}}{n}s^{2n}2^{2n}=(1-s^2)^{-\tfrac{1}{2}} \tag{8}\]
by binomial theorem, provided $|s|<1$. Note that this expansion guarantees that $p_n=0$ if n is odd.

Note that the equation (8) does not sum to one. This is called defective distribution which still gives the probability that the walk is at the origin at step n.

We can estimate the behaviour of $p_n$ for large n by using Stirling’s Formula (asymptotic estimate for n! for large n), $n!\approx\sqrt{2\pi} n^{n+\tfrac{1}{2}}e^{-n}$

From equation (5)
\begin{align*}
p_{2n}&=\frac{1}{2^{2n}}\binom{2n}{n}=\frac{1}{2^{2n}}\frac{(2n)!}{n!n!}\\
&\approx\frac{1}{2^{2n}}\frac{\sqrt{2\pi}(2n)^{2n+\tfrac{1}{2}}e^{-2n}}{[\sqrt{2\pi}(n^{n+\tfrac{1}{2}}e^{-n})]^2}\\
&=\frac{1}{\sqrt{\pi n}}; \qquad \text{for large $n$}
\end{align*}
Hence $np_n\rightarrow \infty$ confirming that the series $\sum\limits_{n=0}^\infty p_n$ must diverge.

Example: Consider a random walk starts from x_0=0 find the probability that after 5 steps the position is 3. i.e. $X_5=3$, p=0.6.

Solution: Here number of steps are n=5 and position is x=3. Therefore positive and negative steps are

$R_n= \frac{1}{2}(n+x)=\frac{1}{2}(5+3)=4$ and $X_n=R_n-L_n \Rightarrow 3=4+L_n=1$
The probability that the event $X_5=3$ will occur in a random walk with $p=0.6$ is
\[P(X_5=3)=\binom{5}{\frac{1}{2}(5+3)}(0.6)^{\tfrac{1}{2}(3+5)}(0.4)^{\tfrac{1}{2}(5-3)}=0.2592\]

Download pdf file:
Download32 downloads

 

VN:F [1.9.22_1171]
Rating: 5.0/5 (1 vote cast)
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)

R language provide enriched help system for users to understand and Learn R Lanugague

VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

Getting Help in R Language

R Language has a very useful and advance help system, which help the R user to understand the R language and let him to know how programming should be done in R language.

To get help in R language you can click Help button on the toolbar of RGui (R Graphical User Interface) windows. If you have internet access on you PC you can type CRAN in Google and search for the help you need at CRAN.

Use of ? for Help

On the other hand, if you know the name of the function (you want help), you need to type question mark (?) followed by the name of the required function on the R command line prompt. For example to get help about “lm” function type ?lm and then press enter key from keyboard.
help(lm) or ?lm have same search results in R language.

help.start()

To get General help in R write the following command at R command prompt
help.start()

Use of help.search()

Sometimes it is difficult to remember the precise name of the function, but you know the subject on which you need help for example data input. Use the help.search function (without question mark) with your query in double quotes like this:
help.search(“data input”)
Press Enter key you will see the names of the R functions associated with the query.  After that you can easily use ?lm to get more detailed help.

Use of find(“”)

To get help find and apropos are also useful functions. The find function tells you what package something is in: for example

find(“cor”) gives output that the cor is in stats package.

Use of apropos()

The apropos function return a character vector giving the names of all objects in the search list that match your enquiry (potentially partial) i.e. This command list all functions containing your string. for example
apropos(“lm”)
will gave the list of all functions containing string lm

Use of example()

example(lm) will provide an example of your required function such as lm

Online Help

There is huge amount of information about R on the web. On CRAN you will find variety of help/ manuals. There are also answers to FAQs (Frequently Asked Questions) and R News (contains interesting articles, books reviews and news of forthcoming releases. Search facility of site allows you to investigate the contents of the R documents, functions, and searchable mail archives.

Help Manuals and Archived Mailing lists {RSiteSearch()}

You can search your required function or string in help manuals and archived mailing lists by using
RSiteSearch(“read.csv”)

get vignettes

vignette is an R jargon for documentation, and are written in the spirit of sharing knowledge, and
assisting new users in learning the purpose and use of a package. To get some help try ?vignette. Vignettes are optional supplemental documentation, thats why not all packages come with vignettes.
vignette()             will show available vingettes
vignette(“foo”)    will show specific vignette

Now you have learned how to use the help in R, now you can continue with the other R tutorials. It is possible that you do not understand something discussed in the coming R tutorials. If this happens then you should use the built-in help system before going to the internet. In most of the cases, the help system of R Language will give you enough information about the required function that you have searched for.

Some Source of R Help/ Manual/ Documentations

http://cran.r-project.org/manuals.html

http://manuals.bioinformatics.ucr.edu/home/programming-in-r

http://rwiki.sciviews.org/doku.php

http://cran.r-project.org/bin/windows/base/rw-FAQ.html

Download pdf file:
Getting Help in R33 downloads

 

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Introduction to R Language

VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

R Language

What is R (Language)
R is an open-source (GPL) programming language for statistical computing and graphics, made after S and S-plus language. The S language was developed by AT & T laboratories in late 80′s. Robert Gentleman and Ross Ihaka started the research project of the statistics Department of the University of Auckland in 1995 and called R Language.

R language is currently maintained by the R core-development team (international team of volunteer developers). The (R Project website) is the main site for information about R. From this page information about obtaining software, accompanying package and many other sources of documentation (help files) can be obtained.

R provides a wide variety of statistical and graphical techniques such as linear and non-linear modeling, classical statistical tests, time series analysis, classification, multivariate analysis etc., as it is an integrated suite of software having facilities for data manipulation, calculation and graphics display. It includes

  • Effective data handling and storage facilities
  • Have suite of operators for calculation on arrays, particularly for matrices
  • Have a large, coherent, integrated collection of intermediate tools for data analysis
  • Graphical data analysis
  • Conditions, loops, user-define recursive functions and input output facilities.

Obtaining R Software
R program can be obtained/ download from the R Project site the ready-to-run (binaries) files for several operating system such as Windows, Mac OS X, Linux, Solaris, etc. The source code for R is also available for download and can be compiled for other platforms. R language simplifies many statistical computations as R is a very powerful statistical language having many statistical routines (programming code) developed by people from all over the world and are freely available from the R project website (www.r-project.org) as “Packages”. The basic installation of R language contains many powerful set of tools and it includes some basic packages required for data handling and data analysis.

Many users of R think of R as a statistical system, but it is an environment within which statistical techniques are implemented. R can also be extended via packages.

Installing R
For windows operating system binary version is available from http://cran.r- project.org/bin/windows/base/. “R-3.0.0-win.exe. R-3.0.0″ is the latest version of R released on 03-April-2013, by Duncan Murdoch.
After downloading the binary file double click it, an almost automatic installation of the R system will start although the customized installation option is also available. Follow the instruction during the installation procedure. Once Installation process is complete, you have R icon on your computer desktop.

The R Console
When R starts, you will see Rconsole windows, where you type some commands to get required results. Note that commands are typed on RConsole command prompt. You can also edit the commands previously typed on command prompt by using left, right, up, down arrow keys, home, end, backspace, insert and delete key from keyboard. Command history can be get by up and down arrow keys to scroll through recent commands. It is also possible to type commands in a file and then execute the file using the source function in Rconsole.

Books
Following books can be useful for learning R and S language.

  • “Psychologie statistique avec R” by Yvonnick Noel. Partique R. Springer, 2013.
  • “Instant R: An introduction to R for statistical Analysis” by Sarah Stowell. Jotunheim Publishing, 2012.
  • “Financial Risk Modeling and Portfolio Optimization with R” by Bernhard Pfaff. Wiley, Chichester, Uk, 2012.
  • “An R companion to Applied Regression” by John Fox and Sanfor Weisberg, Sage Publications, Thousand Oaks, CA, USA, 2nd Edition, 2011,
  • “R Graphs Cookbook” by Hrishi Mittal, Packt Publishing, 2011
  • “R in Action” by Rob Kabacoff. Manning, 2010.
  • “The statistical analysis with R Beginners Guide” by John M. Quick. Packt Publishing, 2010.
  • “Introducing Monte Carlo Methods with R” by Christian Robert and George Casella. Use R. Springer, 2010.
  • “R for SAS and SPSS users” by Robert A. Muenchen. Springer Series in Statistics and Computing. Springer, 2009.

Web Sources
Following are some useful web source for learning R

Download pdf file: 

R Language41 downloads

 

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Graph is a visual display of data in the form of continuous curves

VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

Graphic Presentation of Data

A chart/ graph says more than twenty pages of prose, its true when you are presenting and explaining data. Graph is a visual display of data in the form of continuous curves or discontinuous lines on graph paper. Many graphs just represent a summary of data that has been collected to support a particular theory, to understand data quickly in a visual way, by helping the audience, to make a comparison, to show a relationship, or to highlight a trend.

Usually it is suggested that graphical representations of the data should be carefully looked at before proceeding for the formal statistical analysis, because trend in the data can often be depicted by the use of charts and graphs.

A chart/ graph is a graphical representation of data, in which the data is usually represented by symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart. A chart/ graph can represent tabular numeric data, functions or some kinds of qualitative structures.

Common Uses of Graphs

Presenting the data in graph is a pictorial way of representing relationships between various quantities, parameters, variables. A graph basically summarizes how one quantity changes if another quantity that is related to it also changes.

  1. Graphs are useful for checking assumptions made about the data i.e. the probability distribution assumed.
  2. The graphs provide a useful subjective impression as to what the results of the formal analysis should be.
  3. Graphs often suggest the form of a statistical analysis to be carried out, particularly, the graph of model fitted to the data.
  4. Graphs gives a visual representation of the data or the results of statistical analysis to the reader which are usually easily understandable and more attractive.
  5. item Some graphs are useful for checking the variability in the observation and outliers can be easily detected.

Some Important Points for Drawing Graphs

  • Clearly label the axis with the names of the variable and units of measurement.
  • Keep the units along each axis uniform, regardless of the scales chosen for axis.
  • Keep the diagram simple. Avoid any unnecessary details.
  • A clear and concise title should be chosen to make the graph meaningful.
  • If the data on different graphs are to be measured always use identical scales.
  • In the scatter plot, do not join up the dots. This makes it likely that you will see apparent patterns in any random scatter of points.
  • Use either grid rulings or tick marks on the axis to mark the graph divisions.
  • Use color, shading, or pattern to differentiate the different sections of the graphs such as lines, pieces of the pie, bars etc.
  • In general start each axis from zero; if the graph is too large, indicate a break in the grid.

Download pdf:
Graph and Chart48 downloads

 

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Percentiles are measure of relative standing of an observation within a data

VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
VN:F [1.9.22_1171]
Rating: 4.0/5 (1 vote cast)

Percentiles

Percentiles are measure of relative standing of an observation within a data. Percentiles divides a set of observations into 100 equal parts, and percentile scores are frequently used to report results from national standardized tests such as NAT, GAT etc.

The pth percentile is the value Y(p) in order statistic such that p percent of the values are less than the value Y(p) and (100-p) percent of the values are greater Y(p) . The 5th percentile is denoted by P5 , the 10th by P10 and 95th by P95 .

Percentiles for the ungrouped data

To calculate percentiles for the ungrouped data, adopt the following procedure

  1. Order the observation
  2. For the mth percentile, determine the product $\frac{m.n}{100}$. If $\frac{m.n}{100}$ is not an integer, round it up and find the corresponding ordered value and if $\frac{m.n}{100}$ is an integer, say k, then calculate the mean of the Kth and (k+1)th ordered observations.

Example: For the following height data collected from students find the 10th and 95th percentiles. 91, 89, 88, 87, 89, 91, 87, 92, 90, 98, 95, 97, 96, 100, 101, 96, 98, 99, 98, 100, 102, 99, 101, 105, 103, 107, 105, 106, 107, 112.

Solution: The ordered observations of the data are 87, 87, 88, 89, 89, 90, 91, 91, 92, 95, 96, 96, 97, 98, 98, 98, 99, 99, 100, 100, 101, 101, 102, 103, 105, 105, 106, 107, 107, 112.

\[P_{10}= \frac{10 \times 30}{100}=3\]

So the 10th percentile i.e  P10 is 3rd observation in sorted data is 88, means that 10 percent of the observations in data set are less than 88.

\[P_{95}=\frac{95 \times 30}{100}=28.5\]

29th observation is our 95th percentile i.e. P95=107.

Percentiles for the Grouped data

The mth percentile for grouped data is

\[P_m=l+\frac{h}{f}\left(\frac{m.n}{100}-c\right)\]

Like median, $\frac{m.n}{100}$ is used to locate the mth percentile group.

l     is the lower class boundary of the class containing the mth percentile
h   is the width of the class containing Pm
f    is the frequency of the class containing
n   is the total number of frequencies Pm
c    is the cumulative frequency of the class immediately preceding to the class containing Pm

Note that 50th percentile is the median by definition as half of the values in the data are smaller than the median and half of the values are larger than the median. Similarly 25th and 75th percentiles are the lower (Q1) and upper quartiles (Q3) respectively. The quartiles, deciles and percentiles are also called quantiles or fractiles.

Example: For the following grouped data compute P10 , P25 , P50 , and P95 given below.Deciles, Percentiles for Grouped dataSolution:

  1. Locate the 10th percentile (lower Deciles i.e. D1)by $\frac{10 \times n}{100}=\frac{10 \times 3o}{100}=3$ observation.
    so, P10 group is 85.5–90.5 containing the 3rd observation
    \begin{align*}
    P_{10}&=l+\frac{h}{f}\left(\frac{10 n}{100}-c\right)\\
    &=85.5+\frac{5}{6}(3-0)\\
    &=85.5+2.5=88
    \end{align*}
  2. Locate the 25th percentile (lower Quartiles i.e. Q1)  by $\frac{10 \times n}{100}=\frac{25 \times 3o}{100}=7.5$ observation.
    so, P25 group is 90.5–95.5 containing the 7.5th observation
    \begin{align*}
    P_{25}&=l+\frac{h}{f}\left(\frac{25 n}{100}-c\right)\\
    &=90.5+\frac{5}{4}(7.5-6)\\
    &=90.5+1.875=92.375
    \end{align*}
  3. Locate the 50th percentile (Median i.e. 2nd quartiles, 5th deciles) by $\frac{50 \times n}{100}=\frac{50 \times 3o}{100}=15$ observation.
    so, P50 group is 95.5–100.5 containing the 15th observation
    \begin{align*}
    P_{50}&=l+\frac{h}{f}\left(\frac{50 n}{100}-c\right)\\
    &=95.5+\frac{5}{10}(15-10)\\
    &=95.5+2.5=98
    \end{align*}
  4. Locate the 95th percentile by $\frac{95 \times n}{100}=\frac{95 \times 3o}{100}=28.5$th observation.
    so, P95 group is 105.5–110.5 containing the 3rd observation
    \begin{align*}
    P_{95}&=l+\frac{h}{f}\left(\frac{95 n}{100}-c\right)\\
    &=105.5+\frac{5}{3}(28.5-26)\\
    &=105.5+4.1667=109.6667
    \end{align*}

The percentiles and quartiles may be read directly from the graphs of cumulative frequency function.

 Download pdf file: 
Percentiles95 downloads

 

VN:F [1.9.22_1171]
Rating: 4.0/5 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Stem and Leaf plot helps to visualize the features of distribution of observed data

VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

Stem and Leaf plots

Before performing any statistical calculation (even the simplest one), data should be tabulated or plotted especially if they are of quantitative nature and are few in number (few observations) to visualize the shape of the distribution.

A stem and leaf plot is a way of summarizing the set of data measured on an interval scale in condensed form. Stem and leaf plot are often used in exploratory data analysis, and help to illustrate the different features of the distribution of the observed data. A basic stem-and-leaf plot contains two columns separated by a vertical line. The left side of the vertical line contains the stems while the right side of the vertical line contains the leaves. It is customary to sort the values within each stem from smallest to largest. In this statistical technique (to present a set of data), each numerical value is divided into two parts

  1. Leading Digit(s)
  2. Trailing Digit

Stem values are the leading digit(s) and leaves are trailing digit. The stems are located along the vertical axis, and the leaf values are stacked against each other along the horizontal axis.

A stem and leaf plot is similar to a frequency distribution with more information. It provides information about the symmetry, concentration, empty sets and outlier of the observed data set. Organizing the data into a frequency distribution has disadvantage of

  1. Lose of exact identity of each value (individuality of observation vanishes)
  2. Did not know (sure) how the values within each class are distributed.

The advantage of the stem and leaf plot (display) over a frequency distribution is that we do not lose identity (individuality) of each observation. Similarly a stem and leaf plot is similar to histogram but is usually provide more information for relatively small data set.

More than one data set can be compared by using the multiple stem and leaf plots. Using a back-to-back stem and leaf plot we can compare the same characteristics in to different groups.

The origin of the stem and leaf plot is associated with Tukey, J.W (1977).

Constructing a stem-and-leaf display

Let we have the following data set: 56, 65, 98, 82, 64, 71, 78, 77, 86, 95, 91, 59, 69, 70, 80, 92, 76, 82, 85, 91, 92, 99, 73 and want to draw stem and leaf plot of the given data.

First of all its better to sort the data. The sorted data is 56, 59, 64, 65, 69, 70, 71, 73, 76, 77, 78, 80, 82, 82, 85, 86, 91, 91, 92, 92, 95, 98, 99.

Now first digit is stem and second one is leaf, i.e stems are from 5 to 9 as data ranges from 56 to 99.

Draw a vertical line separating stem from leaf. Put stem values on the left side of the vertical line (bar) and leaf values on the right side of the vertical line.  Note that Each number is assigned to the graph (plot) by pairing the units digit, or leaf, with the correct stem. The score 56 is plotted by placing the units digit  6, to the right of stem 5.

The stem and leaf plot of the above data would look like.

The decimal point is 1 digit(s) to the right of the |
Stem | Leaf
5      | 6 9
6      | 4 5 9
7      | 0 1 3 6 7 8
8      | 0 2 2 5 6
9      | 1 1 2 2 5 8 9

Stem and leaf plot look like histogram by rotating it anti-clock wise.

By adding columns of frequency and cumulative frequency in stem and leaf plot we can find median of the data.

stem and Leaft PlotReference

Tukey, J. W (1977). Explanatory data analysis.

Download pdf file:

 

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Continuous Probability Distribution, Ratio of Two independnet Chi-Square statistics….

VN:F [1.9.22_1171]
Rating: +1 (from 3 votes)
VN:F [1.9.22_1171]
Rating: 3.7/5 (3 votes cast)

Introduction

F-distribution is a continuous probability distribution (also known as Snedecor’s F distribution or the Fisher-Snedecor distribution) which is named in honor of R.A. Fisher and George W. Snedecor. This distribution arises frequently as the null distribution of a test statistic (hypothesis testing), used to develop confidence interval and in the analysis of variance for comparison of several population means.

If $s_1^2$ and $s_2^2$ are two unbiased estimates of the population variance σ2 obtained from independent samples of size n1 and n2 respectively from the same normal population, then the mathematically F-ratio is defined as
\[F=\frac{s_1^2}{s_2^2}=\frac{\frac{(n_1-1)\frac{s_1^2}{\sigma^2}}{v_1}}{\frac{(n_2-1)\frac{s_2^2}{\sigma^2}}{v_2}}\]
where v1=n1-1 and v2=n2-1. Since $\chi_1^2=(n_1-1)\frac{s_1^2}{\sigma^2}$ and $\chi_2^2=(n_2-1)\frac{s_2^2}{\sigma^2}$ are distributed independently as $\chi^2$ with $v_1$ and $v_2$ degree of freedom respectively, we have
\[F=\frac{\frac{\chi_1^2}{v_1}}{\frac{\chi_2^2}{v_2}}\]

So, F Distribution is the ratio of two independent Chi-square ($\chi^2$) statistics each divided by their respective degree of freedom.

Properties

  •  F distribution takes only non-negative values since the numerator and denominator of the F-ratio are squared quantities.
  • The range of F values is from 0 to infinity.
  • The shape of the F-curve depends on the parameters v1 and v2 (its nominator and denominator df). It is non-symmetrical and skewed to the right (positive skewed) distribution. It tends to become more and more symmetric when one or both of the parameter values (v1, v2) increases, as shown in the following figure.

F distribution itfeature.com

  • It is asymptotic. As X values increases, the F-curve approaches the X-axis but never cross it or touch it (a similar behavior to the normal probability distribution).
  • F have a unique mode at the value \[\tilde{F}=\frac{v_2(v_2-2)}{v_1(v_2+2)},\quad (v_2>2)\] which is always less than unity.
  • The mean of F is $\mu=\frac{v_2}{v_2-2},\quad (v_2>2)$
  • The variance of F is \[\sigma^2=\frac{2v_2^2(v_1+v_2-2)}{v_1(v_2-2)(v_2-4)},\quad (v_2>4)\]

Assumptions of F-distribution

Statistical procedure of comparing the variances of two population have assumptions

  • The two population (from which the samples are drawn) follows Normal distribution
  • The two samples are random samples drawn independently from their respective populations.

Statistical procedure of comparing three or more populations means have assumptions

  • The population follow the Normal distribution
  • The population have equal standard deviations σ
  • The populations are independent from each other.

Note

F-distribution is relatively insensitive to violations of the assumptions of normality of the parent population or the assumption of equal variances.

Use of F-table

For given (specified) level of significance α, $F_\alpha(v_1,v_2)$ symbol is used to represent the upper (right hand side) 100% point of an F distribution having v1 and v2 df.

The lower (left hand side) percentage point can be found by taking the reciprocal of F-value corresponding to upper (right hand side) percentage point, but number of df are interchanged i.e. \[F_{1-\alpha}(v_1,v_2)=\frac{1}{F_\alpha(v_2,v_1)}\]

The distribution for the variable F is given by
\[Y=k.F^{(\frac{v_1}{2})-1}\left(1+\frac{F}{v_2}\right)^{-\frac{(v_1+v_2)}{2}}\]

References:

  • http://en.wikibooks.org/wiki/Statistics/Distributions/F
  • http://en.wikipedia.org/wiki/F-distribution
  • http://www.itl.nist.gov/div898/handbook/eda/section3/eda3665.htm
Download pdf file:
F-Distribution109 downloads

 

VN:F [1.9.22_1171]
Rating: 3.7/5 (3 votes cast)
VN:F [1.9.22_1171]
Rating: +1 (from 3 votes)