Statistics for Data Science & Analytics - Statistics MCQs, Software & Data Analysis

Percentiles: Relative Standing

Jul 6, 2025Mar 10, 2013 by Muhammad Imdad Ullah

Post Views: 593

Percentiles are a measure of the relative standing of an observation within a dataset. Percentiles divide a set of observations into 100 equal parts, and percentile scores are frequently used to report results from national standardized tests such as the NAT, GAT, and GRE etc.

The $p$th percentile is the value $Y_{(p)}$ in order statistic such that $p$ percent of the values are less than the value $Y_{(p)}$ and $(100-p)$ (100-p) percent of the values are greater $Y_{(p)}$. The 5th percentile is denoted by $P_5$, the 10th by $P_{10}$ and 95th by $P_{95}$.

Percentiles for the Ungrouped Data

To calculate percentiles (a measure of the relative standing of an observation) for the ungrouped data, adopt the following procedure:

Order the observation
For the $m$th percentile, determine the product $\frac{m.n}{100}$. If $\frac{m.n}{100}$ is not an integer, round it up and find the corresponding ordered value and if $\frac{m.n}{100}$ is an integer, say k, then calculate the mean of the $K$th and $(k+1)$th ordered observations.

Ungrouped Data Example

For the following height data collected from students, find the 10th and 95th percentiles. 91, 89, 88, 87, 89, 91, 87, 92, 90, 98, 95, 97, 96, 100, 101, 96, 98, 99, 98, 100, 102, 99, 101, 105, 103, 107, 105, 106, 107, 112.

Solution: The ordered observations of the data are 87, 87, 88, 89, 89, 90, 91, 91, 92, 95, 96, 96, 97, 98, 98, 98, 99, 99, 100, 100, 101, 101, 102, 103, 105, 105, 106, 107, 107, 112.

\[P_{10}= \frac{10 \times 30}{100}=3\]

So the 10th percentile, i.e., $P_{10}$ is the 3rd observation in sorted data is 88, which means that 10 percent of the observations in the data set are less than 88.

\[P_{95}=\frac{95 \times 30}{100}=28.5\]

The 29th observation is our 95th Percnetile i.e., $P_{95}=107$

Percentiles for the Frequency Distribution Table (Grouped data)

The $m$th percentile (a measure of the relative standing of an observation) for the Frequency Distribution Table (grouped data) is

\[P_m=l+\frac{h}{f}\left(\frac{m.n}{100}-c\right)\]

Like median, $\frac{m.n}{100}$ is used to locate the $m$th percentile group.

$l$    is the lower class boundary of the class containing the $m$th percentile
$h$   is the width of the class containing $P_m$
$f$    is the frequency of the class containing
$n$   is the total number of frequencies $P_m$
$c$    is the cumulative frequency of the class immediately preceding the class containing $P_m$

Note that the 50th percentile is the median by definition, as half of the values in the data are smaller than the median and half of the values are larger than the median. Similarly, the 25th and 75th percentiles are the lower ($Q_1$) and upper quartiles ($Q_3$), respectively. The quartiles, deciles, and percentiles are also called quantiles or fractiles.

Percentiles: Measure of Relative Standing

Grouped Data Example

For the following grouped data compute $P_{10}$, $P_{25}$, $P_{50}$, and $P_{95}$ given below.Solution:

Locate the 10th percentile (lower deciles i.e. $D_1$)by $\frac{10 \times n}{100}=\frac{10 \times 3o}{100}=3$ observation.
so, $P_{10}$ group is 85.5–90.5 containing the 3rd observation
\begin{align*}
P_{10}&=l+\frac{h}{f}\left(\frac{10 n}{100}-c\right)\\
&=85.5+\frac{5}{6}(3-0)\\
&=85.5+2.5=88
\end{align*}
Locate the 25th percentile (lower quartiles i.e. $Q_1$) by $\frac{10 \times n}{100}=\frac{25 \times 3o}{100}=7.5$ observation.
so, $P_{25}$ group is 90.5–95.5 containing the 7.5th observation
\begin{align*}
P_{25}&=l+\frac{h}{f}\left(\frac{25 n}{100}-c\right)\\
&=90.5+\frac{5}{4}(7.5-6)\\
&=90.5+1.875=92.375
\end{align*}
Locate the 50th percentile (Median i.e. 2nd quartiles, 5th deciles) by $\frac{50 \times n}{100}=\frac{50 \times 3o}{100}=15$ observation.
so, P₅₀ group is 95.5–100.5 containing the 15th observation
\begin{align*}
P_{50}&=l+\frac{h}{f}\left(\frac{50 n}{100}-c\right)\\
&=95.5+\frac{5}{10}(15-10)\\
&=95.5+2.5=98
\end{align*}
Locate the 95th percentile by $\frac{95 \times n}{100}=\frac{95 \times 30}{100}=28.5$th observation.
so, $P_{95}$ group is 105.5–110.5 containing the 3rd observation
\begin{align*}
P_{95}&=l+\frac{h}{f}\left(\frac{95 n}{100}-c\right)\\
&=105.5+\frac{5}{3}(28.5-26)\\
&=105.5+4.1667=109.6667
\end{align*}

The percentiles and quartiles may be read directly from the graphs of the cumulative frequency function.

Further Reading: https://en.wikipedia.org/wiki/Percentile

Drawing Graphs and Charts in R Language

Stem and Leaf Plot: Exploratory Data Analysis

Jul 6, 2025Feb 18, 2013 by Muhammad Imdad Ullah

Post Views: 840

Before performing any statistical calculation (even the simplest one), data should be tabulated or plotted, especially if they are quantitative and few (few observations), to visualize the shape of the distribution.

Stem and Leaf Plot

A stem and leaf plot summarizes the set of data measured on an interval scale in condensed form. Stem and leaf plots are often used in exploratory data analysis and help to illustrate the different features of the distribution of the observed data. A basic stem and leaf display contains two columns separated by a vertical line. The left side of the vertical line contains the stems, while the right side contains the leaves. It is customary to sort the values within each stem from smallest to largest. In this statistical technique (to present a set of data), each numerical value is divided into two parts

Leading Digit(s)
Trailing Digit

Stem values are the leading digit(s), and leaves are the trailing digit. The stems are located along the vertical axis, and the leaf values are stacked against each other along the horizontal axis.

A stem and leaf display is similar to a frequency distribution with more information. It provides information about the observed data set’s symmetry, concentration, empty sets, and outliers. Organizing the data into a frequency distribution has the disadvantage of

Loss of the exact identity of each value (individuality of observation vanishes)
Did not know (sure) how the values within each class are distributed.

The advantage of the stem and leaf plot (display) over a frequency distribution is that we do not lose the identity (individuality) of each observation. Similarly, a stem and leaf plot is similar to a histogram but usually provides more information for a relatively small data set.

More than one data set can be compared by using multiple stem and leaf plots. Using a back-to-back stem and leaf plot, we can compare the same characteristics in different groups.

The origin of the stem and leaf plot is associated with Tukey, J. W. (1977).

Constructing a Stem and Leaf Plot

Let us have the following data set: 56, 65, 98, 82, 64, 71, 78, 77, 86, 95, 91, 59, 69, 70, 80, 92, 76, 82, 85, 91, 92, 99, 73 and want to draw the required graph of the given data.

First of all, it’s better to sort the data. The sorted data is 56, 59, 64, 65, 69, 70, 71, 73, 76, 77, 78, 80, 82, 82, 85, 86, 91, 91, 92, 92, 95, 98, 99.

Now the first digit is the stem and the second one is a leaf, i.e., stems are from 5 to 9 as data ranges from 56 to 99.

Draw a vertical line separating the stem from the leaf. Put stem values on the left side of the vertical line (bar) and leaf values on the right side of the vertical line. Note that each number is assigned to the graph (plot) by pairing the unit digit, or leaf, with the correct stem. The score 56 is plotted by placing the unit’s digit 6 to the right of the stem 5.

The stem and leaf plot of the above data would look like this.

The decimal point is 1 digit(s) to the right of the |
Stem | Leaf
5      | 6 9
6      | 4 5 9
7      | 0 1 3 6 7 8
8      | 0 2 2 5 6
9      | 1 1 2 2 5 8 9

The stem and leaf plot looks like a histogram by rotating it anti-clockwise.

By adding columns of frequency and cumulative frequency in the stem and leaf plots, we can find the median of the data.

stem and Leaft Plot — Stem and Leaf Plot

Reference

Tukey, J. W (1977). Explanatory data analysis.
https://en.wikipedia.org/wiki/Stem-and-leaf_display

F Distribution: Ratios of two Independent Estimators (2013)

Aug 3, 2024Jan 26, 2013 by Muhammad Imdad Ullah

Post Views: 928

F-distribution is a continuous probability distribution (also known as Snedecor’s F distribution or the Fisher-Snedecor distribution) which is named in honor of R.A. Fisher and George W. Snedecor. This distribution arises frequently as the null distribution of a test statistic (hypothesis testing), used to develop confidence interval and in the analysis of variance for comparison of several population means.

If $s_1^2$ and $s_2^2$ are two unbiased estimates of the population variance $\sigma^2$ obtained from independent samples of size n₁ and n₂ respectively from the same normal population, then the mathematically F-ratio is defined as
\[F=\frac{s_1^2}{s_2^2}=\frac{\frac{(n_1-1)\frac{s_1^2}{\sigma^2}}{v_1}}{\frac{(n_2-1)\frac{s_2^2}{\sigma^2}}{v_2}}\]
where $v_1=n_1-1$ and $v_2=n_2-1$. Since $\chi_1^2=(n_1-1)\frac{s_1^2}{\sigma^2}$ and $\chi_2^2=(n_2-1)\frac{s_2^2}{\sigma^2}$ are distributed independently as $\chi^2$ with $v_1$ and $v_2$ degree of freedom respectively, we have
\[F=\frac{\frac{\chi_1^2}{v_1}}{\frac{\chi_2^2}{v_2}}\]

So, F Distribution is the ratio of two independent Chi-square ($\chi^2$) statistics each divided by their respective degree of freedom.

F Distribution Properties

This takes only non-negative values since the numerator and denominator of the F-ratio are squared quantities.

The range of F values is from 0 to infinity.
The shape of the F-curve depends on the parameters v₁ and v₂ (its nominator and denominator df). It is non-symmetrical and skewed to the right (positive skewed) distribution. It tends to become more and more symmetric when one or both of the parameter values (v₁, v₂) increase, as shown in the following figure.

It is asymptotic. As X values increase, the F-curve approaches the X-axis but never crosses it or touches it (similar behavior to the normal probability distribution).
F have a unique mode at the value \[\tilde{F}=\frac{v_2(v_2-2)}{v_1(v_2+2)},\quad (v_2>2)\] which is always less than unity.
The mean of F is $\mu=\frac{v_2}{v_2-2},\quad (v_2>2)$
The variance of F is \[\sigma^2=\frac{2v_2^2(v_1+v_2-2)}{v_1(v_2-2)(v_2-4)},\quad (v_2>4)\]

Assumptions of F Distribution

The statistical procedure of comparing the variances of two populations has assumptions

The two populations (from which the samples are drawn) follow Normal distribution
The two samples are random samples drawn independently from their respective populations.

The statistical procedure of comparing three or more populations has assumptions

The population follows the Normal distribution
The population has equal standard deviations σ
The populations are independent of each other.

Note

This distribution is relatively insensitive to violations of the assumptions of normality of the parent population or the assumption of equal variances.

Use of F Distribution Table

For a given (specified) level of significance α, $F_\alpha(v_1,v_2)$ symbol is used to represent the upper (right-hand side) 100% point of an F distribution having $v_1$ and $v_2$ df.

The lower (left-hand side) percentage point can be found by taking the reciprocal of the F-value corresponding to the upper (right-hand side) percentage point, but the number of df is interchanged i.e. \[F_{1-\alpha}(v_1,v_2)=\frac{1}{F_\alpha(v_2,v_1)}\]

The distribution for the variable F is given by
\[Y=k.F^{(\frac{v_1}{2})-1}\left(1+\frac{F}{v_2}\right)^{-\frac{(v_1+v_2)}{2}}\]

References:

Learn R Programming Language

Percentiles: Relative Standing

Table of Contents

Percentiles for the Ungrouped Data

Ungrouped Data Example

Percentiles for the Frequency Distribution Table (Grouped data)

Grouped Data Example

Stem and Leaf Plot: Exploratory Data Analysis

Table of Contents

Stem and Leaf Plot

Constructing a Stem and Leaf Plot

Reference

F Distribution: Ratios of two Independent Estimators (2013)

Table of Contents

F Distribution Properties

Assumptions of F Distribution

Note

Use of F Distribution Table

Table of Contents

Percentiles for the Ungrouped Data

Ungrouped Data Example

Percentiles for the Frequency Distribution Table (Grouped data)

Grouped Data Example

Share this:

Table of Contents

Stem and Leaf Plot

Constructing a Stem and Leaf Plot

Reference

Share this:

Table of Contents

F Distribution Properties

Assumptions of F Distribution

Note

Use of F Distribution Table

Share this: