Quartiles

Introduction to Quantiles and Quartiles

Quantiles are the techniques used to divide the data into different equal parts. For example, quantiles divide the data into four equal parts. Quartile comes from quarter which means 4th part. Deciles divide the data into ten equal parts and they come from deca means the 10th part. Percentiles divide the data into hundred parts and it comes to percent which means the 100th part.

Therefore, quartiles, deciles, and percentiles are used to divide the data into 4, 10, and 100 parts respectively. The quantiles, deciles, and percentiles are collectively called quantiles.

Quartiles

Quartiles are the rules which divide the data into four equal parts. When we divide any data into four equal parts then we cut it at e equidistant points. Therequartiles ($Q_1, Q_2$, and $Q_3$) as quartiles divide the data into four equal parts so divide the number of observations by four for each quartile.

Quartiles for Ungroup Data

\begin{align*}
Q_1 &= \left(\frac{n+1}{4}\right)th \text{ value is the} \frac{1}{4} \text{ part}\\
Q_2 &= \left(\frac{2(n+1)}{4}\right)th \text{ value is the} \frac{2}{4} \text{ part}\\
\left(\frac{3(n+1)}{4}\right)th \text{ value is the} \frac{3}{4} \text{ part}
\end{align*}

The following ungroup data has 96 observations $(n=96)$

222225253030303131333639
404042424848505152555759
818689899091919192939393
939494949596969697979898
999999100100100101101102102102102
102103103104104104105106106106107108
108108109109109110111112112113113113
113114115116116117117117118118119121

The first, second, and third quartiles of the above data set are:

\begin{align*}
Q_1 &= \left(\frac{n}{4}\right)th \text{ position } = \left(\frac{96}{4} = 24th \text{ value} = 59\\
Q_2 &= \left(\frac{2\times 96}{4}\right) = 48th \text{position} = 98\\
Q_3 &= \left(\frac{3\times n}{4}\right)th = \left(\frac{3\times 96}{}\right)th \text{ position} = 72th \text{ position} = 108
\end{align*}

Note that the above data is already sorted. If data is not sorted, first we need to arrange/sort the data in ascending order.

Quartiles for Gruoped Data

For the following grouped data one can also compute the quantiles, hence the quartiles.

ClassesfxC.B.CF
65-84974.564.5-84.59
85-1041094.584.5-104.519
105-12417114.5104.4.5-124.536
125-14410134.5124.5-144.546
145-1645154.5144.5-164.551
165-1844174.5164.5-184.455
185-2045194.5184.5-204.560
Total60   

From the above-grouped data, we have 60 observations $(n=60)= \sum\limits_{i=1}^n = f_i = \Sigma f = 60$. The three quartile will be

\begin{align*}
\frac{n}{4} &= \left(\frac{60}{4}\right)th = 15th \text{ value}\\
Q_1 &= l + \frac{h}{f}\left(\frac{n}{4} – CF\right) = 84.5 + \frac{20}{10}(15-9) = 96.5\\
\frac{2n}{4} &= \left(\frac{2\times 60}{4} \right) = 30th \text{ value}\\
Q_2 &= l + \frac{h}{f}\left(\frac{2n}{4} – CF\right) = 104.5 + \frac{20}{17}(30-19) = 117.44\\
\frac{3n}{4} &= \left(\frac{3\times 60}{4} \right) = 45th \text{ value}\\
Q_3 &= l + \frac{h}{f}\left(\frac{3n}{4} – CF\right) = 124.5 + \frac{20}{17}(45-36) = 142.5\\
\end{align*}

https://rfaqs.com

https://gmstat.com

Measures of Dispersion: Variance (2021)

Variance is one of the most important measures of dispersion of a distribution of a random variable. The term variance was introduced by R. A. Fisher in 1918. The variance of a set of observations (data set) is defined as the mean of the squares of deviations of all the observations from their mean. When it is computed for the entire population, the variance is called the population variance, usually denoted by $\sigma^2$, while for sample data, it is called sample variance and denoted by $S^2$ to distinguish between population variance and sample variance. Variance is also denoted by $Var(X)$ when we speak about the variance of a random variable. The symbolic definition of population and sample variance is

$\sigma^2=\frac{\sum (X_i – \mu)^2}{N}; \quad \text{for population data}$

$\sigma^2=\frac{\sum (X_i – \overline{X})^2}{n-1}; \quad \text{for sample data}$

It should be noted that the variance is in the square of units in which the observations are expressed and the variance is a large number compared to the observations themselves. The variance because of its nice mathematical properties, assumes an extremely important role in statistical theory.

Variance can be computed if we have standard deviation as the variance is the square of standard deviation i.e. Variance = (Standard Deviation)$^2$.

measures-of-dispersion

Variance can be used to compare dispersion in two or more sets of observations. Variance can never be negative since every term in the variance is the squared quantity, either positive or zero.
To calculate the standard deviation one has to follow these steps:

  1. First, find the mean of the data.
  2. Take the difference of each observation from the mean of the given data set. The sum of these differences should be zero or near zero it may be due to the rounding of numbers.
  3. Square the values obtained in step 1, which should be greater than or equal to zero, i.e. should be a positive quantity.
  4. Sum all the squared quantities obtained in step 2. We call it the sum of squares of differences.
  5. Divide this sum of squares of differences by the total number of observations if we have to calculate population standard deviation ($\sigma$). For sample standard deviation (S) divide the sum of squares of differences by the total number of observations minus one i.e. degree of freedom.
    Find the square root of the quantity obtained in step 4. The resultant quantity will be the standard deviation for the given data set.
Measures of Dispersion

The major characteristics of the variances are:
a)    All of the observations are used in the calculations
b)    Variance is not unduly influenced by extreme observations
c)    The variance is not in the same units as the observation, the variance is in the square of units in which the observations are expressed.

Consider a scenario: Imagine two groups of students both score an average of 70% on an exam. However, in Group A, most scores are clustered around 70%, while in Group B, scores are spread out widely. The measure of spread (like standard deviation or variance) helps distinguish these scenarios, providing a more nuanced understanding of student performance.

By understanding how spread out (scatterness of) the data points are from the average value (mean), standard deviation offers valuable insights in various practical scenarios. It allows for data-driven decision making in quality control, investment analysis, scientific research, and other fields.

https://itfeature.com

Read more about Measures of Dispersion

Computer MCQs Online Test

R Programming Language

Standard Deviation: A Measure of Dispersion (2017)

The standard deviation is a widely used concept in statistics and it tells how much variation (measure of spread or dispersion) is in the data set. It can be defined as the positive square root of the mean (average) of the squared deviations of the values from their mean.
To calculate the standard deviation one has to follow these steps:

Calculation of Standard Deviation

  1. First, find the mean of the data.
  2. Take the difference of each data point from the mean of the given data set (which is computed in step 1). Note that, the sum of these differences must be equal to zero or near to zero due to rounding of numbers.
  3. Now compute the square of the differences obtained in Step 2, it would be greater than zero, and it will be a positive quantity.
  4. Now add up all the squared quantities obtained in step 3. We call it the sum of squares of differences.
  5. Divide this sum of squares of differences (obtained in step 4) by the total number of observations (available in data) if we have to calculate population standard deviation ($\sigma$). If you want t to compute sample standard deviation ($S$) then divide the sum of squares of differences (obtained in step 4) by the total number of observations minus one ($n-1$) i.e. the degree of freedom. Note that $n$ is the number of observations available in the data set.
  6. Find the square root (also known as under root) of the quantity obtained in step 5. The resultant quantity in this way is known as the standard deviation (SD) for the given data set.

The sample SD of a set of $n$ observation, $X_1, X_2, \cdots, X_n$ denoted by $S$ is

\begin{aligned}
\sigma &=\sqrt{\frac{\sum_{i=1}^n (X_i-\overline{X})^2}{n}}; Population\, SD\\
S&=\sqrt{ \frac{\sum_{i=1}^n (X_i-\overline{X})^2}{n-1}}; Sample\, SD
\end{aligned}

The standard deviation can be computed from variance too.

The real meaning of the standard deviation is that for a given data set 68% of the data values will lie within the range $\overline{X} \pm \sigma$ i.e. within one standard deviation from the mean or simply within one $\sigma$. Similarly, 95% of the data values will lie within the range $\overline{X} \pm 2 \sigma$ and 99% within $\overline{X} \pm 3 \sigma$.

Standard Deviation

Examples

A large value of SD indicates more spread in the data set which can be interpreted as the inconsistent behaviour of the data collected. It means that the data points tend to be away from the mean value. For the case of smaller standard deviation, data points tend to be close (very close) to the mean indicating the consistent behavior of the data set.

The standard deviation and variance are used to measure the risk of a particular investment in finance. The mean of 15% and standard deviation of 2% indicates that it is expected to earn a 15% return on investment and we have a 68% chance that the return will be between 13% and 17%. Similarly, there is a 95% chance that the return on the investment will yield an 11% to 19% return.

measures-of-dispersion

Online MCQs Test Preparation Website

The sum of Squared Deviations from Mean (2015)

Introduction of Sum Square Deviations

In statistics, the sum of squared deviations (also known as the sum of squares) is a measure of the total variability (Measure of spread or variation) within a data set. In other words, the sum of squares is a measure of deviation or variation from the mean (average) value of the given data set.

Computation of Sum of Squared Deviations

A sum of squares is calculated by first computing the differences between each data point (observation) and the mean of the data set, i.e. $x=X-\overline{X}$. The computed $x$ is known as the deviation score for the given data set. Squaring each of these deviation scores and then adding these squared deviation scores gave us the sum of squared deviation (SS), which is represented mathematically as

\[SS=\sum(x^2)=\sum(X-\overline{X})^2\]

Note that the small letter $x$ usually represents the deviation of each observation from the mean value, while the capital letter $X$ represents the variable of interest in statistics.

The Sum of Squared Deviations Example

Consider the following data set {5, 6, 7, 10, 12}. To compute the sum of squares of this data set, follow these steps

  • Calculate the average of the given data by summing all the values in the data set and then divide this sum of numbers by the total number of observations in the data set. Mathematically, it is $\frac{\sum X_i}{n}=\frac{40}{5}=8$, where 40 is the sum of all numbers $5+6+7+10+12$ and there are 5 observations in number.
  • Calculate the difference of each observation in the data set from the average computed in step 1, for the given data. The differences are
    $5 – 8 = –3$; $6 – 8 = –2$; $7 – 8 = –1$; $10 – 8 =2$ and $12 – 8 = 4$
    Note that the sum of these differences should be zero. $(–3 + –2 + –1 + 2 +4 = 0)$
  • Now square each of the differences obtained in step 2. The square of these differences are
    9, 4, 1, 4 and 16
  • Now add the squared number obtained in step 3. The sum of these squared quantities will be $9 + 4 + 1 + 4 + 16 = 34$, which is the sum of the square of the given data set.
Sum of Squared Deviations

Sums of Squares in Different Context

In statistics, the sum of squares occurs in different contexts such as

  • Partitioning of Variance (Partition of Sums of Squares)
  • The sum of Squared Deviations (Least Squares)
  • The sum of Squared Differences (Mean Squared Error)
  • The sum of Squared Error (Residual Sum of Squares)
  • The sum of Squares due to Lack of Fit (Lack of Fit Sum of Squares)
  • The sum of Squares for Model Predictions (Explained Sum of Squares)
  • The sum of Squares for Observations (Total Sum of Squares)
  • The sum of Squared Deviation (Squared Deviations)
  • Modeling involving the Sum of Squares (Analysis of Variance)
  • Multivariate Generalization of Sum of Square (Multivariate Analysis of Variance)

As previously discussed, the Sum of Squares is a measure of the Total Variability of a set of scores around a specific number.

Summary

  • A higher sum of squares indicates that your data points are further away from the mean on average, signifying greater spread or variability in the data. Conversely, a lower sum of squares suggests the data points are clustered closer to the mean, indicating less variability.
  • The sum of squares plays a crucial role in calculating other important statistics like variance and standard deviation. These concepts help us understand the distribution of data and make comparisons between different datasets.

Online MCQs Test Website

R Faqs