Deciles: Measure of Position

The deciles are the values (nine in number) of the variable that divides an ordered (sorted, arranged) data set into ten equal parts so that each part represents $\frac{1}{10}$ of the sample or population and are denoted by $D_1, D_2, \cdots D_9$, where First decile ($D_1$) is the value of order statistics that exceed 1/10 of the observations and less than the remaining $\frac{9}{10}$. The $D_9$ (ninth decile) is the value in order statistic that exceeds $\frac{9}{10}$ of the observations and is less than $\frac{1}{10}$ remaining observations. Note that the fifth deciles are equal to the median. The deciles determine the values for 10%, 20%,…, and 90% of the data.

Calculating Deciles for Ungrouped Data

To calculate the decile for the ungrouped data, first order all observations according to the magnitudes of the values, then use the following formula for $m$th decile.

\[D_m= m \times \left( \frac{(n+1)}{10} \right) \mbox{th value; } \qquad \mbox{where} m=1,2,\cdots,9\]

Example: Calculate the 2nd and 8th deciles of the following ordered data 13, 13,13, 20, 26, 27, 31, 34, 34, 34, 35, 35, 36, 37, 38, 41, 41, 41, 45, 47, 47, 47, 50, 51, 53, 54, 56, 62, 67, 82.
Solution:

\begin{eqnarray*}
D_m &=&m \times \{\frac{(n+1)}{10} \} \mbox{th value}\\
&=& 2 \times \frac{30+1}{10}=6.2\\
\end{eqnarray*}

We must locate the sixth value in the ordered array and then move 0.2 of the distance between the sixth and seventh values. i.e., the value of the 2nd decile can be calculated as
\[6 \mbox{th observation} + \{7 \mbox{th observation} – 6 \mbox{th observation} \}\times 0.2\]
as 6th observation is 27 and 7th observation is 31.
The second decile would be $27+\{31-27\} \times 0.2 = 27.8$

Similarly, $D_8$ can be calculated. $D_8=52.6$.

Calculating Decile for Grouped Data

The following formula can calculate the $m$th decile for grouped data (in ascending order).

\[D_m=l+\frac{h}{f}\left(\frac{m.n}{10}-c\right)\]

where

$l$ = is the lower class boundary of the class containing $m$th deciles
$h$ = is the width of the class containing $m$th deciles
$f$ = is the frequency of the class containing $m$th deciles
$n$ = is the total number of frequencies
$c$ = is the cumulative frequency of the class preceding the class containing $m$th deciles

Example: Computing Decile for Grouped Data

Calculate the first and third decile(s) of the following grouped data

Deciles: Measure of position made easy

Solution: The Decile class for $D_1$ can be calculated from $\left(\frac{m.n}{10}-c\right) = \frac{1 \times 30}{10} = 3$rd observation. As 3rd observation lies in the first class (first group) so

\begin{eqnarray*}
D_m&=&l+\frac{h}{f}\left(\frac{m.n}{10}-c\right)\\
D_1&=&85.5+\frac{5}{6}\left(\frac{1\times30}{10}-0\right)\\
&=&88\\
\end{eqnarray*}

The Decile class for $D_7$ is 100.5—105.5 as $\frac{7 \times 30}{10}=21$th observation which is in fourth class (group).
\begin{eqnarray*}
D_m&=&l+\frac{h}{f}\left(\frac{m.n}{10}-c\right)\\
D_7&=&100.5+\frac{5}{6}\left(\frac{7\times30}{10}-20\right)\\
&=&101.333\\
\end{eqnarray*}

Importance of Deciles in Statistics and Data Analysis

Deciles are a valuable statistical measure used to divide a dataset into ten equal parts, each representing 10% of the data distribution. They help in understanding the spread, variability, and central tendencies within a dataset. Below are key reasons why deciles are important:

  1. Data Distribution Analysis: Deciles provide a clear breakdown of how data is distributed across different segments. They help identify whether data is skewed, uniform, or concentrated in certain ranges.
  2. Comparison of Data Sets: By comparing deciles across different datasets, analysts can assess differences in distributions (e.g., income levels, test scores, or sales performance). Useful in benchmarking (e.g., comparing a company’s performance against industry deciles).
  3. Identifying Outliers and Extremes: The 1st decile ($D_1$) and 9th decile ($D_9$) help detect unusually low or high values. Useful in finance (e.g., risk assessment) and healthcare (e.g., identifying extreme patient results).
  4. Economic and Social Research: Used by Governments and economists to analyze income/wealth inequality (e.g., the top 10% vs. the bottom 10%). Helps in policy-making (e.g., tax brackets, welfare programs).
  5. Business and Marketing Applications: Businesses categorize customers into deciles based on spending habits. Helps in targeted marketing (e.g., focusing on the top 10% of high-value customers).
  6. Educational and Performance Assessment: Used by Schools and universities to rank student performance (e.g., standardized test scores). Helps identify students needing extra support or advanced programs.
  7. Investment and Portfolio Management: Investors analyze stock or fund performance using decile rankings. Helps in risk management by comparing high-risk vs. low-risk assets.
  8. Robust Alternative to Percentiles and Quartiles: While quartiles divide data into four parts, they provide finer granularity (10 parts). More detailed than quintiles (5 parts) but less complex than percentiles (100 parts).

Conclusion

Deciles are a simple yet powerful tool for understanding data distributions, making comparisons, and supporting decision-making in fields like economics, business, education, and finance. They offer a balanced approach between simplicity (like quartiles) and extreme detail (like percentiles), making them widely useful in statistical analysis.

https://itfeature.com statistics data analytics

Learn R Language

Measure of Central Tendency

Introduction to Measure of Central Tendency

The Measure of central tendency is a statistic that summarizes the entire quantitative or qualitative set of data in a single value (a representative value of the data set) tending to concentrate somewhere in the center of the data. The tendency of the observations to cluster in the central part of the data is called the central tendency and the summary values as measures of central tendency, also known as the measure of location or position, are also known as averages.

Note that

  • The Measure of central tendency should be within the data set’s range.
  • It should remain unchanged by rearranging the observations in a different order.

Criteria of Satisfactory Measures of Location or Averages

There are several types of averages available to measure the representative value of a set of data or distribution. So, an average should satisfy or possess all or most of the following conditions.

  • It should be well-defined, i.e., rigorously defined. There should be no confusion in its definition. The sum of values divided by their total number is the well-defined definition of Arithmetic Mean.
  • It should be based on all the observations made.
  • It should be simple to understand and easy to interpret.
  • It can be calculated quickly and easily.
  • It should be amenable/manageable to mathematical treatment.
  • It should be relatively stable in repeating sampling experiments.
  • It should not be unduly influenced by abnormally large or small observations (i.e., extreme observations)

The mean, median, and mode are all valid measures of central tendencies, but under different conditions, some measures of central tendencies become more appropriate to use than others. There are several different kinds of calculations for central tendency, where the kind of calculation depends on the type of the data, i.e. level of measurement on which data is measured.

Measures of Central Tendencies

The following are the measures of central tendencies for univariate or multivariate data.

Measures of Central Tendency
  • The arithmetic mean: The sum of all measurements divided by the number of observations in the data set
  • Median:  The middlemost value for sorted data. The median separates the higher half from the lower half of the data set, i.e., partitioning the data set into parts.
  • Mode: The most frequent or repeated value in the data set.
  • Geometric mean: The nth root of the product of the data values.
  • Harmonic mean: The reciprocal of the arithmetic mean of the reciprocals of the data values
  • Weighted mean: An arithmetic mean incorporating the weights to elements of certain data.
  • Distance-weighted estimator: The measure uses weighting coefficients for $x_i$ that are computed as the inverse mean distance between $x_i$ and the other data points.
  • Truncated mean: The arithmetic mean of data values after a certain number or proportion of the highest and lowest data values have been discarded.
  • Midrange: The arithmetic mean of the maximum and minimum values of a data set.
  • Midhinge: The arithmetic mean of the two quartiles.
  • Trimean: The weighted arithmetic mean of the median and two quartiles.
  • Winsorized mean: An arithmetic mean in which extreme values are replaced by values closer to the median.

Note that measures of central tendency are applied according to different levels of measures (type of a variable).

Measure of Central Tendency

The best measure to use depends on the characteristics of your data and the specific question you’re trying to answer.

In summary, measures of central tendencies are fundamental tools in statistics whose use depends on the characteristics of the data being studied. The measures are used to summarize the data and are used to provide insight and foundation for further analysis. They also help in getting valuable insights for decision-making and prediction. Therefore, understanding the measures of central tendencies is essential to effectively analyzing and interpreting data.

FAQS about Measure of Central Tendency

  1. Define the measure of central tendency.
  2. What conditions must a measure of tendency follow?
  3. Name widely used measures of central tendency.
  4. What is the functionality of the measure of central tendencies?
  5. What statistical measures can be applied on which level of measurement?

Reference


1) Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-
2) https://en.wikipedia.org/wiki/Central_tendency
3) Dodge, Y. (2005) The Concise Encyclopedia of Statistics. Springer,

R and Data Analysis

Computer MCQs Test Online

Moments In Statistics (2012)

Introduction to Moments in Statistics

The measure of central tendency (location) and the measure of dispersion (variation) are useful for describing a data set. Both the measure of central tendencies and the measures of dispersion fail to tell anything about the shape of the distribution. We need some other certain measure called the moments. Moments in Statistics are used to identify the shape of the distribution known as skewness and kurtosis.

Moments are fundamental statistical tools for understanding the characteristics of any dataset. They provide quantitative measures that describe the data:

  • Central tendency: The “center” of the data. It is the most common measure of central tendency, but other moments can also be used.
  • Spread: Indicates how scattered the data is around the central tendency. Common measures of spread include variance and standard deviation.
  • Shape: Describes the overall form of the data distribution. For instance, is it symmetrical? Does it have a long tail on one side? Higher-order moments like skewness and kurtosis help analyze the shape.

Moments about Mean

The moments about the mean are the mean of deviations from the mean after raising them to integer powers. The $r$th population moment about the mean is denoted by $\mu_r$ is

\[\mu_r=\frac{\sum\limits^{N}_{i=1}(y_i – \bar{y} )^r}{N}\]

where $r=1,2,\cdots$

The corresponding sample moment denoted by $m_r$ is

\[\mu_r=\frac{\sum\limits^{n}_{i=1}(y_i – \bar{y} )^r}{n}\]

Note that if $r=1$ i.e. the first moment is zero as $\mu_1=\frac{\sum\limits^{n}_{i=1}(y_i – \bar{y} )^1}{n}=0$. So the first moment is always zero.

If $r=2$ then the second moment is variance i.e. \[\mu_2=\frac{\sum\limits^{n}_{i=1}(y_i – \bar{y} )^2}{n}\]

Similarly, the 3rd and 4th moments are

\[\mu_3=\frac{\sum\limits^{n}_{i=1}(y_i – \bar{y} )^3}{n}\]

\[\mu_4=\frac{\sum\limits^{n}_{i=1}(y_i – \bar{y} )^4}{n}\]

For grouped data, the $r$th sample moment  about the sample mean $\bar{y}$ is

\[\mu_r=\frac{\sum\limits^{n}_{i=1}f_i(y_i – \bar{y} )^r}{\sum\limits^{n}_{i=1}f_i}\]

where $\sum\limits^{n}_{i=1}f_i=n$

Moments about Arbitrary Value

The $r$th sample sample moment about any arbitrary origin “a” denoted by $m’_r$ is
\[m’_r = \frac{\sum\limits^{n}_{i=1}(y_i – a)^2}{n} = \frac{\sum\limits^{n}_{i=1}D^r_i}{n}\]
where $D_i=(y_i -a)$ and $r=1,2,\cdots$.

therefore
\begin{eqnarray*}
m’_1&=&\frac{\sum\limits^{n}_{i=1}(y_i – a)}{n}=\frac{\sum\limits^{n}_{i=1}D_i}{n}\\
m’_2&=&\frac{\sum\limits^{n}_{i=1}(y_i – a)^2}{n}=\frac{\sum\limits^{n}_{i=1}D_i ^2}{n}\\
m’_3&=&\frac{\sum\limits^{n}_{i=1}(y_i – a)^3}{n}=\frac{\sum\limits^{n}_{i=1}D_i ^3}{n}\\
m’_4&=&\frac{\sum\limits^{n}_{i=1}(y_i – a)^4}{n}=\frac{\sum\limits^{n}_{i=1}D_i ^4}{n}
\end{eqnarray*}

The $r$th sample moment for grouped data about any arbitrary origin “a” is

$$m’_r=\frac{\sum\limits^{n}_{i=1}f_i(y_i – a)^r}{\sum\limits^{n}_{i=1}f} = \frac{\sum f_i D_i ^r}{\sum f}$$

The moments about the mean are usually called central moments and the moments about any arbitrary origin “a” are called non-central moments or raw moments.

One can calculate the moments about mean from the following relations by calculating the moments about arbitrary value

\begin{eqnarray*}
m_1&=& m’_1 – (m’_1) = 0 \\
m_2 &=& m’_2 – (m’_1)^2\\
m_3 &=& m’_3 – 3m’_2m’_1 +2(m’_1)^3\\
m_4 &=& m’_4 -4 m’_3m’_1 +6m’_2(m’_1)^2 -3(m’_1)^4
\end{eqnarray*}

Moments about Zero

If variable $y$ assumes $n$ values $y_1, y_2, \cdots, y_n$ then $r$th moment about zero can be obtained by taking $a=0$ so the moment about arbitrary value will be
\[m’_r = \frac{\sum y^r}{n}\]

where $r=1,2,3,\cdots$.

therefore
\begin{eqnarray*}
m’_1&=&\frac{\sum y^1}{n}\\
m’_2 &=&\frac{\sum y^2}{n}\\
m’_3 &=&\frac{\sum y^3}{n}\\
m’_4 &=&\frac{\sum y^4}{n}\\
\end{eqnarray*}

The third moment is used to define the skewness of a distribution

\[{\rm Skew ness} = \frac{\sum\limits^{i=1}_n (y_i-\overline{y})^3} {ns^3}\]

If the distribution is symmetric then the skewness will be zero. Skewness will be positive if there is a long tail in the positive direction and skewness will be negative if there is a long tail in the negative direction.

The fourth moment is used to define the kurtosis of a distribution

\[{\rm Kurtosis} = \frac{\sum\limits^{i=1}_{n} (y_i -\overline{y})^4}{ns^4}\]

Moments in Statistics

In summary, moments are quantitative measures that describe the distribution of a dataset around its central tendency. Different types of moments, provide specific information about the shape and characteristics of data. By understanding and utilizing moments, one can get a deeper understanding of the data and make more informed decisions in statistical analysis.

FAQS about Moments in Statistics

  1. Define moments in Statistics.
  2. What is the use of moments?
  3. How moments are used to understand the characteristics of the data?
  4. What is meant by moments about mean?
  5. What are moments about arbitrary value?
  6. What is meant by moments about zero?
  7. Define the different types of moments.
Moments In Statistics (2012)

Online MCQs Test Preparation Website