The Deciles: Measure of Position Made Easy (2012)

The deciles are the values (nine in number) of the variable that divides an ordered (sorted, arranged) data set into ten equal parts so that each part represents $\frac{1}{10}$ of the sample or population and are denoted by $D_1, D_2, \cdots D_9$, where First decile ($D_1$) is the value of order statistics that exceed 1/10 of the observations and less than the remaining $\frac{9}{10}$. The $D_9$ (ninth decile) is the value in order statistic that exceeds $\frac{9}{10}$ of the observations and is less than $\frac{1}{10}$ remaining observations. Note that the fifth deciles are equal to the median. The deciles determine the values for 10%, 20%… and 90% of the data.

Calculating Deciles for Ungrouped Data

To calculate the decile for the ungrouped data, first order all observations according to the magnitudes of the values, then use the following formula for $m$th decile.

\[D_m= m \times \left( \frac{(n+1)}{10} \right) \mbox{th value; } \qquad \mbox{where} m=1,2,\cdots,9\]

Example: Calculate the 2nd and 8th deciles of the following ordered data 13, 13,13, 20, 26, 27, 31, 34, 34, 34, 35, 35, 36, 37, 38, 41, 41, 41, 45, 47, 47, 47, 50, 51, 53, 54, 56, 62, 67, 82.
Solution:

\begin{eqnarray*}
D_m &=&m \times \{\frac{(n+1)}{10} \} \mbox{th value}\\
&=& 2 \times \frac{30+1}{10}=6.2\\
\end{eqnarray*}

We have to locate the sixth value in the ordered array and then move 0.2 of the distance between the sixth and seventh values. i.e. the value of the 2nd decile can be calculated as
\[6 \mbox{th observation} + \{7 \mbox{th observation} – 6 \mbox{th observation} \}\times 0.2\]
as 6th observation is 27 and 7th observation is 31.
The second decile would be $27+\{31-27\} \times 0.2 = 27.8$

Similarly, $D_8$ can be calculated. $D_8=52.6$.

Calculating Deciles for Grouped Data

The following formula can calculate the $m$th decile for grouped data (in ascending order).

\[D_m=l+\frac{h}{f}\left(\frac{m.n}{10}-c\right)\]

where

$l$ = is the lower class boundary of the class containing $m$th deciles
$h$ = is the width of the class containing $m$th deciles
$f$ = is the frequency of the class containing $m$th deciles
$n$ = is the total number of frequencies
$c$ = is the cumulative frequency of the class preceding the class containing $m$th deciles

Example: Calculate the first and third decile(s) of the following grouped data

The Deciles: Measure of Position

Solution: The Decile class for $D_1$ can be calculated from $\left(\frac{m.n}{10}-c\right) = \frac{1 \times 30}{10} = 3$rd observation. As 3rd observation lies in the first class (first group) so

\begin{eqnarray*}
D_m&=&l+\frac{h}{f}\left(\frac{m.n}{10}-c\right)\\
D_1&=&85.5+\frac{5}{6}\left(\frac{1\times30}{10}-0\right)\\
&=&88\\
\end{eqnarray*}

The Decile class for $D_7$ is 100.5—105.5 as $\frac{7 \times 30}{10}=21$th observation which is in fourth class (group).
\begin{eqnarray*}
D_m&=&l+\frac{h}{f}\left(\frac{m.n}{10}-c\right)\\
D_7&=&100.5+\frac{5}{6}\left(\frac{7\times30}{10}-20\right)\\
&=&101.333\\
\end{eqnarray*}

https://itfeature.com statistics data analytics

Learn R Language

Measure of Central Tendency (2024): A Comprehensive Guide

Introduction to Measure of Central Tendency

The Measure of central tendency is a statistic that summarizes the entire quantitative or qualitative set of data in a single value (a representative value of the data set) tending to concentrate somewhere in the center of the data. The tendency of the observations to cluster in the central part of the data is called the central tendency and the summary values as measures of central tendency, also known as the measure of location or position, are also known as averages.

Note that

  • The Measure of central tendency should be somewhere within the range of the data set.
  • It should remain unchanged by a rearrangement of the observations in a different order.

Criteria of Satisfactory Measures of Location or Averages

There are several types of averages available to measure the representative value of a set of data or distribution. So an average should satisfy or possess all or most of the following conditions.

  • It should be well defined i.e. rigorously defined. There should be no confusion in its definition. Such as Sum of values divided by their total number is the well-defined definition of Arithmetic Mean.
  • It should be based on all the observations made.
  • Should be simple to understand and easy to interpret.
  • Can be calculated quickly and easily.
  • Should be amenable/manageable to mathematical treatment.
  • Should be relatively stable in repeating sampling experiments.
  • Should not be unduly influenced by abnormally large or small observations (i.e. extreme observations)

The mean, median, and mode are all valid measures of central tendencies, but under different conditions, some measures of central tendencies become more appropriate to use than others. There are several different kinds of calculations for central tendency where the kind of calculation depends on the type of the data i.e. level of measurement on which data is measured.

Measures of Central Tendencies

The following are the measures of central tendencies for univariate or multivariate data.

Measures of Central Tendency
  • The arithmetic mean: The sum of all measurements divided by the number of observations in the data set
  • Median:  The middlemost value for sorted data. The median separates the higher half from the lower half of the data set i.e. partitioning the data set into parts.
  • Mode: The most frequent or repeated value in the data set.
  • Geometric mean: The nth root of the product of the data values.
  • Harmonic mean: The reciprocal of the arithmetic mean of the reciprocals of the data values
  • Weighted mean: An arithmetic mean incorporating the weights to elements of certain data.
  • Distance-weighted estimator: The measure uses weighting coefficients for $x_i$ that are computed as the inverse mean distance between $x_i$ and the other data points.
  • Truncated mean: The arithmetic mean of data values after a certain number or proportion of the highest and lowest data values have been discarded.
  • Midrange: The arithmetic mean of the maximum and minimum values of a data set.
  • Midhinge: The arithmetic mean of the two quartiles.
  • Trimean: The weighted arithmetic mean of the median and two quartiles.
  • Winsorized mean: An arithmetic mean in which extreme values are replaced by values closer to the median.

Note that measures of central tendency are applied according to different levels of measures (type of a variable).

Measure of Central Tendency (2024): A Comprehensive Guide

The best measure to use depends on the characteristics of your data and the specific question you’re trying to answer.

In summary, measures of central tendencies are fundamental tools in statistics whose use depends on the characteristics of the data being studied. The measures are used to summarize the data and are used to provide insight and foundation for further analysis. They also help in getting valuable insights for decision-making and prediction. Therefore, understanding the measures of central tendencies is essential to effectively analyze and interpret data.

FAQS about Measure of Central Tendency

  1. Define the measure of central tendency.
  2. What conditions must a measure of tendency should follow?
  3. Name widely used measures of central tendency.
  4. What is the functionality of the measure of central tendencies?
  5. What statistical measures can be applied on which level of measurement?

Reference


1) Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-
2) https://en.wikipedia.org/wiki/Central_tendency
3) Dodge, Y. (2005) The Concise Encyclopedia of Statistics. Springer,

R and Data Analysis

Computer MCQs Test Online

Probability Theory: An Introduction (2012)

This post is about probability theory. It will serve as an introduction to the theory of chances.

Probability Theory

Uncertainty is everywhere i.e. nothing in this world is perfect or 100% certain except the Almighty Allah the Creator of the Universe. For example, if someone bought 10 lottery tickets out of 500 and each of the 500 tickets is as likely as any other to be selected or drawn for the first prize then it means that you have 10 chances out of 500 tickets or 2% chances to win the first prize.

Similarly, a decision maker seldom has complete information to make a decision.
So, probability is a measure of the likelihood that something will happen, however, probability cannot predict the number of times that something will occur in the future, so all the known risks involved must be scientifically evaluated. The decisions that affect our daily life, are based upon the likelihood (probability or chance) but not on absolute certainty. The use of probability theory allows the decision maker with only limited information to analyze the risks and minimize the gamble inherently. For example in marketing a new product or accepting an incoming shipment possibly containing defective parts.

Probability Theory

Probability can be considered as the quantification of uncertainty or likelihood. Probabilities are usually expressed as fractions such as {1/6, 1/2, 8/9} or as decimals such as {0.167, 0.5, 0.889} and can also be presented as percentages such as {16.7%, 50%, 88.9%}.

Types of Probability

Suppose we want to compute the chances (Note that we are not predicting here, just measuring the chances) that something will occur in the future. For this purpose, we have three types of probability

1) Classical Approach or Prior Approach

In a classical probability approach, two assumptions are used

  • Outcomes are mutually exclusive
  • Outcomes are equally likely

Classical probability is defined as “The number of outcomes favorable to the occurrence of an event divided by the total number of all possible outcomes”.
OR
An experiment resulting in $n$ equally likely mutually exclusive and collectively exhaustive outcomes and “$m$” of which are favorable to the occurrence of an event A, then the probability of event A is the ratio of $\ \frac {m}{n}$. (D.S. Laplace (1749-1927).

Symbolically we can write $$P(A) = \frac{m}{n} = \frac{number\,\, of\,\, favorable\,\, outcomes}{Total\,\, number\,\, of\,\, outcomes}$$

Some shortcomings of the classical approach

  • This approach to probability is useful only when one deals with card games, dice games, or coin tosses. i.e. Events are equally likely but not suitable for serious problems such as decisions in management.
  • This approach assumes a world that does not exist, as some assumptions are imposed as described above.
  • This approach assumes symmetry about the world but there may be some disorder in a system.

2) Relative Frequency or Empirical Probability or A Posterior Approach

The proportion of times that an event occurs in the long run when conditions are stable. Relative frequency becomes stable as the number of trials becomes large under uniform conditions.
To calculate the relative frequency an experiment is repeated a large number of times say “n” under uniform/stable conditions. So if an event A occurs m times, then the probability of the occurrence of the event A is defined by
$$P(A)=\lim_{x\to\infty}\frac{m}{n}$$

if we say that the probability of a number n child will be a boy is $\frac{1}{2}$, then it means that over a large number of children born 50% of all will be boys.

Some Critics

  • It is difficult to ensure that the experiment is repeated under stable/uniform conditions.
  • The experiment can be repeated only a finite number of times in the real world, not an infinite number of times.

3) Subjective Probability Approach

This is the probability based on the beliefs of the persons making the probability assessment.
Subjective probability assessments are often found when events occur only once or at most a very few times.
This approach is applicable in business, marketing, and economics for quick decisions without performing any mathematical calculations.
The Disadvantage of subjective probability is that two or more persons facing the same evidence/problem may arrive at different probabilities i.e. for the same problem there may be different decisions.

Real-Life Example of Subjective Probability

  • A firm must decide whether or not to market a new type of product. The decision will be based on prior information that the product will have high market acceptance.
  • The Sales Manager considers that there is a 40% chance of obtaining the order for which the firm has just quoted. This value (40% chance) cannot be tested by repeated trials.
  • Estimating the probability that you will be married before the age of 30 years.
  • Estimating the likelihood (probability, chances) that Pakistan’s budget deficit will be reduced by half in the next 5 years.

Note that subjective probability theory is not a repeatable experiment, the relative frequency approach to probability is not applicable, nor can equally likely probabilities be assigned.

Important Terminologies of Probability Theory

Visit and Learn R Programming Language