Measure of Central Tendency (2024): A Comprehensive Guide

Introduction to Measure of Central Tendency

The Measure of central tendency is a statistic that summarizes the entire quantitative or qualitative set of data in a single value (a representative value of the data set) tending to concentrate somewhere in the center of the data. The tendency of the observations to cluster in the central part of the data is called the central tendency and the summary values as measures of central tendency, also known as the measure of location or position, are also known as averages.

Note that

  • The Measure of central tendency should be somewhere within the range of the data set.
  • It should remain unchanged by a rearrangement of the observations in a different order.

Criteria of Satisfactory Measures of Location or Averages

There are several types of averages available to measure the representative value of a set of data or distribution. So an average should satisfy or possess all or most of the following conditions.

  • It should be well defined i.e. rigorously defined. There should be no confusion in its definition. Such as Sum of values divided by their total number is the well-defined definition of Arithmetic Mean.
  • It should be based on all the observations made.
  • Should be simple to understand and easy to interpret.
  • Can be calculated quickly and easily.
  • Should be amenable/manageable to mathematical treatment.
  • Should be relatively stable in repeating sampling experiments.
  • Should not be unduly influenced by abnormally large or small observations (i.e. extreme observations)

The mean, median, and mode are all valid measures of central tendencies, but under different conditions, some measures of central tendencies become more appropriate to use than others. There are several different kinds of calculations for central tendency where the kind of calculation depends on the type of the data i.e. level of measurement on which data is measured.

Measures of Central Tendencies

The following are the measures of central tendencies for univariate or multivariate data.

Measures of Central Tendency
  • The arithmetic mean: The sum of all measurements divided by the number of observations in the data set
  • Median:  The middlemost value for sorted data. The median separates the higher half from the lower half of the data set i.e. partitioning the data set into parts.
  • Mode: The most frequent or repeated value in the data set.
  • Geometric mean: The nth root of the product of the data values.
  • Harmonic mean: The reciprocal of the arithmetic mean of the reciprocals of the data values
  • Weighted mean: An arithmetic mean incorporating the weights to elements of certain data.
  • Distance-weighted estimator: The measure uses weighting coefficients for $x_i$ that are computed as the inverse mean distance between $x_i$ and the other data points.
  • Truncated mean: The arithmetic mean of data values after a certain number or proportion of the highest and lowest data values have been discarded.
  • Midrange: The arithmetic mean of the maximum and minimum values of a data set.
  • Midhinge: The arithmetic mean of the two quartiles.
  • Trimean: The weighted arithmetic mean of the median and two quartiles.
  • Winsorized mean: An arithmetic mean in which extreme values are replaced by values closer to the median.

Note that measures of central tendency are applied according to different levels of measures (type of a variable).

Measure of Central Tendency (2024): A Comprehensive Guide

The best measure to use depends on the characteristics of your data and the specific question you’re trying to answer.

In summary, measures of central tendencies are fundamental tools in statistics whose use depends on the characteristics of the data being studied. The measures are used to summarize the data and are used to provide insight and foundation for further analysis. They also help in getting valuable insights for decision-making and prediction. Therefore, understanding the measures of central tendencies is essential to effectively analyze and interpret data.

FAQS about Measure of Central Tendency

  1. Define the measure of central tendency.
  2. What conditions must a measure of tendency should follow?
  3. Name widely used measures of central tendency.
  4. What is the functionality of the measure of central tendencies?
  5. What statistical measures can be applied on which level of measurement?

Reference


1) Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-
2) https://en.wikipedia.org/wiki/Central_tendency
3) Dodge, Y. (2005) The Concise Encyclopedia of Statistics. Springer,

R and Data Analysis

Computer MCQs Test Online

Probability Theory: An Introduction (2012)

This post is about probability theory. It will serve as an introduction to the theory of chances.

Probability Theory

Uncertainty is everywhere i.e. nothing in this world is perfect or 100% certain except the Almighty Allah the Creator of the Universe. For example, if someone bought 10 lottery tickets out of 500 and each of the 500 tickets is as likely as any other to be selected or drawn for the first prize then it means that you have 10 chances out of 500 tickets or 2% chances to win the first prize.

Similarly, a decision maker seldom has complete information to make a decision.
So, probability is a measure of the likelihood that something will happen, however, probability cannot predict the number of times that something will occur in the future, so all the known risks involved must be scientifically evaluated. The decisions that affect our daily life, are based upon the likelihood (probability or chance) but not on absolute certainty. The use of probability theory allows the decision maker with only limited information to analyze the risks and minimize the gamble inherently. For example in marketing a new product or accepting an incoming shipment possibly containing defective parts.

Probability Theory

Probability can be considered as the quantification of uncertainty or likelihood. Probabilities are usually expressed as fractions such as {1/6, 1/2, 8/9} or as decimals such as {0.167, 0.5, 0.889} and can also be presented as percentages such as {16.7%, 50%, 88.9%}.

Types of Probability

Suppose we want to compute the chances (Note that we are not predicting here, just measuring the chances) that something will occur in the future. For this purpose, we have three types of probability

1) Classical Approach or Prior Approach

In a classical probability approach, two assumptions are used

  • Outcomes are mutually exclusive
  • Outcomes are equally likely

Classical probability is defined as “The number of outcomes favorable to the occurrence of an event divided by the total number of all possible outcomes”.
OR
An experiment resulting in $n$ equally likely mutually exclusive and collectively exhaustive outcomes and “$m$” of which are favorable to the occurrence of an event A, then the probability of event A is the ratio of $\ \frac {m}{n}$. (D.S. Laplace (1749-1927).

Symbolically we can write $$P(A) = \frac{m}{n} = \frac{number\,\, of\,\, favorable\,\, outcomes}{Total\,\, number\,\, of\,\, outcomes}$$

Some shortcomings of the classical approach

  • This approach to probability is useful only when one deals with card games, dice games, or coin tosses. i.e. Events are equally likely but not suitable for serious problems such as decisions in management.
  • This approach assumes a world that does not exist, as some assumptions are imposed as described above.
  • This approach assumes symmetry about the world but there may be some disorder in a system.

2) Relative Frequency or Empirical Probability or A Posterior Approach

The proportion of times that an event occurs in the long run when conditions are stable. Relative frequency becomes stable as the number of trials becomes large under uniform conditions.
To calculate the relative frequency an experiment is repeated a large number of times say “n” under uniform/stable conditions. So if an event A occurs m times, then the probability of the occurrence of the event A is defined by
$$P(A)=\lim_{x\to\infty}\frac{m}{n}$$

if we say that the probability of a number n child will be a boy is $\frac{1}{2}$, then it means that over a large number of children born 50% of all will be boys.

Some Critics

  • It is difficult to ensure that the experiment is repeated under stable/uniform conditions.
  • The experiment can be repeated only a finite number of times in the real world, not an infinite number of times.

3) Subjective Probability Approach

This is the probability based on the beliefs of the persons making the probability assessment.
Subjective probability assessments are often found when events occur only once or at most a very few times.
This approach is applicable in business, marketing, and economics for quick decisions without performing any mathematical calculations.
The Disadvantage of subjective probability is that two or more persons facing the same evidence/problem may arrive at different probabilities i.e. for the same problem there may be different decisions.

Real-Life Example of Subjective Probability

  • A firm must decide whether or not to market a new type of product. The decision will be based on prior information that the product will have high market acceptance.
  • The Sales Manager considers that there is a 40% chance of obtaining the order for which the firm has just quoted. This value (40% chance) cannot be tested by repeated trials.
  • Estimating the probability that you will be married before the age of 30 years.
  • Estimating the likelihood (probability, chances) that Pakistan’s budget deficit will be reduced by half in the next 5 years.

Note that subjective probability theory is not a repeatable experiment, the relative frequency approach to probability is not applicable, nor can equally likely probabilities be assigned.

Important Terminologies of Probability Theory

Visit and Learn R Programming Language

Probability Terminology

Probability Terminology

The following are Probability terminology that are helpful in understanding the concepts of probability and rules of probability for solving different probability-related real-life problems.

Sets: A set is a well-defined collection of distinct objects. The objects making up a set are called its elements. A set is usually capital letters i.e. $A, B, C$, while its elements are denoted by small letters i.e. $a, b, c$, etc.

Null Set: A set that contains no element is called a null set or simply an empty set. It is denoted by { } or $\varnothing$.

Subset: If every element of a set $A$ is also an element of a set $B$, then $A$ is said to be a subset of $B$ and it is denoted by $A \ne B$.

Proper Subset: If $A$ is a subset of $B$, and $B$ contains at least one element that is not an element of $A$, then $A$ is said to be a proper subset of $B$ and is denoted by; $A \subset B$.

Finite and Infinite Sets: A set is finite, if it contains a specific number of elements, i.e. while counting the members of the sets, the counting process comes to an end otherwise the set is infinite.

Universal Set: A set consisting of all the elements of the sets under consideration is called the universal set. It is denoted by $\cup$.

Disjoint Set: Two sets $A$ and $B$ are said to be disjoint sets if they have no elements in common i.e. if $A \cup B = \varnothing$, then $A$ and $B$ are said to be disjoint sets.

Overlapping Sets: Two sets $A$ and $B$ are said to be overlapping sets, if they have at least one element in common, i.e. if $A \cap B \ne \varnothing$ and none of them is the subset of the other set then $A$ and $B$ are overlapping sets.

Union of Sets: The Union of two sets $A$ and $B$ is a set that contains the elements either belonging to $A$ or $B$ or both. It is denoted by $A \cap B$ and read as $A$ union $B$.

Intersection of Sets: The intersection of two sets $A$ and $B$ is a set that contains the elements belonging to both $A$ and $B$. It is denoted by $A \cup B$ and read as $A$ intersection $B$.

Difference of Sets: The difference between a set $A$ and a set $B$ is the set that contains the elements of the set $A$ that are not contained in $b$. The difference between sets $A$ and $B$ is denoted by $a-b$.

Complement of a Set: Complement of a set $a$ denoted by $\bar{A}$ or $A^c$ and is defined as $\bar{A}=\cup$.

Experiment: Any activity where we observe something or measure something. An activity that results in or produces an event is called an experiment.

Random Experiment: An experiment, if repeated under identical conditions may not give the same outcome, i.e. The outcome of a random experiment is uncertain, so that a given outcome is just one sample of many possible outcomes. For the random experiment, we know about all possible outcomes. A random experiment has the following properties;

  1. The experiment can be repeated any number of times.
  2. A random trial consists of at least two outcomes.

 Sample Space: The set of all possible outcomes in a random experiment is called sample space. In the coin toss experiment, the sample space is $S=\{Head, Tail\}$, in the card-drawing experiment the sample space has 52 members. Similarly the sample space for a die={1,2,3,4,5,6}.

Event: Event is simply a subset of sample space. In a sample space, there can be two or more events consisting of sample points. For coin, the list of all possible events is 4, found by $event=2^n$, that is i) $A_1 = \{H\}$, ii) $A_2=\{T\}$, iii) $A_3\{H, T\}$, and iv) $A_4=\varnothing$ are possible event for coin toss experiment.

Simple Event: If an event consists of one sample point, then it is called a simple event. For example, when two coins are tossed, the event {TT} is simple.

Compound Event: If an event consists of more than one sample point, it is called a compound event. For example, when two dice are rolled, an event B, the sum of two faces is 4 i.e. $B=\{(1,3), (2,3), 3,1)\}$ is a compound event.

Independent Events: Two events $A$ and $B$ are said to be independent if the occurrence of one does not affect the occurrence of the other. For example, in tossing two coins, the occurrence of a head on one coin does not affect in any way the occurrence of a head or tail on the other coin.

Dependent Events: Two events A and B are said to be dependent if the occurrence of one event affects the occurrence of the other event.

Mutually Exclusive Events: Two events $A$ and $B$ are said to be mutually exclusive if they cannot occur at the same time i.e. $A\cup B AUB=\varnothing$. For example, when a coin is tossed, we get either a head or a tail, but not both. That is why they have no common point there, so these two events (head and tail) are mutually exclusive. Similarly, when a die is thrown, the possible outcomes 1, 2, 3, 4, 5, 6 are mutually exclusive.

probability terminology

Equally Likely or Non-Mutually Exclusive Events: Two events $A$ and $B$ are said to be equally likely events when one event is as likely to occur as the other. OR If the experiment is continued a large number of times all the events have the chance of occurring an equal number of times. Mathematically, $A\cup B \ne\varnothing$. For example, when a coin is tossed, the head is as likely to occur as the tail or vice versa.

Exhaustive Events: When a sample space $S$ is partitioned into some mutually exclusive events, such that their union is the sample space itself, the event is called an exhaustive event. OR
Events are said to be collectively exhaustive when the union of mutually exclusive events is the entire sample space $S$.
Let a die is rolled, the sample space is $S=\{1,2,3,4,5,6\}$.
Let $A=\{1, 2\}, B=\{3, 4, 5\}$, and C=\{6\}$.

$A, B$, and $C$ are mutually exclusive events and their union $(A\cup B \cup C = S)$ is the sample space, so the events $A, B$, and $C$ are exhaustive.

Classical Probability Definition and Examples

R Frequently Asked Questions

Online MCQs Test Website

Pareto Chart Easy Guide (2012)

A Pareto chart named after Vilfredo Pareto (an Italian Economist) is a bar chart in which all bars are ordered from largest to the smallest along with a line showing the cumulative percentage and count of the bars. The left vertical axis has the frequency of occurrence (number of occurrences), or some other important unit of measure such as cost. The right vertical axis contains the cumulative percentage of the total number of occurrences or the total of the particular unit of measure such as total cost. For the Pareto chart, the cumulative function is concave because the bars (representing the reasons) are in decreasing order. A Pareto chart is also called a Pareto distribution diagram.

The Pareto chart is also known as the 80/20 rule chart. These charts offer several benefits for data analysis and problem-solving.

A Pareto chart can be used when the following questions have their answer is “yes”

  1. Can data be arranged into categories?
  2. Is the rank of each category important?

Pareto charts are often used to analyze defects in a manufacturing process or the most frequent reasons for customer complaints to help determine the types of defects that are most prevalent (important) in a process. So a Company can focus on improving its efforts in particular important areas where it can make the largest gain or the lowest loss by eliminating causes of defects. So it’s easy to prioritize the problem areas using Pareto charts. The categories in the “tail” of the Pareto chart are called the insignificant factors.

Pareto Chart Example

Pareto Chart

The Pareto chart given above shows the reasons for consumer complaints against airlines in 2004. Here each bar represents the number (frequency) of each complaint received. The major complaints received are related to flight problems (such as cancellations, delays, and other deviations from the schedule). The 2nd largest complaint is about customer service (rude or unhelpful employees, inadequate meals or cabin service, treatment of delayed passengers, etc.). Flight problems account for 21% of the complaints, while both flight problems and customer service account for 40% of the complaints. The top three complaint categories account for 55% of the complaints. So, to reduce the number of complaints, airlines should need to work on flight delays, customer service, and baggage problems.

By incorporating Pareto-charts into data analysis, one can get valuable insights, prioritize effectively, and make data-driven decisions.

Charts and Graphs

References:

  • Nancy R. Tague (2004). “Seven Basic Quality Tools”. The Quality Toolbox. Milwaukee, Wisconsin: American Society for Quality. p. 15. Retrieved 2010-02-05.
  • http://en.wikipedia.org/wiki/Pareto_chart

See more about Charts and Graphs

Online MCQs Intermediate Mathematics (Matrices and Determinants)

Graphs in R Language