Basic Statistics - Statistics for Data Science & Analytics

Measures of Central Tendency

Apr 4, 2025Apr 5, 2012 by Muhammad Imdad Ullah

The median is one of the three main measures of central tendency, alongside the mean and mode. It represents the middle value of an ordered dataset. It is a powerful and reliable summary statistic and widely used, especially in real-life scenarios where data is skewed or contains outliers. Unlike the mean, the median is not affected by extreme values, which makes it incredibly useful in various fields. For the formula of the median, read the post: formula of median and definition.

When the Median is Preferred over the Mean

Question: What is a measure of central tendency, and what are the common measures of central tendency? Also, when is the median preferred over the mean?

A measure of central tendency is the single numerical value considered most typical of the values of a quantitative variable.

The most common measure of central tendency is the mode (i.e., the most frequently occurring number)

The median (i.e., the middle point or fiftieth percentile), and the mean (i.e., the arithmetic average).

The median is preferred over the mean when the numbers are highly skewed (i.e., non-normally distributed).

Importance of Measures of Central Tendencies

Since measures of central tendency condense a bunch of information into a single, digestible value that represents the center of the data, this makes measures of central tendencies important for several reasons:

Summarizing data: Instead of listing every data point, one can use a central tendency measure to get a quick idea of what is typical in the data set.
Comparisons: By computing central tendency measures for different groups or datasets, one can easily compare them to see if there are any differences.
Decision making: Central tendency measures can help to make wise decisions. For instance, knowing the average income in an area can help set prices. Imagine an organization is analyzing customer purchases. Knowing the average amount spent can help them tailor promotions or target specific customer groups.
Identifying trends: Measures of central tendencies may help in observing the trend over time. This can be done by using different visualizations to see if there are any trends, like a rise in average house prices.

However, it is very important to understand these Measures of Central Tendency (mean, median, mode). Each measure of central tendency has its strengths and weaknesses. Choosing the right measure of central tendency depends on the kind of data and what one’s interest is to extract from and try to understand.

Real-Life Examples and Uses of Median

Income & Salaries: The Median is used to represent the average income of a population more accurately. It is because A few ultra-rich individuals can skew the mean income upward. The median gives a more realistic picture of what a typical person earns. Example: If most people earn around $40,000–$60,000, but a few CEOs earn $10 million or more, the median income might be $55,000 while the mean income could be $95,000 — misleading!
Education (Test/ Exame Scores): The median can be used to summarize exam results or performance data. A few very low or very high scores can distort the mean. For example, if most students score between 70 and 90, but a few score 10 or 100, the measure of central tendency, the median score, gives a better sense of central performance.
Real Estate (Home Prices): Reporting the median home price is common in real estate. Why Median? It avoids distortion from a few very expensive or very cheap homes. For example, A city may have a median home price of $350,000, even if some luxury homes cost $5 million.
Sports (Player Performance): To report median stats like race times, goals scored, or batting averages. To avoid skewed data from one amazing or terrible performance. For example, a runner’s median race time over 10 races can better reflect consistency.
Healthcare (Medical Test Results): Reporting the median wait time in hospitals or median survival time in clinical trials may be beneficial. This is because medical data often contains outliers or skewed distributions. For example, if most patients wait 30 minutes, but a few wait 5 hours, the measure of central tendency, the median wait time, might be 35 minutes, while the mean could be misleadingly high.
Customer Feedback (Review Rating): Median star rating for products or services. Filters out extremely negative or overly positive outliers. For example, if ratings are 1, 5, 5, 5, and 1, the mean is 3.4 but the median is 5, better reflecting the typical rating.
Transportation (Travel Times): Apps like Google Maps or Waze often use median travel times to reflect a more realistic average, ignoring rare traffic jams or super fast times. For example, the median commute time may be 25 minutes, even if a few people experience 60-minute delays.

Summary

Scenario/ Use Case	Variable	Why Median should be used
Income reports	Salary	Avoids distortion by billionaires
House prices	Real estate values	Neutralizes luxury properties
ER performance	Patient wait times	Filters extreme delays
Test scores	Exam performance	Reduces skew from outliers
Travel times	Commute estimates	Reflects normal travel conditions
Product reviews	User ratings	Balances biased reviews

Statistics Help measures of central tendency

Constructing Frequency Tables 2012

Jul 6, 2025Mar 18, 2012 by Muhammad Imdad Ullah

A frequency table is a way of summarizing a set of data. It is a record of each value (or set of values) of the variable in the data/question. In this post, we will learn about the ways of Constructing Frequency Tables for discrete and continuous data.

A grouping of qualitative data into mutually exclusive classes, showing the number of observations in each class, is called a frequency table. The number of values falling in a particular category/class is called the frequency of that category/class, denoted by $f$.

If data of continuous variables are arranged into different classes with their frequencies, then this is known as a continuous frequency distribution. If data of discrete variables is arranged into different classes with their frequencies, then it is known as a discrete distribution or discontinuous distribution.

Discrete Frequency Distribution Table Example

Car Type	Number of Cars
Local	50
Foreign	30
Total Cars	80

Constructing Frequency Tables

Constructing Frequency tables (distributions) may be done for both discrete and continuous variables. A discrete frequency distribution can be converted back to the original values, but for continuous variables, it is not possible.

Step-by-Step Procedure of Constructing a Frequency Table

The following steps are taken into account while constructing frequency tables for continuous data.

Calculate the range of the data. The range is the difference between the highest and lowest values of the given data.
\[Range = Highest\,\, Value – Lowest\,\, Value\]
Decide the number of Classes. The maximum number of classes may be determined by the formula
Number of classes $C = 2^k$ OR Number of classes $(C) = 1+3.3\, log (n)$
Note that: Too many classes or too few classes might not reveal the basic shape of the data set.
Determine the Class Interval or Width
The class all taken together should cover at least the distance from the lowest value in the data up to the highest value, which can be done by this formula \[I=\frac{Highest\,\, Value – Lowest\,\, Value}{Number\,\, of \,\,Classes}=\frac{H-L}{K}\]
Where $I$ is the class interval, $H$ is the highest observed value, $L$ is the lowest observed value, and $K$ is the number of classes.
Generally, the class interval or width should be the same for all classes.
In particular interval size is usually rounded up to some convenient number, such as a multiple of 10 or 100. Unequal class intervals present problems in graphically portraying the distribution and in doing some of the computations. Unequal class intervals may be necessary for certain situations, such as to avoid a large number of empty or almost empty classes.
Set the Individual Class Limits
Class limits are the endpoints of the class interval. State clear class limits so that you can put each of the observations into one and only one category, i.e., you must avoid overlapping or unclear class limits. Class intervals are usually rounded up to get a convenient class size and cover a larger-than-necessary range.
It is convenient to choose the endpoints of the class interval so that no observation falls on them. It can be obtained by expressing the endpoints to one more place of decimal than the observations themselves, i.e., limits are converted to class boundaries to achieve continuity in data.
Tally the Observations into the Classes
Count the Number of Items in each Class
The number of observations in each class I called the class frequency. Note that totaling the frequencies in each class must equal the total number of observations. After following these steps, we have organized the data into a tabulation form, which is called a frequency distribution, which can be used to summarize the pattern in the observation, i.e., the concentration of the data.

Note: Arranging/organizing the data into a tabulation or frequency distribution results in a loss of detailed information as the individuality of observations vanishes, i.e., in frequency distribution, we cannot pinpoint the exact value, and we cannot tell the actual lowest and highest values of the data. However, the lower limit of the largest class conveys essentially the same meaning. So, in constructing the frequency tables, the advantages of condensing the data into a more understandable and organized form are more than offset by this disadvantage.

The Word Statistics Meaning and Use

Mar 14, 2025Feb 26, 2012 by Muhammad Imdad Ullah

The post is about “The Word Statistics Meaning and Use”.

The Word Statistics

The word statistics was first used by German scholar Gottfried Achenwall in the middle of the 18^th century as the science of statecraft concerning the collection and use of data by the state.

The word statistics comes from the Latin word “Status” or Italian word “Statistia” or German word “Statistik” or the French word “Statistique”; meaning a political state, and originally meant information useful to the state, such as information about sizes of the population (human, animal, products, etc.) and armed forces.

According to pioneer statistician Yule, the word statistics occurred at the earliest in the book “The Element of universal erudition” by Baron (1770). In 1787 a wider definition was used by E.A.W. Zimmermann in “A Political Survey of the Present State of Europe”. It appeared in the Encyclopedia of Britannica in 1797 and was used by Sir John Sinclair in Britain in a series of volumes published between 1791 and 1799 giving a statistical account of Scotland. In the 19th century, the word statistics acquired a wider meaning covering numerical data of almost any subject and also interpretation of data through appropriate analysis.

The Word Statistics Now a Day

Now statistics are being used with different meanings.

Statistics refers to “numerical facts that are arranged systematically in the form of tables or charts etc. In this sense, it is always used as a plural i.e. a set of numerical information. For instance statistics on prices, road accidents, crimes, births, educational institutions, etc.
The word statistics is defined as a discipline that includes procedures and techniques used to collect, process, and analyze numerical data to make inferences and to reach an appropriate decision in a situation of uncertainty (uncertainty refers to incompleteness, it does not imply ignorance). In this sense word statistic is used in the singular sense. It denotes the science of basing the decision on numerical data.
The word statistics refers to numerical quantities calculated from sample observations; a single quantity calculated from sample observations is called statistics such as the mean. Here word statistics is plural.

“We compute statistics from statistics by statistics”

The first place of statistics is plural of statistics, in second place is plural sense data, and in third place is singular sense methods.

In another way, the word Statistics has two meanings:

The science of data:
In this sense, statistics deals with collecting, analyzing, interpreting, and presenting numerical data. Therefore, statistics helps us to understand the world around us by making sense of large amounts of information. Statisticians use a variety of techniques to summarize data, identify patterns, and draw wise conclusions.
Pieces of data:
Statistics also refers to the actual numerical data itself, for example, averages, percentages, or other findings from a study. The real-life examples of statistics are: (i) unemployment statistics or (ii) crime statistics.

Most Common Uses of Statistics

The following are the most common uses of Statistics in various fields of life.

Business and Economics

Market Research: Understanding consumer behaviour, satisfaction, preferences, and trends.
Operations Management: Optimizing processes, inventory control, and quality control.
Financial Analysis: Evaluating investments, risk management, and financial performance.

Healthcare

Clinical Trials: Compare and Evaluate the effectiveness and safety of new treatments.
Epidemiology: Studying the occurrence and distribution of diseases.
Public Health: Identifying health risks and developing prevention strategies.

Sociology: Studying social phenomena, such as inequality, crime, and education.
Psychology: Understanding human behaviour, personality, and cognition.
Political Science: Analyzing political behaviour, public opinion, and election outcomes.

Government

Policy Development: Making informed decisions based on data and evidence.
Economic Planning: Forecasting economic growth and trends.
Public Administration: Improving efficiency and effectiveness of government services.

Education

Educational Research: Evaluating teaching methods, curriculum, and student outcomes.
Testing and Assessment: Developing and analyzing standardized tests.
Student Data Analysis: Identifying trends and addressing educational disparities.

Science and Technology

Research: Designing experiments, collecting data, and analyzing results.
Data Analysis: Discovering patterns, relationships, and insights in large datasets.
Machine Learning: Developing algorithms that can learn from data and make predictions.

Sports

Player Performance Analysis: Evaluating athlete performance and identifying areas for improvement.
Team Strategy: Developing game plans and making tactical decisions.
Sports Betting: Analyzing data to predict game outcomes.

For learning about the Basics of Statistics Follow the link Basic Statistics

Learn R Language

Measures of Central Tendency

Table of Contents

When the Median is Preferred over the Mean