Median Definition, Formula, and Example: Quick Guide (2014)

Median Definition

Median (a measure of central tendency) is the middle-most value in the data set when all of the values (observations) in a data set are arranged either in ascending or descending order of their magnitude. The median is also considered a measure of central tendency that divides the data set into two halves, where the first half contains 50% observations below the median value and 50% above the median value. If there are an odd number of observations (data points) in a data set, the median value is the single-most middle value after sorting the data set.

After understanding the median definition, let us consider a few examples to calculate the median for a data set.

Median Example – 1

Question: For the following data set: 5, 9, 8, 4, 3, 1, 0, 8, 5, 3, 5, 6, 3, calculate the median.

Answer: To find the median of the given data set, first sort the data (either in ascending or descending order), that is
0, 1, 3, 3, 3, 4, 5, 5, 5, 6, 8, 8, 9. The middle-most value of the above data after sorting is 5, which is the median of the given data set.

When the number of observations in a data set is even then the median value is the average of two middle-most values in the sorted data.

Median Example – 2

Question: Consider the following data set, 5, 9, 8, 4, 3, 1, 0, 8, 5, 3, 5, 6, 3, 2. Compute the median.

Answer: To find the median first sort it and then locate the middle-most two values, that is,
0, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 8, 8, 9. The middle-most two values are 4 and 5. So the median will be the average of these two values, i.e. 4.5 in this case.

The median is less affected by extreme values in the data set, so the median is the preferred measure of central tendency when the data set is skewed or not symmetrical.

Median Formula for Odd Number of Observations

For large data sets it is relatively very difficult to locate median values in sorted data. It will be helpful to use the median value using the formula. The formula for an odd number of observations is
$\begin{aligned}
Median &=\frac{n+1}{2}th\\
Median &=\frac{n+1}{2}\\
&=\frac{13+1}{2}\\
&=\frac{14}{2}=7th
\end{aligned}$

The 7th value in sorted data is the median of the given data.

Median Formula for Even Number of Observations

The median formula for an even number of observations is
$\begin{aligned}
Median&=\frac{1}{2}(\frac{n}{2}th + (\frac{n}{2}+1)th)\\
&=\frac{1}{2}(\frac{14}{2}th + (\frac{14}{2}+1)th)\\
&=\frac{1}{2}(7th + 8th )\\
&=\frac{1}{2}(4 + 5)= 4.5
\end{aligned}$

Median definition formula example

The computation of the median is a crucial step in exploratory data analysis (EDA). It helps identify potential outliers, assess skewness in the data distribution, and choose appropriate statistical methods for further analysis.

Applications of Median in Different Scenarios

1. Resisting Outliers: The median’s primary strength lies in its resistance to outliers. Unlike the mean (which can be swayed by extreme values), the median remains unaffected and stable by a few very high or very low data points (extreme observations).

2. Analyzing Skewed Distributions: When dealing with data that is not symmetrical (has skewed distributions), the median provides a more accurate representation of the “center” of the data compared to the mean/average. The median reflects the value that divides the data into halves, whereas the mean gets pulled towards the tail of the skewed distribution.

3. Ease of Interpretation: The median is a simple concept – the middle (centermost) value when the data is arranged in order (either ascending or descending).

Note that the median measure of central tendency, cannot be found for categorical data.

FAQs about Median

  1. What is the median?
  2. What is the advantage of the median over other measures of central tendencies?
  3. On what kind/type of data median can be computed?
  4. What is the benefit of using the median?
  5. Write the formula for the median when the number of observations is even and when the number of observations is odd.
  6. How median is interpreted?
  7. In how many groups median classify the data/sample/population?
https://itfeature.com

Online MCQs Test website

R Programming Language

Mode Measure of Central Tendency (2014)

The mode is the most frequent observation in the data set i.e. the value (number) that appears the most in the data set. It is possible that there may be more than one mode or it may also be possible that there is no mode in a data set. Usually, it is calculated for categorical data (data belongs to nominal or ordinal scale) but is unnecessary.

It can also be used for ordinal and ratio scales, but there should be some repeated value in the data set or the data set can be classified. If any of the data points don’t have the same values (no repetition in data values), then the mode of that data set will not exit or may not be meaningful. A data set having more than one mode is called multimode or multimodal.

Example 1: Consider the following data set showing the weight of a child at the age of 10 years: 33, 30, 23, 23, 32, 21, 23, 30, 30, 22, 25, 33, 23, 23, 25. We can find the most repeated value by tabulating the given data in the form of a frequency distribution table, whose first column is the weight of the child and the second column is the number of times the weight appears in the data i.e. frequency of each weight in the first column.

Weight of 10 year childFrequency
221
235
252
303
321
332
Total15

From the above frequency distribution table, we can easily find the most repeated occurring observation (data point), which will be the mode of the data set and it is 23, meaning that the majority of the 10-year-old children weigh 23kg. Note that for finding the mode it is not necessary to make a frequency distribution table, but it helps in finding the mode quickly and the frequency table can also be used in further calculations such as percentage and cumulative percentage of each weight group.

Example 2: Consider we have information about a person about his/her gender. Consider the $M$ stands for male and $F$ stands for Female. The sequence of the person’s gender noted is as follows: F, F, M, F, F, M, M, M, M, F, M, F, M, F, M, M, M, F, F, M. The frequency distribution table of gender is

Weight of 10 year childFrequency
Male11
Female9
Total25

The most repeated gender is male, showing that the most frequent or majority of the people have male gender in this data set.

Mode can be found by simply sorting the data in ascending or descending order and then counting the frequent value without sorting the data especially when data contains a small number of observations, though it may be difficult to remember the number of times which observation occurs. Note that the mode is not affected by the extreme values (outliers or influential observations).

The mode is also a measure of central tendency, but it may not reflect the center of the data very well. For example, the mean of the data set in example 1, is 26.4kg while the mode is 23kg. Therefore, it should be used, if it is expected that data points will repeat or have some classification in them. For such kind of data, one should use it as a measure of central tendency instead of mean or median. For example,

  • In the production process, a product can be classified as a defective or non-defective product.
  • Student grades can classified as A, B, C, D, etc.
  • Gender of respondents
  • Blood Group

Example 3: Consider the following data. 3, 4, 7, 11, 15, 20, 23, 22, 26, 33, 25, 13. There is no mode of this data as each value occurs once. By grouping this data in some useful and meaningful form we can get the most repeated value of the data for example, the grouped frequency table is

GroupValuesFrequency
0 to 93, 4, 73
10 to 1911, 13, 153
20 to 2920, 22, 23, 25, 265
30 to 39331
Total12

We cannot find the most Frequent value from this table, but we can say that “20 to 29” is the group in which most of the observations occur. We can say that this group contains the mode which can be found by using the grouped formula.

Mode from Bar Graph

Bar Graph: Mode Measure of Central Tendency

Introduction to R Language

Online MCQs Test Website

Creating Frequency Distribution Table (2014)

Using Descriptive statistics we can organize the data to get the general pattern of the data and check where data values tend to concentrate and try to expose extreme or unusual data values. Let us start learning about the Frequency Distribution Table and its construction.

A frequency distribution is a compact form of data in a table that displays the categories of observations according to their magnitudes and frequencies such that similar or identical numerical values are grouped. The categories are also known as groups, class intervals, or simply classes. The classes must be mutually exclusive classes showing the number of observations in each class. The number of values falling in a particular category is called the frequency of that category denoted by $f$.

A Frequency Distribution Table shows us a summarized grouping of data divided into mutually exclusive classes and the number of occurrences in a class. Frequency distribution is a way of showing raw (ungrouped or unorganized) data into grouped or organized data to show results of sales, production, income, loan, death rates, height, weight, temperature, etc.

The relative frequency of a category is the proportion of observed frequency to the total frequency obtained by dividing observed frequency by the total frequency and denoted by $r.f.$.  The sum of r.f. column should be one except for rounding errors. Multiplying each relative frequency of class by 100 we can get the percentage occurrence of a class. A relative frequency captures the relationship between a class total and the total number of observations.

The Frequency Distribution Table may be made for continuous data, discrete data, and categorical data (for both qualitative and quantitative data). It can also be used to draw some graphs such as histograms, line charts, bar charts, pie charts, frequency polygons, Pareto Charts, Scatter diagrams, stem and leaf displays, etc.

Steps of Creating Frequency Distribution Table

  1. Decide about the number of classes. The number of classes is usually between 5 and 20. Too many classes or too few classes might not reveal the basic shape of the data set, also it will be difficult to interpret such frequency distribution. The maximum number of classes may be determined by the formula:
    \[\text{Number of Classes} = C = 1 + 3.3 log (n)\]
    \[\text{or} \quad C = \sqrt{n} \quad {approximately}\]where $n$ is the total number of observations in the data.
  2. Calculate the range of the data ($Range = Max – Min$) by finding minimum and maximum data values. The range will be used to determine the class interval or class width.
  3. Decide about the width of the class denoted by h and obtained by
    \[h = \frac{\text{Range}}{\text{Number of Classes}}= \frac{R}{C} \]
    Generally, the class interval or class width is the same for all classes. The classes all taken together must cover at least the distance from the lowest value (minimum) in the data set up to the highest (maximum) value. Also note that equal class intervals are preferred in frequency distribution, while unequal class intervals may be necessary in certain situations to avoid a large number of empty, or almost empty classes.
  4. Decide the individual class limits and select a suitable starting point for the first class which is arbitrary, it may be less than or equal to the minimum value. Usually, it is started before the minimum value in such a way that the midpoint (the average of lower and upper-class limits of the first class) is properly placed.
  5. Take an observation and mark a vertical bar (|) for a class it belongs. A running tally is kept till the last observation. The tally counts indicate five.
  6. Find the frequencies, relative frequency,  cumulative frequency, etc. as required.
Frequency Distribution Table
Frequency Distribution Table

A frequency distribution is said to be skewed when its mean and median are different. The kurtosis of a frequency distribution is the concentration of scores at the mean, or how peaked the distribution appears if depicted graphically, for example, in a histogram. If the distribution is more peaked than the normal distribution it is said to be leptokurtic; if less peaked it is said to be platykurtic.

Continuous Frequency Distribution Table

Further Reading: Frequency Distribution Table

Learn R Language: R Frequently Asked Questions

Primary and Secondary Data (2014)

Data

Before learning about primary and Secondary Data, let us first understand the term Data in Statistics.

The facts and figures which can be numerically measured are studied in statistics. Numerical measures of the same characteristics are known as observation and collection of observations is termed as data. Data are collected by individual research workers or by organizations through sample surveys or experiments, keeping in view the objectives of the study. The data collected may be (i) Primary Data and (ii) Secondary Data.

Primary and Secondary Data in Statistics

The difference between primary and secondary data in Statistics is that Primary data is collected firsthand by a researcher (organization, person, authority, agency or party, etc.) through experiments, surveys, questionnaires, focus groups, conducting interviews, and taking (required) measurements, while the secondary data is readily available (collected by someone else) and is available to the public through publications, journals, and newspapers.

Primary and Secondary Data

Primary Data

Primary data means the raw data (data without fabrication or not tailored data) that has just been collected from the source and has not gone through any kind of statistical treatment like sorting and tabulation. The term primary data may sometimes be used to refer to first-hand information.

Sources of Primary Data

The sources of primary data are primary units such as basic experimental units, individuals, and households. The following methods are used to collect data from primary units usually and these methods depend on the nature of the primary unit. Published data and the data collected in the past are called secondary data.

  • Personal Investigation
    The researcher experiments or surveys himself/herself and collects data from it. The collected data is generally accurate and reliable. This method of collecting primary data is feasible only in the case of small-scale laboratories, field experiments, or pilot surveys and is not practicable for large-scale experiments and surveys because it takes too much time.
  • Through Investigators
    The trained (experienced) investigators are employed to collect the required data. In the case of surveys, they contact the individuals and fill in the questionnaires after asking for the required information, whereas a questionnaire is an inquiry form having many questions designed to obtain information from the respondents. This method of collecting data is usually employed by most organizations and it gives reasonably accurate information but it is very costly and may be time-consuming too.
  • Through Questionnaire
    The required information (data) is obtained by sending a questionnaire (printed or soft form) to the selected individuals (respondents) (by mail) who fill in the questionnaire and return it to the investigator. This method is relatively cheap as compared to the “through investigator” method but the non-response rate is very high as most of the respondents don’t bother to fill in the questionnaire and send it back to the investigator.
  • Through Local Sources
    The local representatives or agents are asked to send requisite information and provide the information based on their own experience. This method is quick but it gives rough estimates only.
  • Through Telephone
    The information may be obtained by contacting the individuals by telephone. It is Quick and provides the accurate required information.
  • Through Internet
    With the introduction of information technology, people may be contacted through the Internet and individuals may be asked to provide pertinent information. Google Survey is widely used as an online method for data collection nowadays. There are many paid online survey services too.

It is important to go through the primary data and locate any inconsistent observations before it is given a statistical treatment.

Secondary Data

Data that has already been collected by someone, may be sorted, tabulated, and has undergone a statistical treatment. It is fabricated or tailored data.

Sources of Secondary Data

The secondary data may be available from the following sources:

  • Government Organizations
    Federal and Provincial Bureau of Statistics, Crop Reporting Service-Agriculture Department, Census and Registration Organization etc.
  • Semi-Government Organization
    Municipal committees, District Councils, Commercial and Financial Institutions like banks etc
  • Teaching and Research Organizations
  • Research Journals and Newspapers
  • Internet

Data Structure in R Language