The five number summary statistics is a set of descriptive statistics that summarizes a data set under study. Five number summary statistics consists of five numerical values that divide the data set into four equal parts. The five number summary statistics are also known as quartiles five number summary.
Table of Contents
Five Number Summary Statistics includes the following values:
- Minimum Value: The smallest value in the data set.
- First Quartile ($Q_1$): The value that separates the lowest 25% of the data from the remaining data sets.
- Median ($Q_2$): The value that separates the lowest 50% from the highest 50% of the data.
- Third Quartile ($Q_3$): The value that separates the lowest 75% of the data from the highest 25% of the data.
- Maximum value: The largest value in the data set.
Visualization of Five Number Summary Statistics
A box plot can visually represent the five number summary statistics. The box plot displays the dataset’s range (Minimum and Maximum), the median ($Q_2$), and the quartiles ($Q_1$ and $Q_2$).
The Five number summary statistics is a useful way to quickly summarize: the central tendency, variability, and distribution of a data set.
Interquartile Range
The interquartile range (IQR) is a measure of variability that is based on the five number summary of a dataset. It is the difference between the third quartile ($Q_3$) and the first quartile ($Q_1$) of a data set. The rectangle in the box plot represents the interquartile range. The box represents the middle 50% of the data (between $Q_1$ and $Q_3$), with a line inside the box marking the median ($Q_2$).
What is a Box Plot
A box plot is a graphical representation of the five number summary statistics. It is also known as a box-and-whisker plot. It is used to see the distribution of the data and to detect outliers graphically/visually.
The relative positions of the quartiles and the median can provide clues about the shape of the distribution. For example, if the median is closer to $Q_1$, the distribution might be right-skewed. If the median is closer to $Q_3$, it might be left-skewed. If the median is roughly halfway between $Q_1$ and $Q_3$, the distribution might be roughly symmetric. The whiskers extend from the box to the minimum and maximum values, and sometimes outliers are plotted as individual points beyond the whiskers.
The five-number summary is a valuable tool for understanding the distribution of data and making comparisons between different datasets. It is often used in exploratory data analysis, quality control, and other statistical applications.
How to Compute the Five Number Summary Statistics:
- First, arrange the data in ascending order.
- Find the minimum and maximum values in the data set.
- Find the median:
- If the number of data points is odd, the median is the middlemost value in the sorted data.
- If the number of data points is even, the median is the average of the two middlemost middle values of the sorted data.
- Find $Q_1$ and $Q_3$:
- $Q_1$ is the median of the lower half of the data (excluding the median if the number of data points is odd).
- $Q_3$ is the median of the upper half of the data (excluding the median if the number of data points is odd).