Stem and Leaf plot: Visualize the Features of the Distribution
Before performing any statistical calculation (even the simplest one), data should be tabulated or plotted especially if they are of quantitative nature and are few in number (few observations) to visualize the shape of the distribution.
A stem and leaf plot is a way of summarizing the set of data measured on an interval scale in condensed form. Stem and leaf plot are often used in exploratory data analysis, and help to illustrate the different features of the distribution of the observed data. A basic stem and leaf display contains two columns separated by a vertical line. The left side of the vertical line contains the stems while the right side of the vertical line contains the leaves. It is customary to sort the values within each stem from smallest to largest. In this statistical technique (to present a set of data), each numerical value is divided into two parts
- Leading Digit(s)
- Trailing Digit
Stem values are the leading digit(s) and leaves are trailing digit. The stems are located along the vertical axis, and the leaf values are stacked against each other along the horizontal axis.
A stem and leaf display is similar to a frequency distribution with more information. It provides information about the symmetry, concentration, empty sets and outlier of the observed data set. Organizing the data into a frequency distribution has a disadvantage of
- Lose of the exact identity of each value (individuality of observation vanishes)
- Did not know (sure) how the values within each class are distributed.
The advantage of the stem and leaf plot (display) over a frequency distribution is that we do not lose identity (individuality) of each observation. Similarly, a stem and leaf plot is similar to a histogram but is usually provide more information for a relatively small data set.
More than one data set can be compared by using multiple stem and leaf plots. Using a back-to-back stem and leaf plot we can compare the same characteristics into different groups.
The origin of the stem and leaf plot is associated with Tukey, J.W (1977).
Constructing a stem and leaf display
Let we have the following data set: 56, 65, 98, 82, 64, 71, 78, 77, 86, 95, 91, 59, 69, 70, 80, 92, 76, 82, 85, 91, 92, 99, 73 and want to draw required graph of the given data.
First of all its better to sort the data. The sorted data is 56, 59, 64, 65, 69, 70, 71, 73, 76, 77, 78, 80, 82, 82, 85, 86, 91, 91, 92, 92, 95, 98, 99.
Now the first digit is stem and the second one is a leaf, i.e stems are from 5 to 9 as data ranges from 56 to 99.
Draw a vertical line separating stem from leaf. Put stem values on the left side of the vertical line (bar) and leaf values on the right side of the vertical line. Note that Each number is assigned to the graph (plot) by pairing the units digit, or leaf, with the correct stem. The score 56 is plotted by placing the units digit 6, to the right of stem 5.
The stem and leaf plot of the above data would look like.
The decimal point is 1 digit(s) to the right of the |
Stem | Leaf
5 | 6 9
6 | 4 5 9
7 | 0 1 3 6 7 8
8 | 0 2 2 5 6
9 | 1 1 2 2 5 8 9
Stem and leaf plot look like histogram by rotating it anti-clockwise.
By adding columns of frequency and cumulative frequency in stem and leaf plot we can find the median of the data.
Reference
- Tukey, J. W (1977). Explanatory data analysis.
- https://en.wikipedia.org/wiki/Stem-and-leaf_display