Stem and Leaf Plot: Exploratory Data Analysis

Before performing any statistical calculation (even the simplest one), data should be tabulated or plotted especially if they are quantitative and are few (few observations) to visualize the shape of the distribution.

A stem and leaf plot is a way of summarizing the set of data measured on an interval scale in condensed form. Stem and leaf plots are often used in exploratory data analysis and help to illustrate the different features of the distribution of the observed data. A basic stem and leaf display contains two columns separated by a vertical line. The left side of the vertical line contains the stems while the right side of the vertical line contains the leaves. It is customary to sort the values within each stem from smallest to largest. In this statistical technique (to present a set of data), each numerical value is divided into two parts

  1. Leading Digit(s)
  2. Trailing Digit

Stem values are the leading digit(s) and leaves are the trailing digit. The stems are located along the vertical axis, and the leaf values are stacked against each other along the horizontal axis.

A stem and leaf display is similar to a frequency distribution with more information. It provides information about the symmetry, concentration, empty sets, and outliers of the observed data set. Organizing the data into a frequency distribution has the disadvantage of

  1. Lose of the exact identity of each value (individuality of observation vanishes)
  2. Did not know (sure) how the values within each class are distributed.

The advantage of the stem and leaf plot (display) over a frequency distribution is that we do not lose the identity (individuality) of each observation. Similarly, a stem and leaf plot is similar to a histogram but usually provides more information for a relatively small data set.

More than one data set can be compared by using multiple stem and leaf plots. Using a back-to-back stem and leaf plot we can compare the same characteristics into different groups.

The origin of the stem and leaf plot is associated with Tukey, J.W (1977).

Constructing a Stem and Leaf Plot

Let us have the following data set: 56, 65, 98, 82, 64, 71, 78, 77, 86, 95, 91, 59, 69, 70, 80, 92, 76, 82, 85, 91, 92, 99, 73 and want to draw the required graph of the given data.

First of all, it’s better to sort the data. The sorted data is 56, 59, 64, 65, 69, 70, 71, 73, 76, 77, 78, 80, 82, 82, 85, 86, 91, 91, 92, 92, 95, 98, 99.

Now the first digit is the stem and the second one is a leaf, i.e. stems are from 5 to 9 as data ranges from 56 to 99.

Draw a vertical line separating the stem from the leaf. Put stem values on the left side of the vertical line (bar) and leaf values on the right side of the vertical line.  Note that Each number is assigned to the graph (plot) by pairing the unit digit, or leaf, with the correct stem. The score 56 is plotted by placing the units digit  6, to the right of stem 5.

The stem and leaf plot of the above data would look like.

The decimal point is 1 digit(s) to the right of the |
Stem | Leaf
5      | 6 9
6      | 4 5 9
7      | 0 1 3 6 7 8
8      | 0 2 2 5 6
9      | 1 1 2 2 5 8 9

The stem and leaf plot looks like a histogram by rotating it anti-clockwise.

By adding columns of frequency and cumulative frequency in the stem and leaf plots we can find the median of the data.

stem and Leaft Plot
Stem and Leaf Plot

Reference

Leave a Comment

Discover more from Statistics for Data Analyst

Subscribe now to keep reading and get access to the full archive.

Continue reading