Creating Frequency Distribution Table (2014)

Using Descriptive statistics we can organize the data to get the general pattern of the data and check where data values tend to concentrate and try to expose extreme or unusual data values. Let us start learning about the Frequency Distribution Table and its construction.

A frequency distribution is a compact form of data in a table that displays the categories of observations according to their magnitudes and frequencies such that similar or identical numerical values are grouped. The categories are also known as groups, class intervals, or simply classes. The classes must be mutually exclusive classes showing the number of observations in each class. The number of values falling in a particular category is called the frequency of that category denoted by $f$.

A Frequency Distribution Table shows us a summarized grouping of data divided into mutually exclusive classes and the number of occurrences in a class. Frequency distribution is a way of showing raw (ungrouped or unorganized) data into grouped or organized data to show results of sales, production, income, loan, death rates, height, weight, temperature, etc.

The relative frequency of a category is the proportion of observed frequency to the total frequency obtained by dividing observed frequency by the total frequency and denoted by $r.f.$.  The sum of r.f. column should be one except for rounding errors. Multiplying each relative frequency of class by 100 we can get the percentage occurrence of a class. A relative frequency captures the relationship between a class total and the total number of observations.

The Frequency Distribution Table may be made for continuous data, discrete data, and categorical data (for both qualitative and quantitative data). It can also be used to draw some graphs such as histograms, line charts, bar charts, pie charts, frequency polygons, Pareto Charts, Scatter diagrams, stem and leaf displays, etc.

Steps of Creating Frequency Distribution Table

1. Decide about the number of classes. The number of classes is usually between 5 and 20. Too many classes or too few classes might not reveal the basic shape of the data set, also it will be difficult to interpret such frequency distribution. The maximum number of classes may be determined by the formula:
$\text{Number of Classes} = C = 1 + 3.3 log (n)$
$\text{or} \quad C = \sqrt{n} \quad {approximately}$where $n$ is the total number of observations in the data.
2. Calculate the range of the data ($Range = Max – Min$) by finding minimum and maximum data values. The range will be used to determine the class interval or class width.
3. Decide about the width of the class denoted by h and obtained by
$h = \frac{\text{Range}}{\text{Number of Classes}}= \frac{R}{C}$
Generally, the class interval or class width is the same for all classes. The classes all taken together must cover at least the distance from the lowest value (minimum) in the data set up to the highest (maximum) value. Also note that equal class intervals are preferred in frequency distribution, while unequal class intervals may be necessary in certain situations to avoid a large number of empty, or almost empty classes.
4. Decide the individual class limits and select a suitable starting point for the first class which is arbitrary, it may be less than or equal to the minimum value. Usually, it is started before the minimum value in such a way that the midpoint (the average of lower and upper-class limits of the first class) is properly placed.
5. Take an observation and mark a vertical bar (|) for a class it belongs. A running tally is kept till the last observation. The tally counts indicate five.
6. Find the frequencies, relative frequency,  cumulative frequency, etc. as required.

A frequency distribution is said to be skewed when its mean and median are different. The kurtosis of a frequency distribution is the concentration of scores at the mean, or how peaked the distribution appears if depicted graphically, for example, in a histogram. If the distribution is more peaked than the normal distribution it is said to be leptokurtic; if less peaked it is said to be platykurtic.