Introduction
A cumulative frequency distribution (cumulative frequency curve or ogive) and a cumulative frequency polygon require cumulative frequencies. The cumulative frequency is denoted by CF, and for a class interval, it is obtained by adding the frequency of all the preceding classes, including that class. It indicates the total number of values less than or equal to the upper limit of that class. For comparing two or more distributions, relative cumulative frequencies or percentage cumulative frequencies are computed.
Table of Contents
A cumulative frequency distribution shows the running total of frequencies up to a certain point in a dataset. It tells you how many values lie below (or above) a particular value or class interval. There are two types:
- Less than cumulative frequency: Total frequency up to and including a class.
- Greater than cumulative frequency: Total frequency from a class and above.
The relative cumulative frequencies are the proportions of the cumulative frequency denoted by CRF and are obtained by dividing the cumulative frequency by the total frequency (Total number of Observations). The CRF of a class can also be obtained by adding the relative frequencies (rf) of the preceding classes, including that class. Multiplying the relative frequencies by 100 gives the corresponding percentage cumulative frequency of a class.
Method of Construction of Cumulative Frequencies
The method of construction of cumulative frequencies and cumulative relative frequencies is explained in the following table:
Plot a Cumulative Frequency Distribution
To plot a CF distribution, scale the upper limit of each class along the x-axis and the corresponding cumulative frequencies along the y-axis. For additional information, you can label the vertical axis on the left in units and the vertical axis on the right in percentages. The cumulative frequencies are plotted along the y-axis against upper or lower-class boundaries, and the plotted points are joined by a straight line. Cumulative Frequency Polygon can be used to calculate median, quartiles, deciles, and percentiles, etc.
Data Visualization in R Programming Language
Uses of Cumulative Frequency Distribution
- Understanding Data Distribution: It helps in visualizing how data accumulates and how it is distributed across intervals. For example, how many students scored less than 70 on a class test?
- Identifying Percentiles, Quartiles, and Medians: Cumulative frequency is essential for calculating:
- Median (50th percentile)
- Lower Quartile ($Q_1$: 25th Percentile)
- Upper Quartile ($Q_3$: 7th Percential)
The example is in a class of 100 students, the 50th value gives the median score.
- Comparing Groups: It makes it easy to compare two datasets — for example, comparing scores of two different classes or years. For example, Class A vs. Class B test scores over the same intervals.
- Visualizing with Ogive (Cumulative Frequency Graph): The ogive curve helps identify the median, quartiles, and general data trends visually. For example, plotting a cumulative graph of income groups shows how many people earn less than a certain amount.
- Simplifying Large Data Sets: For large datasets, cumulative frequency helps summarize and compress information into a more understandable form.
- Estimating Probabilities: It provides a rough idea of how likely a value or range of values is to occur. For example, what is the chance that a randomly selected customer is younger than 35?
FAQs about Cumulative Frequency Distribution
- What is Cumulative Frequency?
- What is Ogive?
- What is a cumulative frequency polygon?
- What can be measured from a Distribution of Cumulative Frequency?
- What can be visualized from ogive?
- What are the two methods for the construction of cumulative frequency distribution?
- What are relative cumulative frequencies?
- How can one compare two data sets?