Pareto Chart Easy Guide (2012)

A Pareto chart named after Vilfredo Pareto (an Italian Economist) is a bar chart in which all bars are ordered from largest to the smallest along with a line showing the cumulative percentage and count of the bars. The left vertical axis has the frequency of occurrence (number of occurrences), or some other important unit of measure such as cost. The right vertical axis contains the cumulative percentage of the total number of occurrences or the total of the particular unit of measure such as total cost. For the Pareto chart, the cumulative function is concave because the bars (representing the reasons) are in decreasing order. A Pareto chart is also called a Pareto distribution diagram.

The Pareto chart is also known as the 80/20 rule chart. These charts offer several benefits for data analysis and problem-solving.

A Pareto chart can be used when the following questions have their answer is “yes”

  1. Can data be arranged into categories?
  2. Is the rank of each category important?

Pareto charts are often used to analyze defects in a manufacturing process or the most frequent reasons for customer complaints to help determine the types of defects that are most prevalent (important) in a process. So a Company can focus on improving its efforts in particular important areas where it can make the largest gain or the lowest loss by eliminating causes of defects. So it’s easy to prioritize the problem areas using Pareto charts. The categories in the “tail” of the Pareto chart are called the insignificant factors.

Pareto Chart Example

Pareto Chart

The Pareto chart given above shows the reasons for consumer complaints against airlines in 2004. Here each bar represents the number (frequency) of each complaint received. The major complaints received are related to flight problems (such as cancellations, delays, and other deviations from the schedule). The 2nd largest complaint is about customer service (rude or unhelpful employees, inadequate meals or cabin service, treatment of delayed passengers, etc.). Flight problems account for 21% of the complaints, while both flight problems and customer service account for 40% of the complaints. The top three complaint categories account for 55% of the complaints. So, to reduce the number of complaints, airlines should need to work on flight delays, customer service, and baggage problems.

By incorporating Pareto-charts into data analysis, one can get valuable insights, prioritize effectively, and make data-driven decisions.

Charts and Graphs

References:

  • Nancy R. Tague (2004). “Seven Basic Quality Tools”. The Quality Toolbox. Milwaukee, Wisconsin: American Society for Quality. p. 15. Retrieved 2010-02-05.
  • http://en.wikipedia.org/wiki/Pareto_chart

See more about Charts and Graphs

Online MCQs Intermediate Mathematics (Matrices and Determinants)

Graphs in R Language

Cumulative Frequency Distribution and Polygon (2012)

Introduction to Cumulative Frequency Distribution

A cumulative frequency distribution (cumulative frequency curve or ogive) and a cumulative frequency polygon require cumulative frequencies. The cumulative frequency is denoted by CF and for a class interval it is obtained by adding the frequency of all the preceding classes including that class. It indicates the total number of values less than or equal to the upper limit of that class. For comparing two or more distributions, relative cumulative frequencies or percentage cumulative frequencies are computed.

The relative cumulative frequencies are the proportions of the cumulative frequency denoted by CRF and are obtained by dividing the cumulative frequency by the total frequency (Total number of Observations). The CRF of a class can also be obtained by adding the relative frequencies (rf) of the preceding classes including that class. Multiplying the relative frequencies by 100 gives the corresponding percentage cumulative frequency of a class.

Method of Construction of Cumulative Frequencies

The method of construction of cumulative frequencies and cumulative relative frequencies is explained in the following table:

Cumulative Frequency Distribution

Plot a Cumulative Frequency Distribution

To plot a CF distribution, scale the upper limit of each class along the x-axis and the corresponding cumulative frequencies along the y-axis. For additional information, you can label the vertical axis on the left in units and the vertical axis on the right in percent. The cumulative frequencies are plotted along the y-axis against upper or lower-class boundaries and the plotted points are joined by a straight line. Cumulative Frequency Polygon can be used to calculate median, quartiles, deciles, and percentiles, etc.

Data Visualization in R Programming Language

Cumulative Frequency Distribution Ogive
Cumulative Frequency Polygon or Ogive
Cumulative Frequency distribution and Frequency polygon

Pie Chart | Visual Display of Categorical Data

A pie chart is a way of summarizing a set of categorical data. It is a circle that is divided into segments/sectors. Each segment represents a particular category. The area of each segment is proportional to the number of cases in that category. It is a useful way of displaying the data where the division of a whole into parts needs to be presented. It can also be used to compare such divisions at different times.

Pie Chart

A pie chart is constructed by dividing the total angle of a circle of 360 degrees into different components. The angle A for each sector is obtained by the relation:

$$A=\frac{Component Part}{Total}\times 360$$

Each sector is shaded with different colors or marks so that they look separate from each other.

Pie Chart Example

Make an appropriate chart for the data available regarding the total production of urea fertilizer and its use on different crops. Let the total production of urea be about 200 thousand (kg) and its consumption for different crops wheat, sugarcane, maize, and lentils is 75, 80, 30, and 15 thousand (kg) respectively.

Solution:

The appropriate diagram seems to be a pie chart because we have to present a whole into 4 parts. To construct a pie chart, we calculate the proportionate arc of the circle, i.e.

CropsFertilizer (000 kg) Proportionate arc of the circle
Wheat 75  $\frac{75}{200}\times 360=135$
Sugarcane 80   $\frac{80}{200}\times 360=144$
Maize

30

$\frac{30}{200}\times 360=54$
Lentils 15   $\frac{15}{200}\times 360=27$
Total 200 360

Now draw a circle of an appropriate radius, and make the angles clockwise or anticlockwise with the help of a protractor or any other device. For wheat make an angle of 135 degrees, for sugarcane an angle of 44 degrees, for maize, an angle of 54 degrees, and for lentils, an angle of 27 degrees, hence the circular region is divided into 4 sectors. Now shade each of the sectors with different colors or marks so that they look different from each other. The pie chart of the above data is

Pie Chart

Online MCQs Test Preparation Website gmstat.com

Favourite Subjects Pie Chart Example
Favourite Subjects

Scatter Diagram: Graphical Representation (2012)

A scatterplot (also called a scatter graph or scatter Diagram) is used to observe the strength and direction between two quantitative variables. In statistics, the quantitative variables follow the interval or ratio scale from measurement scales.

Scatter Diagram

Usually, in a scatter, diagram the independent variable (also called the explanatory, regressor, or predictor variable) is taken on the X-axis (the horizontal axis) while on the Y-axis (the vertical axis) the dependent (also called the outcome variable) is taken to measure the strength and direction of the relationship between the variables. However, it is not necessary to take explanatory variables on the X-axis and outcome variables on the Y-axis. Because, the scatter diagram and Pearson’s correlation measure the mutual correlation (interdependencies) between the variables, not the dependence or cause and effect.

The diagram below describes some possible relationships between two quantitative variables ($X$ & $Y$). A short description is also given of each possible relationship.

Scatter diagram

A scatter diagram can be drawn between two quantitative variables. The length (number of observations) of both of the variables should be equal. Suppose, we have two quantitative variables $X$ and $Y$. We want to observe the strength and direction of the relationship between these two variables. It can be done in R language easily.

x <- c(5, 7, 8, 7, 2, 2, 9, 4, 11 ,12, 9, 6)
y <- c(99, 86, 87, 88, 111, 103, 87, 94, 78, 77, 85, 86)

plot(x, y)
Scatter Diagram

From the above discussion, it is clear that the main objective of a scatter diagram is to visualize the linear or some other type of relationship between two quantitative variables. The visualization may also help to depict the trends, strength, and direction of the relationship between variables.

Limitations of Scatter Diagrams

  • Limited to Two Variables: Scatter plots can only depict the relationship between two variables at a time. If there are more than two variables, one might need to use other visualization techniques.
  • Strength of Correlation: While scatter diagrams can show the direction of a relationship, they don’t necessarily indicate the strength of that correlation. You might need to calculate correlation coefficients to quantify the strength.

In conclusion, scatter diagrams are a powerful and versatile tool for exploring relationships between variables. By understanding how to create and interpret them, one can gain valuable insights from the data and inform decision-making processes across various disciplines.

https://itfeature.com

For more about correlation and regression analysis

Learn R Language for Statistical Computing