Constructing Frequency Tables (2012)

A frequency table is a way of summarizing a set of data. It is a record of each value (or set of values) of the variable in the data/question. In this post, we will learn about the ways of Constructing Frequency Tables for discrete and continuous data.

A grouping of qualitative data into mutually exclusive classes showing the number of observations in each class is called a frequency table. The number of values falling in a particular category/class is called the frequency of that category/class denoted by $f$.

If data of continuous variables are arranged into different classes with their frequencies, then this is known as continuous frequency distribution. If data of discrete variables is arranged into different classes with their frequencies then it is known as discrete distribution or discontinuous distribution.

Discrete Frequency Distribution Table Example

Car TypeNumber of Cars
Local 50
Foreign 30
Total Cars

80

Constructing Frequency Tables

Constructing Frequency tables (distributions) may be done for both discrete and continuous variables. A discrete frequency distribution can be converted back to original values, but for continuous variables, it is not possible.

The following steps are taken into account while constructing frequency tables for continuous data.

  1. Calculate the range of the data. The range is the difference between the highest and smallest values of the given data.
    \[Range = Highest Value – Lowest Value\]
  2. Decide the number of Classes. The maximum number of classes may be determined by the formula
    Number of classes $C = 2^k$     OR    Number of classes $(C) = 1+3.3 log (n)$
    Note that: Too many classes or too few classes might not reveal the basic shape of the data set.
  3. Determine the Class Interval or Width
    The class all taken together should cover at least the distance from the lowest value in the data up to the highest value, which can be done by this formula \[I=\frac{Highest Value – Lowest Value}{Number of Classes}=\frac{H-L}{K}\]
    Where $I$ is the class interval, $H$ is the highest observed value, $L$ is the lowest observed value and $K$ is the number of classes.
    Generally, the class interval or width should be the same for all classes.
    In particular interval size is usually rounded up to some convenient number, such as a multiple of 10 or 100. Unequal class intervals present problems in graphically portraying the distribution and in doing some of the computations. Unequal class intervals may be necessary for certain situations such as to avoid a large number of empty or almost empty classes.
  4. Set the Individual Class Limits
    Class limits are the endpoints in the class interval. State clear class limits so that you can put each of the observations into one and only one category i.e. you must avoid overlapping or unclear class limits. Class intervals are usually rounded up to get a convenient class size, and cover a larger than necessary range.
    It is convenient to choose the endpoints of the class interval so that no observation falls on them. It can be obtained by expressing the endpoints to one more place of decimal than the observations themselves, i.e. limits are converted to class boundaries to achieve continuity in data.
  5. Tally the Observation into the Classes
  6. Count the Number of Items in each Class
    The number of observations in each class I called the class frequency. Note that totaling the frequencies in each class must equal the total number of observations. After following these steps, we have organized the data into a tabulation form which is called a frequency distribution, which can be used to summarize the pattern in the observation i.e., the concentration of the data.
Constructing Frequency Tables

Note: Arranging/organizing the data into a tabulation or frequency distribution results in a loss of detailed information as the individuality of observations vanishes i.e. in frequency distribution we cannot pinpoint the exact value, and we cannot tell the actual lowest and highest values of the data. However, the lower limit of the largest, class conveys some essentially the same meaning. So in constructing the frequency tables, the advantages of condensing the data into a more understandable and organized form are more than offset this disadvantage.

Further Reading

Frequency Distribution Tables

Learn R Programming Language

Pie Chart | Visual Display of Categorical Data

A pie chart is a way of summarizing a set of categorical data. It is a circle that is divided into segments/sectors. Each segment represents a particular category. The area of each segment is proportional to the number of cases in that category. It is a useful way of displaying the data where the division of a whole into parts needs to be presented. It can also be used to compare such divisions at different times.

Pie Chart

A pie chart is constructed by dividing the total angle of a circle of 360 degrees into different components. The angle A for each sector is obtained by the relation:

$$A=\frac{Component Part}{Total}\times 360$$

Each sector is shaded with different colors or marks so that they look separate from each other.

Pie Chart Example

Make an appropriate chart for the data available regarding the total production of urea fertilizer and its use on different crops. Let the total production of urea be about 200 thousand (kg) and its consumption for different crops wheat, sugarcane, maize, and lentils is 75, 80, 30, and 15 thousand (kg) respectively.

Solution:

The appropriate diagram seems to be a pie chart because we have to present a whole into 4 parts. To construct a pie chart, we calculate the proportionate arc of the circle, i.e.

CropsFertilizer (000 kg) Proportionate arc of the circle
Wheat 75  $\frac{75}{200}\times 360=135$
Sugarcane 80   $\frac{80}{200}\times 360=144$
Maize

30

$\frac{30}{200}\times 360=54$
Lentils 15   $\frac{15}{200}\times 360=27$
Total 200 360

Now draw a circle of an appropriate radius, and make the angles clockwise or anticlockwise with the help of a protractor or any other device. For wheat make an angle of 135 degrees, for sugarcane an angle of 44 degrees, for maize, an angle of 54 degrees, and for lentils, an angle of 27 degrees, hence the circular region is divided into 4 sectors. Now shade each of the sectors with different colors or marks so that they look different from each other. The pie chart of the above data is

Pie Chart

Online MCQs Test Preparation Website gmstat.com

Favourite Subjects Pie Chart Example
Favourite Subjects

The Word Statistics Meaning and Use

The post is about “The Word Statistics Meaning and Use”.

The word statistics was first used by German scholar Gottfried Achenwall in the middle of the 18th century as the science of statecraft concerning the collection and use of data by the state.

The word statistics comes from the Latin word “Status” or Italian word “Statistia” or German word “Statistik” or the French word “Statistique”; meaning a political state, and originally meant information useful to the state, such as information about sizes of the population (human, animal, products, etc.) and armed forces.

itfeature.com The word Statistics

According to pioneer statistician Yule, the word statistics occurred at the earliest in the book “The Element of universal erudition” by Baron (1770). In 1787 a wider definition was used by E.A.W. Zimmermann in “A Political Survey of the Present State of Europe”. It appeared in the Encyclopedia of Britannica in 1797 and was used by Sir John Sinclair in Britain in a series of volumes published between 1791 and 1799 giving a statistical account of Scotland. In the 19th century, the word statistics acquired a wider meaning covering numerical data of almost any subject and also interpretation of data through appropriate analysis.

The Word Statistics Now a Day

Now statistics are being used with different meanings.

  • Statistics refers to “numerical facts that are arranged systematically in the form of tables or charts etc. In this sense, it is always used as a plural i.e. a set of numerical information. For instance statistics on prices, road accidents, crimes, births, educational institutions, etc.
  • The word statistics is defined as a discipline that includes procedures and techniques used to collect, process, and analyze numerical data to make inferences and to reach an appropriate decision in a situation of uncertainty (uncertainty refers to incompleteness, it does not imply ignorance). In this sense word statistic is used in the singular sense. It denotes the science of basing the decision on numerical data.
  • The word statistics refers to numerical quantities calculated from sample observations; a single quantity calculated from sample observations is called statistics such as the mean. Here word statistics is plural.

“We compute statistics from statistics by statistics”

The first place of statistics is plural of statistics, in second place is plural sense data, and in third place is singular sense methods.

In another way, the word Statistics has two meanings:

  • The science of data:
    In this sense, statistics deals with collecting, analyzing, interpreting, and presenting numerical data. Therefore, statistics helps us to understand the world around us by making sense of large amounts of information. Statisticians use a variety of techniques to summarize data, identify patterns, and draw wise conclusions.
  • Pieces of data:
    Statistics also refers to the actual numerical data itself, for example, averages, percentages, or other findings from a study. The real-life examples of statistics are: (i) unemployment statistics or (ii) crime statistics.

Most Common Uses of Statistics

The following are the most common uses of Statistics in various fields of life.

Business and Economics

  • Market Research: Understanding consumer behaviour, satisfaction, preferences, and trends.
  • Operations Management: Optimizing processes, inventory control, and quality control.
  • Financial Analysis: Evaluating investments, risk management, and financial performance.

Healthcare

  • Clinical Trials: Compare and Evaluate the effectiveness and safety of new treatments.
  • Epidemiology: Studying the occurrence and distribution of diseases.
  • Public Health: Identifying health risks and developing prevention strategies.

Social Sciences

  • Sociology: Studying social phenomena, such as inequality, crime, and education.
  • Psychology: Understanding human behaviour, personality, and cognition.
  • Political Science: Analyzing political behaviour, public opinion, and election outcomes.

Government

  • Policy Development: Making informed decisions based on data and evidence.
  • Economic Planning: Forecasting economic growth and trends.
  • Public Administration: Improving efficiency and effectiveness of government services.

Education

  • Educational Research: Evaluating teaching methods, curriculum, and student outcomes.
  • Testing and Assessment: Developing and analyzing standardized tests.
  • Student Data Analysis: Identifying trends and addressing educational disparities.

Science and Technology

  • Research: Designing experiments, collecting data, and analyzing results.
  • Data Analysis: Discovering patterns, relationships, and insights in large datasets.
  • Machine Learning: Developing algorithms that can learn from data and make predictions.

Sports

  • Player Performance Analysis: Evaluating athlete performance and identifying areas for improvement.
  • Team Strategy: Developing game plans and making tactical decisions.
  • Sports Betting: Analyzing data to predict game outcomes.

For learning about the Basics of Statistics Follow the link Basic Statistics

Learn R Language

P value and Significance Level

Difference Between the P value and Significance Level?

Basically in hypothesis testing the goal is to see if the probability value is less than or equal to the significance level (i.e., is p ≤ alpha). It is also called the size of the test or the size of the critical region. It is generally specified before any samples are drawn so that the results obtained will not influence our choice.

p value and significance level

The difference between P Value and Significance Level is

  • The probability value (also called the p-value) is the probability of the observed result found in your research study occurring (or an even more extreme result occurring), under the assumption that the null hypothesis is true (i.e., if the null were true).
  • In hypothesis testing, the researcher assumes that the null hypothesis is true and then sees how often the observed finding would occur if this assumption were true (i.e., the researcher determines the p-value).
  • The significance level (also called the alpha level) is the cutoff value the researcher selects and then uses to decide when to reject the null hypothesis.
  • Most researchers select the significance or alpha level of 0.05 to use in their research; hence, they reject the null hypothesis when the p-value is less than or equal to 0.05.
  • The key idea of hypothesis testing is that you reject the null hypothesis when the p-value is less than or equal to the significance level of 0.05.
https://itfeature.com P-value and statistical significance

Learn about Regression Coefficients

Learn about Weighted Least Squares in R Language