# Basic Statistics and Data Analysis

## Frequency Distribution Table

Using Descriptive statistics we can organize the data to get the general pattern of the data and check where data values tend to concentrate and try to expose extreme or unusual data values.

A frequency distribution is a compact form of data in a table which displays the categories of observations according to there magnitudes and frequencies such that the similar or identical numerical values are grouped together. The categories are also known as groups, class intervals or simply classes. The classes must be mutually exclusive classes showing the number of observations in each class. The number of values falling in a particular category is called the frequency of that category denoted by f.

A Frequency Distribution shows us a summarized grouping of data divided into mutually exclusive classes and the number of occurrences in a class. Frequency distribution is a way of showing a raw (ungrouped or unorganized) data into grouped or organized data to show results of sales, production, income, loan, death rates, height, weight, temperature etc.

The relative frequency of a category is the proportion of observed frequency to the total frequency obtained by dividing observed frequency by the total frequency and denoted by r.f.  The sum of r.f. column should be one except for rounding error. Multiplying each relative frequency of class by 100 we can get percentage occurrence of a class. A relative frequency captures the relationship between a class total and the total number of observations.

The frequency distribution may be made for continuous data, discrete data and categorical data (for both qualitative and quantitative data). It can also be used to draw some graphs such as histogram, line chart, bar chart, pie chart, frequency polygon etc.

## Steps to make a Frequency Distribution of data are:

1. Decide about the number of classes. The number of classes usually between 5 and 20. Too many classes or too few classes might not reveal the basic shape of the data set, also it will be difficult to interpret such frequency distribution. The maximum number of classes may be determined by formula:
$\text{Number of Classes} = C = 1 + 3.3 log (n)$
$\text{or} \quad C = \sqrt{n} \quad {approximately}$where $n$ is the total number of observations in the data.
2. Calculate the range of the data (Range = Max – Min) by finding minimum and maximum data value. Range will be used to determine the class interval or class width.
3. Decide about width of the class denote by h and obtained by
$h = \frac{\text{Range}}{\text{Number of Classes}}= \frac{R}{C}$
Generally the class interval or class width is the same for all classes. The classes all taken together must cover at least the distance from the lowest value (minimum) in the data set up to the highest (maximum) value. Also note that equal class intervals are preferred in frequency distribution, while unequal class interval may be necessary in certain situations to avoid a large number of empty, or almost empty classes.
4. Decide the individual class limits and select a suitable starting point of the first class which is arbitrary, it may be less than or equal to the minimum value. Usually it is started before the minimum value in such a way that the mid point (the average of lower and upper class limits of the first class) is properly placed.
5. Take an observation and mark a vertical bar (|) for a class it belongs. A running tally is kept till the last observation. The tally counts  indicates five.
6. Find the frequencies, relative frequency,  cumulative frequency etc. as required.

Frequency Distribution Table

A frequency distribution is said to be skewed when its mean and median are different. The kurtosis of a frequency distribution is the concentration of scores at the mean, or how peaked the distribution appears if depicted graphically, for example, in a histogram. If the distribution is more peaked than the normal distribution it is said to be leptokurtic; if less peaked it is said to be platykurtic.

## Constructing Frequency Tables

A frequency table is a way of summarizing a set of data. It is a record of the each value (or set of values) of the variable in data/question.

A grouping of qualitative data into mutually exclusive classes showing the number of observations in each class is called frequency table. The number of values falling in a particular category/class is called the frequency of that category/class denoted by f.

If data of continuous variable is arranged into different classes with their frequencies then this is known as continuous frequency distribution. If data of discrete variable is arranged into different classes with their frequencies then it is known as discrete distribution or discontinuous distribution.

Example

 Car type Number of cars Local 50 Foreign 30 Total Cars 80

Frequency distribution may be constructed both for discrete and continuous variables. Discrete frequency distribution can be converted back to original values, but for continuous variables it is not possible.

Following steps are taken into account while constructing frequency tables for continuous data.

1. Calculate the range of the data. Range is the difference of the highest and smallest values of the given data.
Range = Highest Value – Lowest Value
2. Decide the number of Classes. Maximum number of classes may be determined by the formula
Number of classes $C = 2^k$     OR    Number of classes $(C) = 1+3.3 log (n)$
Note that: Too many classes or too few classes might not reveal the basic shape of the data set.
3. Determine the Class Interval or Width
The class all taken together should cover at least the distance from the lowest value in the data up to the highest value, which can be done by this formula $I=\frac{Highest Value – Lowest Value}{Number of Classes}$
Where I is the class interval, H is the highest observed value, and L is the lowest observed value and K is the number of classes.
Generally the class interval or width should be the same for all classes.
In particular interval size is usually rounded up to some convenient number, such as a multiple of 10 or 100. Unequal class intervals present problems in graphically portraying the distribution and in doing some of the computations. Unequal class intervals may be necessary in certain situations such as to avoid a large number of empty or almost empty classes.
4. Set the Individual Class Limits
Class limits are the end points in class interval. State clear class limits so that you can put each of the observation into one and only one category i.e. you must avoid the overlapping or unclear class limits. Because class intervals are usually rounded up to get a convenient class size, cover a larger than necessary range.
It is convenient to choose the end points of the class interval so that no observation falls on them. It can be obtained by expressing the end points to one more place of decimal than the observations themselves, i.e. limits are converted to class boundaries to achieve continuity in data.
5. Tally the Observation into the Classes
6. Count the Number of Items in each Class
The number of observation in each class I called the class frequency. Note the totaling the frequencies in each class must equals the total number of observations. After following these steps, we have organized the data into a tabulation form which is called a frequency distribution, which can be used to summarize the pattern in the observation i.e. the concentration of the data.

Frequency Distribution Table

Note: Arranging/organizing the data into a tabulation or frequency distribution results in loss of detailed information as individuality of observations vanishes i.e. in frequency distribution we cannot pinpoint the exact value, and we cannot tell the actual lowest and highest values of the data. However the lower limit of the largest class conveys some essentially the same meaning. So the advantages of condensing the data into a more understandable and organized form are more than offset this disadvantage.