Range Measure of Dispersion (2013)

Measure of Central Tendency provides typical value about the data set, but it does not tell the actual story about the data i.e. mean, median, and mode are enough to get summary information, though we know about the center of the data. In other words, we can measure the center of the data by looking at averages (mean, median, and mode). These measures tell nothing about the spread of data. So for more information about data, we need some other measure, such as the Range measure of dispersion or spread.

Range Measure of Dispersion

The Spread of data can be measured by calculating the range of data; the range tells us how many numbers of data extend. The range is an absolute measure of dispersion that can be found by subtracting the highest value (called upper bound) in data from the smallest value (called lower bound). i.e.

Range = Upper Bound – Lowest Bound
OR
Range = Largest Value – Smallest Value

This absolute measure of dispersion has disadvantages as range only describes the width of the data set (i.e. only spread out) measured in the same unit as data, but it does not give the real picture of how data is distributed. If data has outliers, using range to describe the spread of that can be very misleading as the range is sensitive to outliers.

We need to be careful in using the range measure of dispersion as it does not give the full picture of what’s going between the highest and lowest values. It might give a misleading picture of the spread of the data because it is based only on the two extreme values. Therefore, Range is an unsatisfactory measure of dispersion.

Range measure-of-dispersion

However, the range measure of dispersion is widely used in statistical process control such as control charts of manufactured products, daily temperature, stock prices, etc., applications as it is very easy to calculate. It is an absolute measure of dispersion, its relative measure known as the coefficient of dispersion defines the relation

\[Coefficient\,\, of\,\, Dispersion = \frac{x_m-x_0}{x_m-x_0}\]

Measure of Dispersion

The coefficient of dispersion is pure dimensionless and is used for comparison purposes.

Data Frame in R Language

Online MCQs Test Website

Introduction to Mathematica (2013)

MATHEMATICA created by Steven Wolfram, a product of Wolfram Research, Inc. Mathematica is available for different operating systems, such as SGI, Sun, NeXT, Mac, DOS, and Windows. This introduction to Mathematica will help you to understand its use as a mathematical and programming language with numerical, symbolic, and graphical calculations.

Introduction to Mathematica

  1. A calculator for arithmetic, symbolic, and algebraic calculations
  2. A language for developing transformation rules, so that general mathematical relationships can be expressed
  3. An interactive environment for the exploration of numerical, symbolic, and graphical calculations
  4. A tool for preparing input to other programs, or to process output from other programs

Getting Started with Mathematica

Starting Mathematica will open a fresh window or a notebook, where we do all mathematical calculations and some graphics. Initially window’s title is “untitled-1” which can be changed after saving the notebook by name as desired. Mathematica notebook with text, graphics, and Mathematica input and output

Introduction to mathematica notebook

Entering Expressions

Type 1+1 in the notebook and press the ENTER key from the keyboard. You will get an answer in the next line of work area. This is called evaluating or entering the expression. Note that Mathematica places “In[1]:=” and “out[1]=” (without quotation marks) labels to 1+1 and 2 respectively. You will also see a set of brackets on the right side of the input and output. The innermost brackets enclose the input and output while the outer bracket (larger bracket) groups the input and output. Each bracket contains a cell. Each time you enter or change the input you will notice that the “In” and “Out” labels will also be changed.

Basic Arithmetic

Mathematica can perform basic operations of additions (+), subtraction (-), multiplication (*), division (/), exponentiation(^), etc. For example, write the following line for basic arithmetic in Mathematica

2*3+4^2
5*6
2(3+4)
(2-3+1)(1+2/3)-5^(-1)
6!

Using Previous Results in Mathematica

Often we need the output of the first (previous) calculations in our next (coming) computation. For this purpose % symbol can be used to refer to the output of the previous cell. For example,

2^5
% + 100

Here 2^5 is added in 100.

%% refers to the result before the last results (2nd last).

Exact vs Approximation

Mathematica can give approximate results; when we need

3^20/2^21 produces $\frac{3486784401}{2097152}$

We can force Mathematica to approximate results in decimals by putting decimals in expressions (with any digit or number) such as

3.0^20/ 2^21

For a decimal in number in an expression, Mathematica considers it to be an approximation rather than an exact number.

Wolfram Mathematica

R Frequently Asked Questions

Sampling Frame and Sampling Unit: A Quick Reference

The post is about the concept of Sampling Frame and Sampling Unit.

Sampling Unit

The population divided into a finite number of distinct and identifiable units is called sampling units. OR

The individuals whose characteristics are to be measured in the analysis are called elementary or sampling units. OR

Before selecting the sample, the population must be divided into parts called sampling units or simply sample units.

Sampling Frame

The list of all the sampling units with a proper identification (which represents the population to be covered is called the sampling-frame). The frame may consist of either a list of units or a map of the area (in case a sample of the area is being taken), such that every element in the population belongs to one and only one unit.

The frame should be accurate, free from omission and duplication (overlapping), adequate, and up-to-date units must cover the whole of the population and should be well identified.

In improving the sampling design, supplementary information for the field covered by the sampling frame may also be valuable.

Sampling Frame and Sampling Unit

Sampling Frame and Sampling Unit: Examples

  1. List of households (and persons) enumerated in the population census.
  2. A map of areas of a country showing the boundaries of area units.
  3. In sampling an agricultural crop, the unit might be a field, a farm, or an area of land whose shape and dimensions are at our disposal.

An ideal sampling frame will have the following qualities/characteristics:

  • all sampling units have a logical and numerical identifier
  • all sampling units can be found i.e. contact information, map location, or other relevant information about sampling units is present
  • the frame is organized in a logical and systematic manner
  • the sampling frame has some additional information about the units that allow the use of more advanced sampling frames
  • every element of the population of interest is present in the frame
  • every element of the population is present only once in the frame
  • no elements from outside the population of interest are present in the frame
  • the data is up-to-date

Classification of Sampling Frame

A sampling frame can be classified as subject to several types of defects as follows:

A frame may be inaccurate: where some of the sampling units of the population are listed inaccurately or some units that do not exist are included in the list.

A frame may be inadequate: when it does not include all classes of the population that are to be taken in the survey.

A frame may be incomplete: when some of the sampling units of the population are either completely omitted or include more than once.

A frame may be out of date: when it has not been updated according to the demand of the occasion, although it was accurate, complete, and adequate at the time of construction.

Imagine you are interested in studying the eating habits of people in your city. The entire population of the city would be too big to survey, so you decide to take a sample. The sampling-frame would be like a phone book of everyone in the city. The sampling unit would be each person listed in the phone book.

Summary

Remember that the quality of the sampling-frame directly affects the representativeness of the sample. If the frame does not accurately reflect the population, the results may be biased.

In short, the quality of the sampling-frame directly affects the validity of the study. Ideally, the frame should be complete (including everyone in the target population) and accurate (with no duplicates or errors). In reality, perfect frames can be difficult to achieve, but researchers strive to get as close as possible.

FAQs about Samling Frames and Sampling Units

  1. Define Sampling frame.
  2. Define Sampling unit.
  3. How a sampling frame should be?
  4. What is the classification of the sampling frame?
  5. Give some examples of sampling frames and sampling units.

MCQs General Knowledge

R and Data Analysis

Point Estimation of Parameters

Introduction to Point Estimation of Parameters

The objective of point estimation of parameters is to obtain a single number from the sample which will represent the unknown value of the parameter.

Practically we did not know about the population mean and standard deviation i.e. population parameters such as mean, standard deviation, etc. However, our goal is to measure (estimate) the mean and standard deviation of the population we are interested in from sample information to save time, cost, etc.  This can be done by estimating the sample mean and standard deviation as the best guess for the true population mean and standard deviation.  We can call this estimate a “best guess” and termed a “point estimate” as it is a single number summarized one.

Point Estimate

A Point Estimate is a statistic (a statistical measure from the sample) that gives a plausible estimate (or possibly a best guess) for the value in question.

$\overline{x}$ is a point estimate for $\mu$ and s is a point estimate for $\sigma$.

Or we can say that

A statistic used to estimate a parameter is called a point estimator or simply an estimator. The actual numerical value which we obtain for an estimator in a given problem is called an estimate.

Generally symbol $\theta$ (unknown constant) is used to denote a population parameter which may be a proportion, mean, or some measure of variability. The available information is in the form of a random sample $X_1, X_2, \cdots, X_n$ of size n drawn from the population. We wish to formulate a function of the sample observations $X_1, X_2, \cdots, X_n$; that is, we look for a statistic such that its value computed from the sample data would reflect the value of the population parameter as closely as possible. The estimator of $\theta$ is commonly denoted by $\hat{\theta}$. Different random samples usually provide different values of the statistic $\hat{\theta}$ having its sampling distribution.

Note that Unbiasedness, Efficiency, Consistency, and Sufficiency are the criteria (statistical properties of the estimator) to identify whether a statistic is a “good” estimator.

Application of Point Estimator Confidence Intervals

We can build intervals with confidence as we are not only interested in finding the point estimate for the mean but also in determining how accurate the point estimate is. Here the Central Limit Theorem plays a very important role in building confidence interval.  We assume that the sample standard deviation is close to the population standard deviation (which will almost always be true for large samples). The standard deviation of the sampling distribution of the estimator (here for mean) is

\[\sigma_x \approx \frac{\sigma}{\sqrt{n}}\]

Our interest is to find an interval around $\overline{x}$ such that there is a large probability that the actual (true) mean falls inside the computed interval.  This interval is called a confidence interval and the large probability is called the confidence level.

Example of Point Estimation of Parameters

Question: Suppose that we check for clarity in 50 locations in Lake and discover that the average depth of clarity of the lake is 14 feet with a standard deviation of 2 feet.  What can we conclude about the average clarity of the lake with a 95% confidence level?

Solution: Variable $x$ (depth of lack at 50 locations) can be used to provide a point estimate for $\mu$ and s to provide a point estimate for $s$. To answer how accurate is $x$ as a point estimate, we can construct a 95% confidence interval for $\mu$ as follows.

normal curve: Point Estimation of Parameters

Draw the picture given below and use the standard normal table to find the z-score associated with the probability of .025 (there is .025 to the left and .025 to the right i.e. two-tailed case).

The Z-score for a 95% confidence level is about $\pm 1.96$.

\begin{align*}
Z&=\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\\
\pm 1.96&=\frac{\overline{x}-\mu}{\frac{2}{\sqrt{n}}}\\
\overline{x}-14&=\pm 0.5488
\end{align*}

Note that $Z\frac{\sigma}{\sqrt{n}}$ is called the margin of error.

The 95% confidence interval for the mean clarity will be (13.45, 14.55)

In other words, there is a 95% chance that the mean clarity is between 13.45 and 14.55.

In general, if $z$ is the standard normal table value associated with a given level of confidence then a $\alpha$% confidence interval for the mean is

\[\overline{x} \pm Z_{\alpha}\frac{\sigma}{\sqrt{n}}\]

See more at Wikipedia about Point Estimation of Parameters

R Frequently Asked Questions