Standard Error 2: A Quick Guide

Introduction to Standard Errors (SE)

Standard error (SE) is a statistical term used to measure the accuracy within a sample taken from a population of interest. The standard error of the mean measures the variation in the sampling distribution of the sample mean, usually denoted by $\sigma_\overline{x}$ is calculated as

\[\sigma_\overline{x}=\frac{\sigma}{\sqrt{n}}\]

Drawing (obtaining) different samples from the same population of interest usually results in different values of sample means, indicating that there is a distribution of sampled means having its mean (average values) and variance. The standard error of the mean is considered the standard deviation of all those possible samples drawn from the same population.

Size of the Standard Error

The size of the standard error is affected by the standard deviation of the population and the number of observations in a sample called the sample size. The larger the population’s standard deviation ($\sigma$), the larger the standard error will be, indicating more variability in the sample means. However, the larger the number of observations in a sample, the smaller the estimate’s SE, indicating less variability in the sample means. In contrast, by less variability, we mean that the sample is more representative of the population of interest.

Adjustments in Computing SE of Sample Means

If the sampled population is not very large, we need to make some adjustments in computing the SE of the sample means. For a finite population, in which the total number of objects (observations) is $N$ and the number of objects (observations) in a sample is $n$, then the adjustment will be $\sqrt{\frac{N-n}{N-1}}$. This adjustment is called the finite population correction factor. Then the adjusted standard error will be

\[\frac{\sigma}{\sqrt{n}} \sqrt{\frac{N-n}{N-1}}\]

Uses of Standard Error

  1. It measures the spread of values of statistics about the expected value of that statistic. It helps us understand how well a sample represents the entire population.
  2. It is used to construct confidence intervals, which provide a range of values likely to contain the true population parameter.
  3. It helps to test the null hypothesis about population parameter(s), such as t-tests and z-tests. It helps determine the significance of differences between sample means or between a sample mean and a population mean.
  4. It helps in determining the required sample size for a study to achieve the desired level of precision.
  5. By comparing standard errors of different samples or estimates, one can assess the relative variability and reliability of those estimates.
Standard Error

The SE is computed from sample statistic. To compute SE for simple random samples, assuming that the size of the population ($N$) is at least 20 times larger than that of the sample size ($n$).
\begin{align*}
Sample\, mean, \overline{x} & \Rightarrow SE_{\overline{x}} = \frac{n}{\sqrt{n}}\\
Sample\, proportion, p &\Rightarrow SE_{p} \sqrt{\frac{p(1-p)}{n}}\\
Difference\, b/w \, means, \overline{x}_1 – \overline{x}_2 &\Rightarrow SE_{\overline{x}_1-\overline{x}_2}=\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}\\
Difference\, b/w\, proportions, \overline{p}_1-\overline{p}_2 &\Rightarrow SE_{p_1-p_2}=\sqrt{\frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2}}
\end{align*}

Summary

The SE provides valuable insights about the reliability and precision of sample-based estimates. By understanding SE, a researcher can make more informed decisions and draw more accurate conclusions from the data under study. The SE is identical to the standard deviation, except that it uses statistics whereas the standard deviation uses the parameter.

FAQS about SE

  1. What is the SE, and how it is computed?
  2. What are the uses of SE?
  3. From which is the size of the SE affected?
  4. When will the SE be large?
  5. When will the SE be small?
  6. What will be the standard error for proportion?

For more about SE follow the link Standard Error of Estimate

R for Data Analysis

MCQs Mathematics Intermediate Second Year

Sampling Theory, Introduction, and Reasons to Sample (2015)

Introduction to Sampling Theory

Often we are interested in drawing some valid conclusions (inferences) about a large group of individuals or objects (called population in statistics). Instead of examining (studying) the entire group (population, which may be difficult or even impossible to examine), we may examine (study) only a small part (portion) of the population (an entire group of objects or people). Our objective is to draw valid inferences about certain facts about the population from results found in the sample; a process known as statistical inferences. The process of obtaining samples is called sampling and the theory concerning the sampling is called sampling theory.

Example

Example: We may wish to conclude the percentage of defective bolts produced in a factory during a given 6-day week by examining 20 bolts each day produced at various times during the day. Note that all bolts produced in this case during the week comprise the population, while the 120 selected bolts during 6 days constitute a sample.

In business, medical, social, and psychological sciences, etc., research, sampling theory is widely used for gathering information about a population. The sampling process comprises several stages:

  • Defining the population of concern
  • Specifying the sampling frame (set of items or events possible to measure)
  • Specifying a sampling method for selecting the items or events from the sampling frame
  • Determining the appropriate sample size
  • Implementing the sampling plan
  • Sampling and data collecting
  • Data that can be selected

Reasons to Study a Sample

When studying the characteristics of a population, there are many reasons to study a sample (drawn from the population under study) instead of the entire population such as:

  1. Time: it is difficult to contact every individual in the whole population
  2. Cost: The cost or expenses of studying all the items (objects or individuals) in a population may be prohibitive
  3. Physically Impossible: Some populations are infinite, so it will be physically impossible to check all items in the population, such as populations of fish, birds, snakes, and mosquitoes. Similarly, it is difficult to study the populations that are constantly moving, being born, or dying.
  4. Destructive Nature of items: Some items, objects, etc. are difficult to study as during testing (or checking) they are destroyed, for example, a steel wire is stretched until it breaks and the breaking point is recorded to have a minimum tensile strength. Similarly different electric and electronic components are checked and they are destroyed during testing, making it impossible to study the entire population as time, cost and destructive nature of different items prohibit to study of the entire population.
  5. Qualified and expert staff: For enumeration purposes, highly qualified and expert staff is required which is sometimes impossible. National and International research organizations, agencies, and staff are hired for enumeration purposive which is sometimes costly, needs more time (as a rehearsal of activity is required), and sometimes it is not easy to recruit or hire highly qualified staff.
  6. Reliability: Using a scientific sampling technique the sampling error can be minimized and the non-sampling error committed in the case of a sample survey is also minimal because qualified investigators are included.

Summary

Every sampling system is used to obtain some estimates having certain properties of the population under study. The sampling system should be judged by how good the estimates obtained are. Individual estimates, by chance, may be very close or may differ greatly from the true value (population parameter) and may give a poor measure of the merits of the system.

A sampling system is better judged by the frequency distribution of many estimates obtained by repeated sampling, giving a frequency distribution having a small variance and a mean estimate equal to the true value.

Click the link to Learn Sampling Theory, Sampling Frame, and Sampling Unit

Sampling Theory, Introduction and Reason to Sample

Learn R Programming Language

Sampling Frame and Sampling Unit: A Quick Reference

The post is about the concept of Sampling Frame and Sampling Unit.

Sampling Unit

The population divided into a finite number of distinct and identifiable units is called sampling units. OR

The individuals whose characteristics are to be measured in the analysis are called elementary or sampling units. OR

Before selecting the sample, the population must be divided into parts called sampling units or simply sample units.

Sampling Frame

The list of all the sampling units with a proper identification (which represents the population to be covered is called the sampling-frame). The frame may consist of either a list of units or a map of the area (in case a sample of the area is being taken), such that every element in the population belongs to one and only one unit.

The frame should be accurate, free from omission and duplication (overlapping), adequate, and up-to-date units must cover the whole of the population and should be well identified.

In improving the sampling design, supplementary information for the field covered by the sampling frame may also be valuable.

Sampling Frame and Sampling Unit

Sampling Frame and Sampling Unit: Examples

  1. List of households (and persons) enumerated in the population census.
  2. A map of areas of a country showing the boundaries of area units.
  3. In sampling an agricultural crop, the unit might be a field, a farm, or an area of land whose shape and dimensions are at our disposal.

An ideal sampling frame will have the following qualities/characteristics:

  • all sampling units have a logical and numerical identifier
  • all sampling units can be found i.e. contact information, map location, or other relevant information about sampling units is present
  • the frame is organized in a logical and systematic manner
  • the sampling frame has some additional information about the units that allow the use of more advanced sampling frames
  • every element of the population of interest is present in the frame
  • every element of the population is present only once in the frame
  • no elements from outside the population of interest are present in the frame
  • the data is up-to-date

Classification of Sampling Frame

A sampling frame can be classified as subject to several types of defects as follows:

A frame may be inaccurate: where some of the sampling units of the population are listed inaccurately or some units that do not exist are included in the list.

A frame may be inadequate: when it does not include all classes of the population that are to be taken in the survey.

A frame may be incomplete: when some of the sampling units of the population are either completely omitted or include more than once.

A frame may be out of date: when it has not been updated according to the demand of the occasion, although it was accurate, complete, and adequate at the time of construction.

Imagine you are interested in studying the eating habits of people in your city. The entire population of the city would be too big to survey, so you decide to take a sample. The sampling-frame would be like a phone book of everyone in the city. The sampling unit would be each person listed in the phone book.

Summary

Remember that the quality of the sampling-frame directly affects the representativeness of the sample. If the frame does not accurately reflect the population, the results may be biased.

In short, the quality of the sampling-frame directly affects the validity of the study. Ideally, the frame should be complete (including everyone in the target population) and accurate (with no duplicates or errors). In reality, perfect frames can be difficult to achieve, but researchers strive to get as close as possible.

FAQs about Samling Frames and Sampling Units

  1. Define Sampling frame.
  2. Define Sampling unit.
  3. How a sampling frame should be?
  4. What is the classification of the sampling frame?
  5. Give some examples of sampling frames and sampling units.

MCQs General Knowledge

R and Data Analysis

What is Standard Error of Sampling? (2012)

The standard error (SE) of a statistic is the standard deviation of the sampling distribution of that statistic. The standard error of sampling reflects how much sampling fluctuation a statistic will show. The inferential (deductive) statistics involved in constructing confidence intervals and significance testing are based on standard errors. Increasing the sample size decreases the standard error.

In practical applications, the true value of the standard deviation of the error is unknown. As a result, the term standard error is often used to refer to an estimate of this unknown quantity.

The size of the SE is affected by two values.

  1. The Standard Deviation of the population affects the standard errors. The larger the population’s standard deviation ($\sigma$), the larger is SE i.e. $\frac {\sigma}{\sqrt{n}}$. If the population is homogeneous (which results in a small population standard deviation), the SE will also be small.
  2. The standard errors are affected by the number of observations in a sample. A large sample will result in a small SE of estimate (indicates less variability in the sample means)

Application of Standard Error of Sampling

The SEs are used in different statistical tests such as

  • to measure the distribution of the sample means
  • to build confidence intervals for means, proportions, differences between means, etc., for cases when population standard deviation is known or unknown.
  • to determine the sample size
  • in control charts for control limits for means
  • in comparison tests such as z-test, t-test, Analysis of Variance,
  • in relationship tests such as Correlation and Regression Analysis (standard error of regression), etc.

(1) Standard Error Formula Means

The SE for the mean or standard deviation of the sampling distribution of the mean measures the deviation/ variation in the sampling distribution of the sample mean, denoted by $\sigma_{\bar{x}}$ and calculated as the function of the standard deviation of the population and respective size of the sample i.e

$\sigma_{\bar{x}}=\frac{\sigma}{\sqrt{n}}$                      (used when population is finite)

If the population size is infinite then ${\sigma_{\bar{x}}=\frac{\sigma}{\sqrt{n}} \times \sqrt{\frac{N-n}{N}}}$ because $\sqrt{\frac{N-n}{N}}$ tends towards 1 as N tends to infinity.

When the population’s standard deviation ($\sigma$) is unknown, we estimate it from the sample standard deviation. In this case SE formula is $\sigma_{\bar{x}}=\frac{S}{\sqrt{n}}$

Standard Error of sampling

(2) Standard Error Formula for Proportion

The SE for a proportion can also be calculated in the same manner as we calculated the standard error of the mean, denoted by $\sigma_p$ and calculated as $\sigma_p=\frac{\sigma}{\sqrt{n}}\sqrt{\frac{N-n}{N}}$.

In case of finite population $\sigma_p=\frac{\sigma}{\sqrt{n}}$
in case of infinite population $\sigma=\sqrt{p(1-p)}=\sqrt{pq}$, where $p$ is the probability that an element possesses the studied trait and $q=1-p$ is the probability that it does not.

(3) Standard Error Formula for Difference Between Means

The SE for the difference between two independent quantities is the square root of the sum of the squared standard errors of both quantities i.e $\sigma_{\bar{x}_1+\bar{x}_2}=\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}$, where $\sigma_1^2$ and $\sigma_2^2$ are the respective variances of the two independent population to be compared and $n_1+n_2$ are the respective sizes of the two samples drawn from their respective populations.

Unknown Population Variances
Suppose the variances of the two populations are unknown. In that case, we estimate them from the two samples i.e. $\sigma_{\bar{x}_1+\bar{x}_2}=\sqrt{\frac{S_1^2}{n_1}+\frac{S_2^2}{n_2}}$, where $S_1^2$ and $S_2^2$ are the respective variances of the two samples drawn from their respective population.

Equal Variances are assumed
In case when it is assumed that the variance of the two populations are equal, we can estimate the value of these variances with a pooled variance $S_p^2$ calculated as a function of $S_1^2$ and $S_2^2$ i.e

\[S_p^2=\frac{(n_1-1)S_1^2+(n_2-1)S_2^2}{n_1+n_2-2}\]
\[\sigma_{\bar{x}_1}+{\bar{x}_2}=S_p \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}\]

(4) Standard Error for Difference between Proportions

The SE of the difference between two proportions is calculated in the same way as the SE of the difference between means is calculated i.e.
\begin{eqnarray*}
\sigma_{p_1-p_2}&=&\sqrt{\sigma_{p_1}^2+\sigma_{p_2}^2}\\
&=& \sqrt{\frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2}}
\end{eqnarray*}
where $p_1$ and $p_2$ are the proportion for infinite population calculated for the two samples of sizes $n_1$ and $n_2$.

FAQs about Standard Error

  1. Define the Standard Error of Mean.
  2. Standard Error is affected by which two values?
  3. Write the formula of the standard error of mean, proportion, and difference between means.
  4. What is the application of standard error of mean in Sampling?
  5. Discuss the importance of standard error?
https://itfeature.com Standard Error

Hypothesis Testing in R Language

Online General Knowledge Quiz