What is Standard Error of Sampling? (2012)

The standard error (SE) of a statistic is the standard deviation of the sampling distribution of that statistic. The standard error of sampling reflects how much sampling fluctuation a statistic will show. The inferential (deductive) statistics involved in constructing confidence intervals and significance testing are based on standard errors. Increasing the sample size decreases the standard error.

In practical applications, the true value of the standard deviation of the error is unknown. As a result, the term standard error is often used to refer to an estimate of this unknown quantity.

The size of the SE is affected by two values.

  1. The Standard Deviation of the population affects the standard errors. The larger the population’s standard deviation ($\sigma$), the larger is SE i.e. $\frac {\sigma}{\sqrt{n}}$. If the population is homogeneous (which results in a small population standard deviation), the SE will also be small.
  2. The standard errors are affected by the number of observations in a sample. A large sample will result in a small SE of estimate (indicates less variability in the sample means)

Application of Standard Error of Sampling

The SEs are used in different statistical tests such as

  • to measure the distribution of the sample means
  • to build confidence intervals for means, proportions, differences between means, etc., for cases when population standard deviation is known or unknown.
  • to determine the sample size
  • in control charts for control limits for means
  • in comparison tests such as z-test, t-test, Analysis of Variance,
  • in relationship tests such as Correlation and Regression Analysis (standard error of regression), etc.

(1) Standard Error Formula Means

The SE for the mean or standard deviation of the sampling distribution of the mean measures the deviation/ variation in the sampling distribution of the sample mean, denoted by $\sigma_{\bar{x}}$ and calculated as the function of the standard deviation of the population and respective size of the sample i.e

$\sigma_{\bar{x}}=\frac{\sigma}{\sqrt{n}}$                      (used when population is finite)

If the population size is infinite then ${\sigma_{\bar{x}}=\frac{\sigma}{\sqrt{n}} \times \sqrt{\frac{N-n}{N}}}$ because $\sqrt{\frac{N-n}{N}}$ tends towards 1 as N tends to infinity.

When the population’s standard deviation ($\sigma$) is unknown, we estimate it from the sample standard deviation. In this case SE formula is $\sigma_{\bar{x}}=\frac{S}{\sqrt{n}}$

Standard Error of sampling

(2) Standard Error Formula for Proportion

The SE for a proportion can also be calculated in the same manner as we calculated the standard error of the mean, denoted by $\sigma_p$ and calculated as $\sigma_p=\frac{\sigma}{\sqrt{n}}\sqrt{\frac{N-n}{N}}$.

In case of finite population $\sigma_p=\frac{\sigma}{\sqrt{n}}$
in case of infinite population $\sigma=\sqrt{p(1-p)}=\sqrt{pq}$, where $p$ is the probability that an element possesses the studied trait and $q=1-p$ is the probability that it does not.

(3) Standard Error Formula for Difference Between Means

The SE for the difference between two independent quantities is the square root of the sum of the squared standard errors of both quantities i.e $\sigma_{\bar{x}_1+\bar{x}_2}=\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}$, where $\sigma_1^2$ and $\sigma_2^2$ are the respective variances of the two independent population to be compared and $n_1+n_2$ are the respective sizes of the two samples drawn from their respective populations.

Unknown Population Variances
Suppose the variances of the two populations are unknown. In that case, we estimate them from the two samples i.e. $\sigma_{\bar{x}_1+\bar{x}_2}=\sqrt{\frac{S_1^2}{n_1}+\frac{S_2^2}{n_2}}$, where $S_1^2$ and $S_2^2$ are the respective variances of the two samples drawn from their respective population.

Equal Variances are assumed
In case when it is assumed that the variance of the two populations are equal, we can estimate the value of these variances with a pooled variance $S_p^2$ calculated as a function of $S_1^2$ and $S_2^2$ i.e

\[S_p^2=\frac{(n_1-1)S_1^2+(n_2-1)S_2^2}{n_1+n_2-2}\]
\[\sigma_{\bar{x}_1}+{\bar{x}_2}=S_p \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}\]

(4) Standard Error for Difference between Proportions

The SE of the difference between two proportions is calculated in the same way as the SE of the difference between means is calculated i.e.
\begin{eqnarray*}
\sigma_{p_1-p_2}&=&\sqrt{\sigma_{p_1}^2+\sigma_{p_2}^2}\\
&=& \sqrt{\frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2}}
\end{eqnarray*}
where $p_1$ and $p_2$ are the proportion for infinite population calculated for the two samples of sizes $n_1$ and $n_2$.

FAQs about Standard Error

  1. Define the Standard Error of Mean.
  2. Standard Error is affected by which two values?
  3. Write the formula of the standard error of mean, proportion, and difference between means.
  4. What is the application of standard error of mean in Sampling?
  5. Discuss the importance of standard error?
https://itfeature.com Standard Error

Hypothesis Testing in R Language

Online General Knowledge Quiz

Multivariate Analysis (2012)

Multivariate Analysis term is used to include all statistics for more than two variables that are simultaneously analyzed.

Multivariate analysis is based upon an underlying probability model known as the Multivariate Normal Distribution (MND). The objective of scientific investigations to which multivariate methods most naturally lend themselves includes.

Multivariate Analysis and Statistics

Objectives of Multivariate Analysis

The following are some basic objectives of multivariate analysis.

  • Data reduction or structural simplification
    The phenomenon being studied is represented as simply as possible without sacrificing valuable information. It is hoped that this will make interpretation easier.
  • Sorting and Grouping
    Graphs of similar objects or variables are created, based on measured characteristics. Alternatively, rules for classifying objects into well-defined groups may be required.
  • Investigation of the dependence among variables
    The nature of the relationships among variables is of interest. Are all the variables mutually independent or are one or more variables dependent based on observation of the other variables?
  • Prediction
    Relationships between variables must be determined for predicting the values of one or more variables based on observation of the other variables.
  • Hypothesis Construction and testing
    Specific statistical hypotheses, formulated in terms of the parameter of the multivariate population, are tested. This may be done to validate assumptions or to reinforce prior convictions.

Applications: Multivariate analysis is used in various fields:

  • Social sciences (understanding factors influencing voting behavior)
  • Business (analyzing customer demographics and purchase patterns)
  • Finance (evaluating risk factors in investment portfolios)
  • Natural sciences (studying the relationships between different environmental variables)

Multivariate Data Sets

We are concerned with analyzing measurements made on several variables or characteristics. These measurements (data) must frequently be arranged and displayed in various ways (graphs, tabular form, etc.). Preliminary concepts underlying these first steps of data organization are

Array

Multivariate data arise whenever an investigator, seeking to understand a social or physical phenomenon, selects a number of variables $p\ge$ of variables or characteristics to record. The values of these variables are all recorded for each distinct item, individual, or experimental unit.

$X_{jk}$ notation is used to indicate the particular value of the kth variable that is observed on the jth item or trial. i.e. $X_{jk}$ measurement of the kth variable on the jth item. So, $n$ measurements on $p$ variables can be displayed as

\[\begin{array}{ccccccc}
. & V_1 & V_2  & \dots  & V_k & \dots  & V_p \\
Item 1 & x_{11} & x_{12} & \dots  & x_{1k} & \dots  & x_{1p} \\
Item 2 & x_{21} & x_{22} & \dots  & x_{2k} & \dots  & x_{2p} \\
\vdots & \vdots  & \vdots  & \vdots & \vdots   & \vdots & \vdots  \\
Item j  & x_{j1}   & x_{j2} & \dots  & x_{jk} & \dots  & x_{jp} \\
\vdots &  \vdots & \vdots & \vdots & \vdots   & \vdots & \vdots  \\
Item n & x_{n1} & x_{n2} & \dots  & x_{nk} & \dots  & x_{np} \\
\end{array}\]

These data can be displayed as rectangular arrays $X$ of $n$ rows and $p$ columns

\[X=\begin{pmatrix}
x_{11}     & x_{12} & \dots  & x_{1k}  & \dots  & x_{1p} \\
x_{21}     & x_{22} & \ddots  & x_{2k}  & \ddots  & x_{2p} \\
\vdots & \vdots & \ddots  & \ddots & \vdots & \vdots  \\
x_{j1}     & x_{j2} & \ddots  & x_{jk}  & \ddots  & x_{jp} \\
\vdots  & \vdots & \ddots  & \vdots & \ddots & \vdots  \\
x_{n1}     & x_{n2} & \dots  & x_{nk}  & \dots  & x_{np}
\end{pmatrix}\]

This $X$ array contains the data consisting of all of the observations on all of the variables.

Example: suppose we have data for the number of books sold and the total amount of each sale.

Variable 1 (Sales in Dollars)
\[\begin{array}{ccccc}
Data Values: & 42 & 52 & 48 & 63 \\
Notation: & x_{11} & x_{21} & x_{31} & x_{41}
\end{array}\]

Variable 2 (Number of Books sold)
\[\begin{array}{ccccc}
Data Values: & 4 & 2 & 8 & 3 \\
Notation: & x_{12} & x_{22} & x_{33} & x_{42}
\end{array}\]

itfeature.com Multivariate Analysis

The information, available in the multivariate data sets can be assessed by calculating certain summary numbers, known as multivariate analysis: multivariate descriptive statistics such as Arithmetic Mean, Sample Mean (the measure of location), Average of the Squares of the distances of all of the numbers from the mean (variation/spread i.e. Measure of Spread or Variation).

MCQs General Knowledge

R Programming Language

Measure of Dispersion or Variability (2012)

Introduction to Measure of Dispersion

The measure of location or averages or central tendency is not sufficient to describe the characteristics of a distribution, because two or more distributions may have averages that are exactly alike, even though the distributions are dissimilar in other aspects. On the other hand, a measure of central tendency represents the typical value of the data set. To give a sensible description of data, a numerical quantity called the measure of dispersion/ variability or scatter that describes the spread of the values in a set of data has two types of measures of dispersion or variability:

measures-of-dispersion
  1. Absolute Measures
  2. Relative Measures

A measure of central tendency together with a measure of dispersion gives an adequate description of data as compared to the use of a measure of location only, because the averages or measures of central tendency only describe the balancing point of the data set, it does not provide any information about the degree to which the data tend to spread or scatter about the average value. So, the Measure of dispersion indicates the characteristic of the central tendency measure. The smaller the variability of a given set, the more the values of the measure of averages will represent the data set.

Absolute Measure of Dispersion

Absolute measures are defined in such a way that they have units such as meters, grams, etc., the same as those of the original measurements. Absolute measures cannot be used to compare the variation/spread of two or more data sets.
Most Common absolute measures of variability are:

Relative Measures of Dispersion

The relative measures have no units as these are ratios, coefficients, or percentages. Relative measures are independent of units of measurement and are useful for comparing data of different natures.

  • Coefficient of Variation
  • Coefficient of Mean Deviation
  • Coefficient of Quartile Deviation
  • Coefficient of Standard Deviation

Different terms are used for the measure of dispersion or variability such as variability, spread, scatterness, the measure of uncertainty, deviation, etc.

References:
http://www2.le.ac.uk/offices/careers/ld/resources/numeracy/variability

R Language Frequently Asked Questions