Unbiasedness of the Estimator (2013)

The unbiasedness of the estimator is probably the most important property that a good estimator should possess. In statistics, the bias (or bias function) of an estimator is the difference between this estimator’s expected value and the true value of the parameter being estimated. An estimator is said to be unbiased if its expected value equals the corresponding population parameter; otherwise, it is said to be biased. Let us discuss in detail the unbiasedness of the estimator.

Unbiasedness of the Estimator

Suppose in the realization of a random variable $X$ taking values in probability space i.e. ($\chi, \mathfrak{F}, P_\theta$), such that $\theta \varepsilon \Theta$, a function $f:\Theta \rightarrow \Omega $ has to be estimated, mapping the parameter set $\Theta$ into a certain set $\Omega$, and that as an estimator of $f(\theta)$ a statistic $T=T(X)$ is chosen. if $T$ is such that
\[E_\theta[T]=\int_\chi T(x) dP_\theta(x)=f(\theta)\]
holds for $\theta\varepsilon \Theta$ then $T$ is called an unbiased estimator of $f(\theta)$. The unbiased estimator is frequently called free of systematic errors.

Unbiased Estimator

Suppose $\hat{\theta}$ be an estimator of a parameter $\theta$, then $\hat{\theta}$ is said to be unbiased estimator if $E(\hat{\theta})=0$.

  • If $E(\hat{\theta})=\theta$ then $\hat{\theta}$ is an unbiased estimator of a parameter $\theta$.
  • If $E(\hat{\theta})<\theta$ then $\hat{\theta}$ is a negatively biased estimator of a parameter $\theta$.
  • If $E(\hat{\theta})>\theta$ then $\hat{\theta}$ is a positively biased estimator of a parameter $\theta$.

Bias of an estimator $\theta$ can be found by $$[E(\hat{\theta})-\theta]$$

  • $\overline{X}$ is an unbiased estimator of the mean of a population (whose mean exists).
  • $\overline{X}$ is an unbiased estimator of $\mu$ in a Normal distribution i.e. $N(\mu, \sigma^2)$.
  • $\overline{X}$ is an unbiased estimator of the parameter $p$ of the Bernoulli distribution.
  • $\overline{X}$ is an unbiased estimator of the parameter $\lambda$ of the Poisson distribution.

In each of these cases, the parameter $\mu, p$ or $\lambda$ is the mean of the respective population being sampled.

However, sample variance $S^2$ is not an unbiased estimator of population variance $\sigma^2$, but consistent.

It is possible to have more than one unbiased estimator for an unknown parameter. The sample mean and the sample median are unbiased estimators of the population mean $\mu$ if the population distribution is symmetrical.

Unbiasedness of the Estimator

Computer MCQs

R Programming Language

What is Standard Error of Sampling? (2012)

The standard error (SE) of a statistic is the standard deviation of the sampling distribution of that statistic. The standard error of sampling reflects how much sampling fluctuation a statistic will show. The inferential (deductive) statistics involved in constructing confidence intervals and significance testing are based on standard errors. Increasing the sample size decreases the standard error.

In practical applications, the true value of the standard deviation of the error is unknown. As a result, the term standard error is often used to refer to an estimate of this unknown quantity.

The size of the SE is affected by two values.

  1. The Standard Deviation of the population affects the standard errors. The larger the population’s standard deviation ($\sigma$), the larger is SE i.e. $\frac {\sigma}{\sqrt{n}}$. If the population is homogeneous (which results in a small population standard deviation), the SE will also be small.
  2. The standard errors are affected by the number of observations in a sample. A large sample will result in a small SE of estimate (indicates less variability in the sample means)

Application of Standard Error of Sampling

The SEs are used in different statistical tests such as

  • to measure the distribution of the sample means
  • to build confidence intervals for means, proportions, differences between means, etc., for cases when population standard deviation is known or unknown.
  • to determine the sample size
  • in control charts for control limits for means
  • in comparison tests such as z-test, t-test, Analysis of Variance,
  • in relationship tests such as Correlation and Regression Analysis (standard error of regression), etc.

(1) Standard Error Formula Means

The SE for the mean or standard deviation of the sampling distribution of the mean measures the deviation/ variation in the sampling distribution of the sample mean, denoted by $\sigma_{\bar{x}}$ and calculated as the function of the standard deviation of the population and respective size of the sample i.e

$\sigma_{\bar{x}}=\frac{\sigma}{\sqrt{n}}$                      (used when population is finite)

If the population size is infinite then ${\sigma_{\bar{x}}=\frac{\sigma}{\sqrt{n}} \times \sqrt{\frac{N-n}{N}}}$ because $\sqrt{\frac{N-n}{N}}$ tends towards 1 as N tends to infinity.

When the population’s standard deviation ($\sigma$) is unknown, we estimate it from the sample standard deviation. In this case SE formula is $\sigma_{\bar{x}}=\frac{S}{\sqrt{n}}$

Standard Error of sampling

(2) Standard Error Formula for Proportion

The SE for a proportion can also be calculated in the same manner as we calculated the standard error of the mean, denoted by $\sigma_p$ and calculated as $\sigma_p=\frac{\sigma}{\sqrt{n}}\sqrt{\frac{N-n}{N}}$.

In case of finite population $\sigma_p=\frac{\sigma}{\sqrt{n}}$
in case of infinite population $\sigma=\sqrt{p(1-p)}=\sqrt{pq}$, where $p$ is the probability that an element possesses the studied trait and $q=1-p$ is the probability that it does not.

(3) Standard Error Formula for Difference Between Means

The SE for the difference between two independent quantities is the square root of the sum of the squared standard errors of both quantities i.e $\sigma_{\bar{x}_1+\bar{x}_2}=\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}$, where $\sigma_1^2$ and $\sigma_2^2$ are the respective variances of the two independent population to be compared and $n_1+n_2$ are the respective sizes of the two samples drawn from their respective populations.

Unknown Population Variances
Suppose the variances of the two populations are unknown. In that case, we estimate them from the two samples i.e. $\sigma_{\bar{x}_1+\bar{x}_2}=\sqrt{\frac{S_1^2}{n_1}+\frac{S_2^2}{n_2}}$, where $S_1^2$ and $S_2^2$ are the respective variances of the two samples drawn from their respective population.

Equal Variances are assumed
In case when it is assumed that the variance of the two populations are equal, we can estimate the value of these variances with a pooled variance $S_p^2$ calculated as a function of $S_1^2$ and $S_2^2$ i.e

\[S_p^2=\frac{(n_1-1)S_1^2+(n_2-1)S_2^2}{n_1+n_2-2}\]
\[\sigma_{\bar{x}_1}+{\bar{x}_2}=S_p \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}\]

(4) Standard Error for Difference between Proportions

The SE of the difference between two proportions is calculated in the same way as the SE of the difference between means is calculated i.e.
\begin{eqnarray*}
\sigma_{p_1-p_2}&=&\sqrt{\sigma_{p_1}^2+\sigma_{p_2}^2}\\
&=& \sqrt{\frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2}}
\end{eqnarray*}
where $p_1$ and $p_2$ are the proportion for infinite population calculated for the two samples of sizes $n_1$ and $n_2$.

FAQs about Standard Error

  1. Define the Standard Error of Mean.
  2. Standard Error is affected by which two values?
  3. Write the formula of the standard error of mean, proportion, and difference between means.
  4. What is the application of standard error of mean in Sampling?
  5. Discuss the importance of standard error?
https://itfeature.com Standard Error

Hypothesis Testing in R Language

Online General Knowledge Quiz

Truth about Bias in Statistics

Bias in Statistics is defined as the difference between the expected value of a statistic and the true value of the corresponding parameter. Therefore, the bias is a measure of the systematic error of an estimator. The bias indicates the distance of the estimator from the true value of the parameter. For example, if we calculate the mean of a large number of unbiased estimators, we will find the correct value.

Bias in Statistics: The Difference between Expected and True Value

In other words, the bias (sampling error) is a systematic error in measurement or sampling and it tells how far off on the average the model is from the truth.

Gauss, C.F. (1821) during his work on the least-squares method gave the concept of an unbiased estimator.

The bias of an estimator of a parameter should not be confused with its degree of precision as the degree of precision is a measure of the sampling error. The bias is favoring one group or outcome intentionally or unintentionally over other groups or outcomes available in the population under study. Unlike random errors, bias is a serious problem and bias can be reduced by increasing the sample size and averaging the outcomes.

Bias in Statistics

Several types of bias should not be considered mutually exclusive

  • Selection Bias (arise due to systematic differences between the groups compared)
  • Exclusion Bias (arises due to the systematic exclusion of certain individuals from the study)
  • Analytical Bias (arise due to the way that the results are evaluated)

Mathematically Bias can be defined as

Let statistics $T$ used to estimate a parameter $\theta$, if $E(T) = \theta$+ bias$(\theta)$ then bias$(\theta)$ is called the bias of the statistic $T$, where $E(T)$ represents the expected value of the statistics $T$.

Note: that if bias$(\theta)=0$, then $E(T)=\theta$. So, $T$ is an unbiased estimator of the true parameter, say $\Theta$.

Types of Sample Selection Bias

Reference:
Gauss, C.F. (1821, 1823, 1826). Theoria Combinations Observationum Erroribus Minimis Obnoxiae, Parts 1, 2 and suppl. Werke 4, 1-108.

For further reading about Statistical Bias visit: Bias in Statistics.

Learn about Estimation and Types of Estimation

Estimation and Types of Estimation in Statistics

The Post is about Introduction to Estimation and Types of Estimation in the Subject of Statistics. Let us discuss Estimation and Types of Estimation in Statistics.

The procedure of making a judgment or decision about a population parameter is referred to as statistical estimation or simply estimation.  Statistical estimation procedures provide estimates of population parameters with a desired degree of confidence. The degree of confidence can be controlled in part, by the size of the sample (larger sample greater accuracy of the estimate) and by the type of estimate made. Population parameters are estimated from sample data because it is not possible (it is impracticable) to examine the entire population to make such an exact determination.

The Types of Estimation in Statistics for the estimation of the population parameter are further divided into two groups (i) Point Estimation and (ii) Interval Estimation

Point Estimation

The objective of point estimation is to obtain a single number from the sample which will represent the unknown value of the population parameter. Population parameters (population mean, variance, etc) are estimated from the corresponding sample statistics (sample mean, variance, etc).
A statistic used to estimate a parameter is called a point estimator or simply an estimator, the actual numerical value obtained by an estimator is called an estimate.

A population parameter is denoted by $\theta$ which is an unknown constant. The available information is in the form of a random sample $x_1,x_2,\cdots,x_n$ of size $n$ drawn from the population. We formulate a function of the sample observation $x_1,x_2,\cdots,x_n$. The estimator of $\theta$ is denoted by $\hat{\theta}$. The different random sample provides different values of the statistics $\hat{\theta}$. Thus $\hat{\theta}$ is a random variable with its sampling probability distribution.

Interval Estimation

A point estimator (such as sample mean) calculated from the sample data provides a single number as an estimate of the population parameter, which can not be expected to be exactly equal to the population parameter because the mean of a sample taken from a population may assume different values for different samples. Therefore, we estimate an interval/ range of values (set of values) within which the population parameter is expected to lie with a certain degree of confidence. This range of values used to estimate a population parameter is known as interval estimate or estimate by a confidence interval, and is defined by two numbers, between which a population parameter is expected to lie.

For example, $a<\bar{x}<b$ is an interval estimate of the population mean $\mu$, indicating that the population mean is greater than $a$ but less than $b$. The purpose of an interval estimate is to provide information about how close the point estimate is to the true parameter.

Types of Estimation

Note that the information developed about the shape of a sampling distribution of the sample mean i.e. Sampling Distribution of $\bar{x}$ allows us to locate an interval that has some specified probability of containing the population mean $\mu$.

Interval Estimate formula when $n>30$ and Population is normal $$\bar{x} \pm Z \frac{\sigma}{\sqrt{n}}$$

Interval Estimate formula when $n<30$ and Population is not normal $$\bar{x} \pm t_{(n-1, \alpha)}\,\, \frac{s}{\sqrt{n}}$$

Which of the two types of estimation in Statistics, do you like the most, and why?

The Types of Estimation in Statistics are as follows:

  • Point estimation is nice because it provides an exact point estimate of the population value. It provides you with the single best guess of the value of the population parameter.
  • Interval estimation is nice because it allows you to make statements of confidence that an interval will include the true population value.

Read about the Advantages of Interval Estimation in Statistics

Perform more Online Multiple Choice Quiz about different Subjects

Learn R Programming Language