Category: Estimator Properties

Estimation, Point Estimate, Interval Estimate, Estimator properties

Sufficient statistics and Sufficient Estimators

An estimator $\hat{\theta}$ is sufficient if it makes so much use of the information in the sample that no other estimator could extract from the sample, additional information about the population parameter being estimated.

The sample mean $\overline{X}$ utilizes all the values included in the sample so it is sufficient estimator of the population mean $\mu$.

Sufficient estimators are often used to develop the estimator that has minimum variance among all unbiased estimators (MVUE).

If sufficient estimator exists, no other estimator from the sample can provide additional information about the population being estimated.

If there is a sufficient estimator, then there is no need to consider any of the non-sufficient estimators. A good estimator is a function of sufficient statistics.

Let $X_1,X_2,\cdots,X_n$ be a random sample from a probability distribution with unknown parameter $\theta$, then this statistic (estimator) $U=g(X_1,X_,\cdots,X_n)$ observation gives $U=g(X_1,X_2,\cdots,X_n)$ does not depend upon population parameter $\Theta$.

Sufficient Statistic Example

The sample mean $\overline{X}$ is a sufficient for the population mean $\mu$ of a normal distribution with known variance. Once the sample mean is known, no further information about the population mean $\mu$ can be obtained from the sample itself, while the median is not sufficient for the mean; even if the median of the sample is known, knowing the sample itself would provide further information about the population mean $\mu$.

Mathematical Definition of Sufficiency

Suppose that $X_1,X_2,\cdots,X_n \sim p(x;\theta)$. $T$ is sufficient for $\theta$ if the conditional distribution of $X_1,X_2,\cdots, X_n|T$ does not depend upon $\theta$. Thus
\[p(x_1,x_2,\cdots,x_n|t;\theta)=p(x_1,x_2,\cdots,x_n|t)\]
This means that we can replace $X_1,X_2,\cdots,X_n$ with $T(X_1,X_2,\cdots,X_n)$ without losing information.

For further reading visit: https://en.wikipedia.org/wiki/Sufficient_statistic

Consistent Estimator

A statistics is a consistent estimator of a population parameter if “as the sample size increases, it becomes almost certain that the value of the statistics comes close (closer) to the value of the population parameter”. If an estimator (statistic) is considered as consistent, it becomes more reliable with large sample ($n \to \infty$). All this means that the distribution of the estimates become more and more concentrated near the value of the population parameter which is being estimated, such that the probability of the estimator being arbitrarily closer to $\theta$ converges to one (sure event).

The estimator $\hat{\theta}_n$ is said to be a consistent estimator of $\theta$ if for any positive $\varepsilon$;
\[limit_{n \rightarrow \infty} P[|\hat{\theta}_n-\theta| \le \varepsilon]=1\]
or
\[limit_{n\rightarrow \infty} P[|\hat{\theta}_n-\theta|> \varepsilon]=0]\]

Here $\hat{\theta}_n$ expresses the estimator of $\theta$, calculated by using a sample size of size $n$.

Consistency of an estimator

The sample median is a consistent estimator of the population mean, if the population distribution is symmetrical; otherwise the sample median would approach the population median not the population mean.

The sample estimate of standard deviation is biased but consistent as the distribution of $\hat{\sigma}^2$ is becoming more and more concentrated at $\sigma^2$ as the sample size increases.

A sample statistic can be an inconsistent estimator, whereas a consistent statistic is unbiased in the limit but an unbiased estimator may or may not be consistent.

Note that these two are not equivalent: (1) Unbiasedness is a statement about the expected value of the sampling distribution of the estimator, while (ii) Consistency is a statement about “where the sampling distribution of the estimator is going” as the sample size.

A consistent estimate has insignificant (non-significant) errors (variations) as sample sizes increases indifinately. More specifically, the probability that those errors will vary by more than a given amount approaches zero as the sample size increases. In other words, the more data you collect, a consistent estimator will be close to the real population parameter you’re trying to measure. The sample mean ($\overline{X}$) and sample variance ($S^2$) are two well-known consistent estimators.

Unbiasedness of Estimator

Unbiasedness of estimator is probably the most important property that a good estimator should possess. In statistics, the bias (or bias function) of an estimator is the difference between this estimator’s expected value and the true value of the parameter being estimated. An estimator is said to be unbiased if its expected value equals the corresponding population parameter; otherwise it is said to be biased. Let us consider in detail about the unbiasedness of estimator

Unbiased Estimator

Suppose in the realization of a random variable X taking values in probability space i.e. ($\chi, \mathfrak{F},P_\theta$), such that $\theta \varepsilon \Theta$, a function $f:\Theta \rightarrow \Omega $ has be estimated, mapping the parameter set $\Theta$ into a certain set $\Omega$, and that as an estimator of $f(\theta)$ a statistic $T=T(X)$ is chosen. if T is such that
\[E_\theta[T]=\int_\chi T(x) dP_\theta(x)=f(\theta)\]
holds for $\theta\varepsilon \Theta$ then T is called an unbiased estimator of $f(\theta)$. An unbiased estimator is frequently called free of systematic errors.

Suppose $\hat{\theta}$ be an estimator of a parameter $\theta$, then $\hat{\theta}$ is said to be unbiased estimator if $E(\hat{\theta})=0$.

  • If $E(\hat{\theta})=\theta$ then $\hat{\theta}$ is an unbiased estimator of a parameter $\theta$.
  • If $E(\hat{\theta})<\theta$ then $\hat{\theta}$ is a negatively biased estimator of a parameter $\theta$.
  • If $E(\hat{\theta})>\theta$ then $\hat{\theta}$ is a positively biased estimator of a parameter $\theta$.

Bias of an estimator $\theta$ can be found by $[E(\hat{\theta})-\theta]$.

$\overline{X}$ is an unbiased estimator of the mean of a population (whose mean exists). $\overline{X}$ is an unbiased estimator of $\mu$ in a Normal distribution i.e. $N(\mu, \sigma^2)$. $\overline{X}$ is an unbiased estimator of the parameter $p$ of the Bernoulli distribution. $\overline{X}$ is an unbiased estimator of the parameter $\lambda$ of the Poisson distribution. In each of these cases, the parameter $\mu, p$ or $\lambda$ is the mean of the respective population being sampled.

However, sample variance $S^2$ is not an unbiased estimator of population variance $\sigma^2$, but consistent.

It is possible to have more than one unbiased estimator for an unknown parameter. The sample mean and the sample median are unbiased estimator of the population mean $\mu$, if the population distribution is symmetrical.

x Logo: Shield Security
This Site Is Protected By
Shield Security