# Basic Statistics and Data Analysis

### Category: Estimator Properties

Estimation, Point Estimate, Interval Estimate, Estimator properties

## Sufficient statistics and Sufficient Estimators

An estimator $\hat{\theta}$ is sufficient if it make so much use of the information in the sample that no other estimator could extract from the sample, additional information about the population parameter being estimated.

The sample mean $\overline{X}$ utilizes all the values included in the sample so it is sufficient estimator of population mean $\mu$.

Sufficient estimators are often used to develop the estimator that have minimum variance among all unbiased estimators (MVUE).

If sufficient estimator exists, no other estimator from the sample can provide additional information about the population being estimated.

If there is a sufficient estimator, then there is no need to consider any of the non-sufficient estimator. Good estimator are function of sufficient statistics.

Let $X_1,X_2,\cdots,X_n$ be a random sample from a probability distribution with unknown parameter $\theta$, then this statistic (estimator) $U=g(X_1,X_,\cdots,X_n)$ observation gives $U=g(X_1,X_2,\cdots,X_n)$ does not depend upon population parameter $\Theta$.

## Sufficient Statistic Example

The sample mean $\overline{X}$ is a sufficient for the population mean $\mu$ of a normal distribution with known variance. Once the sample mean is known, no further information about the population mean $\mu$ can be obtained from the sample itself, while median is not sufficient for the mean; even if the median of the sample is known, knowing the sample itself would provide further information about the population mean $\mu$.

## Mathematical Definition of Sufficiency

Suppose that $X_1,X_2,\cdots,X_n \sim p(x;\theta)$. $T$ is sufficient for $\theta$ if the conditional distribution of $X_1,X_2,\cdots, X_n|T$ does not depend upon $\theta$. Thus
$p(x_1,x_2,\cdots,x_n|t;\theta)=p(x_1,x_2,\cdots,x_n|t)$
This means that we can replace $X_1,X_2,\cdots,X_n$ with $T(X_1,X_2,\cdots,X_n)$ without losing information.

## Consistent Estimator

A statistics is a consistent estimator of a population parameter if “as the sample size increases, it becomes almost certain that the value of the statistics comes close (closer) to the value of the population parameter”. If an estimator is consistent, it becomes more reliable with large sample. All this means that the distribution of the estimates become more and more concentrated near the value of the population parameter which is being estimated, such that the probability of the estimator being arbitrarily closer to $\theta$ converges to one (sure event).

The estimator $\hat{\theta}_n$ is said to be a consistent estimator of $\theta$ if for any positive $\varepsilon$;
$limit_{n \rightarrow \infty} P[|\hat{\theta}_n-\theta| \le \varepsilon]=1$
or
$limit_{n\rightarrow \infty} P[|\hat{\theta}_n-\theta|> \varepsilon]=0]$

Here $\hat{\theta}_n$ expresses the estimator of $\theta$, calculated by using a sample size of size $n$.

The sample median is a consistent estimator of the population mean, if the population distribution is symmetrical; otherwise the sample median would approach the population median not the population mean.

The sample estimate of standard deviation is biased but consistent as the distribution of $\hat{\sigma}^2$ is becoming more and more concentrated at $\sigma^2$ as the sample size increases.

A sample statistic can be an inconsistent estimator, whereas a consistent statistic is unbiased in the limit but an unbiased estimator may or may not be consistent estimator.

Note that these two are not equivalent: (1) Unbiasedness is a statement about the expected value of the sampling distribution of the estimator, while (ii) Consistency is a statement about “where the sampling distribution of the estimator is going” as the sample size

## Unbiasedness of estimator

Unbiasedness of estimator is probably the most important property that a good estimator should possess. In statistics, the bias (or bias function) of an estimator is the difference between this estimator’s expected value and the true value of the parameter being estimated. An estimator is said to be unbiased if its expected value equals the corresponding population parameter; otherwise it is said to be biased.

## Unbiased Estimator

Suppose in the realization of a random variable X taking values in probability space i.e. ($\chi, \mathfrak{F},P_\theta$), such that $\theta \varepsilon \Theta$, a function $f:\Theta \rightarrow \Omega$ has be estimated, mapping the parameter set $\Theta$ into a certain set $\Omega$, and that as an estimator of $f(\theta)$ a statistic $T=T(X)$ is chosen. if T is such that
$E_\theta[T]=\int_\chi T(x) dP_\theta(x)=f(\theta)$
holds for $\theta\varepsilon \Theta$ then T is called an unbiased estimator of $f(\theta)$. An unbiased estimator is frequently called free of systematic errors.

Suppose $\hat{\theta}$ be an estimator of a parameter $\theta$, then $\hat{\theta}$ is said to be unbiased estimator if $E(\hat{\theta})=0$.

• If $E(\hat{\theta})=\theta$ then $\hat{\theta}$ is an unbiased estimator of a parameter $\theta$.
• If $E(\hat{\theta})<\theta$ then $\hat{\theta}$ is a negatively biased estimator of a parameter $\theta$.
• If $E(\hat{\theta})>\theta$ then $\hat{\theta}$ is a positively biased estimator of a parameter $\theta$.

Bias of an estimator $\theta$ can be found by $[E(\hat{\theta})-\theta]$.

$\overline{X}$ is an unbiased estimator of the mean of a population (whose mean exists). $\overline{X}$ is an unbiased estimator of $\mu$ in a Normal distribution i.e. $N(\mu, \sigma^2)$. $\overline{X}$ is an unbiased estimator of the parameter $p$ of the Bernoulli distribution. $\overline{X}$ is an unbiased estimator of the parameter $\lambda$ of the Poisson distribution. In each of these cases, the parameter $\mu, p$ or $\lambda$ is the mean of the respective population being sampled.

However, Sample variance $\sigma^2$ is not an unbiased estimator of population variance $\sigma$, but consistent.

It is possible to have more than one unbiased estimator for an unknown parameter. The sample mean and the sample median are unbiased estimator of the population mean $\mu$, if the population distribution is symmetrical.