Unbiasedness

Unbiasedness is a statistical concept that describes the accuracy of an estimator. An estimator is said to be an unbiased estimator if its expected value (or average value over many samples) equals the corresponding population parameter, that is, $E(\hat{\theta}) = \theta$.

If the expected value of an estimator $\theta$ is not equal to the corresponding parameter then the estimator will be biased. The bias of an estimator of $\hat{\theta}$ can be defined as

$$Bias = E(\hat{\theta}) – \theta$$

Note that $\overline{X}$ is an unbiased estimator of the mean of a population. Therefore,

  • $\overline{X}$ is an unbiased estimator of the parameter $\mu$ in Normal distribution.
  • $\overline{X}$ is an unbiased estimator of the parameter $p$ in the Bernoulli distribution.
  • $\overline{X}$ is an unbiased estimator of the parameter $\lambda$ in the Poisson distribution.
Unbiasedness, positive bias, negative bias, unbiased

However, the expected value of the sample variance $S^2=\frac{\sum\limits_{i=1}^n (X_i – \overline{X})^2 }{n}$ is not equal to the population variance, that is $E(S^2) = \sigma^2$.

Therefore, sample variance is not an unbiased estimator of the population variance $\sigma^2$.

Note that it is possible to have more than one unbiased estimator for an unknown parameter. For example, the sample mean and sample median are both unbiased estimators of the population mean $\mu$ if the population distribution is symmetrical.

Question: Show that the sample mean is an unbiased estimator of the population mean.

Solution:

Let $X_1, X_2, \cdots, X_n$ be a random sample of size $n$ from a population having mean $\mu$. The sample mean is $\overline{X}$ is

$$\overline{X} = \frac{1}{n} \sum\limits_{i=1}^n X_i$$

We must show that $E(\overline{X})=\mu$, therefore, taking the expectation on both sides,

\begin{align*}
E(\overline{X}) &= E\left[\frac{1}{n} \Sigma X_i \right]\\
&= \frac{1}{n} E(X_i) = \frac{1}{n} E(X_1 + X_2 + \cdots + X_n)\\
&= \frac{1}{n} \left[E(X_1) + E(X_2) + \cdots + E(X_n) \right]
\end{align*}

Since, in the random sample, the random variables $X_1, X_2, \cdots, X_n$ are all independent and each has the same distribution of the population, then $E(X_1)=E(X_2)=\cdots=E(X_n)$. So,

$$E(\overline{x}) = \frac{1}{n}(\mu+\mu+\cdots + \mu) = \mu$$

Why Unbiasedness is Important

  • Accuracy: Unbiasedness is a measure of accuracy, not precision. Unbiased estimators provide accurate estimates on average, reducing the risk of systematic errors. However, an unbiased estimator can still have a large variance, meaning its individual estimates can be far from the true value.
  • Consistency: An unbiased estimator is not necessarily consistent. Consistency refers to the tendency of an estimator to converge to the true value as the sample size increases.
  • Foundation for Further Analysis: Unbiased estimators are often used as building blocks for more complex statistical procedures.

Unbiasedness Example

Imagine you’re trying to estimate the average height of students in your university. If you randomly sample 100 students and calculate their average height, this average is an estimator of the true average height of all students in that university. If this average height is consistently equal to the true average height of the entire student population, then your estimator is unbiased.

Unbiasedness is the state of being free from bias, prejudice, or favoritism. It can also mean being able to judge fairly without being influenced by one’s own opinions. In statistics, it also refers to (i) A sample that is not affected by extraneous factors or selectivity (ii) An estimator that has an expected value that is equal to the parameter being estimated.

Applications and Uses of Unbiasedness

  • Parameter Estimation:
    • Mean: The sample mean is an unbiased estimator of the population mean.
    • Variance: The sample variance, with a slight adjustment (Bessel’s correction), is an unbiased estimator of the population variance.
    • Regression Coefficients: In linear regression, the ordinary least squares (OLS) estimators of the regression coefficients are unbiased under certain assumptions.
  • Hypothesis Testing:
    • Unbiased estimators are often used in hypothesis tests to make inferences about population parameters. For example, the t-test for comparing means relies on the assumption that the sample means are unbiased estimators of the population means.
  • Machine Learning: In some machine learning algorithms, unbiased estimators are preferred for model parameters to avoid systematic errors.
  • Survey Sampling: Unbiased sampling techniques, such as simple random sampling, are used to ensure that the sample is representative of the population and that the estimates obtained from the sample are unbiased.

Online MCQs and Quiz Website

R Language FAQs and Interview Questions

Properties of a Good Estimator

Introduction (Properties of a Good Estimator)

The post is about a comprehensive discussion of the Properties of a Good Estimator. In statistics, an estimator is a function of sample data used to estimate an unknown population parameter. A good estimator is both efficient and unbiased. An estimator is considered as a good estimator if it satisfies the following properties:

  • Unbiasedness
  • Consistency
  • Efficiency
  • Sufficiency
  • Invariance

Let us discuss these properties of a good estimator one by one.

Unbiasedness

An estimator is said to be an unbiased estimator if its expected value (that is mean of its sampling distribution) is equal to its true population parameter value. Let $\hat{\theta}$ be an unbiased estimator of its true population parameter $\theta$ then $\hat{\theta}$. If $E(\hat{\theta}) = E(\theta)$ the estimator ($\hat{\theta}$) will be unbiased. If $E(\hat{\theta})\ne \theta$, then $\hat{\theta}$ will be a biased estimator of $\theta$.

  • If $E(\hat{\theta}) > \theta$, then $\hat{\theta}$ will be positively biased.
  • If $E(\hat{\theta}) < \theta$, then $\hat{\theta}$ will be negatively biased.

Some examples of biased or unbiased estimators are:

  • $\overline{X}$ is an unbiased estimator of $\mu$, that is, $E(\overline{X}) = \mu$
  • $\widetilde{X}$ is also an unbiased estimator when the population is normally distributed, that is, $E(\widetilde{X}) =\mu$
  • Sample variance $S^2$ is biased estimator of $\sigma^2$, that is, $E(S^2)\ne \sigma^2$
  • $\hat{p} = \frac{x}{n}$ is an unbiased estimator of $E(\hat{p})=p$

It means that if the sampling process is repeated many times and calculations about the estimator for each sample are made, the average of these estimates would be very close to the true population parameter.

An unbiased estimator does not systematically overestimate or underestimate the true parameter.

Consistency

An estimator is said to be a consistent estimator if the statistic to be used as an estimator approaches the true population parameter value by increasing the sample size. OR
An estimator $\hat{\theta}$ is called a consistent estimator of $\theta$ if the probability that $\hat{\theta}$ becomes closer and closer to $\theta$, approaches unity with increasing the sample size.

Symbolically, $\hat{\theta}$ is a consistent estimator of the parameter $\theta$ if for any arbitrary small positive quantity $e$ or $\epsilon$.

\begin{align*}
\lim\limits_{n\rightarrow \infty} P\left[|\hat{\theta}-\theta|\le \varepsilon\right] &= 1\\
\lim\limits_{n\rightarrow \infty} P\left[|\hat{\theta}-\theta|> \varepsilon\right] &= 0
\end{align*}

A consistent estimator may or may not be unbiased. The sample mean $\overline{X}=\frac{\Sigma X_i}{n}$ and sample proportion $\hat{p} = \frac{x}{n}$ are unbiased estimators of $\mu$ and $p$, respectively and are also consistent.

It means that as one collects more and more data, the estimator becomes more and more accurate in approximating the true population value.

An efficient estimator is less likely to produce extreme values, making it more reliable.

Efficiency

An unbiased estimator is said to be efficient if the variance of its sampling distribution is smaller than that of the sampling distribution of any other unbiased estimator of the same parameter. Suppose there are two unbiased estimators $T_1$ and $T_2$ of the sample parameter $\theta$, then $T_1$ will be said to be a more efficient estimator compared to the $T_2$ if $Var(T_1) < Var(T_2)$. The relative efficiency of $T_1$ compared to $T_2$ is given by the ration

$$E = \frac{Var(T_2)}{Var(T_1)} > 1$$

Note that when two estimators are biased then MSE is used to compare.

A more efficient estimator has a smaller sampling error, meaning it is less likely to deviate significantly from the true population parameter.

An efficient estimator is less likely to produce extreme values, making it more reliable.

Sufficiency

An estimator is said to be sufficient if the statistic used as an estimator utilizes all the information contained in the sample. Any statistic that is not computed from all values in the sample is not a sufficient estimator. The sample mean $\overline{X}=\frac{\Sigma X}{n}$ and sample proportion $\hat{p} = \frac{x}{n}$ are sufficient estimators of the population mean $\mu$ and population proportion $p$, respectively but the median is not a sufficient estimator because it does not use all the information contained in the sample.

A sufficient estimator provides us with maximum information as it is close to a population which is why, it also measures variability.

A sufficient estimator captures all the useful information from the data without any loss.

A sufficient estimator captures all the useful information from the data.

Invariance (Property of Love)

If the function of the parameter changes, the estimator also changes with some functional applications. This property is known as invariance.

\begin{align}
E(X-\mu)^2 &= \sigma^2 \\
\text{or } \sqrt{E(X-\mu)^2} &= \sigma\\
\text{or } [E(X-\mu)^2]^2 &= (\sigma^2)^2
\end{align}

The property states that if $\hat{\theta}$ is the MLE of $\theta$ then $\tau(\hat{\theta})$ is the MLE of $\tau(\hat{\theta})$ for any function. The Taw ($\tau$) is the general form of any function. for example $\theta=\overline{X}$, $\theta^2=\overline{X}^2$, and $\sqrt{\theta}=\sqrt{\overline{X}}$.

Properties of a Good Estimator

From the above diagrammatic representations, one can visualize the properties of a good estimator as described below.

  • Unbiasedness: The estimator should be centered around the true value.
  • Efficiency: The estimator should have a smaller spread (variance) around the true value.
  • Consistency: As the sample size increases, the estimator should become more accurate.
  • Sufficiency: The estimator should capture all relevant information from the sample.

In summary, regarding the properties of a good estimator, a good estimator is unbiased, efficient, consistent, and ideally sufficient. It should also be robust to outliers and have a low MSE.

Properties of a good estimator

https://rfaqs.com, https://gmstat.com

Sufficient Estimators and Sufficient Statistics

Introduction to Sufficient Estimator and Sufficient Statistics

An estimator $\hat{\theta}$ is sufficient if it makes so much use of the information in the sample that no other estimator could extract from the sample, additional information about the population parameter being estimated.

The sample mean $\overline{X}$ utilizes all the values included in the sample so it is a sufficient estimator of the population mean $\mu$.

Sufficient estimators are often used to develop the estimator that has minimum variance among all unbiased estimators (MVUE).

If a sufficient estimator exists, no other estimator from the sample can provide additional information about the population being estimated.

If there is a sufficient estimator, then there is no need to consider any of the non-sufficient estimators. A good estimator is a function of sufficient statistics.

Let $X_1, X_2,\cdots, X_n$ be a random sample from a probability distribution with unknown parameter $\theta$, then this statistic (estimator) $U=g(X_1, X_,\cdots, X_n)$ observation gives $U=g(X_1, X_2,\cdots, X_n)$ does not depend upon population parameter $\Theta$.

Sufficient Statistics Example

The sample mean $\overline{X}$ is sufficient for the population mean $\mu$ of a normal distribution with known variance. Once the sample mean is known, no further information about the population mean $\mu$ can be obtained from the sample itself, while the median is not sufficient for the mean; even if the median of the sample is known, knowing the sample itself would provide further information about the population mean $\mu$.

Mathematical Definition of Sufficiency

Suppose that $X_1,X_2,\cdots,X_n \sim p(x;\theta)$. $T$ is sufficient for $\theta$ if the conditional distribution of $X_1,X_2,\cdots, X_n|T$ does not depend upon $\theta$. Thus
\[p(x_1,x_2,\cdots,x_n|t;\theta)=p(x_1,x_2,\cdots,x_n|t)\]
This means that we can replace $X_1,X_2,\cdots,X_n$ with $T(X_1,X_2,\cdots,X_n)$ without losing information.

Sufficient Estimator Sufficient Statistics

For further reading visit: https://en.wikipedia.org/wiki/Sufficient_statistic

Computer MCQs Test Online

Consistent Estimator: Easy Learning

Statistics is a consistent estimator of a population parameter if “as the sample size increases, it becomes almost certain that the value of the statistics comes close (closer) to the value of the population parameter”. If an estimator (statistic) is considered consistent, it becomes more reliable with a large sample ($n \to \infty$). All this means that the distribution of the estimates becomes more and more concentrated near the value of the population parameter that is being estimated, such that the probability of the estimator being arbitrarily closer to $\theta$ converges to one (sure event).

Consistent Estimator

The estimator $\hat{\theta}_n$ is said to be a consistent estimator of $\theta$ if for any positive $\varepsilon$;
\[limit_{n \rightarrow \infty} P[|\hat{\theta}_n-\theta| \le \varepsilon]=1\]
or
\[limit_{n\rightarrow \infty} P[|\hat{\theta}_n-\theta|> \varepsilon]=0]\]

Here $\hat{\theta}_n$ expresses the estimator of $\theta$, calculated by using a sample size of size $n$.

Consistent Estimator
  • The sample median is a consistent estimator of the population mean if the population distribution is symmetrical; otherwise, the sample median would approach the population median, not the population mean.
  • The sample estimate of standard deviation is biased but consistent as the distribution of $\hat{\sigma}^2$ is becoming more and more concentrated at $\sigma^2$ as the sample size increases.
  • A sample statistic can be an inconsistent estimator, whereas a consistent statistic is unbiased in the limit but an unbiased estimator may or may not be consistent.

Note that these two are not equivalent: (1) Unbiasedness is a statement about the expected value of the sampling distribution of the estimator, while (2) Consistency is a statement about “where the sampling distribution of the estimator is going” as the sample size.

A consistent estimate has insignificant (non-significant) errors (variations) as sample sizes increase indefinitely. More specifically, the probability that those errors will vary by more than a given amount approaches zero as the sample size increases. In other words, the more data you collect, the more consistent the estimator will be with the real population parameter you’re trying to measure. The sample mean ($\overline{X}$) and sample variance ($S^2$) are two well-known consistent estimators.

Statistics Help

R Language Lectures