Unbiasedness is a statistical concept that describes the accuracy of an estimator. An estimator is said to be an unbiased estimator if its expected value (or average value over many samples) equals the corresponding population parameter, that is, $E(\hat{\theta}) = \theta$.
If the expected value of an estimator $\theta$ is not equal to the corresponding parameter then the estimator will be biased. The bias of an estimator of $\hat{\theta}$ can be defined as
$$Bias = E(\hat{\theta}) – \theta$$
Note that $\overline{X}$ is an unbiased estimator of the mean of a population. Therefore,
- $\overline{X}$ is an unbiased estimator of the parameter $\mu$ in Normal distribution.
- $\overline{X}$ is an unbiased estimator of the parameter $p$ in the Bernoulli distribution.
- $\overline{X}$ is an unbiased estimator of the parameter $\lambda$ in the Poisson distribution.
However, the expected value of the sample variance $S^2=\frac{\sum\limits_{i=1}^n (X_i – \overline{X})^2 }{n}$ is not equal to the population variance, that is $E(S^2) = \sigma^2$.
Therefore, sample variance is not an unbiased estimator of the population variance $\sigma^2$.
Note that it is possible to have more than one unbiased estimator for an unknown parameter. For example, the sample mean and sample median are both unbiased estimators of the population mean $\mu$ if the population distribution is symmetrical.
Question: Show that the sample mean is an unbiased estimator of the population mean.
Solution:
Let $X_1, X_2, \cdots, X_n$ be a random sample of size $n$ from a population having mean $\mu$. The sample mean is $\overline{X}$ is
$$\overline{X} = \frac{1}{n} \sum\limits_{i=1}^n X_i$$
We must show that $E(\overline{X})=\mu$, therefore, taking the expectation on both sides,
\begin{align*}
E(\overline{X}) &= E\left[\frac{1}{n} \Sigma X_i \right]\\
&= \frac{1}{n} E(X_i) = \frac{1}{n} E(X_1 + X_2 + \cdots + X_n)\\
&= \frac{1}{n} \left[E(X_1) + E(X_2) + \cdots + E(X_n) \right]
\end{align*}
Since, in the random sample, the random variables $X_1, X_2, \cdots, X_n$ are all independent and each has the same distribution of the population, then $E(X_1)=E(X_2)=\cdots=E(X_n)$. So,
$$E(\overline{x}) = \frac{1}{n}(\mu+\mu+\cdots + \mu) = \mu$$
Why Unbiasedness is Important
- Accuracy: Unbiasedness is a measure of accuracy, not precision. Unbiased estimators provide accurate estimates on average, reducing the risk of systematic errors. However, an unbiased estimator can still have a large variance, meaning its individual estimates can be far from the true value.
- Consistency: An unbiased estimator is not necessarily consistent. Consistency refers to the tendency of an estimator to converge to the true value as the sample size increases.
- Foundation for Further Analysis: Unbiased estimators are often used as building blocks for more complex statistical procedures.
Unbiasedness Example
Imagine you’re trying to estimate the average height of students in your university. If you randomly sample 100 students and calculate their average height, this average is an estimator of the true average height of all students in that university. If this average height is consistently equal to the true average height of the entire student population, then your estimator is unbiased.
Unbiasedness is the state of being free from bias, prejudice, or favoritism. It can also mean being able to judge fairly without being influenced by one’s own opinions. In statistics, it also refers to (i) A sample that is not affected by extraneous factors or selectivity (ii) An estimator that has an expected value that is equal to the parameter being estimated.
Applications and Uses of Unbiasedness
- Parameter Estimation:
- Mean: The sample mean is an unbiased estimator of the population mean.
- Variance: The sample variance, with a slight adjustment (Bessel’s correction), is an unbiased estimator of the population variance.
- Regression Coefficients: In linear regression, the ordinary least squares (OLS) estimators of the regression coefficients are unbiased under certain assumptions.
- Hypothesis Testing:
- Unbiased estimators are often used in hypothesis tests to make inferences about population parameters. For example, the t-test for comparing means relies on the assumption that the sample means are unbiased estimators of the population means.
- Machine Learning: In some machine learning algorithms, unbiased estimators are preferred for model parameters to avoid systematic errors.
- Survey Sampling: Unbiased sampling techniques, such as simple random sampling, are used to ensure that the sample is representative of the population and that the estimates obtained from the sample are unbiased.
R Language FAQs and Interview Questions