Efficiency of an Estimator

Introduction to Efficiency of an Estimator

The efficiency of an estimator is a measure of how well it estimates a population parameter compared to other estimators. It is possible to have more than one unbiased estimator of a parameter. We should have at least one additional criterion for choosing among the unbiased estimator of the parameter. Usually, unbiased estimators are compared in terms of their variances. Thus, the comparison of variances of estimators is described as a comparison of the efficiency of estimators.

Use of Efficiency

The efficiency of an estimator is often used to evaluate an estimator through the following concepts:

  • Bias: An estimator is unbiased if its expected value equals the true parameter value ($E[\hat{\theta}]=\theta$). The efficiency of an estimator can be influenced by bias; thus, unbiased estimators are often preferred.
  • Variance: Efficiency is commonly assessed by the variance of the estimator. An estimator having a lower variance is considered more efficient. The Cramér-Rao lower bound provides a theoretically lower limit for the variance of unbiased estimators.
  • Mean Squared Error (MSE): Efficiency can also be measured using MSE, which combines both variance and bias. MSE is given by: MSE = $Var(\hat{\theta}) + Bias (\hat{\theta})^2$. An estimator with a lower MSE is more efficient.
  • Relative Efficiency: The relative efficiency compares the efficiency of two estimators, often expressed as the ratio of their variances: Relative Efficiency = $\frac{Var(\hat{\theta}_2)}{Var(\hat{\theta}_1)}, where $\hat{\theta}_1$ is the estimator being compared, and $\hat{\theta}_2$ is a competitor.
Efficiency of an estimator

The efficiency of an estimator is stated in relative terms. If say two estimators $\hat{\theta}_1$ and $\hat{\theta}_2$ are unbiased estimators of the same population parameter $\theta$ and the variance of $\hat{\theta}_1$ is less than the variance of $\hat{\theta}_2$ (that is, $Var(\hat{\theta}_1) < Var(\hat{\theta}_2)$ then $\hat{\theta}_1$ is relatively more efficient than $\hat{\theta}_2$. The ration is $E=\frac{Var(\hat{\theta}_2)}{var(\hat{\theta}_1)}$ is a measure of relative efficiency of $\hat{\theta}_1$ with respect to the $\hat{\theta}_2$. If $E>1$, $\hat{\theta}_1$ is said to be more efficient than $\hat{\theta}_2$.

If $\hat{\theta}$ is an unbiased estimator of $\theta$ and $Var(\hat{\theta})$ is minimum compared to any other unbiased estimator for $\theta$, then $\hat{\theta}$ is said to be a minimum variance unbiased estimator for $\theta$.

It is preferable to make efficient comparisons based on the MSE instead of its variance.

\begin{align*}
MSE(\hat{\theta}) & = E(\hat{\theta} – \theta)^2\\
&= E\left[(\hat{\theta} – E(\hat{\theta}) + E(\hat{\theta}) – \theta \right]\\
&= E\left[ \left(\hat{\theta} – E(\hat{\theta})\right) ^2 + \left(E(\hat{\theta})-\hat{\theta}\right)^2 + 2(\hat{\theta}-E(\hat{\theta}))(E(\hat{\theta}) -\theta)\right]\\
&= E[\hat{\theta} – E(\hat{\theta})]^2 + [E(\hat{\theta})-\theta]^2 \\
&= Var(\hat{\theta}) + (Bias)^2
\end{align*}

where $E[\hat{\theta}-E(\hat{\theta})] = E(\hat{\theta}) – E(\hat{\theta})=0$

Question about the Efficiency of an Estimator

Question: Let $X_1, X_2, \cdots, X_n$ be a random sample of size 3 from a population with mean $\mu$ and variance \sigma^2$. Consider the following estimators of mean $\mu$:

\begin{align*}
T_1 &= \frac{X_1+X_2+X_3}{2}\qquad Sample\,\, mean\\
T_2 &- \frac{X_1 + 2X_2 + X_3}{4} \qquad Weighted \,\, mean
\end{align*}

which estimator should be preferred?

Solution

First, we check the unbiasedness of $T_1$ and $T_2.

\begin{align*}
E(T_1) &= \frac{1}{3} E(X_1 + X_2 + X_3)=\mu\\
E(T_2) &= \frac{1}{4}E(X_1+2X_2 + X_4) = \mu
\end{align*}

Therefore, $T_1$ and $T_2$ are unbiased estimators of $\mu$.

For efficiency, let us check the variances of these estimators.

\begin{align*}
Var(T_1) &= Var\left(\frac{X_1 + X_2 + X_3}{3} \right)\\
&= \frac{1}{9} \left(Var(X_1) + Var(X_2) + Var(X_3)\right)\\
&= \frac{1}{9} (\sigma^2 + \sigma^2 + \sigma^2) = \frac{\sigma^2}{3}\\
Var(T_2) &= Var\left(\frac{X_1 + 2X_2 + X_3}{4}\right)\\
&= \frac{1}{16} \left(Var(X_1) + 4Var(X_2) + Var(X_3)\right)\\
&= \frac{1}{16}(\sigma^2 + 4\sigma^2 + \sigma^2) = \frac{3\sigma^2}{8}
\end{align*}

Since $\frac{1}{3} < \frac{3}{8}$, that is, $Var(T_1) < Var(T_2). The $T_1$ is better estimator of $\mu$ than $T_2$.

Reasons to Use Efficiency of an Estimator

  1. Optimal Use of Data: An efficient estimator makes the best possible use of the available data, providing more accurate estimates. This is particularly important in research, where the goal is often to make inferences or predictions based on sample data.
  2. Reducing Uncertainty: Efficiency reduces the variance of the estimators, leading to more precise estimates. This is essential in fields like medicine, economics, and engineering, where precise measurements can significantly impact decision-making and outcomes.
  3. Resource Allocation: In practical applications, using an efficient estimator can lead to savings in money, time, and resources. For example, if an estimator provides a more accurate estimate with less data, it can result in fewer resources needed for data collection.
  4. Comparative Evaluation: Comparisons between different estimators help researchers and practitioners choose the best method for their specific context. Understanding efficiency allows one to select estimators that yield reliable results.
  5. Statistical Power: Efficient estimators contribute to higher statistical power, which is the probability of correctly rejecting a false null hypothesis. This is particularly important in hypothesis testing and experimental design.
  6. Robustness: While efficiency relates mostly to variance and bias, efficient estimators are often more robust to violations of assumptions (e.g., normality) in some contexts, leading to more reliable conclusions.

In summary, the efficiency of an estimator is vital as it directly influences the accuracy, reliability, and practical utility of statistical analyses, ultimately affecting the quality of decision-making based on those analyses.

statistics help https://itfeature.com

MCQs Functions and Limits

Packages in R for Data Analysis

Importance of Dispersion in Statistics

The importance of dispersion in statistics cannot be ignored. The term dispersion (or spread, or variability) is used to express the variability in the data set. The measure of dispersion is very important in statistics as it gives an average measure of how much data points differ from the average or another measure. The measure of variability tells about the consistency in the data sets.

The dispersion is a quantity that is far away from its center point (such as average). The data with minimum variation/variability with respect to its center point (average) is said to be more consistent. The lesser the variability in the data the more consistent the data.

Example of Measure of Dispersion

Suppose the score of three batsmen in three cricket matches:

PlayerMatch 1Match 2Match 3Average Score
A70809080
B75809580
C65809580

The question is which player is more consistent with his performance.

In the above data set the player whose deviation from average is minimum will be the most consistent player. So, the player B is more consistent than others. He shows less variation.

There are two types of measures of dispersion:

Absolute Measure of Dispersion

In absolute measure of dispersion, the measure is expressed in the original units in which the data is collected. For example, if data is collected in grams, the measure of dispersion will also be expressed in grams. The absolute measure of dispersion has the following types:

  • Range
  • Quartile Deviation
  • Average Deviation
  • Standard Deviation
  • Variance

Relative Measures of Dispersion

In the relative measures of dispersion, the measure is expressed in terms of coefficients, percentages, ratios, etc. It has the following types:

  • Coefficient of range
  • Coefficient of Quartile Deviation
  • Coefficient of Average Deviation
  • Coefficient of Variation (CV)

See more about Measures of Dispersion

Range and Coefficient of Range

Range is defined as the difference between the maximum value and minimum value of the data, statistically, it is $R=x_{max} – x_{min}$.

The Coefficient of Range is $=\frac{x_{max} – x_{min} }{x_{max} – x_{min} }$. Multiplying it by 100 will express it in percentages.

Consider the ungrouped data $x = 32, 36, 36, 37, 39, 41, 45, 46, 48$

The range will be $x_{max} – x_{min} = 48 – 32 = 16$.

The coefficient of Range will be $=\frac{x_{max} – x_{min} }{x_{max} – x_{min} }$

\begin{align*}
Coef\,\, of\,\, Range =\frac{x_{max} – x_{min} }{x_{max} – x_{min} } \\
&= \frac{48-32}{48+32} = \frac{16}{80} = 0.2\\
&= 0.2 \times 100 = 20\%
\end{align*}

For the following grouped data, the range and coefficient of the range will be

ClassesFreqClass Boundaries
65 – 84964.5 – 84.5
85 – 1041084.5 – 104.5
105 – 12417104.5 – 124.5
125 – 14410124.5 – 144.5
145 – 1645144.5 – 164.5
165 – 1844164.5 – 184.5
185 – 2045184.5 – 204.5
Tota.60

The upper class bound of the highest class will be $x_{min}$ and the lower class boundary of the lowest class will be $x_{min}$. Therefore, $x_{max}=204.5$ and $x_{min} = 64.5$. Therefore,

$$Range = x_{max} – x_{min} = 204.5 – 64.5 = 140$$

The Coefficient of Range will be

\begin{align*}
Coef\,\, of\,\, Range &=\frac{x_{max} – x_{min} }{x_{max} – x_{min} } \\
&= \frac{204.5-64.5}{204.5+64.5} = \frac{140}{269} = 0.5204\\
&= 0.5204 \times 100 = 52.04\%
\end{align*}

Average Deviation and Coefficient of Average Deviation

The average deviation is an absolute measure of dispersion. The mean/average of absolute deviation either taken from mean, median, or mode is called average deviation. Statistically, it is

$$Mean\,\, Deviation_{\overline{X}} = \frac{\sum\limits_{i=1}^n|x_i-\overline{x}|}{n}$$

$X$$x-\overline{x}$$|x-\overline{x}|$$x-\tilde{x}$$|x-\tilde{x}|$$x-\hat{x}$$|x-\hat{x}|$
32$32-40 = -8$8$32-39=-7$7$32-36=-4$4
36$36-40=-4$4$36-39=-3$3$36-36=0$0
36$36-40=-4$4$36-39=-3$3$36-36=0$0
37$37-40=-3$3$37-39=-2$2$37-36=1$1
39$39-40=-1$1$39-39=0$0$39-36=3$3
41$41-40=1$1$41-39=2$2$41-36=5$5
45$45-40=5$5$45-39=6$6$45-36=9$9
46$46-40=6$6$46-39=7$7$46-36=10$10
48$48-40=8$7$48-39=9$9$48-36=12$12
Total0403936

Where
\begin{align*}
Mean &= \overline{x} = \frac{\sum\limits_{i=1}^n x_i}{n} = \frac{360}{9} = 40\\
Mode &= 36\\
Median &= 39\\
MD_{\overline{x}} &= \frac{\sum\limits_{i=1}^n |x-\overline{x}|}{n} = \frac{40}{9} = 4.44\\
MD_{\tilde{x}} &= \frac{\sum\limits_{i=1}^n |x-\tilde{x}|}{n} = \frac{39}{9} = 4.33\\
MD_{\hat{x}} &= \frac{\sum\limits_{i=1}^n |x-\hat{x}|}{n} = \frac{36}{9} = 4.00
\end{align*}

The relative measure of average deviation is the coefficient of average deviation. It can be calculated as follows:

Coefficient of Average Deviation from Mean (also called Mean Coefficient of Dispersion)

\begin{align*}\text{Mean Coefficient of Dispersion} = \frac{MD_{\overline{x}}}{\overline{x}} = \frac{4.44}{40}\times 100 = 11.1\%\end{align*}

Coefficient of Average Deviation from Median (also called Median Coefficient of Dispersion)

\begin{align*}\text{Median Coefficient of Dispersion} = \frac{MD_{\tilde{x}}}{\tilde{x}} = \frac{4.33}{39}\times 100 = 11.1\%\end{align*}

Coefficient of Average Deviation from Mode (also called Mode Coefficient of Dispersion)

\begin{align*}\text{Mode Coefficient of Dispersion} = \frac{MD_{\hat{x}}}{\hat{x}} = \frac{4}{36}\times 100 = 11.1\%\end{align*}

Average Deviation for Grouped Data

One can also compute average deviations for grouped data (Discrete Case) as follows:

$x$
Mid Point
$f$$fx$$|x-\overline{x}|$$f|x-\overline{x}|$$|x-\tilde{x}|$$f|x-\tilde{x}|$
10990$10-34=24$21620180
2010200$20-34=14$14010100
3017510$30-34=4$6800
4010400$40-34=6$6010100
505250$50-34=16$8020100
604240$60-34=26$10430120
705350$70-34=36$18040200
Total602040848800

\begin{align*}
\overline{x} &= \frac{\sum\limits_{i=1}^n}{n} = \frac{2040}{60} = 34\\
\tilde{x} &= 30\\
\hat{x} &= 30\\
MD_{\overline{x}} &= \frac{\sum\limits_{i=1}^n f|x-\overline{x}|}{n} = \frac{848}{60} = 14.13\\
MD_{\tilde{x}} &= \frac{\sum\limits_{i=1}^n f|x-\tilde{x}|}{n} = \frac{800}{60} = 13.33\\
MD_{\hat{x}} &= \frac{\sum\limits_{i=1}^n |x-\hat{x}|}{n} = \frac{36}{9} = 4\\
\text{Mean Coefficient of Dispersion} &= \frac{MD_{\overline{x}}} {n} = \frac{14.13}{34}\times = 41.57\%\\
\text{Median Coefficient of Dispersion} &= \frac{MD_{\tilde{x}}}{\tilde{x}} = \frac{13.333}{30}\times100=44.44\%
\end{align*}

Importance of Dispersion in Statistics

From the above discussion and numerical examples, In statistics, the variability or dispersion is crucial. The following are some reasons for the importance of Dispersion in Statistics:

  • Understanding Data Spread: Variability gives insights into the spread or distribution of data, helping to understand how much individual data points differ from the average or some other measure.
  • Data Reliability: Lower variability in data can indicate higher reliability and consistency, which is key for making sound predictions and decisions.
  • Identifying Outliers: High variability can indicate the presence of outliers or anomalies in the data, which might require further investigation.
  • Comparing Datasets: Dispersion measures, such as variance and standard deviation, allow for the comparison of different datasets. Two datasets might have the same mean but different levels of dispersion, which can imply different data patterns or behaviors.
  • Risk Assessment: In fields like finance, assessing the variability of returns is crucial for understanding and managing risk. Higher variability often implies higher risk.
  • Statistical Inferences: Many statistical methods, such as hypothesis testing and confidence intervals, rely on the variability of data to make accurate inferences about populations from samples.
  • Balanced Decision Making: Understanding variability helps in making more informed decisions by providing a clearer picture of the data’s characteristics and potential fluctuations.
Importance of Dispersion in Statistics

Overall, variability is essential for a comprehensive understanding of data, enabling analysts to draw meaningful conclusions and make informed decisions.

R Language Frequently Asked Questions

MCQs Basic Statistics Quiz 19

This Statistics Test is about MCQs Basic Statistics Quiz with Answers. There are 20 multiple-choice questions from Basics of Statistics, measures of central tendency, measures of dispersion, Measures of Position, and Distribution of Data. Let us start with the MCQS Basic Statistics Quiz with Answers

Online Multiple-Choice Questions about Basic Statistics with Answers

1. Mode of the values 3, 5, 8, 10, and 12 is

 
 
 
 

2. If any value in the data is negative, it is not possible to calculate

 
 
 
 

3. The median is larger than the arithmetic mean when

 
 
 
 

4. The most important measure of dispersion is

 
 
 
 

5. The difference between the largest and smallest value in the data is called

 
 
 
 

6. Mode of the values 2, 6, 8, 6, 12, 15, 18, and 8 is

 
 
 
 

7. Who used the term Statistics for the first time?

 
 
 
 

8. Two sets of distribution are as follows. For both of the sets, the Range is the same. Which of the demerits of Range is shown here in these sets of distribution?
Distribution 1: 30 14 18 25 12
Distribution 2: 30 7 19 27 12

 
 
 
 

9. What would be the changes in the standard deviation if different values are increased by a constant?

 
 
 
 

10. The first step in computing the median is

 
 
 
 

11. If 6 is multiple t all observations in the data, the mean is multiplied by

 
 
 
 

12. For a set of distributions if the value of the mean is 20 and the mode is 14 then what is the value of the median for a set of distributions?

 
 
 
 

13. Which of the following is an absolute measure of dispersion

 
 
 
 

14. Which of the properties of Average Deviation considers Mathematics assumption wrong?

 
 
 
 

15. Fill in the missing words to the quote: “Statistical methods may be described as methods for drawing conclusions about —————- based on ————– computed from the —————“.

 
 
 
 

16. In general, which of the following statements is FALSE?

 
 
 
 

17. If $x=3$ then which of the following is the minimum

 
 
 
 

18. The dispersion expressed in the form of a ratio or coefficient and independent from units of measurement is called

 
 
 
 

19. Which of the following is a relative measure of dispersion

 
 
 
 

20. The half of the difference between the third and first quartiles is called

 
 
 
 

Online MCQs Basic Statistics Quiz

  • If any value in the data is negative, it is not possible to calculate
  • Mode of the values 2, 6, 8, 6, 12, 15, 18, and 8 is
  • Mode of the values 3, 5, 8, 10, and 12 is
  • The first step in computing the median is
  • If $x=3$ then which of the following is the minimum
  • The dispersion expressed in the form of a ratio or coefficient and independent from units of measurement is called
  • The half of the difference between the third and first quartiles is called
  • The difference between the largest and smallest value in the data is called
  • The most important measure of dispersion is
  • Which of the following is a relative measure of dispersion
  • Which of the following is an absolute measure of dispersion
  • If 6 is multiple t all observations in the data, the mean is multiplied by
  • Which of the properties of Average Deviation considers Mathematics assumption wrong?
  • What would be the changes in the standard deviation if different values are increased by a constant?
  • Two sets of distribution are as follows. For both of the sets, the Range is the same. Which of the demerits of Range is shown here in these sets of distribution? Distribution 1: 30 14 18 25 12 Distribution 2: 30 7 19 27 12
  • For a set of distributions if the value of the mean is 20 and the mode is 14 then what is the value of the median for a set of distributions?
  • Who used the term Statistics for the first time?
  • The median is larger than the arithmetic mean when
  • Fill in the missing words to the quote: “Statistical methods may be described as methods for drawing conclusions about —————- based on ————– computed from the —————“.
  • In general, which of the following statements is FALSE?
MCQs Basic Statistics Quiz with Answers

Computer MCQs Online Test, Learn R Language