To learn about errors in statistics, we first need to understand the concepts related to true value, accuracy, and precision. Let us start with these basic concepts.
Table of Contents
True Value
The true value is the value that would be obtained if no errors were made in any way by obtaining the information or computing the characteristics of the population under study.
The true value of the population is possible obtained only if the exact procedures are used for collecting the correct data, every element of the population has been covered and no mistake or even the slightest negligence has happened during the data collection process and its analysis. It is usually regarded as an unknown constant.
Accuracy
Accuracy refers to the difference between the sample result and the true value. The smaller the difference the greater will be the accuracy. Accuracy can be increased by
- Elimination of technical errors
- Increasing the sample size
Precision
Precision refers to how closely we can reproduce, from a sample, the results that would be obtained if a complete count (census) was taken using the same method of measurement.
Errors in Statistics
The difference between an estimated value and the population’s true value is called an error. Since a sample estimate is used to describe a characteristic of a population, a sample being only a part of the population cannot provide a perfect representation of the population (no matter how carefully the sample is selected). Generally, it is seen that an estimate is rarely equal to the true value and we may think about how close will the sample estimate be to the population’s true value. There are two kinds of errors, sampling and non-sampling errors.
- Sampling error (random error)
- Non-sampling errors (nonrandom errors)
Sampling Errors
A sampling error is the difference between the value of a statistic obtained from an observed random sample and the value of the corresponding population parameter being estimated. Sampling errors occur due to the natural variability between samples. Let $T$ be the sample statistic and it is used to estimate the population parameter $\theta$. The sampling error may be denoted by $E$,
$$E=T-\theta$$
The value of the sampling error reveals the precision of the estimate. The smaller the sampling error, the greater will be the precision of the estimate. The sampling error may be reduced by some of the following listed:
- By increasing the sample size
- By improving the sampling design
- By using the supplementary information
Usually, sampling error arises when a sample is selected from a larger population to make inferences about the whole population.
Non-Sampling Errors
The errors that are caused by sampling the wrong population of interest and by response bias as well as those made by an investigator in collecting, analyzing, and reporting data are all classified as non-sampling errors (or non-random errors). These errors are present in a complete census as well as in a sampling survey.
Bias
Bias is the difference between the expected value of a statistic and the true value of the parameter being estimated. Let $T$ be the sample statistic used to estimate the population parameter $\theta$, then the amount of bias is
$$Bias = E(T) – \theta$$
The bias is positive if $E(T)>\theta$, bias is negative if $E(T) <\theta$, and bias is zero if $E(T)=\theta$. The bias is a systematic component of error that refers to the long-run tendency of the sample statistic to differ from the parameter in a particular direction. Bias is cumulative and increases with the increase in size of the sample. If proper methods of selection of units in a sample are not followed, the sample result will not be free from bias.
Note that non-sampling errors can be difficult to identify and quantify, therefore, the presence of non-sampling errors can significantly impact the accuracy of statistical results. By understanding and addressing these errors, researchers can improve the reliability and validity of their statistical findings.