Errors in Statistics: A Comprehensive Guide

To learn about errors in statistics, we first need to understand the concepts related to true value, accuracy, and precision. Let us start with these basic concepts.

True Value

The true value is the value that would be obtained if no errors were made in any way by obtaining the information or computing the characteristics of the population under study.

The true value of the population is possible obtained only if the exact procedures are used for collecting the correct data, every element of the population has been covered and no mistake or even the slightest negligence has happened during the data collection process and its analysis. It is usually regarded as an unknown constant.

Accuracy

Accuracy refers to the difference between the sample result and the true value. The smaller the difference the greater will be the accuracy. Accuracy can be increased by

  • Elimination of technical errors
  • Increasing the sample size

Precision

Precision refers to how closely we can reproduce, from a sample, the results that would be obtained if a complete count (census) was taken using the same method of measurement.

Errors in Statistics

The difference between an estimated value and the population’s true value is called an error. Since a sample estimate is used to describe a characteristic of a population, a sample being only a part of the population cannot provide a perfect representation of the population (no matter how carefully the sample is selected). Generally, it is seen that an estimate is rarely equal to the true value and we may think about how close will the sample estimate be to the population’s true value. There are two kinds of errors, sampling and non-sampling errors.

  • Sampling error (random error)
  • Non-sampling errors (nonrandom errors)

Sampling Errors

A sampling error is the difference between the value of a statistic obtained from an observed random sample and the value of the corresponding population parameter being estimated. Sampling errors occur due to the natural variability between samples. Let $T$ be the sample statistic and it is used to estimate the population parameter $\theta$. The sampling error may be denoted by $E$,

$$E=T-\theta$$

The value of the sampling error reveals the precision of the estimate. The smaller the sampling error, the greater will be the precision of the estimate. The sampling error may be reduced by some of the following listed:

  • By increasing the sample size
  • By improving the sampling design
  • By using the supplementary information

Usually, sampling error arises when a sample is selected from a larger population to make inferences about the whole population.

Errors in Statistics, Sampling Error

Non-Sampling Errors

The errors that are caused by sampling the wrong population of interest and by response bias as well as those made by an investigator in collecting, analyzing, and reporting data are all classified as non-sampling errors (or non-random errors). These errors are present in a complete census as well as in a sampling survey.

Bias

Bias is the difference between the expected value of a statistic and the true value of the parameter being estimated. Let $T$ be the sample statistic used to estimate the population parameter $\theta$, then the amount of bias is

$$Bias = E(T) – \theta$$

The bias is positive if $E(T)>\theta$, bias is negative if $E(T) <\theta$, and bias is zero if $E(T)=\theta$. The bias is a systematic component of error that refers to the long-run tendency of the sample statistic to differ from the parameter in a particular direction. Bias is cumulative and increases with the increase in size of the sample. If proper methods of selection of units in a sample are not followed, the sample result will not be free from bias.

Note that non-sampling errors can be difficult to identify and quantify, therefore, the presence of non-sampling errors can significantly impact the accuracy of statistical results. By understanding and addressing these errors, researchers can improve the reliability and validity of their statistical findings.

Errors in Statistics: Potential Sources of Error

https://rfaqs.com, https://gmstat.com

Quartiles

Introduction to Quantiles and Quartiles

Quantiles are the techniques used to divide the data into different equal parts. For example, quantiles divide the data into four equal parts. Quartile comes from quarter which means 4th part. Deciles divide the data into ten equal parts and they come from deca means the 10th part. Percentiles divide the data into hundred parts and it comes to percent which means the 100th part.

Therefore, quartiles, deciles, and percentiles are used to divide the data into 4, 10, and 100 parts respectively. The quantiles, deciles, and percentiles are collectively called quantiles.

Quartiles

Quartiles are the rules that divide the data into four equal parts. When we divide any data into four equal parts, we cut it at equidistant points. The quartiles ($Q_1, Q_2$, and $Q_3$) divide the data into four equal parts, so divide the number of observations by four for each quartile.

Quartiles for Ungroup Data

\begin{align*}
Q_1 &= \left(\frac{n+1}{4}\right)th \text{ value is the} \frac{1}{4} \text{ part}\\
Q_2 &= \left(\frac{2(n+1)}{4}\right)th \text{ value is the} \frac{2}{4} \text{ part}\\
Q_3 &=\left(\frac{3(n+1)}{4}\right)th \text{ value is the} \frac{3}{4} \text{ part}
\end{align*}

The following ungroup data has 96 observations $(n=96)$

222225253030303131333639
404042424848505152555759
818689899091919192939393
939494949596969697979898
999999100100100101101102102102102
102103103104104104105106106106107108
108108109109109110111112112113113113
113114115116116117117117118118119121

The first, second, and third quartiles of the above data set are:

\begin{align*}
Q_1 &= \left(\frac{n}{4}\right)th \text{ position } = \left(\frac{96}{4} = 24\right)th \text{ value} = 59\\
Q_2 &= \left(\frac{2\times 96}{4}\right) = 48th \text{ position} = 98\\
Q_3 &= \left(\frac{3\times n}{4}\right)th = \left(\frac{3\times 96}{}\right)th \text{ position} = 72th \text{ position} = 108
\end{align*}

Note that the above data is already sorted. If the data is not sorted, we first need to arrange/sort it in ascending order.

Quartiles for Gruoped Data

One can also compute the quantiles for the following grouped data, hence the quartiles.

ClassesfxC.B.CF
65-84974.564.5-84.59
85-1041094.584.5-104.519
105-12417114.5104.4.5-124.536
125-14410134.5124.5-144.546
145-1645154.5144.5-164.551
165-1844174.5164.5-184.455
185-2045194.5184.5-204.560
Total60   

From the above-grouped data, we have 60 observations $(n=60)= \sum\limits_{i=1}^n = f_i = \Sigma f = 60$. The three quartile will be

\begin{align*}
\frac{n}{4} &= \left(\frac{60}{4}\right)th = 15th \text{ value}\\
Q_1 &= l + \frac{h}{f}\left(\frac{n}{4} – CF\right) = 84.5 + \frac{20}{10}(15-9) = 96.5\\
\frac{2n}{4} &= \left(\frac{2\times 60}{4} \right) = 30th \text{ value}\\
Q_2 &= l + \frac{h}{f}\left(\frac{2n}{4} – CF\right) = 104.5 + \frac{20}{17}(30-19) = 117.44\\
\frac{3n}{4} &= \left(\frac{3\times 60}{4} \right) = 45th \text{ value}\\
Q_3 &= l + \frac{h}{f}\left(\frac{3n}{4} – CF\right) = 124.5 + \frac{20}{17}(45-36) = 142.5\\
\end{align*}

Frequently Asked Questions about Quantiles

  1. Define Quartiles, Deciles, Percetiles.
  2. What are fractiles or Quantiles?
  3. How quantiles are computed for grouped and ungrouped data.

https://rfaqs.com

https://gmstat.com

MCQs Skewness Basic Statistics 9

The post is about MCQs Skewness Basic Statistics. There are 20 multiple-choice questions covering the topics of skewness, kurtosis, symmetrical distribution, and empirical relationship between mean, median, and mode. Let us start with the MCQs Skewness Basic Statistics Quiz.

Online MCQs Skewness and Kurtosis

1. The second moment about the mean is equal to

 
 
 
 

2. Bowley’s coefficient of Skewness lies between

 
 
 
 

3. For a moderately skewed distribution, which of the following hold

 
 
 
 

4. If in a distribution the left tail is longer than the right tail, then the distribution will be

 
 
 
 

5. The empirical relation between mean, median, and mode is

 
 
 
 

6. In Uni-model distribution, if the mode is less than the mean, then the distribution will be

 
 
 
 

7. If the mean is less than the mode, the distribution is

 
 
 
 

8. The shape of the symmetrical distribution is _______

 
 
 
 

9. In a symmetrical distribution, mean, median, and mode are:

 
 
 
 

10. Which of the following is negatively skewed?

 
 
 
 

11. A symmetrical distribution has a mean equal to 4. Its mode will be

 
 
 
 

12. The values of mean, median, and mode can be

 
 
 
 

13. The distribution in which mean = 60 and mode = 50, will be ____________

 
 
 
 

14. The distribution is symmetrical if the moment coefficient of skewness $\sqrt{b_1}$ is

 
 
 
 

15. For a positively skewed distribution

 
 
 
 

16. If mean, median, and mode are all equal then distribution will be

 
 
 
 

17. If the mean is less than the mode, the distribution will be

 
 
 
 

18. A curve whose tail is longer to the right is called

 
 
 
 

19. In a symmetrical distribution, the mean is _________ mode

 
 
 
 

20. If the third moment about the mean is zero then the distribution is

 
 
 
 

MCQs Skewness Basic Statistics with Answers

  • The shape of the symmetrical distribution is ————.
  • In a symmetrical distribution, mean, median, and mode are:
  • In a symmetrical distribution, the mean is ———— mode
  • A symmetrical distribution has a mean equal to 4. Its mode will be
  • If mean, median, and mode are all equal then distribution will be
  • The values of mean, median, and mode can be
  • The distribution in which mean = 60 and mode = 50, will be ————.
  • If in a distribution the left tail is longer than the right tail, then the distribution will be
  • If the mean is less than the mode, the distribution will be
  • In Uni-model distribution, if the mode is less than the mean, then the distribution will be
  • The empirical relation between mean, median, and mode is
  • Bowley’s coefficient of Skewness lies between
  • For a positively skewed distribution
  • For a moderately skewed distribution, which of the following hold
  • The distribution is symmetrical if the moment coefficient of skewness $\sqrt{b_1}$ is
  • A curve whose tail is longer to the right is called
  • If the mean is less than the mode, the distribution is
  • If the third moment about the mean is zero then the distribution is
  • Which of the following is negatively skewed?
  • The second moment about the mean is equal to
Statistics Help MCQs Skewness Basic Statistics

https://gmstat.com

https://rfaqs.com