MCQs on Statistical Inference 9

The quiz is about MCQs on Statistical Inference with Answers. The quiz contains 20 questions about hypothesis testing and p-values. It covers the topics of formulation of the null and alternative hypotheses, level of significance, test statistics, region of rejection, and decision about acceptance and rejection of the hypothesis. Let us start with the Quiz MCQs on Statistical Inference.

Online MCQs on Statistical Inference with Answers

1. Suppose a research article indicates a $p = 0.001$ value in the results section ($\alpha = 0.05$).

You have absolutely proven your alternative hypothesis (that is, you have proven that there is a difference between the population means).

 
 
 

2. Suppose a research article indicates a value of $p = 0.001$ in the results section ($\alpha = 0.05$).

The p-value of a statistical test is the probability of the observed result or a more extreme result, assuming the null hypothesis is true.

 
 
 

3. Suppose a research article indicates a $p = 0.30$ value in the results section ($\alpha = 0.05$).

The alternative hypothesis has been shown to be false.

 
 
 

4. Suppose a research article indicates a value of $p = 0.001$ in the results section ($\alpha = 0.05$).

You have found the probability of the null hypothesis being true ($p = .001$).

 
 
 

5. Study A and B are completely identical, except that all tests reported in Study A were pre-registered at a publicly available location (and the reported tests match the pre-registered tests), but all tests in Study B are not pre-registered. Both contain analyses with covariates. Based on research on flexibility in the data analysis, we can expect that on average study A will have ————; the covariate analyses are ————-.

 
 
 
 

6. Suppose a research article indicates a $p = 0.30$ value in the results section ($\alpha = 0.05$).

You have proven the null hypothesis (that is, you have proven that there is no difference between the population means).

 
 
 

7. A Type-I error is ————–, and the Type-I error rate is determined by ————–.

 
 
 
 

8. When the null hypothesis is true, the probability of finding a specific p-value is ————-.

 
 
 
 

9. It is important to have access to all (and not just statistically significant) research findings to be able to ————. A consequence of publication bias is that ———–.

 
 
 
 

10. When $H_0$ is true, the probability that at least 1 out of a $X$ completely independent findings is a Type 1 error is equal to ————, this probability ———— when you look at your data and collect more data if a test is not significant.

 
 
 
 

11. Suppose a research article indicates a $p = 0.001$ value in the results section ($\alpha = 0.05$).

Obtaining a statistically significant result implies that the effect detected is important.

 
 
 

12. Person A is very skeptical about homeopathy. Person B believes strongly in homeopathy. They both read a study about homeopathy, which reports a positive effect and $p < 0.05$. Person A would be more likely than Person B to conclude that ———-, and Person B would be more likely than Person A to think that ————-.

 
 
 
 

13. Suppose a research article indicates a $p = 0.30$ value in the results section ($\alpha = 0.05$).

The p-value gives the probability of obtaining a significant result whenever a given experiment is replicated.

 
 
 

14. After finding a single statistically significant p-value we can conclude that ————-, but it would be incorrect to conclude that ————.

 
 
 
 

15. Suppose a research article indicates a value of $p = 0.30$ in the results section ($\alpha = 0.05$).

Obtaining a statistically non-significant result implies that the effect detected is unimportant.

 
 
 

16. Suppose a research article indicates a $p = 0.001$ value in the results section ($\alpha = 0.05$).

The probability that the results of the given study are replicable is not equal to $1-p$.

 
 
 

17. Suppose a research article indicates a $p = 0.30$ value in the results section ($\alpha = 0.05$).

You have found the probability of the null hypothesis being true ($p = 0.30$).

 
 
 
 

18. You perform two studies to test a potentially life-saving drug. Both studies have 80% power. What is the chance of two type 2 errors (of false negatives) in a row?

 
 
 
 

19. Suppose that a research article indicates a value of $p = 0.001$ in the results section ($\alpha = 0.05$).

The null hypothesis has been shown to be false.

 
 
 

20. When the difference between means is 5, and the standard deviation is 4, Cohen’s d is ————— which is ————— according to the benchmarks proposed by Cohen.

 
 
 
 

MCQs on Statistical Inference with Answers

  • A Type-I error is ————–, and the Type-I error rate is determined by ————–.
  • Suppose a research article indicates a $p = 0.30$ value in the results section ($\alpha = 0.05$). You have found the probability of the null hypothesis being true ($p = 0.30$).
  • Suppose a research article indicates a $p = 0.30$ value in the results section ($\alpha = 0.05$). You have proven the null hypothesis (that is, you have proven that there is no difference between the population means).
  • Suppose that a research article indicates a value of $p = 0.001$ in the results section ($\alpha = 0.05$). The null hypothesis has been shown to be false.
  • Suppose a research article indicates a $p = 0.30$ value in the results section ($\alpha = 0.05$). The p-value gives the probability of obtaining a significant result whenever a given experiment is replicated.
  • Suppose a research article indicates a value of $p = 0.30$ in the results section ($\alpha = 0.05$). Obtaining a statistically non-significant result implies that the effect detected is unimportant.
  • Suppose a research article indicates a value of $p = 0.001$ in the results section ($\alpha = 0.05$). The p-value of a statistical test is the probability of the observed result or a more extreme result, assuming the null hypothesis is true.
  • Suppose a research article indicates a $p = 0.001$ value in the results section ($\alpha = 0.05$). Obtaining a statistically significant result implies that the effect detected is important.
  • Suppose a research article indicates a $p = 0.001$ value in the results section ($\alpha = 0.05$). You have absolutely proven your alternative hypothesis (that is, you have proven that there is a difference between the population means).
  • Suppose a research article indicates a value of $p = 0.001$ in the results section ($\alpha = 0.05$). You have found the probability of the null hypothesis being true ($p = .001$).
  • Suppose a research article indicates a $p = 0.001$ value in the results section ($\alpha = 0.05$). The probability that the results of the given study are replicable is not equal to $1-p$.
  • Person A is very skeptical about homeopathy. Person B believes strongly in homeopathy. They both read a study about homeopathy, which reports a positive effect and $p < 0.05$. Person A would be more likely than Person B to conclude that ———-, and Person B would be more likely than Person A to think that ————-.
  • You perform two studies to test a potentially life-saving drug. Both studies have 80% power. What is the chance of two type 2 errors (of false negatives) in a row?
  • Study A and B are completely identical, except that all tests reported in Study A were pre-registered at a publicly available location (and the reported tests match the pre-registered tests), but all tests in Study B are not pre-registered. Both contain analyses with covariates. Based on research on flexibility in the data analysis, we can expect that on average study A will have ————; the covariate analyses are ————-.
  • When the null hypothesis is true, the probability of finding a specific p-value is ————-.
  • After finding a single statistically significant p-value we can conclude that ————-, but it would be incorrect to conclude that ————.
  • When $H_0$ is true, the probability that at least 1 out of a $X$ completely independent findings is a Type 1 error is equal to ————, this probability ———— when you look at your data and collect more data if a test is not significant.
  • It is important to have access to all (and not just statistically significant) research findings to be able to ————. A consequence of publication bias is that ———–.
  • When the difference between means is 5, and the standard deviation is 4, Cohen’s d is ————— which is ————— according to the benchmarks proposed by Cohen.
  • Suppose a research article indicates a $p = 0.30$ value in the results section ($\alpha = 0.05$). The alternative hypothesis has been shown to be false.
MCQs on Statistical Inference

Online MCQs and Test Website, gmstat.com

R Programming Language and Statistics

Partial Correlation Example

In this post, we will learn about Partial Correlation and will perform on a data as Partial Correlation Example. In multiple correlations there are more than 2 variables, (3 variables and above) also called multivariable, in partial correlation there involved 3 or more variables, partial correlation is defined as the degree of the linear relationship between any two variables, in a set of multivariable data, by keeping the effect of all other variables as a constant.

Partial Correlation Formula

For three variables say $X_1, X_2, X_3$ then the partial correlation measures the relation between $X_1$ and $X_2$ by removing the influence of $X_3$ is the partial correlation $X_1$ and $X_2$. And is given as

$$r_{12 \cdot 3}= \frac{ r_{12} – r_{13} r_{23}} {\sqrt{(1-r_{13}^2)(1- r_{23}^2)} }$$

If we want to find the partial correlation between $X_1$ and $X_3$ then

$$r_{13\cdot 2}= \frac{ r_{13} – r_{12} r_{32}}{ \sqrt{(1- r_{12}^2)(1- r_{32}^2)}}$$

If we want to find the partial correlation between $X_2$ and $X_3$ then

$$r_{23\cdot 1}= \frac{r_{23} – r_{21} r_{31}}{\sqrt{(1- r_{21}^2)(1- r_{31}^2)}}$$

Partial Correlation Graphical Representation

Partial correlation is a statistical measure of relationship between two variables while controlling for (excluding or eliminating) the effects of one or more additional variables. For three variables, say $X, Y,$ and $Z$ is

Partial Correlation Example

Partial Correlation is used when researchers want to determine the strength and direction of relationship between two variables without the influence of other variables. This is particularly useful in multivariate analysis where multiple variables may be interrelated. The partial correlation coefficient ranges from $-1$ to $+1$, with $-1$ indicating a perfect negative correlation, $+1$ indicating a perfect positive correlation, and 0 indicating no correlation.

Partial Correlation Example

For Partial Correlation Example, consider the following data with some basic computation.

$X_1$$X_2$$X_3$$X_1X_2$$X_1X_3$$X_2X_3$$X_1^2$$X_2^2$$X_3^2$
741287449161
1272842414144494
148411256321966416
179515385452898125
201282401609640014464
Total7040206173321911078354110

First compute $r_{21}, r_{13}, r_{23}, r_{12}, r_{31}$, and $r_{32}$.

\begin{align}
r_{12} &= \frac{n\Sigma (x_1 x_2 ) – (\Sigma x_1)(\Sigma x_2 )} {\sqrt{\left[n\Sigma x_1 ^2 -(\Sigma x_1)^2\right] \left[n \Sigma x_2^2 – (\Sigma x_2 )^2\right]}}\\
&= \frac{5(617)-(70)(40)} {\sqrt{\left[5 (1078)-(70)^2\right]\left[5(354)-(40)^2\right]} } = 0.987\\
r_{13} &= \frac{n\Sigma(x_1 x_3 ) – (\Sigma x_1)(\Sigma x_3 )}{\sqrt{\left[n\Sigma x_1^2 – (\Sigma x_1 )^2\right]\left[n \Sigma x_3^2 – (\Sigma x_3 )^2\right]}}\\
&= \frac{5(332)-(70)(20)}{\sqrt{\left[5 (1078)-(70)^2\right]\left[5(110)-(20)^2\right]}}= 0.959\\
r_{23} &= \frac{n\Sigma(x_2 x_3 )-(\Sigma x_2 )(\Sigma x_3 )}{\sqrt{\left[n\Sigma x_2^2 -(\Sigma x_2 )^2\right]\left[n\Sigma x_3^2 -(\Sigma x_3 )^2\right]}}\\
& = \frac{5(191)-(40)(20)}{\sqrt{\left[5(354)-40^2\right]\left[5(110)-20^2\right]}}= 0.971\\
r_{12\cdot 3} &= \frac{r_{12} – r_{13} r_{23} } {\sqrt{(1 – r_{13}^2) (1 – r_{23}^2) }}\\
& = \frac{0.987-(0.959)(0.971)} {\sqrt{(1-(0.959)^2)(1-(0.971)^2)}}\\
&=\frac{0.05659}{0.0681} = 0.8305
\end{align}

Partial correlation is commonly used in statistical analysis, especially in fields like psychology, social sciences, and any area where multivariate relationships are analyzed.

https://rfaqs.com

Importance of Dispersion in Statistics

The importance of dispersion in statistics cannot be ignored. The term dispersion (or spread, or variability) is used to express the variability in the data set. The measure of dispersion is very important in statistics as it gives an average measure of how much data points differ from the average or another measure. The measure of variability tells about the consistency in the data sets.

The dispersion is a quantity that is far away from its center point (such as average). The data with minimum variation/variability with respect to its center point (average) is said to be more consistent. The lesser the variability in the data the more consistent the data.

Example of Measure of Dispersion

Suppose the score of three batsmen in three cricket matches:

PlayerMatch 1Match 2Match 3Average Score
A70809080
B75809580
C65809580

The question is which player is more consistent with his performance.

In the above data set the player whose deviation from average is minimum will be the most consistent player. So, the player B is more consistent than others. He shows less variation.

There are two types of measures of dispersion:

Absolute Measure of Dispersion

In absolute measure of dispersion, the measure is expressed in the original units in which the data is collected. For example, if data is collected in grams, the measure of dispersion will also be expressed in grams. The absolute measure of dispersion has the following types:

  • Range
  • Quartile Deviation
  • Average Deviation
  • Standard Deviation
  • Variance

Relative Measures of Dispersion

In the relative measures of dispersion, the measure is expressed in terms of coefficients, percentages, ratios, etc. It has the following types:

  • Coefficient of range
  • Coefficient of Quartile Deviation
  • Coefficient of Average Deviation
  • Coefficient of Variation (CV)

See more about Measures of Dispersion

Range and Coefficient of Range

Range is defined as the difference between the maximum value and minimum value of the data, statistically, it is $R=x_{max} – x_{min}$.

The Coefficient of Range is $=\frac{x_{max} – x_{min} }{x_{max} – x_{min} }$. Multiplying it by 100 will express it in percentages.

Consider the ungrouped data $x = 32, 36, 36, 37, 39, 41, 45, 46, 48$

The range will be $x_{max} – x_{min} = 48 – 32 = 16$.

The coefficient of Range will be $=\frac{x_{max} – x_{min} }{x_{max} – x_{min} }$

\begin{align*}
Coef\,\, of\,\, Range =\frac{x_{max} – x_{min} }{x_{max} – x_{min} } \\
&= \frac{48-32}{48+32} = \frac{16}{80} = 0.2\\
&= 0.2 \times 100 = 20\%
\end{align*}

For the following grouped data, the range and coefficient of the range will be

ClassesFreqClass Boundaries
65 – 84964.5 – 84.5
85 – 1041084.5 – 104.5
105 – 12417104.5 – 124.5
125 – 14410124.5 – 144.5
145 – 1645144.5 – 164.5
165 – 1844164.5 – 184.5
185 – 2045184.5 – 204.5
Tota.60

The upper class bound of the highest class will be $x_{min}$ and the lower class boundary of the lowest class will be $x_{min}$. Therefore, $x_{max}=204.5$ and $x_{min} = 64.5$. Therefore,

$$Range = x_{max} – x_{min} = 204.5 – 64.5 = 140$$

The Coefficient of Range will be

\begin{align*}
Coef\,\, of\,\, Range &=\frac{x_{max} – x_{min} }{x_{max} – x_{min} } \\
&= \frac{204.5-64.5}{204.5+64.5} = \frac{140}{269} = 0.5204\\
&= 0.5204 \times 100 = 52.04\%
\end{align*}

Average Deviation and Coefficient of Average Deviation

The average deviation is an absolute measure of dispersion. The mean/average of absolute deviation either taken from mean, median, or mode is called average deviation. Statistically, it is

$$Mean\,\, Deviation_{\overline{X}} = \frac{\sum\limits_{i=1}^n|x_i-\overline{x}|}{n}$$

$X$$x-\overline{x}$$|x-\overline{x}|$$x-\tilde{x}$$|x-\tilde{x}|$$x-\hat{x}$$|x-\hat{x}|$
32$32-40 = -8$8$32-39=-7$7$32-36=-4$4
36$36-40=-4$4$36-39=-3$3$36-36=0$0
36$36-40=-4$4$36-39=-3$3$36-36=0$0
37$37-40=-3$3$37-39=-2$2$37-36=1$1
39$39-40=-1$1$39-39=0$0$39-36=3$3
41$41-40=1$1$41-39=2$2$41-36=5$5
45$45-40=5$5$45-39=6$6$45-36=9$9
46$46-40=6$6$46-39=7$7$46-36=10$10
48$48-40=8$7$48-39=9$9$48-36=12$12
Total0403936

Where
\begin{align*}
Mean &= \overline{x} = \frac{\sum\limits_{i=1}^n x_i}{n} = \frac{360}{9} = 40\\
Mode &= 36\\
Median &= 39\\
MD_{\overline{x}} &= \frac{\sum\limits_{i=1}^n |x-\overline{x}|}{n} = \frac{40}{9} = 4.44\\
MD_{\tilde{x}} &= \frac{\sum\limits_{i=1}^n |x-\tilde{x}|}{n} = \frac{39}{9} = 4.33\\
MD_{\hat{x}} &= \frac{\sum\limits_{i=1}^n |x-\hat{x}|}{n} = \frac{36}{9} = 4.00
\end{align*}

The relative measure of average deviation is the coefficient of average deviation. It can be calculated as follows:

Coefficient of Average Deviation from Mean (also called Mean Coefficient of Dispersion)

\begin{align*}\text{Mean Coefficient of Dispersion} = \frac{MD_{\overline{x}}}{\overline{x}} = \frac{4.44}{40}\times 100 = 11.1\%\end{align*}

Coefficient of Average Deviation from Median (also called Median Coefficient of Dispersion)

\begin{align*}\text{Median Coefficient of Dispersion} = \frac{MD_{\tilde{x}}}{\tilde{x}} = \frac{4.33}{39}\times 100 = 11.1\%\end{align*}

Coefficient of Average Deviation from Mode (also called Mode Coefficient of Dispersion)

\begin{align*}\text{Mode Coefficient of Dispersion} = \frac{MD_{\hat{x}}}{\hat{x}} = \frac{4}{36}\times 100 = 11.1\%\end{align*}

Average Deviation for Grouped Data

One can also compute average deviations for grouped data (Discrete Case) as follows:

$x$
Mid Point
$f$$fx$$|x-\overline{x}|$$f|x-\overline{x}|$$|x-\tilde{x}|$$f|x-\tilde{x}|$
10990$10-34=24$21620180
2010200$20-34=14$14010100
3017510$30-34=4$6800
4010400$40-34=6$6010100
505250$50-34=16$8020100
604240$60-34=26$10430120
705350$70-34=36$18040200
Total602040848800

\begin{align*}
\overline{x} &= \frac{\sum\limits_{i=1}^n}{n} = \frac{2040}{60} = 34\\
\tilde{x} &= 30\\
\hat{x} &= 30\\
MD_{\overline{x}} &= \frac{\sum\limits_{i=1}^n f|x-\overline{x}|}{n} = \frac{848}{60} = 14.13\\
MD_{\tilde{x}} &= \frac{\sum\limits_{i=1}^n f|x-\tilde{x}|}{n} = \frac{800}{60} = 13.33\\
MD_{\hat{x}} &= \frac{\sum\limits_{i=1}^n |x-\hat{x}|}{n} = \frac{36}{9} = 4\\
\text{Mean Coefficient of Dispersion} &= \frac{MD_{\overline{x}}} {n} = \frac{14.13}{34}\times = 41.57\%\\
\text{Median Coefficient of Dispersion} &= \frac{MD_{\tilde{x}}}{\tilde{x}} = \frac{13.333}{30}\times100=44.44\%
\end{align*}

Importance of Dispersion in Statistics

From the above discussion and numerical examples, In statistics, the variability or dispersion is crucial. The following are some reasons for the importance of Dispersion in Statistics:

  • Understanding Data Spread: Variability gives insights into the spread or distribution of data, helping to understand how much individual data points differ from the average or some other measure.
  • Data Reliability: Lower variability in data can indicate higher reliability and consistency, which is key for making sound predictions and decisions.
  • Identifying Outliers: High variability can indicate the presence of outliers or anomalies in the data, which might require further investigation.
  • Comparing Datasets: Dispersion measures, such as variance and standard deviation, allow for the comparison of different datasets. Two datasets might have the same mean but different levels of dispersion, which can imply different data patterns or behaviors.
  • Risk Assessment: In fields like finance, assessing the variability of returns is crucial for understanding and managing risk. Higher variability often implies higher risk.
  • Statistical Inferences: Many statistical methods, such as hypothesis testing and confidence intervals, rely on the variability of data to make accurate inferences about populations from samples.
  • Balanced Decision Making: Understanding variability helps in making more informed decisions by providing a clearer picture of the data’s characteristics and potential fluctuations.
Importance of Dispersion in Statistics

Overall, variability is essential for a comprehensive understanding of data, enabling analysts to draw meaningful conclusions and make informed decisions.

R Language Frequently Asked Questions

Data Analytics MCQs 2

The post is about Data Analytics MCQs. There are 20 multiple-choice questions for preparation for various subjects related to BS Data Analytics Degree Programs. Let us start with the Data Analytics MCQs with Answers.

Please go to Data Analytics MCQs 2 to view the test

Online Data Analytics MCQs with Answers

  • Which emerging technology has made it possible for every enterprise to have access to limitless storage and high-performance computing?
  • Which of the data roles is responsible for extracting, integrating, and organizing data into data repositories?
  • When you analyze historical data to predict future outcomes what type of Data Analytics are you performing?
  • A modern data ecosystem includes a network of continually evolving entities. It includes:
  • Data Analysts work within the data ecosystem to:
  • When we analyze data to understand why an event took place, which of the four types of data analytics are we performing?
  • The first step in the data analysis process is to gain an in-depth understanding of the problem and the desired outcome. What are you seeking answers to at this stage of the data analysis process?
  • From the provided list, select the three emerging technologies that are shaping today’s data ecosystem.
  • From the provided list, select the three emerging technologies that are shaping today’s data ecosystem.
  • Which of these skills is essential to the role of a Data Analyst?
  • What, according to Sivaram Jaladi, goes a long way in lending credibility to your data analysis findings?
  • Why is proficiency in Statistics an important skill for a Data Analyst?
  • Which of these is one of the soft skills required to be a successful Data Analyst?
  • Which of the data analyst functional skills helps research and interpret data, theorize, and make forecasts?
  • In “A Day in the Life of a Data Analyst”, what according to Sivaram Jaladi forms a large part of a Data Analyst’s job?
  • In “A Day in the Life of a Data Analyst”, what are some of the data points that were useful in analyzing the use case?
  • What data type is typically found in databases and spreadsheets?
  • Which of these data sources is an example of semi-structured data?
  • Which one of the provided file formats is commonly used by APIs and Web Services to return data?
  • What is one example of the relational databases discussed in the video?
Data Analytics MCQs with Answers

Online Quiz Website with Answers

R Frequently Asked Questions