Statistics for Data Science & Analytics - MCQs, Software & Data Analysis

Important MCQs Sampling Techniques Quiz with Answers 12

Aug 2, 2024Aug 2, 2024 by Muhammad Imdad Ullah

Post Views: 1,277

The post is about Multiple Choice Questions Related to Sampling and Sampling Techniques Quiz. There are 20 MCQs about Sampling Techniques covering the topics related, to stratified sampling, cluster sampling, and simple random sampling. Let us start with the Sampling Techniques Quiz.

MCQs Sampling Techniques Quiz with Answers

Sample allocation plan that provides the most precision, given a fixed sample size is
In stratified random sampling with strata weights 0.35, 0.55, and 0.10, and standard deviations 16, 23, and 19, and sample sizes 70, 110, and 20, the variance of the sample mean estimator is?
A group consists of 300 people and we are interviewing all members of a given group called
When the procedure of selecting the elements from the population is not based on probability is known as
Problem of non-Response
If the respondents do not provide the required information to the researcher, then it is known as
If larger units have more probability of their inclusion in the sample, the sampling is known as
The cluster sampling method differs from the stratified sampling method in that
One similarity between the stratified sampling method and the cluster sampling method is that
In a cluster random sampling method
In a stratified random sampling method
Suppose that all the units in the population are divided into ten mutually exclusive groups. The word “mutually exclusive” means that
When the population is badly affected, which type of sampling is appropriate?
A randomly selected sample of 1000 college students was asked whether they had ever used the drug Ecstasy. Sixteen percent (16% or 0.16) of the 1000 students surveyed said they had which one of the following statements about the number 0.16 is correct?
Which ONE of the following is the main problem with using non-probability sampling techniques?
Which of the following is not a characteristic of quota sampling?
Before completing a survey, an individual acknowledges reading information about how and why the data they provide will be used. What is this concept called?
————- is a set of elements taken from a larger population according to certain rules.
The mean of clusters average is the biased estimator of a population mean when
Which of the following would usually require the smallest sample size because of efficiency?

https://gmstat.com

https://rfaqs.com

Statistical Hypotheses: Made Easy

Aug 1, 2024 by Muhammad Imdad Ullah

Post Views: 776

Overview of Statistical Hypotheses

A statistical hypothesis is a claim about a population parameter. For example,

The mean height of males is less than 65 inches tall
The percentage of people favoring a bullet train is about 59%
The daily average expense for a college student is more than Rs. 250
At least 5% of Pakistan earn more than Rs 2,500,000 per year

A statistical method is used to determine if there is enough evidence in sample data to support a claim about a population.

The claimed hypotheses are written in certain statistical and concise forms. For example, the above statements about population can be written as

$H_0: \mu < 65$
$H_0: \pi = 0.59$
$H_0:\mu > 250$
$H_0: \pi \ge 0.05$

If someone is interested in knowing that above stated statistical hypotheses are either true or false, one needs to conduct a hypothesis test. To test a statistical hypothesis, one needs to follow the following basic procedure, to fulfill the requirements.

Draw a random sample from the population of interest (for example, the height of males)
Determine if the results from the sample data are consistent or not with the hypothesis under study.
If the collected sample data is (significantly) different from the claimed hypothesis, then reject the hypothesis as being false. However, if the sampled data is not significantly different, one would not reject the hypothesis.

Statistical Hypotheses Example

Example: Suppose a battery manufacturer claims that the average life of their batteries is at least 300 minutes.

To test this hypothesis, we follow the procedure as

Select a sample of say $n=100$ batteries. The sample of batteries is tested and the mean life of sampled batteries was found to be $\overline{x} = 294$ minutes with a sample standard deviation of $s=204 minutes.
We need to test whether “is this data sufficiently different from the manufacturer’s claim to justify rejecting the claim as false”?
Since the sample drawn is large enough, the Central Limit Theorem allows us to conclude that the distribution of sample means $\overline{x}$ is approximately normal.
If the manufacturer’s claim is correct, then $\mu_{\overline{x}} = \mu \ge 300$ and we will assume that $\mu_{\overline{x}} = \mu = 300$.
The Z-score will be $$Z = \frac{\overline{x} – \mu_{\overline{x}}}{\frac{s}{\sqrt{n}}}=\frac{294-300}{\frac{20}{\sqrt{100}}} = -3.0$$
Search the Probability value from Standard Normal Table, as $P(\overline{x} \le 294)=0.0013$

Decision about Hypothesis

Now one of the following must be true

The assumption that $\mu = 300$ is incorrect
The sample drawn has a so small mean that only 13 in 10,000 samples have a mean as low.

The probability of the second statement being true is quite small (0.0013). Thus there is strong evidence to believe that the first statement is true, and hence the manufacturer overstated the mean life of their batteries.

https://gmstat.com

https://rfaqs.com

Akaike Information Criteria: A Comprehensive Guide

Sep 5, 2024Jul 24, 2024 by Muhammad Imdad Ullah

Post Views: 1,480

The Akaike Information Criteria/Criterion (AIC) is a method used in statistics and machine learning to compare the relative quality of different models for a given dataset. The AIC method helps in selecting the best model out of a bunch by penalizing models that are overly complex. Akaike Information Criterion provides a means for comparing among models i.e. a tool for model selection.

A too-simple model leads to a large approximation error.

A too-complex model leads to a large estimation error.

AIC is a measure of goodness of fit of a statistical model developed by Hirotsugo Akaike under the name of “an information Criteria (AIC) and published by him in 1974 first time. It is grounded in the concept of information entropy in between bias and variance in model construction or between accuracy and complexity of the model.

The Formula of Akaike Information Criteria

Given a data set, several candidate models can be ranked according to their AIC values. From AIC values one may infer that the top two models are roughly in a tie and the rest far worse.

$$AIC = 2k-ln(L)$$

where $k$ is the number of parameters in the model, and $L$ is the maximized value of the likelihood function for the estimated model.

Akaike Information Criteria/ Criterion (AIC)

For a set of candidate models for the data, the preferred model is the one that has a minimum AIC value. AIC estimates relative support for a model, which means that AIC scores by themselves are not very meaningful

Akaike Information Criteria focuses on:

Balances fit and complexity: A model that perfectly fits the data might not be the best because it might be memorizing the data instead of capturing the underlying trend. AIC considers both how well a model fits the data (goodness of fit) and how complex it is (number of variables).
A lower score is better: Models having lower AIC scores are preferred as they achieve a good balance between fitting the data and avoiding overfitting.
Comparison tool: AIC scores are most meaningful when comparing models for the same dataset. The model with the lowest AIC score is considered the best relative to the other models being evaluated.

Summary

The AIC score is a single number and is used as model selection criteria. One cannot interpret the AIC score in isolation. However, one can compare the AIC scores of different model fits to the same data. The model with the lowest AIC is generally considered the best choice.

The AIC is the most useful model selection criterion when there are multiple candidate models to choose from. It works well for larger datasets. However, for smaller datasets, the corrected AIC should be preferred. AIC is not perfect, and there can be situations where it fails to choose the optimal model.

There are many other model selection criteria. For more detail read the article: Model Selection Criteria