Basics of Probability

In this post, I will discuss the Basics of Probability theory. First I will start with the concept of Set and Event.

Set

In statistical theory, a set is a well-defined collection of distinct events. For example, whenever a coin is tossed or die rolled, something (event) will happen. Distinct events comprise the set, that is when a coin is tossed, either Hear or Tail. It can be denoted with a Set.

$$A=\{Head, \, Tail\}$$

Similarly, for a fair die, the distinct events can be represented as set $B$, that is,

$$B = \{1, 2, 3, 4, 5, 6\}$$

When two fair dice are rolled, there will be 36 events that can be represented in a set say $C$.

\begin{align*}
\Big\{ (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), \\
\,\, &(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), \\
\,\, &(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), \\
\,\, &(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), \\
\,\, &(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), \\
\,\, &(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)\Big\}
\end{align*}

Basics of Probability

Probability is the chance of occurrence of an event described in a set (or sample space). For example, what is the chance of rain today? what is the chance that Pakistan will win the T20 World Cup? Probability is the estimation of chance and it deals with the occurrence of an event in the future. The estimates are presented numerically. For example, (i) There is a 75% chance of rain today, (ii) The insurance industry requires precise knowledge about the risk of loss to calculate premiums, and (iii) The chances of winning the lottery game are 1 in 2.3 million.

Random Experiment

Regarding probability, it is important to understand the concept of random experiments. It is a planned process/activity that gives different results known as outcomes. For example, as discussed earlier, when a coin is tossed, there may be two possible outcomes, Head or Tail. Any experiment or planned process which has only one outcome cannot be regarded as a random experiment. A random experiment has at least a minimum of two outcomes. Outcomes are the results of the experiment. The random experiment has the following properties:

  1. It can be repeated any number of times practically or theoretically.
  2. Each experiment has a minimum of two possible outcomes.
  3. All the outcomes are known in advance but each outcome is unpredictable.

So, we can say that probability is the measure of the degree of uncertainty or quantification of uncertainty.

Sample Space

When we collect all possible outcomes, it is known as sample space, and represented by $S$. For example,

$S=\{H, T\}$ sample space for tossing a single coin

$S=\{HH, HT, TH, TT\}$ sample space when tossing two coins simultaneously

Each outcome of a sample space is called a sample point.

Event

The individual outcome from a sample space in which one is interested is called an event. Events may be based on a single sample point or more than one sample point. For example, Let the even be even numbers when a single dice is thrown, that is, $A=\{2, 4, 6\}$, or even maybe $T$ (Tail) when tossing a coin. $B=\{T\}$. When we throw two dice the event may be the same number on the upper face of the dice, $C=\{(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)\}$

Similarly, the sum of dots on the top face of two dice is 4 is another event, that is, $D=\{(2, 2), (1, 3), (3, 1)\}$

Types of Events

The probability of an event lies between 0 and 1 inclusive. If the probability of an event is 1, it is known as a sure event. If the probability of an event is zero it is an impossible event. When two or more events cannot occur at the same time it is called a mutually exclusive event. For example, in the coin tossing example, either $H$ will occur or Tail, both head and tail cannot occur at the same time.

Events are equally likely when events have the same chance of occurrence. For example, either a student will pass or fail, there is a 50% chance for both events. Collectively Exhaustive Events are events whose union is equal to the sample space.

Random Variable

A random variable is that which takes values randomly. A random variable may be represented by $X$, $Y$, and $Z$, etc. Random variables can be classified as discrete random variables or continuous random variables. A discrete random variable is based on a counting procedure, while a continuous random variable is based on measurements.

A random variable is a variable that takes values randomly. These values may be integers for discrete variables and real for continuous variables. When we toss a coin there may be $H$ or $T$. Suppose, you are interested in Head then the random variable may be denoted by $X$ for various numbers of heads (for example, 0 head and 1 head)

x(Heads)$P(X)$
0 head$\frac{1}{2}$
1 head$\frac{1}{2}$
Total$1.0$

The sample space is $S=\{H, T\}$.

For two coins

x(Heads)$P(X)$
0 head$\frac{1}{4}$
1 head$\frac{2}{4}$
2 heads$\frac{1}{4}$
Total$1.0$

The sample space is $S=\{HH, Ht, TH, TT\}$.

MS Excel Quiz

Basics of Probability Theory

Online Big Data MCQs 5

The post is about Online Big Data MCQs with Answers. There are 20 multiple-choice questions about Big Data 5’s, IaaS, Paas, NameNode, HDFS, Map Reduce, Hadoop, Apache Spark, and YARN. Let us start with the Online Big Data MCQs with Answers now.

Online Big Data MCQs with Answers

Online Big Data MCQs with Answers

1. What is Apache Spark primarily used for in Big Data?

 
 
 
 

2. Which of the following is a distributed file storage system used in Big Data?

 
 
 
 

3. What are the two main components of a data computation framework that were described in the slides?

 
 
 
 
 

4. What is the primary characteristic of Big Data that refers to the scale of data?

 
 
 
 

5. Which of the following are problems to look out for when integrating your project with Hadoop?

 
 
 
 
 

6. What is the benefit of using pre-built Hadoop images?

 
 
 
 

7. What are some examples of open-source tools built for Hadoop and what does it do?

 
 
 
 

8. What is the purpose of data preprocessing in Big Data analytics?

 
 
 
 

9. Which of the following is NOT one of the 5 Vs of Big Data?

 
 
 
 

10. What are the two key components of HDFS and what are they used for?

 
 
 

11. What is the order of the three steps to Map Reduce?

 
 
 
 

12. What is the purpose of YARN?

 
 
 

13. What does the term “Velocity” in Big Data refer to?

 
 
 
 

14. What does IaaS provide?

 
 
 

15. What does PaaS provide?

 
 
 

16. What is the difference between low-level interfaces and high-level interfaces?

 
 

17. Which tool is used for real-time data streaming in Big Data?

 
 
 
 

18. What is the job of the NameNode?

 
 
 

19. What does SaaS provide?

 
 
 

20. Which of the following are Hadoop’s major goals?

 
 
 
 
 
 

Online Big Data MCQs with Answers

  • What does IaaS provide?
  • What does PaaS provide?
  • What does SaaS provide?
  • What are the two key components of HDFS and what are they used for?
  • What is the job of the NameNode?
  • What is the order of the three steps to Map Reduce?
  • What is the benefit of using pre-built Hadoop images?
  • What are some examples of open-source tools built for Hadoop and what does it do?
  • What is the difference between low-level interfaces and high-level interfaces?
  • Which of the following are problems to look out for when integrating your project with Hadoop?
  • Which of the following are Hadoop’s major goals?
  • What is the purpose of YARN?
  • What are the two main components of a data computation framework that were described in the slides?
  • What is the primary characteristic of Big Data that refers to the scale of data?
  • Which of the following is NOT one of the 5 Vs of Big Data?
  • What does the term “Velocity” in Big Data refer to?
  • Which of the following is a distributed file storage system used in Big Data?
  • What is Apache Spark primarily used for in Big Data?
  • Which tool is used for real-time data streaming in Big Data?
  • What is the purpose of data preprocessing in Big Data analytics?

MS Excel Quiz Questions

Hypothesis Testing MCQs Test 12

The post is about Hypothesis Testing MCQs Test with Answers. The quiz contains 20 questions about hypothesis testing and p-values. It covers the topics of formulation of the null and alternative hypotheses, level of significance, test statistics, region of rejection, decision, effect size, about acceptance and rejection of the hypothesis. Let us start with the Quiz Hypothesis Testing MCQs Test now.

Hypothesis Testing MCQs Test with Answers
Please go to Hypothesis Testing MCQs Test 12 to view the test

Online Hypothesis Testing MCQs Test with Answers

  • Which of the following are tests about population proportions and frequencies?
  • Which of the following would best be analyzed using a chi-square test of independence?
  • A man accused of committing a crime is taking a polygraph (lie detector) test. The polygraph is essentially testing the hypotheses $H_0$: The man is telling the truth vs. $H_a$: The man is not telling the truth. Suppose we use a 5% level of significance. Based on the man’s responses to the questions asked, the polygraph determines a P-value of 0.08. We conclude that:
  • If you were running a two-tail t-test with a sample size of $n=24$, what would the critical t-value be if $\alpha$ was chosen as 5%?
  • If a p-value for a hypothesis test of the mean was 0.0330 and the level of significance was 5%, what conclusion would you draw?
  • The power of a statistical test is the probability of rejecting the null hypothesis when it is —————–. When you increase alpha, the power of the test will —————.
  • The value $(1 – \alpha)$ is called ————–.
  • Which of the following is false?
  • Which of the following is false?
  • We want to estimate the average coffee intake of Coursera students, measured in cups of coffee. A survey of 1,000 students yields an average of 0.55 cups per day, with a standard deviation of 1 cup per day. Which of the following is not necessarily true?
  • One-sided alternative hypotheses are phrased in terms of:
  • A Type 2 error occurs when the null hypothesis is
  • You set up a two-sided hypothesis test for a population mean with a null hypothesis of $H_0:\mu=100$. You chose a significance level $\alpha=0.05$. The p-value calculated from the data is 0.12, and hence you failed to reject the null hypothesis. Suppose that after your analysis was completed and published, an expert informed you that the true value of  $\mu$ is 104. How would you describe the result of your analysis?
  • For given values of the sample mean and the sample standard deviation when $n = 25$, you conduct a hypothesis test and obtain a p-value of 0.0667, which leads to non-rejection of the null hypothesis. What will happen to the p-value if the sample size increases (and all else stays the same)?
  • A study compared five different methods for teaching descriptive statistics. The five methods were (i) traditional lecture and discussion, (ii) programmed textbook instruction, (iii) programmed text with lectures, (iv) computer instruction, and (v) computer instruction with lectures. 45 students were randomly assigned, 9 to each method. After completing the course, students took a 1-hour exam. We are interested in finding out if the average test scores are different for the different teaching methods. If the original significance level for the ANOVA was 0.05, what should be the adjusted significance level for the pairwise tests to compare all pairs of means to each other?
  • Which of the following is false regarding paired data?
  • A statement or assumption made about the value of a population parameter is
  • Which hypothesis is tested for possible rejection under the assumption that it is true?
  • The feed of a certain type of hormone increases the mean weight of chicks by 0.3 ounces. A sample of 25 eggs has a mean increase of 0.4 ounces with a standard deviation of 0.20 ounces. What is the value of the t-statistic?
  • Scientists claim that a diet will increase the mean weight of eggs at least by 0.3 ounces. A sample of 25 eggs has a mean increase of 0.4 ounces with a SD of 0.20. What will be the null hypothesis for testing this claim about diet?

Learn R Programming

MCQs General Knowledge