Probability Terminology

Probability Terminology

The following are Probability terminology that are helpful in understanding the concepts of probability and rules of probability for solving different probability-related real-life problems.

Sets: A set is a well-defined collection of distinct objects. The objects making up a set are called its elements. A set is usually capital letters i.e. $A, B, C$, while its elements are denoted by small letters i.e. $a, b, c$, etc.

Null Set: A set that contains no element is called a null set or simply an empty set. It is denoted by { } or $\varnothing$.

Subset: If every element of a set $A$ is also an element of a set $B$, then $A$ is said to be a subset of $B$ and it is denoted by $A \ne B$.

Proper Subset: If $A$ is a subset of $B$, and $B$ contains at least one element that is not an element of $A$, then $A$ is said to be a proper subset of $B$ and is denoted by; $A \subset B$.

Finite and Infinite Sets: A set is finite, if it contains a specific number of elements, i.e. while counting the members of the sets, the counting process comes to an end otherwise the set is infinite.

Universal Set: A set consisting of all the elements of the sets under consideration is called the universal set. It is denoted by $\cup$.

Disjoint Set: Two sets $A$ and $B$ are said to be disjoint sets if they have no elements in common i.e. if $A \cup B = \varnothing$, then $A$ and $B$ are said to be disjoint sets.

Overlapping Sets: Two sets $A$ and $B$ are said to be overlapping sets, if they have at least one element in common, i.e. if $A \cap B \ne \varnothing$ and none of them is the subset of the other set then $A$ and $B$ are overlapping sets.

Union of Sets: The Union of two sets $A$ and $B$ is a set that contains the elements either belonging to $A$ or $B$ or both. It is denoted by $A \cap B$ and read as $A$ union $B$.

Intersection of Sets: The intersection of two sets $A$ and $B$ is a set that contains the elements belonging to both $A$ and $B$. It is denoted by $A \cup B$ and read as $A$ intersection $B$.

Difference of Sets: The difference between a set $A$ and a set $B$ is the set that contains the elements of the set $A$ that are not contained in $b$. The difference between sets $A$ and $B$ is denoted by $a-b$.

Complement of a Set: Complement of a set $a$ denoted by $\bar{A}$ or $A^c$ and is defined as $\bar{A}=\cup$.

Experiment: Any activity where we observe something or measure something. An activity that results in or produces an event is called an experiment.

Random Experiment: An experiment, if repeated under identical conditions may not give the same outcome, i.e. The outcome of a random experiment is uncertain, so that a given outcome is just one sample of many possible outcomes. For the random experiment, we know about all possible outcomes. A random experiment has the following properties;

  1. The experiment can be repeated any number of times.
  2. A random trial consists of at least two outcomes.

 Sample Space: The set of all possible outcomes in a random experiment is called sample space. In the coin toss experiment, the sample space is $S=\{Head, Tail\}$, in the card-drawing experiment the sample space has 52 members. Similarly the sample space for a die={1,2,3,4,5,6}.

Event: Event is simply a subset of sample space. In a sample space, there can be two or more events consisting of sample points. For coin, the list of all possible events is 4, found by $event=2^n$, that is i) $A_1 = \{H\}$, ii) $A_2=\{T\}$, iii) $A_3\{H, T\}$, and iv) $A_4=\varnothing$ are possible event for coin toss experiment.

Simple Event: If an event consists of one sample point, then it is called a simple event. For example, when two coins are tossed, the event {TT} is simple.

Compound Event: If an event consists of more than one sample point, it is called a compound event. For example, when two dice are rolled, an event B, the sum of two faces is 4 i.e. $B=\{(1,3), (2,3), 3,1)\}$ is a compound event.

Independent Events: Two events $A$ and $B$ are said to be independent if the occurrence of one does not affect the occurrence of the other. For example, in tossing two coins, the occurrence of a head on one coin does not affect in any way the occurrence of a head or tail on the other coin.

Dependent Events: Two events A and B are said to be dependent if the occurrence of one event affects the occurrence of the other event.

Mutually Exclusive Events: Two events $A$ and $B$ are said to be mutually exclusive if they cannot occur at the same time i.e. $A\cup B AUB=\varnothing$. For example, when a coin is tossed, we get either a head or a tail, but not both. That is why they have no common point there, so these two events (head and tail) are mutually exclusive. Similarly, when a die is thrown, the possible outcomes 1, 2, 3, 4, 5, 6 are mutually exclusive.

probability terminology

Equally Likely or Non-Mutually Exclusive Events: Two events $A$ and $B$ are said to be equally likely events when one event is as likely to occur as the other. OR If the experiment is continued a large number of times all the events have the chance of occurring an equal number of times. Mathematically, $A\cup B \ne\varnothing$. For example, when a coin is tossed, the head is as likely to occur as the tail or vice versa.

Exhaustive Events: When a sample space $S$ is partitioned into some mutually exclusive events, such that their union is the sample space itself, the event is called an exhaustive event. OR
Events are said to be collectively exhaustive when the union of mutually exclusive events is the entire sample space $S$.
Let a die is rolled, the sample space is $S=\{1,2,3,4,5,6\}$.
Let $A=\{1, 2\}, B=\{3, 4, 5\}$, and C=\{6\}$.

$A, B$, and $C$ are mutually exclusive events and their union $(A\cup B \cup C = S)$ is the sample space, so the events $A, B$, and $C$ are exhaustive.

Classical Probability Definition and Examples

R Frequently Asked Questions

Online MCQs Test Website

Pareto Chart Easy Guide (2012)

A Pareto chart named after Vilfredo Pareto (an Italian Economist) is a bar chart in which all bars are ordered from largest to the smallest along with a line showing the cumulative percentage and count of the bars. The left vertical axis has the frequency of occurrence (number of occurrences), or some other important unit of measure such as cost. The right vertical axis contains the cumulative percentage of the total number of occurrences or the total of the particular unit of measure such as total cost. For the Pareto chart, the cumulative function is concave because the bars (representing the reasons) are in decreasing order. A Pareto chart is also called a Pareto distribution diagram.

The Pareto chart is also known as the 80/20 rule chart. These charts offer several benefits for data analysis and problem-solving.

A Pareto chart can be used when the following questions have their answer is “yes”

  1. Can data be arranged into categories?
  2. Is the rank of each category important?

Pareto charts are often used to analyze defects in a manufacturing process or the most frequent reasons for customer complaints to help determine the types of defects that are most prevalent (important) in a process. So a Company can focus on improving its efforts in particular important areas where it can make the largest gain or the lowest loss by eliminating causes of defects. So it’s easy to prioritize the problem areas using Pareto charts. The categories in the “tail” of the Pareto chart are called the insignificant factors.

Pareto Chart Example

Pareto Chart

The Pareto chart given above shows the reasons for consumer complaints against airlines in 2004. Here each bar represents the number (frequency) of each complaint received. The major complaints received are related to flight problems (such as cancellations, delays, and other deviations from the schedule). The 2nd largest complaint is about customer service (rude or unhelpful employees, inadequate meals or cabin service, treatment of delayed passengers, etc.). Flight problems account for 21% of the complaints, while both flight problems and customer service account for 40% of the complaints. The top three complaint categories account for 55% of the complaints. So, to reduce the number of complaints, airlines should need to work on flight delays, customer service, and baggage problems.

By incorporating Pareto-charts into data analysis, one can get valuable insights, prioritize effectively, and make data-driven decisions.

Charts and Graphs

References:

  • Nancy R. Tague (2004). “Seven Basic Quality Tools”. The Quality Toolbox. Milwaukee, Wisconsin: American Society for Quality. p. 15. Retrieved 2010-02-05.
  • http://en.wikipedia.org/wiki/Pareto_chart

See more about Charts and Graphs

Online MCQs Intermediate Mathematics (Matrices and Determinants)

Graphs in R Language

Goldfeld Quandt Test: Comparison of Variances of Error Terms

The Goldfeld Quandt test is one of two tests proposed in a 1965 paper by Stephen Goldfeld and Richard Quandt. Both parametric and nonparametric tests are described in the paper, but the term “Goldfeld–Quandt test” is usually associated only with the parametric test.
Goldfeld-Quandt test is frequently used as it is easy to apply when one of the regressors (or another r.v.) is considered the proportionality factor of heteroscedasticity. Goldfeld-Quandt test is applicable for large samples. The observations must be at least twice as many as the parameters to be estimated. The test assumes normality and serially independent error terms $u_i$.

The Goldfeld Quandt test compares the variance of error terms across discrete subgroups. So data is divided into h subgroups. Usually, the data set is divided into two parts or groups, and hence the test is sometimes called a two-group test.

Goldfeld Quandt Test: Comparison of Variances of Error Terms

Before starting how to perform the Goldfeld Quand Test, you may read more about the term Heteroscedasticity, the remedial measures of heteroscedasticity, Tests of Heteroscedasticity, and Generalized Least Square Methods.

Goldfeld Quandt Test Procedure:

The procedure for conducting the Goldfeld-Quandt Test is;

  1. Order the observations according to the magnitude of $X$ (the independent variable which is the proportionality factor).
  2. Select arbitrarily a certain number (c) of central observations which we omit from the analysis. (for $n=30$, 8 central observations are omitted i.e. 1/3 of the observations are removed). The remaining $n-c$ observations are divided into two sub-groups of equal size i.e. $\frac{(n-2)}{2}$, one sub-group includes small values of $X$ and the other sub-group includes the large values of $X$, and a data set is arranged according to the magnitude of $X$.
  3. Now Fit the separate regression to each of the sub-groups, and obtain the sum of squared residuals from each of them.
    So $\sum c_1^2$ shows the sum of squares of Residuals from a sub-sample of low values of $X$ with $(n – c)/2 – K$ df, where K is the total number of parameters.$\sum c_2^2$ shows the sum of squares of Residuals from a sub-sample of large values of $X$ with $(n – c)/2 – K$ df, where K is the total number of parameters.
  4. Compute the Relation $F^* = \frac{RSS_2/df}{RSS_2/df}=\frac{\sum c_2^2/ ((n-c)/2-k)}{\sum c_1^2/((n-c)/2-k) }$

If variances differ, F* will have a large value. The higher the observed value of the F*-ratio the stronger the heteroscedasticity of the $u_i$.

Goldfeld Quandt Test of

References

  • Goldfeld, Stephen M.; Quandt, R. E. (June 1965). “Some Tests for Homoscedasticity”. Journal of the American Statistical Association 60 (310): 539–547
  • Kennedy, Peter (2008). A Guide to Econometrics (6th ed.). Blackwell. p. 116

Numerical Example of the Goldfeld-Quandt Test.

R Programming and Data Analysis in R

Online MCQs Test Website