Binomial Distribution (2016)

In this post, we will learn about Binomial Distribution and its basics.

A statistical experiment having successive independent trials having two possible outcomes (such as success and failure; true and false; yes and no; right and wrong etc.) and probability of success is equal for each trial, while this kind of experiment is repeated a fixed number of times (say $n$ times) is called Binomial Experiment, Each trial of this Binomial experiment is known as Bernoulli trial (a trial which is a single performance of an experiment), for example.

Properties of the Binomial Experiment

  1. Each trial of the Binomial Experiment can be classified as a success or failure.
  2. The probability of success for each trial of the experiment is equal.
  3. Successive trials are independent, that is, the occurrence of one outcome in an experiment does not affect the occurrence of the other.
  4. The experiment is repeated a fixed number of times.

Binomial Distribution

Let $X$ be a discrete random variable, which denotes the number of successes of a Binomial Experiment (we call this binomial random variable). The random variable assumes isolated values as $X=0,1,2,\cdots,n$. The probability distribution of the binomial random variables is termed binomial distribution. It is a discrete probability distribution.

Binomial Probability Mass Function

The probability function of the binomial distribution is also called the binomial probability mass function. It can be denoted by $b(x, n, p)$, that is, a binomial distribution of random variable $X$ with $n$ (given number of trials) and $p$ (probability of success) as parameters. If $p$ is the probability of success (alternatively $q=1-p$ is probability of failure such that $p+q=1$) then probability of exactly $x$ success can be found from the following formula,

b(x, n, p) &= P(X=x)\\
&=\binom{n}{x} p^x q^{n-x}, \quad x=0,1,2, \cdots, n

where $p$ is the probability of success of a single trial, $q$ is the probability of failure and $n$ is the number of independent trials.

The formula gives the probability for each possible combination of $n$ and $p$ of a binomial random variable $X$. Note that it does not give $P(X <0)$ and $P(X>n)$. The binomial distribution is suitable when $n$ is small and applied when sampling is done with replacement.

\[b(x, n, p) = \binom{n}{x} p^x q^{n-x}, \quad x=0,1,2,\cdots,n,\]

is called Binomial distribution because its successive terms are the same as that of binomial expansion of

Binomial Distribution

(q+p)^n=\binom{0}{0} p^0 q^{n-0}+\binom{n}{1} p^1 q^{n-1}+\cdots+\binom{n}{n-1} p^n q^{n-(n-1)}+\binom{n}{n} p^n q^{n-n}

$\binom{n}{0}, \binom{n}{1}, \binom{n}{2},\cdots, \binom{n}{n-1}, \binom{n}{n}$ are called Binomial coefficients.

Note that it is necessary to describe the limit of the random variable otherwise, it will be only the mathematical equation, not the probability distribution. statistics help

Take Online MCQ tests on Probability Distributions

Online MCQs Quiz Website

Generate Binomial Random Numbers in R Language

FAQs about Binomial Distribution

  1. What is a binomial random variable?
  2. What is a binomial experiment?
  3. What is the binomial formula?
  4. What is the binomial probability mass function?
  5. Discuss the properties of the Binomial experiment.
  6. What are the parameters of binomial distribution?

Writing Excel Formulas (2016)

Writing Excel formulas is a little different than the way it is done in mathematics class. All Excel formulas start with an equal sign (=), that is, the equal sign always goes in that cell where you want the answer to appear from the formula. Therefore, the equal sign informs Excel that this is a formula not just a name or number. Let us start with writing Excel Formulas.

The Excel formula looks like

= 3 + 2

rather than

3+2 =

Writing Excel Formulas and Cell References in MS Excel

The example of a formula has one drawback. If you want to change the number being calculated (3, and 2), you need to edit it or re-write the formula. A better way is to write formulas in such a way that you can change the numbers without changing or re-writing the formulas themselves. To do this, cell references are used, which tells Excel that data/ numbers are located in a cell. Therefore a cell’s location/ reference in the spreadsheet is referred to as its cell reference.

To find a cell reference, simply click the cell of which you need cell reference, and from NAME BOX (shown in the figure below), see the text, such as F2.

Writing Excel formulas 1

F2 represents the cell in the $F$ column (horizontal position) and row 2 (vertical position). It means cell reference can also be found by reading the column heading (at the topmost position) of the cells and row number (at the leftmost position). Therefore, cell reference is a combination of the column letter and row number such as A1, B2, Z5, and A106, etc. For the previous formula example, instead of writing = 3 + 2 in cell suppose (C1), follow this way of cell reference and formula writing:

In cell A1 write 3, and in cell B2 write 2. In the C1 cell write the formula such as,

= A1 + A2
Excel Formula 2

Note that there is no gap between A & 1 and A & 2, they are simply A1 and A2. See the diagram for a clear understanding.

Updating/ Writing Excel Formulas

Upon wrong cell references in the Excel formula, the results from the formula will be automatically updated, whenever the data value in relevant cells is changed. For example, if you want to change data in cell A1 to 8 instead of 3, you only need to change the content of A1. The result of the formula in cell C1 will automatically be updated after the updation of the data value in A1 or B1.

Note that the formula will not change because the cell references are being used instead of data values or numbers.

Data Analysis in R Language

Read more about Creating Formulas in MS Excel and Operator Order of Precedence

The Correlogram

A correlogram is a graph used to interpret a set of autocorrelation coefficients in which $r_k$ is plotted against the $log k$. A correlogram is often very helpful for visual inspection.

Some general advice to interpret the correlogram are:

  • A Random Series: If a time series is completely random, then for large $N$, $r_k \cong 0$ for all non-zero values of $k$. A random time series $r_k$ is approximately $N\left(0, \frac{1}{N}\right)$. If a time series is random, 19 out of 20 of the values of $r_k$ can be expected to lie between $\pm \frac{2}{\sqrt{N}}$. However, plotting the first 20 values of $r_k$, one can expect to find one significant value on average even when the time series is random.
  • Short-term Correlation: Stationary series often exhibit short-term correlation characterized by a fairly large value of $r_1$ followed by 2 or 3 more coefficients (significantly greater than zero) tend to get successively smaller values of $r_k$ for larger lags tend to get be approximately zero. A time series that gives rise to such a correlogram is one for which an observation above the mean tends to be followed by one or more further observations above the mean and similarly for observation below the mean. A model called an autoregressive model may be appropriate for a series of this type.
  • Alternating Series: If a time series tends to alternate with successive observations on different sides of the overall mean, then the correlogram also tends to alternate. The value of $r_1$ will be negative, however, the value of $r_2$ will be positive as observation at lag 2 will tend to be on the same side of the mean.
  • Non-Stationary Series: If a time series contains a trend, then the value of $r_k$ will not come down to zero except for very large values of the lags. This is because of a large number of further observations on the same side of the mean because of the trend. The sample autocorrelation function $\{ r_k \}$ should only be calculated for stationary time series and no trend should be removed before calculating $\{ r_k\}$.
  • Seasonal Fluctuations: If a time series contains a seasonal fluctuation then the correlogram will also exhibit an oscillation at the same frequency. If $x_t$ follows a sinusoidal pattern then so does $r_k$.
    $x_t=a\, cos\, t\, w, $ where $a$ is constant, $w$ is frequency such that $0 < w < \pi$. Therefore $r_k \cong cos\, k\, w$ for large $N$.
    If the seasonal variation is removed from seasonal data then the correlogram may provide useful information.
  • Outliers: If a time series contains one or more outliers the correlogram may be seriously affected. If there is one outlier in the time series and it is not adjusted, then the plot of $x_y$ vs $x_{t+k}$ will contain two extreme points, which will tend to depress the sample correlation coefficients towards zero. If there are two outliers, this effect is more noticeable.
  • General Remarks: Experience is required to interpret autocorrelation coefficients. We need to study the probability theory of stationary series and the classes of the model too. We also need to know the sampling properties of $x_t$.

There are two main types of correlograms depending on the type of correlation being analyzed:

  • Pearson Correlation: This is the most common type and measures linear correlations between continuous variables.
  • Spearman Rank Correlation: This is a non-parametric measure suitable for ordinal or continuous data and assesses monotonic relationships (not necessarily linear).

In summary, a correlogram is a valuable tool for exploratory data analysis. It helps us:

  • Understand the relationships between multiple variables in your data.
  • Identify potential issues with multicollinearity before building statistical models.
  • Gain insights into the underlying structure of your data. correlogram

Learn R Programming and R Data Analysis

Online MCQs Test