Sampling Distribution of Differences

Understand the sampling distribution of differences between means—what it is, why it matters, and how to apply it in hypothesis testing (with examples). Perfect for students, data scientists, and analysts! Ever wondered how statisticians compare two groups (e.g., test scores, sales performance, or medical treatments)? The key lies in the sampling distribution of differences between means—a fundamental concept for hypothesis testing, confidence intervals, and A/B testing.

Sampling Distribution of Differences Between Means

The Sampling Distribution of Differences Between Means is the probability distribution of differences between two sample means (e.g., $Mean_A – Mean_B$) if you repeatedly sampled from two populations.

Let there are two populations of size $N_1$ and $N_2$ having means $\mu_1$ and $\mu_2$ with variances $\sigma_1^2$ and $\sigma_2^2$. We need to draw all possible samples of size $n_1$ from the first population and $n_2$ from the second population, with or without replacement.

Let $\overline{x}_1$ be the means/averages of samples of the first population and $\overline{x}_2$ be the means/averages of the samples of the second population. After this, we will determine all possible differences between means/averages denoted by
$$d =\overline{x}_1 – \overline{x}_2$$

We call the frequency distribution differences as frequency distribution, while the probability distribution of the differences is the sampling distribution of differences between means.

Notations for Sampling Distribution of Differences between Means

NotationShort Description
$\mu_1$Mean of the first population
$\mu_2$Mean of the second population
$\sigma_1^2$Variance of the first population
$\sigma_2^2$Variance of the second population
$\sigma_1$Standard deviation of the first population
$\sigma_2$Standard deviation of the second population
$\mu_{\overline{x}_1 – \overline{x}_2}$Mean of the sampling distribution of difference between means
$\sigma^2_{\overline{x}_1 – \overline{x}_2}$Variance of the sampling distribution of difference between means
$\sigma_{\overline{x}_1 – \overline{x}_2}$Standard deviation of the sampling distribution of difference between means

Some Formulas for Sampling with/without Replacement

Sr. No.Sampling with ReplacementSampling without Replacement
1.$\mu_{\overline{x}_1 -\overline{x}_2} = \mu_1-\mu_2$$\mu_{\overline{x}_1 -\overline{x}_2} = \mu_1-\mu_2$
2.$\sigma^2_{\overline{x}_1 -\overline{x}_2}=\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}$$\sigma^2_{\overline{x}_1 -\overline{x}_2}=\frac{\sigma_1^2}{n_1}\left(\frac{N-1-n_2}{N_1-1}\right) + \frac{\sigma_2^2}{n_2}\left(\frac{N_2-n_2}{N_2-1}\right)$
3.$\sigma_{\overline{x}_1 -\overline{x}_2}=\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$$\sigma_{\overline{x}_1 -\overline{x}_2}=\sqrt{\frac{\sigma_1^2}{n_1}\left(\frac{N-1-n_2}{N_1-1}\right) + \frac{\sigma_2^2}{n_2}\left(\frac{N_2-n_2}{N_2-1}\right)}$

Example

Let $\overline{x}$ represent the mean of a sample of size $n_1=2$ selected at random with replacement from a finite population consisting of values 5, 7, and 9. Similarly, let $\overline{x}_2$ represent the mean of a sample of size $n_2=2$ selected at random from another finite population consisting of values 4, 6, and 8. Form the sampling distribution of the random variable $\overline{x}_1 – \overline{x}_2$ and verify that

  • $\mu_{\overline{x}_1 – \overline{x}_2} = \mu_1 – \mu_2$
  • $\sigma^2_{\overline{x}_1 – \overline{x}_2} = \frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}

Solution

Population IPopulation II
5, 7, 9
$N_1=3$
$n_1=2$
4, 6, 8
$N_2=3$
$n_2=2$
Possible samples with Replacement are $N_1^{n_1}=3^2 =9$Possible samples with Replacement are
$N_2^{n_2} = 3^2 = 9$
Sampling Distribution of Differences Between Means

All Possible Samples

All possible differences between samples means from both of the population is ($d=\overline{x}_1 – \overline{x}_2$).

$d=\overline{x}_1 =-\overline{x}_2$455666778
55-4= 100-1-1-1-2-2-3
6211000-1-1-2
6211000-1-1-2
732211100-1
732211100-1
732211100-1
8433222110
8433222110
9544333221

The Sampling Distribution of Differences Between Means

$d=\overline{x}_1 – \overline{x}_2$$f$$P(d)$$d\cdot P(d)$$d^2$$d^2 \cdot P(d)$
-311/81$-3 * 1/81 = -3/81$99/81
-244/81-8/81416/81
-11010/81-10/81110/81
01616/810/8100/81
11919/8119/81119/81
21616/8131/81464/81
31010/8130/81990/81
444/8116/811664/81
511/815/8125125/81
Total8181/81=1 297/81=3.67

\begin{align*}
\mu_{\overline{x}_1 – \overline{x}_2} &= E(d) = \Sigma(d\cdot P(d)) = \frac{81}{81}=1\\
\sigma^2_{\overline{x}_1 – \overline{x}_2} &= E(d^2) – [E(d)]^2\\
&=\Sigma d^2 P(d) – \left[\Sigma (d\cdot P(d))\right]^2\\
&= 3.67 – 1^2 = 2.67
\end{align*}

Sampling Distribution of differences between means, mean and variance of both populations

Verification

  • $\mu_{\overline{x}_1 – \overline{x}_2} = \mu_1 – \mu_2 \Rightarrow 7-6 = 1$
  • $\sigma_{\overline{x}_1 – \overline{x}_2}^2 = \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2} = \frac{2.66}{2} + \frac{2.66}{2}\Rightarrow 2.66$

Sampling in R Language

Leave a Comment

Discover more from Statistics for Data Science & Analytics

Subscribe now to keep reading and get access to the full archive.

Continue reading