Understand the sampling distribution of differences between means—what it is, why it matters, and how to apply it in hypothesis testing (with examples). Perfect for students, data scientists, and analysts! Ever wondered how statisticians compare two groups (e.g., test scores, sales performance, or medical treatments)? The key lies in the sampling distribution of differences between means—a fundamental concept for hypothesis testing, confidence intervals, and A/B testing.
Table of Contents
Sampling Distribution of Differences Between Means
The Sampling Distribution of Differences Between Means is the probability distribution of differences between two sample means (e.g., $Mean_A – Mean_B$) if you repeatedly sampled from two populations.
Let there are two populations of size $N_1$ and $N_2$ having means $\mu_1$ and $\mu_2$ with variances $\sigma_1^2$ and $\sigma_2^2$. We need to draw all possible samples of size $n_1$ from the first population and $n_2$ from the second population, with or without replacement.
Let $\overline{x}_1$ be the means/averages of samples of the first population and $\overline{x}_2$ be the means/averages of the samples of the second population. After this, we will determine all possible differences between means/averages denoted by
$$d =\overline{x}_1 – \overline{x}_2$$
We call the frequency distribution differences as frequency distribution, while the probability distribution of the differences is the sampling distribution of differences between means.
Notations for Sampling Distribution of Differences between Means
Notation | Short Description |
---|---|
$\mu_1$ | Mean of the first population |
$\mu_2$ | Mean of the second population |
$\sigma_1^2$ | Variance of the first population |
$\sigma_2^2$ | Variance of the second population |
$\sigma_1$ | Standard deviation of the first population |
$\sigma_2$ | Standard deviation of the second population |
$\mu_{\overline{x}_1 – \overline{x}_2}$ | Mean of the sampling distribution of difference between means |
$\sigma^2_{\overline{x}_1 – \overline{x}_2}$ | Variance of the sampling distribution of difference between means |
$\sigma_{\overline{x}_1 – \overline{x}_2}$ | Standard deviation of the sampling distribution of difference between means |
Some Formulas for Sampling with/without Replacement
Sr. No. | Sampling with Replacement | Sampling without Replacement |
---|---|---|
1. | $\mu_{\overline{x}_1 -\overline{x}_2} = \mu_1-\mu_2$ | $\mu_{\overline{x}_1 -\overline{x}_2} = \mu_1-\mu_2$ |
2. | $\sigma^2_{\overline{x}_1 -\overline{x}_2}=\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}$ | $\sigma^2_{\overline{x}_1 -\overline{x}_2}=\frac{\sigma_1^2}{n_1}\left(\frac{N-1-n_2}{N_1-1}\right) + \frac{\sigma_2^2}{n_2}\left(\frac{N_2-n_2}{N_2-1}\right)$ |
3. | $\sigma_{\overline{x}_1 -\overline{x}_2}=\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$ | $\sigma_{\overline{x}_1 -\overline{x}_2}=\sqrt{\frac{\sigma_1^2}{n_1}\left(\frac{N-1-n_2}{N_1-1}\right) + \frac{\sigma_2^2}{n_2}\left(\frac{N_2-n_2}{N_2-1}\right)}$ |
Example
Let $\overline{x}$ represent the mean of a sample of size $n_1=2$ selected at random with replacement from a finite population consisting of values 5, 7, and 9. Similarly, let $\overline{x}_2$ represent the mean of a sample of size $n_2=2$ selected at random from another finite population consisting of values 4, 6, and 8. Form the sampling distribution of the random variable $\overline{x}_1 – \overline{x}_2$ and verify that
- $\mu_{\overline{x}_1 – \overline{x}_2} = \mu_1 – \mu_2$
- $\sigma^2_{\overline{x}_1 – \overline{x}_2} = \frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}
Solution
Population I | Population II |
---|---|
5, 7, 9 $N_1=3$ $n_1=2$ | 4, 6, 8 $N_2=3$ $n_2=2$ |
Possible samples with Replacement are $N_1^{n_1}=3^2 =9$ | Possible samples with Replacement are $N_2^{n_2} = 3^2 = 9$ |
All Possible Samples
All possible differences between samples means from both of the population is ($d=\overline{x}_1 – \overline{x}_2$).
$d=\overline{x}_1 =-\overline{x}_2$ | 4 | 5 | 5 | 6 | 6 | 6 | 7 | 7 | 8 |
5 | 5-4= 1 | 0 | 0 | -1 | -1 | -1 | -2 | -2 | -3 |
6 | 2 | 1 | 1 | 0 | 0 | 0 | -1 | -1 | -2 |
6 | 2 | 1 | 1 | 0 | 0 | 0 | -1 | -1 | -2 |
7 | 3 | 2 | 2 | 1 | 1 | 1 | 0 | 0 | -1 |
7 | 3 | 2 | 2 | 1 | 1 | 1 | 0 | 0 | -1 |
7 | 3 | 2 | 2 | 1 | 1 | 1 | 0 | 0 | -1 |
8 | 4 | 3 | 3 | 2 | 2 | 2 | 1 | 1 | 0 |
8 | 4 | 3 | 3 | 2 | 2 | 2 | 1 | 1 | 0 |
9 | 5 | 4 | 4 | 3 | 3 | 3 | 2 | 2 | 1 |
The Sampling Distribution of Differences Between Means
$d=\overline{x}_1 – \overline{x}_2$ | $f$ | $P(d)$ | $d\cdot P(d)$ | $d^2$ | $d^2 \cdot P(d)$ |
---|---|---|---|---|---|
-3 | 1 | 1/81 | $-3 * 1/81 = -3/81$ | 9 | 9/81 |
-2 | 4 | 4/81 | -8/81 | 4 | 16/81 |
-1 | 10 | 10/81 | -10/81 | 1 | 10/81 |
0 | 16 | 16/81 | 0/81 | 0 | 0/81 |
1 | 19 | 19/81 | 19/81 | 1 | 19/81 |
2 | 16 | 16/81 | 31/81 | 4 | 64/81 |
3 | 10 | 10/81 | 30/81 | 9 | 90/81 |
4 | 4 | 4/81 | 16/81 | 16 | 64/81 |
5 | 1 | 1/81 | 5/81 | 25 | 125/81 |
Total | 81 | 81/81=1 | 297/81=3.67 |
\begin{align*}
\mu_{\overline{x}_1 – \overline{x}_2} &= E(d) = \Sigma(d\cdot P(d)) = \frac{81}{81}=1\\
\sigma^2_{\overline{x}_1 – \overline{x}_2} &= E(d^2) – [E(d)]^2\\
&=\Sigma d^2 P(d) – \left[\Sigma (d\cdot P(d))\right]^2\\
&= 3.67 – 1^2 = 2.67
\end{align*}
Verification
- $\mu_{\overline{x}_1 – \overline{x}_2} = \mu_1 – \mu_2 \Rightarrow 7-6 = 1$
- $\sigma_{\overline{x}_1 – \overline{x}_2}^2 = \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2} = \frac{2.66}{2} + \frac{2.66}{2}\Rightarrow 2.66$