In 1908, William Sealy Gosset published his work under the pseudonym “Student” to solve problems associated with inference based on sample(s) drawn from a normally distributed population when the population standard deviation is unknown. He developed the Student t-test and t-distribution, which can be used to compare two small sets of quantitative data collected independently of one another, in this case, this t-test is called independent samples t-test or also called unpaired samples t-test.

The Student t-test is the most commonly used statistical technique in testing of hypothesis based on the difference between sample means. The student t-test can be computed just by knowing the means, standard deviations, and number of data points in both samples by using the following formula

\[t=\frac{\overline{X}_1-\overline{X}_2 }{\sqrt{s_p^2 (\frac{1}{n_1}+\frac{1}{n_2})}}\]

where $s_p^2$ is the pooled (combined) variance and can be computed as

\[s_p^2=\frac{(n_1-1)s_1^2 + (n_2-2)s_2^2}{n_1+n_2-2}\]

Using this test statistic, we test the null hypothesis $H_0:\mu_1=\mu_2$ which means that both samples came from the same population under the given “level of significance” or “level of risk”.

If the computed t-statistics from the above formula is greater than the critical value (value from t-table with $n_1+n_2-2$ degrees of freedom and given a level of significance, say $\alpha=0.05$), the null hypothesis will be rejected, otherwise, the null hypothesis will be accepted.

Note that the t-distribution is a family of curves depending on the degree of freedom (the number of independent observations in the sample minus the number of parameters). As the sample size increases, the t-distribution approaches a bell shape i.e. normal distribution.

### Student t-test Example

The production manager wants to compare the number of defective products produced on the day shift with the number on the afternoon shift. A sample of the production from 6-day and 8-afternoon shifts revealed the following defects. The production manager wants to check at the 0.05 significance level, is there a significant difference in the mean number of defects per shits?

Day shift | 5 | 8 | 7 | 6 | 9 | 7 | ||

Afternoon Shit | 8 | 10 | 7 | 11 | 9 | 12 | 14 | 9 |

Some required calculations for the Student t-test are:

The mean of samples:

$\overline{X}_1=7$, $\overline{X}_2=10$,

Standard Deviation of samples

$s_1=1.4142$, $s_2=2.2678$ and $s_p^2=\frac{(6-1) (1.4142)^2+(8-1)(2.2678)^2}{6+8-2}=3.8333$

Step 1: Null and alternative hypothesis are: $H_0:\mu_1=\mu_2$ vs $H_1:\mu_1 \ne \mu_2$

Step 2: Level of significance: $\alpha=0.05$

Step 3: Test Statistics

$\begin{aligned}

t&=\frac{\overline{X}_1-\overline{X}_2 }{\sqrt{s_p^2 (\frac{1}{n_1}+\frac{1}{n_2})}}\\

&=\frac{7-10}{\sqrt{3.8333(\frac{1}{6}+\frac{1}{8})}}=-2.837

\end{aligned}$

Step 4: Critical value or rejection region (Reject $H_0$ if the absolute value of t-calculated in step 3 is greater than the absolute table value i.e. $|t_{calculated}|\ge t_{tabulated}|$). In this example t-tabulated is -2.179 with 12 degrees of freedom at a significance level of 5%.

Step 5: Conclusion: As computed value $|2.837| > |2.179|$, which means that the number of defects is not the same on the two shifts.

very useful information for competitive exams.