## Student t test

**Student t test**

**Student t test**

William Sealy Gosset in 1908 published his work under the pseudonym “Student” to solve problems associated with inference based on sample(s) drawn from normally distributed population when the population standard deviation is unknown. He developed the t-test and t-distribution, which can be used to compare two small sets of quantitative data collected independently of one another, in this case this t-test is called independent samples t-test or also called unpaired samples t-test.

Student’s t-test is the most commonly used statistical techniques in testing of hypothesis on the basis of difference between sample means. The t-test can be computed just by knowing the means, standard deviations and number of data points in both samples by using the following formula

\[t=\frac{\overline{X}_1-\overline{X}_2 }{\sqrt{s_p^2 (\frac{1}{n_1}+\frac{1}{n_2})}}\]

where $s_p^2$ is the pooled (combined) variance and can be computed as

\[s_p^2=\frac{(n_1-1)s_1^2 + (n_2-2)s_2^2}{n_1+n_2-2}\]

Using this test statistic, we test the null hypothesis $H_0:\mu_1=\mu_2$ which means that both samples came from the same population under the given level of significance or level of risk.

If the computed t-statistics from above formula is greater than the critical value (value from t-table with $n_1+n_2-2$ degrees of freedom and given level of significance, say $\alpha=0.05$), the null hypothesis will be rejected, otherwise null hypothesis will be accepted.

Note that the t-distribution is a family of curves depending of degree of freedom (the number of independent observations in the sample minus number of parameters). As the sample size increases, the t-distribution approaches to bell shape i.e. normal distribution.

**Example:** The production manager wants to compare the number of defective products produced on the day shift with the number on the afternoon shift. A sample of the production from 6day shifts and 8 afternoon shifts revealed the following numbers of defects. The production manager wants to check at the 0.05 significance level, is there a significant difference in the mean number of defects per shits?

Day shift | 5 | 8 | 7 | 6 | 9 | 7 | ||

Afternoon Shit | 8 | 10 | 7 | 11 | 9 | 12 | 14 | 9 |

Some required calculations are:

Mean of samples:

$\overline{X}_1=7$, $\overline{X}_2=10$,

Standard Deviation of samples

$s_1=1.4142$, $s_2=2.2678$ and $s_p^2=\frac{(6-1) (1.4142)^2+(8-1)(2.2678)^2}{6+8-2}=3.8333$

Step 1: Null and alternative hypothesis are: $H_0:\mu_1=\mu_2$ vs $H_1:\mu_1 \ne \mu_2$

Step 2: Level of significance: $\alpha=0.05$

Step 3: Test Statistics

$\begin{aligned}

t&=\frac{\overline{X}_1-\overline{X}_2 }{\sqrt{s_p^2 (\frac{1}{n_1}+\frac{1}{n_2})}}\\

&=\frac{7-10}{\sqrt{3.8333(\frac{1}{6}+\frac{1}{8})}}=-2.837

\end{aligned}$

Step 4: Critical value or rejection region (Reject $H_0$ if absolute value of t-calculated in step 3 is greater than absolute table value i.e. $|t_{calculated}|\ge t_{tabulated}|$). In this example t-tabulated is -2.179 with 12 degree of freedom at significance level 5%.

Step 5: Conclusion: As computed value $|2.837| > |2.179|$, which means that the number of defects is not same on the two shifts.

**See some Mathematica demonstration**

** Student T Distribution **

*t*-Distribution from the Wolfram Demonstrations Project by Chris Boucher