# Basic Statistics and Data Analysis

### Category: Testing of Hypothesis

Testing of Hypothesis, Hypothesis testing, Independent t test, Independent z test, Analysis of variance, ANOVA, Comparison tests

# Student t test

William Sealy Gosset in 1908 published his work under the pseudonym “Student” to solve problems associated with inference based on sample(s) drawn from normally distributed population when the population standard deviation is unknown. He developed the t-test and t-distribution, which can be used to compare two small sets of quantitative data collected independently of one another, in this case this t-test is called independent samples t-test or also called unpaired samples t-test.

Student’s t-test is the most commonly used statistical techniques in testing of hypothesis on the basis of difference between sample means. The t-test can be computed just by knowing the means, standard deviations and number of data points in both samples by using the following formula

$t=\frac{\overline{X}_1-\overline{X}_2 }{\sqrt{s_p^2 (\frac{1}{n_1}+\frac{1}{n_2})}}$

where $s_p^2$ is the pooled (combined) variance and can be computed as

$s_p^2=\frac{(n_1-1)s_1^2 + (n_2-2)s_2^2}{n_1+n_2-2}$

Using this test statistic, we test the null hypothesis $H_0:\mu_1=\mu_2$ which means that both samples came from the same population under the given level of significance or level of risk.

If the computed t-statistics from above formula is greater than the critical value (value from t-table with $n_1+n_2-2$ degrees of freedom and given level of significance, say $\alpha=0.05$), the null hypothesis will be rejected, otherwise null hypothesis will be accepted.

Note that the t-distribution is a family of curves depending of degree of freedom (the number of independent observations in the sample minus number of parameters). As the sample size increases, the t-distribution approaches to bell shape i.e. normal distribution.

Example: The production manager wants to compare the number of defective products produced on the day shift with the number on the afternoon shift. A sample of the production from 6day shifts and 8 afternoon shifts revealed the following numbers of defects. The production manager wants to check at the 0.05 significance level, is there a significant difference in the mean number of defects per shits?

 Day shift 5 8 7 6 9 7 Afternoon Shit 8 10 7 11 9 12 14 9

Some required calculations are:

Mean of samples:

$\overline{X}_1=7$, $\overline{X}_2=10$,

Standard Deviation of samples

$s_1=1.4142$, $s_2=2.2678$ and $s_p^2=\frac{(6-1) (1.4142)^2+(8-1)(2.2678)^2}{6+8-2}=3.8333$

Step 1: Null and alternative hypothesis are: $H_0:\mu_1=\mu_2$ vs $H_1:\mu_1 \ne \mu_2$

Step 2: Level of significance: $\alpha=0.05$

Step 3: Test Statistics

\begin{aligned} t&=\frac{\overline{X}_1-\overline{X}_2 }{\sqrt{s_p^2 (\frac{1}{n_1}+\frac{1}{n_2})}}\\ &=\frac{7-10}{\sqrt{3.8333(\frac{1}{6}+\frac{1}{8})}}=-2.837 \end{aligned}

Step 4: Critical value or rejection region (Reject $H_0$ if absolute value of t-calculated in step 3 is greater than absolute table value i.e. $|t_{calculated}|\ge t_{tabulated}|$). In this example t-tabulated is -2.179 with 12 degree of freedom at significance level 5%.

Step 5: Conclusion: As computed value $|2.837| > |2.179|$, which means that the number of defects is not same on the two shifts.

See some Mathematica demonstration

Student T Distribution

# Effect Size: Introduction

An effect size is a measure of the strength of a phenomenon, conveying the estimated magnitude of a relationship without making any statement about the true relationship. Effect size measure(s) play important role in meta-analysis and statistical power analyses. So reporting effect size in thesis, reports or research reports can be considered as a good practice, especially when presenting some empirical results/ findings, because it measures the practical importance of a significant finding. In simple way we can say that effect size is a way of quantifying the size of difference between two groups.

Effect size is usually computed after rejecting the null hypothesis in statistical hypothesis testing procedure. So if the null hypothesis is not rejected (i.e. accepted) then effect size has little meaning.

There are different formulas for different statistical tests to measure the effect size. In general, effect size can be computed in two ways.

1. As the standardized difference between two means
2. As the effect size correlation (correlation between the independent variables classification and the individual scores on the dependent variable).

## Effect size for dependent sample t test

Effect size of paired sample t test (dependent sample t test) known as Cohen’s d (effect size) ranging from $-\infty$ to $\infty$ evaluated the degree measured in standard deviation units that the mean of the difference scores is equal to zero. If value of d equals 0, then it means that the difference scores is equal to zero. However larger the d value from 0, more is the effect size.

Effect size for dependent sample t test can be computed by using

$d=\frac{\overline{D}-\mu_D}{SD_D}$

Note that both the Pooled Mean (D) and standard deviation are reported in SPSS output under paired differences.

Let the effect size, d = 2.56 which means that the sample mean difference and the population mean difference are 2.56 standard deviations apart. The sign has no effect on the size of an effect i.e. -2.56 and 2.56 are equivalent effect sizes.

The d statistics can also be computed from obtained t value and number of paired observation by Ray and Shadish’s (1996) such as

$d=\frac{t}{\sqrt{N}}$

The value of d is usually categorized as small, medium and large. With cohen’s d:

• d=0.2 to 0.5 small effect
• d=0.5 to 0.8, medium effect
• d= 0.8 and higher, large effect.

## Computing Effect Size from $R^2$

Another method of computing the effect size is with r-squared ($r^2$), i.e.

$r^2=\frac{t^2}{t^2+df}$

It can be categorized in small, medium and large effect as

• $r^2=0.01$, small effect
• $r^2=0.09$, medium effect
• $r^2=0.25$, large effect.

The non‐significant results of t-test indicates that we fail to reject the hypothesis that the two conditions have equal means in the population. The larger the value of $r^2$ indicates the larger effect (effect size), while a large effect size with a non‐significant result suggests that the study should be replicated with a larger sample size.

So larger the value of effect size computed from either methods indicates a very large effect, meaning that means are likely very different.

### References:

• Ray, J. W., & Shadish, W. R. (1996). How interchangeable are different estimators of effect size? Journal of Consulting and Clinical Psychology, 64, 1316-1325. (see also “Correction to Ray and Shadish (1996)”, Journal of Consulting and Clinical Psychology, 66, 532, 1998)
• Kelley, Ken; Preacher, Kristopher J. (2012). “On Effect Size”. Psychological Methods 17 (2): 137–152. doi:10.1037/a0028086.

## Introduction

The objective of testing of statistical hypothesis is to determine if an assumption about some characteristic (parameter) of a population is supported by the information obtained from the sample.

The terms hypothesis testing or testing of hypothesis are used interchangeably. A statistical hypothesis (different from simple hypothesis) is a statement about a characteristic of one or more populations such as the population mean. This statement may or may not be true. Validity of statement is checked on the basis of information obtained by sampling from the population.
Testing to Hypothesis refers to the formal procedures used by statisticians to accept or reject statistical hypotheses that includes:

## i) Formulation of Null and Alternative Hypothesis

### Null hypothesis

A hypothesis formulated for the sole purpose of rejecting or nullifying it is called null hypothesis, usually denoted by H0. There is usually a “not” or a “no” term in the null hypothesis, meaning that there is “no change”.

For Example: The null hypothesis is that the mean age of M.Sc. student is 20 years. Statistically it can be written as H0:μ=20. Generally speaking, the null hypothesis is developed for the purpose of testing.
We should emphasized that , if the null hypothesis is not rejected on the basis of the sample data we cannot say that the null hypothesis is true. In other way, failing to reject the null hypothesis does not prove that the H0 is true, it means that we have failed to disprove H0.

For null hypothesis we usually state that “there is no significant difference between “A” and “B” or “the mean tensile strength of copper wire is not significantly different from some standard”.

### Alternative Hypothesis

Any hypothesis different from the null hypothesis is called an alternative hypothesis denoted by H1. Or we can say that a statement that is accepted if the sample data provide sufficient evidence that the null hypothesis is false. Alternative hypothesis also referred to as the research hypothesis.

It is important to remember that no matter how the problem stated, null hypothesis will always contain the equal sign, and equal sign will never appear in the alternate hypothesis. It is because the null hypothesis is the statement being tested and we need a specific value to include in our calculations. The alternative hypothesis for example given in null hypothesis is H1:μ≠20.

### Simple and Composite Hypothesis

If a statistical hypothesis completely specifies the form of the distribution as well as the value of all parameters, then it is called a simple hypothesis. For example, Suppose the age distribution of the first year college student follows N(16, 25), and null hypothesis is H0:μ=16 then this null hypothesis is called simple hypothesis. and If a statistical hypothesis is not completely specifies the form of the distribution, then it is called composite hypothesis. For example H1:μ<16 or H1:μ>16.

## ii) Level of Significance

The level of significance (significance level) is denoted by the Greek letter alpha (α). It is also called the level of risk (as there is the risk you take of rejecting the null hypothesis when it is really true). Level of significance is defined as the probability of making a type-I error. It is the maximum probability with which we would be willing to risk a type-I error. It is usually specified before any sample is drawn so that results obtained will not influence our choice.

In practice 10% (0.10) 5% (0.05) and 1% (0.01) level of significance is used in testing a given hypothesis. 5% level of significance means that there are about 5 chances out of 100 that we would reject the true hypothesis i.e. we are 95% confident that we have made the right decision. The hypothesis that has been rejected at 0.05 level of significance means that we could be wrong with probability 0.05.

### Selection of Level of Significance

Selection of level of significance depends on field of study. Traditionally 0.05 level is selected for business science related problems, 0.01 for quality assurance and 0.10 for political polling and social sciences.

### Type-I and Type-II Errors

Whenever we accept or reject a statistical hypothesis on the basis of sample data, there is always some chances of making incorrect decisions. Accepting a true null hypothesis or rejecting a false null hypothesis leads to a correct decision, and accepting a false hypothesis or rejecting a true hypothesis leads to incorrect decision. These two types of errors are called type-I error and type-II error.
type-I error: Rejecting null hypothesis when it is (H0) true.
type-II error: Accepting null hypothesis when H1 is true.

## iii) Test Statistics

Procedures which enable us to decide whether to accept or reject hypothesis or to determine whether observed sample differ significantly from expected results are called tests of hypothesis, tests of significance or rules of decision. We can also say that a test statistics is a value calculated from sample information, used to determine whether to reject the null hypothesis. The test statistics for mean $\mu$ when $\sigma$ is known is $Z= \frac{\bar{X}-\mu}{\sigma/\sqrt{n}}$, where Z-value is based on the sampling distribution of $\bar{X}$, which follows the normal distribution with mean $\mu_{\bar{X}}$ equal to $\mu$ and standard deviation $\sigma_{\bar{X}}$ which is equal to $\sigma/\sqrt{n}$. Thus we determine that whether the difference between $\bar{X}$ and $\mu$ is statistically significant by finding the number of standard deviations $\bar{X}$  from $\mu$ using the Z statistics. Other test statistics are also available such as t, F, $\chi^2$ etc.

## iv) Critical Region (Formulating Decision Rule)

It must be decided, before the sample is drawn that under what conditions (circumstance) the null hypothesis will be rejected. A dividing line must be drawn defining “Probable” and “Improbable” sample values given that the null hypothesis is a true statement. Simply a decision rule must be formulated having specific conditions under which the null hypothesis should be rejected or should not be rejected. This dividing line defines the region or area of rejection of those values which are large or small that the probability of their occurrence under a null hypothesis is rather remote i.e. Dividing line defines the set of possible values of the sample statistic that leads to reject the null hypothesis called the critical region.

### One tailed and two tailed tests of significance

If the rejection region is on the left or right tail of the curve then it is called one tailed hypothesis. It happens when the null hypothesis is tested against an alternative hypothesis having a “greater than” or a “less than” type.

and if the rejection region is on the left and right tail (both side) of the curve then it is called two tailed hypothesis. It happens when the null hypothesis is tested against an alternative hypothesis having a “not equal to sign” type.

## v) Making a Decision

In this step, computed value of test statistic is compared with the critical value. If the sample statistic falls within the rejection region, the null hypothesis will be rejected otherwise accepted. Note that only one of two decisions is possible in hypothesis testing, either accept or reject the null hypothesis. Instead of “accepting” the null hypothesis (H0), some researchers prefer to phrase the decision as “Do not reject H0” or “We fail to reject H0” or “The sample results do not allow us to reject H0“.

# Difference between a probability value and the significance level?

Basically in hypothesis testing the goal is to see if the probability value is less than or equal to the significance level (i.e., is p ≤ alpha). It is also called the size of the test or size of the critical region. It is generally specified before any samples are drawn so that the results obtained will not influence our choice.

• The probability value (also called the p-value) is the probability of the observed result found in your research study of occurring (or an even more extreme result occurring), under the assumption that the null hypothesis is true (i.e., if the null were true).
• In hypothesis testing, the researcher assumes that the null hypothesis is true and then sees how often the observed finding would occur if this assumption were true (i.e., the researcher determines the p-value).
• The significance level (also called the alpha level) is the cutoff value the researcher selects and then uses to decide when to reject the null hypothesis.
• Most researchers select the significance or alpha level of .05 to use in their research; hence, they reject the null hypothesis when the p-value is less than or equal to .05.
• The key idea of hypothesis testing it that you reject the null hypothesis when the p-value is less than or equal to the significance level of.05.

# Testing of Hypothesis

The researcher is similar to the prosecuting attorney is the sense that the researcher brings the null hypothesis “to trial” when she believes there is probability strong evidence against the null.

• Just as the prosecutor usually believes that the person on trial is not innocent, the researcher usually believes that the null hypothesis is not true.
• In the court system the jury must assume (by law) that the person is innocent until the evidence clearly calls this assumption into question; analogously, in hypothesis testing the researcher must assume (in order to use hypothesis testing) that the null hypothesis is true until the evidence calls this assumption into question.