Student t-test Comparison Test (2015)

In 1908, William Sealy Gosset published his work under the pseudonym “Student” to solve problems associated with inference based on sample(s) drawn from a normally distributed population when the population standard deviation is unknown. He developed the Student t-test and t-distribution, which can be used to compare two small sets of quantitative data collected independently of one another, in this case, this t-test is called independent samples t-test or also called unpaired samples t-test.

The Student t-test is the most commonly used statistical technique in testing of hypothesis based on the difference between sample means. The student t-test can be computed just by knowing the means, standard deviations, and number of data points in both samples by using the following formula

\[t=\frac{\overline{X}_1-\overline{X}_2 }{\sqrt{s_p^2 (\frac{1}{n_1}+\frac{1}{n_2})}}\]

where $s_p^2$ is the pooled (combined) variance and can be computed as

\[s_p^2=\frac{(n_1-1)s_1^2 + (n_2-2)s_2^2}{n_1+n_2-2}\]

Using this test statistic, we test the null hypothesis $H_0:\mu_1=\mu_2$ which means that both samples came from the same population under the given “level of significance” or “level of risk”.

If the computed t-statistics from the above formula is greater than the critical value (value from t-table with $n_1+n_2-2$ degrees of freedom and given a level of significance, say $\alpha=0.05$), the null hypothesis will be rejected, otherwise, the null hypothesis will be accepted.

Note that the t-distribution is a family of curves depending on the degree of freedom (the number of independent observations in the sample minus the number of parameters). As the sample size increases, the t-distribution approaches a bell shape i.e. normal distribution.

Student t-test Example

The production manager wants to compare the number of defective products produced on the day shift with the number on the afternoon shift. A sample of the production from 6-day and 8-afternoon shifts revealed the following defects. The production manager wants to check at the 0.05 significance level, is there a significant difference in the mean number of defects per shits?

Day shift587697  
Afternoon Shit810711912149

Some required calculations for the Student t-test are:

The mean of samples:

$\overline{X}_1=7$, $\overline{X}_2=10$,

Standard Deviation of samples

$s_1=1.4142$, $s_2=2.2678$ and $s_p^2=\frac{(6-1) (1.4142)^2+(8-1)(2.2678)^2}{6+8-2}=3.8333$

Step 1: Null and alternative hypothesis are: $H_0:\mu_1=\mu_2$ vs $H_1:\mu_1 \ne \mu_2$

Step 2: Level of significance: $\alpha=0.05$

Step 3: Test Statistics

$\begin{aligned}
t&=\frac{\overline{X}_1-\overline{X}_2 }{\sqrt{s_p^2 (\frac{1}{n_1}+\frac{1}{n_2})}}\\
&=\frac{7-10}{\sqrt{3.8333(\frac{1}{6}+\frac{1}{8})}}=-2.837
\end{aligned}$

Step 4: Critical value or rejection region (Reject $H_0$ if the absolute value of t-calculated in step 3 is greater than the absolute table value i.e. $|t_{calculated}|\ge t_{tabulated}|$). In this example t-tabulated is -2.179 with 12 degrees of freedom at a significance level of 5%.

Step 5: Conclusion: As computed value $|2.837| > |2.179|$, the number of defects is not the same on the two shifts.

Different Types of Comparison Tests

  • Independent Samples t-test: This compares the means of two independent groups. For example, you might use this to see if a new fertilizer increases plant growth compared to a control group.
  • Paired Samples t-test: This compares the means from the same group at different times or under various conditions. Imagine testing the same group’s performance on a task before and after training.
  • One-Sample t-test: This compares the mean of a single group to a hypothesized value. For instance, you could use this to see if students’ average exam scores significantly differ from 75%.

The summary of key differences between the comparison tests

Independent SamplesPaired SamplesOne-Sample
GroupsIndependentSame group at different timesSingle group
HypothesisMeans are differentMeans are differentMean is different from a hypothesized value
AssumptionsNormally distributed data, equal variances (testable)Normally distributed differencesNormally distributed data

Regardless of the type of t-test, all the above comparison tests assess the significance of a difference between means. These tests tell the research if the observed difference is likely due to random chance or reflects a true underlying difference in the populations.

Student T-test

https://rfaqs.com

https://gmstat.com

P-value Definition, Interpretation, Introduction, Significance

In this post, we will discuss the P-value definition, interpretation, introduction, and some related examples.

P-value Definition

The P-value also known as the observed level of significance or exact level of significance or the exact probability of committing a type-I error (probability of rejecting $H_0$, when it is true), helps to determine the significance of results from the hypothesis. The P-value is the probability of obtaining the observed sample results or a more extreme result when the null hypothesis (a statement about population) is true.

In technical words, one can define the P-value as the lowest level of significance at which a null hypothesis can be rejected. If the P-value is very small or less than the threshold value (chosen level of significance), then the observed data is considered inconsistent with the assumption that the null hypothesis is true, and thus null hypothesis must be rejected while the alternative hypothesis should be accepted. A P-value is a number between 0 and 1 in literature.

Usual P-value Interpretation

  • A small P-value (<0.05) indicates strong evidence against the null hypothesis
  • A large P-value (>0.05) indicates weak evidence against the null hypothesis.
  • p-value very close to the cutoff (say 0.05) is considered to be marginal.

Let the P-value of a certain test statistic is 0.002 then it means that the probability of committing a type-I error (making a wrong decision) is about 0.2 percent, which is only about 2 in 1,000. For a given sample size, as | t | (or any test statistic) increases the P-value decreases, so one can reject the null hypothesis with increasing confidence.

p value and significance level

Fixing the significance level ($\alpha$, i.e. type-I error) equal to the p-value of a test statistic then there is no conflict between the two values, in other words, it is better to give up fixing up (significance level) arbitrary at some level of significance such as (5%, 10%, etc.) and simply choose the P-value of the test statistic. For example, if the p-value of the test statistic is about 0.145 then one can reject the null hypothesis at this exact significance level as nothing wrong with taking a chance of being wrong 14.5% of the time someone rejects the null hypothesis.

P-value addresses only one question: how likely are your data, assuming a true null hypothesis? It does not measure support for the alternative hypothesis.

Most authors refer to a P-value<0.05 as statistically significant and a P-value<0.001 as highly statistically significant (less than one in a thousand chance of being wrong).

P-value Definition, P-value Interpretation

The P-value interpretation is usually incorrect as it is usually interpreted as the probability of making a mistake by rejecting a true null hypothesis (a Type-I error). The P-value cannot be the error rate because:

The P-value is calculated based on the assumption that the null hypothesis is true and that the difference in the sample is by random chance. Consequently, a p-value cannot tell about the probability that the null hypothesis is true or false because it is 100% true from the perspective of the calculations.

https://itfeature.com

Read More about P-value definition, interpretation, and misinterpretation

Read More on Wiki-Pedia

SPSS Data Analysis

Online MCQs Quiz Website

The Degrees of Freedom

The degrees of freedom (df) or several degrees of freedom refers to the number of observations in a sample minus the number of (population) parameters being estimated from the sample data. All this means that the degrees of freedom are a function of both sample size and the number of independent variables. In other words, it is the number of independent observations out of a total of ($n$) observations.

Degrees of Freedom

In statistics, the degrees of freedom are considered as the number of values in a study that is free to vary. Degree of freedom example in real life; if you have to take ten different courses to graduate, and only ten different courses are offered, then you have nine degrees of freedom. Nine semesters you will be able to choose which class to take; the tenth semester, there will only be one class left to take – there is no choice, if you want to graduate, this is the concept of the degrees of freedom (df) in statistics.

Let a random sample of size $n$ be taken from a population with an unknown mean $\overline{X}$. The sum of the deviations from their means is always equal to zero i.e.$\sum_{i=1}^n (X_i-\overline{X})=0$. This requires a constraint on each deviation $X_i-\overline{X}$ used when calculating the variance.

\[S^2 =\frac{\sum_{i=1}^n (X_i-\overline{X})^2 }{n-1}\]

This constraint (restriction) implies that $n-1$ deviations completely determine the nth deviation. The $n$ deviations (and also the sum of their squares and the variance in the $S^2$ of the sample) therefore $n-1$ degrees of freedom.

A common way to think of df is the number of independent pieces of information available to estimate another piece of information. More concretely, the number of degrees of freedom is the number of independent observations in a sample of data that are available to estimate a parameter of the population from which that sample is drawn. For example, if we have two observations, when calculating the mean we have two independent observations; however, when calculating the variance, we have only one independent observation, since the two observations are equally distant from the mean.

Degrees of Freedom

Single sample: For $n$ observation one parameter (mean) needs to be estimated, which leaves $n-1$ degree of freedom for estimating variability (dispersion).

Two samples: There are a total of $n_1+n_2$ observations ($n_1$ for group1 and $n_2$ for group2) and two means need to be estimated, which leaves $n_1+n_2-2$ degree of freedom for estimating variability.

Regression with p predictors: There are $n$ observations with $p+1$ parameters that need to be estimated (regression coefficient for each predictor and the intercept). This leaves $n-p-1$ degrees of freedom of error, which accounts for the error degrees of freedom in the ANOVA table.

Several commonly encountered statistical distributions (Student’s t, Chi-Squared, F) have parameters that are commonly referred to as degrees of freedom. This terminology simply reflects that in many applications where these distributions occur, the parameter corresponds to the degrees of freedom of an underlying random vector. If $X_i; i=1,2,\cdots, n$ are independent normal $(\mu, \sigma^2)$ random variables, the statistic (formula) is $\frac{\sum_{i=1}^n (X_i-\overline{X})^2}{\sigma^2}$, follows a chi-squared distribution with $n-1$ degree of freedom. Here, the degree of freedom arises from the residual sum of squares in the numerator and in turn the $n-1$ degree of freedom of the underlying residual vector $X_i-\overline{X}$.

itfeature.com The Degrees of Freedom

Computer MCQs Online Test

R Programming Language

Effect Size Definition, Formula, Interpretation (2014)

Effect Size Definition

The Effect Size definition: An effect size is a measure of the strength of a phenomenon, conveying the estimated magnitude of a relationship without making any statement about the true relationship. Effect size measure(s) play an important role in meta-analysis and statistical power analyses. So reporting effect size in thesis, reports or research reports can be considered as a good practice, especially when presenting some empirical results/ findings because it measures the practical importance of a significant finding. Simply, we can say that effect size is a way of quantifying the size of the difference between the two groups.

Effect size is usually computed after rejecting the null hypothesis in a statistical hypothesis testing procedure. So if the null hypothesis is not rejected (i.e. accepted) then effect size has little meaning.

There are different formulas for different statistical tests to measure the effect size. In general, the effect size can be computed in two ways.

  1. As the standardized difference between two means
  2. As the effect size correlation (correlation between the independent variables classification and the individual scores on the dependent variable).

The Effect Size Dependent Sample T-test

The effect size of paired sample t-test (dependent sample t-test) known as Cohen’s d (effect size) ranging from $-\infty$ to $\infty$ evaluated the degree measured in standard deviation units that the mean of the difference scores is equal to zero. If the value of d equals 0, then it means that the difference scores are equal to zero. However larger the d value from 0, the more the effect size.

Effect Size Formula for Dependent Sample T-test

The effect size for the dependent sample t-test can be computed by using

\[d=\frac{\overline{D}-\mu_D}{SD_D}\]

Note that both the Pooled Mean (D) and standard deviation are reported in SPSS output under paired differences.

Let the effect size, $d = 2.56$ which means that the sample means difference and the population mean difference is 2.56 standard deviations apart. The sign does not affect the size of an effect i.e. -2.56 and 2.56 are equivalent effect sizes.

The $d$ statistics can also be computed from the obtained $t$ value and the number of paired observations by Ray and Shadish (1996) such as

\[d=\frac{t}{\sqrt{N}}\]

The value of $d$ is usually categorized as small, medium, and large. With Cohen’s $d$:

  • d=0.2 to 0.5 small effect
  • d=0.5 to 0.8, medium effect
  • d= 0.8 and higher, large effect.

Calculating Effect Size from $R^2$

Another method of computing the effect size is with r-squared ($r^2$), i.e.

\[r^2=\frac{t^2}{t^2+df}\]

Effect size is categorized into small, medium, and large effects as

  • $r^2=0.01$, small effect
  • $r^2=0.09$, medium effect
  • $r^2=0.25$, large effect.
Effect Size Definition Dependent t test

The non‐significant results of the t-test indicate that we failed to reject the hypothesis that the two conditions have equal means in the population. A larger value of $r^2$ indicates the larger effect (effect size), while a large effect size with a non‐significant result suggests that the study should be replicated with a larger sample size.

So larger value of effect size computed from either method indicates a very large effect, meaning that means are likely very different.

Choosing the Right Effect Size Measure

The appropriate effect size measure depends on the type of analysis being conducted (for example, correlation, group comparison, etc.) and the scale measurement of the data (continuous, binary, nominal, ration, interval, ordinal, etc.). It is always a good practice to report both effect size and statistical significance (p-value) to provide a more complete picture of your findings.

In conclusion, effect size is a crucial concept in interpreting statistical results. By understanding and reporting effect size, one can gain a deeper understanding of the practical significance of the research findings and contribute to a more comprehensive understanding of the field of study.

References:

  • Ray, J. W., & Shadish, W. R. (1996). How interchangeable are different estimators of effect size? Journal of Consulting and Clinical Psychology, 64, 1316-1325. (see also “Correction to Ray and Shadish (1996)”, Journal of Consulting and Clinical Psychology, 66, 532, 1998)
  • Kelley, Ken; Preacher, Kristopher J. (2012). “On Effect Size”. Psychological Methods 17 (2): 137–152. doi:10.1037/a0028086.

Learn more about Effect Size Definition and Statistical Significance

R Language Basics