Formal Hypothesis Test

A formal hypothesis test in statistics is a structured method used to determine whether there is enough evidence in a sample of data to infer that a certain condition holds for the entire population. It involves making an initial assumption (the null hypothesis) and then evaluating whether the observed data provides sufficient evidence to reject that assumption in favor of an alternative hypothesis.

Null and Alternative Hypotheses

In a formal hypothesis test, the null hypotheses are denoted by $H_o$ and the alternative hypotheses are denoted by $H_a$. The null and alternative hypotheses need to be assigned as follows:

Null Hypothesis

The null hypothesis is the hypothesis being tested. $H_o$ must

  • be the hypothesis we want to reject
  • contain the condition of equality (=, $\ge$, or $\le$)

Alternative Hypothesis

The alternative hypothesis is always the opposite of the null hypothesis, $H_o$. $H_a$ must

  • be the hypothesis we want to support
  • not contain the condition of equality (<, >, $\ne$)

A formal hypothesis test will always conclude with a decision to reject $H_o$ based on sample data or the decision that there is not strong enough evidence to reject $H_o$.

Formal Hypothesis Test, Hypothesis Testing

Components of a Formal Hypothesis Test

The following are key components of a formal hypothesis test.

  • Null Hypothesis ($H_o$)
    It is a statement of “No Effect” or “No Difference”. For example, $H_o:\mu=\mi_o$ (population mean $\mu$ equals a specified value $\mu_o$
  • Alternative Hypothesis ($H_1$)
    It is a statement that contradicts the null hypothesis. An alternative hypothesis can be one-tailed (for example, $H_1:\mu> \mu_o$, or $H_1:\mu<\mu_o$) or two-tailed (for instance, $H_1:\mu\ne\mu_o$).
  • Test Statistic (Test Formula)
  • A numerical value is calculated from sample data by using an appropriate t-statistic, z-score, f-statistic, or $\chi^2$ statistic.
  • Significance Level ($\alfha$)
    The maximum acceptable probability is typically chosen at the outset of the hypothesis test and is referred to as the level of significance or significance level for the test. The level of significance is denoted by $\alpha$, and the most commonly used values are $\alpha = 0.10, 0.05, and 0.01$.
    Note that once $\alpha$ (level of significance) is determined, the value of $\beta$ is also fixed; the probability of making a type-II error in a hypothesis test.
  • P-value
    The probability of observing the test statistic (or more extreme) if $H_o$ is true. If $p\le\alpha$, reject $H_o$; otherwise, accept it.
  • Decision Rule
    Reject $H_o$ if the test statistic falls in the critical region or if $p\le\alpha$
  • Conclusion
    State whether there is sufficient evidence to reject $H_o$ in favour of $H_1$.

Hypothetical Example: One-Sample t-test

  • Null Hypothesis: The population mean $\mu=50$
  • Alternative Hypothesis: The population mean $\mu \ne 50$ (two-tailed test)
  • Test Statistic: $t=\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}$, where $\overline{x}$ is the sample mean, $s4 is the sample standard deviation, $n$ is the sample size
  • Decision: if $|t| > t_{\alpha/2, n-1}$ or $p<\alpha$, reject $H_o$

Real Life Examples of Formal Hypothesis Tests

The following are a few real-life examples of formal hypothesis tests used in various fields.

  • Medical Testing (Drug Efficacy): Consider a pharmaceutical company that tests whether a new drug lowers blood pressure more effectively than a placebo. It is a real case used in clinical trials for hypertension medications. The hypotheses will be
    • $H_o$: The drug has no effect ($\mu_{drug} = \mu_{placebo}$.
    • $H_1: The drug reduces blood pressure ($\mu_{drug}<\mu_{placebo}$). It is a one-tailed test.
    • Test Statistic Used: Two-sample t-test will be used for comparing the means of two groups.
  • Social Science (Opinion Polls): Consider a pollster who tests whether support for a political party candidate differs between men and women. The hypothesis may be
    • $H_o:$: No gender difference in support ($p_{men} = p_{women}$).
    • $H_1: Support differs by gender ($p_{men}\ne p_{p_{women}$). It is a two-tailed test.
    • Test Statistic Used: Chi-Square test for independence (categorical data) will be used.
  • Economics (Policy Impact): A government tests whether a tax incentive increased small business growth. The hypotheses will be
    • $H_o$: The policy had no effect ($\mu_{after} – \mu_{before}=0$).
    • Test Statistic use: Regression analysis with a dummy variable or difference-in-differences test.
  • Business and Marketing (A/B Testing): An e-commerce company tests whether a redesigned website increases sales compared to the old version. The hypotheses will be:
    • $H_o$:The new design has no impact on sales ($p_{new}=p_{old}$)
    • $H_1$: The new design increases sales ($p_{new}>p_{old}$). It is a one-tailed test.
    • Test Statistic: For comparing conversion rates, a two-proportion z-test can be used.
  • Manufacturing (Quality Control): Suppose a factory checks if the average weight of cereal boxes meets the advertised weight of 500g. The hypotheses are:
    • $H_o$: The mean weight is 500g ($\mu=500$)
    • $H_1$: The mean weight differs from 500g ($\mu\ne 500$). It is a two-tailed test.
    • Test Statistic: A sample t-test can be used for testing against a known standard.
  • Environmental Science (Pollution Levels): Researchers are interested in testing if a river’s pollution level exceeds the safe limit (e.g., lead concentration > 15ppm). The hypotheses may be:
    • $H_o$: Mean lead concentration $\le$ 15 ppm ($\mu\le 15$)
    • $H_1$: Mean lead concentration > 15 ppm ($\mu > 15$). It is a two-tailed test.
    • Test Statistic: One-sample t-test (or non-parametric Wilcoxon test, if data is skewed) can be used
  • Education (Test Score Improvement): A school may be interested in testing whether a new teaching method improves students’ math scores. The hypothesis may be
    • $H_o$: The new method has no effect ($\mu_{after} – \mu_{before}=0$)
    • $H_1$: The new method improves scores ($\mu_{after} > \mu_{before}$). It is a one-tailed test.
    • Test Statistic: A paired sample t-test can be used.
  • Psychology (Behavioural Studies): A researcher may test whether sleep deprivation affects reaction time. The hypotheses are
    • $H_o$: Sleep deprivation has no effect ($\mu_{sleep\,deprived} > u_{normal\,sleep})
    • $H_1$: Sleep deprivation increases reaction time ($\mu_{sleep\,deprived}>\mu_{normal}$)
    • Test Statistic: An Independent two-sample t-test can be used for comparing two groups.

Exploratory Data Analysis in R

Type I Type II Error Example

In this post, we will discuss Type I Type II error examples from real-life situations. Whenever sample data is used to estimate a population parameter, there is always a probability of error due to drawing an unusual sample. Two main types of error occur in hypothesis tests, namely type I and type II Errors.

Type I Error (False Positive)

It is rejecting the null hypothesis ($H_0$) when it is actually true. The probability of Type I Error is denoted by $\alpha$ (alpha). The most common values for type I error are: 0.10, 0.05, and 0.01, etc. The example of Type I Error: A medical test indicates a person has a disease when they actually do not.

Type II Error (False Negative)

Type II Error is failing to reject the null hypothesis ($H_0$) when it is actually false. The probability of Type II Error is denoted by $\beta$ (beta). The power of the test is denoted by $1-\beta$, which is the probability of correctly rejecting a false null hypothesis. The example of a Type II error is: A medical test fails to detect a disease when the person actually has it.

Comparison Table

Error TypeWhat HappensRealityRisk Symbol
Type IReject Hâ‚€ when it is true$H_0$ is true$\alpha$
Type IIFail to reject $H_0$ when it is false$H_1$ (alternative) is true$\beta$
$H_0$ True$H_0$ False
$H_0$ RejectedType I ErrorCorrect Decision
$H_0$ Not RejectedCorrect DecisionType II Error

Type I Type II Error Example (Real-Life Examples)

  1. Medical Testing
    • Type I Error (False Positive): A healthy person is diagnosed with a disease. It may lead to unnecessary stress, further tests, or even treatment.
    • Type II Error (False Negative): A person with a serious disease is told they are healthy. It may delay treatment and worsen health outcomes.
      In this case, the more severe error is a Type II error, because missing a true disease can be life-threatening.
  2. Court Trial (Justice System)
    • Type I Error: An innocent person is found guilty. It leads to punishing someone who did nothing wrong.
    • Type II Error: A guilty person is found not guilty. It led to the criminal going free.
      In this example, the more severe is often Type I, because the justice system typically aims to avoid punishing innocent people.
  3. Fire Alarm System
    • Type I Error: The alarm goes off, but there’s no fire. Therefore, a false alarm causes panic and interruption.
    • Type II Error: There is a fire, but the alarm does not go off. It can cause loss of life or property.
      The more severe error is Type II error, due to the potential deadly consequences.
  4. Spam Email Filter
    • Type I Error: A legitimate email is marked as spam. It means one will miss important messages.
    • Type II Error: A spam email is not caught and lands in your inbox. The spam email may be a minor annoyance or a potential phishing risk.
      The more severe error in this case is usually Type I, especially if it causes loss of critical communication (like job offers, invoices, etc.).
  5. Quality Control in Manufacturing
    • A factory tests whether its products meet safety standards. The null hypothesis ($H_0$) states that the product meets requirements, while the alternative ($H_1$) claims it is defective.
    • Type I Error (False Rejection): If a good product is mistakenly labeled defective, the company rejects a true null hypothesis ($H_0$), leading to unnecessary waste and financial loss.
    • Type II Error (False Acceptance): If a defective product passes inspection, the company fails to reject a false null hypothesis ($H_0$). This could result in unsafe products reaching customers, damaging the brand’s reputation.
Type I Type II Error Example

Which Error is More Severe?

  • It depends on the context.
  • In healthcare or safety, Type II errors are often more dangerous.
  • In justice or decision-making, Type I errors can be more ethically concerning.

Designing a good hypothesis test involves balancing both types of errors based on what’s at stake.

Learn about Generic Functions in R

Understanding P-value in Statistics

Understanding P-value is important, as P-values are one of the most widely used and misunderstood concepts in the subject of statistics. Whether you are a novice, a data analyst, or an experienced data scientist, understanding p-values is crucial for hypothesis testing, A/B testing, and scientific research. In this post, we will cover:

What is a p-value? Understanding P-value

A p-value (probability value) measures the strength of evidence against a null hypothesis in a statistical test. The formal definition is

The probability of observing a test statistic as extreme as, or more extreme than, the one calculated, assuming the null hypothesis is true.

Key Interpretation: A low p-value (typically ≤ 0.05) suggests the observed data is unlikely under the null hypothesis, leading to its rejection. For example, suppose you run an A/B test:

Null Hypothesis ($H_o$): No difference between versions A and B.

Observed p-value = 0.03 → There is a 3% chance of seeing this result if $H_o$ were true.

Conclusion: Reject $H_o$ at the 5% significance level.

The P-value of a test statistic is the probability of drawing a random sample whose standardized test statistic is at least as contrary to the claim of the Null Hypothesis as that observed in the sample group.

How to Interpret P-Values Correctly?

To interpret P-values correctly, we need thresholds and Significance. For example,

  • $p \le 0.05$: Often considered “statistically significant” (but context matters!).
  • $p > 0.05$: Insufficient evidence to reject $H_o$ (but not proof that $H_o$ is true).

The following are some common Misinterpretations:

  • A p-value is the probability that the null hypothesis is true. → No! It is the probability of the data given $H_o$, not the other way around.
  • A smaller p-value means a stronger effect. → No! It only indicates stronger evidence against $H_o$, not the effect size.
  • $p > 0.05$ means ‘no effect.’ → No! It means no statistically significant evidence, not proof of absence.

Limitations and Criticisms of P-Values

The following are some limitations and criticisms of P-values:

  • P-hacking: Cherry-picking data to get $p\le 0.05$ inflates false positives.
  • Dependence on Sample Size: Large samples can produce tiny p-values for trivial effects.
  • Alternatives: Consider confidence intervals, Bayesian methods, or effect sizes.

Cherry-Picking Data: selectively choosing data points that support a desired outcome or hypothesis while ignoring data that contradicts it. For example, showing an upward sales trend over the first few months of a year, while omitting the data that showed sales declined for the rest of the year.

Understanding p-value

Computing P-value: A Numerical Example

A university claims that the average SAT score for its incoming students is 1080. A sample of 56 freshmen at the university is drawn, and the average SAT score is found to be $\overline{x} = 1044$ with a sample standard deviation of $s=94.7$ points. Find the p-value.

Suppose our hypothesis in this case is

$H_o: \mu = 1080$

$H_1: \mu \ne 1080$

The standardized test statistic is:

\begin{align*}
Z &= \frac{\overline{x} – \mu_o }{\frac{s}{\sqrt{n}}} \\
&= \frac{1044-1080}{\frac{94.7}{\sqrt{56}}} = -2.85
\end{align*}

From the alternative hypothesis, the test statistic is two-tailed, therefore, the p-value is given by

\begin{align*}
P(z \le -2.85\,\, or\,\, z \ge 2.85) &= 2 \times P(z\le -2.85)\\
&=2\times 0.0022 = 0.0044
\end{align*}

Deciding to Reject the Null Hypothesis

A very small p-value would lead us to reject the null hypothesis while a high p-value would not Since the p-value of a test is the probability of randomly drawing a sample at least as contrary to $H_o$ as the observed sample, one can think of the p-value as the probability that we will be wrong if we choose to reject $H_o$ based on our sampled data. The p-value, then, is the probability of making a Type I Error.

Recall that the maximum acceptable probability of making a Type-I Error is the significance level ($\alpha$), and it is usually determined at the outset of the hypothesis test. The rule that is used to decide whether to reject $H_o$ is:

  • Reject $H_o$ if $p \le \alpha$
  • Do not reject $H_o$ if p > \alpha$

Practical Example: Calculating P-Values in Python & R

from scipy import stats

# Two-sample t-test  

t_stat, p_value = stats.ttest_ind(group_A, group_B)

print(f"P-value: {p_value:.4f}") 
# Two-Sample t-test

result <- t.test(group_A, group_B)

print(paste("P-value:", result$p.value))

Best Practices for Using P-Values

  • Pre-specify significance levels (e.g., $\ alpha=0.05$) before testing.
  • Report effect sizes and confidence intervals alongside p-values.
  • Avoid dichotomizing results (“significant” vs “not significant”).
  • Consider Bayesian alternatives when appropriate.

Conclusion

P-values are powerful but often misused. By understanding their definition, interpretation, and limitations, you can make better data-driven decisions.

Want to learn more?

statistics help https://itfeature.com Statistics for Data Science & Analytics

Try Permutation Combination Math MCQS