Sample Size Determination

Sample size determination is one of the most critical steps in designing any research study or experiment. Whether the researcher is conducting clinical trials, market research, or social science studies, the selection of an appropriate sample size ensures that the results are statistically valid while optimizing resources. This guide will walk you through the key concepts and methods for sample size determination.

In planning a study, the sample size determination is an important issue required to meet certain conditions. For example, for a study dealing with blood cholesterol levels, these conditions are typically expressed in terms such as “How large a sample do I need to be able to reject the null hypothesis that two population means are equal if the difference between them is $d=10$mg/dl?

Why Sample Size Matters

  1. Statistical Power: Adequate sample sizes increase the ability to detect true effects
  2. Precision: Larger samples typically yield more precise estimates
  3. Resource Efficiency: Avoid wasting time/money on unnecessarily large samples
  4. Ethical Considerations: Especially important in clinical research to neither under- nor over-recruit participants

Special Considerations for Estimating Sample Size

  1. Small Populations: May require finite population corrections
  2. Stratified Sampling: Need to calculate for each stratum
  3. Cluster Sampling: Must account for design effect
  4. Longitudinal Studies: Consider repeated measures and attrition
Sample Size Determination

Sample Size Determination Formula

In general, there exists a formula for computing a sample size for the specific test statistic (appropriate to test a specified hypothesis). These formulae require that the user specify the $\alpha$-level and Power = ($1-\beta$) desired, as well as the difference to be detected and the variability of the measure.

Common Approaches to Sample Size Calculation

For Estimating Proportions (Prevalence Studies)

The common approach to calculate sample size, use the formula:

$$n=\frac{Z^2 p (1-p)}{E^2}$$

where

  • Z = Z-value (1.96 for 95% confidence interval)
  • p = estimated proportion
  • E = margin of error

For a survey with an expected proportion of 50%, a 95% confidence level, and 5% margin of error, the sample size will be

$$n=\frac{1.96^2 \times 0.5 \times 0.5}{0.05^2} \approx 385$$

Note that it is not wise to calculate a single number for the sample size. It is better to calculate a range of values by varying the assumptions so that one can get a sense of their impact on the resulting projected sample size. From this range of sample sizes, a suitable sample may be picked for the research work.

Common Situations for Sample Size Determination

We consider the process of estimating sample size for three common circumstances:

  • One-Sample t-test and paired t-test
  • Two-Sample t-test
  • Comparison of $P_1$ vs $P_2$ with a Z-test

One Sample t-test and Paired test

For testing the hypothesis:

$H_o:\mu=\mu_o\quad$ vs $\quad H_1:\mu \ne \mu_o$

For a two-tailed test, the formula of one-sample t-test is

$$n = \left[\frac{(Z_{1-\alpha/2} + Z_{1-\beta})\sigma}{d} \right]^2$$

Example: Suppose we are interested in estimating the size of a sample from a population of blood cholesterol levels. The typical standard deviation of the population is, say, 30 mg/dl. Consider, $\alpha = 0.05, \sigma = 25, d = 5.0, power = 0.80$

\begin{align*}
n & = \left[ \frac{(Z_{1-\alpha/2} + Z_{1-\beta})\sigma}{d} \right]^2\\
&= \left[\frac{(1.96 + 0.842)}{5}25\right]^2 = 196.28 \approx 197
\end{align*}

Two Sample t-test

How large a sample would be needed for comparing two approaches to cholesterol lowering using $\alpha=0.05$, to detect a difference of $d=20$ mg/dl or more with power = $1-\beta=0.90$? For the following hypothesis

$H_o:\mu_1 =\mu_2\quad$ vs $\quad H_1:\mu_1 \ne \mu_2$. For a two-tailed t-test, the formula is

$$N=n_1+n_2 = \frac{4\sigma^2(Z_{1-\alpha/2} + Z_{1-\beta})^2 } {(d = \mu_1 – \mu_2)^2}$$

For $\sigma = 30$mg/dl, $\beta=0.10, \alpha = 0.05$, $Z_{1-\alpha/2}=1.96$, Power = $1-\beta$, $Z_{1-\beta}=1.282$, d = 20 mg/dl.

\begin{align*}
N &= n_1 + n_2 = \frac{4(30)^2 (1.96 + 1.282)^2}{20^2}\\
&= \frac{4\times 900 \times (3.242)^2}{400} = 94.6
\end{align*}

The required sample size is about 50 for each group.

Two Sample Proportion Test

For testing the two-sample proportions hypothesis,

$H_o:P_1=P_2 \quad$ vs $\quad H_1:P_1\ne P_2$

The formula for the two-sample proportion test is

$$N=n_1+n_2 = \frac{{4(Z_{1-\alpha} + Z_{1-\beta})^2}\left[\left(\frac{P_1+P_2}{2}\right) \left(1-\frac{P_1+P_2}{2}\right) \right] }{(d=P_1-P_2)^2}$$

Consider when $\sigma = 30$ mg/dl, $\beta=0.10$, $\alpha = 0.05$, $Z_{1-\alpha/2} = 1.96$, Power = $1-\beta$; $Z_{1-\beta} = 1.282$. $P_1 = 0.7, P_2=0.5$, $d=P_1 – P_2 = 0.7-0.5 = 0.2$. The sample size will be

\begin{align*}
N &= n_1+n_2 = \frac{4(1.96+1.282)^2 [0.6(1-0.6)]}{0.2^2}\\
&= \frac{4(3.242^2)[0.6\times 0.4]}{0.2^2} = 252.25
\end{align*}

Considering using $N=260$ or 130 in each group.

Summary

Proper sample size determination is both an art and a science that balances statistical requirements with practical constraints. While formulas provide a starting point, thoughtful consideration of your specific research context is essential. When in doubt, consult with a statistician to ensure your study is appropriately powered to answer your research questions.

Sample Size Determination FAQs

  • What is meant by sample size?
  • What is the importance of determining the sample size?
  • What are the important considerations in determining the sample size?
  • What are the common situations for sample size determination?
  • What is the formula of a one-sample t-test?
  • What is the formula of a two-sample test?
  • What is the formula of a two-sample proportion test?
  • What is the importance of sample size determination?

R Programming Language

Formal Hypothesis Test

A formal hypothesis test in statistics is a structured method used to determine whether there is enough evidence in a sample of data to infer that a certain condition holds for the entire population. It involves making an initial assumption (the null hypothesis) and then evaluating whether the observed data provides sufficient evidence to reject that assumption in favor of an alternative hypothesis.

Null and Alternative Hypotheses

In a formal hypothesis test, the null hypotheses are denoted by $H_o$ and the alternative hypotheses are denoted by $H_a$. The null and alternative hypotheses need to be assigned as follows:

Null Hypothesis

The null hypothesis is the hypothesis being tested. $H_o$ must

  • be the hypothesis we want to reject
  • contain the condition of equality (=, $\ge$, or $\le$)

Alternative Hypothesis

The alternative hypothesis is always the opposite of the null hypothesis, $H_o$. $H_a$ must

  • be the hypothesis we want to support
  • not contain the condition of equality (<, >, $\ne$)

A formal hypothesis test will always conclude with a decision to reject $H_o$ based on sample data or the decision that there is not strong enough evidence to reject $H_o$.

Formal Hypothesis Test, Hypothesis Testing

Components of a Formal Hypothesis Test

The following are key components of a formal hypothesis test.

  • Null Hypothesis ($H_o$)
    It is a statement of “No Effect” or “No Difference”. For example, $H_o:\mu=\mi_o$ (population mean $\mu$ equals a specified value $\mu_o$
  • Alternative Hypothesis ($H_1$)
    It is a statement that contradicts the null hypothesis. An alternative hypothesis can be one-tailed (for example, $H_1:\mu> \mu_o$, or $H_1:\mu<\mu_o$) or two-tailed (for instance, $H_1:\mu\ne\mu_o$).
  • Test Statistic (Test Formula)
  • A numerical value is calculated from sample data by using an appropriate t-statistic, z-score, f-statistic, or $\chi^2$ statistic.
  • Significance Level ($\alfha$)
    The maximum acceptable probability is typically chosen at the outset of the hypothesis test and is referred to as the level of significance or significance level for the test. The level of significance is denoted by $\alpha$, and the most commonly used values are $\alpha = 0.10, 0.05, and 0.01$.
    Note that once $\alpha$ (level of significance) is determined, the value of $\beta$ is also fixed; the probability of making a type-II error in a hypothesis test.
  • P-value
    The probability of observing the test statistic (or more extreme) if $H_o$ is true. If $p\le\alpha$, reject $H_o$; otherwise, accept it.
  • Decision Rule
    Reject $H_o$ if the test statistic falls in the critical region or if $p\le\alpha$
  • Conclusion
    State whether there is sufficient evidence to reject $H_o$ in favour of $H_1$.

Hypothetical Example: One-Sample t-test

  • Null Hypothesis: The population mean $\mu=50$
  • Alternative Hypothesis: The population mean $\mu \ne 50$ (two-tailed test)
  • Test Statistic: $t=\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}$, where $\overline{x}$ is the sample mean, $s4 is the sample standard deviation, $n$ is the sample size
  • Decision: if $|t| > t_{\alpha/2, n-1}$ or $p<\alpha$, reject $H_o$

Real Life Examples of Formal Hypothesis Tests

The following are a few real-life examples of formal hypothesis tests used in various fields.

  • Medical Testing (Drug Efficacy): Consider a pharmaceutical company that tests whether a new drug lowers blood pressure more effectively than a placebo. It is a real case used in clinical trials for hypertension medications. The hypotheses will be
    • $H_o$: The drug has no effect ($\mu_{drug} = \mu_{placebo}$.
    • $H_1: The drug reduces blood pressure ($\mu_{drug}<\mu_{placebo}$). It is a one-tailed test.
    • Test Statistic Used: Two-sample t-test will be used for comparing the means of two groups.
  • Social Science (Opinion Polls): Consider a pollster who tests whether support for a political party candidate differs between men and women. The hypothesis may be
    • $H_o:$: No gender difference in support ($p_{men} = p_{women}$).
    • $H_1: Support differs by gender ($p_{men}\ne p_{p_{women}$). It is a two-tailed test.
    • Test Statistic Used: Chi-Square test for independence (categorical data) will be used.
  • Economics (Policy Impact): A government tests whether a tax incentive increased small business growth. The hypotheses will be
    • $H_o$: The policy had no effect ($\mu_{after} – \mu_{before}=0$).
    • Test Statistic use: Regression analysis with a dummy variable or difference-in-differences test.
  • Business and Marketing (A/B Testing): An e-commerce company tests whether a redesigned website increases sales compared to the old version. The hypotheses will be:
    • $H_o$:The new design has no impact on sales ($p_{new}=p_{old}$)
    • $H_1$: The new design increases sales ($p_{new}>p_{old}$). It is a one-tailed test.
    • Test Statistic: For comparing conversion rates, a two-proportion z-test can be used.
  • Manufacturing (Quality Control): Suppose a factory checks if the average weight of cereal boxes meets the advertised weight of 500g. The hypotheses are:
    • $H_o$: The mean weight is 500g ($\mu=500$)
    • $H_1$: The mean weight differs from 500g ($\mu\ne 500$). It is a two-tailed test.
    • Test Statistic: A sample t-test can be used for testing against a known standard.
  • Environmental Science (Pollution Levels): Researchers are interested in testing if a river’s pollution level exceeds the safe limit (e.g., lead concentration > 15ppm). The hypotheses may be:
    • $H_o$: Mean lead concentration $\le$ 15 ppm ($\mu\le 15$)
    • $H_1$: Mean lead concentration > 15 ppm ($\mu > 15$). It is a two-tailed test.
    • Test Statistic: One-sample t-test (or non-parametric Wilcoxon test, if data is skewed) can be used
  • Education (Test Score Improvement): A school may be interested in testing whether a new teaching method improves students’ math scores. The hypothesis may be
    • $H_o$: The new method has no effect ($\mu_{after} – \mu_{before}=0$)
    • $H_1$: The new method improves scores ($\mu_{after} > \mu_{before}$). It is a one-tailed test.
    • Test Statistic: A paired sample t-test can be used.
  • Psychology (Behavioural Studies): A researcher may test whether sleep deprivation affects reaction time. The hypotheses are
    • $H_o$: Sleep deprivation has no effect ($\mu_{sleep\,deprived} > u_{normal\,sleep})
    • $H_1$: Sleep deprivation increases reaction time ($\mu_{sleep\,deprived}>\mu_{normal}$)
    • Test Statistic: An Independent two-sample t-test can be used for comparing two groups.

Exploratory Data Analysis in R

Understanding Ridge Regression

Discover the fundamentals of Ridge Regression, a powerful biased regression technique for handling multicollinearity and overfitting. Learn its canonical form, key differences from Lasso Regression (L1 vs L2 regularization), and why it’s essential for robust predictive modeling. Perfect for ML beginners and data scientists!

Introduction

In cases of near multicollinearity, the Ordinary Least Squares (OLS) estimator may perform worse compared to non-linear or biased estimators. For near multicollinearity, the variance of regression coefficients ($\beta$’s, where $\beta=(X’X)^{-1}X’Y$), given by $\sigma^2(X’X)^{-1}$ can be very large. While in terms of the Mean Squared Error (MSE) criterion, a biased estimator with less dispersion may be more efficient.

Ridge Regression, Bias Variance Trade off

Understanding Ridge Regression

Ridge regression (RR) is a popular biased regression technique used to address multicollinearity and overfitting in linear regression models. Unlike ordinary least squares (OLS), RR introduces a regularization term (L2 penalty) to shrink coefficients, improving model stability and generalization.

Addition of the matrix $KI_p$ (where $K$ is a scalar to $X’X$ yields a more stable matrix $(X’X+KI_p)$. The ridge estimator of $\beta$ ($(X’X+KI_p)^{-1}X’Y$) should have a smaller dispersion than the OLS estimator.

Why Use Ridge Regression

OLS regression can produce high variance when predictors are highly correlated (multicollinearity). Ridge regression helps by:

  • Reducing overfitting by penalizing large coefficients
  • Improving model stability in the presence of multicollinearity
  • Providing better predictions when data has many predictors

Canonical Form

Let $P$ denote the orthogonal matrix whose elements are the eigenvectors of $X’X$ and let $\Lambda$ be the (diagonal) matrix containing the eigenvalues. Consider the spectral decomposition;

\begin{align*}
X’X &= P\Lambda P’\\
\alpha = P’\beta\\
X^* &= XP\\
C &= X’^*Y
\end{align*}

The mode $Y=X\beta + \varepsilon$ can be written as

$$Y = X^*\alpha + \varepsilon$$

The OLS estimator of $\alpha$ is

\begin{align*}
\hat{\alpha} &= (X’^*X*)^{-1}X’^* Y\\
&=(P’X’ XP)^{-1}C = \Lambda^{-1}C
\end{align*}

In scalar notation $$\hat{\alpha}_i=\frac{C_i}{\lambda_i},\quad i=1,2,\cdots,P_i\tag{(A)}$$

From $\hat{\beta}_R = (X’X+KI_p)^{-1}X’Y$, it follows that the principle of RR is to add a constant $K$ to the denominator of ($A$), to obtain:

$$\hat{\alpha}_i^R = \frac{C_i}{\lambda_i + K}$$

Grob criticized this approach, that all eigenvalues of $X’X$ are equal, while for the purpose of stabilization, it would be reasonable to add rather large values to small eigenvalues but small values to large eigenvalues. This is the general ridge (GR) estimator. it is

$$\hat{\alpha}_i^R = \frac{C_i}{\lambda_i+K_i}$$

Ridge Regression vs Lasso Regression

Both are regularized regression techniques, but:

FeatureL2L1
ShrinkageShrinks coefficients evenlyCan shrink coefficients to zero
Use CaseMulticollinearity, many predictorsFeature selection, sparse models

Ridge regression is a powerful biased regression method that improves prediction accuracy by adding L2 regularization. It’s especially useful when dealing with multicollinearity and high-dimensional data.

Learn R Programming Language