Sample size determination is one of the most critical steps in designing any research study or experiment. Whether the researcher is conducting clinical trials, market research, or social science studies, the selection of an appropriate sample size ensures that the results are statistically valid while optimizing resources. This guide will walk you through the key concepts and methods for sample size determination.
Table of Contents
In planning a study, the sample size determination is an important issue required to meet certain conditions. For example, for a study dealing with blood cholesterol levels, these conditions are typically expressed in terms such as “How large a sample do I need to be able to reject the null hypothesis that two population means are equal if the difference between them is $d=10$mg/dl?“
Why Sample Size Matters
- Statistical Power: Adequate sample sizes increase the ability to detect true effects
- Precision: Larger samples typically yield more precise estimates
- Resource Efficiency: Avoid wasting time/money on unnecessarily large samples
- Ethical Considerations: Especially important in clinical research to neither under- nor over-recruit participants
Special Considerations for Estimating Sample Size
- Small Populations: May require finite population corrections
- Stratified Sampling: Need to calculate for each stratum
- Cluster Sampling: Must account for design effect
- Longitudinal Studies: Consider repeated measures and attrition
Sample Size Determination Formula
In general, there exists a formula for computing a sample size for the specific test statistic (appropriate to test a specified hypothesis). These formulae require that the user specify the $\alpha$-level and Power = ($1-\beta$) desired, as well as the difference to be detected and the variability of the measure.
Common Approaches to Sample Size Calculation
For Estimating Proportions (Prevalence Studies)
The common approach to calculate sample size, use the formula:
$$n=\frac{Z^2 p (1-p)}{E^2}$$
where
- Z = Z-value (1.96 for 95% confidence interval)
- p = estimated proportion
- E = margin of error
For a survey with an expected proportion of 50%, a 95% confidence level, and 5% margin of error, the sample size will be
$$n=\frac{1.96^2 \times 0.5 \times 0.5}{0.05^2} \approx 385$$
Note that it is not wise to calculate a single number for the sample size. It is better to calculate a range of values by varying the assumptions so that one can get a sense of their impact on the resulting projected sample size. From this range of sample sizes, a suitable sample may be picked for the research work.
Common Situations for Sample Size Determination
We consider the process of estimating sample size for three common circumstances:
- One-Sample t-test and paired t-test
- Two-Sample t-test
- Comparison of $P_1$ vs $P_2$ with a Z-test
One Sample t-test and Paired test
For testing the hypothesis:
$H_o:\mu=\mu_o\quad$ vs $\quad H_1:\mu \ne \mu_o$
For a two-tailed test, the formula of one-sample t-test is
$$n = \left[\frac{(Z_{1-\alpha/2} + Z_{1-\beta})\sigma}{d} \right]^2$$
Example: Suppose we are interested in estimating the size of a sample from a population of blood cholesterol levels. The typical standard deviation of the population is, say, 30 mg/dl. Consider, $\alpha = 0.05, \sigma = 25, d = 5.0, power = 0.80$
\begin{align*}
n & = \left[ \frac{(Z_{1-\alpha/2} + Z_{1-\beta})\sigma}{d} \right]^2\\
&= \left[\frac{(1.96 + 0.842)}{5}25\right]^2 = 196.28 \approx 197
\end{align*}
Two Sample t-test
How large a sample would be needed for comparing two approaches to cholesterol lowering using $\alpha=0.05$, to detect a difference of $d=20$ mg/dl or more with power = $1-\beta=0.90$? For the following hypothesis
$H_o:\mu_1 =\mu_2\quad$ vs $\quad H_1:\mu_1 \ne \mu_2$. For a two-tailed t-test, the formula is
$$N=n_1+n_2 = \frac{4\sigma^2(Z_{1-\alpha/2} + Z_{1-\beta})^2 } {(d = \mu_1 – \mu_2)^2}$$
For $\sigma = 30$mg/dl, $\beta=0.10, \alpha = 0.05$, $Z_{1-\alpha/2}=1.96$, Power = $1-\beta$, $Z_{1-\beta}=1.282$, d = 20 mg/dl.
\begin{align*}
N &= n_1 + n_2 = \frac{4(30)^2 (1.96 + 1.282)^2}{20^2}\\
&= \frac{4\times 900 \times (3.242)^2}{400} = 94.6
\end{align*}
The required sample size is about 50 for each group.
Two Sample Proportion Test
For testing the two-sample proportions hypothesis,
$H_o:P_1=P_2 \quad$ vs $\quad H_1:P_1\ne P_2$
The formula for the two-sample proportion test is
$$N=n_1+n_2 = \frac{{4(Z_{1-\alpha} + Z_{1-\beta})^2}\left[\left(\frac{P_1+P_2}{2}\right) \left(1-\frac{P_1+P_2}{2}\right) \right] }{(d=P_1-P_2)^2}$$
Consider when $\sigma = 30$ mg/dl, $\beta=0.10$, $\alpha = 0.05$, $Z_{1-\alpha/2} = 1.96$, Power = $1-\beta$; $Z_{1-\beta} = 1.282$. $P_1 = 0.7, P_2=0.5$, $d=P_1 – P_2 = 0.7-0.5 = 0.2$. The sample size will be
\begin{align*}
N &= n_1+n_2 = \frac{4(1.96+1.282)^2 [0.6(1-0.6)]}{0.2^2}\\
&= \frac{4(3.242^2)[0.6\times 0.4]}{0.2^2} = 252.25
\end{align*}
Considering using $N=260$ or 130 in each group.
Summary
Proper sample size determination is both an art and a science that balances statistical requirements with practical constraints. While formulas provide a starting point, thoughtful consideration of your specific research context is essential. When in doubt, consult with a statistician to ensure your study is appropriately powered to answer your research questions.
Sample Size Determination FAQs
- What is meant by sample size?
- What is the importance of determining the sample size?
- What are the important considerations in determining the sample size?
- What are the common situations for sample size determination?
- What is the formula of a one-sample t-test?
- What is the formula of a two-sample test?
- What is the formula of a two-sample proportion test?
- What is the importance of sample size determination?