Sample Size in Research: How to Justify It — PaperDraft

If someone told you "30 is enough for any study" and your reviewer is now asking how you justified your sample size, the short answer is: 30 is not a rule, it's a rumor. Sample size in research has to be justified, and the justification depends on what you're testing, how big an effect you expect, and how much statistical power you need to detect it. Pick too few participants and your study can't detect a real effect. Pick too many and you waste resources — or worse, find tiny, meaningless effects and call them significant.

PaperDraft is a writing assistant, not a paper generator — the draft is your starting point, not your submission. You are responsible for editing, verifying sources, and following your school's academic integrity policy.

This guide walks through how to determine sample size, how to justify it in your Methods section, and the different rules that apply to quantitative, qualitative, and mixed-methods studies. For a broader view, see how to write a research paper.

Why Sample Size Matters

Sample size determines whether your statistical test has the power to detect an effect that actually exists in the population. Power is conventionally set at 0.80, meaning an 80% chance of detecting the effect if it's real.

Three things set the required sample size in a quantitative study:

Effect size. How big is the difference or relationship you're testing? Small effects need bigger samples to detect.
Significance level (alpha). Usually 0.05. Lower alpha = larger sample needed.
Power. Usually 0.80. Higher power = larger sample needed.

A classic example: to detect a medium effect (Cohen's d = 0.5) between two groups with alpha = .05 and power = .80, you need roughly 64 participants per group. To detect a small effect (d = 0.2), you need about 394 per group. The difference is not incidental.

What goes wrong with too few

An underpowered study can miss a real effect (false negative, Type II error) — and worse, any significant result you do find is more likely to be an overestimate of the true effect. This is part of why small underpowered studies contribute to the replication crisis in several fields.

What goes wrong with too many

Oversized samples can detect effects so small they are not practically meaningful. A correlation of r = .05 can be statistically significant with n = 5,000, but it explains 0.25% of the variance. The effect is real; its importance is debatable.

How to Determine Sample Size for Quantitative Studies

The standard tool is a power analysis, typically conducted before data collection (a priori). Free software like G*Power handles the calculation once you specify the test, effect size, alpha, and power.

The inputs you need

Statistical test. t-test, ANOVA, chi-square, regression — each has its own calculation.
Expected effect size. From prior research, meta-analysis, or field conventions. Cohen's conventions (small/medium/large) are a fallback, not a first choice.
Alpha. Conventionally 0.05.
Power. Conventionally 0.80, sometimes 0.90.
Design specifics. Number of groups, covariates, repeated measures.

Example calculation

Study: Between-subjects experiment, two conditions, t-test comparison, expected medium effect (d = 0.5), alpha = .05, power = .80.

Required n: 64 per group (128 total). Inflate by 10-20% to account for attrition — aim for 77 per group (154 total).

You write this up in Methods as: "A power analysis in G*Power (version 3.1) indicated a minimum sample of 128 participants (64 per group) to detect a medium effect (d = 0.5) at alpha = .05 with 80% power. We recruited 154 participants to account for attrition."

That paragraph defends the sample size. No paragraph = a reviewer question at minimum.

How to Determine Sample Size for Qualitative Studies

Qualitative sample size is not about statistical power. It's about saturation — the point at which new interviews, observations, or documents stop producing new themes.

Common rules of thumb

Phenomenological studies: 5-25 participants.
Grounded theory: 20-30 participants, continuing until theoretical saturation.
Case studies: 1-10 cases, depending on depth.
Ethnography: Extended engagement with a single setting; numeric sample size is less central.

You justify qualitative sample size by referencing saturation: "Data collection continued until thematic saturation was reached, defined as no new themes emerging in the final three interviews. This occurred at n = 18."

Qualitative sample size is defended conceptually, not calculated statistically.

Sample Size in Mixed-Methods, Surveys, and Other Designs

Not every design fits neatly into the two bins above.

Surveys

Large-sample surveys often use population-representativeness formulas rather than effect-size calculations. A survey targeting a population of N = 10,000 with a 5% margin of error and 95% confidence needs roughly n = 370. Sample-size calculators for surveys (Cochran's formula, online calculators) handle this.

Mixed-methods

Each arm justifies its own sample size. The quantitative arm uses power analysis; the qualitative arm uses saturation. State both separately in Methods.

Meta-analyses and systematic reviews

Sample size is the number of studies included, not participants. Justify with your inclusion criteria and search strategy.

Pilot studies

Pilot studies don't require a full power analysis. Justify them as "feasibility testing" with 10-30 participants, and be clear they are not intended to test hypotheses.

Sample size justified but the blank Methods section is still staring at you? PaperDraft gives you a structured first draft — thesis stub, IMRaD skeleton, Methods subsections in academic register — so you can spend your time on analysis plan instead of formatting. It's a drafting assistant, not a submission. Try PaperDraft — free

How to Write the Sample Size Justification

Your Methods section should include a dedicated paragraph (or subsection) on sample size. A strong justification has four elements:

The method. "A priori power analysis in G*Power," or "saturation monitored during data collection."
The inputs or criteria. Effect size, alpha, power (quantitative) or saturation criteria (qualitative).
The resulting sample size. "Minimum n = 128" or "n = 18, at which point saturation was reached."
Adjustments or deviations. Attrition, non-response, dropout handling.

Example (quantitative):

"An a priori power analysis using G*Power 3.1 indicated that a minimum sample of 128 participants (64 per condition) was required to detect a medium effect (d = 0.5) at alpha = .05 with 80% power, based on effect sizes reported in similar interventions (Smith, 2022; Lee, 2023). We recruited 154 participants to account for anticipated attrition. Four participants withdrew before completion, yielding a final analytic sample of 150."

Example (qualitative):

"Participants were recruited until thematic saturation was achieved, operationalized as three consecutive interviews producing no new codes. Saturation was reached at 18 participants, consistent with recommended sample ranges for phenomenological studies (Creswell, 2013)."

These two paragraphs are the standard. Adapt the language, keep the structure.

Common Mistakes Students Make With Sample Size

A few errors show up across first drafts.

No justification at all. Reporting "we surveyed 85 students" without saying why 85 is a red flag. Always justify.

Assuming n = 30 is universal. The "30 is enough" rumor comes from the central limit theorem approximation, not from study power. It's almost never the right answer for a specific test.

Using post-hoc power analysis to defend a non-significant result. Post-hoc power is largely circular — a study that was underpowered remains underpowered. Justify a priori instead.

Confusing confidence interval width with power. A narrow CI requires a large sample; so does good power. But they're answering different questions.

Ignoring attrition. If you recruit exactly your minimum, any dropout puts you below target. Inflate by at least 10-15%.

Calculating for the wrong test. Power for a t-test is different from power for a regression with covariates. Match the analysis to the calculation.

How a Drafting Assistant Fits

A drafting tool can scaffold the sample size paragraph — the standard four-element structure, the register, and the Methods section around it. What it cannot do is run the power analysis for you, choose the right effect size from prior literature, or decide when qualitative saturation has been reached. PaperDraft handles the structure and the language. You handle the statistical judgment and the honest defense of your design.

FAQ

What's the smallest acceptable sample size for a statistical test?

There's no single minimum. It depends on the effect size you're trying to detect. For t-tests with medium effects at conventional alpha and power, ~64 per group. For small effects, much more.

Can I justify a small sample by saying it's a pilot study?

Yes, as long as you frame it as a pilot. Don't run inferential tests and claim significance — pilots are for feasibility and effect-size estimation for a later, larger study.

How do I know what effect size to use in my power analysis?

First choice: a prior meta-analysis or replication-quality study in your area. Second choice: a prior single study close to yours. Last resort: Cohen's conventions (small/medium/large), which are crude but better than nothing.

Does a large sample always mean a stronger study?

No. A large sample with a biased sampling frame (only volunteers, only one site) still produces biased results. Representative sampling matters as much as size.

What if my sample is fixed (archival data, existing dataset)?

Acknowledge it in Methods. Run a sensitivity analysis: given the fixed n, what is the minimum effect size you can detect with 80% power? Report that as your detection threshold.

Once your sample size is justified, the Methods section holds up to a reviewer. For the next piece — reporting what the data show — see writing the results section.