Effect Size Calculator (Cohen's d + Hedges' g)
Effect size calculator. Cohen's d, Hedges' g (small-sample bias correction), Glass's Δ, and r. Magnitude interpretation included.
Effect Size
| Hedges' g (small-sample corrected) | — |
| Glass's Δ (control SD only) | — |
| r (correlation form) | — |
| Pooled SD | — |
| Cohen benchmarks | 0.2 small · 0.5 medium · 0.8 large |
How to use the effect size calculator
Enter group means + SDs + sample sizes
You need mean, SD, and n for both groups. Group 1 is typically the treatment/intervention; Group 2 is the control or comparison. Order matters only for sign: positive d means Group 1 > Group 2.
Read Cohen's d first
Cohen's d is the standardized mean difference: (M₁ − M₂) / pooled SD. It tells you how many SDs apart the two groups are. d = 0.5 means the treatment shifted the average person 0.5 SDs. Cohen's benchmarks: 0.2 small · 0.5 medium · 0.8 large — but these are rough guides, not universal truths.
Use Hedges' g for small samples
For n₁ + n₂ < 50, prefer Hedges' g. It applies a small-sample bias correction: g = d · (1 − 3 / (4·N − 9)). For large samples g ≈ d.
Choose Glass's Δ when SDs differ
If the treatment is suspected to inflate SD (which violates pooling assumptions), use Glass's Δ = (M₁ − M₂) / s₂ — uses only the control SD as the standardizer. Common in psychology where intervention effects increase variability.
Report effect size alongside p-values
APA/Open Science guidance: always report effect size + 95% CI alongside p-values. A p < 0.05 finding with d = 0.04 means "real but trivial". A non-significant result with d = 0.6 might mean "underpowered, worth replicating".
Effect size — magnitude vs significance, and why both matter
P-values tell you whether an effect is likely real; effect sizes tell you whether it\'s big enough to care about. A drug that reduces blood pressure by 0.5 mmHg in a 50,000-person trial will produce p < 0.001 — but the effect is clinically meaningless. Conversely, a 30-person pilot showing a 10-point IQ gain might fail to reach significance because the trial is underpowered — but the effect size (d ≈ 0.7) signals the intervention deserves a properly powered replication. Modern statistical guidance (APA, Open Science Framework, AMA) requires effect size + confidence interval reporting in every quantitative paper, alongside or instead of p-values.
The Cohen\'s d family
Cohen\'s d is the workhorse: standardized mean difference using pooled SD. Hedges\' g corrects d for small-sample bias — d overestimates the population effect when n is small, and the correction factor (1 − 3 / (4N − 9)) shrinks the estimate toward zero. Glass\'s Δ uses only the control group\'s SD as the standardizer, appropriate when the treatment is suspected to inflate variability. For correlational designs, the equivalent is r (Pearson correlation) or r² (proportion of variance explained). All are interconvertible: r ≈ d / √(d² + 4) for equal-sized groups.
"The primary product of a research inquiry is one or more measures of effect size, not p-values." — Cohen J. (1990) "Things I have learned (so far)", American Psychologist 45(12).
Cohen\'s benchmarks — useful, but contextual
Cohen\'s original 1988 textbook proposed 0.2 (small), 0.5 (medium), 0.8 (large) as benchmarks. These have been widely adopted but were intended as rough heuristics, not laws. Field-specific norms matter: in medicine, d = 0.2 for a low-cost intervention can be enormous practical value; in education, classroom interventions rarely exceed d = 0.4 — Hattie\'s "Visible Learning" meta-analysis of 800+ effect sizes found median d ≈ 0.4 across all educational interventions. Always interpret effect size against the field\'s base rate, not just against Cohen\'s ladder.
ASEAN research applications
Singapore Ministry of Education evaluates intervention programs (reading, math, ICT) using Cohen\'s d for between-cohort comparison. NUS, NTU, and University of Malaya psychology + medical departments report effect size in standard format. Indonesia + Philippines public health research increasingly uses Hedges\' g for small-trial pilot evaluation. Reporting d or g alongside p-values is now standard in journals like Asian Journal of Psychology, Singapore Medical Journal, and Asian Pacific Journal of Public Health.
10 Things to Know About Effect Sizes
Cohen's d = standardised mean difference; (M₁ − M₂) / pooled SD.
Cohen benchmarks: 0.2 small · 0.5 medium · 0.8 large.
Hedges\' g = d with small-sample correction. Use for N < 50.
Glass\'s Δ uses control SD only. For when treatment inflates variance.
Effect size + 95% CI is required in modern APA-style reporting.
Hattie\'s "Visible Learning": median d ≈ 0.4 across 800+ educational interventions.
r ≈ d / √(d² + 4) for equal-sized groups. Family of effect sizes is interconvertible.
p-value tells you "is it real?". Effect size tells you "is it big?".
For ANOVA: η² (eta-squared) or ω² (omega-squared) — proportion of variance.
Meta-analysis combines effect sizes (not p-values) across studies. d is the universal currency.
Frequently asked questions
For small samples (combined N < 50), report Hedges' g. For large samples, d and g converge — either is fine, but g is becoming the default in meta-analytic work because of its bias correction.
Weighted average of the two groups' variances: SD_pooled = √[((n₁−1)·s₁² + (n₂−1)·s₂²) / (n₁ + n₂ − 2)]. Assumes the two populations have equal variance.
When the treatment likely changes variance (not just the mean) — pooling SDs is misleading. Common in psychotherapy outcome research, where the active treatment can spread responses out as much as shift the average.
No. They\'re heuristics from psychology research circa 1988. Educational interventions rarely exceed d = 0.4; medical low-cost preventives can be hugely impactful at d = 0.2; some physiology effects routinely produce d = 1.5+. Interpret against the field\'s base rate.
Effect size estimates have uncertainty too. A study reporting d = 0.5 with 95% CI [0.1, 0.9] is much less precise than d = 0.5 with CI [0.45, 0.55]. CI width is the precision; the point estimate is the best guess.
d = 0.5 → ~33% non-overlap of the two distributions. d = 0.8 → ~47% non-overlap. d = 2.0 → ~81% non-overlap. Useful for communicating to non-statistical audiences ("the average treated person is better off than 69% of controls" at d = 0.5).
For binary outcomes use odds ratio (OR) or risk difference. For ordinal outcomes, consider Cliff\'s delta or rank-biserial correlation. d is a continuous-outcome measure.
Yes. d = t · √(1/n₁ + 1/n₂) for independent samples. Useful when reading older papers that report only t and df.
No. All inputs stay in your browser.
Cohen J. (1988) "Statistical Power Analysis for the Behavioral Sciences", 2nd ed. Hedges & Olkin (1985) "Statistical Methods for Meta-Analysis". Cumming "Understanding the New Statistics". APA Publication Manual 7th ed. on effect-size reporting.
Related News
You may be interested in these recent stories from our newsroom.
No related news yet for this tool. Our editorial team publishes new pieces every week.
Browse all news →75 more free tools
Calculators, converters, security tools — no signup.