Compute chi-square test for goodness-of-fit (observed vs expected) or test of independence (contingency table). Outputs χ², df, p-value, and Cramér's V effect size.

RT-CNV-091 · Converters & Units

Chi-Square Test Calculator

Each row = a different category of variable A; each column = a category of variable B. Minimum 2×2.
χ² statistic
Degrees of freedom
p-value
Cramér's V
Enter data + pick test mode to compute chi-square
Advertisement
After results · AD-W1Responsive · Post-tool — peak engagement

How to use the Chi-Square Test Calculator

Pick test type

Independence (contingency table): tests whether two categorical variables are related. Example: gender × product preference, customer segment × purchase decision. Default. Goodness-of-fit: tests whether observed counts in categories match expected proportions. Example: do dice rolls match the expected uniform distribution?

Enter data

Contingency table: one row per line; columns separated by commas. 2×2 = "A/B test conversion" pattern. Goodness-of-fit: observed counts in one row; expected counts (or proportions) in another. The tool computes degrees of freedom automatically: (rows-1) × (cols-1) for independence, (categories-1) for goodness-of-fit.

Read χ² + p-value + Cramér's V

χ² statistic measures the discrepancy between observed and expected. p-value is the probability of seeing this much (or more) discrepancy if H₀ (independence/good-fit) were true. Cramér\'s V is the effect size — 0.10 small, 0.30 medium, 0.50 large.

Watch for expected-count warnings

Chi-square requires expected counts ≥ 5 per cell for reliable results. The tool warns when cells fall below this. If many cells have low expected counts: (a) combine adjacent categories, (b) increase sample size, or (c) use Fisher\'s exact test (for 2×2 tables with small n).

Advertisement
After how-to · AD-W2Responsive

Chi-square — the categorical-data equivalent of the t-test

While t-tests handle continuous data (means + standard deviations), chi-square tests handle categorical data (counts of things falling into categories). Karl Pearson developed the test in 1900, and it became one of the foundational tools of statistical inference. The math is intuitive: compute the difference between observed counts and what you\'d expect under your hypothesis, square those differences (so they don\'t cancel out), divide by expected to standardise, and sum across all cells. Look up the result in a chi-square distribution to get a p-value. This tool handles both common variants: goodness-of-fit (one categorical variable vs hypothesised proportions) and independence (testing whether two categorical variables are related).

Goodness-of-fit vs independence — when to use which

Goodness-of-fit: ONE categorical variable, compared to hypothesised proportions. Examples: "Do M&M color frequencies in this bag match the manufacturer\'s stated distribution?"; "Do dice roll outcomes (1-6) match the expected uniform 1/6 each?"; "Does observed product-rating distribution match the long-term baseline?" Always reducible to a single row of observed counts + corresponding expected counts. Independence: TWO categorical variables, testing whether they\'re related. Examples: "Is product preference (Type A/B/C) independent of customer segment (Premium/Standard/Basic)?"; "Does conversion rate (converted/not) differ between A/B test variants?"; "Are vote preference + age bucket related?" Always reducible to a 2D contingency table where rows = categories of one variable + columns = categories of the other.

Karl Pearson developed chi-square in 1900 — predating Student's t-test (1908) by 8 years. Both remain the dominant statistical-test choices today: t-test for continuous, chi-square for categorical.

Expected counts + the n ≥ 5 rule

Chi-square approximation relies on each cell having sufficient expected count. The standard rule: all expected counts should be ≥ 5; some sources allow ≥ 80% of cells with at least 5 + no cell below 1. Why this matters: chi-square uses the chi-square continuous distribution to approximate a discrete reality (counts). With low expected counts, the approximation breaks down + p-values become unreliable. What to do: (1) combine adjacent categories to boost cell counts ("under 30, 30-44, 45-59, 60+" → "under 45, 45+"); (2) use Fisher\'s exact test for 2×2 tables (gives exact p-values, no approximation); (3) use simulation-based methods (Monte Carlo chi-square) for larger sparse tables. This tool warns when cells have expected < 5 so you know whether to trust the result.

The ASEAN A/B testing landscape

Categorical data analysis is core to ASEAN tech firms\' product experimentation. Grab: thousands of monthly A/B tests via internal experimentation platform; binary conversion testing (booked ride / didn\'t) handled by chi-square or proportion z-test variants. Sea + Shopee: similar scale; categorical outcomes (purchased, signed up, churned) dominate the metric mix. Lazada + Carousell + Carro: medium-scale experimentation programs. Common patterns in ASEAN platforms: (a) sample sizes large enough that chi-square approximation is reliable (often n > 50,000 per variant); (b) effect sizes are typically small (1-3% lift) so statistical significance often easy to reach but practical significance is the harder question; (c) multi-armed tests (A/B/C/D) common — requires proper multiple-comparison adjustment to avoid false positives. Use chi-square independence test for the omnibus "do variants differ?" question, then pairwise t-tests or proportion tests with Bonferroni correction for follow-up.

10 Things to Know About Chi-Square

01

Karl Pearson 1900 — chi-square predates Student\'s t-test (1908). Foundational categorical-data statistical test.

02

Two main types: goodness-of-fit (one variable vs expected) + independence (two variables, contingency table).

03

Cell expected counts ≥ 5 required for reliable chi-square. Otherwise use Fisher\'s exact test or combine categories.

04

Cramér\'s V = effect size for independence. 0.10 small, 0.30 medium, 0.50 large.

05

χ² is always non-negative; larger values indicate larger discrepancy from expected.

06

Degrees of freedom: (rows-1) × (cols-1) for independence; (categories-1) for goodness-of-fit.

07

For 2×2 tables with small n, Fisher\'s exact test is preferred (gives exact p-values without approximation).

08

A/B testing conversion rates: 2×2 contingency table is the canonical use case. Variant × converted/not.

09

Survey research, genetic ratios, quality control, marketing segmentation — all rely on chi-square for hypothesis testing.

10

ASEAN tech firms (Grab, Shopee, Sea) run thousands of chi-square tests monthly across experimentation platforms.

Frequently Asked Questions

  • Data type drives the choice. Chi-square: categorical data (counts in categories). Examples: conversion rate, color preference, segment counts. T-test: continuous data (numeric values). Examples: revenue per user, time on page, weight, age. Common mistake: t-test on rates (e.g., "5.2% vs 4.8% conversion rate") — should be chi-square or z-test for proportions. The underlying data is binary (converted yes/no), making chi-square more appropriate.

  • Three options. (1) Combine sparse categories: merge low-count cells with adjacent ones ("under 25, 25-34, 35-44, 45+" → "under 35, 35+"). (2) Use Fisher\'s exact test (for 2×2 tables): gives exact p-values without the chi-square approximation. Most stats software (R, SPSS, Python scipy) implements it. (3) Increase sample size: collect more data so cells fill up. For exploratory work, chi-square with cell expected ≥ 1 is sometimes still informative; for publication-quality work, ≥ 5 is the standard threshold.

  • The effect size measure for chi-square independence tests. Formula: V = √(χ² / (n × min(rows-1, cols-1))). Ranges 0 to 1. Interpretation: 0 = no association, 1 = perfect association. Conventional cutoffs: 0.10 small, 0.30 medium, 0.50 large. Why effect size matters: p-value tells you if association is "real" (not chance); Cramér\'s V tells you "how strong" the association is. With large samples, weak associations become statistically significant; effect size lets you spot when significance is "real but tiny."

  • Chi-square: works well for large samples + 2×2 or larger tables; uses chi-square distribution approximation. Fisher\'s exact test: works for small samples + 2×2 tables (originally; extends to larger tables computationally); gives EXACT p-values via hypergeometric distribution. Decision rule: for 2×2 tables, when any expected cell < 5, prefer Fisher\'s exact. For larger tables, chi-square is computationally easier + still reasonable when expected counts are moderate. Fisher\'s exact for larger tables requires more CPU but most modern stats software handles it.

  • Build a 2×2 contingency table: rows = A vs B variant, columns = converted vs not converted. Example: Variant A: 250 converted / 4,750 not = 5,000 total. Variant B: 280 converted / 4,720 not = 5,000 total. Run chi-square independence test. p < 0.05 means conversion rate differs significantly between variants. Cramér\'s V quantifies the effect size. For more than 2 variants: A/B/C/D table; chi-square gives omnibus "are any different?" test; follow-up with pairwise tests + Bonferroni correction.

  • For very large chi-square statistics, p-values can become so small that floating-point precision loses meaningful digits. Reporting "< 0.0001" is the standard convention when the actual value rounds to zero in scientific notation. For very small p-values, the exact magnitude rarely matters — both p = 0.00001 and p = 0.0000001 are "extremely significant"; reporting one or the other doesn\'t change the interpretation.

  • An adjustment for 2×2 chi-square tests when expected counts are small (5-10). Subtracts 0.5 from |observed - expected| before squaring. Effect: produces more conservative p-values. Modern practice: Yates\' correction is somewhat controversial — it\'s overly conservative in many cases, and Fisher\'s exact test is generally preferred for small-sample 2×2 problems. This tool doesn\'t apply Yates\' correction by default. If your stats course or institution requires it, use Fisher\'s exact instead for cleaner interpretation.

  • Yes. A significant chi-square tells you "the categories aren\'t independent" but doesn\'t tell you WHICH cells drive the difference. Post-hoc options: (1) Examine standardised residuals: (observed - expected) / √expected. Cells with |residual| > 2 contributed substantially to significance. (2) Pairwise chi-square tests on subsets of the table, with Bonferroni correction for multiple comparisons. (3) Marascuilo procedure: post-hoc method for proportions with multiple comparison adjustment built in. Most stats software automates these.

  • No. All calculations run in your browser via JavaScript. Open DevTools → Network and confirm zero outbound requests. Data stays on your device. Safe for confidential research + business data.

  • Pair with: T-Test (RT-CNV-090) for continuous data; ANOVA (RT-CNV-092) for comparing 3+ group means; Sample Size Calculator (RT-CNV-093) for pre-test sizing; Linear Regression (RT-CNV-084) for relationship modelling; Standard Deviation (RT-CNV-081) for descriptive stats. External: R + Python (scipy.stats), SPSS, JASP (free academic), GraphPad Prism, JMP — all support full categorical-data analysis suites.

Related News

You may be interested in these recent stories from our newsroom.

No related news yet for this tool. Our editorial team publishes new pieces every week.

Browse all news →
Advertisement
Pre-footer · AD-W3 728 × 90

75 more free tools

Calculators, converters, security tools — no signup.