PDF Diff

Share:

Compare two PDFs and see added, removed, and unchanged lines side by side. 100% browser-based.

RT-IMG-026 · Image & File

PDF Diff

No PDF loaded.
No PDF loaded.
Diff result will appear here.
💡 How it compares: the tool extracts text from both PDFs line-by-line, then runs a Longest Common Subsequence diff (the same algorithm GNU diff uses). Lines only in the original show in red on the left. Lines only in the revised show in green on the right. Unchanged lines align across both sides.
Drop the original PDF on the left, the revised PDF on the right.
🔒 Both PDFs stay on your device. Text extraction and diff computation happen entirely in your browser — nothing is uploaded, even for sensitive legal or financial documents. Verify in DevTools → Network.
Advertisement
After results · AD-W1 Responsive · Post-tool — peak engagement

How to compare two PDFs

Drop both PDFs

Left dropzone for the original ("before"), right dropzone for the revised ("after"). Order matters — getting it backwards shows additions as deletions and vice versa.

Click Compare

Both PDFs are text-extracted, line-segmented, and fed through an LCS diff. Time is roughly proportional to O(M × N) where M and N are line counts — a typical 50-page contract takes under a second.

Read the side-by-side result

Red rows on the left are lines only in the original (removed in the revised). Green rows on the right are lines only in the revised (added). White rows are unchanged. By default, long runs of unchanged lines are collapsed with ⋯ N unchanged lines ⋯ markers — untick the collapse option to see everything.

Export the diff

Click Download .diff.txt to save a unified-diff-style text file (- removed line / + added line). Attach to an email, paste into a code review tool, or commit to Git.

Advertisement
After how-to · AD-W2 Responsive

PDF diff — the contract-review tool every junior lawyer secretly wants

Comparing two PDFs is the single most common request in any legal, compliance, or policy job. A counterparty sends back a marked-up version of the contract you sent them; you need to know what actually changed. A new version of a standard operating procedure lands in your inbox; the email says "minor updates" but you've learned not to trust that. A research paper goes through revision; you want to know what the reviewers actually accepted. The desktop tool that does this — Adobe Acrobat Pro's "Compare Files" feature — costs around US$240 a year. The web alternatives almost all upload your confidential documents to their servers. This tool does it in your browser, with zero upload, on documents that should never leave your device.

The LCS diff algorithm in plain English

Longest Common Subsequence is the foundational diff algorithm used by GNU diff, Git, and every code-review tool in the world. The idea: given two sequences of lines, find the longest sequence of lines that appears (in order) in both. Everything in that common sequence is "unchanged." Lines in the original but not the common sequence are "removed." Lines in the revised but not the common sequence are "added." That's the whole algorithm. The implementation uses dynamic programming — a 2D table where cell (i, j) holds the LCS length of the first i lines of A and first j lines of B. Fill the table, then backtrack to extract the actual diff. Takes about 10 milliseconds for a 1,000-line contract.

The same algorithm that powers Git's git diff, GNU diff, and every code-review tool in the world. Now running in your browser, on your contracts, with nothing uploaded.

The APAC document-review use case

Contract comparison is one of the highest-paid and lowest-fun activities in Singapore's legal and compliance industries — every M&A round, every employment agreement, every loan facility produces multiple drafts that someone has to diff. Malaysia's SC and Bank Negara filings, Indonesia's OJK regulatory submissions, Vietnam's decree updates that ripple through dozens of supplier agreements, the Philippines' BIR ruling clarifications, Thailand's SEC filings, and Hong Kong's SFC compliance documents all involve constant version comparison. A browser-side diff matters because the alternative is sending privileged drafts to a US-hosted SaaS tool — which is a compliance issue under most APAC privacy regimes and most law firm internal policies.

Limitations to be honest about

This tool diffs the text extracted by pdf.js — which means it sees a PDF the same way the text-layer sees it. Layout changes, font changes, image changes, and signature additions don't appear in the diff at all (they're not text). Reordered paragraphs show as a removal followed by an addition. Multi-column layouts can produce misaligned diffs because the text-extraction order doesn't always match human reading order. For pixel-perfect visual comparison (highlight EVERY change including formatting), you need a rendering-based diff like Acrobat Pro's compare. For text-content comparison — what 95% of users actually need — this tool is faster and more private.

10 Things to Know About Document Diffing

01

The LCS algorithm was first published in 1974 by Wagner and Fischer. The Myers algorithm (1986) is the variant used by GNU diff and Git — same output, faster in practice.

02

"Diff" became a UNIX command in 1974 thanks to Doug McIlroy at Bell Labs. Its output format — lines prefixed with + and - — is still the standard 50 years later.

03

Adobe Acrobat Pro's "Compare Files" feature does both text AND visual diffing, but costs about US$240 a year. Most users only need the text diff — which this tool handles for free.

04

Microsoft Word has a "Compare" feature for .docx files — but it doesn't work on PDFs. Converting PDF to DOCX first introduces formatting noise that confuses Word's comparison.

05

Legal review platforms like Litera, Workshare, and DraftWise all use LCS-style diff algorithms underneath the polished UI. The core technology hasn't changed in 30 years.

06

Git uses LCS to diff source code, but with a twist: it uses the "Myers" optimisation to find diffs that minimise the number of changed lines. We use the textbook LCS — easier to understand and read.

07

The phrase "redline version" comes from the pre-computer era when lawyers literally drew red lines through removed text and added new text in blue ink between the lines.

08

"Unified diff" format (the one Git shows) was invented in 1990 to compact "old" and "new" sides into one stream. Side-by-side is older — it's how lawyers and editors have read diffs since the 1900s.

09

For very large documents, the textbook LCS uses O(M × N) memory. Hirschberg's algorithm (1975) does the same job in O(min(M, N)) memory by being clever about backtracking — but it's slower.

10

Microsoft 365's "Track Changes" feature stores the diff as XML inside the .docx file. Open the .docx as a ZIP and inspect word/document.xml — you'll see w:ins and w:del tags marking every change.

FAQ

  • No. Both PDFs are read into browser memory, text-extracted via pdf.js, and diffed in your browser. Open DevTools → Network and watch — zero outbound traffic. This is the privacy-safe alternative for diffing confidential legal, financial, or HR documents.

  • No — scanned PDFs have no extractable text. You need to run OCR first (Adobe Acrobat's "Recognize Text", or a desktop OCR tool) to get a text layer, then diff. Diff-on-images is a different problem requiring computer-vision techniques not yet in scope.

  • The "Collapse unchanged regions" option (on by default) hides runs of identical lines that aren't near any change. This keeps the diff focused on what actually differs. Untick it to see every line of both documents.

  • No. This is a text diff — it sees the words, not the formatting. If a paragraph changed from regular to bold but the text is the same, the diff shows "unchanged." For visual / formatting comparison, use Adobe Acrobat Pro's Compare Files feature.

  • Soft limit: browser memory and the O(M × N) LCS table. Two 500-page contracts (≈30,000 lines each) would need ~900M cells of memory which a modern laptop handles but starts to feel slow. For practical use, anything under 100 pages per side compares in under 3 seconds.

  • Standard unified-diff style: --- header with the original filename, +++ with the revised filename, then each line prefixed with two spaces (unchanged), - (removed), or + (added). Compatible with any code-review tool that reads diff output.

  • LCS doesn't model moves — it only models adds, removes, and unchanged. A moved paragraph appears as removed from its old position and added at the new one. Specialised "move-aware" diff tools exist (Adobe Acrobat Compare, Litera Compare) but are server-heavy.

  • pdf.js reconstructs lines from glyph positions, which doesn't always match human-perceived line endings. Multi-column layouts and PDF wrapping can confuse the line segmentation. The diff is still correct against the segmented text — but the segmentation itself can look unintuitive. If output is too messy, export both PDFs to plain text in a desktop tool (pdftotext, pdfplumber) and diff those.

  • Restriction-only protection (no-copy, no-print) is bypassed via pdf.js's own decoder. Open-password protected PDFs need the password removed first via Adobe Acrobat or macOS Preview.

  • Yes. The LCS algorithm is fully deterministic — the same two inputs always produce the same diff. There's no AI, no fuzziness, no probabilistic matching. The result is reproducible across runs, browsers, and machines.

Related News

You may be interested in these recent stories from our newsroom.

View all news →
Advertisement
Pre-footer · AD-W3 728 × 90

75 more free tools

Calculators, converters, security tools — no signup.