PDF Image Extractor

Share:

Pull every embedded JPEG image out of a PDF at original quality. Browser-only — nothing uploaded.

RT-IMG-027 · Image & File

PDF Image Extractor

💡 Original quality, no re-encode: for JPEG images (the format ~95% of PDFs use), this tool extracts the raw byte stream — which IS a complete JPEG file. The output is bit-identical to what the PDF embedder originally inserted, with zero re-compression loss.
Drop a PDF to begin.
🔒 PDFs stay on your device. Image extraction walks the PDF's indirect-object table entirely in browser memory using self-hosted pdf-lib. Nothing is uploaded — verify in DevTools → Network.
Advertisement
After results · AD-W1 Responsive · Post-tool — peak engagement

How to extract images from a PDF

Drop your PDF

The tool walks the PDF's indirect-object table looking for every XObject with subtype /Image — these are the embedded raster images placed on each page.

Browse the gallery

Every JPEG image found shows up as a thumbnail with dimensions and file size. The original embedder's compression quality is preserved — what you see is the JPEG exactly as it lives inside the PDF.

Download — one or all

Click the Download button on each card to grab one image, or click Download all images to trigger every file at once. Files are named {pdfname}-image-001.jpg, -002.jpg, etc.

What this does NOT extract

Vector graphics (lines, charts, logos drawn as paths), embedded fonts, raw pixel images (uncompressed or FlateDecode-compressed), and CCITT fax-encoded bitmaps. For those, screenshot the rendered page instead — use our PDF to Image tool.

Advertisement
After how-to · AD-W2 Responsive

PDF image extraction — original-quality assets without screenshot-and-crop

Most users who want "the images out of this PDF" reach for screenshotting first. That works, but the result is a re-rendered raster — fuzzy when zoomed, locked to whatever DPI the screen was at, stripped of any colour profile. The actual images embedded in a PDF are usually JPEGs, sitting inside the PDF at full original quality. This tool walks the PDF's internal structure and pulls those JPEGs out without re-encoding them — what you download is bit-identical to what the PDF's creator embedded.

How PDFs store images

A PDF's body is a collection of indirect objects, each with a unique reference number. Images are stored as stream objects with a dictionary header marking them as /Subtype /Image. The dictionary carries the image's width, height, colour space, bits-per-component, and a /Filter entry that says how the stream bytes are encoded. By far the most common filter is DCTDecode (Discrete Cosine Transform — the foundation of JPEG). For DCTDecode streams, the raw bytes are a complete JPEG file: copy them out, drop a .jpg extension on, and you have the original image. No decoding, no re-encoding, no loss.

JPEG images in a PDF are stored as raw JPEG bytes. Pulling them out is just "copy these bytes to a .jpg file." No re-encoding, no quality loss — bit-identical to what the embedder originally placed.

The APAC document-asset use case

Image extraction matters most for design and editorial workflows: a Singapore ad agency receives a printed campaign as PDF and wants the source images for repurposing; a Malaysia publisher needs to extract photographs from a magazine layout PDF for archival; an Indonesia marketing team rebuilds an old brochure from a legacy PDF when the original Photoshop files are lost; a Vietnam e-commerce team pulls product photography out of a supplier PDF catalogue; Filipino graphic designers extract image assets from client-supplied PDFs to start new versions; Thai e-magazine archivists pull illustrations out of decades-old PDF issues. In all these workflows, original-quality extraction (no re-encode) is the difference between usable and useless.

What gets extracted, and what doesn't

This tool extracts images compressed with the two most common filters: DCTDecode (JPEG, ~95% of PDF images) and JPXDecode (JPEG 2000, occasional in print workflows). It skips three other image types: FlateDecode-compressed raw pixel data (needs PNG reconstruction with colour-space awareness — out of scope for v1), CCITTFaxDecode-encoded fax bitmaps (needs TIFF wrapper construction), and uncompressed raw pixel data (rare; needs full PNG encoding). For those, the workaround is rendering the page at high DPI via a PDF-to-image tool and then cropping. Vector graphics (logos, line drawings, charts drawn as paths) aren't extractable as bitmap images — they need different tools (Adobe Illustrator, Inkscape) that understand PDF path data.

10 Things to Know About PDF Images

01

A PDF stores images as indirect objects with /Subtype /Image. The same image can appear on multiple pages — the PDF references the single stored object from each page that uses it.

02

JPEGs inside PDFs are stored under the DCTDecode filter. The raw byte stream IS a valid JPEG file — extraction is literally just "copy these bytes."

03

JPEG 2000 (the JPXDecode filter) was meant to replace JPEG but never caught on outside of certain print and medical workflows. Newer browsers don't natively render .jp2 files — you'll need GIMP or Photoshop.

04

The PDF/X spec (used in print production) requires all images to be JPEG or uncompressed. PDF/A (archival) allows JPEG, JPEG 2000, and JBIG2.

05

Adobe Acrobat's "Export PDF → Image" feature re-renders each page as a screenshot — losing original quality. To get the original-quality embedded JPEGs, use Acrobat's "Export → All Images" or a dedicated extractor like this one.

06

The same JPEG can be reused dozens of times on different pages — a logo in a brochure, a header image in a report. Extraction shows each unique image once, regardless of how many pages reference it.

07

"Inline images" are embedded directly in page content streams rather than as separate XObjects. They're rare and tiny (the PDF spec discourages large inline images) and this tool doesn't extract them in v1.

08

Image masks — PDF objects that define which parts of an image are transparent — are stored as separate stream objects. Stripping them on extraction means you get the colour image without the mask applied.

09

The pdfimages CLI tool from the Poppler project is the desktop equivalent of this tool. It supports more filter types (FlateDecode, CCITTFax) and is the reference standard for forensic PDF analysis.

10

Copyright reminder: extracting images from a PDF doesn't grant you a licence to use them. Photos, illustrations, and logos in PDFs are usually copyrighted — extract for personal use, document analysis, or workflows where you already own the rights.

FAQ

  • No. The pdf-lib library walks the PDF's structure entirely in your browser. Images are copied from memory to download blobs without ever touching a server. Open DevTools → Network — there's zero outbound traffic.

  • Three possibilities: (1) the "images" are actually vector graphics (lines, paths, shapes drawn by PDF instructions, not raster images) — extract those with Adobe Illustrator instead. (2) The images use a filter v1 doesn't handle yet (FlateDecode, CCITTFax). (3) The PDF is a scanned document where each page is one huge image stored as a JPEG — in which case the tool will find one image per page.

  • No — for JPEG (DCTDecode) and JPEG 2000 (JPXDecode), the raw stream bytes from the PDF ARE the original image file. The download is bit-identical to what the PDF's creator embedded. This is fundamentally different from screenshot-the-page approaches, which re-render and lose quality.

  • Vector graphics aren't extractable as bitmaps — they're sequences of drawing instructions, not pixel arrays. To capture them: (a) screenshot the rendered page (loses resolution), (b) open the PDF in Adobe Illustrator which can edit vector elements directly, or (c) use Inkscape's "Import PDF" feature which preserves vector data.

  • PDF deduplicates images at storage. The same logo on 100 pages is stored once and referenced 100 times. The extractor shows each unique stored image once, which is usually what you want — extracting 100 copies of the same logo would be wasteful.

  • Not in v1. PDFs store images as global resources, not per-page assets — building a page→image map requires walking every page's resource dictionary which adds complexity for a feature most users don't need. If you need this, please request it via /request-a-tool.

  • Restriction-only encryption (no-copy, no-print) is bypassed via pdf-lib's ignoreEncryption flag — extraction works. Open-password encryption requires the password to be removed first via Adobe Acrobat or macOS Preview.

  • PDF Image Extractor (this tool): pulls the original embedded raster images out, preserving their exact original quality. PDF to Image: renders each whole page as a screenshot — useful for visual archiving or page previews, but loses original image quality.

  • Depends on the source. Extraction doesn't transfer copyright. If you own the PDF (your own design files, your company's PDFs), extracting your own assets is fine. Photos and illustrations in third-party PDFs are usually copyrighted — extract for analysis or personal reference, not for redistribution without permission.

  • For JPEG/JP2 extractions: yes — the bytes are copied unchanged, so any EXIF or embedded ICC data the embedder included survives. Most PDF embedders strip EXIF during PDF assembly (Adobe Distiller, InDesign do), so don't expect a treasure trove of camera data. ICC colour profiles are more commonly preserved, especially in print-workflow PDFs.

Related News

You may be interested in these recent stories from our newsroom.

View all news →
Advertisement
Pre-footer · AD-W3 728 × 90

75 more free tools

Calculators, converters, security tools — no signup.