🗜 How PDF Compression Actually Works

How PDF Compression Actually Works

A plain-English look at what happens inside a PDF when you shrink it — and how to do it without ruining quality.

By The CrunchyPDF Team · Published June 2, 2026 · Updated July 20, 2026 · ~9 min read

"Compress a PDF" sounds like one simple action, but under the hood there are several very different techniques at play, and they don't all preserve your document equally. Understanding the difference is the key to shrinking a file as much as possible without turning your crisp text into a blurry mess.

First, what's actually inside a PDF?

A PDF is a container format, not a picture and not a document in the way a word processor file is. Inside it you'll typically find a mix of several kinds of objects, each stored and compressed differently:

Text, stored as character codes plus references to fonts — this is usually tiny, often just a few kilobytes for an entire report.
Vector graphics (lines, shapes, logos) described as mathematical paths — also small and infinitely scalable regardless of zoom level.
Raster images (photos, scans, screenshots) stored as grids of pixels — these are almost always the biggest part of a large PDF, often 90% or more of total file size.
Embedded fonts, metadata, annotations, form fields, bookmarks, and sometimes attachments or duplicate resources left over from edits.

When a PDF is surprisingly large, the culprit is nearly always images — high-resolution scans or photos that were dropped in at full size. That single fact drives almost every compression strategy in existence.

1. ReadFile opened locally

→

2. RenderPage drawn to canvas

→

3. Re-encodeCanvas → compressed JPEG

→

4. RebuildPages reassembled

→

5. DownloadNothing left your device

Lossless vs lossy: the fundamental split

There are two philosophies of compression, and a PDF tool may use either or both.

Lossless compression

Lossless methods make a file smaller while keeping every bit of the original data recoverable — nothing is thrown away. They work by spotting redundancy: repeated patterns, long runs of identical pixels, or duplicate objects that can be stored once and referenced many times. Removing an unused font, de-duplicating an image that appears on every page, or zipping the internal data streams are all lossless wins. The catch: lossless savings on an already-efficient file are modest — often single-digit to low double-digit percentages.

Lossy compression

Lossy methods achieve dramatic size reductions by permanently discarding information your eye is unlikely to miss. The classic example is JPEG image compression, which throws away fine detail and subtle color variation. Push it gently and the image looks identical; push it hard and you get visible blocky "artifacts." Lossy compression is where the big savings live — but it's a one-way street. Once detail is gone, it's gone.

The trade-off in one sentence

Lossless keeps everything but saves a little; lossy saves a lot but sacrifices some quality. Most real-world "make this PDF smaller" tasks call for careful lossy compression of the images.

The most common technique: re-rendering pages as images

A very effective and widely-used approach — and the one CrunchyPDF's Compress tool uses — works like this:

Each page is rendered (drawn) onto a canvas, exactly as it would appear on screen.
That canvas is then re-encoded as a JPEG at a quality level you choose.
The compressed page-images are reassembled into a new PDF.

Because JPEG is extremely good at squeezing photographic and scanned content, this can cut a file by 40–80%. The quality slider directly controls how aggressive the JPEG step is: higher quality keeps more detail and a bigger file; lower quality shrinks harder with more visible softening.

Important side effect: when every page becomes an image, the document's selectable text disappears. The words are still visible — they're just now part of a picture, so you can't highlight, copy, or search them, and screen readers can't read them. If keeping searchable text matters, this method isn't the right one for that file.

Other techniques you'll encounter

Downsampling images — reducing a 600 dpi scan to 150 dpi keeps it sharp on screen while cutting pixel count (and size) enormously.
Re-compressing existing images — converting a bulky PNG screenshot to a JPEG, for instance.
Font subsetting — embedding only the characters actually used instead of an entire typeface.
Object de-duplication and stream compression — the lossless housekeeping mentioned above.
Stripping metadata and unused objects — small but free savings.

A short history of PDF compression

PDF was introduced by Adobe in 1993 as a way to preserve a document's exact appearance across different computers and printers — the "Portable" in the name. Early PDFs already leaned on established image codecs of the era, particularly JPEG for photographs and CCITT Group 4 fax compression for black-and-white scanned text, a holdover from fax machines that turned out to be remarkably good at compressing simple black-and-white page scans.

As scanning and digital photography became universal in the 2000s, image-heavy PDFs exploded in size, which pushed the format to add JBIG2 (a more efficient scheme for scanned black-and-white text) and later JPEG2000 support for higher-fidelity lossy and lossless image compression. In 2008, PDF became an open ISO standard (ISO 32000), which stabilized how compression filters are declared inside the file so that any compliant reader can decode them consistently. Today's browser-based tools, including CrunchyPDF, build on the same underlying image codecs that have underpinned PDF compression for two decades — the difference is that the processing now happens in JavaScript on your own device instead of in desktop software or on a server.

Common compression myths

"Compressing always destroys quality." Not necessarily — lossless techniques (de-duplication, stream compression, font subsetting) don't touch quality at all, and even lossy compression at a moderate setting is often visually indistinguishable from the original.
"Smaller is always better." Only if the resulting quality still serves your purpose. A contract you'll print and sign needs sharper text than a reference copy you'll only glance at on a phone.
"PDF compression works like ZIP." ZIP-style compression is lossless and general-purpose; it barely shrinks a PDF full of images because JPEG data is already tightly packed. PDF-specific compression instead targets the images themselves.
"You can compress a file as many times as you like with no penalty." Re-compressing an already-lossy JPEG-based PDF repeatedly compounds quality loss each time, the same way repeatedly saving a JPEG photo does. Always compress from the original, not from a previously compressed copy.

How to choose the right setting

Document type	Suggested approach	Why
Text-heavy report you'll need to search	Lossless tools, or keep quality high	Preserve selectable text and crisp glyphs
Scanned document for email	Moderate lossy (quality ~0.5–0.6)	Scans are images anyway; big savings, still legible
Photo-rich brochure	Lossy, quality ~0.6–0.7	Protect photo quality while trimming size
Archive copy you must keep pristine	Don't compress; keep the original	Lossy loss is permanent

A practical workflow

Start at a middle quality setting, compress, and open the result. If it looks great, try a lower setting and compare — you might save even more. If text or fine lines look fuzzy, step back up. Because the whole process in CrunchyPDF runs in your browser, you can experiment freely: nothing is uploaded, and your original file is never touched.

Quick questions about compression

Can I compress a PDF without losing searchable text?

With CrunchyPDF's Compress tool specifically, no — it works by rendering pages to images, so searchable text is not preserved in the output. If you need to shrink a file while keeping text selectable, look for a lossless "optimize" pass instead, or keep the original as your searchable master and only compress the copy you're sending.

Is there a limit to how much a PDF can shrink?

Practically, yes. Once redundant data and easily-discarded detail are gone, further compression yields diminishing returns and increasingly visible artifacts. A file that's already efficient (mostly text, few images) may only shrink a little no matter the setting.

Does compressing a PDF remove viruses or malware?

No. Compression changes file size and image quality; it does not scan for or remove malicious content. Treat compression and security scanning as two separate concerns.

🗜 Ready to try it? Open the free Compress tool — your file never leaves your browser.

CrunchyPDF