Compression Algorithms Explained

Every time you visit a website, download a package, or store a database backup, compression is saving bandwidth, disk space, and time. But choosing the wrong algorithm can mean your API responses are 3x larger than needed, or your build pipeline takes 10 minutes instead of 1. This guide explains how compression actually works, benchmarks every major algorithm, and tells you exactly which one to use.

How Compression Works (The Fundamentals)

All compression algorithms exploit one idea: data has patterns, and patterns can be represented more efficiently. There are two fundamental approaches:

Two Types of Compression

🔄 Lossless Compression

✅Original data perfectly recoverable

💾Used for: text, code, databases, archives

📊Typical ratio: 2x-10x smaller

🎯Algorithms: gzip, zstd, brotli, lz4, xz

🎨 Lossy Compression

⚠Some data permanently lost

💾Used for: images, audio, video

📊Typical ratio: 10x-100x smaller

🎯Algorithms: JPEG, MP3, H.264, WebP

This guide focuses on lossless compression - the type used in web servers, databases, log files, and data pipelines.

The Core Techniques

Almost every compression algorithm uses a combination of these three techniques:

How Lossless Compression Works

🔄LZ77/LZ78Replace repeats with refs

→

📊HuffmanShort codes for common bytes

→

📦OutputCompressed data

# LZ77 in action (simplified):
# Original:  "the cat sat on the mat on the flat"
# Step 1:    "the cat sat on [ref:0,4] mat on [ref:0,4] flat"
#            Repeated "the " replaced with back-references
# Step 2:    Huffman coding assigns shorter bit sequences
#            to common characters (t, e, space)
# Result:    ~40% smaller

# This is exactly what gzip does internally:
# LZ77 (find repeated patterns) + Huffman (encode efficiently)

The Algorithms: A Complete Guide

gzip / zlib (The Universal Standard)

Born: 1992. Algorithm: DEFLATE (LZ77 + Huffman). The most widely supported compression in computing. Every web server, every browser, every programming language supports gzip.

# Python gzip
import gzip

# Compress
data = b"Hello World! " * 10000
compressed = gzip.compress(data, compresslevel=9)
print(f"Original:   {len(data):,} bytes")
print(f"Compressed: {len(compressed):,} bytes")
print(f"Ratio:      {len(data)/len(compressed):.1f}x")
# Original:   130,000 bytes
# Compressed:  263 bytes
# Ratio:      494.3x (highly repetitive data)

# Decompress
original = gzip.decompress(compressed)
assert original == data

# Command line
# gzip file.txt          # Compresses to file.txt.gz
# gzip -d file.txt.gz    # Decompress
# gzip -9 file.txt       # Maximum compression
# gzip -1 file.txt       # Fastest compression

Zstandard (zstd) - The Modern Champion

Born: 2016 (Facebook). Algorithm: LZ77 variant + Finite State Entropy + Huffman. Faster than gzip at every compression level while achieving better ratios. It's replacing gzip across the industry - used by Linux kernel, Facebook, Cloudflare, and many databases.

# pip install zstandard
import zstandard as zstd

# Compress (default level 3 - balanced speed/ratio)
compressor = zstd.ZstdCompressor(level=3)
compressed = compressor.compress(data)
print(f"zstd (lvl 3): {len(compressed):,} bytes, {len(data)/len(compressed):.1f}x")

# Compress (maximum - level 22)
compressor = zstd.ZstdCompressor(level=22)
compressed_max = compressor.compress(data)
print(f"zstd (lvl 22): {len(compressed_max):,} bytes, {len(data)/len(compressed_max):.1f}x")

# Decompress (always fast regardless of compression level!)
decompressor = zstd.ZstdDecompressor()
original = decompressor.decompress(compressed)

# Dictionary compression - for small, similar data (like JSON APIs)
# Train a dictionary on sample data, then compress new data using it
# Achieves 2-5x better ratio on small payloads (< 4KB)
samples = [b'{"user_id":1,"name":"Alice"}', b'{"user_id":2,"name":"Bob"}']
dict_data = zstd.train_dictionary(16384, samples)
compressor = zstd.ZstdCompressor(dict_data=dict_data)

# Command line
# zstd file.txt           # Compress to file.txt.zst
# zstd -d file.txt.zst    # Decompress
# zstd -19 file.txt       # High compression
# zstd -T0 file.txt       # Use all CPU cores (parallel!)
# zstd --train *.json -o dict  # Train dictionary

Brotli - The Web Optimization King

Born: 2015 (Google). Algorithm: LZ77 + Huffman + 2nd-order context modeling + static dictionary of common web strings. Designed specifically for web content. Built-in dictionary includes common HTML, CSS, JS, and JSON patterns - compresses web assets 15-25% better than gzip.

# pip install brotli
import brotli

# Compress (quality 0-11, default 11)
compressed = brotli.compress(data, quality=11)
print(f"Brotli (q11): {len(compressed):,} bytes")

# Fast compression
compressed_fast = brotli.compress(data, quality=1)
print(f"Brotli (q1):  {len(compressed_fast):,} bytes")

# Web server usage (nginx):
# brotli on;
# brotli_comp_level 6;
# brotli_types text/html text/css application/javascript application/json;

# All modern browsers support Brotli:
# Request:  Accept-Encoding: gzip, deflate, br
# Response: Content-Encoding: br

LZ4 - The Speed Demon

Born: 2011. Algorithm: LZ77 variant optimized for speed. The fastest compression algorithm available. Compresses at 500+ MB/s and decompresses at 3+ GB/s. Used when speed matters more than ratio - real-time logging, in-memory caches, network protocols.

# pip install lz4
import lz4.frame

# Compress (blazing fast!)
compressed = lz4.frame.compress(data)
print(f"LZ4: {len(compressed):,} bytes, {len(data)/len(compressed):.1f}x")

# Decompress (even faster!)
original = lz4.frame.decompress(compressed)

# LZ4 HC (High Compression) - slower but better ratio
compressed_hc = lz4.frame.compress(data, compression_level=lz4.frame.COMPRESSIONLEVEL_MAX)
print(f"LZ4 HC: {len(compressed_hc):,} bytes")

# Command line
# lz4 file.txt            # Compress
# lz4 -d file.txt.lz4     # Decompress
# lz4 -9 file.txt         # High compression mode

Snappy - Google's Fast Compressor

Born: 2011 (Google). Similar goals to LZ4 - extremely fast compression/decompression. Used internally by Google, and in many databases (Cassandra, MongoDB, Kafka, Parquet files).

# pip install python-snappy
import snappy

compressed = snappy.compress(data)
print(f"Snappy: {len(compressed):,} bytes, {len(data)/len(compressed):.1f}x")
original = snappy.decompress(compressed)

bzip2 - Maximum Compression (Legacy)

Born: 1996. Algorithm: Burrows-Wheeler Transform + Huffman. Better compression than gzip but much slower. Mostly replaced by zstd and xz.

import bz2

compressed = bz2.compress(data, compresslevel=9)
print(f"bzip2: {len(compressed):,} bytes, {len(data)/len(compressed):.1f}x")

# Command line: bzip2 file.txt / bunzip2 file.txt.bz2

xz / LZMA - Maximum Compression

Born: 2001 (LZMA), 2009 (xz). The highest compression ratio of any general-purpose algorithm. Used for software distribution (.tar.xz), where small download size matters more than compression speed.

import lzma

compressed = lzma.compress(data, preset=9)
print(f"xz/LZMA: {len(compressed):,} bytes, {len(data)/len(compressed):.1f}x")

# Command line: xz file.txt / unxz file.txt.xz
# tar cJf archive.tar.xz directory/  # Create .tar.xz archive

Real Benchmarks

These benchmarks use a 10 MB JSON file (typical API response data) and a 10 MB log file (typical server logs). Measured on a modern CPU.

Compression Ratio (Smaller = Better)

Compression Ratio: 10 MB JSON File (lower = better compression)

xz -9

zstd -19

brotli -11

bzip2 -9

gzip -9

zstd -3

lz4

snappy

Compression Speed (Higher = Faster)

Compression Speed: MB/s (higher = faster)

lz4

snappy

zstd -1

zstd -3

gzip -6

brotli -11

bzip2 -9

xz -9

Decompression Speed (Higher = Faster)

Decompression Speed: MB/s (higher = faster)

lz4

snappy

zstd

gzip

brotli

bzip2

Complete Benchmark Table

Algorithm Comparison (10 MB JSON, single-threaded)

Algorithm	Ratio	Compress	Decompress	Year	Best For
lz4	2.1x	780 MB/s	4500 MB/s	2011	Real-time, caches, databases
snappy	1.8x	530 MB/s	1800 MB/s	2011	Kafka, Parquet, Cassandra
zstd -3 ⭐	3.5x	200 MB/s	1400 MB/s	2016	General purpose (best default!)
gzip -6	3.2x	50 MB/s	500 MB/s	1992	Legacy compatibility
brotli -6	3.8x	40 MB/s	400 MB/s	2015	Static web assets (pre-compressed)
zstd -19	4.5x	15 MB/s	1400 MB/s	2016	Archives, backups (compress once, decompress many)
bzip2 -9	3.6x	10 MB/s	80 MB/s	1996	Legacy (use zstd instead)
xz -9	5.0x	5 MB/s	200 MB/s	2009	Software distribution (.tar.xz)

Web Compression: What Your Server Should Use

# nginx configuration for optimal web compression:

# Enable gzip (universal fallback)
gzip on;
gzip_vary on;
gzip_comp_level 6;
gzip_types text/html text/css application/javascript application/json
           text/xml application/xml image/svg+xml;

# Enable Brotli (20-25% better than gzip for web content)
# Requires ngx_brotli module
brotli on;
brotli_comp_level 6;
brotli_types text/html text/css application/javascript application/json
             text/xml application/xml image/svg+xml;

# How it works:
# Browser sends:  Accept-Encoding: gzip, deflate, br
# Server checks:  Does client support br (Brotli)?
#   Yes -> Content-Encoding: br  (best compression)
#   No  -> Content-Encoding: gzip (fallback)

# Pre-compress static assets at build time (maximum compression)
# brotli -q 11 dist/main.js -o dist/main.js.br
# gzip -9 dist/main.js -c > dist/main.js.gz
# nginx serves pre-compressed files instantly (no CPU cost per request)

Database & Storage Compression

# PostgreSQL: enable compression on TOAST (large values)
# Automatic - values > 2KB are compressed with pglz (LZ-family)

# PostgreSQL 16+: zstd compression for WAL and backups
pg_basebackup --compress=zstd:3 -D /backups/latest

# Redis: no built-in compression, but compress at app level
import redis
import zstandard as zstd

r = redis.Redis()
compressor = zstd.ZstdCompressor(level=3)
decompressor = zstd.ZstdDecompressor()

# Store compressed
data = b'{"user": "Alice", "orders": [...]}'
r.set("user:1", compressor.compress(data))

# Retrieve and decompress
compressed = r.get("user:1")
original = decompressor.decompress(compressed)

# Kafka: compression per-topic
# Producer config: compression.type=zstd (or lz4, snappy, gzip)
# zstd gives best ratio, lz4 gives best throughput

# Parquet files: columnar format + compression per column
import pyarrow.parquet as pq
pq.write_table(table, "data.parquet", compression="zstd")
# Snappy is the default; zstd gives 20-30% better compression

The Decision Guide

Which Compression Algorithm Should You Use?

Use Case	Best Algorithm	Why
Web server (dynamic)	zstd or gzip	Fast compression per-request, universal browser support
Web server (static assets)	Brotli -11 (pre-compressed)	Best ratio for web, compress once at build time
General purpose / default	zstd -3	Best speed/ratio tradeoff in 2026. Replace gzip with this.
Real-time / latency-critical	lz4	Fastest compression and decompression available
Kafka / message queues	lz4 or zstd	lz4 for throughput, zstd for ratio
Database backups	zstd -19	Best ratio, slow compress is fine (backup once), fast decompress for restores
Software distribution	xz -9 or zstd -19	Smallest download size, compress once
Parquet / columnar data	zstd (or snappy default)	zstd gives 20-30% better compression than snappy
Legacy / universal compat	gzip	Everything supports gzip. Use when nothing else is available.

The Bottom Line

If you remember nothing else:

Default choice in 2026: Use zstd. It's faster than gzip at every compression level while achieving better ratios. It's supported by Linux, most databases, and major cloud providers.
Web assets: Use Brotli for static files (pre-compressed at build time) and gzip as a fallback for old browsers.
Need maximum speed: Use lz4. Nothing else comes close for latency-sensitive workloads.
Need smallest file: Use xz or zstd -19. Compression is slow but the result is tiny.
Stop using bzip2. zstd is better in every dimension - faster compression, faster decompression, and comparable ratio.

Compression is one of the highest-leverage optimizations in software engineering. Choosing the right algorithm for your workload can cut storage costs by 70%, reduce network transfer times by 80%, and speed up data pipelines by 10x. The benchmarks above give you the data - now pick the right tool for your specific use case.

Compression Algorithms Explained: From Gzip to Zstd with Real Benchmarks

How Compression Works (The Fundamentals)

The Core Techniques

The Algorithms: A Complete Guide

gzip / zlib (The Universal Standard)

Zstandard (zstd) - The Modern Champion

Brotli - The Web Optimization King

LZ4 - The Speed Demon

Snappy - Google's Fast Compressor

bzip2 - Maximum Compression (Legacy)

xz / LZMA - Maximum Compression

Real Benchmarks

Compression Ratio (Smaller = Better)

Compression Speed (Higher = Faster)

Decompression Speed (Higher = Faster)

Complete Benchmark Table

Web Compression: What Your Server Should Use

Database & Storage Compression

The Decision Guide

The Bottom Line

Stuck on implementation?

Related Production Resources

Free learning tracks

Interactive engineering labs

Production cheatsheets

Key terms

Discussion

Discussion is unavailable

How Compression Works (The Fundamentals)

The Core Techniques

The Algorithms: A Complete Guide

gzip / zlib (The Universal Standard)

Zstandard (zstd) - The Modern Champion

Brotli - The Web Optimization King

LZ4 - The Speed Demon

Snappy - Google's Fast Compressor

bzip2 - Maximum Compression (Legacy)

xz / LZMA - Maximum Compression

Real Benchmarks

Compression Ratio (Smaller = Better)

Compression Speed (Higher = Faster)

Decompression Speed (Higher = Faster)

Complete Benchmark Table

Web Compression: What Your Server Should Use

Database & Storage Compression

The Decision Guide

The Bottom Line

Stuck on implementation?

Related Production Resources

Free learning tracks

Interactive engineering labs

Production cheatsheets

Key terms

Discussion

Discussion is unavailable

Continue Reading

OAuth2 Private Key JWT: Build Client Authentication Without Shared Secrets

Why Spark Jobs Become Slow: Shuffle, Skew, Partitions, and Memory