Every time you visit a website, download a package, or store a database backup, compression is saving bandwidth, disk space, and time. But choosing the wrong algorithm can mean your API responses are 3x larger than needed, or your build pipeline takes 10 minutes instead of 1. This guide explains how compression actually works, benchmarks every major algorithm, and tells you exactly which one to use.
How Compression Works (The Fundamentals)
All compression algorithms exploit one idea: data has patterns, and patterns can be represented more efficiently. There are two fundamental approaches:
This guide focuses on lossless compression — the type used in web servers, databases, log files, and data pipelines.
The Core Techniques
Almost every compression algorithm uses a combination of these three techniques:
# LZ77 in action (simplified):
# Original: "the cat sat on the mat on the flat"
# Step 1: "the cat sat on [ref:0,4] mat on [ref:0,4] flat"
# Repeated "the " replaced with back-references
# Step 2: Huffman coding assigns shorter bit sequences
# to common characters (t, e, space)
# Result: ~40% smaller
# This is exactly what gzip does internally:
# LZ77 (find repeated patterns) + Huffman (encode efficiently)
The Algorithms: A Complete Guide
gzip / zlib (The Universal Standard)
Born: 1992. Algorithm: DEFLATE (LZ77 + Huffman). The most widely supported compression in computing. Every web server, every browser, every programming language supports gzip.
# Python gzip
import gzip
# Compress
data = b"Hello World! " * 10000
compressed = gzip.compress(data, compresslevel=9)
print(f"Original: {len(data):,} bytes")
print(f"Compressed: {len(compressed):,} bytes")
print(f"Ratio: {len(data)/len(compressed):.1f}x")
# Original: 130,000 bytes
# Compressed: 263 bytes
# Ratio: 494.3x (highly repetitive data)
# Decompress
original = gzip.decompress(compressed)
assert original == data
# Command line
# gzip file.txt # Compresses to file.txt.gz
# gzip -d file.txt.gz # Decompress
# gzip -9 file.txt # Maximum compression
# gzip -1 file.txt # Fastest compression
Zstandard (zstd) — The Modern Champion
Born: 2016 (Facebook). Algorithm: LZ77 variant + Finite State Entropy + Huffman. Faster than gzip at every compression level while achieving better ratios. It's replacing gzip across the industry — used by Linux kernel, Facebook, Cloudflare, and many databases.
# pip install zstandard
import zstandard as zstd
# Compress (default level 3 — balanced speed/ratio)
compressor = zstd.ZstdCompressor(level=3)
compressed = compressor.compress(data)
print(f"zstd (lvl 3): {len(compressed):,} bytes, {len(data)/len(compressed):.1f}x")
# Compress (maximum — level 22)
compressor = zstd.ZstdCompressor(level=22)
compressed_max = compressor.compress(data)
print(f"zstd (lvl 22): {len(compressed_max):,} bytes, {len(data)/len(compressed_max):.1f}x")
# Decompress (always fast regardless of compression level!)
decompressor = zstd.ZstdDecompressor()
original = decompressor.decompress(compressed)
# Dictionary compression — for small, similar data (like JSON APIs)
# Train a dictionary on sample data, then compress new data using it
# Achieves 2-5x better ratio on small payloads (< 4KB)
samples = [b'{"user_id":1,"name":"Alice"}', b'{"user_id":2,"name":"Bob"}']
dict_data = zstd.train_dictionary(16384, samples)
compressor = zstd.ZstdCompressor(dict_data=dict_data)
# Command line
# zstd file.txt # Compress to file.txt.zst
# zstd -d file.txt.zst # Decompress
# zstd -19 file.txt # High compression
# zstd -T0 file.txt # Use all CPU cores (parallel!)
# zstd --train *.json -o dict # Train dictionary
Brotli — The Web Optimization King
Born: 2015 (Google). Algorithm: LZ77 + Huffman + 2nd-order context modeling + static dictionary of common web strings. Designed specifically for web content. Built-in dictionary includes common HTML, CSS, JS, and JSON patterns — compresses web assets 15-25% better than gzip.
# pip install brotli
import brotli
# Compress (quality 0-11, default 11)
compressed = brotli.compress(data, quality=11)
print(f"Brotli (q11): {len(compressed):,} bytes")
# Fast compression
compressed_fast = brotli.compress(data, quality=1)
print(f"Brotli (q1): {len(compressed_fast):,} bytes")
# Web server usage (nginx):
# brotli on;
# brotli_comp_level 6;
# brotli_types text/html text/css application/javascript application/json;
# All modern browsers support Brotli:
# Request: Accept-Encoding: gzip, deflate, br
# Response: Content-Encoding: br
LZ4 — The Speed Demon
Born: 2011. Algorithm: LZ77 variant optimized for speed. The fastest compression algorithm available. Compresses at 500+ MB/s and decompresses at 3+ GB/s. Used when speed matters more than ratio — real-time logging, in-memory caches, network protocols.
# pip install lz4
import lz4.frame
# Compress (blazing fast!)
compressed = lz4.frame.compress(data)
print(f"LZ4: {len(compressed):,} bytes, {len(data)/len(compressed):.1f}x")
# Decompress (even faster!)
original = lz4.frame.decompress(compressed)
# LZ4 HC (High Compression) — slower but better ratio
compressed_hc = lz4.frame.compress(data, compression_level=lz4.frame.COMPRESSIONLEVEL_MAX)
print(f"LZ4 HC: {len(compressed_hc):,} bytes")
# Command line
# lz4 file.txt # Compress
# lz4 -d file.txt.lz4 # Decompress
# lz4 -9 file.txt # High compression mode
Snappy — Google's Fast Compressor
Born: 2011 (Google). Similar goals to LZ4 — extremely fast compression/decompression. Used internally by Google, and in many databases (Cassandra, MongoDB, Kafka, Parquet files).
# pip install python-snappy
import snappy
compressed = snappy.compress(data)
print(f"Snappy: {len(compressed):,} bytes, {len(data)/len(compressed):.1f}x")
original = snappy.decompress(compressed)
bzip2 — Maximum Compression (Legacy)
Born: 1996. Algorithm: Burrows-Wheeler Transform + Huffman. Better compression than gzip but much slower. Mostly replaced by zstd and xz.
import bz2
compressed = bz2.compress(data, compresslevel=9)
print(f"bzip2: {len(compressed):,} bytes, {len(data)/len(compressed):.1f}x")
# Command line: bzip2 file.txt / bunzip2 file.txt.bz2
xz / LZMA — Maximum Compression
Born: 2001 (LZMA), 2009 (xz). The highest compression ratio of any general-purpose algorithm. Used for software distribution (.tar.xz), where small download size matters more than compression speed.
import lzma
compressed = lzma.compress(data, preset=9)
print(f"xz/LZMA: {len(compressed):,} bytes, {len(data)/len(compressed):.1f}x")
# Command line: xz file.txt / unxz file.txt.xz
# tar cJf archive.tar.xz directory/ # Create .tar.xz archive
Real Benchmarks
These benchmarks use a 10 MB JSON file (typical API response data) and a 10 MB log file (typical server logs). Measured on a modern CPU.
Compression Ratio (Smaller = Better)
Compression Speed (Higher = Faster)
Decompression Speed (Higher = Faster)
Complete Benchmark Table
| Algorithm | Ratio | Compress | Decompress | Year | Best For |
|---|---|---|---|---|---|
| lz4 | 2.1x | 780 MB/s | 4500 MB/s | 2011 | Real-time, caches, databases |
| snappy | 1.8x | 530 MB/s | 1800 MB/s | 2011 | Kafka, Parquet, Cassandra |
| zstd -3 ⭐ | 3.5x | 200 MB/s | 1400 MB/s | 2016 | General purpose (best default!) |
| gzip -6 | 3.2x | 50 MB/s | 500 MB/s | 1992 | Legacy compatibility |
| brotli -6 | 3.8x | 40 MB/s | 400 MB/s | 2015 | Static web assets (pre-compressed) |
| zstd -19 | 4.5x | 15 MB/s | 1400 MB/s | 2016 | Archives, backups (compress once, decompress many) |
| bzip2 -9 | 3.6x | 10 MB/s | 80 MB/s | 1996 | Legacy (use zstd instead) |
| xz -9 | 5.0x | 5 MB/s | 200 MB/s | 2009 | Software distribution (.tar.xz) |
Web Compression: What Your Server Should Use
# nginx configuration for optimal web compression:
# Enable gzip (universal fallback)
gzip on;
gzip_vary on;
gzip_comp_level 6;
gzip_types text/html text/css application/javascript application/json
text/xml application/xml image/svg+xml;
# Enable Brotli (20-25% better than gzip for web content)
# Requires ngx_brotli module
brotli on;
brotli_comp_level 6;
brotli_types text/html text/css application/javascript application/json
text/xml application/xml image/svg+xml;
# How it works:
# Browser sends: Accept-Encoding: gzip, deflate, br
# Server checks: Does client support br (Brotli)?
# Yes -> Content-Encoding: br (best compression)
# No -> Content-Encoding: gzip (fallback)
# Pre-compress static assets at build time (maximum compression)
# brotli -q 11 dist/main.js -o dist/main.js.br
# gzip -9 dist/main.js -c > dist/main.js.gz
# nginx serves pre-compressed files instantly (no CPU cost per request)
Database & Storage Compression
# PostgreSQL: enable compression on TOAST (large values)
# Automatic — values > 2KB are compressed with pglz (LZ-family)
# PostgreSQL 16+: zstd compression for WAL and backups
pg_basebackup --compress=zstd:3 -D /backups/latest
# Redis: no built-in compression, but compress at app level
import redis
import zstandard as zstd
r = redis.Redis()
compressor = zstd.ZstdCompressor(level=3)
decompressor = zstd.ZstdDecompressor()
# Store compressed
data = b'{"user": "Alice", "orders": [...]}'
r.set("user:1", compressor.compress(data))
# Retrieve and decompress
compressed = r.get("user:1")
original = decompressor.decompress(compressed)
# Kafka: compression per-topic
# Producer config: compression.type=zstd (or lz4, snappy, gzip)
# zstd gives best ratio, lz4 gives best throughput
# Parquet files: columnar format + compression per column
import pyarrow.parquet as pq
pq.write_table(table, "data.parquet", compression="zstd")
# Snappy is the default; zstd gives 20-30% better compression
The Decision Guide
| Use Case | Best Algorithm | Why |
|---|---|---|
| Web server (dynamic) | zstd or gzip | Fast compression per-request, universal browser support |
| Web server (static assets) | Brotli -11 (pre-compressed) | Best ratio for web, compress once at build time |
| General purpose / default | zstd -3 | Best speed/ratio tradeoff in 2026. Replace gzip with this. |
| Real-time / latency-critical | lz4 | Fastest compression and decompression available |
| Kafka / message queues | lz4 or zstd | lz4 for throughput, zstd for ratio |
| Database backups | zstd -19 | Best ratio, slow compress is fine (backup once), fast decompress for restores |
| Software distribution | xz -9 or zstd -19 | Smallest download size, compress once |
| Parquet / columnar data | zstd (or snappy default) | zstd gives 20-30% better compression than snappy |
| Legacy / universal compat | gzip | Everything supports gzip. Use when nothing else is available. |
The Bottom Line
If you remember nothing else:
- Default choice in 2026: Use zstd. It's faster than gzip at every compression level while achieving better ratios. It's supported by Linux, most databases, and major cloud providers.
- Web assets: Use Brotli for static files (pre-compressed at build time) and gzip as a fallback for old browsers.
- Need maximum speed: Use lz4. Nothing else comes close for latency-sensitive workloads.
- Need smallest file: Use xz or zstd -19. Compression is slow but the result is tiny.
- Stop using bzip2. zstd is better in every dimension — faster compression, faster decompression, and comparable ratio.
Compression is one of the highest-leverage optimizations in software engineering. Choosing the right algorithm for your workload can cut storage costs by 70%, reduce network transfer times by 80%, and speed up data pipelines by 10x. The benchmarks above give you the data — now pick the right tool for your specific use case.