Every time you visit a website, download a package, or store a database backup, compression is saving bandwidth, disk space, and time. But choosing the wrong algorithm can mean your API responses are 3x larger than needed, or your build pipeline takes 10 minutes instead of 1. This guide explains how compression actually works, benchmarks every major algorithm, and tells you exactly which one to use.

How Compression Works (The Fundamentals)

All compression algorithms exploit one idea: data has patterns, and patterns can be represented more efficiently. There are two fundamental approaches:

Two Types of Compression
🔄 Lossless Compression
Original data perfectly recoverable
💾Used for: text, code, databases, archives
📊Typical ratio: 2x-10x smaller
🎯Algorithms: gzip, zstd, brotli, lz4, xz
VS
🎨 Lossy Compression
Some data permanently lost
💾Used for: images, audio, video
📊Typical ratio: 10x-100x smaller
🎯Algorithms: JPEG, MP3, H.264, WebP

This guide focuses on lossless compression — the type used in web servers, databases, log files, and data pipelines.

The Core Techniques

Almost every compression algorithm uses a combination of these three techniques:

How Lossless Compression Works
🔄LZ77/LZ78Replace repeats with refs
📊HuffmanShort codes for common bytes
📦OutputCompressed data
# LZ77 in action (simplified):
# Original:  "the cat sat on the mat on the flat"
# Step 1:    "the cat sat on [ref:0,4] mat on [ref:0,4] flat"
#            Repeated "the " replaced with back-references
# Step 2:    Huffman coding assigns shorter bit sequences
#            to common characters (t, e, space)
# Result:    ~40% smaller

# This is exactly what gzip does internally:
# LZ77 (find repeated patterns) + Huffman (encode efficiently)

The Algorithms: A Complete Guide

gzip / zlib (The Universal Standard)

Born: 1992. Algorithm: DEFLATE (LZ77 + Huffman). The most widely supported compression in computing. Every web server, every browser, every programming language supports gzip.

# Python gzip
import gzip

# Compress
data = b"Hello World! " * 10000
compressed = gzip.compress(data, compresslevel=9)
print(f"Original:   {len(data):,} bytes")
print(f"Compressed: {len(compressed):,} bytes")
print(f"Ratio:      {len(data)/len(compressed):.1f}x")
# Original:   130,000 bytes
# Compressed:  263 bytes
# Ratio:      494.3x (highly repetitive data)

# Decompress
original = gzip.decompress(compressed)
assert original == data

# Command line
# gzip file.txt          # Compresses to file.txt.gz
# gzip -d file.txt.gz    # Decompress
# gzip -9 file.txt       # Maximum compression
# gzip -1 file.txt       # Fastest compression

Zstandard (zstd) — The Modern Champion

Born: 2016 (Facebook). Algorithm: LZ77 variant + Finite State Entropy + Huffman. Faster than gzip at every compression level while achieving better ratios. It's replacing gzip across the industry — used by Linux kernel, Facebook, Cloudflare, and many databases.

# pip install zstandard
import zstandard as zstd

# Compress (default level 3 — balanced speed/ratio)
compressor = zstd.ZstdCompressor(level=3)
compressed = compressor.compress(data)
print(f"zstd (lvl 3): {len(compressed):,} bytes, {len(data)/len(compressed):.1f}x")

# Compress (maximum — level 22)
compressor = zstd.ZstdCompressor(level=22)
compressed_max = compressor.compress(data)
print(f"zstd (lvl 22): {len(compressed_max):,} bytes, {len(data)/len(compressed_max):.1f}x")

# Decompress (always fast regardless of compression level!)
decompressor = zstd.ZstdDecompressor()
original = decompressor.decompress(compressed)

# Dictionary compression — for small, similar data (like JSON APIs)
# Train a dictionary on sample data, then compress new data using it
# Achieves 2-5x better ratio on small payloads (< 4KB)
samples = [b'{"user_id":1,"name":"Alice"}', b'{"user_id":2,"name":"Bob"}']
dict_data = zstd.train_dictionary(16384, samples)
compressor = zstd.ZstdCompressor(dict_data=dict_data)

# Command line
# zstd file.txt           # Compress to file.txt.zst
# zstd -d file.txt.zst    # Decompress
# zstd -19 file.txt       # High compression
# zstd -T0 file.txt       # Use all CPU cores (parallel!)
# zstd --train *.json -o dict  # Train dictionary

Brotli — The Web Optimization King

Born: 2015 (Google). Algorithm: LZ77 + Huffman + 2nd-order context modeling + static dictionary of common web strings. Designed specifically for web content. Built-in dictionary includes common HTML, CSS, JS, and JSON patterns — compresses web assets 15-25% better than gzip.

# pip install brotli
import brotli

# Compress (quality 0-11, default 11)
compressed = brotli.compress(data, quality=11)
print(f"Brotli (q11): {len(compressed):,} bytes")

# Fast compression
compressed_fast = brotli.compress(data, quality=1)
print(f"Brotli (q1):  {len(compressed_fast):,} bytes")

# Web server usage (nginx):
# brotli on;
# brotli_comp_level 6;
# brotli_types text/html text/css application/javascript application/json;

# All modern browsers support Brotli:
# Request:  Accept-Encoding: gzip, deflate, br
# Response: Content-Encoding: br

LZ4 — The Speed Demon

Born: 2011. Algorithm: LZ77 variant optimized for speed. The fastest compression algorithm available. Compresses at 500+ MB/s and decompresses at 3+ GB/s. Used when speed matters more than ratio — real-time logging, in-memory caches, network protocols.

# pip install lz4
import lz4.frame

# Compress (blazing fast!)
compressed = lz4.frame.compress(data)
print(f"LZ4: {len(compressed):,} bytes, {len(data)/len(compressed):.1f}x")

# Decompress (even faster!)
original = lz4.frame.decompress(compressed)

# LZ4 HC (High Compression) — slower but better ratio
compressed_hc = lz4.frame.compress(data, compression_level=lz4.frame.COMPRESSIONLEVEL_MAX)
print(f"LZ4 HC: {len(compressed_hc):,} bytes")

# Command line
# lz4 file.txt            # Compress
# lz4 -d file.txt.lz4     # Decompress
# lz4 -9 file.txt         # High compression mode

Snappy — Google's Fast Compressor

Born: 2011 (Google). Similar goals to LZ4 — extremely fast compression/decompression. Used internally by Google, and in many databases (Cassandra, MongoDB, Kafka, Parquet files).

# pip install python-snappy
import snappy

compressed = snappy.compress(data)
print(f"Snappy: {len(compressed):,} bytes, {len(data)/len(compressed):.1f}x")
original = snappy.decompress(compressed)

bzip2 — Maximum Compression (Legacy)

Born: 1996. Algorithm: Burrows-Wheeler Transform + Huffman. Better compression than gzip but much slower. Mostly replaced by zstd and xz.

import bz2

compressed = bz2.compress(data, compresslevel=9)
print(f"bzip2: {len(compressed):,} bytes, {len(data)/len(compressed):.1f}x")

# Command line: bzip2 file.txt / bunzip2 file.txt.bz2

xz / LZMA — Maximum Compression

Born: 2001 (LZMA), 2009 (xz). The highest compression ratio of any general-purpose algorithm. Used for software distribution (.tar.xz), where small download size matters more than compression speed.

import lzma

compressed = lzma.compress(data, preset=9)
print(f"xz/LZMA: {len(compressed):,} bytes, {len(data)/len(compressed):.1f}x")

# Command line: xz file.txt / unxz file.txt.xz
# tar cJf archive.tar.xz directory/  # Create .tar.xz archive

Real Benchmarks

These benchmarks use a 10 MB JSON file (typical API response data) and a 10 MB log file (typical server logs). Measured on a modern CPU.

Compression Ratio (Smaller = Better)

Compression Ratio: 10 MB JSON File (lower = better compression)
xz -9
zstd -19
brotli -11
bzip2 -9
gzip -9
zstd -3
lz4
snappy

Compression Speed (Higher = Faster)

Compression Speed: MB/s (higher = faster)
lz4
snappy
zstd -1
zstd -3
gzip -6
brotli -11
bzip2 -9
xz -9

Decompression Speed (Higher = Faster)

Decompression Speed: MB/s (higher = faster)
lz4
snappy
zstd
gzip
brotli
xz
bzip2

Complete Benchmark Table

Algorithm Comparison (10 MB JSON, single-threaded)
Algorithm Ratio Compress Decompress Year Best For
lz42.1x780 MB/s4500 MB/s2011Real-time, caches, databases
snappy1.8x530 MB/s1800 MB/s2011Kafka, Parquet, Cassandra
zstd -3 ⭐3.5x200 MB/s1400 MB/s2016General purpose (best default!)
gzip -63.2x50 MB/s500 MB/s1992Legacy compatibility
brotli -63.8x40 MB/s400 MB/s2015Static web assets (pre-compressed)
zstd -194.5x15 MB/s1400 MB/s2016Archives, backups (compress once, decompress many)
bzip2 -93.6x10 MB/s80 MB/s1996Legacy (use zstd instead)
xz -95.0x5 MB/s200 MB/s2009Software distribution (.tar.xz)

Web Compression: What Your Server Should Use

# nginx configuration for optimal web compression:

# Enable gzip (universal fallback)
gzip on;
gzip_vary on;
gzip_comp_level 6;
gzip_types text/html text/css application/javascript application/json
           text/xml application/xml image/svg+xml;

# Enable Brotli (20-25% better than gzip for web content)
# Requires ngx_brotli module
brotli on;
brotli_comp_level 6;
brotli_types text/html text/css application/javascript application/json
             text/xml application/xml image/svg+xml;

# How it works:
# Browser sends:  Accept-Encoding: gzip, deflate, br
# Server checks:  Does client support br (Brotli)?
#   Yes -> Content-Encoding: br  (best compression)
#   No  -> Content-Encoding: gzip (fallback)

# Pre-compress static assets at build time (maximum compression)
# brotli -q 11 dist/main.js -o dist/main.js.br
# gzip -9 dist/main.js -c > dist/main.js.gz
# nginx serves pre-compressed files instantly (no CPU cost per request)

Database & Storage Compression

# PostgreSQL: enable compression on TOAST (large values)
# Automatic — values > 2KB are compressed with pglz (LZ-family)

# PostgreSQL 16+: zstd compression for WAL and backups
pg_basebackup --compress=zstd:3 -D /backups/latest

# Redis: no built-in compression, but compress at app level
import redis
import zstandard as zstd

r = redis.Redis()
compressor = zstd.ZstdCompressor(level=3)
decompressor = zstd.ZstdDecompressor()

# Store compressed
data = b'{"user": "Alice", "orders": [...]}'
r.set("user:1", compressor.compress(data))

# Retrieve and decompress
compressed = r.get("user:1")
original = decompressor.decompress(compressed)

# Kafka: compression per-topic
# Producer config: compression.type=zstd (or lz4, snappy, gzip)
# zstd gives best ratio, lz4 gives best throughput

# Parquet files: columnar format + compression per column
import pyarrow.parquet as pq
pq.write_table(table, "data.parquet", compression="zstd")
# Snappy is the default; zstd gives 20-30% better compression

The Decision Guide

Which Compression Algorithm Should You Use?
Use Case Best Algorithm Why
Web server (dynamic)zstd or gzipFast compression per-request, universal browser support
Web server (static assets)Brotli -11 (pre-compressed)Best ratio for web, compress once at build time
General purpose / defaultzstd -3Best speed/ratio tradeoff in 2026. Replace gzip with this.
Real-time / latency-criticallz4Fastest compression and decompression available
Kafka / message queueslz4 or zstdlz4 for throughput, zstd for ratio
Database backupszstd -19Best ratio, slow compress is fine (backup once), fast decompress for restores
Software distributionxz -9 or zstd -19Smallest download size, compress once
Parquet / columnar datazstd (or snappy default)zstd gives 20-30% better compression than snappy
Legacy / universal compatgzipEverything supports gzip. Use when nothing else is available.

The Bottom Line

If you remember nothing else:

  • Default choice in 2026: Use zstd. It's faster than gzip at every compression level while achieving better ratios. It's supported by Linux, most databases, and major cloud providers.
  • Web assets: Use Brotli for static files (pre-compressed at build time) and gzip as a fallback for old browsers.
  • Need maximum speed: Use lz4. Nothing else comes close for latency-sensitive workloads.
  • Need smallest file: Use xz or zstd -19. Compression is slow but the result is tiny.
  • Stop using bzip2. zstd is better in every dimension — faster compression, faster decompression, and comparable ratio.

Compression is one of the highest-leverage optimizations in software engineering. Choosing the right algorithm for your workload can cut storage costs by 70%, reduce network transfer times by 80%, and speed up data pipelines by 10x. The benchmarks above give you the data — now pick the right tool for your specific use case.