Okay, let me start from the very beginning. Forget everything about Delta Lake, Iceberg, or Hudi for a moment. Let's start with a question that seems silly but is actually the foundation of this entire topic:
What happens when you save data to a file?
You write a CSV. You save it. Done. Now imagine a hundred people need to read that CSV at the same time. And while they're reading, someone else is updating it. And another person is deleting rows. And someone wants to see what the data looked like yesterday, not today.
That CSV is going to have a bad day.
This is the problem that table formats solve. They take dumb files sitting on storage and give them superpowers — like a database, but without an actual database server.
Let's Start Simple: What is a File? What is a Table?
When your data lives in cloud storage (like Amazon S3, Google Cloud Storage, or Azure Blob), it's stored as files. These files can be in different formats:
Think of it this way: a table format is a layer of intelligence that sits on top of files. The files are still Parquet files on S3 — but the table format adds a transaction log, schema tracking, and metadata that makes these files behave like a database table.
Why Can't We Just Use a Database?
Fair question. If you want transactions, schema, and updates — why not just use PostgreSQL or MySQL?
The answer is scale and cost. When you have 10 TB of data, a traditional database costs a fortune and is slow. Cloud object storage (S3) is 10-100x cheaper and can hold petabytes. But raw S3 has none of the nice features of a database. Table formats give you the best of both worlds:
| Feature | Database | Raw Data Lake | Lakehouse (Table Format) |
|---|---|---|---|
| Storage cost | Expensive | Cheapest (S3) | Cheapest (S3) |
| ACID transactions | Yes | No | Yes |
| Schema enforcement | Yes | No | Yes |
| Time travel | Limited | No | Yes (days/weeks) |
| Scale | TBs (hard limit) | Petabytes | Petabytes |
| UPDATE / DELETE | Yes | No (append only) | Yes |
How Table Formats Actually Work (Under the Hood)
Every table format works the same basic way. Instead of just dumping files, they maintain a metadata layer — usually a log file or a set of manifest files — that tracks which data files belong to the table, what the schema is, and what version of the data you're looking at.
# What a Delta Lake table looks like on S3:
s3://my-lake/sales/revenue/
_delta_log/ # Transaction log (the magic)
00000000000000000000.json # Version 0: initial table creation
00000000000000000001.json # Version 1: first data insert
00000000000000000002.json # Version 2: update some rows
00000000000000000003.json # Version 3: delete expired rows
part-00000-abc123.parquet # Data file (current)
part-00001-def456.parquet # Data file (current)
part-00002-ghi789.parquet # Data file (old, superseded by version 3)
# What a Iceberg table looks like on S3:
s3://my-lake/sales/revenue/
metadata/
v1.metadata.json # Table metadata (schema, partitioning)
v2.metadata.json # Updated metadata after schema change
snap-001-abc.avro # Snapshot manifest list
data/
year=2025/quarter=3/
file-001.parquet # Data file
file-002.parquet # Data file
# The key insight: the DATA FILES are the same (Parquet).
# The METADATA LAYER is what makes Delta vs Iceberg different.
Delta Lake — The Databricks Standard
Delta Lake was created by Databricks and is the default table format on their platform. It's the most mature table format and the most widely used in the Spark ecosystem.
Key idea: Delta Lake uses a JSON-based transaction log (_delta_log/) to track every change to the table. Every write operation creates a new log entry. Reading the log tells you exactly which Parquet files make up the current version of the table.
# PySpark: Create a Delta table
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("DeltaExample") \
.config("spark.jars.packages", "io.delta:delta-spark_2.12:3.2.0") \
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
.getOrCreate()
# Create a DataFrame
data = [
(1, "Alice", 120000, "Engineering"),
(2, "Bob", 95000, "Marketing"),
(3, "Charlie", 140000, "Engineering"),
]
df = spark.createDataFrame(data, ["id", "name", "salary", "department"])
# Write as a Delta table
df.write.format("delta").mode("overwrite").save("s3://my-lake/employees")
# Read it back
employees = spark.read.format("delta").load("s3://my-lake/employees")
employees.show()
# UPDATE: Give everyone in Engineering a 10% raise
from delta.tables import DeltaTable
dt = DeltaTable.forPath(spark, "s3://my-lake/employees")
dt.update(
condition="department = 'Engineering'",
set={"salary": "salary * 1.10"}
)
# DELETE: Remove Bob
dt.delete("name = 'Bob'")
# TIME TRAVEL: What did the table look like 2 versions ago?
old_data = spark.read.format("delta") \
.option("versionAsOf", 0) \
.load("s3://my-lake/employees")
old_data.show()
# Shows the original data before updates and deletes!
# SCHEMA EVOLUTION: Add a new column
from pyspark.sql.functions import lit
new_data = spark.createDataFrame(
[(4, "Diana", 110000, "Sales", "2025-01-15")],
["id", "name", "salary", "department", "hire_date"]
)
new_data.write.format("delta") \
.mode("append") \
.option("mergeSchema", "true") \
.save("s3://my-lake/employees")
# The table now has a "hire_date" column — old rows have NULL
Apache Iceberg — The Open Standard
Iceberg was created by Netflix and donated to Apache. It's designed to be engine-agnostic — it works with Spark, Trino, Flink, Dremio, Athena, BigQuery, Snowflake, and many more. If you want maximum portability across engines and clouds, Iceberg is your best bet.
Key difference from Delta: Iceberg uses a tree of metadata files (snapshot → manifest list → manifest → data files) instead of a linear transaction log. This makes it faster for tables with millions of files because it doesn't need to read every log entry from the beginning.
# PySpark: Create an Iceberg table
spark = SparkSession.builder \
.appName("IcebergExample") \
.config("spark.jars.packages", "org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.0") \
.config("spark.sql.catalog.my_catalog", "org.apache.iceberg.spark.SparkCatalog") \
.config("spark.sql.catalog.my_catalog.type", "hadoop") \
.config("spark.sql.catalog.my_catalog.warehouse", "s3://my-lake/iceberg/") \
.getOrCreate()
# Create table using SQL
spark.sql("""
CREATE TABLE my_catalog.sales.revenue (
transaction_id BIGINT,
customer_id BIGINT,
amount DECIMAL(10,2),
currency STRING,
created_at TIMESTAMP
)
USING iceberg
PARTITIONED BY (days(created_at))
""")
# Notice: PARTITIONED BY days(created_at) — Iceberg supports
# partition TRANSFORMS (days, months, hours, bucket, truncate).
# No need to create extra partition columns!
# Insert data
spark.sql("""
INSERT INTO my_catalog.sales.revenue VALUES
(1, 101, 99.99, 'USD', TIMESTAMP '2025-07-15 10:30:00'),
(2, 102, 149.50, 'USD', TIMESTAMP '2025-07-15 11:45:00')
""")
# Time travel using snapshots
spark.sql("""
SELECT * FROM my_catalog.sales.revenue
VERSION AS OF 1 -- Snapshot ID
""")
# Or by timestamp:
spark.sql("""
SELECT * FROM my_catalog.sales.revenue
TIMESTAMP AS OF '2025-07-14 00:00:00'
""")
# Schema evolution (add a column — no rewrite needed!)
spark.sql("""
ALTER TABLE my_catalog.sales.revenue
ADD COLUMN payment_method STRING
""")
# Partition evolution (change partitioning without rewriting data!)
spark.sql("""
ALTER TABLE my_catalog.sales.revenue
ADD PARTITION FIELD months(created_at)
""")
# Old data stays partitioned by day. New data partitioned by month.
# Both are queryable seamlessly. This is Iceberg's killer feature.
AWS S3 Tables — The New Kid (2024)
S3 Tables is Amazon's newest offering (announced re:Invent 2024). It builds Apache Iceberg support directly into S3 itself. Instead of managing Iceberg metadata files yourself, S3 handles it natively. You interact with "table buckets" instead of regular buckets.
# S3 Tables: Create a table bucket
aws s3tables create-table-bucket \
--name my-analytics-bucket \
--region us-east-1
# Create a namespace (like a database/schema)
aws s3tables create-namespace \
--table-bucket-arn arn:aws:s3tables:us-east-1:123456789:bucket/my-analytics-bucket \
--namespace sales
# Create a table (Iceberg, managed by S3)
aws s3tables create-table \
--table-bucket-arn arn:aws:s3tables:us-east-1:123456789:bucket/my-analytics-bucket \
--namespace sales \
--name revenue \
--format ICEBERG
# Benefits:
# 1. S3 automatically manages Iceberg metadata (no manual compaction)
# 2. S3 handles snapshot management and garbage collection
# 3. Up to 3x faster queries vs self-managed Iceberg on S3
# 4. Up to 10x more transactions/second vs regular S3
# 5. Works with Athena, EMR, Redshift, Glue — any Iceberg client
# Query via Athena:
# SELECT SUM(amount) FROM sales.revenue WHERE created_at > '2025-01-01';
# S3 Tables + Athena = serverless data warehouse
Apache Hudi — The Streaming-First Format
Apache Hudi (Hadoop Upserts Deletes and Incrementals) was created by Uber. Its superpower is incremental processing — efficiently processing only the rows that changed since the last read. This makes it great for near-real-time data pipelines.
# Hudi is great for streaming use cases like:
# - CDC (Change Data Capture) from databases
# - Real-time event processing
# - Incremental ETL pipelines
# PySpark: Write a Hudi table
df.write.format("hudi") \
.option("hoodie.table.name", "orders") \
.option("hoodie.datasource.write.recordkey.field", "order_id") \
.option("hoodie.datasource.write.precombine.field", "updated_at") \
.option("hoodie.datasource.write.operation", "upsert") \
.mode("append") \
.save("s3://my-lake/orders")
# Read only the changes since last sync (incremental query):
spark.read.format("hudi") \
.option("hoodie.datasource.query.type", "incremental") \
.option("hoodie.datasource.read.begin.instanttime", "20250715100000") \
.load("s3://my-lake/orders") \
.show()
# Returns ONLY rows that changed after July 15, 10:00 AM
# Instead of re-reading the entire table — massive efficiency gain
The Big Comparison
| Feature | Delta Lake | Iceberg | Hudi | S3 Tables |
|---|---|---|---|---|
| Creator | Databricks | Netflix | Uber | AWS |
| Open source | Yes (Apache 2.0) | Yes (Apache 2.0) | Yes (Apache 2.0) | Managed (uses Iceberg) |
| ACID transactions | Yes | Yes | Yes | Yes |
| Time travel | Yes (by version) | Yes (by snapshot + time) | Yes | Yes |
| Schema evolution | Yes | Yes (full evolution) | Yes | Yes |
| Partition evolution | No (must rewrite) | Yes (zero rewrite!) | No | Yes (Iceberg) |
| Streaming support | Good | Growing | Best (built for it) | Via engines |
| Engine support | Spark-centric | Widest (every engine) | Spark, Flink, Presto | AWS services |
| Cloud support | Any | Any (most portable) | Any | AWS only |
| Best for | Databricks users | Multi-engine, open | Streaming / CDC | AWS-native, zero ops |
Which One Should You Pick?
The Industry Is Converging on Iceberg
I want to be honest about where the industry is heading. While all three formats are excellent, there's a clear trend:
- AWS chose Iceberg for S3 Tables (their newest product).
- Snowflake chose Iceberg for external tables and Polaris Catalog.
- Google BigQuery supports Iceberg tables natively.
- Databricks now supports reading/writing Iceberg tables alongside Delta Lake, and announced Delta-Iceberg interoperability.
- Confluent (Kafka) chose Iceberg for their Tableflow product.
- Dremio, Starburst, Cloudera — all Iceberg-first.
If you're starting fresh in 2026 and don't have an existing Delta Lake investment, Iceberg is the safest bet. It has the broadest engine support, the most open governance, and the strongest industry momentum.
That said — if you're on Databricks, use Delta Lake. It's excellent, deeply integrated, and Databricks is working on Iceberg compatibility. Don't fight your platform.
Getting Started: Your First Table in 5 Minutes
# The fastest way to try each format:
# ── Delta Lake (via PySpark) ──────────────────
pip install delta-spark pyspark
# Then: df.write.format("delta").save("./my_delta_table")
# ── Iceberg (via Spark + local catalog) ───────
pip install pyspark
# Start Spark with Iceberg runtime JAR and write Iceberg tables
# ── S3 Tables (via AWS CLI) ──────────────────
aws s3tables create-table-bucket --name my-bucket
# Then query via Athena — serverless, no Spark needed
# ── Hudi (via PySpark) ───────────────────────
pip install pyspark
# Start Spark with Hudi JAR and write Hudi tables
One Last Analogy
If your data files (Parquet) are books, then:
- A raw data lake is a pile of books on the floor. You can add more books, but finding anything requires digging through the entire pile.
- A table format is a bookshelf with a table of contents, an index, and a checkout log. You can find any book instantly, know who borrowed it, see what was on the shelf yesterday, and add new books without disrupting anyone who's currently reading.
- A metastore (Hive, Glue, Unity Catalog) is the library catalogue system that tells you which bookshelf to go to.
Together, they turn a chaotic storage bucket into something that feels like a proper database — but at data lake prices and data lake scale. That's the lakehouse revolution, and now you understand what's actually happening under the hood.