Okay, let me start from the very beginning. Forget everything about Delta Lake, Iceberg, or Hudi for a moment. Let's start with a question that seems silly but is actually the foundation of this entire topic:

What happens when you save data to a file?

You write a CSV. You save it. Done. Now imagine a hundred people need to read that CSV at the same time. And while they're reading, someone else is updating it. And another person is deleting rows. And someone wants to see what the data looked like yesterday, not today.

That CSV is going to have a bad day.

This is the problem that table formats solve. They take dumb files sitting on storage and give them superpowers — like a database, but without an actual database server.

Let's Start Simple: What is a File? What is a Table?

When your data lives in cloud storage (like Amazon S3, Google Cloud Storage, or Azure Blob), it's stored as files. These files can be in different formats:

Raw Files vs Table Format
📄 Raw Files (CSV, Parquet, JSON)
📁Just files in a folder on S3
No schema enforcement — columns can be anything
No transactions — partial writes can corrupt data
No time travel — once overwritten, old data is gone
No updates/deletes — only append or full rewrite
VS
📊 Table Format (Delta, Iceberg, Hudi)
📊A "table" abstraction over files on S3
Schema enforced — rejects bad data automatically
ACID transactions — writes are all-or-nothing
Time travel — query data as it was 3 days ago
Updates & deletes — just like a regular database

Think of it this way: a table format is a layer of intelligence that sits on top of files. The files are still Parquet files on S3 — but the table format adds a transaction log, schema tracking, and metadata that makes these files behave like a database table.

Why Can't We Just Use a Database?

Fair question. If you want transactions, schema, and updates — why not just use PostgreSQL or MySQL?

The answer is scale and cost. When you have 10 TB of data, a traditional database costs a fortune and is slow. Cloud object storage (S3) is 10-100x cheaper and can hold petabytes. But raw S3 has none of the nice features of a database. Table formats give you the best of both worlds:

Database vs Data Lake vs Lakehouse (Table Formats)
Feature Database Raw Data Lake Lakehouse (Table Format)
Storage costExpensiveCheapest (S3)Cheapest (S3)
ACID transactionsYesNoYes
Schema enforcementYesNoYes
Time travelLimitedNoYes (days/weeks)
ScaleTBs (hard limit)PetabytesPetabytes
UPDATE / DELETEYesNo (append only)Yes

How Table Formats Actually Work (Under the Hood)

Every table format works the same basic way. Instead of just dumping files, they maintain a metadata layer — usually a log file or a set of manifest files — that tracks which data files belong to the table, what the schema is, and what version of the data you're looking at.

How a Table Format Organises Data
Transaction Log / MetadataJSON or Avro files tracking: which data files are "current", schema version, commit history
Manifest Files (pointers)Lists of data file paths + partition info + file-level statistics (min/max values, row count)
Data Files (actual data)Parquet files containing the rows and columns. These are what query engines actually read.
Cloud Object Storage (S3 / GCS / ADLS)Everything lives here as plain files. No database server. Just storage.
# What a Delta Lake table looks like on S3:
s3://my-lake/sales/revenue/
  _delta_log/                    # Transaction log (the magic)
    00000000000000000000.json     # Version 0: initial table creation
    00000000000000000001.json     # Version 1: first data insert
    00000000000000000002.json     # Version 2: update some rows
    00000000000000000003.json     # Version 3: delete expired rows
  part-00000-abc123.parquet      # Data file (current)
  part-00001-def456.parquet      # Data file (current)
  part-00002-ghi789.parquet      # Data file (old, superseded by version 3)

# What a Iceberg table looks like on S3:
s3://my-lake/sales/revenue/
  metadata/
    v1.metadata.json             # Table metadata (schema, partitioning)
    v2.metadata.json             # Updated metadata after schema change
    snap-001-abc.avro            # Snapshot manifest list
  data/
    year=2025/quarter=3/
      file-001.parquet           # Data file
      file-002.parquet           # Data file

# The key insight: the DATA FILES are the same (Parquet).
# The METADATA LAYER is what makes Delta vs Iceberg different.

Delta Lake — The Databricks Standard

Delta Lake was created by Databricks and is the default table format on their platform. It's the most mature table format and the most widely used in the Spark ecosystem.

Key idea: Delta Lake uses a JSON-based transaction log (_delta_log/) to track every change to the table. Every write operation creates a new log entry. Reading the log tells you exactly which Parquet files make up the current version of the table.

# PySpark: Create a Delta table
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("DeltaExample") \
    .config("spark.jars.packages", "io.delta:delta-spark_2.12:3.2.0") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
    .getOrCreate()

# Create a DataFrame
data = [
    (1, "Alice", 120000, "Engineering"),
    (2, "Bob", 95000, "Marketing"),
    (3, "Charlie", 140000, "Engineering"),
]
df = spark.createDataFrame(data, ["id", "name", "salary", "department"])

# Write as a Delta table
df.write.format("delta").mode("overwrite").save("s3://my-lake/employees")

# Read it back
employees = spark.read.format("delta").load("s3://my-lake/employees")
employees.show()

# UPDATE: Give everyone in Engineering a 10% raise
from delta.tables import DeltaTable

dt = DeltaTable.forPath(spark, "s3://my-lake/employees")
dt.update(
    condition="department = 'Engineering'",
    set={"salary": "salary * 1.10"}
)

# DELETE: Remove Bob
dt.delete("name = 'Bob'")

# TIME TRAVEL: What did the table look like 2 versions ago?
old_data = spark.read.format("delta") \
    .option("versionAsOf", 0) \
    .load("s3://my-lake/employees")
old_data.show()
# Shows the original data before updates and deletes!

# SCHEMA EVOLUTION: Add a new column
from pyspark.sql.functions import lit
new_data = spark.createDataFrame(
    [(4, "Diana", 110000, "Sales", "2025-01-15")],
    ["id", "name", "salary", "department", "hire_date"]
)
new_data.write.format("delta") \
    .mode("append") \
    .option("mergeSchema", "true") \
    .save("s3://my-lake/employees")
# The table now has a "hire_date" column — old rows have NULL

Apache Iceberg — The Open Standard

Iceberg was created by Netflix and donated to Apache. It's designed to be engine-agnostic — it works with Spark, Trino, Flink, Dremio, Athena, BigQuery, Snowflake, and many more. If you want maximum portability across engines and clouds, Iceberg is your best bet.

Key difference from Delta: Iceberg uses a tree of metadata files (snapshot → manifest list → manifest → data files) instead of a linear transaction log. This makes it faster for tables with millions of files because it doesn't need to read every log entry from the beginning.

# PySpark: Create an Iceberg table
spark = SparkSession.builder \
    .appName("IcebergExample") \
    .config("spark.jars.packages", "org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.0") \
    .config("spark.sql.catalog.my_catalog", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.my_catalog.type", "hadoop") \
    .config("spark.sql.catalog.my_catalog.warehouse", "s3://my-lake/iceberg/") \
    .getOrCreate()

# Create table using SQL
spark.sql("""
    CREATE TABLE my_catalog.sales.revenue (
        transaction_id BIGINT,
        customer_id BIGINT,
        amount DECIMAL(10,2),
        currency STRING,
        created_at TIMESTAMP
    )
    USING iceberg
    PARTITIONED BY (days(created_at))
""")
# Notice: PARTITIONED BY days(created_at) — Iceberg supports
# partition TRANSFORMS (days, months, hours, bucket, truncate).
# No need to create extra partition columns!

# Insert data
spark.sql("""
    INSERT INTO my_catalog.sales.revenue VALUES
    (1, 101, 99.99, 'USD', TIMESTAMP '2025-07-15 10:30:00'),
    (2, 102, 149.50, 'USD', TIMESTAMP '2025-07-15 11:45:00')
""")

# Time travel using snapshots
spark.sql("""
    SELECT * FROM my_catalog.sales.revenue
    VERSION AS OF 1  -- Snapshot ID
""")

# Or by timestamp:
spark.sql("""
    SELECT * FROM my_catalog.sales.revenue
    TIMESTAMP AS OF '2025-07-14 00:00:00'
""")

# Schema evolution (add a column — no rewrite needed!)
spark.sql("""
    ALTER TABLE my_catalog.sales.revenue
    ADD COLUMN payment_method STRING
""")

# Partition evolution (change partitioning without rewriting data!)
spark.sql("""
    ALTER TABLE my_catalog.sales.revenue
    ADD PARTITION FIELD months(created_at)
""")
# Old data stays partitioned by day. New data partitioned by month.
# Both are queryable seamlessly. This is Iceberg's killer feature.

AWS S3 Tables — The New Kid (2024)

S3 Tables is Amazon's newest offering (announced re:Invent 2024). It builds Apache Iceberg support directly into S3 itself. Instead of managing Iceberg metadata files yourself, S3 handles it natively. You interact with "table buckets" instead of regular buckets.

# S3 Tables: Create a table bucket
aws s3tables create-table-bucket \
  --name my-analytics-bucket \
  --region us-east-1

# Create a namespace (like a database/schema)
aws s3tables create-namespace \
  --table-bucket-arn arn:aws:s3tables:us-east-1:123456789:bucket/my-analytics-bucket \
  --namespace sales

# Create a table (Iceberg, managed by S3)
aws s3tables create-table \
  --table-bucket-arn arn:aws:s3tables:us-east-1:123456789:bucket/my-analytics-bucket \
  --namespace sales \
  --name revenue \
  --format ICEBERG

# Benefits:
# 1. S3 automatically manages Iceberg metadata (no manual compaction)
# 2. S3 handles snapshot management and garbage collection
# 3. Up to 3x faster queries vs self-managed Iceberg on S3
# 4. Up to 10x more transactions/second vs regular S3
# 5. Works with Athena, EMR, Redshift, Glue — any Iceberg client

# Query via Athena:
# SELECT SUM(amount) FROM sales.revenue WHERE created_at > '2025-01-01';
# S3 Tables + Athena = serverless data warehouse

Apache Hudi — The Streaming-First Format

Apache Hudi (Hadoop Upserts Deletes and Incrementals) was created by Uber. Its superpower is incremental processing — efficiently processing only the rows that changed since the last read. This makes it great for near-real-time data pipelines.

# Hudi is great for streaming use cases like:
# - CDC (Change Data Capture) from databases
# - Real-time event processing
# - Incremental ETL pipelines

# PySpark: Write a Hudi table
df.write.format("hudi") \
    .option("hoodie.table.name", "orders") \
    .option("hoodie.datasource.write.recordkey.field", "order_id") \
    .option("hoodie.datasource.write.precombine.field", "updated_at") \
    .option("hoodie.datasource.write.operation", "upsert") \
    .mode("append") \
    .save("s3://my-lake/orders")

# Read only the changes since last sync (incremental query):
spark.read.format("hudi") \
    .option("hoodie.datasource.query.type", "incremental") \
    .option("hoodie.datasource.read.begin.instanttime", "20250715100000") \
    .load("s3://my-lake/orders") \
    .show()
# Returns ONLY rows that changed after July 15, 10:00 AM
# Instead of re-reading the entire table — massive efficiency gain

The Big Comparison

Table Format Comparison
Feature Delta Lake Iceberg Hudi S3 Tables
CreatorDatabricksNetflixUberAWS
Open sourceYes (Apache 2.0)Yes (Apache 2.0)Yes (Apache 2.0)Managed (uses Iceberg)
ACID transactionsYesYesYesYes
Time travelYes (by version)Yes (by snapshot + time)YesYes
Schema evolutionYesYes (full evolution)YesYes
Partition evolutionNo (must rewrite)Yes (zero rewrite!)NoYes (Iceberg)
Streaming supportGoodGrowingBest (built for it)Via engines
Engine supportSpark-centricWidest (every engine)Spark, Flink, PrestoAWS services
Cloud supportAnyAny (most portable)AnyAWS only
Best forDatabricks usersMulti-engine, openStreaming / CDCAWS-native, zero ops

Which One Should You Pick?

Which Table Format Should You Use?
What's your situation?
Using Databricks?
Delta LakeNative, best integration
Multi-engine / multi-cloud?
Apache IcebergMost portable, open standard
Real-time CDC / streaming?
Apache HudiBuilt for incremental

The Industry Is Converging on Iceberg

I want to be honest about where the industry is heading. While all three formats are excellent, there's a clear trend:

  • AWS chose Iceberg for S3 Tables (their newest product).
  • Snowflake chose Iceberg for external tables and Polaris Catalog.
  • Google BigQuery supports Iceberg tables natively.
  • Databricks now supports reading/writing Iceberg tables alongside Delta Lake, and announced Delta-Iceberg interoperability.
  • Confluent (Kafka) chose Iceberg for their Tableflow product.
  • Dremio, Starburst, Cloudera — all Iceberg-first.

If you're starting fresh in 2026 and don't have an existing Delta Lake investment, Iceberg is the safest bet. It has the broadest engine support, the most open governance, and the strongest industry momentum.

That said — if you're on Databricks, use Delta Lake. It's excellent, deeply integrated, and Databricks is working on Iceberg compatibility. Don't fight your platform.

Getting Started: Your First Table in 5 Minutes

# The fastest way to try each format:

# ── Delta Lake (via PySpark) ──────────────────
pip install delta-spark pyspark
# Then: df.write.format("delta").save("./my_delta_table")

# ── Iceberg (via Spark + local catalog) ───────
pip install pyspark
# Start Spark with Iceberg runtime JAR and write Iceberg tables

# ── S3 Tables (via AWS CLI) ──────────────────
aws s3tables create-table-bucket --name my-bucket
# Then query via Athena — serverless, no Spark needed

# ── Hudi (via PySpark) ───────────────────────
pip install pyspark
# Start Spark with Hudi JAR and write Hudi tables

One Last Analogy

If your data files (Parquet) are books, then:

  • A raw data lake is a pile of books on the floor. You can add more books, but finding anything requires digging through the entire pile.
  • A table format is a bookshelf with a table of contents, an index, and a checkout log. You can find any book instantly, know who borrowed it, see what was on the shelf yesterday, and add new books without disrupting anyone who's currently reading.
  • A metastore (Hive, Glue, Unity Catalog) is the library catalogue system that tells you which bookshelf to go to.

Together, they turn a chaotic storage bucket into something that feels like a proper database — but at data lake prices and data lake scale. That's the lakehouse revolution, and now you understand what's actually happening under the hood.