How Liquid Clustering Actually Beats Partitioning + Z-Order

The decision tree for every 2TB+ Delta Lake table

Apr 01, 2026

∙ Paid

Half your team wants Liquid Clustering. The other half wants to keep partitioning + Z-ORDER. The debate usually ends with whoever talks loudest, not whoever is right.

The actual answer has nothing to do with preference. It comes down to three things: how stable your query patterns are, how many distinct values your filter columns have, and whether downstream consumers depend on the physical file layout.

Here’s the full decision framework.

The Core Trade-Off

Data Engineer: “We have a 2TB table. Half the team wants Liquid Clustering, the other half wants to keep partitioning + Z-ORDER. How do I actually decide?”

Databricks: “It’s not preference. It’s query patterns.”

Data Engineer: “When does partitioning + Z-ORDER still win?”

Databricks: “When queries always filter on the same low-cardinality column. 50 region values where every query filters by region? Partitioning eliminates 98% of files before Spark reads anything. Z-ORDER then sorts within each partition for a second filter like date. Hard to beat for stable access patterns.”

Data Engineer: “When does Liquid Clustering win?”

Databricks: “Three signals:

First, your filter columns change. Analysts query by user_id Monday, product_id Friday. Partitioning by either penalizes the other. Liquid Clustering lets you change keys with ALTER TABLE. Future writes and OPTIMIZE use the new keys, no full rewrite.

Second, high cardinality. Partitioning by user_id with 5M distinct values creates 5M folders. Liquid Clustering organizes data without folder explosion.

Third, less maintenance. No manual Z-ORDER column tuning. Writes partially cluster on ingest, OPTIMIZE handles the rest.”

Data Engineer: “Can I combine them? Partition by date, then cluster within each partition?”

Databricks: “No. Mutually exclusive on the same table. You pick one.

If downstream consumers read by date path, keep partitioning. Otherwise, CLUSTER BY (date, user_id) gives comparable skipping with less overhead.”

Data Engineer: “So new tables default to Liquid Clustering. Existing partitioned tables with stable patterns might not be worth migrating?”

Databricks: “New tables: Liquid Clustering. Existing tables with stable filters and low cardinality: keep what works. The wrong choice isn’t which technology. It’s migrating a working table for no measurable gain.”

What “Mutually Exclusive” Actually Means in Practice

This is where teams get tripped up. Let’s go deeper.

Data Engineer: “When you say mutually exclusive, what happens if I try to add clustering to a partitioned table?”

Databricks: “You can’t. If a table has partition columns defined, you cannot add CLUSTER BY. You’d need to create a new table with Liquid Clustering and migrate the data. That means a full rewrite of every row.”

Data Engineer: “So for a 2TB table, that’s a meaningful migration cost.”

Databricks: “Exactly. You’re writing 2TB of data to a new table, which means compute cost, downtime or dual-write complexity, and downstream schema changes if anything references the old table. For a table that’s already performing well with partitioning + Z-ORDER, the migration cost often exceeds the benefit.”

Data Engineer: “What about Z-ORDER by itself, without partitioning? Is that also mutually exclusive with Liquid Clustering?”

Databricks: “Yes. You can’t run OPTIMIZE ZORDER BY on a table that has CLUSTER BY defined. They’re different implementations of the same goal - organizing data within files for efficient skipping. Both use space-filling curves (Hilbert curves), but Liquid Clustering adds a tree-based algorithm on top that optimizes for balanced file sizes and handles cardinality and data skew automatically. The practical difference: Z-ORDER requires you to manually pick columns and rerun OPTIMIZE. Liquid Clustering handles layout decisions incrementally and lets you change keys without a full rewrite.”

Data Engineer: “What about existing partitioned tables where the query patterns HAVE shifted?”

Databricks: “That’s where migration is justified. If your team added 3 new dashboards this quarter and each one filters on different columns than the original partition key, you’re already paying the penalty. Every query that doesn’t filter on the partition column does a full scan of every partition. Liquid Clustering would let you cluster on the columns people actually query.”

The Decision Tree

Here’s the framework that resolves the debate in under 2 minutes.

Continue reading this post for free, courtesy of Jakub Lasak.

Or purchase a paid subscription.