Three Medallion Layers Is Cargo Cult on a Small Team
Every Databricks team treats Medallion as the default architecture. The teams shipping fastest treat it as one option among several, and they earn each extra layer instead of inheriting it.
That distinction is the difference between a platform team running 36 jobs to land one dashboard and a senior engineer running 14 and going home on time.
The reference architecture wasn’t wrong. It was written for a Fortune 500 with hundreds of sources, dozens of consumer teams, and a dedicated platform org maintaining the contract between layers. Most teams reading the blog post have 4 engineers, 15 source tables, and one analytics team. Three layers on that footprint isn’t best practice. It’s cargo cult.
What three layers actually cost on a small footprint
Every layer is not a free abstraction. It charges rent in four currencies, and most teams only notice the first one.
Storage: triple the bytes when most Silver tables are near-copies of Bronze and most Gold tables are denormalized cuts that could be views.
Orchestration: 12 source tables become 36 jobs plus the dependency graph that wires them together. Every job is a new failure point, a new alert, a new on-call page.
Surface area for breakage: schema evolution now has to be propagated through three layers, not one. A column rename in Bronze becomes a three-PR migration.
Cognitive tax on the team: maintenance jobs, expectations, docs, access grants, lineage tooling, and code review of trivial “move one column from Silver to Gold” PRs. This is where senior engineers’ time actually goes.
The tax is invisible per-table. It compounds at the team level. A team of four with 15 sources and three layers is doing the platform work of a team of fifteen, just badly and on the side.
The two patterns that masquerade as “Silver”
Two specific anti-patterns absorb most of the waste. If you can name them in your own pipelines, you’ve already found the layer to collapse.
The pass-through Silver. Read Bronze, cast a few types, maybe drop a malformed row, write to Silver. No business logic, no joins, no enrichment. It adds latency, doubles storage, doubles orchestration, and produces nothing a Bronze read with a CAST in the SELECT couldn’t produce. It’s not a layer. It’s a rename with extra steps.
The single-consumer Gold. One curated table, one dashboard, one analyst. The “curation” is three filters and a SUM. That table doesn’t need its own write, its own schedule, its own retention policy, or its own grants. It needs to be a view on top of Silver (or Bronze). Materializing it is a habit, not a decision.
Both patterns exist because someone treated the three-layer diagram as a checklist instead of a hypothesis to test per table.
The five questions that earn the third layer
Before you let a table claim its own layer, force it to answer these. If it can’t, collapse it.
Does more than one downstream consumer read this table? One dashboard owned by one analyst is not multiple consumers. A view is enough.
Do those consumers have different SLAs? A team that needs hourly freshness and a team that needs daily is the textbook case for a curated layer. A team that needs daily and a team that needs “whenever” is not.
Is there real transformation between this layer and the previous one? Joins, business logic, conformed dimensions, aggregations that aren’t trivial. Casts and renames don’t count.
Does the schema differ in a way the consumer depends on? A stable analytical contract that downstream code is written against. If downstream just SELECTs the same columns Bronze has, you don’t have a contract. You have a copy.
Would collapsing this layer force a change in more than two consumer queries? If the answer is no, the layer is decorative.
A table that scores yes on at least three of these earns its layer. One or two yeses means it’s a view. Zero yeses means delete the job.
The questions above handle the consumer side. Source-side correctness is a separate gate: if the table exists to do CDC current-state via MERGE, dedup against late-arriving keys, or SCD type 2 history, it earns its layer regardless of how it scores. Collapsing those into Bronze pushes the correctness work into every consumer query, which is the opposite of what you want.
The decision is per-domain, not per-platform
The senior shift isn’t “two layers good, three layers bad.” It’s refusing to make the layer count a platform-wide decision in the first place.
Finance pipelines often need three real layers: regulated raw data with retention rules, conformed dimensions with business logic and slowly-changing semantics, then curated marts that serve multiple finance teams with different SLAs. Three layers there is earning its keep.
Marketing pipelines often need two: ingest, then a small set of denormalized tables for dashboards and ad-hoc analysis. A “Silver” in marketing is usually a pass-through pretending to be governance.
Streaming pipelines often need two layers plus per-query materializations: a raw landing zone, an enriched stream, and a small set of materialized aggregates per query pattern. Forcing them into Bronze/Silver/Gold either inflates the architecture or distorts the names until they mean nothing.
The default mental model should be: start with two layers per domain. Add a third only when a specific table answers yes to the questions above. Treat the third layer as a feature you ship for one table at a time, not a foundation you pour for the whole platform.
The cheapest pipeline is the one you didn’t build
Every layer you add is permanent. Every layer you collapse releases storage, jobs, alerts, docs, grants, and review time back to the team. Honest caveat: collapsing a live layer with downstream consumers is a migration, not a delete (deprecation window, compat view, parallel run). Worth doing, but plan it. On a small footprint, the steady-state win is the difference between shipping the dashboard the business asked for last sprint and explaining at standup why it’s still in flight.
The next time someone on your team adds a Silver table, ask them to write down, in one sentence, what it does that Bronze doesn’t. If they can’t, you already know the answer.
What’s the most over-engineered Medallion layer you’ve seen survive a code review?
Premium Further Reading
If this piece reframed Medallion as a per-domain decision instead of a platform default, these are the deep-dives a reader would naturally pick up next. Old posts are auto-archived for premium subscribers only.
This article leans on two moves: killing layers that fail the 5-question test, and refusing to copy Fortune 500 patterns onto a 4-engineer team. The picks below extend both ideas across modeling, code review, debugging, and tech-debt judgment.
Why 3NF Is Killing Your Databricks Dashboards: Stop normalizing your analytical tables. The textbook was written for a different database.
Data Modeling in Databricks: The Complete Guide: From stakeholder interviews to production deployment - the 20% of knowledge that delivers 80% of results
Code Reviews at Scale: What Fortune 500 Companies Actually Do (And Why Your Quick LGTM Is Fine): The 80/20 framework that focuses your attention on schema changes, resource sizing, and failure paths - while safely speeding up the rest
The Databricks Debugging Maturity Ladder: Junior to Principal: How engineers progress through trial-and-error, analysis paralysis, and over-engineering before earning the wisdom to find the one bottleneck that matters
The Databricks Technical Debt Navigation Guide: When to Ship and When to Perfect: The pragmatic judgment framework that senior engineers use to balance velocity and quality - with 8 real scenarios, decision templates, and a 30-day practice plan



