The Paid Subscriber Reading List
17 deep-dives available to Pro subscribers: interviews at every salary band, Spark internals, and the career moves between $150k and $250k.
If you’re new here, welcome. If you’ve been around a while, this is a curated map back to the posts that have done the most for paid subscribers. Worth bookmarking either way.
17 posts grouped into the three things subscribers come here for most: interview prep at every salary band, Spark and Delta internals deep enough to survive onsite questioning, and the career patterns that separate $150k offers from $250k ones.
Pick the section that matches where you’re stuck right now.
If you’re prepping for a Databricks interview
Six posts that map the question types at every salary band, plus the meta-framework for what interviewers are actually scoring.
Spark Performance Interview Questions: What Interviewers Actually Evaluate Most candidates describe what they did. This post breaks down what interviewers are scoring when they ask “how would you tune this slow job,” so you stop reciting features and start sounding senior. Same answer, different framing, different offer.
10 Interview Questions for Junior Databricks DE Roles ($80-100k) The questions hiring managers ask at the entry tier, with the answers that pass and the ones that get cut. Calibrates the bar so you stop over-preparing for senior questions you won’t get, and under-preparing for the basics they actually test.
10 Popular Interview Questions for Mid-Level Databricks DE Roles ($120-150k) The middle tier is the trap. Same questions as junior, but the bar is “can you make a decision under tradeoffs” instead of “can you name the feature.” Covers the questions that separate offers from polite rejections at this band.
10 Interview Questions for Senior Databricks DE Roles ($180-220k) The questions sound deceptively easy. “Tell me about a time you optimized a pipeline.” The post shows what senior-tier answers actually contain (constraints, trade-offs, what you’d do differently next time) and how juniors give the same answer and lose the offer.
10 Delta Lake Questions for $150k+ DE Roles The Delta Lake Q&A you need before any senior loop touching ingestion, streaming, or governance. ACID semantics, time travel, schema evolution, MERGE behavior. The answers that don’t get you hand-waved past.
Partition Pruning vs Data Skipping: A Databricks Interview Deep Dive The one topic candidates confidently get wrong on every onsite. Covers the distinction interviewers expect you to know (and most don’t), and how to answer it in a way that signals production experience instead of textbook recall.
If you want to go deep on Spark and Databricks internals
Six posts that read like reference material. The ones subscribers come back to weeks later when something in production stops making sense.
How Liquid Clustering Actually Works in Databricks The most-read post on this Substack. Covers what’s happening under the clustering keys, why Z-Order is now legacy, when partitioning still wins, and the case where Liquid Clustering is the wrong choice. The post that taught most readers the difference between knowing the feature and actually using it.
How Liquid Clustering Actually Beats Partitioning + Z-Order The benchmark you’d run yourself if you had time. Side-by-side comparison on real workloads showing where Liquid wins, where it ties, and where the legacy approach is still faster. Settles the “should I migrate” question with numbers.
How One SQL Line Becomes 4,000 Parallel Tasks Query, to logical plan, to physical plan, to tasks. End to end. Shows what Spark actually does between you typing SELECT and the cluster spinning up. The mental model interview panels probe for and the one production work depends on.
Understanding Spark Shuffle: The Complete Architecture Guide The reference for why your job is slow. Covers what physically happens during a shuffle, why most performance problems are shuffle problems, and the patterns that avoid them in production. If you only read one technical post, read this one.
Inside the Delta Lake Transaction Log: From Write to Time Travel How Delta achieves ACID, schema evolution, and time travel using only JSON and Parquet. The deep-dive that explains why MERGE INTO behaves the way it does and what’s actually sitting in the
_delta_log/folder.The Databricks Compute Selection Guide: Jobs, All-Purpose, SQL Warehouses, Serverless Decision tree for picking the right cluster type. Covers when each saves money, when each costs you more, and how to defend the choice in a code review or architecture interview.
If you’re trying to advance to the next level
Five posts on the patterns that separate the salary bands. These are the ones senior engineers say they wish they’d read three years earlier.
The Databricks Debugging Maturity Ladder: Junior to Principal The same broken pipeline, debugged five different ways. Shows how each level approaches a production failure. The ladder is what gets you promoted, not more YAML.
The Databricks Hiring Manager’s Guide: Why Certifications Don’t Matter What hiring managers actually weigh after the resume screen. Includes what gets you to the next round, what doesn’t, and why certifications keep getting candidates filtered before the technical loop even starts.
The $150k vs $250k Databricks Data Engineer: Why Technical Skills Aren’t Enough The gap between bands isn’t more SQL. Covers the specific behaviors and decision patterns that move you from mid-tier to senior bands, with examples from real promotion cases.
Why Two Correct Answers Get Different Salary Bands Same technical answer, different framing, different offer. Breaks down how senior framing signals production credibility, and why “correct” isn’t enough at the $200k+ tier.
What 3 Years of Databricks Experience Actually Means What three years of “Databricks experience” looks like to a hiring manager versus what most engineers think it means. Calibrates whether you’re actually senior-track, or just stayed at the same job for three years.
How to use this list
If you’re prepping for an onsite in the next month, start at the top of the Interview Prep section and work down. If you’re trying to get promoted at your current job, start with the Debugging Maturity Ladder and the $150k vs $250k post. If something in production is slow and you don’t know why, start with Spark Shuffle.
Most of the posts above are Pro-only. Pro is $9/month, 1-2 new posts per week, full archive included, plus the Senior Databricks Data Engineer Interview Cheat Sheet - the prep kit I built for $180k+ onsite loops, free for Pro subscribers. New subscribers tell me they read the archive in the first weekend and the math works out fast.
The rest of the brand
The Substack is one piece. Here’s everything else I’ve built for Databricks data engineers, in case the format you want isn’t long-form posts.
Free tools - start here if you’re not ready to subscribe
DataDojo - Duolingo-style daily practice. 633 exercises across 7 zones, certification prep for both DE Associate and Professional exams, streaks, leaderboards. Free forever, runs in the browser or as a PWA on your phone.
Databricks 100 - the 100 must-know concepts, structured as a self-score checklist. Find your gaps before an interview or a promo cycle.
Awesome Databricks - 170+ curated tools, courses, creators, labs, and communities. The directory I wish existed when I started.
Databricks Code Practice - 104 LeetCode-style reps + 5 end-to-end pipeline labs (DLT streaming, fintech monitoring, PySpark cert prep). All runs on Databricks Free Edition.
Other paid products - if a one-time purchase fits better than a subscription
Interview Cheat Sheets by level - Junior ($95K-$135K) and Mid ($135K-$180K) are $9 each on launch ($19 regular). The Senior cheat sheet ($175K-$210K+) is included free with your Pro subscription - you’ll get an email after subscribing with the login link to access it.
The full Cheat Sheet Bundle - all three levels, 114 questions, 15 decision frameworks, 45 red flags, 3 day-of checklists. $24 on launch ($39 regular). Best value if you’re between bands or coaching someone.
Listen and follow
The Databricks Data Engineer podcast - weekly episodes on Spotify, Apple, and YouTube. Same topics, different format, easier on the commute.
LinkedIn (@jrlasak) - 3 posts a week, Mon/Wed/Fri. Free Substack content lives here too.
Full follow page - every channel in one place.
Hit reply on any post and tell me what you’re stuck on. I read every one.
- Jakub

