The Databricks Data Engineer
Subscribe
Sign in
Home
Interview Like Seniors
Execute Like Seniors
Think Like Seniors
Hands-on Labs
Learning Paths
Archive
About
Execute Like Seniors
Latest
Top
Discussions
Why 3NF Is Killing Your Databricks Dashboards
Stop normalizing your analytical tables. The textbook was written for a different database.
Apr 13
•
Jakub Lasak
10
2
How Liquid Clustering Actually Beats Partitioning + Z-Order
The decision tree for every 2TB+ Delta Lake table
Apr 1
•
Jakub Lasak
6
1
6 Data Quality Checks I Build Into Every Databricks Pipeline
The silent failures that broke executive dashboards for 5 days
Mar 18
•
Jakub Lasak
19
1
How Delta Lake Achieves ACID Using Only JSON Files
The _delta_log commit protocol in 10 minutes
Mar 11
•
Jakub Lasak
6
2
How One SQL Line Becomes 4,000 Parallel Tasks
The complete guide to Spark’s execution engine - from SQL to parallel tasks, and how to find bottlenecks when things get slow.
Feb 27
•
Jakub Lasak
11
Code Reviews at Scale: What Fortune 500 Companies Actually Do (And Why Your Quick LGTM Is Fine)
The 80/20 framework that focuses your attention on schema changes, resource sizing, and failure paths - while safely speeding up the rest
Feb 11
•
Jakub Lasak
2
How Liquid Clustering Actually Works in Databricks
Why automatic data organization replaces weekly OPTIMIZE jobs, eliminates partition explosions, and lets you change clustering keys without rewriting…
Jan 28
•
Jakub Lasak
31
2
The Databricks Compute Selection Guide: Jobs, All-Purpose, SQL Warehouses, and Serverless
Why scheduled jobs on All-Purpose clusters are bleeding thousands of dollars a month - and how to fix it in 15 minutes
Jan 14
•
Jakub Lasak
11
1
The Databricks Debugging Maturity Ladder: Junior to Principal
How engineers progress through trial-and-error, analysis paralysis, and over-engineering before earning the wisdom to find the one bottleneck that…
Jan 7
•
Jakub Lasak
16
2
The Spark Cluster Parallelism Guide: Why 32 Nodes Can Be Slower Than 4
The task-to-core ratio that separates $600/month waste from systematic diagnosis - with the complete decision framework for repartition vs. scale
Dec 31, 2025
•
Jakub Lasak
4
Understanding Spark Shuffle: The Complete Architecture Guide
Why your 10-minute job became 2 hours, how data skew causes 100GB → 500GB inflation, and the three-phase mechanism behind every GROUP BY
Dec 23, 2025
•
Jakub Lasak
10
2
Inside the Delta Lake Transaction Log: From Write to Time Travel
How 2KB JSON files control your 10TB table, and why OPTIMIZE temporarily doubles your storage before making queries faster
Dec 18, 2025
•
Jakub Lasak
6
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts