The Databricks Data Engineer
Subscribe
Sign in
Home
Interview Like Seniors
Execute Like Seniors
Think Like Seniors
Hands-on Labs
Learning Paths
Archive
About
Execute Like Seniors
Latest
Top
Discussions
Your Slowest Spark Job Is Your Cheapest Lesson
Scaling the cluster throws away the answer it just gave you
Jun 4
•
Jakub Lasak
4
Why a Bigger Cluster Won't Fix Your Spark OOM
An OOM is one bucket overflowing, not total RAM
May 27
•
Jakub Lasak
10
1
Three Medallion Layers Is Cargo Cult on a Small Team
Every Databricks team treats Medallion as the default architecture.
May 23
•
Jakub Lasak
4
6 Steps to Cut 30-50% of Databricks Cluster Waste
The audit takes an afternoon, the savings last quarters
May 12
•
Jakub Lasak
5
Why 3NF Is Killing Your Databricks Dashboards
Stop normalizing your analytical tables. The textbook was written for a different database.
Apr 13
•
Jakub Lasak
12
2
How Liquid Clustering Actually Beats Partitioning + Z-Order
The decision tree for every 2TB+ Delta Lake table
Apr 1
•
Jakub Lasak
6
1
6 Data Quality Checks I Build Into Every Databricks Pipeline
The silent failures that broke executive dashboards for 5 days
Mar 18
•
Jakub Lasak
19
1
How Delta Lake Achieves ACID Using Only JSON Files
The _delta_log commit protocol in 10 minutes
Mar 11
•
Jakub Lasak
6
2
How One SQL Line Becomes 4,000 Parallel Tasks
The complete guide to Spark’s execution engine - from SQL to parallel tasks, and how to find bottlenecks when things get slow.
Feb 27
•
Jakub Lasak
11
Code Reviews at Scale: What Fortune 500 Companies Actually Do (And Why Your Quick LGTM Is Fine)
The 80/20 framework that focuses your attention on schema changes, resource sizing, and failure paths - while safely speeding up the rest
Feb 11
•
Jakub Lasak
2
How Liquid Clustering Actually Works in Databricks
Why automatic data organization replaces weekly OPTIMIZE jobs, eliminates partition explosions, and lets you change clustering keys without rewriting…
Jan 28
•
Jakub Lasak
31
2
The Databricks Compute Selection Guide: Jobs, All-Purpose, SQL Warehouses, and Serverless
Why scheduled jobs on All-Purpose clusters are bleeding thousands of dollars a month - and how to fix it in 15 minutes
Jan 14
•
Jakub Lasak
11
1
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts