The Databricks Data Engineer

The Databricks Data Engineer

๐—›๐—ผ๐˜„ ๐—ฆ๐—ฝ๐—ฎ๐—ฟ๐—ธ ๐—ฆ๐—ต๐˜‚๐—ณ๐—ณ๐—น๐—ฒ ๐—”๐—ฐ๐˜๐˜‚๐—ฎ๐—น๐—น๐˜† ๐—ช๐—ผ๐—ฟ๐—ธ๐˜€

A simple educational dialogue

Jakub Lasak's avatar
Jakub Lasak
Dec 02, 2025
โˆ™ Paid

๐——๐—ฎ๐˜๐—ฎ ๐—˜๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ:

โ€œMy job is slow because of Shuffle. I know it moves data, but how?โ€

๐—ฆ๐—ฝ๐—ฎ๐—ฟ๐—ธ:

โ€œItโ€™s a physical relocation process. Let me explain.

Itโ€™s not just a network transfer.

Itโ€™s a 3-phase process that hits every resource you have: CPU, Disk, Memory, and Network.

Think of it like reorganizing 1TB of data across 100 machines.โ€

๐——๐—ฎ๐˜๐—ฎ ๐—˜๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ:

โ€œWalk me through it. What happens first?โ€

๐—ฆ๐—ฝ๐—ฎ๐—ฟ๐—ธ:

โ€œPhase 1: Write.

Your executors group data by partition key and write it to local disk.

User's avatar

Continue reading this post for free, courtesy of Jakub Lasak.

Or purchase a paid subscription.
ยฉ 2026 Jakub Lasak Consulting ยท Privacy โˆ™ Terms โˆ™ Collection notice
Start your SubstackGet the app
Substack is the home for great culture