1 min read
3.2 · All-or-Nothing: Gang Scheduling for Distributed Jobs

Series stub — full post TBD. This page exists so the series shape is reviewable.

Planned focus: A multi-GPU job needs all its pieces at once; partial placement deadlocks and wastes what it grabbed.


Part of “Inside AI Infrastructure: The Compute Layer.” Opinions are my own; public, documented concepts only.