I'm a software engineer with 4+ years of experience building AI infrastructure at AWS β specifically the capacity layer that decides how GPU resources get reserved, allocated, and delivered to ML training and inference workloads at scale.
I build the systems behind products like Capacity Blocks for ML (reserved GPU scheduling) and UltraServers (multi-instance GPU supercomputers connected via high-bandwidth accelerator interconnects for trillion-parameter model training). My work involves distributed workflow orchestration, capacity reservation lifecycle management, and the scheduling primitives that bridge capacity planning with GPU utilization.
Before AWS, I interned at Apple on strategic data infrastructure and did ML research at Boston University Department of Medicine. I studied CS & Math at Boston University and Entertainment Technology at Carnegie Mellon University.
This blog is where I write about GPU scheduling, capacity planning for AI workloads, distributed systems patterns, and the infrastructure that makes large-scale ML possible. Bilingual (δΈζ/English), depending on the topic.