Drafts and notes on ML systems, LLM inference, and the work I find worth
writing about. Updated occasionally — not on a schedule.
01
2026 · 05 · 28
LLM Inference
Distributed
NCCL gives you collectives. NVSHMEM gives you memory. A look at when each
one wins, why kernel-launch overhead matters now, why MoE inference has
quietly pushed NVSHMEM out of the HPC corner, and how DeepEP puts the
whole picture together in production.
Read essay →
02
Draft · 2026
ML Systems
— in progress
What it takes to dynamically choose vLLM deployment configurations from
real-time workload signals — and why the obvious heuristics don't survive
first contact with production traffic.
Coming soon →
03
Draft · 2026
Compilers
— in progress
Lessons from writing custom operators outside the CUDA ecosystem on AWS
Trainium — on living without cuBLAS, on what TVM gets you, and on where
hand-written kernels still win.
Coming soon →