Notes

Recent writing

Drafts and notes on ML systems, LLM inference, and the work I find worth writing about. Updated occasionally — not on a schedule.

2026 · 05 · 28 LLM Inference Distributed

NCCL vs NVSHMEM: Two Answers to the Same Question

NCCL gives you collectives. NVSHMEM gives you memory. A look at when each one wins, why kernel-launch overhead matters now, why MoE inference has quietly pushed NVSHMEM out of the HPC corner, and how DeepEP puts the whole picture together in production.

Read essay →

Draft · 2026 ML Systems — in progress

Auto-tuning vLLM in Production: A Field Report

What it takes to dynamically choose vLLM deployment configurations from real-time workload signals — and why the obvious heuristics don't survive first contact with production traffic.

Coming soon →

Draft · 2026 Compilers — in progress

Notes from a Non-CUDA Accelerator

Lessons from writing custom operators outside the CUDA ecosystem on AWS Trainium — on living without cuBLAS, on what TVM gets you, and on where hand-written kernels still win.

Coming soon →