AI Infrastructure

How to Set Up Multi-GPU Training with PyTorch

Go from single-GPU to multi-GPU training with PyTorch DDP in under 50 lines of code

Replace hand-rolled attention kernels with FlexAttention and get up to 2x faster LLM decoding on long contexts.