Course Description

Driven by trillion-parameter LLM breakthroughs like GPT-5, Claude, and DeepSeek V4-Pro, modern AI demands unprecedented engineering scale. This course goes beyond simply using existing tools—it explores the fundamental hardware architectures, software stacks, and system techniques required to design, implement, and improve next-generation LLM systems. Students will look under the hood of modern LLM systems (such as vLLM, SGLang, and FlashAttention) to master the internals of AI infrastructure. You will dive deep into building distributed training systems, writing custom GPU/TPU acceleration kernels (CUDA/Triton), architecting high-throughput serving engines, and optimizing systems for fine-tuning, quantization, and reinforcement learning. Combining rigorous systems engineering with cutting-edge research, the course also features guest insights from industry pioneers at NVIDIA and Google. Don’t just deploy the future of AI—learn to build the next-generation systems that power it.

Instructor

Lei Li

Teaching Assistants

Large Language Model Systems

Course Description

Instructor

Lei Li

Teaching Assistants

Yuheng Ding

Rishiraj Nagarajan

Jie Ren

Aditya Tummala

Sreeram Vennam

Tianqi Xu