Skip to main content

Large Language Model Systems

CMU 11868, Fall 2026

Course Description

Driven by trillion-parameter LLM breakthroughs like GPT-5, Claude, and DeepSeek V4-Pro, modern AI demands unprecedented engineering scale. This course goes beyond simply using existing tools—it explores the fundamental hardware architectures, software stacks, and system techniques required to design, implement, and improve next-generation LLM systems. Students will look under the hood of modern LLM systems (such as vLLM, SGLang, and FlashAttention) to master the internals of AI infrastructure. You will dive deep into building distributed training systems, writing custom GPU/TPU acceleration kernels (CUDA/Triton), architecting high-throughput serving engines, and optimizing systems for fine-tuning, quantization, and reinforcement learning. Combining rigorous systems engineering with cutting-edge research, the course also features guest insights from industry pioneers at NVIDIA and Google. Don’t just deploy the future of AI—learn to build the next-generation systems that power it.

Instructor

Teaching Assistants