Skip to main content

Syllabus

DatesTopicReading/ContentHomework
1/13Introduction to LLM[slides]
GPU Programming Basics 1[slides]Chap 2,4 ofProgramming Massively Parallel Processors, 4th Ed
1/22GPU Programming Basics 2[slides]Chap 3 ofProgramming Massively Parallel Processors, 4th Ed
1/27Learning algorithm and Auto Differentiation[slides]Auto Diff survey Differentiable Programming
Deep Learning Frameworks Design[slides]Tensorflow
2/3Transformer[slides]Attention is all you need
Pre-trained LLMs[slides]LLaMA, GPT3, Annotated TransformerHW1 due
2/10Tokenization [slides]BPE, Sentence-Piece, VOLT
LLM Decoding [slides]Beam search
2/17GPU Acceleration[slides]Chap 5,6 ofProgramming Massively Parallel Processors, 4th Ed
Accelerating Transformer on GPU Part 1[slides]LightSeq
2/24Accelerating Transformer on GPU Part 2[slides]LightSeq2HW2 due
Distributed Model Training[slides]Project proposal due
3/3spring break
3/10Distributed Model Training II[slides]DDP
Distributed Model Training III[slides]GPipe, Megatron-LM
3/17Model Quantization [slides]HW3 due
Model Quantization II [slides]GPTQ
3/24Efficient fine-tuning for Large Models [slides]CIAT, LORA, QLoRA
Large models with Mixture-of-Expert [slides]GShard, Switch Transformer, DeepSpeed-MOE, Deepseek-MoE
3/31Optimizing Attention for Modern Hardware (Tri Dao) [slides]FlashAttention
Communication Efficient Distributed Training [slides]ZeRO (DeepSpeed)HW4 due
4/7LLM Serving with PageAttention (Woosuk Kwon) [slides]vLLM
Better KV Cache for LLM Serving (Yuhan Liu) [slides]Mid-term report due
4/14DistServe: Disaggregated Prefill-Decoding (Hao Zhang) [slides]DistServeHW5 due
LLM serving with SGL (Ying Sheng)
4/21Scalable LLM RL Training
4/23Final project presentation
4/28Final report due
App Stack and Model Serving[slides]Triton, LightLLM
GPU just-in-time compilationJAX
Speculative DecodingSpeculative Decoding
Retrieval-augmented Language ModelsRAG
Nearest Vector Search for EmbeddingsHNSW
Multimodal LLMsFlamingo
Deepseek V3 and R1
Efficient Streaming Language Models with Attention SinksAttention Sink
Advanced Large Model ServingOrca
Dynamo