Skip to main content

Syllabus

DatesTopicReading/ContentHomework
1/17Introduction to LLM [slides]HW1 out
1/22GPU Programming Basics [slides]Chap 2,3 of Programming Massively Parallel Processors, 3rd Ed
Learning algorithm and Auto Differentiation [slides]Auto Diff survery
1/29Deep Learning Frameworks Design Principles [slides]Tensorflow
Transformer [slides]Attention is all you need
2/5Pre-trained LLMs [slides]LLaMA, GPT3, Annotated TransformerHW1 due / HW2 out
Tokenization and Decoding [slides]BPE, Sentence-Piece, Beam search
2/12GPU Acceleration [slides]Chap 4,5 of Programming Massively Parallel Processors, 3rd Ed
Accelerating Transformer on GPU Part 1 [slides]LightSeq
2/19Accelerating Transformer on GPU Part 2 [slides]LightSeq2
Distributed Model Training [slides]DDPHW2 due / HW3 out
2/26Distributed Model Training II [slides]GPipe, Megatron-LM
App Stack and Model Serving [slides]Triton, LightLLMProject proposal due
3/4spring break
3/11Model Quantization and Compression [slides]GPTQ
Efficient fine-tuning for Large Models [slides]LORA, QLoRA
3/18Communication Efficient Distributed Training [slides]ZeRO (DeepSpeed)HW3 due / HW4 out
Advanced Large Model Serving [slides]Orca
3/25PageAttention [slides]vLLM
GPU just-in-time compilation [slides]JAX
4/1Large models with Mixture-of-Expert [slides]DeepSpeed-MOEHW4 due
Memory Optimization for LLMs [slides]FlashAttention
4/8Long and Longer Context [slides]RMTMid-term report due
Efficient Streaming Language Models with Attention Sinks [slides]Attention Sink
4/15Speculative Decoding [slides]Speculative Decoding
Retrieval-augmented Language Models [slides]RAG
4/22Nearest Vector Search for Embeddings [slides]HNSW
Multimodal LLMs [slides]Flamingo
4/29Final project presentation
4/30Final report due