Skip to main content

Syllabus

DatesTopicReading/ContentHomework
1/13Introduction to LLM[slides]
GPU Programming Basics 1[slides]Chap 2,4 ofProgramming Massively Parallel Processors, 4th Ed
1/22GPU Programming Basics 2[slides]Chap 3 ofProgramming Massively Parallel Processors, 4th Ed
1/27Learning algorithm and Auto Differentiation[slides]Auto Diff survey Differentiable Programming
Deep Learning Frameworks Design[slides]Tensorflow
2/3Transformer[slides]Attention is all you need
Pre-trained LLMs[slides]LLaMA, GPT3, Annotated TransformerHW1 due
2/10Tokenization [slides]BPE, Sentence-Piece, VOLT
LLM Decoding [slides]Beam search
2/17GPU Acceleration[slides]Chap 5,6 ofProgramming Massively Parallel Processors, 4th Ed
Accelerating Transformer on GPU Part 1[slides]LightSeq
2/24Accelerating Transformer on GPU Part 2[slides]LightSeq2HW2 due
Distributed Model Training[slides]Project proposal due
3/3spring break
3/10Distributed Model Training II[slides]DDP
Distributed Model Training III[slides]GPipe, Megatron-LM
3/17Model Quantization and CompressionGPTQHW3 due
Efficient fine-tuning for Large ModelsLORA, QLoRA
3/24Communication Efficient Distributed TrainingZeRO (DeepSpeed)
Advanced Large Model ServingOrca
3/31PageAttentionvLLMHW4 due
GPU just-in-time compilationJAX
4/7Large models with Mixture-of-ExpertDeepSpeed-MOEMid-term report due
Memory Optimization for LLMsFlashAttention
4/14Long and Longer ContextRMT
Efficient Streaming Language Models with Attention SinksAttention Sink
4/21Speculative DecodingSpeculative Decoding
Retrieval-augmented Language ModelsRAG
4/25Final project presentation
4/26Final report due
App Stack and Model Serving[slides]Triton, LightLLM
Nearest Vector Search for EmbeddingsHNSW
Multimodal LLMsFlamingo
Deepseek V3 and R1
RL training for LLM