Skip to main content

Syllabus

DatesTopicReading/ContentHomework
1/13Introduction to LLM [slides]HW1 out
GPU Programming Basics [slides]Chap 2,3 of Programming Massively Parallel Processors, 3rd Ed
1/22Learning algorithm and Auto Differentiation [slides]Auto Diff survery
1/27Deep Learning Frameworks Design Principles [slides]Tensorflow
Transformer [slides]Attention is all you need
2/3Pre-trained LLMs [slides]LLaMA, GPT3, Annotated TransformerHW1 due / HW2 out
Tokenization and Decoding [slides]BPE, Sentence-Piece, Beam search
2/10GPU Acceleration [slides]Chap 4,5 of Programming Massively Parallel Processors, 3rd Ed
Accelerating Transformer on GPU Part 1 [slides]LightSeq
2/17Accelerating Transformer on GPU Part 2 [slides]LightSeq2
Distributed Model Training [slides]DDPHW2 due / HW3 out
2/24Distributed Model Training II [slides]GPipe, Megatron-LM
App Stack and Model Serving [slides]Triton, LightLLMProject proposal due
3/3spring break
3/10Model Quantization and CompressionGPTQ
Efficient fine-tuning for Large ModelsLORA, QLoRA
3/17Communication Efficient Distributed TrainingZeRO (DeepSpeed)HW3 due / HW4 out
Advanced Large Model ServingOrca
3/24PageAttentionvLLM
GPU just-in-time compilationJAX
3/31Large models with Mixture-of-ExpertDeepSpeed-MOEHW4 due
Memory Optimization for LLMsFlashAttention
4/7Long and Longer ContextRMTMid-term report due
Efficient Streaming Language Models with Attention SinksAttention Sink
4/14Speculative DecodingSpeculative Decoding
Retrieval-augmented Language ModelsRAG
4/21Nearest Vector Search for EmbeddingsHNSW
Multimodal LLMsFlamingo
4/28Final project presentation
4/29Final report due