Skip to main content

Syllabus

DatesTopicReading/ContentHomework
1/13Introduction to LLM [slides]
GPU Programming Basics 1 [slides]Chap 2,4 of Programming Massively Parallel Processors, 4th Ed
1/22GPU Programming Basics 2 [slides]Chap 3 of Programming Massively Parallel Processors, 4th Ed
1/27Learning algorithm and Auto Differentiation [slides]Auto Diff survey
Deep Learning Frameworks Design [slides]Tensorflow
2/3Transformer [slides]Attention is all you needHW1 due
Pre-trained LLMs [slides]LLaMA, GPT3, Annotated Transformer
2/10Tokenization and Decoding [slides]BPE, Sentence-Piece, Beam search
GPU Acceleration [slides]Chap 5,6 of Programming Massively Parallel Processors, 4th Ed
Accelerating Transformer on GPU Part 1 [slides]LightSeq
2/17Accelerating Transformer on GPU Part 2 [slides]LightSeq2
Distributed Model Training [slides]DDPHW2 due
2/24Distributed Model Training II [slides]GPipe, Megatron-LM
App Stack and Model Serving [slides]Triton, LightLLMProject proposal due
3/3spring break
3/10Model Quantization and CompressionGPTQ
Efficient fine-tuning for Large ModelsLORA, QLoRA
3/17Communication Efficient Distributed TrainingZeRO (DeepSpeed)HW3 due
Advanced Large Model ServingOrca
3/24PageAttentionvLLM
GPU just-in-time compilationJAX
3/31Large models with Mixture-of-ExpertDeepSpeed-MOEHW4 due
Memory Optimization for LLMsFlashAttention
4/7Long and Longer ContextRMTMid-term report due
Efficient Streaming Language Models with Attention SinksAttention Sink
4/14Speculative DecodingSpeculative Decoding
Retrieval-augmented Language ModelsRAG
4/21Nearest Vector Search for EmbeddingsHNSW
Multimodal LLMsFlamingo
4/28Final project presentation
4/29Final report due