Zeyu Shen
Office Hours: Fri 9-10am
Sherred Hall 3rd Floor
Kincaid MacDonald
Office Hours: Wed 3-4pm
Friend Center 010
Raj H. Ghugare
Office Hours: Mon 3-4pm
COS Building 003
Chongyi Zheng
Office Hours: Thu 3-4pm
COS Building 302
This course provides an introductory overview of reinforcement learning (RL), a machine learning paradigm where agents learn to make decisions by interacting with their environment. We will cover fundamental concepts such as Markov Decision Processes, value functions, and policy optimization. Students will learn important RL algorithms including Q-learning, policy gradient methods, and actor-critic approaches. We will also address key challenges in RL such as exploration, generalization, and sample efficiency. Applications of RL to real-world problems—including robotics, healthcare, and molecular science—will be highlighted throughout the course. Assignments will involve implementing RL algorithms and conducting mathematical analyses. Students will complete an open-ended final group project.
Students should have a solid foundation in machine learning and mathematics, including familiarity with probability, statistics, and linear algebra. Prior completion of courses such as COS 324 (Introduction to Machine Learning) or equivalent is recommended. Programming experience in Python is required.
We will post submission instructions on Canvas.
The final project is the largest component of the course (50%). You will work in groups of 3–5 to complete a research project on a topic in reinforcement learning, aiming for academic workshop-level quality.
Schedule is tentative and subject to change. Check the course website for the most up-to-date information.
| WEEK | TOPIC | DESCRIPTION & READINGS |
|---|---|---|
| 1 | Course Introduction & Foundations |
Lecture 1 (Jan 30): Course intro, what is RL, the Markov Decision Process (MDP), value iteration, and policy iteration.
[Slides] Optional Textbook Coverage:
|
| 2 | Value-based RL |
Lecture 2 (Feb 6): Q-learning, value-based methods, and value function learning.
[Slides (pre-lecture)] Pick any two:
|
| 3 | Value-based RL (cont'd) |
Lecture 3 (Feb 13): Continuation of value-based methods and DDPG.
[Slides] Pick any two:
|
| 4 | Policy Gradient and Actor-Critic Methods |
Lecture 4 (Feb 20): REINFORCE, policy gradients, and TRPO.
[Slides] Pick any two:
|
| 5 | Actor-Critic Methods |
Lecture 5 (Feb 27): Bias-variance trade-offs, actor-critic methods, baselines as control variates, and GRPO.
[Notes] Pick any two:
|
This course is closed for enrollment.
Students should have completed COS 324 (Introduction to Machine Learning) or an equivalent course. Familiarity with probability, statistics, linear algebra, and Python programming is required.
Yes! This course is open to both undergraduate and graduate students. Graduate students may be expected to complete a more advanced final project.
All assignments will be in Python using standard ML libraries (NumPy, PyTorch). Familiarity with these tools is helpful but not required—we will provide tutorials.
Generally no. Due to the size of the enrollment, we will require 3-5 students per group, except in exceptional circumstances.
Formal auditing is not possible, but if there's room you can sit in on lectures.