COS 435 / ECE 433: Reinforcement Learning

Princeton Polaris Lab Princeton Polaris Lab
PRINCETON UNIVERSITY, SPRING 2026
Location: Friend Center 101
Time: Friday 1:30pm-4:20pm
Instructor

Prof. Peter Henderson

Assistant Professor in CS/SPIA

Office Hours: By appointment

Teaching Assistants

Zeyu Shen

Office Hours: Fri 9-10am

Sherred Hall 3rd Floor

Kincaid MacDonald

Office Hours: Wed 3-4pm

Friend Center 010

Raj H. Ghugare

Office Hours: Mon 3-4pm

COS Building 003

Chongyi Zheng

Office Hours: Thu 3-4pm

COS Building 302

Course Description

This course provides an introductory overview of reinforcement learning (RL), a machine learning paradigm where agents learn to make decisions by interacting with their environment. We will cover fundamental concepts such as Markov Decision Processes, value functions, and policy optimization. Students will learn important RL algorithms including Q-learning, policy gradient methods, and actor-critic approaches. We will also address key challenges in RL such as exploration, generalization, and sample efficiency. Applications of RL to real-world problems—including robotics, healthcare, and molecular science—will be highlighted throughout the course. Assignments will involve implementing RL algorithms and conducting mathematical analyses. Students will complete an open-ended final group project.

Prerequisites

Students should have a solid foundation in machine learning and mathematics, including familiarity with probability, statistics, and linear algebra. Prior completion of courses such as COS 324 (Introduction to Machine Learning) or equivalent is recommended. Programming experience in Python is required.

Course Expectations & Grading

Components

  • Participation (15%): Starting week 3: Google form with in-class polling questions; breakout discussions on assigned papers; submit reading reflections on assigned papers with the marked up PDF of the paper.
  • Problem Sets (15%): 3 assignments, due every other week starting on week 3; small theory problems.
  • Programming Assignments (20%): 3 assignments, starting on week 3; small programming tasks.
  • Final Project (50%): The biggest component! Research project on a topic in RL; aim for academic workshop-level quality.

Policies

  • Late Submissions: Late assignments will incur a penalty of 10% per day, up to a maximum of three days. After three days, assignments will not be accepted unless prior arrangements are made.
  • Academic Integrity: Students are expected to adhere to Princeton University's academic integrity policies. Using LLMs for solving assignments is NOT permitted other than for getting basic understanding, you must understand and be able to explain all code you submit. That being said, we are okay with some small amount of LLM usage for understanding concepts and ideas, as well as helping with code for more complicated projects---but only minimimally for writing as a post-draft check! But again, you are responsible for the content.
  • Collaboration: You may discuss problem sets with classmates, but must write up solutions independently. List collaborators on your submission.

Resources

Lecture Notes

Textbook

  • Required: None — lecture notes are posted on the course website (see above).

Optional Textbooks

  • Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
  • Reinforcement Learning: Bit by Bit by Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, and Zheng Wen
  • Bandit Algorithms by Tor Lattimore and Csaba Szepesvári (if you're interested in bandits)
  • Algorithms for Reinforcement Learning by Csaba Szepesvári
  • Mathematical Foundations of Reinforcement Learning by Shiyu Zhao

Supplementary Materials

  • Selected research papers for advanced topics
  • OpenAI Spinning Up in Deep RL [Link]

Course Schedule

Schedule is tentative and subject to change. Check the course website for the most up-to-date information.

WEEK TOPIC DESCRIPTION & READINGS
1 Course Introduction & Foundations Lecture 1 (Jan 30): Course intro, what is RL, the Markov Decision Process (MDP), value iteration, and policy iteration.
[Slides]
2 Value-based RL Lecture 2 (Feb 6): Q-learning, value-based methods, and value function learning.
[Slides (pre-lecture)]
Pick any two: Optional Readings:
3 Value-based RL (cont'd) Lecture 3 (Feb 13): Continuation of value-based methods and DDPG.
Pick any two:
4 Policy Gradient and Actor-Critic Methods Lecture 4 (Feb 20): REINFORCE, policy gradients, and TRPO.
Pick any two:
5 Actor-Critic Methods Lecture 5 (Feb 27): Bias-variance trade-offs, actor-critic methods, baselines as control variates, and GRPO.
Pick any two:

Frequently Asked Questions

How do I enroll in this course?

This course is closed for enrollment.

What are the prerequisites?

Students should have completed COS 324 (Introduction to Machine Learning) or an equivalent course. Familiarity with probability, statistics, linear algebra, and Python programming is required.

Is this course suitable for graduate students?

Yes! This course is open to both undergraduate and graduate students. Graduate students may be expected to complete a more advanced final project.

What programming language will we use?

All assignments will be in Python using standard ML libraries (NumPy, PyTorch). Familiarity with these tools is helpful but not required—we will provide tutorials.

Can the final project be individual?

Generally no. Due to the size of the enrollment, we will require 3-5 students per group, except in exceptional circumstances.

Can I audit the course?

Formal auditing is not possible, but if there's room you can sit in on lectures.