SLES: High-Confidence Guarantees for Safe Reward and Policy Learning Under Uncertainty

Information

NSF Award
2416761

Owner

University of Utah

Award Id
2416761
Award Effective Date
8/15/2024 - 11 months ago
Award Expiration Date
7/31/2027 - a year from now
Award Amount
$ 439,425.00
Award Instrument
Standard Grant

Information

SLES: High-Confidence Guarantees for Safe Reward and Policy Learning Under Uncertainty

A prerequisite to making AI systems safe and reliable is to get them to do what we, as humans, want. The focus of this project is to enable the safe deployment of learning-enabled systems that learn objectives from human feedback and then robustly optimize their behavior under these learned objectives. What humans want is often highly ambiguous and uncertain, so we need AI systems that are robust to this uncertainty. However, most prior work on reward learning does not easily facilitate uncertainty assessment. The project's novelties are to develop the first scalable learning methods that are robust to uncertainty, enable self-assessment, and provide basic test cases for assessing AI alignment with human values. The project's impacts are fundamentally new capabilities that will allow AI systems to safely learn models of human intent and enable humans to know with high-confidence whether an AI system will behave correctly with respect to that intent. The broader impacts of making progress on safe and robust human-AI alignment include better domestic robots, recommendation systems, self-driving cars, delivery quadrotors, and large language models (LLMs). The project broadens participation in computing by providing educational outreach opportunities for undergraduate research and K-12 summer AI camps. <br/><br/>The key observation in this project is that AI systems will always face uncertainty when seeking to identify human intent and values. Thus, there is a need for methods that explicitly reason about uncertainty and can provide probabilistic guarantees of robustness under this uncertainty. The project is pursuing the following three specific objectives that will enable safe and robust reward learning: (1) Probabilistic performance bounds when learning policies from human input: the project is developing approaches that allow humans to know with high-confidence whether a learned policy achieves a desired performance threshold when learning a reward function from human feedback. (2) Unit tests for reward and policy alignment: the project is developing tests that verify with high-confidence whether a learned reward function and behavior are correct. (3) Robustness to reward misidentification and misgeneralization: the project is developing techniques that penalize misaligned behavior during policy optimization to ensure the resulting behavior of the AI system does not lead to unintended consequences. The investigators are applying these techniques to reward learning to prevent reward hacking and also to reinforcement learning with a known reward function to overcome the problem of goal misgeneralization.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Program Officer
Pavithra Prabhakarpprabhak@nsf.gov7032922585
Min Amd Letter Date
8/28/2024 - 11 months ago
Max Amd Letter Date
8/28/2024 - 11 months ago
ARRA Amount

Institutions

Name
University of Utah
City
SALT LAKE CITY
State
UT
Country
United States
Address
201 PRESIDENTS CIR
Postal Code
841129049
Phone Number
8015816903

Investigators

First Name
Daniel
Last Name
Brown
Email Address
dsbrown@cs.utah.edu
Start Date
8/28/2024 12:00:00 AM

Program Element

Text
AI-Safety

SLES: High-Confidence Guarantees for Safe Reward and Policy Learning Under Uncertainty

Information

Owner

Award Id

Award Effective Date

Award Expiration Date

Award Amount

Award Instrument

SLES: High-Confidence Guarantees for Safe Reward and Policy Learning Under Uncertainty

Program Officer

Min Amd Letter Date

Max Amd Letter Date

ARRA Amount

Institutions

Name

City

State

Country

Address

Postal Code

Phone Number

Investigators

First Name

Last Name

Email Address

Start Date

Program Element

Text