Collaborative Research: Safe Reinforcement Learning Guaranteed by Bayesian Distributionally Robust Optimization and Online Change Point Detection

Information

NSF Award
2419564

Owner

New York University

Award Id
2419564
Award Effective Date
9/1/2024 - a year ago
Award Expiration Date
8/31/2027 - a year from now
Award Amount
$ 193,000.00
Award Instrument
Standard Grant

Information

Collaborative Research: Safe Reinforcement Learning Guaranteed by Bayesian Distributionally Robust Optimization and Online Change Point Detection

Safety is a crucial requirement for systems employing reinforcement learning in domains such as robotics, autonomous driving, and power systems. In this project we consider safety as the avoidance of known unsafe states and prevention of unknown unsafe behaviors. To achieve this safety goal, we propose a suite of model-based reinforcement learning approaches that span training, deployment, improvement, and evaluation. The project consists of the following research thrusts: 1) Training policies that are robust to distribution shift via distributionally robust approaches; 2) Continual policy improvement via Bayesian risk-averse learning; 3) Adapting policies to non-stationarity via online change detection; and 4) Rigorous simulation via space-filling experiment design to gain understandings of a given policy in various environment settings. If successful, the proposed research will make significant contributions to the existing literature on safe reinforcement learning (RL) by developing new theories and methodologies. In particular, the proposed research has the following innovations: 1) formulation of safety measures as general objectives beyond the standard cumulative form and development of solution approaches for this general formulation; 2) consideration of both intrinsic uncertainty and model uncertainty to ensure that the resulting policy performs well and satisfies a specified risk level in the real environment; 3) bridging the gap between Bayesian RL and safe RL for continually improving models and policies while maintaining the safety of the deployed policy; 4) near-optimal policy learning algorithms that adapt to piecewise non-stationary environments; and 5) rigorous simulation approach for policy evaluation to identify unexpected unsafe behaviors before they actually happen. Because of the generality of the proposed approaches, the resulting techniques will have broad applicability in various domains that utilize reinforcement learning and require safety considerations. This research integrates well with the courses that the PIs have developed and teach. The PIs are committed to promoting diversity, equity, and inclusion within their research communities by actively engaging women and minorities in research and academia careers, outreaching to K-12 students, and fostering greater participation of researchers from underrepresented groups. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Program Officer
Anthony Kuhakuh@nsf.gov7032924714
Min Amd Letter Date
7/22/2024 - a year ago
Max Amd Letter Date
7/22/2024 - a year ago
ARRA Amount

Institutions

Name
New York University
City
NEW YORK
State
NY
Country
United States
Address
70 WASHINGTON SQ S
Postal Code
100121019
Phone Number
2129982121

Investigators

First Name
Zhengyuan
Last Name
Zhou
Email Address
zzhou@stern.nyu.edu
Start Date
7/22/2024 12:00:00 AM

Program Element

Text
EPCN-Energy-Power-Ctrl-Netwrks
Code
760700

Program Reference

Text
LEARNING & INTELLIGENT SYSTEMS
Code
8888

Collaborative Research: Safe Reinforcement Learning Guaranteed by Bayesian Distributionally Robust Optimization and Online Change Point Detection

Information

Owner

Award Id

Award Effective Date

Award Expiration Date

Award Amount

Award Instrument

Collaborative Research: Safe Reinforcement Learning Guaranteed by Bayesian Distributionally Robust Optimization and Online Change Point Detection

Program Officer

Min Amd Letter Date

Max Amd Letter Date

ARRA Amount

Institutions

Name

City

State

Country

Address

Postal Code

Phone Number

Investigators

First Name

Last Name

Email Address

Start Date

Program Element

Text

Code

Program Reference

Text

Code