Collaborative Research: Safe Reinforcement Learning Guaranteed by Bayesian Distributionally Robust Optimization and Online Change Point Detection

Information

  • NSF Award
  • 2419564
Owner
  • Award Id
    2419564
  • Award Effective Date
    9/1/2024 - a year ago
  • Award Expiration Date
    8/31/2027 - a year from now
  • Award Amount
    $ 193,000.00
  • Award Instrument
    Standard Grant

Collaborative Research: Safe Reinforcement Learning Guaranteed by Bayesian Distributionally Robust Optimization and Online Change Point Detection

Safety is a crucial requirement for systems employing reinforcement learning in domains such as<br/>robotics, autonomous driving, and power systems. In this project we consider safety as the avoidance of known unsafe states and prevention of unknown unsafe behaviors. To achieve this safety goal, we propose a suite of model-based reinforcement learning approaches that span training, deployment, improvement, and evaluation. The project consists of the following research thrusts: 1) Training policies that are robust to distribution shift via distributionally robust approaches; 2) Continual policy improvement via Bayesian risk-averse learning; 3) Adapting policies to non-stationarity via online change detection; and 4) Rigorous simulation via space-filling experiment design to gain understandings of a given policy in various environment settings.<br/> <br/>If successful, the proposed research will make significant contributions to the existing literature on safe reinforcement learning (RL) by developing new theories and methodologies. In particular, the proposed research has the following innovations: 1) formulation of safety measures as general objectives beyond the standard cumulative form and development of solution approaches for this general formulation; 2) consideration of both intrinsic uncertainty and model uncertainty to ensure that the resulting policy performs well and satisfies a specified risk level in the real environment; 3) bridging the gap between Bayesian RL and safe RL for continually improving models and policies while maintaining the safety of the deployed policy; 4) near-optimal policy learning algorithms that adapt to piecewise non-stationary environments; and 5) rigorous simulation approach for policy evaluation to identify unexpected unsafe behaviors before they actually happen. Because of the generality of the proposed approaches, the resulting techniques will have broad applicability in various domains that utilize reinforcement learning and require safety considerations. This research integrates well with the courses that the PIs have developed and teach. The PIs are committed to promoting diversity, equity, and inclusion within their research communities by actively engaging women and minorities in research and academia careers, outreaching to K-12 students, and fostering greater participation of researchers from underrepresented groups.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Anthony Kuhakuh@nsf.gov7032924714
  • Min Amd Letter Date
    7/22/2024 - a year ago
  • Max Amd Letter Date
    7/22/2024 - a year ago
  • ARRA Amount

Institutions

  • Name
    New York University
  • City
    NEW YORK
  • State
    NY
  • Country
    United States
  • Address
    70 WASHINGTON SQ S
  • Postal Code
    100121019
  • Phone Number
    2129982121

Investigators

  • First Name
    Zhengyuan
  • Last Name
    Zhou
  • Email Address
    zzhou@stern.nyu.edu
  • Start Date
    7/22/2024 12:00:00 AM

Program Element

  • Text
    EPCN-Energy-Power-Ctrl-Netwrks
  • Code
    760700

Program Reference

  • Text
    LEARNING & INTELLIGENT SYSTEMS
  • Code
    8888