Collaborative Research: SLES: No Bad Surprises: Aligning Agent and Human Norms via Specification Refinements

Information

  • NSF Award
  • 2416459
Owner
  • Award Id
    2416459
  • Award Effective Date
    9/1/2024 - 10 months ago
  • Award Expiration Date
    8/31/2028 - 3 years from now
  • Award Amount
    $ 750,000.00
  • Award Instrument
    Standard Grant

Collaborative Research: SLES: No Bad Surprises: Aligning Agent and Human Norms via Specification Refinements

Autonomous robots hold the potential to revolutionize society in areas such as healthcare, transportation, and manufacturing. These systems frequently employ learning-enabled components in their perception, planning, and control modules, necessitating complex design choices to ensure safe operation. However, design decisions that initially appear sound may lead to unexpected problems during testing or, even worse, post-deployment. For example, an autonomous vehicle once exhibited erratic swerving to localize itself for lane-keeping, a failure mode unforeseen by the system's designers and developers. Such surprises indicate that the agent's norms—what it considers permissible and obligatory—are inappropriate in certain situations. As learning-enabled systems become more complex, operate in open environments, and interact with humans and other robots, these challenges are likely to be exacerbated. This project focuses on safety failures of reinforcement learning (RL) agents, stemming from two primary sources: the misalignment between design intent and the agent’s perceived norms, and the gap between the agent’s required knowledge for safe operation and its actual perception capabilities. The goal is to equip researchers and practitioners with tools to design provably safe autonomous systems, encompassing all major stages of design, verification, and deployment.<br/><br/>The project develops a process to iteratively align an RL agent's norms with those of its designers and formally verify the resulting behavior. Key activities include: (1) developing inverse reinforcement learning algorithms to learn a reward function from demonstrations, constrained by deontic logic; (2) systematically exploring the trained agent’s norms to uncover unknowns by generating norms that would surprise the engineer; (3) querying the agent to explain its reward function when it produces undesired behavior; (4) defining a new class of obligations related to knowledge and corresponding formal specification logic; (5) designing run-time monitors to predict action and knowledge safety violations during operation; (6) implementing online metareasoning, coupled with introspective perception modules, to restore safe behavior; (7) iteratively improving system alignment by updating the agent's learning process using verification and run-time monitoring results. The project's outcomes are validated using an industrial simulator of a real-world bipedal robot, scaled-down autonomous race cars, and a campus-wide fleet of delivery robots.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Jie Yangjyang@nsf.gov7032924768
  • Min Amd Letter Date
    8/21/2024 - 11 months ago
  • Max Amd Letter Date
    8/21/2024 - 11 months ago
  • ARRA Amount

Institutions

  • Name
    Oregon State University
  • City
    CORVALLIS
  • State
    OR
  • Country
    United States
  • Address
    1500 SW JEFFERSON AVE
  • Postal Code
    973318655
  • Phone Number
    5417374933

Investigators

  • First Name
    Houssam
  • Last Name
    Abbas
  • Email Address
    houssam.abbas@oregonstate.edu
  • Start Date
    8/21/2024 12:00:00 AM
  • First Name
    Sandhya
  • Last Name
    Saisubramanian
  • Email Address
    sandhya.sai@oregonstate.edu
  • Start Date
    8/21/2024 12:00:00 AM

Program Element

  • Text
    AI-Safety