Autonomous robots hold the potential to revolutionize society in areas such as healthcare, transportation, and manufacturing. These systems frequently employ learning-enabled components in their perception, planning, and control modules, necessitating complex design choices to ensure safe operation. However, design decisions that initially appear sound may lead to unexpected problems during testing or, even worse, post-deployment. For example, an autonomous vehicle once exhibited erratic swerving to localize itself for lane-keeping, a failure mode unforeseen by the system's designers and developers. Such surprises indicate that the agent's norms—what it considers permissible and obligatory—are inappropriate in certain situations. As learning-enabled systems become more complex, operate in open environments, and interact with humans and other robots, these challenges are likely to be exacerbated. This project focuses on safety failures of reinforcement learning (RL) agents, stemming from two primary sources: the misalignment between design intent and the agent’s perceived norms, and the gap between the agent’s required knowledge for safe operation and its actual perception capabilities. The goal is to equip researchers and practitioners with tools to design provably safe autonomous systems, encompassing all major stages of design, verification, and deployment.<br/><br/>The project develops a process to iteratively align an RL agent's norms with those of its designers and formally verify the resulting behavior. Key activities include: (1) developing inverse reinforcement learning algorithms to learn a reward function from demonstrations, constrained by deontic logic; (2) systematically exploring the trained agent’s norms to uncover unknowns by generating norms that would surprise the engineer; (3) querying the agent to explain its reward function when it produces undesired behavior; (4) defining a new class of obligations related to knowledge and corresponding formal specification logic; (5) designing run-time monitors to predict action and knowledge safety violations during operation; (6) implementing online metareasoning, coupled with introspective perception modules, to restore safe behavior; (7) iteratively improving system alignment by updating the agent's learning process using verification and run-time monitoring results. The project's outcomes are validated using an industrial simulator of a real-world bipedal robot, scaled-down autonomous race cars, and a campus-wide fleet of delivery robots.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.