The rapid advancement and widespread deployment of generative artificial intelligence foundation models, particularly large language models, have also led to an escalation of risks. Despite the tremendous efforts put into their safety alignment, these models have been shown to be fragile, vulnerable to "jailbreaking" attacks against safeguards built into deployed models, and prone to systematic deterioration after custom fine-tuning. The goal of this project is to advance our understanding of the fundamental causes of safety issues and innovate more effective methodologies for ensuring safety, thereby enabling the trustworthy deployment of foundation models in diverse applications. The scientific advances resulting from this work will support the pressing national need for trustworthy artificial intelligence that benefits society at large. Furthermore, this project will impact a broader audience through the organization of workshops, innovation competitions, and the development of educational curricula.<br/><br/>This project will pursue the following tasks, weaving safety measures throughout a model's lifecycle: (1) This project will conduct in-depth analyses of the model's behavior and contributing factors to identify the root causes of harmful outputs and develop targeted interventions. (2) This project will develop a comprehensive testing framework that subjects the model to diverse simulated threats, assessing its resilience, identifying vulnerabilities in human-like interactions, and developing effective countermeasures to enhance robustness. (3) This project will explore methods to integrate safety constraints into the model adaptation process and establish systems for continuous monitoring of the model's behavior in real-world applications to ensure the model remains aligned with desired behaviors, secure, and reliable as it is applied in new contexts.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.