SaTC: CORE: Small: A Framework for Safety Assurance in Foundation Models

Information

  • NSF Award
  • 2424127
Owner
  • Award Id
    2424127
  • Award Effective Date
    10/1/2024 - 2 months ago
  • Award Expiration Date
    9/30/2027 - 2 years from now
  • Award Amount
    $ 309,407.00
  • Award Instrument
    Continuing Grant

SaTC: CORE: Small: A Framework for Safety Assurance in Foundation Models

The rapid advancement and widespread deployment of generative artificial intelligence foundation models, particularly large language models, have also led to an escalation of risks. Despite the tremendous efforts put into their safety alignment, these models have been shown to be fragile, vulnerable to "jailbreaking" attacks against safeguards built into deployed models, and prone to systematic deterioration after custom fine-tuning. The goal of this project is to advance our understanding of the fundamental causes of safety issues and innovate more effective methodologies for ensuring safety, thereby enabling the trustworthy deployment of foundation models in diverse applications. The scientific advances resulting from this work will support the pressing national need for trustworthy artificial intelligence that benefits society at large. Furthermore, this project will impact a broader audience through the organization of workshops, innovation competitions, and the development of educational curricula.<br/><br/>This project will pursue the following tasks, weaving safety measures throughout a model's lifecycle: (1) This project will conduct in-depth analyses of the model's behavior and contributing factors to identify the root causes of harmful outputs and develop targeted interventions. (2) This project will develop a comprehensive testing framework that subjects the model to diverse simulated threats, assessing its resilience, identifying vulnerabilities in human-like interactions, and developing effective countermeasures to enhance robustness. (3) This project will explore methods to integrate safety constraints into the model adaptation process and establish systems for continuous monitoring of the model's behavior in real-world applications to ensure the model remains aligned with desired behaviors, secure, and reliable as it is applied in new contexts.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Nan Zhangnanzhang@nsf.gov7032920000
  • Min Amd Letter Date
    7/11/2024 - 5 months ago
  • Max Amd Letter Date
    7/11/2024 - 5 months ago
  • ARRA Amount

Institutions

  • Name
    Virginia Polytechnic Institute and State University
  • City
    BLACKSBURG
  • State
    VA
  • Country
    United States
  • Address
    300 TURNER ST NW
  • Postal Code
    240603359
  • Phone Number
    5402315281

Investigators

  • First Name
    Ruoxi
  • Last Name
    Jia
  • Email Address
    ruoxijia@vt.edu
  • Start Date
    7/11/2024 12:00:00 AM

Program Element

  • Text
    Secure &Trustworthy Cyberspace
  • Code
    806000

Program Reference

  • Text
    SaTC: Secure and Trustworthy Cyberspace
  • Text
    SMALL PROJECT
  • Code
    7923
  • Text
    WOMEN, MINORITY, DISABLED, NEC
  • Code
    9102