SaTC: CORE: Small: A Framework for Safety Assurance in Foundation Models

Information

NSF Award
2424127

Owner

Virginia Polytechnic Institute and State University

Award Id
2424127
Award Effective Date
10/1/2024 - 2 months ago
Award Expiration Date
9/30/2027 - 2 years from now
Award Amount
$ 309,407.00
Award Instrument
Continuing Grant

Information

SaTC: CORE: Small: A Framework for Safety Assurance in Foundation Models

The rapid advancement and widespread deployment of generative artificial intelligence foundation models, particularly large language models, have also led to an escalation of risks. Despite the tremendous efforts put into their safety alignment, these models have been shown to be fragile, vulnerable to "jailbreaking" attacks against safeguards built into deployed models, and prone to systematic deterioration after custom fine-tuning. The goal of this project is to advance our understanding of the fundamental causes of safety issues and innovate more effective methodologies for ensuring safety, thereby enabling the trustworthy deployment of foundation models in diverse applications. The scientific advances resulting from this work will support the pressing national need for trustworthy artificial intelligence that benefits society at large. Furthermore, this project will impact a broader audience through the organization of workshops, innovation competitions, and the development of educational curricula.<br/><br/>This project will pursue the following tasks, weaving safety measures throughout a model's lifecycle: (1) This project will conduct in-depth analyses of the model's behavior and contributing factors to identify the root causes of harmful outputs and develop targeted interventions. (2) This project will develop a comprehensive testing framework that subjects the model to diverse simulated threats, assessing its resilience, identifying vulnerabilities in human-like interactions, and developing effective countermeasures to enhance robustness. (3) This project will explore methods to integrate safety constraints into the model adaptation process and establish systems for continuous monitoring of the model's behavior in real-world applications to ensure the model remains aligned with desired behaviors, secure, and reliable as it is applied in new contexts.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Program Officer
Nan Zhangnanzhang@nsf.gov7032920000
Min Amd Letter Date
7/11/2024 - 5 months ago
Max Amd Letter Date
7/11/2024 - 5 months ago
ARRA Amount

Institutions

Name
Virginia Polytechnic Institute and State University
City
BLACKSBURG
State
VA
Country
United States
Address
300 TURNER ST NW
Postal Code
240603359
Phone Number
5402315281

Investigators

First Name
Ruoxi
Last Name
Jia
Email Address
ruoxijia@vt.edu
Start Date
7/11/2024 12:00:00 AM

Program Element

Text
Secure &Trustworthy Cyberspace
Code
806000

Program Reference

Text
SaTC: Secure and Trustworthy Cyberspace

Text
SMALL PROJECT
Code
7923

Text
WOMEN, MINORITY, DISABLED, NEC
Code
9102

SaTC: CORE: Small: A Framework for Safety Assurance in Foundation Models

Information

Owner

Award Id

Award Effective Date

Award Expiration Date

Award Amount

Award Instrument

SaTC: CORE: Small: A Framework for Safety Assurance in Foundation Models

Program Officer

Min Amd Letter Date

Max Amd Letter Date

ARRA Amount

Institutions

Name

City

State

Country

Address

Postal Code

Phone Number

Investigators

First Name

Last Name

Email Address

Start Date

Program Element

Text

Code

Program Reference

Text

Text

Code

Text

Code