EAGER: Using Large Language Models to Model Threats to Sensitive Information

Information

NSF Award
2331492

Owner

Columbia University

Award Id
2331492
Award Effective Date
10/1/2023 - 8 months ago
Award Expiration Date
9/30/2024 - 3 months from now
Award Amount
$ 299,990.00
Award Instrument
Standard Grant

Information

EAGER: Using Large Language Models to Model Threats to Sensitive Information

The review process for releasing government records can be time-consuming and error prone. Large Language Models could help reviewers determine whether information is already in the public domain. By developing a prototype system and measuring performance at different stages, this project aims to estimate the additional data and training required to achieve acceptable levels of accuracy. The iterative nature of the system and the involvement of domain experts allows for measuring and minimizing “hallucination.”<br/><br/>The project decouples the reasoning ability of Large Language Models from knowledge databases. It develops a semantic query engine optimized for accurate extraction of relevant information. The project also takes an active approach to fine-tuning, whereby domain experts train a model that generates queries to retrieve records from the knowledgebase, and allows them to fine tune the retrieval engines by assessing the passages that are extracted from these records before they are fed into the Large Language Model for analysis. The output includes text descriptions of what is found through record assembly, accompanied by the records themselves for further evaluation and fine-tuning. Recently released records will serve as test data, with experts categorizing the information as new or already known. Performance metrics are analyzed, considering the impact of data size and composition on accuracy.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Program Officer
Jeremy Epsteinjepstein@nsf.gov7032928338
Min Amd Letter Date
7/6/2023 - 11 months ago
Max Amd Letter Date
7/6/2023 - 11 months ago
ARRA Amount

Institutions

Name
Columbia University
City
NEW YORK
State
NY
Country
United States
Address
202 LOW LIBRARY 535 W 116 ST MC
Postal Code
10027
Phone Number
2128546851

Investigators

First Name
Matthew
Last Name
Connelly
Email Address
mjc96@columbia.edu
Start Date
7/6/2023 12:00:00 AM

Program Element

Text
Secure &Trustworthy Cyberspace
Code
8060

Program Reference

Text
SaTC: Secure and Trustworthy Cyberspace

Text
EAGER
Code
7916

EAGER: Using Large Language Models to Model Threats to Sensitive Information

Information

Owner

Award Id

Award Effective Date

Award Expiration Date

Award Amount

Award Instrument

EAGER: Using Large Language Models to Model Threats to Sensitive Information

Program Officer

Min Amd Letter Date

Max Amd Letter Date

ARRA Amount

Institutions

Name

City

State

Country

Address

Postal Code

Phone Number

Investigators

First Name

Last Name

Email Address

Start Date

Program Element

Text

Code

Program Reference

Text

Text

Code