Collaborative Research: III: Medium: Retrieval-Enhanced Machine Learning Through an Information Retrieval Lens

Information

NSF Award
2402874

Owner

CARNEGIE-MELLON UNIVERSITY

Award Id
2402874
Award Effective Date
10/1/2024 - 3 months ago
Award Expiration Date
9/30/2027 - 2 years from now
Award Amount
$ 336,954.00
Award Instrument
Continuing Grant

Information

Collaborative Research: III: Medium: Retrieval-Enhanced Machine Learning Through an Information Retrieval Lens

Retrieval-Enhanced Machine Learning (REML) refers to a subset of machine learning models that make predictions by utilizing the results of one or more retrieval models from collections of documents. REML has recently attracted considerable attention due to its wide range of applications, including knowledge grounding for question answering and improving generalization in large language models. However, REML has mainly been studied from a machine learning perspective, without focusing on the retrieval aspects. Preliminary explorations have demonstrated the importance of retrieval on downstream REML performance. This observation has motivated this project in order to provide an alternative view to REML and study REML from an information retrieval (IR) perspective. In this perspective, the retrieval component in REML is framed as a search engine capable of supporting multiple, independent predictive models, as opposed to a single predictive model as is the case in the majority of existing work. <br/><br/>This project consists of three major research thrusts. First, the project will develop novel architectures and optimization solutions that provide information access to multiple machine learning models conducting a wide variety of tasks. Next, the project will study training and inference efficiency in the context of REML by focusing on the utilization of retrieval results by downstream machine learning models and the feedback they provide. Third, the project will study approaches for responsible REML by examining data control for content providers in REML and fairness and robustness across multiple downstream models. Without loss of generality, the project will primarily focus on a number of real-world language tasks, such as open-domain question answering, fact verification, and open-domain dialogue systems.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Program Officer
Cornelia Carageaccaragea@nsf.gov7032922706
Min Amd Letter Date
8/12/2024 - 4 months ago
Max Amd Letter Date
8/12/2024 - 4 months ago
ARRA Amount

Institutions

Name
Carnegie-Mellon University
City
PITTSBURGH
State
PA
Country
United States
Address
5000 FORBES AVE
Postal Code
152133815
Phone Number
4122688746

Investigators

First Name
Fernando
Last Name
Diaz
Email Address
diazf@cmu.edu
Start Date
8/12/2024 12:00:00 AM

Program Element

Text
Info Integration & Informatics
Code
736400

Program Reference

Text
INFO INTEGRATION & INFORMATICS
Code
7364

Text
MEDIUM PROJECT
Code
7924

Collaborative Research: III: Medium: Retrieval-Enhanced Machine Learning Through an Information Retrieval Lens

Information

Owner

Award Id

Award Effective Date

Award Expiration Date

Award Amount

Award Instrument

Collaborative Research: III: Medium: Retrieval-Enhanced Machine Learning Through an Information Retrieval Lens

Program Officer

Min Amd Letter Date

Max Amd Letter Date

ARRA Amount

Institutions

Name

City

State

Country

Address

Postal Code

Phone Number

Investigators

First Name

Last Name

Email Address

Start Date

Program Element

Text

Code

Program Reference

Text

Code

Text

Code