MRI: Acquisition of the LanguageLens for Large-Scale Language Modeling

Information

NSF Award
2214708

Owner

Brigham Young University

Award Id
2214708
Award Effective Date
8/1/2022 - 3 years ago
Award Expiration Date
7/31/2025 - 7 days ago
Award Amount
$ 1,014,815.00
Award Instrument
Standard Grant

Information

MRI: Acquisition of the LanguageLens for Large-Scale Language Modeling

Machine learning is revolutionizing many parts of society, but training the very best models requires tremendous computing resources that are often out of reach for academic groups. This project therefore acquires a special-purpose instrument, named the LanguageLens, that is designed to process vast amounts of natural language text. The LanguageLens will support research in natural language processing, deep learning, computational linguistics, crisis informatics, conversational AI, neural machine translation, and legal corpus linguistics, and will enable academic research to advance both the machine learning needed to train large models, as well as societially relevant applications of those models.<br/><br/>The LanguageLens is a high-performance GPU cluster that balances compute, storage and internode communication to support a variety of demanding NLP-based workloads. The LanguageLens will be focused on solving research projects that have the potential for transformational, interdisciplinary impact across a wide variety of fields. A key area of focus for the instrument is the ability to train new large-scale language models and to examine their inner workings in real-time. Language models will be trained with specific downstream applications in mind, on novel corpora as well as with novel neuro-symbolic architectures, to help derive insight from the resulting weights. The LanguageLens will prioritize support for research that addresses pressing societal problems. It will also provide authentic workforce training and educational experiences for students: as the resource gap between industry and academia grows, it is increasingly difficult to give them opportunities to pursue high-impact research that involves huge models and datasets. Finally, as many companies refuse to release the pretrained weights of their models, a central goal is to make trained weights freely available to everyone, subject to ethical considerations, to drive national impact for both industry and academia. Project resources such as code, publications, datasets and pretrained models will be available through the LanguageLens website at https://ll.cs.byu.edu/.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Program Officer
Daniel Andresendandrese@nsf.gov7032922177
Min Amd Letter Date
8/16/2022 - 2 years ago
Max Amd Letter Date
9/7/2022 - 2 years ago
ARRA Amount

Institutions

Name
Brigham Young University
City
PROVO
State
UT
Country
United States
Address
A-153 ASB
Postal Code
846021128
Phone Number
8014223360

Investigators

First Name
Joshua
Last Name
Gubler
Email Address
jgub@byu.edu
Start Date
8/16/2022 12:00:00 AM

First Name
Ethan
Last Name
Busby
Email Address
ethan.busby@byu.edu
Start Date
8/16/2022 12:00:00 AM

First Name
David
Last Name
Wingate
Email Address
wingated@cs.byu.edu
Start Date
8/16/2022 12:00:00 AM

First Name
Nancy
Last Name
Fulda
Email Address
nfulda@cs.byu.edu
Start Date
8/16/2022 12:00:00 AM

First Name
Lisa
Last Name
Argyle
Email Address
lisa_argyle@byu.edu
Start Date
8/16/2022 12:00:00 AM

Program Element

Text
Major Research Instrumentation
Code
1189

Text
Special Projects - CNS
Code
1714

Program Reference

Text
MAJOR RESEARCH INSTRUMENTATION
Code
1189

Text
REU SUPP-Res Exp for Ugrd Supp
Code
9251

MRI: Acquisition of the LanguageLens for Large-Scale Language Modeling

Information

Owner

Award Id

Award Effective Date

Award Expiration Date

Award Amount

Award Instrument

MRI: Acquisition of the LanguageLens for Large-Scale Language Modeling

Program Officer

Min Amd Letter Date

Max Amd Letter Date

ARRA Amount

Institutions

Name

City

State

Country

Address

Postal Code

Phone Number

Investigators

First Name

Last Name

Email Address

Start Date

First Name

Last Name

Email Address

Start Date

First Name

Last Name

Email Address

Start Date

First Name

Last Name

Email Address

Start Date

First Name

Last Name

Email Address

Start Date

Program Element

Text

Code

Text

Code

Program Reference

Text

Code

Text

Code