MRI: Acquisition of the LanguageLens for Large-Scale Language Modeling

Information

  • NSF Award
  • 2214708
Owner
  • Award Id
    2214708
  • Award Effective Date
    8/1/2022 - 3 years ago
  • Award Expiration Date
    7/31/2025 - 7 days ago
  • Award Amount
    $ 1,014,815.00
  • Award Instrument
    Standard Grant

MRI: Acquisition of the LanguageLens for Large-Scale Language Modeling

Machine learning is revolutionizing many parts of society, but training the very best models requires tremendous computing resources that are often out of reach for academic groups. This project therefore acquires a special-purpose instrument, named the LanguageLens, that is designed to process vast amounts of natural language text. The LanguageLens will support research in natural language processing, deep learning, computational linguistics, crisis informatics, conversational AI, neural machine translation, and legal corpus linguistics, and will enable academic research to advance both the machine learning needed to train large models, as well as societially relevant applications of those models.<br/><br/>The LanguageLens is a high-performance GPU cluster that balances compute, storage and internode communication to support a variety of demanding NLP-based workloads. The LanguageLens will be focused on solving research projects that have the potential for transformational, interdisciplinary impact across a wide variety of fields. A key area of focus for the instrument is the ability to train new large-scale language models and to examine their inner workings in real-time. Language models will be trained with specific downstream applications in mind, on novel corpora as well as with novel neuro-symbolic architectures, to help derive insight from the resulting weights. The LanguageLens will prioritize support for research that addresses pressing societal problems. It will also provide authentic workforce training and educational experiences for students: as the resource gap between industry and academia grows, it is increasingly difficult to give them opportunities to pursue high-impact research that involves huge models and datasets. Finally, as many companies refuse to release the pretrained weights of their models, a central goal is to make trained weights freely available to everyone, subject to ethical considerations, to drive national impact for both industry and academia. Project resources such as code, publications, datasets and pretrained models will be available through the LanguageLens website at https://ll.cs.byu.edu/.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Daniel Andresendandrese@nsf.gov7032922177
  • Min Amd Letter Date
    8/16/2022 - 2 years ago
  • Max Amd Letter Date
    9/7/2022 - 2 years ago
  • ARRA Amount

Institutions

  • Name
    Brigham Young University
  • City
    PROVO
  • State
    UT
  • Country
    United States
  • Address
    A-153 ASB
  • Postal Code
    846021128
  • Phone Number
    8014223360

Investigators

  • First Name
    Joshua
  • Last Name
    Gubler
  • Email Address
    jgub@byu.edu
  • Start Date
    8/16/2022 12:00:00 AM
  • First Name
    Ethan
  • Last Name
    Busby
  • Email Address
    ethan.busby@byu.edu
  • Start Date
    8/16/2022 12:00:00 AM
  • First Name
    David
  • Last Name
    Wingate
  • Email Address
    wingated@cs.byu.edu
  • Start Date
    8/16/2022 12:00:00 AM
  • First Name
    Nancy
  • Last Name
    Fulda
  • Email Address
    nfulda@cs.byu.edu
  • Start Date
    8/16/2022 12:00:00 AM
  • First Name
    Lisa
  • Last Name
    Argyle
  • Email Address
    lisa_argyle@byu.edu
  • Start Date
    8/16/2022 12:00:00 AM

Program Element

  • Text
    Major Research Instrumentation
  • Code
    1189
  • Text
    Special Projects - CNS
  • Code
    1714

Program Reference

  • Text
    MAJOR RESEARCH INSTRUMENTATION
  • Code
    1189
  • Text
    REU SUPP-Res Exp for Ugrd Supp
  • Code
    9251