Collaborative Research: CCRI: New: Building a Broad Infrastructure for Uniform Meaning Representations

Information

  • NSF Award
  • 2213804
Owner
  • Award Id
    2213804
  • Award Effective Date
    8/1/2022 - a year ago
  • Award Expiration Date
    7/31/2025 - a year from now
  • Award Amount
    $ 999,689.00
  • Award Instrument
    Standard Grant

Collaborative Research: CCRI: New: Building a Broad Infrastructure for Uniform Meaning Representations

When humans attempt to talk with a computer, our language needs to be translated into a meaning representation that can be processed and understood by the computer. Currently, such translation is done on a task-by-task and language-by-language basis. Such a fragmented approach introduces redundancy and repetition, and is thus inefficient. Uniform Meaning Representation (UMR) is designed as a machine-readable language that all languages, from high-resource languages such as English and Chinese, to low-resource languages like Arapaho, can be translated into. UMR can also be extended to multi-modal settings to represent the content of videos and images, allowing computers to better process and understand the content of these media forms. This project aims to build the necessary infrastructure for translating languages and other media into UMRs. This infrastructure includes tools used to facilitate the translation of human language to UMRs, metrics that can be used to evaluate the quality of UMRs, and an initial collection of UMRs for five languages that have very different linguistic properties: English, Chinese, Arabic, Arapaho, and Quechua, as well as video content that includes both language and gestures for two of those languages.. The project also includes outreach efforts to engage fellow researchers to produce UMRs for additional languages and genres with tutorials, workshops, summer schools, as well as online training materials. Once a sufficient amount of UMRs are created for a language, computer models and algorithms can be trained on these UMRs to automatically produce more UMRs for new data in that language. They can then be used to advance the state of the art for a wide range of downstream human language technologies, ranging from human robot interaction to dialogue systems, from information extraction to question answering, from machine translation to text summarization. The project will also produce UMRs for under-resourced languages and help bring modern language technologies to speakers of those languages, as well as people working on the documentation and/or revitalization of the languages. <br/><br/>This project brings together an interdisciplinary team of linguists and computer scientists to jointly build<br/>an infrastructure for Uniform Meaning Representation (UMR), a practical, formal, computationally tractable, and cross-linguistically valid document-level meaning representation of natural language that can impact a wide range of downstream applications that require “deep” natural language understanding (NLU). The UMR infrastructure will consist of UMR-annotated data sets for five languages, including multimodal data sets for two of those languages, English and Arapaho, a UMR annotation interface and relevant training materials, baseline UMR parsing models that fellow NLP researchers can use as a point of comparison when developing more advanced UMR parsing models, metrics for evaluating document-level meaning representations, and a platform for disseminating the UMR data sets, tools and resources to users of the infrastructure. This project also includes a broad range of outreach efforts consisting of workshops, tutorials, summer schools, and a shared task at the end of the project to involve fellow researchers in the NLP community to produce UMRs for additional languages and promote the use of the UMR infrastructure in meaning representation parsing research and downstream applications. The UMR infrastructure promotes the development of general purpose multilingual and multimodal applications in an effort to move away from both language-specific and task-specific models that require repetitive and often conflicting semantic annotation efforts. The ultimate goal of the project is to build a community of NLP researchers that will contribute to the development of UMR-based data and tools, and adopt UMR in downstream applications to advance the state of the art in Natural Language Processing (NLP) in particular and Artificial Intelligence (AI) in general. In particular, the proposed infrastructure promotes access to information technology in languages for traditionally underrepresented groups by providing the necessary tools and resources to develop AI technologies for these languages.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Tatiana Korelskytkorelsk@nsf.gov7032928729
  • Min Amd Letter Date
    7/12/2022 - a year ago
  • Max Amd Letter Date
    7/12/2022 - a year ago
  • ARRA Amount

Institutions

  • Name
    Brandeis University
  • City
    WALTHAM
  • State
    MA
  • Country
    United States
  • Address
    415 SOUTH ST
  • Postal Code
    024532728
  • Phone Number
    7817362121

Investigators

  • First Name
    Nianwen
  • Last Name
    Xue
  • Email Address
    xuen@cs.brandeis.edu
  • Start Date
    7/12/2022 12:00:00 AM
  • First Name
    James
  • Last Name
    Pustejovsky
  • Email Address
    pustejovsky@gmail.com
  • Start Date
    7/12/2022 12:00:00 AM

Program Element

  • Text
    CCRI-CISE Cmnty Rsrch Infrstrc
  • Code
    7359

Program Reference

  • Text
    COMPUTING RES INFRASTRUCTURE
  • Code
    7359