RI: Small: From acoustics to semantics: Embedding speech for a hierarchy of tasks

Information

  • NSF Award
  • 1816627
Owner
  • Award Id
    1816627
  • Award Effective Date
    8/15/2018 - 5 years ago
  • Award Expiration Date
    7/31/2021 - 2 years ago
  • Award Amount
    $ 449,984.00
  • Award Instrument
    Continuing grant

RI: Small: From acoustics to semantics: Embedding speech for a hierarchy of tasks

There is an increasingly large array of spoken language interfaces available, such as virtual assistants and telephone customer service interfaces. These technologies both (1) recognize the words spoken by a user and (2) extract actionable information, such as the topic of the user's query and the degree of match between the query and documents in a database. Such applications are typically treated as a pipeline of automatic speech transcription followed by text processing to extract the meaning. This project aims to develop technology that directly extracts meaning from speech, while using a variety of linguistic information along the way. This approach is intended to mitigate the effects of speech recognition errors, as well as to use all of the meaning-bearing information in speech, such as intonation. This work is expected to have long-term broad impact through technological advances, as well as immediate broad impact through the PI's involvement in local schools and mentoring for a diverse set of visiting students.<br/><br/>The technical goals of this work are (1) to do high-quality natural language processing directly on speech; (2) to seamlessly integrate domain knowledge into end-to-end speech models; (3) improve the performance-vs.-resources tradeoff; and (4) develop models for embedding arbitrary speech signals into meaning-bearing representations. The process of mapping from speech to meaning can be viewed as a hierarchy of tasks, from the most basic acoustic-phonetic tasks to the deepest semantic tasks. The experimental work will focus on two task hierarchies: a "retrieval" hierarchy including query-by-example search, keyword spotting, semantic speech search; and a "recognition" hierarchy including phonetic recognition, word recognition, parsing, and topic identification. The main technical approaches to be developed include hierarchical multitask learning methods for incorporating domain knowledge and mitigating low-data settings, as well as new models for acoustic-semantic speech embedding.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Tatiana D. Korelsky
  • Min Amd Letter Date
    8/2/2018 - 5 years ago
  • Max Amd Letter Date
    9/13/2018 - 5 years ago
  • ARRA Amount

Institutions

  • Name
    Toyota Technological Institute at Chicago
  • City
    Chicago
  • State
    IL
  • Country
    United States
  • Address
    6045 S. Kenwood Avenue
  • Postal Code
    606372803
  • Phone Number
    7738340409

Investigators

  • First Name
    Karen
  • Last Name
    Livescu
  • Email Address
    klivescu@ttic.edu
  • Start Date
    8/2/2018 12:00:00 AM

Program Element

  • Text
    ROBUST INTELLIGENCE
  • Code
    7495

Program Reference

  • Text
    ROBUST INTELLIGENCE
  • Code
    7495
  • Text
    SMALL PROJECT
  • Code
    7923