EAGER: Discovery of Segmental Sub-Word Structure in Speech

Information

  • NSF Award
  • 1433485
Owner
  • Award Id
    1433485
  • Award Effective Date
    3/1/2014 - 10 years ago
  • Award Expiration Date
    2/28/2015 - 9 years ago
  • Award Amount
    $ 99,911.00
  • Award Instrument
    Standard Grant

EAGER: Discovery of Segmental Sub-Word Structure in Speech

This EArly Concept Grant for Exploratory Research (EAGER) investigates new machine learning techniques for discovering sub-word units in speech for use in automatic speech recognition (ASR). The representation of this EArly Concept Grant for Exploratory Research investigates new machine learning techniques for discovering sub-word units in speech for use in automatic speech recognition (ASR). The representation of words in terms of sub-word units is rarely learned from data, despite significant disagreement among linguists as to the sub-word unit inventory. This project represents exploratory work toward a larger goal of making all aspects of ASR learnable, using scientific insights while being discriminatively trained.<br/><br/>In contrast with prior work, speech segments are clustered into units using discriminatively learned segmental similarities, rather than via dynamic time warping or hidden Markov models. Rather than pre-supposing phoneme-like units, multiple heterogeneous unit types<br/>are learned. The project also leverages multi-modal (video, articulatory, and so on) data to improve unit discovery by sharing<br/>information across modalities. In this exploratory work, the learned units are used in a discriminative model that rescores initial outputs from a standard phone-based recognizer, and the experiments focus on small-/medium-vocabulary recognition.<br/><br/>This project explores new ways of discovering the basic units of speech. Beyond improvements to speech recognition, this project has<br/>the potential for broad impact on other research areas involving sequences with segmental sub-structure (such as text, video,<br/>biological data, and financial data) or involving clustering. The results may also include new representations for the study of speech<br/>in linguistics and speech science. From a societal perspective, in the long term making speech recognition more learnable will enable<br/>improved porting of the technology to under-served linguistic communities, which do not have the benefit of large data sets or other resources.

  • Program Officer
    Tatiana D. Korelsky
  • Min Amd Letter Date
    3/4/2014 - 10 years ago
  • Max Amd Letter Date
    3/4/2014 - 10 years ago
  • ARRA Amount

Institutions

  • Name
    Toyota Technological Institute at Chicago
  • City
    Chicago
  • State
    IL
  • Country
    United States
  • Address
    6045 S. Kenwood Avenue
  • Postal Code
    606372902
  • Phone Number
    7738340409

Investigators

  • First Name
    Karen
  • Last Name
    Livescu
  • Email Address
    klivescu@ttic.edu
  • Start Date
    3/4/2014 12:00:00 AM

Program Element

  • Text
    ROBUST INTELLIGENCE
  • Code
    7495

Program Reference

  • Text
    ROBUST INTELLIGENCE
  • Code
    7495
  • Text
    EAGER
  • Code
    7916