CAREER: Personalized Speech Enhancement: Test-Time Adaptation Using No or Few Private Data

Information

  • NSF Award
  • 2512987
Owner
  • Award Id
    2512987
  • Award Effective Date
    10/1/2024 - 3 months ago
  • Award Expiration Date
    5/31/2026 - a year from now
  • Award Amount
    $ 244,179.00
  • Award Instrument
    Continuing Grant

CAREER: Personalized Speech Enhancement: Test-Time Adaptation Using No or Few Private Data

Current general-purpose speech enhancement systems employ large models trained from big datasets of audio signals which are too bulky to run on small personal devices. A personalized model can be a resource-efficient solution because it focuses on a particular user and a specific test environment for which a smaller model architecture can be good enough. However, training a personalized model requires clean voice data from the test-time user in advance, which are not always available because of the user’s privacy concerns or problems with recording. This CAREER project develops machine-learning methods to achieve the personalization goal while requiring no or few data samples from the test-time users. Because the project achieves the personalization goal in a privacy-preserving and resource-efficient way, it is a step towards a more available and affordable use of artificial intelligence for all members of society.<br/><br/>The project circumvents the lack of personal data in the context of personalized speech enhancement using no- and few-shot learning frameworks with help from adversarial and self-supervised learning. First, it verifies that a personalized system with reduced computational complexity can still compete with a generic model in speech enhancement performance. To this end, the training algorithm divides the potentially large model into multiple sub-modules, each of which handles a particular sub-problem (e.g., a particular user's utterance). If the sub-problems are defined to be mutually exclusive, the test-time inference can be made efficiently by using only the most suitable sub-module. Since the sub-module selection is done on noisy speech, it achieves personalization with no additional training on the test user's data. Second, the project explores a no-shot learning approach, in which the fundamental challenge lies in optimizing a machine learning model with no available target. To this end, an already-trained general-purpose model is fine-tuned for an unseen test environment using adversarial optimization. The third research topic handles the case when a small amount of user's clean speech is available, which falls in the category of few-shot learning. The project overcomes data shortage via a self-supervised learning method that learns effective features from noisy speech data, which are more available than the clean ones. That way, the model can be prepared for a subsequent fine-tuning step, which can be done with only a few clean user-specific speech utterances.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Tatiana Korelskytkorelsk@nsf.gov7032920000
  • Min Amd Letter Date
    12/16/2024 - 20 days ago
  • Max Amd Letter Date
    12/16/2024 - 20 days ago
  • ARRA Amount

Institutions

  • Name
    University of Illinois at Urbana-Champaign
  • City
    URBANA
  • State
    IL
  • Country
    United States
  • Address
    506 S WRIGHT ST
  • Postal Code
    618013620
  • Phone Number
    2173332187

Investigators

  • First Name
    Minje
  • Last Name
    Kim
  • Email Address
    minje@indiana.edu
  • Start Date
    12/16/2024 12:00:00 AM

Program Element

  • Text
    Robust Intelligence
  • Code
    749500

Program Reference

  • Text
    CAREER-Faculty Erly Career Dev
  • Code
    1045
  • Text
    ROBUST INTELLIGENCE
  • Code
    7495