Collaborative Research: FET: Small: De Novo Protein Scaffold Filling by Combinatorial Algorithms and Deep Learning Models

Information

  • NSF Award
  • 2307572
Owner
  • Award Id
    2307572
  • Award Effective Date
    7/1/2023 - 11 months ago
  • Award Expiration Date
    6/30/2025 - a year from now
  • Award Amount
    $ 99,384.00
  • Award Instrument
    Standard Grant

Collaborative Research: FET: Small: De Novo Protein Scaffold Filling by Combinatorial Algorithms and Deep Learning Models

Protein sequencing plays an important role in identifying protein functions, analyzing protein-protein interactions, and characterizing post-translational modifications. Despite the recent progress in protein sequencing and assembly, many of the currently available assembled proteins come in a draft form. There are still many gaps in the assembled protein sequences even if one combines top-down and bottom-up sequencing methods. In other words, at the end of the sequencing step for a specific protein, it is more likely to see contigs separated with gaps (which is called a scaffold). Hence, an important but also natural combinatorial problem is to fill the missing amino acids into a scaffold to obtain a complete protein sequence. With the new framework produced by this project, de novo protein sequencing will greatly advance the research and clinical practice of identifying the function and structure of proteins. The project will provide researchers with powerful computational tools for obtaining the sequence information of antibodies, which is extremely valuable for the construction of antibody databases. This interdisciplinary research also provides various training projects to students at all levels, particularly to underrepresented African American students, and helps them to pursue high quality research from an open-minded and cross-disciplinary perspective. New advances achieved will be integrated into undergraduate/graduate curricula. The results will be disseminated through journal publications, conferences, open-source software release, tutorials, and seminar talks.<br/><br/>In this project, the investigators will study the mass spectrometry-based de novo protein scaffold filling problem by two related phases. Firstly, the investigators will analyze the top-down and bottom-up tandem mass spectrometry to construct the protein scaffold without a proper reference. The methods include general global optimization, dynamic programming, and graph algorithms, which can also handle small protein mutations (where the mass of some amino acid changes). Secondly, the investigators will use deep learning methods, such as combined convolutional neural network and long short-term memory, convolutional denoising autoencoder, and transformer models to finish the last step of protein sequencing obtained by top-down and bottom-up tandem mass spectrometry analysis at first step. The project will result in a new framework of combined combinatorial and deep learning methods for protein scaffold filling, and a corresponding open-source software.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Stephanie Gagesgage@nsf.gov7032924748
  • Min Amd Letter Date
    6/28/2023 - 12 months ago
  • Max Amd Letter Date
    6/28/2023 - 12 months ago
  • ARRA Amount

Institutions

  • Name
    Montana State University
  • City
    BOZEMAN
  • State
    MT
  • Country
    United States
  • Address
    216 MONTANA HALL
  • Postal Code
    59717
  • Phone Number
    4069942381

Investigators

  • First Name
    Binhai
  • Last Name
    Zhu
  • Email Address
    bhz@montana.edu
  • Start Date
    6/28/2023 12:00:00 AM

Program Element

  • Text
    FET-Fndtns of Emerging Tech

Program Reference

  • Text
    SMALL PROJECT
  • Code
    7923
  • Text
    COMPUTATIONAL BIOLOGY
  • Code
    7931