Compute-Cluster for Deep-Learning Models for Mass Spectrometry based Proteomics data

Information

  • Research Project
  • 10389445
  • ApplicationId
    10389445
  • Core Project Number
    R01GM134384
  • Full Project Number
    3R01GM134384-02S1
  • Serial Number
    134384
  • FOA Number
    PA-20-272
  • Sub Project Id
  • Project Start Date
    6/1/2020 - 4 years ago
  • Project End Date
    5/31/2023 - a year ago
  • Program Officer Name
    RAVICHANDRAN, VEERASAMY
  • Budget Start Date
    6/1/2021 - 3 years ago
  • Budget End Date
    5/31/2022 - 2 years ago
  • Fiscal Year
    2021
  • Support Year
    02
  • Suffix
    S1
  • Award Notice Date
    8/12/2021 - 3 years ago

Compute-Cluster for Deep-Learning Models for Mass Spectrometry based Proteomics data

Project Abstract/Summary Mass spectrometry (MS) data is high-dimensional data that is used for large-scale system biology proteomics. The current state of the art mass spectrometers can generate thousands of spectra from a single organism and experiment. This high-dimensional data is processed using database searches and denovo algorithms with varying degrees of success. The overarching objective of this study is to develop, test, integrate and evaluate novel image-processing and deep-learning algorithms that will allow us to deduce and identify reliable peptide sequences in a definitive and quantitative fashion. Our long-term goal is to improve on identification of MS based proteomics data using novel and scalable algorithms. The objective of this proposal is to investigate, design and implement machine-learning deep-learning algorithms for identification of peptides from MS data. Since deep-learning is very good at discovering intricate structures in high-dimensional data it will be ideal solution for discovering dark proteomics data and more accurate deduction of peptides. We predict that the integration of these methods, along with traditional numerical algorithms, will lead to a multimodal fusion-based approach for an optimized and accurate peptide deduction system for large-scale MS data. Further, we will design and implement data augmentation, memory-efficient indexing, and high-performance computing (HPC) to achieve these outcomes more efficiently with a shorter computational time. Therefore, this new line of investigation is significant since it has the potential to improve on long-stalled effort to increase accuracy, reliability and reproducibility of MS data analysis and search tools. The proximate expected outcome of this work is a novel set of deep-learning and image-processing tools which will allow much better insight in MS based proteomics data. The results will have an important positive impact immediately because these proposed research tasks will lay the groundwork to develop a new class of algorithms and will provide rapid, high-throughput, sensitive, and reproducible and reliable tools for MS based proteomics.

IC Name
NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
  • Activity
    R01
  • Administering IC
    GM
  • Application Type
    3
  • Direct Cost Amount
    100000
  • Indirect Cost Amount
    0
  • Total Cost
    100000
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    859
  • Ed Inst. Type
    BIOMED ENGR/COL ENGR/ENGR STA
  • Funding ICs
    NIGMS:100000\
  • Funding Mechanism
    Non-SBIR/STTR RPGs
  • Study Section
    BDMA
  • Study Section Name
    Biodata Management and Analysis Study Section
  • Organization Name
    FLORIDA INTERNATIONAL UNIVERSITY
  • Organization Department
    BIOSTATISTICS & OTHER MATH SCI
  • Organization DUNS
    071298814
  • Organization City
    MIAMI
  • Organization State
    FL
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    331992516
  • Organization District
    UNITED STATES