Excellence in Research: Developing a Knowledge Graph Driven Integrative Framework for Explainable Protein Function Prediction via Generative Deep Learning

Information

  • NSF Award
  • 2302637
Owner
  • Award Id
    2302637
  • Award Effective Date
    8/1/2023 - a year ago
  • Award Expiration Date
    7/31/2026 - a year from now
  • Award Amount
    $ 533,752.00
  • Award Instrument
    Standard Grant

Excellence in Research: Developing a Knowledge Graph Driven Integrative Framework for Explainable Protein Function Prediction via Generative Deep Learning

Proteins are the building blocks of life performing multitudes of functions that include but not limited to catalyzing reactions as enzymes, participating in the body’s defense mechanism as antibodies, forming structures and transporting important chemicals. The interactions among proteins describe the molecular mechanism of diseases, and convey potentially important insights about the disease prevention, diagnosis, and treatments. Therefore, functional characterization of proteins is crucial to helping understand life, diseases, and developing novel treatments for life threatening illness. Despite recent advancements, predicting protein function remains an open problem due to low performance, lack of explainable outcomes, and irreproducible research dissemination highlighting the need for improved methodologies leveraging the recent proliferation of biomedical data about proteins.<br/> <br/><br/>The objective of this research is to design, implement, and evaluate a protein function prediction pipeline using a novel generative deep learning approach powered by heterogeneous knowledge graph to address the challenge of multi-omics data integration, explainable function prediction, and reproducibility. The research will be carried out through three interrelated tasks: 1) investigation of a novel generative deep learning model on knowledge graph; 2) integration of multi-omics features through large language model; and, 3) development of reproducible software. Successful completion of this project will lead to a robust, more accurate, reproducible and explainable protein function prediction pipeline. The project will create new education and outreach opportunities to greatly strengthen the training and research activities in computational biology leveraging modern AI technologies at Meharry Medical College, a leading HBCU. Meharry dominantly enrolls African American students. More than 90% data science students at Meharry are African Americans and majority are women. This project will increase STEM education awareness, impact, and opportunity to the women and minority students at Meharry to excel in AI/ML, quantitative genomics and data science research. The reproducible open-source software will greatly facilitate broader scientific community working to improve protein function prediction.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Anthony Kuhakuh@nsf.gov7032924714
  • Min Amd Letter Date
    7/26/2023 - a year ago
  • Max Amd Letter Date
    7/26/2023 - a year ago
  • ARRA Amount

Institutions

  • Name
    Meharry Medical College
  • City
    NASHVILLE
  • State
    TN
  • Country
    United States
  • Address
    1005 DR DB TODD JR BLVD
  • Postal Code
    372083501
  • Phone Number
    6153276738

Investigators

  • First Name
    Aize
  • Last Name
    Cao
  • Email Address
    acao@mmc.edu
  • Start Date
    7/26/2023 12:00:00 AM
  • First Name
    Bishnu
  • Last Name
    Sarker
  • Email Address
    bsarker@mmc.edu
  • Start Date
    7/26/2023 12:00:00 AM

Program Element

  • Text
    HBCU-EiR - HBCU-Excellence in

Program Reference

  • Text
    HBCU-Strengthening Research Capacities
  • Text
    BIOMEDICAL ENGINEERING
  • Code
    5345
  • Text
    LEARNING & INTELLIGENT SYSTEMS
  • Code
    8888