Collaborative Research: III: Medium: Contextualized and Multimodal Foundation Models for Graph Data in Scientific Discovery

Information

  • NSF Award
  • 2403317
Owner
  • Award Id
    2403317
  • Award Effective Date
    8/15/2024 - 6 months ago
  • Award Expiration Date
    7/31/2028 - 3 years from now
  • Award Amount
    $ 631,953.00
  • Award Instrument
    Continuing Grant

Collaborative Research: III: Medium: Contextualized and Multimodal Foundation Models for Graph Data in Scientific Discovery

Recent progress in deep learning has demonstrated the potential of foundation models built on massive datasets, particularly in scientific discovery. In the sciences, data ranging from particles, to molecules, to cells, to brain activity can be represented by nodes on a graph or as signals on a graph substrate. Therefore, successful AI foundation models in scientific discovery are required to possess the capability of handling such graph-structured data and integrating with other types of data such as text, images, and tabular data. The proposed model can be used as a general substrate to help scientists predict and understand a variety of data that are expressed in graphs, such as molecules, proteins, and connectome.<br/><br/>Existing methods for building graph foundation models for scientific discovery are in general severely limited in that they: 1) do not consider contexts in which the vertices of a graph can themselves be complex structures such as molecular graphs; 2) do not incorporate multimodal information in the form of knowledge graphs and text; 3) have limited forms of message passing in the form of local averaging; and 4) are not versatile and have limited performance gains due to diversity of downstream tasks and graph data distributions. This team of researchers will address these issues by developing a general foundation model framework for data represented as a graph in scientific domains by systematically addressing these key limitations. The framework incorporates novel approaches of multi-level graph neural networks, graph signal processing, multimodal graph learning, graph-specific fine-tuning, and in-context learning. By capturing human scientific knowledge and express the complexity of the natural world, our framework has the potential to dramatically transform machine learning models in scientific discovery and will allow us to tackle a wide range of complex scientific tasks, even with scarce supervision labels.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Hector Munoz-Avilahmunoz@nsf.gov7032924481
  • Min Amd Letter Date
    8/12/2024 - 6 months ago
  • Max Amd Letter Date
    8/12/2024 - 6 months ago
  • ARRA Amount

Institutions

  • Name
    Yale University
  • City
    NEW HAVEN
  • State
    CT
  • Country
    United States
  • Address
    150 MUNSON ST
  • Postal Code
    065113572
  • Phone Number
    2037854689

Investigators

  • First Name
    Rex
  • Last Name
    Ying
  • Email Address
    rex.ying@yale.edu
  • Start Date
    8/12/2024 12:00:00 AM
  • First Name
    Smita
  • Last Name
    Krishnaswamy
  • Email Address
    smita.krishnaswamy@yale.edu
  • Start Date
    8/12/2024 12:00:00 AM

Program Element

  • Text
    Info Integration & Informatics
  • Code
    736400

Program Reference

  • Text
    INFO INTEGRATION & INFORMATICS
  • Code
    7364
  • Text
    MEDIUM PROJECT
  • Code
    7924