Resolving and understanding the genomic basis of heterogeneous complex traits and diseases

Information

  • Research Project
  • 10406616
  • ApplicationId
    10406616
  • Core Project Number
    R35GM128765
  • Full Project Number
    3R35GM128765-04S1
  • Serial Number
    128765
  • FOA Number
    PA-20-272
  • Sub Project Id
  • Project Start Date
    8/15/2018 - 5 years ago
  • Project End Date
    7/31/2023 - 9 months ago
  • Program Officer Name
    KRASNEWICH, DONNA M
  • Budget Start Date
    8/1/2021 - 2 years ago
  • Budget End Date
    7/31/2022 - a year ago
  • Fiscal Year
    2021
  • Support Year
    04
  • Suffix
    S1
  • Award Notice Date
    9/17/2021 - 2 years ago

Resolving and understanding the genomic basis of heterogeneous complex traits and diseases

Parent grant Resolving and understanding the genomic basis of heterogeneous complex traits and diseases. Hundreds of genomics studies have exposed major gaps in our understanding of the mechanistic relationships between genomic variation, cellular processes, tissue function, and trait variation. The goal of the parent project is to develop a suite of computational frameworks that integrate massive collections of genomic and biomedical data to make the following three advances: Direction 1: Discern and leverage mechanism-based subtypes of complex traits and diseases. Direction 2: Characterize physiology and disease along the human lifespan and across the sexes. Direction 3: Find analogous contexts in model organisms for studying human traits/diseases. As demonstrated by us and others, genome-wide molecular networks are grand unifiers of molecular data and knowledge, and serve as powerful tools to contextually understand the roles genes play in cellular pathways, tissue physiology, phenotype/disease mechanisms, and drug action. Hence, a central aspect of our parent project is to develop multiple machine learning approaches to leverage molecular networks to generate accurate, testable hypotheses about the roles genes play in defining subtypes, age/sex differences, and cross-species analogs of a range of complex disorders. As part of this work, we have developed GenePlexus github.com/krishnanlab/GenePlexus, an open source software to run and benchmark our state-of-the-art approach for combining genome-scale networks with supervised machine learning (ML) to get accurate novel predictions about various gene attributes (e.g., pathway membership or disease association; Liu*, Mancuso*, et al., 2020 Bioinformatics). Similarly, our group has committed efforts to make all our other computational methods available to the broader biomedical research community in the form of software tools for open science. We have released such software with nearly all our papers. Other recent examples include: ? PecanPy github.com/krishnanlab/PecanPy for parallelized, efficient, and accelerated node2vec. ? Expresto github.com/krishnanlab/Expresto for imputing unmeasured genes in transcriptomes. ? Txt2Onto github.com/krishnanlab/Txt2Onto for annotating ?omics samples based on free-text metadata. Goal of the supplement project and current prototype GenePlexus: A cloud platform for network-based machine learning The goal of the proposed supplement is to take our software development to the next level by a building a new cloud-based GenePlexus platform to enable: i) biomedical/experimental researchers to seamlessly take advantage of network-based ML to generate interpretable genome-wide predictions, and ii) computational researchers to run network-based ML, retrieve results, and integrate with existing data analysis workflows. The project team, which includes the PI, a postdoc trained in cloud computing, and two professional software engineers ? has worked together over the past six months to develop a prototype of the GenePlexus platform (in the form of a web-server mounted on Microsoft Azure; Fig. 1). Figure 1: Screenshots of the current prototype of the GenePlexus webserver that implements our original GenePlexus software github.com/krishnanlab/GenePlexus. When a user uploads a geneset, GenePlexus creates a virtual machine that trains a ML model, leveraging pre-saved data. In addition to predicting genes in the network functionally similar to the user-supplied genes, GenePlexus provides enables the user to interpret the custom-built ML model in terms of its similarity to thousands of models that were trained to predict genes associated with biological processes (from Gene Ontology) and diseases (from DisGeNET). In addition to retrieving the prediction results in multiple convenient formats, users can visualize the top predicted genes as a network graph.

IC Name
NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
  • Activity
    R35
  • Administering IC
    GM
  • Application Type
    3
  • Direct Cost Amount
    150000
  • Indirect Cost Amount
    84750
  • Total Cost
    234750
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    859
  • Ed Inst. Type
    SCHOOLS OF ARTS AND SCIENCES
  • Funding ICs
    NIGMS:234750\
  • Funding Mechanism
    Non-SBIR/STTR RPGs
  • Study Section
    ZGM1
  • Study Section Name
    Special Emphasis Panel
  • Organization Name
    MICHIGAN STATE UNIVERSITY
  • Organization Department
    BIOSTATISTICS & OTHER MATH SCI
  • Organization DUNS
    193247145
  • Organization City
    EAST LANSING
  • Organization State
    MI
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    488242600
  • Organization District
    UNITED STATES