Functional characterization of genetic and post-transcriptional variation using machine learning methods

Information

  • Research Project
  • 9730612
  • ApplicationId
    9730612
  • Core Project Number
    R21LM012772
  • Full Project Number
    5R21LM012772-02
  • Serial Number
    012772
  • FOA Number
    PA-16-161
  • Sub Project Id
  • Project Start Date
    7/1/2018 - 6 years ago
  • Project End Date
    6/30/2020 - 4 years ago
  • Program Officer Name
    YE, JANE
  • Budget Start Date
    7/1/2019 - 5 years ago
  • Budget End Date
    6/30/2020 - 4 years ago
  • Fiscal Year
    2019
  • Support Year
    02
  • Suffix
  • Award Notice Date
    6/28/2019 - 5 years ago

Functional characterization of genetic and post-transcriptional variation using machine learning methods

The goal of this research proposal is to develop new in-silico approaches for accurate functional annotation of genetic and post-transcriptional variants. The rapid growth of Next-Generation Sequencing (NGS) and high- throughput -omics data have brought us one step closer towards mechanistic understanding of the complex genetic disease, such as cancer, neurological disorders, diabetes, and others at the molecular level. In particular, these data revealed that complex diseases commonly manifest changes at the genetic and post- transcriptional levels. Bot of these types of changes often affect structure and function of the corresponding genes and their products. Understanding the functional implications of the genetic and post-transcriptional variation is an important task as it can provide critical insights into the molecular mechanisms underlying the disease. Here, we propose to leverage novel machine learning paradigms to design computational methods for predicting the effect of genetic and alternative splicing variants on the macromolecular interactions. Macromolecular interactions underlie many cellular functions in a healthy organism. The disease-induced changes in the genes, such as single nucleotide variations (SNVs) and alternative splicing variations (ASVs) have been recently reported to cause the protein-protein interaction network rewiring. Unfortunately, the experimental high-throughput techniques that characterize the large-scale effects of SNVs or ASVs on PPIs are expensive, time-consuming, and far from being comprehensive. The current in-silico methods either suffer from the limited applicability, or are less accurate when compared with the experimental methods. To overcome these challenges, we will use two recent machine learning paradigms, learning under privileged information (LUPI) and semi-supervised learning. If successful, we expect for the proposed methods to provide the critical advancement in the two main challenges of the current computational approaches, the limited coverage and lower than the experimental accuracy. The methods will be freely available to the community as the stand-alone tools as well as web- servers.

IC Name
NATIONAL LIBRARY OF MEDICINE
  • Activity
    R21
  • Administering IC
    LM
  • Application Type
    5
  • Direct Cost Amount
    135000
  • Indirect Cost Amount
    58732
  • Total Cost
    193732
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    879
  • Ed Inst. Type
    SCHOOLS OF ARTS AND SCIENCES
  • Funding ICs
    NLM:193732\
  • Funding Mechanism
    Non-SBIR/STTR RPGs
  • Study Section
    BLR
  • Study Section Name
    Biomedical Library and Informatics Review Committee
  • Organization Name
    WORCESTER POLYTECHNIC INSTITUTE
  • Organization Department
    BIOSTATISTICS & OTHER MATH SCI
  • Organization DUNS
    041508581
  • Organization City
    WORCESTER
  • Organization State
    MA
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    016092247
  • Organization District
    UNITED STATES