An interactive tool for in-depth and reproducible analysis of RNA-seq data

Information

  • Research Project
  • 10252004
  • ApplicationId
    10252004
  • Core Project Number
    R01HG010805
  • Full Project Number
    5R01HG010805-02
  • Serial Number
    010805
  • FOA Number
    PAR-18-844
  • Sub Project Id
  • Project Start Date
    9/2/2020 - 4 years ago
  • Project End Date
    6/30/2024 - 5 months ago
  • Program Officer Name
    GILCHRIST, DANIEL A
  • Budget Start Date
    7/1/2021 - 3 years ago
  • Budget End Date
    6/30/2022 - 2 years ago
  • Fiscal Year
    2021
  • Support Year
    02
  • Suffix
  • Award Notice Date
    6/16/2021 - 3 years ago
Organizations

An interactive tool for in-depth and reproducible analysis of RNA-seq data

PROJECT SUMMARY Bioinformatic analysis of large genomic datasets is a critical barrier for many biologists, especially those at smaller research institutions. Leveraging our team's bioinformatics experience, our goal is to develop an interactive web application that can be used to easily translate RNA sequencing data into biological insights. We hypothesized that an integrated tool for reproducible, in-depth analysis of expression data will democratize access to high-throughput technologies and help biologists pinpoint molecular pathways from large data. Our goal is to develop a carefully-designed user-friendly pipeline with rich data visualization capacity. As a proof of concept, the team developed a prototype called iDEP (integrated Differential Expression and Pathway analysis) for the analysis of summarized expression matrices. It's unique features include (1) comprehensive analytic functionality based on 63 R and Bioconductor packages, covering exploratory data analysis, clustering, differential gene expression and pathway analysis; (2) a massive knowledgebase for automatic gene ID conversion, annotation, and pathway analysis for over 2000 archaeal, bacterial and eukaryotic species; (3) reproducibility of some core steps by generating R and R Markdown notebooks; (4) application programming interfaces (APIs) for retrieval of protein-protein interaction networks and KEGG pathway diagrams, and (5) easy access to about 13000 processed public RNA-seq data in 9 species. Compared with existing tools, the key innovation is the emphasis on deep integration (tools, annotation, pathways, and public datasets), user- friendliness, and reproducibility. Even with limited features, iDEP is beginning to be adopted by researchers from diverse fields. In this proposal, the team plans to complete the development of iDEP. The goal of Specific Aim 1 is to (a) re-write iDEP in a modular, object-oriented fashion, (b) make an R package for generating fully reproducible R Markdown notebooks, and (c) add essential functionalities such as bias correction (batch effect, GC content, gene length, expression level), time-course analysis, supervised classification, and additional methods for existing functional modules. We will also enable gene ontology enrichment analysis for unannotated species using Blast2GO. Specific Aim 2 focuses on (a) substantially expanding the pathway database for frequently studied species and (b) collecting more uniformly processed RNA-seq and DNA microarray datasets to facilitate the re-analysis and meta-analysis of public expression data. In Specific Aim 3, the team will conduct hardware upgrade, rigorous testing, code review, documentation, and community integration. The development of iDEP can help make standard RNA-seq analysis accessible for a very broad community of researchers.

IC Name
NATIONAL HUMAN GENOME RESEARCH INSTITUTE
  • Activity
    R01
  • Administering IC
    HG
  • Application Type
    5
  • Direct Cost Amount
    225000
  • Indirect Cost Amount
    105611
  • Total Cost
    330611
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    172
  • Ed Inst. Type
    BIOMED ENGR/COL ENGR/ENGR STA
  • Funding ICs
    NHGRI:10611\NIGMS:320000\
  • Funding Mechanism
    Non-SBIR/STTR RPGs
  • Study Section
    BDMA
  • Study Section Name
    Biodata Management and Analysis Study Section
  • Organization Name
    SOUTH DAKOTA STATE UNIVERSITY
  • Organization Department
    BIOSTATISTICS & OTHER MATH SCI
  • Organization DUNS
    929929743
  • Organization City
    BROOKINGS
  • Organization State
    SD
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    570070001
  • Organization District
    UNITED STATES