A Data Coordinating Center for modENCODE

Information

  • Research Project
  • 7921190
  • ApplicationId
    7921190
  • Core Project Number
    U41HG004269
  • Full Project Number
    3U41HG004269-04S1
  • Serial Number
    4269
  • FOA Number
    RFA-HG-06-007
  • Sub Project Id
  • Project Start Date
    5/4/2007 - 17 years ago
  • Project End Date
    3/31/2011 - 13 years ago
  • Program Officer Name
    GOOD, PETER J.
  • Budget Start Date
    4/1/2009 - 15 years ago
  • Budget End Date
    3/31/2010 - 14 years ago
  • Fiscal Year
    2009
  • Support Year
    4
  • Suffix
    S1
  • Award Notice Date
    9/29/2009 - 15 years ago

A Data Coordinating Center for modENCODE

DESCRIPTION: The modENCODE project is a key sequel to the sequencing of the fly and worm genomes, and will have an enormous impact on our understanding of biological processes in all higher eukaryotes, including human. In order to manage the diverse, large-scale datasets that will be produced by modENCODE, we propose to create a data coordinating center (DCC) to track the data, integrate it with other information sources, and make it available to the research community in a timely and open fashion. This proposal brings together four groups with highly relevant backgrounds: The Micklem group, through its work on the InterMine system and FlyMine database, has extensive experience in integrating diverse types of data into high-performance data mining systems. The Stein and Lewis groups bring to the project an intimate familiarity with the C. elegans and D. melanogaster genomes, their reagents and research communities, and are well-positioned by their work with the WormBase and FlyBase databases to liaise with those MODs. The Kent group is responsible for the DCC for the Human ENCODE pilot project, and has extensive practical knowledge of developing and managing projects of this sort. We will assemble a team of three data managers stationed at CSHL and at Berkeley, who have a background in the bioinformatics of C. elegans and/or D. melanogaster. The managers will liaise with their contacts at the data provider sites to determine data file formats, milestones and quality control procedures for their datasets. They will also liaise with representatives from NCBI to coordinate modENCODE activities with the primary data repositories at GenBank and GEO. Data providers will upload their data sets to a staging server where they will be able to preview their data on an instance of the GBrowse genome browser. The data managers will QC the data before approving its transfer to the production database. Data will be integrated in the production database using InterMine, and from there released to the public on a monthly schedule. Researchers will be able to access the data via the GBrowse genome browser, bulk downloads, and via complex queries and reports mediated by InterMine and the BioMart data warehousing system. All major software systems used by the proposed DCC will be based on open source tools from the Generic Model Organism Database (GMOD), human ENCODE, and other sources. Throughout the project, Lewis and Stein will work close with FlyBase and/or WormBase to ensure that data collected by modENCODE becomes an integral part of the relevant model organism database. In addition we will dedicate a significant part of a data manager's effort to transfer data from modENCODE into the MODs during the last year of the project.

IC Name
NATIONAL HUMAN GENOME RESEARCH INSTITUTE
  • Activity
    U41
  • Administering IC
    HG
  • Application Type
    3
  • Direct Cost Amount
  • Indirect Cost Amount
  • Total Cost
    196500
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    172
  • Ed Inst. Type
  • Funding ICs
    NHGRI:196500\
  • Funding Mechanism
    Research Centers
  • Study Section
    ZHG1
  • Study Section Name
    Special Emphasis Panel
  • Organization Name
    ONTARIO INSTITUTE FOR CANCER RESEARCH
  • Organization Department
  • Organization DUNS
    205540219
  • Organization City
    TORONTO
  • Organization State
    ON
  • Organization Country
    CANADA
  • Organization Zip Code
    M5G 0A3
  • Organization District
    CANADA