Collaborative Research: ABI Development: A New Platform for highly-optimized, low-latency pipelines for genomic data analysis

Information

  • NSF Award
  • 1356469
Owner
  • Award Id
    1356469
  • Award Effective Date
    9/15/2014 - 9 years ago
  • Award Expiration Date
    8/31/2017 - 6 years ago
  • Award Amount
    $ 345,581.00
  • Award Instrument
    Standard Grant

Collaborative Research: ABI Development: A New Platform for highly-optimized, low-latency pipelines for genomic data analysis

Next-generation sequencing has transformed genomics into a new paradigm of data-intensive computing, raising several salient challenges. First, the deluge of genomic data needs to undergo deep analysis to mine biological information, which requires a full pipeline that integrates many data processing and analysis tools. Second, deep analysis pipelines often take long to run, which entails a long cycle for algorithm and method development. This project aims to bring the latest big data technology and database technology to the genomics domain to revolutionize its data crunching power. This project is anticipated to produce significant scientific and educational benefits. By providing a highly-optimized parallel processing platform for genomic data analysis and making it accessible in private and public clouds, it will enable many new models and algorithms to be developed for genomics and help advance this field at unprecedented speed as big data technology did for Internet companies. This project also integrates research and education with curriculum development, tutorials for K-12 teachers and community college faculty, and engaging women in research through college outreach and NSF-funded outreach programs.<br/><br/>The proposed research includes the development of (1) a deep pipeline for genomic data analysis by assembling state-of-the-art methods, (2) automatic parallelization of the workflow using the big data technology, (3) a principled approach to optimizing the genomic pipeline, and (4) integration of streaming technology to reduce latency of important results. The prototype system will be deployed in both private and public cloud environments, and fully evaluated using existing long-running pipelines at the New York Genome Center and in a variety of real use cases. By way of doing so, this project will provide new knowledge regarding how to adapt and advance big data technology, including new optimization, partitioning, and scheduling techniques, for the genomics domain. The results of the project are disseminated at the web site: http://gesall.cs.umass.edu.

  • Program Officer
    Jennifer Weller
  • Min Amd Letter Date
    9/15/2014 - 9 years ago
  • Max Amd Letter Date
    9/15/2014 - 9 years ago
  • ARRA Amount

Institutions

  • Name
    New York Genome Center
  • City
    New York
  • State
    NY
  • Country
    United States
  • Address
    101 Avenue of the Americas
  • Postal Code
    100131991
  • Phone Number
    6469777000

Investigators

  • First Name
    Toby
  • Last Name
    Bloom
  • Email Address
    tbloom@nygenome.org
  • Start Date
    9/15/2014 12:00:00 AM

Program Element

  • Text
    ADVANCES IN BIO INFORMATICS
  • Code
    1165