ATD: Collaborative Research: Statistical Ensembles for the Identification of Bacterial Genomes

Information

  • NSF Award
  • 1120404
Owner
  • Award Id
    1120404
  • Award Effective Date
    8/15/2011 - 13 years ago
  • Award Expiration Date
    4/30/2014 - 10 years ago
  • Award Amount
    $ 708,360.00
  • Award Instrument
    Continuing grant

ATD: Collaborative Research: Statistical Ensembles for the Identification of Bacterial Genomes

This research focuses on reducing bioterrorism threat by integrating tools from genomics and statistics in ways that have not been previously examined. The investigators develop novel statistical theory and computational tools for accurate pathogen detection based on next generation sequencing data. Key research directions involve (i) classification by sequence enrichment; (ii) comparison of empirical clusterings and reference genomes; and (iii) shrinkage estimation and model selection in hierarchical log-linear models. In addition to an in-depth characterization of the theoretical properties of these new statistical inference techniques, the investigators perform a thorough assessment of their practical importance in the context of the detection and identification of bacterial genomes. This assessment is done using publicly available data from sources such as the Human Microbiome Project, the NCBI Short Read Archive, the European Bioinformatics Institute, and the Broad Institute. The applicability of this new methodology is broad and relates to high-dimensional settings in which choosing an appropriate class of candidate statistical models is difficult. The investigators study statistical ensembles, combinations of techniques that have been shown to provide more reliable inferences than any single statistical approach. As opposed to existing work which combines models from the same class, this new framework concerns ensembles that cross class boundaries and optimally combine inferences from multiple models from several model classes. These ensembles are expected to have distinct advantages over existing approaches, such as robustness to model misspecification and improved predictive performance.<br/><br/>The new statistical methodology developed in this proposal has the potential to substantially improve the response of federal and international agencies to a bioterrorism attack through a rapid identification of differences in microbial genomes and their accurate classification as harmless or potentially pathogenic. The impact of these algorithms for pathogen detection on both information technology and civil infrastructure is maximized through their implementation in user-friendly, open-source computational tools and software that will be freely available to the public. The project also has a significant educational and mentorship component for students and postdoctoral fellows who are interested in enhancing our ability to respond rapidly and appropriately to (i) incidents of bioterrorism, and (ii) microbial threats to public health.

  • Program Officer
    Leland M. Jameson
  • Min Amd Letter Date
    8/9/2011 - 13 years ago
  • Max Amd Letter Date
    7/5/2013 - 11 years ago
  • ARRA Amount

Institutions

  • Name
    University of Miami School of Medicine
  • City
    Coral Gables
  • State
    FL
  • Country
    United States
  • Address
    1320 S. Dixie Highway Suite 650
  • Postal Code
    331462926
  • Phone Number
    3052843924

Investigators

  • First Name
    Jennifer
  • Last Name
    Clarke
  • Email Address
    jclarke3@unl.edu
  • Start Date
    8/9/2011 12:00:00 AM
  • First Name
    BERTRAND
  • Last Name
    CLARKE
  • Email Address
    bclarke3@unl.edu
  • Start Date
    8/9/2011 12:00:00 AM

Program Element

  • Text
    COFFES
  • Code
    7552

Program Reference

  • Text
    ALGORITHMS IN THREAT DETECTION
  • Code
    6877