III: AF: Medium: Collaborative Research: Enabling Phylogenetic Inference for Modern Data Sets

Information

  • NSF Award
  • 1564137
Owner
  • Award Id
    1564137
  • Award Effective Date
    7/1/2016 - 7 years ago
  • Award Expiration Date
    6/30/2020 - 3 years ago
  • Award Amount
    $ 412,627.00
  • Award Instrument
    Continuing grant

III: AF: Medium: Collaborative Research: Enabling Phylogenetic Inference for Modern Data Sets

Many important subjects in biological and biomedical research require a robust means of phylogenetic tree inference: for models of viral transmission, for gene function inference, and for assessment of genetic diversity in the human microbiome, to name a few. These applications also depend on a rigorous means of assessing tree inference uncertainty; the Bayesian framework provides a principled means of assessing and integrating out this uncertainty. The currently available Bayesian algorithmic tools are not capable of performing inferences on large modern data sets, which also may be continually changing as new sequencing results become available. In particular, state-of-the-art methods are almost exclusively based on random-walk Markov chain Monte Carlo (MCMC) using uniformly selected local moves, even though most of these local moves will substantially worsen even a mediocre tree. Convergence problems with this approach are well documented, and thus current methods are limited to around 1000 sequences, a number much smaller than the size of microbial and immune data sets relevant to modern biomedicine. In addition, all current methods require inference to be started from scratch each time the sequence data changes. The broader impacts of this work will extend in three directions: enabling novel applications of Bayesian phylogenetics, stimulating new areas of computer science research, and attracting new talent to the field.<br/><br/>Applications of phylogenetics, in particular Bayesian phylogenetics, are being significantly held back by computational limitations. High-throughput sequencing technologies can return millions of sequences for studies of the human microbiome, viruses, oceanic microbes and antibody-making B Cells but theses cannot be handled with current methods. The models also need to be more realistic, without assumptions of independent interactions. Understanding the shape of multidimensional phylogenetic likelihood surfaces in detail might help to improve the topology. The teams will also investigate when an optimal tree on a taxon sets contains the optimal tree on a taxon subset. These will help to expand the approach to phylogenetic inference. These algorithmic insights will be incorporated into publicly available inference packages with a goal to provide inference on an order of magnitude more taxa than currently possible.

  • Program Officer
    Sylvia J. Spengler
  • Min Amd Letter Date
    4/8/2016 - 8 years ago
  • Max Amd Letter Date
    4/8/2016 - 8 years ago
  • ARRA Amount

Institutions

  • Name
    Fred Hutchinson Cancer Research Center
  • City
    Seattle
  • State
    WA
  • Country
    United States
  • Address
    1100 FAIRVIEW AVE N J6-300
  • Postal Code
    981094433
  • Phone Number
    2066674868

Investigators

  • First Name
    Frederick
  • Last Name
    Matsen
  • Email Address
    matsen@fhcrc.org
  • Start Date
    4/8/2016 12:00:00 AM

Program Element

  • Text
    INFO INTEGRATION & INFORMATICS
  • Code
    7364

Program Reference

  • Text
    INFO INTEGRATION & INFORMATICS
  • Code
    7364
  • Text
    MEDIUM PROJECT
  • Code
    7924