BioCmp: Reconstructing Metabolic and Transcriptional Networks using Bayesian State Space Models

Information

  • NSF Award
  • 0524331
Owner
  • Award Id
    0524331
  • Award Effective Date
    7/15/2005 - 19 years ago
  • Award Expiration Date
    6/30/2009 - 15 years ago
  • Award Amount
    $ 400,000.00
  • Award Instrument
    Continuing grant

BioCmp: Reconstructing Metabolic and Transcriptional Networks using Bayesian State Space Models

Better understanding of the processes involved in the physiology of bacteria can potentially have tremendous impact on both therapeutic approaches to infectious diseases and metabolic engineering applications in biotechnology.<br/><br/>In this project, Drs. David L. Wild and Matthew J. Beal, of the Keck Graduate Institute in Claremont, California and the State University of New York in Buffalo, New York, respectively, are proposing to build statistical models of time series data, with a view to leveraging sophisticated Bayesian methods to "reverse-engineer" an organism's complex genetic regulatory networks from the raw measurements of gene expression and metabolite concentration.<br/><br/>Drs. Wild and Beal will apply their techniques to an ideal experimental system: the response of the bacterium E coli to acid stress, which enables pathogenic E. coli to survive passage through the acidic environment of the stomach and gastro-intestinal tract. They will collaborate with experimentalists Drs. Francesco Falciani and Mark Viant at the University of Birmingham, UK, who will provide data from both pathogenic and non-pathogenic strains of this bacterium. Predictions made by Wild and Beal's models can then be tested and explored back in the laboratory.<br/><br/>Recent advances in functional genomics technologies have given biologists unprecedented access to measurements of the inner workings of complex biological organisms. Using microarray expression profiling, it is now possible to measure the expression levels of tens of thousands of genes in just a single biological experiment, conducted over several days in the form of a time series. Contrast this to the situation only ten years ago when it was rather unusual for a biologist to measure the expression of more than just one or two carefully chosen genes. As well as high-throughput gene expression methods, the new technology of "metabolomics" has opened the door to measuring even more information in the form of the concentration of hundreds of metabolites that are also crucial players in the complex cellular processes under study.<br/><br/>This overwhelming amount of data challenges traditional methods of analysis, especially when one considers the element of time, because now one must consider how certain genes regulate the expression of other genes from one time point in the experiment to the next. <br/><br/>A key ingredient in Drs. Wild and Beal's models is the inclusion of "hidden factors" that help to explain the correlation structure of the observed measurements. These factors may correspond to unmeasured quantities that were not captured during the experiment and often reduce the number of direct gene-to-gene dependencies, leaving the resulting networks much more interpretable for the biologist. A natural question arises: how many hidden factors should be used to account for the dependencies in the observed data? This is answered by employing Bayesian model selection, a well-founded principle used in machine learning and statistics to choose between models of differing complexities. Their models also use a technique called Automatic Relevance Determination to further simplify the models so that only those genes and metabolites that are participating players in the process are retained in the final model.<br/><br/>Another advantage of the Bayesian framework is that existing information about known network connections and interactions, derived from the literature or commercial databases, can be included in the model. The output of the modeling procedure is a probabilistic reckoning of which genetic regulatory networks are plausible or not. These probabilities can be used to design future biological experiments targeted at specific genes, with a view to corroborating the model's in silico predictions or to simply probe a relatively uncharted network.

  • Program Officer
    Mitra Basu
  • Min Amd Letter Date
    7/7/2005 - 19 years ago
  • Max Amd Letter Date
    6/12/2008 - 16 years ago
  • ARRA Amount

Institutions

  • Name
    Keck Graduate Institute
  • City
    Claremont
  • State
    CA
  • Country
    United States
  • Address
    535 Watson Drive
  • Postal Code
    917114817
  • Phone Number
    9096079313

Investigators

  • First Name
    David
  • Last Name
    Wild
  • Email Address
    David_Wild@kgi.edu
  • Start Date
    7/7/2005 12:00:00 AM
  • First Name
    Bharat
  • Last Name
    Jayaraman
  • Email Address
    bharat@cse.buffalo.edu
  • Start Date
    5/9/2007 12:00:00 AM
  • First Name
    Matthew
  • Last Name
    Beal
  • Email Address
    mbeal@cse.buffalo.edu
  • Start Date
    7/7/2005 12:00:00 AM
  • End Date
    05/09/2007

FOA Information

  • Name
    Computer Science
  • Code
    912