BioCmp: Reconstructing Metabolic and Transcriptional Networks using Bayesian State Space Models

Information

NSF Award
0524331

Owner

Keck Graduate Institute

Award Id
0524331
Award Effective Date
7/15/2005 - 19 years ago
Award Expiration Date
6/30/2009 - 15 years ago
Award Amount
$ 400,000.00
Award Instrument
Continuing grant

Information

BioCmp: Reconstructing Metabolic and Transcriptional Networks using Bayesian State Space Models

Better understanding of the processes involved in the physiology of bacteria can potentially have tremendous impact on both therapeutic approaches to infectious diseases and metabolic engineering applications in biotechnology. In this project, Drs. David L. Wild and Matthew J. Beal, of the Keck Graduate Institute in Claremont, California and the State University of New York in Buffalo, New York, respectively, are proposing to build statistical models of time series data, with a view to leveraging sophisticated Bayesian methods to "reverse-engineer" an organism's complex genetic regulatory networks from the raw measurements of gene expression and metabolite concentration. Drs. Wild and Beal will apply their techniques to an ideal experimental system: the response of the bacterium E coli to acid stress, which enables pathogenic E. coli to survive passage through the acidic environment of the stomach and gastro-intestinal tract. They will collaborate with experimentalists Drs. Francesco Falciani and Mark Viant at the University of Birmingham, UK, who will provide data from both pathogenic and non-pathogenic strains of this bacterium. Predictions made by Wild and Beal's models can then be tested and explored back in the laboratory. Recent advances in functional genomics technologies have given biologists unprecedented access to measurements of the inner workings of complex biological organisms. Using microarray expression profiling, it is now possible to measure the expression levels of tens of thousands of genes in just a single biological experiment, conducted over several days in the form of a time series. Contrast this to the situation only ten years ago when it was rather unusual for a biologist to measure the expression of more than just one or two carefully chosen genes. As well as high-throughput gene expression methods, the new technology of "metabolomics" has opened the door to measuring even more information in the form of the concentration of hundreds of metabolites that are also crucial players in the complex cellular processes under study. This overwhelming amount of data challenges traditional methods of analysis, especially when one considers the element of time, because now one must consider how certain genes regulate the expression of other genes from one time point in the experiment to the next. A key ingredient in Drs. Wild and Beal's models is the inclusion of "hidden factors" that help to explain the correlation structure of the observed measurements. These factors may correspond to unmeasured quantities that were not captured during the experiment and often reduce the number of direct gene-to-gene dependencies, leaving the resulting networks much more interpretable for the biologist. A natural question arises: how many hidden factors should be used to account for the dependencies in the observed data? This is answered by employing Bayesian model selection, a well-founded principle used in machine learning and statistics to choose between models of differing complexities. Their models also use a technique called Automatic Relevance Determination to further simplify the models so that only those genes and metabolites that are participating players in the process are retained in the final model. Another advantage of the Bayesian framework is that existing information about known network connections and interactions, derived from the literature or commercial databases, can be included in the model. The output of the modeling procedure is a probabilistic reckoning of which genetic regulatory networks are plausible or not. These probabilities can be used to design future biological experiments targeted at specific genes, with a view to corroborating the model's in silico predictions or to simply probe a relatively uncharted network.

Program Officer
Mitra Basu
Min Amd Letter Date
7/7/2005 - 19 years ago
Max Amd Letter Date
6/12/2008 - 16 years ago
ARRA Amount

Institutions

Name
Keck Graduate Institute
City
Claremont
State
CA
Country
United States
Address
535 Watson Drive
Postal Code
917114817
Phone Number
9096079313

Investigators

First Name
David
Last Name
Wild
Email Address
David_Wild@kgi.edu
Start Date
7/7/2005 12:00:00 AM

First Name
Bharat
Last Name
Jayaraman
Email Address
bharat@cse.buffalo.edu
Start Date
5/9/2007 12:00:00 AM

First Name
Matthew
Last Name
Beal
Email Address
mbeal@cse.buffalo.edu
Start Date
7/7/2005 12:00:00 AM
End Date
05/09/2007

FOA Information

Name
Computer Science
Code
912

BioCmp: Reconstructing Metabolic and Transcriptional Networks using Bayesian State Space Models

Information

Owner

Award Id

Award Effective Date

Award Expiration Date

Award Amount

Award Instrument

BioCmp: Reconstructing Metabolic and Transcriptional Networks using Bayesian State Space Models

Program Officer

Min Amd Letter Date

Max Amd Letter Date

ARRA Amount

Institutions

Name

City

State

Country

Address

Postal Code

Phone Number

Investigators

First Name

Last Name

Email Address

Start Date

First Name

Last Name

Email Address

Start Date

First Name

Last Name

Email Address

Start Date

End Date

FOA Information

Name

Code