Better understanding of the processes involved in the physiology of bacteria can potentially have tremendous impact on both therapeutic approaches to infectious diseases and metabolic engineering applications in biotechnology.<br/><br/>In this project, Drs. David L. Wild and Matthew J. Beal, of the Keck Graduate Institute in Claremont, California and the State University of New York in Buffalo, New York, respectively, are proposing to build statistical models of time series data, with a view to leveraging sophisticated Bayesian methods to "reverse-engineer" an organism's complex genetic regulatory networks from the raw measurements of gene expression and metabolite concentration.<br/><br/>Drs. Wild and Beal will apply their techniques to an ideal experimental system: the response of the bacterium E coli to acid stress, which enables pathogenic E. coli to survive passage through the acidic environment of the stomach and gastro-intestinal tract. They will collaborate with experimentalists Drs. Francesco Falciani and Mark Viant at the University of Birmingham, UK, who will provide data from both pathogenic and non-pathogenic strains of this bacterium. Predictions made by Wild and Beal's models can then be tested and explored back in the laboratory.<br/><br/>Recent advances in functional genomics technologies have given biologists unprecedented access to measurements of the inner workings of complex biological organisms. Using microarray expression profiling, it is now possible to measure the expression levels of tens of thousands of genes in just a single biological experiment, conducted over several days in the form of a time series. Contrast this to the situation only ten years ago when it was rather unusual for a biologist to measure the expression of more than just one or two carefully chosen genes. As well as high-throughput gene expression methods, the new technology of "metabolomics" has opened the door to measuring even more information in the form of the concentration of hundreds of metabolites that are also crucial players in the complex cellular processes under study.<br/><br/>This overwhelming amount of data challenges traditional methods of analysis, especially when one considers the element of time, because now one must consider how certain genes regulate the expression of other genes from one time point in the experiment to the next. <br/><br/>A key ingredient in Drs. Wild and Beal's models is the inclusion of "hidden factors" that help to explain the correlation structure of the observed measurements. These factors may correspond to unmeasured quantities that were not captured during the experiment and often reduce the number of direct gene-to-gene dependencies, leaving the resulting networks much more interpretable for the biologist. A natural question arises: how many hidden factors should be used to account for the dependencies in the observed data? This is answered by employing Bayesian model selection, a well-founded principle used in machine learning and statistics to choose between models of differing complexities. Their models also use a technique called Automatic Relevance Determination to further simplify the models so that only those genes and metabolites that are participating players in the process are retained in the final model.<br/><br/>Another advantage of the Bayesian framework is that existing information about known network connections and interactions, derived from the literature or commercial databases, can be included in the model. The output of the modeling procedure is a probabilistic reckoning of which genetic regulatory networks are plausible or not. These probabilities can be used to design future biological experiments targeted at specific genes, with a view to corroborating the model's in silico predictions or to simply probe a relatively uncharted network.