This project's goal is to develop innovative statistical approaches to<br/>multi-study genomic data analysis. Specific targets include<br/>generalization of meta-analysis tools used in medicine and social<br/>sciences to the genomics context, metrics for evaluating reproducibility<br/>of expression measurements across platform in the absence of a gold<br/>standard, approaches for deriving and validating common expression<br/>scales across platforms, and a novel reformulation of the combination<br/>problem based on constructing ``coexpression matrices'' in which an<br/>element represents the coexpression of a subset of genes in a given<br/>study. The project includes software implementation, application to a<br/>set of representative genomic analyses, and development of public-domain<br/>support website.<br/><br/>Genomics studies are studies that measure simultaneously the activity of<br/>a large portion of the thousands of genes in a biological system. These<br/>have given a great impulse to the life sciences in the past decade, and<br/>changed the way in which biology, medicine, and biotechnology make<br/>progress. A large number and variety of genomics studies are accruing.<br/>Because of cost and difficulty in the acquisition of biological samples,<br/>especially in medicine, the majority of genomic investigations are<br/>carried out using a limited number of samples, and focus on highly<br/>specific problems. This scenario poses two important questions for the<br/>genomics community. First, given the wide variety of genomic<br/>technologies and protocols, there is concern about reproducibility of<br/>genomic findings across technologies and laboratories. How can one<br/>systematically use the large body of genomic information available to<br/>assess reproducibility? Second, given the large, but fragmented and<br/>heterogeneous, set of studies that are accruing, there is concern about<br/>the ability of the scientific community to efficiently integrate the<br/>resulting knowledge. How can one perform analysis of genomics data<br/>across studies, across technologies and across related biological<br/>systems? This project's overall goal is to address these two questions<br/>by developing data analysis tools for comparison and integration of<br/>genomic information across studies, across measurement technologies and<br/>across biological systems. Today, multi-study genomic analysis are rare,<br/>despite the wide availability of genomic data in the public domain. The<br/>premise underlying this proposal is that this is due in large part to<br/>the lack of specific, systematic and rigorous statistical approaches and<br/>the associated software tools. This project aims at providing such tools<br/>and therefore, if the investigator's premise is correct, will promote a<br/>more extensive, more efficient and more rigorous use of the vast<br/>resources made available by the massive investment made on genomic<br/>studies.