PI: John Quackenbush (Dana-Farber Cancer Institute)<br/><br/>The primary goal of a genome project is the identification and functional characterization of the entire catalog of genes within a particular species. While genome sequencing projects have provided a wealth of data, finding and cataloging genes and their variants remains a significant challenge. For this reason, among others, the sequencing of Expressed Sequence Tags (ESTs) derived from genes transcripts remains an important tool for biological inquiry despite the growing number of species for which genome sequencing projects have been initiated. ESTs provide valuable data for gene identification in sequenced species, provide evidence for non-coding but biologically important genomic features, and, in many species, provide the only information available about their gene content. The Gene Index databases (TGI; http://compbio.dfci.harvard.edu/tgi/plant.html) were developed to provide a high-quality, publicly available analysis of EST sequences and currently represent more than 34 different plant species. The TGI provide a consistent view of these data across species. This project will more than double the number of plant and plant parasite species represented in the TGI databases. The databases and the associated software tools will facilitate a wide range of plant functional genomics studies, will assist in identification of genes that can be used, for example, in plant breeding and the study of pathogen resistance, and will contribute to the annotation of plant genomes that will be sequenced in the coming years. All methods developed through this proposal will be instantiated in freely available, open source software tools. This will allow other researchers to faithfully reproduce the TGI in their home institutions and offer an alternative approach to the development and maintenance of gene indices.<br/><br/>Broader Impact<br/>Over the years, the TGI project has made a number of important contributions to the research community, including the creation of a highly used and well-cited public collection of databases, widely used software tools for the analysis of EST data, and the training of a number of students and postdocs. Specifically, the databases created through this project will continue to be available without restriction through http://compbio.dfci.harvard.edu/tgi/plant.html and web services access will allow other plant databases to link more effectively to the resources provided through the TGI and it is expected that in the coming years the databases will see far more than the nearly 15 million web hits these databases received in 2006. This project will continue to support collaboration with a variety of plant genome research groups, to welcome their personnel as visitors to more effectively link resources across projects, and to offer workshops on the use of the TGI resources.