Proposal: 0326404<br/>PI: Lynette Hirschman<br/>Title: Evaluation Bioinformatics Technology<br/><br/>Abstract<br/><br/>The focus of this effort is to put in place an international "challenge evaluation" for text data mining, modeled after the successful CASP evaluations (Critical Assessment of Techniques for the Protein Structure Prediction). An evaluation will be implemented focused initially on two tasks: identification of biologically significant entities; and extraction of relation information such as interactions or identification of Gene Ontology terms in the literature. A gold standard of training and test corpora will be obtained by using expert annotated data from BIND, SWISS-PROT and other model organism databases. These data sets will be made available to a wide community through a website. Organizations in computational linguistics and information retrieval will also be involved. The overall strategy is to focus on problems of importance to working biologists, such as improved curation tools for biological databases, and better access to textual information in both the literature and in curated databases. This approach provides infrastructure and support to multiple research groups. It also supports creation of training and test corpora to support many machine learning and statistical pattern recognition experiments.