DESCRIPTION (provided by applicant): The overall goal of this proposal is to develop an information integration architecture and associated tools to support rapid integration of data and knowledge from distributed heterogeneous data sources. The architecture aims to play a significant role in extracting coherent knowledge bases for biomedical research and improving the accuracy, completeness and quality of the extracted knowledge. Towards achieving these goals, the proposed scalable architecture includes new innovative generalized integration algorithms and tools for the generation of mediators to capture the functional behavior of data sources, semantic representation of data sources to support automated generation of integration agents, and optimization of integrated data queries. The information integration architecture keeps pace with the evolving Internet-based XML electronic data interchange, semantic web services, and web services discovery standards. Thus, leveraging the Internet technologies and standards for the purpose of providing lasting state-of-the-art solutions for information integration. In addition, the proposed architecture is inherently scalable in terms of the number of data sources that can be integrated, the number of users of the integrated system, and the range of biomedical problems that can be tackled. During phase I of the project, prototypes of the proposed integration algorithms and tools will be developed as proofs of concept and to form the foundation for evaluation and pilot testing of the proposed integration mechanisms, using private and public data sources, in terms of scalability and integration capabilities.