Information Integration of Heterogeneous Data Sources

Information

  • Research Project
  • 7614360
  • ApplicationId
    7614360
  • Core Project Number
    R44RR018667
  • Full Project Number
    5R44RR018667-04
  • Serial Number
    18667
  • FOA Number
    PA-06-011
  • Sub Project Id
  • Project Start Date
    7/1/2003 - 21 years ago
  • Project End Date
    3/31/2011 - 14 years ago
  • Program Officer Name
    BRAZHNIK, OLGA
  • Budget Start Date
    4/1/2009 - 16 years ago
  • Budget End Date
    3/31/2010 - 15 years ago
  • Fiscal Year
    2009
  • Support Year
    4
  • Suffix
  • Award Notice Date
    3/28/2009 - 16 years ago
Organizations

Information Integration of Heterogeneous Data Sources

DESCRIPTION (provided by applicant): The wealth of biological and biomedical data constantly being generated promises dramatic advancement in the life sciences. To realize this promise, this pool of rapidly expanding information needs to be efficiently integrated, that is, combined in such a way that it can be queried to extract relevant data that can be subsequently analyzed to answer meaningful research questions. The main objective of this proposal is to develop the GeneTegra System, an information integration solution that provides a common interaction environment to query data and knowledge from multiple sources. Two main obstacles have to be overcome in order to attain an effective integration of knowledge from different data sources: syntactic heterogeneity, where data sources have different representation and access mechanisms;and semantic variability, where similar lexical terms may refer to multiple concepts or dissimilar terms refer to the same concept. The GeneTegra System addresses these obstacles through the use of Semantic Web technologies: ontologies constructed using the Web Ontology Language (OWL) as a common data and knowledge representation for data sources of diverse formats, automated mechanisms for the generation and maintenance of these ontology representations, and a robust system architecture based on reusable, service-oriented mediators. The core of the proposed system consists of general algorithms, procedures, and mechanisms developed during Phase I of this project, that enable the automatic generation of ontologies, the automated identification of semantic correspondences between ontology models, and the creation and execution of queries over these ontology- modeled, distributed, heterogeneous sources. In Phase II, the GeneTegra System will be developed, implemented, and tested as a human-centered solution building on the core components developed during Phase I, incorporating a highly usable interface for query creation and execution, a mechanism for registration, sharing, and re-use of information using Web Services standards, a mechanism for determining quality of data and query reliability, and a security and privacy subsystem that allows the construction of collaborative communities while ensuring that users are properly authenticated and authorized to access information through the system. The GeneTegra System will be designed and evaluated to specifically address the integration of sources relevant to investigations of genotype-phenotype associations and to the identification of genes responsible for human diseases and conditions. PUBLIC HEALTH RELEVANCE The GeneTegra System is an information integration solution that provides a common interaction environment to query data and knowledge from multiple heterogeneous sources. It uses ontologies as the base formulism for semantic and syntactic modeling, and contains automated mechanisms for the generation of these ontologies, and for the reuse and sharing of integration configurations. It is specifically designed to address the integrated querying of sources relevant to investigations of genotype-phenotype associations and to the identification of genes responsible for human diseases and conditions.

IC Name
NATIONAL CENTER FOR RESEARCH RESOURCES
  • Activity
    R44
  • Administering IC
    RR
  • Application Type
    5
  • Direct Cost Amount
  • Indirect Cost Amount
  • Total Cost
    505764
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    389
  • Ed Inst. Type
  • Funding ICs
    NCRR:505764\
  • Funding Mechanism
    SBIR-STTR
  • Study Section
    BCHI
  • Study Section Name
    Biomedical Computing and Health Informatics Study Section
  • Organization Name
    INFOTECH SOFT, INC.
  • Organization Department
  • Organization DUNS
    035354070
  • Organization City
    MIAMI
  • Organization State
    FL
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    33131
  • Organization District
    UNITED STATES