The present invention relates to management and querying of enterprise data.
Information integration applications take data that is stored in two or more data sources and build from them one large database, possibly a virtual database, containing information from all of the sources, so that the data can be queried as a unit. Thus, for example, enterprise accounting data may be stored within a relational database, and enterprise inventory may be stored within XML documents. Information integration enables an enterprise to access its various data sources from within a single data store application.
Information integration is discussed in Chap. 20 of Garcia-Molina, H., Ullman, J. D. and Widom, J., “Database Systems: The Complete Book”, Prentice-Hall, New Jersey, 2002. As pointed out in Sec. 20.1 of this reference, there are three basic modes of information integration: (i) federated databases, (ii) data warehousing, and (iii) mediation.
In a federated database architecture, data sources are independent, but one source can call on others to supply information.
In a data warehousing architecture, data from several sources is extracted and combined into a global schema. The data is then stored at the warehouse, which appears to the user like an ordinary database. Once data is in the warehouse, queries are issued by a user exactly as they would be issued to any database. However, user updates to the warehouse are generally forbidden, sine they are not reflected in the underlying sources, and thus can make the warehouse inconsistent with the sources.
A data warehouse is updated periodically, by reconstructing it from current data in the data sources. Typically, a data warehouse is updated once a night, when the system can be shut down, so that queries are not issued while the warehouse is being constructed. Alternatively, the data warehouse may be incrementally updated based on changes that have been made to the data sources since the last time the warehouse was modified.
Conventional data warehouses are generally expensive and inflexible. In addition, such data warehouses generally do not provide real-time operation.
In a mediation architecture, a software component, referred to as a mediator, supports a virtual database, which a user may query as if it were physically constructed. The mediator stores no data of its own. Rather, it translates a query into one or more queries to its sources, synthesizes the answer to the query from the responses of the sources, and returns an answer to the user. A mediator supports a virtual view, or collection of views, that integrates several sources.
An example of a mediation system is the Enterprise Information Integrator (EII) of IBM Corporation, which generates a virtual warehouse. EII supports integrated querying across multiple data sources, including IBM DB2 relational databases, Microsoft SQL relational databases, and XML document databases.
All three of the approaches to information integration described above use transformers, referred to as wrappers or extractors, to transform data when it is extracted from a data source. Wrappers are used to pass ad-hoc queries to data sources, receive responses from the source, and pass information to an information integrator.
A drawback with conventional information integration is the lack of uniformity in semantics, and the lack of traceability back to individual data sources. Each database accessed by a warehouse generally has its own semantics, including inter alia names for tables and their fields, names for XML complex types and their elements, and data formats. It may happen that the same name is used in different contexts within different databases, or multiple names are used for the same construct, perhaps formatted differently for different names. Further complications that can arise with non-uniform semantics include inter alia, different inter-relationships between data constructs, different business rules relating the same data constructs, redundancies and inconsistencies.
It is thus desirable to be able to introduce a common semantic foundation for all of the data sources accessed within a data warehouse, and to provide a translation layer which enables a user to access data using queries expressed in common and meaningful semantics, and buffers the user from the individual semantics for the individual data sources.
The present invention concerns use of common semantics for querying data within diverse types of data sources. The present invention provides a generic translation layer that buffers a user from the particulars of individual semantics for each data source, and enables the user to express queries in terms of common and meaningful semantics.
There is thus provided in accordance with a preferred embodiment of the present invention a system for managing and querying a plurality of data sources, including an interface to a plurality of inter-related data sources of diverse types, each of the plurality of data sources having a schema that describes its data structure, a schema generator communicating with the interface, for generating a federated schema that describes the structures of the plurality of data sources and their inter-relationships, and for modifying the federated schema over time as the plurality of data sources undergo changes, and a query generator communicating with the schema generator, for generating a query for the federated schema.
There is further provided in accordance with a preferred embodiment of the present invention a method for managing and querying a plurality of data sources, including generating a federated schema that describes the structures of a plurality of inter-related data sources and their inter-relationships, and for modifying the federated schema over time as the plurality of data sources undergo changes, wherein each of the plurality of data sources has a schema that describes its data structure, and generating a query for the federated schema.
There is yet further provided in accordance with a preferred embodiment of the present invention a computer-readable storage medium storing program code for causing a computer to perform the steps of generating a federated schema that describes the structures of a plurality of inter-related data sources and their inter-relationships, and for modifying the federated schema over time as the plurality of data sources undergo changes, wherein each of the plurality of data sources has a schema that describes its data structure, and generating a query for the federated schema.
The present invention will be more fully understood and appreciated from the following detailed description, taken in conjunction with the drawings in which:
Enterprise data is typically distributed over multiple databases, referred to as data assets. Data assets can be of various types, including inter alia relational database tables, XML databases, entity-relationship (ER) databases and Cobol databases. Each data asset stores data according to a specific data structure format, referred to as a schema or as asset metadata. Asset metadata serves as descriptors, explaining how to interpret data stored within the asset. Without knowledge of asset metadata, data stored within an asset is generally unintelligible. Thus, for example, relational database tables store data according to a relational database schema, XML documents store data according to an XML schema, ER databases store data according to an ER logical model, and Cobol databases store data according to a Cobol Copybook. Each such schema is asset metadata, which explains how to interpret data stored within the asset.
Asset metadata is typically comprised of basic data structures, or constructs. For example, relational database schema are comprised of fields and tables, XML schema are comprised of simple types and complex types, ER logical models are comprised of entities and relationships, and Cobol Copybooks are comprised of elementary items and group items.
In a preferred embodiment of the present invention, semantics are provided to enterprise data through (i) a common ontology model, referred to also as an information model; and (ii) mappings of enterprise asset metadata into the ontology model. An ontology model is comprised of classes and properties, and is particularly useful for representing data in a semantically meaningful way. Mappings are associations of constructs of a first asset metadata with constructs of a second asset metadata in a consistent way. Mappings of asset metadata into the ontology model serve as dictionaries through which constructs of the asset metadata can be semantically understood.
Reference is now made to
Ontology model 140 preferably encapsulates substantially all of the constructs from schemas 110, 120 and 130, including inter alia database tables and their fields and their interrelationships through foreign keys, and XML complex types and their elements and the type inter-relationships, as well as business rules that relate table fields to one another and XML elements to one another. Ontology model 140 serves not only to model enterprise data assets, but also provides common semantics for referencing the data assets.
The individual mappings of schemas 110, 120 and 130 into ontology model 140 can be inverted, and used to map ontology model 140 into a single federated schema 150. Federated schema 150 effectively combines the three individual schemas 110, 120 and 130. Moreover, federated schema inherits the common semantics of ontology model 140.
Reference is now made to
It may be appreciated by those skilled in the art that ontology model 140 may be alternatively generated without the use of schema-to-ontology generator 210. Thus, ontology model 140 may alternatively be an industry standard “off-the-shelf” model, or a custom ontology model generated with the use of an ontology modeling tool.
After ontology model 140 is generated, an ontology-to-schema generator 220 creates federated schema 150. Federated schema may be a relational database schema, or an XML schema or another schema. Preferably, ontology-to-schema generator 220 creates relational database tables and fields, or XML complex types and elements, which correspond respectively to the classes of ontology model 140 and their properties.
Reference is now made to
Also shown in
By using federated schema 150 in accordance with the present invention, a generic translation layer can be generated, which buffers the user from the specific semantics of the individual databases 310, 320 and 330, and enables him to express database searches using the semantics of ontology model 140. Specifically, as shown in
Reference is now made to
Shown in
Also shown in
After data schemas 470 have been mapped to ontology model 460 by Unicorn Workbench 450, the ontology model and semantic mappings are stored on a server 490, which provides a web portal into the enterprise data. A user of the portal formulates queries and generates views using the common business language, which are automatically converted to appropriate information integrator 450 names and executed within information integrator 450, and the results are converted back into the common business language—all transparent to the user. Preferably, view definitions are formulated in terms of a standard query language, such as SQL or XQuery.
The user also uses the common business language to run business applications including inter alia visualization and analysis tools. It may thus be appreciated that ontology model 460 and semantic mappings 480 obviate the need for the user to learn the intricacies of the data naming conventions for the individual data sources 420 and the inter-dependencies among their data.
Reference is now made to
As shown in
Shown in
In the foregoing specification, the invention has been described with reference to specific exemplary embodiments thereof. It will, however, be evident that various modifications and changes may be made to the specific exemplary embodiments without departing from the broader spirit and scope of the invention as set forth in the appended claims. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
This application is a continuation-in-part of assignee's pending application U.S. Ser. No. 10/637,339, filed on Aug. 8, 2003, entitled “Method and System for Mapping Enterprise Data Assets to a Semantic Information Model,” which is a continuation-in-part of assignee's pending application U.S. Ser. No. 10/340,068, filed on Jan. 9, 2003, entitled “Brokering Semantics between Web Services,” which is a continuation-in-part of assignee's pending application U.S. Ser. No. 10/302,370, filed on Nov. 22, 2002, entitled “Enterprise Information Unification,” which is a continuation-in-part of assignee's pending application U.S. Ser. No. 10/159,516, filed on May 31, 2002, entitled “Data Query and Location through a Central Ontology Model,” which is a continuation-in-part of assignee's pending application U.S. Ser. No. 10/104,785, filed on Mar. 22, 2002, entitled “Run-Time Architecture for Enterprise Integration with Transformation Generation,” which is a continuation-in-part of assignee's pending application U.S. Ser. No. 10/053,045, filed on Jan. 15, 2002, entitled “Method and System for Deriving a Transformation by Referring Schema to an Central Model,” which is a continuation-in-part of assignee's pending application U.S. Ser. No. 09/866,101, filed on May 25, 2001, entitled “Method and System for Collaborative Ontology Modeling.”
Number | Date | Country | |
---|---|---|---|
Parent | 10637339 | Aug 2003 | US |
Child | 11026358 | Dec 2004 | US |
Parent | 10340068 | Jan 2003 | US |
Child | 10637339 | Aug 2003 | US |
Parent | 10302370 | Nov 2002 | US |
Child | 10340068 | Jan 2003 | US |
Parent | 10159516 | May 2002 | US |
Child | 10302370 | Nov 2002 | US |
Parent | 10104785 | Mar 2002 | US |
Child | 10159516 | May 2002 | US |
Parent | 10053045 | Jan 2002 | US |
Child | 10104785 | Mar 2002 | US |
Parent | 09866101 | May 2001 | US |
Child | 10053045 | Jan 2002 | US |