1. Field of the Invention
This invention generally relates to semantic web technology, and more specifically, to methods and systems for efficiently storing semantic web statements in a relational database. Even more specifically, the invention relates to such methods and systems that are particularly well suited for use with the Resource Description Framework (RDF) language.
2. Background Art
RDF is a language used to represent information, particularly meta data, about resources available in the World Wide Web. For example, RDF may be used to represent copyright or licensing information about a document on the Web, or the author and title of a particular Web page. RDF can also be employed for representing data or meta data about items or matters that can be identified on the World Wide Web even though these items cannot be directly retrieved from the Web. Examples of these latter items may include data about a user's Web preferences, and information, such as the price and availability, of items for sale at on-line shopping facilities. Specifications for RDF are established by the World Wide Web Consortium.
RDF uses identifiers, referred to as Uniform Resource Identifiers, or URIs, and is based on a specific terminology. An RDF statement includes a subject, a predicate and an object. The subject identifies the thing, such as person or Web page, that the statement is about. The predicate identifies the property or characteristic, such as title or owner, of the subject of the RDF statement, and the object identifies a value of that property or characteristic. For example, if the RDF statement is about pet owners, the subject might be “owner,” the predicate could be “name,” and the object could be “Joe.” This format, among other advantages, allows RDF to represent statements as a graph of nodes and arcs. In the graph, the subjects and objects may be represented by, for example, ovals, circles or squares, or some combination thereof, while the predicates of the RFD statements may be represented by arcs or arrows connecting the subject of each statement with the object of the statement.
An important feature of RDF is that it provides a common framework for expressing information. This allows this information to be exchanged among applications without losing any meaning of the information. Because of this common framework, application developers can utilize the availability of common tools and parsers to process RDF information.
RDF data access requests in conventional systems are defined by “Triple Patterns,” which limit the RDF statement(s) they are requesting by constraining any or all of the three parts of an RDF statement: the subject, predicate and object. For example, the triple pattern “(<Person001>, <name>, ?)” requests only RDF statements where the subject is “Person001,” and the predicate is “name” (the “?” for the object is used as a wildcard and means the object can be anything).
A number of RDF storage systems are built on top of relational databases. In such systems, RFD statements are stored in relational database tables created specifically to hold RDF. Such systems cannot be used to store RDF in tables other than the ones specifically designed for these systems to store RDF. Additionally, such systems do not optimize storage for commonly occurring RDF structures.
An object of this invention is to store efficiently semantic web statements in a relational database.
Another object of the present invention is to store semantic web statements in relational tables designed specifically for such structures.
A further object of the invention is to extend read access of RDF data to non-RDF enabled systems or system components
These and other objectives are attained with a method and system for storing semantic web statements in a relational database. The method comprises the steps of providing a repository for said semantic web statements, and providing a relational database including one or more specific tables. Each of these specific tables includes (i) one column holding a Uniform Resource identifier (URI) key, and (ii) one or more additional columns holding components of said semantic web statements. A specific table component registry is established to connect the specific tables to said repository, and this registry includes an entry for each of said specific tables for converting data in said tables to one of said semantic statements.
Preferably, each of the specific tables includes one or more rows, and each row of the specific tables represents a set of semantic web statements. Also, in a preferred embodiment, the semantic web statements include subjects and objects; and for each row of the specific tables, (i) one or more entries in the row combine to make the subject of one of said semantic web statements, and (ii) the remaining entries in the row are, or combine to be, the object of that one of said semantic web statements. Any suitable procedure may be used to access the semantic web statements. For example, access procedures are described in This system not only allows a system to express access control rules but also enforces them by storing access control data in specific tables, for example as described in copending application no. (Attorney Docket POU920050098US1) for “Method and System For Controlling Access To Semantic Web Statements,” the disclosure of which is hereby incorporated herein by reference.
Also disclosed herein is a hybrid system that uses the above described method to more efficiently store RDF in these specific tables where it can, and uses conventional RDF storage tables for statements that have no place in the specific tables. In addition, such a system may extend read access of the RDF data to non-RDF enabled systems or system components
Further benefits and advantages of the invention will become apparent from a consideration of the following detailed description, given with reference to the accompanying drawings, which specify and show preferred embodiments of the invention.
This conventional RDF repository can be extended to implement the present invention.
Thus, as can be seen, for the present invention, additional tables 32 are added to the relational database 24. Each of these additional tables has at least one “URI key” (a column or set of columns that holds or can be converted to a URI), and any number of additional columns, each of which stores data of a primitive type such as “integer,” “date,” “time,” or “varchar” (text). It may be noted that many relational databases, such as databases of 3rd normal form, designed using conventional practices will meet the requirements of a “Specific Table,” meaning that pre-existing databases and the data they hold can exposed as RDF using the instant invention.
To connect these “specific tables” 22 to a conventional RDF repository 22, a “specific table component” registry 36 is created with an entry for each “specific table” able to convert the data in that “specific table” to RDF statements and to interact with the RDF repository to make these RDF statements available to data access requests.
Each “specific table component” keeps track of the “URI key” in the table that stores the subject for each row, a mapping of column to predicate names and a mapping of column names to RDF datatypes for these columns so that the relational database datatypes can be converted to the correct RDF datatype. Once a “specific table component” is created with these properties and mappings, it is able to convert data stored in a relational database into RDF statements. It may also be noted that, with the arrangement shown in
Additionally, with this invention, the conventional RDF repository's data access subsystem may be modified such that read requests for RDF statements destined for the “Statement” table 30 are intercepted by “specific table components” 34 registered with it and are redirected to the appropriate specific table where the data is actually stored. “Specific table components” 34 intercept access requests according to the logic flow shown in
More particularly, the routine shown in
If, at step 52, the Triple Match subject is not constrained, the routine proceeds to step 62, where the routine determines whether the Triple Match predicate is constrained. If not, the routine proceeds to step 56, where the data access request is sent to both the Specific Table and the Statement table. If at step 62, the Triple Match predicate is constrained, the routine moves on to step 64. At this step, the routine determines whether the Specific table component recognizes the Triple Match predicate. If so, the routine proceeds to step 66; if not, the routine moves on to step 56. At step 66, the data access request is sent to Specific tables only. If however, the routine proceeds to step 56, then the data access request is sent to both Specific table and to Statements table.
As will be readily apparent to those skilled in the art, the present invention can be realized in hardware, software, or a combination of hardware and software. Any kind of computer/server system(s)—or other apparatus adapted for carrying out the methods described herein—is suited. A typical combination of hardware and software could be a general-purpose computer system with a computer program that, when loaded and executed, carries out the respective methods described herein. Alternatively, a specific use computer, containing specialized hardware for carrying out one or more of the functional tasks of the invention, could be utilized.
The present invention can also be embodied in a computer program product, which comprises all the respective features enabling the implementation of the methods described herein, and which—when loaded in a computer system—is able to carry out these methods. Computer program, software program, program, or software, in the present context mean any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: (a) conversion to another language, code or notation; and/or (b) reproduction in a different material form.
While it is apparent that the invention herein disclosed is well calculated to fulfill the objects stated above, it will be appreciated that numerous modifications and embodiments may be devised by those skilled in the art, and it is intended that the appended claims cover all such modifications and embodiments as fall within the true spirit and scope of the present invention.