System and Method for Scalable Distribution of Semantic Web Updates

Abstract
Disclosed are a method and system for scalable distribution of semantic web updates. A first embodiment of the invention leverages publish/subscribe technology to distribute those updates such that clients receive only the information they require. A second embodiment of the invention uses an access control feature to limit the statements clients are allowed to read. Optionally in this second embodiment, the same publish/subscribe messaging infrastructure may be used both to distribute updated semantic web data and also to distribute relevant changes to the access control information. The invention is particularly well suited for use with the Resource Description Framework (RDF) language.
Description

BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 shows a computer system embodying the present invention.



FIG. 2 shows example messages that may be used in the implementation of this invention.



FIG. 3 is a flow chart illustrating the startup and event processing by the clients of the system of FIG. 1.



FIG. 4 is a flow chart showing the startup and request processing by the server of FIG. 1.



FIG. 5 shows a modification of the system of FIG. 1.



FIG. 6 illustrates the update message flow and internal state of the update managers of the system of FIG. 5.



FIG. 7 is a flow chart depicting the operation of the update manager of the system of FIG. 5.



FIG. 8 shows a sample message used in the system of FIG. 5.



FIG. 9 illustrates an optional modification to the architecture of FIG. 7.



FIG. 10 shows the update message flow and internal state of the update managers of FIG. 9.



FIG. 11 is a flow chart illustrating the operation of the update managers of the system of FIG. 9.



FIG. 12 shows sample messages that may be used in the system of FIG. 9.





DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS


FIG. 1 illustrates a distributed computer system 100 that may be used in the practice of this invention. In particular, FIG. 1 shows a server computer 102, a plurality of client computers 104, and a publish/subscribe infrastructure 106. The devices of system 100 are connected together by any suitable network. Preferably, this network may be, for example, the Internet, but could also be an intranet a local area network, a wide area network, or other networks.


Any suitable server 102 may be used in system 100, and for example, the server may be an IBM RS/6000 server. Also, the clients 104 of the system may be, for instance, personal computers, laptop computers, servers, workstations, main frame computers, or other devices capable of communicating over the network. Likewise, the devices of system 100 may be connected to the network using a wide range of suitable connectors or links, such as wire, fiber optics or wireless communication links. Distributed system 100 may include additional servers, clients and other devices not shown in FIG. 1.


As mentioned above, in the depicted example, the devices of system 100 may be connected together via the Internet, which is a worldwide collection of networks and gateways that use the TCP/IP suite of protocols to communicate with one another. In the operation of system 100, server 102 provides data and applications to the clients. Among other functions, the server and the clients store semantic web statements such as RDF statements. For this reason, as depicted in FIG. 1, server 102 is referred to as an RDF store server, and clients 104 are referred to as RDF store clients.


Any suitable mechanism may be used to store the RDF statements. For example, relational databases that may be used to store RDF statements are disclosed, in copending application no. (Attorney Docket POU920050098US1) for “Method And System For Controlling Access To Semantic Web Statements,” filed ______, and copending application no. (Attorney Docket POU920050099US1), for “Method And System For Efficiently Storing Semantic Web Statements In A Relational Database,” filed ______, the disclosures of which are hereby incorporated herein in their entireties by reference .


The present invention is directed generally to the distribution of such semantic statements. More specifically, the invention leverages existing JMS publish/subscribe technology (e.g. IBM Websphere Event Broker), represented at 106, to distribute RDF updates such that clients receive only the information they require. Updates include adding, changing or removing RDF statements.


In system 100, clients 104 listen for events using content-based message selection, a feature provided by the JMS standard. The server 102 publishes all events into the publish/subscribe cloud 106, allowing the brokers to discard events that do not match any client subscriptions. The example messages illustrated in FIG. 2 show the properties available for selection by a client. In particular, a message used to add a statement is shown at 202, a message used to change a statement is shown at 204, and a message used to remove a statement is shown at 206. Using the JMS message selector language (similar to SQL), a client may listen for changes affecting particular statements, or statements matching a particular subject/predicate/object pattern. Thus, a client may not be able to receive all events generated by the server 102, but they could efficiently listen for all statements concerning a particular subject resource.



FIG. 3 illustrates the client startup and event processing for keeping a subgraph up-to-date. In this routine, at step 302, the client 104 connects to the JMS broker cloud 106, and subscribes to events using content-based message selection. At step 304, the client starts to receive updates. At step 306, the client executes a server-side RDF query (e.g. RDQL (RDF Data Query Language) or SPARQL (Simple Protocol And RDF Query Language)), via web services, to populate initial data.


At step 310, the routine checks to determine if any JMS messages are pending. If no messages are pending, this step is repeated until a message is pending; and, when a message is pending, the routine moves on to step 312. At this step, the client checks to determine if the pending message indicates that an RDF statement has been removed. If this is the case, the statement is removed from the local RDF stores at step 314, and the routine returns to step 310.


If, at step 312, the pending message does not indicate that an RDF statement is being removed, then the pending message indicates that a statement is being added or modified, and from step 312, the routine proceeds to step 316. At this step, a time stamp on the message is examined. If, as represented at step 320, the timestamp shows that the update is obsolete, then the routine returns to step 310. If the message is not obsolete, then, at step 322, the local store is updated with the new statement information, and then the routine returns to step 310.



FIG. 4 shows the server 102 startup and request processing. At step 402, the server starts-up and connects to publish/subscribe broker cloud 106 as a publisher; and at step 404, the server starts to listen for web server requests. At step 406, the server checks to determine if any requests are pending. If none are, the server continues to listen for requests until a request is pending. When this happens, the routine moves on to step 410, where the routine determines whether the request results in any change. If it does not, then, at step 412, the request is processed and data are returned, and then the routine returns to step 406.


However, if at step 410, the request does result in a change, the routine moves on to step 414, and the request is processed and the data are returned. Then, at step 416, an update message is published for each statement modified, and the routine then returns to step 406.


In the operation of system 100, clients communicate with an RDF store server through a combination of synchronous web service operations and asynchronous JMS updates. This operation may be significantly complicated by statement-level access control lists, which restrict which clients are allowed to read which statements. This level of access control means that simple publish/subscribe cannot be relied on alone, because some application-level authentication and message filtering is needed. In addition, RDF statements are added, modified, and removed within the scope of transactions. This also complicates the scenario, since any particular client may only be allowed to see some of the statements involved in a transaction.



FIG. 5 shows an architecture of a system 500 that accommodates this access control. Similar to system 100, system 500 includes a store server 502, one or more store clients 504, and a publish/subscribe infrastructure 506. System 500 further includes one or more update managers 510 and an ACL database 512. As indicated in the FIG., server 502, publish/subscribe infrastructure 506, update managers 510 and database 512 form a trusted server network 514.


In system 500, clients 504 may connect to one of many update managers (most likely using web services over point-to-point JMS, e.g. IBM WebSphere MQ). After authenticating, the client specifies a pattern for statement updates of interest. These patterns may match the subject, predicate or object of a statement, or other metadata the server chooses to include in the updates such as the date and time the statement is created.


All statement updates are published, one at a time, by the store server, over an internal publish/subscribe broker cloud. The update managers subscribe to statement updates based on the patterns provided by their clients, and listen for all transaction completion messages. Each statement update is tagged with an ACL (Access Control List) identifier. Every time an update manger receives an update, it must ensure that the client is allowed to see the information before passing it on.


With reference to FIG. 5 and 6, in this architecture, the RDF store server shares an ACL database directly with all of the update manager servers. Commonly, it may be noted., ACL data changes much less frequently than statements are updated. Without any additional optimization, each update manager would need to contact the ACL database every time it received a relevant statement update on behalf of a client.


If security policies allow some delay in enforcement after changes to the access control database, HTTP-style caching could be applied here. Update managers would be allowed to cache ACL information, depending on expiration times or get-if-modified operations supported by the ACL database and configured by the store server. It may be noted that with caching, clients connecting to different update managers may see different view of the data (corresponding to different versions of the access control data).



FIG. 7 shows in more detail the operation of the update manager. At step 702, the update manager accepts a connection from a client; and at step 704, client authentication is performed. Then, at step 706, the routine determines whether there is a statement update pending from the store server. If there is not, the routine proceeds to step 710, where the routine determines whether there is a complete transaction pending from the store server. If the transaction is not complete, the routine returns to step 706 and continues on from there. If the transaction Is complete, then the routine moves on to step 712, where the completed transaction is sent to the user via a point-to-point JMS connection. After this, the routine returns to step 706 and then proceeds from there.


If, at step 706, there is a statement update pending, then the routine, at step 714, looks up the update ACL URI in access control database, and then, at step 716, determines whether the user has permission to read this update. If the user does have permission, then the routine proceeds to steps 720 and 722. At step 720, the update is sent to the user, via a point-to-point connection; and at step 722, the update transaction ID is added to a list of pending transactions. From step 722, the routine proceeds to step 710. If, at step 714, it is determined that the user does not have permission to read this update, the routine skips to step 710 and proceeds from there.



FIG. 8 shows sample messages that may be used in the operation of system 500. In particular, a sample statement update message is shown at 802, and sample transaction complete messages are shown at 804 and 806.


Any suitable update managers may be used in the practice of this invention. For example, suitable update managers are described in copending application no. (Attorney Docket no. POU920050059US1) for “System And Method For Tracking And Storing Semantic Web Revision History,” filed ______, and copending application no. (Attorney Docket no. POU920050060US1) for “Method And System For Selective Tracking Of Semantic Web Data Using Distributed Update Events,” filed ______, the disclosure of which is hereby incorporated herein in its entirety by reference.



FIG. 9 illustrates another optional feature of the present invention. Specifically, with the system 900 shown in FIG. 9, the same publish/subscribe infrastructure 906 is used both to distribute the updated semantic web data and also to distribute relevant changes to the access control information. In this architecture modifications to the access control data are applied to the update managers 910, preferably, quickly without sacrificing scalability, and publish/subscribe messaging within the trusted server network is used to distribute relevant ACL information (along with statement updates and transaction completion events) to each update manager.


With reference to FIGS. 9 and 10, when a user connects to an update manager 910, that server fetches the current list of relevant ACLs and group memberships, then subscribes to changes in that data via publish/subscribe infrastructure 506. By only subscribing to ACL update messages that contain references to the user in question or one of their groups, each update manager avoids receiving extraneous updates. It may be noted that group modifications (relatively infrequent) require an update manager to change its publish/subscribe subscriptions. Because of the asynchronous operation of the publish/subscribe, the update manager may need to request a full snapshot of ACL information from the store server (similar to start-up).


This architecture allows update mangers to filter incoming statement update messages immediately, without contacting any other node.



FIG. 11 shows the operation of the update manager 910 of system 900. As can be seen by comparing FIGS. 7 and 11, the operation of managers 910 of system 900 is similar to the operation of managers 710 of system 700, with some additional steps and some steps modified.


With particular reference to FIG. 11, at step 1102, the update manager accepts a connection from a client; and at step 1104, client authentication is performed. Then at step 1106, the user manager contacts the RDF store server, and requests a list of the ACL UIRs that the user is allowed to read and requests group membership. Then, at step 1110, the routine determines whether there is an ACL or group update from the store server pending. If there is, then at step 1112, the list of ACL UIRs that the user is allowed to read is updated, refresh from server, subscriptions are changed if the user group changes. After step 1112 is completed, the routine proceeds to step 1114. The routine also goes to step 1114 from step 1110 if at this latter step there are no ACL or group updates pending from the store server.


At step 1114, the routine determines whether there is a statement update pending from the store server. If there is not, the routine proceeds to step 1116 where the routine determines whether there is an applicable transaction complete pending from the store server. If there is no such transaction complete pending, the routine returns to step 1110 and continues on from there. If there is such a pending transaction complete at step 1114, the routine moves on to step 1120, where the completed transaction is sent to the user via a point-to-point JMS connection, and this transaction is removed from the pending transaction list. After this, the routine returns to step 1110 and then proceeds from there.


If at step 1114, there is an update statement pending, the routine moves to step 1122, where a determination is made as to whether the user is entitled to this update. If the user is not entitled, the routine proceeds to step 1116; however, if the user is entitled to this update, the routine goes to step 1124. At this step, the update is sent to the user via a point-to-point connection. Then, at step 1126, the update's transaction ID is added to the pending transaction list, and from step 1126, the routine then goes to step 1116 and proceed from there.



FIG. 12 shows sample messages that may be used in the operation of system 900. More specifically, a sample statement update message is shown at 1202, and sample transaction complete messages are shown at 1204 and 1206. Sample ACL and group update messages are illustrated at 1210, 1212 and 1214. Update managers will subscribe to messages that contain either the user name or a connection or one of that user's groups.


While it is apparent that the invention herein disclosed is well calculated to fulfill the objects stated above, it will be appreciated that numerous modifications and embodiments may be devised by those skilled in the art, and it is intended that the appended claims cover all such modifications and embodiments as fall within the true spirit and scope of the present invention.

Claims
  • 1. A method of publishing updated semantic web data to distributed clients, comprising the steps of: storing semantic web data in a server;updating said semantic web data;the server publishing a set of updates to said semantic web data; andeach of a plurality of distributed clients registering for a respective one subset of said updates, and receiving only updates within said respective one subset.
  • 2. The method according to claim 1, wherein: the publishing step includes the step of the server publishing said updates to a given publish/subscribe infrastructure; andthe registering step includes the step of each of said plurality of clients registering with said given publish/subscribe infrastructure to receive the respective one subset of said updates.
  • 3. The method according to claim 2, comprising the further step of said given publish/subscribe infrastructure distributing to each of said clients only updates for which the client has registered.
  • 4. The method according to claim 3, wherein: said updates are updates to statements;comprising the further step of providing access control lists that identify which ones of the clients arc allowed to read which ones of the statements; andwherein the distributing step includes the step of the publish/subscribe infrastructure distributing statements only to clients permitted, as identified on the access to control lists, to read the statement.
  • 5. The method according to claim 4, comprising the further step of: storing the access control lists in an ACL database; and wherein:the publishing step includes the step of publishing each of the statements with a property having an identifier identifying one of the access control lists; andthe step of distributing statements only to clients permitted to read the statements includes the steps of i) providing an update manager, andii) when each of the statements is published, the update manager finding in the ACL database the access control list identified by the identifier of the property with the published statement, and using said found access control list to determine which ones of the clients are entitled to read the statement.
  • 6. The method according to claim 4, comprising the further steps of: updating the access control lists;providing an update manager; andthe server using the same publish/subscribe infrastructure used to distribute the updates to the semantic web data, to distribute to the update manager updates to the access control lists.
  • 7. A server/client system for publishing updated semantic web data to distributed clients, comprising: a server storing semantic web data, including updated semantic web data; the server including instructions to publishing a set of the updated said semantic web data;a plurality of distributed clients, each of said plurality of distributed clients including instructions for registering for a respective one subset of said updates; andan infrastructure for distributing to each of said clients only updates within said respective one subset of updates for which the client has registered.
  • 8. The system according to claim 7, wherein: said infrastructure is a publish/subscribe infrastructure;the server publishes said updates to said publish/subscribe infrastructure; andeach of said plurality of clients registers with said publish/subscribe infrastructure to receive the respective one subset of said updates.
  • 9. The system according to claim 8, wherein said publish/subscribe infrastructure distributes to each of said clients only updates for which the client has registered.
  • 10. The system according to claim 9, wherein said updates are updates to statements, further comprising access control lists that identify which ones of the clients are allowed to read which ones of the statements; and wherein; the publish/subscribe infrastructure distributes statements only to clients permitted, as identified on the access to control lists, to read the statement.
  • 11. The system according to claim 10, wherein the server publishes each of the statements with a property having an identifier identifying one of the access control lists, and the system further comprises: an ACL database storing the access control lists; andan update manager wherein when each of the statements is published, the update manager finds in the ACL database the access control list identified by the identifier of the property with the published statement, and uses said found access control list to determine which ones of the clients are entitled to read the statement.
  • 12. The system according to claim 10, further comprising: an update manager; and wherein:the access control lists are updated, andthe server uses the same publish/subscribe infrastructure used to distribute the updates to the semantic web data, to distribute to the update manager the updates to the access control lists.
  • 13. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for of publishing updated semantic web data to distributed clients, said method steps comprising: storing semantic web data in a server;updating said semantic web data;the server publishing a set of updates to said semantic web data; andeach of a plurality of distributed clients registering for a respective one subset of said updates, and receiving only updates within said respective one subset.
  • 14. The program storage device according to claim 13, wherein: the publishing step includes the step of the server publishing said updates to a given publish/subscribe infrastructure; andthe registering step includes the step of each of said plurality of clients registering with said given publish/subscribe infrastructure to receive the respective one subset of said updates.
  • 15. The program storage device according to claim 14, comprising the further step of said given publish/subscribe infrastructure distributing to each of said clients only updates for which the client has registered.
  • 16. The program storage device according to claim 15, wherein: said updates are updates to statements;wherein said method steps comprise the further step of providing access control lists that identify which ones of the clients are allowed to read which ones of the statements; andwherein the distributing step includes the step of the publish/subscribe infrastructure distributing statements only to clients permitted, as identified on the access to control lists, to read the statement.
  • 17. The program storage device according to claim 16, for use with an update manager, and wherein said method steps comprise the further step of: storing the access control lists in an ACL database; and wherein:the publishing step includes the step of publishing each of the statements with a property having an identifier identifying one of the access control lists; andthe step of distributing statements only to clients permitted to read the statements includes the step of when each of the statements is published, the update manager finding in the ACL database the access control list identified by the identifier of the property with the published statement, and using said found access control list to determine which ones of the clients are entitled to read the statement.
  • 18. The program storage device according to claim 16, for use with an update manager, and wherein said method steps comprise the further steps of: updating the access control lists;the server using the same publish/subscribe infrastructure used to distribute the updates to the semantic web data, to distribute to the update manager updates to the access control lists.