Change capturing component 23 is responsible for monitoring data manipulation performed on database 22, which is triggered from the application. For this, the application notifies replication engine 21 of the data change applied using predefined application programming interface (API) calls.
Change detection component 24 is responsible for determining changes applied to database 22 and for generating and delivering a minimal list of change records that can be applied to a target database, such as database 12, without violation of referential integrity.
Stored replication scheme 25 provides a description of replicated tables and data. It particularly contains information about tables and relationships between them. An exemplary replication scheme is embodied as follows:
The column Table indicates the tables to be considered. The column References defines the containment relationships of the table designated by column Table. Furthermore, each table defined in the replication scheme should comprise a primary key so that each record can be identified uniquely.
Operation of replication engine 21 (and 11, respectively) and its subcomponents is shown in
In step 200, a data snapshot is created and/or stored. This is typically performed when a replica 22 is newly created or synchronized (whichever occurred more recently). A snapshot defines a set of records that serves to recognize changes that have been applied since the last synchronization (or creation) point. Details of the data snapshot are discussed later referring to
While using the database 22 (or 12), a user manipulates data stored in the database in step 210. As pointed out above and shown in step 220, replication engine 21 is notified of the data change applied using predefined application programming interface (API) calls in order to keep the database snapshot up-to-date. This can be done by application software or middleware which uses the replication and synchronization mechanism of the present invention. For this, certain APIs are provided to access functionality of the mechanisms described by this invention.
Thus, if a record is updated on the server 1 or the client 2, the application calls the appropriate API which updates the fields “last updated timestamp” and “last updated version number” of the corresponding snapshot data record. If a record is added or deleted on the server 1 or the client 2, nothing needs to be recorded, since those operations are recognized by checking the existence respective non-existence of a record with a database query statement as shown with regard to step 240 below.
When synchronization (or, a mere data comparison, respectively) is to be performed, which is determined in step 230, a minimal sequence of change records is generated in step 240. In this case, client 2 first scans its own snapshot data table for local changes, resulting in change records. Only these will be sent to the server after being sorted as described below, for instance as a parameter of a replication request or for a data comparison.
Similarly, as will be shown referring to
The result is a minimal list of change record objects, i.e. each changed data record is represented by exactly one change record, comprising the following content fields:
In an implementation, also the table name can be included in the change record, depending on how unique the record id is.
Next, in step 240 this list is chronologically sorted, that is, sorted by the “last updated timestamp” in ascendant order. This is because the local changes should be performed in the same order on the target database as they have been executed on the local database. For performing this sorting operation, a stable sorting method is preferred.
At this point a sequence of change records of local changes sorted by the last updated timestamp is obtained. Now, this sequence is rearranged to reflect/maintain referential dependencies. Thus, in order to assure referential integrity when applying that list to a target database, for instance database 12 of server 1, in step 250 the topology of each record is considered and the change records are rearranged in such a way that they can be applied following the order of the rearranged sequence without any violation against defined relationships. This process of rearranging is referred to as topological sorting and is shown in more detail in
Finally, the sequence of change records can be sent to a second database system component such as from client 1 to server 2 (or vice versa). In one application scenario, a server having received the list may examine each change record for an update conflict with its local changes, as will be described further below.
The structure of a data snapshot 4 is shown in
Referring to the overall system, each client stores its own snapshot, and the server stores the snapshots for all its clients.
Snapshot definition 41 defines which types of replica a client 1 has requested (or contains) and the timestamp of the last synchronization point. In a full embodiment, snapshot definition 41 can contain a snapshot definition id, a client id (such as IP-address and client name), a replica type 42 and a last synchronization timestamp 43.
Snapshot data 51 contains a snapshot data record 51 for each record that was replicated to the client. Snap shot data record 51 comprises a data record identifier 52, such as record id and table name, and a version tag 53, such as a last updated timestamp and version number. Any modification of the corresponding record will also update the version tag appropriately. In a full embodiment, a snapshot data record consists of the following fields: snapshot definition id, record id (such as the primary key of the replicated record), table name, synchronization timestamp, synchronization version number, last updated timestamp, and last updated version number.
To ensure that any modification of a data record will also update the version tag of the corresponding snapshot data record, the application should make an explicit API call.
For some applications of the principle of the present invention, each client maintains its own snapshot definition and data as described above.
To illustrate a typical scenario wherein the present invention is shown producing its advantageous effects at two distinct points, namely with performing a mere data comparison and with performing a data synchronization between a client and a server,
In step 410, client 2 submits a request to create a data snapshot. Server 1 stores snapshot information as described above in its database in order to track local changes and detect update conflicts. In step 420, server 1 then returns a list of change records to client 2. Client 2 applies them to its database and has then the same data content like the server database.
Referring to steps 430a and 430b, it is shown that users can manipulate the data on both the client and the server concurrently and independently.
In step 440, client 2 submits a data compare request to server 1, together with a sequence of change records. Server 1 determines its local changes, compares them against the client's changes and determines possible update conflicts. Server 1 returns the result to the client in step 450. Client 2 can show the update conflicts to the user and lets them decide which version to be favored during an actual synchronization.
In step 460, client 2 determines its local changes and submits a synchronization request to server 1, together with the sequence of change records. Server 1 determines first its local changes and compares them against the client's change records in order to check if any update conflicts exist in step 470. If no update conflicts are found, server 1 applies the changes received from the client to the server database, and then sends or returns its local changes to client 2 in step 480. As a result, client 2 applies the server change records to its database. In step 490, client 2 sends an acknowledgement message to server 1.
This scenario may take place with a multiplicity of clients as well. When the server applies changes of the client, a clean rollback of synchronization is always possible if there is a failure. When the client applies server changes, based on the two-phase commit described above, the server is able to cleanly roll back the snapshot data of the client of a previous state such that the client may recover its local data after a failure during this phase.
In step 242, for each table defined by the replication scheme, the SQL select statement below delivers the records that have been updated on the client or server, as the case may be:
In step 243, for each table defined by the replication scheme, the SQL select statement below delivers the records that have been deleted on the client or server, as the case may be:
By calculating the union of these three result lists, the total list of all changed records is determined.
The usage of “predecessor” in this case refers to the order of change records in the minimal list of change records after this list has been sorted chronologically: If the second change record appears in this list before the first change record, it is considered to be a predecessor.
The recursive behavior is as follows: If the second change record is placed before the first change record, the rearranging is done recursively for the second change record. After the recursion has ended, traversal continues with the change record now following the first change record in the list.
As an example, consider the chronologically sorted list of change records (first, third, second, fourth), where the first record references the second, and the second references the third. When the traversal reaches the first record, the second is placed before the first, which gives an intermediate result in which second would be applied before third, contradicting the reference from second to third. This intermediate result is corrected by the recursion on the second record, which moves the third record before the second. Then traversal continues on the fourth record.
In an illustrative example of the result of topological sorting, consider a sequence of change records that has already been sorted chronologically regarding the last update timestamp, showing three insertions of records:
Using the replication scheme in order to determine that “B is dependent on A”, a topological sort as described above results in the following order:
Insertion of record A must take place before insertion of record B, otherwise referential integrity would be violated.
In one embodiment, conflict detection is executed mainly on the server after a client request with its list of change records comes in. In this case, no conflict detection is required on the client, since the present invention guarantees that the list of change records returned from the server as a result of a synchronization request will be applicable to the client's database.
First the server determines the list of its change records as described above. In the following the terms “client change record” and “server change record” are used to designate a change record originated from the client and the server respectively. Next the following steps will be performed for each client change record, and the result is a list of update conflict record, which is initially empty:
If the list of update conflicts is empty at this point, this means no conflict was detected and the client change records can be applied to the server database. The server change records will be returned to the client as a result of its synchronization request.
The automatic resolution of update conflicts provided by an embodiment of the present invention is that the server version will be preferred if and only if both records are equal in term of their field values, i.e. only their version numbers may be different. However a client can utilize the change records field update_hint to enforce which version is more accurate. Below is a procedure a client performs for a user-controlled conflict resolution:
To create a replica, in addition to the techniques described above with regard to steps 410 and 420, the following is to be considered. On the server, those data records which should be included in the content set of the replica should be determined. The content of the replica is determined by the filters which are defined by the user, and by the references as specified in the replication scheme. The records in the content set must both match the filters of the user, and satisfy all referential constraints. To construct the minimal content set which satisfies these two conditions, for each of the data records a change record is created with its operation being “insert”, then change records are ordered chronologically according to creation time stamp (if no creation timestamp is available, this step may be skipped), and then change records are rearranged to reflect referential dependencies (topological sorting). On the client, the list of received change records is applied in the same way as in the case of synchronizing, and a snapshot is stored to track local changes.
Creating a replica can thus be summarized to these steps: determine the content set, create change records for each record in this set with operation=insert and store the snapshot data. Then these change records are sent to the client and processed by the same method as when synchronizing.
The invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In an embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk-read-only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers.
Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
To avoid unnecessary repetitions, explanations given for one of the various embodiments are intended to refer to the other embodiments as well, where applicable. In and between all embodiments, identical reference signs refer to elements of the same kind. Moreover, reference signs in the claims shall not be construed as limiting the scope. The use of “comprising” in this application does not mean to exclude other elements or steps and the use of “a” or “an” does not exclude a plurality. A single unit or element may fulfill the functions of a plurality of means recited in the claims.
1 Server system
2 Client system
3 Data communication channel
4 Snapshot data structure
11 Replication engine
12 Database
13 Change capturing component
14 Change detection component
15 Stored replication scheme
21 Replication engine
22 Database
23 Change capturing component
24 Change detection component
25 Stored replication scheme
41 Snapshot definition record
42 Replication type indicator
43 Synchronization timestamp
51, 51a Snapshot data record
52, 52a Data record identifier
53, 53a Version tag
200 Store snapshot
210 Data manipulation
220 Update snapshot
230 Synchronization request occurred?
240 Generate sequence of change records
241 Select data records
242 Select data records
243 Select data records
250 Rearrange sequence of change records
251 Is operation “delete”?
252 Does database snapshot have an entry for the second data record?
253 Is change record a predecessor?
254 Place record before current record
255 Skip record
410 Request to create snapshot
420 Snapshot data/change records
430
a, 430b Manipulate data on database
440 Compare request
450 Result data/update conflicts
460 Synchronization request
470 Apply changes
480 Result data/server changes
490 Acknowledge synchronization
Number | Date | Country | Kind |
---|---|---|---|
06113236.1 | Apr 2006 | EP | regional |