A wide variety of information may be represented in a hierarchical structure. One example format for representing information in a hierarchical structure is the extensible markup language (XML) format. An XML document may include an arrangement of nodes containing information and may specify parent-child relationships among the nodes.
It may be desirable under a variety of circumstances to apply incremental updates to hierarchically structured information. For example, a computing system may include a variety of processing devices that each generate incremental updates to an XML document.
A processing device that generates an incremental update to hierarchically structured information may be viewed as a trusted device in terms of access to the information. On the other hand, a processing device that applies an incremental update to hierarchically structured information may be viewed as an un-trusted device in terms of access to the information. For example, an XML document may be stored on a network-based information storage facility. As a consequence, it may be desirable to verify whether or not an incremental update handled by an un-trusted device was correctly applied to hierarchically structured information.
Techniques are disclosed for verifying whether an incremental update was correctly applied to a set of hierarchically structured information. The present techniques include determining an overall integrity code for the hierarchically structured information and attaching the overall integrity code to the hierarchically structured information. An incremental update according to the present techniques includes an integrity code that is combined into the overall integrity code attached to the hierarchically structured information when the incremental update is applied to the hierarchically structured information. The integrity code of the incremental update is generated such that when the overall integrity code is recomputed it will match the overall integrity code attached to the hierarchically structured information if the incremental update was correctly applied.
Other features and advantages of the present invention will be apparent from the detailed description that follows.
The present invention is described with respect to particular exemplary embodiments thereof and reference is accordingly made to the drawings in which:
The trusted update creators 10-14, the un-trusted update processor 16, and the trusted verifier 18 may be embodied as any combination of computer systems, server systems, storage systems, database systems, mobile computing devices including personal digital assistants (PDAs), cell phones, etc., as well as more specialized processing devices. The communication network 100 may be embodied as any combination of communication links including public communication links, wireless communication links, Internet communication links, etc.
The trusted update creators 10-14 are trusted with respect to the contents of the hierarchically structured information 30 whereas the un-trusted update processor 16 is not trusted with respect to the contents of the hierarchically structured information 30. For example, the trusted update creators 10-14 may be application programs that generate updates to the contents of the hierarchically structured information 30 and the un-trusted update processor 16 may be a document storage service that provides storage and access to the hierarchically structured information 30. The contents of the hierarchically structured information 30 are encrypted to prevent exposure of its contents, for example from exposure to the un-trusted update processor 16.
The trusted update creators 10-14 generate incremental updates, e.g. an incremental update 40, that are to be applied to the hierarchically structured information 30. The un-trusted update processor 16 applies the incremental update 40 to the hierarchically structured information 30 without decrypting the contents of the hierarchically structured information 30.
The unencrypted contents of the hierarchically structured information 30 may be represented as a hierarchical tree structure having an arrangement of nodes and arcs. The hierarchically structured information 30 is encoded and encrypted in such a way as to prevent unauthorized viewing and modification of its content as well as unauthorized viewing of its hierarchical structure. The hierarchical structure of the hierarchically structured information 30 is hidden in order to prevent unauthorized persons from inferring the information contained in the encrypted hierarchically structured information 30 from its hierarchical structure. For example, unauthorized persons are prevented from inferring the information contained in the hierarchically structured information 30 by determining parent-child relationships among its nodes.
Each node A-D may be represented using a data structure that includes a tag, a set of zero or more attributes, and zero or more text strings. The contents of the data structure for each node A-D that is not the root node of the hierarchically structured information 30 refers to its parent node. The contents of the data structure for each node A-D may refer to an ordered list of child nodes.
An example data structure for representing the unencrypted contents of the hierarchically structured information 30 is a data structure in XML format. An example of the unencrypted contents of the hierarchically structured information 30 in XML format is as follows.
The hierarchically structured information 30 is encrypted in two phases—an encoding phase followed by an encryption phase. In the encoding phase, the hierarchical structure of the unencrypted contents of the hierarchically structured information 30 is flattened into an unordered list of nodes. In the encryption phase, the unordered list of nodes is encrypted.
At step 120, an arbitrary identifier is assigned to each node A-D of the hierarchically structured information 30. The node identifiers remain persistently assigned to the nodes A-D. An example assignment of arbitrary identifiers for the nodes A-D is as follows.
The textual representation of the hierarchically structured information 30 after step 120 is as follows wherein the node identifiers 109, 558, 971, and 623 are shown below in square brackets.
At step 122, the hierarchical structure of the hierarchically structured information 30 is flattened into an unordered list representing the nodes, i.e. the nodes in the list have no particular order in relation to the hierarchical structure of the hierarchically structured information 30.
The following is an unordered list representation for the example hierarchically structured information 30 after step 122.
The first entry in the unordered list from step 122 specifies that the node 971 has a parent node 109 and a child node 623. The second entry in the unordered list from step 122 specifies that the node 623 has a parent node 971. The third entry in the unordered list from step 122 specifies that the node 109 is a root node and has child nodes 558 and 971, in that order. The fourth entry in the unordered list from step 122 specifies that the node 558 has a parent node 109.
The ordering of the node identifiers 971, 623, 109, and 558 in the unordered list from step 122 does not reflect the hierarchical structure of the example hierarchically structured information 30. This prevents unauthorized parties from grasping the structure of the hierarchically structured information 30 from the arrangement of the unencrypted node identifiers 971, 623, 109, and 558.
At step 124, a unique number (U) is generated for each node of the unordered list from step 122. For example, the unique number for the node 971 is U971 and the unique number for the node 623 is U623. The unique number helps ensure the uniqueness in the data of each entry in the unordered list. The unique number may be obtained from a random number generator or another function generator.
At step 126, the contents of each entry in the unordered list is encrypted. The node identifier for each entry in the unordered list is not encrypted. The unordered list representing the hierarchically structured information 30 after step 104 is as follows where the encryption function is E( ).
The choice of encryption algorithm, encryption key and the choice of symmetric or asymmetric key may be adapted to particular embodiments. The appropriate decryption key is used to decrypt the hierarchically structured information 30.
IC971=H(U971+Data971+Refs971)
The Data971 is the actual data associated with the node 971 and the Refs971 are the ordered list of the child references for the node 971. If the data in a node is replaced then the unique number will also be changed ensuring the integrity code is also different.
The one way hash function H generates a fixed width bit stream in response to a text x. The one way hash function H has the property such that there is no computationally feasible method of discovering x given H(x). One example of a suitable hash function is the MD5 function.
At step 162, the integrity codes for all of the nodes in the unordered list representing the hierarchically structured information 30 are combined together to yield the OIC for the hierarchically structured information 30. The integrity codes are combined using a function that is selected such that two integrity codes having the same value will cancel each other out in the combination. One example of a suitable combining function is the exclusive-OR (XOR) function which is used hereinafter to represent the combining function. The order in which the integrity codes of the nodes are combined at step 162 can be any order.
At step 164, the overall integrity code OIC is attached to the hierarchically structured information 30. For example, the overall integrity code OIC may be stored in the entry of the unordered list that corresponds to the root node.
The incremental update 40 specifies nodes to be added to the hierarchically structured information 30 and/or nodes to be deleted and/or nodes to be replaced from the hierarchically structured information 30. The un-trusted update processor 16 adds nodes to and deletes nodes from the hierarchically structured information 30 in response to the incremental update 40 by identifying entries in the unordered list for the nodes to be added or deleted using the node identifiers. This avoids decrypting the portions of the entries containing node data and parent-child pointers, thereby preventing exposure of the contents and hierarchical structure of the hierarchically structured information 30 to the un-trusted update processor 16 as it applies the incremental update 40 to the hierarchically structured information 30.
The incremental update 40 may specify one or more ADD commands and/or one or more DELETE commands and/or one or more REPLACE commands. An ADD command is used to add a node to the hierarchically structured information 30 and a DELETE command is used to delete a node from the hierarchically structured information 30. A REPLACE command is used to replace a node in the hierarchically structured information 30.
An ADD command in one embodiment is as follows.
The “id” parameter of the ADD command is a node identifier for a new, node to be added. The “E(txt)” parameter of the ADD command is the encrypted node data for a new entry in the unordered list representing the hierarchically structured information 30 for the new node. The “E(txt)” parameter includes a unique number for the new entry. The un-trusted update processor 16 performs an ADD id, E(txt) command by adding a new entry to the unordered list representing the hierarchically structured information 30 including the node identifier id and E(txt).
A DELETE command in one embodiment is as follows.
The “id” parameter of the DELETE command is a node identifier for a node to be deleted from the hierarchically structured information 30. The un-trusted update processor 16 performs a DELETE id command by deleting the entry from the unordered list representing the hierarchically structured information 30 that is specified by id parameter in the DELETE command.
A REPLACE command in one embodiment is as follows.
The “id” parameter of the REPLACE command is a node identifier for a node to be replaced. The “E(txt)” parameter of the REPLACE command is the new encrypted node data. The “H(E(txt_old))” parameter of the REPLACE command is a hash of the old encrypted node data.
The un-trusted update processor 16 performs a REPLACE id, H(E(txt_old)), E(txt) command by computing a hash of the encrypted node data in the entry in the hierarchically structured information 30 identified by the “id” parameter and comparing that hash with H(E(txt_old)) and then replacing the entry in the hierarchically structured information 30 that is identified by the node identifier id with E(txt) if they match. If H(E(txt_old)) does not match the hash of the encrypted node data for the entry in the hierarchically structured information 30 identified by the “id” parameter then the REPLACE id, H(E(txt_old)), E(txt) command is a conflicting attempt to modify a previously modified node which is not allowed. In the case of a conflict of incremental update, the commands are appended to the hierarchically structured information 30 and may later be merged by a trusted node.
The un-trusted update processor 16 applies the incremental update 40 to the hierarchically structured information 30 by performing the specified ADD and DELETE and REPLACE commands without decrypting the individual entries of the hierarchically structured information 30. The un-trusted update processor 16 recognizes the unencrypted node identifiers in the entries of the unordered list and then deletes and adds the specified lines.
The following is a first example of the incremental update 40 for the example hierarchically structured information 30.
ADD 421, E(U421, parent 971, <dollars>31.27</dollars>)
The ADD command adds a child node to node 971, and the REPLACE command provides an updated parent node 971 including a new unique number U′971 and child pointers for the update parent node 971. Alternatively, the same unique number U971 may be used.
Before the first example incremental update 40 is applied, the example hierarchically structured information 30 is as follows.
After the first example incremental update 40 is applied the example hierarchically structured information 30 is as follows.
The following is a second example of the incremental update 40 for the example hierarchically structured information 30.
The DELETE command deletes the node 558, a child node to the root node 109, and the REPLACE command provides an updated root node 109 including a new unique number U′109 and updated child pointers.
Before the second example incremental update 40 is applied the example hierarchically structured information 30 is as follows.
After the second example incremental update 40 is applied the example hierarchically structured information 30 is as follows.
The incremental update 40 includes an incremental integrity code (IIC) to be applied to the overall integrity code OIC attached to the hierarchically structured information 30. The IIC for the incremental update 40 is determined by determining an integrity code update ICU for each command specified in the incremental update 40 and then combining together the ICUs. For example, if the incremental update 40 includes an ADD command followed by a REPLACE command then the IIC for the incremental update 40 is as follows.
IIC=ICUA XOR ICUR
The ICUA is the ICU for the ADD command and the ICUR is the ICU for the REPLACE command in the incremental update 40.
The ICUA is determined as follows.
ICUA=ICA
The ICA is the integrity code for the node to be added with the ADD command. The integrity code ICA is computed using a one way hash of the corresponding unique number and the corresponding node data including child pointers.
The ICUR is determined as follows.
ICUR=ICR XOR ICR′
The ICR′ is the integrity code for the node to be replaced by the REPLACE command the ICR is the integrity code for the replacement node in the REPLACE command. The integrity codes ICR and ICR′ are each determined by computing a one way hash of the corresponding unique number and the corresponding node data including child pointers.
In another example, if the incremental update 40 includes an DELETE command followed by a REPLACE command then the IIC for the incremental update 40 is as follows.
IIC=ICUD XOR ICUR
The ICUD is the ICU for the DELETE command and the ICUR is the ICU for the REPLACE command in the incremental update 40.
The ICUD is determined as follows.
ICUD=ICD
The ICD is the integrity code for the node to be deleted by the DELETE command. The integrity codes ICD is determined by computing a one way hash of the corresponding unique number and the corresponding node data including child pointers.
The un-trusted update processor 16 applies the IIC to the hierarchically structured information 30 when it applies the incremental update 40 to the hierarchically structured information 30. The un-trusted update processor 16 applies the IIC to the hierarchically structured information 30 by obtaining the OIC attached to the hierarchically structured information 30 and obtaining the IIC from the incremental update 40 and computing OIC XOR IIC and then attaching the result back into the hierarchically structured information 30 as the updated OIC for the hierarchically structured information 30.
At step 200, a current overall integrity code OIC′ for the hierarchically structured information 30 is determined. The OIC′ may determined using method steps analogous to the method steps 160-162 above but with the current contents of the hierarchically structured information 30.
At step 202, the OIC′ is compared to the OIC attached to the hierarchically structured information 30. If OIC′ equals OIC then incremental updates, e.g. the incremental update 40, were applied correctly to the hierarchically structured information 30. If OIC′ does not equal OIC then incremental updates, e.g. the incremental update 40, were not applied correctly to the hierarchically structured information 30.
Add and subtract operations may used to combine integrity codes rather than an XOR function as described above. An add operation may be performed in place of the XOR operation described above, except when removing the old value of a node. Removing an old value of a node occurs when computing the ICUs for the DELETE and REPLACE commands. A set of formulas for computing an integrity code update for the DELETE and REPLACE commands are as follows.
ICUD=0−ICD
ICUR=ICR−ICR′
Other suitable functions for combining integrity code updates may be employed in other embodiments.
The foregoing detailed description of the present invention is provided for the purposes of illustration and is not intended to be exhaustive or to limit the invention to the precise embodiment disclosed. Accordingly, the scope of the present invention is defined by the appended claims.