This invention relates in general to data security and in particular to verification of information stored in data structures.
Data structures are of course used to store all manner of data elements and, often, to relate them to each other. In some cases, the data structure is meant to encode and record indications of events in some process, such as steps in a manufacturing process, or a supply chain, or even steps in a document-processing or business process. One common concern is then verification: How does one know with enough assurance what entity has created the data entered into the data structure, and how does one know that it hasn't been altered? In some cases, verifiable indication of timing and sequencing is also important, which adds additional complexity.
One method gaining in popularity is to encode event data, for example, by computing its hash value, possibly along with some user identifier such as a Public Key Infrastructure (PKI) key, and then to store this hashed information in a structure such as a blockchain with distributed consensus and some proof-of-work arrangement to determine a “correct” state of the blockchain and which entity may update it. Many of the blockchains used for cryptocurrencies follow this model, for example, since they, usually by design philosophy, wish to avoid any central authority. Such arrangements suffer from the well-known “double spending” problem, however, and are even otherwise often unsuitable for entities such as governments, banks, insurance companies, manufacturing industries, enterprises, etc., that do not want or need to rely on distributed, unknown entities for consensus.
Several different time-stamping routines and services are available that are good at proving the time that data was signed, and that the data being verified is the same as the data that was presented at some point in the past. These systems typically suffer from one or more if at least the following weaknesses:
Because of the such constraints, it follows that it is not always possible to use known timestamping services to prove that a particular sequence of events occurred in a particular, correct, or otherwise desirable order, because another sequence of events could also have received signatures, and simply be hidden from view. It also follows that it may also not always be possible to define what the correct/acceptable order of events should be, because such a definition would have to exist as a unique, addressable specification for a process.
In general, as more and more services—both public and private—are performed digitally, the need for a mechanism to ensure trustworthiness of the underlying processes also grows.
Disclosed here is an arrangement for data structure verification (referred to generally as the “DSV system” or simply “DSV”) that addresses the issues mentioned above.
Among many others, DSV lends itself well to use cases such as:
These use cases, which are of course simply a few of the many possible ones, share common features: There is a need to provide a registry of users, and bind them digitally to an authorized identity (User Registry). A “registry” may be any data structure that can store digital representations of whatever items (which themselves may already be in digital form) are to be tracked. Examples are given below.
Further, in many cases Users (any entity that creates data that is to verifiably included in the data structure) should hold a particular role or office or other authorization level, at the time of their authorization (CEO of company, member of tax authority, current owner of mortgage). Therefore there is often a requirement to maintain an organizational or hierarchical registry, and be able to prove membership, change membership (joining or leaving a company, for example), revoke and add keys, etc., so that it is possible to construct a practical system that can accomplish the above using signatures, if those signatures are based on a private key of some kind. These features are not universal however, and other use cases will have other characteristics, although the assumption is that some process is to be made verifiable.
Embodiments may be used to verifiably track any type of object—even abstract items such as steps in a chain of decisions—that can be identified, represented, or encoded in digital form. The “state” of the object may be defined in any chosen manner. In general, it will be a digital representation of at least one aspect of the object to be followed. For example, a document, possibly plus metadata, could be represented as a hash of all of part of its contents. The metadata could include such information as who is its current owner/administrator, time, codes indicating rules such as permissions, indications of decisions, etc., and/or any other information a system administrator wishes to include. In a manufacturing process, information such as unit or part IDs, digital codes assigned to the different manufacturing stations or processing steps, measurements of characteristics, shipping location, etc., could be represented digitally and form an object that may change over time. For abstract objects such as a chain of decisions, identifiers of the decision-makes, indications of times and of the respective decisions, notations, etc., could be encoded digitally in any known manner and entered into a registry and tracked.
As used here, a process is a series of actions or steps taken in order to achieve an end. Some simple examples of processes are: issuing a certificate/document; amending property ownership records or a list of voter registrations; and a series of manufacturing steps to create a product. There are of course countless other processes that comprise a series of actions or steps.
Processes may be defined as states and transitions, that is, changes of those states. For example, the state of a document might be “unauthorized” or “authorized”, and some user action may cause the state of the document to change from the one to the other. Transitions may be caused not only by intentional user action, but may also occur automatically or even naturally.
The state of something may not be the only thing a user needs to be able to trust. Consider, for example, a will, that is, a last testament. A registry might be set up to record the existence of a will, but the representative of a testator, or of a probate court, may also want to know when the state of that will was most recently changed (to be sure the testator was still competent at the time) such as being amended or replaced by a new will, what any previous and superseded version was, and also that no other valid wills by the same testator exist, which requires some method for proof on nonexistence. It may also be necessary to be able to prove that the registry itself is performing correctly.
Several methods are known for digitally signing and/or timestamping data. In general, the system designer who wishes to implement embodiments of this invention may use any preferred and such system, or systems (for example, separate systems for generating signatures and for timestamping). Nonetheless, by way of example, the Guardtime KSI® system is referred to herein and preferred because of its advantages, one of which is that it is able to generate digital signatures for data that also serve as irrefutable timestamps. Other signature solutions may also be used, however, although they should be able to perform the same functions. The Guardtime KSI® system will now be summarized for the sake of completeness.
Guardtime AS of Tallinn, Estonia, has created a data signature infrastructure developed and marketed under the name KSI® that also includes a concept of “blockchain” that does not presuppose unknown entities operating in a permissionless environment. This system is described in general in U.S. Pat. No. 8,719,576 (also Buldas, et al., “Document verification with distributed calendar infrastructure”). In summary, for each of a sequence of calendar periods (typically related one-to-one with physical time units, such as one second), the Guardtime infrastructure takes digital input records of any type as inputs. These are then cryptographically hashed together in an iterative, preferably (but not necessarily) binary hash tree, ultimately yielding an uppermost hash value (a “calendar value”) that encodes information in all the input records. To this point, the KSI system resembles a typical Merkle tree. This uppermost hash value is however then entered into a “calendar”, which is structured as a form of blockchain in the sense that it directly encodes or is otherwise cryptographically linked (for example, via a Merkle tree to a yet higher root value) to a function of at least one previous calendar value. The KSI system then may return a signature in the form of a vector, including, among other data, the values of sibling nodes in the hash tree that enable recomputation of the respective calendar value if a purported copy of the corresponding original input record is in fact identical to the original input record.
As long as it is formatted according to specification, almost any set of data, including concatenation or other combination of multiple input parameters, may be submitted as the digital input records, which do not even have to comprise the same parameters. One advantage of the KSI system is that each calendar block, and thus each signature generated in the respective calendar time period, has an irrefutable relationship to the time the block was created. In other words, a KSI signature also acts as an irrefutable timestamp, since the signature itself encodes time to within the precision of the calendar period.
One other advantage of using a Guardtime infrastructure to timestamp data is that there is no need to store and maintain public/private (such as PKI) key pairs—the Guardtime system may be configured to be totally keyless except possibly for the purposes of identifying users or as temporary measures in implementations in which calendar values are combined in a Merkle tree structure for irrefutable publication in a physical or digital medium (which may even be a different blockchain). Another advantage is less apparent: Given the signature vector for a current, user-presented data record and knowledge of the hash function used in the hash tree, an entity may be able to verify (through hash computations as indicated by the signature vector) that a “candidate” record is correct even without having to access the signature/timestamping system at all.
Yet another advantage of the Guardtime infrastructure is that the digital input records that are submitted to the infrastructure for signature/timestamping do not need to be the “raw” data; rather, in most implementations, the raw data is optionally combined with other input information (for example, input server ID, user ID, location, etc.) and then hashed. Given the nature of cryptographic hash functions, what gets input into the KSI system, and thus ultimately into the calendar blockchain, cannot be reconstructed from the hash, or from what is entered into the calendar blockchain.
If used in this embodiment of the DSV system, the KSI (or other chosen) system is preferably augmented with additional capability, which provides the following additional properties:
As with most blockchain technologies, it is desirable to make sure the system continues to operate correctly even when a central party is misbehaving (perhaps due to malice, corrupt employees, incompetence or, for example, hacking). The aim of the particular design is generally to see to it that the worst the central system administrator or operator can do is to turn off various parts of the system—which will be obvious to system users—but the central operator at least cannot make the system “tell a lie” without eventually being found out (hopefully quickly, e.g. within minutes or seconds).
The main principle of DSV operation is that all state changes happen as a consequence of events (“transactions”). An event might be for example “User 1 sends $10 to User 2”, or “tax office rejects claim #321”; the system may guarantee that all users will eventually agree on the exact sequence of these events (even if the central operator cheats), and thus everyone can compute correctly all the state/outputs of the system (e.g. “User 1 now has 30 dollars”). The events may be digitally signed by their originators, thus ensuring that the central operator cannot forge events.
As a typical speed optimization, the latest state may be kept by the central operator as well, in a special structure called “state tree”. Options for implementing such a state tree are presented below.
Note that both events and the resulting state may be “sharded” so that users see only events and states that they are allowed to see, but nonetheless can verify the completeness and correctness of their own data. To this end, in one embodiment, the DSV system preferably includes a “gossip” mechanism for published information, that is, for information entered into some medium that is, in practice, immutable and irrefutable. See, for example, https://en.wikipedia.org/wiki/Gossip_rotocol for a summary of “gossiping” in this context.
In short, the DSV system periodically, during a series of “aggregation rounds”, collects all new events (for example, all completed manufacturing steps in the last hour, currency transfers in the last second, etc.) and aggregates them all into its Event Log Merkle (hash) tree 330. The resulting state changes may then be represented in the State Tree 320, which may be configured as a sparse Merkle tree (SMT)—for example, every account could have its latest balance in there, potentially with history and various metadata. The root values of the Event Log hash tree and the SMT may then be aggregated (for example, during each of some predetermined aggregation period) into a history tree (called here a “Tree Head Log” 310 (such as the log 130 in Figure. The root of the Tree Head Log may be periodically signed by the central operator and that signature is potentially timestamped. A KSI signature may, as mentioned above, itself encode time. The resulting signed hash value may then optionally be “gossiped”, that is, distributed, to all or some other users to make sure they are all seeing the same view of the events and their results.
“Gossiping” may also be achieved via anchoring to other blockchains and other means; various optimizations also apply—e.g., the clients need to keep only the latest publication, plus its underlying data. Events from the past will generally also be needed for re-verification, some of that data may be selectively re-downloaded from the server as needed.
Note that, in the general case, if publication occurs every second, this would result in roughly 30 million publications per year; thus, a verifier would need have at least 30 million hash paths per year (and these can be different hash paths per each user), even if there are no events for that user (because the verifier needs to double-check the claim that there are no events, for each and every publication) There are several ways to optimize this for special scenarios (see below), for example, zero-knowledge proofs, including the idea of additionally also gossiping the hashed transactions per every user, in various sizes of gossip circles.
In
The Event Log represents events for each ID under the given publication round. It forms a verifiable map, mapping from a (hash of) ID to a hash of lists of transactions for a given ID. That list of transactions may be a tree itself, or it could be a simple list. In a typical blockchain use-case, the transactions may be signed by their authors; however, one could skip the signature if, for example, the transaction was authored by the central server in which the various data structures are implemented.
For privacy, every user may be constrained to download only tree paths that are for their IDs and that they are allowed to see; and yet users may still see that they have a complete list of transactions for their IDs (a proof of “non-inclusion” as well as “inclusion”).
The State Tree's Merkle tree shows the latest state in a given round after applying all events of the round. For example, every user could have its latest account balances there. For privacy and efficiency, this tree may also be “sharded” by IDs, and various state keys and values (for a specific ID) may also be represented as their own tree whose root may be included in the state tree for the given ID.
The Tree Head Log, which may also be viewed as a “history tree”, stores preferably all roots of the other trees for all publication times. Compared to a typical blockchain design, this history tree gives much shorter proofs for data under previous publications when starting from a more recent publication as a trust anchor.
For increased privacy, the nodes of the above trees may be built such that every node, or some set of nodes, is provided with one or more fake sibling nodes, in order to hide whether or not there is a different branch in the tree. It may then be possible to hide the fact that, for example, a particular entity is a customer; otherwise, a company whose name shares a large enough prefix with that customer would be able to see that there are no branches in the tree that can refer to.
The DSV system addresses several challenges, which include:
Efficiently Verifying Correct Operation
It is desirable to be able to efficiently prove/verify that the log-backed map is internally consistent (that, starting from an empty map and applying all mutations listed in the log, one arrives at the current state of the map). This challenge may be met by the following:
Proof that there were No Changes in the Given Time Period
The term “process ID” (shown as ID) below is used to refer to (the name of) a private channel of communication, usually with restricted access. For example, in a bank, every account could have its own process ID; this way, revealing information about one account (one “process”) does not require revealing information about any other account.
Problem: It takes a lot of time for an auditor to perform a full scan of the entire history (essentially, checking all published hashes ever, and for each of them, all hash paths underneath) to ensure the DSV server didn't behave maliciously. This can be mitigated by:
Not every user may always gossip with all other users about such proofs. For example, users of a lower-level DSV may be the only ones gossiping about transactions in their own DSV instance. This option may, nonetheless, be suitable in cases where entities wish the patterns of their DSV instances to remain private from the rest of the world, or where there is heavy traffic in the hands of small number of people, although both cases would typically be less secure due to a smaller number of nodes gossiping the data.
Prove No Split View
Again, to address this issue, there are alternative embodiments of a solution:
In addition to the messaging techniques used to propagate gossip, the structure of the gossip message should be specified. The design of this message should support the goals of the gossip function, within DSV, namely, to allow users to efficiently audit the server, as it operates, and to detect split-view attacks, and other forms of incorrect operation.
Each Gossip may have two components, which are created at each publication interval:
The pub/sub level gossip message should contain:
The Supplemental Object preferably contains:
Given this technique, parties who are interested in auditing would listen to the desired gossip channel and receive the published messages from the server. On their local machine, these parties should maintain an array of all ProcessID's and the index at which they were most recently updated. In order to participate in auditing, when the new gossip comes, they would:
7) If Process Backlinks agree, update the cache for the current batch of processID's, so that they are paired with the current index.
This describes a type of split-view attack, called a “flipflop”, in which the server makes an unauthorized change to a Process State, then changes it back, then tries to cover it up, by attempting to represent that the flipflop did not occur, and thereby conceal that the attack happened.
Assume that, between 7 and 11, there was an attack. Under the above proposal, when users receive and download the Supplemental Object for index 11, and double check their cache, following the procedure above, they will see that the cache indicates that Process A was most recently changed at index 5, not index 2, as indicated in the Supplemental Object.
They would then like to prepare evidence that the server has signed two conflicting statements, that is, the set of signed backlinks from index 7 and the object they just received at index 11.
Assume that the history of Supplemental Objects is not saved by this user, but is available on a content-addressable distributed file system. Object 11 is in their possession, but because they have not saved the object from index 7, they must retrieve it. This can be achieved using the Content Backlinks from object 11.
In one embodiment, where there was a single Content Backlink to the previous Supplemental Object, entities would need to walk the chain backwards from object 11, to 10, then 9, 8, 7.
This can be improved upon, to achieve O(log(n)) traversal of the Audit Object. A second, improved technique is to include not only the Content hash of the previous Supplemental Object, but also additional links to Supplemental Objects from older indices. In this way, each Supplemental Object contains an array to several older objects, with increasingly larger skips. For example: include Content Backlinks to the current index—1 (previous), current index—10 (ten old), current index—100 (100 old), current index—1000, and so on. This provides O(log(n)) traversal, that is, in order to walk back 2222 steps, you would only need to follow 8 steps, instead of 2222. Using this optimization, each traversal step requires a retrieval operation from the distributed content addressable file store, which will typically be slower than following a pointer in memory.
The problem with this approach is that the size of the Supplemental Objects increases greatly, and duplicates information. A further improvement may be made as follows:
Instead of including a large array of Content Backlinks in each Supplemental Object, most Supplemental Objects may contain only a single Content Backlink to their immediately prior objects. Then, at regular intervals, Sentinel Objects may be created, which contain a larger number of Content Backlinks. This can still be arranged to provide O(log(n)) traversal (albeit with a larger constant) but dramatically reduce the storage required for the Supplemental Objects. Additionally, since the position of these Sentinel Objects is known in advance, and their utility is high, there then exists an incentive for some users to replicate these Sentinel Objects, in order to assist the network in traversal requests.
The different “privacy circles”—called “process ID” in this document—may also be used as different “name spaces” for different services, customers, etc.
See
The hierarchies could be statically partitioned, for example, by geography, organization domain names, etc. On each level, or, for example, only on the bottom levels, actual process IDs with business data may be used. The topmost DSV may then contain the publications of different geographic continents; the next layer might contain continent-specific publications for industries (for example, health care, supply chain, etc.); and the layer under these might contain publications for organizations (for example, Company ABC, Bank XYZ, etc.), under which they would each store with their respective Process IDs. Thus, every company would have its own DSV instance on the bottom level.
The configuration may also be dynamic—as DSV supports smart contracts, there could be specialized smart contracts (with proper permissioning) to handle exactly where in the hierarchy one would find specific process IDs, and their positions could change over time, for example, to share loads across servers, etc.
As
The various hash trees do not have to be binary, including the tree of
A smart contract could be hard-coded into verifiers, or it could be upgradable “in flight” by a permissioning scheme, etc. The contract could be very simple. A degenerate case would be just a listing of processes that have to be in a specific place in a tree, with a default location by name for every other process, or they could be more complex, such as a smart contract that dynamically determines the location of items in the tree based on a real-time bidding market. Since any updates to the functioning of the such a smart contract need to be known to every verifier, care needs to be taken to ensure that the updates to the smart contract itself are verified and transmitted in an efficient manner.
All the above mechanisms of efficiency may still apply—for example, checkpointing could be used to ensure that the smart contract could only be updated once every hour/day/etc., and the updates could be of limited size and may even be limited by number of operations they are allowed to execute, thereby reducing the need to download a big number of updates to the smart contract itself.
The previously illustrated embodiments of the DSV system are done using Merkle trees. An alternative uses skip lists (see https://www.cs.cmu.edu/˜ckingsf/bioinfo-lectures/skip lists.pdf) as a replacement for at least the Mutation Log Merkle tree. This option is illustrated in
If the data needs to be kept secret from the entity hosting the DSV server, the Mutation Log entries and the State Tree can be encrypted by the customer organization. The encryption/decryption keys may then be held by the customer. One method of deriving keys is to hold the keys in the form of a Merkle tree with the root of the tree holding a key derived from the process ID (explained above). Further, child nodes may derive more keys based on the root key above. Any general-purpose key derivation function may be used. The key would need to be shared with the auditor, or, alternatively, another level of encryption can be added to encrypt using the auditor's keys.
The DSV server digitally signs all Tree Head roots that it publishes. These signatures may be time-stamped, for example, by using KSI. This time stamping would ensure the following:
If the server's key were to ever leak, any future signatures with the same key could be automatically invalidated by the lack of a pre-leaking date timestamp. (The timestamp would also be included in gossip, as it is part of data that is necessary to authenticate the server's signature.) Thus, the leaked key could not be used to falsely implicate the server for split view.
That signature timestamp would also necessarily cover all the data in DSV (because the server's signature would naturally cover all that data).
For some data structures used in embodiments, a hash tree structure known as a “Sparse Merkle Tree” (SMT) is particularly advantageous. The structure and characteristics of an SMT will now be summarized, for completeness, followed by an explanation of how SMTs may be used in embodiments.
See
x
8B=hash(x89|xAB)=hash(hash(x8|x9)|hash(xA|xB))
x
0F=root=hash(x07|xAF)=hash(hash(x03|x47|hash(x8B|xCF))= . . .
and so on, where “|” indicates concatenation.
The path in the tree from a leaf to the root may be defined by a vector of “sibling” values. Thus, given value x6, for example, and the vector (x7, x45, x03, xAF), it is possible to recompute the sequence of hash functions that should, if all values are unchanged from the original, result in the root value x0F.
The simple Merkle tree illustrated in
In embodiments, the value that is assigned or computed (such as via hashing) for an object, such as a process, is the “key” which is used to determine which leaf of an SMT the current value associated with the object is to be assigned to. In the greatly simplified example of
In
Now assume that the leaf values represent all the 16 possible values of a 4-bit binary word, that is, 0000, . . . , 1111, and that one wishes to determine if the node in position 0001 is “used”, that is, contains a non-null value. Using the convention chosen for this example, the value 0001 corresponds to downward traversal left-left-left-right from the root, which leads from the node root, to the node marked γ, to the node marked α (whose “sibling” node is marked β) and then to a node whose value is Ø2. At this point, however, there is no need to examine the tree further, since a node value of Øn indicates that there is no node junior that that node that has a non-null value. Thus, in this case, traversing the tree to the Ø2 is sufficient to prove that no value has been entered into the data structure corresponding to leaf position 0001. This also means that it is not necessary to allocate actual memory for a value at position 0001 until it is necessary to store a non-null value in that node.
But now assume that one wishes to determine if any leaf has a non-null value in positions 1000 to 1111. Since the highest order bit for all of these is a “1”, the first step in the tree traversal is to the right, and the first node in that path has the known value Ø4, which indicates that no leaf value in any path below that node has a non-null value. There is no need to examine the tree further.
In the very simple example illustrated in
In general, embodiments include:
At least one Verifiable Data Structure (VDS) such as the Verifiable Map, such as a sparse Merkle tree, which forms a trust anchor. A VDS may be a data structure whose operations can be carried out even by an untrusted provider, but the results of which a verifier can efficiently check as authentic. In one embodiment, the VDS may be implemented using any known key-value data structure. In one embodiment, the preferred key-value data structure is a sparse Merkle tree in which the key indicates the “leaf” position in the tree, with the associated data value forming the leaf value itself. As just a few example, the key for a real estate registry could be the property ID, with owner information as the corresponding value; the key in a voter registry could be a voter registration number plus, for example, a personal identifier such as a national ID number, with the actual voter information as values; and in a VAT registry, invoice numbers could form keys, with the invoice values being the corresponding values.
Verifiable Log of Registry Changes (VLORC), which is a data structure that enables auditing and can indicate the most recent state of tracked objects. The VDS and VLORC may be implemented within the same central/administrative server.
Verifiable State Machine (VSM), which forms a registry for object state. The State Tree described above is an example of such a data structure. The VSM may be stored and processed in any server that is intended to keep the central state registry.
Proofs, which may be held by users, and which comprise digital receipts (such as signatures) of data that has been submitted for entry in the various data structures. For tree structures such as a SMT, the set of sibling values from a leaf to the root may form a proof. The root of the SMT may in turn be published in any irrefutable physical or digital medium such that any future root value presented as authentic can be checked against the published value. In general, there will be a new root for each aggregation round, that is, for each time period during which leaf values may be added or changed.
To better understand what the different structures accomplish, consider the use case of voter registration. In many jurisdictions, such as in most in the USA, a prospective voter must apply for entry into the voter roll, that is, registry, associated with a particular election district. Assume that a prospective voter wishes to submit an application for voter registration. The application (with its data), and the identity of the prospective voter, may be represented in digital form in any known manner and may be associated with some identifier, such as a hash of all or part of its contents (along with any chosen metadata), which may form a key. Let hash1 indicate the representation of the initial state of the application, for example, the hash value at the time of submission. hash1 may then be entered as a “leaf” value in the VDS), and thus be bound to the root hash value of that tree for the respective aggregation time.
At the same time, a representation of the state “Applied for” may be entered into VSM. As part of the processing of the application, the application may be approved, which may be registered in the VSM as a change of the corresponding entry to “Registered”. This will also cause a change of the hash path from the new entry up to the root of the VSM. Either the user may then be given proofs of VDS and VSM entry (hash paths or other signatures), or these may be combined and signed as a unit. The VLORC may then, for example, register the time at which the application state changed. The proof in the VLORC may then also be returned to the user if desired.
For all of the embodiments and use cases described above, certain issues of verifiability may arise, such as, without simply trusting the registry:
The VLORC addresses these questions. See
Assume a DSV instance that operates in rounds, that is, periods during which values are accumulated and a new root value of the SMT 1000 is computed. The length of each round may be determined by the system designer according to what types of data objects are to be tracked. For example, in a manufacturing process for large products, or changes of land ownership in a relatively small jurisdiction, changes may not happen rapidly, and a rounds could last several seconds or minutes or even longer. If all the accounts receivable of a large enterprise are to be tracked, however, or all financial transactions relating to many accounts, then more frequent rounds may be preferable. It is not necessary for rounds to be of the same length, although this will often be most convenient for the sake of bookkeeping. Also, if the DSV instance is to be synchronized with another infrastructure such as KSI, for example for the purpose of generating timestamped signatures, then it will generally be advantageous to arrange at least the time boundaries of DSV rounds to correspond to the time boundaries of KSI accumulation/calendar periods.
Assume by way of example (
One leaf of the SMT 1000 is chosen to be a “Key change” or “Delta” (Δ) leaf 1010. The value of the Δ leaf is a function of an indication of when the most recent previous change has been made relating to any non-null leaf that is changed from null to non-null in the current round. Let Ki:n indicate that key i most recently changed (or was first registered, if not previously) in round n. Thus, since the state corresponding to keys K1 and K2 changed in Round1, the Δ leaf encodes K1:1 and K2:1.
Note that initial entry of a key value forms a special case: the value n will be the same as the round in which the instance of the structure 1100 is found. In other words, since K1:1 and K2:1 are indicated in the structure 1100 and A leaf 1010 of the SMT 1000 for Round1, one can know that these are initial entries. Other indicators of initial entry of a key value may also be chosen, however, as long as they unambiguously indicate in which round the values are first registered in the SMT 1000. For example, in
The information Ki:n for all i and n may be contained in any chosen data structure 1100. Since Ki will typically not directly reveal what data object it corresponds to, this structure may be revealable, which will also aide in auditing. A simple table (such as a hash table) or array may then be used as the Changed keys data structure, arranged in any preferred order. Another option for the data structure 1100 is yet another sparse Merkle tree, whose root value is passed to the SMT 1000 to be the value of the Δ leaf. The value n may then be assigned as the value of the leaf at the position corresponding to the key value Ki. As still another option, the Changed keys data structure could be configured as a skip list, which, as mentioned above, allows for insertion and is relatively efficient to search.
Assume (
In the illustration, the SMT 1000 leaf K2 value has not changed since Round1, so this leaf value remains hash(5678J), with an indication of Round 1.
Now assume (
For each round, the root value (root1, . . . , root9, . . . , root15, . . . ) of the SMT 1000 is preferably immutably registered, for example, by entering it directly into a blockchain, or by submitting it as an input value (leaf) of the KSI signature infrastructure, which would then have the advantage of tying each round's root value to a calendar time.
Whenever a root value of a hash tree is generated, such as the SMT 1000, a proof is preferably returned to the user, and/or otherwise maintained for another entity, such as an auditor. The proof may be the parameters of the leaf-to-root hash path. If the root of one tree (such as SMT 1000) is used as an input leaf to a higher-level tree, then the proof may be extended up to the higher level root and, ultimately, in the cases in which the KSI infrastructure is used to sign and timestamp values, all the way to a published value that will also be available to an auditor.
Now again refer to
The SMT 1000 and Changed keys data structure 1100 for each round may be stored and made available by the central administrative server, or by another other entity. Especially if the SMT 1000 leaves do not contain “raw” client data, but rather only hashes, the SMT 1000 will not reveal any confidential client information. Note that new proofs are preferably generated for each value added to or changed in the leaves of the SMT 1000, but need not be regenerated for unchanged leaves—if the value of a leaf has not changed for some time, then the auditor may check the proof at the time of most recent change, which the auditor will be able to find by going “backwards” in time using the Changed keys data structure 1100. Clients preferably store all proofs for “their” respective state values (that is, SMT 1000 leaves) so that they may be presented to auditors; alternatively, or in addition, proofs may be submitted to any other controlling entity for storage, including the auditing entity itself.
In the cases in which hash values of data objects are registered, such as hash(FGHJK) instead of FGHJK directly, the entity being audited will reveal the “raw” values to the auditor. As long as the hash function used in the SMT structures is known (for example, consistently SHA-256 or the like), then the auditor will also be able to compute its hash value, without the raw data having to be revealed to any other entities. The Changed keys data structure 1100 may, however, for the sake of transparency, be revealed, since it need not contain any “raw” data that identifies any particular user, account, data object (such as a unit of digital currency or other negotiable instrument), etc.
Rather than having a single Changed keys data structure, it would also be possible for clients to maintain respective Changed keys data structures containing information only for their own keys Kj. The roots or other accumulating values of these structures may then be combined by the administrative sever that maintains the SMT 1000 in any known manner, such as by aggregating them in yet a separate SMT or other hash tree, whose root value forms the Δ leaf value. The clients should then retain proofs from “their” entries (“leaves”) to roots, and up to the roots of at least one tree maintained by the administrative server, such as SMT 1000, to prevent any later alteration.
The central, administrative server should store the VLORC SMT 1000 for each round. Assume that a client being audited with respect to the data object whose key is K1 reports a current value of ABC12 to the auditor. The auditor may then contact the administrative server and download the most recent VLORC SMT 1000, compute hash(ABC12) and see that it matches the current value for the K1 leaf. The auditor will then also see the “linkage”, via the Changed keys data structure, back to Round15, to Round9, and to Round1, along with the respective values at those times (the auditor may, for example, request “raw” data from the client). Note that, since other metadata may be entered into a leaf value in addition to the hash ( . . . ) and Round:j data, the auditor will be able to confirm this as well from the proof generated when any change was registered. In short, by following the values in the Changed keys data structure 1100 iteratively “backwards” in time, an auditor may track the entire change history of a data object back to the round during which it was first registered in the SMT 1000.
The auditor may then also recompute the proofs associated with the current K1 and previous K1-associated values and confirm that this leads to the correct root values. This ensures that the SMT structure 1000 itself was not improperly altered.
In the embodiment illustrated in
This application claims priority of U.S. Provisional Patent Application No. 62/787,194, which was filed on 31 Dec. 2018.
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/US2019/069121 | 12/31/2019 | WO | 00 |
| Number | Date | Country | |
|---|---|---|---|
| 62787194 | Dec 2018 | US |