This invention relates to an electronic system and method for generating a representation of non-deterministic time that can be used for, among other things, verifying that an event did not occur before a specific time.
In the age of digital imaging and high-quality image-editing software, the maxim “Seeing is believing” is becoming increasingly out-dated. Nowadays, few cosmetic or clothing advertisements, billboards, magazine photos of celebrities, etc., are not digitally altered in some way. In fact, so many people have come to internalize a visually false portrayal of “reality” that at its 2011 meeting, the American Medical Association adopted a proposal calling on advertisers to develop guidelines to discourage the altering of photographs in a manner that could promote unrealistic expectations of appropriate body image. One attendee observed: “The appearance of advertisements with extremely altered models can create unrealistic expectations of appropriate body image. In one image, a model's waist was slimmed so severely, her head appeared to be wider than her waist.” The issue of altered images warping peoples' perceptions of some form of ideal has even reached the political level. In 2009, for example, various politicians in France proposed a law that would require a “warning” label on all advertising, press, political campaign, art photography and packaging images if an image has been retouched, in particular, digitally altered.
Alteration of image content is only one aspect of the concerns involving image manipulation: In many cases, the time an image is created may also be essential information. For example, the time an image was first created can be the determining factor in the relevance of photographic or video evidence in a criminal trial. At present, in many jurisdictions, whether an image or sound recording is admissible as trial evidence is typically left to the discretion of the judge, and whether the admitted evidence proves the occurrence or non-occurrence of some event at a given time is often a question that the court—either the judge or jury or both—decides at least in part subjectively. This means that the accuracy of the determination is a function of the relative sophistication of the forger versus the court.
Similarly, a casino manager or a punter watching an online horse race will want to know if the video he is watching is really “live”, or at least is showing events at the times they are supposed to have happened. With known technology, these viewers must mostly simply trust that there is no manipulation happening, hoping that later information confirms what they saw/heard.
As every tourist knows, a camera can be set to show the time/date a photo was taken. This time/date value, however, proves little or nothing, since it is so easy to change these settings in a camera. The time/date value could be derived from some external source, but this then simply moves the question of reliability from the local device to that external source and the transmission medium between the two.
A more sophisticated method would be to submit the contents of the image to a service that digitally time-stamps them, perhaps along with image metadata. One of the problems with traditional digital time-stamping, however, is that it proves only that data existed before a particular point in time—one can easily take a photo, edit it, then later digitally time-stamp the edited version, thereby forward-dating the image. In other words, typical time-stamping can establish the latest time an image could have been time-stamped, but this doesn't prove that it couldn't have been created and altered earlier. Traditional digital time-stamping works well if there is general acceptance of the time something happened. For example, if a major unexpected news event is generally known to have occurred at 11:37:46 UTC and a photo of the event is digitally time-stamped indicating 11:37:46 UTC, then there is an exceptionally low probability that the photographer will have had time to edit the photograph at all before obtaining the time stamp. Absent such an external confirmatory event, however, conventional digital time-stamping may provide a high level of assurance that an image hasn't been back-dated, but it typically cannot enable detection of forward-dating.
One other drawback of traditional digital time-stamping schemes is the nature of the schemes themselves. Many known digital time stamps rely on a public key infrastructure (PKI). In the context of time determination, one disadvantage of PKI-based time-stamping systems is that users of such systems must simply trust the accuracy of the system's time reading, even though there is no ability to independently verify it after the fact. One other disadvantage of such PKI-based signing schemes is that, by their very nature, they require the creation, administration and maintenance of the keys. Moreover, for reasons of security, digital certificates, and the PKI keys underlying them, are often allowed to expire. PKI keys also have an operational lifetime after which information that has been time-stamped with those keys needs to be re-time-stamped in order to ensure the time stamp is still valid. A compromise of those keys, or even the possibility of a compromise (by insiders intent on fraud or outsiders, such as hackers, intent on profiting from a key compromise) will cause the digital time stamp to be easily challenged. Digital time-stamping is thus not absolute proof but rather more an attestation by the authority that administers the keys.
The problems just described also apply to other types of data. For example, there may be a need to verify the time of a recorded audio event. Audio files are often even easier to edit and forward-date than video.
Part of the problem of temporal visual/audible data verification is that conventional time is deterministic and therefore predictable: If one knows the exact time right now, then one will also know the exact time n seconds (or other time unit) from now. This means that one has n seconds from actual occurrence/creation of an event to manipulate the corresponding data file and then have the altered data stamped with a desired future time.
The problem of having a trustworthy and verifiable indication of time also arises in contexts other that the audio-visual. For example, it may be important to clearly and establish the time that a computer system event has occurred. Similar to other devices, it is easy for a user to change the system time of a computer that uses an internal time base, and this change will affect the time indications for most or all things the user does thereafter.
It would therefore be good to have a way to establish the time of a perceptibly created or recorded event with less opportunity for undetected manipulation. More generally, it would be advantageous to have some representation of time that isn't deterministic and therefore predictable.
In broad terms, this invention provides a method and various system implementations that generate a representation of non-deterministic time. The invention is described primarily with reference to examples that relate to a type of clock that can be used to mark captured representations of events so as to establish the time of capture in a way that greatly reduces the opportunity to forward-date altered representations. According to one preferred aspect, non-deterministic time is established based on an output of a keyless, distributed hash tree infrastructure, which may also optionally be used to authenticate the contents of the representation of the event as well. Before explaining the notion and use of non-deterministic time, such a keyless, distributed hash tree infrastructure is therefore described first.
As
In the illustrated arrangement, a client is the system where digital records are prepared and entered into the verification/signature system. A digital record may be any set of binary data that one later wishes to verify has not changed since initial registration and signing using the infrastructure. Thus, the term “digital record” could a digital representation of an image, an audio file (or combined audio-visual data such as from a video camera), a digitally created or converted document, etc. Generally, a “digital record” therefore may be anything that can be represented as a set of binary data, regardless of source, manner of creation or method of storage. In short, a client is any system where a representation of any type of information is input, created or otherwise presented (with or without human involvement) in digital form such that it can be processed and registered using the infrastructure according to the invention.
A gateway in the layer 3000 will typically be a computer system such as a server with which one or more of the clients communicates so as to receive requests for registration of each digital record that a client submits. In many implementations, a gateway will be a server controlled by an enterprise or some third-party provider, which may be a server known to and maybe even controlled by an organization to which the client user belongs, or a server accessed through a network such as the Internet. In short, a gateway may generally be any server located anywhere and configured to receive requests from clients for digital record registration. Gateway systems do not need to be of the same type; rather, one gateway might be a server within a company that employs many clients, whereas another gateway might be a server accessible online by arbitrary users.
An aggregator in the aggregation layer 4000 will similarly be a computer system such as a server intended to receive registration requests that have been consolidated by respective gateways. Depending upon the scale and design requirements of a given implementation, any aggregator could also be controlled by the owner of the core, or the owner of the same systems as the gateways and clients, or could be provided by an entirely different entity, and in some cases it would also be possible to consolidate the aggregator and gateways for particular set of clients.
As an example, large corporations or government entities might prefer to implement and benefit from the advantages of the infrastructure using only their own dedicated systems. Nearer the other end of the spectrum of possibilities would be that the gateways and aggregators could all be configured using “cloud computing” such that a user at the client level has no idea where any particular gateway or aggregator is located or who controls the servers. One of the advantages of this infrastructure is that digital input records can still be verified with near total security even in situations where users and others do not know if they can trust the systems in the gateway or aggregation layers 3000, 4000; indeed, it is not even necessary to trust the administrator of the core 5000 in order to have essentially total reliability of verification.
The different terms “aggregator” in layer(s) 4000 and “gateway” in layer(s) 3000 are not intended to imply that the systems (such as servers) that comprise them are functionally significantly different—a gateway “aggregates” the requests of the clients it serves and as such could be viewed as a “local” or “lower level” aggregator in its own right. In many implementations, however, gateways may be under the control of entities more closely associated with the clients and aggregators may be more closely associated with the overall system administrator that maintains the core. This is not a hard and fast distinction, however.
In one implementation, each client system that wishes to use the verification infrastructure is loaded with a software package or internal system routines for convenient or even automatic communication and submission “upwards” of digital information. The software package may include some application program interface (API) 2014 that transforms submitted digital records into a proper form for processing. A digital record 2012 created, selected, or otherwise input in any way is then submitted by way of the API 2014 to a software module 2016 that uses the digital data from the record 2012 as at least one argument in a transformation function such as a hash function.
Cryptographic hash functions are very well known in many areas of computer science and are therefore not described in greater detail here. Just one of many possible examples of a common class of hash functions that are suitable for use in this infrastructure is the “secure hash algorithm” (SHA) family.
Additional hashing within the client may be desired to include additional information depending on the design protocol of the infrastructure. Just a few of the many possible arguments the system designer might optionally choose to include as arguments of the additional hash function 2016 are an identifier of the person or entity requesting registration, an identifier of the particular client system being used, a time indication, information relating to the geographic location of the client or other system, or any other information desired to be incorporated as part of the registration request. A software module 2020 is preferably included to transmit the output of the transformation 2016 to higher layers of the infrastructure as a request (REQ), along with any other parameters and data necessary to communicate with a gateway and initiate the registration request.
It is assumed in this discussion that the transformation function 2016 is a hash function because this will be the most common and efficient design choice, and also because the properties of hash functions are so well understood; moreover, many different hash functions are used in the field of cryptology, security, etc., within commodity computers. One other advantageous property of hash functions is that they can reduce even large amounts of digital information to a size that is more easily processed, with a statistically insignificant chance of two different inputs leading to the same output. In other words, many well-known hash functions will be suitable for use throughout the infrastructure, and can be chosen using normal design considerations. Nonetheless, the function that transforms digital records into a form suitable for submission as a request need not be a hash function as long as its properties are known. For example, especially for small digital records, it may be more efficient simply to transmit the digital record data as is, in its entirety or some subset; in this case, the transformation function may simply be viewed as an identity function, which may then also append whatever other additional information is needed according to the core system administration to form a proper registration request.
The data structure of a binary hash tree is illustrated within the gateway 3010-2. The lowest level nodes of the gateway hash tree will correspond to the transformed dataset 2018 submitted as a request from a client, along with any other parameters or data used in any given implementation to form a request. As illustrated, the values represented by each pair of nodes in the data structure form inputs to a parent node, which then computes a combined output value, for example, as a hash of the two input values from its “children” nodes. Each thus combined output/hash value is then submitted as one of two inputs to a “grandparent” node, which in turn computes a combined output/hash value for these two inputs, and so on, until a single combined output/hash value is computed for the top node in the gateway.
Aggregators such as the system 4010-1 similarly include computation modules that compute combined output values for each node of a hash tree data structure. As in the gateways, the value computed for each node in the aggregator's data structure uses its two “children” nodes as inputs. Each aggregator will therefore ultimately compute an uppermost combined output value—a “root hash value”—as the result of application of a hash function that includes information derived from the digital input record(s) of every client that submitted a request to a gateway in the data structure under that aggregator. Although it is of course possible, the aggregator layer 4000 does not necessarily need to be controlled by the same system administrator that is in charge of the core layer 5000. In other words, as long as they are implemented according to the required protocols and use the correct hash functions (or whatever other type of function is chosen in a given implementation), then the client, gateway, and aggregation layers may be configured to use any type of architecture that various users prefer.
In one embodiment, the core 5000 is maintained and controlled by the overall system administrator. Within the core, a hash tree data structure is computed using the root hash values of each aggregator as lowest level inputs. In effect, the hash computations and structure within the core form an aggregation of aggregation values. The core will therefore compute a single current uppermost core hash value at the respective tree node 5001 at each calendar time interval t0, t1, . . . , tn. This uppermost value is referred to here alternatively as the “calendar value” Ci or “current calendar value” for the time interval ti.
Note that the time origin and granularity are both design choices. For example, one might choose each time interval to be uniformly 1.0 seconds. On the other hand, if significant network delay is anticipated or measured, it may be preferable to set the calendar time interval to a greater value. Less frequent computation of calendar values might also be chosen to suit the administrative or other needs of a verification infrastructure implemented totally within a single enterprise or for any other reason.
Conversely, if there is some need for finer temporal granularity, then one could decrease the time interval such that calendar values are generated more frequently than once a second. System designers may choose an appropriate time granularity based on such factors as the anticipated processing load, network bandwidth and transmission rate, etc.
Note that the uppermost tree node 5001 represents the root node of the entire tree structure of nodes junior to it. As is explained later, this will change upon recomputation of a new uppermost core hash value at the end of the next period of accumulating requests and generating signature vectors (also referred to as “data signatures”) containing recomputation parameters.
In
In
To increase independence of the various layers—in particular, clients and later entities wishing to perform authentication through recomputation—it is advantageous for the entire calendar to be passed to the aggregators and even to the lower layers, even as far as to clients, every time a new calendar value is computed, that is, at the end of each calendar time interval. This then allows delegation and distribution of the computational workload without any compromise of the integrity of the system. If the respective calendar value is passed down along with each data signature vector, it would therefore be possible to authenticate a digital record up to the level of the calendar value without any need for the infrastructure at all; rather, any user with the ability to compute hash values in the proper order, given the signature vector and respective calendar value, could authenticate a digital record presented as being identical to the original.
Important to note is that each calendar value in the calendar 6000 uniquely corresponds to a time, that is, to one of the time interval values t0, t1, . . . , tn. Thus, as “real” time progresses, a new calendar value may be generated for each interval for all of the digital records that happen to be input for signature during that interval. Each calendar value therefore corresponds to time, inasmuch as the infrastructure cuts off new data record input and generates a calendar value at the end of each interval, but the length of the interval is a design choice; as such, calendar values will change at regular intervals (assuming the design choice of equal time intervals).
In most implementations of the authentication infrastructure shown in
Assume for simplicity and by way of example that the granularity of calendar time intervals is chosen to be 1.0 seconds. In other words, assume that a new calendar value is generated every second. If the time right now is 03:19:26, then one knows that the time 1, 10, or 10,000 seconds from now will be 03:19:27, 03:19:36 and 06:06:06, respectively. One cannot predict, however, what the infrastructure calendar value will be even one second from now, although one knows that the value will be generated at 03:19:27.
When the core computes the current calendar value 5001 at the new calendar time interval, it may return to aggregator 4010-1 its sibling (X-marked) lowest core node value from aggregator 4010-k, and the aggregator 4010-1 can then return downwards the X-marked hash values to the gateway 3010-2, which in turn can return downwards to the client 2010-1 all of the above, plus the X-marked hash values computed within that gateway's hash tree structure, etc. The data signature vector 8000 for each client can then be compiled for each data signature request (such as for each input record 2012), either in the client itself or in any entity (such as the associated gateway) that has all “sibling” values for a given input record.
Note that this arrangement makes it possible to distribute the hash computation infrastructure over various layers (vertically) and also “horizontally” at each layer, but the responsibility for communicating requests upward and partial or entire signature vectors downwards can also be distributed and can be carried out simultaneously in many different locations. Of course, since a data signature is unique to the digital record that led to it, the procedure for returning a signature vector for each input digital record 2012 for client 2010-1 (note that a single client may input more than one digital record for verification in each time interval) is preferably duplicated for all digital input records received in the time interval over which values were accumulated for the computation of node value 5001.
The configuration of the distributed infrastructure shown in
In most cases, it is unlikely that the number of clients during a given computation interval will be exactly equal to a power of 2. Any known method may be used to adapt to the actual number of clients while still maintaining a binary hash tree structure throughout. As just one example of a solution to this, known dummy values may be used for all of the “missing” sibling node values. Alternatively, it is also possible to adjust the hash tree branches accordingly, in the manner of giving “byes” in single-elimination sports tournaments.
In one embodiment, the gateways 3000 may be more local to various clients whereas the aggregators are more regional. For example, it would be possible to locate aggregators in different parts of the world not only to distribute the workload, but also to increase throughput. Although it appears in
Assume now by way of example that some entity later wishes to verify that a digital record in question—a “candidate digital record”—is an identical copy of digital record 2012. Applying the same transformation function 2016 to the candidate digital record and recomputing upward using the corresponding data signature 8000, the entity should compute to the exact same calendar value that resulted from the original digital record's registration request. In some implementations, this level of verification is sufficient. As one possible example, if the calendar is distributed to enough independent aggregators, then if one malicious actor were to tamper with some calendar value, this could be detected if some procedure is implemented to compare with other copies of the same calendar.
As another example, in some implementations, users may choose or be obligated to rely on the security of the administrator of the core. In particular, government entities might implement a system in which users must simply rely on the government administrators. In these cases, recomputation up to the corresponding calendar value may be considered sufficiently reliable authentication. In the context of this infrastructure, this can be viewed as “first-level” verification. One hypothetical example of where such a system might be implemented would be where a government agency requires companies, laboratories, etc. to submit a copy of its calendar to the government entity every time the company's system updates its calendar. The government would then be able to audit the company's records and verify the authenticity of any given digital record by recomputing up to the proper calendar value, which the government will have stored. In practice, this would amount to requiring the company to keep updated a “calendar audit trail” with the auditing entity (such as the government).
Even in other instances, as long as the highest level system administrator trusts its ability to securely store calendars, it could be satisfied that a candidate digital record is authentic if recomputation leads to the appropriate stored calendar value. In a sense, it would be the system administrator itself in such cases that is looking for proof of the authenticity of candidate digital records as opposed to clients or other third-party entities. Consequently, the system administrator could trust the security of the recomputation and calendar values to the same extent it trusts itself to maintain the calendar copies.
All but the last digital record requesting registration in a calendar time period will typically need to wait for all other requests in the calendar time interval to be processed before a calendar value will be available that will enable authenticating recomputation. If the calendar time interval is kept short enough, this delay may be acceptable. To increase the level of security during the delay, it would also be possible to implement an option, whenever a client submits an authentication registration request, to generate and return not only the data signature vector but also a key-based signed certificate, which may be issued by any higher layer system such as the current gateway, aggregator, or even core.
Because of the various data structures and procedures of the distributed infrastructure, the published composite calendar value may encode information obtained from every input digital record over the entire publication time interval, and if the current calendar value for the current calendar period is hashed together with the previous one, which is hashed with the one before it, and so on, as shown in
In
Although it may in many cases be desirable or even required for the published value to encode information from the entire calendar from the beginning of calendar time, other alternatives can also be implemented as long as suitable bookkeeping routines are included. For example, rather than include all calendar values in the Merkle tree, at each publication time all of the most recent calendar values could be included in the publication computation along with a random sampling of calendar values from previous intervals. This would be one way, for example, to ensure that the number of included calendar values is conveniently a power of 2.
Similarly, in some contexts, government authorities require proof of records extending back only for some given time such as three years. In such cases it might be advantageous always to include only calendar values generated during this required period such that only relevant digital records are encoded in the most recent publication value.
Another alternative would be for there to be only a single computation of the publication value, including all calendar values from the beginning of system time. This might be useful, for example, in projects with clear time or digital record limits. For example, in litigation or transactions, parties often submit digital records to a “data room” for easy exchange. Calendar values could then be generated periodically as in other cases (perhaps with a longer calendar time interval since digital records will generally not be submitted as frequently as in large-scale, universally accessible implementations of the infrastructure), but with only a single computation of a publication value when all parties agree to close the data room. The publication value would then be a form of “seal” on the body of submitted digital records, which could later be used for recomputation and verification of any digital record ever submitted into the data room.
It is not absolutely necessary for the publication value to be computed using the Merkle hash tree data structure illustrated in
It is not a requirement for systems in any given layer to apply the same hash functions. For example, the transformation functions used in different client systems could be different. As long as the functions at each place in the recomputation path are known to whoever later wants to authenticate a digital record through recomputation, the authentication process will work properly. Adding a hash function identifier as an input parameter to the preparation of the registration request would be one convenient way to enable future users to correctly authenticate a digital record through recomputation.
Recall that the infrastructure shown in
See
Under the direction of a processor 201, the capture device 200 captures a framed portion 250 of the image of the event 300 and records it in digital and/or analog form internally as a corresponding image 260. In the case of digital capture, the image 260 will then typically be stored in a memory 202. Depending on the implementation, the capture device may also include a network interface device 203 and typically associated software so as to enable the capture device to access a network 400 and download data, as described further below.
Now assume that a display device 350 is also positioned within the image frame 250, that is, within the context of the event, and that it shows a representation 355 of the current NDT at the time when the image is captured. The displayed NDT representation 355 will then be part of the recorded image 260. In other words, in the illustrated example, the “photo” of the closed vault door will include a visual representation of NDT at the time the photo was taken. In the time-stamping example described above, one would be able to take the photo, then edit it, and then time-stamp the edited image—the time stamp at the later time would not prove that such forward-dating didn't occur. With the NDT representation in the image, however, the image itself includes visual information that shows the image could not have been created before the included displayed NDT value was available. Since NDT values cannot be predicted in advance, it will be exceptionally difficult to forward-date such an image.
In many cases, one will also want to rule out back-dating as well, at least as well as possible. One way to accomplish this would be for the capture device 200, using known techniques, to obtain a conventional digital time-stamp for the image 260, for example, by accessing a time-stamping server 600 over a network 400.
As
The calendar module 150 communicates in any known manner, such as over a dedicated connection or via the network 400, with the system 500 that generates calendar values Ck=C(tk). In one embodiment, the system 500 is the distributed hash tree infrastructure shown in
If time-stamping is also included as a feature in a chosen implementation of the invention, the time-stamp server 600, may, but need not be, implemented on a separate system from the data signature infrastructure; rather, it could be part of a single overall signing and stamping service. One advantage of this would be that it would be easy to ensure a common, synchronized time base for both components.
It is also possible to implement the NDT system 100 as part of the signature infrastructure itself, such as a processing component in the same server as the core, although it will typically not be involved in the data signature process as such. In such a case, at each time interval tk, or on demand, the NDT system 100 thus may obtain from the signature infrastructure 500 the corresponding calendar value, which in turn corresponds to the current NDT. It then passes this value to the display device 350, either automatically, as each new calendar value is generated, or on demand, depending on the particular implementation of the display 350.
In some implementations, the entire string representing the calendar value can be included in a digital photo or audio file, such as where the data-capture hardware and software automatically superimpose the string onto or into the corresponding digital file before initial storage and in such a way that the user cannot disable or post-process it in.
In most cases, however, a typical calendar value C(tk) will comprise a data string so long that it may be impractical to display it all to capture devices. For example, using the SHA-256 hash function to generate the calendar values tk−1, tk, and tk+1, and representing them in hexadecimal, would create 64-digit values. Including such a value in a public display device would in most cases be unwieldy, and many cameras would not even be able to resolve all the digits adequately except under excellent conditions, if then.
Other representations of the calendar values could also be displayed, depending on the type of display device used. For example, a pattern of lights around the display device could indicate some subset of the binary digits of each calendar value as NDT values. In implementations where the display is, for example, itself a form of video, such as a television, computer monitor, etc., the NDT values could also be encoded directly in the video stream, or as part of the task bar, wallpaper or screen saver of a computer monitor, or as a portion of the display of some running application that itself is being monitored by video.
The dictionary could also be organized according to parts of speech, such that different subsets of the calendar values are used to index into different portions of the dictionary, such that, for example, the first word used in the NDT is an adjective, the second is a noun, the third is a verb, and so on, such that NDT values are represented as syntactically correct sentences.
It is not necessary for the verbal NDT presentation 355 to be in alphanumeric characters—with simple modification well within the skill of system programmers, representations or identifiers of ideographs such as Chinese characters could also be used in the NDT display 355, or stored in the dictionary. Other non-Latin-based alphabets (such as Cyrillic, Arabic, etc.) could also be displayed, as well as syllabaries such as Japanese kana.
This verbal embodiment could be used for audio display too: Assume that the capture device 200 is, or includes, sound recording. Instead of, or in addition to, visually displayed words, the display words could be presented as audio files so that they are “pronounced” and included as part of the audio recording. In such a case, it is preferable to keep the verbal display words short or few enough for it to be possible to pronounce all the included word(s) for the current NDT before a new NDT is generated. To avoid unnecessary disturbance with the intended audio recording, it would also be possible to compress the audio file so as to pronounce the display words very rapidly—the information would still be available for decoding and later verification.
Yet another alternative of the verbal presentation of NDT values could be an oral presentation. For example, an announcer, video conference participant, etc., at some point(s) (such as the beginning and/or end) of an audio and/or video recording, broadcast or other presentation, could speak the word(s) that represent the current NDT, which he could obtain in any preferred manner, such as from a dedicated display in the recording device or in the room/studio, from a website that displays NDT, from a dedicated NDT clock display device, etc.
The “chords” would not need to be “played” throughout the time interval on the audio recording of the capture device, but each could be presented as a sound “burst” at some time during the current time interval to reduce interference with the primary audio recording.
As
In the embodiments illustrated in
Now assume that the purported time of recording (visual, audio or both) of an event is later to be checked. For example, assume that a photograph is presented and purported to have been made at a specific physical time. If an NDT value is present in the photograph, the NDT system 100 can be queried, with a purported NDT value NDT* and the purported event time tp—either manually or automatically—so that a verification module 160 can compare NDT* with the NDT value that was actually generated at the purported event time. If NDT*≠NDT(tp), then the system may return an indication that the photograph is not time-verified. If NDT*=NDT(tp) the NDT system may return an indication of at least probable NDT time verification to within the level of certainty allowed by the probability of “NDT value duplication” (see below). The data signature and time stamp values may also be submitted for verification by the respective entities, with results returned in whatever manner has been implemented.
Given an NDT value, the system will be able to determine the corresponding real, physical time in different ways. One way would be to maintain a data base of NDT values as a function of time, or vice versa. In systems whose time interval is, for example, 1.0 s, this would require storage of approximately 31.6 million NDT (or calendar) entries per year. Another way would be to query the system within the data signing infrastructure 500 that maintains the calendar 6000 with the purported time of occurrence of the event. The corresponding calendar value should then compute to the same NDT as recorded by the capture device, again, to within a degree of certainty that is a function of the probability of NDT value duplication.
The term “NDT value duplication” as used here refers to the fact that, in most cases, although NDT may be a function of calendar time, calendar time may not necessarily be a unique inverse function of NDT. Consider the example above, relating to
One way to reduce the probability of NDT value duplication is to increase the set of possible NDT representations of calendar values. Continuing with the example shown in
In most practical cases, however, the probability of NDT value duplication can be chosen to be sufficiently low as to provide an acceptable level of assurance that forward-dating has not occurred. For example, even with only 65536 possible NDT word pairs, it would still be very difficult for a user to predict which word pair will be the correct one at a given future time, especially since the pairs will occur essentially randomly and there is no guarantee that any given word pair will re-occur even within 18.2 hours. Of course, if one were to include 212=4096 words in the dictionary instead, and three words are included in each NDT representation, a duplicate of at least one NDT value (not necessarily of a given NDT value) would be certain only after about 2177 years, if a calendar value is generated every 1.0 seconds. A similar analysis will of course apply to other ways of representing NDT values.
Moreover, in case of a recording spanning several NDT intervals, the difficulty of guessing the sequence of NDT values increases exponentially compared to the difficulty of guessing just one NDT value. For example, with 216=65,536 possible NDT word pairs, the number of sequences spanning two NDT periods is 655362=232=4,294,967,296, the number of sequences spanning three NDT periods is 655363=248=281,474,976,710,656, and so on.
Of course, a way to avoid the problem of NDT value duplication is simply to have a unique one-to-one mapping between calendar values and their representations, the most straightforward of which would be to display or use the calendar values themselves.
Although it would require additional modification of the internal code of the capture device, one alternative would be to require a data signature for each captured image before it is stored in any memory device 202 that is easily accessible by any user; for example, each captured image could be buffered by the processor 201 and submitted as a digital input record from the buffer, and then stored to memory only when the signature is received. The code that controls the operation of the capture device 200 may be modified by skilled programmers to perform these functions. All of these additional measures increase the believability (and therefore, for example, credibility as evidence) of an image presented as having been created at a given time.
As mentioned, the data set (such as an image, audio recording, etc.) that incorporates an NDT representation may itself form an input to a signing infrastructure such as the one illustrated in
Note that different embodiments represent different levels of security, but also different degrees of implementation complexity:
The embodiments described above are primarily fully automated in the sense of requiring little or no human activity other than, in some implementations, operating a camera, audio recorder, or other capture device. These offer different levels of difficulty of defeat by sophisticated fakers. One embodiment that would be exceptionally difficult to defeat would involve at least one human as part of the “display” 350: Possibly along with other text or statements, assume the capture device is a video camera that films and audio-records a human who speaks the representation of NDT as it occurs—it would be practically impossible for all but the most sophisticated image editors to tamper with the video frames fast enough to maintain consistent NDT values and time stamps.
Calendar values of the distributed hash tree infrastructure shown in
Indeed, this is a concern even when the numbers are generated according to a method that is supposedly cryptographically secure, such as the Dual Elliptic Curve Deterministic Random Bit Generator (Dual_EC_DRBG). The Dual_EC_DRBG, previously promoted by the U.S. National Institute of Standards and Technology (NIST), was soon afterward shown (by, among others, Dan Shumow and Niels Ferguson) to display a vulnerability that could function as a back door. In fact, others have suggested (see New York Times, 10 Sep. 2013, “Government Announces Steps to Restore Confidence on Encryption Standards”) that the Dual_EC_DRBG may deliberately have been designed to include such a back door.
NDT values derived from hash-derived calendar values, in contrast, have the advantage of being tied to verifiable external events, that is, the input of a set of incoming documents, which can be proven to be non-faked by recomputation of any of the documents back up to a given calendar value, or even publication value as shown in
Consequently, one alternative use of the technique described for creating NDT values would be to use them as [pseudo]-random numbers, regardless of whether they also are used to verify the time of an event in an image or other file. The distributed hash tree infrastructure would therefore have the “side benefit” of also functioning as a form of unpredictable, non-deterministic number generator that could make calendar or NDT values or composite calendar values available (including as an NDT display, as long as it includes a desired number of digits). Even if calendar values are not displayed as such non-deterministic numbers, those desiring reliably non-deterministic numbers could simply submit a file as an input record to the distributed hash tree infrastructure and then use the resulting, returned calendar value instead of an otherwise generated [pseudo]-random number. Users of such a number generator would not need to worry, or at least as much, about the integrity of the generation algorithm, or about reliance on a chaotic physical source, etc.
As system designers will understand, the various computational modules within NDT system 100 and the capture device 200 comprise computer-executable instructions that may be provided from any known computer-readable storage medium, including downloading the code over a network into memory or other storage units, on physical media such as CD-ROM or other disks, on optical or magnetic storage media, on flash or other RAM-based memory devices, etc. This code may then be loaded into storage and/or memory and executed by the respective processors to implement the various processes that enable the invention as described above.
In the embodiments described above, NDT values are computed as functions (including the identity function) of current calendar values Ck=C(tk). The distributed hash tree infrastructure offers alternatives to this choice. Refer again to
Note that, in this embodiment, each NDT value would encode information found not only in the corresponding current calendar value, but also information from all previous calendar values included in the uppermost hash structure value. This increases the protection against a flooding attack, since an attacker, wishing to forward-date, would need to control every other input to the infrastructure not only in a current calendar period (which is nearly impossible as is for a widely used system), but for all calendar periods up until the time he wishes to forward-date.
In most of the examples given above, NDT is presented visually and/or or audibly. Other applications are possible, however. Indeed, NDT may be used substantially in any application or situation where some form of time notation is desired and that does not necessarily have to be standard clock time, especially where the evidentiary value of non-determinism (that is, non-predictability) can be advantageous. As just a few of the essentially countless examples, NDT could be associated with automated events such as computerized transactions or internal computer events, stock exchange trades (to help disprove insider trading, for example) the creation, modification, transmission or receipt of data files, state changes of machines or manufacturing processes, print-outs of invoices, receipts or delivery notices, etc.
The display device 350 need not be fixed. Rather any device capable of accessing the NDT system 100 could be designed to obtain and display NDT, thus forming an “NDT clock”. Smart phones and other mobile devices, computers of all types, watches (many of which are already internet-enabled), “augmented reality” display devices such as Google Glass, etc., could all be designed to make NDT available to a user, who could then, for example, manually record the current NDT for whatever desired purpose, or to computer hardware or software components instead of or in addition to physical clock time. NDT could also be represented as a kind of “non-deterministic time zone” as an option in such devices. If implemented as a clock that is viewed by a user, any desired NDT representation may be used, such as those illustrated as the representations 355 in the figures.
As just one example, the face of an “NDT watch” could be configured as shown in
In the previous discussion, the device 200 is described primarily, by way of example, as being some form of device that can capture audio or visual events, in which the various NDT values are presented either visually, on some form of display, or audibly.
The device 200 will in this case include a module 1220 that connects with the NDT system 100 either directly or via the network 400, so as to issue a request to obtain non-deterministic time values as indicated above, which, upon receipt from the NDT system 100, may then be passed to the system software or to user level applications 1230 as desired. The time module 1220 may be incorporated into the system software 1210, or may be a separate application at the user or system software level that is installed in the device 200.
The request for and the downloading of a non-deterministic time value may be triggered by sensing the occurrence of any kind of event 300, which may be some event totally internal to the system 200, or may be triggered by the user 1300, or some combination of the two. Any kind of event may be marked with an NDT value, which may, for example, be stored as part of the metadata associated with the event, as part of a system log, transmitted to an external system such as an administrative or auditing system, or in any other desired manner.
Examples of the essentially countless triggering events could include the booting up of the system 200, the taking of a system snapshot by the operating system 1210, the downloading, creation, alteration or deletion of a file or other data unit, the updating of software, firmware, or hardware, any form of failure that still allows the system to get an NDT value, etc. There are, similarly, essentially a limitless number of user actions that could be used to trigger the system 200 to request and associate an NDT value with the user action as an event. For example, the time when the user 1300 logs into or out of the system could be an event. Other examples might include the time at which a user opens or saves a file, the beginning and/or end times when the user is online, etc. In cases where the system 200 is a communication device such as a telephone, the event 300 could be the initiation or ending of a telephone (or VOIP) call or other network access. The system 200 might also be a server that is involved in financial transactions, either of the conventional type or using digital currency, such that the time of initiation or completion of transactions could be marked with an NDT value.
Note that, in many of these examples, the system 200 may request and obtain the current NDT value without contributing to its creation by submitting any form of digital input record to the hash tree infrastructure that creates the calendar value underlying the NDT value. In other words, in such embodiments, the NDT system 100 may be used as an independent time base, essentially forming an external clock whose NDT output the system 200 may use as internal timestamps for events 300. This independence is not required, however—the system 200 could also request a signature for a data set submitted in conjunction with an event, such as metadata identifying the event and/or the data (such as a file) defining the event itself, depending on the type of event, whereby the submitted data set would be encoded within the NDT value. For example, an NDT value could be associated with each of, or selected ones of, a series of financial transactions, such as credit or debit card transactions, transactions involving digital currency, etc.
One other advantage of using NDT values as time indications is that, even though each NDT value can be unambiguously associated with the physical time corresponding to the calendar values used to form the NDT value, and can be reproducibly verified, the NDT values themselves will typically not be intelligible to most users and are thus harder to interpret and fake: A user can easily change a standard time indication, or even an “epoch” number (a count of time units from some origin time)
At the other end of the “spectrum” of independence, the “event” that triggers the system 200 to request and input an NDT value could simply be according to a schedule, such that the system requests and inputs NDT values, for example, as a record that it was operational during some period, or to mark intervals during which it performs tasks, etc.
In
This application is a Continuation-in-Part of U.S. patent application Ser. No. 14/834,732, filed 25 Aug. 2015, which in turn is a divisional application of U.S. patent application Ser. No. 14/094,252, which was filed 2 Dec. 2013 and which issued as U.S. Pat. No. 9,178,708 on 3 Nov. 2015.
Number | Date | Country | |
---|---|---|---|
Parent | 14094252 | Dec 2013 | US |
Child | 14834732 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 14834732 | Aug 2015 | US |
Child | 14986529 | US |