Blockchain records regarding documents are generally isolated entities. Thus, for off-chain storage, when a set of documents is registered in a blockchain using only hash values (as opposed to in-chain storage, in which the documents themselves are placed into the blockchain), information regarding the relationships of the documents is typically not included. Therefore, any third-party verification regarding the documents at a later time, that involves a determination of whether the document owner considered the documents to be related in some manner at the time of registration, may require that representations by the documents' owner be trusted at the time of verification. Although this is a minor point, it is nevertheless at least a blemish on the idea that blockchains provide “trust in the absence of a trusted entity”, because at least one aspect of the document information (i.e., the existence of some relationship among different documents) cannot be verified in a truly independent manner.
This can become an issue when an arrangement involves multiple separate documents. Some (of many) example scenarios include: (1) real estate transactions; (2) sets of estate planning documents that include codicils for identifying specific bequests, powers of attorney, and others; (3) financial transactions involving multiple stages and/or accounts; and (4) patent cross-license deals with one document that addresses standard essential patents (SEPs) licensing terms, and a separate document that addresses patent licensing terms for non-SEPs. Patent cross-license deals may use separate documents because laws and typical licensing terms can differ widely regarding SEP and non-SEP licensing terms, and companies may become involved in a lawsuit over one class of patents, while the other class is covered by an existing license. The use of multiple documents in real estate transactions and estate planning is well-known. It would therefore, be beneficial to be able to identify that, at the time documents were registered in an off-chain storage blockchain (e.g., a blockchain that stored only document hash values, rather than the documents themselves), the documents were related as part of an identified set of documents.
The ability to easily and reliably establish that a document (a computer file) has existed as of a certain date, and further that is has not been altered by tampering since that date, has been an elusive target for certain types of documents. Document types for which an easy, reliable date proof has been a particularly elusive goal include 1) documents which have been kept in secrecy since their creation, as well as 2) documents which are retained in an uncontrolled or poorly-controlled environment, such as on a website that is susceptible to easy modification and alteration by computer hackers or even the website owner.
The ability to reliably date prove such documents could provide significant beneficial results. For example, in a patent dispute, if one party attempted to claim earlier development of an invention, by producing documents that had been previously held confidentially as trade secrets, the other side may bring accusations of backdating the documents. Using cryptographic methods as part of the proof that an electronic version of the document existed as of the claimed date, as well as to prove that no information had been added since that date, could reduce cost and uncertainties in comparison with the prevalent method of relying on human recollections and honesty in an adversarial legal proceeding. As used herein, the term document includes both humanly readable documents and other digital files, including data files, executable software programs, and files in encrypted, compressed, and/or fitting defined file formats. The term electronic document includes both word processing files, ASCII text files and other digital files, including data files, executable software programs, and files in encrypted, compressed, and/or fitting defined file formats.
Additionally, if a PTO examiner, performing a prior art search for a pending application, discovered a document on a website that allowed revisions to posted pages and used that document in a 35 U.S.C. § 102 or 103 rejection, the patent applicant will challenge the rejection as relying on an improper reference, because it may have been revised to include the referenced passages after the application's priority date. The PTO currently has no response to such applicant arguments, unless an examiner is able to find a copy of the contested website document that had been archived in a reliable database prior to the claimable priority date. The PTO and other organizations facing a similar document dating issues lack the resources to independently generate and maintain date-provable databases of all potentially valuable internet documents. Some internet document archiving services do exist, but due to storage requirements, these databases archive only a small percentage of available documents. Additionally, the selection of documents for retention is outside the control of most users who would later need to rely on the archive, and further, the purported dates of the archive entries can typically be questioned and contested by opponents in litigation.
A prime example of a failure by others, to solve the problem that it is currently cost-prohibitive to prove the dates of various revisions of document held in poorly-controlled environments, is that the PTO has policies against using many potentially valuable website pages in 35 U.S.C. §§ 102 and 103 rejections.
This is a significant matter. Either the PTO is inexplicably excluding a large amount of easily-searched information from the examination process, thereby denying patent examiners access to a valuable resource that could simultaneously ease their burden and improve patent quality, or else the PTO's policies are effectively an admission that a large-scale solution for reliably establishing dates for website pages has not been found and is therefore not obvious.
A prime example of a failure by others, to solve the problem that it is currently difficult to prove the dates of documents held in secrecy, is the relatively low adoption rate of trusted timestamping solutions. Some attempts have been made in the prior art to address date proving documents that are held in secrecy. However, these have so far failed to meaningfully solve certain problems and achieve widespread adoption, because they have multiple security vulnerabilities, require multiple conditions that are uncertain to exist, and are subject to compromise at unpredictable times.
Many industry experts, and even cryptographic standards organizations, teach away from the concept that establishing a document date is possible without all interested parties finding a common entity to trust for time keeping. That is, the current paradigm requires that the document author or any other asserting party attempting to establish a document date, and the document challenger must both endorse a single entity's credibility, which cannot have been compromised or lost through unethical action by insiders, malicious activity, accident, or computational advances that render the trust mechanism obsolete.
One of the prior art solutions is to provide a copy of the document to a document archival services provider. At a later time, upon needing to establish the date of the document, the records of the document archival services provider are subpoenaed and used to establish the date that the document was placed in secure, archival storage. Unfortunately, this solution is expensive, due to storage and record-keeping requirements and so, as can be expected, relatively few organizations use such a service. It also has multiple security weaknesses, including potential corruption of the services provider employees; forgery of archival records unknown to the services provider; loss of the document by fire, flood or theft; and that the services provider is out of business at the time its services are needed to verify the document date.
Another prior art solution is to use a timestamp from a trusted timestamping authority (TTSA). The document author, who wishes to preserve a document in secrecy, can hash the document, send the hash value to the TTSA, who combines the submitted hash value with a timestamp, hashes the combination to produce a second hash value, digitally signs the second hash value with a private key, and returns the signed hash value along with the timestamp information to the document author. The document author then stores the signed second hash and timestamp information with the original document.
At a later time, upon needing to establish the date of the document as that indicated by the timestamp, a verification process is performed. The document is hashed again by a party trusted by both the document author and the party challenging the document's asserted date, and the hash value is combined with the timestamp. This combination is then hashed to produce yet another hash value for final verification. In parallel, the digitally signed hash value provided by the TTSA is decrypted with the TTSA's public key, and the result is compared with the final verification hash value. If there is a match, the TTSA's credibility is used as the basis for trusting the document date indicated by the timestamp.
However, this process requires some critical assumptions and carries significant risk. The TTSA must be trustworthy, the TTSA's private key must not have been secretly compromised, and the TTSA's public key must be available from a trusted source at the later date, when the document is challenged. If the TTSA is corrupt, or even if it is trustworthy, but the document challenger is skeptical, then this prior art scheme will not work to convince the challenger of the document's date. Even worse, if the TTSA's private key is ever stolen, all documents, for which the timestamps had been signed by the stolen key, lose their date provability unless some type of remedial action is taken. A mere single careless act by one employee of the TTSA, or only a single successful hacking attempt, is required to defeat this entire prior art trusted timestamping system. Further, similar to the reliance on the document archival services provider remaining in business, if the TTSA ever ceases operations, it may be difficult to prove the date of a document. This is because the TTSA is no longer around to confirm the validity of its public key. Anyone asserting that a document has been timestamped by a defunct TTSA can identify any key as the alleged public key, and the TTSA entity won't exist to refute the assertion, allowing the possibility of a forgery.
Thus, there exists a need to establish a system for reliable date proof and tamper indication of documents, which is not vulnerable to the security weaknesses and risks of the current trusted timestamping and archival processes, and is further easier to use, more reliable, and likely less expensive than using either a TTSA or a document archival services provider.
For a more complete understanding of the present invention, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
Systems and methods are disclosed which use a blockchain (a.k.a. block chain or edition chain) to enable the establishment of integrity and no-later-than date-of-existence for documents (e.g., generic computer files) even for documents held in secrecy and those stored in uncontrolled environments. Daisy chained records permit linking various blockchain records, to establish that relationships between the various documents (represented by the records) had been asserted as of the date of registration (in the blockchain) of the documents. Example uses that may advantageously employ a blockchain with daisy chained record references include real estate transactions, estate planning, contract negotiations, financial transactions involving multiple stages and/or accounts, and complex deals that aggregate multiple individual documents.
A permissioned blockchain with off-chain storage establishes integrity and no-later-than date-of-existence for documents, leveraging records in which hash values represent documents. After registration, if a document's integrity or date is questioned, the document is hashed again and the new hash value is compared with the record. A provable date-of-existence for the block containing the record establishes a no-later-than date-of-existence for the document. Using multiple hash values renders preimage attacks into multi-dimensional problems, increasing security against quantum computing. If there is no challenge to the document, the document may remain private (confidential) indefinitely. Even if disclosure is needed to prove the document's age and integrity, in some scenarios, disclosure can be limited to an agreed set of trustworthy parties, without becoming public. Compact records and off-chain storage in a secure document corral preserve document confidentiality and ease storage burdens for the distributed blockchain. Permissioning monetizes operations and enforces record content rules, avoiding problematic material (e.g., obscene material, material posing privacy problems, intellectual property rights violations, and digital files containing malicious logic) to ensure long-term viability. That is, the permissioning entity can bar blockchain entries that contain material other than hashes, timestamps, and other authorized data fields, in the correct location with proper content. Thus, obscene and illegal material can be kept out. Additionally, the permissioning entity can limit submissions to submitters who have paid the required fee and/or belong to the proper group (e.g., industry sector) that is serviced by the blockchain. The priority parent application preceded Bitcoin; earlier terms for “block” and “block chain” are “edition” and “edition chain.” Daisy chaining records establishes that relationships existed among various documents as of the blockchain registration dates and can be used to identify when a set of documents, that had been registered in a blockchain with an indication of a relationship among the set, is missing one or more of the documents.
Additional benefits of the disclosure include a blockchain for which document protection persists beyond the cessation of operations by any business associated with producing the blockchain. No one involved with the disclosed blockchain can either falsify date proof (of any document that did not actually exist as of the provable date-of-existence) or deny date proof for any document with a corresponding record appearing within the blockchain. Thus, any employee of a permissioning entity being accused of corruption does not taint the proofs offered by the blockchain. Verification of a no-later-than date of existence for a document can be accomplished by anyone, without the need for special software to read the blockchain or locate records—contingent only on a copy of the document at issue being available for hashing. Thus, when combined with the off-chain storage, significantly reduced storage requirements, and the benefits of the permissioning entity precluding problematic material, a long-life blockchain is possible. Additional disclosure assists with keeping blockchain operations compliant with legal requirements when an enforceable court order requires deletion of certain material (e.g., a “right to be forgotten” as identified in the General Data Protection Regulation (GDPR)). Such compliance is challenging, if not possible for on-chain storage blockchains, such as used by Bitcoin and Ethereum.
The daisy chain capability enhances other aspects of the disclosure, such as the use of a document corral, a document quarantine (for items not permitted to remain within a document corral), the use of parallel (different speed) blockchains, and a unique self-addressed blockchain registration (SABRe) capability that enables a document to identify the location of its record within a blockchain, and yet still produce a hash value (message digest) that is within the record it references. Daisy chaining enables identification of sets of documents within a document corral, without either bloating the blockchain or requiring an external data item to track. Daisy chaining also enables identification of the disposition of quarantined documents. Further, daisy chaining also enables identifying an earlier date-of-existence for “early” documents that leverage the advantageous SABRe capability.
Terms are often used incorrectly in the information assurance field, particularly with regard to tamper detection. For example, the term “tamper proof” is often used incorrectly. A tamper proof article is effectively impervious to tampering, which is often described as unauthorized alteration. Few articles qualify for such a designation. “Tamper resistant” is also often used incorrectly when a more appropriate proper term would be “tamper evident”. A tamper resistant article is one for which an act of tampering is difficult, although possible, to accomplish. A tamper evident article is one for which tampering is detectable, independent of whether the tampering itself is easy or difficult to accomplish.
A document associated with an integrity verification code (IVC), for example a hash value from the secure hash algorithm (SHA) family of functions, is better described as tamper evident, rather than tamper proof or tamper resistant. A document dating list (DDL), for example an embodiment of a public electronic document dating list (PEDDaL™), which comprises a listing of IVCs optionally associated with timestamps, provides a repository of information that is useable in ascertaining whether a particular document has been tampered. A description of IVC generation is provided in
Embodiments of the invention solve problems that have been previously unsolved, for example, proving the date of a document and the lack of any alteration when a challenger of a document date does not trust the timestamping provider or refuses to acknowledge the validity of a timestamp. Embodiments of the invention thus provide a surprising result that contradicts the teachings of the prior art: The need for trusting a timestamping authority can be eliminated in many situations, even when a document is stored in secrecy under the exclusive control and possession of an untrustworthy party.
Embodiments of the invention solve another problem that has been previously unsolved: An asserted date of a document, and the lack of any alteration, can be established even when a document has been stored in an uncontrolled environment. Embodiments of the invention thus provide another surprising result: Website pages stored on a website controlled by any website operator can be reliably dated at a later time, and proven to have remained unaltered, even if the website operator is untrustworthy.
Using an embodiment of the invention, any entity, for example the PTO, a search engine operator, or a litigation party, can reliably assert and prove a date that a website document was available to the public, even without the expense of maintaining an independent archival copy of the document or using either a trusted document archival service or a trusted timestamping authority (TTSA).
Referring now to the figures,
Upon a need arising for the author to establish the timestamping date of document 103, prior art system 200 illustrated in
The intermediary separates the components of document record 112 into document 103, timestamp 106, and encrypted hash value 111. Document 103 is hashed by hash function 104, which is a copy of the same function originally used by the document author to generate document hash value 105. This produces second document hash value 205, which should be identical to the earlier-generated document hash value 105, used in generating timing hash value 108 and then encrypted hash value 111. Second document hash value 205 is combined with timestamp 106 and hashed using hash function 107, which is a copy of the same function originally used by TTSA 102 to generate timing hash value 108. This produces test hash value 208, which should be identical to earlier timing hash value 108, used in generating encrypted hash value 111. Encrypted hash value 111 is decrypted with public key decryption module 209 using the public key 210 of TTSA 102 to produce verification value 211. Public key decryption module 209 and public key 210 correspond to public key encryption module 109 and private key 110, respectively. If test hash value 208 matches verification value 211, then the intermediary has established at least two things: test hash value 208 matches timing hash value 108, and public key 210 corresponds to private key 110. Upon both of these conditions being true, the TTSA 102's credibility can be used to prove the validity of timestamp 106. If either condition is untrue, or there is another problem with prior art system 200, test hash value 208 will differ from verification value 211, and the date of timestamp 106 will be unverified.
It is important to note that the usefulness of prior art systems 100 and 200 is degraded if any of the following occur: 1) TTSA 102 ceases business operations and cannot certify its public key; 2) TTSA 102 ceases business operations and its public key cannot be found; 3) an employee of TTSA 102 is discovered to be corrupt; 4) private key 110 is stolen by an intruder or computer hacker; 5) private key 110 is compromised through social engineering; 6) private key 110 is cracked through computing technology advances; 7) the timestamping equipment of TTSA 102, generating timestamp 106, is suspected of inaccuracies; or 8) a challenger refuses, for any reason, to acknowledge the credibility of TTSA 102.
It should be noted that, in many situations, the credibility of TTSA 102 may be regional, such as generally accepted in some regions while generally rejected in others. An example of this would occur if TTSA 102 operated in a first country and a document challenger came from a second country, which had a long history of political animosity and distrust toward the first country. In such a situation, prior art systems 100 and 200 would have little practical value, even if operated with flawless integrity and accuracy.
Prior art systems 100 and 200 cannot protect against accidental key compromises, TTSA employee corruption, or even arbitrary, baseless distrust of TTSA 102. As a result, prior art systems 100 and 200 have experienced limited rates of adoption.
Embodiments of system 300 enable the proof of asserted document dates and proof of the absence of tampering, even for documents held in secrecy and those stored in uncontrolled environments, without requiring a challenger to trust a timestamping authority or the records of a document archival service. TTSA 102 may be used to generate timestamps, operating in the capacity shown for a TSA 302, but even if TSA 302 loses credibility or ceases business operations, an asserted document date may still be established.
In system 300, a first record submitter 301 exchanges information with TSA 302, which provides a DDL service. Two editions of a DDL are illustrated in
First record submitter 301 obtains a first document 303 and processes it with an IVC generator 304 to produce an IVC 305, which represents at least a portion of first document 303. First record submitter 301 may or may not be the author of first document 303. In some embodiments, IVC 305 represents a collection of multiple documents. In some embodiments, first record submitter 301 obtains IVC generator 304 from TSA 302. In some embodiments, IVC generator 304 is not local to first record submitter 301, but is instead located on remote computing resources requiring that a copy of document 303 be sent for processing and generation of IVC 305. IVC 305 is communicated to TSA 302. In some embodiments, additional information accompanies IVC 305, such as an identification of IVC generator 304, IVC generation rules, software version, a generated timestamp generated by a DDL submitter, and user account information, so that TSA 302 can collect payment for providing DDL services. Upon receiving IVC 305, TSA 302 generates a timestamp 306 and combines it with IVC 305 to produce a document record 305a. Document records generated by TSA 302, such as document record 305a, may contain extra information, including an identification code for the submitter, unless the submission process is anonymous. Other possible information includes an indexing or a record count number, and other information that may enhance the utility of a DDL edition. A record may include information enabling trusted timestamping validation, for example a copy of a signed hash, such as encrypted hash value 111.
A second record submitter 307 obtains a second document 308 and processes it with an IVC generator 309 to produce an IVC 310, which represents at least a portion of second document 308. Second record submitter 307 may or may not be the author of second document 308. IVC generator 309 may be similar in function to IVC generator 304, although this is not a requirement. As with the generation of IVC 305, the IVC processing may be remote, and the resulting IVC may actually represent more than just a single document. IVC 310 is communicated to TSA 302, and may be accompanied by additional information. Upon receiving IVC 310, TSA 302 generates a timestamp 311 and combines it with IVC 310 to produce a document record 310a. Both record 305a and record 310a are added to first DDL edition 312, which is written to a media 313 and sent to both first record submitter 301 and to second record submitter 307. First DDL edition 312 may contain additional records, such as records from many other submitters, and may be closed for writing to media 313 on a regular schedule, such as hourly, daily, weekly, monthly or annually, or when reaching a certain size, such as large enough to fill media 313 to some threshold. In the illustrated embodiment, media 313 is a computer readable medium, shown as a compact disk (CD) or a digital versatile disk (DVD), although it can comprise magnetic storage, random access memory (RAM), either volatile or non-volatile, or another form of data storage. In some embodiments, media 313 is a permanent, read-only media after it has been written with first DDL edition 312. In some embodiments though, media 313 may be substituted with a humanly-readable media, which may also be suitable for an optical character recognition (OCR) process. In some embodiments, first DDL edition 312 is sent out electronically, such as in an email or an equivalent, to first and second record submitters 301 and 307, in addition to others.
With the arrangement illustrated in
On a large scale, many thousands, or even millions, of people are put into a position of being able to provide evidence of the existence and absence of tampering for millions of documents, or even more, without ever knowing their contents. In order to establish a date at a later time though, at least some of the people or entities involved will need to keep records indicating the date at which a copy of first DDL edition 312 was obtained. However, records suitable for proving past dates of certain events, such as having received an item in the mail, are often kept in the ordinary course of business by many entities. This existing activity can be leveraged at a later time, when an asserted date and integrity for first document 303 and/or second document 308 needs to be established.
When providing DDL service, TSA 302 may require that a submitter assign any copyrights in the components of a record to TSA 302, and may further copyright DDL editions. TSA 302 may distribute media 313 and/or other copies of DDL edition 312 free or for a fee. TSA 302 may engage the services of trusted document archival services providers for retaining copies of media 313, or even use one or more TTSAs to timestamp DDL editions in accordance with system 100, shown in
TSA 302 additionally processes first DDL edition 312 with an IVC generator 314 to produce an IVC 315, which represents at least a portion of first DDL edition 312. IVC generator 314 may be similar in function to IVC generator 304, although this is not a requirement. IVC 315 is combined with a timestamp 316 to produce a document record 315a. In the illustrated embodiment, at least a portion of record 315a is sent to a public record 317, for example by publishing a notice in the classified advertisement section of a newspaper listing all or a substantial part of IVC 315. Timestamp 316 may also be included in the submission to public record 317. Other public recording systems may be used in addition to or in place of a newspaper announcement. Some DDL editions, however, may be limited to distribution only among submitters or other defined classes of recipients.
A third record submitter 318 obtains a third document 319, and processes it with an IVC generator 320 to produce an IVC 321, which represents at least a portion of third document 319. Third record submitter 318 may or may not be the author of third document 319. IVC generator 320 may be similar in function to IVC generator 304, although this is not a requirement. As with the generation of IVC 305, the IVC processing may be remote, and the resulting IVC may actually represent more than just a single document. IVC 321 is communicated to TSA 302, and may be accompanied by additional information. Upon receiving IVC 321, TSA 302 generates a timestamp 322 and combines it with IVC 321 to produce a document record 321a. It should be understood that, although IVCs 305, 310, 315 and 321 are described in sequence, the only requirement for the order of generation is that IVCs 305 and 310 be generated prior to IVC 315, so that IVC 315 may represent them. It should also be understood that the reference to documents, such as for documents 103, 303, 308, and 319 is a generic term, and includes any type of computer file suitable for generating an IVC, including executable computer programs and data files.
Record 315a and record 321a are added to second DDL edition 323, which is written to media 324 and sent to third record submitter 318. As with distribution of first DDL edition 312, distribution of second DDL edition 323 may take many forms and include recipients other than IVC submitters. In some embodiments, one or more submitters may not receive a copy of a DDL edition containing their submitted IVC, but may instead rely on the widespread distribution of the DDL edition to find a copy at a later time, if needed.
By including IVC 315 in second DDL edition 323, second DDL edition 323 then provides evidence of the existence and integrity of first DDL edition 312 and therefore, all documents represented by first DDL edition 312. By iterating this process, each subsequent DDL edition builds upon prior submissions, becoming a cumulative record. A series of DDL editions can thus be chained, so that anyone possessing a copy of a particular DDL edition can then infer the existence and integrity of all DDL editions earlier in the chain, up through the initial DDL edition, which may be earlier than first DDL edition 312.
One possible example of a DDL record format is given by the following 1024 bit (1Kb) sequence, although other record formats may be used:
Bits 1-512, (512): SHA-512 message digest;
Bits 513-672 (160): SHA-1 message digest;
Bits 673-696 (24): identification code for hash functions and software version;
Bits 697-760 (64): timestamp in clear text;
Bits 761-952 (192): encrypted timestamp record (signed TTSA record);
Bits 953-968 (16): identification code for timestamp source (TSA or TTSA);
Bits 969-984 (16): reserved;
Bits 985-1024 (40): record index.
Bits 1-696 of the record are generated by the IVC submitter, and TSA 302 provides the remainder, possibly obtaining the TTSA record from an outside TTSA such as TTSA 102. The timestamp may be a simple count of the number of seconds elapsed since a defined start time, or may be a different value. In order to include a signed TTSA record in a compact allocated space, it may require modified generation compared with prior art methods, if the TTSA record is otherwise too long. One example is that 64 bits of the timestamp, 64 bits from a portion of the SHA-512 message digest, and 64 bits from a portion of the SHA-1 message digest, for a total of 192 bits, are encrypted with the TTSA's private key. The record index may be cumulative, or may be reset from one DDL edition to the next. Any fields not used may be left blank.
The use of multiple hash function versions helps preserve trust in the record in the event that one of the hash functions is cracked. Another option is to nest different hash functions, and append a prior-calculated hash value to a document when it is hashed at a later time, with the other algorithm. As an example, bits 1-672 could be {S2(file+S1(file))+S1(file+S2(file))}, where S1 is SHA-1 and S2 is SHA-2. Other IVC generators may be used, including ones with differently sized message digests than those used in the example.
System 100 creates a multitude of disinterested, potential third-party witnesses having evidence that can later be used to establish that documents 303, 308 and 319 existed, and have not since been modified, as of the dates that the applicable one of DDL editions 312 and 323, or a later chained edition, was obtained. The business records of one of these disinterested parties can then be used by one of record submitters 301, 307 and 318 to prove the date that the DDL edition was received. This can be accomplished without unnecessarily disclosing the contents of the documents involved, preserving secrecy.
Upon the need arising for record submitter 301 to establish a date for document 303, one or more of systems 400, 500 or 600, illustrated in
If challenger 402 is the same entity as record submitter 307, then challenger 402 has possession of media 313 and, presumably, business records indicating when media 313 was received. In this situation, records maintained under the control of challenger 402 actually provide dispositive evidence regarding the claim being challenged, the asserted date and/or integrity of document 303. This situation may not be entirely improbable if, for example, both record submitter 301 and challenger 402, a.k.a. record submitter 307, both operate in an industry that uses the services of TSA 302 for intellectual property (IP) protection or other record-keeping.
If however, challenger 402 does not have possession of media 313, TI 401 requests that challenger 402 obtain a copy of media 313 from any source trusted by challenger 402 to maintain reliable records. That is, challenger 402 can select the source for a copy of media 313 from any entity possessing a copy, and is not limited to trusting the records of TSA 302, TI 401, or record submitter 301. However obtained, TI 401 is illustrated as possessing a copy of media 313, or at least a copy of IVC 305. In the illustrated embodiment, TI 401 identifies record 305a on media 313, possibly under instructions from record submitter 301, since record submitter 301 is likely to know either the value of IVC 305, or else a record index number or some other way to identify record 305a on media 313 and/or any other copy of first DDL edition 312.
Because media 313 represents IVCs for multiple documents from multiple submitters, there are many independent entities, in addition to record submitter 301, who have an interest in establishing the date on which media 313 was written and distributed. One of those parties might actually be challenger 402, which is a scenario that is not exploitable by prior art systems 100 and 200. By submitting IVC 305 to first DDL edition 312, record submitter 301 is able to do something not facilitated by prior art systems 100 and 200: leverage the predictable self-interests of other entities to assist pursuing the interests of record submitter 301. Embodiments enable another fundamentally different operation over the prior art: An IVC used to establish an asserted date may be one that is stored outside the control of the entity asserting the date. It should be understood, however, that in some embodiments, a copy stored by record submitter 301 may be used, for example, if challenger 402 accepts the reliability of that copy. In contrast with prior art system 200, which relies on a hash value which is stored in record 112 under the control of the entity asserting a date for document 103,
TI 401 independently generates an IVC 405 from a copy of document 303, using a copy of IVC generator 304, which was originally used to produce IVC 305. Although illustrated that record submitter 301 provides a copy of document 303, TI 401 may obtain the copy of document 303 from another source possessing one, possibly challenger 402 or an independent source. TI 401 may have already been in possession of a copy of IVC generator 304, or may have requested one from TSA 302. If record 305a contained an identification of IVC generator 304, and possibly a specific software version in the case that IVC generator 304 contained an implementation flaw, TI 401 would have the information to select IVC generator 304 from among a collection of possible IVC generators. For example, IVC generator 304 may be SHA-1, SHA-2, which comprises SHA-224, SHA-256, SHA-348 and SHA-512, MD-5, another hash function, or any other function suitable to generate a value that can be later used for an integrity decision. TI 401 then compares the provided copy of IVC 305 with independently generated IVC 405 with comparison processor 406. Comparison processor 406 may be a computing device performing an equality check, or could be a simple human reading of two values on a video display or in printed form. In some embodiments, if the copy of IVC 305 from record 305a is only a partial section, that section is compared with the corresponding partial section of IVC 405. Responsive to a match, TI 401 issues validation certificate 407, and provides it to challenger 402. In some situations, for example during litigation, validation certificate 407 may be provided to a court.
Validation certificate 407 validates that IVC 405, independently generated by TI 401, matches IVC 305, which had been provided for the comparison. Although validation certificate 407 may mention the time and date indicated by timestamp 306, this time and date is generally not certified as accurate, unless timestamp 306 came from a TTSA, or another method of assuring accuracy is available. Trusting a timestamp from a TTSA may require that the timestamp, or an accompanying copy, be encrypted with the TTSA's private key. In some embodiments, establishing the asserted date of document 303 requires further effort, including examining records that indicate the date media 313 was written, or the date that a copy of first DDL edition 312 was available, if media 313 is not used. In such embodiments, validation certificate 407 is part of a collection of evidence which, when examined together, establishes the date of document 303, and its integrity, as of the date that reliable records indicate that IVC 305 had been distributed outside the control of record submitter 301.
In some situations, if an IVC was printed on a face of document 303, for example in accordance with the teachings of U.S. patent application Ser. No. 12/053,560, the printed IVC may be used for an initial comparison with IVC 305, and then verified against IVC 405, if necessary. In some situations, if document 303 had entered the public domain, or record submitter 301 felt no need to keep the contents of document 303 secret from document challenger 402, and document challenger 402 could be trusted to perform an independent verification properly, record submitter 301 can optionally simply ensure that document challenger 402 has an intact copy of document 303, so that document challenger 402 performs the role of II 401. However, as illustrated in
If challenger 501 is the same entity as record submitter 318, then challenger 501 has possession of media 324 and, presumably, business records indicating when media 324 was received. In this situation, records maintained under the control of challenger 501 actually provide dispositive evidence regarding the claim being challenged, the asserted date and/or integrity of document 303. However obtained, II 401 is illustrated as possessing copies of media 313, media 324, document 303, IVC, generator 304, and IVC generator 314. TI 401 identifies record 305a in first DDL edition 312, which is on media 313, and record 315a in second DDL edition 323, which is on media 324.
TI 401 independently generates an IVC 505 from the copy of document 303, using the copy of IVC generator 304, which was originally used to produce IVC 305, and an IVC 515 from the copy of first DDL edition 312, using the copy of IVC generator 314, which was originally used to produce IVC 315. TI 401 compares the provided copy of IVC 305 with independently generated IVC 505 using comparison processor 506, and the provided copy of IVC 315 with independently generated IVC 515 using comparison processor 516. Comparison processors 506 and 516 may be similar to comparison processor 406. Upon a match from comparison processor 506, TI 401 issues validation certificate 507, and provides it to challenger 501. Upon a match from comparison processor 516, TI 401 issues validation certificate 517, and provides it to challenger 501. In some situations, one or more of validation certificates 507 and 517 may be provided to a different entity. Validation certificates 507 and 517 validate that an independently generated IVC matches an IVC which had been provided for comparison. Proof of an asserted date for document 303 can be found using either of timestamps 306 and 316, if issued by a TTSA, or using the business records of the sources of media 313 and/or media 324.
If challenger 501 does not possess a copy of media 324 containing second DDL edition 323, or does not trust a copy available from another entity, but instead possesses or trusts only a later DDL edition, the process described for system 500 can be iterated from the earliest DDL edition, which challenger 501 does trust, going backwards through copies of the intermediate DDL editions until first DDL edition 312 is reached. If TSA 302, or another entity, retains archived copies of the various IVC generators used for the DDL records, TI 401 will be able to reproduce all intermediate stage IVCs. This task may be is eased if each DDL record indicates the specific IVC generator and software version used. At the worst case, challenger 501 will need to admit that IVC 305 had been generated prior to the first DDL edition trusted by challenger 501, by at least the amount of time needed to compile each of the intermediate DDL editions.
TI 401 independently generates an IVC 605 from the copy of document 303, using a copy of IVC generator 304, which was originally used to produce IVC 305, and an IVC 615 from a copy of first DDL edition 312, using a copy of IVC generator 314, which was originally used to produce IVC 315. TI 401 compares the provided copy of IVC 305 with independently generated IVC 605 using comparison processor 606, and the provided copy of IVC 315 from public record 317 with independently generated IVC 615 using comparison processor 616. Comparison processors 606 and 616 may be similar to comparison processor 406. Upon a match from comparison processor 606, TI 401 issues validation certificate 607, and provides it to challenger 601. Upon a match from comparison processor 616, TI 401 issues validation certificate 617, and provides it to challenger 501. In some situations, one or more of validation certificates 607 and 617, which validate that an independently generated IVC matches an IVC which had been provided for comparison, may be provided to a different entity. Proof of an asserted date for document 303 can be found using either of timestamps 306 and 316, if issued by a TTSA, the business records of the source of media 313, and/or using public record 317.
Thus, systems 300, 400, 500 and 600 allow for establishing an asserted document date and integrity when using a timestamping authority that is not trusted by a challenger. Relaxing the provable date from timestamp date 703 to one of independent possession date 705, provable public disclosure date 706, and the data of a later DDL edition, along with leveraging the records of disinterested parties, enables embodiments of system 300, 400, 500 and 600 to function without the security vulnerabilities and many of the other risks inherent in the prior art systems.
In many situations, the relaxed date will suffice. That is, in many situations, it is not required to prove the exact date that a document was timestamped, but rather it is enough to prove that a document exceeds some lesser age. For example, when using a DDL to date a document used in a PTO office action rejection of a pending application, is may not be necessary to prove that a specific document is 15 years old versus 14 years old, but rather that the document existed at any time prior to the application priority date, which may be considerably more recent. This relaxing of requirements enables the system to operate more robustly and with reduced need for trust.
Illustrated system 800 comprises an intranet 801, although other computer networks may be used. A user computer 802 is used to create document 803, and is coupled to intranet 801, and may be a digital version of one or more of documents 303, 308 and 319. Also coupled to intranet 801 are a network printer 804, an email inbox 805, a control node 806, and a server 807, acting as a gateway to internet 808 with security module 809 as the gatekeeper. Control node 806 is configured to intercept document 803 as it is sent from user computer 802 to printer 804, email inbox 805, control node 806 itself or an outside email address across internet 808. Printer 804 may be used to print one or more of documents 303, 308 and 319 and may further comprise a document scanning function for rendering images suitable for an OCR process.
Control node 806 comprises an IVC generator 810, a modification rule module 811, and a file parser 812. File parser 812 identifies the type of document 803, generates at least one original data sequence, selects a type-specific modification rule set from modification rule module 811, and calls IVC generator 810 to produce an IVC. In some embodiments, IVC generator 810 excludes elements from the IVC calculation that are not printably determinable from a printed copy of document 803. It should be understood, however, that alternative configurations of control node 806 can perform the same required functions. Control node 806 illustrates an embodiment of a system described in U.S. patent application Ser. No. 12/053,560, “DOCUMENT INTEGRITY VERIFICATION”.
Upon generation of the IVC, control node 806 communicates the IVC to an embodiment of a PEDDaL™ system running a DDL node 813. DDL node 813 hosts an IVC database 814, a timing module 815, and an account database 816. DDL node 813 is coupled to a media writer 819, capable of writing at least a portion of IVC database 814 to media 313 and/or media 324. IVC database 814 comprises DDL editions, for example first DDL edition 312, second DDL edition 323 and/or other editions. IVC database 814 enables the author of document 803 to prove the existence of document 803 as of the date that a DDL edition of IVC database 814 became public. In some cases, for example if DDL editions are released daily or more often, this may be the same date that document 803 is created. The process for creating a database record for document 803 is automated, and occurs when document 803 is sent to printer 804, email inbox 805, or any other destination monitored by control node 806, provided the. However, IVC database 814 does not betray the contents of document 803 to the public, because IVC generator 810 is a one-way function. It should be noted that, while the illustrated embodiment shows the use of IVCs generated in accordance with modification rules module 811, some embodiments of IVC database 814 can store prior art hash values.
Using database 814 is then easy for a user, due to the automated operation of the illustrated system. A registered user merely sends document 803 to a printer or email inbox, such as printer 804 and email inbox 805, which has been designated as a recipient node for triggering a database entry by an administrator of intranet 801, or places the document in a certain directory accessible by control node 806, and the record generation is automated. For example, a large company may set up a designated printer 804 in an engineering department, and instruct employees to print certain technical reports to printer 804 or use a certain facsimile machine for ingoing and/or outgoing fax messages that are to be processed. For a fax, the fax bit stream is used to generate the IVC, but may need to be stored in an archive. As an another example, a law firm may instruct its support staff to email copies of PDF documents filed with the US PTO to a designated email inbox 805, so that if a document date is later contested, an independent database can at least verify the document's existence as of a certain date. As another example, a company may instruct its employees to place important documents in a specially titled folder on their computer or else in a directory on a network node. In some embodiments, control node 806 can further determine that a received document is sent from a previously identified computer outside security module 809 of server 807, such as computer 817, when an authorized user is logged into intranet 801 from a remote location. However, control node 806 may further avoid processing print jobs or documents sent to printer 804, email inbox 805, or a designated folder by unauthorized parties, in order to avoid triggering undesired IVC generation and database entry costs.
In operation, an exemplary system may function as follows: Upon a user sending document 803 to a monitored destination, control node 806 sends a message with account identification (ID) to DDL node 813. DDL node 813 compares the retrieved time information from timing module 815, and using the account ID, identifies the responsible entity in account database 816. Other networks 818 can comprise another control node, which automatically interacts with DDL node 813, similarly as control node 806. Account database 816 enables identification of the responsible party to bill for database usage. DDL node 813 can operate on either a per-use or a capacity subscription basis, similar to the way a communication service permits a user to contract for a given number of messages on a monthly basis, and charges for extra messages above that number.
If DDL node 813 determines that a requested database entry is from an authorized database user account, it retrieves time information from timing module 815. DDL node 813 then sends the time information, and optionally, a security code to use when submitting a database entry. Control node 806 timestamps the generated IVC using the time information received from the database node or optionally, its own internal clock, and returns the IVC, along with an optional time stamp and response security code. DDL node 813 timestamps the incoming information, using information from timing module 815, and updates IVC database 814 with the received IVC and at least one timestamp. Submitter ID information may optionally be added to IVC database 814. DDL node 813 then sends an acknowledgement of the IVC addition, so that control node 806 does not need to resend the information after a time-out. DDL node 813 and control node 806 exchange fee information, and DDL node 813 updates account database 816 to increment the number of IVC submissions from the account holder associated with control node 806. As some point, the owner of control node 816 is billed for the database services. Upon some event, perhaps IVC database 814 reaching a certain size, or the lapse of a predetermined amount of time, a permanent computer readable medium, such an optical media, containing a copy of IVC database 814, is sent to at least some of multiple contributors to IVC database. Additional copies may be sent to other data archival service providers and libraries. Older versions of IVC database 814 may remain available over internet 808 for searching purposes.
At a later time, the author of document 803 may be accused of trade secret theft, and may wish to use document 803 to prove prior conception of an invention to the accuser. Consider, for the following example, the convenient case that both the author of document 803 and the accuser submitted IVCs to the same version of IVC database 814, and that the accuser kept accurate date records of the receipt of the media. Accuser then has possession a copy of the portion of the IVC database 814, which can be used to prove that document 803 existed, at the latest, as of the time that the accuser received the media. The author may provide a printed paper copy of document 803, or a copy in another format, to the accuser, along with an assertion of the date at which document 803 was allegedly created, and instructions on where to find the IVC in the accuser's own copy of the old IVC database. The accuser can then independently generate the IVC, even from a paper copy of document 803 and verify that it matches a record in IVC database 814. Upon this occurrence, the accuser must then admit to the existence of document 803 prior to the date that the accuser's own internal records indicate receipt of the media containing IVC database 814. Other options exist when the convenient case described above does not exist, such as a third party performing the verification, using a copy of the proper edition of the IVC database 814 from a trusted archival source. This option allows the verification of the date of an important document, even without disclosing the contents outside trusted parties, and can thus provide an efficient, reliable alternative to many IP litigation procedures. Thus, a large organization can automatically, and cost-effectively, provide for date-proving documents generated by its employees.
An embodiment of an automated IVC generation system receives a file, generates an IVC, and communicates the IVC to a DDL. The system may further communicate account ID information to the DDL. The system may further communicate a security code to the DDL. The system may further communicate with the DDL node to obtain an IVC generation module, and communicate to the DDL indicia of the IVC generation module and options used. The system may further generate a second IVC with different IVC generation conditions, such as using different rules or a different algorithm. The system may further generate an IVC according to modification rules, and may further parse the file, based on the file type. The system may further resend information if an acknowledgment from the DDL node is not received within a time-out period. The system may further timestamp information prior to sending it to the DDL node. The system may further request a time reference from the DDL node prior to generating the timestamp. The system may further generate one record for submission to the DDL node, which represents a plurality of files. Receiving a file may comprise intercepting a file sent to a destination, such as a printer or email inbox. Receiving a file may comprise scanning an identified directory at a selected time. Scanning the identified directory may comprise scanning the identified directory to identify files added since a prior scan. Receiving a file may comprise intercepting a facsimile associated with a particular fax machine, either incoming or outgoing. Receiving a file may comprise intercepting a copy of a website page being moves to a web server.
In box 901, copies of IVC generation software and/or hardware, which will produce a compatible DDL record having a predetermined format, are provided to potential DDL submitters. In some situations, this may involve placing downloadable copies of software on a website, providing links to other websites having compatible software, or suggestions on how to obtain or develop an IVC generator. In box 902, an account management and/or login screen is provided and may support a one-time fee for one-time service transaction, a subscription account, or both. An account set-up and management system to allow users to conduct transactions with a DDL service provider, including performing at least some of submitting IVC records, requesting copies of a DDL edition, submitting payment, and assigning any copyright interest in submitted DDL records. In some embodiments, at least some user accounts may be managed to enable anonymous submissions. In box 903, an account ID is received, which is verified against an account database in box 904, to check for a valid and open account, current on any billings.
Some IVC generators may provide a submitter-generated timestamp, which may or may not be included in the published DDL edition. A submitter-generated timestamp may have less value than one produced by a DDL service provider, since a submitter could intentionally attempt to submit a falsified timestamp. However, if an IVC generator does provide its own timestamp, it may request a timekeeping reference from the DDL service provider, to synchronize its own clock with an external, presumably trusted, system. Thus, in box 905, a time reference is sent to a potential submitter.
Additionally, for some subscription services, submitter-side computing resources may perform some initial handshaking and synchronization with DDL service computing resources prior to submitting an IVC or a batch of IVCs. Scenarios include a periodic archiving service, for example a weekly storage media backup for a computer, which additionally scans selected directories, identifies new files, generates IVCs for them, and then submits the IVCs to a DDL. Such a system could operate automatically on a subscription basis, in order to reduce the workload on information technology (IT) managers who administer the computer network.
In an example operation, submitter resources associated with a valid, open subscription account contact the DDL resources with identifying information, signal the start of an IVC submission process, and request synchronization. The DDL resources verify that the account ID corresponds to a valid account with permission to perform the requested operation, and then send both a time reference and, as indicated in box 906, a submission security code. If the user account lacks the permissions, a security code will not be sent. Then, if an IVC submission follows, using a communication protocol associated with a security code, but which is not accompanied with a valid code, the submission will be rejected. In some embodiments, the submitter-side computing resources processes security code information to produce a response code, rather than merely repeating the received information back to the DDL service computing resources. The processing may include an encryption process.
In box 907, an IVC is received from a first submitter. The IVC may comprise portions or the entireties of message digests from a plurality of hash functions, or just a single hash function. In box 908, IVC generation indicia are received, including identification of the IVC generator or generators used, software version, a submitter-asserted timestamp, and other information that may be relevant to enabling a later reproduction of the submitted IVC. Together with the processes of prior boxes, a submitter has, by this point, submitted at least a portion of the information necessary to generate a DDL record. In some embodiments, the submission may be in proper format for appending to an open DDL edition, with only the addition of information by the DDL service provider. In some embodiments, the DDL service provider will need to reformat submitted information, for example in box 911, which will be described in more detail later. A timestamp is obtained in box 909, either generated locally, or requested from an external source. In some embodiments, box 909 may involve obtaining a trusted timestamp in accordance with prior art system 100, illustrated in
A record compatible with an open DDL edition is appended in box 911 with the timestamp information, and may require reformatting if a submitter did not format the information in accordance with a desired record format. Although a DDL services provider may experience a lighter computational burden if submitters use standardized software, some submitters may use third party software, and/or software which create records in an obsolete format. A DDL services provider will likely have an interest in ensuring that properly functional submitter software is available, and includes bug fixes and updates. The DDL record is appended to an open DDL edition in box 912. Some embodiments will include a count or index number in the DDL record, which can be added in one of boxes 911 and 912.
In order to prevent a submitter from unnecessarily repeating the submission process, an acknowledgement is sent in box 913. For a user-interactive submission session, this may be as simple as generating a window for an internet browser, such as a completion web page or a pop-up window. Automated submission systems may attempt to resubmit information after a time-out period or a failure message, so an acknowledgement will prevent release of the computing resources. Some embodiments of an acknowledgment message will include an identification of the open DDL edition containing the submitted record, along with a record index number, or numbers, if there is a plurality. Providing this information to a submitter will enable the submitter to readily locate the IVCs at a later date, for example when attempting to prove an asserted date. The expected closure and/or publication dates and times for the DDL edition may also be provided in an acknowledgement message, or at a later time.
In box 914 the user account is updated, possibly with a count of the number of IVCs submitted, and/or a reference of the record index number and DDL edition, if such information will be desired later. Keeping such information could potentially work against anonymity efforts, although if a submitter loses its own copy of index and edition information, information retained by a DDL services provider may ease the burden of searching for the submitter's IVCs at a later time. The user is billed in box 915. The billing may be based on the number of submissions, or may reflect a subscription service permitting a certain number of submissions during a time interval, with an extra charge for a number above the allotted amount.
In box 916, another submitter begins interfacing with the DDL system, and boxes 902-915 are repeated for each of the other submitters while the current DDL edition is open. It should be understood that multiple submitters may be in various stages of the submission process simultaneously, so that the processes thus described may be implemented in parallel. It should be further understood that some of the stages may be changed in order and/or blended, based on specific implementation needs, capabilities, and business operations of a DDL services provider.
The current DDL edition is closed to new entries in box 917, and an IVC is generated for it in box 918. A DDL record is generated, possibly including timestamp information, so that multiple DDL editions can be chained. In box 919, a copyright registration may be requested on the recently closed DDL edition. The DDL IVC, and possibly other portions of the record that may appear in a subsequent DDL edition, are publicized in box 920. This may include printing an announcement in a newspaper, pacing the information on a website, or other attempts at publicity. The closed edition is publicized in box 921, for example by writing and mailing media, emailing copies, if not prohibitively large, and placing on a publicly-available internet website. The internet website suitable for DDL searches may require a user login, and have some access requirements that limit the portion of the public able to access it. Also as part of box 921, an electronic message may be sent to submitters to inform them that the DDL edition has been publicized, and providing them with information to enable identification of the edition containing their submitted records.
The next DDL edition is opened in box 922, although it should be understood that multiple DDL editions may be open contemporaneously to improve system response times, based, in part, on the rate at which submissions are received or expected. The now-open DDL edition is appended with the DDL IVC generated for the recently closed DDL edition in box 923. The DDL IVC may be the first record, although if the current DDL edition was opened and receiving records while the recently closed DDL edition was being processes, the DDL IVC might not be the first record. As indicated in box 924, portions of the previously-described process are iterated for multiple DDL editions, which are closed according to criteria that are selected by the DDL services provider, and may include the elapse of a predetermined amount of time, or the size of a DDL edition. Iterative chaining allows for a cumulative record of IVCs, continuously protecting all prior submissions indefinitely, and a DDL IVC may be written to multiple subsequent editions. In box 925, a search capability is provided, for example for internet browser dating modules, interactive searches, linked document archives, and search engines. The DDL services provider may charge a fee for searching.
Many of the processes can be performed by a DDL control module, implemented in hardware, software embodied on a computer readable medium, or both. Examples include interacting with a submitter's computing resources, interacting with a timing module and/or a TTSA's computing resources, appending a DDL edition, writing to media, account management, and publishing information on a website. A hardware apparatus may comprise an application specific integrated circuit (ASIC) and/or a field programmable gate array (FPGA). A hardware apparatus may comprise one or more general purpose central processing units (CPUs), coupled to memory holding software programs capable of executing at least some of the processes. Some of the process may not be used for a one-time fee for one-time service business model, and some of the process may not be used for a subscription service business model. Operating a DDL service may comprise offering users a choice between a one-time fee for one-time service and a subscription service transaction, so that both business models are contemporaneously available, and utilized based on customer preferences.
In some embodiments, a DDL record submission is anonymous, such that even a DDL administrator is unable to identify the submitter. In some embodiments, a DDL record submission is associated with a specific user account or other identification information. In some embodiments, both anonymous and user-identifiable submissions are accepted. Both identifiable and anonymous submissions may be available with multiple transaction types, in order to more fully accommodate customer preferences. For anonymous records, the billing process may require additional steps to ensure anonymity, such as purging records after payment is received, and/or using an intermediary billing service, along with an account ID that lacks real names or other information that could specify the submitter's true identity. For some DDL customers, though, anonymity may not be necessary, and a simpler account management system may be preferable.
Anonymity may take various forms. For example, the submission process may be anonymous as previously described. Additionally, the publication process may be anonymous, even if the submission process is not. That is, even if a DDL administrator could link a record submission to a particular submitter identity, some embodiments of a published DDL edition will not include any of the identifying information. However, in some situations, the submitter may wish to associate an identity or a document title with a DDL record in a published database. Some embodiments of a DDL edition may make accommodations for this customer preference, either in the DDL itself, or in an appendix to the DDL edition, providing identifying information, whether submitter, document title or both.
If a published DDL record is anonymous, using a DDL system to protect IP operates with a unique paradigm: Users pay their own money in order to include information anonymously in a publicly distributed record.
An embodiment of a DDL services receives at least one IVC from each of a plurality of submitters and appends a DDL edition. The system may associate a timestamp with one or more of the IVCs. The system may further communicate a security code to a submitter. The system may further provide an IVC generation module. The system may further generate and send an acknowledgment to a submitter. The system may further request a timestamp from an external system. The system may further publicize the DDL edition. The system may further generate an IVC representing the DDL edition. The system may further publicize the DDL IVC. The system may further include the DDL IVC in a second DDL edition. The system may further iterate for multiple DDL editions, thereby generating a plurality of chained DDL editions.
In box 1001, a user obtains an IVC generator. Possibilities include visiting the website of a DDL services provider and downloading software, either provided free or for a nominal cost. Other possibilities include developing an IVC generator independently, so that it produces a record compatible with an intended DDL submission. The IVC generator is set up in box 1002, for example by installing it on a user computer system, and may include configuring the IVC generator to send in a security code uniquely associated with the user's account. Some embodiments of an IVC generator may be set up to automate at least some of the processes described in boxes 1003-1013. At least one IVC, possibly a plurality of IVCs, is generated to represent a selected file, in box 1003. In some embodiments, this is a user-interactive process, such as a user identifying the file using a graphical user interface (GUI), however, in some embodiments, a file may be selected based on its directory location. In some embodiments, the IVC generator runs automatically at certain times. In box 1004, the remainder of a record for submitting to a DDL is generated, to the point of completion expected by the DDL services provider. This may include providing an account ID and a user-asserted timestamp, which may further include synchronizing with a time reference from the DDL services provider sent in accordance with box 905 of method 900.
In box 1005, the user logs into the DDL website, possibly using a previously established user account and, in some embodiments, sending a security code to assist with validating the user's identity. As part of the log-in process, the suitability of the IVC generator may be examined, and if it is out of date, the user may be prompted to download a new version and reset to box 1001. In box 1006, the user pays a fee to use the DDL services, provides permission to publish the user's records in a DDL edition, which may include an express assignment of any copyrights in the generated record, and selects whether to receive a copy of the DDL edition. The user may perform fewer or additional interactions with the DDL services provider, based on the business models available. During set-up of the IVC generator, the user may enter a credit card number, which can be billed upon submission of the IVC. Alternatively, or additionally, the user may enter the credit card number into a payment processing page of the DDL website, or else use another form of internet-based payment. The record generated by the user is submitter in box 1007, and is subject to modification by the DDL services provider.
A timeout clock is started in box 1008, and if an acknowledgement of a successful submission is not received in time, as indicated by decision box 1009, the record is resubmitted in box 1007. In box 1010, a timestamp is received, possibly as part of the submission acknowledgment, and may be the timestamp of the record reception and/or an expected timestamp for the DDL edition close-out and publication. In box 1011, a copy of data sent in accordance with box 913 of method 900 is saved. This may include information usable to rapidly locate the IVC in the DDL, including an identification of the DDL edition and/or a record index. When the current DDL edition is closed and published, if the DDL services provider sends an announcement to submitters regarding the closing and publication of the DDL edition, this information is received in box 1012, possibly by responding to an email and downloading the information from a website, although other methods of obtaining the information may be used. This information is stored in box 1013. Information stored during performance of the processes associated with boxes 1011 and 1013 may be stored in a central location and/or with the files for which IVCs were submitted. An embodiment of an IVC generation system receives a file, generates an IVC, communicates the IVC to a DDL, and stores information received from a DDL services provider.
In box 1101, a user, for example an IT administrator, obtains an automated IVC generator, and sets up a network node or a plurality of nodes, accessible to authorized authors, in box 1102. Possibilities include designating a particular printer, email inbox, facsimile machine, incoming and/or outgoing, network directory, and/or other computing resources. Access may be limited to computers connected to a particular network node behind a security module and/or capable of logging into a network with certain account privileges. The IVC generator is set up in box 1003, for example by installing it on a particular node capable of intercepting network traffic going to the designated network nodes and/or identifying authorized submitters. In box 1005, the user sets up and/or updates a subscription account. Setting up the account may include setting up a payment system, selecting a rate plan that specifies a rate at which records are expected to be submitted along with overage charges, providing a blanket assignment of rights in the upcoming records, furnishing a mailing address for DDL media, requesting a security code, specifying anonymity options, and other actions suitable for maintaining an account suitable for DDL transactions.
In box 1105, a file is received. This may include receiving an attachment to an incoming email, scanning a directory, intercepting a bit stream sent to a printer, receiving an incoming facsimile bit stream, scanning a document in order to generate a PDF or outgoing facsimile with a designated network resource, and other actions in which the IVC generator obtains access to a file or bit stream under conditions specified for generating an IVC. A DDL record, at least the user-submitted version of a record, is generated and submitted to a DDL node, for example, DDL node 813, illustrated in
In box 1107, the next trigger event returns method 1100 to box 1105. The trigger event may be one of a plurality of events, based on the network resources associated with the IVC generator. An embodiment of an automated IVC generation system receives a file, generates an IVC, communicates the IVC to a DDL, stores information received from a DDL services provider, and repeats upon a recurrence of a trigger event. A trigger event may be receiving an email, receiving a facsimile, scanning a document, scanning a directory upon predefined conditions, scanning a directory for files not previously processed, and intercepting a document sent to a printer.
In box 1201, media is obtained, which contains the files to be processed. The selection of generating IVCs on the entire file contents or else using modification rules is made in decision box 1202. If modifications are to be implemented, the rules are applied in box 1203, and method 1200 proceeds to generate IVCs for each of the files in box 1204. In box 1205, the sequence of IVCs is placed in a text file, which could be a simple ASCII file, although other storage formats may be used. Boxes 1204 and 1205 may overlap in time, based on the memory resources available. In box 1206, the IVCs are sorted by value. This precludes a potential problem that might otherwise arise, by permitting generation of an IVC representing only file content, but which is blind to directory structure.
Since the text file will reflect the order in which files are selected for processing, and this is likely done by a control function ordering the files according to directory structure, the text file will depend on the directory structure. Although sets of IVCs will be the same for differing directory structure, the ordering of the individual file IVCs within the text file will depend on the structure. Thus, without a sorting process or some equivalent process that sheds the influence of the directory structure, an IVC generated to represent only the content of files on a media will additionally include the order in which the files were processed. This may be undesirable in some situations.
For many purposes, the directory structure of a set of files is not critical. In some cases it is important, but such an importance will be addressed by boxes 1208-1201. Setting aside the importance of file structure in order to perform integrity verification of file content allows for the possibility that a file moved, entirely intact, from one directory to another. In such a situation, the information content, apart from location, is intact and unchanged. It should then be possible to identify that the content is intact. Sorting the file IVCs by value can enable reliable recreation of the same final output text stream at two different times, initial generation and later validation, even if the directory structure has changed between. In box 1207, duplicate IVCs are detected and deleted. In some situations, this process can enable an identification of space saving opportunities if the files are not on permanent media, since the duplication of files can be brought to a user's attention for possible deletion. If directory structure is important enough that there is no need for an IVC that is blind to directory structure, boxes 1206 and 1207 may be omitted.
The IVC representing the file content is generated in box 1208, possibly blind to directory structure as noted previously. An IVC representing directory structure is generated in boxes 1209-1211, to compensate for the potential loss of information in the content IVC. At a later date, the content IVC and a structure IVC can be verified separately, and if a file has been moved intact, from one directory to another, or else a file name has been changed while the content remained intact, the changes to directory structure can be noted without spoiling the verification of the content IVC. A list of file names, including paths carrying the directory structure, is created in box 1209. This list is either alphabetized, or else is modified in box 1210 to correspond with the sorting and deletion of the IVC list in boxes 1206 and 1207. The file containing the list is then processed to generate the structure IVC in box 1211.
Similar to separating identification of changes to content and changes to file structure, changes to file attributes can be examined separately by use of an IVC generated in boxes 1212-1214. This can become important in situations wherein the initial IVCs were generated while a collection of files was on magnetic media, and then later the files were written to optical media, resulting in a change of the file attributes to read only. Some embodiments of method 1200 thus enable identification that an attribute change has taken place. In many operating systems (OSs), file attributes may be handled as integers, with specific bits of the integers representing logical attribute flags. In box 1212, the attribute flags, whether in integer or other representation, are compiled into a text file, which is sorted and/or otherwise modified in box 1213 according to one or more of boxes 1206, 1207 and 1210, to maintain consistency with the other IVCs. That is, the position of a particular file's name and path information in the directory structure information file may correspond to the position of the IVC for that file in the compiled IVC text file. If a particular duplicate file was deleted from the text files used to generate the content IVC and the structure IVC, it may not be desirable to retain a representation of that file in the attribute IVC. The attribute IVC is generated from the text file in box 1214.
If a single IVC is desired to simultaneously represent two or more of the content IVC, the structure IVC, and the attribute IVC, these are put into a text file in box 1215, and a composite IVC is generated in box 1216. The user now has four IVCs from which to choose as representative of the collection of files thus processed. Any combination of the content IVC, structure IVC, attribute IVC, and composite IVC may be sent to a DDL, depending on the submitter's anticipated needs. It should be understood that method 1200 may be tailored to a user's needs, including omitting unnecessary processes.
Generating and reporting IVCs in accordance with method 1200 has some advantages over the common practice of generating and reporting IVCs for each file individually. 1) The representation is compact, and so can be communicated easily. If IVCs were generated for each file individually, and stored securely in some location, and then IVCs were generated for the collection, the collection IVCs could be communicated first to any entity which desired to validate the collection. If the validation of the collection IVCs was successful, then the individual IVCs are not needed. Only if the collection IVCs failed the matching tests would the larger set of individual IVCs need to be provided. 2) The content IVC reduces the amount of information that is required to verify that no tampering has occurred. If a DVD is provided to a recipient who suspects that a DVD containing thousands, or tens of thousands, of files has been intercepted and substituted by a malicious third party, the recipient must obtain not only all the IVCs from the purported DVD creator, but also an extensive list of all the files on the DVD in order to identify any additions. If there has been any tampering, then such a list would be needed. However, if there has not been any tampering, a single content IVC will indicate that the DVD is intact, and that no files have been added, even without comparing a directory listing with a previously-generated list of files. 3) The use of the three separate IVCs enables identification of permissible changes to files, such as changing to read-only when being written to permanent media. 4) The use of the three separate IVCs enables separate identification of different types of changes to the file collection (content, directory structure, and attributes), while preserving indication of aspects which have not changed.
An embodiment of an IVC generation system receives a plurality of files having an associated directory structure, generates an IVC for each of the files, generates a list of the IVCs, and generates a content IVC representing the list of IVCs. The system may further sort the IVCs in the list of IVCs. The system may further delete duplicate IVCs from the list of IVCs. The system may further generate a file containing directory structure information and generate a structure IVC from the file with the directory structure information. The system may further alphabetize the file with the directory structure information. The system may further sort and modify the file with the directory structure information to correspond with sorting and modifying the list of IVCs. The system may further generate a file containing attribute information and generate an attribute IVC from the file with the attribute information. The system may further sort and modify the file with the attribute information to correspond with sorting and modifying the list of IVCs. The system may further sort and modify the file with the attribute information to correspond with sorting and modifying the file with the directory structure information. The system may further select two or more of the content IVC, the structure IVC and the attribute IVC and generate a composite IVC from the selected IVCs. The system may further communicate at least one of the content IVC, structure IVC, attribute IVC, and composite IVC to a DDL. The system may comprise a processor and/or software embodied on a computer readable medium.
In box 1301, an IVC generator is obtained, and a copy of a file to be archived is obtained in box 1302. The file may represent a single website page or other document, or a collection. The documents may be obtained by saving visited websites, copying files from an optical or magnetic computer readable medium coupled to a computer, or by another method. The selection of generating IVCs on the entire file contents or else using modification rules is made in decision box 1303. For websites html pages, it may be desirable to modify copies to exclude certain types of hyperlinks, advertisements, graphics, and portions of the file that do not pertain to the substance later to be asserted. If modifications are to be implemented, the rules are applied in box 1304, and method 1400 proceeds to generate an IVC in box 1305. Based on the modified IVC generation rules followed, multiple IVCs may be generated in box 1305. In box 1306, the uniform resource locators (URL) or other location identification information is appended to the copy of the file, to prepare for assertion of where the document was found. A second IVC is created in box 1307, reflecting the file appended with the location information. Although appending a URL to a saved copy of a webpage does not prove that the copy necessarily represents content found at the URL, the record will have some enhanced value if the credibility and integrity of the archiving process can be established.
One or more of the IVCs is submitted to a DDL in box 1308. A copy of the file is stored in a controlled archive in box 1310, and a database linking the IVC, URL, file name, and DDL timestamp or edition is appended in box 1311. An IVC for the database is generated and submitted to the DDL in box 1312. The value of submitting the IVCs to a DDL is that, when the documents need to be date proven, an asserted date may be established, even if the credibility of the archive maintainer is questioned. For example, one party in a dispute may assert that certain material had been posted to a website prior to a critical date, whereas the opposing party may claim it occurred later. If the party asserting the earlier date had implemented an embodiment of method 1300 on or before the critical date, the issue could be settled easily.
An embodiment of an IVC generation system receives a plurality of files from a plurality of visited websites or from a computer readable medium coupled to a computer, generates a first IVC for each of the files, appends location or name information to each of the files, generates a second IVC for each of the files, submits at least one of the IVCs to a DDL, stores copies of the files, and generates a database correlating the IVCs with the file names, location information, and/or DDL time information. The system may comprise a processor and/or software embodied on a computer readable medium.
Method 1400 allows for proving an asserted date for a file without retaining a copy, although it does involve the risk that the file will no longer exist at the needed time. In exchange for accepting this risk, the storage facilities of others may be leveraged at no cost to the entity generating the IVCs for the DDL and having an interest in asserting a date. Method 1400 has application when large volumes of files, or perhaps only a few files that are of significant size, are expected to be retained by others. Both of methods 1300 and 1400, along with others disclosed herein, may be done covertly, so that even the author of a file posted on a website is unaware that an IVC representing the file has been submitted to a DDL, unless the author independently generates an IVC and searches publicized DDL editions for a match.
In box 1501, a website is visited by the system building the search database to collect keywords, and in box 1502, an IVC is generated for a file found at the website. The website operator may have prepared the document for later date proofing in an attempt to render it tamper-evident, and thus may have previously generated an IVC for the file. The IVC and information facilitating reproduction may be within the file itself, or in an auxiliary file containing the IVC for that file and possibly others. In some embodiments, a visited website will have a filename associated with IVCs. If one is provided by the website, as determined in decision box 1503, method 1500 allows for validating the claimed IVC in box 1504. In some situations, the IVC claimed by the website operator may have been generated with a different IVC generator, and/or rules, than what is typically used by the search engine database builder. In some situations, this condition can be determined by examining the IVC generation identification information, if available. In some embodiments, boxes 1502 and 1503 may be swapped for efficiency, so that only a single IVC is generated, the one used to produce the claimed IVC. In some embodiments, the search engine database builder uses a preferred IVC generator and generates additional IVCs for validation purposes.
The website operator may be asserting a date for the document, and back this up with information pointing to a DDL record in a published DDL edition. If a date is asserted by the website, as determined in decision box 1505, method 1500 allows for searching a DDL edition for a match in box 1506, to verify the claimed date. If the website does not provide information suitable to sufficiently narrow a DDL search for a match with the IVC, archived results of prior searches, if available, can be used to determine a date. For example, an archive, such as a search engine cache, may have multiple stored versions of a website's contents. If a particular document appears in one version, but not in the version archived immediately prior in time, the DDL search could start with a set of DDL editions which were open during the period between the times the two archives were generated. The earliest DDL edition in which an IVC match is found can be reported as the document date. The claimed IVC and/or date, along with indicia of validity, and possibly an independently determined date, may be put into the search database, if the search engine operators deem such information relevant.
A document author who revises documents, but yet wishes to keep a record of revisions, for example revisions of changes to legislation in public law records, often puts a revision history in a footnote or in a revision section of the document. In order to work with an IVC system, the document author should include in the footer, along with the dates and descriptions of the revisions, IVCs for the documents as published on the identified dates. When a copy of a document is alleged to be a prior revision, the information necessary to verify the claim can then be found in the current document. Method 1500 facilitates tracking revision histories by identifying one in decision box 1507 and storing it in box 1508. As indicated by box 1509, boxes 1501-1508 are iterated in order to generate the searchable database, as represented in box 1510. The database entries may include an IVC generated for a document, dating information, claimed, verified, and/or independently determined, and information necessary to locate a DDL edition record for the document.
For typical search engines, the database has so many entries for common key words, that it is desirable to score the documents, as indicated in box 1511, to facilitate search result ranking. In the terminology used in the claims, the linked database can be the internet, linked documents include those pointed to, for example with a URL, and linking documents are those pointing to other documents, for example by containing a URL. A document may be simultaneously a linked document and a linking document. Processing includes activity necessary to generate search result lists that rank the documents according to the scores, upon a searcher providing a list of search terms.
A curious result of these methods is that they all allow for a possibility that appears invalid on its face. If two identical documents are available on the internet, but at different websites, their scores may be significantly different. One document may be ranked quite high, whereas an exact duplicate of that document may be ranked quite low. Thus, the fact that the content of a first document is effectively identical to the content of a second document is irrelevant when generating the scores used for ranking according to Page.
Using the methods and systems disclosed herein, including the incorporated U.S. patent application Ser. No. 12/053,560, “DOCUMENT INTEGRITY VERIFICATION”, a method of identifying duplicate documents can be used to adjust the scores of documents based on scores of their duplicates, for example by normalizing them to values closer together. Scores for documents linked to one of the duplicates may also be adjusted. Further, identification of document duplicates can assist with determining an earliest date, in the event that some of the duplicate copies are not dated or are associated with later dates.
It is important to note that Page clearly teaches away from this novel improvement to document scoring. Specifically, Page states “Intuitively, a document should be important (regardless of its content) if it is highly cited by other documents.” (Column 2, line 60 of '628, emphasis added.) Thus, Page explicitly teaches that scoring should not take document content into regard.
Since determining duplication among a set of documents necessarily requires taking content into regard, Page unambiguously teaches away from identifying duplicates when scoring and ranking document importance. Also, since determining document integrity necessarily requires taking content into regard, Page unambiguously teaches away from independently determining a document age or date when scoring and ranking document importance.
It is also important to note that neither comparing document names for similarity, nor comparing sets of detected keywords, provides a reliable comparison for content duplication. Two documents or files having identical content may have different names, based on the filing and naming convention used by various entities on possession of them. Additionally, many documents with widely varying content may be assigned a common default name, such as “New Microsoft Word Document.doc”. Identifying a plurality of documents all having the same name, therefore is not an identification of document duplicates. Further, some prior art search engines may identify similar keyword patterns in a plurality of documents, and upon identifying some of them as similar to documents that will appear in a search result list, at least some of the similar documents will be suppressed from appearing on the list. However, using a similarity in keyword detections is not a detection of duplicates, because such similarity detections currently allow for differences in keyword count, and even if identical keyword detections were required, the results would be exceedingly over-inclusive in an overwhelming majority of cases.
There is a difference between scoring a document and ranking the document in a search result list. A score and a rank are both search result list generation parameters, and either or both may be adjusted responsive to identifying duplication in a set of files. A score is a value or calculation associated with the document in a generated database correlating an identification of the document and/or its location, for example a URL, with a keyword useable for matching with search terms. A score is generated prior to a search by a searcher. A ranking is the ordering of list items, such as the document or a group of similar documents, in a search result list generated for a searcher in response to a search being conducted. In the absence of an adjustment to a ranking, a common default condition would be that ranking would be ordered according to scoring, typically with a higher score producing a higher rank that appears earlier in the list. Method 1500 pertains predominantly to scoring, whereas method 1600, illustrated in
In box 1512, duplicates are detected, thereby identifying at least one set of duplicates. Identification of duplicates can be computationally intensive, and therefore provides a plethora of opportunities for improvements in efficiency. An embodiment of a detection method is described, although it should be understood that many variations are possible that could operate more quickly, with a higher probability of detection, and/or with a lower rate of false alarms. To cut duplicate search time, comparing the IVCs may be done in stages, such that a first portion, possibly less than a full message digest, is compared. Responsive to a match, an additional portion is compared. For example, the first N bits of a message digest may be used in an equality comparison on processor capable of handling an N-bit integer with a single arithmetic operation. If there is a difference in the first N bits, further bits need not be tested, although if there is a match, the next set of N bits may be treated as integers for a rapid equality test. This may be iterated until two document IVC excerpts are found to no longer match, or else enough of the IVCs have been compared to merit a more comprehensive document similarity test, such as a bit-by-bit comparison. In some embodiments, a CRC can be used as an initial IVC for duplicate detection, since CRCs can generally be calculated more rapidly than MD-5 and SHA hash functions. However, since CRCs allow for collisions, a low-collision IVC may be used to suppress false alarms. Similarity criteria comparisons can be used for false alarm rejection, intermingled with comparing additional IVC portions, including similarity criteria that cannot establish duplication, such as comparing file sizes and/or keyword count, because using such comparisons may be faster for rejecting false alarms than would be generating a longer IVC. Additional non-IVC similarity checks may be performed prior to, during, or after the IVC portion duplication checks. Using IVCs to test documents encountered by a webcrawler may generate such a large volume of IVCs that it will allow for studying collision rates for various IVC generators. However, for identifying duplicate documents on a large scale a cyclic redundancy check (CRC) algorithm provides faster IVC generation. Generally, the faster the calculation, the higher the probability of a false alarm.
Some embodiments may generate IVCs for only content deemed to have importance for determining duplication, and other content which is unimportant and is therefore non-determinative of duplication. Two documents can then be identified as duplicates if the important content matches, but the unimportant, excluded content differs. Examples include advertising information, such as banners, content that may be generated specific to certain visitors, content generated based on visitor number, and content that is likely to be excluded from a search database. The use of modified IVC generation or non-modified IVC generation may be determined by file type. For example, modified IVC generation might not be used with PDFs and other files having file name extensions indicating some degree of stability. However, files having an html extension may be subject to modified IVC generation that excludes file content that is likely to change rapidly and be unimportant to a document searcher. Thus, two files may differ by factors deemed to be unimportant for duplication detection, and still be identified as duplicates for the purposes of search engine scoring and result list ranking.
In box 1513, the duplication information is used to adjust the score of at least one of the linked documents. One theory applicable to adjusting scores is that a higher count of duplicates indicates wider recognition of importance. Another theory is that each copy of a single base document, possibly allowing for unimportant changes, should receive the same importance score, since the substantive content is the same. Neither theory is perfect, but both may be used as guidelines in adjusting a score. Adjusting the score of a document would result in bringing its score closer to the score of a duplicate. Possibilities include adjusting the score of one or more of the duplicates closer to a score for another document in the same set of duplicates. Possibilities also include calculating an average of all the duplicates found, and adjusting the score for at least one of the duplicates by moving it closer to the average. Some embodiments may assign the average as a common score to all duplicate document copies, whereas other embodiments may use the average as a factor and allowing at least some of the duplicates to retain differing scores. If a particular document has a large number of detected duplicates, the distribution of the scores prior to adjustment based on the duplication detection may provide a metric for comparing the validity of a particular scoring algorithm. Thus, method 1500 has an added value of providing an opportunity to refine search engine document scoring methods.
In box 1514, a DDL edition is used to provide information useable to adjust a document's importance score. Some theories for the relationship between a DDL and a document's importance include that a provably older document may be more important for certain keywords, and that a document for which an IVC can be found in a DDL is more important, based on the fact that it can be tested for integrity and has been deemed significant enough for registration with a DDL. Thus, detecting an IVC for a file in a DDL edition may provide a basis for raising the document's importance score over an otherwise similar document. Additionally, based on a combination of keywords found in a document, an older document may have its score raised. At least some of the theories for adjusting a document score also apply to adjusting the document's rank in a search result list. In box 1515, scores are adjusted for documents linked to those with adjusted scores.
If default rules are to be used, method 1600 proceeds to box 1606, in which a search result list is generated. The processes represented by boxes 1604 and 1606 may be similar, and may involve searching through a previously-compiled database for keywords that are similar to search terms and variations, such as corrected spellings and/or plurals, of search terms. In some embodiments, the database keywords are root words, rather than the exact versions of the words appearing in the corresponding document. In box 1607, if default rules are not to be used for handling duplicates, the searcher (the search engine user) is provided with an option selection for handling duplicates. Options may include one or more of grouping duplicates together in the result list, suppressing duplicates in order to provide a more diverse result list, prioritizing documents with a high number of duplicates, deprioritizing documents with a high number of duplicates, and ignoring duplicates. In box 1608, the searcher is provided with an option selection for handling document age. Options may include one or more of grouping common ages together in the list, provide a more diverse result list based on age, prioritizing documents with an older date, deprioritizing documents with an older date, and ignoring age. In box 1609, the searcher is provided with an option selection for handling the result of the search engine database generation method identifying a DDL record corresponding to a document. Options may include one or more of grouping common registered documents in the list, provide a more diverse result list, prioritizing registered documents, deprioritizing registered documents, and ignoring DDL records. The user selected options are determined in box 1610.
In box 1611, the ranking of at least one list item, indicating a document, is adjusted in the search result list. A list item for a document identified in the search result list may comprise a hyperlink to the document; a preview description; a claimed date; a verified age; a date of a DDL edition having a registration record for the document; at least one portion of an IVC, claimed and/or independently generated; information to assist with independent verification, such as a link to an online DDL edition and IVC generation information; a count of duplicates; links to duplicates of the document; and indication as to whether a document has been registered with a DDL. It should be understood that, in some embodiments, additional or less information may be provided. In some embodiments, if the search engine database generation process did not independently validate claimed age and IVC information, the search result list may provide information to a searcher to facilitate a validation, such as a hyperlink to a DDL edition and/or a website hosting a DDL.
With embodiments of method 1600, a searcher may specify whether a document's age, number of duplicates, and/or registration with a DDL to enable date proving and integrity verification, render a document more important or less important. Additionally, grouping list items enables a searcher to see multiple options for sources of the same document. For example, if a searcher was looking for a specific document known to be available from multiple websites, once the searcher scrolls through the list to identify one copy of the document, the other copies are more readily available. However, if a certain document was widely copied and dispersed, but is of no interest to a searcher who selected a diverse list, the searcher does not need to scroll past a large number of effectively duplicated list items. The effectively duplicated list items differ mainly by URL rather than substantive content, and waste search time if a searcher is looking for a relatively obscure list item. One possible option for implementing a grouping adjustment is to place duplicates under a single list item, indicating multiple duplicates are available, and using the URL of the highest scored version of the duplicates, so that the search result list is hierarchical. Selecting the list item would then either select the featured copy or provide a list of the duplicates, based on provided links and/or user selection. The higher level of hierarchy, above a list of effective duplicates, would then provide a diverse list, likely more compact, since duplicates are pushed down to a lower level, rather than remaining on a single level. Thus, embodiments of method 1600 generate a search result list as a hierarchical list, wherein a first list level is diverse with respect to document duplicates, and a lower list level identifies document duplicates. Hierarchical groupings may also be provided in a search list based on age and/or DDL registration.
In decision box 1612, a decision is made as to whether a DDL link will be included in a list item. Providing a DDL will enable a user to validate a claimed age and DDL registration independently which, in some situations, may reduce the computational search load on search engine equipment compiling the search engine database. If so, a link is added in box 1613, and the search list is presented to the searcher in box 1614.
A computer implemented method of scoring a plurality of documents may comprise: identifying a plurality of linked documents; identifying linking documents that link to the linked documents; determining a score for each of the linked documents based on scores of the linking documents that link to the linked document; processing the linked document according to the determined scores; identifying, within the plurality of linked documents, at least one set of duplicates; and for a first linked document in the set of duplicates, adjusting the score and/or a ranking of the document in a search result list. The method may further comprise generating a first IVC for each of the linked documents. The method may further comprise submitting at least one of the generated IVCs to a DDL, wherein generating an IVC may comprise generating a hash function message digest and/or calculating a CRC. Identifying a set of duplicates may comprise comparing at least a first portion of the first IVC for the first document with a corresponding portion of the first IVC for a second document. Identifying a set of duplicates may comprise comparing a second portion of the first IVC for the first document with a corresponding portion of the first IVC for the second document, responsive to identifying a match between the compared IVC portions. Identifying a set of duplicates may comprise generating a second IVC for each of the first document and the second document, responsive to identifying a match between the compared IVC portions; and comparing at least a portion of the second IVC for the first document with a corresponding portion of the second IVC for the second document. Identifying a set of duplicates may comprise comparing a size of the first document with a size of a second document.
Adjusting the document score may comprise changing the score to a value closer to a score of a duplicate of the first document. This may involve bringing one score closer to another, and/or averaging multiple scores and bringing a score for at least one of the duplicates closer to the average score. Adjusting a ranking of the document in a search result list may comprise moving a list item indicating the first document closer to a list item indicating a duplicate of the first document, thereby displacing another list item in the search result list. Adjusting a ranking of the document in a search result list may comprise moving a list item indicating the first document away from a list item indicating a duplicate of the first document, thereby displacing another list item in the search result list. The method may further comprise adjusting a score for at least one document not identified has having a duplicate, and linked to the first document. Identifying a set of duplicates may comprise identifying, within each of the linked documents, content that is determinative of duplication and content that is not determinative of duplication, wherein the set of duplicates comprises a second document having determinative content identical with the first document and non-determinative content differing from the first document. The method may further comprise determining a date for the first document. The method may further comprise adjusting a score and/or a rank based on the date. The method may further comprise adjusting a score and/or a rank based on the document displaying a claimed date and/or IVC. The method may further comprise adjusting a score and/or a rank based on an IVC representing the document appearing in a DDL. The method may further comprise searching a DDL edition for a match with the first IVC. The method may further comprise receiving, from a searcher, an option selection indication for processing duplicate documents; and generating the search result list responsive to the received preference. The method may further comprise receiving, from a searcher, an option selection indication for processing documents based on age; and generating the search result list responsive to the received preference. The method may further comprise receiving, from a searcher, an option selection indication for processing documents based on representation in a DDL; and generating the search result list responsive to the received preference. The method may further comprise presenting, to a searcher, an option selection, wherein the option selection comprises a first option for grouping document duplicates in the search list and a second option for presenting a diverse search list. Many of the boxes illustrated in any methods associated with a particular one of
A computer program embodied on a computer executable medium and configured to be executed by a processor may comprise: code for identifying a plurality of linked documents; code for identifying linking documents that link to the linked documents; code for determining a score for each of the linked documents based on scores of the linking documents that link to the linked document; code for identifying, within the plurality of linked documents, at least one set of duplicates; and code for adjusting at least one search result list generation parameter responsive to identifying the set of duplicates. An apparatus for scoring a plurality of documents may comprise: a processor; a computer readable medium comprising: a database correlating locations of each of a plurality of linked documents with keywords, importance scores, and indicia of content duplication; and a search module configured to adjusting the importance score a document and/or a ranking of the document in a search result list. An embodiment of apparatus is illustrated in further detail in
An embodiment of an internet browser and/or an browser plug-in is configured to identify a claimed date of a visited website file, identify a claimed IVC, identify IVC generating information, generate an IVC for the file, compare the claimed IVC with the generated IVC, search a DDL for a published IVC matching the generated IVC and/or claimed IVC, and/or report an indication of matching and/or mismatching results. Embodiments of internet browsers, browser plug-ins, and/or other software related to any of the disclosed methods, may comprise a computer program embodied on a computer readable medium and configured to be executable by a processor. Embodiments may also comprise hardware, including ASICs and FPGAs.
In box 1801, a website interface is provided for visitors, which is configured to accept an indication of a URL pointing to the file to be checked for integrity and/or date. In box 1802, a visitor is received, either at the direction of the user, or automatically, based on redirection from referring website and/or browser automatic dating functionality. The URL for the file to be tested is received in box 1803. Optionally, the claimed IVC may be provided, in addition to or instead of the URL. In box 1804, the claimed IVC and generation information is received. Options for performing this process include receiving the information from the visitor's computing resources and independently visiting the URL or another node storing the information for the document at the identified URL. If generating information is not provided, the method, or any others disclosed herein, may perform a trial-and-error test using a set of likely IVC generation functions. In box 1805, the DDL edition containing a record for the document is identified, according to the claims of the website operator hosting the tested document. Alternatively, another database can be referenced that linked the document, either by URL or name, to a DDL edition. If this information is not provided, the DDL search may take longer, but may still be possible in some circumstances.
A verification IVC is generated in box 1806, and is tested for a match with the claimed IVC, if one exists, in decision box 1807. If there is a mismatch, this is reported to the user's computing resources in box 1808. If there is a match, or else no claimed IVC was identified, the DDL is searched for a record having a match with the independently generated verification IVC in box 1809. A mismatch, as determined in decision box 1810, is reported in box 1811, whereas a match, indicating a validation, is reported in box 1812. It should be understood that variations exist, including that the file validation system receives the document itself from a visitor, in addition to or instead of the URL or other location information.
An embodiment of an internet file validation system comprises an apparatus configured to receive an input identifying a file to be validated; to identify a claimed date of the file; to identify a claimed IVC representing the file, to identify IVC generation information; to generate an IVC for the file; to compare the claimed IVC with the generated IVC; to search a DDL for a published IVC matching the generated IVC and/or claimed IVC; and/or to report an indication of matching and/or mismatching results.
A copy of the DDL edition having a record corresponding to the file is received in box 1902. This DDL edition is the one in which the file had been registered. The value of the DDL is higher when so many copies so widespread and under the control of so many different entities, having diverging interests, that forgery of the DDL edition would be readily detectable using another copy. Since the DDL edition contains one-way IVCs that free submitters from the concern that content of their registered files might be disclosed, DDL edition is used for ascertaining the IVC value, rather than reproducing a copy of the file. A DDL copy may be received from the entity asserting a date and integrity, another entity questioning date and integrity, and/or a neutral entity possessing a copy, but taking no position on date and integrity. In box 1903, date information for the DDL is received, for example the date at which the DDL edition was received by an entity other than the one publishing the DDL. The date information may come from the records of the entity providing a copy of the DDL edition and/or public records, for example public record 317, illustrated in
The record is identified in the DDL, in box 1904, and additional information, including IVC generation information and/or a timestamp is identified in box 1905. If the validation process proves to be successful, the timestamp may be reported and/or included in a validation certificate issued by the TI as part of box 1909. An independent IVC is generated in box 1906, and it is tested for a match with the IVC in the DDL record in decision box 1907. If there is a mismatch, this is reported in box 1908. A validation certificate, for example validation certificate 407, 507 or 607, is issued in box 1909. If the record contains a timestamp issued by a TTSA, this may be reported on the certificate. Additionally, if the DDL contained digitally signed information from a TTSA, which enables trusted timestamping validation, for example a copy of a signed hash, such as encrypted hash value 111, a system similar to system 200, illustrated in
In box 2001, a copy of a record accepted by the challenger, or by court order, if method 2000 is performed as part of a litigation procedure, is received by a TI. This record may be a public record, for example public record 317, or a record in a copy of a DDL edition with a trusted date. In box 2002, a copy of the DDL edition represented by the record is obtained. An independent IVC is generated for the DDL edition in box 2003, and it is tested for a match in decision box 2004. If there is a mismatch, this is reported in box 2005. A validation certificate, for example validation certificate 517 or 617, is issued in box 2006. If the current DDL edition is the final one requiring testing, the DDL edition containing the record for the disputed document, as determined in decision box 2007, method 2000 performs an embodiment of method 1900 as part of the process represented by box 2008. As used herein, final edition should not be interpreted to mean last edition tested in time, since the order of testing can be rearranged. However, if the decision box 2007 indicates that the validation chain is incomplete and another DDL edition requires, in box 2009, the record for the next DDL edition to be tested is found in the DDL edition just validated. Method 2000 then returns to box 2002 to iterate the validation process for another DDL edition.
A method of establishing a file date comprises receiving a copy of the file; generating an IVC for the file; receiving a copy of an IVC representing the file; establishing a date for the received IVC; comparing the generated IVC with the received IVC; and generating a report responsive to the generated IVC matching the received IVC. The method may further comprise decrypting an encrypted TTSA record. The method may further comprise reporting the establishing a date for the received IVC as a date for the file. The method may further comprise iteratively establishing dates for chained DDL editions, wherein a first one of the chained DDL editions has an accepted date and a second one of the chained DDL editions comprises the received IVC.
In box 2101, the asserting entity provides a copy of the file, which is received by the challenger in box 2102. The challenger generates an IVC for the file in box 2103. In box 2104, the asserting entity provides copies of DDL editions that can be chained until a record that is accepted by the challenger, and these copies are received in box 2105. In some embodiments, the challenger may already possess the file and/or DDL editions, or may obtain copies from another source. The challenger generates IVCs for the DDL editions in box 2106, if a chaining validation process is required to establish a date for the DDL edition having a record representing the file. The chaining validation process is performed in box 2107, and the validation of the file with the DDL edition is performed in box 2108.
Record 305a is illustrated as comprising a record index 2204, shown as 100, which indicates that record 305a was the 100th entry to first DDL edition 312, and indicia 2205 of the IVC generating functions and software version. Record 305a is further illustrated as comprising an encrypted timestamp record 2206, which will permit verification of timestamp 306 if the timestamping authority is trusted, and indicia 2207 that indicates both a TTSA identity and the specific TTSA key used for signing encrypted timestamp record 2206.
An apparatus for establishing a date of a document may comprise a computer readable medium containing a database edition, wherein the database edition comprises a first record and a second record. The database edition may further comprise a third record. The first record contains an IVC representing a first document or collection of documents received from a first database contributor or record submitter. The second record contains an IVC representing a second document or collection of documents received from a second database contributor or record submitter. The third record contains an IVC representing a prior database edition. The computer readable medium comprises one or more of an optical medium, such as a CD or DVD, a printed medium adapted to enable computer scanning and/or an optical character recognition (OCR) process, volatile or non-volatile memory. The computer readable medium may further contain a timestamp for the database edition. A record in the database edition may further contain one or more of IVC generation method indicia, a timestamp, an encrypted timestamp record, an identification of a timestamp authority, and a record index.
Computing apparatus 2301 comprises a CPU 2302, although it should be understood that a plurality of CPUs may be used within computing apparatus 2301. Computing apparatus 2301 further comprises memory 2303, which is coupled to CPU 2302. Memory 2303 may comprise volatile RAM, non-volatile RAM, and other computer-readable media, such as optical and magnetic media. Memory 2303 comprises digital document 803, and an IVC generator 2304 which may contain the functionality of one or more of IVC generators 304, 309, 314, 320, and 810. IVC generator 2304 is illustrated as comprising data sequence modifier 2305 and modification rule module 811, to enable generation of IVCs reproducible from a printed document version. Memory 2303 also comprises file processor 2306, which may comprise file parser 812, a word processor suitable for creating a document, software capable of intercepting network traffic and extracting attached documents, or software capable of creating and/or processing other types of computer files. Memory 2303 also comprises security module 809.
IVC database 814 is illustrated as comprising first DDL edition 312, second DDL edition 323, and another database 2307. Database 2307 may be another DDL edition or a database linking IVCs and URLs, which facilitates finding duplicate documents at different internet sites. Memory 2303 also comprises timing module 815, account database 816, cryptographic module 2308 and cryptographic keys 2309. Some embodiments of cryptographic module 2308 comprise the functionality of public key encryption module 109 and/or public key decryption module 109. Some embodiments of cryptographic keys 2309 comprise private key 110 and/or public key 210. Search engine database 2310 comprises data suitable for providing a search engine service, whether internet-based, intranet-based, or on a stand-alone computing resource. Search engine database 2310 comprises at least one set of data necessary to enable duplicate detection for at least some of the referenced documents. In some embodiments, this will be a set of IVCs, whether entire hash function message digests, incomplete portions of message digests, CRCs, or any other data string capable of representing document content integrity. Memory 2303 also comprises an internet browser 2311 which comprises document dating capability using a DDL, for example through DDL interface plug-in 2312. Control module 2313 may comprise a module for hosting a DDL submission or searching site, search engine database generation functionality, search engine hosting functionality, automatic document archiving functionality, automatic document search and IVC generation capability, automated IVC submission functionality, and any other computing functions described herein. Computing apparatus 2301 further comprises a network interface module 2314 for interfacing with a computer network, for example a local area network (LAN) and/or the internet.
An apparatus for establishing a date of a document may comprise a computer program embodied on a computer readable medium, and configured to be executed by a processor, whether as compiled instructions or interpreted instructions. The program may comprise one or more modules containing computer code. An apparatus for establishing a date of a document may comprise a computing device comprising a processor and one or more executable modules, either fixed in circuitry, in a memory containing computer code, or in a combination. An apparatus for establishing a date of a document may be configured to generate an IVC for a digital file, request remote generation of an IVC for a digital file, receive submitted IVCs from a plurality of submitters, and/or provide access to a DDL to enable searching by a user. An apparatus for enhancing a search engine operation may comprise a search engine module configured to generate a search engine database and/or generate a search result list for a searcher.
Although various novel concepts are introduced separately, they are compatible with each other. Therefore it is specifically contemplated that combinations will be formed, such as by intermixing ideas and components introduced by any of the figures. That is, examples associated with
A primary difference between a permissioning entity and a trusted entity is that, whereas a trusted entity (e.g., a trusted timestamping entity, document escrow agent) must be trusted to represent critical facts truthfully and accurately, in order to establish a no-later-than date-of-existence and integrity for a challenged document, there is no need to trust a permissioning entity. For scenarios in which a trusted entity is needed, document challengers and arbiters must trust the trusted entity and, if the trusted entity's assertions are incorrect (i.e., the trusted entity is dishonest or even simply making an honest error) the trusted entity might falsify the proof —either improperly denying a correct no-later-than date-of-existence and integrity for a document, or improperly attesting to an incorrect no-later-than date-of-existence and integrity for a document. For scenarios in which a trusted entity is not needed, but a permissioning entity is needed, failures by the permissioning entity, whether due to dishonesty or simple mistake, result in significantly less serious consequences: a record is not entered into the blockchain in a timely manner, and/or records are entered into the blockchain that fail the criteria for inclusion.
If a permissioning entity makes repeated mistakes of not including records in a timely manner, the utility of the blockchain for protecting the documents already registered is not lessened. Document owners, who have already registered documents, are still safe. New documents can be submitted to a different blockchain with, hopefully, a better permissioning entity. In stark contrast, for trust arrangements requiring the use of a trusted entity, a single act of dishonesty by the trusted entity can threaten the protection of all documents. Document owners, who have already registered documents, may lose all their ability to establish no-later-than dates-of-existence and integrity for their registered documents. This is a tragic situation, and a serious risk presented by using trust mechanisms that rely on trusted entities.
Another difference between a permissioning entity and a trusted entity is that, if the trusted entity ceases operations, document owners, who have already registered documents, may lose all their ability to establish no-later-than dates-of-existence and integrity for their registered documents in this scenario, also. In stark contrast, if a permission entity ceases operations, the consequence is limited to document owners not being able to register new documents into the blockchain whereas, for previously-registered documents, no-later-than dates-of-existence and integrity remain safely verifiable. Thus, there is an additional risk factor for systems that use trusted entities, to which systems that need only permissioning entities are not susceptible. The basic issue is that trust in a trusted entity is critical, because a trusted entity can affect proof regarding already-registered documents, whereas a permissioning entity cannot affect proof regarding already-registered documents, in the examples disclosed herein.
Description of blockchain 2400 will begin with an intermediary block 2402b, that is neither the initial block nor the final block in blockchain 2400. In some examples, the operations described herein, associated with blockchain 2400, are performed using one or more computing devices 4800 of
Multiple documents 2406f, 2406g, and 2406h are to be registered in blockchain 2400, specifically, block 2402b. Therefore, each of documents 2406f, 2406g, and 2406h is hashed (or some other integrity verification code operation is performed) by IVC generator 2408 to generate hash values 2410f, 2410g, and 2410h, respectively. These are then entered into records 2404f, 2404g, and 2404h, respectively, as is described in further detail with respect to
Block 2402b is then hashed by IVC generator 2408 to generate hash value 2410b, which is entered into record 2404b in a block 2402c. Block 2402c is subsequent to block 2402b, and record 2404a, which represents block 2402b, is used to chain block 2402b with block 2402c. Additionally, in order to establish a no-later-than date-of-existence for block 2402b, hash value 2410b is published in a public record 2412b, for example in another advertisement in a printed publication. In some examples, public record 2412a and public record 2412b are published the same day (e.g., separate classified ads in the same newspaper edition). In some examples, public record 2412a and public record 2412b are published on different days, with public record 2412b following public record 2412a.
The process repeats for documents 2406k, 2406m, and 2406n to be registered in blockchain 2400, specifically, block 2402c. Therefore, each of documents 2406k, 2406m, and 2406n is hashed by IVC generator 2408 to generate hash values 2410k, 2410m, and 2410n, respectively. These are then entered into records 2404k, 2404m, and 2404n, respectively. Block 2402c is then closed and published in one or more public locations, such as on a website 2440 and/or transmitted to a plurality of dispersed blockchain nodes. Also, in some examples, block 2402b is written to a fixed media 2442c, such as a DVD, and distributed (see
A date field 2508 indicates the date of publication of public record 2412, and therefore, establishes the no-later-than date-of-existence for a PEDDaL® block 090310a as Mar. 19, 2009. Because the specific public record (classified ad 212 within the USA Today newspaper) was published to large base of readers, who would have noticed if date field 2508 had been incorrect, after publication and distribution, the date in date field 2508 became a trustworthy date.
Administrative data 2710p includes generator version information 2810p, a first timestamp in a first timestamp field 2812p, a second timestamp in a second timestamp field 2814p, other administrative data 2816p, a linked record locator field 2802p, and an index value in an index field 28004p. In some examples, second timestamp field 2814p contains an encrypted timestamp from a trusted timestamping entity (a.k.a. trusted timestamping authority, TTA), for example encrypted with the trusted timestamping entity's private key, as a form of a digital signature of the timestamp. The index is to assist locating records within specific blocks. Together, a block identification and a record index specify a blockchain address 2818, which provides the location of a record within blockchain 2400. In some examples, record 2404p has the following format in ASCII text:
Linked record locator field 2802p indicates linked record values that indicate the location of other records (or a portion of the contents of the other records) in blockchain 2400, and possibly also in different blockchains (i.e., blockchains other than blockchain 2400). As indicated, linked record locator field 2802p has a flag 2820q, an index 2804q, a flag 2820r, an index 2804r, a flag 2820k, a block identification 2822c, and an index 2804k. Flag 2820q indicates that the next bit field, containing index 2804q indicates an index within the same block. Similarly, flag 2820r also indicates that the next bit field, containing index 2804r indicates an index within the same block. Index 2804q is the index for record 2404q, and index 2804r is the index for record 2404r. As can be seen in
In some examples, the flags may be combined with the block identification, such as by having a format with two bit fields: one for the block identification and one for the index. If the index is within the same block (e.g., the case for flags 2820q and 2820r, described above), the bit field for the block identification is padded with zeros. If the index is not within the same block (e.g., the case for flag 2820k), the bit field for the block identification is populated with the block identification, which will be different than all zeros. Thus, in some examples, the flags are not dedicated bit fields, but are instead inferred from whether the block identification is padded with zeros or filled with non-zero values. In some examples, a flag indicating that the index is within the same block is shorter, such as a single character, for example the ASCII character for the number 0 (zero). In some examples, linked record locator field 2802p has the following format in ASCII text:
Characters 199-211: 13-character linked record locator #4 (used last);
Characters 212-224: 13-character linked record locator #3;
Characters 225-237: 13-character linked record locator #2; and
Characters 238-250: 13-character linked record locator #1 (used first).
In some examples, the block identifications have the following format in ASCII text: YYMMDDa=seven (7) characters. In some examples, the indices have the following format in ASCII text: six (6) digit (hex) integer identifying the counted position of the record within the block. For example, an index of 000002 with 256-byte records (on a 1 character=1 byte machine) indicates that the record starts at character 257 within the block. With this scheme, each linked record value is 13 characters (7+6=13), although different formats and lengths are possible.
As an example, consider a 256-byte (256-character) record having the following set of characters in positions 199 through 256: “xxxxxx00 00000000 00018082 5A000999 180825A0 00998000 00123456 78000333”, where x indicates unknown. The index is 0x333, indicates that these linked records appear within the 333rd record (in hexadecimal, 819 in decimal) in the block. The linked record locator field has three linked records, two within prior blocks, and one within the same block. The linked records in the prior blocks are in block 180825a, at index 0x998; and in block 180825a, at index 0x999. The index values are in hexadecimal, the decimal values are 2456 and 2457, respectively. The example linked record that is also within the same block is not referenced by index value (just for this example), but is instead referenced by a portion of the contents of that linked record. In some examples, the first octet (i.e., the first 8 characters) of the SHA-1 message digest of the other record is used as a reference or pointer to a linked record. Specifically, that linked record has the first octet identified as “12345678”. In order to find that linked record in this scheme, the other records in the block are searched until a record is found that contains 12345678 in the position corresponding to the first 8 characters of the SHA-1 message digest. Since the octet is eight (8) characters in length, in order to preserve a 13-character scheme for a linked record locator field, the zero-padding is reduced to five (5) characters. This referencing by the first SHA-1 octet can be used when the index value of a linked record is subject to change. Index values can change if, for example, an earlier (within the block) record is removed because of problematic content, or is a duplicate of another record.
Using this information, linking map 3000 can be generated. As seen in linking map 300, record 2404p links to records 2404q, 2404r, and 2404k, directly. Record 2404p links back to record 2404p, duplicates the link to record 2404r, and directly links to record 2404g. Record 2404r links to records 2404s and 2404t, directly. Record 2404k links to records 2404m and 2404h, directly. Thus, record 2404p is linked through a daisy chain to record 2404h. In total, nine (9) records are linked via a daisy chain, even though no single record links to more than three (3) records directly. The linking handles multiple records within a block, as well as spans multiple blocks. With this scheme, an unlimited number of records can be linked across an arbitrary number of blocks, with the primary limitation being that a particular record can only link to contemporaneous and preceding records.
A real-world example exists for the PEDDaL® blockchain. Block 191205a contains two records, one ending in “0000000 00002A 0000000 0000A4 100109A 000004 0000000 00001F 0000A3” and the other ending in “0000000 00001F 0000000 0000A3 100109A 00000F 0000000 00002A 0000A4”. This means that the record at index 0xA3 (164 in decimal) is linked to records with index values 0x2A, 0xA4, and 0x1F within its same block 191205a, and also the record at index value 0x4 in block 100109a. Also, the record at index 0xA4 is linked to records with index values 0x1F, 0xA3, and 0x2A within its same block 191205a, and also the record at index value 0xF in block 100109a. The records at indices 0xA3 and 0xA4 are directly linked to each other. The record at index 0xA3 is not directly linked (first tier link) to the record at index value 0xF in block 100109a. However, the record at index 0xA3 is daisy chained (linked via a daisy chain) to the record at index value 0xF in block 100109a, through the record at index 0xA4. Similarly, the record at index 0xA4 is daisy chained to the record at index value 0x4 in block 100109a, through the record at index 0xA3.
Operation 3314 includes populating a linked record locator field and includes operations 3316 through 3320. Operation 3316 includes generating flags to specify whether a linked record is within the same block or a different block. Operation 3318 includes adding block identification for those linked records that are in a different block. Operation 3320 includes adding a linked record value, for example a record index or a portion of the content of the linked record (e.g., the first octet of the SHA-1 message digest). In some examples, adding a linked record value comprises adding a blockchain address for another record. Operation 3322 iterates operations 3316 through 3320 until all links are complete for the current record. Operation 3324 then iterates operation 3302 for all submitted records.
If, however, the document IVC match, then operation 3520 reports success for that first match, and operation 3522 generates an IVC for the block. The public record is identified in operation 3524 and the public record is retrieved in operation 3526. Operation 3528 includes identifying the block IVC in the public record, and decision operation 3530 includes comparing the IVC generated in operation 3522 with the IVC identified in operation 3528. If they are different, then operation 3532 reports a failure. Otherwise, operation 3534 reports that the integrity of the contested document has been verified and uses the date of the public record (Retrieved in operation 3526) as the no-later-than date-of-existence for the contested document.
An access control 3602 controls read and write privileges for documents and other data within document corral 3600. A set of users 3604a and 3604b have both read and write privileges, as permitted by access control 3602. A read-only user 3606 has only read privileges, as enforced by access control 3602. A write-only user 3608 has only write privileges, as enforced by access control 3602. In some examples, write-only user 3608 enters documents into document corral 3600 that are obtained from other sources, rather than authored by write-only user 3608. As illustrated, user 3604b has a local copy 3610 of at least some of documents 2406f-2406t. It should be understood, however, that any of other users 3604a, 3606, and 3608 can also have local copies of at least some of documents 2406f-2406t. Access control 3602 restricts access to document corral 3600 to only users 3604a, 3604b, 3606, 3608, and permissioning entity 2401. In some examples, each of users 3604a, 3604b, 3606, 3608 is restricted to accessing certain directories and/or documents (or files) within document corral 3600. That is, in some examples, access control 3602 does not grant a particular user access to the entirety of document corral 3600.
A document monitor 3612 determines when documents within document corral 3600 (e.g., any of documents 2406f-2406t) are new or altered and triggers generation of a blockchain record (e.g., record 2404f) using record generator 2608. In some examples, permissioning entity 2401 uses record generator 2608 to generate records upon receiving an alert from document monitor 3612. In some examples, a user (e.g., user 3604b) uses record generator 2608 to generate records upon submitting (writing) documents to document corral 3600. Upon some trigger event, such as the number of document records awaiting entry into blockchain 2400 reaching a threshold, or a schedule, or some other trigger event, permissioning entity 2401 uses block generator 2708 to generate a new block that includes at least some of the records awaiting entry into blockchain 2400. Additionally, a linked record field is populated with linked record values, in accordance with linking instructions, if any are provided. In some examples, permissioning entity 2401 follows at least a portion of flowchart 3200 when adding a new block to blockchain 2400.
Copies of blockchain 2400 are then distributed among users 3602a, 3602b, 3606, and 3608, as well as possibly also stored within document corral 3600 and made available to any other interested member of the public. It is the widespread distribution of blockchain 2400, placing copies of blockchain 2400 out of the control of permissioning entity 2401 that renders blockchain 2400 readily tamper-evident. It is this tamper-evident property that provides the trust element because, with any tampering so trivially detectable, an absence of detecting tampering can be interpreted as an absence of tampering having occurred.
Users 3604a, 3604b, and 3606 can use blockchain 2400 to verify that any documents newly added to document corral 3600 have a corresponding record within a recent block in blockchain 2400. This can be accomplished easily, merely by hashing a local copy of the document, and searching within blockchain 2400 for any record that contains the hash. In some examples, permissioning entity 2401 alerts the user who submitted the document into document corral (and also other interested parties) the block ID (e.g., a sequential number code assigned to a block) and record index, so that interested parties can go straight to the identified record and verify its accuracy without having to perform a search. If any recently-submitted documents do not have a corresponding record, interested parties can alert permissioning entity 2401, as well as other interested parties, about the gap, so that permissioning entity 2401 is on notice of a deficiency that requires remediation.
When users 3604a, 3604b, and 3606 retrieve documents from document corral 3600, they can use blockchain 2400 to verify that the documents have not changed since the time of the earliest corresponding record within blockchain 2400. Any documents for which no corresponding record exists within blockchain 2400 (e.g., no record contains the hash value (message digest) of the document) are treated as unverified. Additionally, in the event that any of users 3604a, 3604b, and 3608 retrieves a set of documents from document corral 3600, the set of documents can be checked for completeness by using linked record locator fields. (See
New records are generated for new and altered documents in operation 3708. That is, operation 3708 includes based at least upon detecting an addition or alteration of a document within the document corral, generating a blockchain record for the document. In some examples, linking data for sets of documents is also generated. In such examples, operation 3708 includes generating a blockchain record with a linked record value. In some examples, the linked record value indicates a prior version of an altered document. In some examples, the linked record value indicates a second document that is related to a received document. In such examples, the document relationships would need to be identified, such as specified by a user, electronically extracted from a data structure, or perhaps determining that both documents were attachments to a common message or appeared in a common source location. In some examples, users of the document corral are notified when records corresponding to their submitted documents are generated, and at least a portion of the records (e.g., IVCs) are provided to the users.
Operation 3710 includes extending the blockchain by adding the blockchain record into a new block of the blockchain and adding one or more new blocks to the blockchain. In some examples, operation 3710 includes the activities described previously for operations 3216-3226 of flowchart 3200. A trigger event can be used for operation 3710, such as a threshold number of new records awaiting entry into the blockchain, or a schedule, or some other event. In some examples, users of the document corral are notified when records corresponding to their submitted documents are placed into the blockchain, and blockchain addresses for the records are provided to the users. Operation 3712 includes distribute copies of the blockchain outside the control of the permissioning entity (e.g., permissioning entity 2401 of
Users retrieve documents from the document corral, either individually or in sets, in operation 3722. Operation 3724 includes validating individual documents according to flowchart 3500, or some other similar process. In operation 3726, users ensure that the set of documents retrieved is complete. Users can traverse the linked record locator fields (if applicable) to rebuild a daisy chain of document relationships, as described for operations 3402-3420 of flowchart 3400. The set of documents is compared with the reported linking map results, in operation 3728. The completeness of the set is determined in decision operation 3730, and if any documents are missing, an alert is generated in operation 3732. The alert may be sent to permissioning entity, the specific user, and even others, in an attempt to ensure that the operations of document corral 3600 are subjected to proper scrutiny.
A trigger event has identified document 2406t as problematic. For example, document 2406t may have material that comprises privacy violations, intellectual property rights violations, malicious logic, and/or obscenity. Triggers may include periodic scans, the addition of new documents into document corral, or events such as user 3604a or another entity (e.g. permissioning entity 2401) is provided a notice from a law enforcement authority, a court, an attorney, or source indicating that distribution of document 2406t will create a legal liability. Alternatively, a scanner 3820 monitors documents (e.g., document 2406t) within document corral 3600 for quarantine triggers, for example, by scanning the documents for problematic material. In some examples, quarantine triggers are selected from the list consisting of: privacy violations, intellectual property rights violations, malicious logic, and obscenity.
Scanner 3820 identifies that document 2406t is to be quarantined on its own, or by user 3604a flagging document 2406t to scanner 3820. Based at least upon determining that document 2406t is to be quarantined, scanner 3820, or another suitable component, moves document 2406t into document quarantine 3800, which provides quarantine storage capability. That is, scanner 3820 (or some other suitable component) removes document 2406t from document corral 3600 and places a copy within document quarantine 3800. Scanner 3820 then also forwards a copy of document 2406t to a cleaner 3822 to generate document 2406u as a replacement for document 2406t in document corral 3600. In some examples, cleaner 3822 generates document 2406u from document 2406t by removing material that triggered quarantine. In some examples, cleaner 3822 generates document 2406u as a summary of document 2406t.
Document 2406u is thus a cleaned version of document 2406t, which represents document 2406t, and is placed into document corral 3600. Document 2406u should therefore not trigger quarantine. Records 3810u is generated for document 2406u using record generator 2608 and block generator 2708, and added into blockchain 2400 (in block 2402d at index 3812u). Record 3810u has linking information in a linked record field 3814. In some examples, linked record field 3814 is the same format as linked record locator field 2802p of
In some examples, a cleaned reference document 2406v permits rapid cross referencing of documents 2406t and 2406u. For example, cleaned reference document 2406v may include document identifiers (e.g., document names) for both documents 2406t and 2406u, along with an annotation that document 2406t is the original document, which is now stored in document quarantine 3800, and document 2406u is the replacement in document corral 3600. In some examples, cleaner 3822 generated cleaned reference document 2406v. In some examples, cleaned reference document 2406v includes at least one item selected from the list consisting of: identification of document 2406t, identification of a quarantine location (e.g., document quarantine 3800) of document 2406t, a blockchain address of record 3810t, identification of document 2406u, and a blockchain address of record 3810u. In some examples, cleaned reference document 2406v is created or updated after record 3810u is placed into blockchain 2400, so that the address of record 3810u is known. In some examples, one cleaned reference document is generated for each pair of quarantined and cleaned documents. In some examples, a cleaned reference document contains identification of multiple pairs of quarantined and cleaned documents, and is appended with new pairs, as more documents go into document quarantine 3800.
With document 2406t having been removed from document corral 3600, proving the integrity and no-later-than date-of-existence for document 2406t requires additional work. In one example, for example if document 2406t had contained malware rather than illegal material, user 3604a may be willing to retrieve a copy of document 2406t from document quarantine 3800 via access control 3802. This may be the case, for example, if since the time that document 2406t had been placed into document quarantine 3800, the anti-virus (or other malware protection on the computer of user 3604a) had improved sufficiently that document 2406t no longer presents a significant threat. For security, though access control 2802 for document quarantine 3800 may be more stringent, such as with fewer authorized users and/or a stricter authentication scheme, than access control 3602 for document corral 3600.
In some scenarios, user 3604a cannot or prefers to not access document 2406t in document quarantine 3800. A trusted entity 3804, however has access to document quarantine 3800 and can retrieve it for verifying that it matches record 3810t. That is, trusted entity 3804 establishes a no-later-than date of existence for document 2406t using blockchain 2400 by generating an IVC for document 2406t; comparing the generated IVC for document 2406t with a recorded IVC within record 3810t within blockchain 2400; and reporting a no-later-than date of existence for an earliest block (e.g., block 2402a) that contains the recorded IVC. In such scenarios, however, it may be required that a document challenger or arbiter accept the reporting of trusted entity 3804. Although this may be an imperfection in the concept of a blockchain providing self-evident proof, in this manner, even documents containing problematic material can have a version of a provable no-later-than date-of-existence.
In some examples, documents are submitted to scanner 3820 prior to being placed into document corral 3600. In the illustrated scenario, document 2406w is submitted to scanner 3820 and goes straight into document quarantine 3800 without first being placed into document corral 3600. In this scenario, a cleaned document 2406x, representing document 2406w but without the problematic material, is placed into document corral 3600.
However, document 2406t is subject to a court order or law enforcement requirement to destroy all copies. For example, document 2406t may be a privacy violation or obscene material. Document 2406t is removed from all copies of blockchain 3900a. The result is that hashing block 3902a now produces a hash value that no longer matches hash value 3912a. This breaks the chain because block 3902a can no longer be proven to have existed prior to the calculation of hash value 3912b. Unfortunately, document 2406t is not the only document negatively affected. Without being able to prove the location of the modified version of block 3902a (the version missing document 2406t) within blockchain 3900a, the value of having placed document 2406y within blockchain 3900a is also damaged. The removal of documents from an in-chain storage blockchain threatens to destroy the protection for all documents within the same and earlier blocks.
In scenario 39002, an in-chain storage blockchain 3900b is similarly configured and holds a copy of document 2406t in block 3902a. However, knowing the effect that removing document 2406t had on blockchain 3900a, the community that maintains blockchain 3900b does not remove document 2406t, despite the court order or law enforcement requirement. Anyone possessing a copy of blockchain 3900b (at least the portion that includes block 3902a) is committing a legal violation. The prospects indicated in scenarios 39001 and 39002 can thus threaten the long term viability of in-chain storage blockchains.
In contrast, for scenario 39003, when document 2406t is removed from document corral 3600, blockchain 2400 is unaffected and therefore unbroken. The record for document 2406t cannot be used to recreate the problematic content, and so does not require removal. Although the protection of document 2406t that had been provided by blockchain 2400 is now gone, blockchain 2400 is in legal compliance, and the no-later-than dates of existence for documents 2406y, 2406z and 2406zz can still be proven. Scenario 39004 involves moving document 2406t into document quarantine 3800, rather than merely deleting it. If document quarantine 3800 is handled properly, such as by storing documents outside the jurisdiction of the relevant court or law enforcement agency, or perhaps by operating document quarantine 3800 in a manner that is blessed by the relevant court or law enforcement agency, the proof for document 2406t may yet persist, even with legal compliance.
In some examples, however, the received first document is not placed into the document corral until after it has been checked for quarantine triggers. In such examples, operation 4010 follows operation 4004. Decision operation 4012 determines whether the first document is to be quarantined. If not, flowchart 4000 returns to operation 4006, in which the first document is placed into the document corral or permitted to remain there. Even though a trigger condition has not yet been identified, it is possible that a trigger condition may arise in the future.
If decision operation 4012 identifies that the first document is to be quarantined, operation 4014 includes, based at least upon determining that the first document is to be quarantined, moving the first document into the document quarantine. In some examples, this includes removing the first document from the document corral. A cleaned document is generated in operation 4016. For example, operation 4016 includes generating a second document as a replacement for the first document in the document corral, the second document not triggering quarantine. In some examples, generating the second document from the first document includes removing material that triggered quarantine. In some examples, the second document is a summary of the first document.
Operation 4018 includes generating a second blockchain record for the second document and adding the second blockchain record into the blockchain. In some examples, generating a second blockchain record for the second document includes generating a blockchain record with a linked record value. In some examples, the linked record value indicates a blockchain address of the first record. In some examples, the linked record value indicates the first document. In some examples, the linked record value indicates quarantine storage. Operation 4020 includes generating a cleaned reference document. In some examples, the cleaned reference document includes at least one item selected from the list consisting of: identification of the first document, identification of a quarantine location of the first document, a blockchain address of the first record, identification of the second document, and a blockchain address of the second record.
At this point, the conditions are set for later proving integrity and no-later-than dates of existence for at least the first (quarantined) and second (cleaned) documents. The cleaned reference document may also be set up for date proof, although its value is less than establishing its age than in permitting rapid identification and/or location of one of the first and second documents from the other. The date proof is similar as has been described earlier for proving ages and integrity for documents and traversing a daisy chain. Operation 4022 includes retrieving the second document from the document corral and determining integrity or a no-later-than date of existence for the second document using the blockchain. The date proof of the second document may, however, be less important than the date proof of the first document, and so may be skipped in some examples.
Operation 4024 includes identifying, within a linked record locator field of the second blockchain record, a linked record value for the first document. In some examples, this is the first blockchain record, whereas in some examples, it is another locator or document identifier. Once the first document is located, operation 4026 includes retrieving the first document from the document quarantine. Operation 4028 includes locating the first blockchain record within the blockchain and determining a no-later-than date of existence for the first document using the blockchain and the first blockchain record. In some examples, a normal user retrieves the first document from the document quarantine and determines the date, hopefully without encountering problems related to the reason for the quarantine. In some examples, however, the trusted entity performs operations 4024-4028. In such examples, the assurance from the trusted entity is the key to establishing the date for the first document. This is because anyone can independently identify (with certainty) a no-later-than date for the first blockchain record. However, only the trusted entity can hash the first document, if the document quarantine access is so limited. Therefore, operation 4030 includes receiving, from the trusted entity, assurance that the first blockchain record matches the first document. This assurance completes the proof for date and integrity.
In some examples, hash values 4120 and 4122 include one or more portions of the SHA-1, SHA-224, SHA-256, SHA-384, and the SHA-512 message digests. The use of two different hash values significantly increases resistance to second preimage attacks. Together hash values 4120 and 4122 form an IVC for item 4110. In some examples, rapid record 4104a will appear as a short message service (SMS) message. A single SMS message has a character limit of around 160 characters, unless multiple messages are strung together. A single SMS is able to hold SHA-1 and SHA-384, and still have 24 characters remaining for index 4124 and other data. A 4-character hexadecimal index field can indicate up to 65,535, which is sufficient to issue a new record index number every minute for an entire week, prior to resetting. A 3-character index field is sufficient to issue a new record index number every minute for an entire day, and leaves more than 20 characters for other administrative data or codes, such as versioning numbers. In some examples, rapid record 4104a is also submitted to document corral 3600.
Rapid record 4104a is entered into a rapid block 42402a, which may also be submitted to document corral 3600. As illustrated, rapid block 42402a holds rapid record 4104a, subsequent rapid records 4104b and 4104c, and a rapid record 4104Z for a prior rapid block, thereby chaining rapid block 4102a and the prior rapid block. A network message generator 4118 generates a network message 4106a, and includes an IVC generator to generate hash value 4130 and hash value 4132 for inclusion within network message 4106a. In some examples, network message 4106a comprises an SMS message. In some examples, network message 4106a comprises a social media post, such as on Twitter or another social media network. Some examples use network messages that are derived from rapid blocks (as just described), some examples use network messages that are copies or near copies of rapid records, and some examples use both. In either case, network message 4106a indicates rapid record 4104a. Network message 4106a also includes an index 4134.
Network message 4106a is submitted to a public messaging network 4140 for broadcasting. Network message 4106a may also be submitted to document corral 3600, whether by messaging network 4140 or another entity that generated network message 4106a for submission to messaging network 4140. Messaging network 4140 timestamps network message 4106a and broadcasts network message 4106a over public network 4146, which may be a wireless or wired network. For example, public network 4146 may be a cellular network, a widely-distributed e-mail, or a website on the internet. As illustrated, messaging network 4140 stores network message 4106a and other network messages 4106b-4106d in its storage 4142, for at least a while. Timestamps 4144 holds timestamping information for network messages 4106a-4106d.
A monitoring node 4150, for example a third party that is unrelated to item 4110, has no knowledge of the contents of item 4110, and thus has no interest in falsifying data with regards to item 4110 monitors public network 4146 with a monitoring component 4156. Monitoring component 4156 is able to receive broadcasts from public network 4146. As illustrated, monitoring node 4150 stores received network message 4106a and other received network messages 4106b-4106d that had been broadcast by messaging network 4140, in its storage 4152. In some examples, monitoring node 4150 timestamps network messages 4106a-4106d as they are received, and stores them in timestamps 4154. Timestamps 4154 may provide an independent time verification source for network messages 4106a-4106d, that are outside the control of messaging network 4140. As shown, any of network messages 4106a-4106d, timestamps 4144, and timestamps 4154 may be submitted to document corral for inclusion in blockchain 2400.
Although messaging network 4140 may eventually delete network messages 4106a-4106d and timestamps 4144, and monitoring node 4150 may cease operations, thereby losing network messages 4106a-4106d timestamps 4154, public records 2412a-2412d provide permanent, truly independent date proof for copies of network messages 4106a-4106d within document corral 3600. Although public records 2412a-2412d do not have the fine time resolution of timestamps 4144 and 4154, they are independently verifiable and permanent.
In some scenarios, as time lapses, the need for finer time resolution lessens. Consider, for example, cryptocurrency transactions. If a cryptocurrency holder is attempting to spend a particular cryptocurrency unit that was received only a matter of hours prior, blockchain 4200 may be able to establish that the cryptocurrency holder is the proper owner. However, the transaction in which the cryptocurrency holder received the particular cryptocurrency unit may not yet be established by blockchain 2400. In this scenario, the potential recipient, such as a retailer that accepts the cryptocurrency, does not trust blockchain 4200, because the retailer does not trust timestamps created by a messaging network operator. However, the potential recipient does trust blockchain 2400, because blockchain 2400 is independently verifiable. When sufficient time has passed that blockchain 2400 can verify the transaction (in which the cryptocurrency holder received the particular cryptocurrency unit), the cryptocurrency holder will be able to spend the cryptocurrency unit with potential recipients that only trust blockchain 2400 but not blockchain 4200.
In some examples, rapid parallel blockchain 4200 issues new blocks on the order of a minute, using SMS messages 4106a-4106f for timestamping. Although such timestamps (e.g., timestamps 4144) have a finer resolution than the intervals between public records 2412a, 2412b, and 2412c, the timestamps are under the control of messaging network 4140. This means that, to at least some extent, messaging network 4140 must be trusted to timestamp network messages accurately. For long term storage, when messaging network 4140 no longer has any interest in maintaining timestamp data and copies of network messages, the reliability of the timestamps may be determined by the reliability of the entity controlling the long term storage of the messages.
This is where the inclusion of the blocks 4102a-4102f of rapid parallel blockchain 4200 within blockchain 2400 provides value (and also including network messages 4106a-4106f within blockchain 2400). In the long term, it can be established that the initially-applied timestamps (by messaging network 4140) had not been altered. Even if messaging network 4140 ceases operations and all of its records are lost. Blockchain 2400 may run at a rate in which new blocks are generated hourly, daily, at set intervals each day, or some other interval (which may vary). For example, blocks for blockchain 2400 may be generated at 9 am, noon, and 5 pm in selected time zones, such as one or more of Coordinated Universal Time (UTC), Eastern US, Pacific US, Japan Standard Time, and others. In some examples, blocks for blockchain 2400 may be generated at different time intervals on weekends and holidays. Although, in some examples, publication intervals for public records 2412a, 2412b, and 2412c (of
In operation, records 4104a-4104d arrive during a time window 4204a, and are included in block 4102a. Block 4102a becomes part of blockchain 4200. Network message 4106a is generated from block 4102a for broadcast, and is timestamped. Record 4104e is generated for block 4102a during a next time window 4204b. Additional records 4104f and 1804g arrive during time window 4204b. Records 4104e-4104g are included in block 4102b. Record 4104e chains blocks 4102a and 4102b, and block 4102b becomes part of blockchain 4200. Network message 4106b is generated from block 4102b for broadcast, and is timestamped. Record 4104h is generated for block 4102b during a next time window 4204c. Additional records 4104i and 1804J arrive during time window 4204c. Records 4104h-4104J are included in block 4102c. Record 4104h chains blocks 4102b and 4102c, and block 4102c becomes part of blockchain 4200. Network message 4106c is generated from block 4102c for broadcast, and is timestamped. Record 4104k is generated for block 4102c during a next time window 4204d. Additional records 4104L and 1804m arrive during time window 4204d.
Records 4104k-4104m are included in block 4102d. Record 4104k chains blocks 4102c and 4102d, and block 4102d becomes part of blockchain 4200. Network message 4106d is generated from block 4102d for broadcast, and is timestamped. Record 4104n is generated for block 4102d during a next time window 4204e. No additional records arrive during time window 4204e, so only records 4104n is included in block 4102e. Record 4104n chains blocks 4102d and 4102e, and block 4102e becomes part of blockchain 4200. Network message 4106e is generated from block 4102e for broadcast, and is timestamped. Record 4104o is generated for block 4102e during a next time window 4204f. Additional records 4104p, 4104q, and 4104r arrive during time window 4204c. Records 4104o-4104r are included in block 4102f. Record 4104o chains blocks 4102e and 4102f, and block 4102f becomes part of blockchain 4200. Network message 4106f is generated from block 4102f for broadcast, and is timestamped. Record 4104s is generated for block 4102d during a next time window, and this process repeats. Blocks 4102a-4102f and possibly also network messages 4106a-4106f are put into blockchain 2400. As illustrated, time windows 4204a-4204c are portions of time window 4202a, so blocks 4102a-4102c of blockchain 4200 become part of block 2402a of blockchain 2400. Time windows 4204d-4204f are portions of time window 4202b, so blocks 4102d-4102f of blockchain 4200 become part of block 2402b of blockchain 2400. In some examples, the ratio of the number of time windows for blocks of blockchain 4200 to the number of time windows for blocks of blockchain 2400 are significantly different, such as on the order of hundreds or even thousands.
Evidence collection device 4306 sends evidence items 4110a and 4110b to a DEB operator 4310 over a network 4822. DEB operator 4310 has a local evidence store 4312 that holds evidence items 4110a and 4110b from evidence collection device 4306, and also evidence item 4110c from potentially another source. DEB operator 4310 has a rapid block generator 4314 that generates a rapid block for all evidence items collected within a prior time period, such as the prior two minutes. For example, a record may be generated for each of evidence items 4110a-4110c, and placed into a block 4102i. In some examples, DEB operator 4310 has a network message generator 4118 that generates network message 4106i (for example, an SMS) indicating block 4102i, for example using the processes described in relation to
Messaging network 4140 receives network messages 4106g-4106i for broadcast (e.g., over public network 4146), timestamps them, and stores their timestamps in timestamps 4144. Messaging network 4140 may receive network messages from any of evidence collection device 4306, DEB operator 4310, and even permissioning entity 2401. Document corral has copies of evidence items 4110a-4110c, network messages 4106g-4106i, and block 4102i. Document corral may receive various ones of these from any of evidence collection device 4306, DEB operator 4310, and messaging network 4140. When a subsequent block 4102J is chained to block 4102i by holding a record 4104u that includes an IVC for block 4102i, a portion of blockchain 4200 is formed. In some examples, DEB operator 4310 and/or permissioning entity 2401 may manage blockchain 4200. Blockchain 4200 provides time and integrity proof for at least evidence items 4110a and 4110 because IVCs (hash values) for evidence items 4110a and 4110 are contained within block 4102i. Blockchain 2400 also provides integrity proof for at least evidence items 4110a and 4110 because the contents of blockchain 4200 are within blockchain 2400. The date resolution for blockchain 2400 is coarser, on the order of days, rather than a minute or so.
Operation 4404 includes generating a first rapid record, the first rapid record comprising an IVC for the item. Thus, operation 4404 includes generating the IVC. In some examples, the IVC comprises a hash value comprising a compete message digest. In some examples, the IVC comprises a hash value comprising a partial message digest. In some examples, the IVC comprises a hash value comprising two message digests. In some examples, the IVC comprises a mixture of partial and complete message digests. In some examples, the hash value includes one or more portions of the SHA-1, SHA-224, SHA-256, SHA-384, and the SHA-512 message digests. In some examples, the first rapid record comprises an index value. At this point it is optional to add the first rapid record to a document corral for inclusion in a date-provable blockchain. Operation 4406 includes entering the first rapid record into the document corral. In some examples, operation 4406 includes submitting the evidence item to a document corral by the evidence collection device and/or the DEB operator.
Operation 4408 includes generating a first rapid block comprising the first rapid record and a second rapid record. In some examples, the first rapid block comprises an index value. In some examples, the first rapid block comprises an IVC (hash value, message digest) for a prior rapid block, thereby chaining the first rapid block and the prior rapid block. Operation 4410 includes generating an IVC for the first rapid block. At this point it is optional to add the first rapid block to the document corral, so operation 4406 includes entering the first rapid block into the document corral. Operation 4412 includes generating a network message indicating the first rapid record. In some examples, the network message indicating the first rapid record comprises at least a portion of the first rapid record. In some examples, the network message indicating the first rapid record comprises at least the IVC of the first rapid block. In some examples, the network message comprises an SMS message or a social media post. In some examples, the evidence collection device generates a network message indicating the evidence item. In some examples, the DEB operator generates the network message indicating the evidence item.
Operation 4414 includes submitting the network message indicating the first rapid record to a public messaging network for broadcasting. In some examples, the evidence collection device submits the network message indicating the evidence item to a public messaging network for broadcasting. In some examples, the DEB operator submits the network message indicating the evidence item to the public messaging network for broadcasting. Operation 4416 includes timestamping, by the public messaging network, the network message indicating the first rapid record. At this point it is optional to add a copy of the network message to the document corral, so operation 4406 includes entering a copy of the network message into the document corral. In some examples, operation 4406 also includes entering the timestamp of the network message into the document corral. Operation 4418 includes broadcasting, by the public messaging network, the network message indicating the first rapid record over a public medium. In some examples, broadcasting includes sending the network message over a wired network and/or a wireless network to paid subscribers.
Operation 4420 includes receiving the broadcast network message at a monitoring node. In some examples the monitoring node is also a DEB operator. Operation 4422 includes timestamping the received broadcast network message. At this point it is optional to add a copy of the received broadcast network message to the document corral, so operation 4406 includes entering the received broadcast network message into a document corral. In some examples, operation 4406 also includes entering the timestamp of the received broadcast network message into the document corral.
Operation 4424 includes generating a rapid blockchain comprising the prior rapid block, the prior rapid block, and a subsequent rapid block. In some examples, the subsequent rapid block comprises an IVC (hash value, message digest) for the first rapid block, thereby chaining the subsequent rapid record and the first rapid block. In some examples, blocks of the rapid blockchain are generated at time intervals of two minutes or less. In some examples, blocks of the rapid blockchain are generated at time intervals of an hour or less. Although the rapid blockchain uses timestamps provided by the public messaging network, which may not be a trusted timestamping entity (TTE), the rapid blockchain does provide higher time resolution than the slower blockchain which does have provable dates. Fortunately, the slower blockchain provides a provable date, although with coarser time resolution. Operation 4426 includes generating a blockchain record indicating the first rapid record. In some examples, the blockchain record indicating the first rapid record comprises the first rapid record. In some examples, the blockchain record indicating the first rapid record comprises the first rapid block. In some examples, the blockchain record indicates the first rapid record comprises a timestamp for the first rapid block. In some examples, operation 4426 is part of a larger operation that includes generating blockchain records for the first blockchain from entries in the document corral.
The first blockchain record is added into the slower blockchain, using one or more of flowcharts 3200, 3300, 3700, and 4000. In some examples, a block of the first blockchain comprises multiple blocks of the rapid blockchain. In some examples, blocks of the first blockchain are generated at time intervals of an hour or less. In some examples, blocks of the first blockchain are generated at time intervals of a day or less. In some examples, blocks of the first blockchain are generated according to a schedule at a set of selected times in a set of selected time zones. In some examples, the schedule varies according to holiday. For later proving the date and integrity of the item received in operation 4402, operation 4428 includes retrieving a timestamp from the public messaging network, such as a timestamp generated in operation 4416 and/or operation 4422. Flowchart 3500 completes the proof, with the retrieved timestamp providing finer time resolution.
Upon receiving reserved blockchain address 4512, the user enters it (or a suitable indication) into document 4508a to make it into document 4508b. The user generates a blockchain record 4504 for document 4502b. Document 4502b now is able to indicate its own blockchain registration, and when hashed at a later time (e.g., during verification in order to resolve a dispute), will reproduce the hash value (IVC) within the e record that it indicates internally. This capability is not currently achievable with any other blockchain, other than PEDDaL®.
User node 4508 generates a message 4506 including record 4504 and reserved blockchain address 4512 and transmits message 4506 to permissioning entity 2401. Permissioning entity 2401 receives message 4506 that associates record 4504 with reserved blockchain address 4512. Permissioning entity 2401 identifies reserved blockchain address 4512 within reservations 4524 and uses a record scheduler 4528 to scheduling inclusion of record 4504 in blockchain 2400 according to reserved blockchain address 4512. If record 4504 is not received in time, but reserved blockchain address 4512 had included a reserved index value, permissioning entity may zero pad the location within the scheduled block that corresponds to the reserved index (or just put in a different record at that location).
Record 4504 is placed into a record storage 4526 to await its scheduled block. If record 4504 is received early enough prior to the generation of the scheduled block, permissioning entity 2401 may also include record 4504 in an earlier block as an early record. A linking component 4532 generates a linked record locating field (e.g., record locator field 2802p) with reserved blockchain address 4512, to turn record 4504 into record 4504a. A block assembly component 4530 puts records into blocks for blockchain 2400, including record 4504a. Upon the generation period for the scheduled block, if an early record had appeared in an earlier block, linking component 4532 generates a linked record locating field with the blockchain address of that earlier record (record 4504a), to turn record 4504 into record 4504b. Block assembly component 4530 puts record 4504b (or record 4504, if there is no linking information) into blockchain 2400 as scheduled (possibly also at the scheduled index position).
IVC generator 2408 generates a hash value 4606 for document 4502b. A record generator (not shown) includes IVC generator 2408 and places hash value 4606 (or another IVC, as generated by IVC generator 2408) within scheduled record 4504b. As illustrated, early record 4504a has the same hash value 4606. This is because early record 4504a and scheduled record 4504b are both for the same document 4502b. As illustrated, early record 4604a, has a linked record value in a linked record field 4620 that indicating a blockchain address (e.g., the number of block 2402d and the value of index 4608) of scheduled record 4504b. Also as illustrated, scheduled record 4504b, has a linked record value in a linked record field 4610 that indicating a blockchain address (e.g., the number of block 2402b and the value of index 4628) of early record 4504a.
Anyone possessing a copy of document 4502b can locate scheduled record 4504b using the indication of reserved blockchain address 4512 in document 4502b. This permits determining integrity or a no-later-than date of existence for document 4502b using scheduled record 4504b. However with linked records, finding scheduled record 4504b enables locating early record 4504a using the linked record value (within scheduled record 4504b) for early record 4504a. This permits determining integrity or a no-later-than date of existence for document 4502b using early record 4504a. In some scenarios, this earlier provable date may be valuable.
In some examples, the SABRe reference section 4604 is printed in a footer of a document, so that the blockchain registration is easily located by anyone who sees any copy of the document. Such examples thus include printing a blockchain address (blockchain registration address) of a blockchain record (for the document) on a copy of the document itself. This may be performed in combination with use of a daisy chained record, a document corral, a quarantine-enabled document corral, a network message for timestamping, a rapid parallel blockchain, a DEB, and/or other examples described herein.
A real-world example exists for the PEDDaL® blockchain. The text shown in document content section 4602 and SABRe reference section 4604 are in an ASCII text file (so no metadata or other extraneous word processing file data to throw off the hash values), with a single space between “experience.” and “The PEDDaL”, and a single carriage return between “mechanism.” and “This document”. After “at:” there is a single space, followed by “191205a0000A5” in lieu of the text window placeholder for reserved blockchain address 4512. There are no other spaces or carriage returns, and text file has 319 bytes (characters). The text document predicts its own blockchain registration, because hashing the text file produces the SHA-512 and SHA-1 message digests found in the record at index value 0xA5 in block 421205a. By recreating the above-described text file carefully, this self-referencing blockchain registration can be independently verified.
Operation 4702 includes requesting a reserved blockchain address. Operation 4704 includes receiving the request to reserve a blockchain address. Operation 4706 includes determining a reserved blockchain address. Operation 4708 includes returning the reserved blockchain address. In some examples, the reserved blockchain address includes both a block ID and an index value. Operation 4710 includes receiving the reserved blockchain address. In some examples, the reserved blockchain address includes both a block ID and an index value.
Now that the document owner has the reserved blockchain address, operation 4712 includes entering an indication of the reserved blockchain address into a document. Operation 4714 includes generating a record for the document. In some examples, generating a record for the document comprises generating a record for a document containing an indication of the reserved blockchain address. Operation 4716 includes transmitting the record for the document with an association of the reserved blockchain address to the permissioning entity, (or some other node that collects records). Operation 4718 includes the permissioning entity receiving a record associated with the reserved blockchain address. Operation 4720 includes scheduling inclusion of the received record in the blockchain according to the reserved blockchain address.
If the record is received while another block is being generated, before the scheduled block, the permissioning entity may also include the record in the earlier block as an early record. The permissioning entity may also put a linked record within the early record for the scheduled record, since the schedule is already known via the reservations. Thus, optional operation 4722 includes including, within an early record, a linked record value indicating a blockchain address of the scheduled record, and operation 4724 includes additionally including the received record, as an early record, in the blockchain in an earlier block, prior to the schedule. Operation 4726 includes including, within the scheduled record, a linked record value indicating a blockchain address of the early record. Operation 4728 includes including the received record, as a scheduled record, in the blockchain according to the schedule. Operation 4730 includes distributing copies of the blockchain outside the control of a permissioning entity of the blockchain, such that the permissioning entity is unable to alter the blockchain without detection. In some examples, distributing copies of the blockchain outside the control of a permissioning entity of the blockchain comprises publishing the blockchain on a website.
At a later time, when the document requires date and/or integrity verification, operation 4732 includes locating the scheduled record within the blockchain using the indication of the reserved blockchain address in the document. If somehow, the early record had already been located, it is also possible to identify, within a linked record locator field of the early record, a linked record value for the scheduled record. This then permits locating the scheduled record within the blockchain using the linked record value for the scheduled record. Operation 4734 includes determining integrity or a no-later-than date of existence for the document using the scheduled record in the blockchain. In some examples, determining integrity for a document comprises generating an IVC for the document and comparing the generated IVC for the document with a recorded IVC within a record within the blockchain. In some examples, determining a no-later-than date of existence for a document comprises hashing the document, comparing a resulting hash value with a recorded hash value within the blockchain. In some examples, determining a no-later-than date of existence for a block of the blockchain that contains the recorded hash value.
Since the address of the scheduled record is identified within the document, is may be easier to initially locate the scheduled record. However, if an early record had also been generated and linked, it is possible to locate the early record using the scheduled record. Thus, operation 4736 includes identifying, within a linked record locator field of the scheduled record, a linked record value for the early record. Operation 4738 includes locating the early record within the blockchain using the linked record value for the early record. Operation 4740 includes determining integrity or a no-later-than date of existence for the document using the early record in the blockchain.
Computing device 4800 includes a bus 4802 that directly or indirectly couples the following devices: memory 4804, one or more processors 4806, one or more presentation components 4808, input/output (I/O) ports 4810, I/O components 4812, a power supply 4814, and a network component 4816. Computer device 4800 should not be interpreted as having any dependency or requirement related to any single component or combination of components illustrated therein. While computer device 4800 is depicted as a seemingly single device, multiple computing devices 4800 may work together and share the depicted device resources. For instance, computer-storage memory 4804 may be distributed across multiple devices, processor(s) 4806 may provide housed on different devices, and so on. Bus 4802 represents what may be one or more busses (such as an address bus, data bus, or a combination thereof). Although the various blocks of
Computer-storage memory 4804 may take the form of the non-transitory computer-storage media referenced below and operatively provided storage of computer-readable instructions, data structures, program modules and other data for computing device 4800. For example, memory 4804 may store an operating system and other program modules and program data. Memory 4804 may be used to store and access instructions configured to carry out the various operations disclosed herein and may include computer-storage media in the form of volatile and/or nonvolatile memory, removable or non-removable memory, data disks in virtual environments, or a combination thereof. Memory 4804 may include any quantity of memory associated with or accessible by the computing device 4800. Memory 4804 may be internal to the computing device 4800, external to the computing device 4800, or both. Examples of memory 4804 include, without limitation, random access memory (RAM); read only memory (ROM); electronically erasable programmable read only memory (EEPROM); flash memory or other memory technologies; CD-ROM, digital versatile disks (DVDs) or other optical or holographic media; magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices; memory wired into an analog computing device; or any other medium for encoding desired information and for access by computing device 4800. Additionally, or alternatively, memory 4804 may be distributed across multiple computing devices 4800, e.g., in a virtualized environment in which instruction processing is carried out on multiple computing devices 4800. For the purposes of this disclosure, “computer storage media,” “computer-storage memory,” “memory,” and “memory devices” are synonymous terms for memory 4804, and none of these terms include carrier waves or propagating signaling.
Processor(s) 4806 may include any quantity of processing units that read data from various entities, such as memory 4804 or I/O components 4812. Specifically, processor(s) 4806 are programmed to execute computer-executable instructions for implementing aspects of the disclosure. The instructions may be performed by one or more processors 4806 within computing device 4800, or by a processor external to computing device 4800. In some examples, processor(s) 4806 are programmed to execute instructions such as those illustrated in the flowcharts depicted in the accompanying drawings. Moreover, in some examples, processor(s) 4806 represent an implementation of analog techniques to perform the operations described herein. For example, the operations may be performed by an analog computing device 4800 and/or a digital computing device 4800. Presentation component(s) 4808 present data indications to a user or other device. Exemplary presentation components 4808 include a display device, speaker, printing component, vibrating component, etc. One skilled in the art will understand and appreciate that computer data may be presented in a number of ways, such as visually in a graphical user interface (GUI), audibly through speakers, wirelessly between computing devices 4800, across a wired connection, or in other ways. I/O ports 4810 allow computing device 4800 to be logically coupled to other devices including I/O components 4812, some of which may be built in. Example I/O components 4812 include a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, etc.
Computing device 4800 may operate in a networked environment via network component 4816 using logical connections to one or more remote computers. In some examples, network component 4816 includes a network interface card and/or computer-executable instructions (e.g., a driver) for operating the network interface card. Communication between computing device 4800 and other devices may occur using any protocol or mechanism over any wired or wireless connection. In some examples, network component 4816 is operable to communicate data over public, private, or hybrid (public and private) using a transfer protocol, between devices wirelessly using short range communication technologies (e.g., near-field communication (NFC), Bluetooth™ branded communications, or the like), or a combination thereof. For example, network component 4816 communicates over a communication link 4820, through a network 4822, with a cloud resource 4824. Various examples of communication link 4820 include a wireless connection, a wired connection, and/or a dedicated link, and in some examples, at least a portion is routed through the internet. In some examples, cloud resource 4824 performs at least some of the operations described herein for computing device 4800.
Although described in connection with an example computing device 4800, examples of the disclosure are capable of implementation with numerous other general-purpose or special-purpose computing system environments, configurations, or devices. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with aspects of the disclosure include, but are not limited to, smart phones, mobile tablets, mobile computing devices, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, gaming consoles, microprocessor-based systems, set top boxes, programmable consumer electronics, mobile telephones, mobile computing and/or communication devices in wearable or accessory form factors, network PCs, minicomputers, distributed computing environments that include any of the above systems or devices, and the like. Such systems or devices may accept input from the user in any way, including from input devices such as a keyboard or pointing device, via gesture input, proximity input (such as by hovering), and/or via voice input.
The order of execution or performance of the operations in examples of the disclosure illustrated and described herein is not essential and may be performed in different sequential manners in various examples. For example, it is contemplated that executing or performing a particular operation before, contemporaneously with, or after another operation is within the scope of aspects of the disclosure. When introducing elements of aspects of the disclosure or the examples thereof, the articles “a,” “an,” “the,” and “said” are intended to mean that there are one or more of the elements. The terms “comprising,” “including,” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements. The term “exemplary” is intended to mean “an example of.” The phrase “one or more of the following: A, B, and C” means “at least one of A and/or at least one of B and/or at least one of C.” Having described aspects of the disclosure in detail, it will be apparent that modifications and variations are possible without departing from the scope of aspects of the disclosure as defined in the appended claims. As various changes could be made in the above constructions, products, and methods without departing from the scope of aspects of the disclosure, it is intended that all matter contained in the above description and shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.
This application claims the benefit of U.S. Provisional Patent Application No. 62/980,467, filed Feb. 24, 2020, entitled “Blockchain With Daisy Chained Records, Document Corral, Quarantine, Message Timestamping, And Self-Addressing”, the entirety of which is hereby incorporated by reference herein; and also claims the benefit of U.S. Provisional Patent Application No. 62/841,406, filed May 1, 2019, entitled “Blockchain With Daisy Chained Record References”, the entirety of which is hereby incorporated by reference herein. This application is also a continuation-in-part of co-pending U.S. patent application Ser. No. 16/399,084, filed Apr. 30, 2019, which is a continuation of U.S. patent application Ser. No. 15/086,042, filed Mar. 30, 2016, now U.S. Pat. No. 10,313,360, which is a continuation of U.S. patent application Ser. No. 14/720,874, filed May 25, 2015, now U.S. Pat. No. 9,330,261, which is a continuation of U.S. patent application Ser. No. 13/304,657, filed Nov. 27, 2011, now U.S. Pat. No. 9,053,142, which is a continuation of U.S. patent application Ser. No. 13/017,057, filed Jan. 31, 2011, now U.S. Pat. No. 8,135,714, which is a continuation of U.S. patent application Ser. No. 12/110,282, filed Apr. 25, 2008, now U.S. Pat. No. 7,904,450, and claims priority thereto.
Number | Date | Country | |
---|---|---|---|
62980467 | Feb 2020 | US | |
62841406 | May 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15086042 | Mar 2016 | US |
Child | 16399084 | US | |
Parent | 14720874 | May 2015 | US |
Child | 15086042 | US | |
Parent | 13304657 | Nov 2011 | US |
Child | 14720874 | US | |
Parent | 13017057 | Jan 2011 | US |
Child | 13304657 | US | |
Parent | 12110282 | Apr 2008 | US |
Child | 13017057 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16399084 | Apr 2019 | US |
Child | 16864078 | US |