The present invention generally relates to authenticity and integrity of documents. In particular, present invention relates to a computer implemented method, and also to a system and computer programs, for tracking of certified documents lifecycle, said documents being previously certified with integrity and authenticity guarantees.
When a user or an entity is dealing with a document, being it in paper or digital form, they face a common problem: verifying the authenticity and integrity of the document. The authenticity of a document or certificate relates to the fact that it has actually been emitted by its emitting entity on the stated date. On the other hand, the integrity of a document or certificate relates to the fact that it has not been edited after its emission—or current version—(added/removed/altered text).
Different practices are currently used to check whether a document is authentic and has not been edited since its emission (i.e., integrity), or to track the document history in terms of document access and editing/versioning. All of them present limitations.
Documents may come with an Administrative Reference Code, by which a service may provide the emission date of the document as well as its issuer (to check the document authenticity) or the whole document (authenticity and integrity). The latter case is prone to data leak, while the first does not guarantee integrity. Furthermore, in both cases, human intervention is always needed to compare the emission date and/or the whole document with the original one, making the system prone to eventual human error.
Some patent applications are known in the field, for instance:
US-A1-20140049802 describes a system which is based on the generation of encoded image ETCODE using steganographic techniques, to be printed with the document, using conventional printers. The decoding is performed by a digital camera portable device, obtaining therefore the information hidden in ETCODE, and then confronted with the information about the document in its digital version present in a database. The described solution does not rely on a trusted third party nor it describes how the comparison between the presented copy and the stored copy of the document is performed and whether it is available both for digital and digitalized documents, or not.
WO 2008108861 describes a method for processing electronic documents, such as electronic invoices, specifications, or contracts, to ensure authenticity, integrity, confidentiality, and non-repudiation of the document. A third party service provider is established as the agent for two interacting parties. The third party service provider receives an electronic document from a first party, the electronic document relating to a transaction between the parties, the transaction being, for example, a sale or a contract. The third party service provider provides an electronic signature and certification for the document and archives the document, providing it, along with the certification, to the second party or others. The described solution only works for digital documents and not for digitalized documents, for which no guarantee is offered. Furthermore, the solution relies on digital certificates, suffering of the same limitations.
KR 1020080014194 describes an electronic document repository system that includes an authentication module, a registration module, a reading module, an issuing module and a certificate module. The authentication module is connected to a user terminal through a network, secures authenticity of electronic documents, performs user authentication through a log-in process when a user accesses the electronic document repository system. The registration module checks an electronic document information package transmitted from the user, generates metadata, adds authentication information to the metadata and stores the metadata in a database. The reading module generates a reading information package and transmits the reading information package to the user when the user wants to read an electronic document. The issuing module generates an issuing information package and transmits the package to the user when the user requests the issuing module to issue an electronic document. The certification module issues a certificate for the electronic document or verifies an issued certificate. Contrary to present invention, this solution only provides document issuing and retrieving functionalities, providing documents with embedded metadata for authenticity check on the user side, but it does not allow to automatically check the authenticity and integrity of a carried document: it only allows a manual comparison of a carried document with the digital copy obtained from the repository, in order to verify the document authenticity and integrity.
US-A1-20090193259 describes a solution to store documents and check their authenticity. The solution relies on a hash of the document, fixed to the document itself with a digital signature. The solution only considers digital documents, and not digitalized ones. Moreover, this solution includes the digest of the hash in a visible fashion inside the document, and not in an unappreciable fashion as described in present invention. Furthermore, the solution does not allow including into the document metadata, but only a hash of the document itself. Furthermore, no details are disclosed on how the document hash is computed, or on how the documents are stored in an unalterable way.
US-A1-20100122348 A1 describes a solution to digitalize documents and store them in a repository to check their authenticity on the basis of a mark applied to the digital version. The mark is a combination of the issuer and stored marks. Contrary to present invention, this solution only considers digitalized documents and not native digital ones. The described solution includes a visible mark in the document. As such, it only guarantees the document integrity on a manual comparison. Furthermore, the included mark does not allow storing metadata. Finally, no details are disclosed on how the document is stored in an unalterable way on the storing side.
US20180025181A1 describes a solution to guarantee the integrity of big quantity of data files by constantly monitoring a data storage through a computer system, which records any change in the data files using a DLT as repository and including the hash of the datafile for which a change has been detected. The described solution only monitors files in a specific storage base, not in any location. Furthermore, the described solution only detects and records changes concerning changes of the content of the data files, not other kind of operations, such as changes in the file metadata, or access to the file by specific users. Finally, changes to the files as recorded by the described solution do not include metadata on the user executing them and other similar information, which are of key importance for security applications of the system. Overall, the described solution represents a reactive system, detecting the changes to the files once they occurred and not a proactive solution recording the changes directly from the system component executing them.
More solutions are therefore needed to track operations on certified files through their lifecycle in an immutable and confidential register.
Embodiments of the present invention provide according to an aspect a computer implemented method for tracking of certified documents lifecycle, in an immutable register, while keeping confidentiality on the changes themselves. The method first comprises receiving, by a second computer system, from a first computer system (issuer), at least one document (a digital document, e.g., a PDF) to be certified, the at least one document being preferably identified in the second computer system with metadata at least including an identifier of the first computer system and a timestamp. Then, the second computer system certifies the received document by applying to the received document a watermark (i.e. an alteration of the document that may include an identifying image or pattern, such as a character spacing or character deformation in the case of text, or pixel shifting in frequency or space in the case of images), providing a modified document. The modified document is sent by the second computer system to the first computer system to be stored or used.
Upon the modified document is submitted to a transaction, a fourth computer system receives, from the second computer system, an identifier of the modified document and an identifier of an operation executed on the modified document during the transaction, and stores information about the transaction in a third computer system hold within a distributed ledger such as a DLT. Then, the third computer system executes a computer program storing therein pairings composed of said identifier of the modified document and a hash value of the transaction and transmits/sends the hash value to the fourth computer system. Upon the reception of the hash value, the fourth computer system, stores an address of the computer program, an interface of the computer program and the identifier of the modified document into a record of a database. Such that the record can be used to track any future transaction executed on the modified document, i.e. through the transaction hash, and knowing the document identifier the complete set of transactions executed over a document during its lifecycle can be tracked by the fourth computer system.
The computer program (or document manager) executed by the third computer system is a computer program that is able to define and store functions, struts and pointers, in this particular case pointers to transactions referring to each operation performed on the modified document. This information is stored in a manager's storage. Each manager (each computer program) has its own storage and memory space and both are in the third computer system.
The computer program interface (or document manager interface) is a data encoding scheme used in the third computer system to work with the document manager. The computer program interfaces defines how data structures or computational routines are accessed.
As stated before, the computer program (or document manager) stores pairings composed of a unique identifier of the modified document (document ID) and a hash value of the transaction. This pairing can be referred as a documentID-hash pairing. As stated before, the storage of the document manager is located in the third computer system; as a consequence, each documentID-hash pairing data is stored in the third computer system as well. There is a different document manager for each unique original document the invention processes, i.e. the document manager is specific for each document the lifecycle of which has to be tracked.
According to the proposed method, the operation executed on the modified document may include its creation, its editing (e.g., signing, adding or removing text, etc.), metadata addition, change or removal, watermark insertion/extraction, biometric signing, document access, etc. The operation may also include processes performed with the document, such as sharing the document through different channels like email, sharing a link to the document stored in an online repository, etc.
Each operation corresponds to a specific identification code and is bound to specific metadata, including a unique ID of the user executing it (and eventually his/her affiliation), a timestamp, and eventual further data, specific for the corresponding operation, e.g., policy setting for the metadata analysis/insertion/editing, result of the document access attempt (success, partial access, failure, etc.), etc.
Different modules may be included in the fourth computer system to take care of the different kind of actions to which the document may be subject to, e.g., a module for the operations concerning metadata (access, editing, addition, removal), a module for the operations concerning the watermark (addition, extraction, removal), a module for the operations concerning the document biometrical signature (signature, extraction, removal), a module for the operations concerning the document access and information Rights Management (IRM) (access attempt, visualization, editing), etc. This set of operations may be executed on a document in any location, including a local document, a document in a cloud system or in a shared memory and more, as the operations are executed on specific modules, which are connected to the described system and are needed to execute the operations themselves.
In an embodiment, the certifying step comprises computing, by the second computer system, a first cryptographic function (e.g., a hash function) of the received document and sending the computed first cryptographic function to the third computer system, the third computer system storing the first cryptographic function in at least one memory thereof. Then, the second computer system receives a first message digest corresponding to an identifier of having stored the first cryptographic function in the third computer system. Following, in the proposed method, the second computer system computes a key using the received first message digest and metadata of the document, said computed key being decoded into the above-mentioned watermark. The second computer system also computes a second cryptographic function of the modified document and sends the computed second cryptographic function and the modified document to the third computer system for storage thereof. Finally, the second computer system receives a second message digest corresponding to an identifier of having stored the second cryptographic function in the third computer system, and stores it locally.
By DLT it has to be understood a consensus of replicated, shared, and synchronized digital data geographically spread across multiple sites, countries, or institutions. There is no central administrator or centralized data storage. As a consequence, the system results fault tolerant and universal (i.e., can be adopted independently of the geographical location). A peer-to-peer network is required as well as consensus algorithms to ensure replication across nodes is undertaken. A Block chain is a possible implementation of the DLT.
According to an embodiment, the watermark is replicated in different points of the modified document allowing hence to check for the authenticity of the document or even for the authenticity of a portion of the document, if it has been damaged (i.e., a broken document, where a part is missing, or dirty/crumpled paper document, etc.). Preferably, the watermark is configured to be indistinguishable to a human eye, while can be identified at a digital inspection.
According to the proposed method, the modified document can be sent, by the first computer system, to a user upon the latter having been validly authenticated.
According to a first embodiment, the second computer system receives a digital document from the user and further extracts the watermark from the received digital document and decodes from it the key, and recovers the second cryptographic function from the third computer system by providing to the latter the second message digest. Then, the second computer system extracts the metadata of the document from the key, computes a third cryptographic function of the digital document and compares the third cryptographic function with the second cryptographic function that has recovered from the third computer server. Finally, the second computer system informs the user of a result of said comparison and also sends metadata to the latter.
The recovering of the second cryptographic function and the extraction of the metadata can be performed at the same time.
According to a second embodiment, the second computer system receives a digitalized document (e.g., a scan/picture of a digital document previously printed to paper or the conversion to a different digital format of a digital document) from the user and further extracts the watermark from the received digitalized document decoding from it the key. Then, the second computer system, extracts the metadata of the document, including the identifier of the first computer system and the timestamp from the key, and the first message digest from the key, and uses the first message digest to recover the first cryptographic function from the third computer system in order to check the document existence and registration. Finally, the second computer system sends a response to the user about the existence and registration of the document in the third computer system and the extracted metadata for further authenticity check by the user.
The extraction of the metadata and the extraction of the first message digest can be performed at the same time.
According to a third embodiment, the second computer system authenticates identification information of the user and upon said authentication is confirmed the second computer system receives a digitalized document from the user. Then, the second computer system extracts the watermark from the received digitalized document and decodes from it the key, using the second message digest to recover the modified document from the third computer system. Following, the second computer system, extracts the metadata of the received digital document including the identifier of the first computer system and the timestamp from the key. Finally, the second computer system, sends to the user the extracted metadata so that (s)he can verify the authenticity of the document, and also sends to him/her the recovered modified document so that (s)he can check its integrity.
Other embodiments of the invention that are disclosed herein include a system and software programs to perform the method embodiment steps and operations summarized above and disclosed in detail below. More particularly, a computer program product is one embodiment that has a computer-readable medium including computer program instructions encoded thereon that when executed on at least one processor in a computer system causes the processor to perform the operations indicated herein as embodiments of the invention.
Present invention guarantees:
Furthermore, the provided guarantees are based on a distributed ledger infrastructure, being hence:
Finally, present invention is based on a trusted third party, guaranteeing hence:
The previous and other advantages and features will be more fully understood from the following detailed description of embodiments, with reference to the attached drawings, which must be considered in an illustrative and non-limiting manner, in which:
Present invention allows to guarantee the integrity and authenticity of a document in an automatic and real time fashion, while also guaranteeing the proof of existence in time of the checked document, the non-repudiation from the document issuer, the confidentiality of the document, a universal access to the solution, in space and time, the solution neutrality with respect to the issuer and user, and the robustness of the solution to document damages. Furthermore, the present invention allows keeping an immutable and confidential register of all the operations executed on documents/files through their lifecycle, mapping them to the corresponding metadata. This register also allows mapping the operations to the document, to a specific timestamp, to the user executing them and to eventual specific policies corresponding to the operation, allowing guaranteeing the security of all the documents monitored by the solution.
When a document is emitted by an authorized issuer (or first computer system as termed in the claims) 10, i.e., an entity (private or public) authorized to issue documents and store them using the proposed method, (
Once an original document D0 is received by the target system 20, the target system 20, in an embodiment, certifies it by applying a watermark to the document D0, providing a modified document Dw, which is further sent to the issuer 10.
In a more complex embodiment, the certification of the document D0 involves the calculation, by the target system 20, of a first cryptographic function such as a hash function of the document h0 which is stored in the DLT 30. Each time the first cryptographic function is stored in a DLT 30, a first digest is returned (record hash—hR0). The returned first digest is combined with the identifier of the issuer 10, the timestamp and eventual other metadata to create a key K, which is decoded into the watermark to be applied to the original document, as well as for future checks on the document authenticity. The document obtained after the application of the watermark DW (i.e., the modified document) is returned to the issuer 10 to be delivered to the final user 1. Furthermore, a second cryptographic function, such as a hash function of the modified document DW, is computed and stored in the DLT 30 for future integrity check; together with the modified document DW itself.
According to the proposed method, the watermark applied to the document consists of a special watermark representing a code (key K) and replicated in different points of the document itself, allowing hence checking for the authenticity of the document or even for the authenticity of a portion of the document, if it has been damaged. The watermark, furthermore, cannot be appreciated by a human inspection, guaranteeing hence security against external observers, as well as robustness to human errors.
Any final user 1 may, according to a first embodiment, verify anytime the authenticity and integrity of a digital document in their possession—given that the original document has been registered using the described solution—by sending it to the described target system 20 (
In a similar way, any final user may, according to a second embodiment, verify anytime the authenticity of a paper or digitalized document (photo, scan, format conversion)—given that the original document has been registered using the described target system 20—by sending the digitalized document to the latter (
Any final user may also, according to a third embodiment, obtain the modified document and verify anytime the authenticity and integrity of a paper or digitalized document (photo, scan, format conversion) (
The described service is implemented in an organization independent of both the issuer 10 and the final user 1, guaranteeing neutrality in their respect and constituting a trusted third party, accessible by any issuer (being it private or public) and by any user.
Furthermore, documents/files Dw may be monitored by the present invention allowing recording any operation executed on them. The documents Dw may be located on any support, including local storage, cloud systems and shared storage.
The operations are executed on the documents Dw preferably through specific modules M1, M2 . . . Mn, each one implementing a specific type of operation, including document creation and editing, metadata analysis, insertion, editing and deleting, watermark insertion and extraction, biometric signing and signature extraction, document access for visualization, editing, etc. Each operation is preferably executed by a specific module M1, M2 . . . Mn, which, when called by the final user to execute a transaction, passes the data of the operation to a system backend 41 of a fourth computer system 40, which stores it in the DLT 30. The data stored for each operation includes a unique ID of the user executing it and eventually his/her affiliation, a unique ID of the document Dw on which the operation is executed, a timestamp of the operation, an ID of the specific operation and eventual setting and policies for the operation (e.g., policies for the metadata analysis of a document), and the eventual outcome of the operation over the document (e.g., success or failure).
The DLT 30 stores two different kinds of information: a) the different hashes of the transactions and its document identifiers in an array structure defined in the document manager, and b) the metadata associated to each operation performed on a document Dw, included in each transaction.
The system backend 41 is a set of computer programs orchestrating the operations to be performed by each element of the solution. Particularly it includes the document manager and the script, as defined below.
The document manager is a computer program that defines and stores pointers to transactions referring to each operation performed on a document Dw. This information is stored in a manager's storage. Each document manager has its own storage and memory space and both are in the DLT 30. The document manager has two main functions:
According to the proposed method, each document Dw has a different document manager. The reason behind this design is to have a more robust system in which all of the information related to each document Dw is stored in the DLT 30, and can be retrieved through this single element. This is why is very important to store the address of each document manager linking to the identifier of each corresponding document Dw in order to access in the future both the information stored in the document manager and the information stored directly in the DLT 30.
Scrip is a computer program where different libraries are defined (in particular a library that will allow working with the document managers) and commands that will allow making different actions with the DLT 30 and the data. For example, the script will be in charge of processing the input data (separating the identifier of the document Dw from the lifecycle data), getting and sending data to a database 44, sending the transactions to the DLT 30 (through the document manager) or establishing the connection with the nodes of the DLT 30 that present invention will be working with.
The fourth computer system 40 also comprises an analysis and visualization module/unit 45, i.e. a set of tools that will allow the retrieval of all the information of the lifecycle of a particular document Dw previously stored in the DLT 30 by the elements defined above. That means retrieving all the hashes (transaction) associated to a document Dw in its document manager and the details (metadata) associated to each document operation included in each transaction represented by its hash.
The database 44 is configured to store and associate the document manager interface, the document manager address and the document ID for each document Dw processed by the system. The document will be obtained from the input data of the first step and the interface and the address will be obtained directly through the script using the library described previously.
According to an embodiment, and following the description of
If an authorized user wants to access information of a specific document Dw the process will be: with the document ID the system backend 41 retrieves the address and the interface of the document manager correspondent to that identifier from the database 44. With the manager's address and interface, the system backend 41 retrieves the transaction hashes. With the transaction hashes, the user is able to get the information from the DLT 30.
Specific authorized users may access the information through the analysis and visualization module 45, in order to reconstruct the document lifecycle or analyze specific operations on a given file, being able to guarantee (or analyze) its security, confidentiality and integrity.
While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof. For example, the different computer systems provided by present invention may be implemented in hardware or software or in a combination of hardware and software.
Additionally, the software programs included as part of the invention may be embodied in a computer program product that includes a computer useable medium. For example, such a computer usable medium can include a readable memory device, such as a hard drive device, a flash memory device, a CD-ROM, a DVD/ROM, or a computer diskette, having computer readable program code segments stored thereon. The computer readable medium can also include a communications link, either optical, wired, or wireless, having program code segments carried thereon as digital or analog signals.
The scope of the present invention is determined by the claims that follow.
Number | Date | Country | Kind |
---|---|---|---|
18382197.4 | Mar 2018 | EP | regional |