1. Field of the Invention
The present invention relates to an information processing apparatus, verification processing apparatus, and control methods thereof.
2. Description of the Related Art
In recent years, along with rapid development and prevalence of computers and their networks, many kinds of information such as text data, image data, audio data, and the like have been digitized. Digital data is free from deterioration due to aging or the like and can be saved in a perfect state forever. In addition, the digital data can be easily copied, edited, and modified.
Such copying, editing, and modifying of digital data are very useful for users, while protection of digital data poses a serious problem. In particular, when documents and image data are distributed via wide area networks such as the Internet and the like, since digital data are readily changed, a third party may alter the data.
In order for a recipient to detect whether or not incoming data has been altered, a processing technology called digital signature has been proposed as a scheme for verifying additional data to prevent alteration. The digital signature processing technology can prevent not only data alteration but also spoofing, denial, and the like on the Internet.
Digital signature, a Hash function, public key cryptosystem, and public key infrastructure (PKI) will be described in detail below.
[Digital Signature]
Let Ks (2106) be a private key, and Kp (2111) be a public key. A sender applies a Hash process 2102 to data M (2101) to calculate a digest value H(M) 2103 as fixed-length data. Next, the sender applies a signature process 2104 to the fixed-length data H(M) using the private key Ks (2106) to generate digital signature data S (2105). The sender sends this digital signature data S (2105) and data M (2101) to a recipient.
The recipient converts (decrypts) the received digital signature data S (2110) using the public key Kp (2111). The recipient generates a fixed-length digest value: H(M) 2109 by applying a Hash process 2108 to the received data M (2107). A verification process 2112 verifies whether or not the decrypted data matches the digest value H(M). If the two data do not match as a result of this verification, it can be detected that the data has been altered.
In digital signature, public key cryptosystems such as RSA, DSA (to be described in detail later), and the like are used. The security of these digital signatures is based on the fact that it is difficult for an entity other than a holder of a private key in terms of calculations to counterfeit a signature or to decode a private key.
[Hash Function]
A Hash function will be described below. The Hash function is utilized together with the digital signature processing to shorten a processing time period for an assignment of the signature by applying lossy compression to data to be signed. That is, the Hash function has a function of processing data M having an arbitrary length, and generating output data H(M) having a constant length. Note that the output H(M) is called Hash data of plaintext data M.
Especially, a one-way Hash function is characterized in that if data M is given, it is difficult in terms of a computation volume to calculate plaintext data M′ which meets H(M′)=H(M). As the one-way Hash function, standard algorithms such as MD2, MD5, SHA-1, and the like are available.
[Public Key Cryptosystem]
A public key cryptosystem will be described below. The public key cryptosystem utilizes two different keys, and is characterized in that data encrypted by one key can only be decrypted by the other key. Of the two keys, one key is called a public key, and is open to the public. The other key is called a private key, and is possessed by an identified person.
Digital signatures using the public key cryptosystem, RSA signature, DSA signature, Schnorr signature, and the like are known. In this case, the RSA signature described in R. L. Rivest, A. Shamir and L. Aldeman: “A method for Obtaining Digital Signatures and Public-Key Cryptosystems”, Communications of the ACM, v. 21, n. 2, pp. 120-126, February 1978, will be exemplified. Also, DSA signature described in Federal Information Processing Standards (FIPS) 186-2, Digital Signature Standard (DSS), January 2000 will be explained additionally.
[RSA Signature]
Primes p and q are generated to have n=pq. λ(n) is set as a least common multiple of p−1 and q−1. Appropriate e prime to λ(n) is selected to have a private key d=1/e (mod λ(n)) where e and n are public keys. Also, let H( ) be a Hash function.
[RSA Signature Generation] Signature generation sequence for document M
[RSA Signature Verification] Verification sequence of signature (s, T) for document M
[DSA Signature]
[DSA Signature Generation] Signature generation sequence for document M
2) We have c:=H(M).
3) We have s:=α ˆ−1 (c+xT) mod q to set (s, T) as signature data.
[DSA Signature Verification] Verification sequence of signature (s, T) for document M
[Public Key Infrastructure]
In order to access resources in a server in a client-server communication, user authentication is required. As one means of user authentication, a public key certificate such as ITU-U Recommendation X.509 or the like is prevalently used. The public key certificate is data which guarantees binding between a public key and its user, and is digitally signed by a trusted third party called a Certification Authority: CA. A user authentication scheme using SSL (Secure Sockets Layer) used in a browser is implemented by confirming if the user has a private key corresponding to a public key included in the public key certificate presented by the user.
Since the public key certificate is signed by the CA, the public key of the user or server included in it can be trusted. For this reason, when a private key used in signature generation by the CA leaks or becomes vulnerable, all the public key certificates issued by this CA become invalid. Since some CAs manage a huge number of public key certificates, various proposals have been made to reduce the management cost. The present invention to be described later can reduce the number of certificates to be issued and server accesses as a public key repository as its effects.
In ITU-U Recommendation X.509 v.3 described in ITU-U Recommendation X.509/ISO/IEC 9594-8:
A “subject” field 1506 stores an X.500 identification name of a holder of a private key corresponding to the public key included in this certificate. A “subjectPublicKeyInfo” field 1507 stores the public key which is certificated. An “issuerUniqueIdentifier” field 1508 and “subjectUniqueIdentifier” fields 1509 are optional fields added since v2, and respectively store unique identifiers of the CA and holder.
An “extensions” field 1510 is an optional field added in v3, and stores sets of three values, i.e., an extension type (extnId) 1511, critical bit (critical) 1512, and extension value (extnvalue) 1513. The v3 “extensions” field can store not only a standard extension type specified by X.509 but also a unique, new, extension type. For this reason, how to recognize the v3 “extensions” field depends on the application side. The critical bit 1512 indicates if that extension type is indispensable or negligible.
The digital signature, Hash function, public key cryptosystem, and public key infrastructure have been described.
A scheme for dividing text data to be signed into a plurality of text data and attaching digital signatures to respective text data using the aforementioned digital signature processing technology has been proposed (see Japanese Patent Laid-Open No. 10-003257). According to this proposed scheme, when digitally signed text data is partially quoted, the verification process can be done for the partially quoted text.
The proposed scheme handles only text data as data to be signed. However, along with diversification of digital data in recent years, compound contents including a plurality of types of contents may be digitally signed. When such compound contents are processed as a group of binary data, and are to be digitally signed via, e.g., a compression process or the like, if a third party divides the contents into sub-contents and tries to re-distribute the sub-contents, signature data in the sub-contents can no longer be verified.
To avoid such problem, as in the proposed scheme, all sub-contents to be signed may be digitally signed in addition to text data. However, in this case, both the signature generation and signature verification require huge computation cost in their encryption or decryption process. Hence, the number of processes increases with increasing number of sub-contents.
It is, therefore, an object of the present invention to allow signature verification not only for text data but also for compound contents of digital data stored in various formats even when a sub-content as a part of such compound contents exists separately. Also, it is an object of the present invention to provide a signature processing technology which can set computation volumes of signature generation and signature verification processes to be constant without being proportional to the number of divided sub-contents.
According to the present invention which at least mitigates the aforementioned problems together or individually, there is provided an information processing apparatus comprising, a first generation unit adapted to generate data to be signed by dividing a digital document into regions, a second generation unit adapted to generate first digest values of the data to be signed and identifiers used to identify the data to be signed, a third generation unit adapted to generate signature information based on a plurality of the first digest values and the identifiers obtained from the digital document, and a fourth generation unit adapted to generate a first signed digital document based on the signature information and the data to be signed.
Also, there is provided a verification processing apparatus which verifies a digital document based on a signed digital document, the apparatus comprising, an extraction unit adapted to extract a signature information from the signed digital document, a determination unit adapted to determine whether a first digest value and a identifier in the signature information have been altered or not, an obtaining unit adapted to obtain a data to be signed from the signed digital document based on the identifier when the determination unit determines that the first digest value and the identifier have not been altered, a calculation unit adapted to calculate a second digest value of the data to be signed, a comparison unit adapted to compare the first digest value and the second digest value, and a verification result generation unit adapted to generate a verification result based on the comparison result.
Further, there is provided a method for controlling an information processing apparatus, comprising, a first generation step of generating data to be signed by dividing a digital document into regions, a second generation step of generating first digest values of the data to be signed and identifiers used to identify the data to be signed, a third generation step of generating signature information based on a plurality of the first digest values and the identifiers obtained from the digital document, and a fourth generation step of generating a first signed digital document based on the signature information and the data to be signed.
Further, there is provided a method for controlling a verification processing apparatus which verifies a digital document based on a first signed digital document, comprising, an extraction step of extracting the signature information from the first signed digital document, a determination step of determining whether the first digest value and the identifier in the signature information have been altered or not, an obtaining step of obtaining the data to be signed from the signed digital document based on the identifier when it is determined in the determination step that the first digest value and the identifier have not been altered, a calculation step of calculating a second digest value of the data to be signed, a comparison step of comparing the first digest value and the second digest value, and a verification result generation step of generating a verification result based on the comparison result.
Further features of the present invention will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
Preferred embodiments of the present invention will now be described in detail in accordance with the accompanying drawings.
<First Embodiment>
A signature generation process and signature verification process corresponding to this embodiment include a digital document generation process and digital document operation process. More specifically, the digital document generation process divides image data generated by scanning a paper document into sub-contents and generates compound contents (to be referred to as a digital document hereinafter) by digitally signing a desired sub-content group by the user. The digital document operation process extracts sub-contents from the digital document, verifies signature information of the sub-contents that require verification, and then performs a contents consumption process such as browsing, printing, or the like, a contents reconstruction process, and the like.
[Digital Document Generation Process]
The process corresponding to this embodiment will be described below.
In the digital document generation process 401 corresponding to this embodiment, a paper document input process 404 inputs a paper document 403. Next, an intermediate digital document generation process 405 generates an intermediate digital document by analyzing the paper document 403. A signature information generation process 407 generates signature information based on the intermediate digital document and a private key 406. A signature information attachment process 408 associates the intermediate digital document with the signature information. A digital document archive process 409 generates a digital document 411 by integrating the intermediate digital document and signature information. The digital document 411 corresponds to the digital document 205 in
In the digital document operation process 402, a digital document reception process 412 receives the digital document 411. A digital document extraction process 413 extracts the intermediate digital document and signature information from the received digital document 411. A signature information verification process 415 performs verification based on the intermediate digital document, the signature information, and a public key 414. A document operation process 416 performs an operation such as modification, editing, printing, or the like of the extracted digital document.
Details of the functional blocks in
Referring to
In step S502, the digital data is divided into regions for respective attributes. The attributes in this case include text, photo, table, and picture.
The regional division process extracts sets such as a group of 8 connected black pixels of contour, a group of 4 connected white pixels of contour, and the like in the digital data, and can extract regions with feature names such as text, picture or figure, table, frame, and line. Such scheme is described in U.S. Pat. No. 5,680,478. Note that the implementation method of the regional division process is not limited to such specific process, but other methods may be applied.
In step S503, document information is generated for each region obtained in step S502. Each document information includes an attribute, layout information such as position coordinates on a page or the like, a character code string if the attribute of the divided region of interest is text, a document logical structure such as a paragraph, title, or the like, and so forth.
In step S504, each region obtained in step S502 is converted into transfer information. The transfer information is required for rendering. More specifically, the transfer information includes a resolution-variable raster image, vector image, monochrome image, or color image, a file size of each transfer information, text as a character recognition result if the attribute of the divided region of interest is text, positions and font of individual characters, reliability of characters obtained by character recognition, and the like. Taking
In step S505, the regions divided in step S502, the document information generated in step S503, and the transfer information obtained in step S504 are associated with each other. Respective pieces of associated information are described in a tree structure. The transfer information and document information generated in the above steps will be referred to as components hereinafter.
In step S506, the components generated in the above steps are saved as an intermediate digital document. The saving format is not particularly limited as long as it can express the tree structure. In this embodiment, the intermediate digital document may be saved using XML as an example of a structured document.
The signature information generation process 407 in
In step S801, a digest value of data to be signed is generated for each data to be signed. Note that the data to be signed is the one which is included in the intermediate digital document, and can be considered as transfer information a (701), transfer information b (702), or document information (703) in
In step S802, an identifier of the data to be signed is generated for each data to be signed. Note that the identifier needs only uniquely identify the data to be signed. For example, in this embodiment, a URI specified by RFC2396 is applied as the identifier of the data to be signed. However, the present invention is not limited to this specification, and various other values may be applied as identifiers.
It is checked in step S803 if processes of steps S801 and S802 have been applied to all the data to be signed. If such processes have been applied to all the data to be signed (“YES” in step S803), the flow advances to step S804; otherwise, the flow returns to step S801.
In step S804, a signature value generation process is executed using the private key 406 for all the digest values generated for an identical digital document in step S801 and all the identifiers generated in step S802 to calculate a signature value. In order to generate the signature value, this embodiment applies the digital signature described in the paragraphs of “Description of the Related Art”. A detailed description of the practical arithmetic processing of the digital signature will be omitted. The data M (2101) in the signature generation process flow shown in
Subsequently, in step S805 signature information is configured using the aggregate data (all the digest values generated in step S801 and all the identifiers generated in step S802) and the signature value generated in step S804, thus ending the signature information generation process.
Note that the signature value generation process in step S804 may be executed for some of the generated digest values and identifiers (i.e., a plurality of generated digest values and identifiers) rather than all the digest values and all the identifiers generated. In this case, sub-contents which are more likely to be re-used in the original contents may be selected automatically or manually by the user, and a signature value may be calculated based on the digest values and identifiers associated with the selected sub-contents. In this case, in step S805 signature information is configured Using some digest values and identifiers used to calculate a signature value, and the calculated signature value. Even when the signature value is calculated using the plurality of (and not all of) digest values and identifiers, the signature value generation process can be done only once for the entire contents.
The structure of the digital document 411 corresponding to this embodiment will be described below with reference to
Subsequently, the signature data attachment process 408 will be described below with reference to
Each signature information is embedded with an identifier, which indicates transfer information or document information corresponding to the data to be signed, as described above. In
Note that the transfer information a (701) is considered as the data to be signed 1 (902), and the transfer information b (702) and the document information 703 are considered as the data to be signed 2 (903). Also, the signature information 1 (704) and signature information 2 (705) can be considered as the signature information 901.
The digital document archive process 409 will be described below with reference to
The digital document generation process in this embodiment has been explained. As described above, in the digital document generation process according to this embodiment, the original contents are separated into a plurality of sub-contents under the assumption that the original contents are separated and are re-distributed or re-used later, and an identifier is given to each or a group of sub-contents. As the identifier, the URI specified by RFC2396 may be applied, as has been explained in the description of step S802. However, the present invention is not limited to this and, for example, relative position information of a sub-content in the original contents may be used. Also, a value calculated using a one-way Hash function from meta data such as number information uniquely assigned to a header field of the sub-content, form information such as a contents holder, date, and the like included in the header field, and the like may be used as identifiers.
Furthermore, a digest value is generated by calculation using a one-way Hash function having a sub-content corresponding to each identifier as an input. A set (aggregate data) of the identifier and digest value is given to the compound contents. In this manner, even when some sub-contents are deleted from the original contents in the document operation process, and contents reconstructed using the remaining sub-contents are distributed, the signature verification process of the reconstructed contents can be made. Furthermore, even when a signature is not generated for each sub-content block (i.e., even when signatures are not generated in one-to-one correspondence with sub-contents), whether or not each sub-content block is altered can be verified.
The possibility of verification in the reconstructed contents will be described below in association with the digital document operation process.
[Digital Document Operation Process]
The digital document 411 received in the digital document reception process 412 in
In the signature information verification process 415, the input data: M (2107) in the signature verification process flow shown in
If it can be confirmed that the aggregate data has not been altered, it is verified if a digest value corresponding to an identifier included in the aggregate data matches that generated from data to be signed. The aforementioned process will be described below with reference to
Referring to
If verification has failed in step S1002 (“NG” in step S1002), the signature verification process ends, and “NG” is returned as a result. On the other hand, if verification has succeeded in step S1002 (“OK” in step S1002), processes in steps S1003 to S1008 are executed for respective identifiers 905 and 907 included in the aggregate data 909.
In step S1004, data to be signed 902 or 903 is extracted from the digital document 411 based on the identifier 905 or 907. It is checked in step S1005 if the data to be signed 902 or 903 can be obtained. If the data to be signed 902 or 903 can be obtained, the flow advances to step S1006. If the data to be signed 902 or 903 cannot be obtained, the flow jumps to step S1008. If the next identifier exists, the process in step S1004 is executed for the corresponding data to be signed. If data to be signed that cannot be obtained from the digital document 411 exists, a message indicating that a sub-content corresponding to the identifier of interest is not included as data to be verified may be displayed on the digital document operation apparatus 205. This display can be made by utilizing a display device of the computer 103 or printer 104 in the arrangement shown in
In step S1006, a digest value: H(M) of the data to be signed 902 or 903 is calculated based on the method shown in
Since the reconstructed digital document 411 is not the digital document 411 generated in the digital document generation process 401, its signature information may often include a digest value of a non-archived content.
Hence, verification of the reconstructed digital document 411 will be explained below with reference to
The digital document 411 to be distributed at this time is rewritten, as shown in
A case will be examined below wherein the processing is done based on the flowchart in
In this manner, a mechanism that skips the digest matching process for a sub-content which is included in the aggregate data 909 but is not archived in the digital document 411, and guarantees non-alteration/alteration for an archived sub-content can be provided.
In the conventional signature generation process, signature values must be provided to the data to be signed 1 and 2 (902 and 903), respectively. Therefore, the load on the calculation process becomes heavier. In particular, the computation volume increases in proportion to the number of divisions of the divided data to be signed.
By contrast, in this embodiment, the calculation process of the signature value can be done only once irrespective of the number of divisions of the contents. In this manner, according to this embodiment, the signature generation and signature verification processes can be executed far more efficiently than the prior art. Even when data is reconstructed using only some sub-contents, whether or not the sub-contents have been altered can be reliably verified
As described above, according to the present invention, signature verification is allowed not only for text data but also for compound contents of digital data stored in various formats even when a sub-content as a part of such compound contents exists separately. In addition, the signature generation and signature verification processes can be efficiently executed.
<Second Embodiment>
The verification process described in
When a sub-content whose digest value obtained as the calculation result in step S1006 does not match that included in the aggregate data 909 is found, it is determined that the signature verification process ends in step S1007 of
Hence, in this embodiment, even when the matching result in step S1007 is NG, the verification processes from steps S1003 to S1008 are continued for all remaining identifiers included in the aggregate data 909 without forcibly ending the process. Then, as the verification result, a list of sub-contents which have not been altered and those which have been altered is returned. In this manner, the user can be informed of information associated with the presence/absence of alteration for respective sub-contents via the computer 103, printer 104, or the like. In this way, the user can permit the contents when some sub-contents have been altered, but other sub-contents have not been altered. Therefore, a mechanism which allows sub-contents which have not been altered to be re-used can be provided.
<Third Embodiment>
This embodiment will explain a case wherein the user can select data to be signed. In the above embodiments, the signature process is executed in the signature information generation process 407, and details of that process have been described using
This embodiment is characterized in that a new process for selecting data to be signed is provided between the intermediate digital document generation process 405 and signature information generation process 407. This process will be referred to as a data to be signed selection process in this embodiment. The data to be signed selection process will be described below.
In the data to be signed selection process, image data scanned in the paper document input process 404 is displayed on the screen of the apparatus in the format shown in
When the display is made in the format shown in
By contrast, the user may often want to sign a region narrower than the region divided in the intermediate digital document generation process 405. For example, a region which is more likely to be divided in the future as a sub-content is narrower than the region divided in the intermediate digital document generation process 405 in some cases.
Assuming such case, a divided region can be divided into finer regions, as shown in
When a narrower region is allowed to be designated, designation of a desired region (e.g., the region 1301 in
When regions divided in the intermediate digital document generation process 405 can be further finely divided, and can be used as data to be signed in the signature information generation process 407, the selection method of data to be signed with a higher degree of freedom for the user can be provided.
Note that the region 1301 may be used as one of the divided regions, and difference information between the regions 606 and 1301 may be used as a new divided region. In the former case, the data size of the digital document 411 increases but processing is easy. In the latter case, a new regional division process is required.
As described above, the user can select data to be signed, and can execute the signature information generation process. The user can designate not only rectangular regions divided in advance but also arbitrary regions as data to be signed.
<Embodiment Based on Other Cryptographic Algorithms>
In the above embodiments, the encryption process (secret conversion) based on a public key cryptosystem has been described. However, the present invention can be easily applied to an encryption process method based on a secret key cryptosystem and MAC (message authentication code) generation method, and the scope of the invention includes a case wherein the above embodiments are implemented by applying other cryptographic algorithms.
<Other Embodiments>
Note that the present invention can be applied to an apparatus comprising a single device or to system constituted by a plurality of devices.
Furthermore, the invention can be implemented by supplying a software program, which implements the functions of the foregoing embodiments, directly or indirectly to a system or apparatus, reading the supplied program code with a computer of the system or apparatus, and then executing the program code. In this case, so long as the system or apparatus has the functions of the program, the mode of implementation need not rely upon a program.
Accordingly, since the functions of the present invention are implemented by computer, the program code installed in the computer also implements the present invention. In other words, the claims of the present invention also cover a computer program for the purpose of implementing the functions of the present invention.
In this case, so long as the system or apparatus has the functions of the program, the program may be executed in any form, such as an object code, a program executed by an interpreter, or script data supplied to an operating system.
Examples of storage media that can be used for supplying the program are a floppy disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, a CD-R, a CD-RW, a magnetic tape, a non-volatile type memory card, a ROM, and a DVD (DVD-ROM, DVD-R or DVD-RW).
As for the method of supplying the program, a client computer can be connected to a website on the Internet using a browser of the client computer, and the computer program of the present invention or an automatically-installable compressed file of the program can be downloaded to a recording medium such as a hard disk. Further, the program of the present invention can be supplied by dividing the program code constituting the program into a plurality of files and downloading the files from different websites. In other words, a WWW (World Wide Web) server that downloads, to multiple users, the program files that implement the functions of the present invention by computer is also covered by the claims of the present invention.
It is also possible to encrypt and store the program of the present invention on a storage medium such as a CD-ROM, distribute the storage medium to users, allow users who meet certain requirements to download decryption key information from a website via the Internet, and allow these users to decrypt the encrypted program by using the key information, whereby the program is installed in the user computer.
Besides the cases where the aforementioned functions according to the embodiments are implemented by executing the read program by computer, an operating system or the like running on the computer may perform all or a part of the actual processing so that the functions of the foregoing embodiments can be implemented by this processing.
Furthermore, after the program read from the storage medium is written to a function expansion board inserted into the computer or to a memory provided in a function expansion unit connected to the computer, a CPU or the like mounted on the function expansion board or function expansion unit performs all or a part of the actual processing so that the functions of the foregoing embodiments can be implemented by this processing.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
This application claims the benefit of Japanese Patent Application No. 2005-263074, filed Sep. 9, 2005, and Japanese Patent Application No. 2006-232812, filed Aug. 29, 2006, which are hereby incorporated by reference herein in their entirety.
Number | Date | Country | Kind |
---|---|---|---|
2005-263074 | Sep 2005 | JP | national |
2006-232812 | Aug 2006 | JP | national |