In general, a bank customer can request a paper copy of a cancelled check. If digitized images of the cancelled checks were returned to the customer's bank, as described in the Background of the Invention, then bank locates the digitized image of the requested check, and prints a copy onto paper for the customer, instead of retrieving the actual, physical, cancelled check.
The Inventors have observed that a question can arise as to whether the digitized image which the bank retrieves is an accurate copy of the digitized image initially created when the check underwent the clearing process. One resolution to this question can be achieved by adding a digital signature to the original digitized image. Some basic principles of digital signatures will be explained, to explain how digital signatures can verify authenticity of a copy of the original digitized image.
(1) image-data 30 of a check, indicated as bytes B, which begin with byte B(1) numbered 21;
(2) a header 35 which includes other information, such as the technical information discussed above, and represented by bytes X; and
(3) a pointer 38, containing bytes P, which may point to data which relates to another image of the check, within the same file.
All bytes B, X, and P can be treated as numbers, for purposes of the digital signature, even though the bytes may, in fact, represent other information, such as alphabetical characters.
To generate a digital signature, one first selects a subset of the numbers, or bytes, in the file. (One could use all numbers in the file, and the concept of a digital signature does not preclude usage of all the numbers. However, the Inventors point out that trade-offs are involved here. On the one hand, usage of all numbers in the file may require greater computation time. On the other hand, a computer program which develops a signature from all the numbers may be easier to generate. Further, even if usage of all numbers imposes certain difficulties, the difficulties may be justified by the fact that the file is extremely valuable, and the use of all numbers can provide greater protection.)
This selected subset is called the “digest” of the file. A formula determines how the digest is selected. As a simple example, the formula may specify that (1) the first byte, (2) every tenth byte thereafter, and (3) the final byte are used. This particular selection of bytes is indicated in
In some approaches, the subset is further processed, in order to produce a digest of a specific length, such as 128 bytes. One reason is that the algorithm, described below, requires input of that specific length.
The digest is then applied as input to a selected algorithm 40 in
To determine whether a copy of the original file is identical to the file itself, one repeats the process just described, but upon the copy, rather than the original file itself. That is, one extracts a digest from the copy, and applies the digest as input to the same algorithm.
If the same signature is obtained, then it is known, with an extremely high degree of probability, that the copy is an accurate rendition of the file. If the same signature is not obtained, it may safely be assumed that the copy is not accurate.
The Inventors have discovered problems when this approach is applied to files containing multiple digitized images. The problems will be explained by reference to
The TIFF file also contains two headers. One header, the Image File header, IFH, includes (1) a pointer, labeled OFFSET A, and (2) other technical data, not shown. The pointer OFFSET A points to another header, the IFD, Image Format Directory, by specifying the offset of the latter header IFD from the beginning of the file, in number of bytes. The offset is indicated by distance 105. The header IFD contains the technical information (check dimensions, type of compression, etc.) discussed above.
The pointer OFFSET A is needed because, under the TIFF convention, the header IFD need not be located immediately subsequent to the previous header IFH.
Another pointer is present, POINTER A, and is located in the IFD header. This pointer serves two functions. One function is a result of the fact that the TIFF file may contain multiple image-data 100, as explained above. In such a case, each collection of image-data 100 is assigned its own IFD header. For example, in the check-system under discussion, a single TIFF file will contain four digitized images of a check. The TIFF convention requires one header IFD header for each digitized image, for a total of four IFDs.
In such a case, shown in
However, in
These values of 0000 indicate the second function served by POINTER A. That second function is to indicate that no further headers IFD are present. Thus, POINTER A either (1) points to the next IFD or (2) indicates that no further IFDs are present.
Header IFD in
A digital signature can be taken of the file of
However, if the single file in
Each check is assigned image data: IMAGE DATA-1, IMAGE DATA-2, etc. Each check is also assigned an IFD, Image File Directory, for its block of image-data. The IFDs contains the technical information discussed above.
Pointers are present, labeled O1 (offset 1), O2 (offset 2), and so on. Offsets O3, O5, O7, and O9 correspond in function to POINTER A in
That is, in concept, the single header IFH in
Therefore, the value OM in original CHECK 2 has probably been changed to the value of O1 in the composite check which contains three other checks, as indicated by the dashed double-arrow pointing to those two offsets.
Similarly, offset ON in CHECK 2 will be different from corresponding offset O4.
Also, offset OP in CHECK 2 will be different from corresponding offset O5.
Therefore, assume that a formula is used to take a digest from CHECK 2, as stored within the composite file in
Thus, a problem arises in attempting to use digital signatures to validate a copy of a digitized check, when taken from a composite image file containing several checks.
One stratagem for mitigating or eliminating this problem is shown in
In the same left column, the terms OA, OB, and POINT, refer to OFFSET A, OFFSET B, and POINTER A in
In the central column of
That is, when one of the four digitized images of a check is initially created, one of the four triplets in the center of
The right column in
One exception lies in the POINTER A of
Thus, the single check which is placed in the last position within the composite file of
From another perspective, the central column of
In one form of the invention, sufficient data is associated with the data of
The invention specifically contemplates a file format which contains separable sub-files. For example, a TIFF file can be concatenated with another file, such as the recovery data 150 of
However, another processing program knows that data of interest to it lies beyond the I-EOF, and locates the data based on the I-EOF. For example, a digital signature recovery program would locate the table of
One form of the invention lies in the process encompassing the following steps.
1. Generating multiple digitized images for each bank check processed in a check-clearing process.
2. Packaging each digital image into an individual graphics file.
3. Deriving a digital signature for each graphics file.
4. Modifying parts of the graphics files, in order to package the graphics files into a single, composite file containing multiple digitized images.
5. Storing data indicating the modifications, so that the individual graphics files can be recovered from the composite file and produce the correct digital signatures.
In another form of the invention, additional data can be interleaved within the data blocks of
Block 200 represents a private header which is inserted between IFD-1 and IMAGE DATA-1. The private header 200 contains pointers, indicated by the dashed arrows 205, which point to the three other added blocks 210, 215, and 220. Three other blocks are shown, but a greater or lesser number may be used, depending on the needs of the designer.
Block 210 represents a document which is inserted. The document 210 may contain content which is conceptually associated with the TIFF image stored in IMAGE DATA-1. For example, the document 210 may take the form of a monthly checking account statement. The TIFF image in IMAGE DATA-1 may contain a cancelled bank check related to the same bank account. In this example, an overall goal is to consolidate all bank records relating to the specific account, or to a specific person, in a single file, which is represented in
As another example, the document 210 may be the original TIFF image of the cancelled check represented by the TIFF image contained in IMAGE DATA-1. As explained above, in general, the original TIFF image of block 210 will contain different pointers than will IMAGE DATA-1, because IMAGE DATA-1 is incorporated into a multi-image TIFF file, which required an alteration of the pointers. Consequently, the original TIFF image of block 210 will produce a different digital signature than will IMAGE DATA-1.
However, if the document 210 contains the original TIFF image, then that original TIFF image can be recovered, by simply reading block 210. The difference structure of
Therefore, as so far explained, this additional embodiment provides two features. One, an ordinary TIFF reader can be used to read the file of
The second feature is that the original, unaltered TIFF image can be available within block 210. The original TIFF image can be read directly by appropriate software, and its digital signature verified, if desired.
The only security issue lies in the trustworthiness of the party who (1) received the original TIFF image, (2) generated the multiple TIFF file of
The original TIFF image within block 210 can be located by pointers 205, which are contained in the private header 200. The private header 200 can contain a unique identifier, in the form of a unique character sequence, which allows the private header 200 to be located by software which scans the overall file, looking for the identifier. Once private header 200 is located, the pointers to blocks such as 210, 215, and 220 become available to retrieve those blocks.
In one embodiment, a private pointer O9 can be placed into, for example, header IFD-1, which points to the private header 200, and is used to locate the private header 200. This approach can eliminate the need of the unique identifier contained in the private header 200. It is possible to use both the private pointer 09 and the unique identifier.
Blocks 215 and 220 contain additional data, and, as stated above, more or fewer blocks can be present. Block 215 contains a digital signature for the data within block 210, which, in the immediate example, is a digital signature for the original TIFF image of a bank check. Block 220 contains a digital signature for the data of IMAGE DATA-1.
This process is repeated, if desired, for the other IMAGE DATA blocks, to produce a file having the general structure shown in
It is pointed out that the pointers O2, O3, etc., in
It is, of course, possible in the situation of
Block 310 indicates that a private IFD and other data is interleaved into the TIFF file. The image to the right of block 310 indicates the overall file, after the interleaving. Pointer 313, which previously pointed to IMAGE DATA-1, is no longer correct at this time.
Block 320 indicates that pointers in the TIFF's IFD, as well as other pointers, are corrected. For example, the pointer O2 is corrected to accurately point to IMAGE DATA-1. As another example, it may be convenient to create pointers 205 at this time, because, until the length of IMAGE DATA-1 becomes known, the minimum required distance between block 200 and block 210 is not known, and that distance becomes available at this time.
Block 340 in
Block 360 indicates that another private header and associated data is interleaved within the file. Blocks 400, 405, 410, and 415 indicate this interleaved material.
Block 370 indicates that the pointers are adjusted.
This process continues until all desired additional data is inserted into the file, thereby producing, for example, the file shown in
This approach produces a file having the following important characteristics. One, it can be read and displayed by an ordinary TIFF reader, although the TIFF reader does not display the added material, indicated by the heavy blocks in
The added material, in general, can take the form of any type of digital data, including without limitation word processing documents; digitized images, including biometric images such as fingerprints and photographs; encrypted data; and data which is partly redundant to that in the TIFF file, such as the original TIFF document discussed above.
In the Additional Embodiment discussed above, several TIFF files were concatenated into a single file, together with additional material interleaved among the TIFF files. In this Second Additional Embodiment, non-TIFF files are accepted as input, and concatenated.
In
Such conversion is known in the art. As a simple example, many software applications, and some operating systems, contain routines which package documents into a format suitable for transmission to a facsimile machine. Conversion from the facsimile format into a TIFF standard is well known.
As another example, many optical scanners produce digitized images from paper documents. Software supplied with the scanners offers numerous formats in which to export the digitized images, and the TIFF format is commonly included.
After the conversion of block 405, the process beginning at block 300 in
Block 310 indicates that a private IFD and other data are added to the TIFF document. In one form of the invention, the other data, indicated by document 210, takes the form of the non-TIFF original document, which was received by block 400 in
That is, in the image adjacent block 310, the document 210 represents the original non-TIFF document, and IMAGE DATA-1 represents the TIFF document into which the non-TIFF document was converted. This arrangement is somewhat analogous to the situation discussed above, wherein document 210 represents an original TIFF document and IMAGE DATA-1 represents the same TIFF document, but with altered pointers.
As a specific example, the non-TIFF document 210 can take the form of a paper photograph which has been digitized by an optical scanner. The IMAGE DATA-1 can take the form of a TIFF file derived from the scanned photograph.
In
A digital signature 215 for the non-TIFF document 210 can be generated, as can a digital signature 220 for the newly created TIFF file, represented by IMAGE DATA-1.
The added material represented by block 210 in
Additional Considerations
1. The term “digest” is a term-of-art, and refers to the subset of data extracted from a file, which is used as input to an algorithm which produces a digital signature. The subset is not precluded from including all characters in the file.
2. The term “digital signature” is a term-of-art. Digital signatures are described in the text “Applied Cryptography,” by Bruce Schneier (John Wiley & Sons, New York, 1996, ISBN 0 471 12845 7). This text is hereby incorporated by reference.
This term-of-art will be emphasized by a counter-example.
“Digital signature,” as a generic term, could be used to describe a handwritten signature which has been digitized. That is, as a generic term, as opposed to a term-of-art, it could describe a bitmap of a handwritten signature.
But, as a term-of-art, it does not describe such a bitmap.
In one usage as a term-of-art, it describes a computed result, produced by an algorithm, to which a “digest” has been applied as input.
3. The term “file,” referring to “computer file,” is a term-of-art. One definition of such a “file” is a collection of data which is processed by a computer, or its operating system, as a unit.
For example, a computer contains a microprocessor. Assume that no operating system is installed in the computer. One can order the computer to print data on a printer, by issuing to the microprocessor, for each character of the file to be printed, the proper collection of “print” commands. The microprocessor then issues its own commands to the memory location, or port, to which the printer is connected.
However, if an operating system is installed, one can specify the data to be printed by means of a file name, as opposed to issuing individual instructions for each character in the file to be printed.
Similarly, the operating system allows the data to be stored, and retrieved, based on the file name.
Thus, one characteristic of a “file” is that it can be processed in certain ways, based on its name, rather than on the individual characters within it.
Consequently, a mere collection of data is not necessarily a “file.” It can become a “file” by giving it a name, and formatting it, both in a manner usable by an operating system.
As a specific example, while a collection of stock market reports in a newspaper may constitute “data,” the collection is not necessarily a “file,” or “data file.”
One reason is that the data is not usable by an operating system. Even if the data is encoded as ASCII bytes, it still has not become a “file.” The mere collection of bytes cannot be handled by an operating system, until properly formatted and named.
4. In the examples given herein, all pointers indicate positions of items, relative to the beginning of the file, as in
However, the principles of the invention can still be used if the pointers use different base points. For example, pointer A can indicate the distance from the beginning of a file to item A. Pointer B can indicate the distance from the end of item A to item B, and so on. Such pointers are sometimes called “relative” pointers.
5. This Point 5 will offer definitions of some terms.
In the original TIFF files (or other type file), such as that of
Also, a specific location in the file can qualify as a parameter. For example, the Nth byte from the beginning can be a parameter.
The parameters are assigned values. That is, the “parameters” identify the bytes of interest in various ways, but the content of those identified bytes are the “values” of the parameters.
To repeat: a group of bytes (a parameter) can be identified by a label. For example, the label may be “TAG_53” and the bytes identified are the two bytes immediately following the label, as in
TAG_53: byte(1), byte(2)
The numerical value of each group of bytes is the “value” of the parameter.
By analogy, in a bank check, the blank “date” field is a parameter, and the handwritten contents of the field represent the value of the parameter.
From another perspective, the parameter describes the meaning of the value. For example, the number 32 can be a value, which has little meaning in itself. However, if “32” is the value of a “date” parameter, then it can refer to February 1, the 32nd day of the year.
Under the invention, parameters with their associated values are stored the TIFF files of the individual bank checks. For example, OFFSET 2, or O2, in
When the TIFF files are combined into the single composite file the parameters are still present, but the values can change.
As a hypothetical example, in
However, in the composite file, at the top of
Therefore, in one form of the invention, an individual TIFF file contains one or more parameters, each having a value. The parameters are retained when the individual files are collected into the composite file, but the values of the parameters may change.
Since the values may change, if those changed values are included in a digest created based on the composite file, the digital signature will change.
6. TIFF files have a format which is compatible with a TIFF reader, which can read the TIFF files, and then display a graphical image of the image-data, as by printing the image, or displaying the image on a monitor.
It could be said that the format of the TIFF file is also compatible with an ordinary text editor, which can read the file and display the individual bytes, but which cannot display a graphical image of the image data. However, this latter meaning is not intended herein.
One definition of “compatible” can be derived by observing a common characteristic of all computer files, namely, that they all consist of bits, which are arranged as characters, such as bytes. However, the format of a TIFF file provides additional functionality beyond the mere presence of bytes, such as the ability to cooperate with a TIFF reader to produce a graphical image.
Similarly, an HTML document is formatted in a manner which allows an HTML reader to display the document in a way specified by the codes within the HTML document.
Similarly, a digitized music file is formatted in a manner which allows a music player to play a song. A similar comment applies to a movie file.
Thus, one definition of “compatible” is that a file is “compatible” with a program if (1) the two can cooperate to produce predetermined functionality, such as displaying an image or movie, or playing music, and (2) other files exist which cannot cooperate with the program to produce that functionality.
As a negative definition, the mere ability of a program to read data from a file does not make the file compatible with the program.
7. It is possible to characterize one form of the invention so that it superficially resembles a certain prior-art process. For example, it could be said that the invention begins with files which produce digital signatures. The files are combined into a single composite file, with modifications, so that the files no longer produce their digital signatures. The invention extracts the files from the composite file, and removes the modifications, so that the extracted files again produce the proper digital signatures.
It could be said that an ordinary compression process has these features. That is, the process of (1) combining files into a single file and (2) compressing the single file causes the individual files to fail to produce their digital signatures. Then, if the single file is de-compressed, and the individual files are recovered, they will now correctly produce their digital signatures.
However, one distinction between this process and one form of the invention is that the compressed file is not usable by a program with which the files are “compatible.” For example, a TIFF reader cannot read the compressed file.
Also, under the invention, when a TIFF file is placed into the composite file, some content of the TIFF file is modified. In general, that does not occur in the compression process. That is, the compression process is designed not to modify content. The compression process modifies the symbols representing content, but does not modify the content itself.
As a simple example, a compression algorithm may process data in units of 100 characters. If a given set of 100 characters begins with “W,” is followed by 98 zeroes, and then ends with another “W,” (i.e., W00000000 . . . 00000W) the compression algorithm may represent those 100 characters as
W-0(98)-W
which means 98 zeroes with “W” at both ends. The “content” (98 zeroes with “W” at both ends) has not been changed, but the symbols representing the content have been changed.
8. The discussion above has focused on TIFF files. However, the invention is applicable to computer files generally, which are collected into a single composite file.
9. Four sub-files can be extracted from the composite file of
If this were done, then the same digital signatures would be obtained from the sub-files, after extraction, compared with the sub-files, as present in the composite file.
However, these sub-files, after extraction, are not compatible with a TIFF reader, for reasons described herein.
10. It was stated above that four images were generated of a check: two images of the check as it appeared on arrival, and two images of the check after any alterations.
Another reason for generating multiple images lies in error correction techniques. One set of images can be generated in a black/white format, and another set generated in grayscale format. The two sets of images allow recovery of content which may have been lost in the digitizing process.
11. One specific embodiment contemplates insertion of individual TIFF files, containing images of bank checks as discussed herein, into a composite file. As a specific example, images of all bank checks drawn on a given account in one month, or other accounting period, are combined into the composite file.
Other documents, associated with the bank checks, are interleaved within the composite file. For example, the other documents may include the bank statement for the one-month period identified above.
In one form of the invention, the checks are readable by a TIFF viewer, but the other documents are not.
More generally, the invention contemplates combining TIFF documents into a multi-image TIFF file, for reading by a TIFF viewer, and the addition of other documents, which are not readable by the TIFF viewer, but which are located and manipulated using the private headers, such as header 200 in
In the example given above, it is preferable that only a single copy of the bank statement be inserted into the composite file. That is, a copy of the bank statement is not concatenated with each bank check, but only a single copy is interleaved within the composite file.
12. In one form of the invention, a difference structure of the type shown in
13.
14.
For most, if not all, non-TIFF files, these characteristics and others (e.g., size of pixels, size of image in pixels, type of encryption) will possess values within known ranges. Therefore, in many, if not all, expected cases, the conversion into TIFF format will simply involve identifying the relevant variables, and specifying them in the TIFF headers (e.g., IFH and IFD). The original bitmap can be used without alteration.
15. In one form of the invention, additional data is interleaved within a multi-page (or multi-view) TIFF file. The data contains information which is not related to functional aspects of the file. For example, computer files can contain data which, in essence, server a formatting function, such as an end-of-file (EOF) marker. The EOF marker serves a formatting function, by indicating the end of the file.
As another example of functional aspects, the individual contents of the TIFF file need not be adjacent. For instance, in
Therefore, under the invention, the data which is interleaved within the file contains information usable by third parties. The pointers to that data are used to find the data.
16. In one form of the invention, data which is interleaved within a multi-page TIFF file includes content which is identical to some content of the TIFF file. For example, one page of the TIFF file may include a bitmap of one side of a bank check as it arrives for processing. However, as explained herein, the pointers in that page may be different from the pointers within the original TIFF file, generated from the bank check.
The original bitmap of the bank check may be interleaved within the TIFF file. That bitmap may be a TIFF file, and contain the original pointers. However, this original bitmap will not be displayed by a TIFF reader, at least for the reason that the IFDs in
Numerous substitutions and modifications can be undertaken without departing from the true spirit and scope of the invention. What is desired to be secured by Letters Patent is the invention as defined in the following claims.