A better understanding of the invention will be obtained by considering the detailed description below, with reference to the following drawings in which:
Referring to
General purpose computer 110 communicates with travel document reader 120 and external storage device 130. As will be appreciated by those in the art, data stored in external storage device 130 may be alternately stored on the hard drive integral to general purpose computer 110. Travel document reader 120 is used to input features associated with a security document 140 (such as a passport, visa, identity card, etc.) into DCS 100 for analysis, to assist the operator with a determination as to whether security document 140 is authentic. In operation, the operator places security document 140 onto an image capture surface associated travel document reader 120 and a portion or all of security document 140 is then exposed to various light sources. Travel document reader 120 is designed to recognize documents that are compliant with the relevant standards and specifications governing such documents. These specifications and standards may be set by the authorities which issue these documents as well as international organizations such as the ICAO (International Civil Aviation Organization). As part of the image capture process, the security document 140 may be exposed to various forms of light such as ultraviolet (UVA and UVB), infrared (IR), red/green/blue (RGB) and white light to determine if certain expected features are present. More specifically, light emitting diodes (LEDs) expose security document 140 to UV, IR and RGB light, while a fluorescent light source exposes security document 140 to white light. In all cases, the light reflected from the surface of security document 140 is captured by a charge coupled device (CCD) or complementary metal-oxide semiconductor (CMOS) sensor, either of which converts the light into electronic signals that can be digitally processed.
In the configuration shown in
General purpose computer 110 has stored thereon, document comparison software, which processes the captured information and compares it to information contained in local security feature/image database 130, to determine if security document 140 is authentic. Alternately, document comparison software could be stored on central server 150 and accessed by each of the plurality of general purpose computers 110 attached thereto. As will be appreciated by those in the art, travel document reader 120 typically includes firmware for accomplishing various reader specific tasks such as acknowledging receipt of security document 140 onto the scanning surface and capturing various the images discussed above. This firmware operates seamlessly with the document comparison software in the analysis of security document 140. More specifically, the firmware associated with travel document reader 120 sends and receives requests for information related to a specified document template, as will be discussed in more detail below.
Document comparison software is comprised of several modules as depicted in
Another module contained in the document comparison software is a template builder graphical user interface (GUI) 310 for assisting the user of DCS 100 with the management of knowledge base 300 and its associated templates. Template builder GUI 310 allows the creation, deletion and renewal of the data that represents a document template. This basic functionality of template builder GUI 310 can either be done in a step-by-step manner for specific entities within a document template or the user can have the tool create a generic layout of a document template with default values. Template builder GUI 310 also provides an interactive visual representation of the hierarchal data in knowledge base. This allows the user to easily scan various document templates contained within knowledge base 300 and quickly apply those changes that are required.
Referring to
Referring again to
When security document 140 is inserted into travel document reader 120 it automatically sends signature image(s) and/or signature feature(s) to the document inspection engine 320. Signature image(s) and/or signature feature(s) are used to determine a document type (e.g. passport) upon which further validation processing can be initiated. More specifically, using the retrieved signature images(s) and/or signature feature(s) document inspection engine 320 determines one or more matching templates. Each template defines the additional data to be retrieved using travel document reader 120 to validate security document 140.
Important enablers for matching templates are signature features. Generally speaking, document inspection engine 320 can locate, process and score features, but signature features also implement a “find matching templates” process. The “find matching templates” process calculates a unique signature for the security document 140 under analysis. This process preferably utilizes a scoring mechanism which ranks the matching templates. From the list of ranked matching templates, the highest scored template is chosen, and this template will be used in the validation of the security document 140 under analysis. Optionally, an operator can select the preferred template from the list.
Once the security document 140 under analysis is identified, additional features associated with security document 140 are located, processed and scored by document inspection engine 320 to determine if security document 140 is authentic. Feature locating, processing and scoring are most commonly methods exported from image and data processing libraries or DLLs. For example a machine readable zone (MRZ) feature uses an image utility for page segmentation and a multi font OCR engine will be called to recognize the letters. MRZ scoring is based on advanced comparators and libraries that have been developed according to ICAO standards. Another example is a pattern recognition feature that locates sub images and uses a normal cross correlation algorithm which generates a number used for scoring.
When all data is received for security document 140, document inspection engine 320 starts the scoring process. The hierarchical structure of knowledge base 300 is key to this process. Scoring security document 140 is a user-weighted summary of scoring all pages, all comparison groups and all properties attached to security document 140. Scoring pages is a user-weighted summary of scoring all data, images and scoring all properties attached to the page. Scoring data and images is a user-weighted summary of scoring all features and properties attached to the page. To score a feature it must first be located then processed before scoring is performed. Scoring a feature involves a user-weighted summary of all properties, property locations and feature location scores. As will be discussed in relation to
Referring to
Additionally, inspector GUI 340 includes a display bar 770. As shown in
Finally, inspector GUI includes a search bar 780. As shown in
As depicted in
Referring to
An image capture module 800 communicates with the scanner 120 to result in a digital image or a digital representation of a page of the security document 140. Once the digital image of the page (e.g. a digital image 805 of a passport page shown in
When the digital image of the specific area or feature has been localized, a mathematical transform is applied to the digital image of the localized feature or area using the mathematical transform module 820. The mathematical transform 820 may also, depending on the property or identifying feature of the security document that is being examined, apply other types of image processing to the digital image.
After the digital image of the localized feature or area has been processed by the mathematical transform module 820, the resulting image from the processing is received by an analysis module 830. The analysis module 830 analyzes the resulting image from the mathematical transform module 820 and produces a result that can easily be compared with stored data derived from a reference security document. The result of the analysis module 830 can then be used by the comparison module 860 to determine how close or how far the feature being examined is from a similar feature on a reference security document. The data for the reference security document is retrieved by a data retrieval module 850 from the database. Once the relevant data for the relevant feature of the reference security document has been retrieved, this data is compared with the data from the analysis module 830 by the comparison module 860. The result of the comparison is then received by the score generation module 870 which determines a score based on the similarities or closeness between the sets of data compared by the comparison module 860. The score generated may be adjusted based on user selected preferences or user or system mandated weights on the data.
It should be noted that the term “reference document” is used to refer to documents against which subject documents will be compared with. As mentioned above, features are associated with documents such that reference documents will be associated with reference features. Features associated with subject documents are compared with reference features associated with reference documents. These reference documents may be authentic or authenticated documents, meaning documents which are known to be legitimate or have been authenticated as being legitimate and not forgeries. Similarly, reference documents may be inauthentic documents or documents known or proven to be fake, forgeries, or otherwise illegitimate. If the reference document being used is an authentic document, the features associated with a subject document are compared to the features associated with an authentic document to positively determine the presence of features expected to be on an authentic document. As an example, if a feature on the reference document (an authentic document in this example) corresponds very closely (if not exactly) to a similar feature on the subject document, then this is an indication of a possible authenticity of the subject document. On the other hand, if the reference document used is an inauthentic document or a known forgery, then a close correlation between features associated with the subject document and on features associated with the reference document would indicate that the subject document is a possible forgery. The use of an inauthentic document can thereby positively determine the possibility, if not a probability, of a forgery. Similarly, using an inauthentic document as a reference document can negatively determine the possibility of the authenticity of a subject document. This is because if the features of the subject document do not closely correlate with the features of an inauthentic document, then this may indicate the authenticity of the subject document.
It should also be noted that the image capture module 800 may be derived from or be found in commercially available software libraries or dynamic link libraries (DLLs). Software and methods for communicating with and receiving digital images from different types of scanning apparatus is well-known in the art of digital scanning and software.
As noted above, to localize a feature or area of the digital image from the image capture module, the feature/area localization module 810 is used. One method which may be implemented by this module 810 is based on having a reference digital image of an area or feature being searched for in the digital image from the image capture module 800. The method, in essence, reduces to searching the digital image for an area or feature that matches the smaller reference digital image. This is done by using normalized cross-correlation.
After normalized cross-correlation is applied to a reference digital image and a subject digital image, the resulting image indicates the regions in the subject digital images which most closely matches the reference digital image. The formula for a correlation factor (or the quality of the match between the reference digital image or template and the subject digital image at coordinates c(u,v)) is given as:
Thus, the correlation factor equals 1 if there is, at point (u,v), an exact match between the reference digital image and the subject digital image. Another way of calculating the correlation factor is to calculate how different the reference digital image and the subject digital image are at point (u,v). This difference or the “distance” between the two images can be found by using the formula:
Since the first two terms in the summation are constants, then the “distance” decreases as the value for the last term increases. The correlation factor is therefore given by the formula c(u,v)=1−e(u,v). When e(u,v)=0, then there is a perfect match at coordinates (u,v). Once the results are plotted, regions where the correlation is highest (closest to 1) appear in the plot.
To apply the cross correlation to the subject digital image, the average value over a window as large as the reference digital image is subtracted from each pixel value of the subject digital image with the window being centered on the pixel being evaluated. This is very similar to applying an averaging filter to the subject digital image. However, to overcome the issue of average values at the edges of the subject digital image, the subject digital image is normalized by padding the edges with mirror values. To best illustrate the above process,
Cross correlation can also be used to validate not only the presence/absence of a pattern but also to take into account the edge integrity of the pattern in question. Referring to
While the above process localizes the desired matching regions or features, the computational complexity may be daunting as the subject image increases in size. To address this issue, both the reference image and the subject image may both be compressed or reduced in size by the same factor. The normalized cross correlation process set out above can then be applied to these compressed images. Since the area of the reference image has shrunk and since the corresponding area of the subject image has also shrunk, then the mathematical complexity of the calculations similarly shrink. This is because the resolution and the number of pixels being used correspondingly decrease.
It should be noted that the correct reference image to be used in the above process may be determined by the type of security document being examined. Such reference images may therefore be stored in the database and retrieved by the data retrieval module 850 as required. Examples of features/areas which may have reference images stored in the database are microprinting samples, identifier symbols such as the maple leaf in the image in
Once the feature/area to be examined has been localized, a mathematical transform or some other type of numerical processing may be applied to the localized feature by the mathematical transform module 820. The transform or processing may take many forms such as applying a Fast Fourier Transform (FFT) to the image, determining/finding and tracking edges in the image, and other processes. Other types of processing such as shape recognition through contour matching, the use of a neural classifier, and wavelet decomposition may also be used.
In one embodiment, a Fast Fourier Transform (FFT) is applied to the localized image to result in an illustration of the power spectrum of the image. The power spectrum reveals the presence of specific frequencies and this frequency signature can be used to determine how similar one feature is to a similar feature in an authenticated security document. To illustrate this process,
Referring to
To continue with the example,
Referring to
It should be noted that the power spectrum of the reference image need not be stored in the database. Rather, the analyzed data from the reference power spectrum of the reference image is stored for comparison with the data gathered from the analysis of the power spectrum of the subject image. To analyze the results of the transform module 820, these results (in this case the power spectrum of the subject image) are received by the analysis module 830.
The analysis module 830 analyzes the results of the transform module 820 and produces a result that is mathematically comparable with the stored reference data. In the power spectrum example, the analysis module 830 determines which frequencies are present, which peaks are present in the power spectrum, and how many peaks there are in the spectrum. For this analysis, the subject power spectrum is filtered to remove frequencies outside a predetermined frequency range. Thus, frequencies outside the stored range of fmin and fmax are discarded. Then, a threshold is applied to the remaining frequencies—if a frequency value is below the stored threshold, then that frequency cannot be a peak. Once these conditions are applied, then the other peak conditions (the conditions which determine if a point on the power spectrum is a peak or not) are applied to the remaining points on the subject power spectrum. These peak conditions may be as follows with (x,y) being the coordinates for a point on the subject power spectrum:
Value(x, y)>Value(x−1, y)
Value(x, y)>Value(x+1, y)
Value(x, y)>Value(x, y−1)
Value(x, y)>Value(x, y+1)
Value(x, y)>Value(x−1, y−1)
Value(x, y)>Value(x+1, y+1)
Value(x, y)>Value(x−1, y+1)
Value(x, y)>Value(x+1, y−1)
Value(x, y)>Threshold
A minimum distance between peaks is also desired so that they may be differentiated. As such, an extra condition is applied to each potential peak:
With (x,y) being a point on the spectrum, (x1,y1) being another point on the spectrum, and THRESHOLD_RADIUS being the minimum desired distance between peaks, the above condition ensures that if two potential peaks are too close to one another, then the second potential peak cannot be considered a peak.
Once the above analysis is performed on the subject power system, then the number of peaks found is returned as the result of the analysis module 830. The reference power spectrum should have also undergone the same analysis and the number of peaks for the reference power spectrum may be stored in the database as the reference data.
After the number of peaks is found for the subject power spectrum, this result is received by the comparison module 860. The reference data from reference security documents, in this case the number of peaks for the reference power spectrum, is then retrieved by the data retrieval module 850 from the database 160 and is passed on to the comparison module 860. The comparison module 860 compares the reference data with the result from the analysis module 830 and the result is passed to the score generation module 870. The comparison module 860 quantifies how different the reference data is from the result received from the analysis module 830.
When the score generation module 870 receives the result of the comparison module, the score module 870 determines, based on predetermined criteria, a score to be given to the subject security document 140 relative to the feature being examined. As an example, if the reference data had 100 peaks while the subject spectrum only had 35 peaks, then the score module may give a score of 3.5 out of 10 based on the comparison module providing a difference of 65 between the reference data and the subject data. However, if it has been previously determined that a 50% correlation between two authentic documents is good, then the same 35 peaks may be given a score of 7 out of 10 (i.e. to double the raw score) to reflect the fact that a large correlation between the peak numbers is not expected. This score generation module 870 may also, depending on the configuration, take into account other user selected factors that affect the score but that may not be derived from the subject image or the type of security document (e.g. setting a higher threshold for documents from specific countries).
While the above examples use an FFT as the mathematical transform and a power spectrum signature as the representation of the characteristics of the feature being examined, other options are also possible. As an example, a color histogram of a specific region of the subject image may be generated by the mathematical transform module 820 while the analysis module 830 measures the various distributions of color within the resulting histogram. The distributions of color in the subject histogram would then be passed on to the comparison module 860 for comparison with the distributions of color from an authentic document. Clearly, the distributions of color from an authentic document would also have been generated or derived from a color histogram of a similar region in the authentic document. This method would be invariant to rotation in that regardless of the angle of the region being examined, the histogram would be the same.
Similarly, a pattern or contour matching based histogram may also be used to compare the features of a reference document with a subject document. Once a specific feature of the security document has been localized, the contour of that feature (e.g. a maple leaf design, an eagle design, or a crest design) may be obtained by applying any number of edge detector operators by way of the mathematical transform module 820. With the contour now clearly defined, the analysis module 830 can then follow this contour and measure the number of turns of the contour line in all the eight possible directions. A histogram of the turns can then be generated and normalized by subtracting the average value of the turns from every point of the histogram. The resulting normalized histogram of the contour changes would therefore be scale independent. Histograms for a specifically shaped feature should therefore be the same regardless of the size (or scale) of the feature. Thus, a large maple leaf feature should have the same histogram for a smaller maple leaf feature as long as the two features have the same shape. Thus, the details regarding a normalized contour histogram of a feature with a specific shape or pattern from a reference security document can be stored in the database (e.g. the distribution of the directions of the contours or other distinguishing characteristics of the reference histogram). This reference histogram can then be compared to the normalized contour histogram of a similar feature in a subject security document as produced by the mathematical transform module 820. The subject histogram can then be analyzed by the analysis module 830 to produce its distinguishing characteristics. The distinguishing characteristics of the subject histogram and of the reference histogram can then be compared by the comparison module 860.
It should be noted that the above methods may also be used to extract and compare not only the clearly visible features of a security document (e.g. microprinting, color of specific area, identifying indicia such as the maple leaf design) but also non-visible and hidden features as well. As noted above, the scanner may be used to properly illuminate the subject document and reveal the presence (or absence) of security features embedded on the security document. The above-noted invention may be used to compare features that can be digitally scanned to provide a digital image. The scanner may be any suitable type of imaging device.
As noted above, inauthentic documents or documents which are known forgeries may also be used as reference documents. Known features of inauthentic documents may be used as the reference by which subject documents are judged or compared against. One example of such a feature are hidden patterns in authentic documents that appear if these authentic documents are copied or otherwise improperly used. Referring to
The above options may all be used together to arrive at different scores for different features on the same security document. These different scores may then be used to arrive at an aggregate or a weighted overall score for the subject security document. As noted above, the aggregate or weighted overall score may then be provided to an end user as an aid to determine whether the subject security document is authentic or not. Referring to
The next step is that in step 910, localizing and/or detecting the feature to be examined. This step is performed by the feature/localization module 810 and the step determines where the feature to be examined is in the document by searching the document for a match with a reference image of the feature.
Step 920 is executed after the feature is localized/detected. Step 920 applies a mathematical transform to the image of the localized feature by way of the mathematical transform module 820. The transform may be the application of an FFT, the application of an edge detector operator, generating a histogram (color or contour) of the feature, or the application of any other mathematical or image processing method.
Step 930 analyzes the data/image/histogram generated by the transform module 820. The analysis extracts the useful data from the transform module's result and this analysis can take various forms. From the examples given above, the analysis may take the form of determining distances between elements in the histogram, determining the number, height, and/or presence of peaks in a power spectrum, and any other analysis that extracts the identifying characteristics of the result from the transform module 820. These identifying characteristics or metrics should be easily quantifiable and should be easy to compare mathematically with reference data stored in the database.
Step 940 provides the metrics from the analysis to the comparison module 860 to determine how quantifiably similar or different the feature of the subject document is from reference data. Also in this step may be the step of retrieving the reference data from the database.
Step 950 actually compares the metrics from the feature of the subject document with the reference data from the database. The comparison may be as simple as subtracting one number from another such that if there is an exact match, then the result should be zero. Results other than zero would indicate a less than perfect match. Alternatively, the comparison step 950 may determine a percentage that indicates how different are the two data sets being compared. From the above example of 35 peaks for the subject document and 100 peaks for the reference data, the comparison step could provide a result that notes that there is a 65% incompatibility or non-match between the two results.
Step 960 generates the final score indicative of a similarity or non-similarity between the subject feature and the reference data derived from the reference feature. As noted above, this step may take into account user or system mandated preferences that would affect the final score.
The final step 970 is that of presenting the final score to the end user as an aid to determining if the subject security document is authentic or not. It should be noted that this final step may include aggregating and/or weighting the scores of multiple different features tested/compared on the subject security document prior to providing a final score to the user.
Embodiments of the method explained above can be implemented as a computer program product for use with a computer system. Such implementation may include a series of computer instructions fixed either on a tangible medium, such as a computer readable medium (e.g., a diskette, CD-ROM, ROM, or fixed disk) or transmittable to a computer system, via a modem or other interface device, such as a communications adapter connected to a network over a medium. The medium may be either a tangible medium (e.g., optical or electrical communications lines) or a medium implemented with wireless techniques (e.g., microwave, infrared or other transmission techniques). The series of computer instructions embodies all or part of the functionality previously described herein. Those skilled in the art should appreciate that such computer instructions can be written in a number of programming languages for use with many computer architectures or operating systems. Furthermore, such instructions may be stored in any memory device, such as semiconductor, magnetic, optical or other memory devices, and may be transmitted using any communications technology, such as optical, infrared, microwave, or other transmission technologies. It is expected that such a computer program product may be distributed as a removable medium with accompanying printed or electronic documentation (e.g., shrink wrapped software), preloaded with a computer system (e.g., on system ROM or fixed disk), or distributed from a server over the network (e.g., the Internet or World Wide Web). Of course, some embodiments of the invention may be implemented as a combination of both software (e.g., a computer program product) and hardware. Still other embodiments of the invention may be implemented as entirely hardware, or entirely software (e.g., a computer program product).
Although various exemplary embodiments of the invention have been disclosed, it should be apparent to those skilled in the art that various changes and modifications can be made which will achieve some of the advantages of the invention without departing from the true scope of the invention.
A person understanding this invention may now conceive of alternative structures and embodiments or variations of the above all of which are intended to fall within the scope of the invention as defined in the claims that follow.