Method and apparatus for comparing document features using pattern recognition

Information

  • Patent Application
  • 20080025555
  • Publication Number
    20080025555
  • Date Filed
    July 31, 2006
    18 years ago
  • Date Published
    January 31, 2008
    16 years ago
Abstract
Systems and methods for assisting in the determination of the authenticity of security documents based on known characteristics of similar reference security documents. The system and methods use digital processing to capture a digital image of the document being examined and they use a feature localization or detection technique to search for a specific feature in the document based on a stored image of a similar feature from a reference document. Once the feature on the subject document has been found, the digital image of the localized feature is transformed, by applying mathematical transforms or other image/mathematical operators, such that the result will have distinguishing characteristics that can be derived or analyzed. When the distinguishing characteristics have been analyzed, these are then compared to the stored distinguishing characteristics of similar features from reference documents. Based on the comparison, a score is then generated that is indicative of how similar or how different the distinguishing characteristics of the feature being examined are from the features from reference documents. The system may also be used such that multiple features from a single document are assessed and scored separately from one another with a final aggregate or weighted score being provided to the user for the whole document.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

A better understanding of the invention will be obtained by considering the detailed description below, with reference to the following drawings in which:



FIG. 1 depicts a stand alone document comparison system;



FIG. 2 depicts a networked document comparison system;



FIG. 3 depicts the software components of the document comparison system;



FIG. 4A depicts the hierarchical organization of the elements of the knowledge base;



FIG. 4B depicts an example of a document template and a number of image features associated therewith;



FIG. 5 depicts a template builder graphical user interface (GUI);



FIG. 6A depicts an example signature feature used by the document inspection engine to identify the security document under consideration;



FIG. 6B depicts a series of example features used to validate an identified security document;



FIG. 7A depicts an inspector GUI;



FIG. 7B depicts the display bar of the FIG. 7A inspector GUI;



FIG. 7C depicts the search bar of the FIG. 7A inspector GUI;



FIG. 8 depicts a block diagram of the software modules used by the system of the invention;



FIG. 9 depicts an example of a reference digital image which must be searched for in the sample subject image of FIG. 10;



FIG. 10 depicts an example of a sample subject image in which the reference digital image of FIG. 9 must be searched for;



FIG. 11 depicts the sample subject image of FIG. 10 with its edges padded with mirror values;



FIG. 12 depicts the normalized version of the subject image of FIG. 10 as derived from the padded image of FIG. 11;



FIG. 13 depicts a plot of the normalized cross correlation coefficients derived from the reference digital image of FIG. 9 and the normalized sample subject image of FIG. 12;



FIG. 13A illustrates a sample reference image taken from an authentic document;



FIG. 13B illustrates a sample image taken from an inauthentic document;



FIG. 13C illustrates the resulting image after normalized cross-correlation is applied to the images in FIGS. 13A and 13B;



FIG. 14 depicts a sample reference image illustrating an area of a security document containing microprinting;



FIG. 15 depicts a power spectrum of the image of FIG. 14 after a Fast Fourier Transform is applied to the image;



FIG. 16 depicts a sample subject image of an area in an inauthentic document where microprinting has been attempted;



FIG. 17 depicts the power spectrum of the image of FIG. 16 after a Fast Fourier Transform is applied to the image;



FIG. 18 illustrates a sample reference image taken from an authentic document;



FIG. 19 illustrates a power spectrum of the image of FIG. 18;



FIG. 20 illustrates a sample image taken from an inauthentic document;



FIG. 21 illustrates a power spectrum of the image in FIG. 20;



FIG. 22 illustrates an original background containing a hidden pattern;



FIG. 23 illustrates the background of FIG. 22 after copying and which shows the hidden pattern; and



FIG. 24 depicts a block diagram illustrating the steps in the generalized approach to comparing and scoring a feature in a subject document relative to data from a known similar feature in an authentic document.





DESCRIPTION OF THE PREFERRED EMBODIMENT

Referring to FIG. 1, an overview of the document comparison system (DCS) (shown generally at 100) in which the present invention functions is provided. The DCS 100 is comprised of a general purpose computer 110 which may utilize, for example, a Windows XP™ operating system produced by Microsoft™ Corporation. The general purpose computer includes a monitor, input device such as a keyboard and mouse, hard drive and processor, such as an Intel™ Pentium™ 4 processor, cooperating with the operating system to coordinate the operation of the aforementioned components. As those in the art will appreciate, general purpose computer 110 could be any commercially available, off-the shelf computer including a laptop or similar device and all such devices are meant to be included within the scope of the present invention.


General purpose computer 110 communicates with travel document reader 120 and external storage device 130. As will be appreciated by those in the art, data stored in external storage device 130 may be alternately stored on the hard drive integral to general purpose computer 110. Travel document reader 120 is used to input features associated with a security document 140 (such as a passport, visa, identity card, etc.) into DCS 100 for analysis, to assist the operator with a determination as to whether security document 140 is authentic. In operation, the operator places security document 140 onto an image capture surface associated travel document reader 120 and a portion or all of security document 140 is then exposed to various light sources. Travel document reader 120 is designed to recognize documents that are compliant with the relevant standards and specifications governing such documents. These specifications and standards may be set by the authorities which issue these documents as well as international organizations such as the ICAO (International Civil Aviation Organization). As part of the image capture process, the security document 140 may be exposed to various forms of light such as ultraviolet (UVA and UVB), infrared (IR), red/green/blue (RGB) and white light to determine if certain expected features are present. More specifically, light emitting diodes (LEDs) expose security document 140 to UV, IR and RGB light, while a fluorescent light source exposes security document 140 to white light. In all cases, the light reflected from the surface of security document 140 is captured by a charge coupled device (CCD) or complementary metal-oxide semiconductor (CMOS) sensor, either of which converts the light into electronic signals that can be digitally processed.


In the configuration shown in FIG. 1, document comparison system 100 operates in a stand alone mode at locations A, B and C such as at a customs or security officer's post located at, for example, an airport or other country point of entry. As shown in FIG. 2, an alternate configuration includes each of a plurality of general purpose computers 110 communicating with a central server 150 in a client-server relationship well known to those in the art. Central server 150 communicates with a central storage device 160.


General purpose computer 110 has stored thereon, document comparison software, which processes the captured information and compares it to information contained in local security feature/image database 130, to determine if security document 140 is authentic. Alternately, document comparison software could be stored on central server 150 and accessed by each of the plurality of general purpose computers 110 attached thereto. As will be appreciated by those in the art, travel document reader 120 typically includes firmware for accomplishing various reader specific tasks such as acknowledging receipt of security document 140 onto the scanning surface and capturing various the images discussed above. This firmware operates seamlessly with the document comparison software in the analysis of security document 140. More specifically, the firmware associated with travel document reader 120 sends and receives requests for information related to a specified document template, as will be discussed in more detail below.


Document comparison software is comprised of several modules as depicted in FIG. 3. One such module is knowledge base 300. DCS 100 uses knowledge base 300 to perform its inspection tasks. Knowledge base 300 (the contents of which are stored in storage devices 130 or 160) contains known templates for a variety of security documents 140 that are identified by a document signature. Each template holds the instructions on what and how to locate, process, inspect, compare and score the various entities on the template. The document content is arranged in a hierarchical manner so as to facilitate cross document, cross page, cross image, and same document, same page, same image inspections. The elements of knowledge base 300 are further defined as follows:

    • (a) Document: A collection of page(s) or data groups to be inspected. An example might be the passport page and visa page. Properties and comparison groups can be attached to a document;
    • (b) Page: A logical grouping of images or binary representations of data. A page can have properties to be inspected, e.g. page size;
    • (c) Image: A binary data representation of an entity that has feature(s) to be inspected, e.g. captured with a different light source to expose certain features;
    • (d) Feature: A significant object within the image entity, e.g. MRZ (machine readable zone) feature, a Maple Leaf pattern. A feature contains the data required to locate, process and score parts of or the entire image. Properties can be attached to a feature.
      • (i) Signature features (to be discussed below) have an added functionality for selecting templates;
      • (ii) Self-learning features have the ability to locate and identify most or all of their properties. Such features can use processors and comparators to help with this process;
    • (e) Property: An element within an entity that can be inspected and scored, e.g. location, colour or text;
    • (f) Comparison Rule: A rule has an operator that is applied to two properties;
    • (g) Comparison Group: A collection of comparison rules to form more complex rules to perform extra checking on the security document 140. The comparison group has an optional activation and deactivation time. An example of a comparison group is to alert the operator that all male travelers, aged between 25-40 of country UTO are to be asked for a second piece of identification during the period of Apr. 1 to Apr. 2, 2005;
    • (h) Signature: A special property that is a unique identification of an entity (e.g. document, page, image or feature) within an entity group. The document type, country code and the document series id could form a document signature.


The hierarchical arrangement of the above-noted elements is depicted in FIG. 4A, with an example of a document template and a number of image features associated therewith depicted in FIG. 4B.

Another module contained in the document comparison software is a template builder graphical user interface (GUI) 310 for assisting the user of DCS 100 with the management of knowledge base 300 and its associated templates. Template builder GUI 310 allows the creation, deletion and renewal of the data that represents a document template. This basic functionality of template builder GUI 310 can either be done in a step-by-step manner for specific entities within a document template or the user can have the tool create a generic layout of a document template with default values. Template builder GUI 310 also provides an interactive visual representation of the hierarchal data in knowledge base. This allows the user to easily scan various document templates contained within knowledge base 300 and quickly apply those changes that are required.


Referring to FIG. 5, a template builder GUI 310 is depicted. Window 500 is the previously mentioned hierarchal representation of the existing templates in knowledge base 300. The commands for adding, removing and maintaining templates are instigated from this tree list. Visual display area 510 provides the user with a representation of the data with which the user is currently working. This could be graphical, binary, etc. Indicator lights 520 inform the user what data source the current data was obtained from during template creation. Finally, data entry fields 530 provide information for each of the different types of entities that make up a template. Template builder GUI 310 dynamically changes the set of fields for data entry depending on which entity is being manipulated. These entities include properties, features, images, reference pages, documents, rules, and portfolios previously discussed.


Referring again to FIG. 3, a further module of the document comparison software is a document inspection engine 320 that works in collaboration with knowledge base 300 to score a document or portfolio of documents based on inspection instructions. Document inspection engine 320 may alternately reside in document authentication server 150 and obtain images from one or more security documents 140 scanned at one or more networked travel document readers 120. As shown in FIG. 3, travel document reader 120 is just one example of the devices that reside in peripheral layer 330, with which document inspection engine communicates to obtain inspection data.


When security document 140 is inserted into travel document reader 120 it automatically sends signature image(s) and/or signature feature(s) to the document inspection engine 320. Signature image(s) and/or signature feature(s) are used to determine a document type (e.g. passport) upon which further validation processing can be initiated. More specifically, using the retrieved signature images(s) and/or signature feature(s) document inspection engine 320 determines one or more matching templates. Each template defines the additional data to be retrieved using travel document reader 120 to validate security document 140.


Important enablers for matching templates are signature features. Generally speaking, document inspection engine 320 can locate, process and score features, but signature features also implement a “find matching templates” process. The “find matching templates” process calculates a unique signature for the security document 140 under analysis. This process preferably utilizes a scoring mechanism which ranks the matching templates. From the list of ranked matching templates, the highest scored template is chosen, and this template will be used in the validation of the security document 140 under analysis. Optionally, an operator can select the preferred template from the list. FIG. 6A depicts an example of a signature feature that looks at the colour distribution of sub images to calculate a unique signature for an incoming image. This signature is used to search, score and rank matching templates.


Once the security document 140 under analysis is identified, additional features associated with security document 140 are located, processed and scored by document inspection engine 320 to determine if security document 140 is authentic. Feature locating, processing and scoring are most commonly methods exported from image and data processing libraries or DLLs. For example a machine readable zone (MRZ) feature uses an image utility for page segmentation and a multi font OCR engine will be called to recognize the letters. MRZ scoring is based on advanced comparators and libraries that have been developed according to ICAO standards. Another example is a pattern recognition feature that locates sub images and uses a normal cross correlation algorithm which generates a number used for scoring. FIG. 6B depicts example features which are located, processed and scored as part of the validation process for security document 140.


When all data is received for security document 140, document inspection engine 320 starts the scoring process. The hierarchical structure of knowledge base 300 is key to this process. Scoring security document 140 is a user-weighted summary of scoring all pages, all comparison groups and all properties attached to security document 140. Scoring pages is a user-weighted summary of scoring all data, images and scoring all properties attached to the page. Scoring data and images is a user-weighted summary of scoring all features and properties attached to the page. To score a feature it must first be located then processed before scoring is performed. Scoring a feature involves a user-weighted summary of all properties, property locations and feature location scores. As will be discussed in relation to FIGS. 7A and 7B, the results of the scoring are displayed in an inspector GUI (element 340 in FIG. 3)


Referring to FIGS. 3 and 7A to 7C, the last major module of the document comparison software is inspector GUI 340. At the end of the inspection process, the inspection results are presented via the inspector GUI 340 to an operator such as a customs officer. As shown in FIG. 7A, inspector GUI 340 includes: a list of machine inspected features 710, properties and rules where the results are signified by colour and a numerical score; a list of important features 720 that the user needs to be aware of but cannot be processed and inspected electronically by DCS 100; an image display area where those items listed in 710 and 720 are boxed on the image; a set of buttons 740 indicating what color planes were obtained and inspected for the template; a text information pane 750 that displays relevant notes pertaining to the item selected from either 710 or 720; a visual information pane 760 that displays relevant images pertaining to the item selected from either 710 or 720.


Additionally, inspector GUI 340 includes a display bar 770. As shown in FIG. 7B display bar 770 includes: a large bold single word 770A, which is easy to see and interpret quickly to indicate the status of the last operation performed; the name of the document template 770B that was used during the last document inspection process; a single sentence 770C highlighting any important information the user may need to know about the last operation that was performed; a numerical score 770D that relates a confidence level of all computations performed on the inspected document in relation to the chosen document template; A numerical value 770E indicating the threshold limit for passing or failing the inspection process; and a progress bar 770F (shown in pre-inspection mode) that is activated during the inspection process to indicate to the user that an operation is taking place.


Finally, inspector GUI includes a search bar 780. As shown in FIG. 7B, search bar 780 includes: location code entry 780A to specify what country, province, county or any other similar geopolitical designation to which a document template belongs; document type code entry 780B to specify to what set of documents the template belongs. Examples include visa, passport, financial card and identifying certificates; document name entry 780C to specify the exact name of the document template that the user may desire to use for an inspection; a “Browse” button 780D, which utilizes the information from the three above-mentioned entry fields to display template information in the main inspection window; a “Clear” button 780E, which clears all data retrieved from the knowledge database 300 from the screen; an “Execute” button, 780F which utilizes the information from the above-mentioned entry fields while instigating an inspection process for acquired images; an “Auto-Selection” button 780G, which turns ON or OFF the option of the user to select a template during the inspection process when a perfect template match cannot be acquired. In the ON state a list of templates is presented to the user for use. In the OFF state the best match template is used for the inspection process; and a “Cancel” button 780H, which interrupts and stops an inspection process before it is complete


As depicted in FIG. 3, an optional module of the document comparison software includes a guardian component 350 which assigns user access privileges to view and modify knowledge base 300 when either template builder GUI or inspector GUI 340 are in use. A user with insufficient privileges is denied access to certain areas of knowledge base 300 in template builder mode or to certain results in inspection mode. For example, if the system administrator does not want the user to even be aware that a certain feature for a specified document exists and can be analyzed then access to that feature in knowledge base 300 will be denied and the results of that feature analysis will remain hidden.


Referring to FIG. 8, a block diagram of software modules used by the document inspection engine 320 is illustrated.


An image capture module 800 communicates with the scanner 120 to result in a digital image or a digital representation of a page of the security document 140. Once the digital image of the page (e.g. a digital image 805 of a passport page shown in FIG. 7A) is captured, a specific area or feature of the image may be localized or found in the digital image by the feature/area localization module 810. The feature/area localization module 810 detects and localizes features or areas of the digital image based on a stored digital representation or image of the same feature or area from an authenticated security document.


When the digital image of the specific area or feature has been localized, a mathematical transform is applied to the digital image of the localized feature or area using the mathematical transform module 820. The mathematical transform 820 may also, depending on the property or identifying feature of the security document that is being examined, apply other types of image processing to the digital image.


After the digital image of the localized feature or area has been processed by the mathematical transform module 820, the resulting image from the processing is received by an analysis module 830. The analysis module 830 analyzes the resulting image from the mathematical transform module 820 and produces a result that can easily be compared with stored data derived from a reference security document. The result of the analysis module 830 can then be used by the comparison module 860 to determine how close or how far the feature being examined is from a similar feature on a reference security document. The data for the reference security document is retrieved by a data retrieval module 850 from the database. Once the relevant data for the relevant feature of the reference security document has been retrieved, this data is compared with the data from the analysis module 830 by the comparison module 860. The result of the comparison is then received by the score generation module 870 which determines a score based on the similarities or closeness between the sets of data compared by the comparison module 860. The score generated may be adjusted based on user selected preferences or user or system mandated weights on the data.


It should be noted that the term “reference document” is used to refer to documents against which subject documents will be compared with. As mentioned above, features are associated with documents such that reference documents will be associated with reference features. Features associated with subject documents are compared with reference features associated with reference documents. These reference documents may be authentic or authenticated documents, meaning documents which are known to be legitimate or have been authenticated as being legitimate and not forgeries. Similarly, reference documents may be inauthentic documents or documents known or proven to be fake, forgeries, or otherwise illegitimate. If the reference document being used is an authentic document, the features associated with a subject document are compared to the features associated with an authentic document to positively determine the presence of features expected to be on an authentic document. As an example, if a feature on the reference document (an authentic document in this example) corresponds very closely (if not exactly) to a similar feature on the subject document, then this is an indication of a possible authenticity of the subject document. On the other hand, if the reference document used is an inauthentic document or a known forgery, then a close correlation between features associated with the subject document and on features associated with the reference document would indicate that the subject document is a possible forgery. The use of an inauthentic document can thereby positively determine the possibility, if not a probability, of a forgery. Similarly, using an inauthentic document as a reference document can negatively determine the possibility of the authenticity of a subject document. This is because if the features of the subject document do not closely correlate with the features of an inauthentic document, then this may indicate the authenticity of the subject document.


It should also be noted that the image capture module 800 may be derived from or be found in commercially available software libraries or dynamic link libraries (DLLs). Software and methods for communicating with and receiving digital images from different types of scanning apparatus is well-known in the art of digital scanning and software.


As noted above, to localize a feature or area of the digital image from the image capture module, the feature/area localization module 810 is used. One method which may be implemented by this module 810 is based on having a reference digital image of an area or feature being searched for in the digital image from the image capture module 800. The method, in essence, reduces to searching the digital image for an area or feature that matches the smaller reference digital image. This is done by using normalized cross-correlation.


After normalized cross-correlation is applied to a reference digital image and a subject digital image, the resulting image indicates the regions in the subject digital images which most closely matches the reference digital image. The formula for a correlation factor (or the quality of the match between the reference digital image or template and the subject digital image at coordinates c(u,v)) is given as:







c


(

u
,
v

)


=





x
,
y













f
_



(

x
,
y

)





g
_



(


x
-
u

,

y
-
v


)









x
,
y










f
_



(

x
,
y

)


2






x
,
y









g
_



(

x
,
y

)


2










Thus, the correlation factor equals 1 if there is, at point (u,v), an exact match between the reference digital image and the subject digital image. Another way of calculating the correlation factor is to calculate how different the reference digital image and the subject digital image are at point (u,v). This difference or the “distance” between the two images can be found by using the formula:










e


(

u
,
v

)


=






x
,
y








(


f


(

x
,
y

)


-

g


(


x
-
u

,

y
-
v


)



)

2








=






x
,
y







(



f


(

x
,
y

)


2

+


g


(


x
-
u

,

y
-
v


)


2

-












2


f


(

x
,
y

)




g


(


x
-
u

,

y
-
v


)



)







Since the first two terms in the summation are constants, then the “distance” decreases as the value for the last term increases. The correlation factor is therefore given by the formula c(u,v)=1−e(u,v). When e(u,v)=0, then there is a perfect match at coordinates (u,v). Once the results are plotted, regions where the correlation is highest (closest to 1) appear in the plot.


To apply the cross correlation to the subject digital image, the average value over a window as large as the reference digital image is subtracted from each pixel value of the subject digital image with the window being centered on the pixel being evaluated. This is very similar to applying an averaging filter to the subject digital image. However, to overcome the issue of average values at the edges of the subject digital image, the subject digital image is normalized by padding the edges with mirror values. To best illustrate the above process, FIGS. 9-13 are provided.



FIG. 9 illustrates a sample reference digital image. FIG. 10 illustrates a sample subject image. Thus, the image in FIG. 9 must be found in the subject image of FIG. 10. To assist the reader, a boxed area in FIG. 10 shows where the reference image may be found. As such, there should be at least one area in FIG. 10 that matches the reference digital image. The issue of average values at the edges of the subject image was raised above and, to address this, the edges of the subject image are padded with mirror values, resulting in FIG. 11. As can be seen in FIG. 11, a mirror image of the edges of the subject image is added to every edge. This process normalizes the subject image to produce FIG. 12 which will be used to search for the reference image. Once normalized cross correlation is applied to FIGS. 9 and 12 and the cross correlation coefficients are calculated at every point, the image in FIG. 13 emerges. As can be seen from FIG. 13, two areas show the strongest potential matches to FIG. 9—the dark patches 890 correspond to the regions 901-902 in FIG. 10 where the closest matches to the reference images are found.


Cross correlation can also be used to validate not only the presence/absence of a pattern but also to take into account the edge integrity of the pattern in question. Referring to FIGS. 13A, 13B, and 13C, these figures illustrate one example in which normalized cross correlation is used to take into account edge integrity for authentication purposes. FIG. 13A illustrates a sample reference image from an authentic document while FIG. 13B illustrates an image from an inauthentic document. Normalized cross-correlation determines the level of correlation between the two images. After applying normalized cross-correlation between the two images, FIG. 13C illustrates the result. A distance of 0.81 between the two images is found. Such a score is considered low as a distance of at least 0.9 is to be expected from cross-correlating two images from genuine documents. As can be seen, the blurry edges of the image in FIG. 13B is in contrast to the sharp edges of the image in FIG. 13A.


While the above process localizes the desired matching regions or features, the computational complexity may be daunting as the subject image increases in size. To address this issue, both the reference image and the subject image may both be compressed or reduced in size by the same factor. The normalized cross correlation process set out above can then be applied to these compressed images. Since the area of the reference image has shrunk and since the corresponding area of the subject image has also shrunk, then the mathematical complexity of the calculations similarly shrink. This is because the resolution and the number of pixels being used correspondingly decrease.


It should be noted that the correct reference image to be used in the above process may be determined by the type of security document being examined. Such reference images may therefore be stored in the database and retrieved by the data retrieval module 850 as required. Examples of features/areas which may have reference images stored in the database are microprinting samples, identifier symbols such as the maple leaf in the image in FIG. 7A, and other indicia which may or may not be visible to the naked eye. For non-visible features, the scanner 120 may be configured to illuminate such features with distinct types of radiation (e.g. white light, blue light, red light, green light, infrared light, ultraviolet A radiation, or ultraviolet B radiation) so that an image of such features may be digitally scanned.


Once the feature/area to be examined has been localized, a mathematical transform or some other type of numerical processing may be applied to the localized feature by the mathematical transform module 820. The transform or processing may take many forms such as applying a Fast Fourier Transform (FFT) to the image, determining/finding and tracking edges in the image, and other processes. Other types of processing such as shape recognition through contour matching, the use of a neural classifier, and wavelet decomposition may also be used.


In one embodiment, a Fast Fourier Transform (FFT) is applied to the localized image to result in an illustration of the power spectrum of the image. The power spectrum reveals the presence of specific frequencies and this frequency signature can be used to determine how similar one feature is to a similar feature in an authenticated security document. To illustrate this process, FIGS. 14-21 are provided.


Referring to FIG. 14, a reference image of an area with a repetitive printing pattern (such as microprinting) is illustrated. This reference image is derived from an authenticated security document and provides a reference by which subject images may be measured. Once an FFT is applied to the reference image, an image of its power spectrum or frequency spectrum emerges (see FIG. 15). As can be seen from FIG. 15, specific frequencies are present (see circles in FIG. 15). These peaks in the spectrum indicate the presence of frequencies in the power spectrum of authentic documents and that other authentic documents which have the same microprinting pattern should have similar frequencies in their power spectrum. Essentially, the sharpness of the microprinting affects the sharpness, height, and even the presence of the peaks in the spectrum. As such, the less sharp the microprinting, the lesser and the lower are the peaks in the spectrum. Thus, the power spectrum of the subject image is to be compared to the power spectrum of the reference image.


To continue with the example, FIG. 16 illustrates a subject image from a known inauthentic document. As can be clearly seen in FIG. 16, the microprinting in the subject image is blurred and is not as sharp as the microprinting in the reference image of FIG. 14. Once an FFT is applied to the subject digital image of FIG. 16, the power spectrum that results is shown in FIG. 17. Thus, the power spectrum of FIG. 17 is the result or output of the mathematical transform module 820.


Referring to FIGS. 18-21, another example is illustrated of how the power spectrum may be used to compare images taken from authentic and inauthentic documents. FIG. 18 illustrates a sample image taken from an authentic document. After applying a mathematical transform to the image, the power spectrum of FIG. 19 results. As can be seen from FIG. 19, the frequency that corresponds to the repeating line sequence in the background of FIG. 18 is located in the lower right quadrant of the power spectrum. FIG. 20 illustrates an image taken from an inauthentic document. After applying a mathematical transform to the image, the power spectrum of FIG. 21 results. As can be seen, the relevant frequency that should correspond to a repeating line sequence, and which should be found in the lower right quadrant, is missing from the lower right quadrant of FIG. 21. Also, a frequency which is not present in the power spectrum of FIG. 19 is found in the upper right quadrant of FIG. 21 (see upper right quadrant of FIG. 21). The presence of this unexpected frequency in the upper right quadrant and the absence of the expected frequency in the lower right quadrant is indicative of the absence of the repeating line sequence from the background of the image in FIG. 20.


It should be noted that the power spectrum of the reference image need not be stored in the database. Rather, the analyzed data from the reference power spectrum of the reference image is stored for comparison with the data gathered from the analysis of the power spectrum of the subject image. To analyze the results of the transform module 820, these results (in this case the power spectrum of the subject image) are received by the analysis module 830.


The analysis module 830 analyzes the results of the transform module 820 and produces a result that is mathematically comparable with the stored reference data. In the power spectrum example, the analysis module 830 determines which frequencies are present, which peaks are present in the power spectrum, and how many peaks there are in the spectrum. For this analysis, the subject power spectrum is filtered to remove frequencies outside a predetermined frequency range. Thus, frequencies outside the stored range of fmin and fmax are discarded. Then, a threshold is applied to the remaining frequencies—if a frequency value is below the stored threshold, then that frequency cannot be a peak. Once these conditions are applied, then the other peak conditions (the conditions which determine if a point on the power spectrum is a peak or not) are applied to the remaining points on the subject power spectrum. These peak conditions may be as follows with (x,y) being the coordinates for a point on the subject power spectrum:





Value(x, y)>Value(x−1, y)





Value(x, y)>Value(x+1, y)





Value(x, y)>Value(x, y−1)





Value(x, y)>Value(x, y+1)





Value(x, y)>Value(x−1, y−1)





Value(x, y)>Value(x+1, y+1)





Value(x, y)>Value(x−1, y+1)





Value(x, y)>Value(x+1, y−1)





Value(x, y)>Threshold


A minimum distance between peaks is also desired so that they may be differentiated. As such, an extra condition is applied to each potential peak:

















IF Value(x,y) = peak



   RADIUS = (x−x1)2 + (y−y1)2



   IF (Value(x1, y1) = peak) AND (RADIUS <



   THRESHOLD_RADIUS)



      Value(x1,y1) is not peak



   END



END










With (x,y) being a point on the spectrum, (x1,y1) being another point on the spectrum, and THRESHOLD_RADIUS being the minimum desired distance between peaks, the above condition ensures that if two potential peaks are too close to one another, then the second potential peak cannot be considered a peak.


Once the above analysis is performed on the subject power system, then the number of peaks found is returned as the result of the analysis module 830. The reference power spectrum should have also undergone the same analysis and the number of peaks for the reference power spectrum may be stored in the database as the reference data.


After the number of peaks is found for the subject power spectrum, this result is received by the comparison module 860. The reference data from reference security documents, in this case the number of peaks for the reference power spectrum, is then retrieved by the data retrieval module 850 from the database 160 and is passed on to the comparison module 860. The comparison module 860 compares the reference data with the result from the analysis module 830 and the result is passed to the score generation module 870. The comparison module 860 quantifies how different the reference data is from the result received from the analysis module 830.


When the score generation module 870 receives the result of the comparison module, the score module 870 determines, based on predetermined criteria, a score to be given to the subject security document 140 relative to the feature being examined. As an example, if the reference data had 100 peaks while the subject spectrum only had 35 peaks, then the score module may give a score of 3.5 out of 10 based on the comparison module providing a difference of 65 between the reference data and the subject data. However, if it has been previously determined that a 50% correlation between two authentic documents is good, then the same 35 peaks may be given a score of 7 out of 10 (i.e. to double the raw score) to reflect the fact that a large correlation between the peak numbers is not expected. This score generation module 870 may also, depending on the configuration, take into account other user selected factors that affect the score but that may not be derived from the subject image or the type of security document (e.g. setting a higher threshold for documents from specific countries).


While the above examples use an FFT as the mathematical transform and a power spectrum signature as the representation of the characteristics of the feature being examined, other options are also possible. As an example, a color histogram of a specific region of the subject image may be generated by the mathematical transform module 820 while the analysis module 830 measures the various distributions of color within the resulting histogram. The distributions of color in the subject histogram would then be passed on to the comparison module 860 for comparison with the distributions of color from an authentic document. Clearly, the distributions of color from an authentic document would also have been generated or derived from a color histogram of a similar region in the authentic document. This method would be invariant to rotation in that regardless of the angle of the region being examined, the histogram would be the same.


Similarly, a pattern or contour matching based histogram may also be used to compare the features of a reference document with a subject document. Once a specific feature of the security document has been localized, the contour of that feature (e.g. a maple leaf design, an eagle design, or a crest design) may be obtained by applying any number of edge detector operators by way of the mathematical transform module 820. With the contour now clearly defined, the analysis module 830 can then follow this contour and measure the number of turns of the contour line in all the eight possible directions. A histogram of the turns can then be generated and normalized by subtracting the average value of the turns from every point of the histogram. The resulting normalized histogram of the contour changes would therefore be scale independent. Histograms for a specifically shaped feature should therefore be the same regardless of the size (or scale) of the feature. Thus, a large maple leaf feature should have the same histogram for a smaller maple leaf feature as long as the two features have the same shape. Thus, the details regarding a normalized contour histogram of a feature with a specific shape or pattern from a reference security document can be stored in the database (e.g. the distribution of the directions of the contours or other distinguishing characteristics of the reference histogram). This reference histogram can then be compared to the normalized contour histogram of a similar feature in a subject security document as produced by the mathematical transform module 820. The subject histogram can then be analyzed by the analysis module 830 to produce its distinguishing characteristics. The distinguishing characteristics of the subject histogram and of the reference histogram can then be compared by the comparison module 860.


It should be noted that the above methods may also be used to extract and compare not only the clearly visible features of a security document (e.g. microprinting, color of specific area, identifying indicia such as the maple leaf design) but also non-visible and hidden features as well. As noted above, the scanner may be used to properly illuminate the subject document and reveal the presence (or absence) of security features embedded on the security document. The above-noted invention may be used to compare features that can be digitally scanned to provide a digital image. The scanner may be any suitable type of imaging device.


As noted above, inauthentic documents or documents which are known forgeries may also be used as reference documents. Known features of inauthentic documents may be used as the reference by which subject documents are judged or compared against. One example of such a feature are hidden patterns in authentic documents that appear if these authentic documents are copied or otherwise improperly used. Referring to FIG. 22, an image of a background of an authentic document is illustrated. If this authentic document was copied in a conventional manner (e.g. by way of a photocopier), a hidden pattern, illustrated in FIG. 23 appears. The image of the hidden pattern (the word VOID in the example) may be used as the reference image which will be processed and against which the subject document is compared with. As explained above, if the subject document's feature closely correlates with the feature of the inauthentic document (such as the image in FIG. 23), then this increases the possibility that the subject document is inauthentic. Thus, instead of using the invention to determine the presence of features expected in authentic documents, the invention may also be used to determine the presence of features expected in inauthentic documents.


The above options may all be used together to arrive at different scores for different features on the same security document. These different scores may then be used to arrive at an aggregate or a weighted overall score for the subject security document. As noted above, the aggregate or weighted overall score may then be provided to an end user as an aid to determine whether the subject security document is authentic or not. Referring to FIG. 24, a block diagram or flowchart of the generalized steps taken in the process explained above is illustrated. Beginning at step 900, the process starts with the generation of a digital image of the security document to be examined for features. This step is executed in conjunction with the scanner that actually scans and obtains the digital image of the document or page under examination.


The next step is that in step 910, localizing and/or detecting the feature to be examined. This step is performed by the feature/localization module 810 and the step determines where the feature to be examined is in the document by searching the document for a match with a reference image of the feature.


Step 920 is executed after the feature is localized/detected. Step 920 applies a mathematical transform to the image of the localized feature by way of the mathematical transform module 820. The transform may be the application of an FFT, the application of an edge detector operator, generating a histogram (color or contour) of the feature, or the application of any other mathematical or image processing method.


Step 930 analyzes the data/image/histogram generated by the transform module 820. The analysis extracts the useful data from the transform module's result and this analysis can take various forms. From the examples given above, the analysis may take the form of determining distances between elements in the histogram, determining the number, height, and/or presence of peaks in a power spectrum, and any other analysis that extracts the identifying characteristics of the result from the transform module 820. These identifying characteristics or metrics should be easily quantifiable and should be easy to compare mathematically with reference data stored in the database.


Step 940 provides the metrics from the analysis to the comparison module 860 to determine how quantifiably similar or different the feature of the subject document is from reference data. Also in this step may be the step of retrieving the reference data from the database.


Step 950 actually compares the metrics from the feature of the subject document with the reference data from the database. The comparison may be as simple as subtracting one number from another such that if there is an exact match, then the result should be zero. Results other than zero would indicate a less than perfect match. Alternatively, the comparison step 950 may determine a percentage that indicates how different are the two data sets being compared. From the above example of 35 peaks for the subject document and 100 peaks for the reference data, the comparison step could provide a result that notes that there is a 65% incompatibility or non-match between the two results.


Step 960 generates the final score indicative of a similarity or non-similarity between the subject feature and the reference data derived from the reference feature. As noted above, this step may take into account user or system mandated preferences that would affect the final score.


The final step 970 is that of presenting the final score to the end user as an aid to determining if the subject security document is authentic or not. It should be noted that this final step may include aggregating and/or weighting the scores of multiple different features tested/compared on the subject security document prior to providing a final score to the user.


Embodiments of the method explained above can be implemented as a computer program product for use with a computer system. Such implementation may include a series of computer instructions fixed either on a tangible medium, such as a computer readable medium (e.g., a diskette, CD-ROM, ROM, or fixed disk) or transmittable to a computer system, via a modem or other interface device, such as a communications adapter connected to a network over a medium. The medium may be either a tangible medium (e.g., optical or electrical communications lines) or a medium implemented with wireless techniques (e.g., microwave, infrared or other transmission techniques). The series of computer instructions embodies all or part of the functionality previously described herein. Those skilled in the art should appreciate that such computer instructions can be written in a number of programming languages for use with many computer architectures or operating systems. Furthermore, such instructions may be stored in any memory device, such as semiconductor, magnetic, optical or other memory devices, and may be transmitted using any communications technology, such as optical, infrared, microwave, or other transmission technologies. It is expected that such a computer program product may be distributed as a removable medium with accompanying printed or electronic documentation (e.g., shrink wrapped software), preloaded with a computer system (e.g., on system ROM or fixed disk), or distributed from a server over the network (e.g., the Internet or World Wide Web). Of course, some embodiments of the invention may be implemented as a combination of both software (e.g., a computer program product) and hardware. Still other embodiments of the invention may be implemented as entirely hardware, or entirely software (e.g., a computer program product).


Although various exemplary embodiments of the invention have been disclosed, it should be apparent to those skilled in the art that various changes and modifications can be made which will achieve some of the advantages of the invention without departing from the true scope of the invention.


A person understanding this invention may now conceive of alternative structures and embodiments or variations of the above all of which are intended to fall within the scope of the invention as defined in the claims that follow.

Claims
  • 1. A method of comparing a feature associated with a security document with a similar reference feature associated with a reference security document, the method comprising: gathering comparison data regarding said feature associated with said security document;retrieving reference data from a database, said reference data regarding said reference feature being gathered from said reference security document; andcorrelating said reference data from said database with said comparison data to result in a calculated score, said score being indicative of a level of similarity between said reference data and said comparison data.
  • 2. A method according to claim 1 further comprising the step of presenting said score to a user for use as an aid in determining an authenticity of said security document.
  • 3. A method according to claim 1 wherein said step of gathering comparison data comprises the steps of generating a digital image of said feature; andapplying a mathematical transform to said digital image to result in a representation of characteristics of said feature, said comparison data being derived from said representation of characteristics.
  • 4. A method according to claim 3 wherein said representation of characteristics of said feature is a histogram.
  • 5. A method according to claim 3 wherein said representation of characteristics of said feature is a map of a spectrum of said digital image.
  • 6. A method according to claim 5 wherein said reference data is derived from a reference map of a spectrum of a digital image of said reference feature in said reference security document.
  • 7. A method according to claim 3 wherein said representation of characteristics of said feature is selected from: a power spectrum signaturea color histograma pattern matching based histograma contour matching based histogram
  • 8. A method according to claim 3 wherein said feature is illuminated prior to said step of generating a digital image of said feature, said feature being illuminated by an illumination source such that said feature is exposed to at least one type of radiation.
  • 9. A method according to claim 8 wherein said at least one type of radiation is selected from: ultraviolet A (UV-A)ultraviolet B (UV-B)infrared lightred lightblue lightwhite lightgreen light
  • 10. A method according to claim 3 wherein said feature is localized in said security document subsequent to said step of generating said digital image of said feature.
  • 11. A method according to claim 10 wherein said feature is localized using normalized cross-correlation.
  • 12. A system for comparing a feature associated with a security document with a similar reference feature associated with a reference security document, the system comprising: a database for storing reference data regarding said reference feature associated with said reference security documentdata gathering means for gathering comparison data regarding said feature associated with said security documentdata processing means for processing said comparison data and for comparing processed comparison data with said reference data from said database, said data processing means receiving comparison data from said data gathering means and receiving reference data from said database.
  • 13. A system according to claim 12 wherein said data gathering means generates a digital image of said feature.
  • 14. A system according to claim 13 wherein a mathematical transform is applied to said digital image to result in a representation of characteristics of said feature, said comparison data being derived from said representation of characteristics.
  • 15. A system according to claim 12 wherein said data gathering means comprises an imaging device.
  • 16. A system according to claim 12 wherein said data gathering means comprises an illumination source for illuminating said security document with at least one type of radiation.
  • 17. A system according to claim 16 wherein said at least one type of radiation is selected from: ultraviolet A (UV-A)ultraviolet B (UV-B)infrared lightred lightblue lightwhite lightgreen light.
  • 18. A system according to claim 15 wherein said representation of characteristics of said feature is selected from: a power spectrum signaturea color histograma pattern matching based histograma contour matching based histogram
  • 19. Computer readable media having embodied thereon computer instructions for executing a method of comparing a feature associated with a security document with a similar reference feature associated with a reference security document, the method comprising: gathering comparison data regarding said feature associated with said security document;retrieving reference data from a database, said reference data regarding said reference feature being gathered from said reference security document; andcorrelating said reference data from said database with said comparison data to result in a calculated score, said score being indicative of a level of similarity between said reference data and said comparison data.
  • 20. Computer readable media according to claim 19 wherein said step of gathering comparison data comprises the steps of generating a digital image of said feature; and