The present invention relates generally to detection and prevention of document fraud on the basis of the image of the document received from sources like mobile phone camera, scanners etc. More particularly, the invention relates to a counterfeit document detection system and method which can be used for any documents like check, healthcare records etc.
With the rapid changes and advancement in the document creating, scanning and copying technology, the problems relating to fraudulent documents have increased dramatically. To mitigate the risks and frauds pertaining to fraudulent document creation, scanning and copying, technology has been created that evaluates checks, healthcare records etc. for counterfeiting based on security techniques built into these documents. E.g. VOID/COPY Pantograph configured in a background of a check can prevent fraudulent copying.
The current counterfeit detection and prevention processes available today are not adequate to detect the document frauds carried out by scanning and editing of the images using sophisticated and high resolution scanners and printers. They are mostly evaluated manually which does not provide enough protection to the users. The advancement in document alteration technology using editors like Photoshop, Corel Draw etc. oftentimes makes revisions of documents nearly impossible to catch. This is especially the case where the alteration can't be caught easily with naked eye.
Document processing in financial institutions, banks, insurance companies etc. is oftentimes automated to cater large volumes. The manual document evaluation as mentioned above for detecting counterfeit documents is a very difficult, costly, cumbersome, time consuming and inefficient way of document processing.
In view of the foregoing, there is a need in the art for a system and method for accurately and automatically detecting counterfeit documents.
In a first aspect of the invention is provided a method to identify the required Region of Interest (ROI) where the Pantograph is located, from the input image which is received from any image source like scanner, mobile device, image editors etc. The method comprising the steps of: ascertaining a specified area
a) having largest density of the minority pixels is determined and this region is further extracted out from the initial input image or
b) by shape or
c) by size/scale.
The ROI obtained from the above step is then cropped from the original image and the copped image is converted into Gray scale if not provided already in that form.
In a second aspect of the invention is provided a computer program product comprising a computer useable medium having computer readable program code embodied therein for applying multi-channel filtering on the image obtained in above step for texture analysis. The main issues involved in the multi-channel filtering approach to texture analysis are:
1) Functional characterization of the channels and the number of channels.
2) Extraction of appropriate texture features from the filtered images.
3) The relationship between channels (dependent vs. independent).
4) Integration of texture features from different channels to produce a segmentation.
Each (selected) filtered image is subjected to a bounded nonlinear transformation that behaves as a ‘blob detector’. The combination of multi-channel filtering and the nonlinear stages can be viewed as performing a multi-scale blob detection. Texture discrimination is associated with differences in the attributes of these blobs in different regions. A statistical approach is then used where the attributes of the blobs are captured by texture features defined by a measure of “energy” in a small window around each pixel in each response image. This process generates one ‘feature image’ corresponding to each filtered image (see
In a third aspect of the invention is provided a computer program product comprising a computer useable medium having computer readable program code embodied therein for edge or object detection. Edge/object detection is the process of localizing pixel intensity transitions. The method uses the derivative approximation to find edges/objects. Therefore, it returns edges at those points where the gradient of the considered image is maximum. Derivative based approaches can be categorized into two groups, namely first and second order derivative methods. First order derivative based techniques depend on computing the gradient several directions and combining the result of each gradient. The value of the gradient magnitude and orientation is estimated using two differentiation masks, one vertical and one horizontal.
In a fourth aspect of the invention is provided a computer program product comprising a computer useable medium having computer readable program code embodied therein for partitioning a group of data points into a small number of clusters. The main issues involved in clustering are, first decide the number of clusters then a) Initialize the center of cluster b) attribute closest cluster to each data point c) Set the position of each cluster to the mean of all data points belonging to that cluster d) Repeat steps b-c until convergence.
The algorithm stops when the assignments do not change from one iteration to the next.
At this stage the characters hidden under Pantograph are visible on the image with naked eyes. If it's a counterfeit document created by scanning the original document and/or photo editing using photo editing software like Adobe Photoshop, Corel Draw etc. the image obtained will not display any characters. If there are characters present, then it is determined that the document is a genuine or a counterfeit photocopy document created using sophisticated high end photo copiers/printers.
In a fifth aspect of the invention is provided a computer program product comprising a computer useable medium having computer readable program code embodied therein for converting the image obtained in the above step to binary image by thresholding. The image is then applied with motion blur. Thresholding is applied again on the output image obtained after motion blur. These steps are repeated to get desired image output till no further iterations are possible. If the document is a genuine document, the image will display the characters hidden under Pantograph which can be viewed with naked eyes. If the document is a counterfeit document, no characters will be visible.
In a sixth aspect of the invention is provided a computer program product comprising a computer useable medium having computer readable program code embodied therein to determine if the document is genuine or counterfeit photocopy and make the characters machine readable using OCR. In order to achieve that, bounding boxes are drawn on the image obtained in the above step to detect the blobs. By calculating the bounding area of bounding boxes, it is determined which bounding boxes are to be considered for the purpose of the confirmation of the genuine or counterfeit and the automation of character reading using OCR. In case of the image of genuine document, we get large size blobs e.g. size greater than 7000, which are then considered and bounding boxes are drawn and the gaps are filled to make the boxes ready for OCR reading. The letters/characters are detected and read using OCR and they are returned to the image. In case of a counterfeit photocopy document, we won't get the blobs of large size. When the bounding boxes are drawn, and parsed to the OCR engine, it will not return in proper characters confirming that the document is a counterfeit photocopy document.
The foregoing and other features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention.
The preferred embodiments of this invention will be described in detail, with reference to the following figures, wherein like designations denote like elements, and wherein:
Although certain preferred embodiments of the present invention will be shown and described in detail, it should be understood that various changes and modifications may be made without departing from the scope of the appended claims. The scope of the present invention will in no way be limited to the number of constituting components, the materials thereof, the shapes thereof, the relative arrangement thereof, etc., which are disclosed simply to describe the preferred embodiment.
System preferably includes a memory, a central processing unit (CPU), input/output devices (I/O) and a bus. A database may also be provided for storage of data relative to processing tasks. Memory preferably includes a program product that, when executed by CPU, comprises various functional capabilities described in further detail below. Memory (and database) may comprise any known type of data storage system and/or transmission media, including magnetic media, optical media, random access memory (RAM), read only memory (ROM), a data object, etc. Moreover, memory (and database) may reside at a single physical location comprising one or more types of data storage, or be distributed across a plurality of physical systems. CPU may likewise comprise a single processing unit, or a plurality of processing units distributed across one or more locations. I/O may comprise any known type of input/output device including a network system, modem, keyboard, mouse, document scanner, check scanner, mobile phone, voice recognition system, CRT, printer, disc drives, etc. Additional components, such as cache memory, communication systems, system software, etc., may also be incorporated into system.
Document processing system may be implemented in a variety of forms. For example, document processing system may be a high speed, high volume document processing system such as found in institutional banks or a single or multiple item(s) processing using mobile phones etc. System, as recognized in the field, may include one or more networked computers, i.e., servers. In this setting, distributed servers may each contain only one application/system/module with the remainder of the applications/systems/modules resident on a centrally located server. In another embodiment, a number of servers may be present in a central location, each having different software applications resident therein. A server computer typically comprises an advanced midrange multiprocessor-based server, utilizing standard operating system software, which is designed to drive the operation of the particular hardware and which is compatible with other system components, and I/O controllers.
Alternatively, system may be implemented as a workstation such as a bank teller workstation. A workstation of this form may comprise, for example, an INTEL PENTIUM Core i5 or AMD microprocessor, or like processor, such as found in an IBM, Lenovo, Dell PCs.
Memory of system preferably includes a program product that, when executed by CPU, provides various functional capabilities for system. As shown in
In the following discussion, it will be understood that the method steps discussed preferably are performed by a processor, such as CPU of system, executing instructions of program product stored in memory. It is understood that the various devices, modules, mechanisms and systems described herein may be realized in hardware, software, or a combination of hardware and software, and may be compartmentalized other than as shown. They may be implemented by any type of computer system or other apparatus adapted for carrying out the methods described herein. A typical combination of hardware and software could be a general-purpose computer system with a computer program that, when loaded and executed, controls the computer system such that it carries out the methods described herein. Alternatively, a specific use computer, containing specialized hardware for carrying out one or more of the functional tasks of the invention could be utilized. The present invention can also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods and functions described herein, and which—when loaded in a computer system—is able to carry out these methods and functions. Computer program, software program, program, program product, or software, in the present context mean any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after the following: (a) conversion to another language, code or notation; and/or (b) reproduction in a different material form.
Turning to
As shown in an exemplary cropped ROI image in
It should be recognized that while the present invention will be described relative to a document having pre-set texture/pattern, the invention is equally applicable to a document having a complete different background. In this situation, other methods for determining a specific Region of Interest are used.
In exemplary check, Region of Interest (ROI) include a Pantograph area as shown in
Referring to
Imaging of document may be provided by an image scanner module of system or some other separate imaging system e.g. check scanner or mobile phone camera etc. Conversion of that image to a gray scale image is preferably conducted by a gray scale converter of system. A document identification is preferably gathered from each document by a document identifier. As known in the art, document may include an identification thereon so system may ascertain a variety of information about document. For instance, system can evaluate whether document is of a type for which evaluation is desired. In addition, if evaluation is desired, system can determine, inter alia, Region of Interest (ROI) on document and their respective predetermined pattern(s). For example, for check, the identification may indicate a rectangular box of Pantograph. Document information such as location of ROI present on document, etc., may be obtained by system from database, which may be subject to periodic updates. In one preferred embodiment, document processor periodically verifies predetermined patterns of ROI of documents used by document processor for use by system. Alternatively, if a system is used with a single type of document, document identification may be eliminated. In the case of check, an identification may be provided, for example, by some digits (
Again referring to
Now, the document either is a genuine document or it is a counterfeit document created by photocopying of the of the original document, done with high end, high resolution, sophisticated digital photo copier. To detect the counterfeit, the image obtained in above step is converted to a binary form using thresholding (
While this invention has been described in conjunction with the specific embodiments outlined above, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, the preferred embodiments of the invention as set forth above are intended to be illustrative, not limiting. Various changes may be made without departing from the spirit and scope of the invention as defined in the following claims.