The field of the instant invention is optical character recognition technology, frequently abbreviated as “OCR,” that is, technology used to convert images of typed, handwritten, or printed text into properly translated machine encoded text for use in electronic data processing environments.
Optical Character Recognition technology is used to scan images and to extract data from images, text, and numbers. Although OCR technology is used to scan such images, extracting meaningful information and the context of the scanned images becomes challenging because traditional OCR technology processes images and text using a fixed line by line approach. In practical terms, while traditional OCR can often read images and alphanumeric text, it has difficulty interpreting the data processed and providing the correct context to the data processed. This failure to take context into account is the problem that the prior art in the OCR field does not solve, but that the instant invention does solve.
The instant invention as further described herein encompasses a novel method and set of algorithms for use with OCR technology and is hereinafter referred to from time to time as “Smart OCR,” which method using such algorithms captures data from documents based on customized dynamic virtual templates that maintain the correct context of the scanned data. Smart OCR reads and stores data by scanning for block headers defined in the template and ensures the context of the extracted data is the same as that of the image being scanned. Virtual templates that are designed and managed exclusively by this system are a key part of Smart OCR. This system encompasses templates for, without limitation, state driver licenses, passports, earnings statements, and bank statements. With Smart OCR, data is not just read; it is also correctly interpreted based on the type of image from which it was captured. This correct interpretation is especially useful in, for example, a landlord's verifying employment/wage data produced by a prospective tenant in the form of a recent pay stub uploaded by that applicant, or helping to verify the identity of an applicant by analyzing an identification document (ID) uploaded by an applicant.
A template is effectively a virtual blueprint for a document type, which effectively allows a method of mapping a document. For example, one such template is for a generic earnings statement. That template contains document attributes in standardized locations—attributes such as the block of information about the employee (name and address), the block of information about the employer, the block of information about beginning, ending, or current pay dates, a section on earnings (for the given pay period and for year to date), deductions (statutory, taxable, and non-taxable withholdings), and net pay, among other things. Based on the map of this document type and keywords identified for this specific template, when a user uploads a document matching this format, the matched template maps out where to find each information attribute and instructs the system on how exactly to process the information being read via Smart OCR. In this way, these templates are an important aspect of Smart OCR.
This system of the present invention utilizing Smart OCR recognizes and automatically reads identity, income, and other consumer documents to help automate processes such as verification of identity and verification of income, processes that are done manually in the prior art. Applying traditional OCR to reading complex documents, such as proof of identity or proof of income, simply cannot work; while OCR technology can read words and numbers, prior art technology cannot provide any context to the characters being read. For example, a traditional OCR scanner does not have any ability to understand where exactly a last name appears on a NJ driver's license as opposed to a NY driver's license, or on a passport, nor can it understand where to find pay period gross and net earnings on any kind of standardized proof of income document. Smart OCR solves these problems by translating each document scanned against a template image; once the template is matched using identifiers and header information, among other things, the characters read by Smart OCR result in clear, contextual information which is then presented back to the user.
The system of the instant invention implements a method of fraud detection using a combination of Smart OCR and document orientation and feature analysis. The system compares a presented document against known templates based on the format and design of the standard document, displayed logos (if applicable), indentation and font structure of different sections of the document, numerical calculations, and validation of mandatory document attributes, or in an express use, statutory withholdings (for proof of income documents.
In the instant invention for use with OCR technology, referred to from time to time as “Smart OCR,” the system and method uses algorithms to capture data from documents based on customized dynamic virtual templates that maintain the correct context of the scanned data. Smart OCR reads and stores data by scanning for block headers defined in a template and ensures the context of the extracted data is the same as that of the image being scanned. This system encompasses templates for, without limitation, state driver licenses, passports, earnings statements, and bank statements.
A template is effectively a virtual blueprint for a standardized document type, which effectively allows a method of mapping a document. For example, one such template is for a generic earnings statement. That template contains document attributes in standardized locations—attributes such as the block of information about the employee (name and address), the block of information about the employer, the block of information about beginning, ending, or current pay dates, a section on earnings (for the given pay period and for year to date), deductions (statutory, taxable, and non-taxable withholdings), and net pay, among other things. Based on the map of this document type and keywords identified for this specific template, when a user uploads a document matching this format, the matched template maps out where to find each information attribute and instructs the system on how exactly to process the information being read via Smart OCR. In this way, these templates are an important aspect of Smart OCR.
This system of the present invention utilizing Smart OCR recognizes and automatically reads identity, income, and other consumer documents to help automate processes such as verification of identity and verification of income, processes that are done manually in the prior art. Applying traditional OCR to reading complex documents, such as proof of identity or proof of income, simply cannot work; while OCR technology can read words and numbers, prior art technology cannot provide any context to the characters being read. For example, a traditional OCR scanner does not have any ability to understand where exactly a last name appears on a NJ driver's license as opposed to a NY driver's license, or on a passport, nor can it understand where to find pay period gross and net earnings on any kind of standardized proof of income document. Smart OCR solves these problems by translating each document scanned against a template image; once the template is matched using identifiers and header information, among other things, the characters read by Smart OCR result in clear, contextual information which is then presented back to the user.
The system of the instant invention implements a method of fraud detection using a combination of Smart OCR and document orientation and feature analysis. The system compares a presented document against known templates based on the format and design of the standard document, displayed logos (if applicable), indentation and font structure of different sections of the document, numerical calculations, and validation of mandatory document attributes, or in an express use, statutory withholdings (for proof of income documents.
In prior art systems, data captured by OCR is based on position mapping. OCR captures data present in place within a document. With traditional OCR, in the event the document uploaded is moved such that the document is skewed or shown in a different scale, OCR fails to capture the correct data. Document movement refers to the fact that some key document attributes could appear in slightly different locations on different documents, even though the documents share the same underlying format, causing failure in a traditional OCR system. The solution of this invention maps and tags document attributes such that even if a given document attribute appears in a different location on a reference document, the system can still process that attribute correctly and with the appropriate context.
As shown in the flowchart of
As in step A of
The flowchart of
Template analysis under the system described hereinabove supports a high level of automatic fraud detection. By using Smart OCR and machine learning to facilitate template comparison, provided documents will automatically be internally compared against standard authentic documents based on attributes of said authentic document that may include: the format and design of a standard authentic document: displayed logos on said standard and authentic documents, including aspects such as logo size, logo color, and relative positioning of logos; indentation and font structure of different sections of the standard authentic document; and numerical validation of calculations and validation of mandatory document attributes or statutory withholdings, if applicable. Based on these attributes, the system at issue generates a document authenticity score that enables the user of the system to determine easily whether the document provided as evidence is or is not authentic. Using the system and method described in this application, fraud detection is quick and simple as it becomes an automatic process.
It should be appreciated that the description of any certain embodiment of the instant invention as set forth herein should not be construed as the sole manner of practicing said invention nor as a limitation on the invention as claimed hereby, coverage of which hereunder shall include the many variations explicitly or implicitly described in this specification.
This application claims priority from U.S. Provisional Application No. 62/684,299 filed on Jun. 13, 2018.
Number | Date | Country | |
---|---|---|---|
62684299 | Jun 2018 | US |