1. Field of the Invention
Various embodiments described herein relate generally to the field of image processing. More particularly, various embodiments are directed in one exemplary aspect to processing an image of a receipt captured by a mobile device, identifying text fields and extracting relevant content therefrom.
2. Related Art
Mobile phone adoption continues to escalate, including ever-growing smart phone adoption and tablet usage. Mobile imaging is a discipline where a consumer takes a picture of a document, and that document is processed, extracting and extending the data contained within it for selected purposes. The convenience of this technique is powerful and is currently driving a desire for this technology throughout Financial Services and other industries.
One document that consumers often encounter is a paper receipt for a purchase of goods or services. In addition to simply confirming a purchase, receipts are valuable for numerous reasons—returns or exchanges of merchandise or services, tracking of expenses and budgets, classifying tax-deductible items, verification of purchase for warranties, etc. Consumers therefore have numerous reasons to keep receipts and also organize receipts in the event they are needed. However, keeping track of receipts and organizing them properly is a cumbersome task. The consumer must firm remember where the receipt was placed when the purchase was made and keep track of it until they arrive home to then further sort through it. In the process, the receipt may be lost, ripped, faded or otherwise damaged to the point that it can no longer be read.
Systems and methods of capturing data from mobile images of receipts implemented are provided herein.
Other features and advantages should become apparent from the following description of the preferred embodiments, taken in conjunction with the accompanying drawings.
Various embodiments disclosed herein are described in detail with reference to the following figures. The drawings are provided for purposes of illustration only and merely depict typical or exemplary embodiments. These drawings are provided to facilitate the reader's understanding and shall not be considered limiting of the breadth, scope, or applicability of the embodiments. It should be noted that for clarity and ease of illustration these drawings are not necessarily made to scale.
The various embodiments mentioned above are described in further detail with reference to the aforementioned figured and the following detailed description of exemplary embodiments.
The embodiments described herein are related to systems and methods for capturing an image of a receipt on a mobile device such as a smartphone or tablet and then identifying and processing various information within the receipt. One of the most important tasks behind the mobile receipt capture technology described herein is understanding and utilizing category-specific rules in the form of known document sizes, relationships between different document fields, etc. For example, knowledge that many receipts have 3 inch widths helps to alter an image to restore the actual size of a receipt, which in turn improves a printing function and, most importantly, accuracy of content extraction such as optical character recognition.
Embodiments described herein focus on capturing the following fields: date, address and tendered amount. These fields persist on a majority of receipts and are important for an application that is designed to process mobile receipts and extract the content therein. Other fields can be identified using similar methods.
Whereas several important fields on receipts could be captured using dynamic capture technology, as set forth in the '036 Application discussed above. The method for capturing the tendered amount is specifically applicable to receipt. Further details are provided below with regard to capturing tendered amounts.
The systems and methods described herein combines category-specific image and data capture technology with a specific workflow which allows a user to store images of receipts, choose types of receipts, convert the currencies and automatically create expense reports, etc. The latter can be sent to the user's email account in multiple forms.
A method of identifying a size of the receipt and correcting a size of the image to match the size of the receipt (steps 40 and 50) is described herein. In a first step, the original bitonal snippet 30 is created, e.g., in accordance with the embodiments described in the '036 Application or in U.S. Pat. No. 7,778,457, entitled “Systems and Methods for Mobile Image Capture and Processing of Checks,” which is also incorporated herein by reference as if set forth in full, after which a preliminary rotation is performed to fix vertical text. Since a majority of receipts are “vertical” (that is, height is bigger than width), it usually results in rotating snippets with an incorrect width-to-height ratio. Thus, in certain embodiments, a more accurate detection and correction of the vertical text is performed using connected components algorithms.
Detection of upside-down text (step 60) can then be performed. If such text is detected, the image is rotated by 180 degrees. An accurate detection and correction of upside-down text can be done using Image Enhancement techniques, described for example in QuickFX API Interface Functions, Mitek Systems, Inc., which is incorporated herein by reference as if set forth in full. Using connected components analysis, all connected components (CCs) are found on image created above.
A histogram analysis can then be applied to detect the most frequent CC's widths. In case there is more than one candidate, additional logic is used to detect if the most frequent values could be considered to be the size of a lowercase or capital letter character.
The character width found above can then be compared to an expected width of a standard 3-inch receipt. If the width is approximately close to expected, the grayscale and bitonal images are recreated using known document widths of 3 inches, and if it is not close, the process skips to the next step. In the next step, the previously determined character width is compared to an expected width on an 11″×8.5″ page receipt. If the width is approximately close to expected, the grayscale and bitonal images are recreated using a known document width of 8.5″ and known height of 11″. Once the size of the receipt in the image is matched as closely as possible to the original size, the text and other characters are in better proportion for capturing using optical character recognition and other content recognition steps.
Bitonal image enhancements can include auto-rotation, noise removal and de-skew. Auto-rotation corrects image orientation from upside-down to right side up. In rare cases, the image is corrected from being 90 or 270 degrees rotated (so that text becomes vertical).
With respect to step 80, the date field on receipts largely has the following format: <MM>/<DD>/<YY>, as shown in
After the date field is found, the system can be configured to try to parse it into individual Month, Day and Year components. Each component can then be tested for possible ranges (no more than 31 days in a month, no more than 12 months etc.) and/or alpha-month is replaced by numeric value. The date results which do not pass such interpretation are suppressed.
The system can then be configured to search for the date field using Fuzzy Matching technique, such as those described in U.S. Pat. No. 8,379,914 (the '914 Patent), entitled “Systems and Methods for Mobile Image Capture and Remittance Processing,” which is incorporated herein by reference in its entirety as if set forth in full. Each found location of data can be assigned the format-based confidence, which reflects how close data in the found location matches expected format. For example, the format-based confidence for “07/28/08” is 1000 (of 1000 max); the confidence of “a7/28/08” is 875 because 1 of 8 non-punctuation characters (“a”) is inconsistent with the format. However, the format-based confidence of “07/2B/08” is higher (900-950) because ‘B’ is close to one of characters allowed by the format (‘8’).
The date with highest format-based confidence can then be returned in step 90.
With respect to step 100, United States address fields on receipts have a regular <Address> format, as illustrated in
Usually, addresses are printed as left-, right- or center-justified text blocks isolated from the rest of document text by significant white margins. Based on this information, the system can detect potential address locations on a document by building text block structure. In one embodiment, this is done by applying text segmentation features available in most of OCR systems, such as Fine Reader Engine by ABBYY.
In most of US addresses, the bottommost line contains City/State/ZIP information. The system can utilize this knowledge by filtering out the text blocks found above that do not have enough alphas (to represent City and State), do not contain any valid state (which is usually abbreviated to 2 characters) and/or do not contain enough numbers in the end to represent Zip-code.
Once address candidates are selected using the processes described, the system can build the entire address block starting with City/State/ZIP at the bottom line and including 1-3 upper lines as potential Name and Street Address components. Since the exact format of the address is not often well-defined (it may have 1-4 lines, be with or without Recipient name, be with or without POBOX etc.), the system can be configured to make multiple address interpretation attempts to achieve satisfactory interpretation of the entire text block.
In order to compare OCR results with the data included into the Postal db, the Fuzzy Matching mechanism described above can be used. For example, if OCR reads “San Diego” as “San Dicgo” (‘c’ and ‘e’ are often misrecognized), Fuzzy Matching will produce matching confidence above 80% between the two, which is sufficient to achieve the correct interpretation of OCR result.
After the interpretation of the address block is achieved, the individual components can be corrected to become identical to those included into the Postal db. Optionally, the discrepancies between address printed on the receipt and its closest match in Postal db could be corrected by replacing invalid, obsolete or incomplete data as follows:
The system can be configured to assign a confidence value on the scale from 0 to 1000 to each address it finds. Such confidences could be assigned overall for the entire address block or individually to each address component (Recipient Name, Street Number, Apartment Number, Street Name, POBOX Number, City, State and Zip). The larger values indicate that the system is quite sure that if found, read and interpreted the address correctly. The component-specific confidence reflects the number of corrections in this component required above. For example, if 1 out of 8 non-space characters was corrected the “CityName” address component (e.g. San Dicgo” v. “San Diego”), the confidence of 875 may be assigned (1000*7/8). The overall confidence is a weighted linear combination of individual component-specific confidences, where the weights are established experimentally.
With respect to step 120, detecting an amount on a receipt is compounded by the presence of multiple amounts on a receipt. For example, the receipt on FIGS. 1 and 2A/2B shows 5 different amount fields, see
The Tendered Amount field has a set of keyword phrases which allow to find (but not uniquely) the field's location on about 90% of receipts. In remaining 10%, the keyword cannot be found due to some combination of poor image quality, usage of small font, inverted text etc.
Some of frequent keyword phrases are:
Among these keywords the ones associated with charging credit cards is identified. For example, on
The system can be configured to search for keywords in the OCR result using Fuzzy Matching technique. For example, if OCR result contains “Bajance Due” then the “Balance Due” keyword will be found with confidence of 900 (out of 1000 max) because 9 out of 10 non-space characters are the same as in the “Balance Due”.
The Tendered Amount field has so-called “DollarAmount” format, which is one of pre-defined data formats explained in the '914 Patent. This data format can be used by the system instead of or in combination with keyword-based search to further narrow down the set of candidates for the field.
Example on
The system can be configured to search for data below or to the right of each keyword found above, e.g., using the Fuzzy Matching technique of the '914 Patent. Each found location of data is assigned the format-based confidence, which reflects how close data in the found location matches expected format (in this case, “DollarAmount”). For example, the format-based confidence for “$94.00” is 1000 (of 1000 max); the confidence of “$94.A0” is 800 because 1 of 5 non-punctuation characters (“A”) is inconsistent with the format; however, the format-based confidence of “$9S.00” is higher (900-950) because ‘S’ is close to one of characters allowed by the format (‘5’).
Using connected components analysis, all connected components (CCs) are found on the image. The system computes average font size on image by building a histogram of individual character's heights over all CCs that are found. The system can then compute the average character thickness on image by building a histogram of individual character's thicknesses over all CCs found. For each data location found, the system can compute the combined score (CS) using a linear combination of the following values:
The weights W1-W8 are established experimentally.
The candidate with the highest CS computed can then be output. Once the data from all of the receipt fields is obtained, the content may be organized into a file or populated into specific software which tracks the specific fields for financial or other purposes. In one embodiment, a user may be provided with a user interface which lists the fields on a receipt and populates the extracted content from the receipt in a window next to each field.
It will be understood that the term system in the preceding paragraph, and throughout this description unless otherwise specified, refers to the software, hardware, and component devices required to carry out the methods described herein. This will often include a mobile device that includes an image capture systems and software that can perform at least some of the steps described herein. In certain embodiments, the system may also include server side hardware and software configured to perform certain steps described herein.
Power supply module 902 can be configured to supply power to the components of server 708.
While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not of limitation. The breadth and scope should not be limited by any of the above-described exemplary embodiments. Where this document refers to technologies that would be apparent or known to one of ordinary skill in the art, such technologies encompass those apparent or known to the skilled artisan now or at any time in the future. In addition, the described embodiments are not restricted to the illustrated example architectures or configurations, but the desired features can be implemented using a variety of alternative architectures and configurations. As will become apparent to one of ordinary skill in the art after reading this document, the illustrated embodiments and their various alternatives can be implemented without confinement to the illustrated example. One of ordinary skill in the art would also understand how alternative functional, logical or physical partitioning and configurations could be utilized to implement the desired features of the described embodiments.
Furthermore, although items, elements or components may be described or claimed in the singular, the plural is contemplated to be within the scope thereof unless limitation to the singular is explicitly stated. The presence of broadening words and phrases such as “one or more,” “at least,” “but not limited to” or other like phrases in some instances shall not be read to mean that the narrower case is intended or required in instances where such broadening phrases may be absent.
Number | Date | Country | |
---|---|---|---|
61801963 | Mar 2013 | US |