The present disclosure relates generally to processing and analysis of a captured image.
Applications exist that enable image capturing of a physical document. An example of this type of application is a receipt capture application that captures an image corresponding to a physical receipt such as one received when a purchase has been made by a user. It is desirable for users to be able to capture and analyze physical receipts in order to track costs and expenses attributable to the user. Additionally, many captured receipts include areas where users have written in values and it is also desirable to obtain information corresponding to written values on a receipt.
Many receipts have handwriting area such as a tip and total amount, but conventional OCR usually cannot read, with accuracy, the information contained in a handwriting area of the receipt. The handwriting area may include information such as value representing tip amount and total bill amount. Conventional OCR technology has difficulty reading the information in the handwriting area that was manually entered by a person unless the text is very clearly and appears as if it were printed text. While certain handwriting specialized recognition methods exist, those solutions expect the handwriting image to be pre-identified and to be provided in a certain condition. Thus, while general receipt capture application exist, a drawback associated with these application is that while they may retrieve printed texts, they cannot analyze the receipt properly in case a relevant value is a handwritten value such as tip amount and/or handwritten total amount.
According to aspect of the disclosure, an information processing method and apparatus are provided that performs operations including identifying, from an image obtained via an image capture device, at least one character string that is relevant in identifying information to be extracted from the image; defining an area, within the image, that includes information as an information extraction area, the information including a plurality of information elements; selecting a region within the defined area where the information to be extracted is expected to be present using a feature within the defined area; removing the feature from the selected region and correcting one or more errors associated with the information caused by the removal of the feature and extracting one or more alphanumeric characters from the corrected information, wherein the extracted one or more alphanumeric characters correspond to the elements of the information and are associated with a respective one of the at least one character strings.
These and other objects, features, and advantages of the present disclosure will become apparent upon reading the following detailed description of exemplary embodiments of the present disclosure, when taken in conjunction with the appended drawings, and provided claims.
Throughout the figures, the same reference numerals and characters, unless otherwise stated, are used to denote like features, elements, components or portions of the illustrated embodiments. Moreover, while the subject disclosure will now be described in detail with reference to the figures, it is done so in connection with the illustrative exemplary embodiments. It is intended that changes and modifications can be made to the described exemplary embodiments without departing from the true scope and spirit of the subject disclosure as defined by the appended claims.
Exemplary embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. It is to be noted that the following exemplary embodiment is merely one example for implementing the present disclosure and can be appropriately modified or changed depending on individual constructions and various conditions of apparatuses to which the present disclosure is applied. Thus, the present disclosure is in no way limited to the following exemplary embodiment and, according to the Figures and embodiments described below, embodiments described can be applied/performed in situations other than the situations described below as examples. Further, where more than one embodiment is described, each embodiment can be combined with one another unless explicitly stated otherwise. This includes the ability to substitute various steps and functionality between embodiments as one skilled in the art would see fit.
There is a need to provide a system and method that improves the ability properly identify non-computerized information within an image such that a value of that information may be extracted from the image with a high degree of reliability. The application according to the present disclosure resolves a problem related to the extraction of data by properly identifying an area of an image where non-computerized text is present and enhancing the area of that image such that the values of the non-computerized text can more easily be extracted. For example, in the application according to the present disclosure can anticipate a location within an image where handwritten text is expected to be and performs image enhancement processing to ensure that the value of the handwritten text is capable of being extracted with a high degree of reliability. More specifically, the present application enables extraction of handwritten text no matter the size, style or other variations on handwriting that commonly exist between people. Based on the advantageous differentiation, the application improves the reliability and accuracy for any data extraction processing to be performed on the target object(s) in the captured image.
The applications and the advantages provided thereby can be achieved based on the algorithm and figures discussed hereinbelow.
The following description of the functionality of image processing and analysis application according to the present disclosure will occur using the instructional steps illustrated in
At step S102, images of one or more objects are obtained. The images are obtained using an image capture device such as a camera of a mobile phone. The images include at least one object and includes one or more data items that can identified and extracted for storage in a data store (e.g. database). In another embodiment, the images may be obtained via file transfer process whereby the computing device acquires one or more images from an external source. This may include, for example, a cloud storage apparatus whereby a user can selectively access and download one or more images on which the processing disclosed herein may be performed. In another embodiment, the images may be attached to an electronic mail message and extracted by the application therefrom in order to perform the processing described herein.
An example of a type of image obtained at S102 is illustrated in
At step S104, the obtained images are processed using optical character recognition processing module/process to retrieve character strings and location data associated with each retrieved character string. The results of the OCR in general will include all retrieved character strings and its location data within the image. The OCR processing performed may be able to recognize any type of alphanumeric character including, letter, numbers, special characters and can recognize characters of one or more language. As long as the result contains all retrieved character strings and its location data, the OCR module/process can be replaced with any general OCR module/process, but the quality of result will vary depending on the result of the OCR module/process.
The results of the OCR processing in S104 are illustrated in
In step S106, a search is performed on all of the character string fields 202 generated in S104 to determine if the recognized characters in the respective fields 202 match one or more pre-defined relevancy conditions stored in a data store. The set of pre-defined relevancy conditions may be stored in tabular format or in a data store such as a database. The set of pre-defined relevancy conditions may include any or all of (a) one or more words or terms, (b) one or more particular characters, (c) format of characters within a field, and/or (d) relative location of fields to one or more other fields. In one embodiment, the pre-defined conditions include one or more word that elicits a user to manually input (e.g. handwrite) additional information in an area proximate to the one or more words on the object that was captured and obtained in S102. In the exemplary embodiment shown in
In the example illustrated in
Upon determining the presence of one or more relevant character string fields as discussed above, the algorithm uses location information associated with the determined one or more relevant character string fields to identify a candidate recognition region within the image. The candidate recognition region is a region in the image that, based on the relevant character string fields, is likely or expected to contain handwritten information subject to extraction therefrom. The algorithm identifies the candidate recognition region based on one or more region selection conditions. The region selection conditions may include a predetermined area in the image relative to one or more of the character string fields 202 that meet pre-defined relevancy conditions. In one embodiment, the region selection condition causes selection of an area of the image that is adjacent to two character string fields that meet relevancy conditions. For example, the region selection condition is an area to the right of the third and fourth character string field 202c and 202d which set forth the relevancy conditions of “Tip” and “Total”. In another embodiment, the region selection condition causes selection of an area that is beneath a character string field meeting a relevancy condition when that character recognition field is adjacent to a further character string field that meets the same or another different relevancy condition.
As shown in
In one embodiment, a size of the candidate recognition region is also based on the location and position of the respective character string fields deemed to be relevant. For example, as shown herein, the algorithm knows the position of the upper bound of the third character string field 202c and the lower bound of the fourth character string field 202d and may use a distance between those boundaries as a height, in pixels, for the candidate recognition region 204. In another embodiment, the height in pixels may be automatically expanded a predetermined number of pixels in order to define an area having a height that is larger than the known distance to potentially capture more handwritten information. Further, the algorithm knows the location of the right boundary of both the third and fourth character string fields 202c and 202d and a rightward boundary of the second character string field 202b and may use a distance therebetween as a width, in pixels, of the candidate recognition region 204. In another embodiment, the width in pixels may be automatically expanded a predetermined number of pixels in one or a right and/or left direction order to define an area having a width that is larger than the known distance to potentially capture more handwritten information.
Once the candidate recognition area is defined, the algorithm, in step S108, analyzes pixel data within the candidate recognition region to be able to recognize handwritten information contained therein. In order to achieve this, the algorithm analyzes the image to determine if one or more image features are present therein which can then be emphasized by further image processing to better locate the handwritten information and, in step S110, using the emphasis applied to the one or more feature, set one or more sub-areas where handwritten information is expected and retrieve image data from within the one or more sub-regions as part of hand writing recognition. The processing performed in S108 and S110 will now be described with respect to
While the handwritten information is expected to be within region 402 as shown in
As shown in
Complement processing to recover information is illustrated in
In step S114, it is determined whether the corrected handwritten information contained in the expanded area 402a contains characters that are not adequately separate from one another. Without separation, the handwriting recognition processing may also not properly recognize the information. In other words, S114 includes performance of a secondary correction processing to remove or correct one or more other defect present within the recognition region. Secondary correction processing will be described with respect to
In
Based on the above processing a number of individual elements are able to be recognized as shown in
Upon determining the correct information to be extracted, alphanumerical values corresponding to each extracted character can be provided and stored in a report while being associated with a particular recognized character string such as “Total Amount”. This resolves a problem associated with object recognition where all type written values are not the correct values to be extracted but instead, the correct value to be extracted is handwritten onto the object. Further, the algorithm described herein takes into account and corrects for the variation in handwriting techniques in order to accurately identify and extract the correct information from the image.
In some embodiments, the computing device 1200 performs one or more steps of one or more methods described or illustrated herein. In some embodiments, the computing device 1200 provides functionality described or illustrated herein. In some embodiments, software running on the computing device 1200 performs one or more steps of one or more methods described or illustrated herein or provides functionality described or illustrated herein. Some embodiments include one or more portions of the computing device 1200.
The computing device 1200 includes one or more processor(s) 1201, memory 1202, storage 1203, an input/output (I/O) interface 1204, a communication interface 1205, and a bus 1206. The computing device 1200 may take any suitable physical form. For example, and not by way of limitation, the computing device 1200 may be an embedded computer system, a system-on-chip (SOC), a single-board computer system (SBC) (such as, for example, a computer-on-module (COM) or system-on-module (SOM)), a desktop computer system, a laptop or notebook computer system, an interactive kiosk, a mainframe, a mesh of computer systems, a mobile telephone, PDA, a computing device, a tablet computer system, or a combination of two or more of these.
The processor(s) 1201 include hardware for executing instructions, such as those making up a computer program. The processor(s) 1201 may retrieve the instructions from the memory 1202, the storage 1203, an internal register, or an internal cache. The processor(s) 1201 then decode and execute the instructions. Then, the processor(s) 1201 write one or more results to the memory 1202, the storage 1203, the internal register, or the internal cache. The processor(s) 1201 may provide the processing capability to execute the operating system, programs, user and application interfaces, and any other functions of the computing device 1200.
The processor(s) 1201 may include a central processing unit (CPU), one or more general-purpose microprocessor(s), application-specific microprocessor(s), and/or special purpose microprocessor(s), or some combination of such processing components. The processor(s) 1201 may include one or more graphics processors, video processors, audio processors and/or related chip sets.
In some embodiments, the memory 1202 includes main memory for storing instructions for the processor(s) 1201 to execute or data for the processor(s) 1201 to operate on. By way of example, the computing device 1200 may load instructions from the storage 1203 or another source to the memory 1202. During or after execution of the instructions, the processor(s) 1201 may write one or more results (which may be intermediate or final results) to the memory 1202. One or more memory buses (which may each include an address bus and a data bus) may couple the processor(s) 1201 to the memory 1202. One or more memory management units (MMUs) may reside between the processor(s) 1201 and the memory 1202 and facilitate accesses to the memory 1202 requested by the processor(s) 1201. The memory 1202 may include one or more memories. The memory 1202 may be random access memory (RAM).
The storage 1203 stores data and/or instructions. As an example and not by way of limitation, the storage 1203 may include a hard disk drive, a floppy disk drive, flash memory, an optical disc, a magneto-optical disc, magnetic tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. In some embodiments, the storage 1203 is a removable medium. In some embodiments, the storage 1203 is a fixed medium. In some embodiments, the storage 1203 is internal to the computing device 1200. In some embodiments, the storage 1203 is external to the computing device 1200. In some embodiments, the storage 1203 is non-volatile, solid-state memory. In some embodiments, the storage 1203 includes read-only memory (ROM). Where appropriate, this ROM may be mask-programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), electrically alterable ROM (EAROM), or flash memory or a combination of two or more of these. The storage 1203 may include one or more memory devices. One or more program modules stored in the storage 1203 may be configured to cause various operations and processes described herein to be executed. While storage is shown as a single element, it should be noted that multiple storage devices of the same or different types may be included in the computing device 1200.
The I/O interface 1204 includes hardware, software, or both providing one or more interfaces for communication between the computing device 1200 and one or more I/O devices. The computing device 1200 may include one or more of these I/O devices, where appropriate. One or more of these I/O devices may enable communication between a person and the computing device 1200. As an example and not by way of limitation, an I/O device may include a keyboard, keypad, microphone, monitor, mouse, speaker, still camera, stylus, tablet, touch screen, trackball, video camera, another suitable I/O device or a combination of two or more of these. An I/O device may include one or more sensors. In some embodiments, the I/O interface 1204 includes one or more device or software drivers enabling the processor(s) 1201 to drive one or more of these I/O devices. The I/O interface 1204 may include one or more I/O interfaces.
The communication interface 1205 includes hardware, software, or both providing one or more interfaces for communication (such as, for example, packet-based communication) between the computing device 1200 and one or more other computing devices or one or more networks. As an example and not by way of limitation, the communication interface 1205 may include a network interface card (NIC) or a network controller for communicating with an Ethernet or other wire-based network or a wireless NIC (WNIC) or wireless adapter for communicating with a wireless network, such as a WI-FI network. This disclosure contemplates any suitable network and any suitable communication interface 1205 for it. As an example and not by way of limitation, the computing device 1200 may communicate with an ad hoc network, a personal area network (PAN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), or one or more portions of the Internet or a combination of two or more of these. One or more portions of one or more of these networks may be wired or wireless. As an example, the computing device 1200 may communicate with a wireless PAN (WPAN) (such as, for example, a Bluetooth WPAN or an ultra wideband (UWB) network), a WI-FI network, a WI-MAX network, a cellular telephone network (such as, for example, a Global System for Mobile Communications (GSM) network), or other suitable wireless network or a combination of two or more of these. Additionally the communication interface may provide the functionality associated with short distance communication protocols such as NFC and thus may include an NFC identifier tag and/or an NFC reader able to read an NFC identifier tag positioned with a predetermined distance of the computing device. The computing device 1200 may include any suitable communication interface 1205 for any of these networks, where appropriate. The communication interface 1205 may include one or more communication interfaces 1205.
The bus 1206 interconnects various components of the computing device 1200 thereby enabling the transmission of data and execution of various processes. The bus 1206 may include one or more types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures.
The above description serves to explain the disclosure; but the invention should not be limited to the examples described above. For example, the order and/or timing of some of the various operations may vary from the examples given above without departing from the scope of the invention. Further by way of example, the type of network and/or computing devices may vary from the examples given above without departing from the scope of the invention. Other variations from the above-recited examples may also exist without departing from the scope of the disclosure.
The scope further includes a non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform one or more embodiments described herein. Examples of a computer-readable medium include a hard disk, a floppy disk, a magneto-optical disk (MO), a compact-disk read-only memory (CD-ROM), a compact disk recordable (CD-R), a CD-Rewritable (CD-RW), a digital versatile disk ROM (DVD-ROM), a DVD-RAM, a DVD-RW, a DVD+RW, magnetic tape, a nonvolatile memory card, and a ROM. Computer-executable instructions can also be supplied to the computer-readable storage medium by being downloaded via a network.
While the present disclosure has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments.
This nonprovisional patent application claims the benefit of priority from U.S. Provisional Patent Application Ser. No. 62/852,756 filed on May 24, 2019, the entirety of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
62852756 | May 2019 | US |