The present disclosure relates generally to image processing and analysis of a captured images.
Applications exist that enable image capturing of a physical document. An example of this type of application is a receipt capture application that captures an image corresponding to a physical receipt such as one received when a purchase has been made by a user. It is desirable for users to be able to capture and analyze physical receipts in order to track costs and expenses attributable to the user. A drawback associated with these receipt capture applications is that, often times, the applications expects only a single receipt to analyze when you capture the receipt image. In these existing systems, when having multiple receipts to be captured and analyzed, they can be captured one by one to analyze them one by one. There is difficulty in differentiating between images that contain only a single receipt and multiple receipts.
In one embodiment, an information processing method and apparatus is provided for obtaining a captured image; detecting a character region from the captured image; performing association processing between expense type information specified from each of one or more receipts which are identified by using a detection result of the character region from the captured image and expense amount information specified from each of the one or more receipts in the captured image; and outputting an expense report obtained based on the association processing between the merchant information of each of one or more receipts and the one or more pieces of expense amount information of each of the one or more receipts.
In another embodiment, an information processing method and apparatus is provided for obtaining a captured image; detecting an object from the captured image; specifying a receipt region by using a detection result of the object; performing association processing between expense type information that is specified from the receipt region that is identified based on the detection result of the object and expense amount information that is specified from the receipt region; and outputting an expense report obtained based on the association processing between the expense type that is specified from the receipt region in the captured image and the expense amount information that is specified from the receipt region in the captured image.
These and other objects, features, and advantages of the present disclosure will become apparent upon reading the following detailed description of exemplary embodiments of the present disclosure, when taken in conjunction with the appended drawings, and provided claims.
Throughout the figures, the same reference numerals and characters, unless otherwise stated, are used to denote like features, elements, components or portions of the illustrated embodiments. Moreover, while the subject disclosure will now be described in detail with reference to the figures, it is done so in connection with the illustrative exemplary embodiments. It is intended that changes and modifications can be made to the described exemplary embodiments without departing from the true scope and spirit of the subject disclosure as defined by the appended claims.
Exemplary embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. It is to be noted that the following exemplary embodiment is merely one example for implementing the present disclosure and can be appropriately modified or changed depending on individual constructions and various conditions of apparatuses to which the present disclosure is applied. Thus, the present disclosure is in no way limited to the following exemplary embodiment and, according to the Figures and embodiments described below, embodiments described can be applied/performed in situations other than the situations described below as examples. Further, where more than one embodiment is described, each embodiment can be combined with one another unless explicitly stated otherwise. This includes the ability to substitute various steps and functionality between embodiments as one skilled in the art would see fit.
There is a need to provide a system and method that improves usability and productivity by being able to identify and distinguish, from a captured image, whether the image includes one or more target object on which image processing can be performed. The application according to the present disclosure resolves a problem of identifying and distinguishing between two different objects of a same type in a same image when the background against which the image capture is performed is substantially similar to a color of the object. The application further advantageously can distinguish when an image apparently contains two different target object but really only includes a single target object. Based on the advantageous differentiation, the application improves the reliability and accuracy for any data extraction processing to be performed on the target object(s) in the captured image.
In an exemplary embodiment, the one or more target objects are receipts that represent a transaction between individuals. An application according to the present disclosure is able to capture multiple receipts in a single image capture operation and automatically process each of the multiple receipts captured in a single image capture operation on an individual basis. The application executing on computing device enables the computing device to capture multiple receipts and identify, within the single captured image, each receipt from the one image and process them properly as different receipt data items. The application advantageously differentiates receipts from a surface on which they rest prior to image capture. This is particularly advantageous when the surface on which image capture is performed has a strong color similarity with the color of the paper of the receipts. For example, an object may be a receipt having a paper color that is white and the background color of the table where receipts are placed is also white or very close thereto, there is a difficulty in identifying which data belong to which receipt in the captured image. The application advantageously identifies each receipt as different ones and prevents further image processing operation, such as data extraction (e.g. object character recognition (OCR) processing), from incorrectly attributing data from one receipt to another. Further, this application also properly identifies when a capture image which appears to include more than one receipt, actually only includes a single receipt. The applications and the advantageous provided thereby can be achieved based on the algorithm and figures discussed hereinbelow.
The following description of the functionality of image processing and analysis application according to the present disclosure will occur using the instructional steps illustrated in
At step S102, images of one or more objects are obtained. The images are obtained using an image capture device such as a camera of a mobile phone. In another embodiment, the images may be obtained via file transfer process whereby the computing device acquires one or more images from an external source. This may include, for example, a cloud storage apparatus whereby a user can selectively access and download one or more images on which the processing disclosed herein may be performed. In another embodiment, the images may be attached to an electronic mail message and extracted by the application therefrom in order to perform the processing described herein.
The images include at least one object that is resting on a surface and includes one or more data items that can identified and extracted for storage in a data store (e.g. database). An example of a type of image obtained at S102 is illustrated in
At step S104, the obtained images are processed using optical character recognition processing module/process to retrieve character strings and location data associated with each retrieved character string. The results of the OCR in general will include all retrieved character strings and its location data within the image. The OCR processing performed may be able to recognize any type of alphanumeric character including, letter, numbers, special characters and can recognize characters of one or more language. As long as the result contains all retrieved character strings and its location data, the OCR module/process can be replaced with any general OCR module/process, but the quality of result will vary depending on the result of the OCR module/process. The results of the OCR processing in S104 is illustrated in
After OCR processing at step S104 is complete, a lookup process is performed in order to obtain information about the plurality of character strings that were retrieved in order to determine information about the objects captured in the image. This is performed using a first database that includes Keyword information which will aid the determination as to how many objects are present within the image. The Keyword database includes a plurality of entries that represent different types of character strings that may be recognized during the OCR process. Further, each entry in the Keyword database includes direction data associated therewith. The direction data is used by the algorithm as described later in order to expand the respective character string field to further define the boundary of the target object. The Keywords are object-specific and are used by the algorithm to set an outer boundary for one or more objects within the image as discussed below. In the example used herein, the objects sought to be recognized are receipts. As such, the Keyword database includes a plurality of entries including types of characters/fields that are commonly found on receipts and are indicated as “Key Types” within the database. The Key Type represents the type of field that a particular character string recognized by OCR represents. The pre-stored set of key type information include entries such as Merchant Name, Address, Amount Name, Amount Value, Amount Option, etc. The contents of the Keyword database as described herein is for purpose of example only and used to illustrate the principle of operation and the database preferably includes a plurality of different keyword types and associated direction data which will help improve the boundary defining processing discussed below.
An exemplary Keyword database and its contents is illustrated in Table 1 below.
Step S108-S110 represent the matching and expansion processing performed on the objects within the image. S108 makes use of the recognized characters in each character string field illustrated in
In
The matching operation of S108 may employ one or more pre-defined matching conditions such as full-match or partial match. In the case of a partial match condition being used, further pre-configured sets of matching conditions may be used. For example, a matching condition indicating that characters match the Key Type of “Merchant Name” should be less than 10 alphabetical characters and less than 3 numerical characters. This is merely exemplary and any condition may be used to define a successful match.
With respect to
For character field 306, one or more characters are recognized as including the word “SUB TOTAL”. When compared to the keyword table, it is determined that character field 306 corresponds to an Amount Name and, based on the location of character field 306 and expected location within the object, the direction data used for expansion towards a center of the object is rightward direction. It should be noted that, in Table 1, there are multiple entries that include the characters “TOTAL” and this is an example of a type of robustness that is preferred for the Keyword database which includes not only a plurality of Key Types that are object-specific but also a plurality of Key Values that can signify the same Key Type which allows the algorithm to more accurately process a same type of object but which may include the same elements represented in different ways. In the case of a receipt, the relevant characters may be “TOTAL” but this could appear in that way or, as in the case shown in
For character field 308, the recognized characters in the character string include a predetermined special character “$” and also include a defined format of “$ *.*” where the * represent at least one numerical value. This indicates the character field 310 corresponds to an Amount Value which, based on the location of the particular character string field within the object and an expected position of the Key Type within the object has direction data that directs expansion towards a center of the object be in the leftward direction
For character field 310, the characters therein include the word “GRATUITY” which indicate that the character field corresponds to the Key Type of Amount Option and that, based on the position of character field 310 and the expected position within the object, that the direction data that directs expansion towards a center of the object is an upward direction.
The reference and discussion of the matching of characters in fields 302-310 is meant to illustrate operation only. During operation each of the respective character fields in
If the result of the character match determination in S108 is negative, the result indicates that the characters in a particular character string field do not correspond to the predefined Key Types. In this instance, the expansion processing to be performed expands the boundary of the field in all directions (up, down, left, right) as indicated in S109. In the result of the character match determination in S108 is positive indicating that characters in the particular character field match the Key Type, the expansion processing is performed using the direction data associated with the direction data associated with the Key Type.
Despite S109 and S110 being illustrated as separate steps, they are both part of the expansion processing performed in order to detect how many target objects are present within a particular image. Expansion processing will now be described with respect to
In order to perform expansion processing, a binary map of the obtained image is generated where a background of the image is a first color and pixel areas within each of the recognized character fields are a second different color. This is illustrated in
Referring now to
Based on the number of groups contained in the updated image of
In step S114, for each tentative target object (1) and (2) in
When the Key Values determined in S108 are used to determine the Expense Type, the determined expense type may have an associated predefined object characteristic. In one embodiment, the predefined object characteristic defines an expected size of the target object based on the type of object. For example, if the Expense Type is determined to be Lodging, the predefined characteristic size may indicate a page size of “Letter Size” or “dimension=8″×11”. In another embodiment, the object characteristic may indicate a predetermined range of pixels within the image that are of a single color (e.g. white space). These object characteristics are described for purposes of example only and any detectable feature within an image may be associated with a specific object type in order to determine and set whether a tentative target object is an actual target object from which data can be extracted.
In S116, it is determined whether the particular object type includes a particular object characteristic. For example, using Table 2, the Key Values determined in S108 indicate that the Expense Type for the tentative target object (1) and (2) are “Meals” and that there are no object characteristics associated therewith. Thus, the determination in S116 is negative and the algorithm sets the first tentative target object (1) and second tentative target object (2) as Target Object 1 and Target Object 2 which indicates that the obtained image includes two objects each having discrete information contained therein. Once the number of target objects in the obtained image are set, data corresponding to the Key Values are extracted and associated with the type of expenses. The extracted information may then be stored in a report such as an expense report. S120 further includes performing association processing between expense type information specified from each of one or more receipts which are identified by using a detection result of the character region from the captured image and expense amount information specified from each of the one or more receipts in the captured image and outputting an expense report obtained based on the association processing between the merchant information of each of one or more receipts and the one or more pieces of expense amount information of each of the one or more receipts.
In order to illustrate the result of a positive determination in step S116, a second different image, as shown in
With respect to
This processing is illustrated in
Exemplary operation described above is further summarized when looking back at
In some embodiments, the computing device 1000 performs one or more steps of one or more methods described or illustrated herein. In some embodiments, the computing device 1000 provides functionality described or illustrated herein. In some embodiments, software running on the computing device 1000 performs one or more steps of one or more methods described or illustrated herein or provides functionality described or illustrated herein. Some embodiments include one or more portions of the computing device 1000.
The computing device 1000 includes one or more processor(s) 1001, memory 1002, storage 1003, an input/output (I/O) interface 1004, a communication interface 1005, and a bus 1006. The computing device 1000 may take any suitable physical form. For example, and not by way of limitation, the computing device 1000 may be an embedded computer system, a system-on-chip (SOC), a single-board computer system (SBC) (such as, for example, a computer-on-module (COM) or system-on-module (SOM)), a desktop computer system, a laptop or notebook computer system, an interactive kiosk, a mainframe, a mesh of computer systems, a smartphone, a mobile telephone, PDA, a computing device, a tablet computer system, or a combination of two or more of these.
The processor(s) 1001 include hardware for executing instructions, such as those making up a computer program. The processor(s) 1001 may retrieve the instructions from the memory 1002, the storage 1003, an internal register, or an internal cache. The processor(s) 1001 then decode and execute the instructions. Then, the processor(s) 1001 write one or more results to the memory 1002, the storage 1003, the internal register, or the internal cache. The processor(s) 1001 may provide the processing capability to execute the operating system, programs, user and application interfaces, and any other functions of the computing device 1000.
The processor(s) 1001 may include a central processing unit (CPU), one or more general-purpose microprocessor(s), application-specific microprocessor(s), and/or special purpose microprocessor(s), or some combination of such processing components. The processor(s) 1001 may include one or more graphics processors, video processors, audio processors and/or related chip sets.
In some embodiments, the memory 1002 includes main memory for storing instructions for the processor(s) 1001 to execute or data for the processor(s) 1001 to operate on. By way of example, the computing device 1000 may load instructions from the storage 1003 or another source to the memory 1002. During or after execution of the instructions, the processor(s) 1001 may write one or more results (which may be intermediate or final results) to the memory 1002. One or more memory buses (which may each include an address bus and a data bus) may couple the processor(s) 1001 to the memory 1002. One or more memory management units (MMUs) may reside between the processor(s) 1001 and the memory 1002 and facilitate accesses to the memory 1002 requested by the processor(s) 1001. The memory 1002 may include one or more memories. The memory 1002 may be random access memory (RAM).
The storage 1003 stores data and/or instructions. As an example and not by way of limitation, the storage 1003 may include a hard disk drive, a floppy disk drive, flash memory, an optical disc, a magneto-optical disc, magnetic tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. In some embodiments, the storage 1003 is a removable medium. In some embodiments, the storage 1003 is a fixed medium. In some embodiments, the storage 1003 is internal to the computing device 1000. In some embodiments, the storage 1003 is external to the computing device 1000. In some embodiments, the storage 1003 is non-volatile, solid-state memory. In some embodiments, the storage 1003 includes read-only memory (ROM). Where appropriate, this ROM may be mask-programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), electrically alterable ROM (EAROM), or flash memory or a combination of two or more of these. The storage 1003 may include one or more memory devices. One or more program modules stored in the storage 1003 may be configured to cause various operations and processes described herein to be executed. While storage is shown as a single element, it should be noted that multiple storage devices of the same or different types may be included in the computing device 1000.
The I/O interface 1004 includes hardware, software, or both providing one or more interfaces for communication between the computing device 1000 and one or more I/O devices. The computing device 1000 may include one or more of these I/O devices, where appropriate. One or more of these I/O devices may enable communication between a person and the computing device 1000. As an example and not by way of limitation, an I/O device may include a keyboard, keypad, microphone, monitor, mouse, speaker, still camera, stylus, tablet, touch screen, trackball, video camera, another suitable I/O device or a combination of two or more of these. An I/O device may include one or more sensors. In some embodiments, the I/O interface 1004 includes one or more device or software drivers enabling the processor(s) 1001 to drive one or more of these I/O devices. The I/O interface 1004 may include one or more I/O interfaces.
The communication interface 1005 includes hardware, software, or both providing one or more interfaces for communication (such as, for example, packet-based communication) between the computing device 1000 and one or more other computing devices or one or more networks. As an example and not by way of limitation, the communication interface 1005 may include a network interface card (NIC) or a network controller for communicating with an Ethernet or other wire-based network or a wireless NIC (WNIC) or wireless adapter for communicating with a wireless network, such as a WI-FI network. This disclosure contemplates any suitable network and any suitable communication interface 1005 for it. As an example and not by way of limitation, the computing device 1000 may communicate with an ad hoc network, a personal area network (PAN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), or one or more portions of the Internet or a combination of two or more of these. One or more portions of one or more of these networks may be wired or wireless. As an example, the computing device 1000 may communicate with a wireless PAN (WPAN) (such as, for example, a Bluetooth WPAN or an ultra wideband (UWB) network), a WI-FI network, a WI-MAX network, a cellular telephone network (such as, for example, a Global System for Mobile Communications (GSM) network), or other suitable wireless network or a combination of two or more of these. Additionally the communication interface may provide the functionality associated with short distance communication protocols such as NFC and thus may include an NFC identifier tag and/or an NFC reader able to read an NFC identifier tag positioned with a predetermined distance of the computing device. The computing device 1000 may include any suitable communication interface 1005 for any of these networks, where appropriate. The communication interface 1005 may include one or more communication interfaces 1005.
The bus 1006 interconnects various components of the computing device 1000 thereby enabling the transmission of data and execution of various processes. The bus 1006 may include one or more types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures.
The above description serves to explain the disclosure; but the invention should not be limited to the examples described above. For example, the order and/or timing of some of the various operations may vary from the examples given above without departing from the scope of the invention. Further by way of example, the type of network and/or computing devices may vary from the examples given above without departing from the scope of the invention. Other variations from the above-recited examples may also exist without departing from the scope of the disclosure.
The scope further includes a non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform one or more embodiments described herein. Examples of a computer-readable medium include a hard disk, a floppy disk, a magneto-optical disk (MO), a compact-disk read-only memory (CD-ROM), a compact disk recordable (CD-R), a CD-Rewritable (CD-RW), a digital versatile disk ROM (DVD-ROM), a DVD-RAM, a DVD-RW, a DVD+RW, magnetic tape, a nonvolatile memory card, and a ROM. Computer-executable instructions can also be supplied to the computer-readable storage medium by being downloaded via a network.
While the present disclosure has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments.
This nonprovisional patent application claims the benefit of priority from U.S. Provisional Patent Application Ser. No. 62/852,773 filed on May 24, 2019, the entirety of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
62852773 | May 2019 | US |