CROSS-REFERENCE TO RELATED APPLICATION(S)
This application is based on and claims priority under 35 U.S.C. 119 to Korean Patent Application No. 10-2021-0146786, filed on Oct. 29, 2021, in the Korean Intellectual Property Office, the disclosure of which is herein incorporated by reference in its entirety.
The present disclosure relates to a method for displaying a document recognition result and a document recognition apparatus using same, which can effectively provide text extracted from a document image to a user.
Optical character recognition (OCR) technology is a technology for deriving a digitized result by recognizing a portion corresponding to a character existing in an input document. Character recognition with respect to a document image may search for a specific key word, etc. included in the corresponding document image possible so that it is possible to easily extract necessary information from the document in a form of an image. However, the conventional art provided only a function to find a specific keyword included in a document image and thus has a problem in that a specific UI for using a character-recognized document image is not implemented.
The present disclosure is to provide a method for displaying a document recognition result and a document recognition apparatus using same, wherein output items to be extracted from a document image through document recognition are visually displayed in the document image to allow a user to intuitively recognize same.
Further, the present disclosure is to provide a method for displaying a document recognition result and a document recognition apparatus using same, wherein a user can quickly and easily browse and print required information by allowing the user to easily add, delete, or change output items to be extracted from a document image.
Also, the present disclosure is to provide a method for displaying a document recognition result and a document recognition apparatus using same, which can intensively provide an extraction result for output items necessary for a user's business processing rather than recognizing all the characters included in the document image.
A method performed by a processor in a computing apparatus for displaying a document recognition result according to an embodiment of the present disclosure may include: extracting text from an input document image and matching multiple key values and values included in the text to generate key-value pairs; on the basis of a configured output item, extracting a key-value pair corresponding to the output item from the document image; and adding a highlight object to an area corresponding to the extracted key-value pair in the document image to generate a first image and displaying the first image.
In the displaying the first image, generating a table indicating a key value and a value included in the key-value pair corresponding to the output item is generated and is additionally included in the displayed first image.
In the displaying the first image, when the input document image is added, a key value and a value corresponding to the key-value pair extracted from each document image may be accumulated to the table.
In the extracting a key-value pair, a key-value pair having a key value corresponding to the output item may be searched from among the key-value pairs to extract the key-value pair.
In the extracting a key-value pair, a key value corresponding to the output item may be searched by using a preconfigured concordance mapping DB and a key-value pair corresponding to the output item may be extracted by using the searched key value.
The method for displaying a document recognition result according to an embodiment of the present disclosure may further include receiving an add or delete input with respect to the output item from a user and configuring the output item.
In the configuring the output item, the list of the output items and an item display area including a selection object for adding or deleting output items to/from the list may be additionally added to be displayed.
The method for displaying a document recognition result according to an embodiment of the present disclosure may further include generating and providing the table in a file form of at least one of JSON, XML, Excel and PDF.
The method for displaying a document recognition result according to an embodiment of the present disclosure may include: displaying a thumbnail display area for displaying a thumbnail image of respective input document images; and when one thumbnail image in the thumbnail display area is selected, displaying a document image corresponding to the selected thumbnail image within a selection image area.
A computer-readable storage medium according to an embodiment of the present disclosure may store instructions that, when executed by a processor, cause an apparatus including the processor to perform an operation for displaying a document recognition result, wherein the operation includes: extracting text from an input document image and matching multiple key values and values included in the text to generate key-value pairs; on the basis of a configured output item, extracting a key-value pair corresponding to the output item from the document image; and adding a highlight object to an area corresponding to the extracted key-value pair in the document image to generate a first image and displaying the first image.
In the displaying the first image, a table indicating a key value and a value included in the key-value pair corresponding to the output item may be generated and included in the displayed first image.
In the displaying the first image, when the input document image is added, a key value and a value corresponding to the key-value pair extracted from each document image may be accumulated to the table.
A document recognition apparatus according to an embodiment of the present disclosure may be an apparatus including a processor and the processor may configured to: extract text from an input document image and match multiple key values and values included in the text to generate key-value pairs; on the basis of a configured output item, extract a key-value pair corresponding to the output item from the document image; and add a highlight object to an area corresponding to the extracted key-value pair in the document image to generate a first image and display the first image.
In the displaying the first image, generating a table indicating a key value and a value included in the key-value pair corresponding to the output item is generated and is additionally included in the displayed first image.
In the displaying the first image, when the input document image is added, a key value and a value corresponding to the key-value pair extracted from each document image may be accumulated to the table.
In the extracting a key-value pair, a key-value pair having a key value corresponding to the output item may be searched from among the key-value pairs to extract the key-value pair.
In the extracting a key-value pair, a key value corresponding to the output item may be searched by using a preconfigured concordance mapping DB and a key-value pair corresponding to the output item may be extracted by using the searched key value.
The document recognition apparatus according to an embodiment of the present disclosure may be configured to further perform receiving an add or delete input with respect to the output item from a user and configuring the output item.
In the configuring the output item, the list of the output items and an item display area including a selection object for adding or deleting output items to/from the list may be additionally added to be displayed.
The document recognition apparatus according to an embodiment of the present disclosure may be configured to further perform generating and providing the table in a file form of at least one of JSON, XML, Excel and PDF.
Further, the above-described technical solutions to the problems are not all of the features of the present disclosure. Various features of the present disclosure and advantages and effects thereof will be more fully understood by reference to following specific exemplary embodiments.
By the method for displaying a document recognition result and the document recognition apparatus using same according to an embodiment of the present disclosure, an output item to be extracted from a document image is visually displayed in the document image and thus a user can intuitively recognize output items being extracted from the current document image.
By the method for displaying a document recognition result and the document recognition apparatus using same according to an embodiment of the present disclosure, a user can easily add, delete, or change output items to be extracted from a document image. That is, a user can easily configure desired output items and thus the user can easily browse and print only the necessary information from the document image.
By the method for displaying a document recognition result and the document recognition apparatus using same according to an embodiment of the present disclosure, rather than recognizing all the characters included in the document image, an extraction result for output items necessary for a user's business processing can be intensively provided. Therefore, it is possible to provide user experience (UX) that allows users to perform more efficient business processing.
It will be appreciated by a person skilled in the art that the effects of the method for displaying a document recognition result and the document recognition apparatus using same according to embodiments of the present disclosure, which may be achieved based on various embodiments, are not limited to the effects described above and other effects that are not described above will be clearly understood from the following detailed description.
Hereinafter, various embodiments of the present disclosure will be described in detail with reference to accompanying drawings. The objects, specific advantages and novel features of the present disclosure will become more apparent from the following detailed description and preferred embodiments taken in conjunction with the accompanying drawings.
Prior to the description, the terms or words used in the present specification and claims should be construed as meanings and concepts consistent with the technical spirit of the present disclosure as the inventor has appropriately defined the concept in order to best explain the disclosure. They are for illustrative purposes only, and should not be construed as limiting the present invention.
In assigning reference numerals to the components, the same or similar components are given the same reference numerals regardless of the reference numerals, and redundant description thereof will be omitted. Herein, the suffixes “module” and “unit” for the elements used in the following description are given or used in common by considering facilitation in writing this disclosure only but fail to have meanings or roles discriminated from each other, and may be referred as software or hardware elements.
In describing elements of the present disclosure, when an element is expressed in a singular form, it should be understood that the element also includes a plural form unless otherwise specified. As used herein, such terms as “first” and “second” may be used to simply distinguish a corresponding component from another, and does not limit the components in other aspect. When it is described that one element is connected to another element, it means that still another element may be connected between the element and the another element.
In the following description of the disclosure, a detailed description of the related prior art incorporated herein will be omitted when it is determined that the description may make the subject matter of embodiments disclosed in the disclosure unclear. The accompanying drawings are only for easy understanding of the embodiments disclosed in the present specification, and the technical ideas disclosed in the present specification are not limited by the accompanying drawings and it should be understood to include all modifications, equivalents and substitutes included in the spirit and scope of the present disclosure.
Referring to
Hereinafter, the document recognition apparatus according to an embodiment of the present disclosure will be described with reference to
The text extraction part 110 may extract text by performing character recognition with respect to an input document image I. That is, the text extraction part 110 may recognize characters included in the document image I by using a character recognition algorithm such as optical character recognition (OCR) and extract a recognition result as text. Any character recognition algorithm may be applied to the text extraction part 110 as long as the algorithm can extract text from the document image I.
The key-value pair generation part 120 may match multiple key values and values included in the text extracted by the text extraction part 110 and generate a key-value pair. Referring to
Here, relationship between the key value and the value may be configured in consideration of a position between the key value and the value, and depending on an embodiment, it is also possible to match in consideration of the semantic similarity between the key value and the value. The key-value pair generation part 120 may match each extracted key value and value on the basis of a preconfigured rule or match through machine learning based on a neural network model or the like.
When a key value is omitted from the document image I, it is possible to create a key value and match the key value with a value. “AA department store” in
The calculation part 130 may extract a key-value pair corresponding to output items from the document image I on the basis of a preconfigured output item. The output item corresponds to an item desired to be displayed to a user among texts included in the document image I. Depending on an embodiment, among the output items, basic items to be displayed to a user may have been configured.
For example, “Company name” and “Total Amount” may be configured as a basic item, and in this case, the calculation part 130 may search a key-value pair having a key value corresponding to “Company Name”which is an output item from among key-value pairs. Thereafter, when there is a key value corresponding to “Company Name” among the key-value pairs generated by the key-value pair generation part 120, the corresponding key-value pair may be extracted.
Each document image I may be non-standardized and thus terms used in the document images I may be different from each other. That is, for the same output item, respective document images I may use different terms. For example, for a term corresponding to “Total Amount” of the output items, each receipt may use different words such as “Total”, “Sales Total”, and “Receipt Amount”. Here, when the calculation part 130 uses only the text extracted from the document image I, searching for a key value corresponding to an output item may fail.
In order to prevent the above-described problem, the document recognition apparatus 100 may use a concordance mapping database (DB) 131. That is, key-values corresponding to output items may be searched by using a preconfigured concordance mapping DB 131, and a key-value pair corresponding to an output item may be extracted by using the searched key value. For example, all of “Total”, “Sales Total”, and “Receipt Amount” may be stored in the concordance mapping DB 131 as corresponding to the output item “Total Amount”, and thus, it is possible to extract key value pairs with key values of “Total”, “Sales Total”, and “Receipt Amount” in addition to “Total Amount” as key value pairs corresponding to the output item “Total Amount”.
According to feedback of a user, the calculation part 130 may add other items in addition to the basic items or delete or modify at least a portion of the basic items to update output items. That is, the calculation part 130 may receive an add or delete input with respect to output items from a user and modify and configure the output items according thereto. As shown in
Thereafter, the calculation part 130 may add a highlight object to an area corresponding to the extracted key-value pair in the document image I to generate a first image. That is, as shown in
The calculation part 130 may implement the first image including the highlight object H to be displayed to a user through the display part 140. When a user adds or deletes an output item through the selection object S or the like, the calculation part 130 may modify and display a location of the highlight object H in the first image according to the selected output item.
In addition, the calculation part 130 may generate a table indicating a key value and a value included in a key-value pair corresponding to the output item. That is, by providing an output item that a user wants to check through a separate table, it is possible to conveniently provide information required by the user among various information included in the document image I.
Specifically, referring to
Depending on an embodiment, the table generated by the calculation part 130 may be provided to a user in a form of file, and various types of files such as JSON, XML, Excel, and PDF may be generated according to a user input.
Multiple documents I may be input to the document recognition apparatus 100, and the calculation part 130 may accumulate and display a key value and a value corresponding to the key-value pair extracted from the added document image I in the table.
That is, when “Company Name” and “Total Amount” are configured as a basic item for an output item, as shown in
Additionally, the output item may vary depending on the user's settings, etc. and in this case, the table may be accumulated and generated as shown in
Here, the calculation part 130 may search for a key value corresponding each of “Company Name”, “Item”, “Quantity”, and “Amount” from the sixth receipt I6 and search for a key-value pair corresponding to the key value, and may extract a value from the searched key-value and configure the value as a value corresponding to the output item. The output item “Company Name”corresponds to the key value “Business name” and the value “AA department store” corresponding to “Business name” may be extracted. In addition, an output item “Item”corresponds to the key value “Product name” and “Red ginseng tablet” may be extracted and added to the table as a value corresponding to “Item”. As the output item “Quantity”corresponds to a key value “Volume” and the output item “Amount” corresponds to “Price”, “1” may be extracted as a value corresponding to “Quantity”and a value “30,000” corresponding to “Amount” is extracted so as to be added to the table.
Thereafter, as shown in
The display part 140 may display the first image, the table, and the like received from the calculation part 130 to visually provide same to a user. Referring to
Depending on an embodiment, as shown in
According to an embodiment, it is possible that when an input of a user with respect to the selection image area A1 is authorized, the document image I displayed in the selection image area A1 is changed. The document image I may appear sequentially in the order of document images corresponding to the thumbnail image T appearing in the thumbnail display area A4. For example, it is possible that when a user authorizes a click input in the selection image area A1, the document images in the following order among the document images in the thumbnail display area A4 may be displayed. Alternatively, it is possible that when swiping from left to right within the selection image area A1, the next image among documents images in the thumbnail display area A4 is displayed, and when swiping from right to left, the previously displayed image is displayed again. Various modifications are possible, such as swiping up and down instead of swiping left and right.
Additionally, selection of multiple output items included in the item display area A2 may be made or released by clicking the selection object S shown in
Referring to
Thereafter, when one of thumbnail images in the thumbnail display area is selected, the document recognition apparatus may display a document image corresponding to the selected thumbnail image in the selection image area (S20). According to an embodiment, it is possible that when an input of a user with respect to the selection image area is authorized, the document image I displayed in the selection image area is changed. For example, it is possible that when a user authorizes a click or swipe input within the selection image area, the document images in the following order among the document images included in the thumbnail display area may be displayed.
When a document image is selected, the document recognition apparatus may extract text from the input document image and match multiple key values and values included in the text to generate key-value pairs (S30). The document recognition may recognize and extract text in the document image by using a document recognition algorithm such as OCR. Relationship between the key values and the values included in the text may be matched through a preconfigured rule or a neural network model-based machine learning. The key-value pairs may be matched considering a position, semantic similarity, or the like between each key value and value. When a key value is omitted from the document, it is possible to create a key value and match the key value with a value.
Thereafter, when an add or delete input with respect to output items is received from a user, the document recognition apparatus may configure an output item (S40). The output item corresponds to an item desired to be displayed to a user among texts included in the document image. Among the output items, basic items to be displayed to a user may have been configured. According to feedback of a user, it is possible to add other items in addition to the basic items or delete or modify at least a portion of the basic items to update output items. That is, the document recognition apparatus may receive an add or delete input with respect to output items from a user and modify and configure the output items according thereto. For example, the document recognition apparatus may generate an item display area including a list of output items and a selection object with respect to each output item and provide same to a user. As such, a user may easily identify output items from the item display area and select or deselect desired output items.
Thereafter, the document recognition apparatus may extract, on the basis of a configured output item, a key-value pair corresponding to the output item from the document image (S50). The document recognition apparatus may search for a key-value pair having a key value corresponding to the output item among the key-value pairs and extract the key-value pair.
However, each document image may be non-standardized and thus terms used in the document images may be different from each other. That is, for the same output item, respective document images may use different terms. Here, when the document recognition apparatus uses only the text extracted from the document image, searching for a key value corresponding to an output item may fail.
In order to prevent the above-described problem, the document recognition apparatus may use a concordance mapping DB. That is, words of the document image, which correspond to the same output item are stored in the concordance mapping DB, and thus when key values corresponding to the output item is searched by using the preconfigured concordance mapping DB, it is possible to extract a key-value pair corresponding to the output item.
Thereafter, the document recognition apparatus may add a highlight object to an area corresponding to the extracted key-value pair in the document image to generate a first image and display the first image (S60). That is, the highlight object is added to visually display the selected output item in the document image to a user so as to generate the first image. The highlight object may correspond to adding a highlight, a bounding box, shading, or the like to an area corresponding to an output item. In addition, when a user adds or deletes an output item or the like, the document recognition apparatus may modify and display a location of the highlight object in the first image according to the modified output item.
The document recognition apparatus may generate a table indicating a key value and a value included in the key-value pair corresponding to the output item and displaying the first image by further including the table (S60). That is, by providing an output item that a user wants to check through a separate table, it is possible to conveniently provide information required by the user among various information included in the document image.
Additionally, multiple documents may be input to the document recognition apparatus, and in this case, the document recognition apparatus may accumulate and display a key value and a value corresponding to the key-value pair extracted from the added document image in the table.
Thereafter, the document recognition apparatus may generate and output the generated table in a file form such as JSON, XML, Excel and PDF (S70). That is, a user may request the document recognition apparatus to provide information corresponding to the table in a file form, and in this case, the document recognition apparatus may convert the generated table into a file form and provide same to the user. The file form provided by the document recognition apparatus may be variously changed according to an embodiment.
The computing environment 10 disclosed herein includes the computing apparatus 12. In an embodiment, the computing apparatus 12 may be an apparatus for classifying a document (e.g., the document recognition apparatus 100).
The computing apparatus 12 includes at least one processor 14, a computer-readable storage medium 16, and a communication bus 18. The processor 14 may cause the computing apparatus 12 to be operated according to the above-described exemplary embodiment. For example, the processor 14 may execute at least one program stored in the computer-readable storage medium 16. The at least one program may include one or more computer-executable instructions, and the computer-executable instructions may be configured to cause, when executed by the processor 14, the computing apparatus to perform operations according to an exemplary embodiment.
The computer-readable storage medium 16 is configured to store a computer-executable instruction or program code, program data and/or other suitable form of information. A program 20 stored in the computer-readable storage medium 16 includes a set of instructions executable by the processor 14. In an embodiment, the computer-readable storage medium 16 may include a memory (volatile memory, such as random-access memory, non-volatile memory, or a suitable combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, another form of storage medium accessible by the computing device 12 and capable of storing desired information, or suitable combinations thereof.
The communication bus 18 may mutually connect various components of the computing apparatus 12 including the processor 14 and the computer-readable storage medium 16.
The computing apparatus 12 may include one or more input/output interfaces 22 providing an interface for one or more input/output apparatus 24, and one or more network communication interface 26. The input/output interface 22 and the network communication interface 26 are connected to the communication bus 18. The input/output apparatus 24 may be connected to other components of the computing apparatus 12 through the input/output interface 22. The exemplary input/output apparatus 24 may include an input apparatus, such as pointing apparatus (a mouse, a trackpad, or the like), a keyboard, a touch input apparatus (a touchpad, a touchscreen, or the like), a voice or sound input apparatus, various types of sensor apparatus and/or imaging apparatus, and/or a display apparatus, and an output apparatus such as a printer, a speaker and/or a network card. The exemplary input/output apparatus 24 may be included in the computing apparatus 12 as a component constituting the computing apparatus 12 and may be connected to the computing apparatus as a separate apparatus distinct from the computing apparatus 12.
The present disclosure described above may be implemented as a computer-readable code in a medium in which a program is recorded. The computer-readable medium may continuously store a computer-executable program, or may temporarily store a computer-executable progam for execution or download. Furthermore, the medium may be various recording means or storage means in a form of a single hardware or a combination of several hardware, may be not limited to a medium directly connected to any computer system, and may exist on a network while being dispersed. An example of the recording medium may be one configured to store program instructions, including magnetic media such as a hard disk, a floppy disk and a magnetic tape, optical media such as CD-ROM and a DVD, magneto-optical media such as a floptical disk, ROM, RAM, and flash memory. Furthermore, other examples of the recording medium may include an application store in which applications are distributed, a site in which other various pieces of software are supplied or distributed, and recording media and/or storage media managed in a server or the like. Accordingly, the detailed description should not be construed as being limitative from all aspects, but should be construed as being illustrative. The scope of the present disclosure should be determined by reasonable analysis of the attached claims, and all changes within the equivalent range of the present disclosure are included in the scope of the present disclosure.
The present disclosure is not limited by the above-described embodiments and the accompanying drawings. For those of ordinary skill in the art to which the present disclosure pertains, it will be apparent that the components according to the present disclosure can be substituted, modified, and changed without departing from the technical spirit of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
10-2021-0146786 | Oct 2021 | KR | national |