This application claims priority to Korean Patent Application No. 10-2021-0147372 filed on Oct. 29, 2021 in Korean Intellectual Property Office, the entire contents of which is hereby incorporated by reference in its entirety.
The present disclosure relates to a method and an apparatus for augmenting learning data and, more particularly, a method and an apparatus for augmenting learning data for artificial intelligence (AI)-based document classification.
Optical character recognition (OCR) refers to a technology of obtaining an image of handwritten or machine-printed text with an image scanner and converting the text into machine-readable text. Optical character recognition is a program or software that converts an image of a typed document obtained by image scanning into a computer-editable character code, and is generally referred to as OCR. OCR started in a field of research in artificial intelligence and machine vision.
Past OCR operates with a plurality of subdivided modules, such as a text line detection module and a character dividing module, and requires a human being to manually register a feature as a criterion for distinguishing characters. In addition, the past OCR limitedly operates only on high-quality images, and has a relatively low rate of recognition of handwriting or cursive writing. Unlike the past OCR that recognizes only characters in a document image, such as a business card or a document, current OCR is developing into a technology that enables recognition of characters even in a picture and a video. Currently, with the technological development of computer vision, instead of a human being manually registering a character, a computer autonomously generates rules for recognizing text from an image through massive data learning using a deep learning-based algorithm. Accordingly, the rate and accuracy of character recognition have been improved to correct a recognition error of OCR, and various algorithms for addressing shortcomings of OCR are continuously being developed.
Among these OCR technologies, document classification is a phase of classifying the type (or kind) of a target document, and is a first stage of document recognition. With the recent development of machine learning, a deep learning algorithm, which is one part of machine learning, is applied to document classification. In order to improve the accuracy of document classification using a deep learning algorithm, a general data augmentation method of adding inversion or reversal, rotation, or various types of noise produced by an image processing technique used in object recognition or character recognition is conventionally used instead of augmenting separate learning data.
Although the conventional data augmentation method is suitable to improve the accuracy of recognition in characters or words, a form in which noise produced by the image processing technique is added to the entire document is different from the form of the document actually contaminated, and thus a data augmentation method based on the structure or characteristics of an actual document is required to improve the accuracy of document classification.
The present disclosure is to address the above-mentioned problems and other problems. Another aspect of the present disclosure is to provide a data augmentation method and an apparatus therefor capable of effectively augmenting learning data for artificial intelligence-based document classification by classifying pieces of document data by quality on the basis of quality measurement information about the pieces of document data and by using a document data distribution by quality.
In another aspect, the present disclosure is to provide a data augmentation method and an apparatus therefor capable of effectively augmenting learning data for artificial intelligence-based document classification by synthesizing a background of documents belonging to a document quality group of low weight (proportion) and a foreground of documents belonging to a document quality group of high weight (proportion).
In still another aspect, the present disclosure is to provide a data augmentation method and an apparatus therefor capable of effectively augmenting learning data for artificial intelligence-based document classification by changing the style of documents belonging to a document quality group of high weight on the basis of a background feature of documents belonging to a document quality group of low weight.
In view of the foregoing, according to an aspect of the present disclosure, a data augmentation method performed by a processor in an apparatus may include: obtaining a plurality of document data; measuring quality information of the plurality of document data; classifying the plurality of document data by quality using the measured quality information, and detecting a distribution of the plurality of document data classified by quality; and augmenting document data corresponding to a specific quality group based on the detected document data distribution by quality.
According to another aspect of the present disclosure, there is provided a data augmentation apparatus including a processor, wherein the processor performs: obtaining a plurality of document data; measuring quality information of the plurality of document data; classifying the plurality of document data by quality using the measured quality information, and detecting a distribution of the plurality of document data classified by quality; and augmenting document data corresponding to a specific quality group based on the detected document data distribution by quality.
According to another aspect of the present disclosure, there is provided a computer-readable storage medium storing instructions that cause an apparatus including a processor to perform an operation for data augmentation based on a document quality when executed by the processor, the operation including: obtaining a plurality of document data; measuring quality information of the plurality of document data; classifying the plurality of document data by quality using the measured quality information, and detecting a distribution of the plurality of document data classified by quality; and augmenting document data corresponding to a specific quality group based on the detected document data distribution by quality.
The above and other aspects, features, and advantages of the present disclosure will be more apparent from the following detailed description taken in conjunction with the accompanying drawings, in which:
Hereinafter, embodiments disclosed herein will be described in detail with reference to the accompanying drawings. Aspects, specific advantages, and novel features of the present disclosure will be more apparent from the following detailed description and exemplary embodiments in conjunction with the accompanying drawings.
The concepts of the terms or words used in the specification and claims are appropriately defined by the inventor to describe the disclosure in an optimal manner, and these terms and words should be interpreted as having meanings and concepts in accordance with the technical idea of the present disclosure, are only for describing embodiments, and should not be construed as limiting the present disclosure.
In assigning reference numerals to components, like or similar components are assigned like reference numerals regardless of reference numerals and redundant descriptions thereof will be omitted. As used herein, the terms “module” and “unit” for components are given or interchangeably used only for ease in writing the specification, do not themselves have distinct meanings or functions, and may refer to a software or hardware component.
In describing components of the present disclosure, when a component is expressed in a singular form, it should be understood that the component also includes a plural form unless otherwise specified. Terms “first”, “second”, and the like are used to distinguish one component from another component, but the components are not limited by these terms. It should be understood that when a component is connected or coupled to another component, the component may be connected or coupled to the other element via any other element interposed therebetween.
When a detailed description about related known technology is determined to make the gist of embodiments disclosed herein unclear in describing the embodiments disclosed herein, the detailed description will be omitted herein. In addition, it should be understood that the accompanying drawings are only for easy understanding of the embodiments disclosed herein, and technical ideas disclosed herein are not limited by the accompanying drawings but include all modifications, equivalents, or substitutes included in the spirit and technical scope of the present disclosure.
The present disclosure proposes a data augmentation method and an apparatus therefor capable of effectively augmenting learning data for artificial intelligence-based document classification by classifying pieces of document data by quality on the basis of quality measurement information about the pieces of document data and by using a document data distribution by quality. The present disclosure proposes a data augmentation method and an apparatus therefor capable of effectively augmenting learning data for artificial intelligence-based document classification by synthesizing a background of documents belonging to a document quality group of low weight and a foreground of documents belonging to a document quality group of high weight. The present disclosure proposes a data augmentation method and an apparatus therefor capable of effectively augmenting learning data for artificial intelligence-based document classification by changing the style of documents belonging to a document quality group of high weight on the basis of a background feature of documents belonging to a document quality group of low weight. Hereinafter, a data augmentation method described herein may be performed by a data augmentation apparatus, and the data augmentation apparatus may be installed in an OCR system or a document classification apparatus.
Hereinafter, various embodiments of the present disclosure will be described in detail with reference to the drawings.
Referring to
When the quantity of the obtained pieces of document data is insufficient, the document classification apparatus may augment pieces of learning data for generating and verifying a learning model on the basis of the obtained pieces of document data (S120). The document classification apparatus may store the augmented pieces of learning data in the storage.
The document classification apparatus may perform an operation of classifying the pieces of document data stored in the storage into a learning data set and a test data set (S130). Here, the learning data set is used to generate a learning model, and the test data set is used to verify the learning model.
The document classification apparatus may generate a document classification model for inferring the type (or kind) of a document by performing a predetermined deep learning algorithm on the basis of the learning data set (S140). Here, the deep learning algorithm may be a deep neural network (DNN), a convolutional neural network (CNN), or a recurrent neural network (RNN), but is not necessarily limited thereto.
The document classification apparatus may verify the performance of the document classification model on the basis of the test data set (S150). According to an embodiment of the present disclosure, operation 130 and operation 150 described above may be omitted.
The document classification apparatus may infer (predict) the type of a document to be classified by using the document classification model generated through the foregoing machine learning process. That is, when the document to be classified is input (S160), the document classification apparatus may input the document into the pre-trained document classification model to accurately infer the type of the document (S170).
Hereinafter, a method for augmenting learning data required to generate and verify a document classification model for inferring the type of a document will be described in detail.
Referring to
The data augmentation apparatus may measure the quality of the obtained pieces of document data (S220). Here, a document area to be subjected to quality measurement may be the entire area of a document or an area in which actual content is written excluding a blank space in the document.
The data augmentation apparatus may measure the quality of the pieces of document data by using a document image attribute, such as a color distribution of a document, a noise distribution of a document, and the degree of rotation of a document. In addition, the data augmentation apparatus may measure the quality of the pieces of document data by using a character image attribute, such as the ratio of a detectable character area in a document and a normal character detection rate, along with the document image attribute.
For example, as shown in
The data augmentation apparatus may calculate scores for a plurality of criteria, and may numerically quantify the quality of document data by applying a weighted value to each criterion. That is, the data augmentation apparatus may calculate a quality score for the document data by using Equation 1 below. Here, the total sum of weighted values for the respective criteria is set to be 1.
The data augmentation apparatus may classify the pieces of document data by quality on the basis of quality measurement information about the pieces of document data (S230). For example, as shown in
The data augmentation apparatus may detect distribution of pieces of document data grouped by quality (S240). For example, as shown in
The data augmentation apparatus may augment document data corresponding to a document quality group of low weight on the basis of document data distribution information by quality to be similar to distribution of document data corresponding to a document quality group of high weight. For example, as shown in
The data augmentation apparatus may separate a foreground and a background of the pieces of document data (S250). Here, the foreground refers to an important object in a field of vision, and the background refers to a less important object as the rest. The foreground is regarded as having a definite form, while the background is regarded as lacking a form. Further, the foreground appears to be in front of the background, and the background appears to be behind the foreground.
Document data subject to foreground and background separation may be all pieces of document data. Alternatively, document data subject to foreground separation and document data subject to background separation may be pieces of document data belonging to different quality groups. For example, the document data subject to the foreground separation may be document data belonging to a first quality group, and the document data subject to the background separation may be document data belonging to a second quality group.
A method for separating a foreground and a background of document data may employ various separation methods, such as a method using a template, a method considering influence between pixels, and a method using an artificial neural network (ANN), but is not necessarily limited thereto. Hereinafter, in this embodiment, a process of separating a foreground and a background of a document through a method using an artificial neural network will be described for illustration. For example, as shown in
The data augmentation apparatus may augment learning data by synthesizing foregrounds of pieces of document data belonging to the document quality group of high weight and backgrounds of pieces of document data belonging to the document quality group of low weight (S260). Here, a method for synthesizing a foreground and a background may employ various synthesis methods, such as a simple synthesis method using a matrix operation, a synthesis method using blending by image processing, and a synthesis method using an image feature, but is not necessarily limited thereto.
The data augmentation apparatus may randomly mix a plurality of foregrounds extracted from the pieces of document data of the document quality group of high weight and a plurality of backgrounds extracted from the pieces of document data of the document quality group of low weight, and may merge as many foregrounds and backgrounds as the data augmentation apparatus is required to augment, thereby generating new types of pieces of document data.
For example, as shown in
Although this embodiment shows that the entire area of a background image and the entire area of a foreground image are merged, the present disclosure is not necessarily limited thereto, and it will be obvious to those skilled in the art that a partial area of a background image and the entire area of a foreground image may be merged.
As described above, the data augmentation method according to the first embodiment of the present disclosure may randomly synthesize a background image of documents belonging to the document quality group of low weight and a foreground image of documents belonging to the document quality group of high weight, thereby effectively augmenting learning data for document classification based on artificial intelligence.
Referring to
The data augmentation apparatus according to the present disclosure may obtain pieces of document data to be subjected to machine learning, and may store the obtained pieces of document data in a storage (S710).
The data augmentation apparatus may measure the quality of the obtained pieces of document data (S720). The data augmentation apparatus may classify the pieces of document data by quality on the basis of quality measurement information about the pieces of document data (S730). The data augmentation apparatus may detect distribution of the pieces of document data classified by quality (S740).
The data augmentation apparatus may augment document data corresponding to a document quality group of low weight on the basis of document data distribution information by quality to be similar to distribution of document data corresponding to a document quality group of high weight. To this end, in this embodiment, the document data may be augmented by using a style transfer method. The style transfer method is an optimization technique that uses a content image and a style reference image to generate a new image that retains the content image but appears to be painted in the style of the style reference image.
The data augmentation apparatus may augment learning data by using a style transfer model (S750). Here, the style transfer model may be configured through an artificial neural network (ANN), such as a convolutional neural network (CNN) or a generative adversarial network (GAN), but is not necessarily limited thereto.
The style transfer model according to the present disclosure may extract a background feature of pieces of document data belonging to the document quality group of low weight, and may change the style of pieces of document data belonging to the document quality group of high weight on the basis of the extracted background feature.
For example, as shown in
As described above, the data augmentation method according to the second embodiment of the present disclosure may change the style of documents belonging to the document quality group of high weight on the basis of a background feature of documents belonging to the document quality group of low weight, thereby effectively augmenting learning data for document classification based on artificial intelligence.
Referring to
The data augmentation apparatus according to the present disclosure may obtain pieces of document data to be subjected to machine learning, and may store the obtained pieces of document data in a storage (S910).
The data augmentation apparatus may measure the quality of the obtained pieces of document data (S920). The data augmentation apparatus may classify the pieces of document data by quality on the basis of quality measurement information about the pieces of document data (S930). The data augmentation apparatus may detect distribution of the pieces of document data classified by quality (S940).
The data augmentation apparatus may augment document data corresponding to a document quality group of low weight on the basis of document data distribution information by quality to be similar to distribution of document data corresponding to a document quality group of high weight. To this end, in this embodiment, the document data may be augmented by using both a foreground/background synthesis method and a style transfer method.
The data augmentation apparatus may separate a foreground and a background of the pieces of document data by using a predetermined separation method (S950).
The data augmentation apparatus may augment learning data by randomly synthesizing foregrounds of pieces of document data belonging to the document quality group of high weight and backgrounds of pieces of document data belonging to the document quality group of low weight (S960).
The data augmentation apparatus may augment the learning data by using a pre-trained style transfer model independently of the foregoing data augmentation method (S970). Here, the style transfer model may extract a background feature of the pieces of document data belonging to the document quality group of low weight, and may change the style of the pieces of document data belonging to the document quality group of high weight on the basis of the extracted background feature.
For example, as shown in
The pieces of document data 1020 of the high quality group and the pieces of document data 1040 of the medium/low quality group stored in the storage 1010 may be input to a style transfer unit 1070. The style transfer unit 1070 may extract a background feature of the pieces of document data 1040 belonging to the medium/low quality group by using the style transfer model, and may change the style of the document data 1020 belonging to the high quality group on the basis of the extracted background feature, thereby generating a plurality of pieces of document data 1080.
As described above, the data augmentation method according to the third embodiment of the present disclosure may use both a method of synthesizing a background image of documents belonging to the document quality group of low weight and a foreground image of documents belonging to the document quality group of high weight and a method of changing the style of the documents belonging to the document quality group of high weight on the basis of a background feature of the documents belonging to the document quality group of low weight, thereby effectively augmenting learning data for document classification based on artificial intelligence.
Referring to
The data acquisition unit 110 may obtain pieces of document data to be subjected to machine learning. The data acquisition unit 110 may store the obtained pieces of document data in a storage. Here, data to be subjected to learning includes any document data recognizable by optical character recognition (OCR).
The quality measurement unit 120 may measure the quality of the pieces of document data stored in a storage. Here, a document area to be subjected to quality measurement may be the entire area of a document or an area in which actual content is written excluding a blank space in the document.
The quality measurement unit 120 may measure the quality of the pieces of document data by using a document image attribute, such as a color distribution of a document, a noise distribution of the document, and the degree of rotation of the document and/or a character image attribute, such as the ratio of a detectable character area in a document and a normal character detection rate, along with the document image attribute.
The data classification unit 130 may classify the pieces of document data by quality on the basis of quality measurement information about the pieces of document data. That is, the data classification unit 130 may classify the pieces of document data into a predetermined number of quality groups according to quality scores for the pieces of document data.
The data classification unit 130 may detect distribution of pieces of document data grouped by quality. Here, the data classification unit 130 may measure the quantity of pieces of document data belonging to a predetermined number of quality groups to detect the distribution of pieces of document data by each quality group.
The data augmentation unit 140 may augment document data corresponding to a document quality group of low weight on the basis of document data distribution information by quality. That is, the data augmentation unit 140 may augment the document data corresponding to the document quality group of low weight on the basis of the quantity of pieces of document data belonging to a document quality group of high weight.
The data augmentation unit 140 may augment document data for document classification based on artificial intelligence by using at least one of a method of synthesizing a foreground and a background of a document and a method using a style transfer model. To this end, the data augmentation unit 140 may include a foreground/background separation unit 141, a foreground/background synthesis unit 142, and a style transfer unit 143.
The foreground/background separator 141 may separate a foreground and a background of the pieces of document data by using a predetermined separation method. Here, the separation method may employ a method using a template, a method considering influence between pixels, a method using an artificial neural network (ANN), and the like, but is not necessarily limited thereto.
The foreground/background synthesis unit (or mixer) 142 may augment learning data by synthesizing foregrounds extracted from pieces of document data of the document quality group of high weight and backgrounds extracted from pieces of document data of the document quality group of low weight. Here, a method for synthesizing a foreground and a background may employ a simple synthesis method using a matrix operation, a synthesis method using blending by image processing, a synthesis method using an image feature, and the like, but is not necessarily limited thereto.
The style transfer unit 143 may augment learning data by using a pre-trained style transfer model. Here, the style transfer model may extract a background feature of the pieces of document data belonging to the document quality group of low weight, and may change the style of the pieces of document data belonging to the document quality group of high weight on the basis of the extracted background feature.
As described above, the data augmentation apparatus according to an embodiment of the present disclosure may measure the quality of pieces of document data, and may group the pieces of document data by quality on the basis of information about the measured quality of the data, thereby augmenting document data corresponding to the document quality group of low weight on the basis of the quantity of the pieces of document data belonging to the document quality group of high weight.
Referring to
For example, the apparatus 200 to which the proposed methods of the present disclosure are applicable may include a network device such as a repeater, a hub, a bridge, a switch, a router, and a gateway, a computer device such as a desktop computer and a workstation, a mobile terminal such as a smartphone, a portable device such as a laptop computer, a home appliance such as a digital TV, and a transportation system such as a car. In another example, the device 200 to which the present disclosure is applicable may be included as a part of an application-specific integrated circuit (ASIC) configured in the form of a system-on-chip (SoC).
A memory 220 may be operatively connected to a processor 210, may store a program and/or instructions for processing and control of the processor 210, and may store data and information used in the present disclosure, control information necessary to process data and information according to the present disclosure, and temporary data generated in a process of processing data and information. The memory 220 may be configured as a storage device, such as a read-only memory (ROM), a random access memory (RAM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), a flash memory, a static RAM (SRAM), a hard disk drive (HDD), and a solid state drive (SSD).
The processor 210 may be operatively connected to the memory 220 and/or a network interface 230, and controls the operation of each module in the device 200. In particular, the processor 210 may perform various control functions for performing the proposed methods of the present disclosure. The processor 210 may also be referred to as a controller, a microcontroller, a microprocessor, a microcomputer, or the like. The proposed methods of the present disclosure may be implemented by hardware, firmware, software, or a combination thereof. When the present disclosure is implemented by using hardware, an application-specific integrated circuit (ASIC) a digital signal processor (DSP), a digital signal processing device (DSPD), a programmable logic device (PLD), a field programmable gate array (FPGA), and the like may be provided in the processor 210. When the proposed methods of the present disclosure are implemented by using firmware or software, the firmware or software may include instructions related to a module, a procedure, or a function that performs functions or operations necessary to implement the proposed methods of the present disclosure, and the instructions may be stored in the memory 220 or in a computer-readable recording medium (not shown) separately from the memory 220, and may be configured to cause the device 200 to implement the proposed methods of the present disclosure when executed by the processor 210.
The apparatus 200 may include the network interface device 230. The network interface device 230 may be operatively connected to the processor 210, and the processor 210 may control the network interface device 230 to transmit or receive a wireless/wired signal carrying information and/or data, a signal, a message, and the like through a wireless/wired network. The network interface device 230 supports various communication standards, for example, IEEE 802 standards, 3GPP LTE(-A), and 3GPP 5G, and may transmit and receive control information and/or a data signal according to the corresponding communication standards. The network interface device 230 may be configured outside the apparatus 200 as needed.
A data augmentation method and an apparatus therefor according to the embodiments of the present disclosure may have the following effects.
According to at least one of the embodiments of the present disclosure, it is possible to measure the quality of pieces of document data and to group the pieces of document data by quality on the basis of information about the measured quality of the data, thereby augmenting document data corresponding to a document quality group of low weight on the basis of the quantity of pieces of document data belonging to a document quality group of high weight.
Further, according to at least one of the embodiments of the present disclosure, it is possible to randomly synthesize a background image of documents belonging to the document quality group of low weight and a foreground image of documents belonging to the document quality group of high weight, thereby effectively augmenting learning data for document classification based on artificial intelligence.
In addition, according to at least one of the embodiments of the present disclosure, it is possible to change the style of the documents belonging to the document quality group of high weight on the basis of a background feature of the documents belonging to the document quality group of low weight, thereby effectively augmenting learning data for document classification based on artificial intelligence.
The effects obtainable by the data augmentation method and the apparatus therefor according to the embodiments of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.
The present disclosure described above may be realized as a computer-readable code in a medium recording a program. A computer-readable medium may continue to store a computer-executable program, or may temporarily store the computer-executable program for execution or download. Further, the medium may include various recording devices or storage devices in a form in which a single piece or a plurality of pieces of hardware is combined, and may be distributed on a network without being limited to a medium directly connected to a computer system. Therefore, the above detailed description should not be construed as restrictive in all aspects and should be considered as illustrative. The scope of the present disclosure should be determined on the basis of reasonable interpretation of the appended claims, and all changes and modifications within the equivalent scope of the present disclosure are included in the scope of the disclosure.
The present disclosure is not limited to the embodiments described above and the appended drawings, but may be configured in different specific forms. It will be obvious to those skilled in the art to which the present disclosure pertains that a component according to the present disclosure described above can be substituted, modified, or changed within the spirit and scope of the present disclosure.
| Number | Date | Country | Kind |
|---|---|---|---|
| 10-2021-0147372 | Oct 2021 | KR | national |