Aspects of the present invention relate to image processing, and more particularly, to character recognition.
Printed end user data can contain damaged and/or degraded characters (for example, smeared, smudged, blurry, bleeding, or partially missing letters or numbers). Such characters can arise for a number of reasons, from scanning errors to moisture or other extraneous material on the documents, among other things. Any of these problems presents a challenge in optical character recognition (OCR) or image character recognition (ICR). It can be challenging to retrain or finetune a character recognition system to address such errors. For example, there may be a limited number of examples with which to do the retraining or the finetuning. There also may be a concern that training data can be biased, resulting in decreased performance and/or drift after the retraining or finetuning. To avoid such issues, larger training sets can be employed, but these can cost a lot of time and money, not only in terms of running the training sets, but also in terms of the time to generate and check the training sets. Augmenting existing data sets also can result in drift and/or bias.
In view of the foregoing, in one aspect, embodiments of the invention provide an approach to data generation and restoration involving character data. In an embodiment, issues in Japanese or more general East Asian character data generation and restoration are addressed, by creating data to train a machine learning (ML) system which can operate in conjunction with an optical/image character recognition (OICR) engine as part of an overall OICR system, wherein the OICR engine reads certain types of data, and the ML system reads data which may be more specialized. Having the ML system and the OICR engine operate together can avoid having to perform substantial retraining or fine tuning of the OICR engine.
In an embodiment, the ML system comprises a convolutional neural network (CNN) which generates characters containing defects, and a recursive neural network (RNN) which identifies and corrects characters with defects. In an embodiment, the overall CRNN system comprises a generative network model that learns and recognizes low level noisy patterns and differentiates them from higher level content. In an embodiment, the characters and fonts with which the CRNN system work may be specific to an end user, while the characters and fonts with which the OICR engine works may be more general. In an embodiment, the OICR engine may be referred to as a base model, and the CRNN system may be referred to as an end user model. In an embodiment, the OICR engine also may be an ML system. In an embodiment, the OICR engine may operate with multiple CRNN systems for different end users.
There are two main areas of application for this generative network model, in the context of optical and image character recognition. One is in data generation, to generate training sets for the CRNN system. For data generation, the generative network is applied to generate characters with a variety of defects, for example, missing characters or portions of characters, or blurry characters. Another is in image restoration, to clean data and reduce noise in the data. For image restoration, the generative network is applied to recover heavily damaged characters with particular fonts, for example, characters with merged or missing strokes, or characters that are blurred or smudged, and to remove background noise. In an embodiment, the fonts and background noise may be specific to an end user. Accordingly, the CRNN may be trained on data that is specific to an end user. By training the CRNN with this data instead of the OICR engine, it is possible to avoid skewing or biasing the OICR engine with end user data that may be extreme, or very different from other training data used for the OICR engine.
Another area of application of the generative network model is in generating an end user model to enable recognition of heavily damaged text. For the end user model, a trained discriminator also can be used as a sub-model, and can be applied to recognize different types of degradation in end user samples.
Aspects of the invention now will be described with reference to embodiments as illustrated in the accompanying drawings, in which:
Aspects of the present invention provide, in an optical/image character recognition (OICR) system comprising an OICR engine and a machine learning system, a method of training the machine learning system, the method comprising: receiving input text and/or image data; altering the input text and/or image data to produce degraded data; training the machine learning system using the degraded data; receiving the degraded data into the machine learning system; correcting the degraded data with the machine learning system to produce corrected data; in response to detecting that adjustment of the machine learning system is required after reading the corrected data, adjusting one or more weights of nodes in the machine learning system; and repeating the correcting and adjusting until it is determined that adjustment no longer is required; wherein the adjusting is carried out without requiring refinement or other alteration to the OICR engine.
In a further aspect, the machine learning system uses additional data besides the degraded data for training. In a still further aspect, the machine learning system is a convolutional recurrent neural network (CRNN), the CRNN comprising a convolutional neural network (CNN) and a recurrent neural network (RNN). In a yet still further aspect, the CNN produces the degraded data, and the RNN produces the corrected data. The CNN may be trained with generative adversarial network (GAN) loss, and the RNN may be trained with connectionist temporal categorical (CTC) loss.
Other aspects of the present invention provide, in an optical/image character recognition (OICR) system comprising an OICR engine and a machine learning system, a method of restoring degraded data, the method comprising, in the machine learning system: receiving the degraded data; correcting the degraded data with the machine learning system to produce corrected data; in response to detecting that adjustment of the machine learning system is required after reading the corrected data, adjusting one or more weights of nodes in the machine learning system; and repeating the correcting and adjusting until it is determined that adjustment no longer is required. In a further aspect, the degraded data comprises characters with one or more of merged or missing strokes and background noise. In a still further aspect, the method includes, responsive to a determination that contents of the machine learning system warrant incorporation of one or more aspects of the machine learning system into to the OICR system, making changes to the OICR system to incorporate the one or more aspects. In a yet still further aspect, the machine learning system is a convolutional recurrent neural network (CRNN), the CRNN comprising a convolutional neural network (CNN) and a recurrent neural network (RNN). In embodiments, the CNN produces the degraded data, and the RNN produces the corrected data.
Still other aspects of the present invention provide an optical/image character recognition (OICR) system comprising an OICR engine and a machine learning system, wherein the machine learning system is programmed to perform a method comprising: receiving input text and/or image data; altering the input text and/or image data to produce degraded data; training the machine learning system using the degraded data; receiving the degraded data into the machine learning system; correcting the degraded data with the machine learning system to produce corrected data; in response to detecting that adjustment of the machine learning system is required after reading the corrected data, adjusting one or more weights of nodes in the machine learning system; and repeating the correcting and adjusting until it is determined that adjustment no longer is required; wherein the adjusting is carried out without requiring refinement or other alteration to the OICR engine. In a further aspect, the machine learning system uses additional data besides the degraded data for training. In a still further aspect, the machine learning system is a convolutional recurrent neural network (CRNN), the CRNN comprising a convolutional neural network (CNN) and a recurrent neural network (RNN). In embodiments, the CNN produces the degraded data, and the RNN produces the corrected data. In yet still further aspects, the CNN is trained with generative adversarial network (GAN) loss, and the RNN is trained with connectionist temporal categorical (CTC) loss.
Still other aspects of the present invention provide, in the machine learning system, a method of restoring degraded data, the method comprising: receiving degraded end user data; correcting the degraded end user data with the machine learning system to produce corrected end user data; in response to detecting that adjustment of the machine learning system is required after reading the corrected end user data, adjusting one or more weights of nodes in the machine learning system; repeating the correcting and adjusting until it is determined that adjustment no longer is required; and outputting the corrected end user data. According to additional aspects, the degraded data end user comprises characters with one or more of merged or missing strokes and background noise. According to yet additional aspects, responsive to a determination that contents of the machine learning system warrant incorporation of one or more aspects of the machine learning system into to the OICR system, changes to the OICR system are made to incorporate the one or more aspects. According to yet still additional aspects, the machine learning system is a convolutional recurrent neural network (CRNN), the CRNN comprising a convolutional neural network (CNN) and a recurrent neural network (RNN). In embodiments, the CNN produces the degraded data, and the RNN produces the corrected data.
As noted earlier, according to an embodiment, the inventive method and system described herein has particular applicability to Eastern language character generation and restoration. In an embodiment, the inventive method and system performs character generation and restoration in Japanese. In an embodiment, a Convolutional Recurrent Neural Network (CRNN) learns and separate two levels of information: (1) noise or other degradation in printed text; (2) high level font or handwriting content.
In an embodiment, a Convolutional Neural Network (CNN) addresses noise in an earlier stage. At a later stage, a Recurrent Neural Network (RNN) learns high-level font information. In an embodiment, the overall Convolutional-Recurrent neural network (CRNN) is trained with a specific loss function, including a loss for an associated Generative Adversarial Network (GAN) for the CNN, and a Connectionist Temporal Categorical (CTC) loss for the RNN. As ordinarily skilled artisans will appreciate, each loss function is associated with a particular instance.
There are several aspects of the CRNN according to different embodiments. In one aspect, the CRNN generates text and/or images that are degraded to provide degraded images resembling images in end user data. To generate the degraded images, a font may be selected. Degradations may have different effects depending on the font.
In another aspect, the CRNN restores text and/or images from input degraded images. The generated degraded images in the first-described aspect may be used to train the CRNN to recognize what is deficient or missing in the input degraded images.
In an embodiment, the CRNN receives a noisy sample, or an image template to train the network to recognize characters. The network outputs a new image with the content maintained from an input image.
Next, the feature sets from CNN feature encoder 116 are combined, with various types of data that will degrade the feature sets, at combiner 125. In an embodiment, random noise block 120 provides the degradation features. RNN 130, which receives the results of combiner 125 as degraded text, preserves content in the generated images/degraded text.
Generative network 140 receives an output of RNN 130 and provides an output 150 comprising degraded characters or text. As
In an embodiment, within generative network 140, deconvolutional decoder 146 preserves basic content of the characters. For example, in an embodiment the deconvolutional decoder 146 may respond to the input of the degraded characters or text by providing corresponding generally universal characters, which may have the basic shape, but which may lack particulars of appearance such as font. Upsampling projector 142/144 may add specific features, such as sharpness or roundness of font. As a result of the operations of generative network 140, then, the resulting generated/restored image will resemble the target image, or ground truth.
In an embodiment, the network of
In an embodiment, degraded samples, such as might appear in the dataset of an end user, are generated. These degraded samples can be used, if necessary, to finetune the OICR engine to improve its performance. Such synthetic data generation is in contrast to current synthetic data generation, which requires identification of noise patterns, and application of existing fonts. The process thus is simpler, and provides end-to end learning for synthetic data generation and augmentation. In an embodiment, the process may even be automated and integrated into the OICR engine finetuning and retraining process itself to support a specific end user data and requirement. Such integration may occur for certain end users who have a significant amount of overlap between their specific requirements, as the CRNN system might meet, and the OICR engine itself.
In the CRNN, a CRNN encoder-decoder is employed to maintain content information throughout the training process. GAN loss learns low-level noise patterns so as to be able to separate the noise from the content. As a result, there is better control over levels of degradation in the generated samples. This improved control helps to avoid prevents shifting of the training data distribution and damaging the overall capability of the OICR engine because of potential for things like drift.
Similarly to
In an embodiment, depending on the location of the data in a particular field, or on the characters immediately around the damaged letter, the surrounding data may inform the CRNN of the correct identity of the damaged letter.
At 1030, in an embodiment, degraded text (whether resulting from 1005-1025 or from actual degraded text to be corrected) is provided to an encoder (such as CNN feature encoder 170 in
At 1045, a check is made to determine whether the image has been restored. If not, flow returns to 1040 for further modification. Once the image has been restored, at 1050, the restored image is output.
The various elements in processing system may communicate with CRNN 1200, which will be described in more detail below with reference to
In an embodiment, as noted earlier, the CRNN 1200 and the OICR engine both could be deep learning systems. In an embodiment, the CRNN 1200 and the OICR engine could share one or more of the layers of nodes in
While the foregoing describes embodiments according to aspects of the invention, the invention is not to be considered as limited to those embodiments or aspects. Ordinarily skilled artisans will appreciate variants of the invention within the scope and spirit of the appended claims.
The present application is related to U.S. application Ser. No. 17/491,122, filed Sep. 30, 2021, entitled “Method and Apparatus for Customized Deep Learning-Based Text Correction”. The entire contents of the just-referenced application are incorporated by reference herein.