The present invention relates to image processing. More specifically, the present invention relates to automatically applying predetermined styles to images.
Optical character recognition (OCR) is today a field of great interest. As is well-known, OCR is a process in which text is digitally encoded based on digital images containing that text. The text may be printed or typed, and in some cases even handwritten. OCR techniques are used in digital data entry, text mining, and many other machine reading applications.
Training machine-learning systems to perform OCR, however, requires significant amounts of data, generally in the form of text-containing images. However, such data objects are often difficult to obtain. Text-containing images taken in the real world may be subject to complex legal issues, may contain proprietary or personally identifying information, and/or may be quite expensive.
Thus, many OCR applications are currently trained on synthetic images, which are machine-generated rather than traditionally captured. Many systems exist for creating synthetic images of text. Unfortunately, however, the synthetic images produced are frequently too ‘clean’ as in they lack the artefacts and imperfections of real-world images—these synthetic images lack the visual complexity of the real-world images. As an example, real-world images of text are frequently discoloured, or show damaged or wrinkled documents, synthetically generated images tend to look perfect. This means that OCR models and other applications that use these synthetic images as training data struggle to adapt to the messier (and more realistic) real-world images.
Thus, there is a need for systems and methods that can introduce the characteristics of real-world data, i.e., the imperfections, wrinkles, shades, etc. of real-world images into synthetically generated images. That is, there is a need for systems and methods that can introduce broad ‘style characteristics’ into an image. Preferably, these systems and methods are automatic and self-improving.
The present invention provides systems and methods for automatically applying style characteristics to images. The images may comprise text. Additionally, the images may be synthetically generated. A style template containing information about style characteristics is passed to an extraction module, which extracts that information and thus determines the style characteristics. The style characteristics are then passed to an application module, which also receives an input image. The application module applies the style characteristics to the image, thereby producing an output image in the intended style. The extraction module and the application module may comprise machine learning elements. The output image may be used in later processes, including, among others, in training processes for optical character recognition models.
In a first aspect, the present invention provides a method for automatically applying style characteristics to an image, the method comprising:
In a second aspect, the present invention provides a system for automatically applying style characteristics to an image, the system comprising:
In a third aspect, the present invention provides non-transitory computer-readable media having encoded thereon computer-readable and computer-executable instructions that, when executed, implement a method for automatically applying style characteristics to an image, the method comprising:
The present invention will now be described by reference to the following figures, in which identical reference numerals refer to identical elements and in which:
The present invention provides automatic systems and methods for applying style characteristics to an image. In one embodiment, style characteristics of real-world images may be applied to synthetically generated images, which can then be used in later processing. Such an embodiment would reduce the need for obtaining potentially costly and complicated real-world data.
Style characteristics may include a variety of characteristics related to the image 40. For instance, a style characteristic may be an image colour or a colour level (such as, ‘increased yellow colour in all pixels’). Another style characteristic may be a contrast level for the image 40 or a saturation level. Additionally, style characteristics may be related to a level of distortion (for instance, a level of blurriness or sharpness) intended for the image 40. Other distortions indicated by style characteristics may include ‘folding’, ‘creasing’, or ‘wrinkling’ the image. These style characteristics may be applied over the entire image 40, or over sections of that image.
Additionally, as would be understood, multiple style characteristics may be applied to a single image 40. That is, a single image may be ‘yellowed’, ‘blurred’, and ‘folded’ to better represent possible real-world images. Alternatively, an image may have only one style characteristic applied. For instance, a single colour change might represent poor lighting conditions in the real world. The predetermined style template 20 thus may contain information related to one or more style characteristics.
The style template 20 may take many forms. In one embodiment, the style template 20 is a template image that has the desired ‘style’, such as a real-world image with desirable contrast levels or blurriness. In such an embodiment, the extraction module 30 would determine the style characteristics based on an analysis of that template image. In another embodiment, the style template 20 may simply be a list of style characteristics encoded in a usable and convenient format. In a further embodiment, the style template 20 may be the result of a machine learning or training process—that is, the extraction module 30 may use machine learning methods to generate the style template 20 based on other training data, which may include real-world images.
In some embodiments, the extraction module 30 and application module 50 may comprise rules-based elements. However, it may be difficult to prepare rules in advance for all possible images that may be received. Thus, in some embodiments, it may be preferable to use machine learning units in the system 10. Either or both of the extraction module 30 and the application module 50 may thus comprise machine learning elements, including neural network-based elements.
Further, in some embodiments, the functions of the extraction module 30 and the application module 50 may be performed by a single module. In an embodiment using machine learning, the module(s) may be trained using a predetermined style template 20. Alternatively, the module(s) may be untrained at start and merely given a set of template images from which to generate a style template 20.
A kind of neural network known as adversarial networks (and in particular, generative adversarial networks or “GANs”) are well-suited to such tasks. For greater detail on the mechanics of GANs, see Liu, Breuel, and Kautz, “Unsupervised Image-to-Image Translation Networks”, arXiv:1703.00848v6 [cs.CV], July 2018, and Zhu et al, “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”, arXiv:1703.10593v5 [cs.CV], August 2018.
Several tests were performed using a system based on GAN elements. Various input and output images from these tests will now be discussed. Referring to
In these tests, each style was applied by using a different model. However, various configurations and architectural modifications of the same underlying model could also be used to generate multiple different styles. For instance, a model could be configured with several style templates. In such a case, a single model could produce multiple output images 60 from a single input image 40. Alternatively, this single model could be configured to select a different known style for each input image 40. This selection could be random or directed by another process or a user.
Referring now to
Additionally, as discussed above, the style template may take many forms and a single style template may be reused many times, for many different images. Thus, steps 120-130 may be repeated many times with many images, without repeating steps 100-110. Likewise, many different style templates may be used with a single input image.
As would be understood, the output data produced by the present invention may be used for training OCR processes. However, the present invention may also be used in many other applications. In particular, the present invention may be configured for any application that would benefit from using a larger data set. As examples, the present invention may be configured for an application for which data is difficult or costly to obtain, or for which synthetic data is too ‘clean’, or for which a specific style of data is desirable.
Additionally, it should be noted that the term ‘image’, as used herein, is not exclusive to 2D images. Various other forms of data may be used by the present invention. For instance, the present invention may receive 3D image data, video data, medical imaging data, video game data, or any other kind of single-dimensional or multi-dimensional data that would be suitable for the application of style characteristics.
It should be clear that the various aspects of the present invention may be implemented as software modules in an overall software system. As such, the present invention may thus take the form of computer executable instructions that, when executed, implements various software modules with predefined functions.
Additionally, it should be clear that, unless otherwise specified, any references herein to ‘image’ or to ‘images’ refer to a digital image or to digital images, comprising pixels or picture cells. Likewise, any references to an ‘audio file’ or to ‘audio files’ refer to digital audio files, unless otherwise specified. ‘Video’, ‘video files’, ‘data objects’, ‘data files’ and all other such terms should be taken to mean digital files and/or data objects, unless otherwise specified.
The embodiments of the invention may be executed by a computer processor or similar device programmed in the manner of method steps, or may be executed by an electronic system which is provided with means for executing these steps. Similarly, an electronic memory means such as computer diskettes, CD-ROMs, Random Access Memory (RAM), Read Only Memory (ROM) or similar computer software storage media known in the art, may be programmed to execute such method steps. As well, electronic signals representing these method steps may also be transmitted via a communication network.
Embodiments of the invention may be implemented in any conventional computer programming language. For example, preferred embodiments may be implemented in a procedural programming language (e.g., “C” or “Go”) or an object-oriented language (e.g., “C++”, “java”, “PHP”, “PYTHON” or “C#”). Alternative embodiments of the invention may be implemented as pre-programmed hardware elements, other related components, or as a combination of hardware and software components.
Embodiments can be implemented as a computer program product for use with a computer system. Such implementations may include a series of computer instructions fixed either on a tangible medium, such as a computer readable medium (e.g., a diskette, CD-ROM, ROM, or fixed disk) or transmittable to a computer system, via a modem or other interface device, such as a communications adapter connected to a network over a medium. The medium may be either a tangible medium (e.g., optical or electrical communications lines) or a medium implemented with wireless techniques (e.g., microwave, infrared or other transmission techniques). The series of computer instructions embodies all or part of the functionality previously described herein. Those skilled in the art should appreciate that such computer instructions can be written in a number of programming languages for use with many computer architectures or operating systems. Furthermore, such instructions may be stored in any memory device, such as semiconductor, magnetic, optical or other memory devices, and may be transmitted using any communications technology, such as optical, infrared, microwave, or other transmission technologies. It is expected that such a computer program product may be distributed as a removable medium with accompanying printed or electronic documentation (e.g., shrink-wrapped software), preloaded with a computer system (e.g., on system ROM or fixed disk), or distributed from a server over a network (e.g., the Internet or World Wide Web). Of course, some embodiments of the invention may be implemented as a combination of both software (e.g., a computer program product) and hardware. Still other embodiments of the invention may be implemented as entirely hardware, or entirely software (e.g., a computer program product).
A person understanding this invention may now conceive of alternative structures and embodiments or variations of the above all of which are intended to fall within the scope of the invention as defined in the claims that follow.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CA2019/051543 | 10/31/2019 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62754019 | Nov 2018 | US |