Embodiments are generally related to the field of ALPR (Automated License Plate Recognition). Embodiments are additionally related to OCR (Optical Character Recognition) and image classification.
ALPR is an image-processing approach that often functions as the core module of “intelligent” transportation infrastructure applications. License plate recognition techniques, such as ALPR, can be employed to identify a vehicle by automatically reading a license plate utilizing image processing, computer vision, and character recognition technologies. A license plate recognition operation can be performed by locating a license plate in an image, segmenting the characters in the captured image of the plate, and performing an OCR (Optical Character Recognition) operation with respect to the characters identified.
The ALPR problem is often decomposed into a sequence of image processing operations: locating the sub-image containing the license plate (i.e., plate localization), extracting images of individual characters (i.e., segmentation), and performing optical character recognition (OCR) on these character images. In order for OCR to achieve high accuracy, it is necessary to obtain properly segmented characters.
The ability to extract license plate information from images and/or videos is fundamental to the many transportation business. Having an ALPR solution can provide significant improvements for the efficiency and throughput for a number of transportation related business processes.
ALPR systems have been successfully rolled out in several U.S. States (e.g., CA, NY, etc.). Some ALPR modules involve training classifiers for character recognition, and are commonly employed after detecting a license plate in a license plate image and segmenting out the characters from the localized plate region. A classifier can be trained for each character in a one-vs-all fashion using samples collected from the site, wherein the collected samples are manually labeled by an operator. Considering the high accuracy (i.e., 99%) required for the overall recognition system, the classifiers are typically trained using on the order of ˜1000 manually labeled samples per character. The substantial time and effort required for manual annotation of training images can result in excessive operational cost and overhead.
In order to address this problem, some techniques have been proposed for training classifiers based on synthetically generated samples. Instead of collecting samples from the site, training images are synthetically generated using the font and layout of a license plate of the State of interest. Examples of such approaches are disclosed in, for example: (1) H. Hoessler et al. “Classifier Training Based on Synthetically Generated Samples”, Proc. 5th International Conference on Computer Vision Systems, 2007; and (2) Bala, Raja, et al. “Image Simulation for Automatic License Plate Recognition.” IS&T/SPIE Electronic Imaging, International Society for Optics and Photonics, 2012, which are incorporated herein by reference.
While these methods eliminate much of the manual investment required for training, they usually result in deterioration in the classification accuracy.
When classifiers are trained by mixing synthetically generated images with real samples, the classification accuracy is recovered as shown in the prior art graph 30 depicted in
Up to this point, the performance of an OCR engine trained with only synthetic characters has been noticeably poor compared to one trained with real characters as shown in graph 20 of
In order to support this iterative legacy process, thousands of real characters are typically required. For perspective, a well-trained OCR engine for a particular state with 36 labels requires ˜54,000 real characters (1500 samples per label). For a mixed synthetic and real scenario, we need 100 samples per label or 3,600 characters. With a targeted method, as outlined in greater detail herein, we can reduce this number to ˜300 examples.
The number of real world images that must be collected is typically much larger than the proportional number of character examples due to a non-uniform distribution of label appearance probability. This amplifies the discrepancy between the results achieved by the disclosed approach versus the baseline. A typical license plate has 7 characters so give a uniform distribution of appearance probability, we'd need 514 plates to obtain 3,600 characters. The actual distribution, however, is depicted in the example prior art graph 40 of
The following summary is provided to facilitate an understanding of some of the innovative features unique to the disclosed embodiments and is not intended to be a full description. A full appreciation of the various aspects of the embodiments disclosed herein can be gained by taking the entire specification, claims, drawings, and abstract as a whole.
It is, therefore, one aspect of the disclosed embodiments to provide for an improved ALPR method and system.
It is another aspect of the disclosed embodiments to provide for methods and systems for bootstrapping an OCR engine for license plate recognition.
The aforementioned aspects and other objectives and advantages can now be achieved as described herein. Methods and systems are disclosed for bootstrapping an OCR engine for license plate recognition. Such an approach can optimize both the performance and upfront investment/time-to-market for an ALPR system through a multi-step cycle that includes, for example, training the required OCR engines using purely synthetically generated characters; identifying the subset of classifiers which require augmentation with real examples, and how many real examples are required for each; and deploying the OCR engine to the field with constraints on automation based on this analysis and operating in a “bootstrapping” period wherein some characters can be automatically recognized while others are sent for human review. Additionally, such an approach involves collecting the previously determined number of real examples required for augmenting the subset of classifiers and retraining each subset of identified classifiers as the number of real examples required becomes available.
This approaches reduces the total manual annotation required for training classifiers in an ALPR system by leveraging synthetically generated character images (i.e., synthetic images) where possible, and augmenting as needed with the samples collected from the site (i.e., real samples). The proportion and number of real samples required for each character can be automatically determined.
The accompanying figures, in which like reference numerals refer to identical or functionally-similar elements throughout the separate views and which are incorporated in and form a part of the specification, further illustrate the present invention and, together with the detailed description of the invention, serve to explain the principles of the present invention.
The particular values and configurations discussed in these non-limiting examples can be varied and are cited merely to illustrate at least one embodiment and are not intended to limit the scope thereof.
Next, as shown at block 56, a step or logical operation can be implemented for identifying the problematic characters based on: the attributes of the trained classifiers; determining the subset of classifiers which require augmentation with real examples and how many real examples are required for each; and the confidence thresholds and/or a “go/no go” for initial automation for each classifier.
The OCR engine can then be deployed “in the field” with the above constraints on automation based on this analysis and operates in a “bootstrapping” period wherein some characters are automatically recognized while others are sent for human review. The previously determined number of real examples required for augmenting the subset of classifiers can be collected. Then, as indicated at block 58, each of the subset of identified classifiers can be retrained as the number of real examples required becomes available. In other words, the step or logical operation shown at block 58 involves training one-vs-all classifiers using synthetic and real images.
Implementation of the method 50 depicted in
The multi-step process/method 50 shown in
After synthetic images are generated for each character, the step shown at block 54 involves training a one-vs-all classifier for each character using the synthetic images. The classifiers can be trained using many features and classifiers. The example features can be SMQT, LBP, SURF, SIFT, HOG, and Fisher vectors, etc. Example classifiers include but are not limited to: SVM, Neural Networks, SNOW, and Deep Belief Networks, etc. Here, there is no restriction in selecting the features or the classifier.
The step or logical operation of identifying the problematic characters based on the attributes of the trained classifiers and determining depicted at block 56 involves: 1) the number of real samples required to further improve each character classifier; and 2) the confidence threshold for automated conclusions for each character classifier during an initial operational mode. In this operation, both the number and proportion of real samples required for augmenting each character classifier, as well as the confidence threshold for initial operation, are automatically determined from the attributes of the classifiers trained using synthetic images.
Before collecting real samples from the site, problematic characters or labels can be identified based on the attributes of the classifiers trained using the synthetic images and hence, thereby minimize the time and effort required for collecting and labeling real samples.
Even though the training error is 0 for all 36 characters, the number of support vectors is notably higher for the problematic characters, which leads to higher error rates when tested with real images as shown, for example, by the data depicted in graph 63 of
Although support vectors are specific to SVM classifiers, the problematic characters can also be identified by attributes of other classifiers. For example, the number of iterations in training can be used as a metric to identify these problematic characters when SNOW is used as the classifier.
Once the problematic symbols are identified, we will collect more real samples for these problematic labels according to severity. If number of support vectors is used as a metric, then the number of real images required for a character can be calculated as:
wherein μ and σ are the mean and standard deviation of the number of support of all the characters. Note that this way of calculating the number of real images required is not restrictive and other ways that assign more samples to problematic characters can also be used.
Finally,
For the initial set of synthetically trained classifiers, most will perform quite well while only a few problematic symbols will have poor performance. Using the method outlined above, we can identify which classifiers are likely to perform well and which are likely to struggle. Leveraging this information, it is possible to determine which of the classifiers could be allowed to become fully operational without additional real training examples and which should not. This could be achieved with a simple gating function based on the number of support vectors, or additional training examples required:
Here, A determines whether or not a given classifier should be automated, r is the number of additional real training examples required for acceptable performance (as calculated using the method above), and T is a predetermined threshold. Extending this technique beyond a simple threshold, one could calculate a confidence threshold for automation for each classifier based on the predicted additional training examples required:
Here, synthetically trained classifiers that require larger numbers of support vectors are essentially restricted by requiring much larger confidence values for automation. This would limit their automation rate (requiring more human review), but would still allow some of the higher confidence (and thus more accurate) results to be generated automatically. Note that these are examples of how the confidence thresholds may be calculated and are not meant to be exhaustive or limiting.
As discussed earlier, the OCR engine can be deployed to the field in a “bootstrapping” model. In this situation, the OCR is allowed to operate under the constraints determined. Thus, only some of the classifiers are allowed to generate fully automated conclusions and each classifier may have its own predetermined confidence threshold. Note that during this bootstrapping mode, the automation rate will be somewhat lower than in the final fully operational mode. However, it is typical for ALPR systems to be deployed in a staged fashion. In fact, some ALPR implementations may have a timeline baked into their pricing to the customer that assumes certain levels of automation as the system performance is improved over time. The key here is that, without the disclosed embodiments, no known state of the art system today is able to become operational in any appreciable manner using purely synthetically trained classifiers. By using the disclosed approach, it is possible to achieve reasonable performance (maintaining accuracy while also achieving fairly high levels of automation) without requiring any real training examples.
Regarding collecting the required number of real samples in the field, as the system operates in bootstrap mode, most of the classifiers will perform normally (e.g., fully automated) while the subset of problematic classifiers will be constrained. Segmented character images for which no high confidence match is found are sent for human review. For those problematic characters that are either disabled (e.g., gated off) or for whom a much higher confidence threshold has been imposed, most or all of the real examples in the field for these symbols will be directed to human review.
Regarding the step of retraining the one-vs-all classifiers for each character using both synthetic and real images depicted at block 58 in
Note that the performance of the disclosed embodiments has been tested to demonstrate the benefit and feasibility of this approach. In one experiment, CA plates were utilized and 2500 real samples were collected for each character. 2000 synthetic images per character were also generated. 1500 out of 2500 real samples per character were used to train the classifier and the remaining 1000 samples were employed for testing. The images were scaled to 40×20 before feature extraction.
Using the synthetic images, the problematic characters were first identified. HOG features were then extracted and a linear SVM classifier was trained in a one-vs-all fashion for each character. The number of support vectors is illustrated in graph 90 of
For the problematic characters, 100 real samples were included while having no real samples for the other characters and trained SNOW classifiers with 2000 synthetic images along with the real images included using SMQT features (in total 600 real images are used). For comparison purposes, the classifiers were also trained in three different settings. In the first setting, SNOW classifiers were trained using 2000 synthetic images per character. In the second setting, the SNOW classifiers were trained using 1500 real images per character. And in the third setting, the SNOW classifiers were trained using 100 real samples for all the characters along with 2000 synthetic images, which amount to 72000 synthetic and 3600 real images in total.
Graph 100 demonstrates the performance for the CA character set where 6/36 characters are problematic. For most other states, the number of problematic characters is typically lower such as 3/36 for CT and thus an advantage of the disclosed approach is that 1/12th the number of real images are required to achieve performance similar to the naïve implementation.
One of the unique features of the disclosed embodiments involves utilizing an uneven number of real samples for each character when a mix of real and synthetic images is used for training the OCR engine. Another unique feature involves identifying the problematic characters based on the attributes of the trained classifiers and determining the number of real samples required for each character. An additional unique feature includes automatically identifying operational constraints (e.g., individual confidence thresholds, which characters can be processed automatically, etc.) for an initial bootstrapping mode of in-field operation.
Advantages of the disclosed approach include a reduction in time and effort required for training an OCR engine in the deployment of ALPR systems in new jurisdictions, along with rendering the training investment more predictable, and reducing the time required to get a system operational in the field (since in-field capability can be provided even while bootstrapping). All of these advantages help to reduce upfront costs for field deployment.
A business need is also solved with the disclosed approach because it saves significant manual review cost. Currently, the gathering of images for OCR training takes 2-3 months and all images are manually reviewed. Any reduction in this time leads directly to profit.
As can be appreciated by one skilled in the art, embodiments can be implemented in the context of a method, data processing system, or computer program product. Accordingly, embodiments may take the form of an entire hardware embodiment, an entire software embodiment or an embodiment combining software and hardware aspects all generally referred to herein as a “circuit” or “module.” Furthermore, embodiments may in some cases take the form of a computer program product on a computer-usable storage medium having computer-usable program code embodied in the medium. Any suitable computer readable medium may be utilized including hard disks, USB Flash Drives, DVDs, CD-ROMs, optical storage devices, magnetic storage devices, server storage, databases, etc.
Computer program code for carrying out operations of the present invention may be written in an object oriented programming language (e.g., Java, C++, etc.). The computer program code, however, for carrying out operations of particular embodiments may also be written in conventional procedural programming languages, such as the “C” programming language or in a visually oriented programming environment, such as, for example, Visual Basic.
The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer. In the latter scenario, the remote computer may be connected to a user's computer through a local area network (LAN) or a wide area network (WAN), wireless data network, e.g., Wi-Fi, Wimax, 802.xx, and cellular network or the connection may be made to an external computer via most third party supported networks (for example, through the Internet utilizing an Internet Service Provider).
The embodiments are described at least in part herein with reference to flowchart illustrations and/or block diagrams of methods, systems, and computer program products and data structures according to embodiments of the invention. It will be understood that each block of the illustrations, and combinations of blocks, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general-purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function/act specified in the block or blocks.
The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions/acts specified in the block or blocks.
As illustrated in
The following discussion is intended to provide a brief, general description of suitable computing environments in which the system and method may be implemented. Although not required, the disclosed embodiments will be described in the general context of computer-executable instructions, such as program modules, being executed by a single computer. In most instances, a “module” constitutes a software application.
Generally, program modules include, but are not limited to, routines, subroutines, software applications, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types and instructions. Moreover, those skilled in the art will appreciate that the disclosed method and system may be practiced with other computer system configurations, such as, for example, hand-held devices, multi-processor systems, data networks, microprocessor-based or programmable consumer electronics, networked PCs, minicomputers, mainframe computers, servers, and the like.
Note that the term module as utilized herein may refer to a collection of routines and data structures that perform a particular task or implements a particular abstract data type. Modules may be composed of two parts: an interface, which lists the constants, data types, variable, and routines that can be accessed by other modules or routines; and an implementation, which is typically private (accessible only to that module) and which includes source code that actually implements the routines in the module. The term module may also simply refer to an application, such as a computer program designed to assist in the performance of a specific task, such as word processing, accounting, inventory management, etc. The module 252 shown in
Based on the foregoing, it can be appreciated that a number of embodiments are disclosed. For example, in an embodiment, a method for optimizing an ALPR system can be implemented. Such a method can include steps or logical operations of, for example: generating synthetic images with respect to each character on a license plate image; training one or more classifiers utilizing the synthetic images; determining a number of samples of real images required for each character based on attributes of the classifier(s) training utilizing the synthetic images; and retraining the classifier(s) utilizing the synthetic images and the real images as the real images become available. In some embodiments, the aforementioned classifer(s) can be implemented as, for example, a one-vs-all classifier and/or an OCR engine.
In another embodiment, the step or operation for determining a number of samples of real images can further comprise a step or logical operation of identifying a subset of the classifier(s) requiring augmentation with the real images and a number of real images required for each subset of the classifier(s).
In yet another embodiment, steps or logical operations can be provided for deploying the OCR engine with constraints based on the training and retraining of the classifier(s); and operating the OCR engine in a bootstrapping period wherein some characters are automatically recognized while other characters are transmitted for human review.
In another embodiment, steps or logical operations can be provided for collecting a previously determined number of the real images required for augmenting each subset of the classifier(s); and retraining each subset of identified classifiers among the classifier(s) as a number of real examples of the real images required becomes available.
In another embodiment, a system for optimizing an ALPR system can be implemented. Such a system can include at least one processor; and a computer-usable medium embodying computer program code, the computer-usable medium capable of communicating with the processor(s). The computer program code can include instructions executable by the processor and configured, for example, for: generating synthetic images with respect to each character on a license plate image; training at least one classifier utilizing the synthetic images; determining a number of samples of real images required for each character based on attributes of the classifier(s) training utilizing the synthetic images; and retraining the classifier(s) utilizing the synthetic images and the real images as the real images become available.
In another embodiment, a system for optimizing an ALPR system can be implemented. Such a system can include, for example, at least one imaging capturing unit (e.g., a video camera) that captures a license plate image; at least one processor that communicates electronically with the image capturing unit(s); and a computer-usable medium embodying computer program code, the computer-usable medium capable of communicating with the processor(s). The computer program code can include instructions executable by the processor(s) and configured for: generating synthetic images with respect to each character on the license plate image; training at least one classifier utilizing the synthetic images; determining a number of samples of real images required for the each character based on attributes of the classifier(s) training utilizing the synthetic images; and retraining the classifier utilizing the synthetic images and the real images as the real images become available.
It will be appreciated that variations of the above-disclosed and other features and functions, or alternatives thereof, may be desirably combined into many other different systems or applications. Also, that various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.