1. Field
An exemplary embodiment of this invention relates to the field of the detection of Automatic Number Plate Recognition (ANPR) systems. More specifically an exemplary embodiment of the invention relates to a method and a system capable of extracting the location of the car number-plate from a series of 2-D images, using a device equipped with a camera of any kind.
2. Description of the Related Art
There are many known devices that are able to detect the location of the number plate of a car and then recognize the plate-number producing at the output an alphanumeric text corresponding to the characters of the plate number.
There are many approaches for performing car-plate detection and recognition. Most of these systems are based on a Personal Computer to carry out the required processing tasks. In such systems a video digitizer samples the camera sensor and a PC, which runs the car-plate detection and recognition software, then processes the data. However these implementations are not easily portable, are bulky in size, require special power-supply and are difficult to install on site.
When ANPR systems are used for recognizing plates of moving cars in highway roads, another important characteristic is the recognition speed. In order to be able to catch fast-moving cars, the plate detector must be able to analyze very fast every frame in the video sequence. The detection speed depends on the algorithm and the processor speed. Today's common processors or even dedicated digital signal processor (DSP) devices are not able to deliver the required performance.
An exemplary embodiment of the invention refers to a stand-alone computer-camera system capable of extracting car-plates. This is achieved by using an on-board computer in order to analyze the video stream recorded by the camera sensor, and can be used with any type of camera sensor. The system features specific characteristics making it extremely fast and able to catch plates of cars moving at high-speed.
The special algorithms incorporated in this system, are specially implemented, in order to be able to be ported on an embedded computer system, which has usually lower capabilities in terms of processing power and memory than a general-purpose computer.
The exemplary embodiments of the invention will be described in detail, with reference to the following figures, wherein:
In the current description we refer to the detection of multiple car-plates from a video sequence and the extraction of the coordinates of each plate. In accordance with an exemplary embodiment of the present invention, location information of car-plates is extracted from an image frame sequence by using a system like the one shown in
The Car Plate Detection Device through which the system detects car-plates and extracts the information of the coordinates is shown in
This exemplary system functions as follows: First two consecutive frames I1 and Ii+1, (12 in
Data from Binary Image Data memory are then fed to the Morphological Filtering unit (225 in
The next step is the classification of the blobs in order to identify the car plates. This procedure takes place into the Region Classification unit (228 in
The results of this extraction are then fed into the Plate Output unit (234 in
A final step of processing concerns the segmentation of the plate digits that exist in the detected plate. This procedure takes place within the Digit Extraction Unit (229 in
The results of this extraction are then fed into the Digit Output unit (230 in
In the following paragraphs the above-referred units are explained analytically.
Moving Object Detection Unit (227 in
This unit detects the motion of pixels from consecutive video frames. The target is to identify one or more moving cars in a steady background as viewed by the camera. The background corresponds to the view of the camera when no car is present and nothing else moves. However this complete absence of motion rarely occurs under real world conditions and therefore the background is instead modeled according to a background model. The background model is actually an image obtained using some statistical methodology, which incorporates any minor differences that may occur due to slight variations in lighting conditions, electronic noise from some camera sensor, or due to some minor motions inherent in the video scene (e.g. tree leaves moving due to a blowing wind).
Given the background model, any moving pixels can be identified in a video frame by subtracting the background model from this particular frame. Therefore referring to
As the motion in the current frame becomes more intense, more pixels are different from the background model.
The calculation of the background model can be achieved using statistical techniques: Each pixel in the background model is modeled through the use of the corresponding pixels of some consequent frames as shown in
PBM
k=0.5PBMk+0.5Pk,i=1 . . . N (1a)
In an exemplary embodiment, a weighted average measure is used described by the following relation:
PBM
k
=aPBM
k+(1−a)Pki,i=1 . . . N (2a)
The difference between equations (1a) and (2a) is the parameter α, which in the case of running average takes on the value 0.5. Values of α smaller than 0.5, make the system to be more robust to background changes. In this case the background model change faster or equivalently the system has limited memory and is able to forget its history. As parameter α gets smaller, the background model changes faster.
More specifically, the procedure of detecting a car-plate is the following: As a first step, the background model BM is calculated. In the first iteration, the background model is initialized with a zero value for every pixel (52 in
The background model is then subtracted from the current frame (54 in
As a final step, the system outputs the coordinates of the moving object using the following procedure: First all the coordinates of the pixels characterized as <<moving>> are sorted (58 in
Image Binarization Unit (224 in
The Binarization unit (224 in
The binarization procedure employs the comparison of each pixel in the image with a threshold value TH_bin and then forms a new binary image having a one to one correspondence with the initial image described as follows: Pixels in the original image with a value greater than TH_bin correspond to pixels with value 255 in the binary image and pixels in the original image with a value lower than TH_bin correspond to pixels with value 0 in the binary image.
However binarization using a global threshold is not an optimal solution. A major problem with global thresholding is that changes in illumination across the scene may cause some parts to be brighter (in the light) and some parts darker (in shadow) in ways that have nothing to do with the objects in the image.
Such uneven illumination can be handled by determining thresholds locally. That is, instead of having a single global threshold, we allow the threshold itself to smoothly vary across the image.
Local Thresholding
In the current invention, we use a local thresholding method, which uses local edge properties in a window to compute threshold.
Automatic Threshold Adaptation Unit (223 in
The selection of the threshold in the Image Binarization unit is a very critical task, since it influences the content of the binary image and finally the precision of the detection system. Usually the value of this threshold changes with the content of the image or with the lighting conditions. Therefore the use of a constant (global or local) threshold, although an option, is not optimal. To this end an automatic threshold adaptation unit is included in the system described in the current invention. The system is able to adapt a global or local threshold according to the results of the detection process.
In an exemplary embodiment, a local thresholding approach is used which employs threshold adaptation using feedback from the system output and more specifically from the Digit Segmentation unit (229 in
More specifically the unit functions as follows: For every frame IK (71 in
An edge map is defined as an image containing image edges. An image edge is a point in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. The points at which image brightness changes sharply are typically organized into a set of curved line segments termed edges.
Edge detection is the process of obtaining the edge-map of an image. The detection process typically employs the filtering of an image by convolving a standard matrix known as an “operator” with the image. This filtering process results in an image having increased intensity for the pixels belonging to an edge and decreased intensity for pixels not-belonging to an edge. Usually as a final step, the binary edge map is obtained by applying binarization, using thresholding, to the edge-map image. This results in an image which has white pixels at the edges and black pixels everywhere else.
In an exemplary embodiment, the binary edge map (76 in
The Threshold Trimming sub-system functions as follows: An arbitrary, pre-determined initial threshold value THRES—1=THRES—1IN1 is set. To be equal to the smaller integer number which is closest to the value 2Nb/2, where Nb is the number of bits used to represent the pixel value (e.g for 8 bits representation this number equals 127). The plate detection and digit segmentation process is then run and when the plate detection and the digit segmentation process is finished, the number of detected digits is fed from the Digit Segmentation unit (229 in
Each threshold from Threshold Trimming sub-system is fed to the Thresholding I sub-system (75 in
As a next step, the Input Frame IK and the binary edge map EK is partitioned into a number NBX×NBY blocks of dimensions w×w pixels each. Then for every from frame IK the following procedure takes place iteratively for every frame IKij (75 in
The block IKij is taken (75 in
The next step is the binarization of this semi-binary block DKij by applying a thresholding scheme (792 in
THRES—2=ΣxWΣyWDijxy (1)
, where Dijxy is the pixel in x-th column and the y-th row of the DKij block. The result is a binary version BKij (793 in
Automatic Threshold Calculation Unit (232 in
Alternative to automatic threshold adaptation, an automatic Threshold Calculation unit can be used. To this end a global threshold calculation algorithm can be used which can lead to acceptable performance.
There are a few automatic global threshold calculation approaches that can be used in this system [2]:
Algorithm of Ridler and Calvard, which optimizes the process of changing a gray-level image to a bimodal image, while retaining the best possible illumination of the image.
Algorithm of Otsu, which is a classical algorithm in image binarization. This algorithm transforms a gray-level image to a binary image for classifying foreground and background with a global threshold. This algorithm can be applied iteratively to a gray-scale histogram of an image for generating threshold candidates.
Algorithm of Pun proposes an optimal criterion for image thresholding. This criterion is corrected and improved by Kapur et al. which revised and improved Pun's algorithm by assuming two probability distributions for objects and background as well as maximizing the entropy of the image to obtain the optimal threshold.
Algorithm of Kittler and Illingworth, proposing a minimum error thresholding algorithm that minimizes the probability of classification error by fitting error expression. It is assumed that a mixture of two Gaussians distributions of object and background pixels can characterize the image.
Algorithm of Fan et al., proposing a fast entropic technique to obtain a global threshold automatically by reducing complexity in computation.
Algorithm of Portes de Albuquerque et al. proposing an entropic thresholding algorithm, which is customized from non-extensive Tsallis entropy concept.
Algorithm of Xiao et al. proposing an entropic thresholding algorithm based on the gray-level spatial correlation (GLSC) histogram. This is a revision and extension of Kapur et al.'s algorithm.
In one exemplary embodiment, the algorithm of Kapur has been selected for implementation [3]. This algorithm assumes two probability distributions, for objects pobg (foreground) and background pbg, and maximizes the between-class entropy of the image to obtain the optimal threshold.
The between-class entropy of the threshold image is defined as:
pi is the probability of a pixel value to appear in the current image and is defined as the ratio of the appearances of a value to the total number of pixels.
For bi-level thresholding, the optimal threshold is:
TH
optimal=ArgMax{f1(TH)} (7)
In other words the optimal threshold value is the value of TH for which the quantity f1 is maximized for each frame.
Threshold Input Unit (231 in
This unit is an input unit, which can be used optionally to input a threshold value manually.
Morphological Filtering Unit (225 in
In the presence of electronic noise, or physical obstacles (e.g. dust) the binarization process may result in binary noise. Binary noise manifests as white spots. These spots can cause a significant increase of the processing time. This is because the Connected Component Analysis unit (232 in
To overcome this problem, the Morphological Filtering unit cleans any isolated pixels in order to eliminate these pixels and to produce a more “clear” binary image.
The unit implements the following morphological operation: In each video frame (80 in
For each window the number of black Nb and the number of white pixels Nw is counted. Then if Nb>Nw the central pixel of the 3×3 window is set to have black value (82 in
Connected Component Analysis Unit (226 in
This unit aims at the labeling of the binary image regions using a connected components algorithm. The target is to label each object within the binary image and this incorporates the labeling of each pixel with a label. Pixels that are somehow connected are given the same label. At the end of this procedure, pixels with the same label corresponding to an object, having the same label as its constituting (labeled) pixels.
In an exemplary embodiment, a run-length based connect component algorithm is used [4], which is similar to the two-pass connected component algorithm [5], but here run-lengths are used rather than pixels resulting in a more efficient implementation in terms of computer memory and processing power.
The stages involved in this implementation are as follows:
1. Encoding pixels to runs (using run-length encoding);
2. Initial labeling and propagation of labels
3. Resolving of conflicts; and
4. Translating run labels to connected component.
Encoding Pixels to Runs (Using Run-Length Encoding),
In accordance with an exemplary embodiment of the current invention, a run-length encoding representation is followed for labeling. The run-length encoded format is also much more compact than a binary image (individual runs have a single label), and so the sequential label propagation stage that follows, is much faster than the conventional algorithm.
Run-length encoding works as follows: Consider the binary image frame (91 in
Initial Labeling and Propagation of Labels
This stage involves initial labeling and propagation of labels (
After that, the 4-way or 8-way connectivity is checked of each run. In 4-way connectivity, the adjacent pixels in four directions (up, down, left, right) are checked. If they are foreground pixels then are connected otherwise they are un-connected. Consider for example pixels P3 (98 in
In 8-way connectivity, the diagonal directions are also checked. Consider for example pixels P1 (95 in
For each Run with identity IDi excluding Runs on the last row of the image, Runs Rj one row below the Ri is checked for a connection. In terms of run-length encoded lines, 4-way connection between two Runs Ri,Rj means that the following conditions hold:
s
i
≦e
j (8)
and
e
i
≧s
j (9)
8-way connection between two Runs Ri, Rj means that the following conditions hold:
s
i
≦e
j+1 (10)
and
e
i+1≧sj (11)
a connected run in the row below ri, is assigned the identity IDi, if and only if its ID, IDj is unassigned. If there is a conflict (e.g. if an overlapping run has assigned IDj), the equivalence of run I (the EQi) is set to IDj.
Resolving of Conflicts
The EQ and ID values should be equal. A differentiation between those two values for some run indicates the presence of some conflict, which occasionally happens when special shaped objects are encountered. Thus a third stage must be included for resolving those conflicts. For example this problem may be occurred when a ‘U’-shaped object is encountered. As shown in
The solution is a conflict-resolving algorithm, which follows a serial procedure, which scans all the runs sequentially, in the way shown in
Translating Run Labels to Connected Component.
At the end of this procedure, each run has a label; so it is straightforward to obtain the final components, by simply gather the runs having the same labels.
Region Classification Unit (228 in
The aim of this unit is to classify each region identified with the help of the CCA unit (227 in
The region classification procedure includes two steps: The region feature extraction and the region classification.
Region Feature Extraction
Region feature extraction includes the measurement of several characteristic features of each region (142 in
The width of the region: Width corresponds to the width of a rectangle surrounding the region under consideration (144 in
The area that the region occupies: This is the area occupied by a rectangle surrounding the region under consideration (144 in
The magnitude of the region: This is the count of the non-white pixels NNW, of the connected region and is measured in pixels.
The plenitude of a region: This measure indicates how full the region under consideration is. For example a region containing gaps will have less plenitude in relation with a region not having gaps. The plenitude of a region is defined as the ratio of the area to the magnitude features defined above.
The aspect ratio of a rectangle surrounding the region under consideration (143 in
Number of Scan-lines intersection points: Several “virtual” lines of 1-pixel thickness are considered that intersect the region at different heights (144 in
Statistical normalized central moments (Hue Moments). Statistical manipulation of the pixels and their coordinates within a region result in the formation of a set of region-specific features called statistical moments [6]. Central moments are given by the following expression:
μpq=ΣxΣy(x−
In Eq. (12) x, y are the x and y coordinates of each pixel in the region and x, y are the mean values of all x and all y coordinates respectively for each non-white pixel within this region. Integer numbers p and q, determine the order of a statistical moment. Combinations of low order statistical moments (up to the order of 2 e.g. μ02 to μ11), represent some physical measure of the region as the mean, the mass-center, the skewness, the angle with the x-axis etc. For example, the angle of a region with the horizontal x-axis is given by the following expression:
In an exemplary embodiment, the calculation of these statistical moments is performed in the encoded space and on the run-length encoded runs. As it has been described above, each run is described by three numbers namely si, ei, ri, which indicate the start and the end on the x-direction as well as the row of each non-white pixel within the region under consideration. If this type of description is used, eq. 12 cannot be directly applied, since the coordinates of each pixel in the region under consideration is not available. To this end eqn. 12 should be modified accordingly. Below, this modification of the central moments is given for order up to 3 (p+q≦3).
One interesting modification of these moments, results when the central moments are normalized used following relation:
By using these normalized central moments, a new set of statistical moments can be formed, known as the Hu moments Ii, given by the following relations
I
1
=n
20
+n
02 (24)
I
2=(n20−n02)2+4n112 (25)
I
3=(n30−3n12)2(3n21−n03)2 (26)
I
4=(n30+n12)2+(n21+n03)2 (27)
I
5=(n30−3n12)(n30+n12)[(n30+n12)2−3(n21+n03)2]+(3n21−n03)(n21+n03)[3(n30+n12)2−(n21+n03)2] (28)
I
6=(n20−n02)[(n30+n12)2−(n21+n03)2]+4n11(n30+n12)(n21+n03) (29)
I
7=(3n21−n03)(n30+n12)[(n30+n12)2−3(n21+n03)2]−(n30−3n12)(n21+n03)[3(n30+n12)2−(n21+n03)2] (30)
In a different implementation the run-length encoded region under consideration, is first decoded in order to obtain the initial binary image corresponding to this region. In this case, equation 12 is applied directly. The procedure that is followed in order to do this is analyzed below, in the description of the digit segmentation unit.
The feature vector FVHM={I1, I2, I3, I4, I5, I6, I7} resulting from this set of features contains up to 7 numbers corresponding to the 7 Hu moments I1 to I7 as described in Eqs. 24-30.
Region Classification
The region classification aims to the classification of each region under consideration as a car-plate or not, using also input from the Classification Criteria Trimming unit (227 in
In implementing an exemplary embodiment, a pattern classification scheme is used for region classification. To this end, the system has been previously trained offline, using a database with regions corresponding to plates and with regions corresponding to non-plates. For each region, the features described in the previous section are evaluated and a total feature vector is formed. The feature vector is then projected in the feature space, defined as a multi-dimensional space with as many dimensions as the feature vector. In such a projection, the feature vectors corresponding to plate and non-plate regions are concentrated (clustered) in separate areas of the multi-dimensional feature space. Consider the example shown in
The next step is to define the centers of the individual clusters. In accordance with one exemplary embodiment, this is achieved via the calculation of the center of mass of each cluster. The center of mass has coordinates
where NS is the number of samples (regions) participating in each cluster. In the 3-dimensional example referred before, the centers of the clusters are indicated as C1 (156 in
When a new region T is tested, its feature vector FVT is obtained. This corresponds to a point in the feature space. In order to test into which cluster this test point belongs, the distance of this point from the centers of the clusters is computed using some distance measure such as the L1 distance, L2 distance, the Mahalanobis distance etc.
In one exemplary embodiment, the L2 distance is used which is defined as follows: in Cartesian coordinates, if p=(p1, p2, . . . , pn) and q=(q1, q2, . . . , qn) are two points in Euclidean n-space, then the L2 or Euclidean distance from p to q, or from q to p is given by the following expression:
d(p,q)=d(q,p)=√{square root over (ΣIi=1n(qi−pi)2)} (32)
In the 3-dimensional example of
Once the distances of the test point from the centers of the clusters are computed, the decision about into which cluster this point belongs to is taken according a proximity criterion. That is, the point belongs to the nearest cluster according to the distance measure used. Once this decision has been made, the region under test has been classified as plate or non-plate.
While the above description utilizes a specific classifier, it is understood that an Artificial Neural Network classifier or any other type of classifier can be used.
An alternative to pattern classification, is the feature filtering implementation. In this scheme, a region can be classified as plate or non-plate according to some empirical measures corresponding to physical properties of each region, or some empirical observations.
To this end the features of width, magnitude, aspect ratio, plenitude, scan-lines and the angle with the x-axis (eq. 13) are used. The target is a formation of a decision vector as follows:
Each of the above-mentioned features is checked against a target value or a range of target values rule (TABLE 1), which are in turn obtained from empirical observations or from governmental standards. These rules are input from the Classification Criteria Trimming unit (227 in
Conformance to the target value corresponds to a true indication and a non-conformance to the target value corresponds to a false indication. To this end a binary decision vector DV is obtained as follows:
DV={D
width
rule
,D
magnitude
rule
,D
aspect
ratio
rule
,D
plenitude
rule
,D
scan
lines
rule
,D
angle
rule}
A simple approach is to classify the region as a plate if and only if all the logic vector containing logic ones, meaning that the all the feature values conforming to the target values.
However in the current implementation a decision fusion rule is formed leading to optimal results. This fusion rule is the following
FR={[D
width
rule AND Daspect
If FR is TRUE then the region is classified as a plate, while if FR is FALSE the region is classified as a non-plate.
The target value rules can be change when is needed (e.g. the system need to be trimmed for a different country) through the Classification Criteria Trimming unit (227 in
Classification Criteria Trimming Unit (227 in
This unit is used for input target value rules to the region classification unit (228 in
Plate Output Unit (234 in
The aim of this unit is to output the coordinates of each region classified as a car plate. The unit outputs the plate if and only if the Automatic Threshold Adaptation unit (223 in
Digit Segmentation Unit (229 in
The aim of this unit is to segment the individual digits constituting a car-plate in order to be able to be output from the system in binary form to an Optical Character Recognition (OCR) system.
The digits in a binary plate image appear as coherent regions (161 in
The CCA analysis performed in this unit follows steps 2 and 3 of the CCA analysis performing the CCA unit, leaded by an extra step, which is the background-foreground inversion. In the first CCA analysis, the digits of the plate appear as white holes (background), since the digits are usually black. To this end they are not run-length encoded and thus information about them cannot be extracted. To this end a background-foreground inversion must be carried out for the regions detected as plates using a procedure, which for a region containing N runs is shown in
Once the background-foreground inversion has been carried out, the new inverted runs must be de-coded in binary image format (pixels coordinates and values). This process is straightforward and incorporates the use of a structured image memory, which is loaded with pixels values at coordinates indicated by the run-length code. Analytically the process followed in the current implementation for a region containing N runs, is shown in
Digit Output Unit (230 in
The aim of this unit is to output the digits to the system output when the Automatic Threshold Adaptation unit (223 in
The systems, methods and techniques described herein performed or implemented on any device that comprises at least one camera, including but not limited to, standalone cameras, security cameras, smart cameras, industrial cameras, mobile phones, tablet computers, laptop computers smart TV sets and car boxes, i.e. a device embedded or installed in an automobile that collects video and images. It will be understood and is appreciated by persons skilled in the art, that one or more processes, sub-processes or process steps described in embodiments of the present invention can be implemented in hardware and/or software.
While the above-described flowcharts and methods have been discussed in relation to a particular sequence of events, it should be appreciated that changes to this sequence can occur without materially effecting the operation of the invention. Additionally, the exemplary techniques illustrated herein are not limited to the specifically illustrated embodiments but can also be utilized and combined with the other exemplary embodiments and each described feature is individually and separately claimable.
Additionally, the systems, methods and protocols of this invention can be implemented on a special purpose computer, a programmed microprocessor or microcontroller and peripheral integrated circuit element(s), an ASIC or other integrated circuit, a digital signal processor, a hard-wired electronic or logic circuit such as discrete element circuit, a programmable logic device such as PLD, PLA, FPGA, PAL, any comparable means, or the like. In general, any device capable of implementing (or configurable to implement) a state machine that is in turn capable of implementing (or configurable to implement) the methodology illustrated herein can be used to implement the various methods, protocols and techniques according to this invention.
Furthermore, the disclosed methods may be readily implemented in software using object or object-oriented software development environments that provide portable source code that can be used on a variety of computer or workstation platforms. Alternatively, the disclosed system may be implemented partially or fully in hardware using standard logic circuits or VLSI design. Whether software or hardware is used to implement the systems in accordance with this invention is dependent on the speed and/or efficiency requirements of the system, the particular function, and the particular software or hardware systems or microprocessor or microcomputer systems being utilized. The systems and methods illustrated herein can be readily implemented in hardware and/or software using any known or later developed systems or structures, devices and/or software by those of ordinary skill in the applicable art from the functional description provided herein and with a general basic knowledge of the video processing arts.
Moreover, the disclosed methods may be readily implemented in software that can be stored on a storage medium, executed on programmed general-purpose computer with the cooperation of a controller and memory, a special purpose computer, a microprocessor, or the like. In these instances, the systems and methods of this invention can be implemented as program embedded on personal computer such as an applet, JAVA™ or CGI script, as a resource residing on a server or computer workstation, as a routine embedded in a dedicated system or system component, or the like. The system can also be implemented by physically incorporating the system and/or method into a software and/or hardware system, such as the hardware and software systems of an electronic device.
It is therefore apparent that there has been provided, in accordance with the present invention, systems and methods for the detection of multiple number-plates of moving vehicles. While this invention has been described in conjunction with a number of embodiments, it is evident that many alternatives, modifications and variations would be or are apparent to those of ordinary skill in the applicable arts. Accordingly, it is intended to embrace all such alternatives, modifications, equivalents and variations that are within the spirit and scope of this invention.
(All of which are incorporated herein by reference in their entirety)