This application claims priority to European Patent Application No. 23 201 829.1 filed Oct. 5, 2023, the disclosure of which is incorporated herein by reference.
The present disclosed subject matter relates to a method, a system and a computing device for automatic license plate recognition (ALPR).
Recognising license plates of vehicles is an important task in Intelligent Transportation Systems (ITS) and vehicle tolling applications to identify vehicles for surveillance, routing and tolling. In common ALPR schemes, a camera device is mounted at a surveillance location, e.g., next to or above a road, at the entry or within a parking lot, garage, etc., to record images which each include the license plate number (LPN) of a vehicle. Each of the recorded images is then machine-read by a computing device to recognise the LPN that is included therein. To this end, the computing device usually crops every image to an outer boundary of the license plate, identifies a bounding frame for each of the characters of the LPN within the boundary, feeds those bounding frames, one after the other, to an optical character recognition (OCR) algorithm reading the same, and merges the OCR read characters, one after the other, to the resulting LPN of each image. Based on the recognised LPN, the computing device may initiate a charging process, an opening of a gate or barrier, a routing of the vehicle, etc.
While algorithms for bounding frame identification and for OCR reading have evolved over the years, the error probabilities of wrongly identified bounding frames and wrongly OCR read characters are still high, in particular when images are recorded under bad lighting conditions, at oblique angles, for dirty license plates, concealed LPNs etc. Whenever an LPN is recognised by the computing device with a high error probability (low confidence), surveillance staff needs to review the corresponding recorded image to guarantee a correct recognition of the LPN. Surveillance staff, however, is expensive and the manual review of images takes a long time such that time critical applications like barrier or gate opening and routing are impeded by high error probabilities of current ALPR systems.
It is an object of the disclosed subject matter to provide a method, a system and a computing device for ALPR which allow to recognise LPNs with low error probabilities and, thus, more reliably, cost and time saving.
To this end, the disclosed subject matter provides in a first aspect a method for automatic license plate recognition, comprising:
The method of the disclosed subject matter is based on the surprising finding that images which each include a complete LPN, i.e. not just one character at a time but all the characters of the LPN, can be classified with low error rates by one single artificial neural network (NN) having one output node for each LPN, i.e. by means of a one-to-one mapping between output nodes and LPNs.
In the holistic approach of the disclosed subject matter, the LPNs are extracted from the images of the first set to determine the different LPNs included in the first set, to generate the NN with the appropriate number N1 of output nodes, and to train the NN with the images and the extracted LPNs. After training, the NN is able to reliably recognise the LPNs it has been trained on, i.e. when the sample image is fed into the NN, the output node for that LPN that is included in the image “fires” reliably, i.e. outputs a distinguished highest value. In this way, the correct LPN is recognised by the trained NN with a low error rate (a high confidence) and without the necessity to identify and OCR-read each of the characters of the license plate number separately. As the trained NN is less error prone, less or no surveillance staff needs to be employed.
Summing up, generating and training the NN with one output node per different LPN to be recognised achieves low error rates and allows for a more reliable, cost and time saving ALPR.
The first set may comprise the images recorded and stored within a certain time interval, e.g., of one hour or less, one day, one week, one year or more. However, due to storage space restrictions or data privacy requirements, a storage of the recorded images for a long time interval is often impossible, for instance prohibited by the General Data Protection Regulation (Regulation (EU) 2016/679, abbreviated GDPR). To overcome this issue, in an optional embodiment the method of the disclosed subject matter further comprises:
In this embodiment the NN is generated, extended and trained charge-wise, always with a current one of (at least) two charges while the previous charge/s are deleted, i.e. with a first charge comprising the images and LPNs of the first set recorded and stored within a first time interval, and—after deleting the first charge—a second charge comprising the images and LPNs of the second set recorded and stored within a second time interval, and optionally—after deleting the second charge—with a further (third, fourth, etc.) charge.
To adapt the NN for a reliable recognition of the “new” LPNs, which are included in the images of the current (e.g., the second) set and not in the images of the previous (e.g., the first) set, the different new LPNs are determined from the extracted LPNs of the current set and the NN is extended by one further output node for each different new LPN.
While the current training of the NN is performed on the current set of images—as the images of the previous set/s is/are deleted—applicant's research has shown that the extended NN is still capable of recognising the new as well as the “old” LPNs which are included in the images of the previous set/s and not in the images of the current set. Hence, no “catastrophic forgetting” problem has been observed. Therefore, the sample image may include either a “new” or an “old” LPN, both being reliably recognised by the extended NN.
Seen from another perspective, the extended NN with its one-to-one mapping between output nodes and LPNs is well suited for the charge-wise generating/extending and training such that the images of the previous set/s may be deleted to save storage space and achieve a high standard of data privacy.
In a favourable variant of this embodiment the images of the second set are fed into the artificial neural network and the different license plate numbers that are included in the second set and not in the first set are determined as the different license plate numbers included in those images of the second set for which all of the output nodes output a respective value below a predetermined threshold value. In this way, the NN itself is employed to find out whether an image of the current (e.g., second) set fed into the NN includes an LPN that is old (has one high value of the output nodes) or that is new (only has low values of the output nodes). This allows for a fast determination of the different new LPNs. Optionally, the old LPNs may thereby simultaneously be extracted by the NN.
Advantageously, the mapping between the output nodes and the corresponding license plate numbers is stored in a mapping table. Utilising a mapping table allows to encode/decode each LPN by its corresponding output node, e.g., to increase data privacy when the mapping table may only be accessed by authorised users/personnel. Moreover, the mapping table may be accessed to quickly determine whether an LPN extracted from an image already has a corresponding output node or whether a new output node is to be added to the NN for that LPN.
In a favourable embodiment said step of extracting the license plate number comprises OCR reading the recorded images of the first and/or second sets of images character by character. In this embodiment the disclosed method utilises—often pre-existing—OCR reading capabilities to extract all or a part of the LPNs for the subsequent generating, extending and training of the NN.
The NN may be trained on and recognise LPNs in the images as recorded. However, to allow for a less complex NN, e.g., having less nodes and/or layers, and for further reducing the ALPR error rates, the recorded images may optionally be pre-processed, in particular by at least one of resizing, converting to grayscale, blur filtering, rotating, cropping to an outer boundary of the license plate, and image sharpening.
In a further beneficial embodiment, in said step/s of training, each image of the first and/or second set is fed into the artificial neural network P times, P being in a range from 2 to 50, in particular from 5 to 20, e.g., from 7 to 13. Thereby, the NN is sufficiently trained to recognise the LPN in the recorded images of the first and/or second sets. Moreover, with these numbers of repeated image feeding no catastrophic forgetting of old LPNs has been observed.
The method of the disclosed subject matter may be carried out with many types of NNs suitable for image recognition. In an optional embodiment the artificial neural network is a convolutional neural network (CNN). A CNN is particularly suited for ALPR and achieves particularly low error rates. Even with a low-complex CNN, which is fast in training and evaluation, low error rates have been achieved.
In a second aspect the disclosed subject matter provides a system for automatic license plate recognition, comprising:
The system utilises the disclosed method in order to recognise LPNs. To this end the system may utilise any of the above-mentioned embodiments to achieve the above-mentioned advantages.
In a third aspect the disclosed subject matter provides for a computing device configured to
The computing device may as well utilise any of the above-mentioned embodiments to achieve the above-mentioned advantages.
The disclosed subject matter will now be described by means of exemplary embodiments thereof with reference to the enclosed drawings, in which show:
The camera device 2 has one or more (here: six) cameras 4 mounted at one or more surveillance locations where license plate numbers (LPNs) Lj are to be recognised for tolling, surveillance, routing, gate opening, etc., here: at a road 5 which is traversed by vehicles 6-8, alternatively or additionally: at a parking lot, garage, gate, barrier, etc. By means of the cameras 4, the camera device 2 records images 91, 92, . . . , generally 9i, of LPNs Lj of license plates 10-14 carried by the vehicles 6-8. Each recorded image 9i, thus, includes one LPN Lj. Each LPN Lj, in turn, includes several (at least two, typically four or more) characters which may each be a number, a letter and/or a logogram such as a seal, escutcheon, state emblem, etc., a Chinese, Japanese or Korean character, or the like.
The computing device 3 may comprise one or more computers, servers, mobile devices, tablets, etc., which may be located in the vicinity of the camera device 2, e.g., at the road 5, and/or remote therefrom, e.g., at a back office. The computing device 3 recognises LPNs Lj in the recorded images 9; and, based thereon, initiates a tolling or charging process, opens a gate or barrier, routes the vehicles 6-8, etc.
Within a first time interval, e.g., one hour, day, week, month, year or more, the camera device 2 records and transmits a first set S1 of images 9i (
The computing device 3 extracts the LPN Lj in each image 9i of the first set S1. The extracting may be carried out in many ways and optionally employ already existing infrastructure. In the example of
The computing device 3 generates the NN 15 with one output node O1, O2, . . . , generally Oj, for each of the N1 different LPNs L1-LN1 in the first set S1. The NN 15 may be generated as any neural network suitable for image recognition such as a convolutional neural network (CNN), a capsule neural network (CapsNet), etc. and with many suitable structures, e.g., with a variety of types and numbers of layers 17-21 and nodes 22, a variety of node connections 23 and node activation functions. However, the NN 15 has one respective output node Oj for each of the N1 different LPNs Lj in the first set S1, i.e. N1 output nodes O1-ON1 in the output layer 21.
The computing device 3 may generate the NN 15 at once, e.g., after extracting the LPNs Lj, by counting the N1 different extracted LPNs L1-LN1 and creating the NN 15 with N1 output nodes O1-ON1. Alternatively, the computing device 3 may generate the NN 15 successively, e.g., during extracting the LPNs Lj, by checking for each extracted LPN Lj whether it is novel (has no corresponding output node Oj yet) and, if so, adding one output node Oj to the NN 15 for that novel LPN Lj.
The computing device 3 optionally stores the mapping between each different LPN Lj and its corresponding output node Oj, e.g., the mapping indicated by arrows 24 between O1 and ‘123’, O2 and ‘456’, etc., in a mapping table 25. The computing device 3, thus, may access the mapping table 25 to determine the number N1 and/or to check whether an LPN Lj extracted from an image 9i is novel. Alternatively to the mapping table 25, the computing device 3 may indicate the mapping by labelling the respective output node Oj with the corresponding LPN Lj, e.g., by labelling the first node by O‘123’, the second node O‘456’ et cetera (not shown) to conserve the mapping.
The computing device 3 trains the NN 15 on the images 9i of the first set S1 and the extracted LPNs Lj of the first set S1. To this end, the computing device 3 feeds the images 9i of the first set S1 into the NN 15 (indicated by arrow 26), compares the values 27j output by the output nodes Oj with the correct values 27j/LPN Lj/output node Oj for the respective image 9i (indicated by arrow 28) and adapts the parameters, e.g., the weights, biases and/or structure, of the NN 15 based on the comparison. For instance, when the first image 91 is fed into the NN 15 the value 271 of the first output node O1 aimed at shall be ‘1’ and the values 27j,j≠1 of the other output nodes Oj,j≠1 aimed at shall be ‘0’ since the first LPN L1 is ‘123’ extracted from the first image 91 corresponds to the first output node O1.
For adapting the parameters the computing device 3 may use any known neural network training algorithm such as backpropagation, difference target propagation, etc.
The training is carried out at least until for each image 9i of the first set S1 the output node Oj for the LPN Lj included in that image 9i outputs the highest value 27j of all output nodes Oj. For instance, after the training, when the second image 92 including the second LPN L2 ‘456’ is fed into the NN 15, the second output node O2 for the LPN L2 ‘456’ included in that image 92 should output the highest value 272 of all output nodes Oj. To this end, the computing device 3 may feed each image 9i of the first set S1 into the NN 15 P times, wherein the number P is in a range from 2 to 50, in particular from 5 to 20, e.g., from 7 to 13.
Said recording, extracting, generating and training steps may be carried out for the whole first set S1 at once or for successive subsets of the first set S1, e.g., batch-wise or image-wise. In the latter case, the NN 15 may be generated by repeatedly adding one output node Oj for each novel LPN Lj extracted from a newly recorded image 9i or batch of the first set S1 and be trained by repeatedly adding the new recorded image 9i or batch of the first set S1 to a training set.
Once the NN 15 has been trained, the computing device 3 utilises the NN 15 for recognising an LPN Lj in at least one fresh (“sample”) image 9s it has not been trained on. To this end, the camera device 2 records the sample image 9s, e.g., of the vehicle 6 when it traverses the camera device 2 a second time, indicated by dashed lines in
As can be seen in
Alternatively to the shown ALPR pipeline 30 the NN 15 may be included in any other type of ALPR pipeline, e.g., be downstream of or parallel to the OCR unit 16, or may be used stand-alone, etc.
As indicated by the feedback stream 38 in
In one embodiment the computing device 3 carries out the further training and optional extending with the first and second sets S1, S2, which may be seen as an extension of the first set S1 by the images 9i of the second set S2.
In another embodiment the computing device 3 deletes the first set S1 of images 9i after the first time interval and carries out the further training and optional extending with the second set S2 only. With reference to
The computing device 3 extracts the LPN Lj in each image 9i of the second set 92, e.g., as described above with respect to the first set S1. Alternatively or in addition to the above-mentioned extractions, the computing device 3 may extract old LPNs Lj of the second set S2 (which were included in the first set S1) by feeding the images 9i of the second set S2 into the NN 15 (following the dashed arrow 39) and recognising the LPN Lj of each image 9i for which one output node Oj outputs a respective highest value 27j above a given threshold as the LPN Lj of that output node Oj (following the dashed arrow 40). Those images 9i for which no output node Oj outputs a respective highest value 27j above a given threshold may be input to the OCR unit 16 (following the dashed arrow 41) and, when OCR-reading fails, reviewed by surveillance staff.
When the second set S2 includes at least one new LPN Lj, i.e. when N2′ is greater than 0, the computing device 3 extends the NN 15 by one output node ON1+1, ON1+2, . . . , generally Oj, for each of the N2′ different LPNs LN1+1-LN1+N2′ that are included in the second set S2 and not in the first set S1, i.e. for each novel LPN Lj. Hence, the extended NN 15 has one respective output node Oj for each of the N1+N2′ different LPNs Lj in the first and second sets S1, S2, i.e. N1+N2′ output nodes O1-ON1+N2′ in the output layer 21.
Similar to the above-mentioned generation, the computing device 3 may extend the NN 15 at once for all the extracted LPNs LN1+1-LN1+N2′ of the second set S2 by counting the N2′ different LPNs Lj that were not included in the first set S1 and extending the NN 15 by N2′ output nodes ON1+1-ON1+N2′; or successively by incrementally adding one output node Oj to the NN 15 whenever a novel LPN Lj is extracted.
To determine the novel LPNs Lj in the images 9i of the second set S2 for extending, the computing device 3 may optionally utilise the NN 15 trained on the first set S1. Thereby, the computing device 3 feeds the images 9i of the second set S2 into the NN 15 and determines the different LPNs LN1+1-LN1+N2′ that are included in the second set S2 and not in the first set S1 (the novel LPNs) as the different LPNs LN1+1-LN1+N2′ included in those images 9i of the second set S2 for which all of the output nodes Oj output a respective value 27j below a predetermined threshold value, e.g., in the images 9i fed into the OCR unit 16 via arrow 41. Alternatively, the computing device 3 may access the mapping table 25 (if present) to determine whether an extracted LPN Lj of the second set S2 is not already included therein and, thus, is novel.
The computing device 3 may optionally add the mapping between each novel LPN Lj and its corresponding output node Oj to the mapping table 25 (if present).
The computing device 3 trains the extended NN 15 on the images 9i of the second set S2 and the extracted LPNs Lj of the second set S2 as detailed above with reference to
Once the NN 15 has been trained on the second set S2, the computing device 3 may utilise the NN 15 to recognise (again or for the first time) the LPN Lj in a sample image 9s recorded by the camera device 2. Therefor, the computing device 3 feeds the recorded sample image 9s into the extended NN 15 and recognises the LPN Lj included in the sample image 9s as detailed above. As shown in
The streams 34 and 37 (and optionally also stream 32) recorded within the third time interval may be used as a third set S3 of recorded images 9i to further train the NN 15 and to optionally extend the NN 15 further by adding further output nodes Oj thereto for each novel LPN Lj, as described above. The above-mentioned training and optional extending of the NN 15 may, thus, be continued for further sets S3, S4, . . . , generally Sk, wherein any old set Sk−1 of images 9i may optionally be deleted before or while a new set Sk of images 9i is recorded and stored within the respective time interval, LPN-extracted and used to train and optionally extend the NN 15.
Optionally, before said extracting, training and/or recognising steps, the computing device 3 may pre-process the recorded images 9j. In particular the computing device 3 may, e.g., depending on the recording angles and conditions of the respective image 9i, convert any image 9i to grayscale, filter out a blur therein, rotate it, crop it to an outer boundary of the license plate 10-14 and/or sharpen it. All of these steps may, of course, may be applied in any order.
With reference to
In a first step 43 the first set S1 of images 9i is recorded, e.g., by means of the camera device 2 within the first time interval. Each recorded image 9i includes one of N1 different LPNs L1-LN1 and each LPN Lj, in turn, comprises two or more characters.
In a second step 44 the LPN Lj in each image 9i of the first set S1 is extracted. Thereby, the recorded images 91 of the first set S1 may be OCR-read character by character by means of the OCR-unit 16 and/or may be read out by surveillance staff.
In a third step 45 the NN 15 is generated with one output node Oj for each of the N1 different LPNs L1-LN1 in the first set S1. The NN 15 may be generated as any neural network suitable for image recognition and with any suitable structure, be it at once for all the extracted LPNs L1-LN1, counting the N1 different LPNs and creating the NN 15 with N1 output nodes Oj, or successively by incrementally adding one output node Oj to the NN 15 whenever a novel LPN Lj is extracted, as detailed above. Optionally the mapping between each different LPN Lj and its corresponding output node Oj may be stored in the mapping table 25.
In a fourth step 46 the NN 15 is trained on the images 9i of the first set S1 and the extracted LPNs L1-LN1 of the first set S1. In this step, the images 9i of the first set S1 are fed into the NN 15, the values 27j output by the output nodes Oj are compared with the correct values 27j indicated by the extracted LPNs L1-LN1 of the images 9i and the parameters of the NN 15 are adapted based on the comparison according to any known neural network training algorithm such as backpropagation, difference target propagation, etc. The training is carried out at least until for each image 9i of the first set S1 the output node Oj for the LPN Lj included in that image 9i outputs the highest value 27j of all output nodes Oj.
Optionally, each image 9i of the first set S1 may be fed into the NN 15 P times in step 46, the number P being in a range from 2 to 50, in particular from 5 to 20, e.g., from 7 to 13. For example, the NN 15 may be trained a number of P epochs, wherein each image 9i is fed into the NN 15 in each epoch.
The steps 43-46 of recording, extracting, generating and training may be carried out one after the other for the whole first set S1 at once or in an overlapping manner with successive parts of the first set S1 being recorded and used for said extracting, generating and training. For example, the NN 15 may be generated by repeatedly adding one output node Oj for each novel LPN Lj extracted from a recently recorded image 9i or batch of the first set S1 and be trained by repeatedly adding a recently recorded images 9i or batch of the first set S1 to a training set.
In a fifth step 47 a sample image 9s is recorded. The sample image 9s includes an LPN Lj that was also included in the images 9i the NN 15 had been trained on so far.
In a sixth step 48 the sample image 9s is fed into the NN 15 and in a seventh step 49 the LPN Lj of the sample image 9s is recognised as the LPN Lj of that output node Oj which outputs the highest value 27j. To retrieve the LPN Lj of the output node Oj that outputs the highest value 27j, the mapping table 25 (if present) may be accessed, the output node Oj be read (if labelled), etc. as detailed above.
Optionally, the NN 15 may be trained and extended by means of the second set S2 of images 9i either in addition to the first set S1 or after deleting-in an optional eight step 50—the first set S1. To this end, the following optional steps 51-54 (and further optionally step 55) may be carried out within a second time interval after said step 46 of training.
In a ninth step 51 the second set S2 of images 9i is recorded, e.g., by means of the camera device 2 within a second time interval. Each recorded image 9i of the second set S2 includes one of N2 different LPNs LN1+1-LN1+N2′, N2′ of which LPNs were not included in the first set S1.
In a tenth step 52 the LPN Lj of each image 9i of the second set 92 is extracted either automatically or manually as detailed above for step 44. In addition, the images 9i of the second set S2 may optionally be fed into the NN 15 to recognise the LPNs L1-LN1 that were already included in the first set S1.
In an eleventh step 53 the NN 15 is extended by one output node ON1+1, ON1+2, . . . , generally Oj, for each of the N2′ different LPNs LN1+1-LN1+N2′ that are included in the second set S2 but were not included in the first set S1, i.e. for each novel LPN Lj of the second set S2.
To determine the novel LPNs Lj of the second set S2 the images 9i of the second set S2 may optionally be fed into the NN 15 and the different LPNs LN1+1-LN1+N2′ that are included in the second set S2 and not in the first set S1 (the novel LPNs Lj of the second set S2) may be determined as the different LPNs LN1+1-LN1+N2′ included in those images 9i of the second set S2 for which all of the output nodes Oj output a respective value 27j below a predetermined threshold value. Alternatively, the mapping table 25 (if present) may be accessed to check whether an extracted LPN Lj of the second set S2 is not already included therein, and, thus, novel.
Optionally the mapping between each novel LPN Lj of the second set S2 and its corresponding output node Oj may be stored in the mapping table 25 (if present).
In a twelfth step 54 the extended NN 15 is trained on the images 9i of the second set S2 and the extracted LPNs Lj of the second set S2 at least until for each image 9i of the second set S2 the output node Oj for the LPN Lj included in that image 9i outputs the highest value 27j of all output nodes Oj. Thereby, each image 9i of the second set S2 may optionally be fed into the NN 15 P times, P being in a range from 2 to 50, in particular from 5 to 20, e.g., from 7 to 13.
In a thirteenth (optional) step 55 the images 9i of the second set S2 may be deleted after the second time interval.
As shown in
The steps 47-49 may be carried out after the first time interval with the NN 15 trained on the first set S1, after the second time interval with the extended NN 15 trained on the first and second sets S1, S2, and/or after any further time interval with the extended NN 15 trained on the first, second, third, etc. sets S1, S2, . . . . Sk.
The steps 44, 46, 52, 54, 59 of extracting, training and recognising may be carried out on the images 9i as recorded. Alternatively, before any of these steps, the recorded images 9i may be pre-processed. For example, the recorded images 9i may be resized, converted to grayscale, blur filtered, rotated, cropped to an outer boundary of the license plate and/or image sharpened. These pre-processing steps may be carried out in any order.
The steps 43-55 may be carried out in any order, some even simultaneously or in parallel, so far as one step does not depend on the result of another step. For example, the deletion of an old set Sk−1 in steps 50 and 55 may be carried out before, during or after carrying out any of the recording, extracting, extending and training steps 51-54 for a new set Sk.
The disclosed subject matter is not restricted to the specific embodiments described above but encompasses all variants, modifications and combinations thereof that fall within the scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
23201829.1 | Oct 2023 | EP | regional |