This application claims priority under 35 U.S.C. 119(e)(1) to Indian Provisional Application No. 201641000153 filed Jan. 4, 2016
The technical field of this invention is image processing.
Traffic sign recognition (TSR) is a technology which makes vehicles capable of recognizing the traffic signs appearing in the vicinity of the driving path. TSR systems form an important part of the ADAS (advanced driver assistance systems) that is currently being deployed in the cars of today. It is a classic example of rigid object detection. TSR systems depend on forward facing image sensors. Current TSR systems are aimed to assist the driver in the driving process. But in future, TSR systems will play a very crucial role in the functioning of autonomous cars.
Computers face a lot of challenges in identifying traffic signs in images due to the following reasons:
A real time Traffic Sign Recognition (TSR) system is described comprising of a preprocessing stage to identify image regions containing a traffic sign, a localization stage accurately locate the sign within the image, a categorization stage to categorize the located sign into one of the sign categories, and a temporal smoothening stage remove noise and false detections due to noise.
These and other aspects of this invention are illustrated in the drawings, in which:
A four stage TSR algorithm is shown as described below. It is also shown pictorially in
Stage 1: Preprocessing Stage 101
Identify the approximate image regions having traffic signs without missing any traffic sign in input images.
Stage 2: Accurate Localization Stage 102
Stage 2a: Extract features from input images in 103
Stage 2b: Accurate localization of the region of traffic sign within the image using classifier in 104.
Stage 3: Classification Stage 105
The windows localized by stage 2 are categorized into one of the categories.
Stage 4: Temporal Smoothening 106
This stage is meant to remove the noisy detections and noisy classification that have been obtained from stage 3.
The preprocessing stage works on the input image and is aimed at reducing the complexity of TSR system by reducing the amount of data that is processed by subsequent stages. It is implemented in two steps:
Extract color cues to find possible locations of traffic signs
A shape detector uses these color cues to identify image locations having traffic signs.
As shown in
Contrast stretching is done in 204 by using histogram equalization on the Y plane. This improves the performance of the algorithm in many low contrast input images.
Red, Blue, Yellow and White binary masks are extracted by thresholding in YUV color space (1 mask for each color) in 205.
Morphological opening (erosion followed by dilation) is applied in 206 for each of these binary masks.
The masks are combined in 207
The binary masks are used by extended radial symmetry transform (ERST) in 208. ERST detects circle, triangle, square and octagon in the input images by performing voting for the gradients present in regions of mask.
In 301 a gradient map for entire image in grey scale is computed using Sobel operator.
In 302, the binary masks obtained from color space thresholding act as (color) cues for this stage.
The gradients that are less than threshold are zeroed out in 303 and are not considered for later stages.
The voting is performed in a 3D accumulator array(x,y,r) 304. One 3D accumulator array is maintained for each shape (circle, square, triangle, and octagon).
Voting (incrementing procedure of accumulator cells) is performed only for the gradient (edge) points for which the binary value in the mask is non-zero.
After voting finishes for the entire image in 305, the top ‘N’ peaks in each accumulator are used in 306 to determine the position and radius of the circle/polygon at that point.
Feature extraction Stage 2a is performed by:
Traffic sign localization Stage 2b is performed by:
An ADA boost (Adaptive Boosting) classifier is used for this localization. Boosting is an approach to machine learning based on the idea of creating a highly accurate prediction rule by combining many relatively weak and inaccurate rules.
1024 number of decision trees of depth 2 act as weak classifiers for ADA boost. A single weak classifier is depicted in
Features computed from 32×32 pixel blocks of images (known as a model) are used as inputs to the classifier. The model is made to step by 4 pixels (both horizontal and vertical) on each image and each scale, as shown in
Feature vectors obtained in this manner from training images are used for training the ADA boost classifier. Training is done in 4 stages with 32, 128, 256, 1024 weak classifiers used in each stage. Boot strapping is used in each stage to strengthen the hypothesis.
The feature vector of size 640 pixels is fed to the ADA boost classifier. The ADA boost returns a real number which is binary thresholded to decide if TS is present or not. Note that localization procedure is only a binary decision procedure where it is decided if a traffic sign is present or not. Actual classification (categorization to specific class) is done in the next stage.
Traffic sign classification Stage 3 is done by:
x=(x1, x2, x3, . . . , xN)T
μ=(μ1, μ2, μ3, . . . , μN)T
Minimization of Mahalanobis distance is mathematically equivalent to minimization of the below function
g
i(x)=witx+wi0
where gi(x)→cost function for class ‘i’
Temporal smoothening Stage 4 is performed by:
Removing the noisy detections and noisy classification that have been obtained from the earlier stages. This stage is present only when the input is a sequence of images that form a part of single video.
The temporal smoothening engine is conceptually depicted in
The descriptors of detection windows 701 (position and dimensions) obtained from stage 2.
Class id's 702 that are associated with each of these detection windows obtained from stage 3.
The temporal smoothening engine internally maintains a history of the detection windows. This history is empty at the start of the sequence of pictures and is updated after every picture. The decision logic block inside the engine looks at the inputs and the history before finalizing the windows and its associated class.
It uses the Jaccard coefficient to measure degree of similarity between windows detected in the current picture and the windows stored in the history. Jaccard coefficient J(A, B), between two windows A and B is defined as follows,
The numerator term denotes the area under intersection and denominator denotes the area in the union of the two windows.
The details of the temporal smoothening engine are shown in
TSR algorithm, and the class id (id[]) 802 for each detection window. In 803, hist[] is the state memory that is built when a new picture is processed. The Jaccard coefficient is computed in 804 for every pair of windows, with one window selected from hist[] and the second from det_win[]. In 805 det_idx is set to zero, and in 806 the we find the hist[best_match_hist_idx] that gives the J, J_max when paired with det_win[det_idx]. If in 807 J_max is >0.5, hist[best_match_hist_idx is stored into det_win[det_idx], and id[det_idx]is associated with the same entry of hist[] in 808. If J_max is = or < than 0.5 in 807, det_win[best_match_det_idx] is added to hist[] as a new entry, and id[best_match_det_idx] is stored with the same entry of hist[] in 809. In 810 we determine if all entries of det_win[] have been processed. If not, det_idx is incremented in 811, and the flow returns to 806. If all entries have been processed, all hist[] entries that have not been updated are deleted in 812.
The output of temporal smoothening engine in 813 and 814 is used as the final output of the TSR system.
Number | Date | Country | Kind |
---|---|---|---|
201641000153 | Jan 2016 | IN | national |