The present invention relates to an object detection field, particularly to an embedded deep learning multi-scale object detection model using real-time distant region locating device and method thereof.
Recently, in the computer information field, due to the flourishing development of deep learning techniques, the expression of computer vision technique has already been approaching the expression of human eyes, and the relevant technical topics have become very hot correspondingly. In these computer vision techniques, a large amount of computer vision understanding techniques have been used in the advanced driver assistance system (ADAS), where the object detection techniques have been used in various car safety assist driving systems, such as Forward Collision Warning System (FCWS), Rear Collision Warning System (RCWS), Brake Assist System (BAS), Advanced Emergency Braking System (AEBS), Lane Departure Warming System (LDWS), and Blind Spot Detection System (BSDS), etc. At present, the above-mentioned systems can be effectively used in automatic assist car driving, which not only can effectively lighten the long driving fatigue, but also can effectively achieve the car driving assist effect, and further increase the safety of driving.
When the car is driven at high speed, the detection of distant object will become very important. However, the distant object will usually be shown as a small object in the detected image, also, the characteristics will be presented less. Based on the imaged object recognized under distant object detection, the obtained effect will be poorer. Thus, a lot of traffic accidents occur because it is unable to dodge the object suddenly resting on the lane in time, such as the stalled vehicle, vehicle stayed in front of road due to traffic accident, or other obstacles rested in front of road. Therefore, detecting the small distant objects at this moment has become an important and necessary detection technique for ensuring safe car driving.
As for the conventional detection of small distant objects, it is necessary to adopt the high-resolution image for regarding as the object recognition, in order to carry on follow-up analysis for obtaining enough object characteristics. However, it will take a large amount of computation time by directly using high-resolution image to catch the characteristics. If the whole picture is caught directly, it will cause that the characteristics of small objects will be insufficient to detect sufficient information. Most conventional objects detection systems are rule-based techniques. It is necessary to aim at various object characteristics and use specific characteristics catching methods to define the objects, such as the car, people, etc. If there is abominable weather or complicated environmental background, or the uncommon object appeared on the road, or the characteristics of object are lost due to deformation or fast movement of object, it will cause substantial drop of detection accuracy and stability. Thus, as for the conventional image detection technique, such as S. Teoh's research team uses the image edge detection method and the left-right symmetrical characteristics of car, to generate the probable appearance position of detected target object. However, if the car direction taken by camera is not located right behind the car, the car will not possess the symmetrical characteristics, so that the detection effect will be influenced tremendously. The detected object cannot be too small, it is necessary to possess certain amount of edge characteristics, otherwise, it will be unable to be judged as the car.
As for the other prior art, V. Růžička's research team used deep learning for object detection. Firstly, the low-resolution image of downsampling is used for carrying on the preliminary object detection. In the detected object frame, find out the part of smaller size, in order to be regarded as the detection position required to be focused again. At the nearby position, catch the high-resolution image for carrying on the second object detection. It is necessary to infer that the smaller object can be detected successfully from the first model. The function can be confirmed that the confidence of small object can be obtained. However, if the object is too small, it will be unable to be detected successfully. Under the distant object detection in the driving field, the detection is also unable to be carried out too far.
As for another prior art, M. Najibi's research team used deep learning for carrying on object detection, and the network was added to predict the probable position probability picture of small object. Firstly, the low-resolution image of downsampling is used for carrying on the preliminary object detection, in order to determine the probable position of predicted small object. Then, near the high probability distribution region, catch high-resolution images from the original picture, carry on lesser downsampling, go back to the original procedure and implement it once again repeatedly, until there is no appearance of small objects with high probability. The objects should be large, or small objects appear gradually. If there is appearance of too small objects, and there are no slightly larger objects to guide the system to focus on its region, other small objects might be missed. Furthermore, the numerous inference objects detection models are required. When there are too many objects which are quite dispersed, it will produce too long operation and implementation time. There is another TSENG prior art, that used deep learning object detection technique which had multi-layer convolutional neural network structure. Interweave the pooling layer from each layer, and the object detection layer will be received for processing at the neural network output of rear layers. The results are summarized, and the final object detection result is outputted. This model will focus on smaller object detection in the shallower neural network output part, and the deeper part will focus on larger object detection, which can detect different multi-scale objects. However, due to the shallow layer only catches lesser object characteristics, and it is harder to completely judge the classification of small objects usually, so that the obtained confidence will be relatively poor. As for the detection way of distant small objects, use the low-resolution images to guess the probable position of small objects first, and then catch each position to enter into object detection model for detecting detailed object frame and classification from high-resolution images. As for this detection way, the objects can't be too small, otherwise when using the low-resolution images to predict the probable position of small objects at the first stage, and the characteristics might be too few to judge the object frame and classification, so that the detection might not be successful.
Therefore, it is necessary to improve the prior art of deep learning object detection technique, in order to solve the shortcoming of the prior art of deep learning object detection technique. This is also an urgent topic which has to be improved in the technical field of relevant sectors.
In view of the above-mentioned description, the present invention proposes an embedded deep learning multi-scale object detection model using real-time distant region locating device and method thereof, in order to solve the shortcoming of the prior art.
The purpose of the present invention is to overcome the shortcoming of existing technique, proposes an embedded multi-function deep learning network, which have good computation efficiency and can develop the optimal distant smaller object detection system without increasing the computation amount.
The purpose of the present invention relates to an embedded deep learning multi-scale object detection model using real-time distant region locating device and method thereof, which catches big objects image through downsampling, and reserves the characteristics of small objects, in order to achieve the purpose of high detection accuracy.
The purpose of the present invention is to base on the above-mentioned embedded deep learning multi-scale object detection model using real-time distant region locating device and method thereof, which also has high-efficiency deep learning object detection structure of distant/nearby object detection accuracy and algorithm operation speed. Under lower computation amounts, it has better object detection accuracy.
The structure of the present invention can be applied to majority of famous deep learning object detection model, and operated in the automotive embedded Nvidia Jetson Xavier platform.
In order to achieve the above-mentioned purpose and other purposes, the present invention provides an embedded deep learning multi-scale object detection model using real-time distant region locating method, which can be used for drive recording to detect the farthest objects in the picture, comprising:
After Step (c), the present invention further includes the following steps:
After Step (d), the present invention further includes Step (h), integrating the vanishing point detection result, and the object high-resolution detection results, and outputting the detection results.
In Step (d) of the present invention, the analyzed vanishing point detection result is the computed maximum confidence number.
The vanishing point detection subnetwork of the present invention is the trained vanishing point detection multi-function deep learning network, including 1×1 convolution computation, flat layer and all connecting layers.
Between Step (d) and Step (e), the present invention further comprises:
Step (h) of the present invention includes the followings:
Step (i) and (d) of the present invention can be executed repeatedly to get the detection results in a plurality of layers.
In order to achieve the above-mentioned purpose and other purposes, the present invention provides an embedded deep learning multi-scale object detection model using real-time distant region locating device, which can be used in drive recording to obtain the farthest objects in the detection picture, comprising:
Image catch unit, used for catching the image of picture; process unit, used for making the downsampling image processing of the image, to obtain the downsampling image, the process unit connects the image catch unit; storage unit, which connects the process unit, the process unit stores the object detection subnetwork, multi-function deep learning network and vanishing point detection subnetwork; where, the process unit uses the multi-function deep learning network to catch the characteristics of downsampling image from the downsampling image, inputs the characteristics of downsampling image to the vanishing point detection subnetwork, the process unit analyzes the vanishing point detection result, to judge the vanishing point, the process unit regards the region around vanishing point as trim frame, trims the high-resolution image in the trim frame, the process unit uses multi-function deep learning network to catch the characteristics of high-resolution image from the trimmed high-resolution image, uses the object detection subnetwork to analyze the object high-resolution result, the object analysis result includes a plurality of object frames, the process unit integrates the vanishing point detection result and the object high-resolution detection results, and outputs the detection results.
The present invention inputs the characteristics of downsampling image to the object detection subnetwork, the process unit analyzes the object downsampling detection results, the object downsampling detection results includes a plurality of downsampling object frames, the process unit integrates the vanishing point detection result, the object downsampling detection results, and the object high-resolution detection results, and outputs the detection results.
Compared to the conventional technique, the present invention provides an embedded deep learning multi-scale object detection model using real-time distant region locating method, using the low-resolution input to make nearby object detection at the first stage, and predicts the vanishing point region, catches the picture adjacent to vanishing point from the high-resolution original picture as the distant picture. Subsequently, carrying on the second object detection, to detection the distant objects, finally, combining the distant/nearby object detection results through a specific post processing algorithm, to obtain a complete object detection result.
Through the image catch device, the present invention uses fixed frequency to catch the image information in front of vehicle, inputs the image with arbitrary resolution, provides clear trichromatic (RGB) image, and higher resolution can be used to detect farther objects.
After the present invention catches the downsampling image to low-resolution image, inputting to the multi-function deep learning network, catching the image characteristics vector matrix, and inputting to object detection subnetwork and vanishing point detection subnetwork, to get larger object, that is closer object detection results and image picture vanishing point, then based on this vanishing point, catch the set small region nearby high-resolution original picture, sent again to the multi-function deep learning network and object detection subnetwork without downsampling, detecting smaller distant object result, finally, using a specific post processing method to combine these two detection results to obtain final result.
Compared to the conventional prior technique, the present invention has the following development advantages:
The present invention adopts the deep learning method to carry on the catch of image characteristics. Compared to the algorithm of conventional technique, the present invention has higher accuracy and stability with respect various weathers, diversified backgrounds, and various object types.
The present invention uses the multi-function deep learning network to compute two tasks of object detection and vanishing point detection in a same network, where these two tasks share the trunk structure of network together, which can save the computation amount greatly. Based on this advantage, the present invention can effectively detect distant objects without increasing too much computation amount. Therefore, the present invention can effectively process a great deal of computation amount required by the deep learning network.
As for the sensor required by the present invention, only an image catch device can be adopted to detect nearby and distant objects. Compared to some conventional technique systems, the present invention, the wide-angle lens and telephoto lens should be adopted, in order to detect nearby objects and distant objects, respectively. The present invention can save the more use cost further.
The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated as the same becomes better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein:
The attached figures should be used to describe the implement way of the present invention. In the figures, the same element symbol is used to represent the same element, in order to describe the element more clearly, its size or thickness might be scaled.
Firstly, please refer to
Firstly, as shown in Step S100 of
As shown in Step S102 of
As shown in Step S104 of
As shown in Step S106 of
As shown in Step S108 of
Please refer to
Refer to Step S401 of
Refer again to Step S402 of
Refer again to Step S403 of
Refer to Step S404 of
Refer to Step S405 of
As shown in Step S110 of
As shown in Step S112 of
As shown in Step S114 of
Please refer to
As shown in Step S116 of
As shown in Step S114 of
It has to describe that the execution of Step S116 in
In another embodiment of the present invention, after Step S106 of
On the other hand of the present invention,
As shown in Step S422 of
As shown in Step S423 of
In the present invention, Step S116 of
As the process unit 620 shown in
As for the structure of vanishing point detection subnetwork 636 shown in
Inputting the characteristics of downsampling image is shown in
The process unit 620 of
The object detection subnetwork 632 shown in
The process unit 620 shown in
The process unit 620 shown in
The present invention relates to an object detection method, which can be used for the visual angle of forward drive recorder, detects the farthest place of picture automatically, and focuses on the distant object. It is trying to solve the problem of advanced driver assistance system, where the detection distance is often not far enough. The present invention has an efficient multi-scale object detection network (ConcentrateNet), which can search the vanishing point of picture automatically, and pay close attention to its nearby region. At the first inference in the model, it will produce the objects under the larger visual field for the prediction of object detection result and vanishing point position. The vanishing point position represents the farthest place of picture. Then use the vanishing point position to make model inference around nearby region once again, to obtain the detection result of distant object. Finally, use non-maximum restriction processing to combine these two results.
The structure of the present invention can be applied to most object detection models. It is able to use several advanced object detection models and add the multi-scale object detection network structure to carry on the test. Compared to the original model using higher input picture to draw higher quality, the less computation amount can be used to obtain higher accuracy. For example, under the information BDD100K in large vehicle, YOLOv3 can directly use 960×540 high resolution to get AP 28.2% accuracy. The multi-scale object detection network uses 640×360 low-resolution still can get AP 30.7% accuracy. The recall rate of small object even can be increased from AR 24.9% to 35.7%. In addition, the present invention can test the multi-scale object detection network through the low power consumption embedded NVIDIA Jetson AGX Xavier platform.
The present invention provides an embedded deep learning multi-scale object detection model using real-time distant region locating method, which can be used for drive recording to detect the farthest objects in the picture, comprising:
After Step (c), the present invention further includes the following steps:
(i) Inputting the characteristics of downsampling image to the object detection subnetwork, analyzing the object downsampling detection results, the object downsampling detection result includes a plurality of downsampling object frames;
Executing the original Step (h), in order to get (h)′, integrating the vanishing point detection result, the object downsampling detection results, and the object high-resolution detection results, and outputting the detection results.
After Step (d), the present invention further includes Step (h), integrating the vanishing point detection result, and the object high-resolution detection results, and outputting the detection results.
In Step (d) of the present invention, the analyzed vanishing point detection result is the computed maximum confidence number.
The vanishing point detection subnetwork of the present invention is the trained vanishing point detection multi-function deep learning network, including 1×1 convolution computation, flat layer and all connecting layer.
Between Step (d) and Step (e), the present invention further comprises:
Step (h) of the present invention includes:
Step (i) and (d) of the present invention can be executed repeatedly to get the detection results in a plurality of layers.
The object detection subnetwork of the present invention includes the RetinaNet, YOLOv3, FCOS, FoveaBox, RepPoints, Anchor-base, and Anchor-free model.
It is understood that various other modifications will be apparent to and can be readily made by those skilled in the art without departing from the scope and spirit of this invention. Accordingly, it is not intended that the scope of the claims appended hereto be limited to the description as set forth herein, but rather that the claims be construed as encompassing all the features of patentable novelty that reside in the present invention, including all features that would be treated as equivalents thereof by those skilled in the art to which this invention pertains.
Number | Date | Country | Kind |
---|---|---|---|
110149226 | Dec 2021 | TW | national |
Number | Name | Date | Kind |
---|---|---|---|
11120566 | Guo | Sep 2021 | B2 |
11170470 | He | Nov 2021 | B1 |
20190050993 | Jang | Feb 2019 | A1 |
20190138676 | Akella | May 2019 | A1 |
20190180418 | Kuybeda | Jun 2019 | A1 |
20210064913 | Ko | Mar 2021 | A1 |
20210279950 | Phalak | Sep 2021 | A1 |
20220269910 | Onzon | Aug 2022 | A1 |
20230128432 | Yeh | Apr 2023 | A1 |
20240056694 | Ozone | Feb 2024 | A1 |
20240071209 | Murphy | Feb 2024 | A1 |
Entry |
---|
Gu et al., “3-D LiDAR + Monocular Camera: An Inverse-Depth-Induced Fusion Framework for Urban Road Detection,” IEEE Transactions on Intelligent Vehicles, vol. 3, No. 3, Sep. 2018 (Year: 2018). |
Number | Date | Country | |
---|---|---|---|
20230206654 A1 | Jun 2023 | US |