This non-provisional application claims priority under 35 U.S.C. ยง 119(a) on Patent Application No(s). 110130954 filed in Taiwan (R.O.C.) on Aug. 20, 2021, the entire contents of which are hereby incorporated by reference.
This disclosure relates to a bounding box reconstruction method, and particularly to a 3D bounding box reconstruction method.
Due to the widespread use of vehicles, the issue of vehicle safety is getting more and more attention, and accordingly, there are more surveillance cameras installed at traffic intersections. However, with the increase in the number of surveillance cameras, it is difficult to monitor each camera manually, so various automated methods such as vehicle detection, traffic flow calculation, vehicle tracking, license plate recognition, etc. have been developed to be implemented on cameras in recent years.
Most of these automated methods involve computations based on the bounding box corresponding to the vehicle in the camera image. However, the existing bounding box reconstruction methods are limited by the conditions in which: (1) the road environment must be composed of vertical lines and horizontal lines; (2) the vehicle movement only includes linear motion; (3) the direction of traffic flow is the same as the lane direction. Therefore, the existing bounding box reconstruction methods can only be applied to the road environment with a fixed trajectory direction.
Accordingly, this disclosure provides a 3D bounding box reconstruction method, a 3D bounding box reconstruction system and a non-transitory computer readable medium that are applicable to complex road environments, such as intersections, roundabouts, etc.
According to an embodiment of this disclosure, a 3D bounding box reconstruction method includes obtaining masks corresponding to a target object in images, obtaining a trajectory direction of the target object according to the masks, generating a target contour according to one of the masks, transforming the target contour into a transformed contour using a transformation matrix, obtaining a first bounding box according to the transformed contour and the trajectory direction, transforming the first bounding box into a second bounding box that corresponds to the target contour using the transformation matrix, obtaining first reference points according to the target contour and the second bounding box, transforming the first reference points into second reference points using the transformation matrix, obtaining a third bounding box using the second reference points, transforming the third bounding box into a fourth bounding box using the transformation matrix, and obtaining a 3D bounding box using the second bounding box and the fourth bounding box.
According to an embodiment of this disclosure, a 3D bounding box reconstruction system includes an image input device, a storage device and a processing device coupled to the processing device and the storage device. The image input device is configured to receive images. The storage device stores a transformation matrix. The processing device is configured to perform a number of steps including: obtaining masks corresponding to a target object in images; obtaining a trajectory direction of the target object according to the masks; generating a target contour according to one of the masks; transforming the target contour into a transformed contour using the transformation matrix; obtaining a first bounding box according to the transformed contour and the trajectory direction, and transforming the first bounding box into a second bounding box using the transformation matrix, wherein the second bounding box corresponds to the target contour; obtaining first reference points according to the target contour and the second bounding box, and transforming the first reference points into second reference points using the transformation matrix; obtaining a third bounding box using the second reference points, and transforming the third bounding box into a fourth bounding box using the transformation matrix; and obtaining a 3D bounding box using the second bounding box and the fourth bounding box.
According to an embodiment of this disclosure, a non-transitory computer readable medium includes, at least one computer executable procedure, wherein a number of steps are performed when said at least one computer executable procedure is executed by a processor, and the steps include: obtaining masks corresponding to a target object in images; obtaining a trajectory direction of the target object according to the masks; generating a target contour according to one of the masks; transforming the target contour into a transformed contour using a transformation matrix; obtaining a first bounding box according to the transformed contour and the trajectory direction, and transforming the first bounding box into a second bounding box using the transformation matrix, wherein the second bounding box corresponds to the target contour; obtaining first reference points according to the target contour and the second bounding box, and transforming the first reference points into second reference points using the transformation matrix; obtaining a third bounding box using the second reference points, and transforming the third bounding box into a fourth bounding box using the transformation matrix; and obtaining a 3D bounding box using the second bounding box and the fourth bounding box.
In view of the above description, the 3D bounding box reconstruction system, the 3D bounding box reconstruction method and the non-transitory computer readable medium in this disclosure may reconstruct a 3D bounding box of a target object for correcting the position of the center of the target object so as to obtain a more accurate target object trajectory direction. The system, method and non-transitory computer readable medium in this disclosure may be applied to vehicle monitoring, achieve 3D reconstruction of traffic flows in different direction, and overcome the limitations of the existing methods.
The present disclosure will become more fully understood from the detailed description given hereinbelow and the accompanying drawings which are given by way of illustration only and thus are not limitative of the present disclosure and wherein:
In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the disclosed embodiments. It will be apparent, however, that one or more embodiments may be practiced without these specific details. In other instances, well-known structures and devices are schematically shown in order to simplify the drawings.
Please refer to
The image input device 11 may include, but not limited to, a wired or wireless image transmission port. The image input device 11 is configured to receive images. These images may be static images captured in advance from a real-time stream or video. Alternatively, the image input device 11 may receive a real-time stream or video including images. For example, the image input device 11 may receive road images taken by a monocular camera installed on the road.
The storage device 13 may include, but not limited to, flash memory, hard disk drive (HDD), solid-state drive (SSD), dynamic random access memory (DRAM) or static random access memory (SRAM). The storage device 13 may store a transformation matrix. The transformation matrix may be associated with perspective transformation, and used to project an image to another viewing plane. In other words, the transformation matrix may be used to transform an image from a first perspective to a second perspective, and to transform the image from the second perspective back to the first perspective. For example, the first perspective and the second perspective are side perspective and top perspective respectively. Besides the transformation matrix, the storage device 13 may also store an object detection model trained in advance. The object detection model may be a convolutional neural network (CNN) model, particularly an instance segmentation model, such as Deep Snake model, but this disclosure is not limited to this.
The processing device 15 may include, but not limited to, a single processor or integration of microprocessors, such as central processing unit (CPU), graphic processing unit (GPU), etc. The processing device 15 is configured to use the data stored in the storage device 13 to detect the target object and reconstruct a 3D bounding box of the target object, and the steps performed by the processing device 15 are described later.
In some embodiments, before detecting the target object and reconstructing the 3D bounding box of the target object, the processing device 15 may obtain the transformation matrix according to a first image taken from the first perspective and a second image taken from the second perspective, and store the transformation matrix into the storage device 13. Please refer to
Please refer to
As shown in
In step S21, the processing device 15 obtains masks corresponding to the target object in images. More particularly, the processing device 15 may input the images into an object detection model and determine the target object in the images through the object detection model so as to obtain the masks corresponding to the target object. The object detection model may be a convolutional neural network (CNN) model that has been trained to detect the target object, particularly an instance segmentation model, such as Deep Snake model, but this disclosure is not limited to this.
In step S22, the processing device 15 obtains a trajectory direction of the target object according to the masks. In an implementation, the processing device 15 may process the masks using a single object tracking algorithm to obtain the trajectory direction of the target object. In another implementation, the processing device 15 may process the masks using a multiple object tracking algorithm for the situation that the masks include the masks corresponding to multiple target objects (i.e. multiple target objects are contained in the same image) to obtain the trajectory direction of each of the target objects. More particularly, the multiple object tracking algorithm may include: considering the centers of the masks to be the positions of the target objects; processing the obtained centers using Kalman filter to obtain an initial tracking result; processing the characteristic matrix of each of the masks using the Hungarian algorithm to adjust the initial tracking result; and obtaining the trajectory direction of each of the target objects from the tracking result. With the Hungarian algorithm, the tracking effect may be more perfect, and the problem of occlusion of the target objects may be solved.
In step S23, the processing device 15 generates a target contour according to one of the masks. More particularly, the processing device 15 may take the outer contour of one of the masks as the target contour. In steps S24-S28, the processing device 15 performs a series of transformations and processing on the target contour to obtain the 3D bounding box of the target object corresponding to the target contour. In particular, the processing device 15 may generate a target contour for each mask in each image and perform steps S24-S28 on the target contour to reconstruct the 3D bounding box of each target object on each image.
For understanding steps S24-S28 of the 3D bounding box reconstruction method in a better way, a vehicle is used as the target object in the following description, but the target object of the invention is not limited to this. Please refer to
Steps (a)-(h) may be correspond to steps S24-S28 in
Step (a): transforming the target contour C1 into a transformed contour C2 using the transformation matrix. Step (a) may be regarded as mapping the target contour C1 from the first perspective image to the second perspective image to form a transformed contour C2. In particular, as shown in subfigure (a2), the size of the transformed contour C2 is obviously larger than that of ordinary vehicles, and the shape of the transformed contour C2 is distorted. It is because the target object in the image taken from the side perspective has a different size and shape due to the actual distance from the camera. Therefore, if the trajectory tracking is performed only using the vehicle contour, the obtained tracking result will have an error.
Step (b): obtaining a first bounding box B1 according to the transformed contour C2 and the trajectory direction D, and transforming the first bounding box B1 into a second bounding box B2 using the transformation matrix, wherein the second bounding box B2 corresponds to the target contour C1. More particularly, the implementation of obtaining the first bounding box B1 according to the transformed contour C2 and the trajectory direction D may include: obtaining two first line segments that are parallel to the trajectory direction D and tangent to the transformed contour C2; obtaining two second line segments that are perpendicular to the trajectory direction D and tangent to the transformed contour; and forming the first bounding box B1 using the two first line segments and the two second line segments. In other words, the first bounding box B1 is a quadrilateral formed by four tangent lines to the transformed contour C2, two of which are parallel to the trajectory direction D, and the other two are perpendicular to the trajectory direction D. The first bounding box B1 is transformed through the transformation matrix to be inversely mapped from the second perspective image to the first perspective image so as to form the second bounding box B2. Accordingly, the second bounding box B2 is also a quadrilateral.
Step (c): obtaining a corner point P10. In an implementation, the corner point P10 is a vertex closest to the origin of the camera coordinate system in the quadrilateral second bounding box B2. In another implementation, the corner point P10 is a vertex of which the y-coordinate in the image coordinate system is the smallest in the second bounding box B2, wherein the bottom left corner, horizontal direction and the vertical direction of the image are defined as the origin, x-axis direction and y-axis direction of the image coordinate system respectively.
Step (d): intersecting the target contour C1 along the first edge E1 of the second bounding box B2 in the vertical direction of the images to form first intersection points, projecting the first intersection points to the first edge E1 to form first points P31 respectively, and obtaining a first extreme point P11 that is a point with the longest distance from the corner point P10 among the first points P31.
Step (e): intersecting the target contour C1 along the second edge E2 of the second bounding box B2 in the vertical direction of the images to form second intersection points, projecting the second intersection points to the second edge E2 to form second points P32 respectively, and obtaining a second extreme point P12 that is a point with the longest distance from the corner point P10 among the second points P32.
The first edge E1 in step (d) and the second edge E2 in the step (e) as mentioned above are two sides of which the intersection is the corner point P10. The first extreme point P11 and the second extreme point P12 obtained in the two steps may be the length position and the width position of the target object. In particular, the first reference points mentioned in step S26 in
Step (f): transforming the corner point P10, the first extreme point P11 and the second extreme point P12 using the transformation matrix to form the transformed corner point P20, the transformed first extreme point P21 and the transformed second extreme point P22 respectively. Step (f) may be regarded as mapping the corner point P10, the first extreme point P11 and the second extreme point P12 from the first perspective image to the second perspective image to form the transformed corner point P20, the transformed first extreme point P21 and the transformed second extreme point P22. In particular, the second reference points mentioned in step S26 in
Step (g): obtaining a third bounding box B3 using the transformed corner point P20, the transformed first extreme point P21 and the transformed second extreme point P22, and transforming the third bounding box B3 into a fourth bounding box B4 using the transformation matrix. More particularly, the implementation of obtaining the third bounding box B3 using the transformed corner point P20, the transformed first extreme point P21 and the transformed second extreme point P22 may include: obtaining a first line segment that connects the transformed corner point P20 and the transformed first extreme point P21; obtaining a second line segment that connects the transformed corner point P20 and the transformed second extreme point P22; obtaining a third line segment that connects to the transformed first extreme point P21 and is parallel to the first line segment; obtaining a fourth line segment that connects to the transformed second extreme point P22 and is parallel to the second line segment; and forming the third bounding box B2 using the first to fourth line segments. Then, the third bounding box B3 is transformed through the transformation matrix to be inversely mapped from the second perspective image to the first perspective image so as to form the fourth bounding box B4.
Step (h): obtaining a 3D bounding box using the second bounding box B2 and the fourth bounding box B4. More particularly, it can be seen from the above steps that the second bounding box B2 and the fourth bounding box B4 are each composed of a quadrilateral. In an implementation, step (h) may include: generating a quadrilateral composing a fifth bounding box B5, wherein a vertex P13 farthest from the origin of the camera coordinate system in the fifth bounding box B5 is identical to a vertex farthest from the origin of the camera coordinate system in the second bounding box B2, and the quadrilateral composing the fifth bounding box B5 is identical to the quadrilateral composing the fourth bounding box B4; and using the fourth bounding box B4 as the bottom of the 3D bounding box, and using the fifth bounding box B5 as the top of the 3D bounding box. In another implementation, the vertex P13 is a vertex of which the y-coordinate in the image coordinate system is the largest in the fifth bounding box B5 and its image coordinates are the same as those of a vertex of which the y-coordinate in the image coordinate system is the largest in the second bounding box B2, wherein the bottom left corner, horizontal direction and the vertical direction of the image are defined as the origin, x-axis direction and y-axis direction of the image coordinate system respectively.
In some embodiments, besides obtaining the 3D bounding box from the first perspective, step (h) may further includes obtaining a 3D bounding box from the second perspective. In an implementation, the 3D bounding box from the second perspective is obtained using the first bounding box B1 and the third bounding box B3. The execution of obtaining the 3D bounding box from the second perspective is in the same way as that of obtaining the 3D bounding box from the first perspective, so it is not repeated here. In another implementation, the fifth bounding box B5 may be further transformed into the sixth bounding box B6 as the top of the 3D bounding box from the second perspective, and the third bounding box B3 is used as the bottom of the 3D bounding box from the second perspective.
For ease of understanding,
In some embodiments, after obtaining the 3D bounding box of the target object, the 3D bounding box reconstruction system/method may further apply the 3D bounding box to trajectory tracking of the target object. More particularly, the geometric center of the bottom of the 3D bounding box may be regarded as the position of the target object. Please refer to
In some embodiments, the 3D bounding box reconstruction method described in the above embodiments may be included in a non-transitory computer readable medium, such as an optical disc, a flash drive, a memory card, a hard disk of a cloud server, etc., in the form of at least one computer executable procedure. When said at least one computer executable procedure is executed by the processor of a computer, the 3D bounding box reconstruction method described in the above embodiments is implemented.
In view of the above description, the 3D bounding box reconstruction system, the 3D bounding box reconstruction method and the non-transitory computer readable medium in this disclosure may perform specific image transformation and processing steps on the contour corresponding to the target object to create a 3D bounding box of a target object, without the need to input the contour into a neural network model for computation, so they may have a lower computational complexity and a higher computational speed in comparison with purely using a neural network model to create a 3D bounding box. The 3D bounding box reconstruction system, the 3D bounding box reconstruction method and the non-transitory computer readable medium in this disclosure may reconstruct a 3D bounding box of a target object for correcting the position of the center of the target object so as to obtain a more accurate target object trajectory direction. In this way, the system, method and non-transitory computer readable medium in this disclosure may achieve 3D reconstruction of traffic flows in different directions, and particularly be applied to vehicle monitoring in complex road environments such as intersections or roundabouts and not limited to preset road environments with a fixed trajectory direction (such as highways) Also, since a more accurate center may be obtained, the system, method and non-transitory computer readable medium in this disclosure may have a good performance in the application of vehicle speed monitoring. Moreover, in comparison with a 2D bounding box, a 3D bounding box may show the actual scope of the target object, so the system, method and non-transitory computer readable medium in this disclosure may have a good performance in the application of judging traffic incidents or states.
Number | Date | Country | Kind |
---|---|---|---|
110130954 | Aug 2021 | TW | national |