The present application relates generally to traffic infrastructure systems and, more particularly for example, to systems and methods for three-dimensional tracking of objects in a traffic scene.
Traffic control systems use sensors to detect vehicles and traffic to help mitigate congestion and improve safety. These sensors range in capabilities from the ability to simply detect vehicles in closed systems (e.g., provide a simple contact closure to a traffic controller) to those that are able to classify (e.g., distinguish between bikes, cars, trucks, etc.) and monitor the flows of vehicles and other objects (e.g., pedestrians, animals).
Within a traffic control system, a traffic signal controller may be used to manipulate the various phases of traffic signal at an intersection and/or along a roadway to affect traffic signalization. These traffic control systems are typically positioned adjacent to the intersection/roadway they control (e.g., disposed upon a traffic signal pole). Traffic control systems generally comprise an enclosure constructed from metal or plastic to house electronic equipment such as a sensor (e.g., an infrared imaging camera or other device), communications components and control components to provide instructions to traffic signals or other traffic control/monitoring devices.
The operation of the traffic signal may be adaptive, responsive, pre-timed, fully-actuated, or semi-actuated depending upon the hardware available at the intersection and the amount of automation desired by the operator (e.g., a municipality). For instance, cameras, loop detectors, or radar may be used to detect the presence, location and/or movement of one or more vehicles. For example, video tracking methods may be used to identify and track objects that are visible in a series of captured images. In response to a vehicle being detected, a traffic signal controller may alter the timing of the traffic signal cycle, for example, to shorten a red light to allow a waiting vehicle to traverse the intersection without waiting for a full phase to elapse or to extend a green phase if it determines an above-average volume of traffic is present and the queue needs additional time to clear.
One drawback of conventional systems is that the systems are limited to tracking objects that are visible in the captured sensor data. For example, a large truck in an intersection may block the view of one or more smaller vehicles from a camera used to monitor traffic. Motion detection algorithms, which track objects across a series of captured images, may not accurately track objects that are blocked from view of the camera. In view of the foregoing, there is a continued need for improved traffic control systems and methods that more accurately detect and monitor traffic.
Improved traffic infrastructure systems and methods are disclosed herein. In various embodiments, systems and methods for tracking objects though a traffic control system include a plurality of sensors configured to capture data associated with a traffic location, and a logic device configured to detect one or more objects in the captured data, determine an object location within the captured data, transform each object location to world coordinates associated with one of the plurality of sensors; and track each object location using the world coordinates using prediction and occlusion-based processes. The plurality of sensors may include a visual image sensor, a thermal image sensor, a radar sensor, and/or another sensor. An object localization process includes a trained deep learning process configured to receive captured data from one of the sensors and determine a bounding box surrounding the detected object and output a classification of the detected object. The tracked objects are further transformed to three-dimensional objects in the world coordinates.
The scope of the present disclosure is defined by the claims, which are incorporated into this section by reference. A more complete understanding of embodiments of the invention will be afforded to those skilled in the art, as well as a realization of additional advantages thereof, by a consideration of the following detailed description of one or more embodiments. Reference will be made to the appended sheets of drawings that will first be described briefly.
Aspects of the disclosure and their advantages can be better understood with reference to the following drawings and the detailed description that follows. It should be appreciated that like reference numerals are used to identify like elements illustrated in one or more of the figures, where showings therein are for purposes of illustrating embodiments of the present disclosure and not for purposes of limiting the same. The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure.
The present disclosure illustrates traffic infrastructure systems and methods with improved object detection and tracking. In various embodiments, a traffic infrastructure system includes an image capture component configured with an image sensor (e.g., a visual image sensor or a thermal image sensor) to capture video or images of a traffic scene and/or one or more other sensors. The system is configured with a trained embedded deep learning-based object detector for each sensor, allowing the traffic infrastructure system to acquire the locations of all the objects in the image. These objects may include different types of vehicles, pedestrians, cyclists and/or other objects. The deep learning object detector may provide a bounding box around each object, defined in image coordinates, and these image coordinates are transformed to Cartesian camera-centered world coordinates using each of the sensors' intrinsic parameters and the device's extrinsic parameters.
Other sensor data may be transformed in a similar manner. For example, the traffic infrastructure system may include a radar sensor configured to detect objects by transmitting radio waves and receiving reflections. The radar sensor can acquire the distance and angle from the object to the sensor which is defined in polar coordinates. These polar coordinates can also be transformed to Cartesian camera-centered world coordinates.
In various embodiments, the traffic infrastructure system transforms the coordinates of sensed objects to the camera-centered world coordinate system, which allows the tracking system to be abstracted from whichever sensor is being used. Physically-based logic is then used in the tracking system and objects are modeled in a traffic scene based on real-life fundamentals. Various objects from the different types of sensors can be matched together based on distances in the camera-centered world coordinate system. The tracking system combines the various sensor acquired object coordinates to track the objects.
After a new object is acquired and has been tracked for a short distance, the tracking system may initiate a Kalman Filter (e.g., an unscented Kalman Filter) to start predicting and filtering out expected noise from each sensor. The Kalman Filter models the location, speed and heading of tracked objects. This also allows the traffic infrastructure system to keep predicting the trajectory of objects while the acquisition sensors have temporarily lost sight of the object. This can happen due to failures in the sensors, failure in the object localization algorithms or occlusions of objects, for example.
Next, the traffic infrastructure system transforms the locations, which are two-dimensional points in the coordinate system, to fully 3D objects. The volume of the object and the ground plane of the object is estimated. This can be estimated, for example, because the trajectory and heading of the object is known and the angle as seen from the devices standpoint. The tracking system provides the 3D objects in the world coordinates system to an application that uses object location information, such as vehicle presence detection at intersections, crossing pedestrian detection, counting and classification of vehicles, and other applications. The use of the 3D objects in the world coordinate system also simplifies those applications greatly because they don't have to include occlusion handling mechanisms or noise reduction mechanism themselves.
In various embodiments disclosed herein, tracking systems are described that are inherently capable of handling multiple sensor inputs where the abstraction from a specific sensor can be transformed to world coordinates. These tracking systems are capable of predicting and handling occlusions to keep track of the location of objects even if all sensors lost sight of the object. These tracking systems are also able to estimate the real object volume in the world (e.g., width, height, length).
Referring to
The thermal sensor 120 may include a thermal image capture device (e.g., a thermal camera) configured to capture thermal images of the scene. The captured thermal images are provided to an object localization algorithm 122, which may include a deep learning model trained to identify one or more objects within a captured thermal image. The object location within the captured thermal images are transformed to world coordinators through a transformation algorithm 124.
The radar sensor 130 may include a transmitter configured to produce electromagnetic pulses and a receiver configured to receive reflections of the electromagnetic pulses off of objects in the location of the scene. The captured radar data is provided to an object localization algorithm 132, which may include a background learning algorithm that detects movement in the captured data and/or a deep learning model trained to identify one or more objects within the radar data. The object location within the captured radar data are transformed to world coordinators through a transformation algorithm 134.
World coordinates of the objects detected by the various sensors 110, 120 and 130 are provided to a distance matching algorithm 140. The distance matching algorithm 140 matches objects detected by one or more sensors based on location and provide the synthesized object information to an object tracking system 152 that is configured to track detected objects using world coordinates. A Kalman Filter 150 (e.g., an unscented Kalman filter) is used to provide a prediction of location based on historic data and previous three-dimensional location of the object. An occlusion prediction and handling algorithm 154 may also be used to track objects that are occluded from detection of one or more sensors. Finally, the tracked objects are transformed to three-dimensional object representations (e.g., through a 3D bounding box having a length, width and height in the world coordinates) through a 3D object transformation process 160.
Referring to
Referring to
The camera intrinsic parameters 420 may include information describing the configuration of the camera, such as a focal length, sensor format and principal point. The extrinsic parameters may include camera height, tilt angle and pan angle. In various embodiments, the tracking system tracks a single point location for each object (e.g., the center bottom point of the bounding box). It is observed that this point is likely to be the back or front of an object that is located on the ground plane.
Referring to
In some embodiments, the tracking system decides between multiple candidates for locations of objects from the multiple sensors (e.g., sensors 512, 514 and 516), along with the predicted locations 554 based on historic data (e.g., by Kalman Filter) and previous 3D object location 552 (e.g., ground plane of object and volume). The system determines the best candidate for a new updated location of the object, based on the available data. The best candidate is decided based on a combination of real world distances of the new acquired location and the predicted location, also taking into account the confidence values of the candidate locations. If a new candidate location does not fit the criteria, the tracking system will start tracking this candidate location as a new object 522. It is also considered that based on the physical volume of the already tracked 3D objects, it should not be possible for objects to overlap in the real world.
Referring to
Embodiments of the occlusion and prediction handling will now be described with reference to
If an occlusion is likely, the particular object potentially will have no new candidates from all the sensors. The traffic monitoring system may handle occlusion, for example, by using the predictions from the Kalman Filter. If the first object moves away, it is expected that the object that was occluded will become visible again and the tracking system will expect new candidates from the sensors to keep this object ‘alive’.
In the second example (illustrated in image sequence (e) through (h)), the first object 750 and second object 760 are both driving in the same lane. In this case, the occlusion area 770 depends at least in part on the height of the first object 750. Taking this into account, it can be calculated how close the second object 760 needs to get behind the first object 750 to be occluded. In this case, the second object 760 might not be visible anymore in all the sensors but the tracking system knows it is still there behind the first vehicle 750 as long no sensor detects him again.
Referring to
In one embodiment, the first step is to define the initial size of the object, and therefore ground plane of the object, in the world where the original tracked point is the center bottom point of the object. bounding box (step 3 in the example) The initial size of the object is chosen based on the class type of the object originally determined by the CNNs on the image sensors or by the radar sensor. After that the ground plane of the object is rotated based on the previous trajectory and heading of the object. (step 4 in the example). By projecting this ground plane of the object back to the original image sensor, this rotated ground plane will correspond to a new projected center bottom point of the projected bounding box. (step 5 in the example: new bounding box and dot). The translation is calculated between the original point and the newly projected point. This translation is now done in the opposite way to compensate for the angle of view as seen from the camera position. This will then correspond with the real ground plane of the object (step 6 in the example).
The width and height of the object is determined based on the class type determined by the CNNs on the image sensors and the radar sensor. However, the real length of the object can be estimated more accurately if we have input from the image sensors. The original bounding box determined by the CNN can be used to calculate the real length. Projecting the 3D object back to the image plane and comparing this with the original bounding box, the length of the 3D object can be extended or shortened accordingly.
An example image 900 from a tracking system working with a thermal image sensor is illustrated in
Referring to
The image capture components 1130 are configured to capture images of a field of view 1131 of a traffic location (e.g., scene 1134 depicting a monitored traffic region). The image capture components 1130 may include infrared imaging (e.g., thermal imaging), visible spectrum imaging, and/or other imaging components. In some embodiments, the image capture components 1130 include an image object detection subsystem 1138 configured to process captured images in real-time to identify desired objects such as vehicles, bicycles, pedestrians and/or other objects. In some embodiments, the image object detection subsystem 1138 can be configured through a web browser interface and/or software which is installed on a client device (e.g., remote client device 1174 with interface 1176 and/or another system communicably coupled to the image capture components 1130). The configuration may include defined detection zones 1136 within the scene 1134. When an object passes into a detection zone 1136, the image object detection subsystem 1138 detects and classifies the object. In a traffic monitoring system, the system may be configured to determine if an object is a pedestrian, bicycle or vehicle. If the object is a vehicle or other object of interest, further analysis may be performed on the object to determine a further classification of the object (e.g., vehicle type) based on shape, height, width, thermal properties and/or other detected characteristics.
In various embodiments, the image capture components 1130 include one or more image sensors 1132, which may include visible light, infrared, or other imaging sensors. The image object detection subsystem 1138 includes at least one object localization module 1138a and at least one coordinate transformation module 1138b. The object localization module 1138a is configured to detect an object and define a bounding box around the object. In some embodiments, the object localization module 1138a includes a trained neural network configured to output an identification of detected objects and associated bounding boxes, a classification for each detected object, and a confidence level for classification. The coordinate transformation module 1138b transforms the image coordinates of each bounding box to real-world coordinate associated with the imaging device. In some embodiments, the image capture components include multiple cameras (e.g., a visible light camera and a thermal imaging camera) and corresponding object localization and coordinate transform modules.
In various embodiments, the radar components 1140 include one or more radar sensors 1142 for generating radar data associated with all or part of the scene 1134. The radar components 1140 may include a radar transmitter, radar receiver, antenna and other components of a radar system. The radar components 1140 further include a radar object detection system 1148 configured to process the radar data for use by other components of the traffic control system. In various embodiments, the radar object detection subsystem 1148 includes at least one object localization module 1148a and at least one coordinate transformation module 1148b. The object localization module 1148a is configured to detect objects in the radar data and identify a location of the object with reference to the radar receiver. In some embodiments, the object localization module 1148a includes a trained neural network configured to output an identification of detected objects and associated location information, a classification for each detected object and/or object information (e.g., size of an object), and a confidence level for classification. The coordinate transformation module 1148b transforms the radar data to real-world coordinates associated with the image capture device (or another sensor system)
In various embodiments, the local monitoring and control components 1110 further include other sensor components, which may include feedback from other types of traffic sensors (e.g., a roadway loop sensor) and/or object sensors, which may include wireless systems, sonar systems, LiDAR systems, and/or other sensors and sensor systems. The other sensor components 1150 include local sensors 1152 for sensing traffic-related phenomena and generating associated data, and associated sensor object detection systems 1158, which includes object localization module 1158a, which may include a neural network configured to detect objects in the sensor data and output location information (e.g., a bounding box around a detected object), and a coordinate transformation module 1158b to transform the sensor data location to real-world coordinates associated with the image capture device (or other sensor system).
In some embodiments, the various sensor systems 1130, 1140 and 1150 are communicably coupled to the computing components 1120 and/or the traffic control system 1112 (such as an intersection controller). The computing components 1120 are configured to provide additional processing and facilitate communications between various components of the intelligent traffic system 1100. The computing components 1120 may include processing components 1122, communication components 1124 and a memory 1126, which may include program instructions for execution by the processing components 1122. For example, the computing components 1120 may be configured to process data received from the image capture components 1130, radar components 1140, and other sensing components 1150. The computing components 1120 may be configured to communicate with a cloud analytics platform 1160 or another networked server or system (e.g., remote local monitoring systems 1172) to transmit local data for further processing. The computing components 1120 may be further configured to receive processed traffic data associated with the scene 1134, traffic control system 1112, and/or other traffic control systems and local monitoring systems in the region. The computing components 1120 may be further configured to generate and/or receive traffic control signals for controlling the traffic control system 1112.
The computing components 1120 and other local monitoring and control components 1110 may be configured to combine local detection of pedestrians, cyclists, vehicles and other objects for input to the traffic control system 1112 with data collection that can be sent in real-time to a remote processing system (e.g., the cloud 1170) for analysis and integration into larger system operations.
In various embodiments, the memory 1126 stores program instructions to cause the processing components 1122 to perform the processes disclosed herein with reference to
Where applicable, various embodiments provided by the present disclosure can be implemented using hardware, software, or combinations of hardware and software. Also, where applicable, the various hardware components and/or software components set forth herein can be combined into composite components comprising software, hardware, and/or both without departing from the spirit of the present disclosure. Where applicable, the various hardware components and/or software components set forth herein can be separated into sub-components comprising software, hardware, or both without departing from the spirit of the present disclosure.
Software in accordance with the present disclosure, such as non-transitory instructions, program code, and/or data, can be stored on one or more non-transitory machine-readable mediums. It is also contemplated that software identified herein can be implemented using one or more general purpose or specific purpose computers and/or computer systems, networked and/or otherwise. Where applicable, the ordering of various steps described herein can be changed, combined into composite steps, and/or separated into sub-steps to provide features described herein. Embodiments described above illustrate but do not limit the invention. It should also be understood that numerous modifications and variations are possible in accordance with the principles of the invention. Accordingly, the scope of the invention is defined only by the following claims.
This application is a continuation of International Patent Application No. PCT/US2021/023324 filed Mar. 19, 2021 and entitled “MULTI-SENSOR OCCLUSION-AWARE TRACKING OF OBJECTS IN TRAFFIC MONITORING SYSTEMS AND METHODS,” which claims the benefit of and priority to U.S. Provisional Patent Application No. 62/994,709 filed Mar. 25, 2020 and entitled “MULTI-SENSOR OCCLUSION-AWARE TRACKING OF OBJECTS IN TRAFFIC MONITORING SYSTEMS AND METHODS,” all of which are incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
62994709 | Mar 2020 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2021/023324 | Mar 2021 | US |
Child | 17948124 | US |