The present disclosure relates to techniques for extracting kinematic variables of ground vehicles and generating labelled data sets for machine learning algorithms from images taken by an unmanned aerial vehicle (UAV).
Unmanned aerial vehicles (UAVs), also referred as drones, have attracted increasing attention of researchers in traffic monitoring and management due to their mobility and low cost. Instead of fixed-location sensors, such as cameras, radars, and loop detectors installed at the infrastructure, which can only collect data from a specific perspective or location and often result in inaccurate and/or aggregated data, UAVs with onboard sensors serve as mobile sensors in modern traffic networks, and can be used to track individual vehicles accurately. UAVs can even be coordinated to collect large scale traffic information to monitor traffic and study congestion. While there are different types of sensors UAVs can carry, such as video cameras, thermal cameras, infrared cameras, LIDAR, and radar, high-resolution video cameras are one of the most popular sensors for traffic monitoring. This is very cost effective compared to collecting vehicle data via probe vehicles equipped with high-precision GPS, which is costly and only allows the collection of a few vehicle trajectories.
Current research has focused on using UAV obtained videos for vehicle detection, classification, and tracking, in order to monitor and analyze traffic, study traffic flow, and calibrate traffic simulation, etc. The data obtained are typically vehicle types and counts, vehicle positions, vehicle speeds, and flow speeds. In these scenarios, vehicles are treated as points from the vehicle dynamics perspective, since the orientation is ignored.
In the coming era of connected and automated vehicles and intelligent transportation systems, knowing the dynamics of the ego vehicle and surrounding vehicles are crucial for the vehicle controller design, and for providing a safer environment for all road users. Instead of on-board sensors of limited range and fixed-location sensors with inaccurate and/or aggregated data, an UAV is used to extract kinematic variables for ground vehicle in this disclosure.
This section provides background information related to the present disclosure which is not necessarily prior art.
This section provides a general summary of the disclosure, and is not a comprehensive disclosure of its full scope or all of its features.
In one aspect, a method is presented for extracting kinematic data for vehicles using an unmanned aerial vehicle. The method incudes: defining a region of interest on the ground; providing a background image of the region of interest; capturing, by a camera, a series of images over time of the region of interest from a perspective above the region of interest; from the series of images, detecting at least one vehicle moving in the region of interest; for each image in the series of images, fitting a bounding box to the at least one moving vehicle, where the bounding box surrounds the at least one moving vehicle and the size of the bounding box is same across the series of images; and determining kinematic data for the at least one moving vehicle using the bounding boxes, where the kinematic data includes yaw angle for the at least one moving vehicle.
In another aspect, a method is presented for detecting a vehicle passing through a region of interest. The method includes: capturing, by a top view camera, a first set of images of a region of interest on the ground from a perspective above the region of interest; from the first set of images, creating a plurality of bounding boxes for each vehicle moving in the region of interest, and extracting kinematic data for each vehicle moving in the region of interest using the plurality of bounding boxes; capturing, by a side view camera, a second set of images of the region of interest from a perspective on side of the region of interest; projecting the plurality of bounding boxes to viewpoint of the side view camera; and training a machine learning algorithm to detect moving vehicles in images captured by the side view camera, where the machine learning algorithm is trained using the second set of images and a ground truth, and the kinematic data and the plurality of bounding boxes projected to the viewpoint of the side view camera the serves as the ground truth.
Further areas of applicability will become apparent from the description provided herein. The description and specific examples in this summary are intended for purposes of illustration only and are not intended to limit the scope of the present disclosure.
The drawings described herein are for illustrative purposes only of selected embodiments and not all possible implementations, and are not intended to limit the scope of the present disclosure.
Corresponding reference numerals indicate corresponding parts throughout the several views of the drawings.
Example embodiments will now be described more fully with reference to the accompanying drawings.
In one aspect, a method is presented for extracting kinematic data for vehicles as depicted in
A series of images (i.e., a video) are captured over time by a camera at 12, where each image is of the region of interest from a perspective above the region of interest. For example, the series of images may be captured using an unmanned aerial vehicle, where the unmanned aerial vehicle is equipped with the camera. Prior to object detection, image stabilization is performed as shown in
From the series of images, at least one vehicle moving in the region of interest is detected at 13. Moving objects can be detected by comparing each frame in the series of images with the background image as shown in
For each image in the series of images, a bounding box is fit at 14 to each of the detected moving objects. With reference to
Finally, kinematic data for the moving objects can be determined at 15 using the bonding boxes. The position and the yaw angle of a moving object can be obtained directly from the bounding box as described below. Taking a truck as an example, apply the bicycle model to the truck as shown in
Consider the vehicle as a rigid body, the yaw rate is the same at any point of the body, but the velocity and heading angle are different at different points. The position
of point C and the yaw angle ψ can be directly obtained from the bounding box after converting the image coordinate to a ground coordinate. With the distances measured between point T and center point C is dCT=3.49 m, and between center point C and rear axle center point R is dCR=2.21 m, the positions of point T and point C can be calculated as:
Knowing the position information at each time step, one can estimate the velocity vC and its magnitude, the speed vC using distance traveled divided by the time difference between consecutive frames:
where ΔxC and ΔyC are the changes of the x and y coordinates between frames and
That is, the speed of the center point of the bounding box, is calculated by determining the distance between center points of two consecutive bounding boxes and dividing the distance by the time different between consecutive frames. The velocity and the speed of the points T and R can be determined similarly. A direct division, however, can cause large errors. One pixel from the image is about 0.0275 m in reality, and the time difference is Δt= 1/60 s (with a frame rate of 60 fps). In this case, even an error of one pixel can lead to 0.0275×60=1.65 m/s when estimating the speed. Thus, the direct speed calculation is smoothed in the plots.
The yaw rate,
can be calculated in the similar way using the yaw angle of the vehicle. The smoothed curves of speed and yaw rate and their original data (position/yaw angle difference divided by time difference) are plotted in
As proof of concept, the detection results from the drone view video (60 Hz) are compared with the GPS data (10 Hz). The GPS antenna is installed at point T of the truck (see
The curve is smoothed with ⅓ s data points, and is the same with the green curve in
For the other run of the experiment, the GPS position in
Despite the drift of GPS data, in general, the drone video process results fit well with the GPS data, while the video process gives smoother data in position and yaw angle (or heading angle) and provides richer information of the vehicle dynamics (heading angles and velocity components at different points, yaw rate) compared to the GPS device which only gives information for a single point.
In another aspect of this disclosure, a method is presented for detecting a vehicle passing through a region of interest as depicted in
From the first set of images, kinematic data for each vehicle moving in the region of interest is extracted at 72. The kinematic data for each vehicle can include one or more of position, yaw angle, velocity and yaw rate. The kinematic data is preferably extracted using bounding boxes in accordance with the techniques described above, where a bounding box surrounds a moving vehicle captured in an image.
Next, a second set of images of the region of interest is captured at 73 from a perspective on side of the region of interest. In one example, the second set of images is captured using a second (side view) camera at a fixed location adjacent to the region of interest.
Image data from the first set of images is then projected at 74 to viewpoint of the second side view camera. Rather than project all of the image data, only the bounding boxes are projected to the viewpoint of the side view camera. In an example embodiment, the bounding boxes are projected to the viewpoint of the side view camera using homography as will further described below.
To detect vehicles passing through the region of interest, a machine learning algorithm, such as YOLO (You Only Look Once) objection detection algorithm, can be trained (or re-trained) at 75 to detect moving vehicle using the second set of images and a ground truth. In this example, the ground truth is formed by the extracted kinematic data and the bounding boxes projected to the viewpoint of the side view camera. In other words, the extracted kinematic data and the bounding boxes projected to the viewpoint of the side view camera serve as a labeled data set for training a machine learning algorithm. Other types of machine learning algorithms, including Convolutional Neural Networks and Recurrent Neural Networks, are contemplated by this disclosure.
Once trained, the machine learning algorithm can be used to detect vehicles passing through the region of interest. To do so, image data from the side view camera can be input to the machine learning algorithm. It is envisioned that the extracted kinematic variables can be used for the management of intelligent infrastructure, and with vehicle-to-everything (V2X) communication. The extracted kinematic variables can also be used for the motion planning of connected automated vehicles (CAVs). Note that the algorithms use UAVs for training data collection, and once the algorithm is trained, only the roadside camera is required to extract the kinematic variables
Homography can be used to project a bounding box from the drone view image to fixed camera side view. Instead of a rectangle from the drone view, the bounding box would be quadrilateral in the fixed-camera side view. This mechanism enables an easy way to generate bounding boxes from a side-view camera, and together with the ground truth (position, speed, yaw angle, and yaw rate, etc.), they can be utilized to generate training data (labeled data sets) for machine learning algorithms.
With reference to
This projection does not consider the distortion of the camera lens, and it preserves the straight lines. Since the scaling factor w doesn't affect the ratio, assuming, for example, h33≈1 results in eight unknowns in matrix H, thus four points are needed to complete the projection.
Considering a drone about 250 feet above the ground, one can assume the bounding box from the drone view can well represent the bounding box on the ground plane. To map it from the ground plane (drone view) to the fixed-camera plane (i.e., side view), more (than four) points can be selected to reduce the estimation error. The problem is thus converted to finding the matrix H for the best fit projected plane in the fixed-camera side view, which can be solved by the singular value decomposition method. Rewriting the H matrix as:
Assume the scale factor w=1, equation (4) becomes
and n is the number of points. Vector h can be obtained by the eigenvector of the least eigenvalue of ATA, which is the direct result of the singular value decomposition of matrix of A.
As shown in
Using images from Unmanned Aerial Vehicles (UAVs), an algorithm is developed to track vehicles in an intersection and to generate labelled data sets for machine learning algorithms. Different from previous studies focusing on vehicle detection, classification, and tracking in traffic surveillance, this disclosure uses recorded videos from UAVs to extract and analyze the kinematic variables of vehicles. These variables include position, yaw angle, heading angle, speed, and yaw rate, etc. By comparing each video frame with a background image using Matlab morphological operations, the areas of changes are identified. A finer investigation of these areas are taken to draw the bounding boxes around the vehicles. The position and yaw angle can be read directly from the bounding boxes, while the speed and yaw rate are calculated accordingly.
The extracted kinematic data can be used in the control and planning of smart intersections, especially in the environment with connected automated vehicles (CAVs), since knowing the dynamics of other vehicles can benefit the CAVs for their decision making and path planning as well. With the obtained data and bounding boxes (from side view fixed camera) as the “ground truth”, this algorithm can be used to generate training data for machine learning algorithms which can perform more complex tasks, for example, online tracking.
The techniques described herein may be implemented by one or more computer programs executed by one or more processors. The computer programs include processor-executable instructions that are stored on a non-transitory tangible computer readable medium. The computer programs may also include stored data. Non-limiting examples of the non-transitory tangible computer readable medium are nonvolatile memory, magnetic storage, and optical storage.
Some portions of the above description present the techniques described herein in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. These operations, while described functionally or logically, are understood to be implemented by computer programs. Furthermore, it has also proven convenient at times to refer to these arrangements of operations as modules or by functional names, without loss of generality.
Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system memories or registers or other such information storage, transmission or display devices.
Certain aspects of the described techniques include process steps and instructions described herein in the form of an algorithm. It should be noted that the described process steps and instructions could be embodied in software, firmware or hardware, and when embodied in software, could be downloaded to reside on and be operated from different platforms used by real time network operating systems.
The present disclosure also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a computer selectively activated or reconfigured by a computer program stored on a computer readable medium that can be accessed by the computer. Such a computer program may be stored in a tangible computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus. Furthermore, the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
The algorithms and operations presented herein are not inherently related to any particular computer or other apparatus. Various systems may also be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatuses to perform the required method steps. The required structure for a variety of these systems will be apparent to those of skill in the art, along with equivalent variations. In addition, the present disclosure is not described with reference to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present disclosure as described herein.
The foregoing description of the embodiments has been provided for purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure. Individual elements or features of a particular embodiment are generally not limited to that particular embodiment, but, where applicable, are interchangeable and can be used in a selected embodiment, even if not specifically shown or described. The same may also be varied in many ways. Such variations are not to be regarded as a departure from the disclosure, and all such modifications are intended to be included within the scope of the disclosure.
This application claims the benefit and priority of U.S. Provisional Application No. 63/466,755 filed on May 16, 2023. The entire disclosure of the above application is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63466755 | May 2023 | US |