Vehicle Kinematic Data Extraction Using Unmanned Aerial Vehicles

Information

  • Patent Application
  • 20240386580
  • Publication Number
    20240386580
  • Date Filed
    May 15, 2024
    7 months ago
  • Date Published
    November 21, 2024
    a month ago
Abstract
An algorithm is presented to extract kinematic variables of ground vehicles and to generate labelled data sets for machine learning algorithms from the unmanned aerial vehicle (UAV) footage. Differently from the existing studies on vehicle detection and tracking, the output kinematic variables include not only position and speed, but also yaw angle and yaw rate, etc. In an intelligent transportation system, these parameters can be used on the planning and control of smart intersections and connected automated vehicles (CAVs). Compared to the GPS device, the proposed techniques provide smoother data and richer dynamics of the vehicles. The algorithm also generates oriented bounding boxes from the drone view (top view) images, and these bounding boxes are converted to the perspective of a fixed camera (roadside view) using homography. The kinematic data and the bounding boxes can serve as the ground truth for machine learning algorithms.
Description
FIELD

The present disclosure relates to techniques for extracting kinematic variables of ground vehicles and generating labelled data sets for machine learning algorithms from images taken by an unmanned aerial vehicle (UAV).


BACKGROUND

Unmanned aerial vehicles (UAVs), also referred as drones, have attracted increasing attention of researchers in traffic monitoring and management due to their mobility and low cost. Instead of fixed-location sensors, such as cameras, radars, and loop detectors installed at the infrastructure, which can only collect data from a specific perspective or location and often result in inaccurate and/or aggregated data, UAVs with onboard sensors serve as mobile sensors in modern traffic networks, and can be used to track individual vehicles accurately. UAVs can even be coordinated to collect large scale traffic information to monitor traffic and study congestion. While there are different types of sensors UAVs can carry, such as video cameras, thermal cameras, infrared cameras, LIDAR, and radar, high-resolution video cameras are one of the most popular sensors for traffic monitoring. This is very cost effective compared to collecting vehicle data via probe vehicles equipped with high-precision GPS, which is costly and only allows the collection of a few vehicle trajectories.


Current research has focused on using UAV obtained videos for vehicle detection, classification, and tracking, in order to monitor and analyze traffic, study traffic flow, and calibrate traffic simulation, etc. The data obtained are typically vehicle types and counts, vehicle positions, vehicle speeds, and flow speeds. In these scenarios, vehicles are treated as points from the vehicle dynamics perspective, since the orientation is ignored.


In the coming era of connected and automated vehicles and intelligent transportation systems, knowing the dynamics of the ego vehicle and surrounding vehicles are crucial for the vehicle controller design, and for providing a safer environment for all road users. Instead of on-board sensors of limited range and fixed-location sensors with inaccurate and/or aggregated data, an UAV is used to extract kinematic variables for ground vehicle in this disclosure.


This section provides background information related to the present disclosure which is not necessarily prior art.


SUMMARY

This section provides a general summary of the disclosure, and is not a comprehensive disclosure of its full scope or all of its features.


In one aspect, a method is presented for extracting kinematic data for vehicles using an unmanned aerial vehicle. The method incudes: defining a region of interest on the ground; providing a background image of the region of interest; capturing, by a camera, a series of images over time of the region of interest from a perspective above the region of interest; from the series of images, detecting at least one vehicle moving in the region of interest; for each image in the series of images, fitting a bounding box to the at least one moving vehicle, where the bounding box surrounds the at least one moving vehicle and the size of the bounding box is same across the series of images; and determining kinematic data for the at least one moving vehicle using the bounding boxes, where the kinematic data includes yaw angle for the at least one moving vehicle.


In another aspect, a method is presented for detecting a vehicle passing through a region of interest. The method includes: capturing, by a top view camera, a first set of images of a region of interest on the ground from a perspective above the region of interest; from the first set of images, creating a plurality of bounding boxes for each vehicle moving in the region of interest, and extracting kinematic data for each vehicle moving in the region of interest using the plurality of bounding boxes; capturing, by a side view camera, a second set of images of the region of interest from a perspective on side of the region of interest; projecting the plurality of bounding boxes to viewpoint of the side view camera; and training a machine learning algorithm to detect moving vehicles in images captured by the side view camera, where the machine learning algorithm is trained using the second set of images and a ground truth, and the kinematic data and the plurality of bounding boxes projected to the viewpoint of the side view camera the serves as the ground truth.


Further areas of applicability will become apparent from the description provided herein. The description and specific examples in this summary are intended for purposes of illustration only and are not intended to limit the scope of the present disclosure.





DRAWINGS

The drawings described herein are for illustrative purposes only of selected embodiments and not all possible implementations, and are not intended to limit the scope of the present disclosure.



FIG. 1 is a flowchart depicting a method for extracting kinematic data for a vehicle from image data.



FIG. 2 is a diagram of an example intersection at a test facility.



FIG. 3A shows image stabilization techniques in accordance with this disclosure.



FIG. 3B shows an object detection technique in accordance with this disclosure.



FIG. 3C shows a vehicle detection technique in accordance with this disclosure.



FIG. 3D shows kinematic data extracted from images in accordance with this disclosure.



FIG. 4 is a diagram of a bicycle mode applied to a truck.



FIGS. 5A-5C are graphs showing speed, velocity components, and heading angles and yaw angle of a truck over time, respectively.



FIGS. 6A-6F are graphs comparing extracted kinematic data with GPS data.



FIG. 7 is a flowchart depicting a method for detecting a vehicle passing through a region of interest.



FIG. 8 is a diagram of a rectangle on the ground plane projected as a quadrilateral on the image plane of a camera.



FIGS. 9A and 9B are illustrations showing points selected in drone view (FIG. 9A) in relation to side view (FIG. 9B), where the circles are hand selected points and crosses are calculated from the drone view using matrix H for validation purposes.



FIG. 10 shows the transfer of bonding box from the drone view to the side view using homography.





Corresponding reference numerals indicate corresponding parts throughout the several views of the drawings.


DETAILED DESCRIPTION

Example embodiments will now be described more fully with reference to the accompanying drawings.


In one aspect, a method is presented for extracting kinematic data for vehicles as depicted in FIG. 1. As a starting point, a region of interest (ROI) is defined as indicated at 11. For demonstration purposes, an experiment was carried out in Mcity Test Facility at the University of Michigan, Ann Arbor. As shown in FIG. 2, the intersection of State Street (north-south) and Main Street (east-west) was chosen for the experiment and designated the region of interest. The experiment is designed as follows. A 26-ft moving truck (Ford F-650) moves towards the intersection from the westbound of Main Street, then makes a left turn to State Street, while a stationary personal vehicle stays at the eastbound of the intersection with its camera facing forward. The truck has an onboard GPS of 10 Hz frequency installed on top of the cabin. There are three other vehicles parked at the intersection to imitate a real environment. A DJI Phantom 4 Pro drone equipped with a video camera of the highest performance of 60 frames per second (fps) is sent above the intersection, of about 250 feet high. The camera is facing down to the intersection in order to record the movements of the vehicles. The effective pixels of the camera is 20M, and it has a 3-axis gimbal for stabilization. The movement of the truck is captured by both the drone camera (top view) and the stationary vehicle's camera (side view).


A series of images (i.e., a video) are captured over time by a camera at 12, where each image is of the region of interest from a perspective above the region of interest. For example, the series of images may be captured using an unmanned aerial vehicle, where the unmanned aerial vehicle is equipped with the camera. Prior to object detection, image stabilization is performed as shown in FIG. 3A. In particular, a background image of the intersection is selected from the series of images when the target vehicles are not present, and it is defined as the region of interest (ROI) to map each frame of the video. The use of the background image not only reduces the computation time by eliminating the unnecessary part of the images, but also stabilizes the video obtained from the hovering drone.


From the series of images, at least one vehicle moving in the region of interest is detected at 13. Moving objects can be detected by comparing each frame in the series of images with the background image as shown in FIG. 3B. The comparison results in a binary image. In an example embodiment, morphological operations are applied to eliminate the small changes of the image (due to shadows, moving leaves in the wind, and slight changes of camera perspective, etc.) and to merge the neighboring patches which belong to the same object. The outcomes of this step are the detection boxes (marked as red squares in FIG. 3B), and the number of objects. Note that the detection boxes are square-shaped considering the orientations of the vehicles, and they can be in different sizes according to the actual vehicle sizes. The goal of this step is to find the rough locations of the vehicles and to prepare for the precise vehicle detection. On the one hand, the parked vehicles in the background are not detected. On the other hand, as long as the vehicle is not in the background image, even if it is parked, it can still be detected. The background image can be updated to accommodate larger changes during the day (or different times of the year), but this is not needed in this research since the time of interest is less than a minute. These steps may be implemented using Matlab or other commercially available image processing tools.


For each image in the series of images, a bounding box is fit at 14 to each of the detected moving objects. With reference to FIG. 3C, a detection box surrounds the detected moving object. More specifically, fitting the bounding box to a detected moving object includes overlay a pre-defined image of the objection (e.g., a car or a truck) on the detection box; changing orientation of the pre-defined image in relation to the detected moving object; for each orientation, determining a correlation metric between the pre-defined image and the periphery of the detected moving object; and drawing a bounding box around the detected moving object based on the pre-defined image having the correlation metric with highest value. It is noted that the size of the bounding box is the same across a series of images.


Finally, kinematic data for the moving objects can be determined at 15 using the bonding boxes. The position and the yaw angle of a moving object can be obtained directly from the bounding box as described below. Taking a truck as an example, apply the bicycle model to the truck as shown in FIG. 4, where Point C is the geometric center of the bounding box, point T is on top of the cabin where the GPS antenna is installed, and point R is the center of the real axle. vT, vC, and vR are the velocities at points T, C, and R, respectively, while βT, βC, and βR are the angles between the velocities and the vehicle longitudinal direction. ψ is the yaw angle of the truck, i.e., the orientation of the truck measured in relation to a fixed reference axis in ground coordinate system. Note that the heading angle θ is given by the sum of yaw angle ψ and angle β. For example, the heading angle θC at point C is (ψ+βC). Assuming no side slip for the rear wheel, that is, the velocity at point R is aligned with the vehicle and it only has longitudinal component, βR≈0.


Consider the vehicle as a rigid body, the yaw rate is the same at any point of the body, but the velocity and heading angle are different at different points. The position







r
c

=

[




x
C






y
C




]





of point C and the yaw angle ψ can be directly obtained from the bounding box after converting the image coordinate to a ground coordinate. With the distances measured between point T and center point C is dCT=3.49 m, and between center point C and rear axle center point R is dCR=2.21 m, the positions of point T and point C can be calculated as:











r
T

=


[




x
T






y
T




]

=


[




x
C






y
C




]

+


d

C

T


[




cos


ψ






sin


ψ




]




,


r
R

=


[




x
R






y
R




]

=


[




x
C






y
C




]

-



d

C

R


[




cos


ψ






sin


ψ




]

.








(
1
)







Knowing the position information at each time step, one can estimate the velocity vC and its magnitude, the speed vC using distance traveled divided by the time difference between consecutive frames:











v
c

=

[





Δx
C

/
Δ

t






Δ


y
C

/
Δ

t




]


,


v
c

=


Δ

r

c


Δ

t



,




(
2
)







where ΔxC and ΔyC are the changes of the x and y coordinates between frames and







Δ


r
c


=




Δ


x
C
2


+

Δ


y
C
2




.





That is, the speed of the center point of the bounding box, is calculated by determining the distance between center points of two consecutive bounding boxes and dividing the distance by the time different between consecutive frames. The velocity and the speed of the points T and R can be determined similarly. A direct division, however, can cause large errors. One pixel from the image is about 0.0275 m in reality, and the time difference is Δt= 1/60 s (with a frame rate of 60 fps). In this case, even an error of one pixel can lead to 0.0275×60=1.65 m/s when estimating the speed. Thus, the direct speed calculation is smoothed in the plots. FIG. 5A shows the smoothed speed curve at each point of the vehicle using 21 data points, which is corresponding to ⅓ s (⅙ s ahead and ⅙ s behind). Instead of the two-sided smoothing, one-sided smoothing which only uses the past information can also be adapted to apply the algorithm online. In the beginning of a trip, for about 2 seconds when the vehicle is moving straight to the intersection, the speed at the three points (T, C and R) are very close. However, when the vehicle starts to turn, the difference becomes obvious, and point T at the front of the vehicle has the largest speed. This is clearly shown in FIG. 5B, where the black curve is the longitudinal velocity and the green, blue, red curves are the lateral velocities of point T, C, and R, respectively. The longitudinal velocity is aligned with the vehicle orientation, and all three points have the same longitudinal velocity since the vehicle is considered as a rigid body and the bounding box is in fixed length. The lateral velocity is perpendicular to the longitudinal velocity. In the beginning two seconds, the lateral velocities are approximately zero at all three points, and when the turn starts, the lateral velocities at point T and C increase to different magnitudes while remains close to zero at point R. This shows that there is only small side slip at the rear wheels, and validates the assumption of the bicycle model.


The yaw rate,







ω
=


Δ

ψ


Δ

t



,




can be calculated in the similar way using the yaw angle of the vehicle. The smoothed curves of speed and yaw rate and their original data (position/yaw angle difference divided by time difference) are plotted in FIG. 3. Here, the heading angle θ is given by the velocity direction, and varies at different points. FIG. 5C gives the heading angles calculated by the velocity directions at point T, C, and R. For comparison, the yaw angle obtained from the bounding box is also plotted in black curve. Point T has the largest heading angle, and the heading angle at point R is very close to the yaw angle (βR≈0).


As proof of concept, the detection results from the drone view video (60 Hz) are compared with the GPS data (10 Hz). The GPS antenna is installed at point T of the truck (see FIG. 4), thus the position, speed, and heading angle are all plotted at point T in FIG. 6A-6F, where the solid green curves are obtained from the drone view video and the black circles are from the GPS device. FIG. 6A, 6C and 6E are from one run of the experiment, while FIGS. 6B, 6D and 6F are from another run. In FIG. 6A, the position points from the video are raw data obtained from the bounding box; while the GPS and the video data have a good fit, the trajectory from the video is smoother. The speed information from the drone view video in FIG. 6C is the same with the green curve in FIG. 5A, which is smoothed with ⅓ s data. The yaw angle can be estimated by the heading angle at the rear axle center (at point R), but cannot be generated from the GPS detection directly (at point T), thus it is hard to compare with the video process result. The heading angle θT from the drone view video is calculated using:











θ
T

=

artcan



Δ


y
T



Δ


x
T





,




(
3
)







The curve is smoothed with ⅓ s data points, and is the same with the green curve in FIG. 5C.


For the other run of the experiment, the GPS position in FIG. 6B has more than 4 meters drift. FIG. 6D shows that the vehicle runs slower compared to panel FIG. 6C, with a maximum speed of about 9 m/s, thus it takes longer to finish the turn. The GPS data agree with the video process results in FIGS. 6D and 6F, which also have the similar trend compared to panels FIGS. 6C and 6E.


Despite the drift of GPS data, in general, the drone video process results fit well with the GPS data, while the video process gives smoother data in position and yaw angle (or heading angle) and provides richer information of the vehicle dynamics (heading angles and velocity components at different points, yaw rate) compared to the GPS device which only gives information for a single point.


In another aspect of this disclosure, a method is presented for detecting a vehicle passing through a region of interest as depicted in FIG. 7. Similar to above, a first set of images of a region of interest on the ground is captured at 71 from a perspective above the region of interest. In one example, the first set of images is captured using an unmanned aerial vehicle, where the unmanned aerial vehicle is equipped with a (top view) camera. Other methods for capturing a set of images from a perspective above the region of interest are also contemplated by this disclosure.


From the first set of images, kinematic data for each vehicle moving in the region of interest is extracted at 72. The kinematic data for each vehicle can include one or more of position, yaw angle, velocity and yaw rate. The kinematic data is preferably extracted using bounding boxes in accordance with the techniques described above, where a bounding box surrounds a moving vehicle captured in an image.


Next, a second set of images of the region of interest is captured at 73 from a perspective on side of the region of interest. In one example, the second set of images is captured using a second (side view) camera at a fixed location adjacent to the region of interest.


Image data from the first set of images is then projected at 74 to viewpoint of the second side view camera. Rather than project all of the image data, only the bounding boxes are projected to the viewpoint of the side view camera. In an example embodiment, the bounding boxes are projected to the viewpoint of the side view camera using homography as will further described below.


To detect vehicles passing through the region of interest, a machine learning algorithm, such as YOLO (You Only Look Once) objection detection algorithm, can be trained (or re-trained) at 75 to detect moving vehicle using the second set of images and a ground truth. In this example, the ground truth is formed by the extracted kinematic data and the bounding boxes projected to the viewpoint of the side view camera. In other words, the extracted kinematic data and the bounding boxes projected to the viewpoint of the side view camera serve as a labeled data set for training a machine learning algorithm. Other types of machine learning algorithms, including Convolutional Neural Networks and Recurrent Neural Networks, are contemplated by this disclosure.


Once trained, the machine learning algorithm can be used to detect vehicles passing through the region of interest. To do so, image data from the side view camera can be input to the machine learning algorithm. It is envisioned that the extracted kinematic variables can be used for the management of intelligent infrastructure, and with vehicle-to-everything (V2X) communication. The extracted kinematic variables can also be used for the motion planning of connected automated vehicles (CAVs). Note that the algorithms use UAVs for training data collection, and once the algorithm is trained, only the roadside camera is required to extract the kinematic variables


Homography can be used to project a bounding box from the drone view image to fixed camera side view. Instead of a rectangle from the drone view, the bounding box would be quadrilateral in the fixed-camera side view. This mechanism enables an easy way to generate bounding boxes from a side-view camera, and together with the ground truth (position, speed, yaw angle, and yaw rate, etc.), they can be utilized to generate training data (labeled data sets) for machine learning algorithms.


With reference to FIG. 8, a rectangle on the ground plane is deformed from the camera's perspective which is looking down from the side. To convert the coordinate of a general point (x, y) on the ground plane to point (x′, y′) on the image plane, the matrix H is used as follows:










[




x







y






1



]

=


1
w






[




h

1

1





h

1

2





h

1

3







h

2

1





h

2

2





h

2

3







h
31




h

3

2





h

3

3





]



H

[



x




y




1



]






(
4
)







This projection does not consider the distortion of the camera lens, and it preserves the straight lines. Since the scaling factor w doesn't affect the ratio, assuming, for example, h33≈1 results in eight unknowns in matrix H, thus four points are needed to complete the projection.


Considering a drone about 250 feet above the ground, one can assume the bounding box from the drone view can well represent the bounding box on the ground plane. To map it from the ground plane (drone view) to the fixed-camera plane (i.e., side view), more (than four) points can be selected to reduce the estimation error. The problem is thus converted to finding the matrix H for the best fit projected plane in the fixed-camera side view, which can be solved by the singular value decomposition method. Rewriting the H matrix as:









h
=



[


h
11

,

h

1

2


,

h

1

3


,

h
21

,

h

2

2


,

h

2

3


,

h
31

,

h

3

2


,

h

3

3



]

T

.





(
5
)







Assume the scale factor w=1, equation (4) becomes










Ah
=
0

,




(
6
)








where








A
=


[




x
1




y
1



1


0


0


0




-

x
1




x
1







-

y
1




x
1






-

x
1







0


0


0



x
1




y
1



1




-

x
1




y
1







-

y
1




y
1






-

y
1








x
2




y
2



1


0


0


0




-

x
2




x
2







-

y
2




x
2






-

x
2







0


0


0



x
2




y
2



1




-

x
2




y
2







-

y
2




y
2






-

y
2





































x
n




y
n



1


0


0


0




-

x
n




x
n







-

y
n




x
n






-

x
n







0


0


0



x
n




y
n



1




-

x
n




y
n







-

y
n




y
n






-

y
n






]


2

n
×
9






(
7
)







and n is the number of points. Vector h can be obtained by the eigenvector of the least eigenvalue of ATA, which is the direct result of the singular value decomposition of matrix of A.


As shown in FIGS. 9A and 9B, 13 points are selected (i.e., n=13) from the drone view and the fixed-camera view, marked as yellow circles. The red crosses on the fixed-camera view are the points mapped from the drone view using matrix H, and they are very close to the hand selected points. FIG. 10 shows the bounding box of the truck from the drone view in different positions on the left, and then it is mapped to the image from the fixed camera on the right. Note that the image distortion from the fixed camera is corrected prior to the projection.


Using images from Unmanned Aerial Vehicles (UAVs), an algorithm is developed to track vehicles in an intersection and to generate labelled data sets for machine learning algorithms. Different from previous studies focusing on vehicle detection, classification, and tracking in traffic surveillance, this disclosure uses recorded videos from UAVs to extract and analyze the kinematic variables of vehicles. These variables include position, yaw angle, heading angle, speed, and yaw rate, etc. By comparing each video frame with a background image using Matlab morphological operations, the areas of changes are identified. A finer investigation of these areas are taken to draw the bounding boxes around the vehicles. The position and yaw angle can be read directly from the bounding boxes, while the speed and yaw rate are calculated accordingly.


The extracted kinematic data can be used in the control and planning of smart intersections, especially in the environment with connected automated vehicles (CAVs), since knowing the dynamics of other vehicles can benefit the CAVs for their decision making and path planning as well. With the obtained data and bounding boxes (from side view fixed camera) as the “ground truth”, this algorithm can be used to generate training data for machine learning algorithms which can perform more complex tasks, for example, online tracking.


The techniques described herein may be implemented by one or more computer programs executed by one or more processors. The computer programs include processor-executable instructions that are stored on a non-transitory tangible computer readable medium. The computer programs may also include stored data. Non-limiting examples of the non-transitory tangible computer readable medium are nonvolatile memory, magnetic storage, and optical storage.


Some portions of the above description present the techniques described herein in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. These operations, while described functionally or logically, are understood to be implemented by computer programs. Furthermore, it has also proven convenient at times to refer to these arrangements of operations as modules or by functional names, without loss of generality.


Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system memories or registers or other such information storage, transmission or display devices.


Certain aspects of the described techniques include process steps and instructions described herein in the form of an algorithm. It should be noted that the described process steps and instructions could be embodied in software, firmware or hardware, and when embodied in software, could be downloaded to reside on and be operated from different platforms used by real time network operating systems.


The present disclosure also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a computer selectively activated or reconfigured by a computer program stored on a computer readable medium that can be accessed by the computer. Such a computer program may be stored in a tangible computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus. Furthermore, the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.


The algorithms and operations presented herein are not inherently related to any particular computer or other apparatus. Various systems may also be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatuses to perform the required method steps. The required structure for a variety of these systems will be apparent to those of skill in the art, along with equivalent variations. In addition, the present disclosure is not described with reference to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present disclosure as described herein.


The foregoing description of the embodiments has been provided for purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure. Individual elements or features of a particular embodiment are generally not limited to that particular embodiment, but, where applicable, are interchangeable and can be used in a selected embodiment, even if not specifically shown or described. The same may also be varied in many ways. Such variations are not to be regarded as a departure from the disclosure, and all such modifications are intended to be included within the scope of the disclosure.

Claims
  • 1. A method for extracting kinematic data for vehicles using an unmanned aerial vehicle, comprising: defining a region of interest on the ground;providing a background image of the region of interest;capturing, by a camera, a series of images over time of the region of interest from a perspective above the region of interest;from the series of images, detecting, by a computer processor, at least one vehicle moving in the region of interest;for each image in the series of images, fitting, by the computer processor, a bounding box to the at least one moving vehicle, where the bounding box surrounds the at least one moving vehicle and the size of the bounding box is same across the series of images; anddetermining, by the computer processor, kinematic data for the at least one moving vehicle using the bounding boxes, where the kinematic data includes yaw angle for the at least one moving vehicle.
  • 2. The method of claim 1 further comprises capturing a series of images using an unmanned aerial vehicle, where the unmanned aerial vehicle is equipped with the camera.
  • 3. The method of claim 1 wherein the kinematic data includes position for the at least one moving vehicle, velocity for the at least one moving vehicle, yaw angle for the at least one moving vehicle and yaw rate for the at least one moving vehicle.
  • 4. The method of claim 1 further comprises detecting at least one moving vehicle by comparing each image in the series of images with the background image.
  • 5. The method of claim 1 wherein fitting the bounding box to the at least one moving vehicle includes overlaying a pre-defined image of a vehicle on a given detected vehicle; changing orientation of the pre-defined image in relation to the given detected vehicle; for each orientation, determining a correlation metric between the pre-defined image and the periphery of the given detected vehicle; and drawing a bounding box around the given detected vehicle based on the pre-defined image having the correlation metric with highest value.
  • 6. The method of claim 1 wherein determining kinematic data for the at least one moving vehicle further comprises measuring the yaw angle for the at least one moving vehicle as an angle between a longitudinal axis of the bounding box surrounding the at least one moving vehicle and a reference axis of a coordinate system.
  • 7. The method of claim 1 wherein determining kinematic data for the at least one moving vehicle further comprises determining a center point of the bounding boxes, calculating a distance between the center point of at least two of the bounding boxes and determining velocity of the at least one moving vehicle from the distance.
  • 8. A method for detecting a vehicle passing through a region of interest, comprising: capturing, by a top view camera, a first set of images of a region of interest on the ground from a perspective above the region of interest;from the first set of images, creating a plurality of bounding boxes for each vehicle moving in the region of interest, and extracting kinematic data for each vehicle moving in the region of interest using the plurality of bounding boxes;capturing, by a side view camera, a second set of images of the region of interest from a perspective on side of the region of interest;projecting the plurality of bounding boxes to viewpoint of the side view camera; andtraining a machine learning algorithm to detect moving vehicles in images captured by the side view camera, where the machine learning algorithm is trained using the second set of images and a ground truth, and the kinematic data and the plurality of bounding boxes projected to the viewpoint of the side view camera the serves as the ground truth.
  • 9. The method of claim 8 further comprises capturing the first set of images using an unmanned aerial vehicle, where the unmanned aerial vehicle is equipped with the top view camera.
  • 10. The method of claim 8 wherein the kinematic data includes position for the at least one moving vehicle, velocity for the at least one moving vehicle, yaw angle for the at least one moving vehicle and yaw rate for the at least one moving vehicle.
  • 11. The method of claim 8 wherein creating a plurality of bounding boxes further comprises overlaying a pre-defined image of a vehicle on a given detected vehicle; changing orientation of the pre-defined image in relation to the given detected vehicle; for each orientation, determining a correlation metric between the pre-defined image and the periphery of the given detected vehicle; and drawing a bounding box around the given detected vehicle based on the pre-defined image having the correlation metric with highest value.
  • 12. The method of claim 10 wherein extracting kinematic data for each moving vehicle further comprises measuring the yaw angle for a given moving vehicle as an angle between a longitudinal axis of the bounding box surrounding the given moving vehicle and a reference axis of a coordinate system.
  • 13. The method of claim 10 wherein extracting kinematic data for each moving vehicle includes determining a center point of a given bounding box in the plurality of bounding boxes, calculating a distance between the center point of at least two of the plurality of bounding boxes and determining velocity of the moving vehicle from the distance.
  • 14. The method of claim 8 further comprises projecting the plurality of bounding boxes to the viewpoint of the side view camera using homography.
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit and priority of U.S. Provisional Application No. 63/466,755 filed on May 16, 2023. The entire disclosure of the above application is incorporated herein by reference.

Provisional Applications (1)
Number Date Country
63466755 May 2023 US