Light Detection and Ranging (LiDAR) is a method for determining ranges (variable distance) by targeting an object with a laser and then measuring the time for the reflected light to return to a receiver. LiDAR has been utilized for many different types of applications such as making digital 3-D representations of areas on the earth's surface and ocean bottom.
LiDAR sensors have also been used in the intelligent transportation field because of their powerful detection and localization capabilities. In a particular application, LiDAR sensors have been installed on autonomous vehicles (or self-driving vehicles) and used in conjunction with other sensors, such as digital video cameras and radar devices, to enable the autonomous vehicle to safely navigate along roads. It has recently been recognized that LiDAR sensors could potentially be deployed as part of the roadside infrastructure, for example, incorporated into a traffic light system at intersections as a detection and data generating apparatus. The detected traffic data can then be used by connected vehicles (CVs) and by other infrastructure systems to aid in preventing collisions and to protect non-motorized road users, to evaluate the performance of autonomous vehicles, and for the general purpose of collecting traffic data for analysis. For example, roadside LiDAR sensor data at a traffic light can be used to identify when and where vehicle speeding is occurring, and it can provide a time-space diagram which shows how vehicles slow down, stop, speed up and go through the intersection during a light cycle. In addition, roadside LiDAR sensor data can be utilized to identify “near-crashes,” where vehicles come close to hitting one another (or close to colliding with a pedestrian or bicyclist), and thus identify intersections or stretches of roads that are potentially quite dangerous.
Connected-Vehicle (CV) technology is an emerging technology that aims to reduce vehicle collisions and provide energy efficient transportation for people. CV technology allows bi-directional communications between roadside infrastructure and the connected vehicles (road users) for sharing real-time traffic and/or road information and provide rapid responses to potential events and/or to provide operational enhancements. However, some currently deployed CV systems suffer from an information gap concerning information or data about unconnected vehicles, pedestrians, bicycles, wild animals and/or other hazards.
Roadside LiDAR sensor systems can potentially be utilized to close the information gap that typical CV systems suffer from. In particular, roadside LiDAR systems can be incorporated into the roadside infrastructure to generate data concerning the real-time status of unconnected road users within a detection range to thus provide complementary traffic and/or hazard information or data. For example, LiDAR sensor systems can be utilized to detect one or more vehicles that is/are running a red light and/or pedestrians who are crossing against a red light and share that information with any connected road users.
A common misconception is that the application of roadside LiDAR sensors is similar to the application of on-board vehicle LiDAR sensors, and that therefore the same processing procedures and algorithms utilized by on-board LiDAR systems could be applicable to roadside LiDAR systems (possibly with minor modifications). However, on-board LiDAR sensors mainly focus on the surroundings of the vehicle and the goal is to directly extract objects of interest from a constantly changing background. In contrast, roadside LiDAR sensors must detect and track all road users in a traffic scene against a static background. Thus, infrastructure-based, or roadside LiDAR sensing systems have the capability to provide behavior-level multimodal trajectory data of all traffic users, such as presence, location, speed, and direction data of all road users from raw roadside LiDAR sensor data. In addition, low cost sensors may be used to gather such real-time, all-traffic trajectories for extended distances, which can provide critical information for connected and autonomous vehicles so that an autonomous vehicle traveling into the area covered by a roadside LiDAR sensor system becomes aware of potential upcoming collision risks and the movement status of other road users while still at a distance away from the area or zone. Thus, the tasks of obtaining and processing trajectory data are different for a roadside LiDAR sensor system than for an on-board vehicle LiDAR sensor system.
Accordingly, for infrastructure-based or roadside LiDAR sensor systems, it is important to detect target objects in the environment quickly and efficiently because fast detection speeds provide the time needed for making a post-detected response, for example, by an autonomous vehicle to avoid a collision with other road users in the real-world. Detection accuracy is also a critical factor to ensure the reliability of a roadside LiDAR based system. Thus, roadside LiDAR sensor systems are required to exclude the static background points and finely partition those foreground points as different entities (clusters).
In addition to supporting connected and autonomous vehicles, the all-traffic trajectory data generated by a roadside LiDAR system may be valuable for traffic study and performance evaluation, advanced traffic operations, and the like. For example, analysis of lane-based vehicle volume data can achieve an accuracy above 95%, and if there is no infrastructure occlusion, the accuracy of road volume detection can generally be above 98% for roadside LiDAR systems. Other applications for collected trajectory data include providing conflict data resources for near-crash analysis, including collecting near-crash data (especially vehicle-to-pedestrian near-crash incidents) that occur during low-light level situations such as during rainstorms and/or during the night hours when it is dark. In this regard, roadside LiDAR sensors deployed at fixed locations (e.g., road intersections and along road medians) provide a good way to record trajectories of all road users over the long term, regardless of illumination conditions. Traffic engineers can then study the historical trajectory data provided by the roadside LiDAR system at multiple scales to define and extract near-crash events, identify traffic safety issues, and recommend countermeasures.
Regarding traffic performance measurements using trajectory data, one challenge is how to classify all of the different types of road users as accurately as possible, even if they are far away from the LiDAR traffic sensors. In general, large-sized road users (e.g., vehicles such as cars, buses and trucks) are relatively easy to identify as compared to smaller-sized road users (e.g., pedestrians, bicyclists and wheelchair users) because the direction of movement and the appearance of small-sized road users constantly changes. In addition, road users who are located near the LiDAR sensor(s) and/or other sensors and are not occluded by other objects are more likely to be classified into a category correctly because such collected data or information is more comprehensive and reliable.
In a classic vehicle on-board LiDAR-Vision sensing system, two sensors work together in a centralized or in a decentralized manner to generate data that can be used to achieve somewhat satisfactory object classification. Centralized processing means that a sensor fusion process occurs at the feature level, wherein features extracted from the LiDAR data and the video data are combined into a single vector for classification using a single classifier. Decentralized processing occurs when two classifiers are trained with LiDAR data and video data individually, and then the final classification results are combined through a set of fusion methods. For example, convolutional neural network (CNN) and image up-sampling methods may be used to generate LiDAR-Vision fusion data that achieves a classification accuracy of about 100% for pedestrians and cyclists, about 98.6% for cars, and about 88.6% for trucks.
Weaknesses inherent in the use of existing on-board LiDAR classifiers include a short effective detection range and no vision data supporting the classification. The inventors have recognized that there is a need for providing improved classification methods and systems for accurately classifying pedestrians, wheelchair users, vehicles and cyclists using roadside LiDAR sensor systems.
Features and advantages of some embodiments of the present disclosure, and the manner in which the same are accomplished, will become more readily apparent upon consideration of the following detailed description taken in conjunction with the accompanying drawings, which illustrate preferred and example embodiments and which are not necessarily drawn to scale, wherein:
Reference will now be made in detail to various novel embodiments, examples of which are illustrated in the accompanying drawings. The drawings and descriptions thereof are not intended to limit the invention to any particular embodiment(s). On the contrary, the descriptions provided herein are intended to cover alternatives, modifications, and equivalents thereof. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the various embodiments, but some or all of the embodiments may be practiced without some or all of the specific details. In other instances, well-known process operations have not been described in detail in order not to unnecessarily obscure novel aspects. In addition, terminology used in the Detailed Description is intended to be interpreted in its broadest reasonable manner, even though it is being used in conjunction with certain examples. The terms used in this specification generally have their ordinary meanings in the art, within the context of the disclosure, and in the specific context where each term is used.
As used herein, the term “module” refers broadly to software, hardware, or firmware (or any combination thereof) components. Modules are typically functional components that can generate useful data or other output using specified input(s). A module may or may not be self-contained. An application program (sometimes called an “application” or an “app” or “App”) may include one or more modules, or a module can include one or more application programs.
In general, and for the purposes of introducing concepts of embodiments of the present disclosure, disclosed herein are roadside LiDAR systems and methods for processing, analyzing, and using traffic data to create a human-in-the-loop system in which all road users, especially pedestrians and cyclists, will be actively protected. Some embodiments focus on computational efficiency because roadside LiDAR applications must quickly and accurately perform complete background filtering and monitor real-time traffic movements. Thus, in some embodiments methods are disclosed for fast background filtering of roadside LiDAR data that embed background filtering within the decoding process to exclude irrelevant information, thus avoiding unnecessary calculation of those datapoint coordinates (as well as avoiding performing further segmentation and/or clustering and classification processing). Specifically, an improved method is disclosed that translates the location of 3D LiDAR datapoints into a 2D structure based on channel, azimuth, and distance information.
In another aspect, disclosed is a fast clustering method that is based on generating a two-dimensional (2D) map. Datapoint cloud clustering is an important component for processing roadside LiDAR traffic data, and clustering processing has a profound effect on the accuracy and efficiency of object detection which impacts the effectiveness of roadside LiDAR applications such as LiDAR based connected-vehicle systems. In some disclosed embodiments, a fast clustering process is based on a spherical projection map wherein the process provides improved detection accuracy and lower computational complexity as compared to conventional methods. The fast clustering method also enables a wider detection range and provides improved robustness. Reasons for the improvement include the adoption of a 2D searching window and the use of a spherical map which greatly improves processing speed for performing a neighborhood query in the clustering process and provides less sensitivity to the varying point density. In some implementations, the Fast Spatial Clustering based on Two-dimensional Map (FSCTD) process is implemented using the Python coding language.
In yet another aspect associated with processing roadside LiDAR sensor data, disclosed is a high-accuracy, feature-based road user classification process. The high-accuracy process utilizes prior trajectory information to classify vehicles, pedestrians, cyclists and wheelchairs. By updating significant features based on prior information of the entire trajectory, more critical features were used as the input of classifiers to increase classification accuracy. In an implementation, the process includes updating critical features based on prior trajectory information, which greatly improves the accuracy of classification, especially for classes having a small number of observations.
Referring again to
Referring again to
The roadside LiDAR sensing systems described above with reference to
1. Background Filtering
As mentioned earlier, background filtering is a crucial element to consider when processing roadside LiDAR sensor data. This is because, according to data collected from testbeds, only about five percent (5%) to ten percent (10%) of the raw LiDAR sensor data are target points of interest. Thus, methods disclosed herein advantageously provide a fast and accurate background filtering process to remove datapoints that are not of interest which at the same time increases the overall processing speed and minimizes storage requirements. Since the background objects detected by infrastructure-based LiDAR sensors are fixed, the goal of background filtering for roadside LiDAR sensing systems is to classify raw datapoints into background datapoints (to discard) and target datapoints (to save).
LiDAR sensors use a wide array of infra-red lasers paired with infra-red detectors to measure distances to objects, and these LiDAR sensors are typically housed within a compact, weather-resistant housing. The array of laser/detector pairs spins rapidly within its fixed housing to scan the surrounding environment (a 360-degree horizontal field of view) and to provide a rich set of 3D point data in real time. Factors that are used to select which LiDAR sensor to use for any particular application include the number of channels, the vertical field of view (FOV), and the vertical resolution of the laser beams. In general, sensors with more laser channels, a larger vertical FOV, and a higher vertical resolution are more productive in data collection.
During use, one data frame is generated after the LiDAR sensors complete a three-hundred and sixty-degree (360°) scan. The collected point cloud is stored in a packet capture (.pcap) file and the size of the packet is determined by the number of laser channels, the time of data collection and the number and complexity of the surrounding objects. There are two steps for decoding “pcap” files to get three-dimensional (3D) LiDAR points. The first step includes obtaining azimuth (α), elevation angle (ω), 3D distance (R) between the object and the LiDAR sensor, timestamp data and intensity information, wherein azimuth is defined as a horizontal angle measured clockwise from any fixed reference plane or easily established base direction line. The second step is to convert the spherical coordinates to Cartesian coordinates. In other words, the azimuth, elevation angle, and the 3D distance of each data point in the spherical coordinate system can be firstly obtained from decoding the original LiDAR data file, while information such as the 3D location of each point in the Cartesian coordinate systems needs further calculation for conversion.
LiDAR sensor data can be represented as a point cloud. A data point's (X, Y, Z) Cartesian coordinates and (R, ω, α) spherical coordinates represent the same location of a data point. The 3D Cartesian coordinates of LiDAR sensor data are usually described by a 3D matrix, in which each dimension records the coordinates of the data points along one direction.
The locations that comprise the 3D point cloud can be described by a two-dimensional (2D) matrix/table structure based on the inherent properties of the LiDAR sensors, in which each row of the table indicates each elevation angle/channel of the laser beams; each column of the table represents each azimuth interval of the laser beams during a 0° to 360° scan, and the content of the table is the range value (i.e., the 3D distance between the LiDAR sensor and the data point collected from an object surface). In this manner, the range values of the data points which were measured by the same laser beam at the same azimuth interval are recorded in the same cell of the 2D table.
Given a LiDAR sensor with N laser beams, with a rotation frequency off Hz (i.e., f revolution per second), and wherein the firing time is t second per firing cycle, the horizontal azimuth resolution (ΔAzimuth) of laser beams can be calculated by:
ΔAzimuth=f(rotation/sec)×360(degree/rotation)×t(second/firing cycle)
The sized of the 2D table is N×M where the number of columns M is equal to 360 degrees divided by ΔAzimuth.
For roadside applications, there is no need to spend processing time and use memory space to calculate and save the Cartesian coordinates of the background objects because the large numbers of points will be deleted after the background filtering process. To address this issue, a decoding-based approach classifies the background points and the target (non-background) points based on the initially obtained channel, azimuth, and range information during the decoding process, and then only the target points are saved for future use.
For roadside applications, background objects can be classified into two categories. The first category includes static background objects such as buildings and ground surfaces, while the second category includes dynamic background objects such as waving bushes and trees. How to accurately capture the locations of background objects is crucial to the quality of background filtering.
With regard to the LiDAR sensor system's working mechanism, there is no guarantee that the laser beams will always shoot out at the exact same location during each rotation of the sensor arrays, and in some cases some laser beams may not return. Therefore, the number of data points collected in each 360° scan is usually not exactly the same every time. Considering the impact of random factors, continuous raw LiDAR data frames were aggregated to find more accurate locations of static background objects. Using the 2D data structure, the range values of all objects from each data frame were aggregated cell by cell to form a new 2D matrix of the same size.
To determine whether data are from static or dynamic objects, all data points in each table cell are segmented into different groups according to the one-dimensional clustering of aggregated distance values. In this manner, the mean distance, minimal distance, and point count information of each group can be obtained. Then, based on the mean distance, the groups in each table cell were sorted in ascending orders. If the count value of the last group (i.e., the farthest group from the sensor) is greater than a point threshold, then this group of data points was considered to have been measured from static background objects. In other words, the static background is identified by the farthest cluster that meets the point requirement. For example, the identified static backgrounds in
2. Identification of Dynamic Background in 3D Point Cloud Space
After excluding static background objects from the selected data frames of aggregation, the remaining data are composed of target objects, dynamic background objects, and some noise points. As dynamic background objects often change in slight motion but in a random fashion, the variation of measured distance is much larger than that of static background objects, which presents the challenge for filtering. A common method for filtering dynamic backgrounds is to use a simple frequency threshold, which usually leaves dynamic background points in the filtered data.
Feature 1: the maximal height of the entire cluster.
In general, the heights of dynamic background objects and target objects are different. For example, a vehicle or a pedestrian is higher than bushes, but lower than trees.
Feature 2: the maximal length of the entire cluster along the X- or Y-axis.
Since the target objects are moving, the obtained target object clusters are formed along the object's moving direction over time. For dynamic background objects, although it may be waving, the location will not change.
Feature 3: the standard deviation of the maximal height of each sliced cluster.
Cut the cluster along X- or Y-axis based on the Feature 2 to obtain sliced clusters. The maximal height in vehicle clusters will not change dramatically, while background cluster (such as trees) may have quite different heights.
Representation of Static and Dynamic Background in 2D Data Structure
After identifying the static and dynamic backgrounds, attention was directed to generating background tables using the channel-azimuth 2D LiDAR data structure. For a specific LiDAR sensor, a blank 2D background table can be created based on the sensor's configuration and the frequency of rotation. The blank table will first be updated to an initial background table by studying the static background information, and then to a final background table by learning dynamic background information. This final background table is a critical input for background filtering of raw LiDAR data packets.
Under the channel-azimuth 2D LiDAR data structure, the range value of each data point from the identified static and dynamic backgrounds is saved in the corresponding table cell based on its channel and azimuth information. Since all background data need to be filtered, the minimal range value is chosen as the distance threshold for each table cell. For example, the range thresholds of the two static backgrounds shown in
The complete procedure of static and dynamic background identification and background table generation is illustrated by the flowchart in
The goal of background filtering for roadside LiDAR data is to exclude the maximum amount of background objects while at the same time keeping the target objects as complete as possible.
Background Filtering in a 2D LiDAR Data Structure
Background filtering is executed based on a comparison of the distances between raw LiDAR datapoints and pre-saved background datapoints. According to the working principle of LiDAR sensors, whether the target object is measured by positive or negative laser beams, the 3D distance between the target object and the LiDAR sensor is always less than the 3D distance between the background object and the LiDAR sensor if these two objects are within the same azimuth interval and are measured by the same laser beam.
The same principle applies to the laser beams with positive angles: d3=OC (to background object 610) and d4=OD (for target object 608), wherein d4 is less than d3. Therefore, if the distance of a datapoint is less than the distance of the corresponding background datapoint, then this datapoint is considered as a target datapoint and should be saved. Additionally, a large portion of the distance threshold values in the final background table are zeros because there are no background objects at the location. Under such circumstances, if the measured distance of a raw datapoint is greater than zero, this datapoint is considered a target datapoint and saved.
After decoding the original data packet, a raw datapoint “A” is considered to be measured by a laser beam with a vertical angle ω at an azimuth angle α and the distance from the LiDAR sensor is D. Based on the channel and azimuth information, distance threshold Db of the corresponding background can be found from the final background table. The criteria for determining the point A as a target datapoint are:
(D<Db) OR (Db=0 AND D>0)
Otherwise, point A is a background datapoint and will be discarded.
Given a LiDAR frame with K points, in which p% and (1−p%) of the points are target points and background points, respectively. The time for processing each individual data point is composed of:
Using the proposed 2D LiDAR data structure, the total calculation time can be calculated by (as shown in
T
2D
=K×t
1
+K×(2t2)+K×t3+K×p%×t4 (1)
Using the traditional 3D LiDAR data structure, the total calculation time is (also shown in
T
3D
=K×t
1
+K×t
4
+K×(3t2)+K×t3 (2)
Therefore, the quantitative difference between T3D and T2D is:
ΔT=T3D−T2D=K×t2+(1−p%)K×t4 (3)
Where the first term involves finding an additional location index along the third dimension of all data points, and the second term refers to the unnecessary coordinate transformation of background points.
In roadside LiDAR applications, the percentage of target points is usually substantially smaller than that of background points. For example, a 16-channel LiDAR sensor (10 Hz rotation frequency) collects nearly K=30,000 points/frame and only p=5% of them are target points, which means that in the coordinate transformation step only, the proposed method would be 20 times faster than the traditional 3D structure-based methods (1,500 vs. 30,000, because K×p%=1,500 from Eq. (2), and K=30,000 from Eq. (3)).
In order to evaluate accuracy, one 16-laser and two 32-laser LiDAR sensors with 10 Hz rotation frequency manufactured by the Velodyne LiDAR company were installed at three intersections to collect pedestrian and vehicle data. The collected raw data was used to evaluate the background filtering method, and the size of the 2D background table for the 16-laser and for the 32-laser sensor is 16×1800 and 32×1800, respectively. It was shown that the background filtering process disclosed herein achieved an average background filtering rate of 99.74% and an average target retention rate of 98.57% for the 16-laser sensor data. In addition, the method achieved a relatively higher target retention rate (99.05%) while maintaining a considerable background filtering rate (99.77%) for the 32-laser sensor data.
An evaluation of the processing speed indicates that, on average the decoding-based method took only about 0.65 milliseconds to process a single data frame collected by the 16-channel LiDAR sensor, and only took 0.90 milliseconds for the 32-channel LiDAR sensor (10 Hz rotation frequency). With similar background filtering rate and target retention rate, the disclosed method is over 77 (16-channel sensor) and 154 times faster (32-channel sensor; 0.65 ms/frame vs. 50 ms/frame and 100 ms/frame) than other known methods.
In addition to speed and accuracy, data storage is another problem which needs to be considered in practice. A key component of decoding-based background filtering is the 2D background table, which is used throughout the entire process. Saving a 2D table/matrix requires the memory space of:
S
1
=M×N×B
where S1 is the total storage space of a 2D table; M is the number of rows in the 2D table (channel); N is the number of columns in the 2D table; B is the number of bytes in each cell of the 2D table. For example, saving a 2D background table for a 16-laser LiDAR sensor (10 Hz) using the 2D data structure in MATLAB requires 16×1800×8=230,400 bytes=225 KB memory. Similarly, saving a 2D background table for a 32-laser LiDAR sensor (10 Hz) using the 2D data structure in MATLAB requires 32×1800×8=460,800 bytes=450 KB of memory. Even though the value of M is determined by the sensor's rotation frequency, the required memory has nothing to do with the size of the detection space. Storing a 3D matrix of datapoints would require a much larger memory space on the order of 1,250,000 KB or 1,220 MB of memory.
The disclosed background filtering method advantageously integrates the background filtering within the process when the original LiDAR data packet is decoded and provides a significant efficiency improvement as compared to conventional methods. A critical component of the method is the creation of a new 2D background table that automatically learns the critical distance information of both static and dynamic backgrounds. Moreover, the present decoding-based method outperforms known methods for roadside LiDAR applications, and also outperforms methods commonly used for onboard LiDAR applications. In addition, the process provides a major breakthrough in processing speed which lays a solid background for real-time application of infrastructure-based LiDAR sensors in connected vehicles and for infrastructure systems. The new method may also beneficially be integrated with clustering and tracking algorithms.
Object Clustering and Identification
In another aspect, disclosed is a fast clustering method based on a two-dimensional (2D) map process. Datapoint cloud clustering is an important component for processing roadside LiDAR traffic data. The clustering algorithm has a profound effect on the accuracy and efficiency of object detection and further impacts the effectiveness of roadside LiDAR applications such as LiDAR based connected-vehicle systems. In some disclosed embodiments, a fast clustering process is based on a spherical projection map that provides improved detection accuracy and lower computational complexity as compared to conventional methods. The fast clustering method also enables a wider detection range and improved robustness. Reasons for the improvement include the adoption of a 2D searching window and the use of a spherical map which greatly improves the time complexity of a neighborhood query in the clustering process and provides less sensitivity to varying point densities. In some implementations, the Fast Spatial Clustering process is based on a two-dimensional map (FSCTD) method and may be implemented using the Python coding language or other programming languages.
The first step for FSCTD is to convert 3D coordinates into a 2D map by using spherical projection. As explained earlier, for a datapoint in the 3D space, in addition to XYZ coordinates, it could also be described by Azimuth α, Zenith θ and Distance d, where the α and θ refer to the horizontal and vertical angle respectively, and d refers to the distance from the LiDAR sensor to the point. Basically, the idea is to project the distance value of each point onto a 2D map according to the corresponding spherical information: azimuth and zenith. The greatest advantage of converting to a spherical map is that image processing can occur with regard to the 3D datapoint cloud.
The Fast Spatial Clustering based on Two-Dimensional Map (FSCTD) Process
The FSCTD process is based on the known Density-Based Spatial Clustering of Applications with Noise (DBSCAN) method. The DBSCAN process is a prevalent algorithm for using in performing a clustering task, as it identifies arbitrary shaped clusters according to their spatial density while also ruling out noise. In general, two parameters are involved in the DBSCAN: “minPts” (minimum sample points to establish a core point) and “eps” (neighborhood searching radius). DBSCAN operates as follows: First, for each point, search the neighborhoods in an eps-radius, and mark those points whose number of neighborhoods within the eps-radius satisfy the minPts threshold as core points. Second, identify the connected components of the core points based on the direct and indirect interconnection of neighborhoods, and ignore the non-core points. Third, assign each non-core point to a nearby connected component if the non-core point is within the eps-radius, otherwise the non-core points are labeled as noise.
As suggested in the DBSCAN process, time complexity is composed of two parts: 1) the time for checking the core points, which consumes O(N); and 2) the time for searching the neighborhoods, which consumes O(N) per point as a worst case situation. If a data structure such as a KD-Tree is applied to improve the neighborhood searching, the time complexity would be improved O(log(N)). Therefore, the total time complexity is O(N2) to O(Nlog(N)). For data associated with heavy traffic conditions that involves many moving objects, the large number of foreground points would exponentially increase the time needed for performing the clustering process, which results in a lag that is not optimal for a real-time response in roadside LiDAR applications.
The FSCTD process includes a modification of the neighborhood searching part of the DBSCAN process. Compared to DBSCAN, the FSCTD searches the neighborhoods on the spherical 2D map, which improves efficiency and accuracy.
Referring again to
In this criterion, the term ε is another parameter utilized by the FSCTD process to exclude datapoint readings whose distance deviate too far from the center point, which is necessary because even though two points are close to each other on the spherical map they may still be far in 3D space. Since the neighborhood query using the 2D-window only focuses on the cells within the window, the time complexity is independent from the whole problem scale N, and thus the time complexity for 2D-window searching is O(1) and the whole complexity for FSCTD is O(N).
Another advantage for FSCTD is its robustness to the variation of point density problem, which improves both accuracy and range of detection.
Use of the FSCTD process avoids the “sparseness” or impact problem because: 1) for objects at the far side, even though the distance of two adjacent points is large, the two points are still near each other on the spherical map due to the rotational characteristic of a LiDAR sensor; and 2) the parameters in FSCTD can be tuned to detect both near and far sides of objects correctly.
Specifically,
D
DBSCAN
=d
far
2
+d
near
2−2dneardfar cos(Δa) (1)
D
FSPC
=d
far
−d
near (2)
In this case, the Δα is the delta azimuth angle between two horizontal beams, which is a constant equaling to 0.2 degree, hence cos (Δα) approximates to 1 and the term 2dneardfar cos(α) is approximately equal to 2dneardfar. Thereby the DDBSCAN could be rewritten as (dfar−dnear)2. Assuming that dfar=γdnear, then DFSCTD and DDBSCAN could be recognized as:
D
DBSCAN=(γ−1)2dnear2 (3)
D
FSPC=(γ−1)dnear (4)
Where γ is an amplification factor which determines the heading degree of the object surface and which is greater than 1. It could be inferred that the DFSPC is less sensitive compared to DDBSCAN since the DFSCTD is increased linearly while the DDBSCAN is increases exponentially with the distance. Thus, FSCTD easily identifies the appropriate threshold to separate entities in both the far and the near side because the distance measurement for FSCTD only increases by a small extent. For example, the Euclidean distance between adjacent points would increase 10000 times from the 1 m to 100 m whereas the distance measurement for FSCTD only increases 100 times.
The output of FSCTD is a Labeling map with identical size as the spherical map, where each cell stores the point-wise instance label. The Labeling map could be decoded as 3D point coordinates associated with the spherical information provided on the spherical map.
The FSCTD was tested on a dataset including 18,000 frames (half an hour) of post-filtering point clouds processed by the background filtering method. Among which 2000 frames were manually labeled with a point-wise classifier (target object or background object).
As mentioned in the methodology, three parameters: E(eps), size of 2D window and minPts are involved in the FSCTD process. For evaluation purposes, the best combination of different parameters is first tuned to obtain optimal performance. Use of a conventional DBSCAN (provided by skit-learn library) was also included for comparison. Since we only focus on the detection of target objects (all road users), the total classes in the clustered results are comprised of dynamic background (noise object left after background filtering) and target points (foreground) as labeled in the after visualization 1604 of
Where Oc and Gc respectively refer to the output and ground truth point sets belonging to class c. The symbol |·| denotes the cardinality (number of points) of the point set.
The parameter searching range is presented in the following table, which was determined empirically.
The tuning starts from the parameter minPts.
Given a 15 minPts setting, the parameter setting of eps was also tested, and the graphs shown in
Next, the size of the searching window (Window Width) was tested, which includes two parameters: height and width. The window size significantly affects the searching range and the computation load, and thus it was tested in association with time consumption. FIG. 19 and
Based on the systematic analysis, the optimal parameter for FSCTD is summarized in Table 2 and the same analysis is implemented and shown for DBSCAN in Table. 2, below
An overall summary of metrics based on best parameters setting is shown in Table 3, below. Beside, an instance level error between the number of detected object and ground truth is introduced to adequately represent the performance. The instance level error is calculated by counting the instances number that have instance-IoU over 50% between output and ground truth. It can be seen that the FSCTD process outperforms traditional DBSCAN at all metrics. The precision, recall and IoU for target points could reach to 98.2%, 98.3% and 96.5% respectively. Based on the metrics for background points, it is indicated that over 99% of the dynamic background points could be ruled out by both algorithms. In total 1511 ground truth instances, of which 1483 are correctly detected with over 50% IoU, while the traditional DBSCAN only tackled 1414 with lower IoU. Most importantly, the FSCTD process is faster and more robust as compared to DBSCAN. FSCTD on average only takes 24.4 ms/frame to process a frame, but within the same dataset, DBSCAN takes 117.1 ms/frame to process a frame. Also, the standard deviation of the time consumption for FSCTD is 2.36 but is 6.59 for DBSCAN, proving that the FSCTD process is more robust.
The major reason for the accurate result can be attributed to solving the variant density problem by using the TD-MAP structure and 2D-window searching, which was explained above.
For roadside LiDAR traffic data processing, the robustness to variant computation loads is important since the foreground points can periodically surge due to the nature of traffic along a road or intersection. For this concern, the impact of the number of foreground points to algorithm efficiency was tested, and
(It should be noted that the DBSCAN process can outperform the FSCTD process in a situation wherein the number of foreground points is less than 2000 because the FSCTD process consumes more of a computational load in the preprocessing steps, such as when establishing the TD-map. As a consequence, the time for preprocessing exceeds the time for conducting the clustering process. Nevertheless, the FSCTD process still outperforms the DBSCAN process when it comes to both efficiency and robustness in over 90% situations.)
The disclosed fast clustering process (FSCTD) efficiently and accurately clusters variant-density points by utilizing the spherical projected TD-Map and 2D-window searching to accelerate the clustering process and to advantageously alleviate the impact of the variant-density phenomenon. Test results showed that: 1) the FSCTD process achieves results within 30 ms/frame for the most common scenes (foreground points within 5000); and 2) over 96% of objects have over 50% IoU with instance-level ground truths, and over 98% in both precision and recall metrics. In addition, as compared to conventional DBSCAN use, the FSCTD process outperformed in all metrics with a processing speed that was 4.8 times faster and had a higher overall clustering accuracy of over 96%. In the time complexity analysis, the FSCTD process was not greatly impacted by the surge of foreground points and thus in over 90% of scenes the process time would be maintained within 40 ms. Additionally, a wider detection range was identified of up to 200 meters from the LiDAR sensor which is 33% longer than that for the conventional DBSCAN process. Accordingly, the high performance results of the FSCTD process can be beneficially used in practice to improve the roadside LiDAR sensor detection process. Moreover, the background filtering method disclosed herein, which is based on spherical mapping and a tracking algorithm, can be added before and/or after use of the FSCTD process when processing roadside LiDAR sensor data. In addition, it may be possible to improve the speed of the FSCTD process by utilizing coding techniques to further accelerate the processing speed.
New Object Classification Process
Another aspect associated with processing roadside LiDAR sensor data is a high-accuracy, feature-based road user classification process. The high-accuracy process disclosed herein utilizes prior trajectory information to classify vehicles, pedestrians, cyclists and wheelchairs. By updating significant features based on prior information of the entire trajectory, more critical features can be used as the input(s) of classifiers to increase classification accuracy. In an implementation, the process includes updating critical features based on prior trajectory information, which greatly improves the accuracy of classification, especially for classes having a small number of observations.
There is limited information available concerning classification of objects using roadside LiDAR, but a known process includes using an artificial neural network (ANN) classifier with three features (i.e., the number of points, the distance to LiDAR, and spatial distribution direction) extracted from a 16-laser LiDAR sensor to distinguish vehicles and pedestrians. Even though such LiDAR sensor systems have been shown to achieve 96% classification accuracy within a 30 meter (m) detection range, it is difficult for the trained classifier to further classify other types of road users such as cyclists and wheelchairs. Accordingly, weaknesses of existing roadside LiDAR classifiers include the short effective detection range and the limited types of road users that can be classified, and such short detection ranges and classification limitations fail to meet the detection and classification requirements necessary for multimodal traffic detection using roadside LiDAR sensors.
LiDAR Sensor
As mentioned above, LiDAR sensors use a wide array of infra-red lasers paired with infra-red detectors to measure distances to objects, and such sensors are typically securely mounted within a compact, weather-resistant housing. The array of laser/detector pairs spins rapidly within its fixed housing to scan the surrounding environment and provide a rich set of point data in real time. LiDAR sensor configuration considerations include the number of channels, the vertical field of view (FOV), and the vertical resolution of the laser beams. In general, LiDAR sensors with more laser channels, larger vertical FOV and smaller vertical resolution are more productive in collecting traffic data. Installation considerations such as height and inclination of the LiDAR sensors also affect detection performance.
A study was conducted using the Ultra Puck (VLP-32C) LiDAR sensors manufactured by the Velodyne company. The Ultra Puck is a 360-degree LiDAR sensor with 32 laser beams and has a vertical FOV of −15° to +25°, but the distribution of the laser beams' vertical resolution is non-linear. A detailed configuration of the Ultra Puck LiDAR sensor is listed in Table A1 below.
Feature Selection
For supervised classification tasks, feature selection is a critical step in training classifiers. Good features should be able to effectively distinguish different classes and can be easily obtained from datasets. LiDAR sensors are good at accurately capturing the surface shape of objects, therefore, seven features can be extracted (five of them are related to dimensionality) from the point cloud of clusters for vehicle/pedestrian/cyclist/wheelchair classification.
Significant Features
With regard to analyzing the importance of features, when training with different classifiers, even though the ranking of feature significance was different, Feature 6 (2D length) always ranked first among the seven proposed features. For example,
Feature 6 (2D length) is calculated based on the dimension of the generated bounding box that covers the projection of clusters on XY plane, but it cannot stably reflect the true dimension of detected objects. This is because the number of data points collected by roadside LiDAR sensors is affected by factors such as occlusion issues, the sensors' field of view, the relative position and height of the LiDAR sensors and target objects, and the like considerations. In addition, the direction of movement of a road user relative to the scanning direction of the laser beams of the LiDAR sensor also affects the detection quality. In particular, more data points will be collected for an object moving perpendicular to the LiDAR sensor than from objects moving parallel to the LiDAR sensor, and more data points means a more comprehensive description can be obtained for objects. In other words, if LiDAR sensors can scan the side of an object, the detected length value is closer to the real size of the object. However, if LiDAR sensors can only scan the front or back of an object, the reliability of the detected length is low.
A detailed analysis of
In
Another characteristic of the Distance-Length distribution is that, for each individual road user, the detected length of the object will fluctuate as the object moves. This characteristic indicates that when an object is moving relative to the LiDAR sensor, the detected dimension (e.g., length) of the same object is also changing and there may be some outliers.
Four supervised classification methods were utilized: artificial neural network (ANN), random forest (RF), adaptive boosting (AdaBoost), and random undersampling boosting (RUSBoost) for vehicle/pedestrian/cyclist/wheelchair classification. A brief description of each is provided below.
Artificial Neural Network (ANN)
ANN is a multilayer feedforward neural network composed of an input layer, a hidden layer, an output layer, and neurons in each layer. Input data are fed into the input layer. The activity of each hidden layer is determined by the inputs and the weights that connect the input layer and hidden layer. A similar process takes place between the hidden layer and the output layer, and the transmission from one neuron in one layer to another neuron in the next layer is independent. The output layer produces the estimated outcomes. The comparison information (error) between the target outputs and the estimated outputs is given back to the input layer as a guidance to adjust the weights in the next training round. Through each iteration, the neural network gradually learns the inner relationship between the input and the output by adjusting the weights for each neuron in each layer to result in a best accuracy output. When the minimum error is reached, or the number of iterations is beyond a predefined range, the training process is terminated with fixed weights.
Random Forest (RF)
Random forest is an ensemble learning method for classification, regression, and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, each individual decision tree in the random forest outputs a class prediction and the class with the most votes become the output of the random forest. Random forest essentially enables a large number of weak or weakly-correlated classifiers to form a strong classifier, and generally outperforms decision trees, but its accuracy is lower than that of gradient boosted trees. In addition, the computational cost of running random forest on large datasets is low.
Adaptive Boosting (AdaBoost)
AdaBoost is an ensemble learning method which was initially created to increase the efficiency of binary classifiers. AdaBoost uses an iterative approach to learn from the mistakes of weak classifiers, increase weights for misclassified observations and reduce the weights for correctly classified observations, and turn weak classifiers into strong ones. In terms of AdaBoost for multiclass classification, instead of weighted classification error, weighted pseudo-loss is used as a measure of the classification accuracy from any learner in an ensemble. AdaBoost is fast, simple, and easy to implement and has the flexibility to be combined with other machine learning algorithms, but it is sensitive to noisy data and outliers.
Random Undersampling Boosting (RUSBoost)
Random undersampling boosting is especially effective at classifying imbalanced data, meaning some classes in the training data have much fewer members than others. The algorithm takes N observations in the class with the fewest observations in the training data, as the basic unit for sampling. Classes with more observations are under sampled by taking only N observations of every class. In other words, if there are K classes, for each weak learner in the ensemble, RUSBoost takes a subset of the data with N observations from each of the K classes. The boosting procedure follows the procedure in AdaBoost for multiclass classification for reweighting and constructing the ensemble.
Road User-Data Example
In order to collect road user data, two “Ultra Puck” LiDAR sensors manufactured by the Velodyne company, each having a 10 Hertz (Hz) frequency of rotation, were installed at two intersections (Site 1 and Site 2). At Site 1, roadside LiDAR sensor data of vehicles, pedestrians, and cyclists was collected by portable roadside LiDAR equipment for three days from 7:00 am to 8:00 pm each day. At Site 2, roadside LiDAR sensor data of vehicles, pedestrians, cyclists, and wheelchairs was collected for a two day period and then for a five day period several months later. A “senior center” and a Walmart were located on the opposite side of a selected intersection, and thus some wheelchair activity data was collected within a certain period of time at Site 2.
Due to the characteristic of traffic at the selected intersections of Site 1 and Site 2, the number of vehicles detected was much larger than the number of objects in the other three classes. In order to include more observations of cyclists and wheelchairs in the training and testing datasets, data was selected during certain time periods from Site 2 after watching a raw LiDAR sensor data stream. Since the trajectory information of road users will be used to increase the classification accuracy in an offline manner, the length of continuous trajectories should meet certain standards (e.g., greater than 20 data frames, 0.1 second/frame). To train a classifier, the whole dataset including four categories was divided into Training dataset (70%) and Testing dataset (30%), as shown in Table 5.1 below. It is obvious that compared with vehicles and pedestrians, the number of cyclists and wheelchairs is much smaller, which makes classification with imbalanced datasets challenging.
Classification Without Considering Prior Trajectory Information
In this section, seven features (introduced above 3) extracted from the training dataset were inputs of the four selected classifiers (ANN, RF, AdaBoost, and RUSBoost), and then the testing dataset was used to evaluate the performance of the trained classifiers. To measure the quality of classifiers, the recall of each class using different classifiers was calculated. Mathematically, the recall is defined as follows:
where, TP is true positive; FN is false negative.
According to Table 5.2(a) below, the vehicle's recall and pedestrian's recall reached 99% and 97% using ANN, RF, and AdaBoost classifiers, but the cyclist's recall was below 50%. Through dealing with imbalanced data, the RUSBoost classifier can improve the cyclist's recall to 66% but it also reduces the pedestrian's recall to 91%.
Classification Considering Prior Trajectory Information
Taking the modified significant features as input, the recalls of four types of road users using the same classification methods are listed in Table 5.2(b) below. Table 5.2(b) shows that the recall rates of cyclist and wheelchair were all above 99% when using AdaBoost and RUSBoost classifiers, which was a huge improvement compared with the previous results in Table 5.2(a). This indicated the effectiveness of updating significant features by considering prior trajectory information to increase classification accuracy.
Sensitivity Analysis
As a result of the misclassification example explained above in Section 5.2, it was determined that the length of trajectories must be greater than 20 frames, and Feature 6 was updated to the maximum length within the entire trajectory of each object. In this regard, the question of how the trajectory length affects the classification accuracy is discussed herein.
Trajectories longer than 10 data frames and 30 data frames, respectively, were selected and the process of feature selection and classifier's training and testing were repeated. The recall rates of vehicle, pedestrian, cyclist, and wheelchair classification with and without considering historical trajectory are listed below in Table 5.3.
Comparing the recall rates obtained from trajectories of different lengths in Table 5.2(b) and Table 5.3, the conclusions can be summarized as follows:
Thus, disclosed herein is a feature-based classification method which has been combined with prior trajectory information to beneficially improve the classification of a vehicle, pedestrian, cyclist, and wheelchair using roadside LiDAR sensor data. In some embodiments, six features including distance, point count, direction, height, height variance, 2D length and 2D area extracted from road user clusters were utilized along with four classification algorithms including ANN, RF, AdaBoost and RUSBoost. The updating of critical features based on prior information of the entire trajectory advantageously improved the classification accuracy. The accuracy of using four classifiers to distinguish four types of road users before and after integration with prior trajectory information has been illustrated herein, with example results showing that training AdaBoost and RUSBoost classifiers with prior trajectory information can achieve recall rates for vehicles of 100%, for pedestrians of 99.96%, for cyclists of 99.74%, and for wheelchairs of 99.43% using 32-laser roadside LiDAR sensor data. The trained classifiers can also be used at different sites, which is highly advantageous, and the high classification accuracy of the disclosed methods lays a solid foundation for processing roadside LiDAR sensor data, especially because an advantage of roadside LiDAR is that it can record a large number of historical trajectories of all road users at a fixed location.
The traffic data processing computer 4000 may constitute one or more processors, which may be special-purpose processor(s), that operate to execute processor-executable steps contained in non-transitory program instructions described herein, such that the traffic data processing computer 4000 provides desired functionality.
Communication device 4004 may be used to facilitate communication with, for example, electronic devices such as roadside LiDAR sensors, traffic lights, transmitters and/or remote server computers and the like devices. The communication device 4004 may, for example, have capabilities for engaging in data communication (such as traffic data communications) over the Internet, over different types of computer-to-computer data networks, and/or may have wireless communications capability. Any such data communication may be in digital form and/or in analog form.
Input device 4006 may comprise one or more of any type of peripheral device typically used to input data into a computer. For example, the input device 4006 may include a keyboard, a computer mouse and/or a touchpad or touchscreen. Output device 4008 may comprise, for example, a display screen (which may be a touchscreen) and/or a printer and the like.
Storage device 4010 may include any appropriate information storage device, storage component, and/or non-transitory computer-readable medium, including combinations of magnetic storage devices (e.g., magnetic tape and hard disk drives), optical storage devices such as CDs and/or DVDs, and/or semiconductor memory devices such as Random Access Memory (RAM) devices and Read Only Memory (ROM) devices, as well as flash memory devices. Any one or more of the listed storage devices may be referred to as a “memory”, “storage” or a “storage medium.”
The term “computer-readable medium” as used herein refers to any non-transitory storage medium that participates in providing data (for example, computer executable instructions or processor executable instructions) that may be read by a computer, a processor or a like device. Such a medium may take many forms, including but not limited to, non-volatile media and volatile media. Non-volatile media may include, for example, optical or magnetic disks and other persistent memory. Volatile media may include dynamic random access memory (DRAM), which typically constitutes the main memory. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, a solid state drive (SSD), any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EEPROM, any other memory chip or cartridge, or any other medium from which a computer can read.
Various forms of computer readable media may be involved in providing sequences of computer processor-executable instructions to a processor. For example, sequences of instruction (i) may be delivered from RAM to a processor, (ii) may be wirelessly transmitted, and/or (iii) may be formatted according to numerous formats, standards or protocols, such as Transmission Control Protocol, Internet Protocol (TCP/IP), Wi-Fi, Bluetooth, TDMA, CDMA, and 3G.
Referring again to
The storage device 4010 may also include one or more traffic data database(s) 4018 which may store, for example, prior traffic trajectory data and the like, and which may also include computer executable instructions for controlling the traffic data processing computer 4000 to process sensor data and/or information to make the classification of roadside users possible in a manner that can be transmitted to connected vehicles and/or for further study of road use by users such as vehicles, cyclists, pedestrians and wheelchair users. The storage device 4010 may also include one or more other database(s) 4020 and/or have connectivity to other databases (not shown) which may be required for operating the traffic data processing computer 4000.
Application programs and/or computer readable instructions run by the traffic data processing computer 4000, as described above, may be combined in some embodiments, as convenient, into one, two or more application programs. Moreover, the storage device 4010 may store other programs or applications, such as one or more operating systems, device drivers, database management software, web hosting software, and the like.
As used herein, the term “computer” should be understood to encompass a single computer or two or more computers in communication with each other.
As used herein, the term “processor” should be understood to encompass a single processor or two or more processors in communication with each other.
As used herein, the term “memory” should be understood to encompass a single memory or storage device or two or more memories or storage devices.
As used herein, a “server” includes a computer device or system that responds to numerous requests for service from other devices.
The above descriptions and illustrations of processes herein should not be considered to imply a fixed order for performing the process steps. Rather, the process steps may be performed in any order that is practicable, including simultaneous performance of at least some steps and/or omission of steps.
Although the present disclosure has been described in connection with specific example embodiments, it should be understood that various changes, substitutions, and alterations apparent to those skilled in the art can be made to the disclosed embodiments without departing from the spirit and scope of the disclosure.
This patent application claims the benefit of U.S. Provisional Patent Application No. 63/263,518 filed on Nov. 4, 2021, the contents of which are hereby incorporated by reference for all purposes.
Number | Date | Country | |
---|---|---|---|
63263518 | Nov 2021 | US |