The subject matter disclosed herein relates to data processing, and in particular to identifying one or more objects in an environment of another object (e.g., a vehicle), and more particularly, to generating accurate bounding box estimation in multi-radar systems of vehicles for identification of objects.
Autonomous vehicles are becoming a more frequent occurrence on the road. Such vehicles are typically equipped with a variety of sensors, cameras, and other devices that allow the vehicle to determine its position, location, heading direction, and detect various objects around it. Typically, autonomous vehicles include LiDAR sensors which are light based sensors that measure reflection of a light signal to determine a geometry of a scene. However, such sensors as well as cameras are unable to provide proper sensing in bad weather or other adverse conditions. This may result in undesired consequences, including fatal accidents. Thus, it is important to equip autonomous vehicles with an ability to “see” the environment in all weather or other undesired conditions.
In some implementations, the current subject matter relates to a computer-implemented method for detecting presence of an object. The method may include receiving one or more signals reflected by one or more second objects, the signals being received by one or more radar sensors positioned on one or more first objects; generating, based on the one or more received signals, one or more representations, one or more portions of the generated representations corresponding to the one or more received signals; generating, using the one or more representations, one or more virtual enclosures encompassing the one or more second objects; and detecting, using the generated one or more virtual enclosures, a presence of the one or more second objects.
In some implementations, the current subject matter may include one or more of the following optional features. The radar sensors may be positioned on the vehicle a predetermined distance apart. The radar sensors may include two radar sensors.
In some implementations, the radar sensors may include a plurality of radar sensors. At least one radar sensor in the plurality of radar sensors may be configured to receive a signal transmitted by at least another radar sensor in the plurality of radar sensors. Further, at least a portion of the plurality of radar sensors may be time-synchronized.
In some implementations, one or more generated representations may include one or more point clouds. One or more portions of the generated representations may include one or more points in the point clouds. In some implementations, the method may also include filtering the generated point clouds to remove one or more points corresponding to one or more noise signals in the received signals, and generating, using the filtered point clouds, one or more virtual enclosures encompassing the second objects.
In some implementations, the generating of the point clouds may include generating one or more cross potential point clouds by combining one or more point clouds generated using signals received by each radar sensor. Generation of one or more cross potential point clouds may include clustering at least a portion of the point clouds using a number of points corresponding to at least a portion of the received signals being received from the one or more scattering region of a second object, generating one or more clustered point clouds, combining at least a portion of the clustered point clouds based on a determination that at least a portion of the clustered point clouds is associated with the second object and determined based on signals received from different radar sensors, and generating the cross potential point clouds.
In some implementations, the filtering may include removing one or more noise signals in the received signals received by each radar sensor. The filtering may include removing one or more noise signals in the received signals using one or more predetermined signal to noise ratio thresholds.
In some implementations, the generation of one or more object enclosures may include generating one or more anchor enclosures (e.g., anchor boxes) corresponding to each point in the point clouds. Generation of one or more anchor enclosures may include extracting, using the anchor enclosures, a plurality of features corresponding to the second objects, and determining, based on the extracting, a single feature representative of each anchor enclosure. Generation of one or more object enclosures may include predicting one or more object enclosures using the determined single feature of each anchor enclosure, associating a confidence value with each predicted object enclosure in the predicted object enclosures, and refining, based on the associated confidence value, one or more parameters of each predicted object enclosure to generate one or more virtual enclosures.
In some implementations, the object enclosure may include at least one of the following: a three-dimensional object enclosure, a two-dimensional object enclosure, and any combination thereof. The one or more virtual enclosures may include at least one of the following parameters: a length, a breadth, a height, one or more center coordinates, an orientation angle, and any combination thereof.
In some implementations, at least one of the first and second objects may include at least one of the following: a vehicle, an animate object, an inanimate object, a moving object, a motionless object, a human, a building, and any combination thereof.
In some implementations, the presence of an object may include at least one of the following: a location, an orientation, a direction, a position, a type, a size, an existence, and any combination thereof of the one or more second objects.
In some implementations, one or more second objects may be located in an environment of the one or more first objects. The presence of one or more second objects may be determined in the environment of the one or more first objects.
Implementations of the current subject matter can include, but are not limited to, systems and methods consistent including one or more features are described as well as articles that comprise a tangibly embodied machine-readable medium operable to cause one or more machines (e.g., computers, etc.) to result in operations described herein. Similarly, computer systems are also described that may include one or more processors and one or more memories coupled to the one or more processors. A memory, which can include a computer-readable storage medium, may include, encode, store, or the like one or more programs that cause one or more processors to perform one or more of the operations described herein. Computer implemented methods consistent with one or more implementations of the current subject matter can be implemented by one or more data processors residing in a single computing system or multiple computing systems. Such multiple computing systems can be connected and can exchange data and/or commands or other instructions or the like via one or more connections, including but not limited to a connection over a network (e.g. the Internet, a wireless wide area network, a local area network, a wide area network, a wired network, or the like), via a direct connection between one or more of the multiple computing systems, etc.
The details of one or more variations of the subject matter described herein are set forth in the accompanying drawings and the description below. Other features and advantages of the subject matter described herein will be apparent from the description and drawings, and from the claims. While certain features of the currently disclosed subject matter are described for illustrative purposes in relation to identification of objects within an environment, it should be readily understood that such features are not intended to be limiting. The claims that follow this disclosure are intended to define the scope of the protected subject matter.
The accompanying drawings, which are incorporated in and constitute a part of this specification, show certain aspects of the subject matter disclosed herein and, together with the description, help explain some of the principles associated with the disclosed implementations. In the drawings,
One or more implementations of the current subject matter relate to methods, systems, articles of manufacture, and the like that may, among other possible advantages, provide an ability to identify one or more objects in an environment of another object (e.g., a vehicle), and in particular, to generate accurate bounding box estimation in multi-radar systems of vehicles for identification of objects. While the following description will refer to identification of objects in an environment of a vehicle, it should be understood that the current subject matter is not limited to that and is applicable to any type of object that may be configured to perform determination/identification of other objects within its environment. The term(s) vehicle(s) is used for ease of discussion and illustration purposes only, but is not intended to limit the scope of the current subject matter and the claims.
Autonomous perception (such as in automotive systems) requires high-quality environment sensing in the form of 3D bounding boxes of dynamic objects. Primary sensors used in such automotive systems are typically light-based cameras and LiDARs. However, these sensors are known to fail in adverse weather conditions. Radars can potentially solve this problem as they are typically not affected by such adverse weather conditions. However, the wireless signals used in radars undergo predominantly specular reflections, that can cause poor performance of radar point clouds because of the lack of resolution.
In some implementations, the current subject matter relates to a system (including any associated methods, articles of manufacture, etc.) that may be configured to combine data from one or more spatially separated radars with an optimal separation to resolve the above problems. The current subject matter may be configured to implement cross potential point clouds, which may use spatial diversity induced by multiple radars and resolve a problem of noise and/or sparsity in radar point clouds. Moreover, the current subject matter may be configured to include a deep learning architecture that may be configured for radar's sparse data distribution to enable accurate 3D bounding box estimation. The current subject matter's spatial techniques may be considered fundamental to radars point cloud distribution and may be beneficial to various radar sensing applications.
As stated above, autonomous vehicles typically require a high-quality geometric perception of a scene in which they are navigating, even in adverse weather conditions. Most of the conventional computer vision algorithms and/or data-driven techniques rely on high resolution, multi-channel LiDARs to construct accurate 3D bounding boxes for dynamic objects. LiDAR is a light-based sensor which measures a reflection of a light signal to perceive a geometry of a scene, thereby creating one or more dense 3D point clouds. However, LiDAR cannot penetrate through adverse conditions (e.g., fog, dust, snow blizzard, etc.), thus causing the sensor to fail. In contrast, radars may provide a robust sensing solution, which transmits millimeter waves (mmWaves) and remain less affected by adverse weather conditions. The wavelength of mmWaves allows them to easily pass through such weather conditions, e.g., fog, dust, and/or other microscopic particles.
Although radars are all-weather reliant sensors, they need to provide LiDAR-like high-quality perception performance to enable adverse weather perception. However, a challenge with the radar is that it cannot generate dense and/or uniform point clouds, like LiDAR. One of the reasons for that is that an automotive radar emits mmWave signals, which specularly reflects off the surfaces, unlike light signals that scatter in every direction, thereby allowing only a fraction of incident waves to travel back to the radar receiver.
To address the above issues with current radar systems, in some implementations, the current subject matter relates to a system (as well as associated methods and/or articles of manufacture) that may enable one or more radar to overcome the challenges posed by specular reflections, sparsity and noise in the radar point clouds, and to provide high-fidelity perception of the scene using 3D bounding boxes. The current subject matter may be configured to include one or more low-resolution radars that may be positioned in an optimal fashion to maximize spatial diversity and/or scene information that may be perceived by the radar(s). The current subject matter may further implement a multi-radar fusion process (together with spatial diversity) to resolve the problem of specular reflections, sparsity and/or noise in radar point clouds. The current subject matter also may be configured to enable detection of multiple dynamic objects in the scene, with their accurate location, orientation and/or 3D dimensions. Further, the current subject matter may enable such perception in inclement weather to allow radar(s) to function as a sensor for autonomous perception.
To overcome specular reflections, the current subject matter may be configured to use one or more (e.g., two) radars that may be positioned at spatially separated locations overlooking the same scene and illuminate an object in the scene from different viewpoints. For example, the radars may be positioned a predetermined distance away from each other. This, in turn, may increase a probability of receiving a reflection back from multiple points/surfaces of the object, which a conventional single radar may have missed (as shown in
In some implementations, to address noise problem, the current subject matter may be configured to execute a multi-radar fusion process to reduces one or more noise points to enable accurate detection of multiple dynamic objects. Spatial diversity generated by multiple radars may be used to reduce the noise and/or enhance points corresponding to actual dynamic objects (e.g., to eliminate noise). In some cases, the point clouds collected by each radar of the multiple-radar may be translated to a common frame of reference and then combined to densify the radar point cloud. However, this approach may add up the noise points and might not reduce noise and/or miss out on important information/data encoded in the spatial locations of radars.
It is noted that across multiple viewpoints (e.g., radars), noise points may appear independent of each other in space and points belonging to actual surface/object may appear at nearby location consistently in most of the views (e.g., radars). To best leverage this observation, the current subject matter may be configured to generate a space-time coherence based framework for combining of 3D point clouds from multiple radars. The output of the framework may include a representation of cross potential point clouds that may include information regarding confidence of each point coming from an actual object as a soft probability value, along with one or more (or all) properties of a point cloud.
Using the knowledge of confidence estimates for the points, the current subject matter may be configured to determine whether they belong to objects or noise. However, identification of all relevant points out of noise might not be sufficient for a multi-object 3D bounding box estimation. First, depending on the distance, orientation, and/or exposed surface of an object, only a limited set of points might be captured by the radar. Second, in a scene with multiple objects, precise 3D bounding box estimation may require segmenting out the points belonging to each object. This may result in an uncertainty of the exact orientation and/or location of the bounding box.
Some approaches to solve for uncertainty may include designing hand-crafted features by taking into account the shape and size of the vehicles and all possible orientations. However, such an approach is not trivial because crafting features that can incorporate all possible cases may be very challenging.
In some implementations, to address the above problems, the current subject matter may be configured to execute a data-driven deep learning-based process to perform precise 3D bounding box estimation that leverages the sparsity of cross potential point clouds. This process may be configured to combine point cloud segmentation and 3D bounding box location estimation in space, and perform a region of interest (RoI) based classification. However, picking RoIs uniformly throughout the 3D space is not computationally feasible. Thus, the current subject matter may be configured to define a unique set of anchor boxes that allow iteration over all possible configurations of bounding boxes over sparse point clouds. The set of anchor boxes may exhaustively cover all configurations while efficiently reducing the search space.
In some exemplary, experimental configurations, the current subject matter was able to achieve a median error of less than 37 cm in localizing a center of an object bounding box, and a median error of less than 25 cm in estimating dimensions of the bounding boxes. Moreover, the current subject matter was able to achieve an overall mean-average precision (mAP) score (corresponding to an area under the precision-recall (PR) curve, which is a measure of the number of actual boxes detected (recall) along with the accuracy of detections (precision)) of 0.67 with an IoU (i.e., a Jaccard index corresponding to a measure of overlap between predicted bounding box and ground truth box) threshold of 0.5 and 0.94 with an IoU threshold of 0.2 for estimating 3D bounding box, which is comparable to existing bounding box estimation techniques that use LiDARs. Further, using the current subject matter's approach, the mAP values increase to 0.67 compared to 0.45 for a single radar system. This means that the current subject matter provides a performance improvement of 48% with its multi-radar fusion compared to a single radar. Moreover, the current subject matter may be configured to make inference at a frame rate of 50 Hz which is greater than the real-time requirements.
In some implementations, the current subject matter may be configured to provide a framework for radar perception that may leverages spatial diversity induced by multiple radars and optimize their separation, to counter a challenge of specular reflections in mmWave radars. The cross potential point clouds may utilize space-time coherence on point clouds from multiple radars, and reduce the noise in radar point clouds, thereby increasing quality of signal. The current subject matter's deep learning framework may be configured to leverage non-uniform distribution of radar point clouds, and estimate precise 3D bounding boxes on cross potential point clouds, while addressing challenges of specular reflections, radar clutter and noise, as well as sparsity.
With regard to specular reflections, for an incident electromagnetic wave on a surface, the size of its wavelength compared to the roughness of the object's surface determines the degree of scattering of the wave. mmWaves undergo a negligible scattering effect, resulting in a specular reflection (angle of incidence equals angle of departure) from the surfaces. Consequently, for a small aperture radar, a lot of reflected signal does not make its way back to the sensor, causing blindness of the objects. The blindness may be independent of the resolution capabilities of sensors.
Radar detections are commonly known to be polluted by signals from clutter, noise, and multi-path effects. Radar clutter is defined as the unwanted echoes from the ground or other objects like insects that can be confused with the objects under consideration. In a congested environment like cities, a signal emitted by a radar sensor could suffer multiple reflections before coming back to the sensor. The result is the formation of ghost objects, which are reflections of actual objects in some reflector formed because of multipath.
Outdoor scene point clouds are inherently sparse due to the empty volume between the objects, which are at a substantial distance from each other. Additionally, due to different interaction properties of mmWaves with different objects (non-uniform interactions), this effect is compounded in the case of mmWave radars. The result is a sparse and non-uniform point cloud.
As stated above, the current subject matter may resolve each of the above challenges to provide accurate bounding boxes by combining multiple radar fusion with a noise filtering algorithm to estimate the bounding boxes.
One or more of the elements 202-214 may include a processor, a memory, and/or any combination of hardware/software, and may be configured to generate one or more bounding enclosures or “boxes” (referred to herein as a “bounding box” or “object bounding box” for ease of discussion or illustration). In some cases, generation of the bounding boxes may be configured to rely on data, functions and/or features (and/or any combination thereof) of one or more components/elements 202-214. An object bounding enclosure or box may correspond to a virtualized/virtual enclosure of an object in an environment of another object, thereby allowing for one object to have a machine vision of another object. Such virtualized/virtual enclosure may allow objects to “see” one another and/or determine their locations, shapes, movement directions, and/or any other details. The enclosure may have any desired shape (e.g., square, rectangular, triangular, object shape, complex shape, etc.), form, size, and/or any other characteristics. It may also be positioned about the object in any desired fashion as well, e.g., it may be larger than the object (e.g., a vehicle), it may have one or more dimensions corresponding to one or more dimensions of the object, etc. The enclosure may have any number of dimensions (e.g., 2D, 3D, etc.). A computing component may refer to a software code that may be configured to perform a particular function, a piece and/or a set of data (e.g., data unique to a particular user and/or data available to a plurality of users) and/or configuration data used to create, modify, etc. one or more software functionalities associated with a particular workflow, sub-workflow, and/or a portion of a workflow. The system 200 may include one or more artificial intelligence and/or learning capabilities that may rely on and/or use various data, as will be discussed below.
The elements of the system 200 may be communicatively coupled using one or more communications networks. The communications networks can include at least one of the following: a wired network, a wireless network, a metropolitan area network (“MAN”), a local area network (“LAN”), a wide area network (“WAN”), a virtual local area network (“VLAN”), an internet, an extranet, an intranet, and/or any other type of network and/or any combination thereof.
The elements of the system 200 may include any combination of hardware and/or software. In some implementations, the elements may be disposed on one or more computing devices, such as, server(s), database(s), personal computer(s), laptop(s), cellular telephone(s), smartphone(s), tablet computer(s), and/or any other computing devices and/or any combination thereof. In some implementations, the elements may be disposed on a single computing device and/or can be part of a single communications network. Alternatively, the elements may be separately located from one another.
At 304, the system 200, using, for example, a radar point cloud generator 204, may be configured to generate one or more representations (e.g., referred to as point clouds hereinafter) that may be used for the purposes estimating one or more object bounding boxes that may be configured to enclose one or more objects in an environment of the vehicle.
At 306, the system 200 may be configured to apply noise filtration to reduce noise and enhance points corresponding to actual dynamic objects. At 308 and 310, one or more confidence metrics for the generated point clouds and/or one or more filtered point clouds and/or cross-potential point clouds may be generated using components 208 and 210 of system 200, respectively.
The process 300 may be configured to conclude with generation of an estimate (at 312) of one or more bounding enclosures or boxes and subsequent generation (at 314) of one or more object bounding boxes.
Specular reflections of millimeter waves can cause direct blindness of object surfaces, which could lead to fatal accidents. To better scrutinize the effect of specularity, it is important to understand a distribution of a radar point cloud. Point cloud generated by a radar may depend on the following: a geometry of the scene and a resolution of the radar. An adverse effect of specular reflections is a geometric problem and cannot be resolved simply by increasing the resolution of the radar.
For a rectangular bounding box of a car, the current subject matter system that uses multiple radars, may be configured to capture more surface points and thus be less affected by specularity, to thereby better estimate the bounding box's orientation. To understand the optimal separation of such a multiple radar system, simulations may be used. In particular, to estimate the orientation angle of the box from point clouds, a multi-layer perceptron (MLP) regressor trained on the data generated from the simulations may be used. The MLP regressor may receive a simulated point cloud as input and output an orientation angle of a vehicle.
In some implementations, use of multiple radars 202 (as shown in
As stated above, multiple radars may be configured to work together to overcome challenges posed by specular reflections of mmWaves by providing rich spatial diversity. However, noise (e.g., due to clutter, multi-path, system noise, etc.) may present a challenge for object detection from radar point clouds. Noise points may misguide object detection and introduce false positives. A single radar has fundamental limitations in removing noise. The features corresponding to the Cartesian coordinates, e.g., (x, y, z) and velocity, may provide a rich context for object detection. However, they do not provide information necessary to segregate the noise.
In some implementations, to resolve these issues, the current subject matter may be configured to generate a representation of cross potential point clouds that may be formed by fusing multiple radar point clouds. It should be noted that points belonging to noise may be independent across multiple radars placed at different spatial locations. This means that points belonging to an actual object may be more likely to be present in the point clouds of more than one radar. In contrast, points belonging to random noise may be specific to each radar. By leveraging this observation, the current subject matter system 200 may be configured to filter noise from radar point clouds and create low-noise cross potential point clouds, as discussed below.
Noise harms bounding box estimation as it generates false positives. In some implementations, the system 200 may be configured to use signal processing techniques to address noise and generation of false positives. In signal processing, multiple noisy data streams may be collected and averaged to reduce noise variance and improve an overall signal to noise ratio (SNR), whereby signals present in each data stream may add up coherently. At the same time, noise is random and will not add constructively. However, in point clouds, it is not possible to simply add point clouds from multiple radars to reduce noise because (1) a 3D point cloud is sparse and incoherent in space, i.e., multiple radars may capture different points in 3D space for the same target object, and (2) it is hard to build confidence for every point, whether it contributes to the object bounding box or corresponds to the noise point (e.g., as generated by clutter and/or multi-path).
To apply space coherence in the point cloud domain, the system 200 may be configured to use geometric information of point clouds. Thus, if a region of 3D space generates a response in multiple radars, it is likely to be generated from an object and not noise. To capture this effect, the system 200 may measure coherence between point clouds originating from multiple radars across 3D space. Radar points from an object may be concentrated around some scattering regions on a vehicle. By identifying these scattering regions as cluster of points, the system 200 may define a confidence of a point being generated from an object by looking at the same scattering region in multiple radar point clouds (i.e., space coherence). The system 200 may then perform enforcing space coherence by defining cross-potentials.
Since radar point clouds may be present in the form of clusters of points originating from a scattering region on the object, to cluster the point clouds, the system 200 may be configured to use a conventional DBSCAN clustering algorithm to find clusters in the point cloud. DBSCAN may define a neighborhood of points based on distance ϵ given as an input parameter. If a specific number of points (e.g., another input parameter) is present in that neighborhood, the point and its neighborhood may be identified as a cluster. For each cluster i, the centroid ci of its points may be used as the cluster's representative point. For the multi-radar case, cij may be used to represent the centroid of cluster i in the point cloud of jth-radar, which is represented by Γj. In this way, multiple clusters may be generated individually for each radar point cloud.
Next, the system 200 may be configured to determine correspondence between clusters across multiple radars. To do so, the system 200 may define a cross-potential between two clusters from two different radars as a confidence metric if the two clusters belong to the same object. The cross-potential between two clusters may be inversely proportional to the distance between the two clusters' centroids. Assuming the cross-potential as P(cij|Γk) for the ith cluster in radar j (where cij denotes its centroid) with the kth radar for k≠j, k∈{1, 2, . . . , N} in an N-radar system. Thus, the cross-potential may be expressed as follows:
where rijk is the distance between centroid cij from the respective nearest cluster centroid in kth-radar's point cloud Γj.
In some implementations, the above cross-potential may depend on dimensions of a particular vehicle, e.g., a typical passenger vehicle has a width of approximately 2 m. Thus, by way of a non-limiting example, the system 200 may select a function that generates a high potential to all points within the predetermined distance D (e.g., 2 m) neighborhood of a point, e.g., P>0.5 if rijk<D (e.g., 2 m). The choice of potential function may also preserve a point lying on a vehicle, which is present in only one radar, as long as it is in the predetermined distance (e.g., 2 m) neighborhood of another high potential point (e.g., on the farther corner along the width). With this, any extra points added due to the spatial diversity of the multiple radar system may be preserved. Using this potential function, the system 200 may quantify a space coherence of signals being received from multiple radars. The system 200 may then combine all points being received from multiple radar point clouds and add confidence information to them. Each point may be assigned the same potential as its respective cluster centroid. Further, the system 200 may filter all points below a certain predetermined potential-threshold to generate cross potential point clouds (CPPC).
Further, space coherence may help determine an amount of noise from an actual signal and improve detection performance. To do so, the system 200 may be configured to determine that along with the space coherence across radars, the points may also follow a time coherence across multiple frames from radar. For a point originating from a rigid body, the linear motion may be the same as the vehicle's motion across multiple time frames. The system 200 may be configured to track movement of points in consecutive frames and use it to estimate vehicle's heading direction. The system 200 may be configured to perform tracking on self-motion compensated frames to remove an effect of the source vehicle movement. In some implementations, Kalman filter-based corrections may be used to resolve sensor uncertainties and/or noise. The system 200 may track points with the highest cross-potentials for each object. Using the time coherence between points, the system 200 may obtain a prior estimate of vehicles' heading directions in a scene (e.g., a surrounding environment about the vehicle, e.g., vehicle 602 shown in
Using multiple radar fusion to generate cross potential point clouds (as shown in
x,
,z=(scene geometry) (2)
The system 200 may then inverse this mapping and estimate the scene geometry in terms of object bounding boxes, as follows:
(pN,ψN)=(ΓCPPC) (3)
where N is the unknown number of objects present in the scene; ΓCPPC is cross potential point clouds where each point is denoted by its Cartesian coordinates, velocity, intensity and CPPC confidence; pN represents the confidence of detection for Nth objects' bounding box in a scene and ψN: {cx, c, cz, w, h, l, θ} denotes the tuple of bounding box parameters which are center coordinates, dimensions and yaw angle (i.e., an angle with respect to z-axis) respectively.
is the multiple 3D bounding box estimation system.
Estimating 3D bounding boxes for objects from a radar point cloud depends on proper segmentation of radar point cloud. In particular, a radar point cloud of a scene is sparsely distributed where any subset of points could belong to a single object. Also, the number of objects and their locations are not known a priori. Bounding box estimation requires proper segmentation of points belonging to each object in the scene. This is a complex mapping problem where the number of targets is not known. Moreover, radars can only see a part of an object which is exposed to the sensor. Thus, the point cloud of an object may not contain crucial information regarding all dimensions, orientation, and center-location of the bounding box. As a result, there is uncertainty in bounding box parameters.
To overcome the challenges mentioned above, in some implementations, the system 200 may be configured to include a deep learning architecture designed to handle the sparsity in radar point clouds and output accurate 3D bounding boxes.
The process 700 may be configured to generate one or more anchor boxes based on the radar response due to vehicle geometry and space-time coherence. Unlike LiDAR, where many points originate from ground and other static objects, e.g., buildings, radar data is sparse and contains mostly the points from dynamic and metallic objects like vehicles after CPPC noise suppression due to strong electromagnetic (EM) reflective properties of metals. Specifically, the sparsity in radar data and the fact that all the points originate from the vehicles' surface allow defining point-based region proposals. These fixed size anchor boxes may be used as initial estimates of 3D bounding boxes. The size of the anchor boxes may be determined using an average size of vehicles in the training dataset. Instead of the entire scene point cloud, the process 700 may be executed separately on each of these anchor boxes. For each of these anchor boxes, the task is reduced to generate a confidence number p of whether points inside that anchor box belong to an object.
Confidence scores may be generated for all anchor boxes. A set of high confidence boxes may be selected. These anchor boxes may be passed through a refinement stage that may solve the uncertainty issue in bounding box parameters. In this stage, the anchor boxes may be refined to generate accurate 3D bounding boxes of the objects present in the scene (e.g., as represented by parameter ψ).
Referring to
At 706, segmentation using feature extraction and pooling may be performed. Here, the system 200 may be configured to perform classification and 3D bounding box parameter regression by learning meaningful feature representations from the point cloud data. The system 200 may be configured to extract these features before and after generating anchor boxes. For extraction of features before generation of anchor boxes, a point-net encoder of shared multi-layer perception (MLP) may be used to extract features from the entire point cloud. During extraction of features after generation of anchor boxes, the anchor boxes may be determined for each point. The system 200 may be configured to use a region of interest (RoI) feature pooling block to pool features from all points inside an anchor box. These features may be passed through another point-net layer and then max-pooled into a single representative feature for every anchor box defined per scene.
At 708, the system 200 may be configured to generate or predict an anchor box confidence score. The entire set of representative features of anchor boxes, obtained at 706, may be passed through a classification network that may include fully connected layers. The fully connected layers may learn a mapping from anchor boxes' representative features to the confidence value for each box.
Performing classification on RoI based max-pooled features may ensure that the contextual information from all neighborhood points of the anchor point, lying inside the anchor box is accounted, thereby leading to better classification results. Here, the problem of segmentation may be solved by the system 200 by performing classification directly on the anchor boxes. The system 200 may learn to select the corresponding anchor box with high confidence, which may include all points belonging to an object.
At 710, the anchor box refinement may be executed by refining of box parameters. As discussed above, at 708, the generated anchor boxes may correspond to rough estimates of the dimensions, center, and orientation of final 3D bounding boxes since fixed-size anchor boxes were used. The system 200 may be configured to perform further refinement of these parameters to generate accurate bounding boxes, which may estimate accurate dimensions and location of the boxes. After the classification step (at 708), confidence scores for all the anchor boxes may be determined and since anchor boxes were generated for each point, there may be one or more overlapping high confidence boxes belonging to the same object. The system 200 may be configured to perform non-maximal suppression (NMS) sampling on this set using the confidence values. NMS sampling may remove boxes which have a high overlap with another high confidence box of the same object. The representative features from the remaining anchor boxes may be passed through three fully connected layers to output a tuple [h′, w′, l′, x′, y′, z′, θ′] corresponding to refinements of length, breadth, height, center coordinates, and orientation angle, respectively. These refinements may be added to the anchor box parameters to generate the final 3D bounding box prediction, at 712.
The anchor box classification in the first stage of the process 700 is a binary classification problem that uses a cross-entropy loss represented by
where yi=[0, 1] is the ground truth and pi is the predicted confidence value. Refinement of the bounding boxes is a regression problem and Smooth-L1 loss may be used for this purpose. The loss may be represented as follows:
where r and r′ are ground truth and regressed refinement values respectively for each parameter[h′, w′, l′, x′, y′, z′, θ′].
In some implementations, synchronization clocks 955 (a, b, c, d) (e.g., global positioning system (GPS) based synchronization clocks) may be communicatively coupled and/or integrated with each radar 951, respectively. Alternatively, or in addition to, one clock may be associated with more than one radar 951. The radars 951 may be synchronized using the synchronization clocks 955. To perform synchronization, one or more radars 951 may be selected as leader device and its clock may be synchronized across other radars 951. Using the synchronized clocks and known positions of the radars 951, the central node 953 may determine transmission origin of any received signals (e.g., a signal received by radar 951b at time t1 may be determined to be a reflected signal of a signal that has been transmitted by radar 951a at time t0). As can be understood, there may be other ways of synchronizing clocks of radars and/or determining origins of the received signals. Once the signals are received, the central node 953 (which can be incorporated into the system 200 shown in
The current subject matter system was tested using a dataset containing 54,000 radar frames. A train test split of 9:1 was used. The data used for testing was obtained from separate data collection runs than training data to ensure generalization. Performance of the current subject matter was compared against LiDAR in bad weather conditions. The IoU and mAP metrics were used to assess performance of the system. As stated above, IoU is a measure of the overlap between the predicted bounding box and the ground truth box. 3D IoU is defined by
and 2D IoU is defined for the top view (also, referred to as bird-eye-view (BEV)) rectangles of 3D bounding boxes, as
Two equal-sized boxes with half overlap would have an IoU of 0.33. Hence, even an IoU of around 0.5 is generally regarded as a good overlap. mAP is the area under the precision-recall (PR) curve, which is a measure of the number of actual boxes detected (recall) along with the accuracy of detections (precision). Specifically, precision is obtained for incremental recall values to get PR curve:
Precision=TP/(TP+FP)
Recall=TP/(TP+FN)
mAP=Area(precision-recall curve)
An estimation is regarded as a true positive (TP) if it is above a particular IoU (Intersection Over Union) threshold. Note that a higher recall rate may be obtained by predicting a large number of boxes, but at the cost of sacrificing precision (e.g., more False Positives (FP)) and vice-versa. A higher mAP may mean better performance on both accuracy (precision) and exhaustiveness (recall) of estimation. An FP may also be obtained because of noise. An FP generated due to noise may have a very small (approximately 0) IoU with any ground truth box. Thus, in a lower IoU threshold regime, the mAP may be more sensitive to the amount of noise and may allow better comparison of noise suppression performance.
The current subject matter system achieved a median error of less than 37 cm in localizing the center of an object bounding box and a median error of less than 25 cm in estimating the dimensions of the bounding boxes. A 2D IoU (BEV IoU) was used as a threshold for the mAP metric. The current subject matter system achieved an mAP score of 0.67 for an IoU threshold of 0.5 and a score of 0.94 for a lower IoU threshold of 0.2, which is a 45% improvement over a single radar system.
An experimental current subject matter system (referred to as RP-MR-CPPC 801 in
RP-MR 802: system with multiple radar data without cross potential point clouds fusion. The point clouds from multiple radars are simply added in the global coordinate system.
RP-SR 803: system with single radar data.
Clust-CPPC 804: system with clustering based approach used on cross potential point clouds.
Clust 805: system with a clustering based bounding box estimation baseline. A predefined size bounding box is estimated for each cluster found using DBSCAN, coupled with angle estimation.
PointRCNN 806: system implementing well-known LiDAR based 3D bounding box estimation network PointRCNN.
In some implementations, the current subject matter can be configured to be implemented in a system 1000, as shown in
At 1102, the system 200 may be configured to receive one or more signals reflected by one or more second objects (e.g., other vehicles, telephone poles, buildings, etc.). The second objects, for example may be located in an environment (e.g., a scene) of a first object (e.g., autonomous vehicle). The signals may be received by one or more radar sensors positioned on one or more first objects. For example, radar sensors 202 may be positioned on one or more first object (e.g., at a predetermined distance apart, such as, at a width of the vehicle). An exemplary sensor positioning is illustrated in
At 1104, the radar point cloud generator 204 may be configured to generate, based on the received signals, one or more representations (e.g., point clouds). The representations may include a plurality of portions (e.g., points) corresponding to the received signals. An exemplary representation (e.g., a point cloud) is shown in
At 1106, the system 200 may be configured to generate one or more virtual enclosure encompassing one or more second objects. Exemplary virtual enclosures (e.g., bounding boxes) are illustrated in
In some implementations, the current subject matter may include one or more of the following optional features. The radar sensors may be positioned on the vehicle a predetermined distance apart. The radar sensors may include two radar sensors.
In some implementations, the radar sensors may include a plurality of radar sensors. At least one radar sensor in the plurality of radar sensors may be configured to receive a signal transmitted by at least another radar sensor in the plurality of radar sensors (e.g., as shown in
In some implementations, one or more generated representations may include one or more point clouds. One or more portions of the generated representations may include one or more points in the point clouds. In some implementations, the method 1100 may also include filtering the generated point clouds to remove one or more points corresponding to one or more noise signals in the received signals, and generating, using the filtered point clouds, one or more virtual enclosures encompassing the second objects.
In some implementations, the generating of the point clouds may include generating one or more cross potential point clouds by combining one or more point clouds generated using signals received by each radar sensor. Generation of one or more cross potential point clouds may include clustering at least a portion of the point clouds using a number of points corresponding to at least a portion of the received signals being received from the same scattering region of a second object, generating one or more clustered point clouds, combining at least a portion of the clustered point clouds based on a determination that at least a portion of the clustered point clouds is associated with the second object and determined based on signals received from different radar sensors, and generating the cross potential point clouds.
In some implementations, the filtering may include removing one or more noise signals in the received signals received by each radar sensor. The filtering may include removing one or more noise signals in the received signals using one or more predetermined signal to noise ratio thresholds.
In some implementations, the generation of one or more object enclosures may include generating one or more anchor enclosures (e.g., anchor boxes) corresponding to each point in the point clouds. Generation of one or more anchor enclosures may include extracting, using the anchor enclosures, a plurality of features corresponding to the second objects, and determining, based on the extracting, a single feature representative of each anchor enclosure. Generation of one or more object enclosures may include predicting one or more object enclosures using the determined single feature of each anchor enclosure, associating a confidence value with each predicted object enclosure in the predicted object enclosures, and refining, based on the associated confidence value, one or more parameters of each predicted object enclosure to generate one or more virtual enclosures.
In some implementations, the object enclosure may include at least one of the following: a three-dimensional object enclosure, a two-dimensional object enclosure, and any combination thereof. The one or more virtual enclosures may include at least one of the following parameters: a length, a breadth, a height, one or more center coordinates, an orientation angle, and any combination thereof.
In some implementations, at least one of the first and second objects may include at least one of the following: a vehicle, an animate object, an inanimate object, a moving object, a motionless object, a human, a building, and any combination thereof.
In some implementations, the presence of an object may include at least one of the following: a location, an orientation, a direction, a position, a type, a size, an existence, and any combination thereof of the one or more second objects.
In some implementations, one or more second objects may be located in an environment of the one or more first objects. The presence of one or more second objects may be determined in the environment of the one or more first objects.
One or more aspects or features of the subject matter described herein can be realized in digital electronic circuitry, integrated circuitry, specially designed application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) computer hardware, firmware, software, and/or combinations thereof. These various aspects or features can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which can be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device. The programmable system or computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
These computer programs, which can also be referred to as programs, software, software applications, applications, components, or code, include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the term “machine-readable medium” refers to any computer program product, apparatus and/or device, such as for example magnetic discs, optical disks, memory, and Programmable Logic Devices (PLDs), used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term “machine-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable processor. The machine-readable medium can store such machine instructions non-transitorily, such as for example as would a non-transient solid-state memory or a magnetic hard drive or any equivalent storage medium. The machine-readable medium can alternatively, or additionally, store such machine instructions in a transient manner, such as for example, as would a processor cache or other random access memory associated with one or more physical processor cores.
The subject matter described herein can be embodied in systems, apparatus, methods, and/or articles depending on the desired configuration. The implementations set forth in the foregoing description do not represent all implementations consistent with the subject matter described herein. Instead, they are merely some examples consistent with aspects related to the described subject matter. Although a few variations have been described in detail above, other modifications or additions are possible. In particular, further features and/or variations can be provided in addition to those set forth herein. For example, the implementations described above can be directed to various combinations and subcombinations of the disclosed features and/or combinations and subcombinations of several further features disclosed above. In addition, the logic flows depicted in the accompanying figures and/or described herein do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Other implementations may be within the scope of the following claims.
The present application claims priority to U.S. Provisional Patent Appl. No. 63/113,123 to Bansal et al., filed Nov. 12, 2020, and entitled “MIMO Synchronized Large Aperture Radar,” and incorporates its disclosure herein by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US21/59190 | 11/12/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63113123 | Nov 2020 | US |