The present invention relates to V2X communication technology. In particular, the invention relates to a method for roadside assisted vehicle sensor deviation correction in simultaneous localization and mapping (SLAM) including, but not limited to, a method for position and/or deviation correction of a moving object in a SLAM environment, and/or a method for position and/or deviation correction of an autonomous vehicle (AV) in a SLAM environment.
Autonomous Vehicles (AVs) are vehicles configured with an aim to assist or replace human drivers by automating at least some of the driving tasks. In contrast to conventional vehicles which have typically been configured to use only real-time data retrieved from on-board vehicle modules such as visual sensors to determine or detect potential threats on the road, AVs utilize the Vehicle-to-Everything (V2X) communications protocol, which is a vehicular communication protocol configured to deliver information from a vehicle to any entity that may affect the vehicle, and vice versa. The V2X protocol assists by enabling communicating information and/or data exchanges between vehicle on-board units (OBUs) and, for example, roadside infrastructure for road safety, management, and/or threat determination purposes. A V2X system incorporates other more specific types of communications including, but not limited to, Vehicle-to-Infrastructure (V2I), Vehicle-to-Vehicle (V2V), Vehicle-to-Pedestrian (V2P), Vehicle-to-Device (V2D), and Vehicle-to-Grid (V2G), etc.
Very often, AVs are equipped with sensors for collecting point clouds; a point cloud comprising a collection of individual data points in a three-dimensional space, with each data point being assigned with a set of coordinates within the X, Y, and Z axes. Point cloud data is particularly useful in object detection, path planning and vehicle control by the AVs. Similarly, road-side units (RSUs) such as road-side sensors or other connectivity devices may also be capable of collecting point cloud data, and subsequently, transmitting the point cloud data to edge servers or the like through Cellular-Vehicle-to-Everything (C-V2X) channels.
In traditional SLAM architecture for autonomous driving, loop-closure is used to reduce accumulated error by checking if somewhere is revisited by the AV. When a loop (i.e., a revisited position) is found, a location-equality condition is added to a back-end bundle adjustment (BA), which performs constrained global optimization over all history keyframes of the AV's point cloud data, in order to minimize accumulated error in the AV's current trajectory.
During the process of SLAM, accumulation of errors is unavoidable. Long-term error accumulation will lead to unreliable results, which means it is difficult to build globally consistent vehicle trajectories and maps. Thus, in autonomous driving, loop-closure detection (also called loop detection) is widely used to solve this problem. Using loop-closure detection, previous back-end optimized trajectory is corrected and accumulated errors reduced or eliminated. Loop-closure detection is very important for improving the accuracy and robustness of the whole SLAM system.
Traditional loop-closure detection highly relies on choosing one or more key frames from point cloud data to calculate a similarity between frames for judging whether the frames relate to the same location. Besides, loop-closure detection requires that the AV must arrive at the same position so the loop-closure can be generated. However, in outdoor environments, it may take a long time for the AV to return to the same position, which means the deviation correction of loop-closure frequency is too low to correct accumulated errors in time. Thus, the loop-closure detection is not efficient enough to meet this need. Furthermore, the comparison of key frames is not conducted consistently using an accurate world coordinate system, which may lead to wrong matching of frames such that deviations of the AV from the world coordinate system cannot be eliminated.
There is a need to develop a method to correct accumulated errors of AV pose estimation without needing loop closure detection.
An object of the invention is to mitigate or obviate to some degree one or more problems associated with known systems and methods for correcting accumulated errors in AV pose in a SLAM environment.
Another object of the invention is to provide an improved method for position and/or deviation correction of a moving object in a SLAM environment.
A further object of the invention is to provide an improved method for position and/or deviation correction of an autonomous vehicle (AV) in a SLAM environment.
A yet further object is to provide an improved method for roadside assisted vehicle sensor deviation correction in SLAM.
The above objects are met by the combination of features of the main claims; the sub-claims disclose further advantageous embodiments of the invention.
One skilled in the art will derive from the following description other objects of the invention. Therefore, the foregoing statements of object are not exhaustive and serve merely to illustrate some of the many objects of the present invention.
In a first main aspect, the invention provides a method for position and/or deviation correction of a moving object in a simultaneous localization and mapping (SLAM) environment. The method comprises receiving at the moving object point cloud data for an area associated with the moving object from a sensor located at a fixed location. A processor of the moving object performs the steps of: transforming the received point cloud data to a coordinate system of the moving object; matching the transformed point cloud data with point cloud data generated by a sensor at the moving object for the area associated with the moving object; and using the matched point cloud data to determine a corrected position and/or deviation of the moving object within a selected coordinate system.
Preferably, but not exclusively, the SLAM environment comprises a road system, the moving object comprises an AV navigating its way around the road system, the processor forms part of an on-board unit (OBU) of the AV, the sensor of the moving object comprises a Light Detection and Ranging (LiDAR) sensor of the AV, the area associated with the moving object comprises a field of view (FOV) of the LiDAR sensor of the AV, the sensor located at a fixed location comprises a road-side unit (RSU) LiDAR sensor, and the selected coordinate system comprises a world coordinate system.
In a second main aspect, the invention provides an OBU for an AV operating in a SLAM environment, the OBU comprising a memory for storing machine-readable instructions and a processor for executing said machine-readable instructions such that, on receiving point cloud data for a FOV of a LiDAR sensor of the AV from a LiDAR sensor of a RSU located at a fixed position within a selected coordinate system, the processor configures the OBU to implement the steps of: transforming the received point cloud data to a coordinate system of the AV; matching the transformed point cloud data with point cloud data generated by the LiDAR sensor of the AV; and using the matched point cloud data to determine a corrected position and/or deviation of the AV within the selected coordinate system. The selected coordinate system preferably comprises a world coordinate system.
In a third main aspect, the invention provides an RSU operating in a SLAM environment, the RSU comprising a memory for storing machine-readable instructions and a processor for executing said machine-readable instructions configuring the RSU to implement the steps of: broadcasting notification information indicating that the RSU can provide point cloud data to autonomous vehicles (AVs) in a cover region of the RSU; receiving from an AV data defining a FOV of the LiDAR sensor of the AV and, in response, selecting a subset of point cloud data generated by one or more RSU LiDAR sensors, said subset of the point cloud data comprising points located in the FOV of the LiDAR sensor of the AV; and sending the subset of point cloud data to the OBU of the AV.
The summary of the invention does not necessarily disclose all the features essential for defining the invention; the invention may reside in a sub-combination of the disclosed features.
The foregoing and further features of the present invention will be apparent from the following description of preferred embodiments which are provided by way of example only in connection with the accompanying figures, of which:
The following description is of preferred embodiments by way of example only and without limitation to the combination of features necessary for carrying the invention into effect.
Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments, but not other embodiments.
It should be understood that the elements shown in the figures may be implemented in various forms of hardware, software, or combinations thereof. These elements may be implemented in a combination of hardware and software on one or more appropriately programmed general-purpose devices, which may include a processor, memory, and input/output interfaces.
The present description illustrates the principles of the present invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the principles of the invention and are included within its spirit and scope.
Moreover, all statements herein reciting principles, aspects, and embodiments of the invention, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
Thus, for example, it will be appreciated by those skilled in the art that the diagrams presented herein represent conceptual views of systems or devices embodying the principles of the invention.
The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term “processor” or “controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (“DSP”) hardware, read-only memory (“ROM”) for storing software, random access memory (“RAM”), and non-volatile storage.
In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode, or the like, combined with appropriate circuitry for executing that software to perform the function. The invention as defined by such claims resides in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.
By way of further background, in a traditional SLAM environment or architecture for autonomous driving, it is known to use loop-closure to reduce accumulated errors if an AV revisits a location. When a loop (revisited position) is found, a location-equality condition is added to a back-end bundle adjustment (BA) which performs constrained global optimization over all history keyframes of the AV, in order to minimize accumulated error in the AV's trajectory.
Referring to
Scan context is a popular algorithm for loop-closure detection. A scan context descriptor comprises a global feature descriptor for a point cloud that can be used to detect loop closures. Scan context is a spatial descriptor with a matching algorithm, specifically targeting outdoor place recognition using a single three-dimension (3D) scan by encoding a whole point cloud data in 3D scan into a matrix. This involves, first, a point cloud in a single 3D scan is encoded into scan context. Then, a dimensional vector Nr is encoded from the scan context and is used for retrieving the nearest candidates as well as the construction of the KD (K-dimensional) tree. Nr is the number of rings where the ring is a concept when obtaining the scan context of a frame in point cloud data. Finally, retrieved candidates are compared to a query scan context. The candidate that satisfies an acceptance threshold and is closest to the query is considered the loop. If a loop is detected which satisfies the acceptance threshold and is closest to the query then it is considered the “true value” of a revisited location and is added to the constraints of SLAM pose estimation.
There are many disadvantages of using the traditional loop-closure detection method. The disadvantages include that loop-closure detection highly relies on choosing a key frame from the point cloud data to calculate the similarity between frames to judge whether or not it is the same location, i.e., the revisited location. Loop-closure detection also requires that the AV must arrive at the same position so that the loop-closure can be generated, but, in outdoor environments, it may be a long time before the AV returns to the same position, which means that the deviation correction of loop-closure frequency may be too low to correct accumulated errors over time. Thus, loop-closure detection is often not efficient enough for AV purposes. Furthermore, the comparison of the key frame is not conducted in an accurate world coordinate system, which may lead to a wrong match in loop-closure detection and consequently any deviation of the AV from the world coordinate system may not be eliminated.
Some of the known or traditional methods focus on higher level of errors between different sources such as “missing objects”. While that is useful for shared perception problems, the present invention focuses on eliminating sensor deviation errors. In addition, in the known methods, it is common to pre-process the data to simplified formats (i.e., data from the RSUs) before broadcasting the data to road agents. These data often lack sufficient details for sensor error reductions. Known methods also often focus on recording and reporting sensor errors, while the present invention aims to fix sensor deviations in real-time. The known systems do not disclose or suggest the efficient real-time data sharing of data between RSUs and AVs such as requesting the specific data for the AV FOV and segmentation of point cloud data at the RSU based on the AV FOV. Known methods tend to place data matching/correction processes in the RSUs' processing units or in the cloud and to then share it between road agents such as RSUs. The present invention focuses on the correction of AVs' sensors thus the process takes place in every AV separately assisted by the RSUs and is conducted in real-time.
The present invention does not use traditional loop-closure detection because of its many disadvantages. The present invention proposes a novel method for position and/or deviation correction of a moving object in a simultaneous localization and mapping (SLAM) environment. The novel method comprises receiving at the moving object point cloud data for an area associated with the moving object from a sensor located at a fixed location. A processor of the moving object performs the steps of: transforming the received point cloud data to a coordinate system of the moving object; matching the transformed point cloud data with point cloud data generated by a sensor at the moving object for the area associated with the moving object; and using the matched point cloud data to determine a corrected position and/or deviation of the moving object within a selected coordinate system.
In a preferred embodiment, the SLAM environment comprises a road system, the moving object comprises an AV navigating its way around the road system, the processor forms part of an on-board unit (OBU) of the AV, the sensor of the moving object comprises a Light Detection and Ranging (LiDAR) sensor of the AV, the area associated with the moving object comprises a field of view (FOV) of the LiDAR sensor of the AV, the sensor located at a fixed location comprises a road-side unit (RSU) LiDAR sensor, and the selected coordinate system comprises a world coordinate system. Preferably, the world coordinate system is derived from the fixed location of the RSU.
The preferred embodiment for a general roadside assisted method 10 for autonomous vehicle sensor deviation correction is illustrated by
Graph A of
As seen in
In a set of next steps 42 of the method 40, when an AV 26 enters a coverage region or area 22 of an RSU 14, the AV OBU 12 sends a request and data defining the AV's LiDAR sensor FOV to the RSU 14. In response, as explained above, the RSU 14 sends relevant point cloud data from the dynamic point cloud map to the AV OBU 12 based on the AV LiDAR sensor's FOV, i.e., coverage area 29.
More specifically, for a preferred embodiment of the method 40, the set of next steps 42 can be broken down into the following steps. When an AV 26 enters a coverage region or area 22 of an RSU 14, the RSU 14 broadcasts notification information to its coverage region 22 indicating that it can provide point cloud data to AVs 26 that are within said coverage region 22. The notification is preferably broadcast on the PC5 UU interface of the 5G cellular network. On receiving the notification, the AV OBU 12 of an AV 26 determines if it is necessary to conduct a deviation correction. If yes, the AV OBU 12 sends a request to said RSU 14 requesting the point cloud data, the request including data defining the FOV (coverage are 29) of the LiDAR sensor 13 of the AV 26. On receiving the request from the AV OBU 12, the RSU 14 matches the received FOV with the dynamic point cloud map by segmenting target point cloud data from the dynamic point cloud map. The RSU 14 does this by selecting points in the dynamic point cloud map which are located within the FOV of the AV LiDAR sensor 13. The RSU 14 then sends the subset of the dynamic point cloud data map to the AV OBU 12. If the AV OBU 12 does not receive any point cloud data from the RSU 14 then the AV OBU 12 takes no further action, i.e., ends the method 40.
The process of matching the received FOV at the RSU 14 with the dynamic point cloud map is more fully illustrated in
It is worth noting that the data defining the FOV of the AV's LiDAR sensor 13 sent by the AV OBU 12 to the RSU 14 is located in a coordinate system generated in the SLAM process which includes deviations when compared to the world coordinate system. In contrast, as the position of the RSU 14 is fixed, the position of the dynamic point cloud data map can be considered as the ground truth of the world coordinate system. Thus, when the FOV of the AV's LiDAR sensor 13 is used as a basis for segmenting the dynamic point cloud data map, there will be small errors introduced. However, these errors can be ignored as, later in the process, a confidence of point cloud data from comparison is set such that errors introduced here have little effect on the final result.
A further step of the method 40 is for the AV OBU 12 to transform the received subset of point cloud data from the RSU 14 to the coordinate system of the AV 26. However, to aid such a transformation, it is preferred to first obtain an initial estimation for a transformation matrix in a next step 43. The step 43 comprises determining an initial estimation of a transformation matrix Ti comprising a rotation matrix Ri and a translation vector ti which initially translates the subset of point cloud data to the coordinate system of the AV 26.
The method 40 involves two sets of point cloud data:
P={p
1
,p
2
,p
3
, . . . p
n} and
Q={q
1
,q
2
,q
3
, . . . q
n}.
where P is the point cloud data detected by the AV 26 and Q is the point cloud data collected at the roadside by the RSUs 14. It is necessary to determine the transformation matrix Ti comprising the rotation matrix Ri and the translation vector ti such that:
To do so, it is helpful to determine the corresponding point sets first and construct an objective function by a least squares method for iterative optimization.
To improve the success rate of point cloud registration as described later, a good initial estimation is required. Some algorithms for point cloud registration rely heavily on the initial estimation. The point cloud registration may fail because of a wrong initial estimation. Thus, the present invention proposes the following method using V2I (Vehicle to Infrastructure) to improve the initial estimation. Referring to
Assume that transformation matrix Ti satisfies:
Ti is the unknown value required.
Combining these formulas with the foregoing formulas provides:
As Tv and Tr are both known, then Ti can be determined to provide a good initial estimation for the transformation matrix Ti.
This provides the transformation from the RSU 14 coordinate system to the AV OBU 12 coordinate system, as both the Tv and Ti are estimated values by the AV 26 and the RSU 14 respectively. The transformation matrix Ti is not accurate, but it has been found to be sufficiently accurate to enable the initial estimation and transformation to be implemented. The process of determining an initial estimation of the transformation matrix Ti generally comprises determining respective poses V0, V1 of the AV 26 detected by the LiDAR sensor 13 of the AV 26 at at least two different time points t0, t1 and determining respective poses R0, R1 of the AV 26 detected by the LiDAR sensor 15 of the RSU 14 at said at least two different time points to, t1. Then, said respective poses V0, V1, R0, R1 at the at least two different time points to, t1 are combined to determine the initial estimation of a transformation matrix Ti.
After obtaining an estimated pose for the AV 26 from the AV OBU 12 and from the RSU 14 using the transformation matrix Ti, it is possible to further estimate the pose. This can be achieved by adjusting a weight of each pose according to a tracking quality, or dynamically adjusting the weight in a decision tree or a neural network.
Assume the weight of the AV estimated pose is w1, and the weight of the RSU estimated pose is w2, the initial estimation of pose can be obtained from:
Returning to
In a next step 45 of the method, the OBU 12 of the AV 26 selects a frame (denoted as “F-frame”) of the point cloud data generated by the LiDAR sensor of the AV 26 for its FOV and then compares the F-frame with the R-frame.
In a next set of steps 46 of the method 40, the OBU 12 of the AV 26 compares the F-frame with the R-frame by firstly encoding the point cloud data comprising the F-frame and the point cloud data comprising the R-frame into respective scan context descriptors using a Scan Context Algorithm. Then, determining or calculating a similarity score between the respective scan context descriptors for the F-frame and the R-frame. Then, if the similarity score satisfies an acceptance threshold, proceeding to the next step of the method 40, otherwise end the method 40. Preferably, the selected F-frame comprises a current frame of point cloud data generated by the LiDAR sensor 13 of the AV 26. The method does not require use of the search function in the Scan Context Algorithm.
In a next step 47 of the method 40, the OBU 12 of the AV 26 performs point cloud data registration between the F-frame and the R-frame to obtain their proper or correct transformation matrix T comprising a rotation matrix R and a translation vector t. The transformation matrix T may be used to translate the R-frame point cloud data to the F-frame point cloud data to provide a matched frame of point cloud data for the AV 26 comprising point cloud data generated by the LiDAR sensor 13 of the AV 26 and by one or more or at least some of the RSU LiDAR sensors 15. If the point cloud registration step 47 fails then the method 40 is ended. However, if the point cloud registration step 47 is successful then the method moves to step 48.
In Step 48 of the method 40, the transformation matrix Tis used by the OBU 12 of the AV 26 to determine a correct pose of the AV 26 in the world coordinate system. Assume that the pose of the RSU 14 is Pr and the pose of the OBU 12 of the AV 26 is PO, with both Pr and PO in the world coordinate system. Then, it can be seen that: T×Pr=Po and, as Pr is known, the correct value of the pose Po of the AV OBU 12 in the world coordinate system can be obtained.
After the step 48 of using the transformation matrix T to determine a correct pose of the AV 26 in the world coordinate system, the method 40 preferably includes a step 49 of adding the truth value Po into the constraints of SLAM. This truth constraint can be used to correct all previous positions estimated by the AV OBU 12 by correcting any deviations accumulated in the previous SLAM processes through a BA mechanism. The matched frame of point cloud data can be used as a key frame of point cloud data for subsequent SLAM processes. Preferably, some RSU generated or collected point cloud data is added to the key frame to provide an enhanced key frame. The enhanced key frame combines point cloud data from both the AV LiDAR sensor 13 and from some or all of the RSU LiDAR sensors 15 to provide an enhanced key frame with much denser data point than would be the case with known and traditional SLAM processes. The enhanced key frame enables the AV OBU 12 to perceive more road environment features including covering or at least reducing blind spots of the AV LiDAR sensor 13. The enhanced key frame also assists with future tracking of the AV 26 and implementing the BA mechanism.
The invention also provides an OBU for an AV operating in a SLAM environment, the OBU comprising a memory for storing machine-readable instructions and a processor for executing said machine-readable instructions to implement the methods of the invention.
The invention also provides an RSU operating in a SLAM environment, the RSU comprising a memory for storing machine-readable instructions and a processor for executing said machine-readable instructions configuring the RSU to the methods of the invention.
Whilst the preferred embodiments of the invention have been described with respect to an AV operating in a road network environment, it will be understood that the methods of the invention may be applied to moving objects in other environments.
Generally speaking, the size of one frame of point cloud will be around 500 KB to 2 MB and the V2X services may need a processing frequency of more than 10 Hz to process the point cloud data. The system of the present invention is advantageous in that it significantly reduces the amount of road-side point cloud data to around 100 KB to 500 KB for transmission with a cooperative arrangement of infrastructure of about 2 to 3 lamp poles as RSUs, and further reduces the size of blind area data to around 10 KB to 50 KB. With such a reduction in size of data transmission, the present invention can be utilized to support collaboration of multiple agents to cover a larger area in real-time.
The modules, units, and devices as described above may be implemented at least in part in software. Those skilled in the art will appreciate that the above described may be implemented at least in part using general purpose computer equipment or using bespoke equipment.
Here, aspects of the method and system described herein can be executed on any system or apparatus comprising the road monitoring system. Program aspects of the technology can be thought of as “products” or “articles of manufacture” typically in the form of executable code and/or associated data that is carried on or embodied in a type of machine readable medium. “Storage” type media include any or all of the memory of the mobile stations, computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives, and the like, which may provide storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunications networks. Such communications, for example, may enable loading of the software from one computer or processor into another computer or processor. Thus, another type of media that may bear the software elements includes optical, electrical, and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links, or the like, also may be considered as media bearing the software. As used herein, unless restricted to tangible non-transitory “storage” media, terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution.
While the invention has been illustrated and described in detail in the drawings and foregoing description, the same is to be considered as illustrative and not restrictive in character, it being understood that only exemplary embodiments have been shown and described and do not limit the scope of the invention in any manner. It can be appreciated that any of the features described herein may be used with any embodiment. The illustrative embodiments are not exclusive of each other or of other embodiments not recited herein. Accordingly, the invention also provides embodiments that comprise combinations of one or more of the illustrative embodiments described above. Modifications and variations of the invention as herein set forth can be made without departing from the spirit and scope thereof, and, therefore, only such limitations should be imposed as are indicated by the appended claims.
In the claims which follow and in the preceding description of the invention, except where the context requires otherwise due to express language or necessary implication, the word “comprise” or variations such as “comprises” or “comprising” is used in an inclusive sense, i.e., to specify the presence of the stated features but not to preclude the presence or addition of further features in various embodiments of the invention.
It is to be understood that, if any prior art publication is referred to herein, such reference does not constitute an admission that the publication forms a part of the common general knowledge in the art.