The present disclosure relates to a K-nearest neighbors search accelerator for a large-scale point cloud.
As an important step in a large number of lidar algorithms, K-nearest neighbors search is widely used in a simultaneous localization and mapping algorithm to correct a localization drift error, and matches a frame of lidar point cloud with an entire point cloud map to re-locate user-side and other related algorithms. Although a K-nearest neighbors search algorithm has a relatively simple structure, the K-nearest neighbors search algorithm occupies approximately 80% of time in a matching process due to a large number of query operations in a large-scale point cloud map [1].
Considering a real-time requirement of the simultaneous localization and mapping algorithm in a complex outdoor scenario, as well as a strict constraint of an autonomous vehicle on a battery, developing an energy-efficient algorithm for fast searching K-nearest neighbors is a huge challenge.
In order to improve performance of the K-nearest neighbors algorithm and reduce a power consumption, relevant experts have performed explorations in different aspects. A series of works such as [2] and [3] proposed a parallel pipeline K-nearest neighbors search algorithm based on a tree data structure. However, it only achieves a high search speed on a small-scale point cloud map. KD-tree construction time for the large-scale point cloud map is excessively long, which is unacceptable for the autonomous vehicle. In [4], a double-segmentation-voxel-structure (DSVS)-based hardware accelerator for K-nearest neighbors search has been proposed, which can quickly construct the large-scale point cloud map. The DSVS-based hardware accelerator for K-nearest neighbors search adaptively sets a side length to segment a space in which the point cloud is located into voxels, and further segments a dense voxel into sub-voxels. A DSVS can reduce a search area to a nearby (sub) voxel when dealing with massive and uneven point clouds. However, due to slow data transmission and redundant search areas, the DSVS performs slowly and inefficiently in the large-scale point cloud map. In addition, most K-nearest neighbors implementations such as [2] [4] can only handle a medium-sized point cloud map containing up to approximately 400,000 points, and thus are only suitable for inter-frame matching or local map matching. Search in the large-scale point cloud map may lead to problems such as long search algorithm construction time, long point cloud transmission time, complex search operations, and large search areas.
Technical problems to be solved by the present disclosure are redundant search areas and slow data transmission in a large-scale point cloud map.
In order to solve the above technical problems, the technical solution of the present disclosure is to provide a fast and energy-efficient K-nearest neighbors search accelerator for a large-scale point cloud, where an NSVS framework is constructed to perform search based on a DSVS search structure, and a K-nearest neighbors search algorithm for a large-scale point cloud map is implemented on a field programmable gate array (FPGA), where the fast and energy-efficient K-nearest neighbors search accelerator is configured for:
Preferably, the voxel and the sub-voxel are respectively indexed by using a hash value and a sub_hash value.
Preferably, there are at most 17 search voxels or search sub-voxels in total.
Preferably, in the second function module, when the candidate neighboring points of the query point have already been stored in the data reuse buffer in a previous query point search process, the candidate neighboring points are directly obtained from the data reuse buffer.
Preferably, an interval of the first function module, the second function module and the third function module is optimized to one cycle, and a data channel between the first function module, the second function module and the third function module is implemented as a stream type to enable a task-level pipeline.
The present disclosure proposes an FPGA implementation of a DSVS-based energy-efficient and fast K-nearest neighbors search algorithm for a large-scale point cloud map. Compared with the prior art, the present disclosure has following innovative points:
An experimental result on a KITTI dataset shows that the K-nearest neighbors search accelerator proposed in the present disclosure has a search speed 9.1 times faster than a state-of-the-art FPGA implementation. In addition, the solution of the present disclosure also achieves optimal energy efficiency, and the accelerator proposed in the present disclosure has energy efficiency 11.5 times and 13.5 times higher than state-of-the-art FPGA and GPU implementations respectively.
The present disclosure will be further described in detail below in connection with specific embodiments. It should be understood that these embodiments are only intended to describe the present disclosure, rather than to limit the scope of the present disclosure. In addition, it should be understood that various changes or modifications may be made on the present disclosure by those skilled in the art after reading the content of the present disclosure, and these equivalent forms also fall within the scope defined by the appended claims of the present disclosure.
In order to fast search for K-nearest neighbors in a large-scale point cloud map, an NSVS framework proposed in the present disclosure includes two parts: constructing a DSVS search structure, and searching for K-nearest neighbors based on the DSVS.
In a first part, the DSVS search structure is constructed.
As shown in
In a second part, the K-nearest neighbors are searched for.
The embodiments of the present disclosure first find query point q by calculating the hash value and the sub_hash value. Next, the embodiments of the present disclosure reduce a search area into several voxels and sub-voxels by using a provided NSVS technique, as shown by a dark area in
The embodiments of the present disclosure implement the NSVS framework on a heterogeneous system. This heterogeneous system has a powerful PS (CPU) and PL (programmable logic), as shown in
A K-nearest neighbors search accelerator mainly has the following four parts:
A data reuse rate is defined as a ratio of a quantity of query points to a quantity of reference points. When the data reuse rate exceeds a threshold, the reference set is accessed sequentially using the data reuse buffer. When the data reuse rate is lower than the threshold, the reference set is directly and randomly accessed through a plurality of large-bit-width ports. For a small reference set with a point cloud scale being at most 400,000 points, the query point is densely distributed in the reference set, thereby achieving a high data reuse rate. In this case, the embodiments of the present disclosure use a customized data reuse buffer. After point cloud sorting, points in a same voxel and sub-voxel are sequentially placed in a physically continuous memory. Therefore, cyclic partitioning is performed on a block ram on an FPGA to simultaneously access the points in the same voxel or sub-voxel. The data reuse buffer is continuously updated based on a change in the query point to ensure that a point around a current query point is always in the data reuse buffer. If a candidate neighbor of the query point has already been stored in the data reuse buffer in a previous query point search process, the candidate neighbor can be directly obtained from the data reuse buffer. This reduces a total amount of data transmitted from an external memory. However, when the data reuse rate is low, the query point may be sparsely distributed, resulting in few common candidate neighbors between adjacent query points. Therefore, for a super-large reference set with a maximum size of 6 million points, the embodiments of the present disclosure choose to obtain the candidate neighbor from the external memory in a random access mode. The candidate neighbor of the query point only includes a small portion of the reference set, and there is no need to transmit other reference points from the external memory to the accelerator. Therefore, the solutions disclosed in the embodiments of the present disclosure automatically select a faster transmission mode based on the data reuse rate, thereby increasing robustness of the present disclosure to different datasets.
The embodiments of the present disclosure use different strategies to reduce the search range based on a position of the query point. When the query point is located in the sub-voxel, in other words, in a dense area with many reference points, the present disclosure significantly reduces the search area.
The candidate neighboring points may come from the data reuse buffer or directly from the external memory, depending on the data reuse rate.
The embodiments of the present disclosure optimize the interval of the three hardware functions to one cycle, and implement a data channel between the functions as a stream type to enable a task-level pipeline.
For example, in the embodiments of the present disclosure, the search area is limited to a total of 3×3×3=27 voxels around the query point. In a DSVS data structure, it is difficult to control a quantity of sub-voxels because all sub-voxels within a search sphere with a radius of Rin are selected in the DSVS data structure. An uncertainty of the quantity of sub-voxels increases a difficulty of hardware implementation. Therefore, in the NSVS, if an adjacent voxel is further divided into sub-voxels, the embodiments of the present disclosure select only a sub-voxel nearest to the voxel containing the query point. Reasons are as follows:
There are two scenarios when the closest sub-voxel is selected:
In order to further achieve a load balance for the hardware implementation, as shown in
The above solutions can be applied to a K-nearest neighbors search step in a simultaneous localization and mapping process of autonomous driving or robots. A point cloud may be constituted by data of a lidar. For different types of datasets, the solutions provided in the present disclosure can accurately and quickly complete a K-nearest neighbors search task. Acceleration of the FPGA makes the entire algorithm to have better real-time performance and consume less energy.
Number | Date | Country | Kind |
---|---|---|---|
202410044699.5 | Jan 2024 | CN | national |
This application is the continuation application of International Application No. PCT/CN2024/118960, filed on Sep. 14, 2024, which is based upon and claims priority to Chinese Patent Application No. 202410044699.5, filed on Jan. 11, 2024, the entire contents of which are incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
11971258 | Liu | Apr 2024 | B2 |
20100106713 | Esuli et al. | Apr 2010 | A1 |
20220148281 | Sun | May 2022 | A1 |
20230195803 | Ahn | Jun 2023 | A1 |
Number | Date | Country |
---|---|---|
111860340 | Oct 2020 | CN |
112287185 | Jan 2021 | CN |
114240729 | Mar 2022 | CN |
117788591 | Mar 2024 | CN |
Entry |
---|
Ji Zhang, et al., Low-drift and Real-time Lidar Odometry and Mapping, Autonomous Robots, 2017, pp. 1-17. |
Yiming Li, et al., A Knn Accelerator Based on Approximate K-D Tree for ICP, 2022 International Conference on Image Processing and Media Computing (ICIPMC), 2022, pp. 124-128. |
Faquan Chen, et al., ParalleINN: A Parallel Octree-based Nearest Neighbor Search Accelerator for 3D Point Clouds, 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023, pp. 403-414. |
Hao Sun, et al., Efficient FPGA Implementation of K-Nearest-Neighbor Search Algorithm for 3D LIDAR Localization and Mapping in Smart Vehicles, IEEE Transactions on Circuits and Systems II: Express Briefs, 2020, pp. 1644-1648, vol. 67, No. 9. |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2024/118960 | Sep 2024 | WO |
Child | 18985065 | US |