The present application relates to autonomous driving and high-resolution map creation. More specifically, the present application relates to systems and methods for registering point clouds based on semantic information and multi-resolution segmentation in autonomous driving and/or creating high-resolution maps.
Point cloud registration is an important process in applications such as high-resolution map creation and autonomous driving. In a typical point cloud registration process, a source point cloud is registered to a target point cloud such that the two point clouds align or match to each other. Current methods register point clouds in an iterative process, in which the pose (e.g., position and orientation) of the source point cloud is iteratively changed to an optimal value (e.g., toward the pose of the target point cloud), starting from an initial estimation. However, existing methods are sensitive to the accuracy of the initial estimation. If the initial estimation is not sufficiently accurate, then the iteration process may converge to a local optimum, failing to reach the global optimal solution. In addition, some existing methods require division of the point cloud space into a grid of cells, and the performance of the registration relies on the size of the cells (also referred to as resolution). If the resolution is too low (cells are too big), the performance of the registration is poor. On the other hand, if the resolution is too high, the computational cost is also high, leading to low efficiency. The problem of balancing the performance, efficiency, and the choice of resolution remains untackled. Moreover, existing methods perform the registration process over the entire set of points in the point clouds without distinguishing the differences among different kinds of objects, often resulting in unsatisfactory results.
Embodiments in the present disclosure address the aforementioned problems by providing systems and methods for registering point clouds based on semantic information and multi-resolution segmentation.
In one aspect, a system for registering point clouds is provided. The system may include a memory storing computer-executable instructions and at least one processor communicatively coupled to the memory. The computer-executable instructions, when executed by the processor, may cause the processor to perform operations. The operations may include parsing semantic information from a source point cloud and a target point cloud. The operations may also include segmenting points in the source point cloud into first and second groups based on the semantic information parsed from the source point cloud. The operations may also include segmenting points in the target point cloud into third and fourth groups based on the semantic information parsed from the target point cloud. The operations may further include determining an initial pose of the source point cloud by registering the first group of points in the source point cloud to the third group of points in the target point cloud according to a first resolution. In addition, the operations may include adjusting the initial pose of the source point cloud by registering the second group of points in the source point cloud to the fourth group of points in the target point cloud according to a second resolution. The second resolution may be different from the first resolution.
In another aspect, a method for registering point clouds is provided. The method may include parsing semantic information from a source point cloud and a target point cloud. The method may also include segmenting points in the source point cloud into first and second groups based on the semantic information parsed from the source point cloud. The method may also include segmenting points in the target point cloud into third and fourth groups based on the semantic information parsed from the target point cloud. The method may further include determining an initial pose of the source point cloud by registering the first group of points in the source point cloud to the third group of points in the target point cloud according to a first resolution. In addition, the method may include adjusting the initial pose of the source point cloud by registering the second group of points in the source point cloud to the fourth group of points in the target point cloud according to a second resolution. The second resolution may be different from the first resolution.
In a further aspect, a non-transitory computer-readable medium storing instructions is provided. The instructions, when executed by at least one processor, may cause the processor to perform a method for registering point clouds. The method may include parsing semantic information from a source point cloud and a target point cloud. The method may also include segmenting points in the source point cloud into first and second groups based on the semantic information parsed from the source point cloud. The method may also include segmenting points in the target point cloud into third and fourth groups based on the semantic information parsed from the target point cloud. The method may further include determining an initial pose of the source point cloud by registering the first group of points in the source point cloud to the third group of points in the target point cloud according to a first resolution. In addition, the method may include adjusting the initial pose of the source point cloud by registering the second group of points in the source point cloud to the fourth group of points in the target point cloud according to a second resolution. The second resolution may be different from the first resolution.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
Light detection and ranging (LiDAR) devices have been widely used in applications such as high-resolution map creation and vehicle self-localization in autonomous driving. For example, a LiDAR device may be equipped on a survey vehicle to collect three-dimensional (3D) data of roads as well as surrounding environment as the survey vehicle is travelling along a trajectory. The collected 3D data may be in the form of point clouds, e.g., a set of points indicating the spatial locations on object surfaces that reflect laser beams emitted by the LiDAR device. Because the range of a LiDAR device is finite, a point cloud resulting from a LiDAR scan may include points within a limited space surrounding the survey vehicle. As the survey vehicle travels along a road, the LiDAR device may perform multiple scans, generating multiple point clouds, which may be combined to create a larger point cloud. A combined point cloud over an extended area may serve as an important part of a high-resolution map. Combining multiple point clouds often involves matching or aligning one point cloud to another (e.g., adjacent) point cloud, a process often referred to as point cloud registration.
In autonomous driving applications, a self-driving vehicle may sense the road conditions and surrounding environment using a LiDAR device, which may generate 3D information in the form of a point cloud (e.g., a source point cloud). The point cloud may be compared with a reference point cloud (e.g., a target point cloud) in a high-resolution map to determine the pose (e.g., position and orientation) of the self-driving vehicle, thereby providing, for example, high-precision self-localization information to aid self-driving decision making. The comparison may include matching or aligning the point cloud obtained by the LiDAR device (e.g., the source point cloud) to the corresponding reference point cloud (e.g., the target point cloud) in the high-resolution map, which is also a point cloud registration process.
Current point cloud registration methods such as iterative closest point (ICP) and normal-distributions transform (NDT) use an iterative approach, which starts from an initial pose estimation and iteratively change the pose toward an optimized direction. However, existing methods are sensitive to the initial pose estimation and susceptible to the errors in the initial pose estimation. For example, when the initial pose estimation is not sufficiently accurate, the iteration process may be trapped at a local optimal solution, failing to reach the global optimal solution. Moreover, in NDT, in order to compute the normal distributions, the point cloud space are divided into a grid of cells. The size of the cells, referred to as the spatial resolution or simply resolution, has significant impact on the performance of the registration. If the resolution is too low (cells are too big), the precision of the registration is poor, leading to low performance. On the other hand, if the resolution is too high, the computational cost is also high, leading to low efficiency. The problem of balancing the performance, efficiency, and the choice of resolution remains untackled. Further, existing methods perform the registration process over the entire set of points in the point clouds without distinguishing the differences among different kinds of objects, often resulting in unsatisfactory results.
Embodiments of the present disclosure provide improved systems and methods for registering point clouds, utilizing semantic information contained in the point clouds to adaptively choose the resolution of the spatial grid. For example, embodiments of the present disclosure can parse the semantic information from source and target point clouds by classifying points into categories using a trained classifier and associating semantic labels with points in the categories. Based on the semantic information, points in the point clouds can be segmented into different groups (e.g., based on common properties such as size, shape, curvature, etc.). Then, a lower resolution is used to generate a relatively coarse spatial grid to determine an initial pose of the source point cloud by registering one group of segmented points in the source point cloud to the corresponding group of points in the target point cloud. The initial pose can be refined and/or adjusted by applying a higher resolution (e.g., relatively dense) spatial grid to register another group of segmented points in the source point cloud to the corresponding group of segmented points in the target point cloud. In this way, the initial pose can be obtained more efficiently than conventional methods because a relatively coarse spatial grid with a relatively low resolution is used to register a subset of points (e.g., corresponding to objects having larger size or less details) segmented based on semantic information. Highly precise and/or accurate registration can be obtained by refining/adjusting the initial pose using a relatively dense spatial grid with a relatively high resolution to register another subset of points (e.g., corresponding to objects having smaller size or finer details). Embodiments of the present disclosure can also improve the robustness of point cloud registration because the initial pose determination using low resolution spatial grid is less sensitive and suspectable to estimation errors and less likely to cause the iteration process to be trapped in local optima compared to conventional methods.
Embodiments of the present disclosure may be implemented using hardware, software, firmware, or any combination thereof. Components of the embodiments can reside in a cloud computing environment, one or more servers, one or more terminal devices, or any combination thereof. In some cases, at least part of the point cloud registration system disclosed herein may be integrated with or equipped as an add-on device to a vehicle. For example,
As illustrated in
In some embodiments, LiDAR device 140 may be configured to capture data as vehicle 100 moves along a trajectory. For example, LiDAR device 140 may be configured to scan the surrounding and acquire point clouds. LiDAR measures distance to a target object by illuminating the target object with pulsed laser beams and measuring the reflected pulses with a photodetector. Differences in laser return times, phases, or wavelengths can then be used to calculate distance information (also referred to as “range information”) and construct digital 3-D representations of the target object (e.g., a point cloud). The laser light used for LiDAR scan may be ultraviolet, visible, or near infrared. As vehicle 100 moves along the trajectory, LiDAR device 140 may acquire a series of point clouds at multiple time points, which may be used to construct a high definition map or facilitate autonomous driving.
Memory 230 may be configured to store computer-executable instructions that, when executed by at least one processor (e.g., processor 210), can cause the at least one processor to perform various operations disclosed herein. Memory 230 may be any non-transitory type of mass storage, such as volatile or non-volatile, magnetic, semiconductor-based, tape-based, optical, removable, non-removable, or other type of storage device or tangible computer-readable medium including, but not limited to, a ROM, a flash memory, a dynamic RAM, and a static RAM.
Processor 210 may be configured to perform the operations in accordance with the computer-executable instructions stored on memory 230. Processor 210 may include any appropriate type of general-purpose or special-purpose microprocessor, digital signal processor, microcontroller, or the like. Processor 210 may be configured as a separate processor module dedicated to performing one or more specific operations. Alternatively, processor 210 may be configured as a shared processor module for performing other operations unrelated to the one or more specific operations disclosed herein. As shown in
Communication interface 220 may be configured to communicate information between system 200 and other devices or systems. For example, communication interface 220 may include an integrated services digital network (ISDN) card, a cable modem, a satellite modem, or a modem to provide a data communication connection. As another example, communication interface 220 may include a local area network (LAN) card to provide a data communication connection to a compatible LAN. As a further example, communication interface 220 may include a high-speed network adapter such as a fiber optic network adaptor, 10G Ethernet adaptor, or the like. Wireless links can also be implemented by communication interface 220. In such an implementation, communication interface 220 can send and receive electrical, electromagnetic or optical signals that carry digital data streams representing various types of information via a network. The network can typically include a cellular communication network, a Wireless Local Area Network (WLAN), a Wide Area Network (WAN), or the like
In some embodiments, communication interface 220 may communicate with a database 250 to exchange information related to point cloud registration. Database 250 may include any appropriate type of database, such as a computer system installed with a database management software. Database 250 may store high-resolution map data, target point cloud data, source point cloud data generated by LiDAR device 140, pose information generated by system 200, training data sets, or any data related to point cloud registration.
In some embodiments, communication interface 220 may communicate with an output device, such as a display 260. Display 260 may include a display device such as a Liquid Crystal Display (LCD), a Light Emitting Diode Display (LED), a plasma display, or any other type of display, and provide a Graphical User Interface (GUI) presented on the display for user input and data depiction. For example, pose information, source and/or target point cloud image rendering, a high-resolution map, a navigation interface, or any other information related to point cloud registration may be displayed on display 260.
In some embodiments, communication interface 220 may communicate with a terminal device 270. Terminal device 270 may include any suitable device that can interact with a user and/or vehicle 100. For example, terminal device 270 may include LiDAR device 140, a desktop computer, a laptop computer, a smart phone, a tablet, a wearable device, a vehicle on-board computer, an autonomous driving computer, or any kind of device having computational capability sufficient to support collecting, processing, or storing point cloud data, pose information, autonomous driving information, or the like.
Regardless of which devices or systems are communicatively coupled to system 200 through communication interface 220, communication interface 220 may receive a source point cloud 280 and a target point cloud 282, and generate a pose 290 associated with source point cloud 280 such that applying pose 290 to source point cloud 280 would register source point cloud 280 to target point cloud 282. Pose 290 may include various types of information. For example, pose 290 may include linear spatial transformation or shifting, rotation along any suitable axis, panning, tilting, pitching, yawing, rolling, or any other suitable manner of spatial movement.
In some embodiments, processor 210 may be configured to receive source point cloud 280 from database 250. For example, database 250 may store multiple point clouds obtained by a survey vehicle (e.g., vehicle 100). The multiple point clouds may contain partially overlapping portions (e.g., adjacent point clouds obtained from consecutive scans using LiDAR device 140). These partially overlapping point clouds may be combined to creating a high-resolution map. System 200 may be used to register one point cloud (used as a source point cloud) to another overlapping point cloud (used as a target point cloud) such that the two point clouds align with each other and the overlapping portions match with each other. In this way, multiple point clouds can be combined and connected to form a larger point cloud covering an extended area, based on which a high-resolution map may be created. In some embodiments, combining multiple point clouds may be performed by system 200 onboard vehicle 100, such that multiple point clouds can be combined on-the-fly as new point cloud data are collected by LiDAR device 140. In this case, system 200 may receive source and target point cloud data from LiDAR device 140 instead of or in addition to database 250.
After receiving source point cloud 280 and target point cloud 282, processor 210 may, using one or more modules such as 212-218, register source point cloud 280 to target point cloud 282 to generate pose 290, which may be stored in memory 230 and/or sent to other devices/systems such as database 250, display 260, and terminal device 270. An exemplary work flow of registering source point cloud 280 to target point cloud 282 is illustrated in
In step 410, semantic information parser 212 may parse semantic information from source point cloud 280 and/or target point cloud 282. For example, point clouds 280/282 may include surface points on a variety of objects. Semantic information may be parsed from the types, kinds, and/or categories of the underlying objects on which the points lie. In some embodiments, semantic information may include category information of the objects, for example, cars, trucks, pedestrians, buildings, plants, road signs, street lights, traffic lights, etc. In some embodiments, semantic information may include the size, shape, and/or curvature of the objects, for example, straight lines, curved lines, plane surfaces, curvatures, large objects, small objects, etc. In some embodiments, semantic information may include movement information of the objects, for example, stationary objects, moving objects, etc. It is contemplated that semantic information is not limited to the above-mentioned examples. Rather, semantic information may include any cognizable information that may distinguish one kind of objects from another kind of objects.
In some embodiments, semantic information parser 212 may parse semantic information using a classifier. The classifier may be based on a learning model and may be trained with a training data set.
In step 510, the training processor may use a deep neural network such as PointNet to compute the semantic features. The semantic features may be in any computer-readable form, such as a collection of points, lines, surfaces, shapes, with various levels of details, as known in the field of neural network. Base on the semantic features, the training processor may, in step 520, train the classifier to classify points into semantic categories, for example, associating semantic features with semantic labels. In some embodiments, the classifier may be in the form of a neural network, such as a PointNet-based neural network with trained parameters based on the training data set.
After the classifier is trained, semantic information parser 212 may use the classifier to classify the points in the source and target point clouds.
It is contemplated that semantic labels and categories may or may not be the same. In some embodiments, associating semantic labels may be the same process as classifying points into categories. In such cases, sub-steps 412 and 414 may be combined into a single step. In some embodiments, associating semantic labels and classifying points into categories may be different processes. In such cases, sub-steps 412 and 414 may be separate steps.
In some embodiments, semantic information parser 212 may divide the source/target point clouds (280/282) into point blocks, each corresponding to a category or semantic label. In some embodiments, semantic information parser 212 may traverse each point in the source/target point clouds (280/282), and apply a category or semantic label to each point. In any case, after the points in the source and target point clouds (280 and 282) are categorized, classified, and/or associated with semantic labels, the semantic information parsing step 410 may finish.
Referring back to
In step 422, segmentation unit 214 may segment points associated with a first set of semantic labels into the first group in source point cloud 280. For example, the first set of semantic labels may correspond to objects having a first range of dimensions, such as buildings, plants, cars, trucks. Segmentation unit 214 may segment points in source point cloud 280 associated with the first set of semantic labels into a “large-size” group in source point cloud 280 (e.g., denoted as C_large_src). Similarly, in step 426, segmentation unit 214 may segment points associated with the first set of semantic labels into the third group in target point cloud 282. For example, segmentation unit 214 may segment points in target point cloud 282 associated with buildings, plants, cars, trucks, or similar semantic labels into a “large-size” group in target point cloud 282 (e.g., denoted as C_large_tgt).
In step 424, segmentation unit 214 may segment points associated with a second set of semantic labels into the second group in source point cloud 280. For example, the second set of semantic labels may correspond to objects having a second range of dimensions that are smaller than the objects having the first range of dimensions, such as pedestrians, artificial objects such as traffic lights, road signs, street marks, etc. Segmentation unit 214 may segment points in source point cloud 280 associated with the second set of semantic labels into a “small-size” group in source point cloud 280 (e.g., denoted as C_small_src). Similarly, in step 428, segmentation unit 214 may segment points associated with the second set of semantic labels into the fourth group in target point cloud 282. For example, segmentation unit 214 may segment points in target point cloud 282 associated with pedestrians, artificial objects such as traffic lights, road signs, street marks, or similar semantic labels into a “small-size” group in target point cloud 282 (e.g., denoted as C_small_tgt).
In step 430, registration unit 218 may determine an initial pose pi of source point cloud 280 according to a first resolution provided by resolution selector 216. For example, registration unit 218 may determine the initial pose of source point cloud 280 by registering the first group of points in source point cloud 280 (e.g., C_large_src) to the third group of points in target point cloud 282 (e.g., C_large_tgt) according to a first resolution (e.g., r1).
In step 434, registration unit 218 may compute a local representation of points within a first target cell falling into the third group (e.g., group C_large_tgt). In some embodiments, the local representation may include at least one of a mean or a covariance of points in the first target cell. For example, for a target cell t (in the target point cloud 282, denoted by T) that falls within group C_large_tgt, registration unit 218 may compute the mean and covariance for the points within target cell t:
where m is the number of points in t, {right arrow over (y)}k is the position/spatial vector of the kth point, {right arrow over (μ)} is the mean, and Σ is the covariance. In some embodiments, registration unit 218 may compute local representations (e.g., means and covariances) for all target cells that fall within group C_large_tgt.
In step 436, based on the mean {right arrow over (μ)} and covariance Σ, registration unit 218 may compute, for a point {right arrow over (x)} in a source cell s (in the source point cloud 280, denoted by S) that falls within group C_large_src, the likelihood that point {right arrow over (x)} also lies in target cell t:
where D is a normalization coefficient, and p({right arrow over (x)}) is the probability function (e.g., indicating the likelihood) that point {right arrow over (x)} also lies in target cell t. The larger the value of p({right arrow over (x)}), the more likely that point {right arrow over (x)} also lies in target cell t. In some embodiments, registration unit 218 may compute the probability functions for all points in source cell s.
In step 438, registration unit 218 may register the first group of points (e.g., points in group C_large_src) in the source point cloud 280 to the third group of points (e.g., points in group C_large_tgt) in the target point cloud 282 by optimizing a collective likelihood that points within multiple source cells in the first group C_large_src also lie in corresponding target cells in the third group C_large_tgt. For example, the collective likelihood function can be represented as:
wherein T({right arrow over (p)},{right arrow over (x)}k) is a spatial transformation function that moves a point {right arrow over (x)}k in space by a pose {right arrow over (p)}, and n is the total number of points in group C_large_src that includes multiple source cells. The initial pose pi can be computed by maximizing the collective likelihood function Ψ. Maximizing Ψ can be solved as an optimization problem using an iterative approach, in which each iteration would move the solution toward the optimal solution. After multiple iterations, or until a tolerance or threshold is reaches, the initial pose pi can be obtained.
Because initial pose pi is obtained using relatively low resolution r1 and by registering points segmented into the large-size groups (C_large_src and C_large_tgt) in source and target point clouds 280 and 282, respectively, the computational cost is relatively low and the optimization iteration process is less susceptible to initial estimation errors and local optimum problem. Thus, the efficiency and speed can be improved. After the initial pose pi is obtained, the initial pose can be refined by finely adjusting the initial pose to achieve higher precision and accuracy using a second, higher resolution r2.
Referring back to
In step 444, registration unit 218 may compute a local representation of points within a second target cell falling into the fourth group (e.g., group C_small_tgt). In some embodiments, the local representation may include at least one of a mean or a covariance of points in the second target cell. For example, for a target cell t′ (in the target point cloud 282, denoted by T) that falls within group C_small_tgt, registration unit 218 may compute the mean and covariance for the points within target cell t′ according to equation (1) discussed above. In some embodiments, registration unit 218 may compute local representations (e.g., means and covariances) for all target cells that fall within group C_small_tgt.
In step 446, based on the mean {right arrow over (μ)} and covariance Σ, registration unit 218 may compute, for a point {right arrow over (x)} in a source cell s′ (in the source point cloud 280, denoted by S) that falls within group C_small_src, the likelihood that point x also lies in target cell t′ can be computed according to equation (2), as discussed above. In some embodiments, registration unit 218 may compute the probability functions for all points in source cell s′.
In step 448, registration unit 218 may register the second group of points (e.g., points in group C_small_src) in the source point cloud 280 to the fourth group of points (e.g., points in group C_small_tgt) in the target point cloud 282 by optimizing a collective likelihood that points within multiple source cells in the second group C_small_src also lie in corresponding target cells in the fourth group C_small_tgt. The collective likelihood function can be represented by equation (3) discussed above. The refined pose pr can be computed by maximizing the collective likelihood function Ψ. Maximizing Ψ can be solved as an optimization problem using an iterative approach, in which each iteration would move the solution toward the optimal solution. After multiple iterations, or until a tolerance or threshold is reaches, the refined pose pr can be obtained.
Another aspect of the disclosure is directed to a non-transitory computer-readable medium storing instructions which, when executed by at least one processor (e.g., processor 210), cause the at least one processor to perform the methods disclosed herein. The computer-readable medium may include volatile or non-volatile, magnetic, semiconductor-based, tape-based, optical, removable, non-removable, or other types of computer-readable medium or computer-readable storage devices. For example, the computer-readable medium may be the storage device or the memory module having the computer instructions stored thereon, as disclosed. The computer-readable medium may be a disc, a flash drive, or a solid-state drive having the computer instructions stored thereon.
It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed system and related methods. Other embodiments will be apparent to those skilled in the art from consideration of the specification and practice of the disclosed system and related methods.
It is intended that the specification and examples be considered as exemplary only, with a true scope being indicated by the following claims and their equivalents.
This application is a bypass continuation to PCT Application No. PCT/CN2019/107538, filed Sep. 24, 2019, the content of which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2019/107538 | Sep 2019 | US |
Child | 17701496 | US |