1. Statement of the Technical Field
The inventive arrangements concern registration of point cloud data, and more particularly registration of point cloud data for targets in the open and under significant occlusion.
2. Description of the Related Art
One problem that frequently arises with imaging systems is that targets may be partially obscured by other objects which prevent the sensor from properly illuminating and imaging the target. For example, in the case of an optical type imaging system, targets can be occluded by foliage or camouflage netting, thereby limiting the ability of a system to properly image the target. Still, it will be appreciated that objects that occlude a target are often somewhat porous. Foliage and camouflage netting are good examples of such porous occluders because they often include some openings through which light can pass.
It is known in the art that objects hidden behind porous occluders can be detected and recognized with the use of proper techniques. It will be appreciated that any instantaneous view of a target through an occluder will include only a fraction of the target's surface. This fractional area will be comprised of the fragments of the target which are visible through the porous areas of the occluder. The fragments of the target that are visible through such porous areas will vary depending on the particular location of the imaging sensor. However, by collecting data from several different sensor locations, an aggregation of data can be obtained. In many cases, the aggregation of the data can then be analyzed to reconstruct a recognizable image of the target. Usually this involves a registration process by which a sequence of image frames for a specific target taken from different sensor poses are corrected so that a single composite image can be constructed from the sequence.
In order to reconstruct an image of an occluded object, it is known to utilize a three-dimensional (3D) type sensing system. One example of a 3D type sensing system is a Light Detection And Ranging (LIDAR) system. LIDAR type 3D sensing systems generate image data by recording multiple range echoes from a single pulse of laser light to generate an image frame. Accordingly, each image frame of LIDAR data will be comprised of a collection of points in three dimensions (3D point cloud) which correspond to the multiple range echoes within sensor aperture. These points are sometimes referred to as “voxels” which represent a value on a regular grid in three dimensional space. Voxels used in 3D imaging are analogous to pixels used to in the context of 2D imaging devices. These frames can be processed to reconstruct an image of a target as described above. In this regard, it should be understood that each point in the 3D point cloud has an individual x, y and z value, representing the actual surface within the scene in 3D.
Aggregation of LIDAR 3D point cloud data for targets partially visible across multiple views or frames can be useful for target identification, scene interpretation, and change detection. However, it will be appreciated that a registration process is required for assembling the multiple views or frames into a composite image that combines all of the data. The registration process aligns 3D point clouds from multiple scenes (frames) so that the observable fragments of the target represented by the 3D point cloud are combined together into a useful image. One method for registration and visualization of occluded targets using LIDAR data is described in U.S. Patent Publication 20050243323. However, the approach described in that reference requires data frames to be in close time-proximity to each other is therefore of limited usefulness where LIDAR is used to detect changes in targets occurring over a substantial period of time.
The invention concerns a method for registration of two or more of frames of three dimensional (3D) point cloud data concerning a target of interest. The method begins by acquiring at least a first frame and a second frame, each containing 3D point cloud data collected for a selected object. Thereafter, a density image for each of the first frame and the second frame is obtained respectively by projecting the 3D point cloud data from each of the first frame and the second frame to a two dimensional (2D) plane. Using the density images obtained from the first frame and the second frame, one or more translation vectors is determined. The translation vector or vectors are used to perform a coarse registration of the 3D point cloud data in one or more of the XY plane and the Z direction using the one or more translation vector. According to one aspect of the invention, the method can include the step of selecting for registration a sub-volume of the 3D point cloud data from each frame which includes less than a total volume of the 3D point cloud data.
The density images for each of the first frame and the second frame include a pair of XY density images which are obtained by setting to zero a z coordinate value of each data point in a 3D point cloud contained in the first and second frame. The density images for each of the first frame and the second frame also include a pair of XZ density images which are obtained by setting to zero a y coordinate value of each data point in a 3D point cloud contained in the first and second frame.
Each of the foregoing density images are filtered to obtain a filtered density image. The filtering includes median filtering, edge enhancement filtering, or both types of filtering. The one or more translation vectors are determined by performing a cross-correlation of the filtered density image obtained from the first frame and the filtered density image obtained from the second frame. Once the cross-correlation is performed the one or more translation vectors is determined based on the location of a peak in the cross-correlation output matrix.
The coarse registration of the 3D point cloud data from the first frame and the second frame is advantageously performed with respect to both the XY plane and in the Z axis direction using a plurality of the translation vectors. Thereafter the method continues by performing a fine registration process on the 3D point cloud data from the first frame and the second frame.
The fine registration process includes several steps. For example, the fine registration process begins by defining two or more sub-volumes within each of the first and second (3D) frames. Thereafter, one or more qualifying ones of the sub-volumes are identified which include selected arrangements of 3D point cloud data. This step is performed calculating a set of eigen values for each of the sub-volumes. Thereafter, a set of eigen-metrics are calculated using the eigen values. The eigen metrics are selected so as to identify sub-volumes containing 3D point clouds that have a blob-like arrangement. This process is continued for both adjacent and non-adjacent scenes, such as frames 1, 2; 1, 3; 1, 4; 2, 3; 2, 4; 2, 5 and so on, where consecutively numbered frames are adjacent, and non consecutively numbered frames are not adjacent.
The method also includes the step of identifying qualifying data points in the qualifying ones of the sub-volumes. The qualifying data points include two or more of pairs of data points. Each pair of data points comprises a first data point in the first frame that most closely matches a position of a corresponding second data point in the second frame.
Once the qualifying data points are identified for all scene pairs (as described above), an optimization routine is simultaneously performed on the 3D point cloud data associated with all of the frames. The optimization routine is used to determine a global rotation, scale, and translation matrix applicable to all points and all frames in the data set. Consequently, a global transformation is achieved rather than a local frame to frame transformation.
It should be understood that there are many optimizations routines available that can be used for a fine registration process. Some are local in the sense that they operate on adjacent frames only. In contrast, the present invention advantageously uses a global transform for fine registration of all frames as once. In this regard, the invention is unlike conventional approaches that do a frame to frame registration for the fine registration process or an average across several frames. Although these conventional approaches are commonly used, they have been found to be inadequate for purposes of producing a satisfactory result.
The global transform that is used with the present invention advantageously collects all the correspondences for each pair of frames of interest. In this regard it should be understood that the term “pairs” as used herein does not refer merely to frames that are adjacent such as frame 1 and frame 2. Instead, pairs can include frames 1, 2; 1, 3, 1, 4, 2, 3; 2, 4; 2, 5 and so on. All of these pair correspondences are then used simultaneously in a global optimization routine in the fine registration step. Parameters that minimize the error between all frames simultaneously are output and used to transform the frames.
In order to understand the inventive arrangements for registration of a plurality of frames of three dimensional point cloud data, it is useful to first consider the nature of such data and the manner in which it is conventionally obtained.
For convenience in describing the present invention, the physical location 108 will be described as a geographic location on the surface of the earth. However, it will be appreciated by those skilled in the art that the inventive arrangements described herein can also be applied to registration of data from a sequence comprising a plurality of frames representing any object to be imaged in any imaging system. For example, such imaging systems can include robotic manufacturing processes, and space exploration systems.
Those skilled in the art will appreciate a variety of different types of sensors, measuring devices and imaging systems exist which can be used to generate 3D point cloud data. The present invention can be utilized for registration of 3D point cloud data obtained from any of these various types of imaging systems.
One example of a 3D imaging system that generates one or more frames of 3D point cloud data is a conventional LIDAR imaging system. In general, such LIDAR systems use a high-energy laser, optical detector, and timing circuitry to determine the distance to a target. In a conventional LIDAR system one or more laser pulses is used to illuminate a scene. Each pulse triggers a timing circuit that operates in conjunction with the detector array. In general, the system measures the time for each pixel of a pulse of light to transit a round-trip path from the laser to the target and back to the detector array. The reflected light from a target is detected in the detector array and its round-trip travel time is measured to determine the distance to a point on the target. The calculated range or distance information is obtained for a multitude of points comprising the target, thereby creating a 3D point cloud. The 3D point cloud can be used to render the 3-D shape of an object.
In
It should be appreciated that in many instances, the occluding material 106 will be somewhat porous in nature. Consequently, the sensors 102-I, 102-j will be able to detect fragments of the target which are visible through the porous areas of the occluding material. The fragments of the target that are visible through such porous areas will vary depending on the particular location of the sensor 102-i, 102j. However, by collecting data from several different sensor poses, an aggregation of data can be obtained. In many cases, the aggregation of the data can then be analyzed to reconstruct a recognizable image of the target.
In
From the foregoing, it will be understood that the 3D point cloud data 200-i, 200-j respectively contained in frames i, j will be based on different sensor-centered coordinate systems. Consequently, the 3D point cloud data in frames i and j generated by the sensors 102-i, 102-j, will be defined with respect to different coordinate systems. Those skilled in the art will appreciate that these different coordinate systems must be rotated and translated in space as needed before the 3D point cloud data from the two or more frames can be properly represented in a common coordinate system. In this regard, it should be understood that one goal of the registration process described herein is to utilize the 3D point cloud data from two or more frames to determine the relative rotation and translation of data points necessary for each frame in a sequence of frames.
It should also be noted that a sequence of frames of 3D point cloud data can only be registered if at least a portion of the 3D point cloud data in frame i and frame j is obtained based on common subject matter (i.e. the same physical or geographic area). Accordingly, at least a portion of frames i and j will generally include data from a common geographic area. For example, it is generally preferable for at least about ⅓ common of each frame to contain data for a common geographic area, although the invention is not limited in this regard. Further, it should be understood that the data contained in frames i and j need not be obtained within a short period of time of each other. The registration process described herein can be used for 3D point cloud data contained in frames i and j that have been acquired weeks, months, or even years apart.
An overview of the process for registering a plurality of frames i, j of 3D point cloud data will now be described in reference to
The process continues in step 400, which involves performing a coarse registration of the data contained in frame i and j with respect to the x, y plane. Thereafter, a similar coarse registration of the data in frames i and j is performed in step 500 with respect to the x, z plane. These coarse registration steps will be described in more detail below. However, it should be noted that the coarse registration process described herein advantageously involves selection of one particular frame be designated as a reference frame to which all other frames will be aligned. In
In step 600, a determination is made as to whether coarse registration has been completed for all n frames in a sequence of frames which are to be registered. If not, then the value of j is incremented in step 602 and the process returns to step 304 to acquire the point cloud data for the next frame j. Thereafter, steps 304, 400, 500, 600 and 602 are repeated until registration is completed for all n frames. At that point, the process will proceed to step 700.
In step 700, all coarsely adjusted frame pairs from the coarse registration process in steps 400, 500 and 600 are processed simultaneously to provide a more precise registration. Step 700 involves simultaneously calculating global values of RjTj for all n frames of 3D point cloud data, where Rj is the rotation vector necessary for aligning or registering all points in each frame j to frame i, and Tj is the translation vector for aligning or registering all points in frame j with frame i.
Step 800 is the final step in the registration process. In step 800, the calculated values for Rj and Tj for each frame as calculated in step 700 are used to translate the point cloud data from each frame to a common coordinate system. For example, the common coordinate system can be the coordinate system of frame i. At this point the registration process is complete for all frames in the sequence of frames. For example, a sensor may collect 25 to 40 consecutive frames consisting of 3D measurements during a collection interval. All of these frames can be aligned with the process described in
Steps 400, 500, 700 and 800 in
In one embodiment of the invention, the sub-volume of the frame that is selected can include 3D point cloud data points corresponding to locations which are slightly above the surface of the ground level and extending to some predetermined altitude or height above ground level. For example, a sub-volume ranging from z=0.5 meters above ground-level, to z=6.5 meters above ground level, is usually sufficient to include most types of vehicles and other objects on the ground. Still, it should be understood that the invention is not limited in this regard. In other circumstances it can be desirable to choose a sub-volume that begins at a higher elevation relative to the ground so that the registration is performed based on only the taller objects in a scene, such as tree trunks. For objects obscured under tree canopy, it is desirable to select the sub-volume that extends from the ground to just below the lower tree limbs.
In step 404, the various data points that comprise the 3D point cloud 200-i, 200-j are projected to their respective x, y plane from their location in the point clouds. Stated another way, the x and y values of the data points in each frame remain the same, while the z value for each of the data points is set to zero. The result of step 404 is to convert each frame i, j comprised of the 3D point cloud data to a 2 dimensional frame in the x, y plane (XY frame).
In step 405, the projection of the 3D point cloud data to the XY plane for frames i and j is used to generate XY density images. According to one embodiment, the XY density images are created by using a window of size 5*voxelsize×5*voxelsize. A voxel is a cube of scene data. Here, the term “voxelsize” refers to the length of an edge of a single cubic voxel. For instance, a single voxel can have a size of (0.2 m)3 based upon the LIDAR sensor resolution. In that case, the voxelsize would be 0.2 m, and the filter window has dimensions 1.0 m×1.0 m This window is used to process the 2D XY projection of the volumetric data (Z=0). The voxelsize*numvoxels (minus any filter edge effects) will equal the width of the density image, Here, the term numvoxel refers to the number of voxels that are aligned in a direction corresponding to a width dimension of the density image. Notably, the width of the density image is very close to the width of 2D XY projection image after partial voxels and edge effects are removed.
The window described above is passed over the 2D projection and the number of hits in the window (density) is used as value at that window location. The voxelsize is based on the processing resolution of the data and is in meters. Note that the creation of density images as described herein gives less ‘weight’ to sparse voxels, i.e. those voxels with few ‘hits’. Those regions are not significant in the coarse registration process and may include sparse foliage (bushes, low-lying limbs). Also they are not as stable over time as rocks and tree trunks.
Using the procedure described above, an XY density image for frame i is obtained, and an XY density image for frame j is obtained.
The purpose of the XY density images as described herein is to allow a subsequently applied filtering process to find edge content of the 2D shapes that will be registered. It should be noted that the approach described herein, involving filtering of a density image, is the preferred method for registration of for certain types of objects appearing in an image. In particular, this process works well for objects which are out in the open (i.e. not occluded or only minimally occluded) since it is simpler to apply computationally, versus an eigenmetric method. Based on the limited number of data points within each frame for objects that are heavily occluded, one skilled in the art might anticipate that this approach would not work with more heavily occluded objects. However, the registration technique has also been found to work unexpectedly well for objects under tree canopies. If the slice of data samples from a 3D image is carefully selected, enough shape content is available to perform the correlation and therefore complete the coarse registration of the ‘incomplete’ frames as described below. In this regard, the slice of data points is preferably selected in such instances so as to include only data points between ground level to just under the lower tree limbs.
In step 406, the process continues with one or more filtering steps to create a filtered XY density image i and a filtered XY density image j respectively from the XY density image for frame i and the XY density image for frame j. In this regard, step 406 includes (1) performing median filtering of the XY density images i, j, and (2) performing Sobel edge filtering of the median filter XY density images i, j. These filtering steps will now be described in greater detail.
The median filter step is performed primarily to reduce noise in the XY density images i, j. Median filters are well known in the art. Accordingly, the process will not be described here in detail. In general, however, median filtering involves selection of a filter mask that is a certain number of pixels in height and width. The exact size of the mask can vary according to the particular application. In the present case, a filter mask having a size of 5 pixels high×5 pixels wide has been found to provide suitable results. However, the invention is not limited in this regard. In practice, the mask is slid over the image and the center pixel contained within the mask is examined to determine if it has similar values as compared to its neighboring pixels. If not, this is often an indication that the particular pixel has been corrupted by noise. Accordingly, the median filter will replace the center pixel value with the median of the remaining pixels values under the mask. The median is calculated by first sorting all the pixel values under the mask into numerical order and then replacing the pixel being considered with the middle pixel value.
The preparation of the filtered density images also involves edge filtering. Those skilled in the image processing field will readily appreciate that for the purpose of aligning two images, it can be helpful to identify the edges of objects contained in the image. For example, detecting the edges of objects forming an image will substantially reduce the total amount of data contained in the image. Edge detection preserves the important structural properties of an image but will remove information which is not generally useful for purposes of image alignment. Accordingly, it is advantageous to perform edge filtering on the XY density images after median filtering has been performed.
As used herein, the term “edge” generally refers to areas within a two-dimensional image where there exist strong intensity contrasts. In such areas, there is usually a rapid variation in intensity as between adjacent pixels. In this regard, it should be understood that while there are many different ways to perform edge detection, and all such methods are intended to be included within the scope of the present invention. For the purpose of the present invention, edge filtering can include any technique now known, or which is discovered in the future, which can be used for detecting or emphasizing edges within an image.
According to a preferred embodiment, edge filtering in the present invention can be carried out using a conventional Sobel filter. In a Sobel filtering process, a Sobel operator is used to determine a 2-D spatial gradient measurement on an image. Conventional techniques for Sobel filter processing are well known. Accordingly, the Sobel filtering technique will not be described here in great detail. Typically, however, a first convolution mask 3 pixels high and 3 pixels wide is used for determining a gradient in the x-direction. A second convolution mask of the same size is used for determining a gradient in the y-direction. In this regard, it should be understood that each of the first and second convolution masks will be much smaller than the actual XY density image. The masks are each slid over the image, manipulating one 3×3 group of pixels at a time in accordance with the Sobel operator. The first convolution mask highlights the edges in a first direction while the second convolution mask highlights the edges in a second direction, transverse to the first direction. As used herein, the term “highlight” can refer to any image or data enhancement that allows edges of point clouds to be more clearly defined. The result of the process is edges that are highlighted in directions aligned with both the x and y axis.
In step 408, an XY translation error is determined. The XY translation error is a shift or offset in the x, y plane which exists as between the image data represented in the filtered XY density image i and the image data represented by the filtered XY density image j. The XY translation error can be defined by a vector which identifies the direction and distance of the shift or offset as between the two filtered XY density images i, j. One method for determining the XY translation error is by performing a cross-correlation of the filtered density images i, j. It is well known in the art that the cross-correlation of two images is a standard approach which can be used for identifying similarities as between two images. If two images contain at least some common subject matter, the cross-correlation process will generally result in a peak in the correlation value at a location which corresponds to the actual XY translation error.
Notably, for frames that are taken consecutively, there is little rotation variation. In other words, there is little rotational variation in the point of view of the imaging device relative to the scene being imaged. In such circumstances, a correlation method that is not invariant to rotation as between the scenes contained in two images can be used. For example a conventional normalized correlation method can be used for this purpose.
Still, it should be understood that a normalized correlation is generally only usable for rotational variations of two or three degrees. For frames taken at substantially different times (e.g. 6 months apart) and from different orientations, the 2D projections (in the case of the preferred mode for objects in the open) and the 3D volumes (in the case of the preferred mode for occluded objects under trees, there can be significant rotation errors using such conventional normalized correlation processes. This problem can be addressed by collecting supporting data to allow for adjustment of the orientation of the data. However, where such data is not available or simply not used, a correlation process which is invariant to rotation is preferred. Rotationally invariant correlation processes are known in the art.
In the present invention, we calculate the normalized cross-correlation for the filtered density images i, j. In this regard, it can be convenient to display the resulting output of the cross-correlation calculation as a surface plot. The peak of the cross-correlation surface plot occurs where the XY filtered density images for frame i and frame j are best correlated. Significantly, the correlation peak location will identify a shift in the x, y plane as between frames i and j. The actual XY translation error vector is easily determined from the peak location. Simply it is the delta x and delta y between the two frames calculated from the center of the frames. The adjustments are applied while holding the reference frame constant. If there are only two frames, either can be considered the reference. For a sequence of frames (as is collected for objects located under a tree canopy, for instance) the center frame works best as the reference frame.
The correlation process described herein with respect to step 408 can include a Normalized Cross Correlation (NCC) process performed with respect to filtered XY density images i, and j. The use of NCC processes for registration of two dimensional images is well known in the art. Accordingly, the NCC process will not be described here in detail. In general, however, the cross-correlation of two images i and j is defined as the product:
where pi is the pixel index running over the domain of interest wi in the filtered XY density image i, and similarly pj a running 2-dimensional index over the domain of interest wj in the XY density image j. It is known in the art that the cross-correlation product denoted as can be defined by various different functions, depending on the purpose of the cross-correlation. However, one example of typical product definition would be as follows:
p
i
p
j=Σw
It will be appreciated by those skilled in the art that the foregoing product definition will provide an indication how similar are two regions of interest contained in two different images. In this regard, when the cross correlation value is at a peak where the best correlation is obtained. Of course, the invention is not limited in this regard and any other NCC process can be used provided that it produces a result which identifies a translation error as between the XY density image i and the XY density image j.
In an alternative embodiment of the invention, a different approach can be used in step 408 in place of the NCC process to determine the XY translation error. In particular, the NCC can be replaced by a similarity metric which is rotationally invariant. As will be understood by those skilled in the art, any suitable similarity metric can be used for this purpose, provided that it is rotationally invariant, or is at least less sensitive to rotational variations as compared to the NCC process. A rotationally invariant similarity metric can be particularly advantageous in those situations where the pose of sensor 102-i was rotated with respect to sensor 102-j when the frames i and j were obtained.
Regardless of the particular technique used to determine the XY translation error in step 408, the result will be some translation error vector in the x, y plane which defines the XY translation error as between the filtered density image i and the filtered density image j. Once this translation error vector has been determined, the process can continue on to step 410. In step 410, the translation error vector is used to provide a coarse adjustment of the position of the data points in the frames i and j so that they are approximately aligned with each other, at least with respect to their position in the x, y plane. The process then continues to step 500, where the frames i, j after coarse alignment in the x, y plane is complete, are passed on for coarse alignment in the z direction.
Referring now to the flowchart in
Referring now to
In step 504, the method continues by projecting the various data points that comprise the 3D point cloud 200-i, 200-j onto the x, z plane from their location in the point cloud. Stated another way, the x and z values of the data points remain the same, while the y value for each of the data points is set to zero. The result of step 504 is to convert each frame i, j comprising the 3D point cloud data to a 2 dimensional frame in the x, z plane (XZ frame).
In step 505, the projection of the 3D point cloud data to the XZ plane for frames i and j is used to generate XZ density images. The XZ density images are generated in a manner similar to the one described above with regard to the XY density mages, except that in this instance the value of Y is set to zero. In this way, an XZ density image for frame i is obtained, and an XZ density image for frame j is obtained.
In step 506, the process continues by creating filtered XZ density image i and filtered XZ density image j. These filtered XZ density images are respectively created from the XZ density image for frame i and the XZ density image for frame j. Creation of the filtered XZ density images i, j in step 506 actually involves at least two steps. Briefly, step 506 includes (1) performing median filtering of the XZ density images i, j, and (2) performing Sobel edge filtering of the median filter XZ density images i, j. These intermediate steps were described above in detail with respect to
In step 508, a coarse determination of the Z translation error is determined. The Z translation error is a shift or offset in the z axis direction which exists as between the image data represented in the filtered XZ density image i and the image data represented by the filtered XZ density image j. The Z translation error can be defined by a vector which identifies the z direction shift or offset as between the two filtered XZ density images i, j. One method for determining the Z translation error is by performing an NCC operation on the filtered XZ density images i, j in a manner similar to that previously described with respect to step 408. Alternatively, instead of using the NCC technique for determining the Z translation error, other types of similarity metrics can also be used. In this regard, it will be appreciated that similarity metrics that are rotationally invariant can be advantageous, particularly in those situations where the pose of sensor 102-i was rotated with respect to sensor 102-j when the frames i and j were obtained.
Regardless of the particular technique used to determine the Z translation error in step 508, the result will be some vector which defines the Z translation error as a shift in the Z direction as between the filtered XZ density image i and the filtered XZ density image j. Once this translation error vector has been determined, the process can continue on to step 510. In step 510, the Z translation error vector is used to provide a coarse adjustment of the position of the data points in the frames i and j so that they are approximately aligned with each other with respect to their position in the x, z plane. Thereafter, the process continues on to step 600 (See
The process described above for frames i, j comprising the 3D point cloud data is repeated for a plurality of pairs of frames comprising a set of 3D point cloud frames (frame set). The process can begin with adjacent frames 1 and 2, where frame 1 is used as a reference frame (i=1) to which other frames are aligned. However, it can be advantageous to begin the coarse registration process using a frame in the middle of a frame set as the reference frame. Stated differently, if we have 25 frames, frame 13 can be used as a reference frame (i=13). The coarse registration can be used to align frame 14 to frame 13, frame 15 can be aligned with the coarsely aligned frame 14, and so on. Similarly, in the other direction frame 12 can be aligned to frame 13, frame 11 can be aligned to coarsely aligned frame 12, and so on. Those skilled in the art will appreciate that this approach would require some minor modification of the flowchart in
Referring once again to
Referring now to
In step 720 the process continues by performing an eigen analysis to determine a set of eigen values λ1, λ2, and λ3 for each of the sub-volumes defined in step 710. It is well known in the art that an eigen analysis can be used to provide a summary of a data structure represented by a symmetrical matrix. In this case, the symmetrical matrix used to calculate each set of eigen values is selected to be the point cloud data contained in each of the sub-volumes. Each of the point cloud data points in each sub-volume are defined by a x, y and z value. Consequently, an ellipsoid can be drawn around the data, and the ellipsoid can be defined by the three 3 eigen values, namely λ1, λ2, and λ3. The first eigenvalue is always the largest and the third is always the smallest. We are looking for structure only at this point. For example, an object such as a truck or tree trunks. We can calculate the eigen metrics using the equations in
The methods and techniques for calculating eigen values are well known in the art. Accordingly, they will not be described here in detail. In general, however the data in a sub-volume consists of a list of XYZ points. An eigenvalue decomposition is performed on the data yielding the eigenvalues with λ1 being the largest. Our frames were collected sequentially and therefore orientation between adjacent frames was similar. Consequently, each eigen value λ1, λ2, and λ3 will have a value of between 0 and 1.0
The eigenmetrics are calculated using the table in
As noted above, eigen values are particularly useful for characterizing or summarizing a data structure that is represented by a symmetrical matrix. In the present invention, the eigen values λ1, λ2, and λ3 are used for computation of a series of metrics which are useful for providing a measure of the shape formed by a 3D point cloud within a sub-volume. For example, the table in
Referring again to
When the values of M1, M2 and M3 are all approximately equal to 1.0, this is an indication that the sub-volume contains a blob-like point cloud as opposed to a planar or line shaped point cloud. For example, when the values of M1, M2, and M3 for a particular sub-volume are all greater than 0.7, it can be concluded that the sub-volume has a blob-like point cloud structure. Still, those skilled in the art will appreciate that the invention is not limited in this regard. Moreover, those skilled in the art will readily appreciate that the invention is not limited to the particular metrics shown. Instead, any other suitable metrics can be used, provided that they allow blob-like point clouds to be distinguished from point clouds that define straight lines, curved lines, and planes.
In step 730, the results of the eigen analysis and the table in
Once the qualifying sub-volumes that are most useful for registration purposes have been selected, the process continues with step 740. More particularly, in step 740, the process continues by identifying, for each scene pair in the data set, corresponding pairs of data points that are contained within the qualifying sub-volumes. This step is accomplished by finding data points in a qualifying sub-volume of one frame (e.g. frame j), that most closely match the position or location of data points from the qualifying sub-volume of the other frame (e.g. frame i). The raw data points from the qualifying sub-volumes are used to find correspondence between frame pairs. Point correspondence between frame pairs can be found using a K-D tree search method. This method, which is known in the art, is sometimes referred to as a nearest neighbor search method.
In step 750, an optimization routine is simultaneously performed on the 3D point cloud data associated with all of the frames. The optimization routine is used to determine a global rotation, scale, and translation matrix applicable to all points and all frames in the data set. Consequently, a global transformation is achieved rather than a local frame to frame transformation. More particularly, an optimization routine is used find a rotation and translation vector RiTi for each frame j that simultaneously minimizes the error for all the corresponding pairs of data points identified in step 740. The rotation and translation vector is then used for all points in each frame j so that they can be aligned with all points contained in frame i. There are several optimization routines which are well known in the art that can be used for this purpose. For example, the optimization routine can involve a simultaneous perturbation stochastic approximation (SPSA). Other optimization methods which can be used include the Nelder Mead Simplex method, the Least-Squares Fit method, and the Quasi-Newton method. Still, the SPSA method is preferred for performing the optimization described herein. Each of these optimization techniques are known in the art and therefore will not be discussed here in detail
A person skilled in the art will further appreciate that the present invention may be embodied as a data processing system or a computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. The present invention may also take the form of a computer program product on a computer-usable storage medium having computer-usable program code embodied in the medium. Any suitable computer useable medium may be used, such as RAM, a disk driver, CD-ROM, hard disk, a magnetic storage device, and/or any other form of program bulk storage.
Computer program code for carrying out the present invention may be written in Java®, C++, or any other object orientated programming language. However, the computer programming code may also be written in conventional procedural programming languages, such as “C” programming language. The computer programming code may be written in a visually oriented programming language, such as VisualBasic.
All of the apparatus, methods and algorithms disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure. While the invention has been described in terms of preferred embodiments, it will be apparent to those of skill in the art that variations may be applied to the apparatus, methods and sequence of steps of the method without departing from the concept, spirit and scope of the invention. More specifically, it will be apparent that certain components may be added to, combined with, or substituted for the components described herein while the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the invention as defined.