The present invention relates generally to computer vision, and more particularly, to a method for automatically extracting 3D models of dense urban regions using aerial LIDAR data.
Building modeling has numerous applications in a wide variety of tasks such as urban planning, cartography, 3D visualization for virtual city tours and autonomous robot navigation. In many practical situations, it is desired to obtain 3D models for areas of hundreds and thousands of square kilometers within days and with minimal manual interaction. Traditionally, building modeling has employed data collected either from the ground or from the air. Ground-based data provides high details on facades of buildings, but lacks information on tops of buildings. Aerial data can yield very accurate building footprints and can facilitate the fusion of multiple images through ortho-rectification to cover large geographical areas, but aerial data generally lacks information on the sides of buildings necessary for producing a 3D model.
Most of the research on building modeling has employed 2D imagery, which is very inexpensive to acquire, but poses inherent difficulties for automatic 3D modeling due to known limitations of stereo algorithms for recovering 3D structure from multiple views, especially in untextured areas and at depth discontinuities. Because of these limitations, most image-based modeling systems in the prior art are either manual or semi-automatic and required a large amount of computer time to produce 3D models.
Light Detection and Ranging (LIDAR) has emerged in recent years as a viable and cost-effective alternative to using solely 2D imagery, because LIDAR can directly produce precise and accurate range information and can significantly alleviate the task of automatic building segmentation. Aerial LIDAR collections can acquire data over entire cities very rapidly, however due to constraints imposed by the operating conditions on the altitude and speed of an airborne platform, the resolution of the data is typically less than one point per square meter. LIDAR measurements from multiple views are typically co-registered together in the same global coordinate system by tracking the pose of the LIDAR sensor as it acquires data, using GPS and inertial measurement units (IMU). In several LIDAR based building segmentation algorithms, 3D points are classified in three classes: terrain, clutter and building. First, the 3D measurements are classified as ground and non-ground, and subsequently the non-ground points are divided into clutter and building regions.
In K. Kraus and N. Pfeifer, “Determination of terrain models in wooded areas with airborne laser scanner data,”ISPRS Journal of Photogrammetry and Remote Sensing, 53(4):193-203, 1998, an iterative method is presented for terrain modeling based on removing at each step 3D measurements with residuals to the current terrain surface larger than a threshold and re-estimating the terrain surface using the remaining data. Because the initialization of the terrain used all the data, the method may not converge for densely built regions. Morphological opening operators can be used to create a digital terrain model (DTM) which is subtracted from input data. DTM filters, inspired from image processing, may fail to produce good ground segmentation, especially for nonflat terrain. In V. Verma, R. Kumar, and S. Hsu, “3D building detection and modeling from aerial LIDAR datam,” in CVPR, 2006, the ground and the buildings are segmented in one step by computing local planar patches (surfels) using Total Least Squares (TLS) and connecting the consistent neighboring surfels into regions using bottom-up region growing. The largest region is selected as ground, while the rest of the regions are classified as individual buildings. In F. Rottensteiner, J. Trinder, S. Clode, and K. Kubik, “Automated delineation of roof planes from LiDAR data,” in Laser05, pages 221-226, 2005, a method is presented for the automatic delineation of roof planes using local plane estimation and grouping the local planes into larger regions starting from local seed locations. Over-segmented regions are merged together using co-planarity constraints. Points that do not belong to any surface are labeled as clutter. In J. Sun, H.-Y. Shum, and N.-N. Zheng, “Stereo matching using belief propagation,” PAMI, 19:787-800, 2003 a max-product belief propagation (BP) algorithm is used for stereo matching to enforce constraints that neighboring pixels with the same intensity values are assigned the same depth. In Y. Guo, H. Sawhney, R. Kumar, and S. Hsu, “Learning-based building online detection from multiple aerial images,” in ECCV, pages 545-552, 2001, a rectilinear approximation to outlines of buildings in an image based building detection system is employed. The orientation of each rectilinear fit is determined by finding the maximum in a histogram of local image gradients approximating tangent directions to the contour. This method assumes that errors of the contour points are relatively small.
Automating building segmentation is a critical component in any 3D modeling system by providing 3D regions (segments) with little or no manual interaction. Each of the aforementioned methods employ automatic building segmentation algorithms which fall short in efficiency or resolution. More particularly, the above-cited references assume that the ground (as opposed to buildings) covers the largest surface area, and are therefore innapropriate for use in modelling dense urban areas. Accordingly, what would be desirable, but has not yet been provided, is an efficient, accurate method for automatic building segmentation for automatically extracting 3D models of dense urban regions using aerial LIDAR data.
The above-described problems are addressed and a technical solution achieved in the art by providing a method for extracting a 3D terrain model for identifying at least buildings and terrain from LIDAR data, comprising the steps of generating a point cloud representing terrain and buildings mapped by LIDAR; classifying points in the point cloud, the point cloud having ground and non-ground points, the non-ground points representing buildings and clutter; segmenting the non-ground points into buildings and clutter; and calculating a fit between at least one building segment and at least one rectilinear structure, wherein the fit yields the rectilinear structure with the fewest number of vertices. The step of calculating further comprises the steps of (a) calculating a fit of a rectilinear structure to the at least one building segment, wherein each of the vertices has an angle that is a multiple of 90 degrees; (b) counting the number of vertices; (c) rotating the at least one building segment about an axis by a predetermined increment; and (d) repeating steps (a)-(c) until a rectilinear structure with the least number of vertices is found. The method can further comprise the step of rendering the orientations of building segments to be consistent with neighboring buildings and, before the step of calculating, extracting contours for each of the building segments. A contour can be extracted using a ball pivoting algorithm.
Segmenting the non-ground points into buildings and clutter can further comprise estimating local surfels of the non-ground points at an adaptive scale; grouping non-ground surfels into coarse regions using a bottom up region growing technique; estimating planes for each course region; refining the location of the planes by fitting points to the planes by employing an expectation maximization technique; and segmenting building regions from the planes so as to enforce neighborhood constraints between planes and ensure sharp boundaries between regions by employing a loopy belief propagation (BP) framework. Voxels that are not grouped into any of the planes are removed as clutter. Estimating local surfels can further comprise computing the local covariance at multiple scales at each voxel which contains 3D data. Grouping non-ground surfels into coarse regions can further comprises the steps of determining a histogram of local normal orientation and computing a measurement of local planarity by integrating normal direction cues over a large area; and starting region growing from voxels which have a higher degree of local planarity, wherein a voxel is joined into a proposed region if local planar regions are coherent. Estimating planes for each course region can further comprise enforcing a constraint that joined surfels are either planar or quadric surfaces, so that the planar surfaces are fit to the course regions. The expectation maximization technique includes iteratively fitting points to planes and declaring points not fitting on any planes as outliers until there is no more changes to the assigned points on planes or outliers.
Classifying points in the point cloud as ground and non-ground points can further comprise performing an initial filtering out of points in the point cloud wherein high points are removed, the remaining points being putative ground points, the high points being those points that are higher than a maximum predetermined threshold; estimating surfels for each of the putative ground points; grouping ground surfals into planar regions using bottom-up region growing; constructing a hole-free terrain; and classifying points as ground and non-ground, wherein non-ground points are those which are higher than a predetermined threshold above the terrain. Bottom-up region growing comprises finding those neighboring surfels whose normals are the same and whose origins are consistent such that it can be concluded that the neighboring surfels belong to the same plane, and repeating said step of finding wherein until sharp edges are reached.
The present invention will be more readily understood from the detailed description of an exemplary embodiment presented below considered in conjunction with the attached drawings and in which like reference numerals refer to similar elements and in which:
It is to be understood that the attached drawings are for purposes of illustrating the concepts of the invention and may not be to scale.
Referring now to
Referring now to
Referring now to
A specific rigorous implementation of steps 40-48 for classifying points of a LIDAR 3D point cloud as ground G and non-ground N is described hereinafter. Let xm, xM, ym, yM, zm and zM be the minimum and maximum bounds of the 3D points along the X, Y, Z axes. Denote by (i,j,k)=γ(p,vg) the voxelization transformation which maps a 3D point p into a voxel of a 3D grid, with vg being the length of the voxel side.
LIDAR sensors do not penetrate solid surfaces, hence ground points must have a minimum height in the vicinity of buildings. This constraint is employed to remove 3D measurements that cannot be part of the ground. An image AεE RNx×Ny is calculated containing at each pixel the minimum z value of all the points that project into it, where Nx=|(xM−xm)/vg|, Ny=|(yM−ym)/vg| and ┌x┐ is ceiling function representing the smallest integer larger than x. A 2D minimum filter is applied with radius |Lmax/vg| over A to produce an image B containing at a pixel Bi,j the ground minimum within a local neighborhood. The constant Lmax is chosen function of the largest distance between the projections onto the XY plane of any point paεN and its closest ground point pbεG (the Hausdorff distance). An upper bound on the height difference between pa and pb is Hmax=Lmax|tan(φmax| where φmax is the maximum slope of the ground and we have used the monotonicity of the tan( ) function.
Denote by Γ the set of voxels containing at least one point pu obeying puz≦Bi,j+Hmax+6σv, (i,j,·)=γ(pu,vg), where σv is the standard deviation of the noise, assumed known from the operating conditions. σv=0.15 m is employed. At each voxel (i,j,k)εΓ, a local planar model is fit at several scales ρ1< . . . <ρs using the Total Least Squares (TLS) algorithm. For each scale ρs the centroid is computed c(ρs) and the normalized scatter matrix C(ρs), of the points within radius ρs from the voxel center qi,j,k is computed. A plane (surfel) estimate π=(n, o) is represented by the normal n, P nP=1, and the origin o. The TLS plane estimate is obtained by computing the singular value decomposition (SVD) of the matrix C(ρs) and selecting n=θ3, the smallest singular vector of C(ρs) and o=c(ρs). The plane estimate is kept only if the singular values σ3s≦σ2s≦σ1s satisfy the planarity constraints σ3s/σ2s<τp, σ2s/σ1s>τKκ. ρS=3 m, τp=0.1 and τKκ=0.05 is used.
The surfel defined at the scale ρs
Let Γ1⊂Γ be the subset of putative ground voxels containing surfels, and let Γ2⊂Γ1 be the subset of Γ1 that contains voxels having at least one point ρu satisfying ρus≦Bi,j+h0, h0<Hmax+6σv. h0=4 m is used. For each voxel (i,j,k)εΓ1 a histogram Hi,j,k of normal coherence measures |ni′,j′k′T ni,j,k| is computed for all voxels (i′,j′,k′)εΓ1, P qi′,j′,k′−qi,j,kP≦R, with R=10 m. The entropy of the histogram Hi,j,k yields a planarity measure at a 3D location.
A set of seed voxels Γ3⊂Γ2 is selected by sorting the voxels in Γ2 in the increasing value of the entropy measures and utilize a greedy strategy to select voxels, followed by non maxima suppression to ensure spatial separation. Starting at each seed voxel from Γ3 a 26-connected-component region growing is performed by recursively adding new voxels from Γ1 that have consistent normals with their immediate neighbors. A voxel can be assigned to at most one ground region Γg
A digital terrain model (DTM) is computed, which is represented as an an image T containing at each pixel Ti,j the sampled ground height. The size of the pixel, w, is chosen small enough to ensure that constant ground height is a valid approximation. The DTM is initialized using height values sampled from Γg and apply kernel smoothing to smooth out the surfel estimates and to interpolate ground at missing locations.
Some entries in the matrix T may still be undefined, for example underneath buildings. To estimate the missing ground locations, the missing ground locations are grouped into holes based on eight-connected-component neighboring constraints. For each hole region the surrounding pixels are determined from T containing valid height estimates and use them to extrapolate the height within the hole.
A point pu is classified as nonground, puεN, if
*−0.19 cmpuz≧Ti,j+hb, (i,j)=γ([pux,puy]T, w), (1)
where hb is the minimum height of a building, specified by the user. In the present invention, hb=2 m.
Referring again to
A specific rigorous implementation of step 56 for parametric surface refinement with expectation maximization is described hereinafter. To initialize the building segmentation, surfels are estimated at voxel locations containing non-ground points puεN using a similar adaptive estimation described above in the rigorous implementation of steps 4048. Building segmentation requires a smaller voxel size vb<vg compared to ground classification. vb=1 m is used, since the data handled has typically a distance between samples of 1 m or more. Formally, let Φ denote the set of nonground voxels and Φ1⊂Φ denote the subset of voxels from Φ which contains surfels. A bottom-up grouping of neighboring voxels that contain consistent surfels into larger, nonparametric regions Ri is employed. Two voxels (i0,j0,k0),(i1,j1,k1)εΦ1 are considered consistent if
∥qi
|ni
with
and κ1, κ2 are two thresholds and δ is the maximum distance between the closest two 3D points that belong to the same surface. δ is chosen to be at least the sampling size. δ=2 m, κ1=cos(10°) and κ2=cos(85°).
The regions Ri with areas larger than a threshold are classified as buildings. The measurements which are not assigned to any valid region Ri are classified as clutter.
It is assumed that buildings have flat (planar) roofs. A second-order model (quadric) can be used in the same framework to model vaults. Let Ri, i=1, . . . , N be the coarse regions produced in the previous section. For clarity of presentation the region index i is dropped. In each region R planar models are fit robustly using MLESAC as described in P. Torr and A. Zisermann, “MLESAC: A new robust estimator with application to estimating image geometry,” CVIU, 78:138-156, 2000. The number of planes within R is unknown and has to be also estimated. Let πj, j=1, . . . , P denote the surfels from region R. A number B<P surfels are sampled without replacement from the set {πj,j=1, . . . , P} and a set Δ={π*b, b=1, . . . , B} is obtained. Algorithm 1 below is applied to segment out a number of planar regions Π={Π1, . . . , ΠF}. The set Π is refined using the expectation-maximization algorithm from algorithm 2 below.
1. Initialize the set Π={} and the current set of measurements U={pu|puεR}. Let F=1.
2. Calculate the MLESAC cost
and Dmax is the inlier distance. Dmax=3σv.
3. Select the hypothesis
yielding the smallest cost (4). Assign the points puεU to the plane ΠF=(nF, oF) if |nFT(pu−oF)|≦Dmax. Remove the points assigned to plane ΠF from U.
4. Refine the plane hypothesis ΠF using the measurements pu assigned to it. Insert ΠF to the set Π. Remove from the hypothesis list Δ all surfels which are consistent with ΠF. Two planes (surfels) π1=(n1, o1) and π2=(n2, o2) are considered consistent when |niT(o1−o2)|≦Dmax, i=1,2.
5. If the number of measurements in U, |U|, satisfies |U |≦3, or the number of hypotheses |Δ|=0, STOP. Otherwise, set F=F+1 and go to Step 2.
1. Initialize t=0 and Πf0=Πf, f=1, . . . , F.
2. Expectation: For all puεR compute the probability (weight) of assigning pu to Πft=(nft, oft)
if |(pu−oft)T nft|≦Dmax, and ξ(pu,Πft)=0, otherwise. As in Algorithm 1, Dmax=3σv.
3. Maximization: Reestimate the planes Πft+1, using a close-form variant of the HEIV algorithm as described in B. Matei and P. Meer, “Estimation of nonlinear errors-in-variables models for computer vision applications,” PAMI, 28:1537-1553, 1999 (hereinafter “Matei and Meer”), in which each constraint is weighted by ξ(pu,Πft). If changes occur, set t=t+1 and go to Step 2.
4. Assign puεR to the plane Πf
5. Repeat: merge any two planes Πf
where
is the rank five covariance matrix of the plane estimates Πf
A specific rigorous implementation of step 58 for segmenting building regions so as to enforce neighborhood constraints between planes and ensure sharp boundaries between regions by employing a loopy belief propagation (BP) framework is described hereinafter. All coarse regions are projected onto the ground plane XY and neighborhood relationships between them are extracted. For each region Ri, i=1, . . . , N all the regions Rj are determined that are within a distance D=5 m from Ri. The neighborhood relationships between regions form a graph. It is assumed that the neighborhood relationships are transitive, i.e. if Ri⇄Rj and Rj⇄Rk, then Ri⇄Rk. Cliques of regions forming disjoint sets are extracted. Clique extraction from graphs is an NP-complete problem so a sub-optimal approach is used based on greedy assignment.
Denote by B, one of the cliques results. Because each clique is independent of each other, the segmentation task can be distributed to be processed in parallel. Let xmB, xMB, ymB, yMB be the bounding box of the regions in B projected onto horizontal plane. B is assigned to all the points pu having the x and y components withing the bounding box of B. Let fp
assigned to a region from B and fp
A set Ψ={Ψ1, . . . , Ψj} is initialized and Ψm={puεB|∀pvεΨm, ∥≦δ,fp
Define VεRn
At Vq, we associate the belief vector λqεRJ containing the probability that Vq belongs to each of the J regions. The set of labels ζq={mp
At a cell Vq the data term vqεRJ is computed as
A Potts model is employed to model the interaction between 4-connected neighboring cells cells Vq and Vr
because it reduces the complexity of BP from quadratic to linear. Ppotts=0.01.
A max-product algorithm which maximizes the joint probability of label assignment over all cells is used as described in C. Bishop, “Pattern Recognition and Machine Learning,” Springer, 2006. The message from Vq to Vr is calculated as
ψ
q→r
t(m)=max(μq(m),a·Ppotts), (7)
with
is the neighborhood of q. The belief at node Vq is
where T is the last iteration of (7). The final label assigned to Vq is mq=arg maxm=1, . . . ,J bq(m).
Referring again to
More particularly, referring again to FIGS. 5 and 6A-6C, the 2D orientation of any polygon can be uniquely specified by an angle between 0° and 180°. In the case of the rectilinear polygon, due to the constraint that the angles between any two sides are a multiple of 90°, the orientation of the rectilinear shape can be specified uniquely using αzε[0°;90°]. The points from the outline are assumed ordered in a counter-clockwise order and can be quantized in an image having length of the pixel wc. wc is selected to be about 5 m based on the typical edge of a building. Based on the order of traversal of the contour points the pixel traversal order is determined. The upper-left occupied contour pixel is determined and then the contour is traversed, the pixels however at every step being moved only along the horizontal and vertical direction. To represent the direction of a side in the rectilinear polygon the chain code of
From
In general, neighboring buildings have a consistent orientation. Therefore, at step 64, neighborhoods are established and orientation consistancy for neighboring buildings is established. At step 66, a constraint is imposed that neighboring buildings of a local clique have a common orientation. More particularly, for each outline the modes of log-likelihood are extracted using mean-shift and only the modes which have a likelihood ratio higher than 0.8 compared to the best peak are retained. The neighboring outlines (regions) which have consistent orientation (i.e., they have peaks within 10° of each other) are determined. For all the neighboring outlines with consistent orientation the individual log-likelihoods are added and the angle corresponding to the maximum is determined. If α is the final angle estimate for a contour, the interval of the 2D projection of the contour along the direction corresponding to α and α+90° is computed and the angle leading to the largest interval is retained.
It is to be understood that the exemplary embodiments are merely illustrative of the invention and that many variations of the above-described embodiments may be devised by one skilled in the art without departing from the scope of the invention. It is therefore intended that all such variations be included within the scope of the following claims and their equivalents.