The present system and method is directed to a system for three-dimensional (3D) image processing, and more particularly to a system that generates 3D models using a 3D mosaic method.
Three-dimensional (3D) modeling of physical objects and environments is used in many scientific and engineering tasks. Generally, a 3D model is an electronically generated image constructed from geometric primitives that, when considered together, describes the surface/volume of a 3D object or a 3D scene made of several objects. 3D imaging systems that can acquire full-frame 3D surface images of physical objects are currently available. However, most physical objects self-occlude and no single view 3D image suffices to describe the entire surface of a 3D object. Multiple 3D images of the same object or scene from various viewpoints have to be taken and integrated in order to obtain a complete 3D model of the 3D object or scene. This process is known as “mosaicing” because the various 3D images are combined together to form an image mosaic to generate the complete 3D model.
Currently known 3D modeling systems have several drawbacks. Existing systems require knowledge of the camera's position and orientation at which each 3D image was taken, making the system impossible to use with hand-held cameras or in other contexts where precise positional information for the camera is not available. Current systems cannot automatically generate a complete 3D model from 3D images without significant user intervention.
According to one exemplary embodiment, the present system and method are configured for modeling a 3D surface by obtaining a plurality of uncalibrated 3D images (i.e., 3D images that do not have camera position information), automatically aligning the uncalibrated 3D images into a similar coordinate system, and merging the 3D images into a single geometric model. The present system and method may also, according to one exemplary embodiment, overlay a 2D texture/color overlay on a completed 3D model to provide a more realistic representation of the object being modeled. Further, the present system and method, according to one exemplary embodiment, compresses the 3D model to allow data corresponding to the 3D model to be loaded and stored more efficiently.
The accompanying drawings illustrate various embodiments of the present system and method and are a part of the specification. Together with the following description, the drawings demonstrate and explain the principles of the present system and method. The illustrated embodiments are examples of the present system and method and do not limit the scope thereof.
Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.
The optical device (102) illustrated in
Often, 3D mosaics are difficult to piece together to form a 3D model because 3D mosaicing involves images captured in the (x,y,z) coordinate system rather than a simple (x,y) system. Often the images captured in the (x,y,z) coordinate system do not contain any positional data for aligning the images together. Conventional methods of 3D image integration rely on pre-calibrated camera positions to align multiple 3D images and require extensive manual routines to merge the aligned 3D images into a complete 3D model. More specifically, traditional systems include cameras that are calibrated to determine the physical relative position of the camera to a world coordinate system. Using the calibration parameters, the 3D images captured by the camera are registered into the world coordinate system through homogeneous transformations. While traditionally effective, this method requires extensive information about the camera's position for each 3D image, severely limiting the flexibility in which the camera's position can be moved.
The flowchart shown in
Image Selection
As illustrated in
Image Pre-Processing
Once a 3D image is selected, the selected image then undergoes an optional pre-processing step (step 204) to ensure that the 3D images to be integrated are of acceptable quality. This pre-processing step (step 204) may include any number of processing methods including, but in no way limited to, image filtration, elimination of “bad” or unwanted 3D data from the image, and removal of unreliable or undesirable 3D image data. The pre-processing step (step 204) may also, according to one embodiment, include removal of noise caused by the camera to minimize or eliminate range errors in the 3D image calculation. Noise removal from the raw 3D camera images can be conducted via a spatial average or wavelet transformation process, to “de-noise” the raw images acquired by the camera (102).
A number of noise filters consider only the spatial information of the 3D image (spatial averaging) or both the spatial and frequency information (wavelet decomposition). A spatial average filter is based on spatial operations performed on local neighborhoods of image pixels. The image is convoluted with a spatial mask having a window. The spatial average filter has a zero mean, and the noise power is reduced by a factor equal to the number of pixels in the window. Although the spatial average filter is very efficient in reducing random noise in the image, it also introduces distortion that blurs the 3D image. The amount of distortion can be minimized by controlling the window size in the spatial mask.
Noise can also be removed, according to one exemplary embodiment, by wavelet decomposition of the original image, which considers both the spatial and frequency domain information of the 3D image. Unlike spatial average filters, which convolute the entire image with the same mask, the wavelet decomposition process provides a multiple resolution representation of an image in both the spatial and frequency domains. Because noise in the image is usually at a high frequency, removing the high frequency wavelets will effectively remove the noise.
Image Alignment or Registration
Regardless of which, if any, pre-processing operations are conducted on the selected 3D image, the 3D image then undergoes an image alignment step (step 206). Rather than rely upon camera position information or an external coordinate system, the present system and method relies solely upon the object's 3D surface characteristics, such as surface curvature, to join 3D images together. The 3D surface characteristics are independent of any coordinate system definition or illumination conditions, thereby allowing the present exemplary system and method to produce a 3D model without any information about the camera's position. Instead, according to one exemplary embodiment, the system locates corresponding points in overlapping areas of the images to be joined and performs a 4×4 homogenous coordinate transformation to align one image with another in a global coordinate system.
The preferred alignment process will be described with reference to
Previous methods of aligning two 3D images required knowledge of the relative relationship between the coordinate systems of the two images; this position information is normally obtained via motion sensors. However, this type of position information is not available when the images are obtained from a hand-held 3D camera, making it impossible to calculate the relative spatial relationship between the two images using known imaging systems. Even in cases where position information is available, the information tends to be only an approximation of the relative camera positions, causing the images to be aligned inaccurately.
The present exemplary system provides more accurate image alignment, without the need for any camera position information, by aligning the 3D images based solely on information corresponding to the detected 3D surface characteristics. Because the alignment process in the present system and method does not need any camera position information, the present system and method can perform “free-form” alignment of the multiple 3D images to generate the 3D model, even if the images are from a hand-held camera. This free-form alignment eliminates the need for complex positional calibrations before each image is obtained, allowing free movement of both the object being modeled and the 3D imaging device to obtain the desired viewpoints of the object without sacrificing speed or accuracy in generating a 3D model.
An exemplary way in which the alignment step (step 206) is carried out imitates the way in which humans assemble a jigsaw puzzle in that the present system relies solely on local boundary features of each 3D image to integrate the images together, with no global frame of reference. Referring to
A local feature vector is produced for each fiducial point at step (302). The local feature vector responds to a local minimum curvature and/or maximum curvature. The local feature vector for the fiducial point is defined as (k01,k02)t, where k01 and k02 are the minimum and maximum curvature of the 3D surface at the fiducial point, respectively. The details on the computation of the k01 and k02 are given below:
z(x,y)=β20x2+β11x,y+β02y2+β10x+β01y+β00.
Once a local feature vector is produced for each fiducial point, the method defines a 3×3 window for a fiducial point f0=(x0, y0, z0), which, according to one exemplary embodiment, contains all of its 8-connected neighbors {fw=(xw, yw, zw), w=1, . . . 8} (step 304), as shown in
or Z=Xβ in vector form, where β=[β20 β11 β02 β10 β01 β00]t is the unknown parameter vector to be estimated. Using the least mean square (LMS) estimation formulation, we can express β in terms of Z and X:
β≈{circumflex over (β)}=(XtX)−1XtZ
where (XtX)−1Xt is the pseudo inverse for X. The estimated parameter vector {circumflex over (β)} is used for the calculations of the curvatures k1 and k2. Based on known definitions in differential geometry, k1 and k2 are computed based on the intermediate variables, E, F, G, e, f, g:
The minimum curvature at the point f0 is defined as:
and the maximum curvature is defined as:
In the preceding equations, k1 and k2 are two coordinate-independent parameters indicating the minimum and the maximum curvatures at f0, and they form the feature vector that represents local characteristics of the 3D surface for the image.
Once each of the two 3D images to be integrated have a set of defined local fiducial points, the present exemplary system derives a 4×4 homogenous spatial transformation to align the fiducial points in the two 3D images into a common coordinate system (step 306). Preferably, this transformation is carried out via a least-square minimization method, which will be described in greater detail below with reference to
According to the present exemplary method, the corresponding fiducial point pairs on surface A and surface B illustrated in
where T is a translation vector, i.e., the distance between the centroid of the point Ai and the centroid of the point Bi. R is found by constructing a cross-covariance matrix between centroid-adjusted pairs of points.
In other words, during the alignment step (step 206), the present exemplary method starts with a first fiducial point on surface A (which is in the first image) and searches for the corresponding fiducial point on surface B (which is in the second image). Once the first corresponding fiducial point on surface B is found, the present exemplary method uses the spatial relationship of the fiducial points to predict possible locations of other fiducial points on surface B and then compares local feature vectors of corresponding fiducial points on surfaces A and B. If no match for a particular fiducial point on surface A is found on surface B during a particular prediction, the prediction process is repeated until a match is found. The present exemplary system matches additional corresponding fiducial points on surfaces A and B until alignment is complete.
Note that not all measured points have the same amount of error. For 3D cameras that are based on the structured light principle, for example, the confidence of a measured point on a grid formed by the fiducial points depends on the surface angle with respect to the light source and the camera's line-of-sight. To take this into account, the present exemplary method can specify a weight factor, wi, to be a dot product of the grid's normal vector N at point P and the vector L that points from P to the light source. The minimization problem is expressed as a weighted least-squares expression:
To achieve “seamless” alignment, a “Fine Alignment” optimization procedure is designed to further reduce the alignment error. Unlike the coarse alignment process mentioned above where we derived a closed-form solution, the fine alignment process is an iterative optimization process.
According to one exemplary embodiment, the seamless or fine alignment optimization procedure is performed by an optimization algorithm, which will be described in detail below. As discussed in previous sections, we define the index function:
where R is the function of three rotation angles (α,β,γ), t is a translation vector (x,y,z), and Ai and Bi are the n corresponding sample points on surface A and B respectively.
Rather than using just the selected feature points, as was performed for the coarse alignment, the present exemplary embodiment of the fine alignment procedure uses a large number sample points Ai and Bi in the shared region and calculates the error index value for a given set of R and T parameters. Small perturbations to the parameter vector (α,β,γ,x,y,z) are generated in all possible first order difference, which results in a set of new index values. If the minimal value of this set of indices is smaller than the initial index value of this iteration, the new parameter set is updated and a new round of optimization begins.
During operation of the fine alignment optimization procedure, two sets of 3D images, denoted as surface A and surface B are input to the algorithm along with the initial coarse transformation matrix (R(k), t(k)) having initial parameter vector (α0, β0, γ0, x0, y0, z0). Once the inputs are received, the algorithm outputs a set of transformation (R′,t′) that aligns A and B. Once the set of transformation (R′, t′) is output, for any given sample point Ai(k) on surface A, the present exemplary method searches for the closest corresponding Bi(k) on surface B, such that distance d=|Ai(k)−Bi(k)| is minimal for all neighborhood points of Bi(k).
The error index for perturbed parameter vector (αk±Δα, βk±Δβ, γk±Δγ, xk±Δx,yk±Δy,zk±Δz) can then be determined, where (Δα, Δβ, Δγ, Δx, Δy, Δz) are pre-set parameters. By comparing the index values of the perturbed parameters, an optimal direction can be determined. If the minimal value of this set of indices is smaller than the initial index value of this iteration k, the new parameter set is updated and a new round of optimization begins.
If, however, the minimal value of this set of indices is greater than the initial index value of this iteration k, the optimization process is terminated. The convergence of the proposed iterative fine alignment algorithm can be easily proven. Notice that the following equation holds I(k+1)<I(k), k=1,2, . . . . Therefore the optimization process can never diverge.
Returning to
According to one exemplary embodiment of the present system and method, a user is allowed to facilitate the registration and alignment (step 206) by manually selecting a set of feature points (minimum three points in each image) in the region shared by a plurality of 3D images. Using the curvature calculation algorithm discussed previously, the program is able to obtain a curvature values from one 3D image and search for the corresponding point on another 3D image that has the same curvature values. The feature points on the second image are thus modified to the points in which the curvature values are calculated and match with the corresponding points from the first image. The curvature comparison process would establish the spatial corresponding relationship among these feature points.
Any inaccuracy in establishing the correspondence of feature points leads to inaccurate estimation of transformation parameters. Consequently, a verification mechanism may be employed, according to one exemplary embodiment, to check the validity of the corresponding feature points founded by the curvature-matching algorithm. Only valid corresponding pairs may then be selected to calculate the transformation matrix.
According to one exemplary embodiment, the distance constraints imposed by rigid transformations may be used as the validation criteria. Given feature points A1 and A2 on the surface A and corresponding B1 and B2 on the surface B, the following constraint holds for all the rigid transformations:
∥A1−A2∥=∥B1−B2∥, or δ12 A=δ12B
Otherwise, the (A1, A2) and (B1, B2) cannot be valid feature point pair. If the difference between δ12A and δ12B are sufficiently large, 10% of its length, for example, we can reasonably assume that the feature point pair is invalid. In the case where multiple feature points are available, all possible pairs (Ai, Aj) and (Bi, Bj) may be examined, where i, j,=1,2, . . . N. Then the points are ranked according to the most number of incompatible pairs. Then the points are removed according to their ranking on the list.
According to the above-mentioned method, the transformation matrix can be calculated using three feature point pair. Given feature points A1, A2 and A3 on the surface A and corresponding B1, B2 and B3 on the surface B, a transformation matrix can be obtained by first Aligning B1 with A1 (via a simple translation), aligning B2 with A2 (via a simple rotation around A1), and aligning B3 with A3 (via a simple rotation around A1A2 axis). Subsequently combining these three simple transformations will produce an alignment matrix.
In the case where multiple feature points are available, all possible pairs (Ai, Aj, Ak) and (Bi, Bj, Bk), where i, j, k,=1,2, . . . N would be examined. Subsequently, the transformation matrices are ranked according to an error index
Then the transformation matrix that produces the minimum error will be selected.
In addition to the above-mentioned registration techniques, a number of alternative 3D registration methods may be employed. According to one exemplary embodiment, an iterative closest point (ICP) algorithm may be performed for 3D registration. The idea of the ICP algorithm is, given two sets of 3D points representing two surfaces called P and X, find the rigid transformation as defined by rotation R and translation T, which minimizes the sum of Euclidean square distances between the corresponding points of P and X. The sum of all square distances gives rise to the following surface matching error:
By iteration, optimum R and T values are found to minimize the error e(R, T). In each step of the iteration process, the closest point xk on X Of pk on P is obtained by effective search structure such as k-D tree partitioning method.
Knowing the calibration information of the 3D camera, based on Pin-hole camera model, the computational intensive 3D searching process will become a 2D searching process on the image plane of the camera. This will save considerable time over traditional ICP algorithm processing, especially when aligning dozens of range images.
The above-mentioned ICP algorithm uses two surfaces that are roughly brought together. Otherwise the ICP algorithm will converge to some local minimum. According to one exemplary embodiment, roughly bringing the two surfaces together can be done by manually selecting corresponding feature points on the two surfaces.
However, in many applications such as the 3D ear camera, automatic registration is desired. According to one exemplary embodiment, feature tracking is performed through a video sequence to construct the correspondence between two 2D images. Subsequently, camera motion can be obtained by known Structure From Motion (SFM) methods. A good feature for tracking is a textured patch with high intensity variation in both x and y directions, such as a corner. Accordingly, the intensity function may be denoted by I(x, y) and the local intensity variation matrix as:
According to one exemplary embodiment, a patch defined by a 25×25 window is accepted as a candidate feature if in the center of the window both eigenvalues of Z, λ1 l and λ1, exceed a predefined threshold λ: min(λ1, λ2)>λ.
KLT feature tracker is used for tracking good feature points through a video sequence. The KLT feature tracker is based on the early work of Lucas and Kanade as disclosed in Bruce D. Lucas and Takeo Kanade. An Iterative Image Registration Technique with an Application to Stereo Vision, International Joint Conference on Artificial Intelligence, pages 674-679, 1981; as well as Tomasi and Kanade in Jianbo Shi and Carlo Tomas, Good Feature to Track, IEEE Conference on Computer Vision and Pattern Recognition, pages 593-600, 1994, which references are incorporated herein by reference in their entirety. Briefly, good features are located by examining the minimum eigenvalue of each 2 by 2 gradient matrix, and features are tracked using a Newton-Raphson method of minimizing the difference between the two windows.
After having the corresponding feature points on multiple images, 3D scene structure or camera motion from those images can be recovered from the feature correspondence information. According to one exemplary embodiment, approaches for recovering camera or structure motion are taught in Hartley, R. I. [Richard I.] In Defense of the Eight-Point Algorithm, PAMI(19), No. 6, June 1997, pp. 580-593 and Z. Zhang, R. Deriche, O. Faugeras, Q.-T. Luong, “A Robust Technique for Matching Two Uncalibrated Images Through the Recovery of the Unknown Epipolar Geometry”, Artificial Intelligence Journal, Vol.78, pages 87-119, October 1995, which references are incorporated herein by reference in their entirety. However, with the above-mentioned methods, the results are either unstable, need the estimation of ground truth, or only a unit vector of translation T can be obtained.
According to one exemplary embodiment, with the help from 3D surfaces corresponding to 2D images, 3D positions of well-tracked feature points can be used directly for the initial guess of 3D registration.
Alternatively, the 3D image registration process may be fully automatic. That is, with the ICP and automatic feature tracking techniques, the entire process of 3D image registration may be performed by: capturing one 3D surface through a 3D camera; while moving to next position, capturing the video sequence and do feature tracking; capturing another 3D surface at the new position; obtaining the initial guess for the 3D registration from tracked feature points on 2D video; and using the ICP method to refine the 3D registration.
While the above-mentioned method is somewhat automatic, computational efficiency is an important issue in the application of aligning range images. Various data structures are used to facilitate search of the closest point. Traditionally, K-d tree is the most popular data structure for fast closest point search. It is a multidimensional search tree for points in k dimensional space. Levels of the tree are split along successive dimensions at the points. The memory requirement for this structure grows linearly with the number of points and is independent of the number of used features.
However, when dealing with tens of range images with hundreds of thousand 3D points, the k-d tree method becomes less effective, not only due to the performance of k-d tree structure, but also due to the amount of memory used to store this structure of each range image.
Consequently, according to one exemplary embodiment, an exemplary registration method based on the pin-hole camera model is proposed to reduce the memory used and enhance performance. According to the present exemplary embodiment, the 2D closest point search is converted to 1D and has no extra memory requirement.
Previously existing methods (such as K-D Tree) perform registration without taking into consideration of the nature of 3D images, thus they could not take advantage of leveraging known sensor configuration to simplify the calculation. The present exemplary method improves on the speed of traditional image registration methods by incorporating various knowledge user already have about the imaging sensor into the algorithm.
According to the present exemplary method, 3D range images are created from a 3D sensor. Traditionally, a 3D sensor includes one CCD camera and a projector. The camera can be described by widely used pinhole model as illustrated in
where 5 is an arbitrary scale and P is a 3×4 matrix, called the perspective projection matrix. Consequently, the one-one correspondence of 3D point to 2D point on the image plane can be obtained as mentioned above.
The matrix P can be decomposed as P=A[R, T], where A is a 3×3 matrix, mapping the normalized image coordinates to the retinal image coordinates, and (R, T) is the 3D motion (rotation and translation) from the world coordinate system to the camera coordinate system. The most general matrix A can be written as:
where f is the focal length of the camera, ku and kv are the horizontal and vertical scale factors, whose inverses characterize the size of the pixel in the world coordinate unit, u0 and v0 are the coordinates of the principal point of the camera, the intersection between the optical axis and the image plan. These parameters called internal and external parameters of camera are known after camera calibration.
Given another 3D surface P, finding the closest point on surface X corresponding to p(x, y, z) on surface P can be performed. By projecting p(x, y, z) onto the image plane of surface X, m(u, v), a 2D point on the image plane of X, can be calculated as noted above. Meanwhile the correspondence of m(u, v) to 3D point x(x, y, z) is already available because x(x, y, z) is calculated from m(u, v) when doing triangulation. This 3D point x(x, y, z) will be a good estimate of the closest point of p(x, y, z) on surface X. The reason is that ICP method required surface X and surface P be roughly brought together, called initial guess. Due to this good initial estimate, it is acceptable to perform an exhaust search near x(x, y, z) for better accuracy.
As illustrated in
Data Merging
Once the alignment step (step 206) is complete, the present exemplary method merges, or blends, the aligned 3D images to form a uniform 3D image data set (step 208). The object of the merging step (step 208) is to merge the two raw, aligned 3D images into a seamless, uniform 3D image that provides a single surface representation and that is ready for integration with a new 3D image. As noted above, the full topology of a 3D object is realized by merging new 3D images one by one to form the final 3D model. The merging step (step 208) smoothes the boundaries of the two 3D images together because the 3D images usually do not have the same spatial resolution or grid orientation, causing irregularities and reduced image quality in the 3D model. Noise and alignment errors also may contribute to surface irregularities in the model.
For the boundary determination step (600), the present exemplary system can use a method typically applied to 2D images as described in P. Burt and E. Adelson, “A multi-resolution spline with application to image mosaic”, ACM Trans. On Graphics, 2(4):217, 1983, the disclosure of which is incorporated by reference herein. As shown in
The quality of the 3D image data is also considered in determining the boundary (704). The present exemplary method generates a confidence factor corresponding to a given 3D image, which is based on the difference between the 3D surface's normal vector and the camera's line-of-sight. Generally speaking, 3D image data will be more reliable for areas where the camera's line-of-sight is aligned with or almost aligned with the surface's normal vector. For areas where the surface's normal vector is at an angle with respect to the camera's line of sight, the accuracy of the 3D image data deteriorates. The confidence factor, which is based on the angle between the surface's normal vector and the camera's line-of-sight, is used to reflect these potential inaccuracies.
More particularly, the boundary determining step (600) combines the 3D distance (denoted as “d”) and the confidence factor (denoted as “c”) to obtain a weighted sum that will be used as the criterion to locate the boundary line (704) between the two aligned 3D images (700, 702):
D=w1d+w2c
Determining a boundary line (704) based on this criterion results in a pair of 3D images that meet along a boundary with points of nearly equal confidences and distances.
After the boundary determining step, the process smoothes the boundary (700) using a fuzzy weighting function (step 602). As shown in
Re-Sampling
After the smoothing step (602), the exemplary merging method illustrated in
Consequently, the re-sampling method (step 209), as illustrated in
Alternatively, after each 3D image has been aligned (i.e., registered) into a same coordinate system, a single 3D surface model can be created from those range images. There are mainly two approaches to generate this single 3D iso-surface model, mesh integration and volumetric fusion as disclosed in Turk, G., M. Levoy, Zippered polygon meshes from range images, Proc. of SIGGRAPH, pp.311-318, ACM, 1994 and Curless, B., M. Levoy, A volumetric method for building complex models from range images, Proc. of SIGGRAPH, pp.303-312, ACM, 1996, both of which are incorporated herein by reference in their entirety.
The mesh integration approach can only deal with simple cases such as where two range images are involved in the overlapping area. Otherwise the situation will be too complicated to build the relationship of those range images and the overlapping area will merge into an iso-surface.
On the contrast, the volumetric fusion approach is a general solution which is suitable for various circumstances. For instance, for full coverage, dozens of range images are to be captured for an ear impression. Quite a few ranges will overlap to each other. The volumetric fusion approach is based on the idea of marching cube which creates a triangular mesh that will approximate the iso-surface.
According to one exemplary embodiment, an algorithm for the marching cube includes: first, locating the surface in a cube of eight vertexes; then assigning outside 0 to vertex outside the surface and 1 to vertex inside the surface; then generating triangles based on surface-cube intersection pattern; and marching to the next cube.
Selecting Additional Images
Continuing with
After the 3D model is complete and it is determined that there are no further images available for merging (NO, step 210), it may be desirable, according to one exemplary embodiment, to compress the 3D model data (step 214) so that it can be loaded, transferred, and/or stored more quickly. As is known in the art and noted above, a 3D model is a collection of geometric primitives that describe the surface and volume of a 3D object. The size of a 3D model of a realistic object is usually quite large, ranging from several megabytes (MB) to several hundred MB files. The processing of such a huge 3D model is very slow, even on the state-of-the-art high-performance graphics hardware.
According to one exemplary embodiment, a polygon reduction method is used as a 3D image compression process in the present exemplary method (step 214). Polygon reduction generally entails reducing the number of geometric primitives in a 3D model while minimizing the difference between the reduced and the original models. A preferred polygon reduction method also preserves important surface features, such as surface edges and local topology, to maintain important surface characteristics in the reduced model.
More particularly, an exemplary compression step (step 214) used in the present exemplary method involves using a multi-resolution triangulation algorithm that inputs the 3D data file corresponding to the 3D model and changes the 3D polygons forming the model into 3D triangles. Next, a sequential optimization process iteratively removes vertices from the 3D triangles based on an error tolerance selected by the user. For example, in dental applications, the user may specify a tolerance of about 25 microns, whereas in manufacturing applications, a tolerance of about 0.01 mm would be acceptable. A 3D distance between the original and reduced 3D model, as shown in
As can be seen in
The present exemplary method may continue by performing post-processing steps (step 216, 218, 220, 222) to enhance and preserve the image quality of the 3D model. These post-processing steps can include, but are in no way limited to any miscellaneous 3D model editing functions (step 216), such as retouching the model, or overlaying the 3D model with a 2D texture/color overlay (step 218) to provide a more realistic 3D representation of an object. Additionally, texture overlay technique may provide an effective way to reduce the number of polygons in a 3D geometry model while preserve a high level of visual fidelity of 3D objects. In addition to the 3D model editing functions (step 216) and the texture/color overlay (step 218), the present exemplary method may also provide a graphical 3D data visualization option (step 220) and the ability to save and/or output the 3D model (step 222). The 3D visualization tool allows users to assess the 3D Mosaic results and extract useful parameters from the completed 3D model. Additionally, the 3D model may be output or saved on any number of storage or output mediums.
According to one exemplary embodiment, the present system and method are graphically illustrated by an interactive graphical user interface (GUI) to ensure the ease of use and streamlining process of 3D image acquisition, processing, alignment/merge, compression, and transmission. The GUI would allow user to have a full control of the process while maintain its intuitiveness and speed.
According to one exemplary embodiment, the GUI and its associated components and software contain software drivers for acquiring images using various CCD cameras, both analog and digital, while handling both monochromic and color image sensors. Using the GUI and its associated software, the various properties of captured images may be controlled including, but in no way limited to, resolution (number of pixels such as 240 by 320, 640 by 480, 1040 by 1000, etc.); color(binary, 8-bit monochromic, 9-bit, 15-bit, or 24 bit RGB color, etc.); acquisition speed (30 frames per second (fps), 15 pfs, free-running, user specified, etc.); file format (tiff, bmp, and many other popular 2D image formats and conversion utilities among these file formats).
Additionally, according to one exemplary embodiment, the GUI and its associated software may be used to display and manipulate 3D models. According to one exemplary embodiment, the software is written in C++ using Open-GL library under the WINDOWS platform. According to this exemplary embodiment, the GUI and its associated software are configured to: first, provide multiple viewing windows controlled by users to simultaneously view the 3D object from different perspectives; second, manipulate one or more 3D objects on the screen, such manipulation including, but not limited to, rotation around and translation along three spatial axes to provide full six degrees of freedom manipulation capabilities, zoom in/out, automatic centering and scaling the displayed 3D object to fit the screen size, and multiple resolution display during the manipulation in order to improve the speed of operation; third, set material properties, display and color modes for optimized rendering results including, but in no way limited to, multiple rendering mode including surface, point of cloud, mesh, smoothed surface, and transparency; short-cut key for frequently used functions; and online documentation. Additionally, the pose of each 3D image can be changed in all degrees of freedom of translation/rotation with a three-key mouse or other similar input device.
According to another exemplary embodiment, the GUI interface and its associated software may be used to clean up received 3D image data. According to this exemplary embodiment, the received 3D images are interpolated on a square parametric grid. Once interpolated, the bad 3D data can be determined based on bad viewing angle of optical and light devices, lack of continuity of received data based on a threshold distance, and/or Za and Zb constraints.
Further, using iterative minimum distance algorithms, the software associated with the present system and method is configured to determine via a trial and error method the transformation matrix that can minimize the registration error defined and the sum of distances between corresponding points on a plurality of 3D surfaces. According to the present exemplary embodiment, the software initiates several incremental transformation matrices, and find a best one that can minimize the registration error, in each iteration. Such an incremental matrix will approach to identification matrix if the iterative optimization process converges.
Applications
According to one exemplary embodiment, the above-mentioned system and method are used to form a 3D model of dental prosthesis for CAD/CAM-based restoration. While traditional dental restorations rely upon physical impression to obtain precise shape of the complex dental surface, the present 3D dental imaging technique eliminates traditional dental impressions and provide accurate 3D model of dental structures.
According to one exemplary embodiment, digitizing dental casts for building crowns and other dental applications includes taking five 3D images from five views (top, right, left, upper and lower sides). These images are pre-processed to eliminate “bad points” and imported to the above-mentioned alignment software which conducts both the “coarse” and the “fine” alignment procedures. After obtaining the alignment transformations for all five images, the boundary detection is performed and unwanted portions of 3D data from the original images are cut off. The transformation matrices are then used to align these processed images together.
Once the source image is transformed using the spatial transformation determined by the alignment process, in most cases, only parts of the multiple images are overlapped. Therefore the error is calculated only in the overlapping regions. In general, the alignment error is primarily determined by two factors: the noise level in the original 3D images, and accuracy of the alignment error.
According to one exemplary embodiment, the 3D dental model is sent to commercial dental prosthesis vendors to have an actual duplicated dental part made using high-precision milling machine. The duplicated part, as well as the original tooth model, is then sent to a calibrated touch-probe 3D digitization machine to measure the surface profiles. The discrepancy between the original tooth model and the duplicated part are within acceptable level (<25 microns) for dental restoration applications.
Additionally, the present system and method may be used in plastic surgery applications. According to one exemplary embodiment, the above-mentioned system and method may be implemented for use in plastic surgery planning, evaluation, training, and documentation.
Human body is a complex 3D object. The quantitative 3D measurement data enables plastic surgeons to perform high-fidelity pre-surgical prediction, post-surgical monitoring, and computer-aided procedure design. The 2D and 3D images captured by the 3D video camera would allow the surgeon and the patient to discuss the surgical planning process through the use of actual 2D/3D images and computer-generated alternations. Direct preoperative visual communication helps to increase postoperative satisfaction by improving patient education in regards to realistic results. The 3D visual communication may also be invaluable in resident and fellow teaching programs between attending and resident surgeons.
In some plastic surgery applications, such as breast augmentation and facial surgeries, single view 3D images provide sufficient quantitative information for the intended applications. However, for other clinical cases such as breast reduction, due to the extreme size of the breast, multiple 3D images from different viewing angles are needed to cover entire region.
Applying the procedures of pre-processing and coarse/fine alignment with our prototype software, three 3D images can be merged into a complete breast model. These breast models may then be used for pre-operative evaluation, surgical planning, and patient communications. According to one exemplary embodiment, the differences in volume measurements between actual breast size and image breast size have been confirmed to be less than 3%, which is acceptable for clinical applications for the breast reduction surgery.
Further, the present system and method may be used for enhancing reverse engineering techniques. According to one exemplary embodiment, where there is a high dimensional accuracy request, 3D images may be taken and merged according to the above-mentioned methods.
However, there are often very few surface features that help the alignment of multiple 3D images—the surface are all smooth and with similar shape. In such cases the object may be fixed onto a background that has rich set of features allowing for the free-form alignment program work properly. The inclusion of dents or surface variations help the alignment program greatly in finding the corresponding point in the overlapping regions of 3D images. Once the images of the desired object are aligned properly, the 3D images may be further processed to cut off the background regions and generate a set of cleaned images.
Alternatively, better correspondence can be found if the surface contains more discriminative characteristics. One possible solution to such a situation is to use additional information, such as surface color, to differentiate the surface features. Another solution is to use additional features outside the object to serve as alignment “bridge points”.
The integration module of the 3D Mosaic prototype software is then used to fuse the 3D images together. Additionally, the 3D model compression program may be used to obtain 3D models with 50K, 25K, 10K and 5K triangles.
It should be understood that various alternatives to the embodiments of the present exemplary system and method described herein may be employed in practicing the present exemplary system and method. It is intended that the following claims define the scope of the invention and that the system and method within the scope of these claims and their equivalents be covered thereby.
The present application claims priority under 35 U.S.C. § 119(e) from the following previously-filed Provisional Patent Application, U.S. Application No. 60/514,150, filed Oct. 23, 2003 by Geng, entitled “Method and Apparatus for Three-Dimensional Modeling Via an Image Mosaic System” which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
60514150 | Oct 2003 | US |