The invention relates generally to 3-dimensional (3-D) imaging and more particularly relates to methods incorporating textural information to a 3-D representation of the human face to form a 3-D facial model.
Orthodontic procedures and orthognathic surgery seek to correct dentofacial conditions including structural asymmetry, aesthetic shortcomings, and alignment and other functional problems that relate to the shape of the patient's face and jaws. One tool that can be of particular value for practitioners skilled in orthodontics and related fields is photorealistic modeling. Given a facial model displayed as an accurate volume rendition of the patient's head, showing the structure as well as the overall surface appearance or texture of the patient's face, the practitioner can more effectively visualize and plan a treatment procedure that provides both effective and pleasing results.
Generating a volume image that provides a suitable visualization of the human face for corrective procedures relating to teeth, jaws, and related dentition uses two different types of imaging. A volume image that shows the shape and dimensions of the head and jaws structure is obtained using computed tomography (CT), such as cone-beam computed tomography (CBCT), or other volume imaging method, including magnetic resonance imaging (MRI) or magnetic resonance tomography (MRT). The volume image, however, has no color or perceptible textural content and would not, by itself, be of much value for showing simulated results to a patient or other non-practitioner, for example. To provide useful visualization that incorporates the outer, textural surface of the human face, a camera is used to obtain reflectance or “white light” images. The color and texture information from the camera images is then correlated with volume image information in order to provide an accurate rendition usable by the orthodontics practitioner.
Solutions that have been proposed for addressing this problem include methods that provide at least some level of color and texture information that can be correlated with volume image data from CBCT or other scanned image sources. These conventional solutions include so-called range-scanning methods.
Reference is made to U.S. Patent Application Publication No. 2012/0300895 entitled “DENTAL IMAGING APPARATUS” by Koivisto et al. that combines texture information from reflectance images along with surface contour data from a laser scan.
Reference is made to U.S. Patent Application Publication No. 2013/0163718 entitled “DENTAL X-RAY DEVICE WITH IMAGING UNIT FOR SURFACE DETECTION AND METHOD FOR GENERATING A RADIOGRAPH OF A PATIENT” by Lindenberg et al. that describes using a masking edge for scanning to obtain contour and color texture information for combination with x-ray data.
The '0895 Koivisto et al. and '3718 Lindberg et al. patent applications describe systems that can merge volume image data from CBCT or other scanned image sources with 3-D surface data that is obtained from 3-D range-scanning devices. The range scanning devices can provide some amount of contour data as well as color texture information. However, the solutions that are described in these references can be relatively complex and costly. Requirements for additional hardware or other specialized equipment with this type of approach add cost and complexity and are not desirable for the practitioner.
A dental imaging system from Dolphin Imaging Software (Chatsworth, Calif.) provides features such as a 2-D facial wrap for forming a texture map on the facial surface of a 3-D image from a CBCT, CT or MRI scan.
Reference is made to a paper by Iwakiri, Yorioka, and Kaneko entitled “Fast Texture Mapping of Photographs on a 3D Facial Model” in Image and Vision Computing NZ, November 2003, pp. 390-395.
Both the Dolphin software and the Iwakiri et al. method map 2-D image content to 3-D CBCT volume image data. While such systems may have achieved certain degrees of success in particular applications, there is room for improvement. For example, the Dolphin software user, working with a mouse, touch screen, or other pointing device, must accurately align and re-position the 2-D content with respect to 3-D content that appears on the display screen. Furthermore, imprecise registration of 2-D data that provides information on image texture to the 3-D volume data compromises the appearance of the combined data.
Thus, there is a need for apparatus and method for accurately generating a volume image that provides accurate representation of textural features.
An object of the present disclosure is to advance the art of volume imaging, particular for orthodontic patients.
Another object of the present disclosure is to provide a system that does not require elaborate, specialized hardware for providing a 3-D model of a patient's head. Advantageously, methods disclosed herein can be executed using existing CBCT hardware, providing accurate mapping of facial texture information to volume 3-D data.
These objects are given only by way of illustrative example, and such objects may be exemplary of one or more embodiments of the invention. Other desirable objectives and advantages inherently achieved by the disclosed invention may occur or become apparent to those skilled in the art. The invention is defined by the appended claims.
According to one aspect of the invention, there is provided a method for forming a 3-D facial model, the method executed at least in part on a computer and comprising:
The foregoing and other objects, features, and advantages of the invention will be apparent from the following more particular description of the embodiments of the invention, as illustrated in the accompanying drawings. The elements of the drawings are not necessarily to scale relative to each other.
The following is a detailed description of exemplary embodiments of the application, reference being made to the drawings in which the same reference numerals identify the same elements of structure in each of the several figures.
In the drawings and text that follow, like components are designated with like reference numerals, and similar descriptions concerning components and arrangement or interaction of components already described are omitted. Where they are used, the terms “first”, “second”, and so on, do not necessarily denote any ordinal or priority relation, but are simply used to more clearly distinguish one element from another.
In the context of the present disclosure, the term “volume image” is synonymous with the terms “3-dimensional image” or “3-D image”. 3-D volume images can be cone-beam computed tomography (CBCT) as well as fan-beam CT images, as well as images from other volume imaging modalities, such as magnetic resonance imaging (MM).
For the image processing steps described herein, the terms “pixels” for picture image data elements, conventionally used with respect 2-D imaging and image display, and “voxels” for volume image data elements, often used with respect to 3-D imaging, can be used interchangeably. It should be noted that the 3-D volume image is itself synthesized from image data obtained as pixels on a 2-D sensor array and displays as a 2-D image from some angle of view. Thus, 2-D image processing and image analysis techniques can be applied to the 3-D volume image data. In the description that follows, techniques described as operating upon pixels may alternately be described as operating upon the 3-D voxel data that is stored and represented in the form of 2-D pixel data for display. In the same way, techniques that operate upon voxel data can also be described as operating upon pixels.
In the context of the present disclosure, the noun “projection” may be used to mean “projection image”, referring to the 2-D radiographic image that is captured and used to reconstruct the CBCT volume image, for example.
The term “set”, as used herein, refers to a non-empty set, as the concept of a collection of elements or members of a set is widely understood in elementary mathematics. The term “subset”, unless otherwise explicitly stated, is used herein to refer to a non-empty proper subset, that is, to a subset of the larger set, having one or more members. For a set S, a subset may comprise the complete set S. A “proper subset” of set S, however, is strictly contained in set S and excludes at least one member of set S.
As used herein, the term “energizable” relates to a device or set of components that perform an indicated function upon receiving power and, optionally, upon receiving an enabling signal.
The term “reflectance image” refers to an image or to the corresponding image data that is captured by a camera using reflectance of light, typically visible light. Image texture includes information from the image content on the distribution of color, shadow, surface features, intensities, or other visible image features that relate to a surface, such as facial skin, for example.
Cone-beam computed tomography (CBCT) or cone-beam CT technology offers considerable promise as one type of tool for providing diagnostic quality 3-D volume images. Cone-beam X-ray scanners are used to produce 3-D images of medical and dental patients for the purposes of diagnosis, treatment planning, computer aided surgery, etc. Cone-beam CT systems capture volume data sets by using a high frame rate flat panel digital radiography (DR) detector and an x-ray source, typically both affixed to a gantry or other transport, that revolve about the subject to be imaged. The CT system directs, from various points along its orbit around the subject, a divergent cone beam of x-rays through the subject and to the detector. The CBCT system captures projection images throughout the source-detector orbit, for example, with one 2-D projection image at every degree increment of rotation. The projections are then reconstructed into a 3-D volume image using various techniques. Among the most common methods for reconstructing the 3-D volume image from 2-D projections are filtered back projection (FBP) and Feldkamp-Davis-Kress (FDK) approaches.
Embodiments of the present disclosure use a multi-view imaging technique that obtains 3-D structural information from 2-D images of a subject, taken at different angles about the subject. Processing for multi-view imaging can employ “structure-from-motion” (SFM) imaging technique, a range imaging method that is familiar to those skilled in the image processing arts. Multi-view imaging and some applicable structure-from-motion techniques are described, for example, in U.S. Patent Application Publication No. 2012/0242794 entitled “Producing 3D images from captured 2D video” by Park et al., incorporated herein in its entirety by reference.
The logic flow diagram of
Continuing with the
The reflectance image and calibration data in the
Identifying feature points 36 and 72 helps to provide the needed registration between 2-D and 3-D image data in a subsequent registration step S150 of the
At the conclusion of the
The logic flow diagram of
Continuing with the sequence of
The reflectance images then provide a second point cloud for the face surface of the patient. In an exemplary sparse point cloud generation step S180, the reflectance images obtained in reflectance image capture step S132 are used to generate another point cloud, termed a sparse point cloud, with relatively fewer surface points defined when compared to the dense point cloud for the same surface. In the context of the present disclosure, for a given surface such as a face, a sparse point cloud for that surface has fewer point spatial locations than does a dense point cloud that was obtained from a volume image. Typically, though not necessarily, the dense point cloud has significantly more points than does the sparse point cloud. Both point clouds are spatially defined and constrained by the overall volume and shape associated with the facial surface of the patient. The actual point cloud density for the dense point cloud depends, at least in part, on the overall resolution of the 3-D volume image. Thus, for example, where the isotropic resolution for a volume image is 0.5 mm, the corresponding resolution of the dense point cloud is constrained so that points in the dense point cloud are no closer than 0.5 mm apart. In typical practice, the point cloud that is generated for the same subject from a succession of 2-D images using structure-from-motion or related multi-view geometry techniques is sparse by comparison with the point cloud generated using volume imaging.
To generate the sparse point cloud, the system applies multi-view geometry methods to the reflectance images 50 acquired in step S132. Step 180 processing is shown in
Structure from motion (SFM) is a range imaging technique known to those skilled in the image processing arts, particularly with respect to computer vision and visual perception. SFM relates to the process of estimating three-dimensional structures from two-dimensional image sequences which may be coupled with local motion signals. In biological vision theory, SFM has been related to the phenomenon by which the human viewer can perceive and reconstruct depth and 3-D structure from the projected 2-D (retinal) motion field of a moving object or scene. According to an embodiment of the present invention, the sparse point cloud 70 can be recovered from a number of reflectance images 50 obtained in step S132 (
References to Structure-from-motion (SFM) image processing techniques include U.S. Patent Application Publication No. 2013/0265387 A1 entitled “Opt-Keyframe Reconstruction for Robust Video-Based Structure from Motion” by Hailin Jin.
References to 2-D to 3-D image alignment include U.S. Patent Application Publication No. 2008/0310757 entitled “System and Related Methods for Automatically Aligning 2D Images of a Scene to a 3D Model of the Scene” to Wolberg et al.
As shown in
According to one embodiment of the present disclosure, texture mapping step S160 can proceed as follows:
Generation of a polygon model from a point cloud is known to those skilled in the imaging arts. One type of polygon model generation is described, for example, in U.S. Pat. No. 8,207,964 entitled “Methods and apparatus for generating three-dimensional image data models” to Meadow et al. More generally, polygons are generated by connecting nearest-neighbor points within the point cloud as vertices, forming contingent polygons of three or more sides that, taken together, define the skin surface of the patient's face. Polygon model generation provides interconnection of vertices, as described in U.S. Pat. No. 6,975,750 to Han et al., entitled “System and method for face recognition using synthesized training images.” Mapping of the texture information to the polygon model from the reflectance images forms the texture-mapped volume image.
In displaying the texture-mapped volume image, an optional measure of transparency can be provided for the texture components, to allow improved visibility of internal structures, such as jaws, teeth, and other dentition elements.
An embodiment of the present invention can be integrated into 3-D Visual Treatment Objective (VTO) software, used in orthognathic surgery, for example.
The schematic diagram of
Consistent with one embodiment, the present invention utilizes a computer program with stored instructions that perform on image data accessed from an electronic memory. As can be appreciated by those skilled in the image processing arts, a computer program of an embodiment of the present invention can be utilized by a suitable, general-purpose computer system, such as a personal computer or workstation. However, many other types of computer systems can be used to execute the computer program of the present invention, including networked processors. The computer program for performing the method of the present invention may be stored in a computer readable storage medium. This medium may comprise, for example; magnetic storage media such as a magnetic disk such as a hard drive or removable device or magnetic tape; optical storage media such as an optical disc, optical tape, or machine readable bar code; solid state electronic storage devices such as random access memory (RAM), or read only memory (ROM); or any other physical device or medium employed to store a computer program. The computer program for performing the method of the present invention may also be stored on computer readable storage medium that is connected to the image processor by way of the internet or other communication medium. Those skilled in the art will readily recognize that the equivalent of such a computer program product may also be constructed in hardware.
It should be noted that the term “memory”, equivalent to “computer-accessible memory” in the context of the present disclosure, can refer to any type of temporary or more enduring data storage workspace used for storing and operating upon image data and accessible to a computer system. The memory could be non-volatile, using, for example, a long-term storage medium such as magnetic or optical storage. Alternately, the memory could be of a more volatile nature, using an electronic circuit, such as random-access memory (RAM) that is used as a temporary buffer or workspace by a microprocessor or other control logic processor device. Displaying an image requires memory storage. Display data, for example, is typically stored in a temporary storage buffer that is directly associated with a display device and is periodically refreshed as needed in order to provide displayed data. This temporary storage buffer can also be considered to be a memory, as the term is used in the present disclosure. Memory is also used as the data workspace for executing and storing intermediate and final results of calculations and other processing. Computer-accessible memory can be volatile, non-volatile, or a hybrid combination of volatile and non-volatile types.
It will be understood that the computer program product of the present invention may make use of various image manipulation algorithms and processes that are well known. It will be further understood that the computer program product embodiment of the present invention may embody algorithms and processes not specifically shown or described herein that are useful for implementation. Such algorithms and processes may include conventional utilities that are within the ordinary skill of the image processing arts. Additional aspects of such algorithms and systems, and hardware and/or software for producing and otherwise processing the images or co-operating with the computer program product of the present invention, are not specifically shown or described herein and may be selected from such algorithms, systems, hardware, components and elements known in the art.
In one exemplary embodiment, a method for forming a 3-D facial model can be executed at least in part on a computer and can include obtaining a reconstructed computed tomography image volume of at least a portion of the head of a patient; extracting a soft tissue surface of the patient's face from the reconstructed computed tomography image volume and forming a dense point cloud corresponding to the extracted soft tissue surface; acquiring a plurality of reflection images of the face, wherein each reflection image in the plurality has a different corresponding camera angle with respect to the patient; calculating calibration data for the camera for each of the reflection images; forming a sparse point cloud corresponding to the reflection images according to a multi-view geometry; automatically registering the sparse point cloud to the dense point cloud; mapping texture data from the reflection images to the dense point cloud; and displaying the texture-mapped volume image.
While the invention has been illustrated with respect to one or more implementations, alterations and/or modifications can be made to the illustrated examples without departing from the spirit and scope of the appended claims. In addition, while a particular feature of the invention can have been disclosed with respect to one of several implementations, such feature can be combined with one or more other features of the other implementations as can be desired and advantageous for any given or particular function. The term “at least one of” is used to mean one or more of the listed items can be selected. The term “about” indicates that the value listed can be somewhat altered, as long as the alteration does not result in nonconformance of the process or structure to the illustrated embodiment. Finally, “exemplary” indicates the description is used as an example, rather than implying that it is an ideal. Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. The presently disclosed embodiments are therefore considered in all respects to be illustrative and not restrictive. The scope of the invention is indicated by the appended claims, and all changes that come within the meaning and range of equivalents thereof are intended to be embraced therein.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2014/083989 | 8/8/2014 | WO | 00 |