System and Method for Aligning 3-D Imagery of a Patient's Oral Cavity in an Extended Reality (XR) Environment

BACKGROUND
Field

This invention relates generally to medical diagnostics, and more specifically to an automated system and method for the alignment of volumetric and surface scan images with an extended reality (XR) video feed.

Related Art

Modern image generation systems play an important role in disease detection and treatment planning. Few existing systems and methods were discussed as follows. One common method utilized is dental radiography, which provides dental radiographic images that enable the dental professional to identify many conditions that may otherwise go undetected and to see conditions that cannot be identified clinically. Another technology is cone beam computed tomography (CBCT) that allows to the view of structures in the oral-maxillofacial complex in three dimensions. Hence, cone beam computed tomography technology is most desired over dental radiography.

However, CBCT includes one or more limitations, such as time consumption and complexity for personnel to become fully acquainted with the imaging software and correctly using digital imaging and communications in medicine (DICOM) data. American Dental Association (ADA) also suggests that the CBCT image should be evaluated by a dentist with appropriate training and education in CBCT interpretation. Further, many dental professionals who incorporate this technology into their practices have not had the training required to interpret data on anatomic areas beyond the maxilla and the mandible. To address the foregoing issues, deep learning has been applied to various medical imaging problems to interpret the generated images, but its use remains limited within the field of dental radiography. Further, most applications only work with 2D X-ray images.

Another existing article entitled “Teeth and jaw 3D reconstruction in stomatology”, Proceedings of the International Conference on Medical Information Visualisation—BioMedical Visualisation, pp 23-28, 2007, researchers Krsek et al. describe a method dealing with problems of 3D tissue reconstruction in stomatology. In this process, 3D geometry models of teeth and jaw bones were created based on input (computed tomography) CT image data. The input discrete CT data were segmented by a nearly automatic procedure, with manual correction and verification. Creation of segmented tissue 3D geometry models was based on vectorization of input discrete data extended by smoothing and decimation. The actual segmentation operation was primarily based on selecting a threshold of Hounsfield Unit values. However, this method fails to be sufficiently robust for practical use.

Another existing patent number U.S. Pat. No. 8,849,016, entitled “Panoramic image generation from CBCT dental images” to Shoupu Chen et al. discloses a method for forming a panoramic image from a computed tomography image volume, acquires image data elements for one or more computed tomographic volume images of a subject, identifies a subset of the acquired computed tomographic images that contain one or more features of interest and defines, from the subset of the acquired computed tomographic images, a sub-volume having a curved shape that includes one or more of the contained features of interest. The curved shape is unfolded by defining a set of unfold lines wherein each unfold line extends at least between two curved surfaces of the curved shape sub-volume and re-aligning the image data elements within the curved shape sub-volume according to a re-alignment of the unfold lines. One or more views of the unfolded sub-volume are displayed.

Another existing patent application number US20080232539, entitled “Method for the reconstruction of a panoramic image of an object, and a computed tomography scanner implementing said method” to Alessandro Pasini et al. discloses a method for the reconstruction of a panoramic image of the dental arches of a patient, a computer program product, and a computed tomography scanner implementing said method. The method involves acquiring volumetric tomographic data of the object; extracting, from the volumetric tomographic data, tomographic data corresponding to at least three sections of the object identified by respective mutually parallel planes; determining, on each section extracted, a respective trajectory that a profile of the object follows in an area corresponding to said section; determining a first surface transverse to said planes such as to comprise the trajectories, and generating the panoramic image on the basis of a part of the volumetric tomographic data identified as a function of said surface. However, the above references also fail to address the afore discussed problems regarding the cone beam computed tomography technology and image generation system.

Therefore, there is a need for an automated parsing pipeline system and method for anatomical localization and condition classification. There is a need for training an AI/ML model for performing segmentation of any dental volumetric image for providing dental practitioners with an automated diagnostic tool. Additionally, while individual imaging techniques, such as CBCT, are powerful on their own, when combined, they can provide a more accurate 3D representation of a patient. In practice, volumetric CBCT images are already being merged with surface Intraoral Scans (IOS) to improve planning for computer-guided surgery. However, this superimposition must currently be done manually. One method, for example, involves manually identifying and specifying matching points in both the volumetric images and surface scants. The process of manual alignment is time-consuming. An automated system capable of aligning volumetric images and surface scans would benefit dental practitioners by reducing the time and effort required to align said images prior to use in surgical and clinical applications.

Extended Reality (XR) is an umbrella term encompassing immersive technologies such as Augmented Reality (AR), Virtual Reality (VR), and Mixed Reality (MR). XR technologies blend the physical and digital worlds, creating interactive environments where users can visualize and manipulate digital objects in real-time. In the context of dentistry, XR enables clinicians to engage with 3D models and patient data through headsets, glasses, or other wearable devices, offering an enhanced understanding of complex anatomical structures.

While XR applications exist in medicine and dentistry, there still appears to be a void in current dental practice regarding the seamless integration of Cone Beam Computed Tomography (CBCT) and intra-oral scan (IoS) overlays for XR use in a clinical context. Existing XR applications lack the ability to toggle effortlessly between CBCT and IoS overlays depending on the anatomical structure being examined. For hard tissue landmarks, such as assessing bone density or the position of dental implants, CBCT images provide unparalleled detail and accuracy. Conversely, for soft tissue evaluations, such as checking gum health or detecting lesions, IoS images offer superior visualization. Current XR solutions do not provide the flexibility needed to toggle between these imaging modalities seamlessly, which is essential for ensuring that the most relevant data is available to clinicians at all times. This limitation hinders the precision and efficacy of clinical decisions.

Moreover, the current methods for aligning CBCT and IoS images are labor-intensive and prone to error. Traditional techniques require manual identification and alignment of corresponding points in both sets of images, a process that is time-consuming and susceptible to inaccuracies. The lack of real-time overlay and interactive features in clinical XR applications means that clinicians must still rely on outdated, and non-interactive methods, which detract from their efficiency and increase the risk of human error. Additionally, current XR applications do not fully exploit interactive features that could enhance clinical practice. For instance, the integration of hand-gesture and voice-controlled annotation tools remains underdeveloped. These tools would allow clinicians to annotate images, adjust overlays, and manipulate 3D models in real-time using natural gestures and voice commands. This hands-free interactivity would enable clinicians to maintain focus on the patient, improving both the efficiency of the workflow and the quality of care provided. The absence of such advanced interactive features in existing XR clinical applications represents a significant gap in the current state of the art.

Examples of XR applications in dentistry include virtual treatment simulations, where AR glasses project treatment plans onto the patient's teeth, allowing for precise drilling and implant placement. VR headsets are used to provide immersive training environments for dental students, enabling them to practice procedures in a risk-free setting. XR is also employed for patient education, using AR to show patients visualizations of their dental conditions and proposed treatments, thus improving their understanding and engagement.

While XR applications have made strides in medical and dental fields, significant gaps remain in their ability to integrate and utilize CBCT and IoS images effectively. The current state of the art lacks seamless switching between imaging modalities, automated alignment processes, and advanced interactive tools, all of which are essential for enhancing diagnostic precision and treatment planning. Addressing these deficiencies is crucial for the advancement of XR technology in dental practice, offering substantial improvements in workflow efficiency, accuracy, and overall clinical outcomes.

SUMMARY

A system of one or more computers can be configured to perform particular operations or actions by virtue of having software, firmware, hardware, or a combination of them installed on the system that in operation causes or cause the system to perform the actions. One or more computer programs can be configured to perform particular operations or actions by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions. Embodiments disclosed include an automated parsing pipeline system and method for anatomical localization and condition classification.

In an embodiment, the system comprises an input event source, a memory unit in communication with the input event source, a processor in communication with the memory unit, a volumetric image processor in communication with the processor, a voxel parsing engine in communication with the volumetric image processor and a localizing layer in communication with the voxel parsing engine. In one embodiment, the memory unit is a non-transitory storage element storing encoded information. In one embodiment, at least one volumetric image data is received from the input event source by the volumetric image processor. In one embodiment, the input event source is a radio-image gathering source.

The processor is configured to parse the at least one received volumetric image data into at least a single image frame field of view by the volumetric image processor. The processor is further configured to localize anatomical structures residing in the at least single field of view by assigning each voxel a distinct anatomical structure by the voxel parsing engine. In one embodiment, the single image frame field of view is pre-processed for localization, which involves rescaling using linear interpolation. The pre-processing involves use of any one of a normalization schemes to account for variations in image value intensity depending on at least one of an input or output of volumetric image. In one embodiment, localization is achieved using a V-Net-based fully convolutional neural network.

The processor is further configured to select all voxels belonging to the localized anatomical structure by finding a minimal bounding rectangle around the voxels and the surrounding region for cropping as a defined anatomical structure by the localization layer. The bounding rectangle extends by at least 15 mm vertically and 8 mm horizontally (equally in all directions) to capture the tooth and surrounding context. In one embodiment, the automated parsing pipeline system further comprises a detection module. The processor is configured to detect or classify the conditions for each defined anatomical structure within the cropped image by a detection module or classification layer. In one embodiment, the classification is achieved using a DenseNet 3-D convolutional neural network.

In another embodiment, an automated parsing pipeline method for anatomical localization and condition classification is disclosed. At one step, at least one volumetric image data is received from an input event source by a volumetric image processor. At another step, the received volumetric image data is parsed into at least a single image frame field of view by the volumetric image processor. At another step, the single image frame field of view is pre-processed by controlling image intensity value by the volumetric image processor. At another step, the anatomical structure residing in the single pre-processed field of view is localized by assigning each voxel a distinct anatomical structure ID by the voxel parsing engine. At another step, all voxels belonging to the localized anatomical structure is assigned a distinct identifier and segmentation is based on a distribution approach. Optionally, a segmented polygonal mesh may be generated from the distribution-based segmentation. Further optionally, the polygonal mesh may be generated from a coarse-to-fine model segmentation of coarse input volumetric images. In other embodiments, may be converted selected by finding a minimal bounding rectangle around the voxels and the surrounding region for cropping as a defined anatomical structure by the localization layer. In another embodiment, the method includes a step of, classifying the conditions for each defined anatomical structure within the cropped image by the classification layer.

In another embodiment, the system comprises an input event source, a memory unit in communication with the input event source, a processor in communication with the memory unit, an image processor in communication with the processor, a segmentation layer in communication with the image processor, a mesh layer in communication with the segmentation layer, and an alignment module in communication with both the segmentation layer and mesh layer. In one embodiment, the memory unit is a non-transitory storage element storing encoded information. In one embodiment, at least one volumetric image datum and at least one surface scan datum are received from the input event source by the image processor. In one embodiment, the input event source is at least one radio-image gathering source. In one embodiment, the volumetric image is a three-dimensional voxel array of a maxillofacial anatomy of a patient and the surface scan is a polygonal mesh corresponding to the maxillofacial anatomy of the same patient.

The processor is configured to segment both volumetric images and surface scan images into a set of distinct anatomical structures. In one embodiment, the volumetric image is segmented by assigning an anatomical structure identifier to each volumetric image voxel, and the surface scan image segmented by assigning an anatomical structure identifier to each vertex or face of the surface scan's mesh. The volumetric image and the surface scan image have at least one distinct anatomical structure in common.

The processor is further configured to convert both the volumetric image and the surface scan image into point clouds/point sets that can be aligned. In one embodiment, a polygonal mesh is extracted from the volumetric image. Both the original surface scan polygonal mesh and the extracted volumetric image mesh are converted to point clouds. In one embodiment, both the volumetric image and surface scan image are processed by applying a binary erosion on the voxels corresponding to an anatomical structure, producing an eroded mask. The eroded mask is subtracted from a non-eroded mask, revealing voxels on the boundary. A random subset of boundary voxels is selected as a point set by selecting a number of points similar to a number of points on a corresponding structure in a polygonal mesh. Once both the volumetric image and surface scan image are converted to point clouds/point sets, the volumetric image and surface scan image point cloud/point sets are aligned. In one embodiment, alignment is accomplished using point set registration. Alternatively, each of the volumetric and surface scan meshes may be converted into a format featuring coordinates of assigned structures, landmarks, etc. for alignment based on common coordinates/structures, landmarks, etc.

In one aspect of the invention, a method for aligning three-dimensional (3D) imagery of a patient's oral cavity in an extended reality (XR) system is disclosed. This method involves receiving a 3D image comprising at least one of a volumetric image, surface scan, or a photograph with depth information. The volumetric image, obtained from a CBCT scan, comprises a three-dimensional voxel array representing the anatomical structure, while the surface scan involves intra-oral scanning to produce a detailed polygonal mesh or point cloud reflecting the surface contours. Additionally, a real-time display or feed from XR goggles or a prerecorded video (display/feed) is received, providing real-time or prerecorded imagery of the patient's anatomical structure.

The registration and rendering process for aligning CBCT and IOS 3D imagery with an XR display/feed involves several steps and various alignment methods. Alignment can occur between the processed 3D images themselves (CBCT and IOS) and then with the XR display/feed, or directly between each 3D image and the display/feed. Alternatively, alignment can occur between a single processed 3D image, either CBCT or IOS, and the display/feed. Once aligned, the registration process involves obtaining an affine transformation to accurately align the 3D imagery with the display feed. An affine transformation is a linear mapping method that preserves points, straight lines, and planes. It includes rotation, translation, scaling, and shearing components, allowing for a comprehensive adjustment of the 3D images to match the display/feed. Additionally, the camera undergoes calibration to obtain intrinsic and extrinsic parameters. Intrinsic parameters include focal length and principal point, while extrinsic parameters define the camera's position and orientation in space. The method involves detecting a set of anatomically distinct points on both the 3D imagery and the display/feed, ensuring that corresponding points can be matched between the two, maintaining accuracy and consistency in the alignment process. Once the affine transformation and camera parameters are determined, the processed 3D imagery is rendered onto the display/feed. This rendering process involves mapping the 3D coordinates of the imagery to the 2D coordinates of the display/feed. In yet other embodiments, the mapping of the 3D coordinates of the imagery to the 2D coordinates of the display/feed is performed as part of the registration step, while the rendering step performs the step of rendering the mapped coordinates determined by the registration step (alignment between at least one 3D image with the display/feed, affine transformation, camera parameters, and mapping).

In one embodiment, the method of registering (aligning) the 3D imagery onto the display/feed and, or aligning the 3D imagery, comprises applying a fully convolutional U-Net-like architecture. This architecture is used to obtain a probability distribution over the location of every point of interest for every voxel; obtaining a probability distribution of a location of a landmark in form of heatmap; selecting a location of maximum probability as a detection of a landmark; and then filtering said detections by a probability threshold, where probability is taken as a probability of that landmark on that voxel, and threshold is selected by optimizing detection metrics (such as precision and recall) on validation set.

In another embodiment, the method includes detecting corresponding points by applying a weighted point-set alignment approach. This approach assigns different weights to soft tissue and hard tissue points, optimizing the alignment process based on the clinical context. Soft tissue points, such as facial landmarks, are given less weight due to their potential for movement and deformation, while hard tissue points, like teeth, are given more weight due to their stability and precision. This weighted alignment ensures that the most stable and reliable points are used for accurate alignment.

In one embodiment, the method further comprises converting the mesh from the volumetric image and the mesh or points from the surface scan to a point cloud. Once converted, the volumetric point cloud and the surface scan point cloud are aligned using point set registration techniques. This conversion and alignment process allows for the precise integration of volumetric and surface scan data, ensuring that both types of imagery are accurately overlaid onto the video feed. This alignment is crucial for providing clinicians with a comprehensive view of the patient's anatomy, combining the detailed internal structures captured by CBCT with the precise surface details from IoS.

The system also includes a user interface module that allows clinicians to manually select and mark anatomical landmarks on the volumetric image or surface scan. Additionally, this module enables clinicians to interact with the rendered video feed using hand gestures or voice commands to make annotations, adjust overlays, and manipulate 3D models in real-time. This hands-free interactivity allows clinicians to maintain focus on the patient, improving workflow efficiency and the quality of care provided.

In another embodiment, the real-time video processing module displays or streams video data from XR goggles to a processing unit that integrates the display/feed with the 3D objects.

The prerecorded video processing module accesses and processes stored video files captured during prior clinical sessions or procedures. This flexibility allows the system to be used both during live procedures and for retrospective analysis and planning.

Significant gaps in the current state of XR applications in dental practice are bridged by providing a method and system for seamless integration and alignment of CBCT and IoS images with an XR display/feed. The ability to toggle between different imaging modalities, automated alignment processes, and advanced interactive tools such as hand-gesture and voice-controlled annotations enhance diagnostic precision and treatment planning. By improving workflow efficiency and reducing the potential for human error, this invention offers substantial benefits to dental practitioners and patients alike, advancing the capabilities of XR technology in the field of dentistry.

Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.

BRIEF DESCRIPTION OF THE DRAWINGS

The drawings illustrate the design and utility of embodiments of the present invention, in which similar elements are referred to by common reference numerals. In order to better appreciate the advantages and objects of the embodiments of the present invention, reference should be made to the accompanying drawings that illustrate these embodiments. However, the drawings depict only some embodiments of the invention, and should not be taken as limiting its scope. With this caveat, embodiments of the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:

FIG. 1A illustrates in a block diagram, an automated parsing pipeline system for anatomical localization and condition classification, according to an embodiment.

FIG. 1B illustrates in a block diagram, an automated parsing pipeline system for anatomical localization and condition classification, according to another embodiment.

FIG. 2A illustrates in a block diagram, an automated parsing pipeline system for anatomical localization and condition classification according to yet another embodiment.

FIG. 2B illustrates in a block diagram, a processor system according to an embodiment.

FIG. 3A. illustrates in a flow diagram, an automated parsing pipeline method for anatomical localization and condition classification, according to an embodiment.

FIG. 3B illustrates in a flow diagram, an automated parsing pipeline method for anatomical localization and condition classification, according to another embodiment.

FIG. 4 illustrates in a block diagram, the automated parsing pipeline architecture according to an embodiment.

FIG. 5 illustrates in a screenshot, an example of ground truth and predicted masks in an embodiment of the present invention.

FIG. 6A, 6B & 6C illustrates in a screenshot, the extraction of anatomical structure by the localization model of the system in an embodiment of the present invention.

FIG. 7 illustrates in a graph, receiver operating characteristic (ROC) curve of a predicted

tooth condition in an embodiment of the present invention.

FIG. 8 illustrates in a block diagram, the automated segmentation pipeline according to an embodiment.

FIG. 9 illustrates in a block diagram, the automated segmentation pipeline according to an embodiment.

FIG. 10 illustrates in a block diagram, the automated segmentation pipeline according to an embodiment.

FIG. 11 illustrates in a flow diagram, the automated segmentation pipeline according to an embodiment.

FIG. 12A illustrates in a block diagram, the automated alignment pipeline according to an aspect of the invention.

FIG. 12B illustrates in a block diagram, the automated alignment pipeline according to an aspect of the invention.

FIG. 13 illustrates in a graphical process flow diagram, the automated alignment pipeline in accordance with an aspect of the invention.

FIG. 14 illustrates in a method flow diagram, the automated alignment pipeline in accordance with an aspect of the invention.

FIG. 15 illustrates in a method flow diagram, the automated alignment pipeline in accordance with an aspect of the invention.

FIG. 16 illustrates in a process flow diagram, the automated alignment pipeline according to an aspect of the invention.

FIG. 17 illustrates a method flow diagram, the automated fusion pipeline according to an aspect of the invention.

FIG. 18 illustrates a schematic of the integration of extended reality (XR) in a clinical context for overlaying and toggling between 3D object information during treatment.

FIG. 19 illustrates a schematic of the integration of extended reality (XR) in a clinical context for overlaying and toggling between 3D object information during treatment.

FIG. 20A illustrates in a block diagram, the XR alignment pipeline according to an aspect of the invention.

FIG. 20B illustrates in a block diagram, the XR alignment pipeline according to an aspect of the invention.

FIG. 21 illustrates in a graphical process flow diagram, the XR alignment pipeline in accordance with an aspect of the invention.

DETAILED DESCRIPTION

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the invention. It will be apparent, however, to one skilled in the art that the invention can be practiced without these specific details.

Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments, but not other embodiments.

The present embodiments disclose for a system and method for an automated and AI-aided alignment of volumetric images and surface scan images for improved dental diagnostics. In addition to the various segmentation/localization techniques for assigning structures to each of the received volumetric and surface scan images-as described previously-the automated alignment pipeline additionally features an alignment layer for aligning the converted meshes/erosion points from each of the image types.

Specific embodiments of the invention will now be described in detail with reference to the accompanying FIGS. 1A-7. In the following detailed description of embodiments of the invention, numerous specific details are set forth in order to provide a more thorough understanding of the invention. In other instances, well-known features have not been described in detail to avoid obscuring the invention. Embodiments disclosed include an automated parsing pipeline system and method for anatomical localization and condition classification.

FIG. 1A illustrates a block diagram 100 of the system comprising an input event source 101, a memory unit 102 in communication with the input event source 101, a processor 103 in communication with the memory unit 102, a volumetric image processor 103a in communication with the processor 103, a voxel parsing engine 104 in communication with the volumetric image processor 103a and a localizing layer 105 in communication with the voxel parsing engine 104. In an embodiment, the memory unit 102 is a non-transitory storage element storing encoded information. The encoded instructions, when implemented by the processor 103, configure the automated pipeline system to localize an anatomical structure and classify the condition of the localized anatomical structure.

In one embodiment, an input data is provided via the input event source 101. In one

embodiment, the input data is a volumetric image data and the input event source 101 is a radio-image gathering source. In one embodiment, the input data is 2D image data. The volumetric image data comprises 3-D pixel array. The volumetric image processor 103a is configured to receive the volumetric image data from the radio-image gathering source. Initially, the volumetric image data is pre-processed, which involves conversion of 3-D pixel array into an array of Hounsfield Unit (HU) radio intensity measurements.

The processor 103 is further configured to parse at least one received volumetric image data 103b into at least a single image frame field of view by the volumetric image processor.

The processor 103 is further configured to localize anatomical structures residing in the single image frame field of view by assigning each voxel a distinct anatomical structure by the voxel parsing engine 104. In one embodiment, the single image frame field of view is pre-processed for localization, which involves rescaling using linear interpolation. The pre-processing involves use of any one of a normalization schemes to account for variations in image value intensity depending on at least one of an input or output of volumetric image. In one embodiment, localization is achieved using a V-Net-based fully convolutional neural network. In one embodiment, the V-Net is a 3D generalization of UNet.

The processor 103 is further configured to select all voxels belonging to the localized anatomical structure by finding a minimal bounding rectangle around the voxels and the surrounding region for cropping as a defined anatomical structure by the localization layer. The bounding rectangle extends by at least 15 mm vertically and 8 mm horizontally (equally in all directions) to capture the tooth and surrounding context.

FIG. 1B illustrates in a block diagram 110, an automated parsing pipeline system for anatomical localization and condition classification, according to another embodiment. The automated parsing pipeline system further comprises a detection module 106. The processor 103 is configured to detect or classify the conditions for each defined anatomical structure within the cropped image by a detection module or classification layer 106. In one embodiment, the classification is achieved using a DenseNet 3-D convolutional neural network.

In one embodiment, the localization layer 105 includes 33 class semantic segmentation in 3D. In one embodiment, the system is configured to classify each voxel as one of 32 teeth or background and resulting segmentation assigns each voxel to one of 33 classes. In another embodiment, the system is configured to classify each voxel as either tooth or other anatomical structure of interest. In case of localizing only teeth, the classification includes, but not limited to, 2 classes. Then individual instances of every class (teeth) could be split, e.g. by separately predicting a boundary between them. In some embodiments, the anatomical structure being localized, includes, but not limited to, teeth, upper and lower jaw bone, sinuses, lower jaw canal and joint.

In one embodiment, the system utilizes a fully-convolutional network. In another embodiment, the system works on downscaled images (typically from 0.1-0.2 mm voxel resolution to 1.0 mm resolution) and grayscale (1-channel) image (say, 1×100×100×100-dimensional tensor). In yet another embodiment, the system outputs a 33-channel image (say, 33×100×100×100-dimensional tensor) that is interpreted as a probability distribution for non-tooth vs. each of 32 possible (for adult human) teeth, for every pixel.

In an alternative embodiment, the system provides 2-class segmentation, which includes labeling or classification, if the localization comprises tooth or not. The system additionally outputs assignment of each tooth voxel to a separate “tooth instance”.

In one embodiment, the system comprises VNet predicting multiple “energy levels”,

which are later used to find boundaries. In another embodiment, a recurrent neural network could be used for step by step prediction of tooth, and keep track of the teeth that were outputted a step before. In yet another embodiment, Mask-RCNN generalized to 3D could be used by the system. In yet another embodiment, the system could take multiple crops from 3D image in original resolution, perform instance segmentation, and then join crops to form mask for all original image. In another embodiment, the system could apply either segmentation or object detection in 2D, to segment axial slices. This would allow to process images in original resolution (albeit in 2D instead of 3D) and then infer 3D shape from 2D segmentation.

In one embodiment, the system could be implemented utilizing descriptor learning in the multitask learning framework i.e., a single network learning to output predictions for multiple dental conditions. This could be achieved by balancing loss between tasks to make sure every class of every task has approximately the same impact on the learning. The loss is balanced by maintaining a running average gradient that network receives from every class*task and normalizing it. Alternatively, descriptor learning could be achieved by teaching networks on batches consisting of data about a single condition (task) and sample examples into these batches in such a way that all classes will have the same number of examples in each batch (which is generally not possible in multitask setup). Further, standard data augmentation could be applied to 3D tooth images to perform scale, crop, rotation, vertical flips. Then, combining all augmentations and final image resize to target dimensions in a single affine transform and apply all at once.

Advantageously, in some embodiment, to accumulate positive cases faster, weak models could be trained and run the model on all of unlabeled data. From resulting predictions, teeth models that give high scores on some rare pathology of interest are selected. Then, the teeth are sent to be labelled by humans or users and added to the dataset (both positive and negative human labels). This allows for a build up of more balanced dataset for rare pathologies.

In some embodiments, the system could use coarse segmentation mask from localizer as an input instead of tooth image. In some embodiments, the descriptor could be trained to output fine segmentation mask from some of the intermediate layers. In some embodiments, the descriptor could be trained to predict tooth number.

As an alternative to multitask learning approach, “one network per condition” could be employed, i.e. models for different conditions are completely separate models that share no parameters. Another alternative is to have a small shared base network and use separate subnetworks connected to this base network, responsible for specific conditions/diagnoses.

FIG. 2A illustrates in a block diagram 200, an automated parsing pipeline system for anatomical localization and condition classification according to yet another embodiment. In an embodiment, the system comprises an input system 204, an output system 202, a memory system or unit 206, a processor system 208, an input/output system 214 and an interface 212. Referring to FIG. 2B, the processor system 208 comprises a volumetric image processor 208a, a voxel parsing engine 208b in communication with the volumetric image processor 208a, a localization layer 208c in communication with the voxel parsing engine 208 and a detection module 208d in communication with the localization module 208c. The processor 208 is configured to receive at least one volumetric image via an input system 202. At least one received volumetric image comprises a 3-D pixel array. The 3-D pixel array is pre-processed to convert into an array of Hounsfield Unit (HU) radio intensity measurements. Then, the processor 208 is configured to parse the received volumetric image data into at least a single image frame field of view by the said volumetric image processor 208a.

The anatomical structures residing in the at least single field of view is localized by assigning each voxel a distinct anatomical structure by the voxel parsing engine 208b.

The processor 208 is configured to select all voxels belonging to the localized anatomical structure by finding a minimal bounding rectangle around the voxels and the surrounding region for cropping as a defined anatomical structure by the localization layer 208c. Then, the conditions for each defined anatomical structure within the cropped image is classified by a detection module or classification layer 208d.

FIG. 3A illustrates in a flow diagram 300, an automated parsing pipeline method for anatomical localization and condition classification, according to an embodiment. At step 301, an input image data is received. In one embodiment, the image data is a volumetric image data. At step 302, the received volumetric image is parsed into at least a single image frame field of view. The parsed volumetric image is pre-processed by controlling image intensity value.

At step 304, a tooth or anatomical structure inside the pre-processed and parsed volumetric image is localized and identified by tooth number. At step 306, the identified tooth and surrounding context within the localized volumetric image are extracted. At step 308, a visual report is reconstructed with localized and defined anatomical structure. In some embodiments, the visual reports include, but not limited to, an endodontic report (with focus on tooth's root/canal system and its treatment state), an implantation report (with focus on the area where the tooth is missing), and a dystopic tooth report for tooth extraction (with focus on the area of dystopic/impacted teeth).

FIG. 3B illustrates in flow diagram 310, an automated parsing pipeline method for anatomical localization and condition classification, according to another embodiment. At step 312, at least one volumetric image data is received from a radio-image gathering source by a volumetric image processor.

At step 314, the received volumetric image data is parsed into at least a single image frame field of view by the volumetric image processor. At least a single image frame field of view is pre-processed by controlling image intensity value by the volumetric image processor. At step 316, an anatomical structure residing in the at least single pre-processed field of view is localized by assigning each voxel a distinct anatomical structure ID by the voxel parsing engine. At step 318, all voxels belonging to the localized anatomical structure are selected by finding a minimal bounding rectangle around the voxels and the surrounding region for cropping as a defined anatomical structure by the localization layer. At step 320, a visual report is reconstructed with defined and localized anatomical structure. At step 322, conditions for each defined anatomical structure is classified within the cropped image by the classification layer.

FIG. 4 illustrates in a block diagram 400, the automated parsing pipeline architecture according to an embodiment. According to an embodiment, the system is configured to receive input image data from a plurality of capturing devices, or input event sources 402. A processor 404 including an image processor, a voxel parsing engine and a localization layer. The image processor is configured to parse images into each image frame and preprocess the parsed image. The voxel parsing engine is configured to localize an anatomical structure residing in the at least single pre-processed field of view by assigning each voxel a distinct anatomical structure ID. The localization layer is configured to select all voxels belonging to the localized anatomical structure by finding a minimal bounding rectangle around the voxels and the surrounding region for cropping as a defined anatomical structure. The detection module 406 is configured to detect the condition of the defined anatomical structure. The detected condition could be sent to the cloud/remote server, for automation, to EMR and to proxy health provisioning 408. In another embodiment, detected condition could be sent to controllers 410. The controllers 410 includes reports and updates, dashboard alerts, export option or store option to save, search, print or email and sign-in/verification unit.

Referring to FIG. 5, an example screenshot 500 of tooth localization done by the present system, is illustrated. This figure shows examples of teeth segmentation at axial slices of 3D tensor.

Problem: Formulating the problem of tooth localization as a 33-class semantic segmentation. Therefore, each of the 32 teeth and the background are interpreted as separate classes.

Model: A V-Net-based fully convolutional network is used. V-Net is a 6-level deep, with widths of 32; 64; 128; 256; 512; and1024. The final layer has an output width of 33, interpreted as a softmax distribution over each voxel, assigning it to either the background or one of 32 teeth. Each block contains 3*3*3 convolutions with padding of 1 and stride of 1, followed by ReLU non-linear activations and a dropout with 0:1 rate. Instance normalization before each convolution is used. Batch normalization was not suitable in this case, as long as there is only one example in batch (GPU memory limits); therefore, batch statistics are not determined.

Different architecture modifications were tried during the research stage. For example, an architecture with 64; 64; 128; 128; 256; 256 units per layer leads to the vanishing gradient flow and, thus, no training. On the other hand, reducing architecture layers to the first three (three down and three up) gives a comparable result to the proposed model, though the final loss remains higher.

Loss function: Let R be the ground truth segmentation with voxel values ri (0 or 1 for each class), and P the predicted probabilistic map for each class with voxel values pi. As a loss function we use soft negative multi-class Jaccard similarity, that can be defined as:

$Jaccard Multi class Loss = 1 - \frac{1}{N} \sum_{i = 0}^{N} \frac{p_{i} r_{i} + ϵ}{p_{i} + r_{i} - p_{i} r_{i} + ?}$

$? indicates text missing or illegible when filed$

- where N is the number of classes, which in our case is 32, and ε is a loss function stability coefficient that helps to avoid a numerical issue of dividing by zero. Then the model is trained to convergence using an Adam optimizer with learning rate of 1e-4 and weight decay 1e-8. A batch size of 1 is used due to the large memory requirements of using volumetric data and models. The training is stopped after 200 epochs and the latest checkpoint is used (validation loss does not increase after reaching the convergence plateau).

Results: The localization model is able to achieve a loss value of 0:28 on a test set. The background class loss is 0:0027, which means the model is a capable 2-way “tooth/not a tooth” segmentor. The localization intersection over union (IoU) between the tooth's ground truth volumetric bounding box and the model-predicted bounding box is also defined. In the case where a tooth is missing from ground truth and the model predicted any positive voxels (i.e. the ground truth bounding box is not defined), localization IoU is set to 0. In the case where a tooth is missing from ground truth and the model did not predict any positive voxels for it, localization IoU is set to 1. For a human-interpretable metric, tooth localization accuracy which is a percent of teeth is used that have a localization IoU greater than 0:3 by definition. The relatively low threshold value of 0:3 was decided from the manual observation that even low localization IoU values are enough to approximately localize teeth for the downstream processing. The localization model achieved a value of 0:963 IoU metric on the test set, which, on average, equates to the incorrect localization of 1 of 32 teeth.

Referring to FIGS. 6A-6C, an example screenshot (600A, 600B, 600B) of tooth sub-volume extraction done by the present system, illustrated.

In order to focus the downstream classification model on describing a specific tooth of interest, the tooth and its surroundings is extracted from the original study as a rectangular volumetric region, centered on the tooth. In order to get the coordinates of the tooth, the upstream segmentation mask is used. The predicted volumetric binary mask of each tooth is preprocessed by applying erosion, dilation, and then selecting the largest connected component. A minimum bounding rectangle is found around the predicted volumetric mask. Then, the bounding box is extended by 15 mm vertically and 8 mm horizontally (equally in all directions) to capture the tooth context and to correct possibly weak localizer performance. Finally, a corresponding sub-volume is extracted from the original clipped image, rescale it to 643 and pass it on to the classifier. An example of a sub-volume bounding box is presented in FIGS. 6A-6C.

Referring to FIG. 7, a receiver operating characteristic (ROC) curve 700 of a predicted tooth condition is illustrated.

Model: The classification model has a DenseNet architecture. The only difference between the original and implementation of DenseNet by the present invention is a replacement of the 2D convolution layers with 3D ones. 4 dense blocks of 6 layers is used, with a growth rate of 48, and a compression factor of 0:5. After passing the 643 input through 4 dense blocks followed by down-sampling transitions, the resulting feature map is 548×2×2×2. This feature map is flattened and passed through a final linear layer that outputs 6 logits-each for a type of abnormality.

Loss function: Since tooth conditions are not mutually exclusive, binary cross entropy is used as a loss. To handle class imbalance, weight each condition loss inversely proportional to its frequency (positive rate) in the training set. Suppose that Fi is the frequency of condition i, pi is its predicted probability (sigmoid on output of network) and ti is ground truth. Then: Li=(1=Fi). ti.log pi+Fi. (1−ti).log (1−pi) is the loss function for condition i. The final example loss is taken as an average of the 6 condition losses.

Artificial
Filling

Impacted

crowns
canals
Filling
tooth
Implant
Missing

ROC
0.941
0.95
0.892
0.931
0.979
0.946

AUC

Condition
0.092
0.129
0.215
0.018
0.015
0.145

frequency

Results: The classification model achieved average area under the receiver operating characteristic curve (ROC AUC) of 0:94 across the 6 conditions. Per-condition scores are presented in above table. Receiver operating characteristic (ROC) curves 700 of the 6 predicted conditions are illustrated in FIG. 7.

The automated segmentation pipeline may segment/localize volumetric images by distinct anatomical structure/identifiers based on a distribution approach, versus the bounding box approach described in detail above. In accordance with an exemplary embodiment of the this alternative automated segmentation pipeline, as illustrated by FIG. 8, the memory unit 802 is a non-transitory storage element storing encoded information, when implemented by the processor 803, configure the automated pipeline system to localize/segment an anatomical structure, and optionally, classify the condition of the localized anatomical structure. In one embodiment, an input data (volumetric image) is provided via the input event source 801 (volumetric image gathering source-CBCT, etc.). In one embodiment, the input data is a volumetric image data and the input event source 801 is a radio-image gathering source. In one embodiment, the input data is 2D image data. In another embodiment, the volumetric image data comprises a 3-D pixel array. The volumetric image processor 803a is configured to receive the volumetric image data from the image gathering source-and optionally process or stage for processing the received image for at least one of parsing/segmentation/localization/classification.

The processor 803 is further configured to parse at least one received volumetric image data 803b into at least a single image frame field of view by the volumetric image processor and further configured to localize anatomical structures residing in the single image frame field of view by assigning each voxel a distinct anatomical structure by the voxel parsing engine 804. Optionally, in one embodiment, the single image frame field of view may be pre-processed for segmentation/localization, which involves rescaling using linear interpolation. The pre-processing involves use of any one of a normalization schemes to account for variations in image value intensity depending on at least one of an input or output of volumetric image. In one embodiment, localization/segmentation is achieved using a V-Net-based fully convolutional neural network. In one embodiment, the V-Net is a 3D generalization of UNet.

The processor 803 is further configured to select all voxels belonging to the localized anatomical structure. The processor 803 is configured to parse the received volumetric image data into at least a single image frame field of view by the said volumetric image processor 803ak. The anatomical structures residing in the at least single field of view is localized by assigning each voxel a distinct anatomical structure (identifier) by the voxel parsing engine 803b. The distribution-based approach is an alternative to the minimum bounding box approach detailed in earlier figure descriptions above: selecting all voxels belonging to the localized anatomical structure by finding a minimal bounding rectangle around the voxels and the surrounding region for cropping as a defined anatomical structure by the localization layer. Whether segmented based on distribution or bounding box, the conditions for each defined anatomical structure within the cropped/segmented/mesh-converted image may then be optionally classified by a detection module or classification layer 806.

In a preferred embodiment, the processor is configured for receiving a volumetric image comprising a jaw/tooth structure in terms of voxels; and defining each voxel a distinct anatomical identifier based on a probabilistic distribution for each of an anatomical structure. Apply a computer segmentation model to output probability distribution or discrete assignment of each voxel in the image to one or more classes (probabilistic of discrete segmentation).

In one embodiment, the voxel parsing engine 803b or a localization layer (not shown) may perform 33 class semantic segmentation in 3D for dental volumetric images. In one embodiment, the system is configured to classify each voxel as one of 32 teeth or background and the resulting segmentation assigns each voxel to one of 33 classes. In another embodiment, the system is configured to classify each voxel as either tooth or other anatomical structure of interest. In the case of localizing only teeth, the classification includes, but not limited to, 2 classes. Then individual instances of every class (teeth) could be split, e.g., by separately predicting a boundary between them. In some embodiments, the anatomical structure being localized, includes, but not limited to, teeth, upper and lower jaw bone, sinuses, lower jaw canal and joint.

For example, each tooth in a human may have a distinct number based on its anatomy, order (1-8), and quadrant (upper, lower, left, right). Additionally, any number of dental features (maxilla, mandible, mandibular canal, sinuses, airways, outer contour of soft tissue, etc.) constitute a distinct anatomical structure that can be unambiguously coded by a number.

In one embodiment, a model of a probability distribution over anatomical structures via semantic segmentation may be performed: using a standard fully-convolutional network, such as VNet or 3D UNet, to transform I×H×W×D tensor of input image with I color channels per voxel, to H×W×D×C tensor defining class probabilities per voxel, where C is the number of possible classes (anatomical structures). In the case where classes do not overlap, this could be converted to probabilities via applying a softmax activation along the C dimension. In case of a class overlap, a sigmoid activation function may be applied to each class in C independently.

Alternatively, an instance or panoptic segmentation may be applied to potentially identify several distinct instances of a single class. This works both for cases where there is no semantic ordering of classes (as in case 1, which can be alternatively modeled by semantic segmentation), and for cases where there is no natural semantic ordering of classes, such as in segmenting multiple caries lesions on a tooth.

Instance or Panoptic segmentation could be achieved, for example, by using a fully-convolutional network to obtain several outputs tensors:

- S: H×W×D×C semantic segmentation output
- C: H×W×D×1 centerness output, which defines probability that a voxel is a center of a distinct instance of a class, which is defined by S
- O: H×W×D×3 offset output, which for each voxel defined an offset to point to a centroid predicted by C
  
  S output gets converted to a probability distribution over classes for each voxel by applying a Softmax activation function. Argmax over S gives the discrete classes assignment.
  
  C output gets converted to a centroid instances by:
- Applying a sigmoid to get a probability of instance at this voxel
- Applying of some threshold to reject definite negatives (we used 0.1)
- Applying Non-Maximum-Suppression (NMS)-like procedure of keeping only voxels that have higher probability than their neighbours (each voxel have 3×3×3−1=26 neighbours)
- Centroids are assigned class and also filtered by a semantic classification from S.
- Remaining positive voxels are recorded by their 3D coordinate as instance centroids.
  
  O output assigns each voxel to a centroid by:
- Filtering only non-background voxels from S
- Obtaining predicted instance centroid for instance to which this voxel belongs, by taking a sum of a coordinate of the voxel with its predicted offset
- Selecting the centroid from C closest to the predicted location.

After these steps, we obtain an assignment of each voxel to object instance, and assignment of instances to classes. Again, while not shown, the automated segmentation pipeline system may further comprise a detection module. The detection module is configured to detect or classify the conditions for each defined anatomical structure within the cropped image by a detection module or classification layer. In one embodiment, the classification is achieved using a DenseNet 3-D convolutional neural network. In continuing reference to FIG. 8, a mesh layer or module 805 may be configured to convert probabilistic or discrete segmentation to a polygonal mesh for each class by applying a volume-to-mesh conversion algorithm (such as marching cubes, stainer triangulation, flying edges, etc.).

FIGS. 9 and 10 both illustrate an exemplary flow diagram detailing the automatic segmentation flow involving coarse input images into a coarse and fine model. The use of coarse and fine models allow defining large structures on coarse scale and then refining borders for allowing practitioners to detect small objects in fine scale. As FIGS. 9 and 10 illustrate, a volumetric image is uploaded (1.1) to a device, then it is preprocessed (1.2) so that it can be fed to the trained coarse model and to the fine model. To apply the coarse model, one should rescale data to the appropriate step, and do the same for the fine model. Preprocessed data (1.3) is then passed to the coarse model (1.4) and its prediction (1.5) combined with preprocessed raw data (1.3), which is then passed to the fine model (1.6). Predictions of the fine model with minor postprocessing are then rescaled to the input size resulting in the volumetric image with segmented objects on it (1.7). This prediction can be used by a specialist as is, but then optionally, the system may convert the segmentation to a polygonal mesh for each class by applying a volume-to-mesh conversion algorithm (such as marching cubes, stainer triangulation, flying edges, etc.).

The fine model runs in higher resolution than the coarse model, and typically cannot process the image as a whole. Hence, two techniques are proposed to split volumes in sub-images:

Patch-Based Approach

- a. Split the image into a set of overlapping or non-overlapping patches that cover the whole image.
- b. Combine each patch with the corresponding region of the coarse output (hint).
- c. Run the combined image patch with hint through the fine model, obtaining fine output. The fine output per patch is then combined to reconstruct the fine output for the whole image.
- d. In case of overlapping patches the output is averaged on regions of intersection.

Averaging could be done with or without weights, where weights are increasing towards the center of the patch and falling towards its boundary.

Region of Interest Approach (RoI)

- a. Based on the output of the coarse model (segmentation of objects of interest in coarse resolution), select regions corresponding to the objects of interest.
- b. Select input volume regions corresponding to this region.
- c. Select coarse output part corresponding to this region.
- d. Combine input volume RoI part and coarse output RoI part and run them together through the fine model to obtain a fine model output for the object of interest.
- e. Combine multiple fine per-object outputs into a single fine step output corresponding to the whole image.

FIG. 11 represents an illustrative method flow diagram, detailing the steps entailed in automatically segmenting dental volumetric images. At least one volumetric image data is received from an image gathering source and is parsed into at least a single image frame field of view by the volumetric image processor. The received image may optionally be pre-processed by controlling image intensity value by the volumetric image processor. At step 1102, combining a coarse model output with a coarse input image at fine resolution for a coarse output; passing the output through a fine model to generate the probability 1104. Optionally, the probability may then be applied through a mesh layer or module for generating a polygonal mesh with segmentation. Also, optionally (not shown), a visual report may be reconstructed with defined and localized anatomical structure. Also optionally, each defined anatomical structure may be classified in terms of condition/treatment plan by the classification layer/detection module.

Now in reference to FIG. 12A and 12B, which each illustrate in block diagram form, an exemplary system and method for the automated and AI-aided alignment of volumetric images and surface scan images for improved dental diagnostics. FIG. 12A/12B illustrates a block diagram of the system comprising an input event source; a memory unit in communication with the input event source; a processor 1203 in communication with the memory unit; an image processor 1203a in communication with the processor 1203; a localizing layer or segmenting layer 1204 in communication with the mesh module 1205 and alignment module 1206. In an embodiment, the memory unit is a non-transitory storage element storing encoded information. The encoded instructions when implemented by the processor 1203, configure the automated alignment system to segment and align a volumetric image with a surface scan image for improved visual details/diagnostics.

In one embodiment, an input data is provided via the input event source. In one embodiment, the input data is a volumetric image data and/or surface scan image and the input event source is any one of an image gathering source. In one embodiment, the input data is 2D image data. In another embodiment, the volumetric and/or surface scan image data comprises 3-D voxel array. In another embodiment, the volumetric image received from the input source may be a three-dimensional voxel array of a maxillofacial anatomy of a patient and the surface scan image received may be a polygonal mesh corresponding to the maxillofacial anatomy of the same patient. The image processor 1203a is configured to receive the image data from the image gathering source. In one embodiment, the image data is pre-processed, which involves conversion of 3-D pixel array into an array of Hounsfield Unit (HU) radio intensity measurements.

The processor 1203 is further configured to localize/segment anatomical structures residing in the single image frame field of view by assigning each voxel/pixel/face/vertex/vertices a distinct anatomical structure by the segmentation or localization layer 1204. In one embodiment, the single image frame field of view is pre-processed for localization, which involves rescaling using linear interpolation (not shown). The pre-processing 1203b involves use of any one of a normalization schemes to account for variations in image value intensity depending on at least one of an input or output of volumetric image.

In one embodiment, the localization layer 1204 may perform 33 class semantic segmentation in 3D for dental volumetric images. In one embodiment, the system is configured to classify each voxel as one of 32 teeth or background and the resulting segmentation assigns each voxel to one of 33 classes. In another embodiment, the system is configured to classify each voxel as either tooth or other anatomical structure of interest. In the case of localizing only teeth, the classification includes, but not limited to, 2 classes. Then individual instances of every class (teeth) could be split, e.g., by separately predicting a boundary between them. In some embodiments, the anatomical structure being localized, includes, but not limited to, teeth, upper and lower jaw bone, sinuses, lower jaw canal and joint. Segmentation/localization entails, according to a certain embodiment, selecting for all voxels belonging to the localized anatomical structure by finding a minimal bounding rectangle around the voxels and the surrounding region.

In continuing reference to FIG. 12A/12B, the segmentation layer 1204 segments the volumetric image and surface scan image into a set of distinct anatomical structures by assigning each voxel in the volumetric image an identifier by structure and assigning each vertex or face of the mesh from the surface scan image an identifier by structure. In one embodiment, only the distinct anatomical structures that are in common between the volumetric and the surface scan image are segmented and processed for downstream mesh alignment. In yet other embodiments, all assigned voxels that designate for a distinct structure are segmented for downstream processing, regardless of commonalities with the segmented surface scan image. In one embodiment, the surface scan assignment is determined by a margin defining the boundary between each crown and gingiva.

Once segmented, a polygonal mesh from the volumetric image featuring common structures with the polygonal mesh from the surface scan image is extracted/generated by the mesh layer 1205. The meshes from both the volumetric image and from the surface scan image are then converted to point clouds; and the converted meshes are then aligned via point clouds using a point set registration by the alignment module 1206. In one embodiment, the surface scan image mesh is extracted or generated from the surface scan image, while in other embodiments, the surface scan mesh is received de novo or directly from the input source for downstream processing. In yet other embodiments, as shown in FIG. 12B, a conversion module 1205a may optionally convert the mesh to a point cloud for downstream alignment by the alignment layer 1206.

Now in reference to FIG. 13, which illustrates a graphical flow of the alignment pipeline, the alignment method entails the steps of: A method for alignment of volumetric and surface scan images, said method comprising the steps of: receiving a volumetric image and surface scan image, wherein the volumetric image is a three-dimensional voxel array of a maxillofacial anatomy of a patient and the surface scan image is a polygonal mesh corresponding to the maxillofacial anatomy of the same patient 1301a, 1301b. Optionally, the received images may be additionally pre-processed and normalized to fit for downstream alignment 1302a, 1302b. The next step entails segmenting the volumetric image and surface scan image into a set of distinct anatomical structures by assigning each voxel in the volumetric image an identifier by structure and assigning each vertex or face of the mesh from the surface scan image an identifier by structure, wherein at least one of the distinct anatomical structures are in common between the volumetric and the surface scan image 1303a, 1303b. The volumetric image may be further segmented by assigning a subset of voxels to the dental crown 1303c; a polygonal mesh featuring common structures with the polygonal mesh from the surface scan image is then extracted from the volumetric image 1304a. A teeth mesh is extracted from the surface scan image 1304b. Both the meshes, from the volumetric image and from the surface scan image, are converted to point clouds and the converted meshes are aligned via point clouds using a point set Registration 1305.

In a preferred embodiment, the mesh extraction is performed by a Marching Cubes algorithm. Alternatively, the extraction of the polygonal mesh is of a polygonal mesh of an isosurface from a three-dimensional discrete scalar field. Other, less conventional extraction techniques may be used as well. Preferred alignment methods, such as Iterative Closest Point or Deformable Mesh Alignment may be performed. Essentially any means for aligning two partially overlapping meshes given initial guess for relative transform, so long as one mesh is derived from a CBCT (volumetric image), and the other from an IOS (surface scan image). Aligned CBCT and IOS is then used for orthodontic treatment and implant planning. CBCT provides knowledge about internal structures: bone, nerves, sinuses and tooth roots, while IOS provides very precise visible structures: gingiva and tooth crowns. Both scans are needed for high-quality digital dentistry.

The implementation essentially consists of the following steps:

- 1. Receive a CBCT (in DICOM format) and an IOS (in STL format) from the user.
- 2. Perform a CBCT image preprocessing: normalize a CBCT image intensity values by clipping the values lying outside the [−1000, 2000] interval and then subtract a mean intensity value and divide by a standard deviation.
- 3. Using a convolutional neural network, perform teeth segmentation on CBCT, assigning each voxel a distinct tooth ID or a background ID.
- 4. Segment the dental crowns of localized teeth by the following procedure. For each localized tooth assign a voxel to this tooth's dental crown if:
  - a. this voxel was assigned to this tooth during localization 1303c AND
  - b. the distance between this voxel and the tooth's highest (lowest) point is not greater than 6 mm for the lower (upper) jaw tooth 1303c.
- 5. Build a dental crown mesh using marching cubes algorithm 1304.
- 6. Perform an Intraoral scan preprocessing: center and rescale the mesh to fit a unit sphere;
- 7. Using a convolutional neural network, perform teeth segmentation on IOS, assigning each voxel to one of the teeth or a background.
- 8. Based on teeth segmentation, extract teeth mesh from IOS.
- 9. Perform an alignment of meshes from p.4 and p.6 using point-set registration algorithms (e.g. Iterative Closest Point).

FIGS. 14 and 15 each illustrate a method flow diagram in accordance with an aspect of the invention. As shown in FIG. 14, the method for alignment of CBCT (DICOM format) and IOS (STL format) images, comprises the steps of: A method for alignment of volumetric and surface scan images, said method comprising the steps of: receiving a volumetric image and surface scan image, wherein the volumetric image is a three-dimensional voxel array of a maxillofacial anatomy of a patient and the surface scan image is a polygonal mesh corresponding to the maxillofacial anatomy of the same patient 1402; segmenting the volumetric image and surface scan image into a set of distinct anatomical structures by assigning each voxel in the volumetric image an identifier by structure and assigning each vertex or face of the mesh from the surface scan image an identifier by structure, wherein at least one of the distinct anatomical structures are in common between the volumetric and the surface scan image 1404; extracting a polygonal mesh from the volumetric image featuring common structures with the polygonal mesh from the surface scan image 1406; converting both meshes from the volumetric image and from the surface scan to a point cloud 1408; and aligning the converted meshes via point clouds using a point set registration 1408.

As shown in FIG. 15, the method may obviate the need to build/generate/extract a mesh from the CBCT or volumetric image for purposes of alignment with the IOS mesh. The method entails the steps of: receiving a volumetric image and surface scan image, wherein the volumetric image is a three-dimensional voxel array of a maxillofacial anatomy of a patient and the surface scan image is a polygonal mesh corresponding to the maxillofacial anatomy of the same patient 1502; segmenting the volumetric image and surface scan image into a set of distinct anatomical structures by assigning each voxel in the volumetric image an identifier by structure and assigning each vertex or face of the mesh from the surface scan image an identifier by structure, wherein at least one of the distinct anatomical structures are in common between the volumetric and the surface scan image 1502; applying a binary erosion on the voxels corresponding to a structure (eroded mask) 1504; subtracting the eroded mask from a non-eroded mask revealing voxels on the boundary for selection 1504; selecting a subset of boundary voxels as a point set by selecting a random subset of points to keep a number of points similar to a number of points on a corresponding structure in a polygonal mesh 1506; and aligning a point set from the selected subset of boundary voxels from the received/segmented volumetric image and surface scan image using a point set registration 1508. In another embodiment, the selection of points on the surface of anatomical structures of the volumetric image is done by convolving a binary segmentation image with an edge-detection convolution kernel.

FIG. 16 illustrates a process flow diagram of an embodiment of the invention providing an alternative method of aligning the volumetric and surface scan images. As described previously, the received volumetric image is a three-dimensional voxel array of the maxillofacial anatomy of a patient and the surface scan image received is a polygonal mesh corresponding to the maxillofacial anatomy of the same patient. The image processor is configured to receive the image data from the image source 1603. Optionally, the image data is pre-processed and normalized to fit for downstream alignment. The next step is the localization of dental anatomical landmarks common to both images present inside the volumetric image and on the surface scan image 1604. Standard dental landmarks include:

Exemplary Dental Anatomical Landmarks

- Fauces—Passageway from oral cavity to pharynx.
- Frenum—Raised folds of tissue that extend from the alveolar and the buccal and labial mucosa.

Gingiva—Mucosal tissue surrounding portions of the maxillary and mandibular teeth and bone.

- Hard palate—Anterior portion of the palate which is formed by the processes of the maxilla.
- Incisive papilla—A tissue projection that covers the incisive foramen on the anterior of the hard palate, just behind the maxillary central incisors.
- Maxillary tuberosity—A bulge of bone posterior to the most posterior maxillary molar. Maxillary/Mandibular tori—Normal bony enlargements that can occur either on the maxilla or mandible.
- Mucosa—Mucous membrane lines the oral cavity. It can be highly keratinized (such as what covers the hard palate), or lightly keratinized (such as what covers the floor of the mouth and the alveolar processes) or thinly keratinized (such as what covers the cheeks and inner surfaces of the lips).
- Palatine rugae—Firm ridges of tissues on the hard palate.
- Parotid papilla—Slight fold of tissue that covers the opening to the parotid gland on the buccal mucosa adjacent to maxillary first molars.
- Pillars of Fauces—Two arches of muscle tissue that defines the fauces.
- Soft palate—Posterior portion of the palate. This is non-bony and is comprised of soft tissue.
- Sublingual folds—Small folds of tissue in the floor of the mouth that cover the openings to the smaller ducts of the sublingual salivary gland.
- Submandibular gland—Located near the inferior border of the mandible in the submandibular fossa.
- Tonsils—Lymphoid tissue located in the oral pharynx.
- Uvula—A non-bony, muscular projection that hangs from the midline at the posterior of the soft palate.
- Vestibule—Space between the maxillary or mandibular teeth, gingiva, cheeks and lips.
- Wharton's duct—Salivary duct opening on either side of the lingual frenum on the ventral surface of the tongue.

Following the localization of landmarks common to both the volumetric and surface scan images, the images are aligned by minimizing the distance between the corresponding landmarks present in both images 1605. Alignment may be performed alternatively between: a polygonal mesh of a volumetric image and a polygonal mesh of a surface scan image; a point set of a volumetric image and a point set of a surface scan image; a mesh of a volumetric image and a point set of a surface scan image; or a point set of a volumetric image and a mesh of a surface scan image.

Alternatively, volumetric images and surface scan images may be combined into a single image via a fusion of tooth meshes. FIG. 17 illustrates a method flow diagram of an aspect of this invention. The method entails receiving both a volumetric image mesh and a surface scan image mesh from the same patient in the same format and registered to the same coordinate system 1702. Next, the parts of the volumetric tooth crown mesh also present on the surface crown mesh are identified and segmented 1704. In one embodiment, this is accomplished by first segmenting and numerating the teeth on the surface scan using a convolutional neural network. Each tooth is then isolated into a separate mesh. In one embodiment, this is accomplished by the following procedure: for each pair of neighboring teeth, border vertices are identified by finding common vertices of two sub-meshes corresponding to the two teeth; a plane is fit on the border vertices using the singular value decomposition (SVD) to obtain a plane, referred to as a separating plane; for each tooth, the separating plain is moved toward the tooth center by a constant offset of 0.5 mm; the vertices where a separating plain and a tooth mesh interest are found; the tooth mesh is sliced with the separating plane; and the resulting hole in the tooth mesh is filled by triangulating the points of intersection. The teeth of the volumetric mesh are then segmented and numerated using a convolutional neural network.

Once both are segmented and numerated, the volumetric tooth mesh and the surface scan tooth mesh are matched by their numbers. For each numbered tooth, the faces of the volumetric tooth mesh also present in the surface scan tooth crown mesh are identified. In one embodiment, this is accomplished by, for each face of the surface scan mesh, identifying the nearest face of the volumetric tooth mesh. Next, each face in the volumetric tooth mesh found to match a face in the surface scan tooth crown mesh is removed from the volumetric tooth mesh 1708. Border vertices on the volumetric and surface scan meshes are identified by finding edges adjacent to a single triangle. The two meshes can then be fused by triangulating the border vertices 1710.

FIG. 18 illustrates a schematic of a dental examination scenario where a patient (1804) is seated in a dental chair (1802), being examined by a clinician (1806) equipped with XR goggles (1808). The clinician is utilizing the extended reality (XR) system to enhance the examination process. The XR goggles are connected to a processing unit that integrates real-time data from the XR goggles with pre-captured imaging modalities, including Cone Beam Computed Tomography (CBCT) and intra-oral scans (IoS).

The patient's mouth is open, and the clinician is focusing on the maxillofacial region. The XR goggles display a view and, or feed of the patient's anatomy directly onto the clinician's field of view. The system uses pre-captured volumetric images from a CBCT scanner and surface scans from an intra-oral scanner, providing a comprehensive view of the patient's dental anatomy. The clinician can toggle between CBCT and IoS overlays depending on the anatomical structure of interest. In one embodiment, the clinician may interact with the XR system using either hand gestures or voice commands for a hands-free and less obstructed patient examination.

In this scenario, the clinician is examining a suspected issue with the patient's molars and adjacent gum tissue. By utilizing the XR system, the clinician can switch from the CBCT overlay, which provides detailed information about the bone structure and the position of the teeth, to the IoS overlay, which offers a clear view of the soft tissue, including the gums and any potential lesions.

FIG. 19 is a detailed view of the clinician's perspective through the XR goggles during the examination. The display shows an augmented reality overlay combining both CBCT and IoS data with the real-time display and, or video feed. In an exemplary scenario, on the left side of the display, the clinician sees the CBCT scan 1902 of the patient's molars, highlighting the detailed bone structure, root positions, and any potential issues such as bone loss or cysts. The right side of the display features the IoS data 1904, providing a high-resolution view of the soft tissue, including gum recession, inflammation, and the presence of any lesions or abnormalities.

The advantage of toggling between imaging modalities is clearly illustrated in this view. The clinician can focus on the hard tissue using the CBCT overlay to assess the condition of the bone and the integrity of the dental implants. By simply using a hand gesture or a voice command, the clinician can switch to the IoS overlay to examine the soft tissue surrounding the implants, ensuring there is no infection or gum disease affecting the implant sites.

The XR system's alignment pipeline ensures that the overlays are precisely aligned with the real-time display and, or feed from the XR goggles. In one embodiment, this alignment is achieved using the fully convolutional U-Net-like architecture to obtain a probability distribution over the location of every point of interest, selecting the location of maximum probability as the detection of a landmark, and filtering detections by a probability threshold. In another embodiment, the weighted point-set alignment approach gives different weights to soft tissue and hard tissue points, optimizing the accuracy based on the clinical context.

During the examination, the clinician identifies an area of concern around the second molar. The CBCT overlay reveals a potential bone defect, suggesting early signs of bone loss. By toggling to the IoS overlay, the clinician can closely examine the gum tissue in the same area, identifying signs of inflammation and recession that may correlate with the bone loss seen in the CBCT scan. This seamless switch between imaging modalities allows for a more comprehensive assessment without the need for multiple, separate examinations.

The ability to toggle between CBCT and IoS overlays provides significant clinical advantages. Enhanced diagnostic accuracy is achieved as the clinician can corroborate findings from the CBCT scan with detailed soft tissue views from the IoS scan, leading to a more accurate diagnosis. Improved treatment planning is facilitated by having immediate access to both hard and soft tissue information, allowing the clinician to develop a more effective treatment plan. For example, if bone loss is detected alongside gum recession, a combined approach addressing both issues can be devised. Efficiency and workflow are enhanced as the real-time overlay and ability to toggle between modalities streamline the examination process, reducing the time needed for separate imaging sessions and increasing the overall efficiency of the clinical workflow. Patient communication is also improved as the clinician can show the patient the augmented reality view through a secondary display, helping the patient understand the diagnosis and proposed treatment plan, thereby improving patient engagement and compliance.

A person of ordinary skill in the art will appreciate that any number of overlay configurations may be possible, in addition to the split-screen configuration disclosed above. For instance, a clinician's view could be presented as a full-screen display of either the CBCT or IoS modality, with the option to effortlessly switch between the two. This switch could be achieved through various input methods, such as voice commands, hand gestures, touch-screen interfaces, or physical buttons (either hard or soft) integrated into the XR goggles or the processing unit. Other possible configurations include a picture-in-picture layout where one modality is displayed as a smaller window within the other, or a transparency slider that allows the clinician to adjust the opacity of the overlaid images, providing a blended view that emphasizes specific anatomical structures as needed. These flexible overlay options enhance the clinician's ability to access and interpret the relevant diagnostic information quickly and accurately, tailored to the clinical scenario.

FIGS. 20A and 20B illustrate an exemplary system block diagram in accordance with an aspect of the invention. Now in reference to FIGS. 20A and 20B, which each illustrate in block diagram form, an exemplary system and method for the automated and AI-aided alignment of volumetric images and surface scan images for XR-enabled diagnostics.

In one embodiment, input data is provided via the input event source. The input data may be volumetric image data and/or surface scan image data, and the input event source can be any image gathering source. In another embodiment, the volumetric and/or surface scan image data comprises a 3-D voxel array. The volumetric image received from the input source may be a three-dimensional voxel array of a maxillofacial anatomy of a patient, while the surface scan image received may be a polygonal mesh corresponding to the maxillofacial anatomy of the same patient. The image processor 2003a is configured to receive the image data from the image gathering source. In one embodiment, the image data is pre-processed, which involves conversion of a 3-D pixel array into an array of Hounsfield Unit (HU) radio intensity measurements.

The processor 2003a is further configured to localize/segment anatomical structures within the single image frame field of view by assigning each voxel/pixel/face/vertex a distinct anatomical structure identifier via the segmentation or localization layer 2004. In one embodiment, the single image frame field of view is pre-processed for localization, which involves rescaling using linear interpolation (not shown). The pre-processing may involve using any normalization scheme to account for variations in image value intensity, depending on the input or output of the volumetric image.

In one embodiment, the localization layer 2004 performs 33 class semantic segmentation in 3D for dental volumetric images. The system may be configured to classify each voxel as one of 32 teeth or background, assigning each voxel to one of 33 classes. Alternatively, the system may classify each voxel as either a tooth or another anatomical structure of interest. For instance, in localizing only teeth, the classification might include 2 classes, then individual instances of each tooth could be split by predicting boundaries between them. Anatomical structures being localized may include, but are not limited to, teeth, upper and lower jaw bones, sinuses, lower jaw canal, and joint. Segmentation/localization involves selecting all voxels belonging to the localized anatomical structure by finding a minimal bounding rectangle around the voxels and the surrounding region.

In one embodiment, a model of a probability distribution over anatomical structures via semantic segmentation may be performed using a standard fully-convolutional network, such as VNet or 3D UNet, to transform an I×H×W×D tensor of the input image with I color channels per voxel, to an H×W×D×C tensor defining class probabilities per voxel, where C is the number of possible classes (anatomical structures). This can be converted to probabilities by applying a softmax activation along the C dimension, or a sigmoid activation function may be applied to each class in C independently if classes overlap. Alternatively, instance or panoptic segmentation may identify several distinct instances of a single class, useful for cases with multiple caries lesions on a tooth, for example.

The segmentation layer (2004) segments the volumetric image and surface scan image into distinct anatomical structures by assigning each voxel in the volumetric image an identifier by structure and assigning each vertex or face of the mesh from the surface scan image an identifier by structure. In one embodiment, only the distinct anatomical structures common between the volumetric and surface scan images are segmented and processed for downstream mesh alignment. In other embodiments, all assigned voxels designating a distinct structure are segmented for downstream processing, regardless of commonalities with the segmented surface scan image. The surface scan assignment may be determined by a margin defining the boundary between each crown and gingiva.

Once segmented, a polygonal mesh from the volumetric image featuring common structures with the polygonal mesh from the surface scan image is extracted/generated by the mesh layer 2005. The meshes from both the volumetric image and the surface scan image are then converted to point clouds; the converted meshes are aligned via point clouds using point set registration by the alignment module 2006. In one embodiment, the surface scan image mesh is extracted or generated from the surface scan image, while in other embodiments, the surface scan mesh is received de novo or directly from the input source for downstream aligning/registering/rendering onto the XR display/feed.

In one embodiment, the aligning (2006) of the 3D imagery onto the video display/feed comprises applying a fully convolutional U-Net-like architecture to obtain a probability distribution over the location of every point of interest. The system selects a location of maximum probability as the detection of a landmark, and then filters said detections by a probability threshold.

In another embodiment, the aligning (2006) of the 3D imagery is achieved by applying a weighted point-set alignment approach. This approach gives different weights to soft tissue and hard tissue points, optimizing the alignment process based on the clinical context. Soft tissue points, such as facial landmarks, are given less weight due to their potential for movement and deformation, while hard tissue points, like teeth, are given more weight due to their stability and precision.

In another embodiment, the method further comprises converting the mesh from the volumetric image and the mesh or points from the surface scan to a point cloud 2005a. Once converted, the volumetric point cloud and the surface scan point cloud are aligned using point set registration techniques. This conversion and alignment process allows for the precise integration of volumetric and surface scan data, ensuring that both types of imagery are accurately overlaid onto the real-time video feed from the XR goggles. This alignment is crucial for providing clinicians with a comprehensive view of the patient's anatomy, combining the detailed internal structures captured by CBCT with the precise surface details from IoS.

Further downstream, as depicted in FIG. 20A, a registration layer (2007) maps the aligned 3D imagery onto the display/feed coordinate system. This involves correlating spatial data points from the 3D imagery with corresponding coordinates on the XR display. This mapping ensures that the 3D images are accurately represented within the context of the real-time video feed from the XR goggles. Following the registration, a rendering layer (2008) then renders the composite imagery onto the display/feed. This layer is responsible for visually integrating the aligned 3D images with the live feed, enabling the clinician to toggle seamlessly between different imaging modalities, such as CBCT and IoS, to suit the clinical context.

In another embodiment, as shown in FIG. 20B, the alignment module (2006) is configured not only to align the 3D imagery but also to manage the registration and rendering of the aligned 3D imagery onto the XR display/feed. This integrated approach ensures that the transition from alignment to registration and rendering is smooth and cohesive. In some embodiments, the alignment layer may simultaneously or successively align both the 3D imagery and the display/feed before proceeding to the registration/rendering stages. The alignment methods may include various techniques, such as those described earlier for aligning 3D imagery, ensuring precision and consistency. Furthermore, in some embodiments, the alignment layer might align the 3D imagery directly with the XR display/feed without the need for separate registration/rendering steps. This could involve using advanced algorithms to match spatial points between the 3D data and the live feed dynamically.

During the examination, the clinician may need to assess both hard and soft tissues. For instance, the CBCT overlay provides detailed information about the bone structure, root positions, and potential issues such as bone loss or cysts. The CBCT data, focusing on hard tissues, helps in visualizing the internal structure of the teeth and surrounding bone, which is crucial for diagnosing and planning treatments that involve these components. On the other hand, when examining soft tissues, the IoS overlay offers a high-resolution view of the gums, revealing conditions like recession, inflammation, or lesions. This soft tissue visualization is critical for detecting periodontal issues and planning appropriate interventions.

The ability to switch views instantly, as facilitated by the registering and rendering layers, allows for a comprehensive assessment in a single session. This capability significantly enhances diagnostic accuracy and treatment planning by providing the clinician with a full spectrum of relevant data without the need for multiple, separate imaging sessions.

Now in reference to FIG. 21, which illustrates a graphical flow of the alignment pipeline adapted for XR applications, the alignment method entails the following steps. The process begins with receiving a volumetric image and a surface scan image (2101a, 2101b). The volumetric image is a three-dimensional voxel array representing the maxillofacial anatomy of a patient, while the surface scan image is a polygonal mesh corresponding to the same anatomy. Optionally, the received images may undergo pre-processing and normalization (2102a, 2102b) to prepare them for downstream alignment, involving normalizing intensity values and ensuring consistency in image scale and orientation.

The next step involves segmenting the volumetric image and the surface scan image into distinct anatomical structures by assigning each voxel in the volumetric image an identifier by structure and assigning each vertex or face of the mesh from the surface scan image an identifier by structure, ensuring that at least one of the anatomical structures is common between both images (2103a, 2103b). For example, the volumetric image may be further segmented by assigning a subset of voxels to the dental crown (2103c).

Subsequently, a polygonal mesh featuring common structures with the polygonal mesh from the surface scan image is extracted from the volumetric image (2104a), while a teeth mesh is extracted from the surface scan image (2104b). Both meshes are then converted to point clouds and aligned via point clouds using a point set registration technique (2105). In a preferred embodiment, the mesh extraction is performed by a Marching Cubes algorithm. Alternatively, the extraction of the polygonal mesh can be achieved through less conventional techniques, such as extracting a polygonal mesh of an isosurface from a three-dimensional discrete scalar field.

Preferred alignment methods include Iterative Closest Point or Deformable Mesh Alignment, but essentially any method for aligning two partially overlapping meshes given an initial guess for relative transform may be used, as long as one mesh is derived from a CBCT (volumetric image) and the other from an IOS (surface scan image).

Further downstream, as depicted in FIG. 21, a registration layer (2108) maps the aligned 3D imagery onto the display/feed coordinate system. This involves correlating spatial data points from the 3D imagery with corresponding coordinates on the XR display. This mapping ensures that the 3D images are accurately represented within the context of the real-time video feed from the XR goggles. Finally, a rendering layer (2110) renders the composite imagery onto the display/feed. This layer is responsible for visually integrating the aligned 3D images with the live feed, enabling the clinician to toggle seamlessly between different imaging modalities, such as CBCT and IoS, to suit the clinical context.

In another embodiment, the alignment module (2106) is configured not only to align the 3D imagery but also to manage the registration and rendering of the aligned 3D imagery onto the XR display/feed. This integrated approach ensures that the transition from alignment to registration and rendering is smooth and cohesive. In some embodiments, the alignment layer may simultaneously or successively align both the 3D imagery and the display/feed before proceeding to the registration/rendering stages. The alignment methods may include various techniques, such as those described earlier for aligning 3D imagery, ensuring precision and consistency.

Furthermore, in some embodiments, the alignment layer might align the 3D imagery directly with the XR display/feed without the need for separate registration/rendering steps. This could involve using advanced algorithms to dynamically match spatial points between the 3D data and the live feed.

During the examination, the clinician may need to assess both hard and soft tissues. For instance, the CBCT overlay provides detailed information about the bone structure, root positions, and potential issues such as bone loss or cysts, crucial for diagnosing and planning treatments involving these components. Conversely, the IoS overlay offers a high-resolution view of the gums, revealing conditions like recession, inflammation, or lesions, essential for detecting periodontal issues and planning appropriate interventions.

The figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. It should also be noted that, in some alternative implementations, the functions noted/illustrated may occur out of the order noted. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

Since various possible embodiments might be made of the above invention, and since various changes might be made in the embodiments above set forth, it is to be understood that all matter herein described or shown in the accompanying drawings is to be interpreted as illustrative and not to be considered in a limiting sense. Thus, it will be understood by those skilled in the art of creating independent multi-layered virtual workspace applications designed for use with independent multiple input systems that although the preferred and alternate embodiments have been shown and described in accordance with the Patent Statutes, the invention is not limited thereto or thereby.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Some portions of embodiments disclosed are implemented as a program product for use with an embedded processor. The program(s) of the program product defines functions of the embodiments (including the methods described herein) and can be contained on a variety of signal-bearing media. Illustrative signal-bearing media include, but are not limited to: (i) information permanently stored on non-writable storage media (e.g., read-only memory devices within a computer such as CD-ROM disks readable by a CD-ROM drive); (ii) alterable information stored on writable storage media (e.g., floppy disks within a diskette drive or hard-disk drive, solid-state disk drive, etc.); and (iii) information conveyed to a computer by a communications medium, such as through a computer or telephone network, including wireless communications. The latter embodiment specifically includes information downloaded from the Internet and other networks. Such signal-bearing media, when carrying computer-readable instructions that direct the functions of the present invention, represent embodiments of the present invention.

In general, the routines executed to implement the embodiments of the invention may be part of an operating system or a specific application, component, program, module, object, or sequence of instructions. The computer program of the present invention typically is comprised of a multitude of instructions that will be translated by the native computer into a machine-accessible format and hence executable instructions. Also, programs are comprised of variables and data structures that either reside locally to the program or are found in memory or on storage devices. In addition, various programs described may be identified based upon the application for which they are implemented in a specific embodiment of the invention.

However, it should be appreciated that any particular program nomenclature that follows is used merely for convenience, and thus the invention should not be limited to use solely in any specific application identified and/or implied by such nomenclature.

The present invention and some of its advantages have been described in detail for some embodiments. It should be understood that although the system and process is described with reference to automated segmentation pipeline systems and methods, the system and process may be used in other contexts as well. It should also be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims. An embodiment of the invention may achieve multiple objectives, but not every embodiment falling within the scope of the attached claims will achieve every objective. Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, and composition of matter, means, methods and steps described in the specification. A person having ordinary skill in the art will readily appreciate from the disclosure of the present invention that processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed are equivalent to, and fall within the scope of, what is claimed. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.

	Number	Date	Country
Parent	17854894	Jun 2022	US
Child	18768241		US
Parent	17564565	Dec 2021	US
Child	17854894		US
Parent	17215315	Mar 2021	US
Child	17564565		US
Parent	16783615	Feb 2020	US
Child	17215315		US
Parent	16175067	Oct 2018	US
Child	16783615		US

System and Method for Aligning 3-D Imagery of a Patient's Oral Cavity in an Extended Reality (XR) Environment

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Continuation in Parts (5)