Machine learning is used in a variety of industries and fields to automate and improve processes and various tasks. In the dental field, including orthodontics, many processes and tasks are performed manually and may rely upon user feedback or interaction for completion of them. Machine learning could be used in the dental field in order to automate, partially automate, or improve such processes and tasks.
Embodiments use machine learning applied to various dental processes and solutions. In particular, generative adversarial networks embodiments apply machine learning to smile design—finished smile, appliance rendering, scan cleanup, restoration appliance design, crown and bridges design, and virtual debonding. Vertex and edge classification embodiments apply machine learning to gum versus teeth detection, teeth type segmentation, and brackets and other orthodontic hardware. Regression embodiments apply machine learning to coordinate systems, diagnostics, case complexity, and prediction of treatment duration. Automatic encoders and clustering embodiments apply machine learning to grouping of doctors (or technicians) and preferences.
Geometric Deep Learning (GDL) or machine learning methods are used to process dental scans for several dental and orthodontic processes and tasks. The use of GDL can, for example, automate, partially automate, or improve these processes and tasks. The following are exemplary uses of GDL for dental and orthodontic applications in addition to the embodiments described in the Sections below.
Use of transfer learning: The methods could use transfer learning in situations having a dearth of good training data. The methods could use a model that is pretrained for a tooth type for which there is enough training data as a base model and fine tune its weights, either completely or partially, to create a new model suitable for working with the first (data deficient) tooth type.
Use of other modalities: GDL could be used with multi-view two-dimensional (2D) projections, for example, multi-view convolutional neural networks (MVCNN).
Use of multiple modalities together: The methods could create pipelines that use machinery from all or some of the modalities to create a hybrid pipeline. This pipeline is capable of ingesting data with multiple modalities.
These embodiments include, for example, the following.
Restoration final smile design: Using generative adversarial networks to create a 3D mesh of final smile based off of the initial 3D mesh.
Restoration appliance design: Using generative adversarial networks (GANs) to create a restoration appliance based on the 3D mesh of the final smile design.
Crown and bridges design: Using GANs to provide the capability of displaying how appliances (braces, brackets, etc.) would look during the course of the treatment.
Virtual debonding: Using GANs to generate scanned arch meshes without appliances based off of initial 3D scans of the arches that contain appliances (brackets, retainers, or other hardware) and, alternatively using machine learning segmentation module to identify brackets, retainers, or other hardware that is present in the scanned arches. GANs or 3D mesh processing may then be used to remove the appliances form the scan meshes.
These embodiments include methods for automated 3D mesh cleanup for dental scans. There are three main approaches: a 3D mesh processing approach; a deep learning approach; and a combination approach that employs some 3D mesh processing elements and some deep learning elements.
The methods receive raw (pre-cleanup) digital dental models generated by a variety of intra-oral and lab scanners with varying characteristics of their 3D meshes. The methods utilize standard, domain-independent 3D mesh repair techniques to guarantee certain mesh qualities that avoid mesh processing problems in subsequent operations. The methods also use custom, orthodontic/dental domain-specific algorithms such as model base removal and partial gum clipping/bridging, as shown in
As more data is acquired, machine learning methods and particularly deep learning methods start performing on par or exceed the performance of explicitly programmed methods. Deep learning methods have the significant advantage of removing the need for hand-crafted features as they can infer several useful features using a combination of several non-linear functions of higher dimensional latent or hidden features directly from the data through the process of training. While trying to solve the mesh cleanup problem, directly operating on the 3D mesh might be desirable using, for example, methods such as PointNet, PointCNN, MeshCNN, and FeaStNetre.
Deep learning algorithms have two main development steps: 1) Model training and 2) Model deployment.
Model training makes use of multiple raw (pre-cleanup) and cleaned up digital 3D models for historic case data. Raw or partially cleaned up 3D models are input into a deep learning framework that has been architected to generate predicted improved cleaned up 3D models. Optionally, data augmentation may be applied to the input models to increase the amount of data that is input into the deep learning model. Some data augmentation techniques include mesh translation and rotation, uniform and non-uniform scaling, edge flipping, and adding random noise to mesh vertices. Next, the model is trained through a process that iteratively adjusts a set of weights to minimize the difference between predicted and actual cleaned up digital 3D models. The trained model is then evaluated by generating cleaned up meshes for a reserve set of cases that were not used during training and comparing these generated 3D meshes to cleaned up meshes for the actual case data.
The model deployment stage makes use of the trained model that was developed during model training. The trained model takes as input a raw digital 3D model for a new never-before-seen case and generates a cleaned-up 3D model for this case.
Model development/training:
Model deployment:
As described in Section 2, it is possible to use deep learning to generate cleaned up meshes from input scans without any explicitly programmed 3D mesh processing steps. Some mesh cleanup operations (e.g., hole filling, removing triangle intersections, and island removal), however, are well-defined mesh operations and likely more effectively implemented using 3D mesh processing methods rather than deep learning. Instead, the methods can implement a combination approach that uses deep learning in place of some but not all mesh processing steps described in Section 1. For example, deep learning can be used to identify the gum line in a mesh, allowing for excess material below the gum line to be removed. Deep learning may also be used to identify pathologic features (see
These methods use GANs and GDL to automate the manual mesh clean process based on trends learned from the data. Defects in the meshes include topological holes, non-smooth surfaces., etc. In these methods, a machine learning approach is used to construct a mapping between meshes in the uncleaned state to meshes in their cleaned state. This mapping is learned through adversarial training and is embodied in the conditional distribution of cleaned up meshes, given the corresponding uncleaned source mesh. The model is trained using a dataset of point clouds acquired using intra-oral scans (referred to as data points) in the uncleaned state and the corresponding meshes after they go through a cleaning process, done by either semi-automated software programs or completely manually by trained humans.
This machine learning model can later be used as a preprocessing step for other geometric operations. For instance, in the case of digital orthodontics this model could be used to standardize the incoming point cloud in a coordinate system that is conducive for processing without requiring a human in the loop. This effectively and significantly reduces the processing time for each case as well as the need to train human workers to perform this task. Additionally, because the machine learning model is trained on data generated by a multitude of trained humans, it has the potential to achieve higher accuracy when compared to a single human on their own.
The following are stages in the workflow:
The preprocessed mesh/point cloud is passed through the machine learning model and a generated mesh/point cloud is obtained. The steps related to the use of the machine learning model are provided below.
The training process uses training data, which is a set of pairs of unclean and cleaned meshes/point clouds. This data is assumed to be available before the creation of the model.
The model has two major parts: one is the generator and the other is the discriminator. The generator takes in a mesh/point cloud and generates another mesh/point cloud. This generated mesh/point cloud has some desired geometric traits. The discriminator takes the generated mesh/point cloud and gives it a score. The discriminator is also given the corresponding ground truth cleaned mesh/point cloud and gives out another score. The adversarial loss encodes the dissimilarity between these two scores. The total loss function can include other components as well. Some components could be introduced to enforce rules-based problem-specific constraints.
The methods pass batches randomly selected from the training data set into the model and compute the loss function. The methods infer the gradients from the computed loss function and update the weights of the model. During the training of the model, the generator is updated to minimize the total function and the discriminator is updated to maximize it. This process is repeated either for a predefined number of iterations or until a certain objective criterion is met.
The methods assume that there is a validation set available at the beginning of training. This data set is similar to the training data set in that it is a set of paired unclean and cleaned meshes/point clouds.
After a set number of training iterations, the methods pass the validation set thorough the model and compute the loss function value. This value serves as a measure of how well the model generalizes on unseen data. The validation loss values can be used as a criterion for stopping the training process.
These methods use GANs and GDL to predict the meshes after dental restoration has happened given a mesh representing the initial state of the mesh on the basis of a trends learned from a training set.
In these methods, a machine learning approach is used to construct a map between meshes in the uncleaned state to meshes in their cleaned state. This map is learned through adversarial training and is embodied in the conditional distribution of meshes of restored teeth, given the mesh corresponding to initial state. The model is trained using a dataset of point clouds acquired using intra-oral scans (referred to as data points) in the unrestored initial state and the corresponding meshes after they go through a restoration process.
The inference machine learning model can later be used for smile prediction, which could enable the orthodontists to show their patients the final state of the restored arch after the restoration process has been completed in software.
The following are stages in the workflow:
The preprocessed mesh/point cloud is passed through the machine learning model and a generated mesh/point cloud is obtained. The steps related to the use of the machine learning model are provided below.
The training process uses training data, which is a set of pairs of unclean and cleaned meshes/point clouds. This data is assumed to be available before the creation of the model.
The model has two major parts: one is the generator and the other is the discriminator. The generator takes in a mesh/point cloud and generates another mesh/point cloud, this generated mesh/point cloud has some desired geometric traits. The discriminator takes the generated mesh/point cloud and gives it a score. The discriminator is also given the corresponding ground truth cleaned mesh/point cloud and gives out another score. The adversarial loss encodes the dissimilarity between these two scores.
The total loss function can include other components as well. Some components could be introduced to enforce rules-based problem-specific constraints.
The methods pass batches randomly selected from the training data set into the model and compute the loss function. The methods infer the gradients from the computed loss function and update the weights of the model. During the training of the model, the generator is updated to minimize the total function and the discriminator is updated to maximize it. This process is repeated either for a predefined number of iterations or till a certain objective criterion is met.
The methods assume that there is a validation set available at the beginning of training. This data set is similar to the training data set, in that it is a set of paired unclean and cleaned meshes/point clouds.
After a set number of training iterations, the methods pass the validation set thorough the model and compute the loss function value. This value serves as a measure of how well the model generalizes on unseen data. The validation loss values can be used as a criterion for stopping the training process.
These methods determine the validation state of a component for use in creating a dental restoration appliance. These methods can facilitate automating the restoration appliance production pipeline. There are at least two embodiments: 1) an embodiment that uses a GraphCNN to apply a class label (i.e., pass or fail) to the 3D mesh component, and 2) an embodiment that uses a CNN to apply a class label (i.e., pass or fail) to a set of one or more 2D raster images that represents one or more views of the 3D mesh component.
Each embodiment uses a neural network (NN) to distinguish between two or more states of a representation of a component to be used in a dental restoration appliance, optionally for the purpose of determining if that component is acceptable for use in building the appliance.
These embodiments can perform quality assurance (QA) on a completed dental restoration appliance. In some production pipelines, a qualified person must inspect the completed appliance and render a pass/no-pass determination. These embodiments can automate the process of validating a restoration appliance and eliminate one of the largest remaining “hidden factories” of effort, reducing, for example, a one- to two-day pipeline process to as little as half an hour for many cases.
These embodiments can validate components of a dental restoration appliance and/or the completed dental restoration appliance. The advantage of using these embodiments for such a QA process is that the NN can assess the quality of generated components and placed components faster and more efficiently than is possible by manual inspection, allowing the QA process to scale far beyond a few experts. As a further advantage, the NN may produce a more accurate determination of the quality of the shape or placement of a component than would otherwise be possible by manual inspection, for example if the NN recognizes subtle abnormalities that a human would miss. As still a further advantage, the use of the NN and the examination of the results of that NN may help a human operator become trained to recognize a proper appliance component design. In this manner, knowledge may be transferred to new human experts.
In a further application, these embodiments support the creation of an extensive automated regression test framework for the code that generates and/or places components. The advantage of this further application is to make comprehensive regression testing possible. These embodiments enable a regression testing framework to automatically validate the outputs of dozens of processed cases and can do so as often as the developer chooses to run the tests.
These embodiments can be implemented using in part, for example, the open source toolkit MeshCNN to implement the Graph CNN (GCNN). MeshCNN has a sample program that inputs a mesh and assigns a class label to that mesh. The sample program has a long list of possible classes. The sample program that comes with MeshCNN is able to classify these 3D meshes in order to assign the appropriate label. MeshCNN is adapted to distinguish between two or more states (e.g., pass/no-pass) of a component that is to be used in the creation of a dental restoration appliance (i.e., a mold parting surface).
This implementation is similar to the Embodiment 1 implementation, except that the GCNN is replaced with a CNN. The CNN is trained to classify 2D raster images. For a given component, the CNN would be trained to recognize each of a set of different views of the 3D geometry of the component (e.g., a parting surface) by itself, in conjunction with other features, represented in the final appliance design, or combinations thereof; alone, in relation to the input dental structure, or both. These 2D raster images are produced with, for example, the commercial CAD tool Geomagic Wrap or open-source software tools such as Blender.
As a proof of concept, MeshCNN was used to train a NN to distinguish between examples of “passing” mold parting surfaces and “non-passing” parting surfaces. “Passing” and “non-passing” are subjective labels that can be determined by experts and may vary between different experts. This type of label stands in contrast to, for example, the label of “dog” for an ImageNet image of a dog. The label of “dog” is objective and does not involve any expert opinion.
The NN of these embodiments could be incorporated into a regression testing system for testing the quality of the code that automates the production of parts to be used in the production of a dental restoration appliance. Typically, regression tests are used to determine whether recent changes to code or to inputs have negatively affected the outputs of a system. In the present case, there is a need to be able to change a few lines of the automation code and rapidly determine whether those changes have had any adverse effects on the outputs on our suite of test cases. There may be dozens of test cases. The outputs of the dozens of test cases can be inspected manually, but at great cost in terms of the time required for a technician or other person to manually inspect the outputs for all test cases. The advantage of the present embodiment is to streamline the process. Even if 1 out of 36 test cases fails to produce acceptable results after the code change, the NN from this embodiment is designed to detect that error.
As a further application, the NN can be used outside of regression testing, and be applied as a QA step in production. At present, a qualified person must manually inspect the 3D data associated with the creation of the dental appliance. There are several stages of the fabrication process at which these data must be validated.
In one embodiment, a NN is used to validate the correctness of a “mold parting surface,” which is an important component of the restoration appliance. It is important that the parting surface be formed correctly. This new NN examines the parting surface on a tooth-by-tooth basis, observing the manner in which the parting surface bisects each tooth.
These embodiments operate on the outputs of the automation code. The automation code may embody some or all of the content of PCT Patent Application Number PCT/IB2020/054778, entitled “Automated Creation of Tooth Restoration Dental Appliances” and U.S. Provisional Patent Application No. 63/030,144, entitled “Neural Network-Based Generation and Placement of Tooth Restoration Dental Appliances.” Some of those outputs are generated components. A non-exhaustive list of generated components includes: mold parting surfaces, gingival trim surfaces, facial ribbons, incisal ridges, lingual shelves, stiffening ribs, “doors & windows” and diastema matrices. Others of those outputs are placed components (e.g., prefab library parts that must be translated and/or rotated to align in certain ways with respect to a patient's tooth geometry). A non-exhaustive list of placed components includes: incisal registration features, vents, rear snap clamps, door hinges, and door snaps. A technician must inspect the automation outputs to ensure that the generated components are properly formed and that the placed library components are properly positioned. A NN from the present embodiment could be used to determine whether components are properly formed or placed. The advantage is to save time for the technician and potentially to produce a higher quality dental restoration appliance through discovering errors in the shapes or placements of components that the technician may overlook. There are certain components that are of special importance, such as the mold parting surface. The mold parting surface forms the basis for much of the subsequent formation of the appliance. If there is an error in the mold parting surface, there is great value in discovering the error and discovering the error early in the appliance creation process.
A machine learning system has two stages of operation 1) training and 2) validation/operational use. The NNs in the embodiment must be trained on examples of good geometry and examples of bad geometry. Our first proof-of-concept used mold parting surfaces for the 3D geometry.
The MeshCNN code was run (without modification) on this particular dataset of dental restoration appliance component parts and trained to distinguish between “passing” and “non-passing” parts. The training dataset contained 14 examples of “passing” parting surfaces and 14 examples of “non-passing” parting surfaces. Each one of the “non-passing” examples is a corrupted instance of one of the “passing” examples (i.e., where the code was changed to corrupt the generated parting surface). The test dataset contained 6 “non-passing” examples and 7 “passing” examples. The NN was trained for 20 epochs, achieving 100% accuracy on the held-out validation set. One epoch involves iterating through each example once. For the purposes of this proof-of-concept implementation, the parting surfaces were generated with a smaller number of triangles than the production parting surfaces, to save on the RAM required by the NN and enable the NN to run on an ordinary laptop.
The NN was then tested on a held-out validation dataset, i.e. data samples that were not involved in the training process, as is the custom in training machine learning models. 18 “passing” samples (i.e., good parting surfaces) were prepared, and 18 “non-passing” samples (i.e., bad parting surfaces) were prepared. The NN correctly classified 100% of these held-out validation data samples.
This embodiment is an extension of other embodiments described herein. This embodiment adds another item to the four items described above. This embodiment uses a NN to distinguish between two or more states of a representation of a component to be used in a dental restoration appliance for the purpose of determining if that component is acceptable for use in building the appliance, and if the component is found not to be acceptable, then the NN may in some embodiment output an indication of how the component should be modified to correct the geometry of the component.
The term “3D Mesh Component” is used to indicate a generated component from those described above, a placed component from those described above, or another 3D mesh that is intended for use with a rapid prototyping, 3D printing or stereolithography system. The component may be either a positive or negative feature which is integrated into the final part by a Boolean operation. The embodiment helps provide contextual feedback to automated feature generation, wherein there may be one algorithm or ruleset to create a component and one NN classification to check the quality of that component. The relationship between two components comprises a recursive “guess and check” mechanism to ensure acceptable results (create/generate>classify>regenerate>classify> . . . >final design).
This embodiment involves 3D Mesh Components in the context of digital dentistry and the automated production of dental appliances. Examples include: the restoration appliance, a clear tray aligner, bracket bonding tray, lingual bracket, restorative component (e.g., crown, denture), patient specific custom devices, and others. A dentist or provider could apply this embodiment to a digital design that the provider has made chairside in a dental clinic. Other embodiments are also possible, for example any application where automating a design could benefit from this embodiment, including the automated design of support structures for 3D printing and the automated design of fixtures for part fixturing. Additionally, a 3D printing laboratory could apply the embodiment to a prototype part, where the part is embodied as a 3D mesh. A manufacturing environment could apply this embodiment to custom 3D printed components where the NN input is derived from photos of the component, or screen captures of a mesh that is generated by scanning a physical part. This would allow the manufacturer to qualify output parts without the use of classical 3D analysis software, and may reduce or eliminate the effort required by a human expert to qualify the output parts. This embodiment could be applied by interaction of the user with the software, or this embodiment could be part of a background operation of a smart system that is providing input to the process without direct user intervention.
This embodiment is generally useful in the detection of problems with 3D meshes and the automated correction of those problems.
A 3D Mesh Component is created through the following: automatic generation as described herein; automatic placement as described herein; manual generation by an expert; manual placement by an expert; or by some other means, for example, the use of a CAD tool or another setting in a rapid prototyping lab.
That 3D Mesh Component is entered into a validation neural network (e.g., of the kind described herein). The validation neural network renders a result on the quality of the 3D Mesh Component: either pass or not-pass. If the result is a pass, then the 3D Mesh Component is sent along to be used for its intended purpose (e.g., to be incorporated into a dental appliance). If the result is a not-pass, then the validation neural network may in some embodiments output an indication of how to modify the 3D Mesh Component in order to bring the 3D Mesh Component into closer conformance with expectations.
In the embodiment described below, a mold parting surface is examined in the vicinity to each of the teeth in an arch. If the mold parting surface intersects that tooth in the incorrect way, then the embodiment outputs and indication that the mold parting surface should be moved either lingually or facially in order to cause the mold parting surface to more cleanly bisect that tooth's outer cusp or incisal edge. The mold parting surface is intended to divide the facial and lingual portions of each tooth, which means that the mold parting surface should run along the outer cusp tips of the teeth. If the mold parting surface cuts too far in the lingual direction or too far in the facial direction, then the mold parting surface does not adequately divide the facial and lingual portions of each tooth. Consequently, the mold parting surface requires adjustment in the vicinity of that tooth. The software that automatically generates the mold parting surface has parameters which are operable to bias the facial/lingual positioning of the mold parting surface near that tooth. This embodiment produces incremental changes to those parameter values, in the proper directions, to make the mold parting surface more cleanly bisect each tooth.
This embodiment can mandate changes to the mold parting surface in the vicinity of some teeth (i.e., where the mold parting surface did not correctly bisect the tooth), but not in the vicinity of others (i.e., where the mold parting surface correctly or more cleanly bisected the tooth).
In this embodiment, there are two validation neural networks, one which is termed the Lingual-bias NN, and one which is termed the Facial-bias NN. Both of these neural networks are trained on 2D raster images of views of 3D tooth geometries, where the 3D tooth geometries are visualized in connection with a mold parting surface (see descriptions above in this Section I.D.). The mold parting surface is an example of a 3D Mesh Component, as previously defined.
Options for creating the 2D raster images of teeth in relation to mold parting surfaces include the following:
Arbitrary views are considered for each of the above. In some embodiments, the use of a multi-view pipeline can enable the use of an arbitrary number of views with arbitrary camera positions and angles of rendered images.
The Lingual-bias NN is trained on two classes of image: 1) a class where the mold parting surface has been correctly formed and correctly bisects the teeth, and 2) a class where the mold parting surface has been incorrectly formed and does not correctly bisect one or more teeth. For this example, images were created which reflect several arbitrary views of each tooth in the arch. The views should show that tooth in relation to the parting surface, as the parting surface intersects the tooth (as per the list above). This case could use option 4 above, where the parting surface is intersected with the tooth and produces, for example, red and blue coloration or different shading on the tooth.
This embodiment trains the Lingual-bias NN to distinguish between the two classes of image (i.e., with passing parting surfaces and with non-passing parting surfaces). If the Lingual-bias NN renders a non-passing result on an input parting surface, then the methods know that the parting surface must have bisected the tooth in a manner that was too far lingual. The methods therefore output an indication that the parting surface came too far lingually for this tooth and should be moved slightly in the opposite direction when the mold parting surface is reworked by the automatic generation software (e.g., automatic generation software as described herein). The code to automatically generate the parting surface has a parameter for each tooth, which can bias the parting surface in either the lingual or facial direction. This parameter can be adjusted, so that the next iteration of the parting surface moves by a small increment in the facial direction for this tooth.
Other embodiments can in effect estimate the amount of surface movement in the facial direction using a regression network that works on images of the teeth. The regression network can be used to estimate the “transgression” in the lingual or facial region given an image of the teeth. Converting that transgression amount to the parameter can be feasible. This change in the feedback loop will lower the number of iterations/revisions for the methods.
The Facial-bias NN is trained with the same positive class images as the Lingual-bias NN, but the negative class images are generated using parting surfaces that come too far facially along the tooth. All the rest of the training details are substantially the same, except that when the Facial-bias NN renders a non-passing verdict, then the methods know that the mold parting surface came too far facially along the tooth, and the automatic generation software must be instructed to move the mold parting surface by an increment in the lingual direction.
In other embodiments, a neural network can be trained to isolate anomalies in the transgression on either of the lingual or facial directions. Such a NN has the capability of highlighting what were the most salient parts of the mesh/image of the arch for its inference.
In some embodiments, a regression network may be used for estimating the amount of transgression on the facial side and adjusting the corresponding parameter accordingly.
Each tooth is analyzed separately. Several images of each tooth are put through the pipeline, and pass/no-pass verdicts are rendered for each image. In cases where several images of a tooth/parting surface combination are rendered through this pipeline, there are different options for determining the result. In some embodiments, if at least one of the several views renders a non-passing verdict, then the methods output an indication that the parting surface needs adjustment in the vicinity of that tooth. In other embodiments, if some or a majority of the analyzed images renders a non-passing verdict, then the methods output an indication that the parting surface needs adjustment in the vicinity of that tooth.
In one embodiment, the validation neural network comprises of a Convolutional Neural Network (CNN). A CNN may embody a variety of different network configurations, including networks with a different number of layers, a different number of nodes per layer, differing use dropout layers, differing use of convolutional layers, differing use of dense layers, among other differences.
In other embodiments, the validation neural network may draw upon elements of a Multi-view CNN (MVCNN) architecture. As a brief summary, the input of the network uses an arbitrary number of images of a 3D scene. All the images pass through the shared copy of a feature extraction CNN. These features are then pooled using a view pooling mechanism and fed into a classification network, which is typically a fully connected network. The fundamental difference from a standard CNN is that this kind of an architecture can allow the use of multiple views of the same scene. Training works in a similar way with one change, that instead of passing in one image and label/value at a time, the methods pass in multiple views of mesh as images and label/value at a time.
In still other embodiments, the validation CNN (which works with 2D raster images) can be replaced with a neural network that directly works with 3D data, such as a MeshGAN. In other embodiments, the validation CNN (which works with 2D raster images) can be replaced with a GraphCNN (which works directly on 3D data). In other embodiments, the validation CNN (which works with 2D raster images) can be replaced with a GraphGAN (which works directly on 3D data).
One example provides images of the tooth in relation to a mold parting surface into both 1) the Lingual-bias NN and 2) the Facial-bias NN.
The methods loop through each tooth, determining whether the mold parting surface is correctly positioned in relation to that tooth, or whether the mold parting surface needs adjustment in either the lingual or facial directions, in the vicinity of the tooth.
In other embodiments, the pair of NN comprising of the Lingual-bias NN and the Facial-bias NN, each of which is operable to perform 2-class classification, may be replaced with a single NN which is operable to perform 3-class classification. This 3-class classification NN would be trained on 2D raster images from 3 classes:
The 3-class classification NN would render predictions from this set of three class labels.
In other embodiments, an N-class classification NN could be employed to assign one of N possible class labels to each data sample, corresponding to N distinct states of the appliance component (i.e. a mold parting surface).
In other embodiments, both views from the facial and the lingual sides can be incorporated in one NN as opposed to having two separate NNs. In this case, a graphical convolution network would take the entire tooth mesh as an input and output one regression value which represents the amount of “radial” adjustment for that particular tooth. The input to such a NN (the original 3D scene) strictly has more information than a few arbitrarily rendered images rendered from the scene.
The following is one embodiment of a neural network used in the implementation for the 2D raster image embodiment of the validation component:
One embodiment uses a neural network which is trained on 1) examples of correct appliance components and 2) examples of appliance components which have been systematically altered to be incorrect. This neural network distinguishes between correct and incorrect appliance components using multiple 2D renderings of the component, taken from different views.
The NN can be trained to distinguish between 1) a correct parting surface and 2) a parting surface which is too far lingual. The NN was tested on 3 patient cases, totally 54 teeth. In this test, 50 of the teeth yielded correct predictions and 4 teeth yielded incorrect predictions.
In this method, the neural network is trained to validate the correctness of a mold parting surface. This embodiment reflects 2-class classification, where the neural network was trained on two classes of data, i.e., where the parting surface is either:
The diagram in
Each of the 30 views (
This method of visualizing the results of the neural network is advantageous because the method organizes the large number of views of the tooth for a single test case and enables a human to quickly scan through and grasp the results of the test case.
The diagram in
The neural network is trained on ground truth data of the following 3 classes.
The various embodiments described herein can be used in a variety of different neural networks. Embodiment 2 uses a CNN. Embodiment 1 of uses a Graph Convolutional Neural Network (GraphCNN). Other embodiments may involve elements derived in whole or in part from other types of neural networks, including the following: Perceptron (P); Feed Forward (FF); Radial Basis Network (RBF); Deep Feed Forward (DFF); Recurrent Neural Networks (RNN); Long Short-Term Memory (LSTM); Gated Recurrent Unit (GRU); Auto Encoder (AE); Variational Auto Encoder (VAE); Denoising Auto Encoder (DAE); Sparse Auto Encoder (SAE); Capsule Autoencoder (CAE); Stacked Capsule Autoencoders (SCAE); Deep Belief Network (DBN); Deep Convolutional Network (DCN); Deconvolutional Network (DN); Generative Adversarial Network (GAN); Liquid State Machine (LSM); and Neural Turing Machine (NTM).
The GraphCNN can operate on dental data which are provided in a 3D form, such as a 3D Mesh. The mesh includes both vertices and instructions on how to arrange the vertices into faces. Implicit in the definitions of the faces is information about the edges which connect the vertices.
The CNN can operate on dental data which are provided in the form of 2D raster images. The 2D raster images can use color or shading to highlight areas of interest within the dental anatomy (e.g., the use of red and blue coloration, or light and dark shading, to denote the facial and lingual portions of a tooth which result from application of the mold parting surface to that tooth).
These neural networks may be trained on data which has undergone augmentation. In the case of the 3D mesh data, augmentation can involve stochastic or deterministic transforms that are applied to the vertices or the faces, so as to change the shape of the 3D mesh, but not alter the essential identity of the mesh. This variation in the mesh shape can help a classifier to avoid overfitting when used as training data. In the case of 2D raster images, the images may be resized, stretched, rotated, sheared or undergo the introduction of noise. Likewise, with the 3D data, these 2D data augmentations to the training data can help the neural networks avoid overfitting at training time.
The neural networks described herein can incorporate various activation functions, such as RELU. Other activation functions include: binary step, identity, logistic, and Tan H. The neural networks can incorporate down sampling techniques, such as pooling and max-pooling. The neural networks can reduce overfitting and reduce generalization error using regularization techniques such as dropout.
The following are other examples of dental appliances which may benefit from the validation techniques described herein.
In some embodiments, the validation techniques described herein can be applied to the design of custom lingual brackets. Digital 3D views of lingual brackets placed on teeth could be used to train a validation NN that would render a pass/fail verdict on the lingual bracket design. This feedback could be acted upon by a trained technician or sent to the automation software that generated the lingual bracket to improve the design of the next iteration of the lingual bracket. For lingual brackets, a bonding pad is created for a specific tooth by outlining a perimeter on the tooth, creating a thickness to form a shell, and then subtracting-out the tooth via a Boolean operation. Bracket bodies are selected from a library, placed on the pad and united to the pad via Boolean addition. Various bracket components (e.g., hooks and wings) are adjusted to best adapt to the particular geometry of the tooth and gingiva and are united into the bracket body to complete the digital design of the bracket which is exported as a 3D geometry file. In some embodiments, the STL format may be used for the 3D geometry file.
Brackets are selected from a library and custom positioned on a tooth. Fine adjustments are made based on local tooth anatomy in the bonding region and some customization of torque and rotation is possible through compensation within the adhesive bond line between the tooth and the bracket. A NN is trained to recognize discrepancies in the bracket placements, where those placements were either automated placements or technician-produced placement.
In other embodiments, the validation techniques described herein can be applied to the design of CTAs, for example the 3D data that are used to design aligner trays. An example of such data is a 3D representation (e.g., 3D mesh) of the patient's teeth called a “fixture model” that is then sent to a 3D printer. Parameters such as the location of the trim line, the geometry and position of attachments, bite ramps or slits can be validated. The trim line is where the aligner is trimmed during thermo forming. More complex features are possible in direct 3D printed aligners (local thickness, reinforcing rib geometry, flap positioning, etc.) can be subject to the validation techniques described herein.
A digital 3D model of a patient's teeth and gums showing the trim line could be used to train a validation NN that would render a pass/fail verdict on the CTA. This feedback could be acted upon by a trained technician or sent to the automation software that generated the CTA to improve the design of the next iteration of the CTA. The CTAs are a series of removable, nearly invisible plastic trays that are shaped to move the patient's teeth progressively along a series of predetermined positions.
Other dental appliances which can be validated using the validation techniques described herein include data or structures related to implant placement, or other types of dental restoration (such as veneer, crown, or bridge) design.
Also, the validation techniques described herein could be used to validate a bracket placement, including either or both of a manual placement by a human expert and an automatic placement by an algorithm.
These embodiments include, for example, the following: using a machine learning segmentation module to provide the capability of segmenting out hardware from scanned arches.
The hardware can be in the form of brackets, braces, or other complicated external artifacts.
A deep learning model is used to automatically segment teeth from a 3D mesh. This process can be divided into two steps: model development/training, and model deployment. During training (flow chart 1 in
The flow chart in
As more data is acquired, machine learning methods and particularly deep learning methods start performing can exceed the performance of explicitly programmed methods. Deep learning methods have the significant advantage of removing the need for hand-crafted features as they are able to infer several useful features using a combination of several non-linear functions of higher dimensional latent or hidden features, directly from the data through the process of training. While trying to solve the segmentation problem, directly operating on the malocclusion 3D mesh might be desirable.
Deep Learning for Tooth Segmentation from the Gum:
A deep learning model performs tooth segmentation from 3D mesh data using MeshCNN. MeshCNN is a general-purpose deep neural network for 3D triangular meshes, which can be used for tasks such as 3D shape classification or segmentation. This framework includes convolution, pooling, and unpooling layers which are applied directly on mesh edges and has an advantage over other approaches because it is invariant to mesh rotation, scale, and translation changes. Deep learning algorithms including MeshCNN have two main development steps: 1) Model training and 2) Model deployment.
1. Model Training
Model training makes use of multiple unsegmented and segmented digital 3D models for historic case data. Prior to use, these 3D models may undergo some mesh cleanup and resampling. For our case data, many standard mesh cleanup operations were performed including hole filling, degenerate edge removal, island removal, etc. For computational efficiency during model training, mesh decimation was also performed to decrease the number of faces to a smaller number (roughly 3000). To increase the number of 3D mesh samples used to train the deep neural network, data augmentation techniques including nonuniform scale, vertex shift, and edge flipping were used. The unsegmented meshes as well as labels for each mesh edge were input into the MeshCNN framework. As is standard for deep learning models, the model was trained through a process that iteratively adjusts a set of weights to minimize the difference between predicted and actual segmentation labels. The trained model was then evaluated by predicting segmentation labels for a reserve set of cases that were not used during training and measuring accuracy. The model achieved 97% accuracy in correctly identifying edges as either belonging to teeth or gums.
2. Model Deployment
The model deployment stage makes use of the trained model that was developed during Step 1: Model Training. The trained model takes as input an unsegmented 3D scan for a new case. Any mesh cleanup or resampling that was performed on 3D meshes during the model training stage should also be applied to the new 3D scan data. The trained model outputs a set of labels that indicate, for each edge, whether the edge belongs to the “gum” or “tooth” class.
Examples of segmentation results for some upper and lower arches are shown in
The segmentation results created above were generated by assuming that edges in the mesh belonged to one of two classes: (1) tooth, (2) gum. Alternatively, the edges could be labeled as belonging to one of multiple classes, for example:
A deep learning model, such as MeshCNN, can be trained to label edges as belonging to one of multiple classes.
This method uses GDL to infer parts or segments of an object scan using different scanning hardware. This method uses a machine learning approach to infer a segmentation of an input point cloud. These segments are corresponding to the individual teeth and gingiva (gums). The model was trained using a dataset of point clouds acquired using intra-oral scans (henceforth referred to as data points), embodied as collections of (x,y,z) coordinates of each point in the point cloud and the associated segmentation of the points into teeth and gingiva.
This map can later be used in other geometric operations. For instance, in the case of digital orthodontics, this model could be used to standardize the incoming point cloud in a coordinate system that is conducive for processing without the need for having manual input in the loop. This effectively and significantly reduces the processing time for each case and also reduces the need to train human workers to perform this task.
The flow chart in
The flow chart in
During training, both a point cloud and the associated segmentation is passed in whereas during testing, only the point cloud is passed in.
Stages in the Workflow:
The (augmented) point cloud is passed through the machine learning model and the associated approximate coordinate system is obtained. The steps related to the use of the machine learning model are provided below.
The training process uses training data, which is a set of pairs of data points and the associated coordinate systems. This data is assumed to be available before the creation of the model.
The methods pass batches randomly selected from the training data set into the model and compute the loss function. This loss function measures the dissimilarity between the ground truth coordinate systems and the predicted coordinate systems.
The methods infer the gradients from the computed loss function and update the weights of the model. This process is repeated either for a predefined number of iterations or till a certain objective criterion is met.
The methods can use a validation set available at the beginning of training. This data set is similar to the training data set, in that it is a set of paired data points and associated coordinate systems.
After a set number of training iterations, the methods pass the validation set to the model and compute the loss function value. This value serves as a measure of how well the model generalizes on unseen data. The validation loss values can be used as a criterion for stopping the training process.
These embodiments include, for example, the following.
Case complexity: Use a regression module to classify the complexity level of a treatment for a case given a scanned arch.
Case characteristics: Use a regression model to classify scanned arch meshes based on case characteristics such as bite relationship (Class 1, 2, or 3), bite (overbite/deep bite), midline shift, etc. Use a regression model to classify scanned arch meshes based on existing labels of case characteristics, such as bite relationship (class 1, 2, or 3), bite (overbite, overjet, anterior/posterior crossbite), midline offset, anterior leveling, spaces/crowding, arch form, and protocols applied (extrusion, expansion, distalization).
Predict treatment duration: Use a regression module to classify the complexity level of a treatment for a case given a scanned arch, which is later used to predict the amount of care and treatment time needed.
This embodiment includes a machine learning method to determine the relative pose or coordinate system for a 3D object with respect to a global frame of reference. Such a method has impact on problems such as orthodontic treatment planning, etc.
The problem of determining pose of a 3D object is usually resolved using computational geometry approaches. 3D pose estimation from 2D images, especially of humans and human faces, is a well-studied problem. However, there are scenarios where the relative pose of a 3D object given a frame of reference is important, and information about the shape of the object in 3D is available. Traditionally, explicit description of shape features and matching to a template or registering to a template are used to determine the pose. For example, the Iterative Closest Point (IC) algorithm can be used to register an observed target 3D shape to a canonical template. Then the inferred transformation matrix can be used to transform the pose of the reference template to the target shape.
Deep learning methods directly applied on 3D shape representations have been used to solve two problems: 1) Object classification; and 2) Semantic segmentation or vertex/element-wise classification. Using similar techniques to predict the pose or coordinate system is possible. The requirement is that the model predicts a set of real numbers or a transformation matrix that represents the pose—position and orientation of the 3D object with respect to the global frame of reference. This can be represented by seven output parameters—3 for translation and 4 for the quaternion representation of the rotation. This provides fewer than the 12 parameters that would be required to represent the full transformation matrix. However, this representation is not limited and other representations such as axis angle or Euler angles can also be used.
Method: Given a large number of training data of mesh geometry (e.g., mesh representations of teeth) and corresponding output transformation parameters as labels, a mesh based or point based deep learning model can be trained, for example using PointNet, PointCNN, etc. Additionally, during training data augmentation can be performed on the input mesh such as under-sampling, rotating, and permuting the points. This can help generate thousands of augmented input data from a single source greatly increasing the chance that the algorithm delivers higher performance.
The following are exemplary embodiments for coordinate system prediction: Method of receiving 3D point cloud or mesh data, using a machine learning algorithm to predict relative pose and position given a global frame of reference; and method of receiving 3D point cloud or mesh data, using a registration algorithm to align point cloud to a known set of one or more templates and then using the results to determine relative pose and position with respect to a global frame of reference.
These embodiments can be used, for example, where the 3D point cloud represents teeth and where the registration algorithm can be ICP, ICP with point to plane distance metric, etc.
These methods use GDL to infer the orientation/coordinate system of an object using only a point cloud obtained from its surface using different scanning hardware.
In these methods, a machine learning approach is used to infer a map between the point cloud and an associated coordinate system. An example of such an algorithm can use a modification of PointNet. The methods train the model using a dataset of point clouds acquired using intra-oral scans (referred to as data points), embodied as collections of (x,y,z) coordinates of each point in the point cloud and the associated coordinate systems, embodied in a six-dimensional representation. The model functions as a regression map between the point cloud domain and the coordinate system domain, in that, given a point cloud, the model infers the associated coordinate system.
This map can later be used in other geometric operations. For instance, in the case of digital orthodontics, this model can be used to standardize the incoming point cloud in a coordinate system that is conducive for processing without the need for having a human in the loop. This effectively and significantly reduces the processing time for each case and also reduces the need to train human workers to perform this task.
The flow chart of
The flow chart of
Method workflow: Input point clouds are obtained from segmenting teeth from a scanned arch. This point cloud is originally in a “global coordinate system.” The following are stages in the workflow:
The standardized point cloud is passed through the machine learning model and the associated approximate coordinate system is obtained. The steps related to the use of the machine learning model are provided below.
The training process uses training data, which is a set of pairs of data points and the associated coordinate systems. This data is assumed to be available before the creation of the model.
The methods pass batches randomly selected from the training data set into the model and compute the loss function. This loss function measures the dissimilarity between the ground truth coordinate systems and the predicted coordinate systems.
The methods infer the gradients from the computed loss function and update the weights of the model. This process is repeated either for a predefined number of iterations or until a certain objective criterion is met.
The methods assume that there is a validation set available at the beginning of training. This data set is similar to the training data set, in that it is a set of paired data points and associated coordinate systems.
After a set number of training iterations, the methods pass the validation set to the model and compute the loss function value. This value serves as a measure of how well the model generalizes on unseen data. The validation loss values can be used as a criterion for stopping the training process.
Below is the description of our experimental setup for this task, and the results for some of the teeth.
A set of 65 cases (possibly partially complete) were divided into a training and a validation set using a 4:1 split. Each case was a collection of a point cloud and its associated human annotated coordinate system. The point clouds corresponding to these cases had variable input point densities and were non-homogenous in their sizes as well. Only the (x,y,z) coordinates of the points as their feature vectors were used.
These embodiments include, for example, the following.
Grouping doctors and preferences: Using an unsupervised approach such as clustering, providers (e.g., doctors, technicians, or others) are grouped based on their treatment preferences. Treatment preferences could be indicated in a treatment prescription form, or they could be based upon characteristics in treatment plans such as setup characteristics (e.g., amount of bite correction or midline correction in planned final setups), staging characteristics (e.g., treatment duration, tooth movement protocols, or overcorrection strategies), or outcomes (e.g., number of revisions/refinements).
Using a supervised approach and provider identifications in existing data, a recommendations system is trained for each provider based on their past preferences.
Using a supervised approach, long paragraphs of provider's (e.g., doctor's) notes are translated or converted into the correct order of procedures that setup technicians follow during treatment design.
The flow chart in
After provider preferences have been summarized by machine learning, the treatment planning algorithm considers those preferences and generates customized future treatment plans according to each provider's (e.g., doctor's) past treatments. The customization of treatment can reduce the number of revisions of plans between the doctors and technicians. The following Table provide an exemplary data structure for storing these provider preferences and customized treatment plans. The customized treatment plans can be stored in other ways and templates.
The methods and processes described herein can be implemented in, for example, software or firmware modules for execution by one or more processors such as processor 20. Information generated by the methods and processes can be displayed on a display device such as display device 16. If user interaction is required or desired for the methods and processes, such interaction can be provided via an input device such as input device 18.
The GDL and machine learning embodiments described herein can be combined in order to use the GDL processing for any combination of the embodiments, such as the following.
The mesh or model cleanup process described in Section I can be performed before or along with the dental restoration prediction process described in Section I to provide a cleaned up mesh or model to the dental restoration prediction process.
The mesh or model cleanup process described in Section I, along with segmentation process described in Section II and the coordinate system process described in Section III, can be performed before or along with the dental restoration prediction process described in Section I to provide a cleaned up mesh or model with segmentation and a coordinate system to the dental restoration prediction process.
The mesh or model cleanup process described in Section I can be performed before or along with the dental restoration validation process described in Section I to provide a cleaned up mesh or model to the dental restoration validation process.
The mesh or model cleanup process described in Section I, along with segmentation process described in Section II and the coordinate system process described in Section III, can be performed before or along with the dental restoration validation process described in Section I to provide a cleaned up mesh or model with segmentation and a coordinate system to the dental restoration validation process.
The dental restoration prediction process described in Section I can be performed with the dental restoration validation process described in Section I, one before the other or at least partially simultaneously.
The mesh or model cleanup process described in Section I can be performed before or along with the dental restoration prediction process described in Section I is performed with the dental restoration validation process described in Section I to provide a cleaned up mesh or model to the dental restoration prediction and validation processes.
The mesh or model cleanup process described in Section I can be performed, along with the segmentation process described in Section II and the coordinate system process described in Section III, before or along with the dental restoration prediction process described in Section I is performed with the dental restoration validation process described in Section I to provide a cleaned up mesh or model, segmented and with a coordinate system, to the dental restoration prediction and validation processes.
The mesh or model cleanup process described in Section I can be performed before or along with the segmentation process described in Section II to provide a cleaned up mesh or model to the segmentation process.
The mesh or model cleanup process described in Section I can be performed before or along with the coordinate system process described in Section III to provide a cleaned up mesh or model to the coordinate system process.
The segmentation process described in Section II can be performed with the coordinate system process described in Section III, one before the other or at least partially simultaneously, to provide a mesh or model both segmented and with a coordinate system.
The mesh or model cleanup process described in Section I can be performed before or along with the segmentation process described in Section II is performed with the coordinate system process described in Section III to provide a cleaned up mesh or model to the segmentation process and the coordinate system process.
The mesh or model cleanup process described in Section I, the dental restoration prediction and validation processes described in Section I, the segmentation process described in Section II, and the coordinate system process described in Section III can be selectively used with the grouping providers process described in Section IV when generating the customized treatment plans.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2021/061230 | 12/2/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63124263 | Dec 2020 | US |