Embodiments of the present disclosure relate to the field of dentistry and, in particular, to systems and methods for determining three-dimensional (3D) data for 2D points in intraoral images.
Dental impressions of a subject's intraoral 3D surface, e.g., teeth and gingiva, are used for planning dental procedures. Traditional dental impressions are made using a dental impression tray filled with an impression material, e.g., PVS or alginate, into which the subject bites. The impression material then solidifies into a negative imprint of the teeth and gingiva, from which a 3D model of the teeth and gingiva can be formed.
Digital dental impressions utilize intraoral scanning to generate 3D digital models of an intraoral 3D surface of a subject. Digital intraoral scanners often use structured light 3D imaging. The surface of a subject's teeth may be highly reflective and somewhat translucent, which may reduce the contrast in the structured light pattern reflecting off the teeth. Therefore, in order to improve the capture of an intraoral scan, when using a digital intraoral scanner that utilizes structured light 3D imaging, a subject's teeth are frequently coated with an opaque powder prior to scanning in order to facilitate a usable level of contrast of the structured light pattern, e.g., in order to turn the surface into a scattering surface. While intraoral scanners utilizing structured light 3D imaging have made some progress, additional advantages may be had.
A few example implementations are summarized. These example implementations should not be construed as limiting.
In a first implementation, a method comprises: projecting, by one or more structured light projectors of an intraoral scanner, a light pattern comprising a plurality of projector rays onto a dental site; capturing, by a plurality of cameras of the intraoral scanner, a plurality of images of at least a portion of the light pattern projected onto the dental site, wherein each camera of the plurality of cameras captures an image of the plurality of images, the image comprising a plurality of points of at least the portion of the light pattern projected onto the dental site; determining, for each projector ray of the plurality of projector rays, one or more candidate points of the plurality of points that might have been caused by the projector ray; processing information for each projector ray using a trained machine learning model, wherein the trained machine learning model generates one or more outputs comprising, for each projector ray, and for each candidate point associated with the projector ray, a probability that the candidate point corresponds to the projector ray; and determining three-dimensional (3D) coordinates for at least some of the plurality of points in the plurality of images based on the one or more outputs of the trained machine learning model.
A second implementation may further extend the first implementation. In the second implementation, each of the plurality of images is a two-dimensional (2D) image.
A third implementation may further extend any of the first or second implementations. In the third implementation, the method further comprises: determining, for each projector ray, and for each candidate point of the one or more candidate points that might have been caused by the projector ray, a distance at which the candidate point intersects with the projector ray, wherein the information for the projector ray that is input into the trained machine learning model comprises the distance.
A fourth implementation may further extend the third implementation. In the fourth implementation, the method further comprises: for each projector ray, grouping one or more candidate points from different images of the plurality of images for which the distance matches into a candidate intersection, wherein the candidate intersection comprises an intersection of the one or more candidate points with the projector ray.
A fifth implementation may further extend the fourth implementation. In the fifth implementation, the distance for candidate points match if the distance varies by less than a threshold amount.
A sixth implementation may further extend the fourth or fifth implementation. In the sixth implementation, the method further comprises: determining, for each candidate intersection, a triangulation point of the candidate intersection, wherein the information for the projector ray that is input into the trained machine learning model comprises the triangulation point.
A seventh implementation may further extend any of the third through or sixth implementations. In the seventh implementation, the one or more structured light projectors comprise a plurality of structured light projectors, the method further comprising: determining, for each projector ray, an index of a structured light projector of the plurality of structured light projectors that generated the projector ray, wherein the information for the projector ray that is input into the trained machine learning model comprises the index.
An eighth implementation may further extend the seventh implementation. In the eighth implementation, a first subset of the plurality of structured light projectors produces light having a first wavelength, and wherein a second subset of the plurality of structured light projectors produces light having a second wavelength, the method further comprising: determining the 3D coordinates for one or more points of a first subset of points of the plurality of points having the first wavelength; and independently determining the 3D coordinates for one or more additional points of a second subset of points of the plurality of points having the second wavelength.
A ninth implementation may further extend the eighth implementation. In the ninth implementation, the method further comprises: identifying one or more projector rays for which candidate points from at least one of the first subset of points or the second subset of points have not been selected; combining information for the first subset of points and the second subset of points; and determining the 3D coordinates for one or more additional points of the first subset of points and the 3D coordinates for one or more additional points of the second subset of points after combining the information.
A 10th implementation may further extend any of the third through 9th implementations. In the 10th implementation, the method further comprises: determining, for each projector ray, and for one or more candidate points associated with the projector ray, one or more properties associated with the projector ray and the one or more candidate points, wherein the information for the projector ray that is input into the trained machine learning model comprises the one or more properties.
An 11th implementation may further extend the 10th implementation. In the 11th implementation, the one or more properties comprise a distance from an epi-polar line associated with the projector ray.
A 12th implementation may further extend the 11th implementation. IN the 12th implementation, the distance from the epi-polar line comprises an orthogonal distance from the epi-polar line.
A 13th implementation may further extend any of the 10th through 12th implementations. In the 13th implementation, the one or more properties comprise, for an image associated with a candidate point, a triangulation error that is determined based on a distance between a camera that captured the image and an origin of the projector ray.
A 14th implementation may further extend any of the 10th through 13th implementations. In the 14th implementation, the one or more properties comprise an intensity associated with the captured point.
A 15th implementation may further extend any of the 10th through 14th implementations. In the 15th implementation, the captured point comprises a captured spot, and wherein the one or more properties comprise a spot size of the captured spot.
A 16th implementation may further extend any of the 10th through 15th implementations. In the 16th implementation, the one or more properties comprise a color of the dental site at the intersection of a candidate point with the projector ray as determined from one or more color images captured at least one of before or after capture of the plurality of images.
A 17th implementation may further extend any of the 1st through 16th implementations. In the 17th implementation, the method further comprises: generating a tuple for a projector ray comprising: distances and probabilities for one or more top candidate points for the projector ray; and distances and probabilities for one or more top candidate points for one or more additional projector rays that are proximate to the projector ray; and inputting the tuple into a second trained machine learning model, wherein the second trained machine learning model outputs an updated probability for one or more candidate points for the projector ray.
An 18th implementation may further extend any of the 1st through 17th implementations. In the 18th implementation, the plurality of images are associated with a current frame, and wherein a previous plurality of images was generated at a prior frame prior to generation of the plurality of images, the method further comprising: determining, for a projector ray, a 3D coordinate associated with the projector ray for the prior frame; and updating, for a candidate point for the projector ray, the probability that the candidate point corresponds to the projector ray based on the 3D coordinate associated with the projector ray for the prior frame.
A 19th implementation may further extend any of the 1st through 18th implementations. In the 19th implementation, the method further comprises: using a second trained machine learning model to select candidate points for a plurality of projector rays based on one or more inputs comprising probabilities of candidate points corresponding to projector rays, wherein the 3D coordinates are determined based on the selected candidate points.
A 20th implementation may further extend any of the 1st through 19th implementations. In the 20th implementation, a computer readable medium comprises instructions that, when executed by a processing device, cause the processing device to perform the method of any of the 1st through 19th implementations.
A 21st implementation may further extend any of the 1st through 19th implementations. In the 21st implementation, an intraoral scanning system comprises: the intraoral scanner to generate the plurality of images; and a computing device, wherein the computing device is to perform the method of any of the 1st through 19th implementations.
In a 22nd implementation, a method comprises: projecting, by one or more structured light projectors of an intraoral scanner, a light pattern comprising a plurality of projector rays onto a dental site; capturing, by a plurality of cameras of the intraoral scanner, a plurality of images of at least a portion of the light pattern projected onto the dental site, wherein each camera of the plurality of cameras captures an image of the plurality of images, the image comprising a plurality of points of at least the portion of the light pattern projected onto the dental site; determining, for each projector ray of the plurality of projector rays, one or more candidate points of the plurality of points that might have been caused by the projector ray, each candidate point of the one or more candidate points having a determined probability of corresponding to the projector ray; using a trained machine learning model to select candidate points for a plurality of projector rays based on one or more inputs comprising probabilities of candidate points corresponding to projector rays; and determining three-dimensional (3D) coordinates for at least some of the plurality of points in the plurality of images based on the selected candidate points for the plurality of projector rays.
A 23rd implementation may further extend the 22nd implementation. In the 23rd implementation, the method further comprises: generating an input comprising a candidate point for a projector ray, one or more additional candidate points for the projector ray, and one or more additional projector rays for the candidate point; and providing the input to the trained machine learning model, wherein the trained machine learning model outputs a selection of the candidate point or one of the one or more additional candidate points for the projector ray.
A 24th implementation may further extend the 23rd implementation. In the 24th implementation, the method further comprises: removing the selected candidate point from association with the one or more additional projector rays that were associated with the selected candidate point.
A 25th implementation may further extend the 24th implementation. In the 25th implementation, the method further comprises: generating a next input comprising a next candidate point for a next projector ray, one or more next additional candidate points for the next projector ray, and one or more next additional projector rays for the next candidate point; providing the next input to the trained machine learning model, wherein the trained machine learning model outputs a selection of the next candidate point or one of the one or more next additional candidate points for the next projector ray; and removing the selected next candidate point from association with the one or more next additional projector rays that were associated with the selected next candidate point.
A 26th implementation may further extend the 25th implementation. In the 26th implementation, the method further comprises: repeating the generating of the next input, the providing of the next input to the trained machine learning model, and the removing of the selected next candidate point for a plurality of next additional projector rays until no remaining projector rays have an associated candidate point with at least a threshold probability.
A 27th implementation may further extend the 26th implementation. In the 27th implementation, the method further comprises: reducing the threshold probability to a second threshold probability; and repeating the generating of the next input, the providing of the next input to the trained machine learning model, and the removing of the selected next candidate point for a plurality of additional projector rays until no remaining projector rays have an associated candidate point with at least the second threshold probability.
A 28th implementation may further extend any of the 23rd through 27th implementations. In the 28th implementation, the method further comprises: generating a first list associating projector rays with candidate intersections, wherein each candidate intersection comprises an intersection of a projector ray of the plurality of projector rays and a candidate point of the one or more candidate points that might have been caused by the projector ray, the first list comprising, for each projector ray, one or more candidate intersections associated with the projector ray; and generating a second list of the plurality of points, the second list comprising, for each point of the plurality of points, one or more candidate intersections associated with the point; wherein at least one of the first list or the second list is used to generate the input.
A 29th implementation may further extend any of the 22nd through 28th implementations. In the 29th implementation, a computer readable medium comprises instructions that, when executed by a processing device, cause the processing device to perform the method of any of the 22nd through 28th implementations.
A 30th implementation may further extend any of the 22nd through 28th implementations. In the 30th implementation, an intraoral scanning system comprises: the intraoral scanner to generate the plurality of images; and a computing device, wherein the computing device is to perform the method of any of the 22nd through 28th implementations.
In a 31st implementation, a method comprises: using a first trained machine learning model to determine probabilities that captured points of a captured light pattern in one or more images correspond to projected points of a projected light pattern; using a second trained machine learning model to determine correspondence between a plurality of the captured points and a plurality of the projected points based on one or more of the determined probabilities; and determining depth information for at least some of the plurality of captured points based on the determined correspondence.
A 32nd implementation may further extend the 31st implementation. In the 32nd implementation, a computer readable medium comprises instructions that, when executed by a processing device, cause the processing device to perform the method of the 31st implementation.
A 33rd implementation may further extend the 31st implementation. In the 33rd implementation, an intraoral scanning system comprises: the intraoral scanner to generate the plurality of images; and a computing device, wherein the computing device is to perform the method of the 31st implementation.
In a 34th implementation, a method comprises: using one or more trained machine learning models to determine correspondence between captured points of a captured light pattern in one or more images and projected points of a projected light pattern; and determining depth information for at least some of the plurality of captured points based on the determined correspondence.
A 35th implementation may further extend the 34th implementation. In the 35th implementation, a computer readable medium comprises instructions that, when executed by a processing device, cause the processing device to perform the method of the 34th implementation.
A 36th implementation may further extend the 34th implementation. In the 36th implementation, an intraoral scanning system comprises: the intraoral scanner to generate the plurality of images; and a computing device, wherein the computing device is to perform the method of the 34th implementation.
In a 37th implementation, a method comprises: projecting, by one or more structured light projectors of an intraoral scanner, a light pattern comprising a plurality of projector rays onto a dental site, wherein the plurality of projector rays form a plurality of features of the light pattern; capturing, by a plurality of cameras of the intraoral scanner, a plurality of images of at least a portion of the light pattern projected onto the dental site, wherein each camera of the plurality of cameras captures an image of the plurality of images, the image comprising a subset of the plurality of features; determining, for each projector ray of the plurality of projector rays, one or more candidate points of the plurality of features that might have been caused by the projector ray, each combination of a candidate point and a projector ray corresponding to a candidate intersection; processing information for each candidate intersection to determine a probability that the candidate point of the candidate intersection corresponds to the projector ray of the candidate intersection; selecting a subset of candidate intersections based at least in part on determined probabilities; and determining three-dimensional (3D) coordinates for at least some of the plurality of features according to the selected subset of the candidate intersections.
A 38th implementation may further extend the 37th implementation. In the 38th implementation, processing the information for a projector ray is performed using a trained machine learning model, wherein the trained machine learning model generates one or more outputs comprising, for each candidate intersection, the probability that the candidate point of the candidate intersection corresponds to the projector ray of the candidate intersection.
A 39th implementation may further extend any of the 37th through 38th implementations. In the 39th implementation, the selecting the subset of the candidate intersections is performed using a trained machine learning model based on one or more inputs comprising the determined probabilities.
A 40th implementation may further extend any of the 37th through 39th implementations. In the 40th implementation, each of the plurality of images is a two-dimensional (2D) image.
A 41st implementation may further extend any of the 37th through 40th implementations. In the 41st implementation, the light pattern comprises a pattern of spots, and wherein the plurality of features comprises a plurality of discrete unconnected spots.
A 42nd implementation may further extend any of the 37th through 40th implementations. In the 42nd implementation, the light pattern comprises a checkerboard pattern, and wherein the plurality of features comprise a plurality of regions of the checkerboard pattern.
A 43rd implementation may further extend any of the 37th through 42nd implementations. In the 43rd implementation, the processing the information is performed using a first trained machine learning model, and wherein the selecting the subset of the candidate intersections is performed using a second trained machine learning model.
A 44th implementation may further extend any of the 37th through 43rd implementations. In the 44th implementation, the method further comprises: determining, for a feature of the plurality of features, that no projector ray of the plurality of projector rays has a candidate intersection for the feature with a probability that meets a probability threshold; and removing the feature from consideration.
A 45th implementation may further extend any of the 37th through 44th implementations. In the 45th implementation, the method further comprises: determining, for a projector ray of the plurality of projector rays, that no candidate intersection associated with the projector ray has a probability that meets a probability threshold; and removing the projector ray from consideration, wherein no 3D coordinate is determined for the projector ray.
A 46th implementation may further extend any of the 37th through 45th implementations. In the 46th implementation, the method further comprises: determining, for a projector ray of the plurality of projector rays, that a first candidate intersection associated with the projector ray has a first probability, that a second candidate intersection associated with the projector ray has a second probability, and that a delta between the first probability and the second probability is less than a threshold; and removing the projector ray from consideration, wherein no 3D coordinate is determined for the projector ray.
A 47th implementation may further extend the 46th implementation. In the 47th implementation, the first probability and the second probability are each at or above a probability threshold.
A 48th implementation may further extend any of the 37th through 47th implementations. In the 48th implementation, at most one candidate intersection is determined for each projector ray of the plurality of projector rays.
A 49th implementation may further extend any of the 37th through 48th implementations. In the 49th implementation, the plurality of projector rays are arranged in a known grid pattern (e.g., in a known order), the method further comprising: determining, for one or more remaining projector rays for which a candidate intersection has not been selected, one or more candidate intersections that fail to preserve the known order; and removing the one or more candidate intersections from consideration.
A 50th implementation may further extend the 49th implementation. In the 50th implementation, the method further comprises: determining, for each remaining projector ray of the one or more remaining projector rays, and for each remaining candidate intersection associated with the remaining projector ray, an updated probability that the remaining candidate intersection corresponds to the remaining projector ray
A 51st implementation may further extend any of the 37th through 50th implementations. In the 51st implementation, the plurality of projector rays are arranged in a first known order along a first axis and in a second known order along a second axis, the method further comprising: determining, for one or more remaining projector rays for which a candidate intersection has not been selected, one or more candidate intersections that fail to preserve at least one of the first known order or the second known order; and removing the one or more candidate intersections from consideration.
A 52nd implementation may further extend the 37th through 51st implementations. In the 52nd implementation, the plurality of projector rays are arranged in a hexagonal grid pattern.
A 53rd implementation may further extend the 52nd implementation. In the 53rd implementation, the method further comprises: performing an affine transformation to at least one of the plurality of projector rays or the plurality of features to transform the hexagonal grid pattern into a rectangular grid pattern, wherein the first axis and the second axis are axes of the rectangular grid pattern.
A 54th implementation may further extend any of the 37th through 53rd implementations. In the 54th implementation, a computer readable medium comprises instructions that, when executed by a processing device, cause the processing device to perform the method of any of the 37th through 53rd implementations.
A 55th implementation may further extend any of the 37th through 53rd implementations. In the 55th implementation, an intraoral scanning system comprises: the intraoral scanner to generate the plurality of images; and a computing device, wherein the computing device is to perform the method of any of the 37th through 53rd implementations.
In a 56th implementation, a method comprises: receiving probabilities relating structured light features in one or more images captured by one or more cameras of an intraoral scanner with projector rays of a light pattern projected by one or more structured light projectors of the intraoral scanner; determining three-dimensional (3D) coordinates of a subset of the structured light features by associating the subset of the structured light features with a subset of the projector rays based on the received probabilities; constraining projector ray candidates for non-associated structured light features by removing one or more projector ray candidates for non-associated structured light features that do not preserve order with the subset of the structured light features associated with the subset of the projector rays; and solving, after constraining the projector ray candidates for the non-associated structured light features, for 3D coordinates of at least a subset of the non-associated structured light features by associating at least the subset of the non-associated structured light features with a subset of the non-associated projector rays.
A 57th implementation may further extend the 56th implementation. In the 57th implementation, each of the one or more images is a two-dimensional (2D) image.
A 58th implementation may further extend any of the 56th through 57th implementations. In the 58th implementation, the light pattern comprises a pattern of spots, and wherein the structured light features comprise a plurality of discrete unconnected spots.
A 59th implementation may further extend any of the 56th through 57th implementations. In the 58th implementation, the light pattern comprises a checkerboard pattern, and wherein the structured light features comprise a plurality of regions of the checkerboard pattern.
A 60th implementation may further extend any of the 56th through 59th implementations. In the 60th implementation, the method further comprises: determining, for a feature of the structured light features, that no projector ray candidate has a probability associating the projector ray candidate with the feature that meets a probability threshold; and removing the feature from consideration.
A 61st implementation may further extend any of the 56th through 60th implementations. In the 61st implementation, the method further comprises: determining, for a feature of the structured light features, that a first projector ray candidate has a first probability associating the first projector ray candidate with the feature, that a second projector ray candidate has a second probability associating the second projector ray candidate with the feature, and that a delta between the first probability and the second probability is less than a threshold; and removing the feature from consideration, wherein no 3D coordinate is solved for the feature.
A 62nd implementation may further extend the 61st implementation. In the 62nd implementation, the first probability and the second probability are each at or above a probability threshold.
A 63rd implementation may further extend any of the 56th through 62nd implementations. In the 63rd implementation, at most one projector ray is associated with each structured light feature.
A 64th implementation may further extend any of the 56th through 63rd implementations. In the 64th implementation, the method further comprises: determining updated probabilities associating non-associated structured light features in the one or more images with non-associated projector rays of the light pattern responsive to constraining the projector ray candidates.
A 65th implementation may further extend any of the 56th through 64th implementations. In the 65th implementation, the projector rays are arranged in a hexagonal grid pattern.
A 66th implementation may further extend any of the 56th through 65th implementations. In the 66th implementation, the method further comprises: performing an affine transformation to at least one of the projector rays or the structured light features to transform the hexagonal grid pattern into a rectangular grid pattern, wherein the constraining is performed after the affine transformation.
A 67th implementation may further extend any of the 56th through 66rd implementations. In the 67th implementation, a computer readable medium comprises instructions that, when executed by a processing device, cause the processing device to perform the method of any of the 56th through 66rd implementations.
A 68th implementation may further extend any of the 56th through 66rd implementations. In the 68th implementation, an intraoral scanning system comprises: the intraoral scanner to generate the plurality of images; and a computing device, wherein the computing device is to perform the method of any of the 56th through 66rd implementations.
In a 69th implementation, a method comprises: receiving candidate pairings of structured light features in one or more images captured by one or more cameras of an intraoral scanner with projector rays of a light pattern projected by one or more structured light projectors of the intraoral scanner, each of the candidate pairings comprising a probability that a structured light feature corresponds to a projector ray; removing structured light features from consideration that have no candidate pairings with probabilities that are at or above a threshold; and solving for 3D coordinates of at least a subset of the structured light features by selecting candidate pairings for the subset of the structured light features based at least in part on the probabilities.
A 70th implementation may further extend the 69th implementation. In the 70th implementation, the method further comprises: determining, for a feature of the structured light features, that a first candidate pairing has a first probability associating a first projector ray with the feature, that a second candidate pairing has a second probability associating a second projector ray with the feature, and that a delta between the first probability and the second probability is less than a threshold; and removing the feature from consideration, wherein no 3D coordinate is solved for the feature.
A 71st implementation may further extend any of the 69th through 70th implementations. In the 71st implementation, a computer readable medium comprises instructions that, when executed by a processing device, cause the processing device to perform the method of any of the 69th through 70th implementations.
A 72nd implementation may further extend any of the 69th through 70th implementations. In the 72nd implementation, an intraoral scanning system comprises: the intraoral scanner to generate the plurality of images; and a computing device, wherein the computing device is to perform the method of any of the 69th through 70th implementations.
In a 73rd implementation, a method comprises: determining candidate pairings of structured light features in one or more images captured by one or more cameras of an intraoral scanner with projector rays of a light pattern projected by one or more structured light projectors of the intraoral scanner; determining, for each candidate pairing of the candidate pairings, a probability that a structured light feature of the candidate pairing corresponds to a projector ray of the candidate pairing; and determining 3D coordinates of at least a subset of the structured light features by selecting candidate pairings based at least in part on determined probabilities.
A 74th implementation may further extend the 73rd implementation. In the 74th implementation, the method further comprises: removing one or more candidate pairings for which a known order of structured light features is not preserved; and solving for 3D coordinates of one or more additional structured light features by selecting one or more remaining candidate pairings for the one or more structured light features.
A 75th implementation may further extend any of the 73rd through 74th implantations. In the 75th implementation, the method further comprises: removing structured light features from consideration that have no candidate pairings with probabilities that are at or above a threshold.
A 76th implementation may further extend any of the 73rd through 75th implantations. In the 76th implementation, the method further comprises: determining, for a feature of the structured light features, that a first candidate pairing has a first probability associating a first projector ray with the feature, that a second candidate pairing has a second probability associating a second projector ray with the feature, and that a delta between the first probability and the second probability is less than a threshold; and removing the feature from consideration, wherein no 3D coordinate is solved for the feature.
A 77th implementation may further extend any of the 73rd through 76th implantations. In the 77th implementation, determining a probability that a structured light feature of a candidate pairing corresponds to a projector ray of the candidate pairing comprises: processing information for the candidate pairing using a trained machine learning model, wherein the trained machine learning model generates an output comprising the probability that the structured light feature corresponds to the projector ray.
A 78th implementation may further extend any of the 73rd through 77th implantations. In the 78th implementation, selecting the candidate pairings based at least in part on the determined probabilities comprises processing one or more inputs comprising one or more of the determined probabilities using a trained machine learning model, wherein the trained machine learning model outputs one or more selections of candidate pairings.
A 79th implementation may further extend any of the 73rd through 78th implantations. In the 79th implementation, each of the one or more images is a two-dimensional (2D) image.
An 80th implementation may further extend any of the 73rd through 78th implementations. In the 80th implementation, a computer readable medium comprises instructions that, when executed by a processing device, cause the processing device to perform the method of any of the 73rd through 78th implementations.
An 81st implementation may further extend any of the 73rd through 78th implementations. In the 81st implementation, an intraoral scanning system comprises: the intraoral scanner to generate the plurality of images; and a computing device, wherein the computing device is to perform the method of any of the 73rd through 78th implementations.
Embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings.
Described herein are methods and systems for processing images of dental surfaces illuminated by structured light and determining depth information for points on the dental surfaces based on captured points (e.g., structured light features) of the structured light pattern in the images. Reconstruction of a 3D point cloud from points in 2D images (e.g., spots in 2D images) requires solving a complex problem of correspondence matching between 2D points in the images and corresponding projector rays of structured light projected onto a dental surface. Embodiments disclosed herein apply machine learning techniques to assign probability scores to pairs of projector rays and candidate points (e.g., structured light features) that might have caused the projector rays, where such pairs may be referred to herein as candidate intersections or simply candidates. In embodiments, properties (also referred to as features) associated with candidate intersections are determined based on extraction of information from images and/or information about the cameras and/or structured light projectors associated with the candidate intersections. The extracted information may relate to matching quality, and may be processed by the trained machine learning model to assign, for each candidate intersection, a probability of a candidate point of the candidate intersection having been caused by the projector ray of the candidate intersection. In embodiments, a second trained machine learning model processes probabilities of candidate intersections being correct to select candidate intersections (e.g., to select, for each projector ray, one of the candidate points associated with that projector ray). Embodiments select candidate intersections in a manner that chooses “best” candidates for given projector rays while optimizing a global score summation. Accordingly, in embodiments a best candidate intersection may not be a highest probability candidate intersection for a given projector ray, but may instead be a candidate intersection that results in a highest combined probability of a set of candidate points corresponding to associated projector rays. As used herein, a candidate point for a projector ray is a point of structured light feature that might have been caused by the projector ray. A candidate point for a projector ray corresponds to a candidate intersection that includes the projector ray and the point/feature.
Embodiments provide a system and method for solving a correspondence problem between projector rays (e.g., points output by a structured light projector) and camera rays (e.g., points in images captured by cameras). In embodiments, the correspondence problem is solved with a maximal solve rate, a minimal error rate, and a minimal computation power usage. In embodiments, the correspondence problem is solved even in instances where some projector rays in the field of view of one or more cameras may be missed due, for example, to low signal quality. Additionally, in embodiments the correspondence problem is solved even in instances where some detected points are false points due to noise. In embodiments, the correspondence problem is solved in real time or near-real time as images are generated. Accordingly, as images are captured by an intraoral scanner, processing logic may solve the correspondence problem and determine 3D coordinates for points in the captured images to form point clouds. The point clouds may constitute intraoral scans, and may be stitched to previously captured 3D point clouds/intraoral scans and/or to a 3D surface generated from such previously captured 3D point clouds/intraoral scans.
Various embodiments are described herein. It should be understood that these various embodiments may be implemented as stand-alone solutions and/or may be combined. Accordingly, references to an embodiment, or one embodiment, may refer to the same embodiment and/or to different embodiments. Some embodiments are discussed herein with reference to intraoral scans and intraoral images. However, it should be understood that embodiments described with reference to intraoral scans also apply to lab scans or model/impression scans. A lab scan or model/impression scan may include one or more images of a dental site or of a model or impression of a dental site. Various embodiments are discussed with regards to candidate intersections. It should be understood that these embodiments also apply equally to candidate intersection groups, which are groups of candidate intersections from different images/cameras for which a same or similar intersection distance of a projector ray is determined.
Computing device 105 may be coupled to one or more intraoral scanner 150 (also referred to as a scanner) and/or a data store 125 via a wired or wireless connection. In one embodiment, multiple scanners 150 in dental office 108 wirelessly connect to computing device 105. In one embodiment, scanner 150 is wirelessly connected to computing device 105 via a direct wireless connection. In one embodiment, scanner 150 is wirelessly connected to computing device 105 via a wireless network. In one embodiment, the wireless network is a Wi-Fi network. In one embodiment, the wireless network is a Bluetooth network, a Zigbee network, or some other wireless network. In one embodiment, the wireless network is a wireless mesh network, examples of which include a Wi-Fi mesh network, a Zigbee mesh network, and so on. In an example, computing device 105 may be physically connected to one or more wireless access points and/or wireless routers (e.g., Wi-Fi access points/routers). Intraoral scanner 150 may include a wireless module such as a Wi-Fi module, and via the wireless module may join the wireless network via the wireless access point/router.
Computing device 106 may also be connected to a data store (not shown). The data stores may include local data stores and/or remote data stores. Computing device 105 and computing device 106 may each include one or more processing devices, memory, secondary storage, one or more input devices (e.g., such as a keyboard, mouse, tablet, touchscreen, microphone, camera, and so on), one or more output devices (e.g., a display, printer, touchscreen, speakers, etc.), and/or other hardware components.
Computing device 105 and/or data store 125 may be located at dental office 108 (as shown), at dental lab 110, or at one or more other locations such as a server farm that provides a cloud computing service. Computing device 105 and/or data store 125 may connect to components that are at a same or a different location from computing device 105 (e.g., components at a second location that is remote from the dental office 108, such as a server farm that provides a cloud computing service). For example, computing device 105 may be connected to a remote server, where some operations of intraoral scan application 115 are performed on computing device 105 and some operations of intraoral scan application 115 are performed on the remote server.
Intraoral scanner 150 may include a probe (e.g., a hand held probe) for optically capturing three-dimensional structures. The intraoral scanner 150 may be used to perform an intraoral scan of a patient's oral cavity. An intraoral scan application 115 running on computing device 105 may communicate with the scanner 150 to effectuate the intraoral scan. A result of the intraoral scan may be intraoral scan data 135A, 135B through 135N that may include one or more sets of intraoral scans and/or sets of intraoral 2D images. Each intraoral scan may include a 3D image or point cloud that may include depth information of a portion of a dental site. In embodiments, intraoral scans include x, y and z information.
A captured 3D image or point cloud may be generated based on multiple 2D images captured in parallel (e.g., at the same time) by different cameras. Scanner 150 may include one or more structured light projectors that output structured light at one or a few wavelengths, which may illuminate a dental site with the structured light. Multiple cameras of the scanner 150 may capture images of the dental site illuminated by the structured light from different angles. Captured images may be 2D images of the dental site illuminated by the structured light. Triangulation may be performed to determine the depth information about the 2D images. For example, each point of the structured light captured in one or more 2D images may have known 2D coordinates, but may initially lack depth information. The depth information may be determined using triangulation based on known information about a location of an origin of a projector ray of a structured light projector that caused a point to appear on the dental site and a location of a camera sensor that captured the point. Additionally, for points captured in multiple images (e.g., each image captured by a different camera), known locations of camera sensors of the two cameras may additionally or alternatively be used to perform triangulation and determine the depth of the point. However, it can be difficult to determine which projector rays correspond to which points (e.g., which spots or other structured light features) in the captured 2D images. Accordingly, in embodiments intraoral scan application 115 and/or other logic processes captured images to determine which points in captured images correspond to which projector rays of the structured light projector(s). Details for solving such a correspondence problem are set forth in greater detail below.
Intraoral scan data 135A-N may also include color 2D images and/or images of particular wavelengths (e.g., near-infrared (NIRI) images, infrared images, ultraviolet images, etc.) of a dental site in embodiments. In embodiments, intraoral scanner 150 alternates between generation of 3D intraoral scans (e.g., in which structured light is projected and 2D images of a dental site illuminated by the structured light are captured and processed to determine 3D point clouds) and one or more types of 2D intraoral images (e.g., color images, NIRI images, etc.) during scanning. For example, one or more 2D color images may be generated between generation of a fourth and fifth intraoral scan by outputting white light and capturing reflections of the white light using multiple cameras.
Intraoral scanner 150 may include multiple different cameras (e.g., each of which may include one or more image sensors) that generate additional 2D images (e.g., 2D color images) of different regions of a patient's dental arch concurrently. Intraoral 2D images may include 2D color images, 2D infrared or near-infrared (NIRI) images, and/or 2D images generated under other specific lighting conditions (e.g., 2D ultraviolet images). The 2D images may be used by a user of the intraoral scanner to determine where the scanning face of the intraoral scanner is directed and/or to determine other information about a dental site being scanned. The 2D images may also be used to apply a texture mapping to a 3D surface and/or 3D model of the dental site generated from the intraoral scans.
The scanner 150 may transmit the intraoral scan data 135A, 135B through 135N to the computing device 105. Computing device 105 may store some or all of the intraoral scan data 135A-135N in data store 125. In some embodiments, intraoral scan application 115 processes the intraoral scan data 135A-N to determine which points in captured 2D images correspond to which projector rays of structured light projectors, and to ultimately generate a 3D point cloud based on the points in the 2D images. The process of solving the correspondence problem and determining which points correspond to which projector rays is described in greater detail below with reference to
According to an example, a user (e.g., a practitioner) may subject a patient to intraoral scanning. In doing so, the user may apply scanner 150 to one or more patient intraoral locations. The scanning may be divided into one or more segments (also referred to as roles). As an example, the segments may include a lower dental arch of the patient, an upper dental arch of the patient, one or more preparation teeth of the patient (e.g., teeth of the patient to which a dental device such as a crown or other dental prosthetic will be applied), one or more teeth which are contacts of preparation teeth (e.g., teeth not themselves subject to a dental device but which are located next to one or more such teeth or which interface with one or more such teeth upon mouth closure), and/or patient bite (e.g., scanning performed with closure of the patient's mouth with the scan being directed towards an interface area of the patient's upper and lower teeth). Via such scanner application, the scanner 150 may provide intraoral scan data 135A-N to computing device 105. The intraoral scan data 135A-N may be provided in the form of intraoral scan data sets, each of which may include 2D intraoral images (e.g., color 2D images) and/or 3D intraoral scans (e.g., based on 2D images of a dental site illuminated by structured light) of particular teeth and/or regions of an dental site. In one embodiment, separate intraoral scan data sets are created for the maxillary arch, for the mandibular arch, for a patient bite, and/or for each preparation tooth. Alternatively, a single large intraoral scan data set is generated (e.g., for a mandibular and/or maxillary arch). Intraoral scans may be provided from the scanner 150 to the computing device 105 in the form of one or more points (e.g., one or more pixels and/or groups of pixels). For instance, the scanner 150 may provide an intraoral scan as one or more point clouds. The intraoral scans may each comprise height information (e.g., a height map that indicates a depth for each pixel). In some embodiments, the intraoral scans include multiple 2D images of a dental site illuminated by one or more structured light projectors, which are then processed to generate a 3D point cloud. The processing of the 2D images may be performed on the scanner 150 before transmission to the computing device 105 or may be performed on the computing device 105 after receipt of the 2D images from scanner 150.
The manner in which the oral cavity of a patient is to be scanned may depend on the procedure to be applied thereto. For example, if an upper or lower denture is to be created, then a full scan of the mandibular or maxillary edentulous arches may be performed. In contrast, if a bridge is to be created, then just a portion of a total arch may be scanned which includes an edentulous region, the neighboring preparation teeth (e.g., abutment teeth) and the opposing arch and dentition. Alternatively, full scans of upper and/or lower dental arches may be performed if a bridge is to be created.
By way of non-limiting example, dental procedures may be broadly divided into prosthodontic (restorative) and orthodontic procedures, and then further subdivided into specific forms of these procedures. Additionally, dental procedures may include identification and treatment of gum disease, sleep apnea, and intraoral conditions. The term prosthodontic procedure refers, inter alia, to any procedure involving the oral cavity and directed to the design, manufacture or installation of a dental prosthesis at a dental site within the oral cavity (dental site), or a real or virtual model thereof, or directed to the design and preparation of the dental site to receive such a prosthesis. A prosthesis may include any restoration such as crowns, veneers, inlays, onlays, implants and bridges, for example, and any other artificial partial or complete denture. The term orthodontic procedure refers, inter alia, to any procedure involving the oral cavity and directed to the design, manufacture or installation of orthodontic elements at a dental site within the oral cavity, or a real or virtual model thereof, or directed to the design and preparation of the dental site to receive such orthodontic elements. These elements may be appliances including but not limited to brackets and wires, retainers, clear aligners, or functional appliances.
In embodiments, intraoral scanning may be performed on a patient's oral cavity during a visitation of dental office 108. The intraoral scanning may be performed, for example, as part of a semi-annual or annual dental health checkup. The intraoral scanning may also be performed before, during and/or after one or more dental treatments, such as orthodontic treatment and/or prosthodontic treatment. The intraoral scanning may be a full or partial scan of the upper and/or lower dental arches, and may be performed in order to gather information for performing dental diagnostics, to generate a treatment plan, to determine progress of a treatment plan, and/or for other purposes. The dental information (intraoral scan data 135A-N) generated from the intraoral scanning may include 3D scan data, 2D color images, NIRI and/or infrared images, and/or ultraviolet images, of all or a portion of the upper jaw and/or lower jaw. The intraoral scan data 135A-N may further include one or more intraoral scans showing a relationship of the upper dental arch to the lower dental arch. These intraoral scans may be usable to determine a patient bite and/or to determine occlusal contact information for the patient. The patient bite may include determined relationships between teeth in the upper dental arch and teeth in the lower dental arch.
Intraoral scanners may work by moving the scanner 150 inside a patient's mouth to capture all viewpoints of one or more tooth. During scanning, the scanner 150 is calculating distances to solid surfaces in some embodiments. These distances may be recorded as 3D point clouds in some embodiments. Each scan (e.g., point cloud) is overlapped algorithmically, or ‘stitched’, with the previous set of scans to generate a growing 3D surface. As such, each scan is associated with a rotation in space, or a projection, to how it fits into the 3D surface.
During intraoral scanning, intraoral scan application 115 may register and stitch together two or more intraoral scans generated thus far from the intraoral scan session to generate a growing 3D surface. In one embodiment, performing registration includes capturing 3D data of various points of a surface in multiple scans, and registering the scans by computing transformations between the scans. One or more 3D surfaces may be generated based on the registered and stitched together intraoral scans during the intraoral scanning. The one or more 3D surfaces may be output to a display so that a doctor or technician can view their scan progress thus far. As each new intraoral scan is captured and registered to previous intraoral scans and/or a 3D surface, the one or more 3D surfaces may be updated, and the updated 3D surface(s) may be output to the display. A view of the 3D surface(s) may be periodically or continuously updated according to one or more viewing modes of the intraoral scan application. In one viewing mode, the 3D surface may be continuously updated such that an orientation of the 3D surface that is displayed aligns with a field of view of the intraoral scanner (e.g., so that a portion of the 3D surface that is based on a most recently generated intraoral scan is approximately centered on the display or on a window of the display) and a user sees what the intraoral scanner sees. In one viewing mode, a position and orientation of the 3D surface is static, and an image of the intraoral scanner is optionally shown to move relative to the stationary 3D surface.
Intraoral scan application 115 may generate one or more 3D surfaces from intraoral scans, and may display the 3D surfaces to a user (e.g., a doctor) via a graphical user interface (GUI) during intraoral scanning. In embodiments, separate 3D surfaces are generated for the upper jaw and the lower jaw. This process may be performed in real time or near-real time to provide an updated view of the captured 3D surfaces during the intraoral scanning process. As scans are received, these scans may be registered and stitched to a 3D surface.
When a scan session or a portion of a scan session associated with a particular scanning role (e.g., upper jaw role, lower jaw role, bite role, etc.) is complete (e.g., all scans for an dental site or dental site have been captured), intraoral scan application 115 may generate a virtual 3D model of one or more scanned dental sites (e.g., of an upper jaw and a lower jaw). The final 3D model may be a set of 3D points and their connections with each other (i.e. a mesh). To generate the virtual 3D model, intraoral scan application 115 may register and stitch together the intraoral scans generated from the intraoral scan session that are associated with a particular scanning role. The registration performed at this stage may be more accurate than the registration performed during the capturing of the intraoral scans, and may take more time to complete than the registration performed during the capturing of the intraoral scans. In one embodiment, performing scan registration includes capturing 3D data of various points of a surface in multiple scans, and registering the scans by computing transformations between the scans. The 3D data may be projected into a 3D space of a 3D model to form a portion of the 3D model. The intraoral scans may be integrated into a common reference frame by applying appropriate transformations to points of each registered scan and projecting each scan into the 3D space.
In one embodiment, registration is performed for adjacent or overlapping intraoral scans (e.g., each successive frame of an intraoral video). Registration algorithms are carried out to register two adjacent or overlapping intraoral scans and/or to register an intraoral scan with a 3D model, which essentially involves determination of the transformations which align one scan with the other scan and/or with the 3D model. Registration may involve identifying multiple points in each scan (e.g., point clouds) of a scan pair (or of a scan and the 3D model), surface fitting to the points, and using local searches around points to match points of the two scans (or of the scan and the 3D model). For example, intraoral scan application 115 may match points of one scan with the closest points interpolated on the surface of another scan, and iteratively minimize the distance between matched points. Other registration techniques may also be used.
Intraoral scan application 115 may repeat registration for all intraoral scans of a sequence of intraoral scans to obtain transformations for each intraoral scan, to register each intraoral scan with previous intraoral scan(s) and/or with a common reference frame (e.g., with the 3D model). Intraoral scan application 115 may integrate intraoral scans into a single virtual 3D model by applying the appropriate determined transformations to each of the intraoral scans. Each transformation may include rotations about one to three axes and translations within one to three planes.
Intraoral scan application 115 may generate one or more 3D models from intraoral scans, and may display the 3D models to a user (e.g., a doctor) via a graphical user interface (GUI). The 3D models can then be checked visually by the doctor. The doctor can virtually manipulate the 3D models via the user interface with respect to up to six degrees of freedom (i.e., translated and/or rotated with respect to one or more of three mutually orthogonal axes) using suitable user controls (hardware and/or virtual) to enable viewing of the 3D model from any desired direction. If scaling of image on screen is also considered, than the doctor can virtually manipulate the 3D models with respect to up to seven degrees of freedom (the previously described six degrees of freedom in addition to zoom or scale).
After completion of the 3D surface(s) and/or 3D model(s) and/or during generation of the 3D surface(s) and/or 3D model(s) intraoral scan application may perform texture mapping to map color information to the 3D surface(s) and/or 3D model(s). Color images (e.g., images generated under white light conditions) may be processed, and color information from these color images may be added to the 3D surface(s) and/or 3D model(s). In some embodiments, the color information is used to improve an accuracy of solving the correspondence problem for at least some points in captured 2D images of intraoral scans.
Reference is now made to
For some applications, structured light projectors 22 are positioned within probe 28 such that each structured light projector 22 faces an object 32 outside of intraoral scanner 20 that is placed in its field of illumination, as opposed to positioning the structured light projectors in a proximal end of the handheld wand and illuminating the object by reflection of light off a mirror and subsequently onto the object. In embodiments, the structured light projectors 22 and cameras 24 are a distance of less than 20 mm from the object 32, or less than 15 mm from the object 32, or less than 10 mm from the object 32. The distance may be measured as a distance between a camera/structured light projector and a plane orthogonal to an imaging axis of the intraoral scanner (e.g., where the imaging axis of the intraoral scanner may be perpendicular to a longitudinal axis of the intraoral scanner). Alternatively, the distance may be measured differently for each camera as a distance from the camera to the object 32 along a ray from the camera to the object.
In some embodiments, the structured light projectors are disposed at a distal end of the wand and face an object to be scanned (e.g., is directed approximately orthogonal to a longitudinal axis of a probe of the wand). In some embodiments, one or more structured light projectors are oriented approximately parallel to the longitudinal axis of the probe and face a mirror in the probe, which redirects structured light projected by the structured light projector(s) onto an object to be imaged. Similarly, for some applications, cameras 24 are positioned within probe 28 at a distal end of the probe such that each camera 24 faces an object 32 outside of intraoral scanner 20 that is placed in its field of view. In some embodiments, one or more cameras is oriented approximately parallel to the longitudinal axis of the probe and toward a mirror, and views the object by reflection of light off the mirror and into the camera. In some embodiments, the projectors and cameras of the intraoral scanner are arranged into component groups (e.g., imaging units) each including one or more structured light projectors and one or more cameras. In embodiments, component groups are arranged in series along the longitudinal axis of the probe. In an embodiment, each component group includes a projector with two or more cameras disposed about the projector. By combining multiple cameras and structured light projectors within probe 28, the scanner is able to have an overall large field of view while maintaining a low profile probe.
In some applications, cameras 24 each have a large field of view β (beta) of at least 45 degrees, e.g., at least 70 degrees, e.g., at least 80 degrees, e.g., 85 degrees. In some applications, the field of view may be less than 120 degrees, e.g., less than 100 degrees, e.g., less than 90 degrees. In one embodiment, a field of view β (beta) for each camera is between 80 and 90 degrees, which may be particularly useful because it provided a good balance among pixel size, field of view and camera overlap, optical quality, and cost. Cameras 24 may include an image sensor 58 and objective optics 60 including one or more lenses. To enable close focus imaging, cameras 24 may focus at an object focal plane 50 that is located between 1 mm and 30 mm, e.g., between 4 mm and 24 mm, e.g., between 5 mm and 11 mm, e.g., 9 mm-10 mm, from the lens that is farthest from the sensor. In some applications, cameras 24 may capture images at a frame rate of at least 30 frames per second, e.g., at a frame of at least 75 frames per second, e.g., at least 100 frames per second. In some applications, the frame rate may be less than 200 frames per second.
A large field of view achieved by combining the respective fields of view of all the cameras (e.g., of multiple component groups) may improve accuracy due to reduced amount of image stitching errors, especially in edentulous regions, where the gum surface is smooth and there may be fewer clear high resolution 3D features. Having a larger field of view enables large smooth features, such as the overall curve of the tooth, to appear in each image frame, which improves the accuracy of stitching respective surfaces obtained from multiple such image frames.
Similarly, structured light projectors 22 may each have a large field of illumination α (alpha) of at least 45 degrees, e.g., at least 70 degrees. In some applications, field of illumination α (alpha) may be less than 120 degrees, e.g., than 100 degrees.
For some applications, in order to improve image capture, each camera 24 has a plurality of discrete preset focus positions, in each focus position the camera focusing at a respective object focal plane 50. Each of cameras 24 may include an autofocus actuator that selects a focus position from the discrete preset focus positions in order to improve a given image capture. Additionally or alternatively, each camera 24 includes an optical aperture phase mask that extends a depth of focus of the camera, such that images formed by each camera are maintained focused over all object distances located between 1 mm and 30 mm, e.g., between 4 mm and 24 mm, e.g., between 5 mm and 11 mm, e.g., 9 mm-10 mm, from the lens that is farthest from the sensor.
In some applications, structured light projectors 22 and cameras 24 are coupled to rigid structure 26 in a closely packed and/or alternating fashion, such that (a) a substantial part of each camera's field of view overlaps the field of view of neighboring cameras, and (b) a substantial part of each camera's field of view overlaps the field of illumination of neighboring projectors. Optionally, at least 20%, e.g., at least 50%, e.g., at least 75% of the projected pattern of light are in the field of view of at least one of the cameras at an object focal plane 50 that is located at least 4 mm from the lens that is farthest from the sensor. Due to different possible configurations of the projectors and cameras, some of the projected pattern may never be seen in the field of view of any of the cameras, and some of the projected pattern may be blocked from view by object 32 as the scanner is moved around during a scan.
Rigid structure 26 may be a non-flexible structure to which structured light projectors 22 and cameras 24 are coupled so as to provide structural stability to the optics within probe 28. Coupling all the projectors and all the cameras to a common rigid structure helps maintain geometric integrity of the optics of each structured light projector 22 and each camera 24 under varying ambient conditions, e.g., under mechanical stress as may be induced by the subject's mouth. Additionally, rigid structure 26 helps maintain stable structural integrity and positioning of structured light projectors 22 and cameras 24 with respect to each other.
Reference is now made to
Reference is now made to
Typically, the distal-most (toward the positive x-direction in
In embodiments, the number of structured light projectors 22 in probe 28 may range from two, e.g., as shown in row (iv) of
In an example application, an apparatus for intraoral scanning (e.g., an intraoral scanner 150) includes an elongate wand comprising a probe at a distal end of the elongate handheld wand, at least two light projectors disposed within the probe, and at least four cameras disposed within the probe. Each light projector may include at least one light source configured to generate light when activated, and a pattern generating optical element that is configured to generate a pattern of light when the light is transmitted through the pattern generating optical element. Each of the at least four cameras may include a camera sensor (also referred to as an image sensor) and one or more lenses, wherein each of the at least four cameras is configured to capture a plurality of images that depict at least a portion of the projected pattern of light on an intraoral surface. A majority of the at least two light projectors and the at least four cameras may be arranged in at least two rows that are each approximately parallel to a longitudinal axis of the probe, the at least two rows comprising at least a first row and a second row. In at least one embodiment, as shown in row (v), the intraoral scanner includes two rows of cameras (e.g., two rows of three cameras each) and a single row of structured light projectors (e.g., five structured light projectors) disposed between the two rows of cameras.
In a further application, a distal-most camera along the longitudinal axis and a proximal-most camera along the longitudinal axis of the at least four cameras are positioned such that their optical axes are at an angle of 90 degrees or less with respect to each other from a line of sight that is perpendicular to the longitudinal axis. Cameras in the first row and cameras in the second row may and/or third row be positioned such that optical axes of the cameras in the first row are at an angle of 90 degrees or less with respect to optical axes of the cameras in the second row and/or third row from a line of sight that is coaxial with the longitudinal axis of the probe. A remainder of the at least four cameras other than the distal-most camera and the proximal-most camera have optical axes that are substantially parallel to the longitudinal axis of the probe. Some of the at least two rows may include an alternating sequence of light projectors and cameras. In some embodiments, some rows contain only projectors and some rows contain only cameras (e.g., as shown in row (v).
In a further application, the distal-most camera along the longitudinal axis and the proximal-most camera along the longitudinal axis are positioned such that their optical axes are at an angle of 35 degrees or less with respect to each other from the line of sight that is perpendicular to the longitudinal axis. The cameras in the first row and the cameras in the second row and/or third row may be positioned such that the optical axes of the cameras in the first row are at an angle of 35 degrees or less with respect to the optical axes of the cameras in the second row and/or third row from the line of sight that is coaxial with the longitudinal axis of the probe.
In a further application, the at least four cameras may have a combined field of view of 25-45 mm along the longitudinal axis and a field of view of 20-40 mm along a z-axis corresponding to distance from the probe.
Returning to
Processor 96 may run a surface reconstruction algorithm that may use detected patterns (e.g., dot patterns) projected onto object 32 to generate a 3D surface of the object 32, as described in greater detail below with reference to
For some applications, all data points taken at a specific time are used as a rigid point cloud, and multiple such point clouds are captured at a frame rate of over 10 captures per second. The plurality of point clouds are then stitched together using a registration algorithm, e.g., iterative closest point (ICP), to create a dense point cloud. A surface reconstruction algorithm may then be used to generate a representation of the surface of object 32.
For some applications, at least one temperature sensor 52 is coupled to rigid structure 26 and measures a temperature of rigid structure 26. Temperature control circuitry 54 disposed within intraoral scanner 20 (a) receives data from temperature sensor 52 indicative of the temperature of rigid structure 26 and (b) activates a temperature control unit 56 in response to the received data. Temperature control unit 56, e.g., a PID controller, keeps probe 28 at a desired temperature (e.g., between 35 and 43 degrees Celsius, between 37 and 41 degrees Celsius, etc.). Keeping probe 28 above 35 degrees Celsius, e.g., above 37 degrees Celsius, reduces fogging of the glass surface of intraoral scanner 20, through which structured light projectors 22 project and cameras 24 view, as probe 28 enters the intraoral cavity, which is typically around or above 37 degrees Celsius. Keeping probe 28 below 43 degrees, e.g., below 41 degrees Celsius, prevents discomfort or pain.
In some embodiments, heat may be drawn out of the probe 28 via a heat conducting element 94, e.g., a heat pipe, that is disposed within intraoral scanner 20, such that a distal end 95 of heat conducting element 94 is in contact with rigid structure 26 and a proximal end 99 is in contact with a proximal end 100 of intraoral scanner 20. Heat is thereby transferred from rigid structure 26 to proximal end 100 of intraoral scanner 20. Alternatively or additionally, a fan disposed in a handle region 174 of intraoral scanner 20 may be used to draw heat out of probe 28.
In embodiments, the scanner 150 of
Reference is now made to
In some applications, each structured light projector 22 projects at least 400 (e.g., 550) discrete unconnected spots 33 onto an intraoral three-dimensional surface during a scan. In embodiments, each structured light projector 22 projects a plurality of projector rays or beams, each corresponding to a discrete spot. In some applications, each structured light projector 22 projects less than 3000 discrete unconnected spots 33 onto an intraoral surface during a scan. In order to reconstruct the three-dimensional surface from projected sparse distribution 34, correspondence between respective projected spots 33 and the spots detected by cameras 24 should be determined, as further described hereinbelow with reference to
For some applications, structured light projector 22 includes a light source (e.g., a laser diode) and a pattern generating optical element. The pattern generating optical element may be, for example, a diffractive optical element (DOE) that generates distribution 34 of discrete unconnected spots 33 of light when the light source transmits light through DOE onto object 32. As used herein, a spot of light is defined as a small area of light having any shape. For some applications, different structured light projectors 22 generate spots having different respective shapes, i.e., every spot 33 generated by a specific DOE has the same shape, and the shape of spots 33 generated by at least one DOE is different from the shape of spots generated by at least one other DOE 39. Alternatively, different structured light projectors project spots having the same shape. In some embodiments, different light projectors project spots having a different wavelength or color (e.g., structured light projectors may be divided into a first class that emits a first wavelength, a second class that emits a second wavelength, a third class that emits a third wavelength, and so on). By way of example, some pattern generating optical elements may generate circular spots 33 (such as is shown in
Reference is now made to
With specific reference to
In embodiments, during a calibration process one or more machine learning models are trained using image data captured from one or more known objects (e.g., one or more calibration objects). Training the machine learning models requires huge amount of data that is fed to the machine learning models. In one embodiment, a reference object is first scanned with an intraoral scanner. The reference model may be a model for which processing logic has a ground-truth surface mesh (e.g., generated using an external, high resolution/accurate scanner. Resulting data, including all candidate rays, may be input into a dedicated file or data store. In one embodiment, using a “stitch-to-reference” technique processing logic takes the trajectories of the intraoral scanner (determined by the stitching of the captured images to the reference) and simulates spots using calibrated projectors' rays, and shooting them onto the reference surfaces. This provides simulated spots and simulated ray distances. These simulated spots and ray distances can then be compared to “real” candidates to determine if a candidate that is close enough to be considered as the correct candidate is identified. In embodiments, candidates that did not fulfill this rule will be considered as wrong candidates.
Additionally, or alternatively, calibration values may be stored based on camera rays 86 corresponding to pixels on camera sensor 58 of each one of cameras 24, and projector rays 88 corresponding to projected spots 33 of light from each structured light projector 22. For example, calibration values may be stored for (a) a plurality of camera rays 86 corresponding to a respective plurality of pixels on camera sensor 58 of each one of cameras 24, and (b) a plurality of projector rays 88 corresponding to a respective plurality of projected spots 33 of light from each structured light projector 22.
By way of example, the following calibration process may be used. A high accuracy dot target, e.g., black dots on a white background, is illuminated from below and an image is taken of the target with all the cameras. The dot target is then moved perpendicularly toward the cameras, i.e., along the z-axis, to a target plane. The dot-centers are calculated for all the dots in all respective z-axis positions to create a three-dimensional grid of dots in space. A distortion and camera pinhole model is then used to find the pixel coordinate for each three-dimensional position of a respective dot-center, and thus a camera ray is defined for each pixel as a ray originating from the pixel whose direction is towards a corresponding dot-center in the three-dimensional grid. The camera rays corresponding to pixels in between the grid points can be interpolated. The above-described camera calibration procedure is repeated for all respective wavelengths of respective laser diodes 36, such that included in the stored calibration values are camera rays 86 corresponding to each pixel on each camera sensor 58 for each of the wavelengths.
After cameras 24 have been calibrated and all camera ray 86 values stored, structured light projectors 22 may be calibrated as follows. A flat featureless target is used and structured light projectors 22 are turned on one at a time. Each spot is located on at least one camera sensor 58. Since cameras 24 are now calibrated, the three-dimensional spot location of each spot is computed by triangulation based on images of the spot in multiple different cameras. The above-described process is repeated with the featureless target located at multiple different z-axis positions. Each projected spot on the featureless target will define a projector ray in space originating from the projector.
Reference is now made to
Reference is now made to
In one embodiment, the one or more machine learning models include one or more neural networks. For example, the one or more machine learning models may be or include one or more deep neural networks, convolutional neural networks, recurrent neural networks, and so on. Other types of machine learning models that may be used include support vector machines, random forest models, k-nearest neighbors models, Bayesian classifiers, logistic regression algorithms, and so on. The one or more machine learning models may include machine learning models trained using supervised learning, semi-supervised learning, unsupervised learning, and/or reinforcement learning. In some embodiments, the one or more trained machine learning models include at least two machine learning models, as discussed further with reference to
In embodiments, one or more images are generated by an intraoral scanner. The intraoral scanner may include multiple cameras, each of which may capture the same or a different subset of projector points projected by one or more structured light projectors of the intraoral scanner. Multiple images (e.g., each captured by a different camera at a same time) may include captured points that correspond to one or more of the same projected points (e.g., to the same projector rays). The one or more machine learning models may be tasked with determining which of the captured points correspond to which of the projected points (e.g., to which projector rays). This is a non-trivial problem. In embodiments, the one or more machine learning models determine the correspondence between captured points and projector points very quickly (e.g., on the order of micro-seconds or milliseconds) and in real time or near-real time so that this information can be used to construct a 3D point cloud that constitutes an intraoral scan generated at a given time and be registered and stitched to one or more additional intraoral scans previously generated during an intraoral scanning session and/or a 3D surface being constructed as intraoral scanning progresses. In some embodiments, the one or more images are pre-processed prior to inputting data for the intraoral images into the one or more trained machine learning models. The pre-processing may be performed to generate feature vectors or feature data sets (e.g., including properties associated with projected points and/or captured points) that provide useful information from which the one or more machine learning models can a) determine probabilities of captured points in images corresponding to projected points (e.g., projector rays) output by structured light projectors and b) ultimately pair projected points with captured points. Various examples of types of features or properties that are extracted are discussed in greater detail below with reference to
One type of machine learning model that may be used to perform some or all of the above tasks is an artificial neural network, such as a deep neural network. Artificial neural networks generally include a feature representation component with a classifier or regression layers that map features to a desired output space. A convolutional neural network (CNN), for example, hosts multiple layers of convolutional filters. Pooling is performed, and non-linearities may be addressed, at lower layers, on top of which a multi-layer perceptron is commonly appended, mapping top layer features extracted by the convolutional layers to decisions (e.g. classification outputs). Deep learning is a class of machine learning algorithms that use a cascade of multiple layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input. Deep neural networks may learn in a supervised (e.g., classification) and/or unsupervised (e.g., pattern analysis) manner. Deep neural networks include a hierarchy of layers, where the different layers learn different levels of representations that correspond to different levels of abstraction. In deep learning, each level learns to transform its input data into a slightly more abstract and composite representation. In an image recognition application, for example, the raw input may be a matrix of pixels; the first representational layer may abstract the pixels and encode edges; the second layer may compose and encode arrangements of edges; the third layer may encode higher level shapes (e.g., teeth, lips, gums, etc.); and the fourth layer may recognize a scanning role. Notably, a deep learning process can learn which features to optimally place in which level on its own. The “deep” in “deep learning” refers to the number of layers through which the data is transformed. More precisely, deep learning systems have a substantial credit assignment path (CAP) depth. The CAP is the chain of transformations from input to output. CAPs describe potentially causal connections between input and output. For a feedforward neural network, the depth of the CAPs may be that of the network and may be the number of hidden layers plus one. For recurrent neural networks, in which a signal may propagate through a layer more than once, the CAP depth is potentially unlimited.
Training of a neural network may be achieved in a supervised learning manner, which involves feeding a training dataset consisting of labeled inputs through the network, observing its outputs, defining an error (by measuring the difference between the outputs and the label values), and using techniques such as deep gradient descent and backpropagation to tune the weights of the network across all its layers and nodes such that the error is minimized. In many applications, repeating this process across the many labeled inputs in the training dataset yields a network that can produce correct output when presented with inputs that are different than the ones present in the training dataset. In high-dimensional settings, such as large images, this generalization is achieved when a sufficiently large and diverse training dataset is made available. In embodiments, at least some training data includes images and associated 3D point clouds of one or more known objects (e.g., calibration objects). Some training data may include images and associated 3D point clouds (e.g., intraoral scans determined from the images) with labels indicating correspondence between captured points and projector points or projector rays. In some embodiments, the known object may have a known position and/or orientation (e.g., be positioned on a movable stage at a known position and/or orientation), and thus it may be known a priori which captured points correspond to which projector points (e.g., projector rays). Each training data item may include a set of images generated of an object (e.g., a known object) and a label indicating for each captured point in each of the images which projector ray or projector point the captured point corresponds to.
In one embodiment, the one or more machine learning models are trained to output, for one or more candidate intersections of a captured point and a projector ray that might have caused that captured point, a probability that the projector ray caused the captured point. In some embodiments, candidate intersections of multiple images, each captured by a different camera, for which agreement is found for a same or approximate the same distance value for a projector ray are grouped into a candidate intersection group. The one or more machine learning model may additionally be trained to process information for candidate intersection groups, and to output probability scores for such candidate intersection groups. A probability score for a candidate intersection or candidate intersection group, collectively referred to simply as candidates, may be a probability that a distance value of the candidate intersection or candidate intersection group is correct. An output of a machine learning model may be a tuple including probabilities for multiple different candidate intersections. An output of a machine learning model may be a probability for a single candidate intersection or candidate intersection group. Accordingly, one or more machine learning models may consider only a single candidate intersection or candidate intersection group at a time, and one or more other machine learning models may consider multiple candidate intersections and/or candidate intersection groups at a time.
At block 1010, processing logic determines depth information for at least some of the plurality of captured points based on the determined correspondence. Each intersection of a captured point with a projector point (e.g., a projector ray) on a surface of an imaged object may occur at a known depth or distance from the intraoral scanner. Accordingly, if a captured point is determined to have been caused by a particular projector ray, then the depth or distance from the intraoral scanner can be determined. Additionally, the x, y coordinates of each of the captured points are known in each of the images (e.g., based on information about the pixel(s) associated with the captured points). Accordingly, once the depth information is determined a 3D coordinate or location of the intersection of the captured point with the corresponding projected point or projector ray can be determined as well. In this manner 3D coordinates for some or all of the captured points (each having been candidate points for one or more projector rays) in the images can be determined. These 3D coordinates are used to construct a 3D point cloud, which may then be registered to and/or stitched to a 3D surface and/or previously determined 3D point clouds.
In some embodiments, the first trained machine learning model is a neural network, such as a deep neural network, a convolutional neural network, a recurrent neural network, and so on. In some embodiments, the first trained machine learning model is a support vector machine, a random forest model, a k-nearest neighbors model, a Bayesian classifier, a logistic regression algorithm, and so on. The first machine learning model may have been trained using supervised learning, semi-supervised learning, unsupervised learning, and/or reinforcement learning.
In some embodiments, the one or more images are pre-processed prior to inputting data for the intraoral images into the first trained machine learning model. The pre-processing may be performed to generate feature vectors or feature data sets (also referred to as property data sets) that provide useful information from which the one or more machine learning models can a) determine probabilities of captured points in images corresponding to projector points (e.g., projector rays) output by structured light projectors and b) ultimately pair projector points with captured points. Various examples of types of properties/features that are extracted are discussed in greater detail below with reference to
In embodiments, multiple candidate intersections are determined for a set of images, where each image in the set was generated by a different camera at the same time or at approximately the same time. Based on the set of images, processing logic determines all possible candidate intersections, where a candidate intersection is an intersection of a candidate point from an image with a projector ray or projected point. In embodiments, processing logic groups together matching candidate intersections from different images/cameras to generate candidate intersection groups. In embodiments, a feature vector or feature set/property data set may be generated for each candidate intersection and/or candidate intersection group (collectively referred to as candidates). The feature vector or feature set/property data set (which may include one or more of the images and/or may exclude one or more of the images) for a candidate is input into the first trained machine learning model, and the first trained machine learning model outputs a probability score for the candidate (e.g., a probability of a captured point having been caused by a projector ray, which is also a probability of the projector ray having intersected a surface of a dental site at a particular distance from the intraoral scanner). This process may be repeated for each candidate (e.g., for each candidate intersection and/or candidate intersection group). In some embodiments, the probabilities for multiple candidates are determined in parallel (e.g., feature sets for multiple candidates may be input into the first trained machine learning model together, which may provide an output that includes probabilities for each of the candidates).
At block 1110, processing logic uses a second trained machine learning model to determine correspondence between a plurality of captured points and a plurality of projected points (e.g., projector rays that cause the projected points) based on one or more of the determined probabilities determined at block 1105. In some embodiments, an input for the second trained machine learning model includes a set of probabilities for multiple candidates (e.g., for multiple candidate intersections and/or candidate intersection groups). For example, processing logic may select a candidate having a highest probability. The candidate may include a candidate point from one or more image and a projector ray. Processing logic may then determine one or more probabilities for other candidates that also include the candidate point from the one or more image or the projector ray. For example, processing logic may select candidates also associated with the candidate point or projector ray that have next highest probabilities. The input comprising information for the candidates and their associated probabilities (e.g., probability scores) may be input into the second trained machine learning model, which may output a selection of one of the candidates, and therefore of a correspondence between a captured point and a projector ray. This process may be repeated until some criteria is satisfied, wherein with each iteration a new candidate may be selected. In one embodiment, the process is repeated until no remaining candidates can be determined with at least a threshold level of confidence (e.g., with at least an 85% confidence, a 90% confidence, an 80% confidence, etc.).
At block 1115, processing logic determines depth information (and 3D coordinates) for at least some of the plurality of captured points based on the determined correspondence information. Each intersection of a captured point with a projector point (e.g., a projector ray) on a surface of an imaged object may occur at a known depth or distance from the intraoral scanner. Accordingly, if a captured point is determined to have been caused by a particular projector ray, then the depth or distance from the intraoral scanner can be determined. Additionally, the x, y coordinates of each of the captured points are known in each of the images (e.g., based on information about the pixel(s) associated with the captured points. Accordingly, once the depth information is determined, a 3D coordinate or location of the intersection of the captured point with the corresponding projector point or projector ray can be determined as well. In this manner 3D coordinates for some or all of the candidate points in the images can be determined. These 3D coordinates are used to construct a 3D point cloud, which may then be registered to and/or stitched to a 3D surface and/or previously determined 3D point clouds.
The captured images may be 2D images. However, 3D information may be determined based on the 2D images by matching up captured points/features in the images with projector rays (e.g., projected points or features) output by the structured light projectors. In embodiments, the 2D images are transmitted to a computing device (e.g., computing device 105 of
At block 1215, processing logic determines, for each projector ray of the plurality of projector rays of the projected light pattern, one or more candidate points/features in the images that might have been caused by the projector ray. A candidate intersection or pairing may be determined for each pairing of a projector ray and a candidate point/feature of an image.
At block 1220, processing logic processes information for each projector ray to determine probability information for each combination of a projector ray and an associated candidate point/feature (e.g., for each candidate intersection). The probability information for a pair of a projector ray and an associated candidate point/feature that might have been caused by the projector ray (e.g., for a candidate intersection/pairing) may be a probability that the projector ray caused the candidate point/feature. The output of the trained machine learning model may be a probability score (e.g., ranging from a probability of 0% to a probability of 100%). In some embodiments, the probability information is generated using a trained machine learning model (e.g., as described with reference to
In one embodiment, processing logic determines, for each projector ray, and for one or more candidate points/features associated with the projector ray (e.g., for each candidate intersection/pairing), one or more properties (also referred to as features) associated with the pairing of the projector ray and the candidate point (e.g., one or more properties associated with the candidate intersection). The properties may include triangulation properties, distance from an epi-polar line, light intensity, spot size, and/or other information. In some embodiments, candidate intersections from multiple images are grouped together to form a candidate intersection group. The candidate intersection group may have a single candidate intersection that is representative of each of the candidate intersections in the candidate intersection group. Properties may be determined for the representative candidate intersection of the candidate intersection group. In some embodiments, determined properties may be divided into camera-agnostic properties and camera-specific properties. A candidate intersection may representative of a candidate intersection group, and may be associated with multiple different cameras/images. Camera-agnostic properties may be shared by each of the images of a candidate intersection group. Each image in a candidate intersection group may be associated with a particular camera, and may have its own unique camera-specific properties for a given candidate intersection. One or more of the various determined properties may be used to determine the probability information in embodiments.
In one embodiment, at block 1222, processing logic selects one or more of the candidate intersections based on the probabilities. A selection of a candidate intersection is a selection of pairing of a projector ray and a candidate point or structured light feature for the projector ray, and provides a depth or distance at which the projector ray interfaced with a 3D intraoral surface. In some embodiments, the candidate intersection selection is performed using a trained machine learning model (e.g., as described with reference to
At block 1230, processing logic determines depth information (and 3D coordinates) for at least some of the plurality of captured intersections based on the selected candidate intersections for the plurality of projector rays (e.g., based on the selected candidate pairing). Each intersection of a captured point with a projector point (e.g., a projector ray) on a surface of an imaged object may occur at a known depth or distance from the intraoral scanner. Accordingly, if a captured point/feature is determined to have been caused by a particular projector ray, then the depth or distance from the intraoral scanner can be determined. Additionally, the u, v coordinates of each of the captured points are known in each of the images (e.g., based on information about the pixel(s) associated with the captured points. Accordingly, once the depth information is determined, a 3D coordinate or location of the intersection of the captured point/feature with the corresponding projector point/feature or projector ray can be determined as well. In this manner 3D coordinates for some or all of the candidate points/features in the images can be determined. These 3D coordinates are used to construct a 3D point cloud, which may then be registered to and/or stitched to a 3D surface and/or previously determined 3D point clouds.
The captured images may be 2D images. However, 3D information may be determined based on the 2D images by matching up captured points in the images with projector rays (e.g., projected points) output by the structured light projectors. In embodiments, the 2D images are transmitted to a computing device (e.g., computing device 105 of
At block 1255, processing logic determines, for each projector ray of the plurality of projector rays of the projected light pattern, one or more candidate points in the images that might have been caused by the projector ray. A candidate intersection may be determined for each pairing of a projector ray and a candidate point of an image.
At block 1260, processing logic processes information for each projector ray using a trained machine learning model. The trained machine learning model generates one or more outputs containing probability information for each combination of a projector ray and an associated candidate point (e.g., for each candidate intersection). The probability information for a pair of a projector ray and an associated candidate point that might have been caused by the projector ray (e.g., for a candidate intersection) may be a probability that the projector ray caused the candidate point. The output of the trained machine learning model may be a probability score (e.g., ranging from a probability of 0% to a probability of 100%).
In one embodiment, at block 1265 processing logic determines, for each projector ray, and for one or more candidate points associated with the projector ray (e.g., for each candidate intersection), one or more properties (also referred to as features) associated with the pairing of the projector ray and the candidate point(s) (e.g., one or more properties or features associated with the candidate intersection). The properties or features may include triangulation properties or features, distance from an epi-polar line, light intensity, spot size, and/or other information. In some embodiments, candidate intersections from multiple images are grouped together to form a candidate intersection group. The candidate intersection group may have a single candidate intersection that is representative of each of the candidate intersections in the candidate intersection group. Properties or features may be determined for the representative candidate intersection of the candidate intersection group. In some embodiments, determined properties or features may be divided into camera-agnostic properties or features and camera-specific properties or features. A candidate intersection may representative of a candidate intersection group, and may be associated with multiple different cameras/images. Camera-agnostic properties or features may be shared by each of the images of a candidate intersection group. Each image in a candidate intersection group may be associated with a particular camera, and may have its own unique camera-specific properties or features for a given candidate intersection.
In one embodiment, at block 1270, for each pair of a projector ray and an associated candidate point or points (e.g., for a candidate), the determined properties or features are input into the trained machine learning model. The trained machine learning model may determine the probability for the candidate based on the input properties or features.
At block 1275, processing logic may use a second machine learning model to select candidates (e.g., to select candidate points for a plurality of projector rays) based on one or more inputs comprising probabilities of candidate points corresponding to projector rays (e.g., based on probability scores associated with the candidates). In one embodiment, the second machine learning model receives probabilities associated with multiple candidates, and outputs a selection of one of the candidates. This process may be repeated multiple times until no remaining candidates remain that satisfy one or more criteria.
In an example, processing logic may select a candidate having a highest probability. The candidate may include a candidate point from one or more images/cameras and a projector ray. Processing logic may then determine one or more probabilities for other candidates that also include the candidate point from the one or more images/cameras or the projector ray. For example, processing logic may select candidates also associated with the candidate point(s) or projector ray that have next highest probabilities. The input comprising information for the candidates and their associated probabilities (e.g., probability scores) may be input into the second trained machine learning model, which may output a selection of one of the candidates, and therefore of a correspondence between a captured point in one or more images and a projector ray. This process may be repeated until some criteria is satisfied, wherein with each iteration a new candidate (e.g., a new candidate intersection or candidate intersection group) may be selected. In one embodiment, the process is repeated until no remaining candidates can be determined with at least a threshold level of confidence (e.g., with at least an 85% confidence, a 90% confidence, an 80% confidence, etc.).
At block 1280, processing logic determines depth information (and 3D coordinates) for at least some of the plurality of captured points based on the selected candidate points for the plurality of projector rays (e.g., based on the selected candidate. Each intersection of a captured point with a projector point (e.g., a projector ray) on a surface of an imaged object may occur at a known depth or distance from the intraoral scanner. Accordingly, if a captured point is determined to have been caused by a particular projector ray, then the depth or distance from the intraoral scanner can be determined. Additionally, the x, y coordinates of each of the captured points are known in each of the images (e.g., based on information about the pixel(s) associated with the captured points. Accordingly, once the depth information is determined, a 3D coordinate or location of the intersection of the captured point with the corresponding projector point or projector ray can be determined as well. In this manner 3D coordinates for some or all of the candidate points in the images can be determined. These 3D coordinates are used to construct a 3D point cloud, which may then be registered to and/or stitched to a 3D surface and/or previously determined 3D point clouds.
Given an intersection of a projector ray with the imaged surface, several cameras might capture the same illuminated point (e.g., a 2D spot) that was generated. Based on calibration data for the cameras and projectors, processing logic can calculate the projector ray's intersection distance for each camera independently by triangulating the projector ray and a camera ray (there is a continuous mapping between a pixel coordinate and it's corresponding camera ray direction). A discussion of how to perform calibration and generate such calibration data is discussed in greater detail in U.S. Application Ser. No. 16/446,181, filed Jun. 19, 2019, which is incorporated by reference herein. Ideally, if processing logic could locate each spot or point to an infinite accuracy, then each camera would have estimated the same projector ray's distance. But in practice a noisy measurement of captured point (e.g., spot) location gives some deviation with statistically higher triangulation error as the ray's distance is high (the triangle's height is much higher than its base).
At block 1310, processing logic determines whether all projector rays have been processed. If not all projector rays have been processed, the method may return to block 1302 for selection of another projector ray. If all projector rays have been processed (e.g., all candidate intersections and their associated distances have been determined for all projector rays), then the method may proceed to block 1312.
As a preprocessing step, processing logic may try to reduce the problem complexity by grouping candidate intersections from different cameras who agree on the same projector ray's distance (roughly). At block 1312, processing logic may group candidate points associated with a same projector ray (e.g., candidate intersections) from different images for which the determined distance matches or approximately matches into a candidate point group (also referred to as a candidate intersection group). Two candidate intersections from different images may be said to match if they vary by less than a threshold amount (e.g., the distance between the points is less than a threshold amount, such as 2, 3 or 4 pixels along one or more axes). In one embodiment, processing logic determines a number of cameras/images associated with the candidate (e.g., number of candidate intersections in a candidate intersection group). The more cameras that agree one a candidate intersection, the higher the likelihood that the candidate intersection is an actual intersection of a projector ray and a captured point.
Returning to
As previously mentioned, the intraoral scanner may include multiple structured light projectors. Different structured light projectors may output a different light pattern, may output light having a different wavelength, may output a light pattern having differently shaped spots, etc. Each projector ray may be associated with a structured light projector that produced that projector ray. At block 1318, processing logic may determine an index of the structured light projector for each projector ray, and may add the index to each of the candidate intersections/candidate intersection groups (candidates) associated with the projector ray.
At block 1320, processing logic generates a set of camera-agnostic properties or features for each pair of a projector ray and associated candidate point (e.g., for each candidate intersection or candidate intersection group). The set of camera-agnostic properties or features may include, for example, distance from intraoral scanner, 3D location of the intersection (triangulation point), index of the associated structured light projector, and so on. The set of camera-agnostic properties or features associated with a candidate intersection group may be used to form an input for the first trained machine learning model in embodiments.
At block 1502 of method 1500, processing logic selects a pair of a projector ray and an associated candidate point. In other words, processing logic selects a candidate (e.g., a candidate intersection or a candidate intersection group). At block 1504, processing logic selects an image (which corresponds to a camera) comprising the candidate point (e.g., the candidate point of the candidate intersection or the candidate intersection group). At block 1506, processing logic determines an epi-polar line for the projector ray for the selected image. The epi-polar line may be a line representing the projector ray's path as viewed by the selected camera. For example, line 92 in
At block 1510, processing logic may determine an intensity of the captured point in the selected image (e.g., for the selected camera). Intensity of projected light decreases with distance. Accordingly, there is a rough correlation between distance and intensity which can be used by the first trained machine learning model to assist an estimation of a probability that a particular captured point was caused by a particular projector ray.
At block 1514, processing logic may determine a triangulation error based on a distance of the triangulation point from the camera and a distance between the camera and the structured light projector that emitted the projector ray (e.g., an origin of the projector ray). Processing logic may determine a distance between the camera and the structured light projector (or between the pixel in the camera that captured the point and the origin of the ray in the structured light projector). Processing logic may also determine the distance between the intraoral scanner and the candidate intersection. The distance may be a distance to the camera or an orthogonal distance to a line between the camera and the structured light projector. Processing logic may determine a triangulation angle for the candidate intersection based on the distance to the point of intersection and the distance between the camera that captured the point and the structured light projector that generated the point. The lower the distance to the point and the greater the distance between the camera and projector, the greater the triangulation angle and thus the greater the accuracy of the determined distance to the candidate intersection. Accordingly, the accuracy of the candidate intersection may be directly proportional to the distance between the camera and projector and may be inversely proportional to the distance to the candidate intersection. In one embodiment, an error for the determined distance to the candidate intersection may be determined using the function e=z2/bf, where e is an error for the determined distance to the point, z is the determined distance to the point, f is the focal length of the cameras, and b is the base (the distance between the camera and projector). In embodiments, the error associated with the triangulation angle of the point is determined, and the error is used as a feature/property to be included in an input for the candidate intersection to a trained machine learning model.
At block 1516, processing logic determines a spot size of the captured point in the image that is associated with the candidate. In embodiments, spot size is a function of distance from the intraoral scanner. The size of a spot or other structured light feature generally increases with distance from the intraoral scanner. Processing logic may include calibration data indicating the approximate spot/feature size that should be detected at various distances. Alternatively, such information may be derived by the machine learning model during training of the machine learning model. For example, spots projected onto a surface at a first relatively close distance may have a first general spot size, spots projected onto a surface at a second distance that is greater than the first distance may have a second larger general spot size, and spots projected onto a surface at a third distance that is greater than the second distance may have a third even larger general spot size.
At block 1518, processing logic may determine a color of the surface at an intersection of the projector ray and the candidate point for the image based on one or more nearest white light (e.g., color) image generated by the camera. The intraoral scanner may alternate between generation of 2D images while structured light of one or a few wavelengths is projected and generation of 2D images while white light is projected. 2D images captured during projection of white light provide color information. A pixel or pixels associated with the captured point in the selected image may be determined. The color for the same pixel or pixels in a white light 2D image generated by the camera before and/or after generation of the image using structured light may also be determined. An assumption may be made that the intraoral scanner did not move or moved only minimally between capturing of the structured light in the image and the capturing of the white light in a subsequent and/or previous image. Accordingly, the color of the pixel(s) in the white light image may be assigned to the candidate point for the selected image (and of the candidate comprising the candidate point). The color information may be used to help determine probability of a candidate that includes the candidate point. For example, if a candidate includes candidate points from multiple cameras, but those cameras disagree on the color of the surface at the intersection, this may lower a confidence in the candidate.
At block 1520, processing logic may determine a difference between a distance associated with the candidate point (e.g., a distance of the candidate intersection or candidate intersection group that includes the candidate point) and the average distance for the candidate intersection group (also referred to as candidate point group). If a candidate point is part of a candidate intersection group and its distance of intersection is very close to the average distance of intersection to the candidate intersection group, then the confidence for the candidate intersection group may increase. However, if a candidate point is part of a candidate intersection group and its distance of intersection is not as close to the average distance of intersection for the candidate intersection group, then the confidence for the candidate intersection group may decrease.
At block 1522, processing logic generates a set of camera-specific properties or features for the pair of projector ray and associated candidate point (or candidate point group). In other words, processing logic may generate a set of camera-specific properties or features for a candidate intersection or candidate intersection group.
At block 1524, processing logic determines whether all images in the candidate point group or candidate intersection group have been processed. If not all images in the candidate point group or candidate intersection group have been processed, the method returns to block 1504 and another image is selected for the candidate intersection group. If all images have been processed for the candidate intersection group, the method continues to block 1526.
At block 1526, processing logic determines whether all pairs of projector rays and candidate points/candidate point groups (e.g., whether all candidate intersections or candidate intersection groups) have been processed. If so, the method ends. If not, the method returns to block 1502 and another candidate intersection or candidate intersection group is selected.
After information associated with each particular candidate (e.g., each combination of a possible intersection of a projector ray and a candidate point from one or more images) is processed by the first trained machine learning model (e.g., at block 1220 of method 1200), probabilities of the projector ray having caused each of the candidate points (e.g., probability scores of each candidate) may be provided.
Once probabilities are determined for candidates (e.g., candidate intersections and/or candidate intersection groups), processing logic may use that probability information to select candidates (e.g., to select which candidate points in images were caused by which projector rays). In some embodiments, such a selection process is performed using a second trained machine learning model, such as at block 1275 of
At block 1805 of method 1800, processing logic selects a projector ray. At block 1810, processing logic determines one or more additional projector rays that are proximate to the selected projector ray.
Returning to
At block 1820 processing logic determines distances and probabilities for one or more (e.g., n) top candidates (e.g., each including a candidate point from one or more images) for each of the additional projector rays, where there may be up to k additional projector rays considered. Each top candidate may correspond to a candidate having a highest to nth highest probability. Such information may be in the form of candidates comprising the candidate points and associated probabilities as output by a trained machine learning model.
At block 1825, processing logic generates a tensor comprising the distances determined at block 1820 and the probabilities determined at block 1820. Assuming that n top candidates are selected for k+1 rays (the selected ray and k neighboring rays), the tensor may be a (k+1)×n×2 tensor. At block 1830, processing logic inputs the tensor into a trained machine learning model (e.g., a neural network), which outputs an updated probability for one or more candidate points of the projector ray. Different rays can have a different number of candidates. In embodiments the tensor input into the trained machine learning model may be standardized, such that a set number of highest scoring candidates is used regardless of the number of candidates for a given projector ray. The value of n may be selected based on a tradeoff between accuracy and performance (e.g., CPU load). The output of the trained machine learning model may be an updated probability for one or more candidates of the selected projector ray. Alternatively, the output of the trained machine learning model may be a weight (e.g., adjustment factor) to apply to the probability scores for one or more candidates associated with the projector ray.
At block 1830, processing logic determines whether or not all projector rays have been processed. If there are remaining projector rays that have not yet been processed, the method returns to block 1805 and a new projector ray is selected. If all projector rays have been processed, the method ends.
At block 1905 of method 1900, processing logic selects a projector ray. At block 1910, processing logic determines a 3D coordinate (or just a distance value) for the selected projector ray in one or more prior frames.
At block 1915, processing logic selects a candidate point for the selected projector ray (e.g., a candidate intersection for the projector ray). At block 1920, processing logic updates the probability for the candidate point for the projector ray in the current frame based on a difference between a first distance value of the 3D coordinate for the projector ray in the prior frame(s) and the distance value associated with the candidate point (e.g., candidate intersection) in the current frame. In one embodiment, the candidate intersection score is updated according to the following normal distribution:
Where μ is the previous estimated ray distance and x is the current candidate's ray distance. In embodiments, the candidate score is not updated if the ray was not solved in the previous frame.
At block 1925, processing logic determines whether all candidate points (candidate intersections) for the selected projector ray have been processed. If there are remaining candidates that have not yet been processed, the method returns to block 1915 and a new candidate is selected. If all candidate points for the projector ray have been processed, the method continues to block 1930.
At block 1930, processing logic determines whether or not all projector rays have been processed. If there are remaining projector rays that have not yet been processed, the method returns to block 1905 and a new projector ray is selected. If all projector rays have been processed, the method ends.
Once processing logic has determined a probability score for each candidate (e.g., for each candidate intersection or each candidate intersection group) of a given ray, processing logic needs to choose the correct candidate intersection as the actual intersection of the projector ray with an imaged surface (if there any). The correct candidate is not necessarily the candidate that has the highest probability score. According to the manner in which the first machine learning model that outputs probability scores for candidate intersections, the predicted probability scores should provide a probability, for each candidate, of that given candidate being the correct solution. The purpose of candidate selection is therefore to choose among all probability scores the best combination of candidates, from the entire selection of options and combinations. This amounts to a global optimization of selecting candidate intersections that in the aggregate have a highest combined probability score. However, the probability score for any given individual projector ray may not be maximized.
Solving the entire correspondence all together is quite difficult. Additionally, candidates may be candidate intersection groups (e.g., groups of 2D spots), and those spots in any given group may participate in several different candidate intersection groups. Accordingly, in embodiments processing logic restrains the spots to participate in just one candidate. That is, once processing logic chooses a specific candidate, processing logic eliminates all the spots of the candidate from the rest of the unsolved candidates. To do that, processing logic solves for candidate selection in a greedy manner in which it prioritizes candidates according to their score, and then updates the rest of the candidates once a candidate has been selected (e.g., once a distance for a projector ray is solved and/or a correspondence between a projector ray and a candidate point is determined).
In an embodiment, the entire spots, candidates, and projector rays can be summarized in two tangled tables that associate from spots (or captured points) to projector rays via candidates (e.g., candidate intersection groups and/or candidate intersections) as an intermediate step, such that (Rays↔Candidates↔Spots).
In one embodiment, at block 2005 processing logic generates a first table or list of projector rays. The first list may comprise, for each projector ray, one or more candidates associated with the projector ray and their associated probability scores. At block 2010, processing logic generates a second list or table of the plurality of points or spots. The second list may comprise, for each point or spot of the plurality of points or spots, one or more candidates associated with the point or spot. The pair of tables may be as follows:
Returning to
At block 2020, processing logic generates an input for a trained machine learning model (e.g., for the second trained machine learning model used at block 1225 of method 1200. The generated input may include the candidate and its associated probability score as well as one or more additional candidates and their associated probability scores. The one or more additional candidates may be those candidates for the projector ray of the candidate having a next highest probability score and/or those candidates for the captured point or spot of the candidate having a next highest probability score.
In the example of
In one embodiment, the input associated with a candidate that is prepared for the machine learning model includes n highest probability scores associated with the projector ray of the candidate and n highest probability scores associated with the captured point or spot of the candidate. In one embodiment n is equal to two, which may result in four values being included in the input, where two of the values are the same (e.g., two instances of the probability score of the candidate intersection). Three different examples of inputs for the machine learning model are provided in
Returning to
At block 2028, processing logic may determine a 3D coordinate for the captured point (e.g., spot) of the selected candidate.
At block 2030, processing logic removes candidate intersections for the point/spot of the selected candidate intersection from the second table or list and removes the candidate intersections for the projector ray of the selected candidate intersection from the first table or list.
At block 2035, processing logic determines whether any remaining candidates have at least a threshold probability score. If so, the method returns to block 2015, and a next highest probability candidate is selected. If no remaining candidates have at least the threshold probability score, the method continues to block 2040. At block 2040, processing logic determines whether the probability threshold is currently set to a lowest possible setting for the probability threshold. If so, the method ends. If the current probability threshold is not set to the lowest setting for the probability threshold, the method continues to block 2025. At block 2025, processing logic lowers the probability threshold. The method then returns to block 2015, and a candidate intersection having a next highest probability score is selected.
Processing logic may solve the correspondence in a descending order starting from the most confident candidates, and may determine candidate selection for candidates associated with the highest probability scores until there are no remaining candidates with scores meeting an initial probability threshold. Processing logic may then incrementally lower the probability threshold, and repeat the process one or more times using the lowered probability threshold. This process may be repeated one or more additional times until the probability threshold has been lowered to a minimum setting for the probability threshold. Proceeding in this fashion ensures that selections are initially made with the highest confidence, which eliminates some possibilities and may increase the accuracy of selection for further candidates that might have lower probability scores. Each time processing logic solves some candidates, it updates the lists (removing the solved candidates from the lists).
To get computation as fast as possible, in one embodiment processing logic divides the minimal allowed threshold into three evenly separated thresholds (e.g., 97.5%, 95%, 92.5%).
In one embodiment, at block 2415 for blue spots processing logic performs candidate generation for projector rays and at block 2435 processing logic performs candidate generation for green spots. Candidate generation may be performed by using calibration data to match up captured spots in captured 2D images with projected spots (each corresponding to a projector ray). Spot detection yields many candidate intersections of projected spots and captured spots.
Returning to
Returning to
Returning again to
At block 2460, processing logic may add information for unsolved projector rays from one or more prior frames, adjusting the probability scores for candidates of those projector rays based on distance information for the projector rays in the previous frame(s). After adjusting the probability scores, processing logic may again run the candidate selection algorithms to solve for more candidates. At block 2465 the set of 3D points is updated based on the additional selected candidates.
At the end of method 2400 there may not be solved intersections for all projector rays. Such projector rays with unsolved intersections (and thus unsolved distances/depths) may not be discarded. Enough solutions may be determined to generate a point cloud that has enough properties/features to register and stitch to a 3D surface. Once such registration and stitching is performed, then there may be additional 3D surface data that may be used to help determine the distance/depth information for the previously unsolved projector rays. As further scans are generated and information from those scans is added to the 3D surface, the 3D surface may accumulate data for more points on the surface, further improving the data that can be used to help solve the correspondence problem for previously unsolved projector rays. For example, the 3D coordinates of nearby points may be used to adjust the probability scores for candidates associated with projector rays, after which a candidate selection may be made.
At block 2810, the intraoral scanner is driven to capture, using one or more cameras, a plurality of images of at least a portion of the light pattern projected onto the dental site. Each camera of the plurality of cameras may capture a distinct image of the dental site at the same time or at approximately the same time, where each of the images comprises a plurality of captured points of at least the portion of the light pattern.
At block 2815, processing logic determines, for each projector ray of the plurality of projector rays of the projected light pattern, one or more candidate points in the images that might have been caused by the projector ray. A candidate intersection may be determined for each pairing of a projector ray and a candidate point of an image.
At block 2820, processing logic divides the projector rays into a first subset of projector rays having a first wavelength and a second subset of projector rays having a second wavelength.
At block 2822, processing logic processes properties/features for each candidate intersection using a trained machine learning model. The trained machine learning model generates one or more outputs containing probability information for each candidate intersection. In one embodiment, different machine learning models are trained for processing candidates for the first and second subsets. A first machine learning model may be trained to process the first subset (e.g., may be trained to process information for rays having the first wavelength) and a second machine learning model may be trained to process the second subset (e.g., may be trained to process information for rays having the second wavelength).
At block 2825, processing logic uses a second trained machine learning model to select candidate intersections for the first subset. At block 2828, processing logic uses a second trained machine learning model to select candidate intersections for the second subset.
At block 2830, processing logic determines depth information (and 3D coordinates) for at least some of the plurality of captured points in the captured images based on the selected candidate intersections.
At block 2835, processing logic may determine one or more projector rays for which candidate points from the first subset and/or second subset have not been selected. At block 2835, processing logic may combine information for the first subset and the second subset. At block 2840, processing logic may then determine 3D coordinates for at least some of the remaining points in the first subset and/or the second subset using the combined information.
At block 2915, processing logic determines candidate pairings of structured light features captured in the one or more images with projector rays of the structured light pattern. For each projector ray of the plurality of projector rays of the projected light pattern, processing logic may determine one or more candidate pairings each including a structured light feature (e.g., a point of the structured light pattern in the image) and a projector ray. Such candidate pairings may also be referred to as candidate intersections.
At block 2920, processing logic determines, for each candidate pairing, a probability that the structured light feature of the candidate pairing corresponds to (e.g., was caused by) the projector ray of the candidate pairing. Such probabilities may be determined as described elsewhere herein (e.g., optionally using one or more trained machine learning models and/or based on one or more properties associated with candidate pairings).
At block 2925, processing logic determines 3D coordinates of at least a subset of the structured light features in the images by selecting candidate pairings based at least in part on the determined probabilities. Such selection of candidate pairings (e.g., of candidate points/features) may be determined as described elsewhere herein (e.g., optionally using one or more trained machine learning models). Each candidate pairing represents an intersection of a captured structured light feature with a projector ray on a surface of an imaged object at a known depth or distance from the intraoral scanner. Accordingly, if a captured point/feature is determined to have been caused by a particular projector ray (based on selection of a candidate pairing), then the depth or distance from the intraoral scanner can be determined for the structured light feature of the candidate pairing.
At block 2945, processing logic determines candidate pairings of structured light features captured in the one or more images with projector rays of the structured light pattern. For each projector ray of the plurality of projector rays of the projected light pattern, processing logic may determine one or more candidate pairings each including a structured light feature (e.g., a point of the structured light pattern in the image) and a projector ray.
Returning to
At block 2955, selects one or more candidate intersections (candidate pairings) having highest probabilities. Such selection of candidate intersections may be determined as described elsewhere herein (e.g., optionally using one or more trained machine learning models).
Once one or more candidate intersection (candidate pairings) are selected, other candidate intersection (candidate pairings) may be eliminated from consideration using one or more rules. For example, each projector ray can cause at most a single point or feature on an imaged object. Similarly, each structured light feature on an imaged object can only be caused by a single projector ray. Accordingly, once a candidate intersection (candidate pairing) is selected for a projector ray, all other candidate intersection (candidate pairings) for the projector ray (e.g., each associated with another possible structured light feature) may be eliminated. Similarly, once a candidate intersection (candidate pairing) is selected for a structured light feature, all other candidate intersection (candidate pairings) for the structured light feature (e.g., associated with other projector rays) may be eliminated.
Referring back to
Returning to
In an example, referring to
Returning to
At block 2975 processing logic removes any structured light features with unsolved candidate intersections that cannot be solved with a threshold level of accuracy/confidence from consideration. As a result, data points associated with the structured light features may not be included in a 3D point cloud generated from the 2D image(s). By removing data for structured light features that cannot be solved with a sufficient level of confidence, the accuracy of the 3D point cloud can be increased, resulting in a more accurate 3D model of the scanned dental site generated from the 3D point cloud and other 3D point clouds generated from intraoral scanning. Incorrectly choosing a candidate intersection can reduce a resolution of a 3D point cloud, and ultimately of a generated 3D model of a dental site. Accordingly, it can be beneficial to obtain as clean a point as possible to reproduce an accurate surface. Accordingly, in embodiments points having below a threshold level of confidence or accuracy may be dropped and not included in the 3D point cloud.
At block 2980, processing logic determines 3D coordinates for at least some of the plurality of points (e.g., structured light features) in the images based on the selected candidate intersections for the plurality of projector rays. The 3D coordinates may be used to generate a 3D point cloud.
It can be mathematically complex to determine ordering information of structured light features/projector rays relative to one another for a hexagonal grid pattern. Accordingly, in some embodiments one or more transformations are performed to convert the hexagonal grid pattern into a rectangular grid pattern. It can be easy to determine ordering of structured light features/projector rays in a rectangular grid pattern.
Returning to
Returning to
At block 3076, processing logic removes any such candidate intersections that fail to preserve the known order from consideration. Examples of such candidate intersections that fail to preserve a known order are shown in
In embodiments, the plurality of projector rays/structured light features are arranged in a first known order along a first axis (e.g., x-axis) and in a second known order along a second axis (e.g., y-axis) in the image plane. Processing logic may eliminate from consideration candidate intersections that fail to preserve either the first known order in the horizontal direction or the second known order in the vertical direction.
In embodiments, once a point (e.g., a structured light feature such as a spot) is solved for, processing logic can eliminate numerous candidate intersections that fail to maintain the correct ordering. In embodiments, up to approximately half or remaining candidate intersections surrounding a solved point may be eliminated based on ordering.
In some embodiments, each camera may include one or more “confusion” regions, at which it is more difficult to determine whether a known order is preserved. Such confusion regions are regions at which inherent errors may cause the ordering information to be incorrect. The error regions may be one or a few pixels wide in some embodiments. In embodiments, an area around a solved point may be divided into four quadrants 3445A, 3445B, 3445C, 3445D as shown, by drawing diagonal lines through the solved point. Any candidate intersections that fall on or near the line dividing two quadrants may be in a confusion region, and may not be eliminated using ordering information in embodiments. In some embodiments, a confusion region 3450A, 3450B, 3450C, 3450D is a region around each of the lines separating quadrants. In some embodiments, a confusion region is about plus or minus 2.5% or plus or minus 5% around each line dividing quadrants.
In embodiments, ordering information for a candidate intersection may be determined based on the quadrant within which the candidate intersection falls. If the candidate intersection is in quadrant 3445A or 3445C, then ordering information may be determined to maintain ordering in the X axis in an embodiment. If the candidate intersection is in quadrant 3445B or 3445D, then ordering information may be determined to maintain ordering in the Y axis in an embodiment.
At block 3105 of method 3100, processing logic receives probabilities relating structured light features in one or more images captured by one or more cameras of an intraoral scanner with projector rays of a light pattern projected by one or more structured light projectors of the intraoral scanner. The images may have been generated as discussed earlier herein. The probabilities relating structured light features to projector rays may have been determined as described earlier herein (e.g., optionally using machine learning). For example, candidate intersections may be identified, each representing a candidate pairing of a structured light feature and a projector ray, and each candidate intersection may be associated with a probability.
At block 3110, processing logic determines 3D coordinates of a subset of structured light features by associating the subset of the structured light features with a subset of projector rays based on the received probabilities. The 3D coordinates may be determined by selecting candidate intersections that satisfy one or more selection criteria, where each candidate intersection may be associated with a 3D coordinate, as described earlier herein.
At block 3115, processing logic constrains projector ray candidates for non-associated structured light features (e.g., for structured light features for which a candidate intersection has not yet been selected) by removing one or more projector rays candidates (e.g., removing one or more candidate intersections each associated with a different projector ray) for non-associated structured light features that do not preserve order with the subset of the structured light features associated with the subset of the projector rays (e.g., that do not preserve order with an already selected candidate intersection).
At block 3120, processing logic solves, after constraining the projector ray candidates for the non-associated structured light features, for 3D coordinates of at least a subset of the non-associated structured light features by associating at least the subset of the non-associated structured light features with a subset of the non-associated projector rays. Once the projector ray candidates are constrained (e.g., one or more candidate intersections are removed from consideration), the probabilities of remaining projector ray candidates (e.g., remaining candidate intersections) may be increased to a threshold appropriate for selection. 3D coordinates may then be solved for such candidate rays in embodiments.
At block 3205 of method 3200, processing logic receives probabilities relating structured light features in one or more images captured by one or more cameras of an intraoral scanner with projector rays of a light pattern projected by one or more structured light projectors of the intraoral scanner. The images may have been generated as discussed earlier herein. The probabilities relating structured light features to projector rays may have been determined as described earlier herein (e.g., optionally using machine learning). For example, candidate intersections may be identified, each representing a candidate pairing of a structured light feature and a projector ray, and each candidate intersection may be associated with a probability.
At block 3208, processing logic solves for 3D coordinates of at least a subset of the structured light features by associating at least the subset of the structured light features with a subset of projector rays, as discussed earlier herein. For example, processing logic may select candidate intersections for one or more structured light features/points in one or more captured images.
At block 3210, processing logic determines structured light features that have no candidate pairings (e.g., no candidate intersections associating the structured light feature with a projector ray) with probabilities that are at or above a first threshold.
At block 3215, processing logic determines, for one or more structured light features, that a first candidate pairing has a first probability associating a first projector ray with the structured light feature and that a second candidate pairing has a second probability associating a second projector ray with the structured light feature. Processing logic may further determine that the first and second probabilities are each above the first threshold. Processing logic may compare the first and second probabilities to determine a delta between the first probability and the second probability. Processing logic may then determine whether the delta is above a second threshold.
At block 3220, for those structured light features for which the determined delta between probabilities of two possible candidate pairings was less than the second threshold, the structured light features are removed from consideration. This ensures that all solved for 3D coordinates have a very high accuracy (e.g., about 95%, about 99%, about 99.9%, etc.).
The example computing device 3500 includes a processing device 3502, a main memory 3504 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM), etc.), a static memory 3506 (e.g., flash memory, static random access memory (SRAM), etc.), and a secondary memory (e.g., a data storage device 3528), which communicate with each other via a bus 3508.
Processing device 3502 represents one or more general-purpose processors such as a microprocessor, central processing unit, or the like. More particularly, the processing device 3502 may be a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, processor implementing other instruction sets, or processors implementing a combination of instruction sets. Processing device 3502 may also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like. Processing device 3502 is configured to execute the processing logic (instructions 3526) for performing operations and steps discussed herein.
The computing device 3500 may further include a network interface device 3522 for communicating with a network 3564. The computing device 3500 also may include a video display unit 3510 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)), an alphanumeric input device 3512 (e.g., a keyboard), a cursor control device 3514 (e.g., a mouse), and a signal generation device 3520 (e.g., a speaker).
The data storage device 3528 may include a machine-readable storage medium (or more specifically a non-transitory computer-readable storage medium) 3524 on which is stored one or more sets of instructions 3526 embodying any one or more of the methodologies or functions described herein, such as instructions for intraoral scan application 3515, which may correspond to intraoral scan application 115 of
The computer-readable storage medium 3524 may also be used to store dental modeling logic 3550, which may include one or more machine learning modules, and which may perform the operations described herein above. The computer readable storage medium 3524 may also store a software library containing methods for the intraoral scan application 115. While the computer-readable storage medium 3524 is shown in an example embodiment to be a single medium, the term “computer-readable storage medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “computer-readable storage medium” shall also be taken to include any medium other than a carrier wave that is capable of storing or encoding a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure. The term “computer-readable storage medium” shall accordingly be taken to include, but not be limited to, solid-state memories, and optical and magnetic media.
It is to be understood that the above description is intended to be illustrative, and not restrictive. Many other embodiments will be apparent upon reading and understanding the above description. Although embodiments of the present disclosure have been described with reference to specific example embodiments, it will be recognized that the disclosure is not limited to the embodiments described, but can be practiced with modification and alteration within the spirit and scope of the appended claims. Accordingly, the specification and drawings are to be regarded in an illustrative sense rather than a restrictive sense. The scope of the disclosure should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
The present application claims priority to U.S. Provisional Patent Application No. 63/461,804, filed on Apr. 25, 2023, which is herein incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63461804 | Apr 2023 | US |