The present invention relates to a localization apparatus and method by using a machine learning technique.
In recent years, machine learning techniques called deep learning have been used for an image recognition process. Among the machine learning techniques, methods described in Non-PTL 1, Non-PTL 2, Non-PTL 3, and Non-PTL 4 are available as methods for detecting specific position coordinates of an object imaged in an image.
In the related art above described, it is described that, after training a deep learning network by using training image data in which position coordinates desired to be detected are specified and a two-dimensional Gaussian heatmap centered on the position coordinates desired to be detected as teacher data (correct data), a new image is inferred by using a trained deep learning network (called a trained learning model), and the position coordinates desired to be detected in the new image are detected. Since an inference result output by the trained learning model is the two-dimensional Gaussian heatmap, the coordinates of the peak values are output as the position coordinates desired to be detected. In addition, the peak value itself represents reliability (certainty) of the inferred position coordinates.
In the related art described above, when the position coordinates desired to be detected are specified, since the two-dimensional Gaussian heatmap can be automatically generated, there is an advantage in that the teacher data can be easily generated.
When position coordinates are used to automate apparatus operations, such as industrial products, it is required to automatically determine whether the position coordinates inferred by a trained learning model are correct. At this time, it is conceivable to use reliability.
However, in the related art, since an inferred peak value of a heatmap indicates the position coordinates desired to be detected and the reliability at the same time, when the peak value is erroneous, there is a possibility that a problem of continuing an operation while determining the incorrect position coordinate or the incorrect reliability as a correct position coordinate or correct reliability.
It is necessary to use the scale different from the inferred peak value of the heatmap as the reliability, or to calculate the position coordinates and the reliability by using the different scale.
An object of the invention is to solve the above-described problems of the related art and to provide a localization apparatus and method capable of implementing position detection with high detection performance only by providing an image and coordinates desired to be detected.
In order to solve the above-described problems of the related art, a localization apparatus according to the invention includes: a deep learning model trained by using a plurality of combinations of training image data in which a position desired to be detected on an image obtained by imaging a subject is specified and teacher image data generated by arranging a pixel group different from background of the subject with respect to the subject at the position desired to be detected on the training image data; a position coordinate calculation unit calculating position coordinates by using inference image data obtained by inputting an image of a new subject of which position coordinates are desired to be detected to the deep learning model; and a reliability calculation unit calculating reliability of the position coordinates calculated by the position coordinate calculation unit by using information about the pixel group of the inference image data output from the deep learning model.
According to the invention, since the reliability can be obtained by using a scale different from the peak value of the heatmap, the possibility that the peak value is incorrect can be determined. Furthermore, when the position coordinates or the reliability are calculated from the shape of the pixel group of the inference data, since the pixel group independent of the shape of the subject is used as the teacher data, the position coordinates or the reliability can be obtained without depending on the shape of the subject.
The invention relates to a localization apparatus including: a deep learning model trained by using training image data in which position coordinates desired to be detected are specified and teacher image data in which a pixel group representing a shape independent of a subject of the training image data is arranged at a position relative to the position coordinates desired to be detected; a position coordinate calculation unit calculating position coordinates by using inference image data output from the deep learning model, and a reliability calculation unit calculating reliability by using global information of the pixel group of the inference image data output from the deep learning model.
Hereinafter, embodiments of the invention will be described in detail with reference to the drawings. It is noted that the embodiments described below do not limit the invention according to the claims, and all of the elements described in the embodiments and combinations of the elements are not necessarily essential to solution of the invention. In addition, the embodiments disclosed in the present application also include forms in which various elements described in the embodiments are appropriately combined.
A first embodiment of the invention will be described with reference to
The deep learning model. 110 is a trained learning model in which a pre-training is executed by using a plurality of pairs of training image data and teacher data. The deep learning model 110 is a fully-convolutional encoder-decoder network as described in Citation List (Non-PTLs 1 to 4). An example of the training image data and the teacher data will be described with reference to
An image 210 in
An image 220 in
A plurality of pairs of training image data and teacher data as illustrated in
The behavior of the trained deep learning model 110 will be described with reference to
The position coordinate calculation unit 120 calculates position coordinates from the inference image 320 inferred by the deep learning model 110. Specifically, by obtaining the peak having a maximum pixel value from the inference image 320, the peak (position coordinate 303) of the heatmap 304 is detected and the coordinate is output.
The reliability calculation unit 130 calculates the reliability from the shape of the heatmap 304 formed at the peak positions output by the position coordinate calculation unit 120 from the inference image 320 inferred by the deep learning model 110. As a specific calculation method, a known technique for quantifying the difference in shape, such as a method of quantifying an isotropy of the heatmap, a method of obtaining a difference from the heatmap used for the teacher data, a method of obtaining a correlation value with the heatmap used for the teacher data, and the like, or a combination technique thereof may be used.
Next, behavior of the localization apparatus 100 when the difficulty level of detection of the position coordinates is high will be described with reference to
An image 410 in
The position coordinate calculation unit 120 detects the peak of the heatmap 403 by obtaining the peak having a maximum pixel value from the inference image 420 and outputs the coordinates of the peak.
The reliability calculation unit 130 calculates the reliability from the shape of the heatmap 403 formed at the peak positions output by the position coordinate calculation unit 120 from the inference image 420. Since the heatmap 403 is less isotropic than the heatmap 204 used for the teacher data 220, low reliability is output. Accordingly, it can be seen that the position coordinates output by the position coordinate calculation unit 120 are not appropriate.
In the above-described embodiment, only inappropriate position coordinates are output, but the plurality of position coordinates and the corresponding reliability may be output, and appropriate position coordinates may be selected by a post-stage process (not illustrated). In addition, although the configuration of the localization apparatus 100 is changed, the output of the reliability calculation unit 130 may be input to the position coordinate calculation unit 120 so as to output the position coordinates having the highest reliability.
First, the deep learning model 110 is trained by using the plurality of pairs of the training image data and the teacher data (3501). Examples of the training image data and the teacher data are illustrated in the training image data 210 in
Next, the image of which position coordinates are desired to be detected is input to the trained deep learning model to obtain the inference image (S502). The example of the image of which position coordinates are desired to be detected is illustrated as the image 310 in
The maximum N of peak coordinates of which pixel values are a specific threshold value or more are obtained from the obtained inference image (S503). The specific threshold value and N are set in advance.
Next, the reliability is calculated from the shape of the heatmap corresponding to the obtained peak coordinates (S504). Examples of the heatmaps are illustrated as 403 and 404 in
Finally, the peak coordinates having the maximum reliability are obtained, and the peak coordinates are output (S505). At this time, reliability information corresponding to the output peak coordinates may also be output together. An output destination may be a control unit of the computer (not illustrated) (control unit that controls an external apparatus) or a display screen, or may be an external processing apparatus connected to the computer (not illustrated).
According to the present embodiment, since the reliability is calculated by using information different from the peak value of the heatmap, when the peak position is erroneously detected, it can be suggested that the detection is erroneous detection for the output information.
A second embodiment of the invention will be described with reference to
The deep learning model 610 is a trained learning model in which a pre-training is executed by using a plurality of pairs of training image data and teacher data. The deep learning model 610 can perform semantic segmentation. That is, background-likeness and object-likeness for each class are calculated for each pixel, and the label corresponding to the largest calculated value is output as the pixel value. It can be said that each pixel is classified into background or an object. In some cases, as a ratio of a maximum value (or a value of a softmax function) to a total value of background-likeness and object-likeness for each class is greater, likelihood becomes higher. An example of the training image data and the teacher data will be described with reference to
The training image data 210 in
The circular
The plurality of pairs of the training image data as illustrated in
Behavior of the trained deep learning model 610 will be described with reference to
The position coordinate calculation unit 620 calculates position coordinates from the inference image 820 inferred by the deep learning model 610. Specifically, the blob having an area which is a specific threshold value or more is obtained from the inference image 820, center-of-gravity coordinates of the obtained blob are obtained, and the center-of-gravity coordinates are output. The well-known method may be used for detecting the blobs and calculating the center-of-gravity coordinates. In addition, selection may be prioritized by using the likelihood of center-of-gravity coordinates or the average value of the likelihood of all the blobs. This likelihood value may be output in association with the corresponding center-of-gravity coordinates.
The reliability calculation unit 630 calculates the reliability from the shape of the blob 804 formed at the center-of-gravity position output by the position coordinate calculation unit 620 from the inference image 820 inferred by the deep learning model 610. As a specific calculation method, a known technique for quantifying the difference in shape, such as a method of quantifying a degree of circularity of the blob, a method of obtaining a correlation value with the blob used as the teacher data, a method of polar-coordinate-converting the blob from the center of gravity and quantifying unevenness at the boundary, and the like, or a combination technique thereof may be used.
Next, behavior of the localization apparatus 600 when the difficulty level of detection of the position coordinates is high will be described with reference to
An image 410 in
An image 920 in
The position coordinate calculation unit 620 obtains blobs having an area which is a specific threshold value or more from the inferred image 920 inferred by the deep learning model 610, obtains the center-of-gravity coordinates of the obtained blobs, and outputs the center-of-gravity coordinates.
The reliability calculation unit 630 calculates the reliability from the shape of the blob 903 formed at the center-of-gravity position output by the position coordinate calculation unit 620 from the inference image 920. Since the blob 903 has a low degree of circularity, the low reliability is output. Accordingly, it can be seen that the position coordinates output by the position coordinate calculation unit 620 are not appropriate. In the above-described embodiment, only inappropriate position coordinates are output, but the plurality of position coordinates and the corresponding reliability may be output, and appropriate position coordinates may be selected by a post-stage process (not illustrated). In addition, although the configuration of the localization apparatus is changed, the position coordinates having the highest reliability may be output by inputting the output of the reliability calculation unit to the position coordinate calculation unit.
First, the deep learning model is trained by using the plurality of pairs of the training image data and the teacher data (S1001). Examples of the training image data and the teacher data are illustrated in the training image data 210 in
Next, the inference image is obtained by inputting the image of which position coordinates are desired to be detected to the trained deep learning model 610 (S1002). The example of the image of which position coordinates are desired to be detected is illustrated as the image 310 in
The maximum N of blobs having an area which is a specific threshold value or more are obtained from the obtained inference image, and the center-of-gravity position the blob is obtained (S1003). The specific threshold value and N are set in advance.
Reliability is calculated from the shape of the blob corresponding to the obtained center-of-gravity coordinates (S1004). As a specific calculation method, a known technique for quantifying the difference in shape, such as a method of quantifying a degree of circularity of the blob, a method of obtaining a correlation value with the blob used as the teacher data, a method of polar-coordinate-converting the blob from the center of gravity and quantifying unevenness at the boundary, and the like, or a combination technique thereof may be used.
Finally, the center-of-gravity coordinates having the highest reliability are obtained, and the coordinates are output (31005). At this time, the reliability corresponding to the center-of-gravity coordinates to be output may be output.
In addition, in the above-described example, the teacher data is the circular figure, but the teacher data may be a polygon such as a quadrangle. Even in that case, the feature calculated by a well-known technique for the shape may be used as the reliability.
According to the present embodiment, since the position coordinates are obtained by calculating the center of gravity of the blobs having the same pixel values, the position coordinates can be determined uniquely, so that the position coordinates can be obtained by a relatively simple process in comparison with the method described in the first embodiment.
A modified example of the second embodiment will be described with reference to
The present modified example differs from the second embodiment in that the reliability calculation unit 630 described in the second embodiment is not provided. The deep learning model 1810 is the same as the deep learning model 610 of
Next, behavior of the localization apparatus 1800 when the difficulty level of detection of the position coordinates is high will be described with reference to
An image 410 in
The position coordinate calculation unit 1820 obtains the blob having a largest area from the inference image 1920, obtains the center-of-gravity coordinates of the obtained blob, and outputs the center-of-gravity coordinates. In the present modified example, when the plurality of blobs are detected in the inference image 1920 by using the characteristic that the area of the blob of which center-of-gravity coordinates that are to be originally obtained is generally larger than that of another blob that becomes noise, it is determined as the blob to be originally detected without calculating reliability of each of the plurality of blobs by detecting the blob having the largest area.
First, the deep learning model 1810 is trained by using the plurality of pairs of the training image data and the teacher data (S2001). Examples of the training image data and the teacher data are the same as the training image data 210 and teacher data 720 described with reference to
Next, the inference image is obtained by inputting the image of which position coordinates are desired to be detected to the trained deep learning model (S2002). The example of the image of which position coordinates are desired to be detected is illustrated as the image 410 in
A blob having the largest area is obtained from the obtained inference image, and the center-of-gravity position of the blob is obtained (S2003). Finally, the obtained coordinates are output (S2005).
In an example in the related art, when the peak becomes flat or a plurality of peaks appear in the vicinity, complicated processing is required for detection of the peak. However, in the present modified example, there is an advantage that a simple process of obtaining the center-of-gravity coordinates of the blob is enough.
In addition, the present modified example has an advantage in that the position coordinates can be obtained without obtaining a reliability index.
A third embodiment of the invention will be described with reference to
The deep learning model 1110 is a trained learning model in which a pre-training is executed by using a plurality of pairs of training image data and teacher data. The deep learning model 1110 can also perform the semantic segmentation. The deep learning model 1110 randomly performs dropout on internal nodes a plurality of times and determines a label of the pixel by using the average value thereof. In addition, it is assumed that the image in which the pixel values are variations by means of a plurality of dropouts is also output.
In the second embodiment, the plurality of pairs of training image data 210 and teacher data 720 as described with reference to
Behavior of the trained deep learning model 1110 will be described with reference to
An image 1210 in
An image 1220 in
The position coordinate calculation unit 1120 in the localization apparatus 1100 illustrated in
On the other hand, the reliability calculation unit 1130 in the localization apparatus 1100 calculates the reliability from the shape of the blob 1203 formed at the center-of-gravity position output by the position coordinate calculation unit 1120 from the inference image 1220 in
In the above-described embodiment, only inappropriate position coordinates are output, but the plurality of position coordinates and the corresponding reliability may be output, and appropriate position coordinates may be selected by a post-stage process (not illustrated). Further, although the configuration of the localization apparatus 1100 is changed, the output of the reliability calculation unit 1130 may be input to the position coordinate calculation unit 1120 to output the position coordinates with highest reliability.
First, the deep learning model is trained by using the plurality of pairs of the training image data and the teacher data (S1301). This deep learning model performs the semantic segmentation by randomly performing dropout on the internal nodes a plurality of times and determines the label of the pixel by using the average value. In addition, it is assumed that the image in which the pixel values are variations in the semantic segmentation by means of the plurality of dropouts is also output. Examples of the training image data and the teacher data are illustrated as the training image data 210 in
Next, the inference image is obtained by inputting the image of which position coordinates are desired to be detected to the trained deep learning model 1110 and determining the pixel label by using an average value obtained by performing the semantic segmentation by randomly performing dropout a plurality of times, and the inference image is obtained in which the pixel values are variations in the semantic segmentation by means of the plurality of dropouts (S1302). The example of the image of which position coordinates are desired to be detected is illustrated as image 410 in
The maximum N of blobs having an area which a specific threshold value or more are obtained from the inference image in which pixel labels are determined by using the average value obtained by performing the semantic segmentation by randomly performing dropout a plurality of times, and the center-of-gravity position of the blob is obtained (S1303). The specific threshold value and N are set in advance.
The blob corresponding to the obtained center-of-gravity coordinates is specified from the inference image of which pixel value is a variation in the semantic segmentation by means of the plurality of dropouts, and the reliability is calculated from the shape of the blob (S1304) . As a specific calculation method, a known technique for quantifying the difference in shape, such as a method of quantifying a degree of circularity of the blob, a method of obtaining a correlation value with the blob used as the teacher data, and a method of polar-coordinate-converting the blob from the center of gravity and quantifying unevenness at the boundary, or a combination technique thereof may be used.
Finally, the center-of-gravity coordinates having the maximum reliability are obtained, and the center coordinates are output (31305). At this time, the reliability corresponding to the center-of-gravity coordinates to be output may be output.
In addition, in the above-described example, the teacher data is a circular figure, but the teacher data may be a polygon such as a quadrangle. Even in that case, features calculated by a well-known technique for a shape may be used as the reliability.
According to the present embodiment, since the center-of-gravity coordinates and the reliability are obtained by using the average value and variation in the semantic segmentation by means of the plurality of dropouts, the robustness against noise becomes stronger (higher), and the position coordinates can be obtained with higher reliability.
Next, an example in which the localization apparatus 100, 600, 1100, or 1800 described in the first to third embodiments is applied to the actual apparatus will be described.
The sample machining apparatus 1400 is connected to a display apparatus 1410 and an input/output apparatus 1411 via the bus 1406. The sample machining apparatus 1400 may be wired or wirelessly connected to the display apparatus 1410 and the input/output apparatus 1411. It is noted that, although the display apparatus 1410 and the input/output apparatus 1411 are provided outside the sample machining apparatus 1400 in
In order to pick up a portion of a specific sample, the sample machining apparatus 1400 has a function of conveying the specific sample into the apparatus and allowing the tip of the needle (not illustrated) embedded in the needle unit 1401 to approach a specific location of the sample.
A processing flow for allowing the tip of the needle (not illustrated) embedded in the needle unit 1401 to approach the specific location of the sample will be described with reference to
First, a target sample (not illustrated) is conveyed into the sample machining apparatus 1400 by using the conveying unit 1402 (S1501). Next, a machining unit (not illustrated) performs machining on the specific location of the conveyed sample (S1502). The specific location of the machined sample is detected by using a localization program generated in advance in the computer 1403 and the image of the sample imaged by the image sensor 1404 (S1503). This localization program may be a localization program corresponding to the flowchart illustrated in
In a case where the localization program corresponding to the flowchart illustrated in
When continuing the operation, the tip of the needle is detected by using the localization program prepared in advance in the computer 1403 and the image of the needle imaged by the image sensor 1404 (S1504). As this localization program, the localization program corresponding to the flowchart illustrated in
Herein, when the calculated reliability is lower than a predetermined threshold value, the operator may be notified by the alarm. In addition, when the position coordinates calculated by using another localization program are compared and both exist within the allowable range, the operation may be continued. When the operator is notified by the alarm, the operator may refer to the image of the image sensor displayed on the display apparatus and the position coordinates, so that the operation may be determined to be continued or interrupted. When continuing the operation, the needle is moved by controlling the needle unit 1401 so that the position coordinates of the detected specific location of the sample and the position coordinates of the tip of the needle are allowed to approach each other (S1505).
According to the present embodiment, since the center-of-gravity coordinates and the reliability are obtained by the same method as described in the third embodiment, the robustness against noise becomes stronger, and the tip position of the needle can be obtained with higher reliability.
Next, another example in which the localization apparatus 100, 600, 1100, or 1800 described in the first to third embodiments is applied to an actual apparatus will be described.
The sample inspection apparatus 1600 is connected to a display apparatus 1610 and an input/output apparatus 1611 via the bus 1606. The sample inspection apparatus 1600 may be wired or wirelessly connected to the display apparatus 1610 and the input/output apparatus 1611. It is noted that, although the display apparatus 1610 and the input/output apparatus 1611 are provided outside the sample inspection apparatus 1600 in
A processing flow for detecting a specific location of a sample will be described with reference to
First, a target sample (not illustrated) is conveyed into the sample inspection apparatus 1600 by using the conveying unit 1602 (S1701). Next, the specific location of the sample is detected by using the localization program stored in advance in the computer 1603 and the image of the sample imaged by the image sensor 1604 or the charged particle beam unit 1601 (S1702). This localization program is a localization program corresponding to the flowchart described in the third embodiment with reference to
Herein, when the calculated reliability is lower than a predetermined threshold value, an operator may be notified by the alarm. In addition, when the position coordinates calculated by using another localization program stored in the computer 1603 are compared and both exist within the allowable range, the operation may be continued. When the operator is notified by the alarm, the operator refers to the image detected by the image sensor 1604 displayed on the display apparatus 1610 and the position coordinates of the center-of-gravity obtained in S1303 of the flowchart illustrated in
According to the present embodiment, since the center-of-gravity coordinates and the reliability are obtained by the same method as described in the third embodiment, the robustness against noise becomes stronger, and the specific location of the sample can be obtained with higher reliability.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/012616 | 3/25/2021 | WO |