The present disclosure relates to an inspection method of a target object using an image.
A technique for carrying out an inspection for an abnormality using an image of a product has been proposed. For example, Patent Document 1 discloses an appearance inspection device which captures an image of a tablet as the product to be inspected in three directions, and performs a shape inspection, a color inspection, and a crack inspection on the image in the three directions to determine whether the tablet is qualified or not.
In an appearance inspection device of Patent Document 1, the same inspection is performed in three directions with respect to an image of an object to be inspected. However, in reality, anomalies tend to vary from surface to surface or part to part of each product to be inspected.
It is one object of the present disclosure to provide an inspection device capable of performing an abnormality determination in an image recognition method suitable for each plane or each portion of a product to be inspected.
According to an example aspect of the present disclosure, there is provided a learning device including:
According to another example aspect of the present disclosure, there is
According to a further example aspect of the present disclosure, there is provided a recording medium storing a program, the program causing a computer to perform a process including:
According to a further example aspect of the present disclosure, there is provided an inspection device including:
According to a still further example aspect of the present disclosure, there is provided an inspection method including:
According to a yet still example aspect of the present disclosure, there is provided a recording medium storing a program, the program causing a computer to perform a process including:
According to the present disclosure, it becomes possible to perform an abnormality determination in an image recognition method suitable for each plane or each portion of an inspection object.
In the following, example embodiments will be described with reference to the accompanying drawings.
[Overview of Inspection]
First, an overview of inspection by an inspection device 100 according to the present disclosure will be described.
A light 3 and a high-speed camera 4 are disposed above the rail 2. Depending on a shape of the object and a type of an abnormality to be detected, a plurality of lights in various intensities and lighting ranges are installed. Especially in a case of a small object such as the tablet 5, since a type, a degree, a position, and the like of several lights may be used to capture images under various lighting conditions.
The high-speed camera 4 captures images of the tablet 5 under illumination at high speed and outputs captured images to the inspection device 100. In a case where each image is taken by the high-speed camera 4 while moving the tablet 5, it is possible to capture images of a minute abnormality which exists on the tablet 5 without missing that abnormality. Specifically, the abnormality which occurs on the tablet may be adhesion of a hair, a minute crack, or the like.
The tablet 5 is reversed by a reversing mechanism provided on the rail 2. In
[Hardware Configuration]
The interface 11 inputs and outputs data to and from an external device. Specifically, the image sequence (temporal images) of the tablet captured by the camera 4 is input through the interface 11. Also, a determination result of the abnormality generated by the inspection device 100 is output to the external device through the interface 11.
The processor 12 corresponds to one or more processors each being a computer such as a CPU (Central Processing Unit) and controls the entire inspection device 100 by executing programs prepared in advance. The processor 12 may be a GPU (Graphics Processing Unit) or a FPGA (Field-Programmable Gate Array). The processor 12 executes an inspection process to be described later.
The memory 13 is formed by a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The memory 13 is also used as a working memory during executions of various processes by the processor 12.
The recording medium 14 is a non-volatile and non-transitory recording medium such as a disk-shaped recording medium or a semiconductor memory and is formed to be detachable with respect to the inspection device 100. The recording medium 14 records various programs executed by the processor 12. When the inspection device 100 performs the various processes, the programs recorded on the recording medium 14 are loaded into the memory 13 and executed by the processor 12.
The DB 15 stores the image sequence input from the camera 4 as needed. The input section 16 includes a keyboard, a mouse, and the like for a user to perform instructions and input. The display section 17 is formed by, for instance, a liquid crystal display, and displays a recognition result of the target object.
[Functional Configuration]
The target object region extraction unit 21 extracts a region of the tablet 5 which is a target object to be inspected from the input image sequence, and outputs an image sequence (hereinafter, referred to as the “target object image sequence”) indicating the region of the target object. The target object image sequence corresponds to a set of images in which only a portion of the target object is extracted from the images captured by the camera 4 as illustrated in
The group discrimination unit 22 uses a group discrimination model to classify a plurality of frame images forming the target object image sequence. The group discrimination unit 22 outputs the image sequence of each group acquired by the classification to a corresponding recognizer 23. Each of the recognizers 23 uses the recognition model to perform an image recognition with respect to the image sequence of each group, and determines whether or not an abnormality exists. Each of the recognizers 23 outputs the determination result to the integration unit 24. Note that the group discrimination model used by the group discrimination unit 22 and the learning of the recognition model used by the recognizer 23 will be described later.
The integration unit 24 generates a final determination result of the tablet 5 based on the determination result output by the plurality of recognizers 23. For instance, in a case where each of the recognizers 23 performs a binary decision (0: normal, 1: abnormal) for a normality or the abnormality of the tablet 5, the integration unit 24 uses a max function, and decides the final determination result so as to indicate the abnormality when even one of the determination results of the three groups indicates the abnormality. Moreover, in a case where each of the recognizers 23 outputs a degree of abnormality for the tablet 5 in a range of “0” to “1”, the integration unit 24 outputs the degree of abnormality for an image having the highest degree of abnormality by using the max function as the final determination result.
In the above-described configuration, the target object region extraction unit 21 corresponds to an example of an acquisition means, the group discrimination unit 22 corresponds to an example of a group discrimination means, the recognizers 23 correspond to an example of a recognition means, and the integration unit 24 corresponds to an example of an integration means.
[Process of Each Part]
(Acquisition of Target Object Image Sequence)
(Learning of Group Discrimination Unit and Recognizer)
In
As illustrated in
Thus, since the need for two recognition models arises, a group discrimination model G is trained to classify all samples S into two groups. In detail, the group discrimination model G is trained using the correct answer sample group k1 and the incorrect answer sample group k1′. When the training of the group discrimination model G is completed, all samples S are input into the acquired group discrimination model G, and the incorrect answer sample group k1″ is acquired. Since the aforementioned incorrect answer sample group k1′ is a result by the recognition model M1 and does not necessarily match with the discrimination result by the group discrimination model G, the incorrect answer sample group acquired by the group discrimination model G is distinguished as k1″.
Accordingly, since the group discrimination model G which classifies all samples S into two groups has been acquire, next, a second recognition model is generated. In detail, the incorrect answer sample group k1″ is used to train a recognition model M2 different from the recognition model M1. Then, the inference is performed by inputting the incorrect answer sample group k1 for the acquired recognition model M2, to acquire the correct answer sample group k2 by the recognition model M2 and the incorrect answer sample group k2′.
Here, the incorrect answer sample group k2′ is a sample group for which it is difficult to correctly determine the abnormality depending on the added recognition model M2. In other words, the recognition models M1 and M2 are not sufficient to correctly determine all samples S, and additional recognition models are needed. Therefore, next, the number of necessary recognition models is further increased by one to be N=3, and the group discrimination model G is trained to classify all samples S into three groups.
Thus, the above-described loop process is repeated until any of the following end conditions is provided, and the group discrimination model is updated and the recognition model is added.
Accordingly, it becomes possible to perform the abnormality determination using an appropriate number of the recognizers 23 in accordance with the target object image sequence generated by capturing.
Note that a method for updating the group discrimination model G as the number of recognition models increases depends on a type of the group identification model G. For instance, in a case where a k-means or a SVM (Support Vector Machine) is used as the group discrimination model G, a model is added for updating. In addition, in a case where a Kdtree is used as the group discrimination model G, the number of groups is increased for a re-learning.
In actual training, the number of samples belonging to the incorrect answer sample group decreases as the above loop process is repeated. Therefore, in order to train the group discrimination model and the recognition model to be added, it is necessary to secure a data number to be used for training by a data augmentation. Moreover, the iterations of the loop process cause an imbalance in the number of data in the correct and incorrect answer sample groups, it is desirable to eliminate the imbalance by oversampling or undersampling as necessary.
When the incorrect answer images are acquired, a group learning unit 42 trains the group discrimination model so as to increment the iteration number k of the loop process by one (k=k+1) and performs the classification into k(=2) groups, and generates the group discrimination unit parameters P2.
In the second step of the loop process (k=2), the group discrimination unit parameters P2 acquired in the first step is set to the group discrimination unit 22. The group discrimination unit 22 performs the inference of dividing the target object image sequence 32 into two groups. Accordingly, incorrect answer estimation images 35 (corresponding to the aforementioned incorrect answer sample group k1) are acquired. The recognizer learning unit 41 trains the second recognizer 23 using the incorrect answer estimation images 35 and the input label sequence 33, and generates the recognizer parameters P1 corresponding to the second recognizer 23. Moreover, the target object image sequence 32 is input to the second recognizer 23 acquired by training to perform the inference, the correct/incorrect answer images 34 are acquired. The correct answer images correspond to the aforementioned correct answer sample group k2, and the incorrect answer image correspond to the aforementioned incorrect answer sample group k2′.
When incorrect answer images are acquired, the group learning unit 42 further increments the iteration number k of the loop process by one, and trains the group discrimination model so as to perform grouping into k (=3) groups, and generates the group discrimination unit parameters P2. Next, in the same manner as in the second step, a process of a third step (k=3) is executed. Accordingly, the loop process is iteratively executed until the aforementioned end condition is satisfied, and the recognition model and group recognition model are obtained based on the recognizer parameters P1 and the group discrimination unit parameters P2 at the end of the process.
In the above-described configuration, the target object region extraction unit 21 corresponds to an example of an acquisition means, and the recognizer learning unit 41 and the group learning unit 42 correspond to an example of a learning means.
Next, the k(=1)th recognizer 23 performs the inference of the target object image sequence 32 (step S13). The recognizer learning unit 41 trains the k-th recognizer 23 by the inference result of the k-th recognizer 23 and the input label, and acquires the recognizer parameters P1. Moreover, the recognizer learning unit 41 performs the inference of the target object image sequence 32 by the recognizer 23 after the training, and outputs the correct/incorrect answer images 34 (step S14).
Next, the group learning unit 42 increments the iteration number k by 1 (k=k+1), trains the group discrimination model so as to discriminate k groups using the correct/incorrect answer images 34, and acquires the group discrimination unit parameters P2 (step S15).
Next, the group discrimination unit 22 extracts the features from the target object image sequence 32, performs a group discrimination, and outputs images classified into the k groups (step S16). Next, the k-th recognizer 23 performs the inference with respect to the k-th group image (that is, the image estimated as the incorrect answer image of the (k−1)th recognizer 23) (step S17). Next, the recognizer learning unit 41 trains the k-th recognizer 23 by the inference result of the k-th recognizer 23 and the input label, and acquires the recognizer parameters P1. The recognizer learning unit 41 performs the inference of the target object image sequence 32 by the kth recognizer 23 after the learning, and outputs the correct/incorrect answer images 34 (step S18).
Next, the group learning unit 42 increments k by 1 (k=k+1) using the correct/incorrect answer images 34 to train the group discrimination model so as to discriminate the k groups, to acquire the group discrimination unit parameters P2 (step S19).
Next, it is determined whether or not the above-described end condition is provided (step S20), and when the end condition is not satisfied (step S20: No), the learning process goes back to the step S16. On the other hand, when the end condition is satisfied (step S20: Yes), the learning process is terminated.
(At Inspection (at Inference))
The target object region extraction unit 21 generates the target object image sequence 36 based on the input image sequence, and outputs the target object image sequence 36 to the group discrimination unit 22. The group discrimination unit 22 classifies images of the target object image sequence 36 into N groups, and outputs the classified images to the N recognizers 23. The N recognizer 23 determines a presence or absence of abnormality in each input image, and outputs the determination result to the integration unit 24. The integration unit 24 integrates the input determination result and outputs the final determination result.
Next, the group discrimination unit 22 extracts the features from the target object image sequence 36, and performs the discrimination for the N groups, and outputs the image sequence for each of the N groups (step S33). Subsequence, the N respectively recognizers perform the abnormality determination based on the image sequences of the corresponding groups (step S34). After that, the integration unit 24 performs a final determination by integrating respective determination results of the recognizers 23 for each group (step S35). Accordingly, the inspection process is terminated.
Note that the group discrimination unit 22 classifies the images of the target object image sequence into a plurality of groups; however, with respect to a case where a group to which not even one captured image belongs exists among the plurality of groups, the inspection device 100 may determine that the inspection is insufficient, and may output that determination as the final determination result.
As described above, according to the first example embodiment, the training of each recognition model of the recognizers 23 and the training of the group discrimination model of the group discrimination unit 22 are alternately repeated to generate a necessary number of recognition models and the group discrimination model for classifying the images of the image sequence into the necessary number of groups. Therefore, it is possible to improve accuracy of the abnormality determination using an appropriate number of recognizers.
Next, a second example embodiment will be described. In the second example embodiment, each of a group discrimination unit and recognizers is formed by a neural network (NN: Neural Network) to perform learning of an end-to-end (End to End). Accordingly, the group discrimination unit and the recognizers form a single unit, and the learning is performed consistently.
[Hardware Configuration]
A hardware configuration of an inspection device 200 of the second example embodiment is the same as that of the first example embodiment, and explanations thereof will be omitted.
[Functional Configuration]
The target object images are also input to the post-stage NN. The post-stage NN corresponds to a recognizer which performs the abnormality determination, and has a relatively heavy structure. The post-stage NN extracts the features of each of the images from the input target object image sequence, performs the abnormality determination, and outputs degrees of abnormality. The degrees of abnormality output by the post-stage NN are integrated by the integration unit 24, and the integrated degree is output as the final determination result.
As the post-stage NN, for instance, a CNN (Convolutional Neural Network) or a RNN (Recurrent Neural Network) can be used. In a case where the post-stage NN is the CNN, the weights output by the front NN are multiplied by a lost value calculated by the image unit to perfume the learning. In a case where the post-stage NN is the RNN, the weights output by the pre-stage NN are multiplied by temporal features to perform the learning. In a case where the pre-stage NN outputs the weights by the pixel unit, the post-stage NN may be designed to further multiply the feature map (feature map) of an intermediate layer by the weights. In this case, it is necessary to re-size the weights output by the pre-stage NN in accordance with a size of the feature map.
As described above, the NN is formed by the pre-stage NN and the post-stage NN, and by simultaneously and consistently training the pre-stage NN and the post-stage NN, the weighting of the pre-stage NN is learned so as to increase a recognition accuracy of the post-stage NN. At that time, it is expected to increase a weight for an image which is difficult to recognize and improve a recognition ability of that image which is difficult to recognize.
In the second example embodiment, the post-stage NN corresponding to the recognizer is regarded as a single NN; however, different parameter sets for the post-stage NN are functionally used as a plurality of recognition models by using the weighting as a machine learning-based attention (Attention).
[At Learning]
(Configuration at Learning)
The recognizer 52 performs the abnormality determination by extracting the features of the target object image sequence 32 based on the weights output by the weighting unit 51, and outputs the degree of abnormality. The learning unit 53 performs the learning of the weighting unit 51 and the recognizer 52 based on an input label series 33 and the abnormality degree output by the recognizer 52, and generates weighting unit parameters P3 and recognizer parameters P4.
(Learning Process)
Next, the weighting unit 51 outputs the weights by the image unit (or the pixel unit) for the target object image sequence 32 by using the pre-stage NN (step S43). Next, the recognizer 52 performs the inference by the post-stage NN described above (step S44). In a case where the NN 50 is the RNN, the recognizer 52 weights the temporal features using the weights output in step S43.
Next, the learning unit 53 performs the learning of the weighting unit 51 and the recognizer 52 using the inference result and the input label of the recognizer 52 to acquire the weighting unit parameters P3 and the recognizer parameters P4 (step S45). Note that in a case where the NN 50 is the CNN, the learning unit 53 weights a lost by using the weights output at step S43. After that, the learning process is terminated.
[At Inspection (at Inference)]
(Configuration at Inspection)
The target object image sequence 36 formed by the images acquired by taking the actual inspection object is input to the weighing unit 51. The weighting unit 51 generates weights by the image unit (or the pixel unit) based on the target object image sequence, and outputs the weights to the recognizer 52. The recognizer 52 performs the abnormality determination using the target object image sequence 32 and the weights, and outputs each degree of abnormality as the determination result to the integration unit 24. The integration unit 24 integrates the degree of abnormality being input, and outputs a final determination result.
(Inspection Process)
Next, the weighting unit 51 outputs weights by the image units (or the pixel unit) of the target object image sequence 36 (step S53). Next, the recognizer 52 performs the abnormality determination of the target object image sequence 36 (step S54). In a case where the NN 50 is the RNN, the recognizer 52 weights temporal features with the weights output in step S53. Subsequently, the integration unit 24 performs a final determination by integrating the degree of abnormality output by the recognizer 52 (step S55). After that, the process is terminated.
As described above, in the second example embodiment, the group discrimination unit and the recognizer are formed by the NN and are simultaneously and consistently learned. In detail, the group description unit is formed by the pre-stage NN, and the recognizer is formed by the post-stage NN. Therefore, it is possible to perform the group discrimination by the pre-stage NN and perform the abnormality determination with a different parameter set for the post-stage NN, as a plurality of recognition models functionally are used.
A part or all of the example embodiments described above may also be described as the following supplementary notes, but not limited thereto.
(Supplementary Note 1)
A learning device comprising:
(Supplementary Note 2)
The learning device according to supplementary note 1, wherein the learning means alternately repeats training of the group discrimination model and training of the recognition models.
(Supplementary Note 3)
The learning device according to supplementary note 2, wherein the learning means increase a number of the recognition models in a case where inference results by the recognition models include an incorrect answer.
(Supplementary Note 4)
The learning device according to supplementary note 2 or 3, wherein the learning means terminates in any of a case in which a number of iterations of the training of the group discrimination model and the training of the recognition models reaches a predetermined number, a case in which accuracy of the recognition models reaches a predetermined accuracy, and a case wherein a range of improvement in the accuracy of the recognition models is lower than or equal to a predetermined threshold.
(Supplementary Note 5)
The learning device according to any one of supplementary notes 1 to 4, wherein the recognition models determine an abnormality of the target object included in the captured images.
(Supplementary Note 6)
The learning device according to supplementary note 1, wherein
(Supplementary Note 7)
The learning device according to supplementary note 6, wherein
(Supplementary Note 8)
A learning method comprising:
(Supplementary Note 9)
A recording medium storing a program, the program causing a computer to perform a process comprising:
(Supplementary Note 10)
An inspection device comprising:
(Supplementary Note 11)
An inspection method comprising:
(Supplementary Note 12)
A recording medium storing a program, the program causing a computer to perform a process comprising:
While the disclosure has been described with reference to the example embodiments and examples, the disclosure is not limited to the above example embodiments and examples. It will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure as defined by the claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/008389 | 3/4/2021 | WO |