This patent application is based on and claims priority pursuant to 35 U.S.C. § 119(a) to Japanese Patent Application Nos. 2017-006524, filed on Jan. 18, 2017, and 2017-104704, filed on May 26, 2017, in the Japan Patent Office, the entire disclosure of which is hereby incorporated by reference herein.
The present invention relates to an information processing apparatus, an information processing method, and a recording medium.
A machine learning algorithm is known, which determines a state of certain data. The machine learning can be widely applied, such as to determine whether or not a subject appearing in an image is a person, identify a scene appearing in an image, identify a sentence, identify audio, and the like.
The machine learning algorithm is also used to identify a material or to inspect a defect in the material. For example, an abnormality such as a defect can be detected using multi-resolution analysis. In such case, a defect or a non-defect is determined using a plurality of images having respective resolutions, based on detection of an abnormal quantity or comparison with a feature value prepared in advance. However, in order to determine a degree of abnormality, it has been necessary to adjust various parameters since there is no statistically meaningful threshold value.
Example embodiments of the present invention include an information processing apparatus, which: applies a plurality of different spatial filters to one input image to generate a plurality of filtered images; calculates, for each of a plurality of pixels included in each of the plurality of filtered image, a score indicating a value determined by a difference from a corresponding one of a plurality of model groups, using the plurality of model groups that respectively correspond to the plurality of filtered images and each including one or more models having a parameter representing a target shape; calculates an integrated score indicating a result of integrating the scores of the respective plurality of pixels corresponding to each other over the plurality of filtered images; and determines an abnormality based on the integrated score.
Example embodiments of the present invention include an information processing method and a recording medium storing an information processing program.
A more complete appreciation of the disclosure and many of the attendant advantages and features thereof can be readily obtained and understood from the following detailed description with reference to the accompanying drawings, wherein:
The accompanying drawings are intended to depict embodiments of the present invention and should not be interpreted to limit the scope thereof. The accompanying drawings are not to be considered as drawn to scale unless explicitly noted.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the present invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise.
In describing embodiments illustrated in the drawings, specific terminology is employed for the sake of clarity. However, the disclosure of this specification is not intended to be limited to the specific terminology so selected and it is to be understood that each specific element includes all technical equivalents that have a similar function, operate in a similar manner, and achieve a similar result.
Hereinafter, embodiments of an information processing apparatus, an information processing method, and a program according to the present invention will be described in detail referring to the accompanying drawings.
In the example of
The CPU 131 controls entire operation of the information processing apparatus 130. The CPU 131 uses a predetermined area of the RAM 133 as a work area to execute a program stored in the ROM 132, the memory 134, or the like, and implements various functions of the information processing apparatus 130. Specific contents of the functions of the information processing apparatus 130 will be described later.
The ROM 132 is a non-volatile memory (non-rewritable memory) for storing the program, various setting information, and the like related to the information processing apparatus 130.
The RAM 133 is a storage device such as synchronous dynamic random access memory (SDRAM), and functions as the work area of the CPU 131 or a buffer memory, etc.
The memory 134 is an auxiliary storage device such as a hard disk drive (HDD). The input device 135 accepts operation by a user, and may be implemented by a keyboard, a mouse, touch panel, etc. The display 136 displays various types of information relating to the information processing apparatus 130, and includes a liquid crystal display, for example.
The device I/F 137 is an interface for connecting the information processing apparatus 130 to the camera 120 and the output apparatus 140, for example. The communication I/F 138 is an interface for connecting the information processing apparatus 130 with a network such as the Internet. For example, instead of the device I/F 137, the information processing apparatus 130 may be connected to the camera 120 and the output apparatus 140 via the communication I/F 138.
The obtainer 201 obtains the captured image from the camera 120. The generator 202 applies a plurality of different spatial filters to one input image (the captured image obtained by the obtainer 201) to generate a plurality of filtered images. Here, the number of filters to be applied is 12, but it is not limited to 12. In a case where 12 filters are used as in the present embodiment, for example, 3 scales×4 directions (0 degree direction, 45 degree direction, 90 degree direction, 135 degree direction) can be set. For example, as filter coefficients, four filter matrices represented by equations 1 to 4 below can be used.
As for the scales, in addition to an equal magnification image, by applying images obtained by reducing the input image by ¼ times and ⅛ times to the above-described filters and restoring the images to equal magnification images, a total of 12 filtered images are obtained. Here, a captured image to which the above-described filters are not applied (exceptionally, the captured image may be regarded as one aspect of the filtered image) is added, and abnormality detection is performed using 13 filtered images in this example.
The learning device 203 learns the image of the non-defective object 110 based on a plurality of images having no defect (non-defective images) prepared in advance. More specifically, the learning device 203 learns a model group including one or more models each having a parameter representing a target shape for each of the plurality of spatial filters. In this example, the learning device 203 learns 13 model groups corresponding one-to-one with 13 filtered images. In this example, a mean (pixel mean value) μ(x, y) and a variance (pixel variance value) σ2(x, y) of pixel values of the respective plurality of non-defective images are adopted as parameters, but the parameters are not limited to the mean and the variance.
Hereinafter, the learning method will be described. Here, an example will be described of learning a model group including a plurality (K) of models, for any one spatial filter. As for the models, it is assumed that a plurality of pixels has a normal distribution, and it is a premise that there is the plurality (K) of such models. An image that can be observed is assumed to be an image generated from any of the plurality of models. Here, it is unknown which model the image is observed from, and the model is a hidden variable. When learning is completed (when estimation of the model is completed), the pixel mean value μ(x, y) and the pixel variance value σ2(x, y) are obtained for each of the plurality of models. During inspection, presence or absence of the defect in the captured image is determined based on the parameters for each of the plurality of models.
Since the hidden variable and the model parameters cannot be determined at the same time, here, the learning is performed by using an expectation-maximization (EM) algorithm, which is effective for estimating the model parameters in a case where there is the hidden variable. Hereinafter, an E step and an M step of the EM algorithm will be separately described.
The learning is started from the E step first. The learning device 203 calculates a Z-score for each of the K models, for each of the plurality of pixels included in the input image (n non-defective images (non-defective filtered images) corresponding to the one spatial filter). A Z-score Znk(x, y) for the k-th model of a pixel (x, y) included in the n-th input image is represented by equation 5 below. In equation 5 below, In(x, y) is a pixel value (luminance value) of the pixel (x, y) of the n-th input image. In addition, μk(x, y) and σ2k(x, y) are parameters of the pixel (x, y) of the k-th model. More specifically, μk(x, y) is a pixel mean value of the pixel (x, y) of the k-th model, and σ2k(x, y) is a pixel variance value of the pixel (x, y) of the k-th model. The Z-score is a value representing an outlier in a case where a normal distribution is assumed for a probability model.
Next, the learning device 203 obtains a probability enk that the n-th input image In corresponds to the k-th model k. The probability enk can be represented by equation 6 below. In this example, the learning device 203 assigns the Z-score to an equation of a standard normal distribution with a mean of 0 and a variance of 1, to calculate a probability density for each pixel, and calculates a product of the probability density for each pixel or each area to obtain a joint probability. Note that, X and Y in equation 6 below are the number of pixels in the lateral direction and the vertical direction of the input image, respectively. Here, the probability enk is obtained from a distribution of the pixel values over the entire image, not for each pixel. As a result, it is possible to appropriately obtain a probability of which model corresponds to the input image while viewing the entire parts of the input image.
Next, the learning device 203 uses the above-described probability enk to obtain a burden ratio γnk corresponding to an expected value of occurrence of the input image from any model. The burden ratio γnk can be obtained by equation 7 below. N represents the total number of input images, and K represents the number of models. The above is the content of the E step.
After the E step is completed, the learning device 203 estimates the parameters of each model in the M step. More specifically, the learning device 203 performs weighting with the burden ratio γnk to obtain the pixel mean value μk(x, y) of the pixels of the k-th model. In this example, the pixel mean value μk(x, y) can be obtained by equation 8 below.
The learning device 203 performs weighting with the burden ratio γnk to obtain the pixel variance value σ2k(x, y) of the pixels of the k-th model. In this example, the pixel variance value σ2k(x, y) can be obtained by equation 9 below.
Nk in equations 8 and 9 is obtained by equation 10 below.
After the above-described M step is completed, the learning device 203 returns to the E step again, and repeats the processing until a parameter variation from the previous time becomes equal to or less than a threshold value (until a convergence condition is satisfied). By repeating the above-described E step and M step, the parameters of the model can be estimated in a state where there is the hidden variable. Note that, as for an initial value, for example, μk(x, y) may be a random number and σ2k(x, y) may be 1, or the pixel value of the input image may be set as μk(x, y) as the initial value of the model, in a case where it is clear which model the input image is desired to be classified into, so that the user can classify types while viewing the input image. As described above, the learning device 203 learns the parameters (μk(x, y), σ2k(x, y)) of the K models.
Next, the learning device 203 determines whether or not the parameter variation from the previous time is equal to or less than the threshold value (step S15). In a case where a result in step S15 is negative (step S15: No), the processing in step S11 and subsequent steps described above is repeated. In a case where the result in step S15 is positive (step S15: Yes), the parameters calculated in step S14 are determined as the final parameters (step S16). The parameters determined as described above are stored in the memory 134 or the like, for example.
As described above, in the learning processing of the present embodiment, the pixel mean value μk(x, y) and the pixel variance value σ2k(x, y) that optimize the burden ratio γnk are determined and stored. (a) of
Referring back to
The first calculator 211 uses a plurality of model groups corresponding one-to-one with the plurality of filtered images, to calculate a score indicating a value corresponding to a difference from a corresponding one of the model groups (in this example, a higher value is indicated as the difference from the model group is larger), for each of the plurality of pixels included in each of the plurality of filtered images. The first calculator 211 calculates the score, for each of the plurality of pixels included in each of the plurality of filtered images, based on the pixel values of the respective pixels and the parameters of the corresponding one of the model groups. Here, the score is represented by the Z-score.
Hereinafter, a method will be described that uses a model group corresponding to any one filtered image to calculate the Z-score of each pixel included in the filtered image; however, the Z-score is calculated by the same method for each pixel of other filtered images. Here, an example will be described in a case where K models are included in the model group corresponding to the any one filtered image; however, not limited to the example, only one model may be included in the model group (the number of models included in the model group is arbitrary), for example.
When the one filtered image (input image) is generated by the generator 202 and input to the first calculator 211, the first calculator 211 uses equation 5 above to calculate the Z-score Znk(x, y) for each model, for each pixel included in the one filtered image. In addition, the first calculator 211 uses equation 6 above to obtain the probability enk. Then, the first calculator 211 uses equation 11 below to obtain an outlier from the model, that is, a defect estimation amount Sn(x, y), for each pixel included in the one filtered image. In a multi-model in which a model group includes a plurality of models, the defect estimation amount Sn(x, y) is the Z-score based on the occurrence probability of the learned model. In this example, the first calculator 211 calculates the defect estimation amount Sn(x, y) of each pixel of the one filtered image as the final Z-score. That is, in the present embodiment, in a case where a model group corresponding to any filtered image includes a plurality of models, for each of the plurality of pixels included in the filtered image, the first calculator 211 determines the final Z-score of each of the pixels on the basis of a unit score indicating a value corresponding to a difference from each model of each of the pixels (in this example, the Z-score Znk(x, y) for each of the K models), and the probability enk that the filtered image corresponds to each model.
Here, in a portion where the luminance values of the image (learned image) obtained by the learning are substantially uniform, if the luminance value of a corresponding portion of the input image for inspection is slightly different from the luminance value of the learned image, the Z-score becomes large as a major abnormality. However, since a minute difference between pixels in that portion cannot be known by human senses, the minute difference is not regarded as abnormal in general visual inspection. Thus, for the purpose of matching to such human visual characteristics, by changing equation 5 above to equation 12 below, accuracy of the inspection can be improved.
Here, sat(x) in equation 12 above is a function represented by equation 13 below.
sat(x)=x if x>c
c if x<=c [Mathematical Equation 13]
In equation 13 above, c is a constant and is a parameter to be adjusted while a result of the test is examined. By applying the function represented by equation 13 above to the variance value, even in a case where the luminance values of the respective pixels included in an area of the learned image are all or almost uniform, the learned variance value is 0 or a constant value (a value of c above) that is not a small value. For that reason, even in a case where the mean value of the luminance values of the respective pixels in the portion where the luminance values in the learned image are substantially uniform, and the luminance value of the corresponding portion of the input image are slightly different from each other during inspection, the Z-score can be inhibited from becoming too large.
Note that, even if other than the function represented by equation 13 above is used, it is sufficient that the learned variance value does not become too small, so that equation 5 above may be changed to equation 14 below, for example.
Here, d in equation 14 above is a constant and is a parameter to be adjusted while a result of the test is examined. By using this equation, the denominator does not become excessively small, so that the Z-score does not become too large.
As described above, in the portion where the luminance values of the learned image are substantially uniform, when the luminance value of the corresponding portion of the input image during inspection is shifted even a little, the shift is detected as a large abnormality; however, the minute difference between the pixels cannot be known by the human senses. Thus, by reducing sensitivity of abnormality detection in that portion, the above disadvantage can be solved. That is, the information processing apparatus 130 of the present embodiment may have a function (variance corrector) of correcting the pixel variance value indicating the variance of the pixel values of the non-defective image to a value larger than a threshold value (a very small value such as 0). For example, the above-described first calculator 211 may also serve as the variance corrector, or the variance corrector may be provided separately from the above-described first calculator 211.
As described above, the first calculator 211 calculates the Z-score for each of the plurality of pixels included in each of the plurality of filtered images. In the following description, the Z-score of the pixel (x, y) of the m-th filtered image may be represented as Zm(x, y).
Here, since a normal distribution is assumed as the occurrence probability of each pixel, the Z-score indicates an occurrence probability of the corresponding pixel of the input image represented by a multiple of σ in the standard normal distribution, when the learned model is considered. In this example, an example has been described in which a multi-model is used; however, the same thing may be performed assuming a single model, of course. In that case, K=1 is set, the model is obtained by equations 8 and 9 above during learning, and the Z-score is calculated by equation 5 during detection. Here, it is assumed that the pixels have a normal distribution; however, to further improve the accuracy, modeling may be performed as a mixed Gaussian distribution using the EM algorithm, in the same way as performed in the multi-model.
The second calculator 212 calculates an integrated score Ztotal(x, y) indicating a result of integrating the Z-scores Zm(x, y) of the respective plurality of pixels corresponding to each other over the plurality of filtered images. That is, it may be regarded that the integrated score Ztotal(x, y) is calculated of the pixels of one image in which the plurality of filtered images is integrated. It is assumed that 13 filtered images have the same number of pixels, and the pixels correspond to each other, in this example. Here, since the Z-score Zm(x, y) is a standard deviation in the standard normal distribution, the second calculator 212 calculates the integrated score Ztotal(x, y), for each of the plurality of pixels corresponding to each other over the plurality of filtered images, on the basis of a joint probability of occurrence probabilities Pm(x, y) corresponding to the Z-scores Zm(x, y) of the respective plurality of pixels. More specifically, the second calculator 212 calculates the occurrence probability Pm(x, y) corresponding to the Z-score Zm(x, y) by equation 15 below, and calculates the integrated score Ztotal(x, y) by equation 16 below.
The integrated score Ztotal(x, y) is a value of the Z-score in consideration of all the model groups. This value uniformly indicates all the spatial filters, that is, elements such as various scales and various edge directions, as an occurrence probability with a basis of the standard deviation in the standard normal distribution, and coincides with a value indicating an acceptable range represented by a multiple of σ, which is often used in a so-called production process and the like. For that reason, by determining the threshold value with this value, it is not necessary to set an individual threshold value for each of the plurality of spatial filters, and further, it is possible to determine an abnormality with a criterion with a basis such as the value indicating the acceptable range represented by the multiple of σ.
As described above, in the present embodiment, during inspection, the plurality of different spatial filters is applied to one input image (captured image) and the plurality of filtered images is generated, and the Z-score Zm(x, y) is calculated corresponding to the difference from the corresponding one of the model groups, for each of the plurality of pixels included in each of the plurality of filtered images. Then, the integrated score Ztotal(x, y) is calculated in which Z-scores Zm(x, y) is integrated of the respective plurality of pixels corresponding to each other over the plurality of filtered images.
Referring back to
The determination result notifier 206 notifies the output apparatus 140 of information indicating the abnormal area determined by the determiner 205. The output apparatus 140 receiving the notification outputs information for notification of the abnormal area (the information may be audio information or image information).
As described above, in the present embodiment, the abnormal area of the input image (captured image) is determined on the basis of the integrated score Ztotal(x, y) in which the Z-scores Zm(x, y) is integrated of the respective plurality of pixels corresponding to each other over the plurality of filtered images, so that abnormality determination can be performed on a certain criterion uniformly for various objects.
In the above, the embodiments according to the present invention have been described; however, the present invention is not limited to the above-described embodiments, and in the implementation stage, the constituent elements can be modified and embodied without departing from the gist of the invention. In addition, various inventions can be formed by appropriately combining the constituent elements disclosed in the above-described embodiments. For example, some constituent elements may be removed from all the constituent elements described in the embodiment. Further, different embodiments and modifications may be appropriately combined.
The program executed by the information processing system 100 according to the embodiment described above may be stored in a computer-readable recording medium such as a CD-ROM, a flexible disk (FD), a CD-R, a digital versatile disk (DVD), a universal serial bus (USB) memory device, or may be provided or distributed via a network such as the Internet. In addition, various programs may be provided by being incorporated in ROM or the like in advance.
Hereinafter, modifications will be described.
Modification 1
For example, the second calculator 212 may calculate a mean value of the Z-scores Zm(x, y) of the respective plurality of pixels, as the integrated score Ztotal(x, y), for each of the plurality of pixels corresponding to each other over the plurality of filtered images. In this case, the second calculator 212 can calculate the integrated score Ztotal(x, y) by equation 17 below.
Modification 2
For example, the second calculator 212 may calculate a total value of the Z-scores Zm(x, y) of the respective plurality of pixels, as the integrated score Ztotal(x, y), for each of the plurality of pixels corresponding to each other over the plurality of filtered images. In this case, the second calculator 212 can calculate the integrated score Ztotal(x, y) by equation 18 below.
Modification 3
For example, wavelet transformation may be used that computes a filter group at a time.
The above-described embodiments are illustrative and do not limit the present invention. Thus, numerous additional modifications and variations are possible in light of the above teachings. For example, elements and/or features of different illustrative embodiments may be combined with each other and/or substituted for each other within the scope of the present invention.
Each of the functions of the described embodiments may be implemented by one or more processing circuits or circuitry. Processing circuitry includes a programmed processor, as a processor includes circuitry. A processing circuit also includes devices such as an application specific integrated circuit (ASIC), digital signal processor (DSP), field programmable gate array (FPGA), and conventional circuit components arranged to perform the recited functions.
Number | Date | Country | Kind |
---|---|---|---|
2017-006524 | Jan 2017 | JP | national |
2017-104704 | May 2017 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
6983065 | Akgul | Jan 2006 | B1 |
20020081033 | Stentiford | Jun 2002 | A1 |
20030219172 | Caviedes | Nov 2003 | A1 |
20070230770 | Kulkarni | Oct 2007 | A1 |
20080008396 | Kisilev | Jan 2008 | A1 |
20140201126 | Zadeh | Jul 2014 | A1 |
20150296193 | Cote | Oct 2015 | A1 |
20150324662 | Garg | Nov 2015 | A1 |
20160127405 | Kasahara | May 2016 | A1 |
20170004399 | Kasahara | Jan 2017 | A1 |
20170154234 | Tanaka et al. | Jun 2017 | A1 |
20170262974 | Kasahara | Sep 2017 | A1 |
20170323431 | Sarkar | Nov 2017 | A1 |
20180032846 | Yang | Feb 2018 | A1 |
Number | Date | Country |
---|---|---|
2008-020235 | Jan 2008 | JP |
2012-181148 | Sep 2012 | JP |
2014-056415 | Mar 2014 | JP |
2017-167624 | Sep 2017 | JP |
Entry |
---|
Pinheiro et al,., 2014, From Image-level to Pixel-level Labeling with Convolutional Networks, arXiv:1411.6228v3 [cs.CV] (Submitted on Nov. 23, 2014 (v1), last revised Apr. 24, 2015), pp. 1-9. |
Number | Date | Country | |
---|---|---|---|
20180204316 A1 | Jul 2018 | US |