The present application is based on, and claims priority from JP Application Serial Number 2020-094200, filed May 29, 2020, and JP Application Serial Number 2020-094205, filed May 29, 2020, the disclosures of which are hereby incorporated by reference herein in their entirety.
The present disclosure relates to a technology using a capsule network.
In the related art, a capsule network is known as an algorithm in machine learning (International Publication No. 2019/083553, Geoffrey Hinton, Sara Sabour, Nicholas Frosst, “MATRIX CAPSULES WITH EM ROUTING”, published as a conference paper at ICLR 2018, Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton, “Dynamic Routing Between Capsules”, 31st Conference on Neural Information Processing Systems (NIPS 2017)).
A capsule network is an algorithm model having a unit called a capsule at a node of a network. A typical capsule in a capsule network inputs and outputs pose and activation. The pose indicates a state of the capsule that output the pose and takes a form of a vector or a matrix. In addition, the activation is a scalar quantity indicating an activity of the capsule that outputs the activation. The pose and the activation are determined by an output from a plurality of capsules in a previous layer, for example, by the pose and the activation by using a technique called routing-by-agreement. The routing-by-agreement is not limited, but is preferably performed by an Expectation-Maximization (EM) algorithm. The capsule network typically has a multi-layer structure. Some of the layers that configure the multi-layer structure are called “capsule layers”. Preferably, each of the capsule layers has one or more capsules aligned in a so-called depth direction. Then, each of the plurality of capsules arranged in the depth direction in one capsule layer calculates the pose and the activation based on the output from the previous layer, and stores the calculated pose and activation in a memory so as to input into the corresponding plurality of capsules in a next capsule layer. That is, in each capsule layer, the pose and the activation are calculated for each capsule. In a final layer, a class having the same number of capsules as that of target class discriminations and corresponding to the capsule in which the activation is maximized is output. The class discrimination is also called label discrimination. When being designed as a software program, the number of capsules at one depth may be one in each capsule layer. However, in the following specification, in accordance with typical embodiments during learning and estimation, in each capsule layer, on a conceptual plane perpendicular to an axis in the depth direction, that is, intersecting the axis at each depth, it is also expressed that there are the plurality of capsules that configure a two-dimensional array determined by kernel size and stride.
In the related art, when the class discrimination is performed by using the capsule network, a result of the class discrimination is output, but a discrimination basis of the output class is unknown, and it is difficult to know the discrimination basis.
(1) According to a first aspect of the present disclosure, there is provided a method for causing one or more processors to execute. The method for causing one or more processors to execute: performing learning of a first model of a capsule network type including one or more capsule layers each having one or more capsules to reproduce correspondence between a plurality of first data elements included in a first data set and a pre-label corresponding to each of the plurality of first data elements; and inputting the first data set into the learned first model and acquiring first intermediate data based on at least one of a first activation and a first pose included in the one or more capsules, for the one or more capsule layers.
(2) According to a second aspect of the present disclosure, there is provided a method for causing one or more processors to execute using a first model learned in advance. The first model is a capsule network type including one or more capsule layers each having one or more capsules, and is learned to reproduce correspondence between a plurality of first data elements included in a first data set and a pre-label corresponding to each of the plurality of first data elements. The method includes: acquiring first intermediate data based on at least one of a first activation and a first pose included in the one or more capsules, for each of the one or more capsule layers when the first data set is input into the learned first model; inputting a second data element into the first model and acquiring second intermediate data based on at least one of a second activation and a second pose included in the one or more capsules, for each of the one or more capsule layers; and calculating a similarity between the first intermediate data and the second intermediate data, for the one or more capsule layers.
(3) According to a third aspect of the present disclosure, an apparatus is provided. The apparatus includes: one or more processors, in which the one or more processors is configured to execute: performing learning of a first model of a capsule network type including one or more capsule layers each having one or more capsules to reproduce correspondence between a plurality of first data elements included in a first data set and a pre-label corresponding to each of the plurality of first data elements; and inputting the first data set into the learned first model and acquiring first intermediate data based on at least one of a first activation and a first pose included in the one or more capsules, for the one or more capsule layers.
(4) According to a fourth aspect of the present disclosure, an apparatus is provided. The apparatus includes: a storage device which is a capsule network type including one or more capsule layers each having one or more capsules, and stores a first model learned to reproduce correspondence between a plurality of first data elements included in a first data set and a pre-label corresponding to each of the plurality of first data elements; and one or more processors. The one or more processors is configured to execute: acquiring first intermediate data based on at least one of a first activation and a first pose included in the one or more capsules, for each of the one or more capsule layers when the first data set is input into the learned first model; inputting a second data element into the first model and acquiring second intermediate data based on at least one of a second activation and a second pose included in the one or more capsules, for each of the one or more capsule layers; and calculating a similarity between the first intermediate data and the second intermediate data, for the one or more capsule layers.
(5) According to a fifth aspect of the present disclosure, a non-temporary computer-readable medium is provided. The non-temporary computer-readable medium storing instructions for causing one or more processors to execute: performing learning of a first model of a capsule network type including one or more capsule layers each having one or more capsules to reproduce correspondence between a plurality of first data elements included in a first data set and a pre-label corresponding to each of the plurality of first data elements; and inputting the first data set into the learned first model and acquiring first intermediate data based on at least one of a first activation and a first pose included in the one or more capsules, for each of the one or more capsule layers.
(6) According to a sixth aspect of the present disclosure, a non-temporary computer-readable medium storing instructions for causing one or more processors to execute using a first model learned in advance is provided. The first model is a capsule network type including one or more capsule layers each having one or more capsules, and is learned to reproduce correspondence between a plurality of first data elements included in a first data set and a pre-label corresponding to each of the plurality of first data elements. The instructions causes one or more processors to further execute: acquiring first intermediate data based on at least one of a first activation and a first pose included in the one or more capsules, for each of the one or more capsule layers when the first data set is input into the learned first model; inputting a second data element into the first model and acquiring second intermediate data based on at least one of a second activation and a second pose included in the one or more capsules, for each of the one or more capsule layers; and calculating a similarity between the first intermediate data and the second intermediate data for the one or more capsule layers.
In the present embodiment, the first data set 12 is stored in a storage device of the discrimination device 20 from the external device via the data interface. The first data set 12 is used for performing the learning of the first model 30. The first data set 12 has first data elements 12A, 12B, and 12C, and pre-labels 14 corresponding to the first data elements 12A, 12B, and 12C. Of the pre-labels 14, a label corresponding to the first data element 12A is also called a pre-label 14A, a label corresponding to the first data element 12B is also called a pre-label 14B, and a label corresponding to the first data element 12C is also called a pre-label 14C. The first data set 12 includes a plurality of first data elements 12A, 12B, and 12C acquired by sensors. The sensors are various sensors such as an RGB camera, an infrared camera, a depth sensor, a microphone, an acceleration sensor, and a gyro sensor, and are cameras in the present embodiment.
As illustrated in
The pre-label 14 is stored in a first input data set 10 in association with each of the first data elements 12A to 12C. The pre-label has a non-defective product label as a first pre-label and a defective product label as a second pre-label. The pre-label 14 may be simply referred to as the label 14. The defective product label is associated with the first data element 12A as the pre-label. The defective product label is associated with the first data element 12B as the pre-label. The non-defective product label is associated with the first data element 12C as the pre-label. That is, the first model 30 in the present embodiment is used in a case of manufacturing a product in which three spots are printed or engraved on each surface of the cube, and discriminates between the non-defective product and the defective product.
The discrimination device 20 illustrated in
The first model 30 is a learning model of a capsule network type, and is a hierarchy type having a plurality of layers. In a neural network of the related art, one neuron receives an output of a scalar quantity from each of a plurality of other neurons and outputs one scalar quantity by a nonlinear transformation, whereas in the learning model of the capsule network type, as described above, a node called a capsule propagates information by inputting and outputting the scalar quantity, a vector, or matrix type data according to a routing-by-agreement. Regarding the capsule network type algorithm, the contents disclosed in International Publication No. 2019/083553, Geoffrey Hinton, Sara Sabour, Nicholas Frosst, “MATRIX CAPSULES WITH EM ROUTING”, published as a conference paper at ICLR 2018, Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton, “Dynamic Routing Between Capsules”, 31st Conference on Neural Information Processing Systems (NIPS 2017) are adopted.
A flow of generating a vector Mj, which is a pose of a output destination capsule, from a vector Mi, which is a pose of an input source capsule, will be described by the routing-by-agreement, here, EM routing. First, Vij is calculated from the input vector Mi.
[Math. 1]
V
ij=(Vij1, . . . ,Vijh, . . . VijH)T (1.1)
V
ij
=W
ij
M
i (1.2)
Here, each element described in a parentheses on a right side of the equation (1.1) is a scalar value configuring the vector Vij, and H is the number of dimensions of the vector. Further, Wij is a weight matrix. Further, the subscripts i and j satisfy the following equation (2), and ΩL and ΩL+1 are a set of capsule numbers included in a layer L and a layer L+1, respectively.
[Math. 2]
∀i∈ΩL∀j∈ΩL+1 (2)
In the EM routing, the next M step and E step are repeated an appropriate number of times to calculate a weighted average uij of Vij, and then uij is converted to the output vector Mj, thereby generating the output vector M The weighted average uij is expressed by the following equation (3).
[Math. 3]
Here, an element in a parentheses on a right side of the above equation (3) is a scalar value configuring the weighted average uij.
First, it starts from a state initialized by the following equation (4).
[Math. 4]
R
ij=1/|ΩL+1| (4)
M step:
[Math. 5]
R
ij
←R
ij
a
i (5)
Rij is adjusted by multiplying the activation ai of the input source capsule i by the above equation (5).
The weighted average uij of Vij is obtained by the above equation (6), and a weighted variance of Vij is obtained by the above equation (7).
[Math. 8]
a
j=logistic{λ(βa−Σh{(βu+log{σjh})ΣiRij})} (8)
The activation aj of the output destination capsule j is calculated by the above equation (8). Here, βa and βu are parameters and are learned together with Wij. Further, logistic is a general logistic function.
E step:
Next, Rij is updated by the following equations (9) and (10).
In the learning stage of the first model 30, the first model 30 is learned to reproduce the correspondence between each of the first data elements 12A, 12B, and 12C of the first data set 12 and each of the pre-labels 14A, 14B, and 14C corresponding to each of the first data elements 12A, 12B, and 12C of the first data set 12. The first model 30 includes one or more capsule layers having one or more capsules. In the present embodiment, the first model 30 is configured of a plurality of capsule layers each having a plurality of capsules. A detailed configuration of the first model 30 will be described later.
The processor 24 realizes various functions by executing various programs stored in the storage device 22. The processor 24 functions as, for example, a learning section, a first acquisition section, a second acquisition section, or a calculation section. In another embodiment, at least a part of the above-mentioned various functions may be realized by a hardware circuit. Here, in the present specification, the “processor” is a term including a CPU, a GPU, and a hardware circuit.
The output section 26 is used to output various information. The output section 26 is, for example, a liquid crystal monitor. As various information, for example, information about the label of the data element discriminated by using the learned first model 30 is displayed. The output section 26 may be a speaker that outputs audio instead of a display device such as the liquid crystal monitor.
By applying a 5×5 kernel with stride “2” to each of the data elements 12A to 12C of the first data set 12, a partial range that gives the output to one node of the convolution layer 33 is determined within a range of the data elements. The number of kernels applied to each of the data elements 12A to 12C is 32. Therefore, the convolution layer 33 is configured such that a vertical axis, which is a first axis, and a horizontal axis, which is a second axis orthogonal to the first axis, each have a region divided into 14. Further, the number of channels, which is a depth of the convolution layer 33, is configured of 32, which is the same number as the number of kernels. The “partial range” is one region specified by a position on the vertical axis and a position on the horizontal axis on the data element. However, as is clear from the following explanation, a size of the “partial range” on the data element is different depending on the capsule corresponding to the “partial range” or which of the capsule layers 35, 37, 38, and 39 the partial region Rx belongs. The “partial region” is a region specified by a position on the vertical axis and a position on the horizontal axis in the capsule layer. Each “partial region” in the capsule layer has dimensions of “Height”דWidth”דDepth” corresponding to the vertical axis, the horizontal axis, and the channel. In the present embodiment, the number of capsules included in one “partial region” is “1×1×number of depths”. In the present specification, the numerical values of “35”, “37”, “38”, and “39” are substituted for “x” in the notation “partial region Rx” depending on the capsule layers 35, 37, 38, and 39. For example, the partial region R35 indicates a region in the capsule layer 35.
By applying the 1×1×32 kernel to the convolution layer 33 with the stride “1”, from among the partial regions R33 of the convolution layer 33, the partial region R33 giving an output to one capsule of the primary capsule layer 35 is determined. Here, since 16 types of kernels are used with the same size and the same stride, the number of capsules corresponding to one partial region R33 of the convolution layer 33 is 16 in the primary capsule layer 35. A transformation matrix is used to generate the output from the node of the convolution layer 33 to the capsule of the primary capsule layer 35, and no routing-by-agreement is used. The kernel dimension for convoluting into the capsule layer may be expressed as “Height”דWidth”דDepth”דNumber of pose M elements” when the number of channels and the number of pose elements are also taken into consideration. According to this expression, the dimensions of the kernel used for convolution from the convolution layer 33 to the primary capsule layer 35 are 1×1×32×16.
By applying a 3×3×16 kernel to the primary capsule layer 35 with stride “1”, from among the partial regions R35 of the primary capsule layer 35, the partial region (s) R35 giving an output to the capsule included in one partial region R37 of the first capsule layer 37 is determined. Here, since 12 types of kernels are used with the same size, the same dimension, and the same stride, the number of capsules included in the partial region R37 of the first capsule layer 37 is 12. The routing-by-agreement is used to generate the output from the capsule of the primary capsule layer 35 to the capsule of the first capsule layer 37. Here, the kernel applied to the lower layer 35 is also expressed as specifying 3×3×16 capsules of the lower layer 35 used to determine one capsule of the upper layer 37 according to the routing-by-agreement. This also applies to the following explanation.
By applying a 7×7×12 kernel to the first capsule layer 37 with stride “2”, from among the partial regions R37 of the first capsule layer 37, the partial region (s) R37 giving an output to one partial region R38 of the second capsule layer 38 is determined. Here, since six types of kernels are used with the same size, the same dimension, and the same stride, the number of capsules included in the partial region R38 of the second capsule layer 38 is 6. The routing-by-agreement is used when generating the capsule of the second capsule layer 38 from the capsule of the first capsule layer 37.
By applying a 3×3×6 kernel to the second capsule layer 38 with stride “1”, from among the partial regions R38 of the second capsule layer 38, the partial region (s) R38 giving an output to one partial region R39 of the classification capsule layer 39 is determined. Here, since two types of kernels are used with the same size, the same dimension, and the same stride, the number of capsules included in the partial region R39 of the classification capsule layer 39 is 2. The routing-by-agreement is used when generating the capsule of the classification capsule layer 39 from the capsule of the second capsule layer 38.
The classification capsule layer 39 that is the final layer is configured of one partial region R39. The classification capsule layer 39 classifies the data elements input into the first model 30, into predetermined labels. In the present embodiment, the predetermined labels are the non-defective product label and the defective product label. In the classification capsule layer 39, out of the two capsules, the label corresponding to the capsule having the maximum activation a is output. The label output from the classification capsule layer 39 is output by the output section 26 by being controlled by the processor 24.
In
Next, in step S12, the first data set 12 to be learned by the first model 30 is prepared. The order of steps S10 and S12 is not limited to the above, and step S12 may be executed before step S10.
Next, in step S14, each of the first data elements 12A to 12C of the first data set 12 is sequentially input into the first model 30. The first model 30 is learned to reproduce the correspondence between each of the data elements 12A to 12C of the first data set 12 and the pre-label corresponding to each of the data elements 12A to 12C. The processor 24 performs the learning of the first model 30 by using, for example, an algorithm of a mini-batch gradient descent method. In the present embodiment, the processor 24 performs the learning of the first model 30 by using the algorithm of the mini-batch gradient descent method in which a size of the mini-batch, which is a subset of the data elements, is set to “32” and an epoch is set to “20000”.
In step S16, the processor 24 re-inputs the first data set 12 into the first model 30 learned in step S14, and the following is executed for each of the first capsule layer 37, the second capsule layer 38, and the classification capsule layer 39 which are the capsule layers. That is, the processor 24 acquires the first intermediate data by being associated with the partial range which is the partial range on the first data elements 12A to 12C and to which each capsule corresponds based on at least one of the first activation a1 and the first pose M1 included in each capsule of each of the layers 37, 38, and 39. The storage device 22 stores a relative position of the associated first intermediate data and the associated partial range. The associated partial range itself may be stored in the storage device 22. Here, the partial range on the first data elements 12A to 12C and corresponding to each capsule is also referred to as a first partial data element. Further, in the following, the stored first intermediate data and first partial data element are also referred to as collection data element 32. In another embodiment, in the same manner as in each of the layers 37, 38, and 39, also in the primary capsule layer 35, the first intermediate data may be acquired by being associated with the first partial data element. Further, when the partial range information indicating the region of the first partial data element is not included as the information about the similarity described later, the first intermediate data may not be associated with the first partial data element. It is not always necessary to acquire the first intermediate data from all the capsule layers. For example, the first intermediate data may be acquired only from the second capsule layer 38, or may be acquired from a combination of several layers. This also applies to the second intermediate data described below. Further, in another embodiment, by dividing the first data set 12 used for performing the learning of the first model 30 in step S14 into two, in step S14, it is divided into a group of the first data elements 12A, 12B, and 12C used for performing the learning of the first model 30 and a group of the first data elements 12A, 12B, and 12C not used for performing the learning. In step S14, only one group performs the learning of the first model 30, and in step S16, the first intermediate data may be generated by using two groups. In short, as long as the same pre-labels 14A, 14B, and 14C as the pre-labels 14A, 14B, and 14C of the first data elements 12A, 12B, and 12C used for performing the learning of the first model 30 are given, in step S16, data for generating the intermediate data is not limited only to the first data elements 12A, 12B, and 12C used for performing the learning of the first model 30.
As described above, according to the method executed by the processor 24, the first intermediate data can be acquired by step S16 based on at least one of the first activation a1 and the first pose M1 included in the capsule. Therefore, when the data element of the discrimination target is input into the first model 30, the second intermediate data is acquired based on at least one of the second activation a2 and the second pose M2 included in the capsule. The similarity between a feature spectrum generated from the first intermediate data and a feature spectrum generated from the second intermediate data can be calculated. The second intermediate data may be the second pose M2 itself or the second activation a2 itself, or may be data which is data-processed such as weighting for the second pose M2 or the second activation a2. In the present embodiment, the second intermediate data is configured of the second pose M2 and the second activation a2. In another embodiment, the first intermediate data and the second intermediate data each may be the feature spectrum. The details of the feature spectrum will be described later. By calculating the similarity, for example, a capsule of which the similarity is less than a predetermined threshold value can be specified. Therefore, a discrimination basis of the input data element by using the first model can be output. Details of the output aspect of the discrimination basis will be described later.
As illustrated in
By inputting the second data elements 62A to 62C one by one into the learned first model 30, the class discrimination, that is, the label is discriminated. For example, when the second data element 62A indicating one spot or the second data element 62B indicating two spots is input into the first model 30, it is discriminated that the product is defective, and when the second data element 62C indicating three spots is input, it is discriminated that the product is non-defective. Further, in the present embodiment, the processor 24 generates the discrimination basis for discriminating the label, and causes the output section 26 to display the discrimination basis together with the discriminated label. The method of generating the discrimination basis will be described later.
Next, in step S24, the processor 24 outputs, to the output section 26, as the discrimination result, the label corresponding to the capsule in which the second activation a2 of the classification capsule layer 39 is maximized based on the calculation result in step S22. The label discrimination result is not limited to the image information and may be any information that can be notified to the user. For example, the label discrimination result may be output as audio information. In this case, the output section 26 includes a speaker. In the following description, the label discrimination result is stored in the storage device 22 as a part of the collection data element 32.
As described above, the label can be easily discriminated by inputting the second data elements 62A to 62C into the first model 30. Further, since the label discrimination result is output by the output section 26, the user can easily grasp the label discrimination result.
As illustrated in
Next, in step S34, the processor 24 calculates a feature spectrum Sp of the second data elements 62A to 62C and a feature spectrum Sp of the first data element 12C of the non-defective product label. Specifically, the processor 24 calculates each feature spectrum Sp from the first intermediate data and the second intermediate data for each of the partial regions R37, R38, and R39 of each of the first capsule layer 37, the second capsule layer 38, and the classification capsule layer 39. In the present specification, the feature spectrum Sp may be represented by arranging one or more poses M by normalizing for each partial region Rx in the capsule layer with an average of 0 and a variance of 1, or by standardizing by using a Softmax function. Further, the feature spectrum Sp may be represented by arranging each dimension or each element of the poses M by being weighted by the corresponding activations a for each partial region Rx. The weighting can be realized, for example, by taking a product of the pose M and a value of the activation a corresponding to the pose M. Further, the feature spectrum Sp may be represented by arranging the value of activations a for each of the partial regions R37, R38, and R39 of the capsule layers 37, 38, and 39. Further, the arranged activations a may be normalized so that the average is 0 and the variance is 1. Further, the feature spectrum Sp may be represented by arranging the poses M and/or the activations a for each of the partial regions R37, R38, and R39 of the capsule layers 37, 38, and 39. The feature spectrum Sp may be arranged by converting the pose M having a plurality of dimensions, 16 dimensions in the present embodiment, into one dimension without normalization.
The graph illustrated in
As illustrated in
Next, in step S40, the processor 24 outputs information about the similarity which is calculated, that is, derived, by using the output section 26. The information about the similarity includes at least one of hierarchy partial range information indicating a position, layer label information indicating a hierarchy, similarity information, and comparison information. The hierarchy partial range information is partial range information for each hierarchy. Although not limited, in the present embodiment, the hierarchy partial range information is information indicating the partial range on the first data elements 12A, 12B, and 12C and the second data element 62A, 62B, and 62C corresponding to the partial region Rx in which the similarity is calculated for each hierarchy. In the above, the partial range on the first data elements 12A, 12B, and 12C corresponding to the partial region Rx is also referred to as the first partial data element. Further, in the above, the partial range on the second data elements 62A, 62B, and 62C corresponding to the partial region Rx is also referred to as the second partial data element. The layer label information is information for identifying the hierarchy of the first capsule layer 37, the second capsule layer 38, and the classification capsule layer 39 which are a plurality of capsule layers. The similarity information is information indicating the similarity between the partial regions Rx belonging to the same hierarchy, that is, the same capsule layers 37, 38, and 39. The comparison information is information indicating a magnitude relationship between the similarity and a predetermined threshold value. The comparison information is information generated when the processor 24 compares the similarity with a predetermined threshold value in step S40. In the data element of the second data set 62, when the similarity is smaller than the predetermined threshold value, it may be interpreted that the similarity with the feature of the known image in the hierarchy is low, and when the similarity is equal to or more than predetermined threshold value, it may be interpreted that the similarity with the feature of the known image in the hierarchy is high. The predetermined threshold value is, for example, a reference value indicating that the similarity is high or low. In this way, the similarity can be used as a discrimination basis for the class determination of the non-defective product, the defective product, or the like. In step S40, information about the similarity, including the similarity, is generated and output to the output section 26. Here, when only one of the plurality of capsule layers 37, 38, and 39 is used as the discrimination basis, the partial range information may be used instead of the hierarchy partial range information. The partial range information is information indicating the partial range on the data element of which the similarity is calculated.
In step S36 of
Although the information about the similarity is output by using the liquid crystal monitor which is an example of the output section 26, the information may be output as audio information. In this case, the output section 26 includes a speaker.
As described above, by calculating the similarity based on the first intermediate data and the second intermediate data, for example, a capsule of which the similarity is less than a predetermined threshold value can be specified. Therefore, it is possible to output the discrimination basis of the second data elements 62A to 62C by using the first model 30. Further, in the processing of step S36 of
As illustrated in
As illustrated in
Next, in step S34a, the processor 24 calculates the feature spectra of the second data elements 62A to 62E input into the first model 30 and the feature spectra of the extracted and learned first data elements 12A to 12C. Since the calculation method is the same as the method described in step S34 of
Next, in step S36a, the processor 24 calculates the similarity between the feature spectra Sp of the second data elements 62A to 62C and the feature spectra Sp of the first data elements 12A to 12C. As the similarity, for example, the cosine similarity is used. In step S36, a value having a highest similarity to all the first data elements 12A to 12C is set to the similarity of the partial region Rx for respective partial regions Rx of the first capsule layer 37, the second capsule layer 38, and the classification capsule layer 39. The similarity of the partial region Rx is stored in the storage device 22 in association with the partial range of the image data element corresponding to the partial region Rx and the activation a of the partial region Rx.
Next, in step S40a, the processor 24 outputs the information about the calculated similarity by using the output section 26. The information about the similarity includes at least one of the hierarchy partial range information and the partial range information indicating the position, the layer label information indicating the hierarchy, the similarity information, and the comparison information. The difference between step S40 illustrated in
As described above, the same effects are achieved in that the same configuration and processing as those in the first explanatory processing are performed in the second explanatory processing. For example, by calculating the similarity based on the first intermediate data and the second intermediate data, for example, the capsule of which the similarity is less than a predetermined threshold value can be specified, so that the discrimination basis of the second data element using the first model can be output.
The cosine similarities illustrated in
In the bar graphs of
The cosine similarity illustrated by each bar graph of No. 1 to No. 4 is calculated by the following method. In the following, a part of the known first data elements 12A to 12C used for learning is used, but all thereof for learning may be used. That is, between the randomly-selected 200 known first data elements 12A-12C and the prepared second data elements 62B-62E as comparison targets, the similarities are calculated for each of the partial regions R37 and R38 which are also the strides in the first capsule layer 37 and the second capsule layer 38, and only the value of the cosine similarity with the maximum similarity is stored. This operation is performed on 100 randomly selected second data elements 62B to 62E, and bar graphs are generated by representing the statistic. The standard deviation is represented by an error bar. In the present embodiment, the cosine similarity has a minimum value of “−1” and a maximum value of “1”.
In
In
Instead of the similarity calculation used in
The feature spectrum Sp for calculating the similarity is not limited to the above description. For example, the feature spectrum Sp may be generated by weighting the pose M included in the partial region Rx by a correction value acquired by applying the Softmax function to the value of the activation a, or the feature spectrum Sp may be generated by arranging the value of the pose M for each element with respect to the element in which the pose M included in the partial region Rx is rearranged in one dimension. Therefore, the calculation method of the similarity may be for calculating the similarity between the feature spectrum Sp configured of the first pose M1 and the feature spectrum Sp configured of the second pose M2.
Further, the calculation method of the similarity is not limited to the method described above. The calculation method of the similarity may be, for example, for calculating the similarity between the first activation a1 and the second activation a2. Specifically, the calculation method of the similarity may be for calculating the similarity between the feature spectrum Sp having a plurality of first activations a1 as elements and the feature spectrum Sp having a plurality of second activations a2 as elements.
According to the above embodiment, by calculating the similarity between the first intermediate data and the second intermediate data, for example, the capsule of which the similarity is less than a predetermined threshold value can be specified. Therefore, it is possible to output the discrimination basis of the second data elements 62A to 62E by using the first model 30. By outputting the label discrimination basis of, it is possible to improve, for example, a manufacturing process or the like for reducing the occurrence of the defective products by analyzing the label discrimination basis.
According to the above embodiment, the size of the partial range corresponding to a group of capsules (also called a capsule group) in the depth direction, which is a partial range of input data elements (images in the present embodiment) and is included in each layer of the hierarchical type capsule network model, tends to qualitatively increase from small to large from the lower layer to the upper layer, and the size is adjustable to some extent under this tendency. Therefore, when the features included in the data element have the hierarchical property, the network structure of the first model 30 can be adjusted so that each of the layers included in the first model 30 corresponds to the feature of each hierarchy, that is, corresponds to the feature of each size. Further, in the hierarchical type capsule network model, a correlation of the partial range on the data element (for example, on the image) corresponding to the capsule group in each layer is maintained from the lower layer to the upper layer. From this, by comparing between the first intermediate data and the second intermediate data for each of the capsule layers 37, 38, and 39, the discrimination result of each data element of the second input data set 60 can be explained from a viewpoint of each hierarchy of the feature. The first intermediate data is data acquired from the capsule layers 37, 38, and 39 by giving to the learned first model 30, as an input, the first data set 12 configured of a plurality of data elements used for learning. Further, the second intermediate data is data acquired from the capsule layers 37, 38, and 39 by giving to the first model 30, as an input, the data elements not used for learning, each data element of the second data set 62 in the present embodiment.
In the above embodiment, the similarity between the feature spectrum Sp of the first intermediate data and the feature spectrum Sp of the second intermediate data is the cosine similarity, but the similarity is not limited to this, and various similarities may be used depending on elements for comparing the similarity. For example, the similarity may be a square error, a similarity based on an inner or outer product of two vectors, a distance between two points represented by two vectors, or a similarity based on a norm.
In the above embodiment, the first explanatory processing and the second explanatory processing may be configured to be automatically switchable by the processor 24. For example, in a case where the processor 24 executes the first explanatory processing, when a ratio discriminated to be the defective product to a total number of input data elements becomes equal to or more than a predetermined value, the first explanatory processing may be switched to the second explanatory processing.
In the above embodiment, the program stored in the non-volatile storage medium 23 is executed by one processor 24, but may be executed by two or more processors 24.
The generation method of the first intermediate data and the second intermediate data is not limited to the above embodiment, and for example, the first intermediate data and the second intermediate data may be generated by using a Kmeans method. Further, the first intermediate data and the second intermediate data may be generated by using conversion such as PCA, ICA, or Fisher. Further, the conversion methods of the first intermediate data and the second intermediate data may be different.
The present disclosure is not limited to the above embodiments, and can be realized in various aspects without departing from the spirit thereof. For example, the present disclosure can also be realized by the following aspects. The technical features in the above embodiments corresponding to technical features in each of the aspects described below can be replaced or combined as appropriate in order to solve some or all of the problems of the present disclosure, or achieve some or all of the effects of the present disclosure. Further, if the technical feature is not described as essential in the present specification, it can be appropriately deleted.
(1) According to the first aspect of the present disclosure, there is provided a method for causing one or more processors to execute. The method for causing one or more processors to execute: performing learning of a first model of a capsule network type including one or more capsule layers each having one or more capsules to reproduce correspondence between a plurality of first data elements included in a first data set and a pre-label corresponding to each of the plurality of first data elements; and inputting the first data set into the learned first model and acquiring first intermediate data based on at least one of a first activation and a first pose included in the one or more capsules, for the one or more capsule layers. According to this aspect, the first intermediate data can be acquired based on at least one of the first activation and the first pose included in the capsule. Therefore, the second intermediate data is acquired based on at least one of the second activation and the second pose included in the capsule when the second data element of the discrimination target is input into the first model. The similarity between the first intermediate data and the second intermediate data can be calculated. By calculating the similarity, for example, the capsule of which the similarity is less than a predetermined threshold value can be specified, so that the discrimination basis of the second data element using the first model can be output.
(2) According to the second aspect of the present disclosure, there is provided a method for causing one or more processors to execute using a first model learned in advance. The first model is a capsule network type including one or more capsule layers each having one or more capsules, and is learned to reproduce correspondence between a plurality of first data elements included in a first data set and a pre-label corresponding to each of the plurality of first data elements. The method includes: acquiring first intermediate data based on at least one of a first activation and a first pose included in the one or more capsules, for each of the one or more capsule layers when the first data set is input into the learned first model; inputting a second data element into the first model and acquiring second intermediate data based on at least one of a second activation and a second pose included in the one or more capsules, for each of the one or more capsule layers; and calculating a similarity between the first intermediate data and the second intermediate data, for the one or more capsule layers. According to this aspect, the first intermediate data can be acquired based on at least one of the first activation and the first pose included in the capsule. Therefore, the second intermediate data is acquired based on at least one of the second activation and the second pose included in the capsule when the second data element of the discrimination target is input into the first model. The similarity between the first intermediate data and the second intermediate data can be calculated. By calculating the similarity, for example, the capsule of which the similarity is less than a predetermined threshold value can be specified, so that the discrimination basis of the second data element using the first model can be output.
(3) In the above aspect, it may further include outputting the information about the calculated similarity. According to this aspect, the user can easily grasp the information about the similarity.
(4) In the above aspect, it may further include inputting the second data element into the first model to discriminate the label of the second data element. According to this aspect, the label of the second data element using the first model can be discriminated.
(5) In the above aspect, it may further include outputting the discrimination result of the label. According to this aspect, the user can easily grasp the label discrimination result.
(6) In the above aspect, the capsule layer may have a plurality of the capsules, and the acquiring of the first intermediate data may include acquiring the first intermediate data included in each of the plurality of capsules, and associating the first partial range that is apart of the first data element corresponding to the acquired first intermediate data with the corresponding first intermediate data. The acquiring of the second intermediate data may include acquiring the second intermediate data included in each of the plurality of capsules, and associating the second partial range that is apart of the second data element corresponding to the acquired second intermediate data with the corresponding second intermediate data. The calculating of the similarity may include calculating the similarity between the first intermediate data of the first partial range and the second intermediate data of the second partial range corresponding to the first partial range. According to this aspect, the similarity between the first partial range and the second partial range can be calculated by calculating the similarity between the first intermediate data of the first partial range and the second intermediate data of the second partial range corresponding to the first partial range. Therefore, it is possible to easily grasp which range in the second data element is used as the basis for discriminating the label.
(7) In the above aspect, the information about the similarity may include the partial range information indicating the first partial range and the second partial range for which the calculation of the similarity is performed. According to this aspect, the user can easily grasp which partial range is used as the basis for discriminating the label.
(8) In the above aspect, the capsule layer may have the hierarchy structure configured of a plurality of layers, and the calculating of the similarity may include calculating the similarity between the first intermediate data of the first partial range and the second intermediate data of the second partial range corresponding to the first partial range, for each of the capsule layers. According to this aspect, the similarity between the first intermediate data and the second intermediate data can be calculated for each layer of the plurality of capsule layers.
(9) In the above aspect, the capsule layer may have a plurality of the capsules, and the acquiring of the first intermediate data may include acquiring the first intermediate data included in each of the plurality of capsules, and associating the first partial range which is a part of the first data element corresponding to the acquired first intermediate data with the corresponding first intermediate data. The acquiring of the second intermediate data may include acquiring the second intermediate data included in each of the plurality of capsules, and associating the second partial range that is a part of the second data element corresponding to the acquired second intermediate data with the corresponding second intermediate data. The calculating of the similarity may include calculating the similarity between the first intermediate data and the second intermediate data. According to this aspect, the similarity between the first partial range and the second partial range can be calculated by calculating the similarity between the first intermediate data of the first partial range and the second intermediate data of the second partial range. Therefore, it is possible to easily grasp which region in the second data element is used as the basis for discriminating the label.
(10) In the above aspect, the information about the similarity may include partial range information indicating the second partial range for which the calculation of the similarity is performed. According to this aspect, the user can easily grasp which partial range is used as the basis for discriminating the label.
(11) In the above aspect, the information about the similarity may include the partial range information indicating the first partial range and the second partial range for which the calculation of the similarity is performed. According to this aspect, the user can easily grasp which partial range is used as the basis for discriminating the label.
(12) In the above aspect, the capsule layer may have the hierarchy structure configured of a plurality of layers, and the calculating of the similarity may include calculating the similarity between the first intermediate data and the second intermediate data for each capsule layer. According to this aspect, the similarity between the first intermediate data and the second intermediate data can be calculated for each layer of the plurality of capsule layers.
(13) In the above aspect, the information about the similarity may further include at least one of layer label information for identifying the hierarchy of the plurality of capsule layers, similarity information indicating the similarity for each hierarchy, hierarchy partial range information indicating the hierarchy partial range which is the second partial range in which the similarity is calculated for each hierarchy, comparison information indicating the magnitude relationship between the similarity, and the predetermined threshold value. According to this aspect, the user can grasp the information about the similarity in more detail.
(14) In the above aspect, the first intermediate data may include at least the first pose, the second intermediate data may include at least the second pose, and calculating the similarity may include calculating the similarity between the first pose and the second pose. According to this aspect, the similarity can be calculated by using the first pose and the second pose.
(15) In the above aspect, the first intermediate data may include at least the first activation, the second intermediate data may include at least the second activation, and the calculating of the similarity may include calculating the similarity between the first activation and the second activation. According to this aspect, the similarity can be calculated by using the first activation and the second activation.
(16) In the above aspect, the first intermediate data may include the first pose and the first activation, and the second intermediate data may include the second pose and the second activation. The calculating of the similarity may include weighting the first pose by the first activation, weighting the second pose by the second activation, and calculating the similarity between the weighted first pose and the weighted second pose. According to this aspect, the similarity can be calculated by using the weighted first pose and the weighted second pose.
(17) According to the third aspect of the present disclosure, an apparatus is provided. The apparatus includes: one or more processors, in which the one or more processors is configured to execute: performing learning of a first model of a capsule network type including one or more capsule layers each having one or more capsules to reproduce correspondence between a plurality of first data elements included in a first data set and a pre-label corresponding to each of the plurality of first data elements; and inputting the first data set into the learned first model and acquiring first intermediate data based on at least one of a first activation and a first pose included in the one or more capsules, for the one or more capsule layers. According to this aspect, the first intermediate data can be acquired based on at least one of the first activation and the first pose included in the capsule. Therefore, the second intermediate data is acquired based on at least one of the second activation and the second pose included in the capsule when the second data element of the discrimination target is input into the first model. The similarity between the first intermediate data and the second intermediate data can be calculated. By calculating the similarity, for example, the capsule of which the similarity is less than a predetermined threshold value can be specified, so that the discrimination basis of the second data element using the first model can be output.
(18) According to a fourth aspect of the present disclosure, an apparatus is provided. The apparatus includes a storage device which is a capsule network type including one or more capsule layers each having one or more capsules, and stores a first model learned to reproduce correspondence between a plurality of first data elements included in a first data set and a pre-label corresponding to each of the plurality of first data elements; a first acquisition section which acquires the first intermediate data based on at least one of the first activation and the first pose included in the one or more capsules, for each of the one or more capsule layers when the first data set is input into the learned first model; a second acquisition section which inputs the second data element into the first model and acquires the second intermediate data based on at least one of the second activation and the second pose included in the one or more capsules, for each of the one or more capsule layers; and a calculation section which calculates the similarity between the first intermediate data and the second intermediate data in the one or more capsule layers. According to this aspect, the first intermediate data can be acquired based on at least one of the first activation and the first pose included in the capsule. Therefore, the second intermediate data is acquired based on at least one of the second activation and the second pose included in the capsule when the second data element of the discrimination target is input into the first model. The similarity between the first intermediate data and the second intermediate data can be calculated. By calculating the similarity, for example, the capsule of which the similarity is less than a predetermined threshold value can be specified, so that the discrimination basis of the second data element using the first model can be output.
(19) According to a fifth aspect of the present disclosure, a non-temporary computer-readable medium is provided. The non-temporary computer-readable medium storing instructions for causing one or more processors to execute: performing learning of a first model of a capsule network type including one or more capsule layers each having one or more capsules to reproduce correspondence between a plurality of first data elements included in a first data set and a pre-label corresponding to each of the plurality of first data elements; and inputting the first data set into the learned first model and acquiring first intermediate data based on at least one of a first activation and a first pose included in the one or more capsules, for each of the one or more capsule layers. According to this aspect, the first intermediate data can be acquired based on at least one of the first activation and the first pose included in the capsule. Therefore, the second intermediate data is acquired based on at least one of the second activation and the second pose included in the capsule when the second data element of the discrimination target is input into the first model. The similarity between the first intermediate data and the second intermediate data can be calculated. By calculating the similarity, for example, the capsule of which the similarity is less than a predetermined threshold value can be specified, so that the discrimination basis of the second data element using the first model can be output.
(20) According to a sixth aspect of the present disclosure, a non-temporary computer-readable medium executed by using the first model learned in advance is provided. The first model is a capsule network type including one or more capsule layers each having one or more capsules, and is learned to reproduce correspondence between a plurality of first data elements included in a first data set and a pre-label corresponding to each of the plurality of first data elements. The instructions causes one or more processors to further execute: acquiring first intermediate data based on at least one of a first activation and a first pose included in the one or more capsules, for each of the one or more capsule layers when the first data set is input into the learned first model; inputting a second data element into the first model and acquiring second intermediate data based on at least one of a second activation and a second pose included in the one or more capsules, for each of the one or more capsule layers; and calculating a similarity between the first intermediate data and the second intermediate data for the one or more capsule layers. According to this aspect, the first intermediate data can be acquired based on at least one of the first activation and the first pose included in the capsule. Therefore, the second intermediate data is acquired based on at least one of the second activation and the second pose included in the capsule when the second data element of the discrimination target is input into the first model. The similarity between the first intermediate data and the second intermediate data can be calculated. By calculating the similarity, for example, the capsule of which the similarity is less than a predetermined threshold value can be specified, so that the discrimination basis of the second data element using the first model can be output.
In addition to the above aspects, the present disclosure can be realized in a form of a system including a non-volatile storage medium in which a computer program is recorded, or an apparatus.
Number | Date | Country | Kind |
---|---|---|---|
2020-094200 | May 2020 | JP | national |
2020-094205 | May 2020 | JP | national |