The present application is based on, and claims priority from JP Application Serial Number 2021-200573, filed Dec. 10, 2021, the disclosure of which is hereby incorporated by reference herein in its entirety.
The present disclosure relates to a technique for determining the class of data to be determined using a machine learning model.
In U.S. Pat. No. 5,210,798 and WO 2019/083553, a capsule network is described, the capsule network being a vector neural network type machine learning model using vector neurons. A vector neuron means a neuron with a vector as the input and output. A capsule network is a machine learning model with a vector neuron called a capsule as a node of the network. The vector neural network type machine learning model such as a capsule network can be used to determine the class of input data.
In known technology, when the data amount of the data used in learning and of the data to be determined is large, there may be cases in which the machine learning model learning time and the time taken to input the data to be determined into the machine learning model and determine the class is a long time.
According to a first aspect of the present disclosure, a learning method for M number of vector neural network type machine learning models including a plurality of vector neuron layers, the M number of machine learning models being used in determining a class of data to be determined is provided, M being an integer of two or more. The learning method includes (a) preparing a plurality of pieces of data for learning including data for input and a pre-label associated with the data for input, (b) dividing the plurality of pieces of data for learning into one or more groups to generate one or more input learning data groups, and (c) training the M number of machine learning models so that a correspondence between the data for input and the pre-label associated with the data for input is reproduced, by inputting the corresponding input learning data groups respectively into the M number of machine learning models, wherein (b) includes (b1) dividing the plurality of pieces of data for input into one or more regions to generate, as one of the input learning data groups, a collection of first type divided input data after division belonging to the same region, or (b2) dividing the plurality of pieces of data for learning belonging to one class into one or more groups to generate, as one of the input learning data groups, a collection of second type divided input data after division.
According to a second aspect of the present disclosure, a determining method for determining a class of data to be determined using M number of vector neural network type machine learning models including a plurality of vector neuron layers is provided, M being an integer of two or more. The determining method includes (a) preparing the M number of machine learning models trained using a plurality of pieces of data for learning including data for input and a pre-label associated with the data for input, wherein each one of the M number of machine learning models is trained using one corresponding group of one or more input learning data groups, the one or more input learning data groups being obtained by dividing the plurality of pieces of data for learning, (b) preparing M number of known feature spectrum groups corresponding to the M number of machine learning models after training, wherein the M number of known feature spectrum groups include a known feature spectrum group obtained from an output of a specific layer from among the plurality of vector neuron layers by inputting the input learning data groups into the M number of machine learning models after the training, (c) obtaining individual data used in class determination of the data to be determined for each one of the M number of machine learning models by inputting data to be determined for input generated from the data to be determined into each one of the M number of machine learning models after the training, wherein, for each one of the M number of machine learning models, the individual data is generated using at least one of (i) a similarity between a feature spectrum calculated from an output of the specific layer according to input of the data to be determined for input into the machine learning model and the known feature spectrum group, and (ii) an activation value corresponding to a determination value for each class output from an output layer of the machine learning model according to an input of the data to be determined for input, and (d) executing class determination for the data to be determined using M number of pieces of the individual data obtained respectively for the M number of machine learning models, wherein (a) includes one of (a1) dividing the plurality of pieces of data for input into one or more regions, and using a collection of first type divided input data after division belonging to the same region as one of the input learning data groups, and (a2) executing division processing to divide the plurality of pieces of data for learning belonging to one class into one or more groups, and using a collection of second type divided input data after the division processing as one of the input learning data groups.
According to a third aspect of the present disclosure, a learning apparatus for M number of vector neural network type machine learning models including a plurality of vector neuron layers, the M number of machine learning models being used in determining a class of data to be determined is provided, M being an integer of two or more. The learning apparatus includes a memory, and a processor configured to execute training of the M number of machine learning models, wherein the processor executes processing to divide a plurality of pieces of data for learning including data for input and a pre-label associated with the data for input into one or more groups to generate the one or more input learning data groups, and processing to train the M number of machine learning models so that a correspondence between the data for input and the pre-label associated with the data for input is reproduced by inputting the corresponding input learning data groups respectively into the M number of machine learning models, the processing to generate the one or more input learning data groups includes
processing to divide the plurality of pieces of data for input into one or more regions and generate, as one of the input learning data groups, a collection of first type divided input data after division belonging to the same region, or processing to divide the plurality of pieces of data for learning belonging to one class into one or more groups to generate, as one of the input learning data groups, a collection of second type divided input data after division.
According to a fourth aspect of the present disclosure, a determining apparatus for determining a class of data to be determined using M number of vector neural network type machine learning models including a plurality of vector neuron layers is provided, M being an integer of two or more. The determining apparatus includes a memory configured to store the M number of machine learning models trained using a plurality of pieces of data for learning including data for input and a pre-label associated with the data for input, wherein each one of the M number of machine learning models is trained using one corresponding group of one or more input learning data groups, the one or more input learning data groups being obtained by dividing the plurality of pieces of data for learning, and
a processor configured to execute class determination of the data to be determined by inputting the data to be determined into the M number of machine learning models, wherein the processor executes processing to generate M number of known feature spectrum groups corresponding to the M number of machine learning models after training, wherein the M number of known feature spectrum groups include a known feature spectrum group obtained from an output of a specific layer from among the plurality of vector neuron layers by inputting the input learning data groups into the M number of machine learning models after the training, processing to obtain individual data used in class determination of the data to be determined for each one of the M number of machine learning models by inputting data to be determined for input generated from the data to be determined into each one of the M number of machine learning models after the training, wherein, for each one of the M number of machine learning models, the individual data is generated using at least one of (i) a similarity between a feature spectrum calculated from an output of the specific layer according to input of the data to be determined for input into the machine learning model and the known feature spectrum group, and (ii) an activation value corresponding to a determination value for each class output from an output layer of the machine learning model according to an input of the data to be determined for input, and processing to execute class determination for the data to be determined using M number of pieces of the individual data obtained respectively for the M number of machine learning models, and the input learning data group is either
a collection of first type divided input data after division belonging to the same region of one or more regions obtained by dividing the plurality of pieces of data for input, or a collection of second type divided input data after the division processing in which the plurality of pieces of data for learning belonging to one class are divided into one or more groups.
According to a fifth aspect of the present disclosure, a non-transitory computer-readable storage medium storing a computer program configured to cause a processor to execute training of M number of vector neural network type machine learning models including a plurality of vector neuron layers, the M number of machine learning models being used in determining a class of data to be determined is provided, M being an integer of two or more. The computer program includes a function (a) of dividing a plurality of pieces of data for learning including data for input and a pre-label associated with the data for input into one or more groups to generate the one or more input learning data groups, and a function (b) of training the M number of machine learning models so that a correspondence between the data for input and the pre-label associated with the data for input is reproduced by inputting the corresponding input learning data groups respectively into the M number of machine learning models, wherein the function (a) includes
a function of dividing the plurality of pieces of data for input into one or more regions to generate, as one of the input learning data groups, a collection of first type divided input data after division belonging to the same region, or a function of dividing the plurality of pieces of data for learning belonging to one class into one or more groups to generate, as one of the input learning data groups, a collection of second type divided input data after division.
According to a sixth aspect of the present disclosure, a non-transitory computer-readable storage medium storing a computer program configured to cause a processor to execute determination of a class of data to be determined using M number of vector neural network type machine learning models including a plurality of vector neuron layers is provided, M being an integer of two or more. The computer program includes a function (a) of storing the M number of machine learning models trained using a plurality of pieces of data for learning including data for input and a pre-label associated with the data for input, wherein each one of the M number of machine learning models is trained using one corresponding group of one or more input learning data groups, the one or more input learning data groups being obtained by dividing the plurality of pieces of data for learning, a function (b) of generating M number of known feature spectrum groups corresponding to the M number of machine learning models after training, wherein the M number of known feature spectrum groups include a known feature spectrum group obtained from an output of a specific layer from among the plurality of vector neuron layers by inputting the input learning data groups into the M number of machine learning models after the training, a function (c) of obtaining individual data used in class determination of the data to be determined for each one of the M number of machine learning models by inputting data to be determined for input generated from the data to be determined into each one of the M number of machine learning models after the training, wherein, for each one of the M number of machine learning models, the individual data is generated using at least one of (i) a similarity between a feature spectrum calculated from an output of the specific layer according to input of the data to be determined for input into the machine learning model and the known feature spectrum group, and (ii) an activation value corresponding to a determination value for each class output from an output layer of the machine learning model according to an input of the data to be determined for input, and a function (d) of executing class determination for the data to be determined using M number of pieces of the individual data obtained respectively for the M number of machine learning models, wherein the input learning data group is either a collection of first type divided input data after division belonging to the same region of one or more regions obtained by dividing the plurality of pieces of data for input, or a collection of second type divided input data after the division processing in which the plurality of pieces of data for learning belonging to one class are divided into one or more groups.
The processor 110 functions as a data generating unit 112 that generates data to be input into a machine learning model 200 from pieces of data of an input learning data group IDG used in training the machine learning model 200 and from data for input IM such as data to be determined IM and the like and functions as a class determination processing unit 114 that executes class determination processing of the data to be determined IM. The data generating unit 112 and the class determination processing unit 114 are implemented by the processor 110 executing the computer program stored in the memory 120.
The data generating unit 112 executes one of the following two types of data processing to generate the input learning data group IDG.
(1) First Data Processing:
Each one of the plurality of data for input IM is divided into M or more number of regions, and a collection of first type divided input data IDa after division that belongs in the same region is generated as one input learning data group IDG.
(2) Second Data Processing:
Division processing is executed to divide the plurality of data for input IM belonging to one class into M or more number of pieces, and a collection of second type divided input data IDb after division is generated as one input learning data group IDG.
The class determination processing unit 114 includes a similarity calculation unit 310 and a total determination unit 320. The class determination processing unit 114 inputs the data to be determined IM into M number of machine learning models 200 and uses a plurality of individual data DD obtained for each one of the M number of machine learning models 200 to determine the class of the data to be determined IM. The details will be described below. Note that the data input into the machine learning model 200 is denoted by the reference sign IM regardless of the type of data.
In the foregoing, at least one of the functions of the data generating unit 112 and the class determination processing unit 114 may be implemented by a hardware circuit. In the present specification, the term processor includes such a hardware circuit. The processor for executing the class determination processing may be a processor included in a remote computer connected to the determining apparatus 20 via a network.
The memory 120 stores the plurality of machine learning models 200, a learning data group TDG, the plurality of input learning data groups IDG, and a plurality of known feature spectrum groups KSp. The machine learning model 200 is used in the processing by the class determination processing unit 114. Each one of the plurality of machine learning models 200 is a vector neural network type machine learning model including a plurality of vector neuron layers. An example configuration and operation of the machine learning model 200 will be described below. The number of machine learning models 200 is represented by M, and M can be set to any integer greater than or equal to 2. In the present embodiment, a case in which five machine learning models 200 are used is described. Note that when the five machine learning models 200 are referred to separately, the suffix “_T (T is an integer from 1 to 5)” is attached at the end. That is, the five machine learning models 200 are machine learning models 200_1 to 200_5. Note that, for the five machine learning models 200, also referred to as the first model 200_1, the second model 200_2, the third model 200_3, the fourth model 200_4, and the fifth model 200_5.
The learning data group TDG is a collection of data for learning TD which is training data. In the present embodiment, each piece of data for learning TD of the learning data group TDG includes spectral data as data for input and a pre-label LB associated with the spectral data. In the present embodiment, the pre-label LB is a label indicating the type of the target object 10. Note that in the present embodiment, “label” and “class” have the same meaning. The input learning data group IDG is a data group generated by the data generating unit 112 using the learning data group TDG. M or more number of input learning data groups IDG are generated. The input learning data groups IDG are generated by dividing the plurality of data for learning TD composing the learning data group TDG into M or more number of pieces. Note that in the present embodiment, the number of the input learning data groups IDG is M, the same as the number of the machine learning models 200. Note that when the five input learning data groups IDG are referred to separately, the suffix “_T (T is an integer from 1 to 5)” is attached at the end. That is, the five input learning data groups IDG are input learning data groups IDG_1 to IDG_5.
The known feature spectrum group KSp is a collection of feature spectra obtained when the learning data group TDG is input into the trained machine learning model 200. The feature spectra are described below. The learning data groups TDG and the known feature spectrum groups KSp supported by each machine learning model 200 are used in the machine learning models 200.
In the present embodiment, the data for input IM to be input is spectral data and thus is data of a one-dimensional array. For example, the data IM to be input is data obtained by extracting 36 representative values every 10 nm from the spectral data in a range of from 380 nm to 730 nm.
In the example of
The configuration of each layer 210 to 250 in
Description of Configuration of Machine Learning Model 200
Cony layer 210: Conv [32, 6, 2]
PrimeVN layer 220: PrimeVN [26, 1, 1]
ConvVN1 layer 230: ConvVN1 [20, 5, 2]
ConvVN2 layer 240: ConvVN2 [16, 4, 1]
ClassVN layer 250: Class VN [Nm, 3, 1]
Vector dimension VD: VD=16
In the description of these layers 210 to 250, the character string before parentheses is a layer name, and numbers inside the parentheses are the number of channels, the surface size of a kernel, and the stride in this order. For example, for the Conv layer 210, the layer name is “Conv”, the number of channels is 32, the surface size of a kernel is _1×6, and the stride is 2. In
The Conv layer 210 is a layer composed of scalar neurons. The other four layers 220 to 250 are layers composed of vector neurons. A vector neuron is a neuron with a vector for the input and output. In the above description, the dimension of the output vector of each vector neuron is constant at 16. In the following description, the term “node” is used as an upper concept of the scalar neuron and the vector neuron.
In
As is well known, a resolution W1 in the y direction after convolution is given by the following equation.
W1=Ceil{(W0−Wk+1)/S} (1)
Here, W0 is the resolution before convolution, Wk is the surface size of the kernel, S is the stride, and Ceil{X} is a function for performing an operation of rounding up X.
The resolution of each layer illustrated in
The ClassVN layer 250 has Nm number of channels. In the example of
In
As illustrated in
In the present disclosure, the vector neuron layer used for calculation of a similarity is also referred to as a “specific layer”. As the specific layer, a discretionary number of one or more vector neuron layers can be used. Note that the configuration of the feature spectrum Sp and the calculation method of a similarity S using the feature spectrum Sp are described below.
Description of Configuration of Each Layer
Cony layer 210: Conv [32, 5, 2]
PrimeVN layer 220: PrimeVN [16, 1, 1]
ConvVN1 layer 230: ConvVN1 [12, 3, 2]
ConvVN2 layer 240: ConvVN2 [6, 3, 1]
ClassVN layer 250: Class VN [Nm, 4, 1]
Vector dimension VD: VD=16
The machine learning model 200 illustrated in
When the data to be determined IM is input into the five machine learning model 200, the feature spectrum Sp is calculated from the specific layer of each one of the five machine learning models 200 and these are input into the similarity calculation unit 310. The similarity calculation unit 310 calculates a by class similarity Sclass, which is the similarity between the feature spectrum Sp and the known feature spectrum groups KSp of a corresponding specific layer.
In step S12, the data generating unit 112 executes the first data processing. Specifically, the data generating unit 112 divides the plurality of data for learning TD prepared in step S10 and generates M number of input learning data groups IDG. An example of step S12 executed by the data generating unit 112 via the first data processing will be described using
The data generating unit 112 executes the division of a single piece of data for input IM for each data for input IM to generate the input learning data group IDG, which is a collection of the first type divided input data IDa belonging to the same region.
As illustrated in
The vertical axis in
The number of the feature spectrum Sp obtained from the output of the ConvVN1 layer 230 with respect to one piece of data is equal to the number of plane positions (x, y) of the ConvVN1 layer 230, that is, the number of the partial regions R230, and is thus 6. Similarly, three feature spectra Sp are obtained from the output of the ConvVN2 layer 240 with respect to one piece of data, and one feature spectrum Sp is obtained from the output of the ClassVN layer 250.
When the data for input IMa after division of the input learning data group IDG is input again into the trained machine learning model 200, the similarity calculation unit 310 calculates the feature spectrum Sp illustrated in
Each record of the known feature spectrum group KSp_ConvVN1 includes a parameter m for distinguishing between the M number of machine learning models 200, a parameter i indicating the label or a class, a parameter j indicating the specific layer, a parameter k indicating the partial region Rn, a parameter q indicating the data number, and the known feature spectrum KSp associated with the parameters i, j, k, and q. The known feature spectrum KSp is the same as the feature spectrum Sp of
The class parameter i is class classification information indicating which class the known feature spectrum KSp belongs to and has the same value of 1 to 3 as the label. The parameter j of the specific layer has a value of 1 to 3 indicating one of the three specific layers 230, 240, and 250. The parameter k of the partial region Rn has a value indicating which one of the plurality of partial regions Rn is included in each specific layer, that is, a value indicating which plane position (x, y). Since the number of partial regions R230 in the ConvVN1 layer 230 is 6, k=1 to 6. The parameter q of the data number indicates the number of the data for input IMa after division to which the same label is attached and has values of 1 to max1 for the class 1, 1 to max2 for the class 2, and 1 to max3 for the class 3. The known feature spectrum KSp associated with the parameter i, which is the class classification information, in this manner is also referred to as the by class known feature spectrum KSp.
As described above, the known feature spectrum groups KSp is obtained from the output of the specific layer by inputting a corresponding input learning data group IDG into each one of the M number of machine learning models 200_1 to 200_5.
Note that the plurality of input learning data groups IDG used in step S20 are not necessarily the same as the plurality of input learning data group IDG used in step S14. However, even in step S20, if some or all of the plurality of input learning data ID used in step S14 are used, there is an advantage that it is not necessary to prepare new input learning data ID.
As illustrated in
As illustrated in
As illustrated in
As illustrated in
As illustrated in
In the first integration processing, the total determination unit 320 calculates the cumulative activation value by adding together the five activation values a for each of the three classes. Then, as illustrated in
In the second integration processing, the total determination unit 320 generates a similarity for determination by integrating the respective similarities S calculated for the first model 200_1 to the fifth model 200_5. In the present embodiment, as illustrated in
As illustrated in
According to the first embodiment described above, as illustrated in
Note that the total determination unit 320 may omit step S42 and step S46. In other words, the total determination unit 320 may set the class with the highest activation value for determination as the determination class regardless of the magnitude of the similarity for determination. This makes it possible to easily decide the determination class using the activation value for determination without using the similarity for determination.
The class determination process illustrated in
B-1. Other Embodiment 1 of Class Determination Process:
In step S34a, the similarity calculation unit 310 generates, as an element of the individual data DD and in addition to the activation a corresponding to each class and the similarity S, a pre-determined class using the activation and the similarity S. The similarity calculation unit 310 generates a pre-determined class by executing the following processes (1) and (2).
Process (1):
In this process, for each one of the first model 200_1 to the fifth model 200_5, when the similarity S is equal to or greater than a predetermined threshold, the class corresponding to the activation value a with the highest value from among the activation values a corresponding to the classes is set as the pre-determined class.
Process (2):
In this process, for each one of the first model 200_1 to the fifth model 200_5, when the similarity S is less than a predetermined threshold, an unknown class different from the class corresponding to the pre-label is set as the pre-determined class.
The threshold in the processes (1) and (2) described above are set to a value such that the data to be determined for input IM_1 to IM5 are estimated to be not similar to the data for input IMa_1 to IMa_5 after division relating to each class. In the present embodiment, the threshold is set to 0.7.
As described above, according to the process (1), the class corresponding to the activation value a with the largest value can be set as the pre-determined class. Also, as described above, since an unknown class can be set as a pre-determined class according to the process (2), class determination of the data to be determined IM can be executed with higher accuracy. In the present disclosure, an unknown class is represented by class 0.
By the similarity calculation unit 310 executing the processes (1) and (2) described above in step S34a, class 1 is generated as the pre-determined class for the first model 200_1, class 0, i.e., an unknown class, is generated for the second model 200_2, class 1 is generated for the third model 200_3, class 1 is generated for the fourth model 200_4, and class 1 is generated for the fifth model 200_5.
Next, in step S48, the total determination unit 320 sets the class most prevalent among the pre-determined classes of first model 200_1 to the fifth model 200_5 as the class of the data to be determined IM. This makes it possible to easily determine the class of the data to be determined IM using the pre-determined class.
In the other embodiment 1 of the class determination process described above, the similarity calculation unit 310 determines the pre-determined class taking into account a threshold, but the process is not limited thereto. For example, the similarity calculation unit 310 may, in step S34a, generate a class corresponding to the activation value a with the highest value from among the activation values a corresponding to the classes for the first model 200_1 to the fifth model 200_5, regardless of the magnitude of the similarity S.
B-2. Other Embodiment 2 of Class Determination Process:
In step S48b, the total determination unit 320 sets the class with the highest similarity S among the pre-determined classes of first model 200_1 to the fifth model 200_5 as the class of the data to be determined IM. In the example illustrated in
Note that the other embodiment 2 of the class determination process is not limited to that described above. For example, in step S48b, the total determination unit 320 may calculate the sum or product of the similarities S included in the individual data DD for each class of the same pre-determined class and set the pre-determined class with the highest calculated value as the determination class of the data to be determined IM. This will be described using the following examples.
First model . . . (Pre-determined class=Class 1, Similarity S=0.8)
Second Model . . . (Pre-determined class=Class 1, Similarity S=0.7)
Third Model . . . (Pre-determined class=Class 3, Similarity S=0.7)
Fourth Model . . . (Pre-determined class=Class 2, Similarity S=0.9)
Fifth Model . . . (Pre-determined class=Class 2, Similarity S=0.8)
In the case described above, the total determination unit 320 calculates the sum of the similarities S included in the individual data DD for each class of the same pre-determined class, for example. Regarding the sum, the sum of the similarities S of class 1 is 1.5, the sum of the similarities S of class 2 is 1.7, and the sum of the similarities S of class 3 is 0.7. Thus, the total determination unit 320 sets, as the determination class of the data to be determined IM, the class 2 with the highest sum of 1.7. Accordingly, the determination class of the data to be determined IM can be easily determined using the similarity S, without using the activation value a.
B-3. Other Embodiment 3 of Class Determination Process:
B-4. Other Embodiment 4 of Determination Process:
In the other embodiments 1 to 3 of the determination process described above, the total determination unit 320 may set the unknown class as the determination class of the data to be determined IM regardless of the class indicated by the other pre-determined classes when one of the plurality of pre-determined classes corresponding to the plurality of machine learning models 200_1 to 200_5 indicates the unknown class. When one of the plurality of pre-determined classes indicates an unknown class, there may be a likelihood that the data to be determined IM is unknown. Thus, by setting the class of the data to be determined IM to the unknown class when one of the pre-determined class indicates an unknown class, class determination can be executed with higher accuracy.
According to another embodiment of the determination process described above, as illustrated in
In the other embodiment illustrated in
Next, the similarity calculation unit 310 generates, as a pre-determined class as an element of the individual data DD, a class associated with the representative similarity with the highest value from among the representative similarities of the by class similarity Sclass calculated for each class. In this manner, the pre-determined class can be easily generated using the by class similarity Sclass, without using the activation value a. Here, when the representative similarity with the highest value is less than a predetermined threshold, instead of the class associated with the representative similarity, an unknown class different from a class corresponding to a pre-label is generated as the pre-determined class as an element of the individual data. In this manner, an unknown class can be generated as the pre-determination class, and thus a pre-determined class can be generated with higher accuracy. In the present embodiment, the predetermined threshold is set to 0.7 as in the processes (1) and (2) described above.
In the class determination process, first, the similarity calculation unit 310 obtains the individual data DD by inputting the data to be determined for input IM_1 to IM_5 into the corresponding two machine learning models 200. Next, the similarity calculation unit 310 sets which individual data DD of the two machine learning models 200 corresponding to each regions R1 to R5 to use in class determination. Specifically, the similarity calculation unit 310 calculates a model reliability Rmodel that depends on the similarity S included in the individual data DD and sets the individual data DD obtained from the machine learning model 200 with the highest model reliability Rmodel as the data to be used in class determination. By executing integration processing via the first integration processing and the second integration processing in a similar manner as in the first embodiment described above, the activation value for determination and the similarity for determination is calculated for the five pieces of individual data DD used in class determination set for each region R1 to R5. In addition, as in the first embodiment, the total determination unit 320 sets the determination class from the activation value for determination and the similarity for determination. Note that the setting method for setting the individual data DD obtained from the machine learning model 200 with the highest model reliability Rmodel as the data to be used in class determination can be applied to other embodiments of the present disclosure. For example, in the first embodiment illustrated in
For example, any of the following can be used as a reliability function for obtaining the model reliability R from the similarity S.
Rmodel(i)=H1[S(i)]=S(i) (3a)
Rmodel(i)=H2[S(i)]=Ac(i)×Wt+S(i)×(1−Wt) (3b)
Rmodel(i)=H3[S(i)]=Ac(i)×S(i) (3c)
where Ac(i) is an activation value corresponding to the determination value with the highest value in the output layer of the machine learning model 200, and Wt is the weighting coefficient ranging from 0<Wt<1.
The reliability function H1 of the above-described Equation (3a) is an identity function using the similarity S as is as the model reliability Rmodel. The reliability function H2 of the above-described Equation (3b) is a function for obtaining the model reliability Rmodel by finding the weighted average of the similarity S and the activation value Ac. The reliability function H3 of the above-described Equation (3c) is a function for obtaining the model reliability Rmodel by multiplying the similarity S and the activation value Ac. Other reliability functions may also be used. For example, a function may be used in which a power of the similarity S is used as the model reliability Rmodel. Thus, a model reliability Rmodel can be obtained that is dependent on the similarity S. Additionally, the model reliability Rmodel preferably has a positive correlation to the similarity S.
In the first embodiment described above, data for input is divided into M or more number of regions, and a collection of the first type of divided input data IDa after division that belongs in the same region is generated as one input learning data group IDG by the data generating unit 112 as first data processing. However, the first data processing may include, as the first data processing, dividing into one or more or two or more regions to generate, as one input learning data group IDG, a collection of the first type divided input data IDa after division that belong in the same region. In this case, the same input learning data group IDG may be input into at least two models of the M number of machine learning models 200. Since the performance of the machine learning model 200 with each training may change with the same input learning data, a plurality of the machine learning models 200 trained with the same input learning data may be used. With known techniques, when a single machine learning model is trained and class determination of the data to be determined IM is performed, the determination accuracy may be reduced. However, according to the first embodiment and the other embodiments of the first embodiment described above, since class determination is performed using the plurality of machine learning models 200 with different class determination performances, class determination accuracy can be improved. Such an effect can also be achieved by the second embodiment described below.
As illustrated in
As illustrated in
In the second embodiment also, the pre-preparation process illustrated in
According to the second embodiment described above, as illustrated in
In the second embodiment described above, as the second data processing, the data generating unit 112 executes division processing to divide the plurality of data for input IM belonging to one class into M or more number of pieces and generates a collection of the second type divided input data IDb after division as one input learning data group IDG. However, the second data processing may include dividing the plurality of data for learning TD belonging to one class into one or more or two or more groups to generate, as one input learning data group IDG, a collection of the second type divided input data IDb after division. For example, the plurality of data for learning TD belonging to one class are referred to data for learning TD1, TD2, and TD3. In this case, the following seven input learning data groups IDG are generated. For each machine learning model 200_1 and 200_2, one or more data group is selected from the following generated seven input learning data groups IDG and used in learning.
(1) First input learning data group . . . Configuration by data for learning TD1.
(2) Second input learning data group . . . Configuration by data for learning TD2.
(3) Third input learning data group . . . Configuration by data for learning TD3.
(4) Fourth input learning data group . . . Configuration by data for learning TD1 and TD2.
(5) Fifth input learning data group . . . Configuration by data for learning TD1 and TD3.
(6) Sixth input learning data group . . . Configuration by data for learning TD2 and TD3.
(7) Seventh input learning data group . . . Configuration by data for learning TD1, TD2, and TD3.
Note that since the performance of the machine learning models 200_1 and 200_2 with each training may change with the same input learning data, a plurality of the machine learning models 200 trained with the same input learning data may be used. That is, in the second data processing, the plurality of learning data TD belonging to one class may be divided into one or more group regardless of the number of machine learning models 200.
Any one of following three methods can be used as the calculation method of the by class similarity Sclass described above, for example.
(1) First calculation method M1 for obtaining by class similarity Sclass without considering correspondence between the partial regions Rn in the feature spectrum Sp and the known feature spectrum group KSp
(2) Second calculation method M2 for obtaining by class similarity Sclass based on the corresponding partial regions Rn in the feature spectrum Sp and the known feature spectrum group KSp
(3) Third calculation method M3 for obtaining by class similarity Sclass without considering the partial regions Rn at all
Hereinafter, a method of calculating the by class similarity Sclass_ConvVN1 from the output of the ConvVN1 layer 230 according to the three calculation methods M1, M2, and M3 will be sequentially described. Note that in the following description, the parameter m of the machine learning model 200 and the parameter q of the data to be determined IM are omitted.
In the first calculation method Ml, the local similarity S(i, j, k) is calculated using the following equation. S(i, j, k)=max[G{Sp(j, k), KSp(i, j, k=all, q=all)}] (c1)
where i is a parameter indicating the class,
j is a parameter indicating the specific layer,
k is a parameter indicating the partial region Rn,
q is a parameter indicating the data number,
G{a, b} is a function for obtaining the similarity between a and b,
Sp(j,k) is a feature spectrum obtained from the output of a specific partial region k of the specific layer j according to the data to be determined,
Ksp(i, j, k=all, q=all) is, from the known feature spectrum group KSp illustrated in
max[X] is a logical calculation that takes the maximum value of the values of X.
Note that for the function G{a, b} for obtaining the similarity, for example, a formula for obtaining cosine similarity, a formula for obtaining similarity corresponding to distance, or the like can be used.
The three types of class similarities Sclass(i, j) illustrated on the right side of
As described above, in the first calculation method Ml for the by class similarity,
(1) the local similarity S(i, j, k) which is the similarity between the feature spectrum Sp obtained from the output of the specific partial region k of the specific layer j according to the data to be determined IM and all of the known feature spectrum KSp associated with the specific layer j and each class i is obtained, and
(2) the by class similarity Sclass(i, j) is obtained by taking the maximum value, the average value, the minimum value, or the modal value of the local similarity S(i, j, k) for the plurality of partial regions k for each class i. According to the first calculation method Ml, the by class similarity Sclass(i, j) can be obtained by a relatively simple calculation and process.
where KSp(i, j, k, q=all) is the known feature spectrum of all the data numbers q in the specific partial region k of the specific layer j associated with the class i in the known feature spectrum group KSp illustrated in
In the first calculation method M1 described above, the known feature spectrum KSp(i, j, k=all, q=all) in all of the partial regions k of the specific layer j is used. However, in the second calculation method M2, only the partial region k of the feature spectrum Sp(j, k) and the known feature spectrum KSp(i, j, k, q=all) for the same partial region k are used. The other methods in the second calculation method M2 are the same as in the first calculation method M1.
In the second calculation method M2 for the by class similarity,
(1) the local similarity S(i, j, k) which is the similarity between the feature spectrum Sp obtained from the output of the specific partial region k of the specific layer j according to the data to be determined IM and all of the known feature spectrum KSp associated with the specific partial region k of the specific layer j and each class i is obtained, and
(2) the by class similarity Sclass(i, j) is obtained by taking the maximum value, the average value, the minimum value, or the modal value of the local similarity S(i, j, k) for the plurality of partial regions k for each class i. According to the second calculation method M2 also, the by class similarity Sclass(i, j) can be obtained by a relatively simple calculation and process.
The by class similarity Sclass(i, j) obtained via the third calculation method M3 is calculated using the following equation.
Sclass(i, j)=max[G{Sp(j, k=all), KSp(i, j, k,=all, q=all)}] (c3)
where Sp(j, k=all) is the feature spectrum obtained from the output of all of the partial regions k of the specific layer j according to the data to be determined IM.
As described above, in the third calculation method M3 for the by class similarity,
(1) the by class similarity Sclass(i, j) which is the similarity between all of the feature spectrums Sp obtained from the output of the specific layer j according to the data to be determined IM and all of the known feature spectrum KSp associated with the specific layer j and each class i is obtained.
According to the third calculation method M3, the by class similarity Sclass(i, j) can be obtained by an even simpler calculation and process.
The calculation method for the output vector of each layer in the machine learning model 200 illustrated in
In each node of the PrimeVN layer 220, a scalar output of 1×1×32 nodes in the Conv layer 210 is regarded as a 32-dimensional vector, and a vector output at the node is obtained by multiplying this vector by a transformation matrix. The transformation matrix is an element of a kernel having a surface size of 1×1 and is updated by the learning of the machine learning model 200. Note that the processing of the Conv layer 210 and the processing of the PrimeVN layer 220 can be integrated to form one primary vector neuron layer.
When the PrimeVN layer 220 is referred to as a “lower layer L” and the ConvVN1 layer 230 adjacent to an upper side thereof is referred to as an “upper layer L+1”, the output at each node of the upper layer L+1 is determined using the following equation.
Here, MLi is an output vector of the ith node in the lower layer L,
ML+1j is an output vector of the jth node in the upper layer L+1,
vij is a prediction vector of the output vector ML+1j,
WLij is a prediction matrix for calculating the prediction vector vij from the output vector MLi in the lower layer L,
uj is the sum of the prediction vectors vij, that is, a sum vector, which is a linear combination,
aj is an activation value, which is a normalization coefficient obtained by normalizing a norm |uj| of the sum vector uj, and
F(X) is a normalization function for normalizing X.
As the normalization function F(X), for example, the following Equations (E3a) or (E3b) can be used.
Here, k is an ordinal number for all the nodes in the upper layer L+1, and
β is an adjustment parameter which is an optional positive coefficient, for example, β=1.
In the above-described Equation (E3a), the activation value aj is obtained by normalizing the norm |uj| of the sum vector uj with a softmax function for all the nodes in the upper layer L+1. On the other hand, in Equation (E3b), the activation value aj is obtained by dividing the norm |uj| of the sum vector uj by the sum of the norms |uj| for all the nodes in the upper layer L+1. Note that, as the normalization function F(X), a function other than Equations (E3a) and (E3b) may be used.
The ordinal number i in the above-described Equation (E2) is, for the sake of convenience, assigned to the node in the lower layer L used to determine the output vector ML+1j at the jth node in the upper layer L+1 and takes a value from 1 to n. Also, an integer n is the number of nodes in the lower layer L used to determine the output vector ML+1j in the jth node in the upper layer L+1. Thus, the integer n is given by the following equation.
n=Nk×Nc (E5)
Here, Nk is the surface size of the kernel, and Nc is the number of channels in the PrimeVN layer 220 which is the lower layer. In the example of
One kernel used to obtain the output vector in ConvVN1 layer 230 has 1×5×26=130 elements with a kernel size of 1×5 as the surface size and a number of channels of 26 in the lower layer as the depth, and each of these elements is the prediction matrix WLij. Also, in order to generate the output vectors having 20 channels in the ConvVN1 layer 230, 20 sets of these kernels are necessary. Therefore, the number of prediction matrices WLij of the kernel used to obtain the output vector in the ConvVN1 layer 230 is 130×20=2600. These prediction matrices WLij are updated by the learning of the machine learning model 200.
As can be seen from the above-described Equations (E1) to (E4), the output vector MLj at each node in the upper layer L+1 is obtained by the following calculation.
(a) The prediction vector vij is obtained by multiplying the output vector MLi at each node in the lower layer L by the prediction matrix WLij,
(b) the sum vector uj, which is the sum of the prediction vectors vij obtained from each node in the lower layer L, that is, the linear combination, is obtained,
(c) the activation value uj, which is the normalization coefficient obtained by normalizing the norm |uj| of the sum vector uj, is obtained, and
(d) the sum vector uj is divided by the norm |uj| and further multiplied by the activation value aj.
Note that the activation value aj is a normalization coefficient obtained by normalizing the norm |uj| for all the nodes in the upper layer L+1. Thus, the activation value aj can be considered as an index indicating a relative output intensity at each node among all the nodes in the upper layer L+1. The norm used in Equations (E3), (E3a), (E3b), and (4) is an L2 norm indicating a vector length in a typical example. At this time, the activation value aj corresponds to the vector length of the output vector ML+1j. Since the activation value aj is only used in the above-described Equations (E3) and (E4), it is not necessary to be output from the node. However, it is also possible to configure the upper layer L+1 to output the activation value aj to the outside.
The configuration of a vector neural network is almost the same as the configuration of a capsule network, and a vector neuron of a vector neural network corresponds to a capsule of a capsule network. However, the calculation according to the above-described Equations (E1) to (E4) used in the vector neural network is different from calculation used in the capsule network. The biggest difference between the two networks is that in the capsule network, the prediction vector vij on the right side of the above-described Equation (E2) is multiplied by a weight, and the weight is searched by repeating dynamic routing a plurality of times. On the other hand, in the vector neural network of the present embodiment, since the output vector ML+1j can be obtained by calculating the above-described Equations (E1) to (E4) once in order, there is an advantage in that it is not necessary to repeat the dynamic routing and the calculation is faster. In addition, in the vector neural network of the present embodiment, the amount of memory required for calculation is smaller than that of the capsule network, and, according to an experiment of the inventor of the present disclosure, there is an advantage in that a sufficient amount of memory is from approximately ½ to ⅓.
In terms of using a node at which a vector is input and output, the vector neural network is the same as the capsule network. Thus, the advantages of using the vector neuron are also common to the capsule network. In addition, regarding the feature of a larger region being expressed as the position is higher, and the feature of a smaller region being expressed as the position is lower in the plurality of layers 210 to 250, the vector neural network is the same as a normal convolutional neural network. Here, the term “feature” means a characteristic portion included in the input data to a neural network. Regarding the output vector at a certain node including spatial information representing spatial information of the feature represented by the node, the vector neural network and the capsule network are superior to the normal convolutional neural network. That is, the vector length of the output vector at the certain node represents an existence probability of the feature represented by the node, and a vector direction represents the spatial information such as a direction and a scale of the feature. Accordingly, vector directions of the output vectors at two nodes belonging to the same layer represent a positional relationship between the respective features. Alternatively, it can be said that the vector directions of the output vectors at two nodes represent a variation of the features. For example, in the case of a node corresponding to a feature of an “eye”, the direction of the output vector may represent variations such as the narrowness of the eye, the way the eye rises, and the like. In a normal convolutional neural network, it is said that the spatial information of the feature is lost via pooling processing. As a result, there is an advantage in that the vector neural network and the capsule network have excellent performance in identifying the input data compared with a normal convolutional neural network.
The advantages of vector neural networks can also be thought of as follows. That is, in the vector neural network, there is an advantage in that the output vector at the node expresses the feature of the input data as coordinates in a continuous space. Thus, the output vector can be evaluated such that the features are similar if the vector directions are close. In addition, there is also an advantage in that even if the feature included in the input data is not covered by the training data, the feature can be determined by interpolation. On the other hand, since a normal convolutional neural network is subjected to random compression by the pooling processing, there is a disadvantage in that the features of the input data cannot be expressed as the coordinates in the continuous space.
Since the outputs at the nodes in the ConvVN2 layer 240 and the ClassVN layer 250 are also determined in the same manner by using the above-described Equations (E1) to (E4), a detailed description thereof will be omitted. The resolution of the ClassVN layer 250, which is the uppermost layer, is 1×1, and the number of channels is n1.
The output of the ClassVN layer 250 is converted into a plurality of determination values Class 0 to Class 2 for the known classes. These determination values are typically values normalized by the softmax function. Specifically, for example, the determination value for each class can be obtained by executing a calculation including calculating the vector length of the output vector from the output vector at each node in the ClassVN layer 250 and normalizing the vector length of each node by the softmax function. As described above, the activation value aj obtained by the above-described Equation (E3) is a value corresponding to the vector length of the output vector ML+1j and is normalized. Accordingly, the activation value aj at each node in the ClassVN layer 250 may be output and used as is as the determination value for each class.
In the embodiment described above, as the machine learning model 200, the vector neural network for obtaining the output vector by the calculation of the above-described Equations (E1) to (E4) is used, but instead, the capsule network described in U.S. Pat. No. 5,210,798 and WO 2009/083553 may be used.
I. Other Aspects:
The present disclosure is not limited to the embodiments described above, and may be implemented in various aspects without departing from the spirits of the disclosure. For example, the present disclosure can be achieved in aspects described below. Appropriate replacements or combinations may be made to the technical features in the above-described embodiments which correspond to the technical features in the aspects described below to solve some or all of the problems of the disclosure or to achieve some or all of the advantageous effects of the disclosure. Additionally, when the technical features are not described herein as essential technical features, such technical features may be deleted appropriately.
(1) According to a first aspect of the present disclosure, a learning method for M (an integer of two or more) number of vector neural network type machine learning models including a plurality of vector neuron layers, the M number of machine learning models being used in determining a class of data to be determined is provided. The learning method includes (a) preparing a plurality of pieces of data for learning including data for input and a pre-label associated with the data for input; (b) dividing the plurality of pieces of data for learning into one or more groups to generate one or more input learning data groups; and (c) training the M number of machine learning models so that a correspondence between the data for input and the pre-label associated with the data for input is reproduced, by inputting the corresponding input learning data groups respectively into the M number of machine learning models, wherein (b) includes (b1) dividing the plurality of pieces of data for input into one or more regions to generate, as one of the input learning data groups, a collection of first type divided input data after division belonging to the same region, or (b2) dividing the plurality of pieces of data for learning belonging to one class into one or more groups to generate, as one of the input learning data groups, a collection of second type divided input data after division.
According to this aspect, a collection of first type divided input data can be used as one input learning data group in the learning of one machine learning model, or a collection of second type divided input data can be used as one input learning data group in the learning of one machine learning model. This makes it possible to reduce the amount of data used in the learning of one machine learning model, and thus suppresses the learning from taking a long time.
(2) According to a second aspect of the present disclosure, a determining method for determining a class of data to be determined using M (an integer of two or more) number of vector neural network type machine learning models including a plurality of vector neuron layers is provided. The determining method includes (a) preparing the M number of machine learning models trained using a plurality of pieces of data for learning including data for input and a pre-label associated with the data for input, wherein each one of the M number of machine learning models is trained using one corresponding group of one or more input learning data groups, the one or more input learning data groups being obtained by dividing the plurality of pieces of data for learning; (b) preparing M number of known feature spectrum groups corresponding to the M number of machine learning models after training, wherein the M number of known feature spectrum groups include a known feature spectrum group obtained from an output of a specific layer from among the plurality of vector neuron layers by inputting the input learning data groups into the M number of machine learning models after the training; (c) obtaining individual data used in class determination of the data to be determined for each one of the M number of machine learning models by inputting data to be determined for input generated from the data to be determined into each one of the M number of machine learning models after the training, wherein, for each one of the M number of machine learning models, the individual data is generated using at least one of (i) a similarity between a feature spectrum calculated from an output of the specific layer according to input of the data to be determined for input into the machine learning model and the known feature spectrum group, and (ii) an activation value corresponding to a determination value for each class output from an output layer of the machine learning model according to an input of the data to be determined for input; and (d) executing class determination for the data to be determined using M number of pieces of the individual data obtained respectively for the M number of machine learning models, wherein (a) includes one of (a1) dividing the plurality of pieces of data for input into one or more regions, and using a collection of first type divided input data after division belonging to the same region as one of the input learning data groups, and (a2) executing division processing to divide the plurality of pieces of data for learning belonging to one class into one or more groups, and using a collection of second type divided input data after the division processing as one of the input learning data groups.
According to this aspect, a collection of first type divided input data can be used as one input learning data group in the learning of one machine learning model, or a collection of second type divided input data can be used as one input learning data group in the learning of one machine learning model. This makes it possible to determine a class of the data to be determined using a machine learning model that can reduce the amount of data used in the learning of one machine learning model, and thus suppresses the learning from taking a long time.
(3) In the aspect described above, each one of the M number of pieces of individual data may include the activation value corresponding to each class; and (d) may include setting, as a determination class, a class with a highest activation value for determination calculated using a cumulative activation value obtained by adding together the activation values of the M number of pieces of individual data for each class. According to this aspect, the determination class can be easily determined using the activation value for determination.
(4) In the aspect described above, each one of the M number of pieces of individual data may include the similarity; and (d) may include (d1) generating a similarity for determination by integrating the respective similarities of the machine learning models, and (d2) setting, when the similarity for determination is equal to or greater than a predetermined threshold, a class with a highest activation value for determination as the determination class, and setting, when the similarity for determination is less than the threshold, regardless of the activation value for determination, an unknown class different from a class corresponding to a pre-label, as the determination class. According to this aspect, the determination class can be determined with high accuracy using the similarity for determination and the activation value for determination.
(5) In the aspect described above, (c) may include generating a class corresponding to the activation value with a highest value from among the activation values corresponding to the classes for each machine learning model as a pre-determined class as an element of the individual data. According to this aspect, a class corresponding to the determination class candidate can be generated by setting the class corresponding the activation value with the highest value as the pre-determined class.
(6) In the aspect described above, (c) may include generating, when the similarity is equal to or greater than a predetermined threshold, a class corresponding to the activation value with a highest value from among the activation values corresponding to the classes as a pre-determined class as an element of the individual data for each machine learning model, and generating, when the similarity is less than the threshold, the pre-determined class as an unknown class different from a class corresponding to the pre-label as an element of the individual data for each machine learning model. According to this aspect, since an unknown class can be set as a pre-determined class, class determination of the data to be determined can be executed with higher accuracy.
(7) In the aspect described above, (c) may include (i) for each one of the plurality of specific layers, calculating a multiplication value obtained by multiplying a weighting coefficient set for each one of the plurality of specific layers and the similarity corresponding to one of the specific layers and setting a sum of the multiplication values calculated as the similarity used in class determination, or (ii) setting a maximum value or a minimum value of the similarities corresponding to the plurality of specific layers as the similarity used in class determination. According to this aspect, even when a plurality of specific layers are provided, similarity used for the class determination can be easily calculated.
(8) In the aspect described above, each known feature spectrum included in the known feature spectrum group prepared in (b) is associated with class classification information indicating which class the known feature spectrum belongs to, and when the known feature spectrum associated with the class classification information is referred to as a by class known feature spectrum, (c) may include calculating a by class similarity, which is the similarity between the by class known feature spectrum and the feature spectrum, for each class, and generating the class associated with the by class similarity with a highest value from among the class similarities calculated for the classes as a pre-determined class as an element of the individual data. According to this aspect, the pre-determined class can be easily generated using the by class similarity.
(9) In the aspect described above, the calculating of the by class similarity in (c) may include
calculating the similarity between each one of the plurality of by class known feature spectrums and the feature spectrum for each class, and calculating a representative similarity of the plurality of similarities for each class by executing statistical processing of the plurality of similarities calculated for each class; and the generation in (c) may include generating the class associated with the representative similarity with a highest value from among the representative similarities calculated for each class as the pre-determined class as an element of the individual data. According to this aspect, the pre-determined class can be easily generated using the representative similarity.
(10) In the aspect described above, the statistical processing of the plurality of similarities may include calculating a maximum value, a median value, an average value, or a modal value of the plurality of similarities as the representative similarity. According to this aspect, the pre-determined class can be easily generated using the representative similarity.
(11) In the aspect described above, (c) may further include generating an unknown class different from a class corresponding to the pre-label instead of the class associated with the by class similarity as the pre-determined class as an element of the individual data when the highest value is less than a predetermined threshold. According to this aspect, an unknown class can be generated as the pre-determination class, and thus a pre-determined class can be generated with higher accuracy.
(12) In the aspect described above, (d) may include setting a most prevalent class from the pre-determined classes included in the individual data of the plurality of machine learning models as a class of the data to be determined. According to this aspect, the class of the data to be determined can be easily determined using the pre-determined class.
(13) In the aspect described above, (c) may include generating a similarity between a feature spectrum calculated from an output of the specific layer and the known feature spectrum group as an element of the individual data; and (d) may include (i) setting a class with a highest value for the similarity from among the pre-determined classes included in the individual data of the plurality of machine learning models as a class of the data to be determined, or (ii) calculating a sum or product of the similarities included in the individual data of classes with the same pre-determined class and setting the pre-determined class with a highest calculated value as a class of the data to be determined. According to this aspect, the determination class of the data to be determined can be easily generated using the similarity.
(14) In the aspect described above, (c) may include generating a similarity between a feature spectrum calculated from an output of the specific layer and the known feature spectrum group as an element of the individual data; and (d) may include calculating a reference value for each one of the plurality of machine learning models using the similarity and a weighting coefficient preset for each one of the plurality of machine learning models, and setting a class of the data to be determined using the pre-determined class and the reference value calculated. According to this aspect, the class of the data to be determined can be determined taking into account the weighting coefficient set for each machine learning model.
(15) In the aspect described above, in the reference value calculating step, the reference value may be calculated for each one of the machine learning models by multiplying the similarity and the weighting coefficient; and the class setting step may include (i) setting the pre-determined class with a highest sum of the reference values of the machine learning models with the same pre-determined class as a class of the data to be determined, or (ii) setting the pre-determined class of the machine learning model with a maximum or minimum value for the reference value as a class of the data to be determined. According to this aspect, the class of the data to be determined can be determined taking into account the weighting coefficient set for each machine learning model.
(16) In the aspect described above, (d) may include, when one of the plurality of pre-determined classes corresponding to the plurality of machine learning models indicates an unknown class different from a class corresponding to the pre-label, setting the unknown class as a class of the data to be determined, regardless of classes indicated by other pre-determined classes. According to this aspect, a class of the data to be determined can be set as the unknown class when one of the pre-determined classes indicates an unknown class.
(17) In the aspect described above, the division processing in (a2) may be (i) executed by performing clustering of the plurality of pieces of data for learning belonging to the same class, or (ii) executed by randomly extracting the plurality of pieces of data for learning belonging to the same class via sampling with replacement. According to this aspect, the collection of the second type divided input data can be easily generated by performing clustering of the plurality of pieces of data for learning or by randomly extraction via sampling with replacement.
(18) According to a third aspect of the present disclosure, a learning apparatus for M (an integer of two or more) number of vector neural network type machine learning models including a plurality of vector neuron layers, the M number of machine learning models being used in determining a class of data to be determined is provided. The learning apparatus includes a memory; and a processor configured to execute training of the M number of machine learning models, wherein the processor executes processing to divide a plurality of pieces of data for learning including data for input and a pre-label associated with the data for input into one or more groups to generate the one or more input learning data groups, and processing to train the M number of machine learning models so that a correspondence between the data for input and the pre-label associated with the data for input is reproduced, by inputting the corresponding input learning data groups respectively into the M number of machine learning models; the processing to generate the one or more input learning data groups includes
processing to divide the plurality of pieces of data for input into one or more regions and generate, as one of the input learning data groups, a collection of first type divided input data after division belonging to the same region, or processing to divide the plurality of pieces of data for learning belonging to one class into one or more groups to generate, as one of the input learning data groups, a collection of second type divided input data after division.
According to this aspect, a collection of first type divided input data can be used as one input learning data group in the learning of one machine learning model, or a collection of second type divided input data can be used as one input learning data group in the learning of one machine learning model. This makes it possible to reduce the amount of data used in the learning of one machine learning model, and thus suppresses the learning from taking a long time.
(19) According to a fourth aspect of the present disclosure, a determining apparatus for determining a class of data to be determined using M (an integer of two or more) number of vector neural network type machine learning models including a plurality of vector neuron layers is provided. The determining apparatus includes a memory configured to store the M number of machine learning models trained using a plurality of pieces of data for learning including data for input and a pre-label associated with the data for input, wherein each one of the M number of machine learning models is trained using one corresponding group of one or more input learning data groups, the one or more input learning data groups being obtained by dividing the plurality of pieces of data for learning; and
a processor configured to execute class determination of the data to be determined by inputting the data to be determined into the M number of machine learning models, wherein the processor executes processing to generate M number of known feature spectrum groups corresponding to the M number of machine learning models after training, wherein the M number of known feature spectrum groups include a known feature spectrum group obtained from an output of a specific layer from among the plurality of vector neuron layers by inputting the input learning data groups into the M number of machine learning models after the training; processing to obtain individual data used in class determination of the data to be determined for each one of the M number of machine learning models by inputting data to be determined for input generated from the data to be determined into each one of the M number of machine learning models after the training, wherein, for each one of the M number of machine learning models, the individual data is generated using at least one of (i) a similarity between a feature spectrum calculated from an output of the specific layer according to input of the data to be determined for input into the machine learning model and the known feature spectrum group, and (ii) an activation value corresponding to a determination value for each class output from an output layer of the machine learning model according to an input of the data to be determined for input; and processing to execute class determination for the data to be determined using M number of pieces of the individual data obtained respectively for the M number of machine learning models; and the input learning data group is either
a collection of first type divided input data after division belonging to the same region of one or more regions obtained by dividing the plurality of pieces of data for input, or a collection of second type divided input data after the division processing in which the plurality of pieces of data for learning belonging to one class are divided into one or more groups.
According to this aspect, a collection of first type divided input data can be used as one input learning data group in the learning of one machine learning model, or a collection of second type divided input data can be used as one input learning data group in the learning of one machine learning model. This makes it possible to determine a class of the data to be determined using a machine learning model that can reduce the amount of data used in the learning of one machine learning model, and thus suppresses the learning from taking a long time.
(20) According to a fifth aspect of the present disclosure, a non-transitory computer-readable storage medium storing a computer program configured to cause a processor to execute training of M (an integer of two or more) number of vector neural network type machine learning models including a plurality of vector neuron layers, the M number of machine learning models being used in determining a class of data to be determined is provided. The computer program includes a function (a) of dividing a plurality of pieces of data for learning including data for input and a pre-label associated with the data for input into one or more groups to generate the one or more input learning data groups; and a function (b) of training the M number of machine learning models so that a correspondence between the data for input and the pre-label associated with the data for input is reproduced, by inputting the corresponding input learning data groups respectively into the M number of machine learning models, wherein the function (a) includes
a function of dividing the plurality of pieces of data for input into one or more regions to generate, as one of the input learning data groups, a collection of first type divided input data after division belonging to the same region, or a function of dividing the plurality of pieces of data for learning belonging to one class into one or more groups to generate, as one of the input learning data groups, a collection of second type divided input data after division.
According to this aspect, a collection of first type divided input data can be used as one input learning data group in the learning of one machine learning model, or a collection of second type divided input data can be used as one input learning data group in the learning of one machine learning model. This makes it possible to reduce the amount of data used in the learning of one machine learning model, and thus suppresses the learning from taking a long time.
(21) According to a sixth aspect of the present disclosure, a non-transitory computer-readable storage medium storing a computer program configured to cause a processor to execute determination of a class of data to be determined using M (an integer of two or more) number of vector neural network type machine learning models including a plurality of vector neuron layers is provided. The computer program includes a function (a) of storing the M number of machine learning models trained using a plurality of pieces of data for learning including data for input and a pre-label associated with the data for input, wherein each one of the M number of machine learning models is trained using one corresponding group of one or more input learning data groups, the one or more input learning data groups being obtained by dividing the plurality of pieces of data for learning; a function (b) of generating M number of known feature spectrum groups corresponding to the M number of machine learning models after training, wherein the M number of known feature spectrum groups include a known feature spectrum group obtained from an output of a specific layer from among the plurality of vector neuron layers by inputting the input learning data groups into the M number of machine learning models after the training; a function (c) of obtaining individual data used in class determination of the data to be determined for each one of the M number of machine learning models by inputting data to be determined for input generated from the data to be determined into each one of the M number of machine learning models after the training, wherein, for each one of the M number of machine learning models, the individual data is generated using at least one of (i) a similarity between a feature spectrum calculated from an output of the specific layer according to input of the data to be determined for input into the machine learning model and the known feature spectrum group, and (ii) an activation value corresponding to a determination value for each class output from an output layer of the machine learning model according to an input of the data to be determined for input; and a function (d) of executing class determination for the data to be determined using M number of pieces of the individual data obtained respectively for the M number of machine learning models, wherein the input learning data group is either a collection of first type divided input data after division belonging to the same region of one or more regions obtained by dividing the plurality of pieces of data for input, or a collection of second type divided input data after the division processing in which the plurality of pieces of data for learning belonging to one class are divided into one or more groups.
According to this aspect, a collection of first type divided input data can be used as one input learning data group in the learning of one machine learning model, or a collection of second type divided input data can be used as one input learning data group in the learning of one machine learning model. This makes it possible to determine a class of the data to be determined using a machine learning model that can reduce the amount of data used in the learning of one machine learning model, and thus suppresses the learning from taking a long time.
The present disclosure may be embodied in various forms other than that described above. For example, the present disclosure can be embodied as a non-transitory storage medium storing a computer program.
Number | Date | Country | Kind |
---|---|---|---|
2021-200573 | Dec 2021 | JP | national |