The present application is based on, and claims priority from JP Application Serial Number 2021-189877, filed Nov. 24, 2021, the disclosure of which is hereby incorporated by reference herein in its entirety.
The present disclosure relates to a regression processing device configured to execute regression processing using a machine learning model, a method, and a non-transitory computer-readable storage medium storing a computer program.
U.S. Pat. No. 5,210,798 and WO 2019/083553 each disclose a so-called capsule network as a machine learning model of a vector neural network type using a vector neuron. The vector neuron indicates a neuron where an input and an output are in a vector expression. The capsule network is a machine learning model where the vector neuron called a capsule is a node of a network. The vector neural network-type machine learning model such as a capsule network is applicable to input data classification processing.
However, there has not been sufficient examination on application of the vector neural network to regression processing in the related art, and hence a technique enabling highly accurate regression processing using the vector neural network has been desired.
According to a first aspect of the present disclosure, there is provided a regression processing device configured to execute regression processing of obtaining a predicted output value with respect to input data using a machine learning model including a vector neural network including a plurality of vector neuron layers. The regression processing device includes a regression processing unit configured to execute the regression processing, and a memory configured to store a known feature spectrum group obtained from an output of a specific layer of the machine learning model when a plurality of pieces of teaching data are input to the machine learning model. The regression processing unit is configured to execute processing (a) of obtaining the predicted output value with respect to the input data using the machine learning model, processing (b) of reading out the known feature spectrum group from the memory, processing (c) of calculating a degree of similarity relating to the predicted output value between the known feature spectrum group and a feature spectrum obtained from an output of the specific layer when the input data is input to the machine learning model, and processing (d) of outputting the predicted output value using the degree of similarity.
According to a second aspect of the present disclosure, there is provided a method of executing regression processing of obtaining a predicted output value with respect to input data using a machine learning model including a vector neural network including a plurality of vector neuron layers. The method includes (a) obtaining the predicted output value with respect to the input data using the machine learning model, (b) reading out, from a memory, a known feature spectrum group obtained from an output of a specific layer of the machine learning model when a plurality of pieces of teaching data are input to the machine learning model, (c) calculating a degree of similarity relating to the predicted output value between the known feature spectrum group and a feature spectrum obtained from an output of the specific layer when the input data is input to the machine learning model, and (d) outputting the predicted output value using the degree of similarity.
According to a third aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing a computer program for causing a processor to execute regression processing of obtaining a predicted output value with respect to input data using a machine learning model including a vector neural network including a plurality of vector neuron layers. The computer program causes the processor to (a) obtain the predicted output value with respect to the input data using the machine learning model, (b) read out, from a memory, a known feature spectrum group obtained from an output of a specific layer of the machine learning model when a plurality of pieces of teaching data are input to the machine learning model, (c) calculate a degree of similarity relating to the predicted output value between the known feature spectrum group and a feature spectrum obtained from an output of the specific layer when the input data is input to the machine learning model, and (d) output the predicted output value using the degree of similarity.
The information processing device 100 includes a processor 110, a memory 120, an interface circuit 130, and an input device 140 and a display device 150 that are coupled to the interface circuit 130. The camera 400 is also coupled to the interface circuit 130. Although not limited thereto, for example, the processor 110 is provided with a function of executing processing, which is described below in detail, as well as a function of displaying, on the display device 150, data obtained through the processing and data generated in the course of the processing.
The processor 110 functions as a learning execution unit 112 that executes learning of a machine learning model and a regression processing unit 114 that executes regression processing for input data. The regression processing unit 114 includes a degree of similarity arithmetic unit 310 and an output execution unit 320. Each of the learning execution unit 112 and the regression processing unit 114 are implemented when the processor 110 executes a computer program stored in the memory 120. Alternatively, the learning execution unit 112 and the regression processing unit 114 may be implemented with a hardware circuit. The processor in the present disclosure is a term including such a hardware circuit. Further, one or a plurality of processors that execute learning processing or regression processing may be a processor included in one or a plurality of remote computers that are coupled via a network.
In the memory 120, a machine learning model 200, a teaching data group TD, and a known feature spectrum group GKSp are stored. The machine learning model 200 is used for processing executed by the regression processing unit 114. A configuration example and an operation of the machine learning model 200 are described later. The teaching data group TD is a group of labeled data used for learning of the machine learning model 200. In the present exemplary embodiment, the teaching data group TD is a set of image data. The known feature spectrum group GKSp is a set of feature spectra that are obtained by inputting teaching data to the machine learning model 200 that is previously learned. The feature spectrum is described later.
In the example of
An image having a size of 28×28 pixels is input into the input layer 210. A configuration of each of the layers other than the input layer 210 is described as follows.
In the description for each of the layers, the character string before the brackets indicates a layer name, and the numbers in the brackets indicate the number of channels, a kernel surface size, and a stride in the stated order. For example, the layer name of the Conv layer 220 is “Conv”, the number of channels is 32, the kernel surface size is 5×5, and the stride is two. In
Each of the input layer 210 and the Conv layer 220 is a layer configured as a scholar neuron. Each of the other layers 230 to 260 is a layer configured as a vector neuron. The vector neuron is a neuron where an input and an output are in a vector expression. In the description given above, the dimension of an output vector of an individual vector neuron is 16, which is constant. In the description given below, the term “node” is used as a superordinate concept of the scholar neuron and the vector neuron.
In
As is well known, a resolution W1 after convolution is given with the following equation.
W1=Ceil{(W0−Wk+1)/S} (A1)
Here, W0 is a resolution before convolution, Wk is the kernel surface size, S is the stride, and Ceil{X} is a function of rounding up digits after the decimal point in the value X.
The resolution of each of the layers illustrate in
The RegressVN layer 260 has M channels. M is the number of predicted output values that are output from the machine learning model 200. In the present exemplary embodiment, M is one, and one predicted output value θpr is output. The predicted output value θpr is not a discrete value but a continuous value. The number M of predicted output values may be two or more. For example, when a three-dimensional object image is used as input data, the machine learning model 200 may be configured so that three rotation angles about three axes thereof are obtained as predicted output values.
As an active function of the RegressVN layer 260, the linear function in Equation (A2) may be used.
[Mathematical Expression 1]
a
j
=∥u
j∥ (A2)
Here, aj indicates a norm of an output vector after activation in the j-th neuron in the layer, uj is an output vector before activation in the j-th neuron in the layer, and indicates a norm of a vector uj. In other words, an output of the RegressVN layer 260 is a value corresponding to a length of the vector uj before activation.
As an active function of the RegressVN layer 260, various functions other than the linear function in Equation (A2) given above may be used. However, a softmax function is not suitable. A freely-selected active function may be used for a layer other than the RegressVN layer 260.
In
As illustrated in
In the present disclosure, a vector neuron layer used for calculation of the degree of similarity is also referred to as a “specific layer”. As the specific layer, the vector neuron layers other than the ConvVN2 layer 250 may be used. One or more vector neuron layers may be used, and the number of vector neuron layers is freely selectable. Note that a configuration of the feature spectrum and an arithmetic method of the degree of similarity using the feature spectrum are described later.
In Step S120, the learning execution unit 112 uses the teaching data group TD, and thus executes learning of the machine learning model 200. A freely selected loss function may be used at the time of learning. In the present exemplary embodiment, Mean Square Error (MSE) is used. After completion of learning, the machine learning model 200 that is previously learned is stored in the memory 120.
In Step S130, the learning execution unit 112 inputs a plurality of pieces of teaching data again to the machine learning model 200 that is previously learned, and generates the known feature spectrum group GKSp. The known feature spectrum group GKSp is a set of feature spectra, which is described later.
The vertical axis in
The number of feature spectra Sp that can be obtained from an output of the ConvVN2 layer 250 with respect to one piece of input data is equal to the number of plane positions (x, y) of the ConvVN2 layer 250, in other words, the number of partial regions R250, which is nine.
The learning execution unit 112 inputs the teaching data again to the machine learning model 200 that is previously learned, calculates the feature spectra Sp illustrated in
Each record in the known feature spectrum group GKSp includes a parameter k indicating the order of the partial region Rn in the layer, a parameter c indicating the class, a parameter q indicating the data number, and a known feature spectrum KSp. The known feature spectrum KSp is the same as the feature spectrum Sp in
The parameter k of the partial region Rn is a value indicating any one of the plurality of partial regions Rn included in the specific layer, in other words, any one of the plane positions (x, y). In a case of the ConvVN2 layer 250, the number of partial regions R250 is nine, and hence k=1 to 9. The parameter q of the data number indicates a serial number of the teaching data, and is a value from 1 to max. For example, max=5000.
The plurality of pieces of teaching data used in Step S130 are not required to be the same as the plurality of pieces of teaching data used in Step S120. When part or entirety of the plurality of pieces of teaching data used in Step S120 is also used in Step S130, there is no need to prepare new teaching data, which is advantageous.
In Step S240, the regression processing unit 114 inputs the input data to the machine learning model 200, and obtains the predicted output value θpr. In the present exemplary embodiment, the predicted output value θpr is a rotation angle of the hand-written characters included in the input data. In Step S250, the regression processing unit 114 obtains the feature spectrum Sp illustrated in
In Step S270, the output execution unit 320 outputs the predicted output value θpr using the degree of similarity.
As described above, in the exemplary embodiment described above, the regression processing can be executed at high accuracy using the machine learning model 200 including the vector neural network. Further, the predicted output value θpr is output using the degree of similarity Sm. Thus, the predicted output value θpr that is reliable can be obtained.
For example, any one of the following methods may be employed as the arithmetic method of the degree of similarity described above.
(1) A first arithmetic method M1 for obtaining a degree of similarity without considering correspondence of partial regions Rn of the feature spectrum Sp and the known feature spectrum group GKSp.
(2) A second arithmetic method M2 for obtaining a degree of similarity in the corresponding partial regions Rn of the feature spectrum Sp and the known feature spectrum group GKSp.
(3) A third arithmetic method M3 for obtaining a degree of similarity without considering the partial region Rn at all
In the following description, description is sequentially made on methods of calculating a degree of similarity from an output of the ConvVN2 layer 250 while following those arithmetic methods M1, M2, and M3.
In the first arithmetic method M1, the local degree of similarity S(k) is calculated using the following equation.
S(k)=max[G{Sp(k),KSp(k=all,q=all)}] (B1),
where
k is a parameter indicating the partial region Rn;
q is a parameter indicating the data number;
G{a, b} is the function for obtaining a degree of similarity between a and b;
Sp(k) is the feature spectrum obtained from an output of the specified partial region k of the specific layer in accordance with the input data;
KSp(k=all, q=all) are known feature spectra for all the data numbers q in all the partial regions k of the specific layer in the known feature spectrum group GKSp illustrated in
max[X] is a logical operation for obtaining a maximum value of the values X.
Note that, as the function G{a, b} for obtaining the degree of similarity, for example, an equation for obtaining a cosine degree of similarity or a degree of similarity corresponding to a distance may be used.
The three types of the degrees of similarity Sm, which are illustrated on the right side of
As described above, in the first arithmetic method M1 for obtaining a degree of similarity,
(1) the local degree of similarity S(k) is obtained, the local degree of similarity S(k) being a degree of similarity between the feature spectrum Sp obtained from an output of the specified partial region k of the specific layer in accordance with the input data and all the known feature spectra KSp associated with the specific layer, and
(2) the degree of similarity Sm is obtained by obtaining the maximum value, the average value, or the minimum value of the local degree of similarity S(k) for the plurality of partial regions k.
With the first arithmetic method M1, the degree of similarity Sm can be obtained in a calculation and a procedure that are relatively simple.
S(k)=max[G{Sp(k),KSp(k,q=all)}] (B2),
where
KSp(k, q=all) are known feature spectra for all the data numbers q in the specified partial region k of the specific layer in the known feature spectrum group GKSp illustrated in
In the first arithmetic method M1 described above, the known feature spectrum KSp(k=all, q=all) in all the partial regions k of the specific layer is used. In contrast, the second arithmetic method M2 only uses the known feature spectrum KSp(k, q=all) in the partial region k same as the partial region k of the feature spectrum Sp(k). Other contents of the second arithmetic method M2 are similar to those of the first arithmetic method M1.
In the second arithmetic method M2 for obtaining a degree of similarity,
(1) the local degree of similarity S(k) is obtained, the local degree of similarity S(k) being a degree of similarity between the feature spectrum Sp obtained from an output of the specified partial region k of the specific layer in accordance with the input data and all the known feature spectra KSp associated with the specified partial region K of the specific layer, and
(2) the degree of similarity Sm is obtained by obtaining the maximum value, the average value, or the minimum value of the local degree of similarity S(k) for the plurality of partial regions k.
With the second arithmetic method M2, the degree of similarity Sm can also be obtained in a calculation and a procedure that are relatively simple.
The degree of similarity Sm obtained in the third arithmetic method M3 is calculated using the following equation.
Sm=max[G{Sp(k=all),KSp(k=all,q=all)}} (B3),
where
Sp(k=all) is the feature spectrum obtained from an output of all the partial regions k of the specific layer in accordance with the input data.
As described above, with the third arithmetic method M3 for obtaining a degree of similarity, there is obtained (1) the degree of similarity Sm being a degree of similarity between all the feature spectra Sp obtained from an output of the specific layer according to the input data and all the known feature spectra KSp associated with the specific layer.
With the third arithmetic method M3, the degree of similarity Sm can be obtained in a calculation and a procedure that are further simple.
Each of the three arithmetic methods M1 to M3 described above is a method for calculating a degree of similarity using an output of one specific layer. However, a calculation for a degree of similarity can be executed while one or more of the plurality of vector neuron layers 240, 250, and 260 illustrated in
Arithmetic methods for obtaining an output of each of the layers illustrated in
For each of the nodes of the PrimeVN layer 230, a vector output of the node is obtained by regarding scholar outputs of 1×1×32 nodes of the Conv layer 220 as 32-dimensional vectors and multiplying the vectors by a transformation matrix. In the transformation matrix, a surface size is a 1×1 kernel element. The transformation matrix is updated by learning of the machine learning model 200. Note that processing in the Conv layer 220 and processing in the PrimeVN layer 230 may be integrated so as to configure one primary vector neuron layer.
When the PrimeVN layer 230 is referred to as a “lower layer L”, and the ConvVN1 layer 240 that is adjacent on the upper side is referred to as an “upper layer L+1”, an output of each node of the upper layer L+1 is determined using the following equations.
where
MLi is an output vector of an i-th node in the lower layer L;
ML+1j is an output vector of a j-th node in the upper layer L+1;
vij is a predicted vector of the output vector ML+1j;
WLij is a predicted matrix for calculating the predicted vector vij from the output vector MLi of the lower layer L;
uj is a sum vector being a sum of the predicted vector vij, that is, a linear combination;
aj is an activation value being a normalization coefficient obtained by normalizing a norm |uj| of the sum vector uj; and
F(X) is a normalization function for normalizing X.
For example, as the normalization function F(X), Equation (E3a) or Equation (E3b) given below may be used.
where
k is an ordinal number for all the nodes in the upper layer L+1; and
β is an adjustment parameter being a freely-selected positive coefficient, for example, β=1.
In addition to this, the Sigmoid function may be used as a normalization function F(X). The Sigmoid function is used to collectively refer to a function with an S-like curve in a graph. Examples thereof include a logistic function F(x)=1/(1+exp(−βx)) and a hyperbolic tangent function tanh(x).
In Equation (E3a) given above, the activation value aj is obtained by normalizing the norm |uj| of the sum vector uj with the softmax function for all the nodes in the upper layer L+1. Meanwhile, in Equation (E3b), the norm |uj| of the sum vector uj is divided by the sum of the norm |uj| of all the nodes in the upper layer L+1. With this, the activation value aj is obtained. Note that, as the normalization function F(X), a function other than Equation (E3a) and Equation (E3b) may be used.
For sake of convenience, the ordinal number i in Equation (E2) given above is allocated to each of the nodes in the lower layer L for determining the output vector ML+1j of the j-th node in the upper layer L+1, and is a value from 1 to n. Further, the integer n is the number of nodes in the lower layer L for determining the output vector ML+1j of the j-th node in the upper layer L+1. Therefore, the integer n is provided in the equation given below.
n=Nk×Nc (E5)
Here, Nk is a kernel surface size, and Nc is the number of channels of the PrimeVN layer 230 being a lower layer. In the example of
One kernel used for obtaining an output vector of the ConvVN1 layer 240 has 144 (3×3×16) elements, each of which has a surface size being a kernel size of 3×3, and has a depth being the number of channels in the lower layer of 16. Each of the elements is a prediction matrix WLij. Further, in order to generate output vectors of 12 channels of the ConvVN1 layer 240, 12 kernel pairs are required. Therefore, the number of predication matrices WLij of the kernels used for obtaining output vectors of the ConvVN1 layer 240 is 1,728 (144×12). Those prediction matrices WLij are updated by learning of the machine learning model 200.
As understood from Equation (E1) to Equation (E4) given above, the output vector ML+1j of each of the nodes in the upper layer L+1 is obtained by the following calculation.
(A) the predicted vector vij is obtained by multiplying the output vector MLi of each of the nodes in the lower layer L by the prediction matrix WLij;
(b) the sum vector uj being a sum of the predicted vectors vij of the respective nodes in the lower layer L, which is a linear combination, is obtained;
(c) the activation value aj being a normalization coefficient is obtained by normalizing the norm |uj| of the sum vector uj; and
(d) the sum vector uj is divided by the norm |uj|, and is further multiplied by the activation value aj.
Note that the activation value aj is a normalization coefficient that is obtained by normalizing the norm |uj| for all the nodes in the upper layer L+1. Therefore, the activation value aj can be considered as an index indicating a relative output intensity of each of the nodes among all the nodes in the upper layer L+1. The norm used in Equation (E3) and Equation (4) is an L2 norm indicating a vector length in a general example. In this case, the activation value aj corresponds to a vector length of the output vector ML+1j. The activation value aj is only used in Equation (E3) and Equation (E4) given above, and hence is not required to be output from the node. However, the upper layer L+1 may be configured so that the activation value aj is output to the outside.
A configuration of the vector neural network is substantially the same as a configuration of the capsule network, and the vector neuron in the vector neural network corresponds to the capsule in the capsule network. However, the calculation with Equation (E1) to Equation (E4) given above, which are used in the vector neural network, is different from a calculation used in the capsule network. The most significant difference between the two calculations is that, in the capsule network, the predicted vector vij in the right side of Equation (E2) given above is multiplied by a weight and the weight is searched by repeating dynamic routing for a plurality of times. Meanwhile, in the vector neural network of the present exemplary embodiment, the output vector ML+1j is obtained by calculating Equation (E1) to Equation (E4) given above once in a sequential manner. Thus, there is no need of repeating dynamic routing, and the calculation can be executed faster, which are advantageous points. Further, the vector neural network of the present exemplary embodiment has a less memory amount, which is required for the calculation, than the capsule network. According to an experiment conducted by the inventor of the present disclosure, the vector neural network requires approximately ⅓ to ½ of the memory amount of the capsule network, which is also an advantageous point.
The vector neural network is similar to the capsule network in that a node with an input and an output in a vector expression is used. Therefore, the vector neural network is also similar to the capsule network in that the vector neuron is used. Further, in the plurality of layers 220 to 260, the upper layers indicate a feature of a larger region, and the lower layers indicate a feature of a smaller region, which is similar to the general convolution neural network. Here, the “feature” indicates a feature included in input data to the neural network. In the vector neural network or the capsule network, an output vector of a certain node contains space information indicating information relating to a spatial feature expressed by the node. In this regard, the vector neural network or the capsule network are superior to the general convolution neural network. In other words, a vector length of an output vector of the certain node indicates an existence probability of a feature expressed by the node, and the vector direction indicates space information such as a feature direction and a scale. Therefore, vector directions of output vectors of two nodes belonging to the same layer indicate positional relationships of the respective features. Alternatively, it can also be said that vector directions of output vectors of the two nodes indicate feature variations. For example, when the node corresponds to a feature of an “eye”, a direction of the output vector may express variations such as smallness of an eye and an almond-shaped eye. It is said that, in the general convolution neural network, space information relating to a feature is lost due to pooling processing. As a result, as compared to the general convolution neural network, the vector neural network and the capsule network are excellent in a function of distinguishing input data.
The advantageous points of the vector neural network can be considered as follows. In other words, the vector neural network has an advantageous point in that an output vector of the node expresses features of the input data as coordinates in a successive space. Therefore, the output vectors can be evaluated in such a manner that similar vector directions show similar features. Further, even when features contained in input data are not covered in teaching data, the features can be interpolated and can be distinguished from each other, which is also an advantageous point. In contrast, in the general convolution neural network, disorderly compaction is caused due to pooling processing, and hence features in input data cannot be expressed as coordinates in a successive space, which is a drawback.
An output of each of the node in the ConvVN2 layer 250 and the RegressVN layer 260 are similarly determined through use Equation (E1) to Equation (E4) given above, and detailed description thereof is omitted. A resolution of the RegressVN layer 260 being the uppermost layer is 1×1, and the number of channels thereof is M.
In the RegressVN layer 260, the linear function in Equation (A2) given above or the like may be used as an active function in place of Equation (E3) given above. In other words, the output vector of the RegressVN layer 260 is converted into the predicted output value θpre by the linear function in Equation (A2) given above. Alternatively, the above-mentioned Sigmoid function may be used as an active function.
In the exemplary embodiment described above, as the machine learning model 200, the vector neural network that obtains an output vector by a calculation with Equation (E1) to Equation (E4) given above is used. Instead, the capsule network disclosed in each of U.S. Pat. No. 5,210,798 and WO 2019/083553 may be used.
Other Aspects:
The present disclosure is not limited to the exemplary embodiment described above, and may be implemented in various aspects without departing from the spirits of the disclosure. For example, the present disclosure can also be achieved in the following aspects. Appropriate replacements or combinations may be made to the technical features in the above-described exemplary embodiment which correspond to the technical features in the aspects described below to solve some or all of the problems of the disclosure or to achieve some or all of the advantageous effects of the disclosure. Additionally, when the technical features are not described herein as essential technical features, such technical features may be deleted appropriately.
(1) According to a first aspect of the present disclosure, there is provided a regression processing device configured to execute regression processing of obtaining a predicted output value with respect to input data using a machine learning model including a vector neural network including a plurality of vector neuron layers. The regression processing device includes a regression processing unit configured to execute the regression processing, and a memory configured to store a known feature spectrum group obtained from an output of a specific layer of the machine learning model when a plurality of pieces of teaching data are input to the machine learning model. The regression processing unit is configured to execute processing (a) of obtaining the predicted output value with respect to the input data using the machine learning model, processing (b) of reading out the known feature spectrum group from the memory, processing (c) of calculating a degree of similarity relating to the predicted output value between the known feature spectrum group and a feature spectrum obtained from an output of the specific layer when the input data is input to the machine learning model, and processing (d) of outputting the predicted output value using the degree of similarity.
With this device, the regression processing can be executed at high accuracy using the machine learning model including the vector neural network. Further, the predicted output value is output using the degree of similarity. Thus, a high degree of similarity and a highly reliable predicted output value can be obtained.
(2) In the regression processing device described above, the processing (d) may involve processing of outputting the degree of similarity, together with the predicted output value.
With this device, a user is allowed to determine whether the predicted output value is reliable based on the degree of similarity.
(3) In the regression processing device described above, the processing (d) may involve processing of outputting of a degree of reliability of the predicted output value according to the degree of similarity, together with the predicted output value.
With this device, a user can easily understand the degree of reliability of the predicted output value.
(4) In the regression processing device described above, the processing (d) may involve processing of determining that the predicted output value is valid when the degree of similarity is equal to or greater than a predetermined threshold value and determining that the predicted output value is invalid when the degree of similarity is less than the threshold value.
With this device, when the degree of similarity is less than the threshold value, the degree of reliability of the predicted output value is low. Thus, it can be determined that the predicted output value obtained by the machine learning model is invalid.
(5) In the regression processing device described above, the specific layer may have a configuration in which a vector neuron arranged in a plane defined with two axes including a first axis and a second axis is arranged as a plurality of channels along a third axis being a direction different from the two axes. The feature spectrum may be any one of (i) a first type of a feature spectrum obtained by arranging a plurality of element values of an output vector of a vector neuron at one plane position in the specific layer, over the plurality of channels along the third axis, (ii) a second type of a feature spectrum obtained by multiplying each of the plurality of element values of the first type of the feature spectrum by an activation value corresponding to a vector length of the output vector, and (iii) a third type of a feature spectrum obtained by arranging the activation value at one plane position in the specific layer, over the plurality of channels along the third axis.
With this device, the feature spectrum can easily be obtained.
(6) According to a second aspect of the present disclosure, there is provided a method of executing regression processing of obtaining a predicted output value with respect to input data using a machine learning model including a vector neural network including a plurality of vector neuron layers. The method includes (a) obtaining the predicted output value with respect to the input data using the machine learning model, (b) reading out, from a memory, a known feature spectrum group obtained from an output of a specific layer of the machine learning model when a plurality of pieces of teaching data are input to the machine learning model, (c) calculating a degree of similarity relating to the predicted output value between the known feature spectrum group and a feature spectrum obtained from an output of the specific layer when the input data is input to the machine learning model, and (d) outputting the predicted output value using the degree of similarity.
(7) According to a third aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing a computer program for causing a processor to execute regression processing of obtaining a predicted output value with respect to input data using a machine learning model including a vector neural network including a plurality of vector neuron layers. The computer program causes the processor to (a) obtain the predicted output value with respect to the input data using the machine learning model, (b) read out, from a memory, a known feature spectrum group obtained from an output of a specific layer of the machine learning model when a plurality of pieces of teaching data are input to the machine learning model, (c) calculate a degree of similarity relating to the predicted output value between the known feature spectrum group and a feature spectrum obtained from an output of the specific layer when the input data is input to the machine learning model, and (d) output the predicted output value using the degree of similarity.
The present disclosure may be achieved in various forms other than the above-mentioned aspects. For example, the present disclosure can be implemented in forms including a computer program for achieving the functions of the regression processing device, and a non-transitory storage medium storing the computer program.
Number | Date | Country | Kind |
---|---|---|---|
2021-189877 | Nov 2021 | JP | national |