The present invention relates to an information processing device, an information processing method, and an information processing program.
Enormous data can be collected from a plurality of types of sensors by using IoT and AI. For example, it is possible to collect a large number of various types of data such as image data of a road measured by an in-vehicle camera, position data of a vehicle measured by a GPS, soil data of a paddy field measured by a drone, and growth data of rice plants measured by a sensor.
Meanwhile, a relationship between data is complicated and difficult. With the progress of a smart city, autonomous driving technology, and the like, it is necessary to simultaneously analyze various types of data by the same analysis method and perform countermeasures against the analysis results. Therefore, a technique of performing cooperative learning and inference in consideration of the relationship between data is becoming important.
One type of conventional machine learning for performing data analysis is deep learning in which layers of neurons are deeply multilayered. An example thereof is a convolutional neural network (CNN). The CNN is a deep learning model having convolutional layers that extract features of an image, pooling layers that reduce dimensions of the extracted features, and connected layers that classify the image. Learning is to adjust a degree of connection (weight) between neurons such that a value of an output result matches with an input image, and inference is to determine the input image by using the CNN subjected to the weight adjustment.
However, in the CNN, a neuron itself serving as a learning unit lacks the ability to accurately express a feature element in data. In particular, because the pooling layers are used, a spatial positional relationship between a plurality of feature elements is lost. Thus, a relationship between features in data and inference results is unclear. That is, interpretability of learning for data is insufficient in the CNN, which is difficult for humans to understand. Therefore, it is difficult to achieve human-in-the-loop (HITL) in which humans are partially involved in determination.
In view of this, Non Patent Literature 1 discloses a capsule network as a model capable of recognizing and recording feature elements in data. The feature elements in the data are input and output by scalars in the CNN, but, in the capsule network, the feature elements are input and output by vectors. Each axis of the vector stores the feature element in the data. In the capsule network that executes a task of “recognizing the presence of a line”, for example, information regarding a thickness of a line is stored in the first axis, and an orientation of the line is stored in the second axis. The capsule refers to a state in which information of multidimensional feature elements in pixel units, object units, or the like is vectorized and held.
Unlike the CNN, in the capsule network, (1) a vector is used instead of a scalar, (2) processing of giving a spatial relationship to a feature element in data is added, (3) dynamic routing is used instead of backpropagation to weight an output vector, and (4) a squashing function is used instead of a sigmoid function as an activation function. Hereinafter, a processing procedure of the capsule network will be outlined.
Procedure 1. Image data is subjected to convolution processing to obtain a plurality of multidimensional vectors ui (see
Procedure 2. The vectors ui of the primary capsules are multiplied by a weighting matrix Wij corresponding to the vectors ui to obtain vectors uj|i (hat symbol ({circumflex over ( )}) above u) (see Non Patent Literature 2). For example, a weight indicating a spatial relationship between a hand and a human body is given. Further, a vector of each capsule for a foot or the like is also subjected to Procedure 1 and Procedure 2.
Procedure 3. Each vector uj|i is multiplied by each weight cij to obtain a sum, and the sum Σicijuj|i is set as an input vector sj for the next layer (see Non Patent Literature 2). The weight cij is calculated by a method called dynamic routing at the time of learning the capsule network.
Procedure 4. In the next layer, a squashing function is applied to the input vector sj to obtain an output vector vj (see Non Patent Literature 2). The output vector vj is a vector of the capsule for the human body based on a spatial positional relationship between parts. The squashing function is an activation function that scales a length of the input vector sj within a range of 0 to 1 while maintaining an orientation thereof. The output vector vj is referred to as a digit capsule, and a layer thereof is referred to as a digit capsule layer.
Here, the weight cij is a degree of connection indicating how the vectors ui are connected in the whole and is calculated from “cij= {exp (bij)}/{Σkexp (bik)}” by using a softmax function. A calculation method for optimizing a value of bij at the time of learning is dynamic routing, and a situation of connection between the vectors ui in the whole can be dynamically changed. Dynamic routing is a method of calculating how important each capsule is when a high-level capsule in the next layer is generated from a lower-level capsule in a previous layer.
Here, the value of bij is updated by “bij←bij+uj|i·Vj”. The expression “uj|i·Vj” is referred to as agreement calculation to calculate a matching degree between a vector of a primary capsule in a previous layer and a vector of a digital capsule in the next layer.
However, the agreement calculation takes a lot of calculation resources and time, which is inefficient. In particular, the number of parameters of the weighting matrix Wij used for calculating uj|i and vj is enormous. The number of parameters is two to three times that of the CNN, and thus a time required for the value of bij to converge is ten to thirty times that of the CNN. As a result, the output vector vi is calculated while a plurality of feature elements in the data are being intertwined, and thus the feature elements may not be appropriately separated from each other.
Therefore, Non Patent Literature 2 discloses a graph capsule network. In the graph capsule network, the calculation method of dynamic routing having a large amount of calculation is replaced with a calculation method of multihead attention-based graph pooling (see
Processing procedures of the graph capsule network are different from those of the capsule network in Procedure 2 and Procedure 3. The multidimensional vectors ui obtained in Procedure 1 are transformed into respective nodes of a graph having K2 nodes with identity. Then, the transformed vector ui′ of each node is multiplied by a weighting matrix A indicating a weight (=attention) between adjacent nodes to obtain a graph corresponding to the vectors ui. Thereafter, a capsule group of the feature vectors ui′ included in the graph is dimensionally reduced to obtain an input vector sj for the next layer.
However, Non Patent Literature 2 merely discloses a technique of encapsulating image data having feature elements in a space. Thus, it is difficult to apply the technique to time-series data having feature elements on a time axis. Further, Non Patent Literature 2 does not disclose a mechanism for separating feature elements of time-series data from each other. Thus, an output vector is calculated while the feature elements in the time-series data are being intertwined.
The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a technique capable of improving classification accuracy of each feature element included in multidimensional time-series data having feature elements in a spacetime.
An information processing device according to an aspect of the present invention includes: an input unit that inputs a plurality of pieces of time-series data measured by a respective plurality of sensors at different positions; and a processing unit that divides the plurality of pieces of the time-series data into a plurality of pieces of partial time-series data at predetermined time intervals, generates a plurality of primary capsules each including a feature vector of each of the plurality of pieces of the partial time-series data regarding the plurality of pieces of the time-series data, performs graph modeling to generate a weighting matrix in which a connection relationship between the plurality of primary capsules is indicated by a weight corresponding to a distance between the plurality of sensors and the predetermined time interval, and performs graph Fourier transform on the feature vector of each of the plurality of primary capsules based on the weighting matrix.
An information processing device according to an aspect of the present invention includes: an input unit that inputs a plurality of pieces of time-series data measured by a respective plurality of sensors at different positions; and a processing unit that uses the plurality of pieces of the time-series data to separately generate a plurality of primary capsules each including a feature vector of each piece of the time-series data and a plurality of primary capsules each including a feature vector at each predetermined time interval.
An information processing method according to an aspect of the present invention is an information processing method performed in an information processing device, the information processing method including: a step of inputting a plurality of pieces of time-series data measured by a respective plurality of sensors at different positions; and a step of dividing the plurality of pieces of the time-series data into a plurality of pieces of partial time-series data at predetermined time intervals, generating a plurality of primary capsules each including a feature vector of each of the plurality of pieces of the partial time-series data regarding the plurality of pieces of the time-series data, performing graph modeling to generate a weighting matrix in which a connection relationship between the plurality of primary capsules is indicated by a weight corresponding to a distance between the plurality of sensors and the predetermined time interval, and performing graph Fourier transform on the feature vector of each of the plurality of primary capsules based on the weighting matrix.
An information processing method according to an aspect of the present invention is an information processing method performed in an information processing device, the information processing method including: a step of inputting a plurality of pieces of time-series data measured by a respective plurality of sensors at different positions; and a step of using the plurality of pieces of the time-series data to separately generate a plurality of primary capsules each including a feature vector of each piece of the time-series data and a plurality of primary capsules each including a feature vector at each predetermined time interval.
An information processing program according to an aspect of the present invention causes a computer to function as the above information processing device.
The present invention can provide a technique capable of improving classification accuracy of each feature element included in multidimensional time-series data having feature elements in a spacetime.
Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the drawings, the same parts are denoted by the same reference signs, and the description thereof is omitted.
In order to solve the above problems, the present invention discloses two embodiments.
A first embodiment discloses a method of, regarding multidimensional time-series data having feature elements in a spacetime, generating a capsule by simultaneously graphically modeling the feature elements in the spacetime. In particular, graph modeling is performed by using a weighting matrix based on both a spatial axis and a time axis in order to apply the present invention not only to feature elements on the spatial axis but also to feature elements on the time axis. In order to improve classification accuracy of each feature element included in the time-series data, graph Fourier transform for transforming a feature element in a spatiotemporal domain into a spectral domain is performed. Therefore, the present invention can also be applied to multidimensional time-series data having feature elements on the time axis, and the feature elements can be completely separated from each other.
A second embodiment discloses a method of, regarding multidimensional time-series data having feature elements in a spacetime, separately encapsulating feature elements on a spatial axis and on a time axis. The feature elements are separated into the feature elements on the spatial axis and the feature elements on the time axis and are then encapsulated. Thus, the present invention can also be applied to multidimensional time-series data having feature elements on the time axis, and feature elements can be completely separated from each other.
The input unit 11 has a function of inputting a plurality of pieces of time-series data measured by a respective plurality of sensors at different positions.
The processing unit 12 has a function of dividing the plurality of pieces of the time-series data into a plurality of pieces of partial time-series data at predetermined time intervals, generating a plurality of primary capsules each including a feature vector of each of the plurality of pieces of the partial time-series data regarding the plurality of pieces of the time-series data, performing graph modeling to generate a weighting matrix in which a connection relationship between the plurality of primary capsules is indicated by a weight corresponding to a distance between the plurality of sensors and the predetermined time interval, and performing graph Fourier transform on the feature vector of each of the plurality of primary capsules based on the weighting matrix.
The processing unit 12 has a function of generating a plurality of capsules in a spectral domain by classifying a plurality of signal values included in a respective plurality of bases obtained by the graph Fourier transform according to signal values at the same position in the bases and calculating attention of a classification class on the basis of a magnitude of the signal values included in the plurality of capsules.
The output unit 13 has a function of outputting an inference result of a task inferred on the basis of a calculation result of the attention of the classification class.
The storage unit 14 has a function of storing arbitrary data used for learning or inference. The storage unit 14 stores, for example, the plurality of pieces of the time-series data input to the information processing device 1, a learning model used for learning or inference, and various parameters of the learning model.
The input unit 11 inputs a group X {x1, x2, . . . , xm} (xi ∈R1) of m pieces of time-series data. Here, m denotes the total number of pieces of time-series data. R denotes a real number space. 1 denotes a time length of time-series data xi. 1 also denotes the total number of timestamps (measurement times). The time-series data is, for example, time-series data of electrocardiographic values measured by a sensor 1, time-series data of electroencephalogram values measured by a sensor 2, . . . , and time-series data measured by a sensor m. The time-series data used in the first embodiment is preferably data sensitive to a spectral space.
Next, the processing unit 12 inputs the group X of m pieces of the time-series data to a gated recurrent unit (GRU) 21 (see
Next, the processing unit 12 changes the axis order of the feature vector group ϕslice ∈Rm×s×k from “m-axis, S-axis, K-axis” to “K-axis, m-axis, S-axis” and encapsulates a feature vector uN (N=m× S) of each K-axis (see
Next, the processing unit 12 performs graph modeling. Graph modeling is processing of determining a weight between primary capsules. That is, graph modeling is processing of generating a weighting matrix to be applied to the primary capsule group Ωprimary{u1, u2, . . . , uN}.
In graph modeling, modeling into a graph G (V, E) is performed by setting a primary capsule as a node N(=|V|) and the degree of connection (weight) between adjacent primary capsules as an edge E. As illustrated in
As a method of generating a weighting matrix, there are a method of manually setting a weight (attention) between nodes and a method of determining the weight by machine learning. In a case where the weight is manually set, the weighting matrix Ai,j ∈RN×N is calculated by Equation (1).
The expression “ti-tj” denotes a difference between adjacent timestamps in a time-axis direction between the nodes. The expression “Si-Sj” is a spatial distance between sensors in a spatial-axis direction between the nodes. In a case where the weight is manually set, a human sets in advance the difference between the adjacent timestamps and the spatial distance between the sensors. Regarding this point, Non Patent Literature 2 discloses that a weighting matrix is generated only on the basis of the spatial axis, as can be seen from Equation (1) of Non Patent Literature 2. Meanwhile, in the present embodiment, the weighting matrix based on both the spatial axis and the time axis is generated as in Equation (1) described above. Thus, the present invention can also be applied to time-series data having feature elements on the time axis.
Meanwhile, in a case where the weight is determined by machine learning, the weight of the weighting matrix is subjected to machine learning. The weight of each edge is arbitrarily set in advance, and the weight is automatically updated in the process of machine learning. As shown in Equation (2), the processing unit 12 prepares two learnable matrices Wkey and Wquery, multiplies each of the two matrices by the primary capsule group Ωprimary, and then applies the softmax function to calculate the weighting matrix Ai,j.
The above is the method of generating the weighting matrix Ai,j. At this timing, the processing unit 12 calculates a diagonal matrix Di,j ∈RN×N of the weighting matrix Ai,j by calculating the sum of each column as shown in Equation (3).
The processing unit 12 replaces the feature vectors {u1, u2, . . . , uN} of the N primary capsules with signal values F {f1, f2, . . . , fN} of N nodes. This replacement is processing for handling the feature vectors of the primary capsules as the signal values of the nodes. A signal value fi has the same value as the feature vector ui on the K-axis.
Next, the processing unit 12 performs graph Fourier transform on a spatiotemporal domain of the graph model into a spectral domain.
First, the processing unit 12 calculates graph Laplacian L ∈RN×N by subtracting the weighting matrix Ai,j from the diagonal matrix Di,j as shown in Equation (4).
Next, the processing unit 12 spectrally decomposes the graph Laplacian L to calculate a transposed matrix UT of a unitary matrix U ∈RN×Dtrans as shown by Equation (5).
Thereafter, as shown in Equation (6), the processing unit 12 multiplies the signal values F of the N nodes by the transposed matrix UT of the unitary matrix U to calculate signal values F ∈RDtrans×K (hat symbol ({circumflex over ( )}) above F) in the spectral domain. Hereinafter, the hat symbol will be simply referred to as “({circumflex over ( )})”.
The signal values F ({circumflex over ( )}) subjected to the graph Fourier transform are a matrix in the form of “D bases u1 to UDtrans×K”. That is, K feature vectors (hereinafter, also referred to as “dimensions” in order to facilitate understanding) are output for each base u. The total number K is the same as the total number of signal values of each node before the transform (the total number of feature vectors on the K-axis). Note that there is no positional correspondence between the signal values before and after the graph Fourier transform.
As described above, in the present embodiment, the signal value F of each node in the spatiotemporal domain is subjected to the graph Fourier transform into the signal value F ({circumflex over ( )}) in the spectral domain. Thus, feature elements of the signal values between the plurality of nodes can be reliably separated by the principle of the graph Fourier transform.
That is, in steps S104 and S105, graph modeling is performed to once give the connection relationship of the weight between the nodes. However, the feature element of each node is supposed to be different for each node, and thus an influence between the nodes is eliminated by performing graph Fourier transform, that is, by transforming a spacetime into a spectral space to transform the axis not into the spatial axis or the time axis, but into a frequency axis.
Next, the processing unit 12 performs filtering processing in order to extract only a target signal value from the K signal values F ({circumflex over ( )}) included in each base u in the spectral domain (see
In step S106, in some cases, it is unknown which dimension is effective for a final classification task among K dimensions included in each base u in the spectral domain. Therefore, an unimportant noise signal is eliminated by performing the filtering processing such that only a target dimension remains.
Next, the processing unit 12 transforms the signal value y ({circumflex over ( )}) ∈RDtrans×K subjected to the filtering processing into a signal value y ({circumflex over ( )}) ∈RK×Dtrans as preprocessing for performing graph attention calculation described later (see
A plurality of dimensions included in each capsule are completely separated from each other by the graph Fourier transform already performed. It is possible to determine an important capsule on the basis of the magnitude of the scalar of the dimension included in the capsule, and the capsule serves as a determination source of attention of the graph.
Next, the processing unit 12 calculates the attention of the graph. Specifically, as shown in Equation (8), a graph attention Att is calculated by multiplying the capsule group Ω({circumflex over ( )}) in the spectral domain by a learnable parameter W. The parameter W is a parameter in the form of RDtrans×M. M denotes the total number of classification tasks.
Thereafter, the processing unit 12 calculates an output vector v on the basis of the graph attention Att. Specifically, first, as shown in Equation (9), the capsule group Ω({circumflex over ( )}) is multiplied by a transposed matrix Att™ of the graph attention Att to calculate a classification result S{s1, s2, . . . , sm} (s1 ∈RDtrans) in the form of RDtrans×M. The classification result S is a matrix in the form of “M×Dtrans”.
Thereafter, the output vector v is calculated by applying the squashing function in Equation (10) to each classification result Si. A capsule of the output vector is called a high-level capsule.
Finally, the output unit 13 outputs the output vector v of the high-level capsule.
Summarizing the above, in the first embodiment, the GRU processing, the division processing, and the transform processing are applied to composite time-series data obtained from the plurality of sensors to generate primary capsules, which makes it possible to store a plurality of feature elements for each dimension and to implement a learning model capable of disentangling feature elements in the time-series data.
Further, in the first embodiment, the feature elements in the time-series data are separated from each other by simultaneously graphically modeling a relationship with a spacetime for the time-series data sensitive to the spectral space, applying the graph Fourier transform, and calculating attention of the graph in the spectral domain. This makes it possible to implement a learning process capable of effectively disentangling the feature elements.
According to the first embodiment, the information processing device 1 includes: the input unit 11 that inputs a plurality of pieces of time-series data measured by a respective plurality of sensors at different positions; and the processing unit 12 that divides the plurality of pieces of the time-series data into a plurality of pieces of partial time-series data at predetermined time intervals, generates a plurality of primary capsules each including a feature vector of each of the plurality of pieces of the partial time-series data regarding the plurality of pieces of the time-series data, performs graph modeling to generate a weighting matrix in which a connection relationship between the plurality of primary capsules is indicated by a weight corresponding to a distance between the plurality of sensors and the predetermined time interval, and performs graph Fourier transform on the feature vector of each of the plurality of primary capsules based on the weighting matrix. Thus, it is possible to apply the present invention also to multidimensional time-series data having feature elements on the time axis and also to completely separate the feature elements from each other. As a result, it is possible to provide a technique capable of improving classification accuracy of each feature element included in multidimensional time-series data having feature elements in a spacetime.
In the first embodiment, the GRU processing, the division processing, and the transform processing are performed on time-series data to generate primary capsules. However, this is merely an example of a method of generating primary capsules. Other types of processing may be performed as long as spatiotemporal feature elements can be separated. For example, convolution processing can be performed.
In the first embodiment, spatiotemporal feature elements are separated by the graph Fourier transform. This is suitable for time-series data that is difficult for humans to separate feature elements. Meanwhile, feature elements can be easily separated depending on the type of time-series data. Therefore, in order to easily apply the present invention to the latter time-series data, a method of separately encapsulating feature elements in time-series data on the spatial axis and on the time axis will be described in the second embodiment.
As in the first embodiment of
The processing unit 12 has a function of using a plurality of pieces of time-series data to separately generate a plurality of primary capsules each including a feature vector of each piece of the time-series data and a plurality of primary capsules each including a feature vector at each predetermined time interval.
The processing unit 12 has a function of not only generating the two pluralities of primary capsules, but also generating a feature vector by extracting an entire feature included in the plurality of pieces of the time-series data for each partial region and generating primary capsules each including the feature vector at the each predetermined time interval by using the plurality of pieces of the time-series data.
The processing unit 12 has a function of performing attention routing on each of the three pluralities of primary capsules to generate three digital capsules and inferring a task on the basis of a size of feature vectors included in the three digital capsules.
The input unit 11 inputs a group X {x1, x2, . . . , xc} (xi ∈R1) of c pieces of time-series data. Here, c denotes the total number of pieces of time-series data, and c is also referred to as a channel. R denotes a real number space. 1 denotes a time length of time-series data xi. l also denotes the total number of timestamps (measurement times). The time-series data is, for example, position data of coordinates (x, y, z) of a hand, an elbow, or the like that changes with the lapse of time. The time-series data used in the second embodiment is preferably time-series data other than “data sensitive to the spectral space”.
Steps S202 to S204 are first branch processing of acquiring an entire feature of the time-series data group X. In the present embodiment, convolution processing is performed to acquire the entire feature. Hereinafter, the processing will be described.
First, the processing unit 12 inputs the group X of c pieces of the time-series data to a first convolutional layer 31 and inputs the output from the first convolutional layer to a second convolutional layer 32.
The first convolutional layer 31 performs convolution processing on the group X of c pieces of the time-series data by using a filter having a kernel size k1 and the number of target channels c1. Therefore, a time-series data group X ∈Rc×l is transformed into a time-series data group x1 ∈Rc1×(1−k1+1).
The second convolutional layer 32 performs convolution processing on the time-series data group x1 ∈Rc1×(1−k1+1) by using a filter having a kernel size k2 and the number of target channels c2. Therefore, the time-series data group x1 ∈Rc1×(1−k1+1) is transformed into a time-series data group x2 ∈Rc2×(1−k1−k2+2).
The first convolutional layer 31 and the second convolutional layer 32 perform, for example, various types of processing such as 1D batch normalization processing, application of a rectified linear unit (ReLU), dropout processing at a rate of 0.3, and application of a squeeze and excitation (SE) block.
Thereafter, the processing unit 12 inputs the output from the second convolutional layer 32 to a third convolutional layer 33. The third convolutional layer 33 performs convolution processing on the time-series data group x2 ∈Rc2×(1−k1−k2+2) by using a filter having a kernel size k3 and the number of target channels c3. Therefore, the time-series data group x2 ∈Rc1×(1−k1−k2+2) is transformed into a time-series data group x3 ∈Rc3×11 (l1=l−k1−k2−k3+3). Application of the SE block is not performed in the third convolutional layer.
Next, as shown in Equation (11), the processing unit 12 calculates a learnable matrix W1 E Rc3×11 for the time-series data group x3 to calculate a first primary capsule group Ω1primary{u1, u2, . . . , u11} having a plurality of primary capsules of feature vectors u11 for each timestamp. Thereafter, the processing unit 12 transforms Ω1primary into Ω1primary ∈R1×c3×11.
Next, as shown in Equation (12), the processing unit 12 calculates ϕ1digit ∈Rcls×c3×11 by calculating routing attention A1 ∈Rcls×1×11 for the first primary capsule group Ω1primary. Here, cls denotes the total number of classification classes.
Thereafter, the processing unit 12 calculates a first digit capsule group Ω1digit ∈Rcls×c3 by summing the last dimension of Ø1digit.
After step S201, steps S205 to S208 are performed in parallel with steps S202 to S204. Steps S205 to S208 are second branch processing of dividing the time-series data group X by the spatial axis to acquire a feature of each channel. In the present embodiment, the feature of each channel is acquired by using GRU as in the first embodiment. Hereinafter, the processing will be described.
First, the processing unit 12 divides the group X E R1 of c pieces of the time-series data in the spatial-axis direction to acquire a group xset{x1, x2, . . . . xc} of c channels.
Next, the processing unit 12 inputs the time-series data of each channel included in the channel group xset to each GRU 34. Each GRU 34 shares the same parameter, inputs the time-series data of each channel, and outputs cgru feature vectors from one piece of the time-series data via g layers.
Thereafter, the processing unit 12 calculates ϕ2primary {ϕ′1, ϕ′2, . . . , ϕ′c} (ϕ2primary ∈Rcgru×c) by using a hidden state ϕ′∈Rcgru of each channel output from each GRU 34.
Next, as shown in Equation (13), the processing unit 12 calculates a learnable matrix W2 ∈Rcgru×c for ϕ2primary to calculate a second primary capsule group Ω2primary{w1, w2, . . . , wc} having a plurality of primary capsules of the feature vectors wc of each channel. The length of the signal value included in each capsule of the primary capsule group (spatial axis) illustrated in
Next, as shown in Equation (14), the processing unit 12 calculates ϕ2digit ∈Rcls×cgru×c by calculating routing attention A2 ∈Rcls×1×c for the second primary capsule group Ω2primary. Here, cls denotes the total number of classification classes.
Thereafter, the processing unit 12 calculates a second digit capsule group Ω2digit ∈Rcls×cgru by summing the last dimension of Ø2digit.
After step S201, steps S209 to S212 are performed in parallel with steps S202 to S204. Steps S209 to S212 are third branch processing of dividing the time-series data group X by the time axis to acquire a feature of each timestamp. In the present embodiment, the feature of each timestamp is acquired by using GRU as in the first embodiment. Hereinafter, the processing will be described.
First, the processing unit 12 inputs a group X ∈R1 of c pieces of the time-series data to a GRU 35. The GRU 35 inputs the group X of c pieces of the time-series data and then outputs cgru feature vectors via the g layers.
Next, the processing unit 12 calculates ϕ3primary{ω1, ω2, . . . , ω1} (ϕ3primary ∈Rcgru×l) by using a hidden state ω E Rcgru of each timestamp output from the GRU 35.
Next, as shown in Equation (15), the processing unit 12 calculates a learnable matrix W3 ∈Rcgru×l for ϕ3primary to calculate a third primary capsule group Ω3primary{V1, V2, . . . , V1} having a plurality of primary capsules of the feature vectors V1 of each timestamp. The length of the signal value included in each capsule of the primary capsule group (time axis) illustrated in
Next, as shown in Equation (16), the processing unit 12 calculates Ω2digit ∈Rcls×cgru×l by calculating routing attention A3 ∈Rcls×1×1 for the third primary capsule group Ω3primary. Here, cls denotes the total number of classification classes.
Thereafter, the processing unit 12 calculates a third digit capsule group Ω3digit ∈Rcls×cgru by summing the last dimension of Ø3digit.
Next, the processing unit 12 performs the batch normalization processing and application of the rectified linear unit to each of the first to third digit capsule groups Ωkdigit (k ∈{1, 2, 3}) and then calculates a norm Vk{vote1k, . . . , voteclsk} which is a voting content of each classification class cls included in each of the digit capsule groups. The norm votejk is an index indicating reliability of the j-th classification class included in the k-th digital capsule group.
Thereafter, as shown in Equation (17), the processing unit 12 multiplies the norm Vk (k ∈{1, 2, 3}) regarding the first to third digit capsule groups by a weight set (a1, a2, a3) for the learnable attention to obtain a sum thereof, thereby calculating a final voting result.
That is, a correct answer of each digit capsule group can be found depending on the length of each feature vector included in the digit capsule group. However, the correct answer may be different among the three digit capsule groups, and thus the correct answer is selected by voting on the basis of the weight in step S213. A voting method is, for example, performing weighting with three scalars as described above. A weight considered to be correct is set to be large, and a weight considered to be incorrect is set to be small. The weight may be a weight set by a human or a learned weight.
Finally, the output unit 13 outputs the voting result as an inference result.
Summarizing the above, in the second embodiment, a primary capsule group that is not separated by the spatial axis or the time axis, a primary capsule group that is separated by the spatial axis, and a primary capsule group that is separated by the time axis are generated for time-series data other than “time-series data sensitive to the spectral space”, then a routing operation is applied to each of the primary capsule groups, and three generated digit capsules are integrated by a learnable voting mechanism. This makes it possible to improve the classification accuracy and also to improve learning process interpretability by branch analysis or the like.
According to the second embodiment, the information processing device 1 includes: the input unit 11 that inputs a plurality of pieces of time-series data measured by a respective plurality of sensors at different positions; and the processing unit 12 that uses the plurality of pieces of the time-series data to separately generate a plurality of primary capsules each including a feature vector of each piece of the time-series data and a plurality of primary capsules each including a feature vector at each predetermined time interval. This makes it possible to apply the present invention also to multidimensional time-series data having feature elements on the time axis and to completely separate the feature elements from each other. As a result, it is possible to provide a technique capable of improving classification accuracy of each feature element included in multidimensional time-series data having feature elements in a spacetime.
The present invention is not limited to the above embodiments. The present invention may be modified in various manners within the gist of the present invention. The first embodiment and the second embodiment may be combined.
The information processing devices 1 according to the first embodiment and second embodiment described above can be achieved by using, for example, a general-purpose computer system including a CPU 901, a memory 902, a storage 903, a communication device 904, an input device 905, and an output device 906 as shown in
The information processing device 1 may be implemented by one computer. The information processing device 1 may be implemented by a plurality of computers. The information processing device 1 may be a virtual machine that is implemented in a computer. The program for the information processing device 1 can be stored in a computer-readable recording medium such as an HDD, an SSD, a USB memory, a CD, or a DVD. The program for the information processing device 1 can also be distributed via a communication network.
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/JP2021/032649 | 9/6/2021 | WO |