The present disclosed technology relates to a signal identifier.
An object of signal identification according to the present disclosed technology is to predict a category of a signal, that is, to classify a signal into a class to which the signal belongs. The signal handled here includes a signal obtained by electrically converting image data.
It is widely known that machine learning is effective for a problem of classification, that is, a problem of predicting a category. It is also widely known that a neural network is used as a learning model to be machine-learned.
A variational autoencoder is known as one of generation models using a neural network. In the technical field of machine learning, a learning device that learns a feature of input data, which is learning data, using a variational autoencoder has also been proposed. The variational autoencoder outputs an average and a variance of a latent variable z expressed by a multidimensional normal distribution. A learning device in which learning accuracy of an average and a variance of a latent variable z being improved in a variational autoencoder is disclosed (for example, Patent Literature 1).
Patent Literature 1: JP 2020-154561 A
Incidentally, a human can view a certain image, determine what an object shown in the image represents, and classify the image. The determination of the classification performed by human beings is performed on the basis of words and concepts created by human beings. For example, the human being associates the word “bird” with the concept “it has the body surface covered with specific feathers and has a beak and wings”. Furthermore, in the concept developed by human beings, for example, creating a subordinate concept “sparrow” from a broader concept “bird” is also possible. The broader concept and the subordinate concept can be replaced with a large classification and a small classification in the classification problem.
Even if an object shown in an image is unknown, a human can make a prediction on the basis of a concept developed by human beings. For example, assume that there is a person who does not know “emu” but knows other birds. When the person looks at an image showing “emu”, the person can predict that it is a kind of bird because it has the body surface covered with specific feathers and has a beak and wings.
On the other hand, in the conventional learning model exemplified in Patent Literature 1, for signal data belonging to an unlearned class, it is possible to calculate the closest one among learned classes as a candidate on the basis of a feature of an image such as color. However, the conventional learning model does not have the concept that has been developed by human beings. Therefore, in the conventional technology, there is a fear that, an unlearned image of “emu” is classified as “close to capybaras” that is not a bird on the basis of a feature of an image having a brown color, a classification not desirable for humans is performed.
An object of a signal identifier according to the present disclosed technology is to solve the above problem and to perform prediction on signal data of an unlearned class in accordance with a concept developed by human beings.
A signal identifier according to the present disclosed technology includes an inference model that generates a latent variable in which a distribution for each class in a latent space is defined according to a class of classification, and a second latent variable in which a distribution for each large classification in the latent space is defined according to a large classification of a broader concept of the class.
Since the signal identifier according to the present. disclosed technology has the above-described configuration, prediction can be performed on signal data of an unlearned class in accordance with a concept developed by human beings.
Embodiment 1
As illustrated in
A signal data set (1) illustrated in
The teacher data (34) may be simple data allocated for each label in an implementation manner in advance, for example, a letter, a number, an alphabet, a symbol, or a combination thereof. For example, an integer of 1001 may be allocated. in advance to the label of “Bird, Columbiformes, Columbidae”. In addition, in the case of a label related to a living organism, the integer allocated to the label may be an allocation method in accordance with the above-described concept developed by the human beings, such as 0 to 999 for mammals, 1000 to 1999 for birds, and 2000 to 2999 for fish. A label of a conceptually close class may be allocated with a close integer. Further, the type of numbers allocated to the label is not limited to one-dimensional numbers, and may be multi-dimensional numbers such as (1001, B, . . . , 0).
In a preferred example of the teacher data (34) according to the present disclosed technology, a distance between the teacher data (34) and another teacher data (34) is defined, and. the distance decreases when their concepts are close.
The learning model generated by the known signal learning unit 33 of the learning unit 31 by learning is illustrated in the center of
Input signal data (2) illustrated in
In addition, the identification result (4) illustrated in
The operation of the signal identifier 3 will become more clear by the following description divided into a learning phase and an inference phase.
The operation of the signal identifier 3 in the learning phase becomes clear by comparison with conventional machine learning.
The conventional machine learning is known to be developed from the viewpoint of how to draw a boundary for each class in a space with respect to a classification problem. One example of this viewpoint technology is Support Vector Machine. The support vector machine is designed to obtain a classification surface having a margin, and a non-linear classification surface such as a curved surface is also known. Here, the space is called a feature amount space or a latent space.
Conventional supervised learning machine learning considers a space in which only a feature of input data is a variable with respect to labeled input data. Taking the above-described “emu” and “capybara” as an example, both of the images have a feature that the color is brown, and thus, are plotted at close places in the feature amount space. For this reason, in the conventional technology, there is a fear that an unlearned image of “emu” is classified undesirably for humans such as “close to capybaras” on the basis of a feature of an image whose color is brown.
The signal identifier 3 according to the present disclosed technology considers not only a variable including features of input data but also a variable based on teacher data. Therefore, the present disclosed technology may consider a feature amount space including a variable including a feature of input data and a variable based on teacher data. The variable based on the teacher data may be a type of number allocated to the label described above. Taking the above-described “emu” and “capybaras” as an example, a plurality of variables including features of both input data have close values, but variables based on both teacher data do not have close values. Therefore, in the present disclosed. technology, there is no fear that an undesirable classification for humans, such as “close to capybaras” for an unlearned image of “emu”, would occur.
In the present disclosed technology, as described above, the dimension of the feature amount space may be obtained by adding the dimension of the variable including the features of the input data and the dimension of the variable based on the teacher data. Furthermore, in the present disclosed technology, a coordinate transformation may be performed to reflect the information of the teacher data while setting the dimension of the feature amount space as the dimension of the variable including the features of the input data.
Such a structure, in addition to having continuity with respect to continuous change of input data in the feature amount space or the latent space, having continuity with respect to continuous change of teacher data, is referred to as a “manifold structure” in the present disclosed technology. A method for implementing that a space has a manifold structure without changing a dimension becomes more clear by the following description. The expression “continuous change” herein may be paraphrased as “minute change” or “located in the vicinity”.
The difference between the conventional technology and the present disclosed technology also appears in a loss function used in the learning phase. The loss function is also referred to as a cost function (expressed in KATAKANA), a cost function, or an evaluation function.
In
In the inference model in
The latent variable z illustrated in
The inference model may be, for example, a neural network or another mathematical model.
In the identification model in
The identification model may be, for example, a neural network or another mathematical model.
In the generation model in
The generation model may be, for example, a neural network or another mathematical model.
The inference model, the identification model, and the generation model change in the learning process so as to achieve the purpose of learning. The above-described loss function is obtained by quantifying the purpose of learning. The varying portions of the inference model, the identification model, and the generation model are referred to as weight parameters or simply parameters.
The learning device according to the conventional technology includes a term related to a “reconfiguration error” illustrated in
r
:=∥x−{circumflex over (x)}∥
1 (1)
Although Expression (1) is defined by 1-norm, the term related to the reconfiguration error is not limited thereto. The term related co the reconfiguration error may be defined by another norm such as 2-norm, or may be defined by the square of 2-norm that can be used by the least squares method.
The loss function used by the learning unit 31 according to the present disclosed technology includes a term related to “identification error” in addition to the reconfiguration error. The identification error is a difference between the teacher data (t) and the class identification result. The term related to the identification error in the loss function is expressed by, for example, the following mathematical expression.
Expression (2) is defined as a general expression using the cross entropy as an error function, but is not limited thereto.
The loss function used by the learning unit 31 more preferably further includes two terms related to KL divergence. The two terms related to the KL divergence are expressed, for example, by the following mathematical expressions.
KL divergence is a measure of how similar two probability distributions are. DKL [| |] expressed by Expression (3) and Expression (4) represents a function for obtaining KL divergence. Further, “I” in Expression (4) represents an identity matrix.
Expression (3) is a KL divergence of a Gaussian distribution having an average of μ and a variance of σ2 and a Gaussian distribution having an average of m and a variance of I. Expression (4) is a KL divergence of a Gaussian distribution having an average of μH and a variance of σH2 and a normal distribution having an average of 0 and a variance of I. The role of these two KL divergences will become more clear by the following.
The signal identifier 3 according to Embodiment 1 may use a loss function expressed by the following mathematical expression as a loss function used for learning by the learning unit 31.
Here, α, β, and γ are weights. The learning of the learning unit 31 is performed so as to minimize the loss function represented by Expression (5). For updating the parameters of the inference model, the identification model, and the generation model, for example, an optimization method by a stochastic gradient descent method may be used. Each of the learned inference model, identification. model, and generation model is represented as a learned model (35) in
The effect of the term Lc illustrated in Expression (2) is to update the identification model so that the signal identifier 3 outputs a correct class identification result.
In addition, an effect of the term of the LKLM illustrated in Expression (4) is that plots of a plurality of classes having the same large classification of the broader concept form one Gaussian distribution in the second latent space. In other words, those having the same large classification of the broader concept are close in distance in the second latent space. In the case of different large classifications of the broader concept, the distance in the second latent space is long even if the features of the images are similar.
By including the term of Lc and the term of LKLM in the loss function, the learning unit 31 is learned to extract the manifold structure of the entire signal data set.
The effect of the term Lr illustrated in Expression (1) is to update the generation model so that the generation model correctly restores the signal data (x).
In addition, the effect of the term of LKL shown in Expression (3) is to be in the latent space and form a Gaussian distribution for each class.
In the present disclosed technology, since the center of each class is m having the manifold structure of the entire data set, the positional relationship of the Gaussian distribution of each class can take over the manifold structure of the second latent space.
To summarize the above, it can be said that the latent space is of each signal data unit similar to that in the conventional technology, and the second latent space is of a class unit viewed macroscopically.
In the example illustrated in
The result of learning according to the conventional technology has no regularity in the distribution of learned classes, and large classification according to a broader concept of “animal” and “machine” is not performed.
In contrast, large classification according to a broader concept of “animal” and “machine” is performed on the result of learning according to the present disclosed technology, and the distribution of dogs that are unlearned classes appears at a position close to the distribution of cats that are the same animals.
The operation of the signal identifier 3 in the inference phase will become more clear by the following description.
In the inference phase, the inference unit 36 uses the learned model (35) learned in the learning phase (see
The learned model (35) has each Gaussian distribution defined in the latent space for each learned class.
The learned model (35) and the input signal data (2) are input to a signal identification unit 37 of the inference unit 36, The signal identification unit 37 plots the latent variable z of the input signal data (2) in the latent space and calculates a correlation with each Gaussian distribution of the learned class defined by the learned model (35).
Incidentally, the Gaussian distribution is also referred to as the normal distribution and is a type of probability distributions. Abnormality detection is known as one of techniques using the normal distribution. Furthermore, as a method for measuring the degree of deviation of a certain sample using the measurement result of the normal distribution, a method using the Mahalanobis distance is known.
It is conceivable that the inference unit 36 according to the present disclosed technology also calculates the identification result (4) of the input signal data (2) using the Mahalanobis distance.
D
M(zx, pk)=∥(zx−μk,Train)T(Σk,Train)−1(zx−μk,Train)∥2 (6)
wherein
Here, k represents a serial number of the learned class, and the lower subscript “Train” represents that learning has been completed. in addition, a superscript T represents transposition. In addition, in Expression (6) represents the latent variable z of the input signal data (2).
On the basis of the Mahalanobis distance calculated by Expression (6), the inference unit 36 outputs the identification result (4) expressed by the following Expression.
The signal identification unit 37 of the inference unit 36 may determine an equal probability curve representing an n % section in the distribution for each class as a boundary for recognizing that the signal data belongs to the class. That is, if zx is inside an equal probability curve of a certain class, the signal identification unit 37 may determine that zx is likely to belong to the class as an identification result. In addition, if zx is not inside the equal probability curve of any class, the signal identification unit 37 may determine that zx is likely to belong to the unlearned class as the identification result. In a case where zx is not inside the equal probability curve of any class, the signal identification unit 37 may output information of the closest class from the information of the distribution of the class having the closest Mahalanobis distance, or may output a large classification that is a broader concept.
As described above, since the signal identifier 3 according to Embodiment 1 has the above-described configuration and functions, prediction can be performed on signal data of an unlearned class in accordance with the concept developed by human beings.
The signal identifier 3 according to the present disclosed technology can be used as a device that performs signal identification of a radio wave signal acquired by a radar, identification of an image acquired by a camera, and other signal identification, and thus has industrial applicability.
3: signal identifier, 31: learning unit, 33: known signal learning unit, 36: inference unit, 37: signal identification unit, 50: processor, 51: memory, 52: signal input interface, 53: signal processing processor, 54: display interface
This application is a Continuation of PCT International Application No. PCT/JP2021/008581 filed on Mar. 5, 2021, which is hereby expressly incorporated by reference into the present application.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2021/008581 | Mar 2021 | US |
Child | 18212501 | US |