The embodiments discussed herein are related to a non-transitory recording medium storing a determination control program, determination control device, and determination control method.
Hitherto a probability distribution of normal data is learnt by unsupervised training, and abnormal data is detected by comparing a probability distribution of determination target data against the normal data probability distribution.
For example, technology is proposed in which a probability distribution of latent space proportional to a probability distribution in real space is captured by an autoencoder compatible with rate-distortion theory that minimizes latent variable entropy, and abnormal data is detected from a difference to the latent space probability distribution. For example, related arts are disclosed in Rate-Distortion Optimization Guided Autoencoder for Isometric Embedding in Euclidean Latent Space (ICML2020) and “Fujitsu Develops World's First AI technology to Accurately Capture Characteristics of High-Dimensional Data Without Labeled Training Data”, [online] , Jul. 13, 2020 [search date Sep. 13, 2020], Internet<URL: https://www.fujitsu.com/global/about/resources/news/press-releases/2020/0713-01.html>
According to an aspect of the embodiments, a non-transitory recording medium is stored with a determination control program that causes a computer to execute a process. The process includes estimating, as a probability distribution, a low-dimensional feature value obtained by encrypting input data, the low-dimensional feature value having a lower dimensionality than the input data, generating output data by decrypting a feature value resulting from adding noise to the low-dimensional feature value, and adjusting respective parameters of the encrypting, the estimating, and the decrypting based on a cost including an error between the input data and the output data and including an entropy of the probability distribution, wherein, in a determination as to whether or not target input data is normal, a determination standard for the determination is controlled based on information obtained from another probability distribution estimated by encrypting the target input data with the parameters after adjusting are employed.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
Explanation follows regarding an example of an exemplary embodiment according to technology disclosed herein, with reference to the drawings.
First, prior to explaining details about each exemplary embodiment, explanation follows regarding issues in cases in which normal/abnormal determination uses a probability distribution exhibited by a low-dimensional feature extracted from input data, and in cases in which features in input data exhibit various probability distributions.
Explanation follows regarding an example of a case in which in which medical images imaging an organ of a human body or the like serve as the input data. Examples of medical images serving as the input data are schematically illustrated at a lower part of
However, as illustrated at the lower part of
Thus in order to address this, control is performed in each of the following exemplary embodiments so as to enable determination of normal or abnormal with good accuracy even in cases in which a probability distribution exhibited by a low-dimensional feature extracted from the input data exhibits various probability distributions.
A determination control device 10 according to a first exemplary embodiment includes, from a functional perspective, an autoencoder 20, an estimation section 12, an adjustment section 14, and a determination section 16, as illustrated in
First, explanation follows regarding functional sections that function during training, with reference to
The autoencoder 20 includes an encryption section 22, a noise generation section 24, an adding section 26, and a decryption section 28, as illustrated in
The encryption section 22 encrypts multi-dimensionality input data so as to extract a low-dimensional feature value z having a lower dimensionality than the input data. More specifically, the encryption section 22 extracts the low-dimensional feature value z from input data x using an encryption function fθ(x) including a parameter θ. For example, the encryption section 22 is able to apply a convolutional neural network (CNN) algorithm as the encryption function fθ(x). The encryption section 22 outputs the extracted low-dimensional feature value z to the adding section 26.
The noise generation section 24 generates noise ε that is a random number based on a distribution having the same dimensionality as the low-dimensional feature value z, having no inter-correlation between dimensions, and having a mean of 0. The noise generation section 24 outputs the generated noise ε to the adding section 26.
The adding section 26 generates a low-dimensional feature value z{circumflex over ( )} (denoted by “{circumflex over ( )} (hat)” above “z” in the drawings) resulting from adding the noise ε input from the noise generation section 24 to the low-dimensional feature value z input from the encryption section 22, and outputs the low-dimensional feature value z{circumflex over ( )} to the decryption section 28.
The decryption section 28 generates output data x{circumflex over ( )} (denoted by “{circumflex over ( )} (hat)” above “x” in the drawings) having the same dimensionality as input data x by decrypting the low-dimensional feature value z{circumflex over ( )} input from the adding section 26. More specifically, the decryption section 28 generates the output data x{circumflex over ( )} from the low-dimensional feature value z{circumflex over ( )} by using a decryption function gφ(z{circumflex over ( )}) including a parameter φ. For example, the decryption section 28 may apply a transposed CNN algorithm as the decryption function gφ(z{circumflex over ( )}).
The estimation section 12 acquires the low-dimensional feature value z extracted by the encryption section 22, and estimates the low-dimensional feature value z as a probability distribution. More specifically, the estimation section 12 estimates a probability distribution Pψ(z) including parameter ψ using a probability distribution mixture model configured from plural distributions. The present exemplary embodiment will now be described for a case in which the probability distribution model is a Gaussian mixture model (GMM). In this case the estimation section 12 estimates a probability distribution Pψ(z) by calculating parameters π, Σ, μ of the following Equation (1) using a maximum likelihood estimation method or the like.
In Equation (1) K is a number of the normal distributions contained in the GMM, μk is a mean vector of the kth normal distribution, Σk is a covariance matrix of the kth normal distribution, and πk is a weight (mixing coefficient) of the kth normal distribution, wherein the sum of all πk=1. Moreover, the estimation section 12 computes an entropy R of the probability distribution Pψ(z)=−log (Pψ(z)).
The adjustment section 14 adjusts each of the respective parameters θ, φ, ψ of the encryption section 22, the decryption section 28, and the estimation section 12 based on a training cost including an error between input data x and output data x{circumflex over ( )} corresponding to this input data and including the entropy R computed by the estimation section 12. For example, as expressed by the following Equation (2), the adjustment section 14 repeatedly performs processing to generate the output data x{circumflex over ( )} from the input data x while updating the parameters θ, φ, ψ so as to minimize a training cost L1 expressed by a weighted sum of the error between x and x{circumflex over ( )}, and the entropy R. The parameters of the autoencoder 20 and the estimation section 12 are trained thereby.
L
1
=E
x˜p(x),ε˜N(0,σ
)
[R+λ·D] (2)
Note that in Equation (2) λ is a weighting coefficient and D is an error between x and x{circumflex over ( )}, for example D=(x−x{circumflex over ( )})2.
Next, description follows regarding functional sections that function during determination, with reference to
The encryption section 22 extracts the low-dimensional feature value z from the input data x by encrypting the input data x based on an encryption function fθ(x) set with the parameter θ after being adjusted by the adjustment section 14.
The estimation section 12 acquires the low-dimensional feature value z extracted by the encryption section 22, and estimates the probability distribution Pψ(z) of the low-dimensional feature value z by using the GMM set with the parameter ψ after being adjusted by the adjustment section 14. Moreover, the estimation section 12 computes the entropy R of the probability distribution Pψ(z)=−log (Pψ(z)) similarly to during training. Furthermore, the estimation section 12 also computes a membership coefficient γ indicating a probability that the low-dimensional feature value z belongs to each of the plural normal distributions configuring the GMM. In cases in which the GMM is configured from K normal distributions, fπ(πk)=γk that can be computed from the weights πk of the normal distributions included in Equation (1) is employed to express the membership coefficient γ as a K dimensional vector γ=(γ1, γ2, . . . , γk, . . . , γK). The membership coefficient γ is accordingly computed in the process of estimating the probability distribution Pψ(z).
The determination section 16 uses the adjusted parameters θ, φ, ψ to control a determination standard to determine whether or not the determination target input data is normal by controlling based on information obtained from the probability distribution Pψ(z). More specifically, the determination section 16 employs the membership coefficient γ computed by the estimation section 12 as information obtained from the probability distribution PΩ(z), and identifies cluster information indicating which cluster the low-dimensional feature value z belongs to from among plural clusters equivalent to the plural normal distributions configuring the GMM.
By training a probability distribution model configured from plural distributions, such as a GMM, as the probability distribution model, the parameter ψ of the GMM is adjusted such the plural normal distributions corresponding to a trend in a broad feature exhibited by the low-dimensional feature value z are contained in the probability distribution model. For example, in cases in which the input data is medical images such as illustrated in
Then from among determination standards pre-determined for each respective cluster, determination section 16 sets the determination standard corresponding to the identified cluster information, namely corresponding to the cluster the low-dimensional feature value z belongs to. Note that the determination standard for each respective cluster can be determined in advance experimentally. For example, the entropy computed during training for each respective cluster the low-dimensional feature value z belongs to may be employed as the determination standard for each respective cluster.
For the determination target input data, the determination section 16 determines whether the input data is normal or abnormal by comparing the entropy computed by the estimation section 12 against the determination standard set according to the cluster information, and outputs a result of the determination.
The determination control device 10 may, for example, be implemented by a computer 40 as illustrated in
The storage section 43 may be implemented by, for example, a hard disk drive (HDD), solid state drive (SSD), or flash memory. A determination control program 50 to cause the computer 40 to function as the determination control device 10 by executing training processing and determination processing, described later, is stored on the storage section 43 serving as a storage medium. The determination control program 50 includes an autoencoder process 60, an estimation process 52, an adjustment process 54, and a determination process 56.
The CPU 41 reads the determination control program 50 from the storage section 43, expands the determination control program 50 into the memory 42, and sequentially executes the processes of the determination control program 50. The CPU 41 operates as the autoencoder 20 illustrated in
The functions implemented by the determination control program 50 may be implemented by, for example, a semiconductor integrated circuit, and more particularly by an application specific integrated circuit (ASIC).
Next, description follows regarding operation of the determination control device 10 according to the first exemplary embodiment. When adjusting the parameters of the autoencoder 20 and the estimation section 12, training input data x is input to the determination control device 10, and the training processing illustrated in
First the training processing will be described in detail, with reference to
At step S12, the encryption section 22 extracts the low-dimensional feature value z from the input data x using the encryption function fθ(x) including the parameter θ, and outputs the low-dimensional feature value z to the adding section 26.
Next at step S14, the estimation section 12 estimates the probability distribution Pψ(z) of the low-dimensional feature value z using the GMM including the parameter ψ. The estimation section 12 also computes the entropy R of the probability distribution Pψ(z)=−log (Pψ(z))
Next at step S16, the noise generation section 24 generates noise ε that is a random number based on a distribution having the same dimensionality as the low-dimensional feature value z, having no inter-correlation between dimensions, and having a mean of 0, and outputs the noise ε to the adding section 26. The adding section 26 then generates a low-dimensional feature value z{circumflex over ( )} resulting from adding the noise ε input from the noise generation section 24 to the low-dimensional feature value z input from the encryption section 22, and outputs the low-dimensional feature value z{circumflex over ( )} to the decryption section 28. Furthermore, the decryption section 28 decrypts the low-dimensional feature value z{circumflex over ( )} using the decryption function gφ(z{circumflex over ( )}) including the parameters φ, and generates output data x{circumflex over ( )}.
Next at step S18, the adjustment section 14 computes an error between the input data x and the output data x{circumflex over ( )} generated at step S16, such as, for example, D=(x−x{circumflex over ( )})2.
Next at step S20, the adjustment section 14 computes a training cost L1 expressed by, for example, a weighted sum of the error D computed at step S18, and the entropy R computed by the estimation section 12 at step S14, as expressed in Equation (2).
Next at step S22, the adjustment section 14 updates the parameter θ of the encryption section 22, the parameter φ of the decryption section 28, and the parameter ψ of the estimation section 12 so as to decrease the training cost L1.
Next at step S24, the adjustment section 14 determines whether or not training has converged. For example, training can be determined as having converged in cases in which the number of times of repeatedly updating the parameters has reached a specific number of times, cases in which the value of the training cost L1 has stopped changing, and the like. Processing returns to step S12 in cases in which training has not converged, and the processing of steps S12 to S22 is repeated for the next input data x. The training processing is ended in cases in which the training has converged.
Next, a detailed description follows regarding the determination processing, with reference to
At step S32, the encryption section 22 extracts the low-dimensional feature value z from the input data x by using the encryption function fθ(x) including the parameter θ.
Next at step S34, the estimation section 12 estimates the probability distribution Pψ(z) of the low-dimensional feature value z using the GMM including the parameter ψ. Moreover, the estimation section 12 computes the probability distribution Pψ(z) entropy R=−log (Pψ(z)). Furthermore, the estimation section 12 computes the membership coefficient γ of the GMM.
Next at step S36, the determination section 16 identifies, as cluster information indicating which cluster the low-dimensional feature value z belongs to, a cluster equivalent to the normal distribution corresponding to the maximum coefficient from among the coefficients γk contained in the K dimensional vector that is the computed membership coefficient γ.
Next at step S38, from among the determination standards pre-determined for each respective cluster, the determination section 16 sets the determination standard corresponding to the cluster information identified at step S36, namely corresponding to the cluster the low-dimensional feature value z belongs to. The determination section 16 then determines for the determination target input data x whether the input data x is normal or abnormal by comparing the entropy R computed by the estimation section 12 at step S34 against the determination standard that was set.
Next at step S40, the determination section 16 outputs the result of normal/abnormal determination, and then ends the determination processing.
As described above, the determination control device according to the first exemplary embodiment encrypts the input data, estimates the low-dimensional feature value obtained as a probability distribution, and decrypts the feature value resulting from adding noise to the low-dimensional feature value to generate the output data. Moreover, the determination control device adjusts the respective parameters for encryption, probability distribution estimation, and decryption based on the training cost including the error between the input data and output data and the probability distribution entropy. The determination control device then determines whether or not the determination target input data is normal using the parameters after adjustment, and sets the determination standard corresponding to the cluster that the low-dimensional feature value belongs to. This accordingly enables normal/abnormal determination to be performed by comparison of a local feature in a cluster after clustering on the low-dimensional feature value has been performed using a broad feature exhibited by the low-dimensional feature value. This accordingly enables distinguishing between normal and abnormal to be suppressed from becoming difficult even in cases in which an input data feature exhibits various probability distributions, and a difference between normal and abnormal is a local feature, thereby enabling control such that determination between normal or abnormal can be performed with good accuracy.
Next, description follows regarding a second exemplary embodiment. Note that detailed explanation will be omitted regarding parts of the determination control device according to the second exemplary embodiment common to those of the determination control device 10 according to the first exemplary embodiment.
A determination control device 210 according to the second exemplary embodiment includes, from a functional perspective, an autoencoder 220, an estimation section 212, an adjustment section 214, and a determination section 216, as illustrated in
First, explanation follows regarding functional sections that function during training, with reference to
As illustrated in
The lower encryption section 221 extracts an intermediate output y of low-dimensional feature values from input data x using an encryption function fθy(x) including a parameter θy. The lower encryption section 221 outputs the extracted intermediate output y to the lower adding section 225 and the upper encryption section 222. The upper encryption section 222 extracts a low-dimensional feature value z from the intermediate output y using an encryption function fθz(y) including a parameter θz. The upper encryption section 222 outputs the extracted low-dimensional feature value z to the upper adding section 226. A CNN algorithm may be applied as the encryption function fθy(x) and the encryption function fθz(y)
The lower noise generation section 223 generates a noise εy having the same dimensionality as the intermediate output y, and outputs the generated noise to the lower adding section 225. The upper noise generation section 224 generates a noise εz having the same dimensionality as the low-dimensional feature value z, and outputs the generated noise to the upper adding section 226. The noise εy and noise εz are each a random number based on a distribution having no inter-correlation between dimensions, and having a mean of 0.
The lower adding section 225 adds the noise εy input from the lower noise generation section 223 to the intermediate output y input from the lower encryption section 221 so as to generate an intermediate output y{circumflex over ( )} (“{circumflex over ( )}(hat)” above “y” in the drawings), and outputs the intermediate output y{circumflex over ( )} to the lower decryption section 227. The upper adding section 226 adds the noise εz input from the upper noise generation section 224 to the low-dimensional feature value z input from the upper encryption section 222 so as to generate a low-dimensional feature value z{circumflex over ( )}, and outputs the low-dimensional feature value z{circumflex over ( )} to the upper decryption section 228.
The lower decryption section 227 generates output data x{circumflex over ( )} having the same dimensionality as the input data x by decrypting the intermediate output y{circumflex over ( )} input from the lower adding section 225 using a decryption function gφy(y{circumflex over ( )}) including a parameter φy. The upper decryption section 228 generates an intermediate output y{circumflex over ( )}′ having the same dimensionality as the intermediate output y by decrypting the low-dimensional feature value z{circumflex over ( )} input from the upper adding section 226 using a decryption function gφz(z{circumflex over ( )}) including parameter φz. A transposed-convolution CNN algorithm may be applied as the decryption function gφy(y{circumflex over ( )}) and decryption function gφz(z{circumflex over ( )}).
Similarly to the estimation section 12 in the first exemplary embodiment, the estimation section 212 acquires the low-dimensional feature value z extracted by the upper encryption section 222, and estimates a probability distribution Pψz(z) of the low-dimensional feature value z using the GMM including the parameter ψz. The estimation section 212 also computes the entropy Rz of the probability distribution Pψz(z)=−log (Pψz(z)).
Furthermore, the estimation section 212 also acquires the intermediate output y extracted by the lower encryption section 221 and the intermediate output y{circumflex over ( )}′ generated by the upper decryption section 228, and estimates the intermediate output y as a conditional probability distribution under local feature values of the intermediate output y and the intermediate output y{circumflex over ( )}′. For example, the estimation section 212 employs a multi-dimensional Gaussian distribution model including parameter wy to estimate a conditional probability distribution Pψy(y|y{circumflex over ( )}′).
More specifically, the estimation section 212, for example, uses an auto-regressive (AR) model such as a masked CNN or the like to estimate parameters μ and σ of a multi-dimensional Gaussian distribution from information in peripheral regions to the intermediate output y and the intermediate output y{circumflex over ( )}′. An AR model is a model that predicts a next frame from directly proceeding frames. For example, when a masked CNN having a kernel size of 1 is utilized for a case in which the input data is image data, as illustrated in
Moreover, the estimation section 212 employs the estimated μ(y) and σ(y) to compute the conditional probability distribution Pψy(y|y{circumflex over ( )}′) entropy Ry=−log (Pψy(y|y{circumflex over ( )}′)) using the following Equation (3). Note that i in Equation (3) is a variable to identify pixels (m, ny in the image data example above) in each dimension of the intermediate output y.
The adjustment section 214 computes a training cost L2 including an error between the input data x and the output data x{circumflex over ( )} corresponding to this input data, and including the entropy Rz and the entropy Ry computed by the estimation section 212. The adjustment section 214 adjusts the respective parameters θz, θy, φz, φy, ψz, ψy in the lower encryption section 221, the upper encryption section 222, the lower decryption section 227, the upper decryption section 228, and the estimation section 212 based on the training cost L2. For example, the adjustment section 214 repeatedly executes processing to generate the output data x{circumflex over ( )} from the input data x while updating the parameters θz, θy, φz, φy, ψz, ψy so as to minimize the training cost L2 expressed by a weighted sum of the error between x and x{circumflex over ( )} and the entropies Rz and Ry as illustrated in the following Equation (4). The parameters of the autoencoder 220 and the estimation section 212 are trained thereby.
L
2
=E
x˜p(x),ε
˜N(0,σ
),ε
˜N(0,σ
)
[R
z
+R
y
+λ·D] (4)
Next, description follows regarding functional sections that function during determination, with reference to
The lower encryption section 221 extracts the intermediate output y of the low-dimensional feature value from the input data x by encrypting the input data x based on the encryption function fθy(x) set with the parameter θy adjusted by the adjustment section 214, and inputs the intermediate output y to the upper encryption section 222.
The upper encryption section 222 extracts the low-dimensional feature value z from the intermediate output y by encrypting the intermediate output y based on the encryption function fθz(y) set with the parameter θz adjusted by the adjustment section 214, and inputs the low-dimensional feature value z to the upper decryption section 228.
The upper decryption section 228 generates the intermediate output y′ having the same dimensionality as the intermediate output y by decrypting the low-dimensional feature value z input from the upper encryption section 222 using the decryption function gφz(z) including the parameter φz adjusted by the adjustment section 214.
The estimation section 212 acquires the low-dimensional feature value z extracted by the upper encryption section 222, and estimates the probability distribution Pψz(z) of the low-dimensional feature value z using the GMM set with the parameter ψz adjusted by the adjustment section 214. The estimation section 212 computes the membership coefficient γ of the GMM in the process to estimate the probability distribution Pψz(z).
Moreover, the estimation section 212 also acquires the intermediate output y extracted by the lower encryption section 221 and the intermediate output y′ generated by the upper decryption section 228. The estimation section 212 estimates the intermediate output y as the conditional probability distribution Pψy(y|y{circumflex over ( )}′) under local feature values of the intermediate output y and the intermediate output y′ using a multi-dimensional Gaussian distribution model including the parameter ψy adjusted by the adjustment section 214. The estimation section 212 estimates the parameters μ(y) and σ(y) of the multi-dimensional Gaussian distribution while estimating the conditional probability distribution Pψy(y|y{circumflex over ( )}′).
Moreover, the estimation section 212 uses the following Equation (5) to compute a difference ΔRy between the entropy Ry as computed from the estimated μ(y) and σ(y) using the Equation (3), and an expected value of entropy as computed from the estimated σ(y).
Similarly to the determination section 216 of the first exemplary embodiment, the determination section 216 uses the membership coefficient γ computed by the estimation section 212 to identify cluster information indicating which cluster the low-dimensional feature value z belongs to. The determination section 16 sets, from among respective determination standards pre-determined for each cluster, a determination standard corresponding to the identified cluster information, namely to the cluster the low-dimensional feature value z belongs to. Then for the determination target input data x, the determination section 216 determines the input data x to be normal or abnormal by comparing the entropy difference ΔRy computed by the estimation section 212 against the determination standard set corresponding to the cluster the low-dimensional feature value z belongs to.
The determination control device 210 may be implemented by, for example, the computer 40 illustrated in
The CPU 41 reads the determination control program 250 from the storage section 43, expands the determination control program 250 in the memory 42, and sequentially executes the processes of the determination control program 250. The CPU 41 operates as the autoencoder 220 illustrated in
Note that the functions implemented by the determination control program 250 may, for example, be implemented by a semiconductor integrated circuit, and more particularly by an application specific integrated circuit (ASIC).
Next, description follows regarding operation of the determination control device 210 according to the second exemplary embodiment. While adjusting the parameters of the autoencoder 220 and the estimation section 212, the training processing illustrated in
First the training processing will be described in detail, with reference to
At step S212, the lower encryption section 221 uses the encryption function fθy(x) including the parameter θy to extract the intermediate output y of the low-dimensional feature value from the input data x, and outputs the intermediate output y to the lower adding section 225 and the upper encryption section 222. The upper encryption section 222 uses the encryption function fθz(y) including the parameter θz to extract the low-dimensional feature value z from intermediate output y, and outputs the low-dimensional feature value z to the upper adding section 226.
Next at step S213, the estimation section 212 estimates the probability distribution Pψz(z) of the low-dimensional feature value z using the GMM including the parameter ψz. The estimation section 212 also computes the probability distribution Pψz(z) entropy R=−log (Pψz(z)).
Next, at step S214, the lower noise generation section 223 generates noise εy that is a random number based on a distribution having the same dimensionality as the intermediate output y, having no inter-correlation between dimensions, and having a mean of 0, and outputs the noise εy to the lower adding section 225. The lower adding section 225 then generates an intermediate output y{circumflex over ( )} resulting from adding the noise εy input from the lower noise generation section 223 to the intermediate output y input from the lower encryption section 221, and then outputs the intermediate output y{circumflex over ( )} to the lower decryption section 227. Furthermore, the lower decryption section 227 also decrypts the intermediate output y{circumflex over ( )} using the decryption function gφy(y{circumflex over ( )}) including the parameter φy, and generates output data x{circumflex over ( )}.
Next at step S216, the adjustment section 214 computes an error between the input data x and the output data x{circumflex over ( )} generated at step S214, such as D=(x−x{circumflex over ( )})2 for example.
Next at step S217, the upper noise generation section 224 generates noise εz that is a random number based on a distribution having the same dimensionality as the low-dimensional feature value z, having no inter-correlation between dimensions, and having a mean of 0, and outputs the noise εz to the upper adding section 226. The upper adding section 226 generates a low-dimensional feature value z{circumflex over ( )} resulting from adding the noise εz input from the upper noise generation section 224 to the low-dimensional feature value z input from the upper encryption section 222, and outputs the low-dimensional feature value z{circumflex over ( )} to the upper decryption section 228. Furthermore, the upper decryption section 228 decrypts the low-dimensional feature value z{circumflex over ( )} using a decryption function gφz(z{circumflex over ( )}) including the parameter φz, and generates an intermediate output y{circumflex over ( )}′.
Next, at step S218, the estimation section 212 uses an AR model, for example, to extract a peripheral region to each of the intermediate output y extracted by the lower encryption section 221 and the intermediate output y{circumflex over ( )}′ generated by the upper decryption section 228. The estimation section 212 then estimates the intermediate output y as a conditional probability distribution Pψy(y|y{circumflex over ( )}′) by estimating parameters μ(y) and σ(y) of a multi-dimensional Gaussian distribution. The estimation section 212 employs the estimated μ(y) and σ(y) to compute the conditional probability distribution Pψy(y|y{circumflex over ( )}′) entropy Ry=−log (Pψy(y|y{circumflex over ( )}′)) using Equation (3).
Next at step S219, the adjustment section 214 computes a training cost L2, for example as expressed by Equation (4), expressed by a weighted sum of the error D computed at step S216 and the entropy Rz and the entropy Ry computed at step S213 and step S218.
Next, at step S220, the adjustment section 214 updates the respective parameters θz, θy, φz, φy, ψz, ψy of the lower encryption section 221, the upper encryption section 222, the lower decryption section 227, the upper decryption section 228, and the estimation section 212 so as to decrease the training cost L2.
Next, at step S24, the adjustment section 214 determines whether or not training has converged. In cases in which training has not converged, processing returns to step S212, and the processing of step S212 to step S220 is repeated for the next input data x. The training processing is ended in cases in which training has converged.
Next detailed description will be given regarding determination processing, with reference to
At step S232, the lower encryption section 221 extracts an intermediate output y from the input data x by using the encryption function fθy(x), and outputs the intermediate output y to the upper encryption section 222. The upper encryption section 222 extracts a low-dimensional feature value z from the intermediate output y using the encryption function fθz(y).
Next at step S233, the upper decryption section 228 decrypts the low-dimensional feature value z using the decryption function gφz(z) and generates an intermediate output y′.
Next at step S234, the estimation section 212 uses an AR model, for example, to extract a peripheral region to each of the intermediate output y extracted by the lower encryption section 221 and the intermediate output y{circumflex over ( )}′ generated by the upper decryption section 228. The estimation section 212 then estimates the intermediate output y as a conditional probability distribution Pψy(y|y{circumflex over ( )}′) by estimating parameters μ(y) and σ(y) of a multi-dimensional Gaussian distribution.
Next at step S235, the estimation section 212 uses Equation (5) to compute a difference ΔRy between the entropy Ry as computed at step S234 from the estimated μ(y) and σ(y) using the Equation (3), and an expected value of entropy as computed from the estimated σ(y).
Next, at step S236, the estimation section 212 uses a GMM to estimate a probability distribution Pψz(z) for the low-dimensional feature value z, and computes a membership coefficient γ of the GMM.
Next based on the membership coefficient y computed at step S236, at step S237 the determination section 216 identifies cluster information indicating which cluster the low-dimensional feature value z belongs to.
Next at step S238, from among respective determination standards pre-determined for each respective cluster, the determination section 216 sets a determination standard corresponding to the cluster information identified at step S237, namely to the cluster the low-dimensional feature value z belongs to. Then for the determination target input data x, the determination section 216 determines the input data x to be normal or abnormal by comparing the entropy difference ΔRy computed by the estimation section 212 at step S235 against the determination standard that was set.
Next, at step S40, the determination section 216 outputs a determination result of normal or abnormal, and ends the determination processing.
As described above, the determination control device according to the second exemplary embodiment extracts an intermediate output of the low-dimensional feature value by lower layer encryption, and extracts the low-dimensional feature value by upper layer encryption. Moreover, for respective outputs of the decrypted intermediate output and low-dimensional feature value, the determination control device estimates a conditional probability distribution of data of interest under information of the peripheral region to the data of interest in the intermediate output. Moreover, similarly to in the first exemplary embodiment, the determination control device sets a determination standard corresponding to the cluster that the low-dimensional feature value belongs to. The determination control device then determines whether or not the determination target input data is normal using the entropy of the estimated conditional probability distribution and the determination standard. The accordingly enables determination of normal or abnormal to be performed by evaluation of a local feature expressed by the intermediate output under a broad feature expressed by the low-dimensional feature value. This accordingly enables distinguishing between normal and abnormal to be suppressed from becoming difficult even in cases in which the input data feature exhibits various probability distributions, and a difference between normal and abnormal is a local feature, thereby enabling control such that determination between normal or abnormal can be performed with good accuracy.
Note that in the second exemplary embodiment, a uniform distribution U (−½, ½) may be employed for the noise εy added to the intermediate output y to generate the intermediate output y{circumflex over ( )}. In such cases, the conditional probability distribution Pψy(y|y{circumflex over ( )}′) estimated during training is that of following Equation (6). Moreover, the entropy difference ΔRy computed during estimation is that of following Equation (7). Note that C in Equation (7) is a constant determined experimentally according to the designed model.
Moreover, although each of the exemplary embodiments has been described mainly based on examples of cases in which the input data is image data, the input data may be waveform data, such as that of an electrocardiogram or an electroencephalogram. In such cases, for example, a CNN or the like transformed onto a single dimension may be employed as the algorithm used for encryption and the like.
Moreover, although each of the exemplary embodiments has been described mainly with respect to determination control devices including each of the functional sections employed during training and during determination in a single computer, there is no limitation thereto. A training device including an autoencoder, estimation section, and an adjustment section where parameters prior to adjustment are employed, and a determination device including an autoencoder, estimation section, and a determination section where already adjusted parameters are employed, may each be configured by a separate computer.
Moreover, although each of the exemplary embodiments has been described for an embodiment in which the determination control program is pre-stored (installed) in the storage section, there is no limitation thereto. The program according to the technology disclosed herein may be provided in a format stored on a storage medium such as a CD-ROM, DVD-ROM, USB memory, or the like.
In related technology there is an issue in that sometimes determination as normal or abnormal is not able to be made with good accuracy in cases in which the input data feature exhibits various probability distributions, and a feature of a probability distribution indicating abnormal data is buried in differences between the various probability distributions.
The technology disclosed herein enables determination of normal or abnormal to be performed with good accuracy even in cases in which input data feature exhibits various probability distributions.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
This application is a continuation application of International Application No. PCT/JP2020/035558, filed Sep. 18, 2020, the disclosure of which is incorporated herein by reference in its entirely.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2020/035558 | Sep 2020 | US |
Child | 18119333 | US |