The present invention relates to a learning apparatus, an anomaly detecting apparatus, a learning method, a method for detecting an anomaly, and a program.
For a business operator who operates an information and communication technology (ICT) system, it is one of important tasks to grasp a state of an anomaly occurring in the ICT system and quickly respond to the anomaly. Hence, research on a method for detecting an anomaly occurring in an ICT system at an early stage has been conducted in the related art. In particular, there has been proposed an unsupervised anomaly detecting technique using deep learning (DL) in which an anomaly is detected by learning a normal state using normal data of an ICT system and calculating a degree of deviation from the normal state at the time of a test (for example, Non Patent Literatures 1 and 2).
The ICT system provides various services, and users who use these services also have various tendencies. Hence, in order to learn a normal state of the ICT system by the unsupervised anomaly detecting technique using the DL, a large amount of normal data is required. In general, since the normal time of the ICT system is often longer than the abnormal time, in the ICT system that is operated for a long period of time, it is possible to collect a large amount of normal data.
However, there is a case where only a small amount of normal data can be collected. For example, immediately after a new ICT system is constructed, a sufficient amount of normal data cannot be collected. Hence, an anomaly cannot be detected by an unsupervised anomaly detecting technique until a sufficient amount of normal data is collected.
In addition, for example, in a case where a normal state of an ICT system has changed by providing a new service, a conventional unsupervised anomaly detecting technique cannot be used, and thus, similarly, anomaly cannot be detected until a sufficient amount of normal data is collected.
An embodiment of the present invention has been made in view of such points described above, and an object thereof is to realize unsupervised anomaly detection in a target system with a small amount of normal data.
In order to achieve the above-described object, a learning apparatus according to an embodiment includes: an input unit that inputs a set of normal data of a first system serving as a target domain and a set of normal data of a second system serving as a source domain; and a learning unit that learns a model including a first encoder having data of the target domain as an input, a second encoder having data of the source domain as an input, a discriminator having output data of either the first encoder or the second encoder as an input to discriminate whether the output data is data indicating a feature of either the target domain or the source domain, and a DeepSVDD having the output data as an input, by using the set of normal data of the first system and the set of normal data of the second system.
It is possible to realize unsupervised anomaly detection in a target system with a small amount of normal data.
Hereinafter, an embodiment of the present invention will be described. In the embodiment, an unsupervised anomaly detecting technique of transferring information obtained at the time of learning a normal state of an ICT system having a large amount of normal data to an ICT system having a small amount of normal data will be described focusing on a point that a configuration and a function are different for each ICT system, but in a case where the ICT system has a similar configuration and a similar function, ICT systems have similar normal states. This unsupervised anomaly detecting technique enables to obtain an anomaly detector capable of detecting an anomaly in an ICT system (hereinafter, also referred to as a target system) having only a small amount of normal data.
In addition, an anomaly detecting apparatus 10 that creates an anomaly detector and detects an anomaly of the target system by the anomaly detector by the unsupervised anomaly detecting technique will be described.
Hereinafter, a theoretical configuration of the unsupervised anomaly detecting technique according to the embodiment will be described.
First, an ICT system having a large amount of normal data is set as a source domain S, and an ICT system (target system) having a small amount of normal data is set as a target domain T.
In addition, assuming that one item of normal data obtained from the source domain S is n-dimensional vector data xs=[xl(s), . . . , xn(s)], a data set including the n-dimensional vector data xs is set as follows.
Here, n represents the number of types of data obtained in the source domain S, and |D3| represents the number of n-dimensional vector data.
Similarly, assuming that one item of normal data obtained from the target domain T is m-dimensional vector data xt=[x1(t), . . . , xm(t)], a data set including the m-dimensional vector data xt is set as follows.
Here, m represents the number of types of data obtained in the target domain T, and |DT| represents the number of m-dimensional vector data.
Next, a model used in the unsupervised anomaly detecting technique according to the embodiment will be described. As a technique for detecting an anomaly in each of the source domain S and the target domain T, an encoder which is a type of DL and a deep support vector data description (DeepSVDD: Support vector data description method using deep learning) (Reference Literature 1) are used. More specifically, after input data (normal data) is compressed by an encoder E, learning is performed so that a hypersphere (a volume thereof or an area of hyperspherical plane) is minimized when the compressed data (feature value) is mapped onto the hyperspherical plane by DeepSVDD. Accordingly, after the learning, anomaly detection is performed by the SVDD and an encoder having data of the target domain as an input. Hereinafter, the encoder of the source domain S is denoted by ES, a parameter thereof is denoted by θS, the encoder of the target domain is denoted by ET, and a parameter thereof is denoted by θE.
In addition, as a technique for transferring knowledge in a certain domain to another domain, a technique of using a generative adversarial network (GAN) is known (Reference Literature 2). In this respect, in the unsupervised anomaly detecting technique according to the embodiment, a model obtained by combining the encoders ES and ET and the DeepSVDD is used in the GAN-based transfer learning technique.
Specifically, by extracting feature values from the source domain S and the target domain T by the encoders ES and ET, respectively, an expression (feature) that can be transferred from the source domain S is acquired and applied to the target domain T. In addition, a discriminator D(⋅; θD) that discriminates which domain causes output ES(xs; θS) or ET(xt; θT) of the encoder ES(⋅; θS) or ET(⋅; θT) to be made (that is, discriminates which domain causes input data of the encoders to be extracted, from the outputs of the encoders) is prepared. Here, the discriminator D(⋅; θD) is expressed by a neural network, and θD represents a parameter thereof. That is, the discriminator D is a neural network that discriminates whether the input data is data from the source domain or data from the target domain. Further, a dimension of an input layer of the discriminator D(⋅; θD) and a dimension of an input layer of the encoders ES(⋅; θS) and ET(⋅; θT) are the same (for example, both dimensions are a dimension number k).
The above-described model is a learning target. Hereinafter, i is an index of data input to the model which is the learning target, and i=1, . . . , |DS|+|DT|. In addition, it is assumed that only either si∈{1, . . . , |DS|} or ti∈{1, . . . , |DT|} is satisfied for each i. Further, regarding i with which si∈{1, . . . , |DS|} is not satisfied, for example, si=|DS|+1 or the like may be satisfied. Similarly, regarding i with which ti∈{1, . . . , |DT|} is not satisfied, for example, ti=|DT|+1 or the like may be satisfied. Hereinafter, symbols (subscripts) of “si” and “ti” assigned at a lower right position of “x” are expressed as “xs_i” and “xs_i” in the text of the specification.
In this case, a schematic diagram of the model which is the learning target is illustrated in
Next, a loss function of the model which is the learning target is defined.
First, a loss function of the discriminator D(⋅; θD) is defined below.
Here, LS represents binary cross entropy. In addition, di represents whether an i-th item of data is in the source domain or the target domain. For example, when the i-th item of data input to the model is in the source domain, di=0 is set, and when the i-th item of data is in the target domain, di=1 is set. Further, NS is a set of i satisfying si∈{1, . . . , |DS|}, and NT is a set of i satisfying ti∈{1, . . . , |DT|}. Further, note that input data to the model is xs_i∈DS when i∈NS, and input data to the model is xt_i∈DT when i∈NT.
Next, a loss function of the DeepSVDD DSVDD (⋅; θDSVDD) is defined below.
Here, θDSVDD represents a parameter of DSVDD. In addition, c is a constant given in advance before the learning of the model but is calculated from a data set for learning (that is, DS and DT). Specifically, after parameters θS and θT of the encoders ES(⋅; θS) and ET(⋅; θT) and the parameter θDSVDD of the DeepSVDD DSVDD (⋅; θDSVDD) are initialized, DSVDD (ES (xs; θS); θDSVDD) and DSVDD(ES(xt; θT); θDSVDD) are calculated for each xs∈DS and each xt∈DT, and an average thereof is defined as c.
Collectively, the loss function of the model which is the learning target is defined below.
Here, W1 represents a weight of an 1-th layer of a neural network representing DeepSVDD DSVDD, L represents the number of layers, and λ represents a hyperparameter. In addition, ∥⋅∥F represents the Frobenius norm. Further, note that W1 is included in the parameter θDSVDD. In addition, note that |NS|=|DS| and |NT|=|DT|.
Accordingly, parameter learning is performed to minimize a loss function expressed in Expression 5 above. That is, each parameter learning is performed to minimize a hypersphere around c with respect to the encoders ES and ET and the DeepSVDD DSVDD and maximize discrimination performance of the discriminator D with respect to the encoders ES and ET and the discriminator D. Specifically, the parameters learning is performed as follows.
Further, various parameter learning techniques can be considered, and appropriate techniques may be used. For example, optimization using Adam (Reference Literature 3) can be used.
After learning is performed by Expression 6 above, anomaly detection is performed by DSVDD(ET(⋅; θT); θDSVDD) which completes the learning. Specifically, data which is an anomaly detection target in the target domain is set as x, and DSVDD (ET (x; θT); θDSVDD) is calculated. In a case where a difference between DSVDD(ET(x; θT); θDSVDD) and c exceeds a predetermined threshold, this case is determined as abnormal, otherwise a case is determined as normal. Further, the threshold can be optionally set, and for example, it is conceivable to use μ+σ2, μ−σ2, or the like as a threshold when a difference between DSVDD(ET(xt; θT); θDSVDD) and c for each xt∈DT is calculated using DSVDD(ET(⋅; θT); θDSVDD) obtained after the learning, an average thereof is represented by μ, and a variance thereof is represented by σ2.
Next, a hardware configuration of an anomaly detecting apparatus 10 according to the embodiment will be described with reference to
As illustrated in
The input device 101 is, for example, a keyboard, a mouse, a touch panel, or the like. The display device 102 is, for example, a display or the like.
The external I/F 103 is an interface with an external device such as a recording medium 103a. The anomaly detecting apparatus 10 can read and write in the recording medium 103a via the external I/F 103. Further, examples of the recording medium 103a include a compact disc (CD), a digital versatile disk (DVD), a secure digital memory card (SD memory card), and a universal serial bus (USB) memory card.
The communication I/F 104 is an interface for connecting the anomaly detecting apparatus 10 to a communication network. The processor 105 is, for example, any of various arithmetic devices such as a central processing unit (CPU) and a graphics processing unit (GPU). The memory device 106 is any of various storage devices such as a hard disk drive (HDD), a solid state drive (SSD), a random access memory (RAM), a read only memory (ROM), and a flash memory.
The anomaly detecting apparatus 10 according to the embodiment has the hardware configuration illustrated in
Next, another hardware configuration of the anomaly detecting apparatus 10 according to the embodiment will be described with reference to
As illustrated in
In addition, the anomaly detecting apparatus 10 according to the embodiment includes a target domain DB 204, a source domain DB 205, and a learned model DB 206. Each of these DBs (databases) is realized by, for example, the memory device 106.
The learning unit 201 learns the model (that is, model including the encoders ES and ET, the DeepSVDD DSVDD, and the discriminator D) illustrated in
When DSVDD(ET(⋅; θT); θDSVDD) included in the learned model stored in the learned model DB 206 is used as an anomaly detector, the inference unit 202 determines whether or not an anomaly has occurred in a target system, using the anomaly detector and the m-dimensional vector data x as the anomaly detection target.
The user interface unit 203 outputs the determination result by the inference unit 202 to the user. For example, the user interface unit 203 outputs the determination result to a terminal or the like used by an operator or the like of the target system.
The target domain DB 204 stores a data set DT of the target domain T. The source domain DB 205 stores a data set DS of the source domain S. The learned model DB 206 stores a learned model.
Further, the functional configuration of the anomaly detecting apparatus 10 illustrated in
Next, a flow of overall processes executed by the anomaly detecting apparatus 10 according to the embodiment will be described with reference to
Step S101: The learning unit 201 learns the model illustrated in
Step S102: The inference unit 202 determines whether or not an anomaly has occurred in a target system, using the anomaly detector and the m-dimensional vector data xt of the anomaly detection target, when DSVDD (ET(⋅; θT); θDSVDD) included in the learned model stored in the learned model DB 206 is used as an anomaly detector. That is, the inference unit 202 determines that, when a difference between DSVDD(ET(x; θT); θDSVDD) and c exceeds a predetermined threshold, an anomaly has occurred and otherwise the target system is normal.
Step S103: The user interface unit 203 outputs the determination result (normal or abnormal) of Step S102 to the user. Further, the user interface unit 203 may output the result to the user only when the determination result of Step S102 is abnormal.
As described above, the anomaly detecting apparatus 10 according to the embodiment can detect the anomaly of the target system by the unsupervised anomaly detecting technique using the DL by transferring the information on the normal state of the ICT system having a large amount of normal data even when there is only a small amount of normal data of the target system.
Further, as described above, the anomaly detecting apparatus 10 has the learning phase and the inference phase, and in the embodiment, the same anomaly detecting apparatus 10 executes the learning phase and the inference phase, but these phases may be executed by different respective apparatuses. In addition, the anomaly detecting apparatus 10 in the learning phase may be referred to as a “learning apparatus” or the like.
The present invention is not limited to the above-mentioned specifically disclosed embodiments, and various modifications and changes, combinations with known technologies, and the like can be made without departing from the scope of the claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/039838 | 10/28/2021 | WO |