The present invention relates to an information processing apparatus, a creation method, and a creation program.
A conventionally known approach to anomaly-based anomaly detection using unsupervised learning is to learn probability distributions of normal data from the normal data and create models. Here, if a learned model is created without dividing data, the detection performance degrades, but the learning cost decreases, and also the model can be reused. On the other hand, if a learned model is created by dividing data based on a certain index such as IP address, the detection performance improves, but the learning cost increases, and the model cannot be reused. Thus, there are trade-offs. Furthermore, there also exists a method of performing an exhaustive check regarding various division granularities to find an appropriate division granularity that does not degrade the detection performance.
However, the aforementioned method of performing an exhaustive check regarding various division granularities to find an appropriate division granularity that does not degrade the detection performance requires a high learning cost, and therefore, there is a problem in that it is difficult to determine an appropriate data division method at a low learning cost.
In order to address the above-described problem and achieve an object, an information processing apparatus of the present invention includes: a calculation unit configured to calculate, with respect to datasets into which data is divided based on individual labels serving as candidates for an index when the data is divided, an amount of information for each of division methods that use the respective labels; a division unit configured to divide the data into a plurality of datasets based on the division method that provides the highest amount of information, of the amounts of information calculated by the calculation unit; and a creation unit configured to create, with use of the datasets divided by the division unit, a learned model for each of the dataset.
The present invention has the effect of making it possible to determine an appropriate data division method at a low learning cost.
Hereinafter, embodiments of an information processing apparatus, a creation method, and a creation program according to the present application will be described in detail based on the drawings. Note that the information processing apparatus, the creation method, and the creation program according to the present application are not limited to the following embodiments.
In an embodiment below, the configuration of an information processing apparatus 10 according to a first embodiment and the flow of processing performed by the information processing apparatus 10 will be described in this order, and finally, the effects of the first embodiment will be described.
First, the configuration of a detection system according to the first embodiment will be described using
The information processing apparatus 10 acquires normal-state data and detection target data regarding the devices 30, learns the acquired normal-state data, and performs anomaly detection on the acquired detection target data. For example, the information processing apparatus 10 acquires logs and the like of communications that are performed between the external network 40 and the devices 30 and that pass through the gateway 20. The devices 30 each may be, for example, an IoT device, such as a surveillance camera or a wearable device. For example, in the case where a device 30 is a surveillance camera, the information processing apparatus 10 can acquire traffic data at the time when the resolution of the surveillance camera is changed, as normal-state data.
Next, the configuration of the information processing apparatus 10 will be described using
The input/output unit 11 receives data input from a user. Examples of the input/output unit 11 include input devices, such as a mouse and a keyboard, and display devices, such as a display and a touch screen. The communication unit 12 performs data communication with other apparatuses via a network. For example, the communication unit 12 is an NIC (Network Interface Card). The communication unit 12 performs data communication with the gateway 20, for example.
The storage unit 14 is a storage device, such as an HDD (Hard Disk Drive), an SSD (Solid State Drive), or an optical disk. Note that the storage unit 14 may also be a data-rewritable semiconductor memory, such as a RAM (Random Access Memory), a flash memory, or an NVSRAM (Non Volatile Static Random Access Memory). The storage unit 14 stores an OS (Operating System) and various programs that are executed by the information processing apparatus 10. Furthermore, the storage unit 14 stores various kinds of information that are used to execute the programs. In addition, the storage unit 14 has a learned model storage unit 14a. The learned model storage unit 14a stores parameters and the like of learned models.
The control unit 13 controls the entire information processing apparatus 10. The control unit 13 is, for example, an electronic circuit, such as a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a TPU (Tensor Processing Unit), or an MPU (MicroProcessing Unit), or an integrated circuit, such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array). The control unit 13 has an internal memory for storing programs that specify various processing procedures, as well as control data, and executes processing using the internal memory. The control unit 13 functions as various processing units by various programs running. For example, the control unit 13 has an acquisition unit 13a, a calculation unit 13b, a division unit 13c, a creation unit 13d, and a detection unit 13e.
The acquisition unit 13a acquires traffic data as learning data or detection target data. For example, the acquisition unit 13a may acquire traffic data from the devices 30 in real time, or may be configured to acquire traffic data that is input automatically or manually at predetermined times.
Here, a specific example of the traffic data acquired by the acquisition unit 13a will be described using
The calculation unit 13b calculates, with respect to datasets into which data is divided based on individual labels serving as candidates for an index when the data is divided, the amount of information for each of division methods that use the respective labels. For example, upon receiving the traffic data acquired by the acquisition unit 13a, the calculation unit 13b creates a list of labels serving as the candidates for division. Note that the label list may be set manually in advance.
Then, the calculation unit 13b, for example, calculates the score of the amount of mutual information with respect to a label f using an equation (1) below. Hereinafter, let “f” denote a label, and “vf” be a value taken by the label f. Note that, although the second term requires a high calculation cost, it is a common term that does not depend on f and therefore may be ignored in the calculation here.
Note that it is assumed that the distribution of “x|vfvf” in the calculation of the amount of mutual information is already known. For estimation of the distribution of “x|vfvf”, a VAE (Variational AutoEncoder) may be used as a method for performing probability density estimation from sampling (see Reference 1 below).
However, when the calculation unit 13b estimates the distribution of “x|vfvf” using the VAE, calculation is costly. For this reason, a MINE (Mutual Information Neural Estimation), which is a method for calculating the amount of mutual information from sampling, may be used (see Reference 2 below). The calculation unit 13b may be configured to calculate the amount of mutual information for each label using the MINE. Since the calculation unit 13b can calculate the amount of mutual information for each label using the MINE without involving estimation of the probability distribution p(x) from a dataset x, the calculation cost can be reduced.
The division unit 13c divides the data into a plurality of datasets based on the division method that provides the highest amount of information, of the amounts of information calculated by the calculation unit 13b. Thus, for example, when there exist division methods f1 and f2 using respective labels, the division unit 13c compares I(x,vf1) and I(x,vf2) and divides the data based on the label that provides the higher amount of information. That is to say, the division unit 13c divides the data into vf datasets. Note that a label, for example, f1, is not limited to a label consisting of a single item, such as Src IP, and may also be constituted by a tuple, such as (Src IP, Dst Port). In addition, when the difference between the scores of the amount of information of the labels calculated by the calculation unit 13b is small, the division unit 13c may divide the data into large datasets such that the number of models is small.
The creation unit 13d creates, with use of the datasets divided by the division unit 13c, a learned model for each dataset. For example, the creation unit 13d generates, for each of the divided datasets, a learned model for estimating the probability distribution p(x) from a dataset x by probability density estimation, and stores the learned model in the learned model storage unit 14a. Note that p(x) may be a logarithm, such as log p(x).
The detection unit 13e estimates the probability of occurrence of detection target data using the learned models learned by the creation unit 13d, and if the probability of occurrence is lower than a predetermined threshold value, the detection unit 13e detects an anomaly.
For example, when the acquisition unit 13a has acquired new data x′, the detection unit 13e calculates the occurrence probability p(x′) using the learned models, and then outputs a report regarding an anomaly, or outputs an alert, if the occurrence probability p(x′) is lower than a preset threshold value.
[Processing Procedures of Information Processing Apparatus]
Next, an example of processing procedures of the information processing apparatus 10 according to the first embodiment will be described using
As illustrated in
Subsequently, the division unit 13c divides the data based on the label of the division method that provides the highest score (step S104). After that, the creation unit 13d creases a learned model for each dataset (step S105).
As described above, the information processing apparatus 10 according to the first embodiment calculates, with respect to datasets into which data is divided based on individual labels serving as candidates for an index based on which the data is to be divided, the amount of information for each of division methods that use the respective labels. Then, the information processing apparatus 10 divides the data into a plurality of datasets based on the division method that provides the highest amount of information of the calculated amounts of information. Next, with use of the thus divided datasets, the information processing apparatus 10 creates a learned model for each dataset. Therefore, the information processing apparatus 10 can determine an appropriate data division method at a low learning cost.
Moreover, the information processing apparatus 10 according to the first embodiment calculates the amount of mutual information for each multi-label using the MINE, and can therefore calculate the amount of mutual information for each label without involving estimation of the probability distribution p(x) from a dataset x. Thus, the information processing apparatus 10 can reduce the calculation cost.
Moreover, the information processing apparatus 10 according to the first embodiment estimates the probability of occurrence of detection target data using the learned models created by the creation unit 13d, and if the probability of occurrence is lower than a predetermined threshold value, the information processing apparatus 10 detects an anomaly. Thus, the information processing apparatus 10 can detect an anomaly in, for example, an IoT device with high accuracy.
Here, with use of
In the first embodiment above, a case has been described in which the information processing apparatus 10 has the acquisition unit 13a, the calculation unit 13b, the division unit 13c, the creation unit 13d, and the detection unit 13e; however, the present invention is not limited to this, and the functions of the various units may be distributed to a plurality of apparatuses. Here, a detection system according to another embodiment will be described using
The acquisition unit 110 of the data acquiring apparatus 100 acquires traffic data as learning data or detection target data. Upon acquiring the data, the acquisition unit 110 sends the acquired data to the score calculator. If detection target data is acquired, the acquisition unit 110 sends the acquired detection target data to the detector 400.
Upon receiving the traffic data, the calculation unit 210 of the score calculator 200 creates a list of labels serving as candidates for division. Then, as in the first embodiment, the calculation unit 210 calculates the amount of mutual information scores and sends the calculated scores to the data acquiring apparatus 100.
Upon receiving the calculated scores, the division unit 120 of the data acquiring apparatus 100 divides the data into a plurality of dataset based on a division method that provides the highest amount of information, of the calculated amounts of information. Then, the division unit 120 sends the datasets to the learning machine 300.
Upon receiving the datasets, the creation unit 310 of the learning machine 300 creates, with use of the received datasets, a learned model for each dataset. Then, the creation unit 310 sends the created learned models to the detector 400.
The detection unit 410 of the detector 400, with use of the learned models created by the creation unit 310, estimates the probability of occurrence of detection target data newly detected by the acquisition unit 13a, and if the probability of occurrence is lower than a predetermined threshold value, the detection unit 410 detects an anomaly.
As described above, in the detection system according to the other embodiment, the plurality of apparatuses have the functional units (the acquisition unit 110, the division unit 120, the calculation unit 210, the creation unit 310, and the detection unit 410) in a distributed manner. The detection system according to the other embodiment achieves similar effects to those of the first embodiment.
[System Configuration, Etc.]
The components of the apparatuses illustrated in the drawings are conceptual representation of functions, and need not be physically configured in the manner as illustrated in the drawings. In other words, specific forms of distribution and integration of the apparatuses are not limited to those illustrated in the drawings, and the entirety or a portion of the individual apparatuses may be functionally or physically distributed or integrated in suitable units depending on various loads or use conditions. Furthermore, all or suitable part of the processing functions implemented by the apparatuses may be realized by a CPU and a program analyzed and executed by the CPU, or may be realized by hardware using a wired logic.
Moreover, of the processing steps described herein in the embodiments, all or part of the processing steps that have been described as being performed automatically may also be performed manually.
Alternatively, all or part of the processing steps that have been described as being performed manually may also be performed automatically using a known method. In addition, the processing procedures, control procedures, specific names, and information including various kinds of data and parameters described hereinabove or illustrated in the drawings can be suitably changed unless otherwise stated.
[Program]
It is also possible to create a program that describes processing executed by the information processing apparatus described in the foregoing embodiment and is written in a computer-executable language. For example, it is also possible to create a creation program that describes processing executed by the information processing apparatus 10 according to the embodiment and is written in a computer-executable language. In this case, similar effects to those of the foregoing embodiment can be achieved by a computer executing the creation program. Furthermore, processing similar to that of the foregoing embodiment may be also realized by recording the creation program in a computer-readable recording medium, and causing a computer to load and execute the creation program recorded in this recording medium.
As illustrated in
Here, as illustrated in
Moreover, the various kinds of data described in the foregoing embodiments are stored as program data in, for example, the memory 1010 or the hard disk drive 1090. The CPU 1020 loads the program module 1093 or the program data 1094 stored in the memory 1010 or the hard disk drive 1090 into the RAM 1012 as necessary, and executes various processing procedures.
Note that the program module 1093 and the program data 1094 related to the creation program need not be stored in the hard disk drive 1090, and may also be stored in, for example, a removable storage medium and loaded by the CPU 1020 via a disk drive or the like. Alternatively, the program module 1093 and the program data 1094 related to the creation program may also be stored in another computer that is connected via a network (a LAN (Local Area Network), a WAN (Wide Area Network), or the like) and loaded by the CPU 1020 via the network interface 1070.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/019963 | 5/20/2019 | WO | 00 |