This application is based upon and claims the benefit of priority of the prior Japanese Patent application No. 2022-108028, filed on Jul. 4, 2022, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a computer-readable recording medium having stored therein a detection program, a detection apparatus, and detection method.
Network-based intrusion detection systems (NIDSs) have been well-known. Such systems issue alerts when any anomalous communication that coincides with detection rule is detected. However, a group of alerts (alert group) issued by an NIDS may include false positive alerts issued when normal communications are misdetected as anomalous communications.
Anomaly detection techniques for detecting suspicious (e.g., anomalous) terminals (anomaly hosts) using an alert group including false positive alerts have also been well-known. For example, an anomaly detection technique uses an alert group including false positive alerts to train a machine learning model so that states in which an NIDS misjudges communications as anomalous communications despite of being normal communications are regarded as normal states (states without attacks). Upon screening of an alert group, the anomaly detection technique inputs an alert into a machine learning model that has been trained, and determines that a communication pertaining to that alert is an anomalous communication (it is not a false positive alert) when an inference result acquired from the input alert indicates a deviation from normal states. The anomaly detection technique then identifies a terminal that has performed the anomalous communication as an anomaly host.
In the anomaly detection technique, to keep up with updates to anomalous communication techniques, such as attacks and the like, acquisition of new training data over time to retrain the machine learning model is important. In an anomaly detection technique, at least an alert group related to an anomaly host is discarded from an alert group issued by an NIDS. Because alerts related to the anomaly host also include alerts on normal communications (false positive alerts), discarding the entire alert group related to the anomaly host may make ensuring a sufficient volume of training data difficult, which may affect the accuracy of the model.
On the other hand, when all alerts in an alert group issued by the NID are used as training data, dangerous cyber attacks (hereinafter referred to as “attacks”) become gradually undetectable if alerts related to such attacks are added. This is because if training data includes a lot of alerts similar in types, those alerts are no longer “abnormal” alerts (they are regarded as normal communications). As a result, alerts indicating actual attacks might not be detected as “attacks”.
Hence, the training accuracy of a machine learning model involving an anomaly detection may be declined, and the attack detection capability may be reduced.
According to an aspect of the embodiments, a non-transitory computer-readable recording medium having stored therein a detection program for causing a computer to execute a process including: identifying a terminal performing an anomalous communication, based on a machine learning model trained with normal communications that satisfy a certain condition as training data; and adding, to the training data used for the training of the machine learning model, at least one alert of a first alert group related to a plurality of communications performed by the identified terminal, the at least one alert being identified based on a degree of contribution of a feature included in the alert to the identification.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
Hereinafter, an embodiment of the present disclosure will be described with reference to the drawings. Note that the embodiment that will be described is merely exemplary, and it is not intended to exclude various variations or applications of techniques that are not described. For example, the present embodiment may be modified in various forms without departing from the scope thereof. In the drawings used in the following description, elements denoted by the like reference symbols denote the same or similar elements unless otherwise stated.
Hereinafter, a technique for improving the learning accuracy of a machine learning model for detecting anomalous communications in anomaly detection will be described with reference to an anomaly detection apparatus 100 according to one embodiment (see
An anomaly detection apparatus 100 according to one embodiment represents one example of a detection apparatus. The anomaly detection apparatus 100 may be an NIDS per se or an apparatus connected to an NIDS, as will be described below. The anomaly detection apparatus 100 may be a virtual server (VM; Virtual Machine), or may be a physical server. The functions of the anomaly detection apparatus 100 may be embodied by a single computer, or may be embodied by two or more computers. Furthermore, at least some of functions of the anomaly detection apparatus 100 may be embodied through hardware (HW) resources and network (NW) resources provided by a cloud environment.
As illustrated in
The processor 10a represents one example of a processing unit configured to perform a wide variety of controls and computations. The processor 10a may be communicatively connected to each block in the computer 10 to each other via a bus 10i. The processor 10a may be a multiprocessor including a plurality of processors or may be a multi-core processor including a plurality of processor cores, or may have a configuration including a plurality of multi-core processors.
Examples of the processor 10a include an integrated circuit (IC), such as a CPU, an MPU, a GPU, an APU, a DSP, an ASIC, and an FPGA, for example. A combination of two or more of these integrated circuits may be used as the processor 10a. CPU is an abbreviation for Central Process Unit, and MPU is an abbreviation for Micro Process Unit. GPU is an abbreviation for Graphics Process Unit, and APU is an abbreviation for Accelerated Process Unit. DSP is an abbreviation for Digital Signal Processor, ASIC is an abbreviation for Application Specific IC, and FPGA is an abbreviation for Field-Programmable Gate Array.
For example, in the case where the anomaly detection apparatus 100 performs a machine learning process in addition to an attack detection process according to one embodiment, the processor 10a may be a combination of a processing unit, such as CPU, which performs the attack detection process and an accelerator which performs the machine learning process. Examples of the accelerator include a GPU, an APU, a DSP, an ASIC, and an FPG described above, for example.
The memory 10b represents one example of HW configured to store information, such as a wide variety of data and programs. Examples of the memory 10b include either or both of a volatile memory, such as a Dynamic Random Access Memory (DRAM), and a nonvolatile memory, such as a Persistent Memory (PM).
The storing unit 10c represents one example of HW configured to store information, such as a wide variety of data and programs. Examples of the storage 10c include a wide variety of storage devices, such as a magnetic disk device, e.g. a Hard Disk Drive (HDD), and a semiconductor drive device, e.g., a Solid State Drive (SSD), and a nonvolatile memory. Examples of the nonvolatile memory include a flash memory, a Storage Class Memory (SCM), and a Read Only Memory (ROM).
The storage 10c may store a program 10g (detection program) that embodies all or a part of the various functions of the computer 10.
For example, the processor 10a in the anomaly detection apparatus 100 can embody the functions of the anomaly detection apparatus 100 (control unit 110 exemplified in
The IF unit 10d represents one example of a communication IF configured to perform controls on connections and communications between the anomaly detection apparatus 100 and various networks, including networks between unillustrated devices. Examples of such devices include network devices that provide the anomaly detection apparatus 100 with data, computers such as servers which perform machine learning processes based on data output from the anomaly detection apparatus 100, and user (operator) terminals that receive alerts, for example.
For example, the IF unit 10d may include an adapter compliant with a local area network (LAN) standard such as the Ethernet® or an optical communication standard such as the Fibre Channel (FC). The adapter may support either or both of wireless and wired communication techniques.
Note that the program 10g may be downloaded to the computer 10 from the network via the communication IF and stored in the storage 10c.
The IO unit 10e may include either or both of an input device and an output device. Examples of the input device include a keyboard, a mouse, and a touch panel, for example. Examples of the output device include a monitor, a projector, and a printer, for example. Furthermore, the IO unit 10e may include a touch panel or any other device in which an input device and a display device are integrated.
The reader 10f represents one example of a reader that reads data and information of a program recorded on a recording medium 10h. The reader 10f may include a connection terminal or device to which the recording medium 10h can be connected or inserted. Examples of the reader 10f include an adapter compliant with the Universal Serial Bus (USB) or any of other standards, a drive device for accessing a recording disk, and a card reader for accessing a flash memory, such as an SD card, for example. Note that the recording medium 10h may store the program 10g, and the reader 10f may read the program 10g from the recording medium 10h and store it in the storage 10c.
Examples of the recording medium 10h may include non-transitory computer-readable recording media, such as magnetic/optical disks and flash memories, for example. Examples of magnetic/optical disks may include flexible disks, compact discs (CDs), digital versatile discs (DVDs), Blu-ray discs, and holographic versatile discs (HVDs), for example. Examples of flash memories may include USB memories and SD cards, for example.
The above-described HW configuration of the computer 10 is merely exemplary. Accordingly, in the computer 10, HW may be added or omitted (e.g., any blocks may be added or omitted), divided, or combined in any combinations, or a bus may be added or omitted, where it is deemed appropriate.
In
The network apparatus 2 may be a communication apparatus, such s a router or a hub. The NIDS 1 is communicatively connected to the network apparatus 2. The NIDS 1 acquires communication data sent from and received by the network apparatus 2 through mirroring. The NIDS 1 is a network-based intrusion detection system that analyzes acquired communication data and issues an alert when any communication that coincides with a detection rule is detected. The NIDS 1 may send the alert to the operator terminal 3 of an operator 3b.
An alert is information indicating a detection of a communication that coincides with the detection rule, and may be, as one example, a warning log indicating occurrence of an attack. For example, an alert group issued by the NIDS 1 may include false positive alerts issued when normal communications are misdetected as anomalous communications. If the number of false positive alerts increases, there is a risk of alert fatigue in which the operator 3b engaging in monitoring become desensitized to alerts, resulting in a delay of a response to an alert on an actual attack.
The anomaly detection system 100 uses an anomaly detection technique for screening of an alert group issued by the NIDS 1. The anomaly detection apparatus 100 may be the NIDS 1 per se, or may be an apparatus included in the NIDS 1, or may be an apparatus connected to the NIDS 1 via a network.
The anomaly detection technique is a technique that learns states in which the NIDS 1 misjudges normal communications as anomalous communications, so that states (anomalies) deviating from such states become detectable. The anomaly detection apparatus 100 has a control unit 110.
The control unit 110 uses the training data to implement the learning process (training process) in machine learning. In other words, the anomaly detection apparatus 100 trains an anomaly detection model 120 (machine learning model) by the control unit 110.
The control unit 110 identifies an anomaly host 5 performing an anomalous (abnormal) communication based on the anomaly detection model 120 that has been already trained. The control unit 110 identifies a first alert group related to a plurality of communications performed by the anomaly host 5. The control unit 110 may perform a control to notify information on the anomaly host 5 of the operator terminal 3. The control unit 110 adds, to the training data used for training of the anomaly detection model 120, at least one alert of the first alert group. The process by the control unit 110 will be described later.
The training alert group 21 includes false positive alerts. The training alert group 21 may include false positive alerts at a percentage of preferably 80% or more, more preferably 90% or more. All alerts in the training alert group 21 may be false positive alerts. In selection of the training alert group 21, alerts related to truly anomalous communications (e.g., attacks) may be excluded. When the model is trained for the first time, the training alert group 21 may be a real alert group which is a set of alerts issued by the NIDS 1 when the NIDS 1 determines that the corresponding communications are anomalous, or may be training data acquired from an external source. In second and subsequent training of the anomaly detection model 120, a third alert group identified in an inference phase is added to the training alert group 21, so that the updated training alert group 21 is used. The third alert group will is discussed later.
The control unit 110 trains the anomaly detection model 120 based on a false positive alert issued on a communication misjudged by the NIDS 1 as an anomalous communication despite of being a normal communication, so that states in which normal communications are misjudged as anomalous communications are regarded as normal states (states without attacks). A communication misdetected as an anomalous communication despite of being a normal communication represents one example of a normal communication that satisfies a certain condition.
The anomaly detection apparatus 100 (e.g. , the control unit 110) may perform the following processes (1) to (7).
The control unit 110 performs screening of the real alert group 22. When the real alert group 22 is input into the trained anomaly detection model 120 and the inference result obtained from the input indicates a deviation from normal states, the control unit 110 may determine that the alert is an alert related to a truly anomalous communication (is not a false positive alert). The control unit 110 may then identify (detect) the causal host that is the source of the anomalous communication (e.g., a packet) as an anomaly host 5. Note that the control unit 110 may classify alerts in the real alert group 22 based on source hosts (terminals), and may make determinations on behaviors of communications by each host to detect an anomaly host 5. A plurality of anomaly hosts 5 may be detected through the screening of the real alert group 22.
The detection result on the anomaly host 5 is reported to the operator terminal 3 of the operator 3b. The detection result may include information to specify the anomaly host 5, such as the Internet Protocol (IP) address, as an example. In response to receiving the report, the operator 3b may conduct a hearing from the user, etc. of the anomaly host 5 or make any other action. In this manner, according to the anomaly detection apparatus 100, through the screening of the real alert group 22 output from the NIDS 1, an alert related to a communication that is truly anomalous (is not a false positive alert) is extracted from the real alert group 22 and a notification to the operator terminal 3 is made. This reduces the frequency of alerts received by the operator 3b.
The control unit 110 identifies a first alert group 23 related to a plurality of communications performed by the identified anomaly host 5. The first alert group 23 may include a plurality of first alerts 23a. The first alert group 23 is also referred to as anomaly host alert group.
The control unit 110 calculates the degree of contribution made by a feature included in each of the first alerts 23a, to the identification in the process (3) of identifying the first alerts 23a. The degree of contribution is an indicator of the degree of the basis provided by the feature in the identification of the anomaly host 5 as a host performing an anomalous communication.
The control unit 110 identifies one or more second alerts 24 which are one or more alerts in the first alert group 23 and which have calculated degrees of contribution satisfying a certain first condition. The second alert(s) 24 represents one example of a second alert. When more than one second alerts 24 are present, the second alerts 24 may be referred to as a second alert group.
The control unit 110 identifies a third alert group 25, the third alert group 25 being the first alert group 23 excluding the second alert(s) 24. The third alert group 25 represents one example of a third alert.
Note that the control unit 110 may identify, from the first alert group 23, a third alert group 25 composed of a plurality of alerts having degrees of contribution satisfying a certain condition. The certain condition may be the opposite of the first condition. In this case, the control unit 110 may omit performing the identification process of second alerts described in (5) above, or may perform the identification process of the third alert group 25 in parallel with the identification process of second alerts. Identifications of the second alert 24 and the third alert group 25 will be described later.
The control unit 110 adds at least one alert of the third alert group 25 to the training alert group 21. This allows the at least one alert of the third alert group 25 related to a communication by the anomaly host 5 to be reused as training data.
Now referring to
As illustrated in
In the inference phase, the data acquisition unit 121 acquires a real alert group 22 (see
The anomaly host detection unit 122 identifies an anomaly host 5 performing an anomalous communication based on an anomaly detection model 120 that has been already trained. The detection result transmission unit 123 transmits the detection result on the anomaly host 5 to the operator terminal 3 of the operator 3b or the like.
In the inference step, the anomaly host detection unit 122 acquires a predicted value for each terminal (host) based on the inference result acquired through the input of the real alert group 22 into the anomaly detection model 120. In the example in
The anomaly host detection unit 122 may detect (identify) a terminal (host) which has a predicted value equal to or greater than a certain threshold value x, as an anomaly host 5. Alternatively, the anomaly host detection unit 122 may detect (identify) X terminal(s) (host(s)) (X is an integer of 1 or more) having highest predicted values from the top, as anomaly host(s) 5. The detection of the anomaly hosts 5 may involve identifying the IP addresses of the possible anomaly hosts 5, for example.
Referring back to
As illustrated in
The first alert group identification unit 124 identifies, from the real alert group 22, alerts of which source (host) IP address coincides with the IP address of the anomaly host 5, as a first alert group 23. In the example in
Referring back to
The alert identification unit 126 identifies at least one alert of the first alert group 23 as an alert to be added to the training alert group 21 (third alert group 25), the at least one alert being identified based on the degree of contribution included in the first alert 23a.
The third alert group addition unit 127 adds at least one alert of the third alert group 25 to the training alert group 21. Note that the third alert group addition unit 127 may add all alerts in the third alert group 25 to the training alert group 21.
In the first alert 23a illustrated in
The degree of contribution calculation unit 125 calculates the degree of contribution 240 of each of the plurality of features 231. The first alert 23a illustrated in
For example, the alert identification unit 126 may add a first alert 23a of the first alert group 23 to the third alert group 25, wherein the degree of contribution 240 of each feature 231 of the first alert 23a satisfies the certain condition. The certain condition is, for example, whether or not the degree of contribution 240 of a feature 231 is less than a certain threshold value.
In the first alert 23a in
In the method of identifying the second alerts 24 and the third alert group 25, a degree of contribution 240 being equal to or greater than a certain threshold value means that the corresponding feature 231 is likely to have contributed to identification of a source terminal of a communication as an anomaly host 5. A greater degree of contribution 240 indicates that the degree of contribution is high. Conversely, a degree of contribution 240 less than the certain threshold value means that the corresponding feature 231 is less likely to have contributed to identification of a source terminal of the communication as an anomaly host 5.
In
In
In one method of identifying the third alert group 25 by the alert identification unit 126, the alert identification unit 126 deletes the second alert 24 from the first alert group 23. As a result, the first alert group 23 including a first alert(s) remaining undeleted is identified as the third alert group 25. In
In another method of identifying the third alert group 25 by the alert identification unit 126, the alert identification unit 126 may identify, from the first alert group 23, a third alert group 25 composed of a plurality of alerts where the degrees of contribution 240 satisfy a certain condition. The certain condition may be the opposite of the first condition. The certain condition may be such that the degree of contribution 240 of each feature 231 included in the first alert 23a is less than a certain threshold value. In
The method used to calculate the degree of contribution 240 for identifying the third alert group 25 from the first alert group will be described.
In
Cases #1 to #6 indicate patterns of combinations of Features A to C in Steps #1 to #4. In Cases #1 to #6, neither of the features 231 is used in Step #1. In Step #2, one of features 231 A, B, and C is used. In Step #3, a combination of two features, i.e., the feature 231 used in Step #2 and one of the remaining features 231, is used. In Step #4, a combination of three features, i.e., the two features 231 used in Step #3 and the one remaining feature 231, is used.
The degree of contribution calculation unit 125 uses the anomaly detection model 120 to calculate 24 predicted values for Cases #1 to #6 based on 24 (6×4) patterns of alerts using the features 231 used in Steps #1 to #6. As the number of features 231 increases with an increase in the number of steps, the number of predicted values also increases.
When n features are present in a first alert 23a, the number of cases is n!, and the number of steps is n+1. Hence, the anomaly detection model 120 calculates (n+1)·n! predicted values based on (n +1)·n! patterns of alerts.
In each of Cases #1 to #6, the degree of contribution calculation unit 125 may determine the Shapley value of each feature 231 by calculating the average of the increases in predicted values at the step where the feature 231 is used for the first time. The Shapley value of Feature A is a value of (the sum of the increases in the predicted values of Step #2 in Case #1, Step #2 in Case #2, Step #3 in Case #3, Step #4 in Case #4, Step #3 in Case #5, and Step #4 in Case #6) divided by the number of cases (i.e., 6).
With regard to Feature A, the average of the increases in predicted values of each feature 231 at the step where the feature 231 is used for the first time, i.e., the degree of contribution 240 (Shapley value) of Feature A, is (10+10+10+25+40+20)/6≅19.2. With regard to Feature B, the average of the increases in predicted values of each feature 231 at the step where the feature 231 is used for the first time, i.e., the degree of contribution 240 (Shapley value) of Feature B, is (10+25+20+20+5+25)/6=17.5. With regard to Feature C, the average of the increases in predicted values of each feature 231 at the step where the feature 231 is used for the first time, i.e., the degree of contribution 240 (Shapley value) of Feature C, is (30+15+20+5+5+5)/6≅13.3. Thus, the descending order of values of the degree of contribution 240 is Features A, B, and C.
The degrees of contribution 240, however, are not limited to Shapley values. As the number of features n increases, the volume of computations for calculating Shapley values increases. Thus, in place of the actual Shapley value, an approximation of the Shapley value may be used as the degree of contribution 240 in actual machine learning. Alternatively, the degree of contribution calculation unit 125 may calculate the degree of contribution 240 of a feature 231 based on the number or the frequency of occurrences.
In
In order to prevent the degree of contribution 240 of a feature (e.g., IP header length) which has the same value in almost all alerts from becoming too high, such a feature 231 may be excluded in advance.
In place of the number of occurrences of each of the features 231, the degree of contribution 240 may be the frequency of occurrences (the number of occurrences/the sum of the numbers of occurrences) of each feature 231.
In
According to the method illustrated in
The degree of contribution 240 may be the number of occurrences determined by dividing the sum of the number of occurrences of feature of the similar type by the number of occurrences (“the sum of the number of occurrences of the feature of the similar type”/“the number of occurrences”). The feature of the similar type may be a feature having the same item 232, or a feature having the same item 232 and having a value 233 within a predetermined range.
Next, an example of the operation of the anomaly detection apparatus 100 according to one embodiment will be described.
As exemplified in
The control unit 110 uses the training alert group 21 to implement the training process (learning process) in machine learning. In other words, the anomaly detection apparatus 100 trains the anomaly detection model 120 (machine learning model) by the control unit 110 (Step S12).
At least one alert from a third alert group 25 is added to the training alert group 21 over time, as will be described below. Accordingly, in the second and subsequent training (retraining) of the anomaly detection model 120, the training alert group 21 to which the alerts have been added will be used.
As exemplified in
The control unit 110 performs an anomaly detection (Step S22). The anomaly host detection unit 122 identifies (detects) an anomaly host 5 performing an anomalous communication based on the anomaly detection model 120 that has been trained. The detection result transmission unit 123 may transmit the detection result on the anomaly host 5 to the operator terminal 3 of the operator 3b or the like. The first alert group identification unit 124 identifies a first alert group 23 related to a plurality of communications performed by the identified anomaly host 5.
The degree of contribution calculation unit 125 calculates, for each first alert 23a included in the first alert group 23, the corresponding degrees of contribution 240 made by the respective features 231 included in the each alert 23a in the identification of the anomaly host 5 (Step S23).
The alert identification unit 126 selects one alert (first alert 23a) from the first alert group 23 of the anomaly host 5 (Step S24).
The alert identification unit 126 determines whether or not the selected alert contains any feature 231 of which degree of contribution 240 satisfies a first condition (Step S25). In one example, the first condition may be such that the degree of contribution 240 is equal to or greater than a certain threshold value.
If the selected alert includes a feature 231 that satisfies the first condition (see the YES route in Step S25), for example, if the alert includes a feature 231 having a degree of contribution 240 of equal to or greater than a certain threshold value, the process proceeds to Step S26. If the selected alert does not contain a feature 231 that satisfies the first condition (see NO route in Step S25), for example, if the alert includes a feature 231 of which degree of contribution 240 is not equal to or greater than the certain threshold value, the process proceeds to Step S27.
The process for the YES route in Step S25 represents one example of the process of identifying, from the first alert group 23, one or more second alerts 24 related to a plurality of communications performed by the identified anomaly host 5, the one or more second alerts 24 including a degree of contribution 240 of a feature 231 included in the alert satisfying a certain first condition.
In other words, the process of the NO route in Step S25 represents one example of the process of identifying one or more alerts as a third alert group 25 from the first alert group 23, the alerts in the third alert group 25 including a degree of contribution 240 of a feature 231 included in the alerts satisfying the certain condition. The certain condition may be such that the degrees of contribution 240 of all features 231 included in an alert are less than a certain threshold value.
In Step S26, the alert identification unit 126 discards the selected alert. The alert to be discarded in Step S26 is the second alert 24. Step S26 represents one example of the process of removing the second alert 24 from the first alert group 23.
In Step S27, the third alert group addition unit 127 adds the selected alerts to the training alert group 21 (training data). The process in Step S27 represents one example of the process of adding, to the training alert group 21 (training data) used for training of the anomaly detection model 120 (machine learning model), at least one alert of the third alert group 25, the third alert group 25 being the first alert group 23 excluding the second alert 24.
The alert identification unit 126 determines whether or not any unselected (unprocessed) alert is present in the first alert group 23 of the anomaly host 5 (Step S28). If an unselected alert is present (YES route in Step S28), the process returns to Step S24 and the next alert is selected in Step S24. If no unselected alert is present (NO route in Step S28), the process ends.
In one method according to one embodiment, the computer 10 implements the functions of the anomaly detection apparatus 100. The computer 10 performs the process of identifying the anomaly host 5 performing an anomalous communication, based on a machine learning model trained with normal communications that satisfy a certain condition as training data. The computer 10 performs the process of adding, to the training data for the machine learning model, an alert of the first alert 23 related to the plurality communications performed by the identified anomaly host 5, the alert being identified based on a degree of contribution 240 of a feature 231 included in the alert to the identification.
According to the above method, an alert that is identified from the first alert group 23 of the anomaly host 5 in view of the degree of contribution 240 is added to the training alert group 21. This allows at least one alert to be used as training data, rather than discarding the entire first alert group 23 related to the plurality of communications performed by the anomaly host 5. This allows false positive alerts included in the first alert group 23 to be used as the training data, thereby preventing decrease in the volume of the training data and ensuring that a larger volume of training data becomes available. As a result, the learning accuracy of the machine learning model in detection of anomalous communications can be improved.
Furthermore, the alert that is added to the training data is identified based on the degree of contribution 240 of a feature 231 to the identification. Thus, based on degrees of contribution 240, alerts related to attacks are prevented from being added to the training data. This prevents the attack detection capability from decreasing due to addition of alerts related to attacks to the training data.
Furthermore, according to the method according to one embodiment, the process of adding the alert to the training data includes identifying, from the first alert group 23, one or more second alerts 24 including a feature 231 having a degree of contribution 240 of equal to or greater than a certain threshold value. Furthermore, the process of adding the alert to the training data includes adding, to the training data, at least one alert of a third alert group, the third alert group being the first alert group 23 excluding the second alerts 24.
A second alert 24 that includes a feature 231 having a degree of contribution 240 of equal to or greater than a certain threshold value is prevented from being added to the training data. Thus, alerts related to attacks are prevented from being added to the training data. This prevents the attack detection capability from decreasing due to addition of alerts related to attacks to the training data.
Furthermore, according to the method according to one embodiment, the process of identifying of the one or more second alerts 24 includes a prevention process when a plurality of anomaly hosts 5 are identified in the process of identifying an anomaly host 5. The prevention process prevents an alert including a feature 231 that is common in a plurality of first alert groups 23 related to a plurality of communications performed by the respective anomaly hosts 5, from being identified as a second alert 24.
This prevents an alert commonly having a high degree of contribution 240 in alerts of the respective anomaly hosts 5 from being discarded. As a result, only more abnormal alerts are excluded from the training data. This ensures that a larger volume of training data becomes available.
Furthermore, according to a method according to one embodiment, a feature 231 is a combination of an item 232 and the value of that item 232 in each alert in the first alert group 23. The degree of contribution 240 is an indicator of the degree of a basis provided by a feature 231 in identification of an anomaly host 5 as a host performing an anomalous communication.
By making the combination of the item 232 and the value of that item 232 to be taken into consideration, it is possible to prevent an alert related to an attack from being added to the training data based on an indicator of the degree of a basis in identification of an anomaly host 5 as a host performing an anomalous communication.
In the anomaly detection apparatus 100 according to one embodiment, it has been described that a first alert 23a that includes one or more features 231 having a degree of contribution 240 of equal to or greater than a certain threshold value are added to a second alert 24. In other words, the anomaly detection apparatus 100 prevents inclusion of a first alert 23a that includes one or more of features 231 having a degree of contribution 240 of equal to or greater than the certain threshold value, from being added to the third alert group 25. However, the anomaly detection system 100 is not limited to this case.
As illustrated in (6), when a plurality of anomaly hosts 5 (terminals) are identified by the anomaly host detection unit 122, the prevention unit 128 prevents an alert related to a communication that is commonly performed by the plurality of anomaly hosts 5 (terminals) from being identified as a second alert 24. Specifically, the prevention unit 128 refrains from identifying, as a second alert 24, an alert including a feature 231 that is common in a plurality of first alert groups 23 related to a plurality of communications performed by the respective anomaly hosts 5.
In other words, when a plurality of anomaly hosts 5 are identified by the anomaly host detection unit 122, the prevention unit 128 identifies an alert related to a communication that is commonly performed by the plurality of anomaly hosts 5 as a third alert group 25. Specifically, the prevention unit 128 and the alert identification unit 126 identify, as a third alert group 25, alerts that include a feature 231 that is common in a plurality of first alert groups 23 related to a plurality of communications performed by the respective anomaly hosts 5.
In
In the example in
Alert #A1, Alert #A2, and Alert #A3 on Anomaly Host #1 include Feature A (destination port=80), Feature B (destination port=8080), and Feature C (destination port=2908 and IDS signature number=4567), respectively.
Alert #B1, Alert #B2, and Alert #B3 on Anomaly Host #2 include Feature D (destination port=8000), Feature B (destination port=8080), and Feature E (destination port=3908 and IDS signature number=7891), respectively. In
In
In the modification, the prevention unit 128 prevents Alert #A2 and Alert #B2 including Feature B that is common from being identified as second alerts 24. On the other hand, Alerts #A1, #A3, #B1, and #B3 that do not include any common features 231 are identified as second alerts 24.
The alert identification unit 126 refrains from identifying Alert #A2 and Alert #B2 as second alerts 24. As a result, the alert identification unit 126 does not delete (discard) Alert #A2 and Alert #B2 from the first alert groups 23-1 and 23-2.
The remaining alerts in the first alert group 23 excluding the second alerts 24 compose the third alert group 25.
In the modification, Alert #A2 and Alert #B2 are not included in the second alert 24, but are included in the third alert group 25. In other words, the prevention unit 128 promotes identification of Alert #A2 and Alert #B2 as the third alert group 25. As a result, Alert #A2 and Alert #B2 included in the third alert group 25 are also added to the training alert group 21.
Communications sent from a plurality of hosts on a regular basis are likely to be false positive alerts and may not be real attacks. Hence, Alert #A2 and Alert #B2, which are likely to be false positive alerts, can be used as training data. However, if an alert including Feature B is a communication typically performed by an attacker 5a, the control unit 110 may apply the method according to one embodiment on the alert including Feature B.
If the selected alert does not include any feature 231 that satisfies the first condition (see NO route in Step S35), the process proceeds to Step S36. If the selected alert contains a feature 231 that satisfies the first condition (see the YES route in Step S35), the process proceeds to Step S37.
In Step S36, the third alert group addition unit 127 adds the selected alert to the training alert group 21 (training data).
In Step S37, the prevention unit 128 determines whether or not an alert including a feature 231 that is common in a plurality of first alert groups 23-1, 23-2, . . . , and 23-M (M is an arbitrary integer) related to a plurality of communications performed by the respective hosts 5 is present.
If no alert including a feature 231 that is common in the plurality of first alert groups 23-1, 23-2, . . . , and 23-M is present (see NO route in Step S37), the process proceeds to Step S36. If an alert including a feature 231 that is common in a plurality of first alert groups 23 is present (see the YES route in Step S37), the process proceeds to Step S38. If a plurality of anomaly host 5 have not been identified and one anomaly host 5 has been identified (see NO route in Step S37), the process proceeds to Step S36.
In Step S38, the alert identification unit 126 discards the selected alert.
The process in the NO route of Step S37 and the process in Step S36 represent one example of the process of preventing an alert including a feature 231 that is common in alerts in first alert groups 23-1, . . . , 23-M of the respective hosts 5, from being identified as a second alert 24. The alert to be discarded in Step S38 is the second alert 24.
In other words, the process in the NO route of Step S37 and the process in Step S36 represent one example of a process of promoting identification of an alert including a feature 231 that is common in alerts in first alert groups 23-1, . . . , 23-M of the respective hosts 5 as a third alert group 25.
The alert identification unit 126 determines whether or not any unselected (unprocessed) alert is present in the first alert group 23 of the anomaly host 5 (Step S39). If any unselected alert is present (YES route in Step S39), the process returns to Step S34 and the next alert is selected in Step S34. If no unselected alert is present (NO route in Step S39), the process ends.
As described above, the method according to the modification can achieve effects similar to the effects of the one embodiment, and can also increase the number of alerts identified as a third alert group 25, thus contributing to enrichment of training data.
The technique according to one embodiment described above can be embodied in the following variations and modifications.
For example, the functional blocks included in the anomaly detection apparatus 100 may be combined together in any combination, or one functional block may be divided.
Furthermore, the anomaly detection apparatus 100 may also have a configuration (system) in which a plurality of apparatuses cooperate with each other via a network to implement the process functions. In this case, various servers or apparatuses, such as DB servers, application servers, and web servers, may cooperate with each other via a network to implement the process functions as the anomaly detection apparatus 100.
In one aspect, the present disclosure can improve the learning accuracy of a machine learning model that detects anomalous communications.
Throughout the descriptions, the indefinite article “a” or “an”, or adjective “one” does not exclude a plurality.
All examples and conditional language recited herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present inventions have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2022-108028 | Jul 2022 | JP | national |