An embodiment relates to a cyberattack information analysis program, a cyberattack information analysis method, and an information processing apparatus.
In recent years, cyberattacks such as unauthorized access through a network have become a serious problem. In order to take measures against the cyberattack, it is important to analyze huge cyberattack information that is observed every day, specify an attack source, and monitor the attack source.
Related art is disclosed in Japanese Laid-open Patent Publication No. 2015-76863.
According to an aspect of the embodiments, a non-transitory computer-readable recording medium records a cyberattack information analysis program for causing a computer to execute processes of: a collecting process of collecting a plurality of pieces of cyberattack information; a specifying process of analyzing the plurality of pieces of collected cyberattack information, specifying a plurality of addresses of cyberattack sources included in the plurality of pieces of cyberattack information, and specifying a period in which each of the specified addresses of the plurality of cyberattack sources is observed; a determining process of determining an address range or some addresses included in the address range as monitoring targets according to a result of comparing a first period distribution of an observed period corresponding to the plurality of specified addresses and a second period distribution of an observed period for each address range; and an outputting process of outputting information regarding the determined address range or some addresses included in the address range.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
As a technology of specifying an attack source from huge cyberattack information, a technology is known that extracts a combination of a plurality of attacked destination communication devices having the same attack source communication device at a detection time when a network device detects the attack and in a period when an attack is performed including the detection time.
However, there is an extremely high possibility that an IP address used for the cyberattack or the like is a single-use IP address. Even if such a single-use IP address is analyzed, there is a case a labor for this analysis is wasted. Therefore, a work to specify a significant IP address for analysis from among a large number of IP addresses used for the cyberattack has been needed, and an analyzer who has a limited work time in busy daily work takes a lot of troubles with the analysis.
According to one aspect, a cyberattack information analysis program, a cyberattack information analysis method, and an information processing apparatus that can assist to specify significant information for analysis of cyberattacks may be provided.
Hereinafter, a cyberattack information analysis program, a cyberattack information analysis method, and an information processing apparatus according to an embodiment will be described with reference to the drawings. Configurations with the same functions in the embodiment are denoted by the same reference signs, and redundant description will be omitted. Note that the cyberattack information analysis program, the cyberattack information analysis method, and the information processing apparatus described in the following embodiment are merely examples, and do not limit the embodiment. Additionally, each of the embodiments below may be appropriately combined unless otherwise contradicted.
As illustrated in
Note that the campaign is a name applied to a series of cyberattack activities (collection of plurality of cyberattacks) by the same attacker, the same attack force, and the same attack operation. For example, a user (analyzer) inputs a campaign name or a malware name corresponding to a campaign to be analyzed as the target campaign 12. Furthermore, for example, a list of campaign names to be processed regarding the target campaign 12 may be input.
In other words, the cyber threat intelligence 11 is an example of cyberattack information. Furthermore, at the time of STIX version 1.1.1, the cyber threat intelligence 11 is described in an eXtensible Markup Language (XML) format as illustrated in
For example, in an area 11a sandwiched by tags of “Observables”, an observed IP, domain, malware hash value, and the like are described. In an area 11b sandwiched by tags of “Indicators”, information indicating an indicator that characterizes a cyberattack event is individually described. Specifically, in the area 11b, an indicator that characterizes the cyberattack is described together with a tool used to create a detection indicator from a type of the detection indicator, an observable related to the detection indicator, an attack stage phase, a trace, and the like.
Furthermore, in an area 11c sandwiched by tags of “TTPs”, an attack way that is used, for example, spam mail, malware, a watering hole attack, and the like is described. Furthermore, in an area 11d sandwiched by tags of “Exploit_Targets”, information indicating a weak point of an asset to be a target of an attack in a cyberattack event such as weak points of software and a system to be attacked, from a viewpoint of vulnerability, the type of vulnerability, settings, configurations, and the like is individually described.
Furthermore, in an area 11e sandwiched by tags of “Campaigns”, a name of a series of attacks (campaign) or the like is described. In other words, in the area 11e, information regarding the campaign of the cyberattack is described. By referring to the name of the campaign in the area 11e, it is possible to identify which campaign the cyberattack with respect to the cyber threat intelligence 11 belongs to.
Furthermore, in an area 11f sandwiched by tags of “Threat_Actors”, information regarding a person/organization for contributing to the cyberattack is individually described from viewpoints of a type of the attacker of the cyberattack, a motive of the attacker, a skill of the attacker, an intention of the attacker, or the like. Specifically, in the area 11f, an IP address of an unauthorized access source (attack source), a mail address, or information regarding an account of a social network service is described.
In this way, in the areas 11a to 11f of the cyber threat intelligence 11, together with the name of the campaign indicating the campaign of the cyberattack, the information indicating the feature of the cyberattack such as the observables (IP, domain, hash value, or the like) of the cyberattack or the TTP, that is feature information (detection indicator) of the cyberattack is described. Note that, as a source used to share the cyber threat intelligence 11, there are Open Threat Exchange (OTX) that is provided by AlienVault and can be used for free, and the like. Furthermore, if a platform for managing the cyber threat intelligence 11 is used, it is possible to confirm content of the cyber threat intelligence 11 or to see a relationship between the cyber threat intelligences 11.
Next, the information processing apparatus 1 analyzes the collected cyber threat intelligence 11 and specifies a plurality of addresses (for example, IP address) of cyberattack sources of the target campaign 12. Furthermore, the information processing apparatus 1 specifies a period when each of the specified addresses is observed (hereinafter, referred to as survival period) by analyzing the collected cyber threat intelligence 11.
Next, the information processing apparatus 1 compares an overall distribution of the survival periods corresponding to the plurality of specified addresses and a distribution of survival periods for each address range. Next, the information processing apparatus 1 determines an address range or some addresses included in the address range as a monitoring target according to the comparison result between the overall distribution and the distribution for each address range.
Next, the information processing apparatus 1 outputs information regarding the determined address range or some addresses included in the address range, for example, as an output list 51 in a list format. For example, the information processing apparatus 1 outputs the output list 51 to a monitor 103 (refer to
From the output list 51 that has been output, an analyzer (user) can easily find an address range of which a survival period of an attack source address is different from that of the overall distribution and some addresses included in the address range as a monitoring target.
Next, details of the information processing apparatus 1 will be described. The information processing apparatus 1 includes a preprocessing unit 20, a survival period learning unit 30, a detection unit 40, and an output unit 50.
The preprocessing unit 20 receives an input of the target campaign 12, collects the cyber threat intelligence 11 corresponding to the target campaign 12 among the plurality of cyber threat intelligences 11 stored in the cyber threat intelligence DB 10, and executes preprocessing. In other words, the preprocessing unit 20 is an example of a collection unit.
Specifically, the preprocessing unit 20 collects the cyber threat intelligence 11 corresponding to the target campaign 12 from among the plurality of cyber threat intelligences 11 stored in the cyber threat intelligence DB 10 and executes the preprocessing, and stores the data on which the preprocessing has been executed in IP address group information 21 and cyber threat intelligence and IP address group information 22.
For example, the preprocessing unit 20 extracts an IP address such as “XXX.XXX.XXX.XXX” or “YYY.YYY.YYY.YYY” from a part sandwiched by tags of “AddressObj:Address_Value”. Similarly, the preprocessing unit 20 extracts an attack way from a part sandwiched by tags of the tactics, techniques, and procedures (TTPs). Furthermore, the preprocessing unit 20 extracts courses of action from a part sandwiched by tags of the courses of action (Courses_Of_Action). Furthermore, the preprocessing unit 20 extracts vulnerability to be used from a part sandwiched by tags of the attack target (Exploit_Targets). Furthermore, the preprocessing unit 20 extracts a name of a campaign from a part sandwiched by tags of the campaign. Note that, in a case where no data exists, it is assumed that no information exist. Furthermore, in a case where a title of the cyber threat intelligence 11 includes a time stamp (time information) such as “report for certain malware, period”, the time information is extracted.
Next, the preprocessing unit 20 determines whether or not the cyber threat intelligence 11 is related to the target campaign 12 on the basis of the element extracted from the cyber threat intelligence 11 (S11). Specifically, the preprocessing unit 20 determines whether or not the cyber threat intelligence 11 is targeted on the target campaign 12 on the basis of whether or not the campaign name in the element extracted from the cyber threat intelligence 11 matches the campaign name of the target campaign 12.
In a case where the cyber threat intelligence 11 is targeted on the target campaign 12 (S11: YES), the preprocessing unit 20 stores an IP address when the IP address indicating the attack source extracted from the cyber threat intelligence 11 is not stored in the IP address group information 21. Furthermore, the preprocessing unit 20 stores the IP address extracted from the cyber threat intelligence 11 in association with an ID indicating the cyber threat intelligence 11 in the cyber threat intelligence and IP address group information 22 (S12).
Note that, in a case where the cyber threat intelligence 11 is not targeted on the target campaign 12 (S11: NO), the preprocessing unit 20 skips the process in S12 and proceeds to S13.
Next, the preprocessing unit 20 determines whether or not an unselected cyber threat intelligence 11 exists as an element to be extracted in the cyber threat intelligence DB 10 (S13). In a case where the unselected cyber threat intelligence 11 exists (S13: YES), the preprocessing unit 20 selects the unselected cyber threat intelligence 11 as the element to be extracted and returns the process to S10. In a case where the unselected cyber threat intelligence 11 does not exist (S13: NO), the process on all the cyber threat intelligences 11 is completed. Therefore, the preprocessing unit 20 terminates the preprocessing.
Returning to
Next, the survival period learning unit 30 refers to a WHOIS record of the selected IP address and stores data of a subnet that is an address range of the IP address in the survival period information 32 (S21).
Note that the IP address range (subnet) is a group of several IP addresses and, for example, is a group of addresses in the CIDR notation (CIDR block) such as “AAA.AAA.AAA.0/22”, or the like. In the present embodiment, the CIDR block is exemplified as the IP address range (subnet). However, the IP addresses may be grouped for each domain, and the IP address range is not particularly limited to the CIDR block.
Next, the survival period learning unit 30 collects an IP address in the same band as the IP address selected from among unselected IP addresses of the IP address group information 21 on the basis of the data of the IP address range if the above IP address exists (S22).
Next, the survival period learning unit 30 refers to the cyber threat intelligence and IP address group information 22 and counts the number of cyber threat intelligences 11 in which the IP address selected in S20 and the IP address collected in S22 appear, respectively. Next, on the basis of the counted number, the survival period learning unit 30 obtains a survival period of each IP address and stores the obtained survival period in the IP address group information 21 and the survival period information 32 (S23).
The cyber threat intelligence 11 is issued at a predetermined cycle, for example, as a weekly report and the like. Therefore, the IP address described in the cyber threat intelligence 11 is an address that survives (is observed) as an attack source in the week in the cyber threat intelligence 11. Therefore, the survival period learning unit 30 can obtain the survival period (survival week) of the IP address by counting the number of cyber threat intelligences 11 in which the IP address appears.
Note that, in the present embodiment, it is assumed that the number of cyber threat intelligences 11 in which the IP address exists correspond to the number of weeks in which the IP address survives. However, the method of calculating the survival period is not limited to the above method. For example, in a case where a daily report is assumed, the number of survival days can be obtained as a survival period by counting the number of cyber threat intelligences 11. Furthermore, in a case where the cyber threat intelligence 11 includes date information, the cyber threat intelligences 11 in which the IP address appears are arranged in chronological order, and a survival period such as “2018/1/1 to 2018/1/31” may be calculated on the basis of the first (2018/1/1) and the last (2018/1/31) date information.
Next, the survival period learning unit 30 determines whether or not an unselected IP address exists in the IP address group information 21 (S24). In a case where the unselected IP address exists (S24: YES), the survival period learning unit 30 selects the unselected IP address and returns the process to S20. In a case where the unselected IP address does not exist (S24: NO), the process on all the IP addresses is completed. Therefore, the survival period learning unit 30 terminates the survival period learning process.
Returning to
Here, in the present embodiment, because a long-life IP address that is not a single-use IP address is focused, a long-life threshold used to identify the long-life IP address is obtained. For example, the detection unit 40 calculates a survival period which is in top 5% of the survival periods from the overall statistical information and sets the calculated value as a long-life threshold.
Next, the detection unit 40 selects an unselected IP address range from the survival period information 32 (S31). Next, the detection unit 40 refers to a survival period of an IP address belonging to the selected IP address range from the survival period information 32 and creates statistical information regarding the selected IP address range. Here, the detection unit 40 calculates a ratio of the long-life IP address in the IP address range (long-life rate) on the basis of the calculated long-life threshold by using the following formula (1) and stores a calculation result in the survival period information 32 (S32).
Long-life rate=(the number of IP addresses having a survival period exceeding the long-life threshold in the IP address range)/(the number of IP addresses in the IP address range) (1)
Next, the detection unit 40 determines whether or not an unselected IP address range exists in the survival period information 32 (S33). In a case where the unselected IP address range exists (S33: YES), the detection unit 40 selects the unselected IP address range and returns the process to S31. In a case where the unselected IP address range does not exist (S33: NO), the detection unit 40 proceeds the process to S34.
In S34, the detection unit 40 registers the IP address range to be monitored and the long-life IP address in the band in the output list 51 on the basis of the long-life rate of each IP address range in the survival period information 32. Specifically, the detection unit 40 registers the IP address range in which the long-life rate exceeds a predetermined threshold and an IP address that exceeds the long-life threshold (referred to as long-life IP address) in the output list 51 as monitoring targets (S34) and terminates the process.
Because the long-life threshold is set with reference to 5% for the overall distribution, for example, this threshold is set to be higher than 5%. With this setting, the detection unit 40 can obtain the IP address range of which the long-life rate is higher than that in the overall distribution of the survival periods and the long-life IP address in the IP address range.
Note that, in the present embodiment, the long-life threshold based on the top 5% in the distribution as the statistical information is calculated, and the overall distribution and the distribution for each IP address range are compared with each other by using the threshold with which the long-life rate in the IP address range exceeds 5%. Then, the IP address range in which the long-life rate exceeds 5% with respect to the overall distribution and the long-life IP address in the IP address range are monitored. However, other statistical information may be used to compare the distributions. For example, by calculating an average of the survival periods, the IP address range to be monitored and the IP address in the IP address range may be obtained on the basis of a difference between an average in the overall distribution and an average in the IP address range.
Returning to
From the output list 51, a user can easily find an IP address range of which the survival period of the attack source address is different from that of the overall distribution or a long-life IP address included in the address range as monitoring targets.
A graph G10 illustrated in
A graph G11 illustrated in
Therefore, by setting “x.x.0.0/16” of the histogram as illustrated in the graph G11 as a monitoring target, the long-life rate becomes higher than that of the overall distribution. In this example, if the survival period is equal to or longer than three weeks in consideration of the values in top 5% of the distribution, the IP is a long-life IP. Therefore, the long-life rate of “x.x.0.0/16” is significantly higher. Because such an IP address range means that the attacker uses each IP address for a long time, there is a high possibility that an attacker's intention is more reflected than that in other band. Therefore, by setting “x.x.0.0/16” as in the graph G11 having a high long-life rate as a monitoring target, the cyberattack can be efficiently analyzed.
A graph G20 illustrated in
(Modification)
Note that the survival period learning unit 30 may access a predetermined information processing server that manages a Domain Name System (DNS) and specify a domain corresponding to at least some addresses of the addresses of the plurality of specified cyberattack sources.
Furthermore, the output unit 50 determines whether or not an address corresponding to a domain specified by accessing the DNS again at a time when the survival period learning unit 30 specifies the domain or at a time different from the time of the access to the DNS is different from a previous address. Next, in a case where the address corresponding to the specified domain is different from the previous address, the output unit 50 includes information regarding the newly specified address in the output list 51 and outputs the output list 51.
In this way, the information processing apparatus 1 may specify the domain corresponding to the address of the cyberattack source and track the address corresponding to the domain. With this operation, the user can easily track another IP address, which is associated with the domain and different from the previous address, for the domain corresponding to the addresses of the plurality of cyberattack sources specified by the cyber threat intelligence 11.
As described above, the information processing apparatus 1 includes the preprocessing unit 20, the survival period learning unit 30, the detection unit 40, and the output unit 50. The preprocessing unit 20 collects the plurality of cyber threat intelligences 11. The survival period learning unit 30 analyzes the plurality of collected cyber threat intelligences 11 and specifies a plurality of addresses of the cyberattack sources included in the plurality of cyber threat intelligences 11. Furthermore, the survival period learning unit 30 specifies a period in which each of the specified addresses of the plurality of cyberattack sources is observed (survival period). The detection unit 40 compares the distribution of the survival periods corresponding to the plurality of specified addresses and the distribution of the survival periods for each address range. Next, the detection unit 40 determines an address range or some addresses included in the address range as a monitoring target according to the comparison result of the distributions. The output unit 50 outputs information regarding the address range determined by the detection unit 40 or some addresses included in the address range.
With this operation, the user can easily find the address range of which the distribution of the survival periods of the plurality of cyberattack sources is different from the distribution of the survival periods for each address range and some addresses included in the address range as monitoring targets. The distribution of the survival periods of the monitoring target is different from, for example, the overall distribution in which the ratio of the single-use IP address is significantly high, and there is a high possibility that the attacker intentionally uses these monitoring targets. Therefore, a user can easily find a significant monitoring target for the analysis of the cyberattacks.
Furthermore, the survival period learning unit 30 accesses the predetermined information processing server (DNS) and specifies the domain corresponding to at least a part of the addresses of the plurality of specified cyberattack sources. In a case where the address corresponding to the domain specified by accessing the DNS again at a time when the domain is specified or at a time different from the time of the access to the DNS is different from a previous address, the output unit 50 outputs information regarding the newly specified address. With this operation, the user can track another IP address, which is associated with the domain and different from the previous address, for the domain corresponding to the addresses of the plurality of cyberattack sources specified by the cyber threat intelligence 11, and it is possible to enhance the analysis quality.
Furthermore, the detection unit 40 determines whether or not a ratio of an address which is observed for a longer period than a predetermined threshold (long-life address) in the distribution of the survival periods corresponding to the plurality of specified addresses is more than that in the distribution of the survival periods for each address range. Next, the detection unit 40 determines an address range that is determined as having the higher ratio or some addresses in the address range as monitoring targets. As a result, the user can easily find the address range having a higher long-life address ratio or some addresses included in the address range as monitoring targets.
Furthermore, the detection unit 40 determines an address (long-life address) that is observed for a longer period than a predetermined threshold from among the addresses included in the address range as a monitoring target. As a result, the user can easily find the long-life address as a monitoring target.
Furthermore, the preprocessing unit 20 collects the cyber threat intelligence 11 related to a predetermined campaign such as the target campaign 12 from the cyber threat intelligence DB 10 so that the user can easily find the address range regarding the predetermined campaign or some addresses included in the address range.
Furthermore, the survival period learning unit 30 specifies a survival period by counting the cyber threat intelligences 11 each including the specified address of each of the plurality of cyberattack sources in chronological order. As a result, the information processing apparatus 1 counts the number of cyber threat intelligences 11 in which the address of the cyberattack source is posted from the cyber threat intelligences 11 that is regularly issued such as a weekly report or a monthly report and can easily specify the survival period.
Note that the components of each of the illustrated devices are not necessarily and physically configured as illustrated in the drawings. In other words, the specific aspects of separation and integration of each of the apparatus and devices are not limited to the illustrated aspects, and all or some of the apparatus or devices can be functionally or physically separated and integrated in any unit, in accordance with various loads, use status, and the like.
In addition, various processing functions executed with the information processing apparatus 1 may be entirely or optionally partially executed on a central processing unit (CPU) (or a microcomputer, such as a microprocessor unit (MPU) or a micro controller unit (MCU)). Furthermore, it is needless to say that whole or any part of various processing functions may be executed by a program to be analyzed and executed on a CPU (or microcomputer such as MPU or MCU), or on hardware by wired logic. In addition, various processing functions executed with the information processing apparatus 1 may be executed by a plurality of computers in cooperation through cloud computing.
Meanwhile, the various processes described in the above embodiment can be achieved by execution of a prepared program on a computer. Thus, there will be described below an example of a computer (hardware) that executes a program having functions similar to the above embodiment.
As illustrated in
The hard disk device 109 stores a program 111 used to execute various processes of the preprocessing unit 20, the survival period learning unit 30, the detection unit 40, the output unit 50, or the like described in the embodiment. In addition, the hard disk device 109 stores various types of data 112 to which the program 111 refers. The input device 102 receives, for example, an input of operation information from an operator. The monitor 103 displays, for example, various screens operated by the operator. The interface device 106 is connected to, for example, a printing device or the like. The communication device 107 is connected to a communication network such as a local area network (LAN), and exchanges various types of information with the external device via the communication network.
The CPU 101 reads the program 111 stored in the hard disk device 109 and develops and executes the program 111 on the RAM 108 so as to execute various processes of the preprocessing unit 20, the survival period learning unit 30, the detection unit 40, the output unit 50, or the like. Note that, the program 111 may not be stored in the hard disk device 109. For example, the program 111 that is stored in a storage medium that can be read by the information processing apparatus 1 may be read and executed. The storage medium which can be read by the information processing apparatus 1 corresponds to, for example, a portable recording medium such as a CD-ROM, a DVD disk, and a universal serial bus (USB) memory, a semiconductor memory such as a flash memory, a hard disk drive, and the like. Alternatively, the program 111 may be prestored in a device connected to a public line, the Internet, a LAN, or the like, and the information processing apparatus 1 may read the program 111 from the device to execute the program 111.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
This application is a continuation application of International Application PCT/JP2018/027140 filed on Jul. 19, 2018 and designated the U.S., the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2018/027140 | Jul 2018 | US |
Child | 17130467 | US |