The present invention relates to a system and the like for detecting, for example, an irregularity event with regard to communication.
Various methods are known as methods for monitoring communication in an information processing system. For example, PTL 1 discloses a detection device for monitoring a communication network. The detection device estimates a condition of the communication by estimating whether an event relating to a certain communication is irregular, based on a log relating to a communication event (hereinafter referred to as “event”) transmitted and received in a communication network.
PTL 1: Japanese Unexamined Patent Application Publication No. 2010-531553
However, according to the detection device disclosed in PTL 1, the accuracy for estimating whether the event is irregular is low. This is because, in the detection device, it is difficult to define an irregular communication by using a query and the like.
Such an irregular communication will be explained herein below. For convenience of explanation, it is assumed that a frequency of communication via TCP port 80 among multiple hosts (information processing devices, communication devices) is extremely low. The TCP stands for abbreviation of Transmission Control Protocol. In this case, for example, every time the detection device receives information indicating that a communication via TCP port 80 is executed, it is necessary to search all of recently executed communications via TCP port 80 between each host. The detection device specifies a communication via TCP port 80 from among all the communications, calculates the frequency of execution of the identified communication, and only in a case where a communication is executed between hosts having low calculated frequencies, the detection device estimates that the event relating to the received communication is irregular.
Therefore, it is a main object of the present invention to provide an event estimation device and the like capable of estimating with high accuracy whether a communication is irregular.
In order to achieve the aforementioned object, as an aspect of the present invention, an event estimation device including:
model generation means for generating, based on a frequency of communications, a model for calculating an irregularity degree, that represents how irregular a communication is, which is high in case that the frequency is low and which is low in case that the frequency is high; and
estimation means for calculating the irregularity degree by applying the model to a frequency of a certain communication, and estimating that the certain communication is irregular in case that the calculated irregularity degree satisfies a criterion and that, otherwise, the certain communication is non-irregular.
In addition, as another aspect of the present invention, an event estimation method including:
generating, based on a frequency of communications, a model for calculating an irregularity degree, that represents how irregular a communication is, which is high in case that the frequency is low and which is low in case that the frequency is high, calculating the irregularity degree by applying the model to a frequency of a certain communication, and estimating that the certain communication is irregular in case that the calculated irregularity degree satisfies a criterion and that, otherwise, the certain communication is non-irregular.
Furthermore, the object is also realized by an associated event estimation program, and a computer-readable recording medium which records the program.
According to an event estimation device and the like of the present invention, whether a communication is irregular can be estimated with high accuracy.
Subsequently, example embodiments for carrying out the present invention will be explained in details with reference to drawings.
A configuration of an event estimation device 101 according to a first example embodiment of the present invention will be explained in details with reference to
The event estimation device 101 according to the first example embodiment includes a model generation unit 102 and an estimation unit 103. The event estimation device 101 may further include a query execution unit 104.
In a communication database 503, graph information (explained later, for example,
First, a target information processing system, in which an irregular event relating to communication is detected in accordance with a result estimated by the event estimation device 101 and the like, will be explained with reference to
For convenience of explanation, it is assumed that, in the information processing system, multiple information processing devices (which are represented as a host 1001a to a host 1001d) execute communication with each other. In this case, the event estimation device 101 according to the present example embodiment estimates whether communications executed among the host 1001a to the host 1001d are irregular.
The host 1001a to the host 1001d include an agent 1002a to an agent 1002d, respectively, for monitoring communications between the hosts. For example, the agent 1002a monitors communication executed by the host 1001a. The agent 1002b monitors communication executed by the host 1001b. The agent 1002c monitors communication executed by the host 1001c. The agent 1002d monitors communication executed by the host 1001d.
The agent 1002a transmits the communication information to a converter 1003 in accordance with transmission or reception of information (hereinafter referred to as “communication information”) by the host 1001a. Similarly to the agent 1002a, the agent 1002b to the agent 1002d also monitor communication executed by the host including own agent, respectively.
The converter 1003 receives communication information transmitted by each agent, and analyzes the received communication information. For example, the converter 1003 identifies, in the received communication information, an identifier representing a transmission-side host (hereinafter referred to as “transmission-side identifier”), an identifier representing a reception-side host (hereinafter referred to as “reception-side identifier”), and information representing a content transmitted and received in the communication. Subsequently, the converter 1003 sets the identified transmission-side identifier as a label of a starting node and sets the identified reception-side identifier as a label of an ending node. The converter 1003 generates a directed graph by setting the identified information as the labels of a directed edge extending from the starting node to the ending node. More specifically, the converter 1003 uses the identified reception-side identifier, the identified reception-side identifier, and the identified information to generate graph information representing an aspect of the communication. The converter 1003 stores the generated graph information to a communication database 1004. Hereinafter, a node may also be referred to as a vertex.
An operator 1006 sets, into an interface 1005, a query for retrieving information relating to communication satisfying a predetermined condition from among target communications for being monitored. For example, in a case where the operator 1006 monitors “communication transmitted via TCP port 80”, the operator 1006 sets, in the interface 1005, a query described with a predetermined criteria such as “TCP port=80”. In accordance with a certain timing (e.g. a query is set, search is performed with a regular interval, a new communication is executed, and the like), the interface 1005 specifies, in the communication database 1004, information relating to communication satisfying a criteria in the set query. When a communication matching with the criteria in the set query is specified, the interface 1005 outputs information about the specified communication to the operator 1006.
Similarly to the communication database 1004 explained above, the communication database 503 illustrated in
In each example embodiment of the present application, communication bodies represent information processing devices capable of performing communication via a communication network. Alternatively, the communication bodies represent network devices capable of performing communication via a communication network. A network device is, for example, computers such as a personal computer or a server, a device such as a network printer, a firewall, a router, a network switch, or the like.
Processing in the interface 1005 in a case where a query for retrieving information relating to communication satisfying a predetermined criteria is set will be explained with reference to
In case that a query is input (YES in step S201), the query interface (hereinafter also referred to as “IF”) 1005 stores the query (step S202). The interface 1005 may convert the query into a configuration suitable for searching the query (step S203). For example, the interface 1005 may convert the query into a search tree, or may convert the query into an aspect for performing search using a hash function.
Subsequently, a query which is input into the interface 1005 will be explained with reference to
When roughly classified, the GUI exemplified in
The irregularity degree designation item further includes a type IF 301 for setting an index representing a type relating to the irregularity degree (explained later), a threshold value IF 302 for setting a threshold value serving as a reference estimating whether communication is irregular, and an option IF 303 for setting an option relating to the irregularity degree.
Via the type IF 301, a type relating to a function of calculating the irregularity degree, with which whether a communication is irregular is estimated, is set from among multiple choices. For example, a type “novelty” represents a function of estimating that the communication is irregular in case that communication is executed between communication bodies that usually do not execute communication. A “time zone” included in the type IF 301 represents a function of estimating that the communication among multiple communication bodies is irregular in case that communication is executed in a time zone that communication is usually not executed among them. The time zone is a certain time of a day, a certain day of a week, a certain day of a month, and the like, and can be set via the option IF 303.
A “communication frequency” included in the type IF 301 represents a function of estimating that the communication is irregular in case that the cycle of communication executed among multiple communication bodies is different from a cycle of communication executed normally among them. A “communication quantity” included in the type IF 301 represents a function of estimating that the communication is irregular in case that the communication quantity of the communication executed between communication bodies is different from a communication quantity of communication executed normally between them.
A threshold value, which represents a criterion for estimating whether a communication is irregular, for the irregularity degree of a type set via the type IF 301 can be set via the threshold value IF 302. The threshold value is, for example, a value representing a criterion for estimating whether the communication is irregular by using a model (explained later, for example,
The option IF 303 allows an input of information that needs to be additionally set with regard to the irregularity degree of the type setting with the type IF 301. The option IF 303 may be shown as necessary. For example, in case that the “time zone” is selected with the type IF 301, the option IF 303 may be shown. For example, the option IF 303 can set, as a time zone, a certain time of a day (Time of the Day), a certain day of a week (Day of Week), a certain day of a month (Day of Month), or the like.
It is assumed that, in case that the “communication quantity” is selected with the type IF 301, the option IF 303 is shown. Via the option IF 303, a period for measuring the communication quantity can be allowed. The number of items that can be set via the option IF 303 is not limited to one, and multiple items may be set as necessary.
The information IF includes a transmission host IF 304, a reception host IF 305, and a protocol IF 306. The information IF may include other IFs, and is not limited to the following explanation.
Via the transmission host IF 304, communication bodies transmitting information (hereinafter referred to as “transmission host”) relating to communication for being searched can be set. Via the reception host IF 305, communication bodies receiving information (hereinafter referred to as “reception host”) relating to communication for being searched is set.
Examples of methods for setting communication bodies include a method for designating, an IP (internet protocol) address, a method for designating a MAC (Media Access Control) address, a method for designating a host name, or the like.
It is not always necessary to set information for designating the transmission host via the transmission host IF 304. It is not always necessary to set information for designating the reception host via the reception host IF 305. For example, in case that the transmission host and reception hosts are designated, the event estimation device 101 estimates whether a communication between the designated transmission host and the designated reception host is irregular by using a query exemplified in
A protocol relating to target communication for being determined as to whether it is irregular can be designated via the protocol IF 306. Examples of methods for designating a protocol include a method for designating a protocol name, a method for designating a TCP/UDP (user datagram protocol) port number and the like.
The event estimation device 101 estimates whether a communication executed in accordance with the designated protocol is irregular. In case that a protocol is not designated, the event estimation device 101 may estimate whether a communication is irregular without limiting the protocol.
In the form exemplified in
The form may include, for example, an IF capable of inputting a port number or the like. The form does not always need to include all the items such as the type IF 301. More specifically, the form is not limited to the aspect exemplified in
In
item 1: a communication of which type (“Anomaly Type” in
item 2: a communication of which threshold value (“Threshold” in
item 3: a communication of which protocol (“Protocol” in
More specifically, the query exemplified in
For convenience of explanation, it is assumed that a basic syntax relating to a query is based on EPL (Event Processing Language). However, in each example embodiment of the present invention, the query exemplified in
In case that a query is designated with a text format, a type, a threshold value, an option, or the like can be designated just like the case of designating a query via GUI.
The example illustrated in
Subsequently, communication information and graph information relating to processing performed in the event estimation device 101 according to the present example embodiment will be explained. First, the communication information will be explained with reference to
The communication information is information where for example, a device identifier capable of identifying a transmission host executing communication, a device identifier capable of identifying a reception host executing communication, a date and time when communication is executed, a protocol of the communication, a communication quantity transmitted and received in the communication, and the like are associated with each other. This represents that information having the communication quantity is communicated from the transmission host to the reception host at the date and time in accordance with the protocol of the communication. For example, in the communication information exemplified in
Subsequently, the graph information will be explained with reference to
The graph information is information where a device identifier capable of identifying a transmission host, a device identifier representing a reception host, and communication information about communication executed between the transmission host and the reception host are associated. For example, in the communication information, a time for communication, a protocol relating to the communication, and a communication quantity (data size) transmitted and received in the communication are associated. In the communication information, a model generated with regard to the communication (exemplified in
In the graph information exemplified in
In the graph information, the two device identifiers are associated by using an aspect in which the two vertices are connected via arrows. The arrow represents communication executed between devices represented by each of the device identifiers. For example, in the graph information exemplified in
Further, in the graph information, communication information about the communication is attached as a label of an edge representing the communication. For example, in the graph information exemplified in
More specifically, in the graph information, for example, the device identifier for identifying the transmission host, the device identifier representing the reception host, and the communication information about communication executed between the transmission host and the reception host are associated by using the graph explained above.
Subsequently, processing for achieving processing relating to a graph in the information processing device will be explained. For example, the graph is expressed by using adjacent vertex information where a vertex identifier representing a certain vertex and a vertex identifier representing a vertex connected to (adjacent to) the certain vertex are associated. The graph may be represented by using vertex edge information where a vertex identifier representing a certain vertex and an edge identifier representing an edge connected to the certain vertex are associated.
In a case where a graph is represented by adjacent vertex information, information attached to a certain vertex (for example, the device identifier explained above) is represented by vertex label information where an identifier representing the certain vertex and the information are associated. An identifier representing the certain vertex and an information identifier representing the information may be associated in the vertex label information.
In a case where a graph is represented by vertex edge information, information attached to a certain edge (for example, the date and time, the model, and the like explained above) is represented by edge label information where an edge identifier representing the certain edge and the information are associated. In a case where a graph is represented by vertex edge information, information attached to a certain edge is represented by edge label information where an edge identifier representing the certain edge and the information are associated.
In a case where information is attached to both of the vertex and the edge in the graph, the graph may be represented by the vertex label information explained above and the edge label information explained above. The aspect for representing the graph is not limited to the example explained above.
For convenience of explanation, in each example embodiment of the present invention, processing executed by each unit is represented as processing for the graph, but the processing is realized as processing executed with regard to information such as the vertex edge information and the like described above.
Subsequently, processing in the event estimation device 101 according to the present example embodiment will be explained. The processing in the event estimation device 101 roughly includes processing for generating a model and processing for determining whether a communication is irregular based on the generated model.
First, processing for generating a model in the event estimation device 101 according to the present example embodiment will be explained. The model generation unit 102 generates a model to be referred to (explained later, for example,
Subsequently, processing for determining whether a communication is irregular based on the generated model in the event estimation device 101 according to the present example embodiment will be explained with reference to
The processing in the event estimation device 101 will be explained with reference to an example of a case where the information processing system exemplified in
Subsequently, in accordance with the query exemplified in
The estimation unit 103 applies the read model to the calculated parameter to calculate the irregularity degree (step S102). Subsequently, the estimation unit 103 determines whether the calculated irregularity degree satisfies a criterion (step S103). The criterion is whether the irregularity degree is more than a predetermined threshold value.
In case that the calculated irregularity degree is more than the threshold value (YES in step S103), the estimation unit 103 associates the communication with a label indicating an irregular communication (step S104). In case that the calculated irregularity degree does not satisfy the criterion (NO in step S103), the estimation unit 103 associates the communication with a label indicating a non-irregular communication (step S105). Although the estimation unit 103 associates the communication with the label in step S104 or step S105, the estimation unit 103 may classify the communications into irregular communication and non-irregular communication on the base of whether the irregularity degree is more than the threshold value.
The processing for calculating the parameter in the processing shown in step S102 may be executed in advance, and in this case, for example, a parameter relating to the communication processing in the data stored in the communication database 503 (data structure is exemplified in
With reference to
More specifically, the graph indicates an aspect of communication executed among multiple communication bodies. For example, in a case where communication processing is executed to transmit information from the vertex a to the vertex b, the model generation unit 102 may specify an arrow from the vertex a to the vertex b, and may update a frequency attached as a label of the identified arrow on the base of the date and time of the communication processing. The data structure exemplified in
Subsequently, processing for generating graph information relating to communication executed by communication bodies and storing the graph information to the communication database 503 will be explained with reference to
For convenience of explanation, the communication bodies are assumed to be hosts (i.e., a host a to a host d). The host a to the host d are assumed to have an agent a to an agent d, respectively, for monitoring communication of the hosts. More specifically, the agent a to the agent d are assumed to be resident on the host a to the host d, respectively.
In a case where the host a executes communication (i.e., communication occurs) (YES in step S301), the agent a notifies communication information about the communication (exemplified in
The converter 1003 reads the identifier of the transmission host relating to a certain communication, the identifier of the reception host relating to the communication, the date and time when the communication is executed, the protocol of the communication, and the communication quantity of the communication from the communication information received from each agent. The converter 1003 convert the read information to the graph information (for example,
For example, in a case where the graph information is updated in the communication database 503, the model generation unit 102 may generate a model, based on the base of the updated graph information. For example, the model generation unit 102 executes processing such as reading a time from the updated graph information, classifying the read time into each time zone, and calculating the frequency of communication executed within each time zone, so that the model generation unit 102 generates a model (step S305). The details of the processing for generating the model will be explained later in details for each of the types of “novelty”, “time zone”, and the like. The model generation unit 102 stores the generated model into the communication database 503 as a label of an edge connecting the identifier of the transmission host and the identifier of the reception host (step S306).
A procedure for generating a model in step S305 in the model generation unit 102 will be explained in a more specific manner. The processing for generating a model in the model generation unit 102 will be explained with reference to an example where the type is, for example, “novelty”, “time zone”, “communication frequency”, and “communication quantity”, respectively.
Processing for generating a model in the model generation unit 102 in case that the type is “novelty” will be explained.
The model generation unit 102 generates a histogram representing a history of communication frequency, based on graph information stored in the communication database 503. In this case, for example, the model generation unit 102 reads the date and time (timing) relating to communication executed in accordance with a certain protocol between the transmission host and the reception host from the graph information. The model generation unit 102 classifies the read timing into a predetermined period, and calculates the communication frequency in the period, so that the model generation unit 102 generates the histogram.
For example, in case that there is a period in which the frequency is zero, the model generation unit 102 may add a small value (for example, 1) to the frequency of each period for which the histogram is calculated. In this case, for example, even in a case where the frequency is not include in the graph information stored in the communication database 503, the model generation unit 102 calculates the frequency on the basis of the small value. In this case, the model generation unit 102 generates a model where execution of communication in the period is assumed. Therefore, the model generation unit 102 generates the appropriate histogram.
The model generation unit 102 generates a model by, e.g., switching a high level of frequency and a low level of frequency in the histogram. For example, in case that the frequency in the histogram is high, the model generation unit 102 sets the irregularity degree low. In case that the frequency in the histogram is low, the model generation unit 102 sets the irregularity degree high. As a result, model generation unit 102 generates a model exemplified in
In case that the type is “novelty”, the estimation unit 103 calculates the frequency of communication executed during a certain period. The estimation unit 103 reads a model included in the graph information generated by the model generation unit 102 on the base of a result obtained by referring to the communication database 503 and applies the read model to the calculated frequency, so that the estimation unit 103 calculates the irregularity degree. In this case, in case that communication is executed in a period with low frequency, the estimation unit 103 estimates that the communication is irregular. Therefore, as illustrated in
Processing for generating a model in the model generation unit 102 in case that the type is “time zone” and further the option is “Time of the Day” will be explained.
The model generation unit 102 generates a histogram representing a history of communication frequency in a certain time zone on the base of the graph information stored in the communication database 503. In this case, for example, the model generation unit 102 classifies a timing of communication between the transmission host and the reception host in accordance with a certain protocol into multiple time zones, and calculates the frequency in the time zones, so that the model generation unit 102 generates the histogram. For example, the model generation unit 102 generates a histogram relating to each of time zones generated by dividing a day.
The model generation unit 102 generates a model as exemplified in
The estimation unit 103 calculates a time zone including a timing of a certain communication. The estimation unit 103 reads a model included in the graph information generated by the model generation unit 102 from the communication database 503 and applies the read model to the calculated time zone, so that the estimation unit 103 calculates the irregularity degree. Hereinafter, the estimation unit 103 estimates whether a communication is irregular by executing processing similar to the above processing.
In case that the type is “time zone”, the estimation unit 103 estimates that a communication is irregular, when the communication is executed in a time zone where the communication frequency is low. More specifically, in the example illustrated in
Processing for generating a model in the model generation unit 102 in case that the type is “communication frequency” will be explained.
The model generation unit 102 generates a histogram representing a history of communication frequency on the base of the communication information stored in the communication database 503. In this case, for example, the model generation unit 102 calculates a time interval of communication between the transmission host and the reception host in accordance with a certain protocol. The model generation unit 102 divides the calculated interval into sections, and calculates the frequency in each of the sections, so that the model generation unit 102 generates a histogram.
The model generation unit 102 generates a model as exemplified in
The estimation unit 103 calculates an interval of a certain communication with regard to the certain communication. The estimation unit 103 reads a model included in the graph information generated by the model generation unit 102 from the communication database 503 and applies the read model to the calculated communication interval, so that the estimation unit 103 calculates the irregularity degree. Hereinafter, the estimation unit 103 estimates whether a communication is irregular by executing processing similar to the above processing.
As described above, in case that the type is “communication frequency”, the model generation unit 102 calculates an interval between timings of two communications. For example, the event estimation device 101 may include a state (not shown) storing a timing of an immediately preceding communication.
Processing for generating a model in the model generation unit 102 in case that the type is “communication quantity” will be explained.
The model generation unit 102 generates a histogram representing a history of communication frequency on the base of the graph information stored in the communication database 503. In this case, for example, the model generation unit 102 reads communication quantities transmitted and received in a communication between the transmission host and the reception host in accordance with a certain protocol. Subsequently, the model generation unit 102 classifies the read communication quantities into sections, and calculates the frequency in each of the sections to generate a histogram. In this case, the frequency represents a frequency of a certain communication quantity measured with regard to communication executed between the transmission host and the reception host in accordance with a certain protocol within a certain time.
The model generation unit 102 generates a model as exemplified in
The estimation unit 103 calculates a communication quantity transmitted and received in a certain communication. The estimation unit 103 reads a model generated by the model generation unit 102 from the communication database 503 and applies the read model to the calculated communication quantity, so that the estimation unit 103 calculates the irregularity degree. Hereinafter, the estimation unit 103 estimates whether a communication is irregular by executing processing similar to the above processing.
In case that the type is “communication quantity”, the communication is likely to be irregular when the communication quantity is different from a communication quantity transmitted and received normally. Therefore, the model generation unit 102 generates a model in which degree of irregularity is lower in a case where the communication quantity is closer to a communication quantity transmitted and received normally, and the model generation unit 102 generates a model in which degree of irregularity is higher in a case where the communication quantity is closer to a communication quantity different from those transmitted and received normally.
In case that the type is “communication quantity”, it is necessary to calculate a summation of communication quantity within a window time (i.e., a certain time). Therefore, the event estimation device 101 may have a state (not shown) capable of storing communication within the window time.
A procedure for executing processing in accordance with a query in a case where communication is executed on a host will be explained with reference to
For example, in a case where communication is executed among the host a to the host d (i.e., communication occurs) (YES in step S401), the agent a to the agent d transmit communication information about the communication to the converter (step S402). The converter receives the communication information, and converts the received communication information into graph information (step S403). The processing shown in step S401 to step S403 is similar to the processing shown in step S301 to step S303 illustrated in
The query execution unit 104 searches a query that matches with the communication information, but in its previous stage, the query execution unit 104 calculates the irregularity degree relating to communication included in the communication information on the base of the model stored in the communication database 503 (step S405).
Hereinafter, operations for each item that can be set to the type will be explained.
First, in a case where the type is “novelty”, the query execution unit 104 reads, from the communication database 503, a model associated with the transmission host information, the reception host information, and the protocol that are included in the received communication information. The query execution unit 104 calculates the irregularity degree by applying the read model to the information about the timing when the communication is executed.
In a case where the type is “time zone”, the query execution unit 104 reads, from the communication database 503, a model associated with the transmission host information, the reception host information, and the protocol that are included in the received communication information. Then, the query execution unit 104 calculates the irregularity degree by applying the read model to the timing when the communication is executed.
In a case where the type is “communication frequency”, the query execution unit 104 reads, from the communication database 503, a model associated with the transmission host information, the reception host information, and the protocol that are included in the received communication information. The query execution unit 104 calculates a difference between a timing of the communication included in the communication information and a timing of an immediately preceding communication of the same protocol in the same section as the communication information was executed, and applies the read model to the calculated difference, so that the query execution unit 104 calculates the irregularity degree.
In a case where the type is “communication quantity”, the query execution unit 104 reads, from the communication database 503, a model associated with the transmission host information, the reception host information, and the protocol that are included in the received communication information. The query execution unit 104 calculates a summation communication quantity in communication included within a window time designated by a query with regard to any given communication with the same protocol in the same section as the communication information held in the state and the communication information, and applies the read model to the communication quantity, so that the query execution unit 104 calculates the irregularity degree.
The query execution unit 104 searches a query matching with (agreeing with) the communication information from among the storied queries (step S406). The query execution unit 104 estimates that a query matches with communication information in a case where the calculated irregularity degree is more than a threshold value. In a case where there exists a matching query (YES in step S407), the query execution unit 104 notifies the matching query to the operator 1006 via the query IF (step S408). The query execution unit 104 may store communication information for a model of a type (“communication frequency”, “communication quantity”, and the like) that requires past communication information (step S409).
Subsequently, the advantages relating to the event estimation device 101 according to the first example embodiment will be explained.
The event estimation device 101 can estimate whether a communication is irregular with a high degree of accuracy. This is because the model generation unit 102 calculates a model appropriate for calculating the irregularity degree.
The irregularity detection device disclosed in PTL 1 calculates a percentile relating to an event stored in a history on the base of the history of an occurred event. Subsequently, the irregularity detection device discovers an irregular event, based on the calculated percentile. For example, in a case where the number of occurred events is small, the history may not necessarily store the events of all the types. Therefore, the irregularity detection device does not necessarily discover an irregular event.
In contrast, the model generation unit 102 generates an appropriate model by executing the processing explained above. The model generation unit 102 generates a model in which the irregularity degree is high in a case where the communication frequency is low, and in which the irregularity degree is low in a case where the communication frequency is high. The estimation unit 103 determines whether a communication is irregular in accordance with the model. Therefore, the event estimation device 101 can estimate whether a communication is irregular with a high degree of accuracy.
Further, in a case where there is a section in which the frequency is zero, for example, the model generation unit 102 adds a small value (for example, one) to the frequency in each section, so that the model generation unit 102 can generate a model with which the irregularity degree relating to the communication can be calculated appropriately. Therefore, the event estimation device 101 accurately estimates whether a communication is irregular based on an appropriate model.
In a case where the type is “novelty”, the event estimation device 101 according to the present example embodiment can estimate whether a communication is irregular with a high degree of accuracy. This is because in many cases, communications are frequently executed within a certain period, and communications are not so much executed in a period other than the certain period.
The reason why the event estimation device 101 according to the present example embodiment can estimate whether a communication is irregular with a high degree of accuracy in a case where the type is “novelty” will be explained in details. As explained above about the processing relating to the case where the type is “novelty”, the relationship between the frequency and the irregularity degree is such that a communication of a lower frequency has a higher irregularity degree, and a communication of a higher frequency has a lower irregularity degree. In a case where communication is executed at a timing away from a period in which communications are frequently executed, the communication is likely to be irregular. In accordance with the processing explained above, the event estimation device 101 determines a communication executed at a timing away from a period in which communications are frequently executed is irregular. Therefore, the event estimation device 101 according to the present example embodiment can estimate whether a communication is irregular with a high degree of accuracy.
In a case where the type is “time zone”, the event estimation device 101 according to the present example embodiment can estimate whether a communication is irregular with a high degree of accuracy.
The reason why the event estimation device 101 according to the present example embodiment can estimate whether a communication is irregular with a high degree of accuracy in a case where the type is “time zone” will be explained. The relationship between the frequency and the irregularity degree is such that a communication executed in a time zone in which similar communication events (communications) seldom occur has a higher degree of irregularity, and a communication executed in a time zone in which similar (or the same) communications are frequently executed has a lower degree of irregularity. Therefore, the event estimation device 101 according to the present example embodiment generates a model such that a time zone with a lower frequency has a higher irregularity degree, and a time zone with a higher frequency has a lower irregularity degree to cause the model to be an appropriate model, and accordingly the irregularity of communications can be determined accurately.
In a case where the type is “communication frequency”, the event estimation device 101 according to the present example embodiment can estimate whether a communication is irregular with a high degree of accuracy.
The reason why the event estimation device 101 according to the present example embodiment can estimate whether a communication is irregular with a high degree of accuracy in a case where the type is “communication frequency” will be explained. When communications are executed with an interval different from the normal interval, this indicates that an irregular phenomenon occurs. The event estimation device 101 employs, as the frequency, an interval between a communication timing and a subsequent communication timing, and the event estimation device 101 generates a model such that in a case where the frequency of the interval is lower, the irregularity degree is higher, and in a case where the frequency of the interval is higher, the irregularity degree is lower. Therefore, the event estimation device 101 according to the present example embodiment can generate an appropriate model.
In a case where the type is “communication quantity”, the event estimation device 101 according to the present example embodiment can estimate whether a communication is irregular with a high degree of accuracy.
The reason why the event estimation device 101 according to the present example embodiment can estimate whether a communication is irregular with a high degree of accuracy in a case where the type is “communication quantity” will be explained. When an information quantity different from a normal information quantity is communicated, this indicates that an irregular phenomenon occurs. The event estimation device 101 employs, as the frequency, a communication quantity for a certain period of time, and the event estimation device 101 generates a model such that in a case where the frequency of the communication quantity is lower, the irregularity degree is higher, and in a case where the frequency of the communication quantity is higher, the irregularity degree is lower. Therefore, the event estimation device 101 according to the present example embodiment can generate an appropriate model.
Therefore, the event estimation device 101 according to the present example embodiment can estimate whether a communication is irregular with a high degree of accuracy.
Subsequently, the second example embodiment of the present invention, which is based on the first example embodiment explained above, will be explained.
In the following explanation, characteristic portions relating to the present example embodiment will be mainly described, and the same reference numerals are given to the same configurations as those of the first example embodiment described above, and redundant explanation will be omitted.
The configuration of the event estimation device 201 according to the second example embodiment and the processing performed by the event estimation device 201 will be described with reference to
The event estimation device 201 according to the second example embodiment includes a communication extraction unit 202, a model generation unit 102, and an estimation unit 103.
Graph information (for example,
For example, in accordance with the updating of the graph information in the communication database 503, the communication extraction unit 202 reads a communication having a high degree of similarity (similarity), that represents a degree how much the communication is similar to the communication included in the updated graph information, from the communication database 503 (step S501). For convenience of explanation, the read communication will be referred to as “first communication”. In this case, a high degree of similarity indicates that certain two communications are similar or the same.
For example, in a case where various kinds of information about communications are associated with edges in the graph information, the communication extraction unit 202 may calculate the degree of similarity on the base of the information. For example, in a case where the information is represented with a symbol or a numerical value, the distance of the information can be calculated, and the distance can be employed as the degree of similarity.
In a case where the calculated degree of similarity is more than the predetermined value, the communication extraction unit 202 estimates that communication is similar to (or the same as) information included in the graph information. In a case where the calculated degree of similarity is less than the predetermined value, the communication extraction unit 202 estimates that the communication is not similar to (or not the same as) information included in the graph information.
The communication extraction unit 202 selects a communication having a high degree of similarity by executing the processing described above (step S501).
Alternatively, the communication extraction unit 202 may select similar (or the same) information by applying a clustering algorithm to the symbol or numerical value representing the information.
The model generation unit 102 generates a model relating to the communication by generating the histogram as described above with regard to the communication selected by the communication extraction unit 202 (step S101).
Subsequently, the estimation unit 103 calculates the irregularity degree by applying the generated model (step S102). The estimation unit 103 determines whether the calculated irregularity degree satisfies a criterion (step S103). In a case where the calculated degree of irregularity is more than the threshold value (YES in step S103), the estimation unit 103 associates the communication with a label indicating an irregular communication (step S104). In a case where the calculated irregularity degree does not satisfy the criterion (NO in step S103), the estimation unit 103 associates the communication with a label indicating a non-irregular communication (step S105). In step S104 or step S105, the estimation unit 103 associates the communication with the label, but the estimation unit 103 may classify the communication into an irregular communication and a non-irregular communication on the base of whether the irregularity degree is more than the threshold value.
Subsequently, the effects of the event estimation device 201 according to the second example embodiment will be explained.
The event estimation device 201 according to the present example embodiment can estimate whether a communication is irregular with a still higher degree of accuracy. This reason includes Reason 1 and Reason 2.
(Reason 1) The configuration of the event estimation device 201 according to the second example embodiment includes the configuration of the event estimation device 101 according to the first example embodiment.
(Reason 2) The communication extraction unit 202 selects a communication having a high similarity degree so that the model generation unit 102 can generate an appropriate model.
(Hardware Configuration Example)
A configuration example of hardware resources that realize an event estimation in the above-described example embodiments of the present invention using a single calculation processing apparatus (an information processing apparatus or a computer) will be described. However, the availability analysis device may be realized using physically or functionally at least two calculation processing apparatuses. Further, the availability analysis device may be realized as a dedicated apparatus.
The non-volatile recording medium 24 is, for example, a computer-readable Compact Disc, Digital Versatile Disc, Universal Serial Bus (USB) memory, or Solid State Drive. The non-transitory recording medium 24 allows a related program to be holdable and portable without power supply. The non-transitory recording medium 24 is not limited to the above-described media. Further, a related program can be carried via a communication network by way of the communication I/F 27 instead of the non-transitory medium 24.
In other words, the CPU 21 copies, on the memory 22, a software program (a computer program: hereinafter, referred to simply as a “program”) stored by the disc 23 when executing the program and executes arithmetic processing. The CPU 21 reads data necessary for program execution from the memory 22. When display is needed, the CPU 21 displays an output result on the display 28. When a program is input from the outside, the CPU 21 reads the program from the input apparatus 25. The CPU 21 interprets and executes an event estimation program present on the memory 22 corresponding to a function (processing) indicated by each unit illustrated in
In other words, in such a case, it is conceivable that the present invention can also be made using the event estimation program. Further, it is conceivable that the present invention can also be made using a computer-readable, non-transitory recording medium storing the event estimation program.
The present invention has been described using the above-described example embodiments as example cases. However, the present invention is not limited to the above-described example embodiments. In other words, the present invention is applicable with various aspects that can be understood by those skilled in the art without departing from the scope of the present invention.
This application is based upon and claims the benefit of priority from Japanese patent application No. 2014-184088, filed on Sep. 10, 2014, the disclosure of which is incorporated herein in its entirety.
101 Event estimation device
102 Model generation unit
103 Estimation unit
104 Query execution unit
503 Communication database
a Vertex
b Vertex
c Vertex
d Vertex
201 Event estimation device
202 Communication extraction unit
301 Type IF
302 Threshold value IF
303 Option IF
304 Transmission host IF
305 Reception host IF
306 Protocol IF
20 Calculation processing device
21 CPU
22 Memory
23 Disk
24 Non-volatile recording medium
25 Input device
26 Output device
27 Communication IF
28 Display
1001
a Host
1002
a Agent
1001
b Host
1002
b Agent
1001
c Host
1002
c Agent
1001
d Host
1002
d Agent
1003 Converter
1004 Communication database
1005 Interface
1006 Operator
Number | Date | Country | Kind |
---|---|---|---|
2014-184088 | Sep 2014 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2015/004523 | 9/7/2015 | WO | 00 |