Embodiments described herein relate generally to simulations of malware propagation through computer networks.
Conventional malware protection mechanisms are reactive to the detection of malware in a network or the widespread distribution of anti-malware measures. Such approaches are known as “diagnosis and treatment”. Mitigation measures such as anti-malware or malware-specific protective measures may not be known for some time after an infection has been studied for its effects. Accordingly, it is beneficial to provide improvements in the simulation of the propagation of such infections through computer networks, thereby allowing for faster and more appropriate selection of malware protection mechanisms.
The present application relates to simulation of a network and, in particular, a network subject to a threat or attack such as malware or the like. The simulation is arranged to simulate the propagation of the threat through the network as each entity in the network (i.e. each system or device) goes through a process of being susceptible to infection, then infected, then detected (i.e. infection is detected), then ultimately removed (e.g. the infection is either remediated, mitigated or the entity is disconnected/removed from the network).
The reproduction number R0 (i.e. how many secondary computer systems are infected by a primary computer system on average) constitutes a key characteristic of an infectious threat. Determining this number for a malware is particularly challenging in practice because it is difficult or not possible to determine secondary infection cases based on a primary infection case. For this reason, values of R0 are typically estimated.
In the present application, the inventors propose using the simulation to determine one or more values of R0. The simulation can be used to overcome the challenge of attributing secondary infected entities to primary infected entities and so a value for R0 can be calculated for the simulated entities, rather than estimated.
In use, the simulation can be used to forecast R0 for a given threat. Such a forecast can initially be made without any responsive measures, and then subsequent executions of the simulation with forecasting of R0 for the deployment of each of a number of possible responsive measures to compare the forecast R0 values. This allows an appropriate responsive measure to be selected for a real-world system.
In accordance with a first aspect of the invention, there is provided a computer-implemented method of simulating a propagation of a malware through a set of computer systems, the method comprising: identifying a plurality of first simulated computer systems infected with a simulated malware; for each of the first simulated computer systems, infecting a number of neighbouring second simulated computer systems, the number being zero or an integer; and determining a value of a reproduction number, R0, based on the total number of second simulated computer systems and the number of first simulated computer systems.
The method may include repeating the steps of identifying a plurality of first simulated computer systems, infecting a number of neighbouring second simulated computer systems and determining a value of a reproduction number, R0, over a plurality of time periods, such that a value of the reproduction number is determined for each of the time periods.
The method may include, for each of the time periods: deploying one or more simulated malware protection measures configured to inhibit the propagation of the simulated malware; and associating the value of the reproduction number, R0, determined for the time period with the simulated malware protection measures for the time period.
The method may include: deploying one or more simulated malware protection measures configured to inhibit the propagation of the simulated malware; and associating the value(s) of the reproduction number, R0, with the simulated malware protection measures.
The simulated malware protection measure may include one or more of: an anti-malware facility; a malware filter; a malware detector; a block, preclusion or cessation of interaction; and a reconfiguration of one or more simulated computer systems.
Determining the value of the reproduction number, R0, may include: obtaining the total number of second simulated computer systems by summing the numbers of second simulated computer systems; determining the number of first simulated computer systems; and dividing the total number of second simulated computer systems by the number of first simulated computer systems.
The number of second simulated computer systems infected by each first simulated computer system may be determined according to an infection rate.
The method may include identifying one or more simulated computer systems as being susceptible to the simulated malware; and/or identifying one or more simulated computer systems as being insusceptible to the simulated malware.
The method may include determining a value of the effective reproduction number, Rt, based on the value of the reproduction number and the proportion of simulated computer systems identified as being susceptible to the simulated malware.
In accordance with a second aspect of the invention, there is provided a computer implemented malware protection method to protect at least a subset of a set of computer systems from a malware, the method comprising: accessing a model of the set of computer systems; simulating a propagation of the malware through the set of computer systems using the model, wherein the simulating comprises the method of any one of the methods set out above; and identifying, based on the determined value(s) of the reproduction number, R0, one or more malware protection measures to be deployed to one or more of the set of computer systems.
The method may include deploying the one or more malware protection measures to the one or more computer systems.
In accordance with a third aspect of the invention, there is provided a system including one or more processors and a memory storing instructions that, when executed by the one or more processors, cause the one or more processors to perform any one of the methods set out above.
In accordance with a fourth aspect of the invention, there is provided a computer program comprising instructions that, when executed by a processor, cause the processor to perform any one of the methods set out above.
In the following, embodiments will be described with reference to the drawings in which:
Simulations of the present application model the propagation of a threat/infection/malware across the network based on modelled network communications and interactions between entities. Conventionally, such simulations employ a variety of parameters in such a model including: infection rates; detection rates; removal rates and the like. In conventional simulations, these rates are defined for the entire simulation, or at least a network or sub-network.
Malicious software, also known as computer contaminants or malware, is software that is intended to do direct or indirect harm in relation to one or more computer systems. Such harm can manifest as the disruption or prevention of the operation of all or part of a computer system, accessing private, sensitive, secure and/or secret data, software and/or resources of computing facilities, or the performance of illicit, illegal or fraudulent acts. Malware includes, inter alia, computer viruses, worms, botnets, trojans, spyware, adware, rootkits, keyloggers, dialers, malicious browser extensions or plugins and rogue security software.
Malware proliferation can occur in a number of ways. Malware can be communicated as part of an email such as an attachment or embedding. Alternatively, malware can be disguised as, or embedded, appended or otherwise communicated with or within, genuine software. Some malware is able to propagate via storage devices such as removable, mobile or portable storage including memory cards, disk drives, memory sticks and the like, or via shared or network attached storage. Malware can also be communicated over computer network connections such as the internet via websites or other network facilities or resources. Malware can propagate by exploiting vulnerabilities in computer systems such as vulnerabilities in software or hardware components including software applications, browsers, operating systems, device drivers or networking, interface or storage hardware.
A vulnerability is a weakness in a computer system, such as a computer, operating system, network of connected computers or one or more software components such as applications. Such weaknesses can manifest as defects, errors or bugs in software code that present an exploitable security weakness. An example of such a weakness is a buffer-overrun vulnerability, in which, in one form, an interface designed to store data in an area of memory allows a caller to supply more data than will fit in the area of memory. The extra data can overwrite executable code stored in the memory and thus such a weakness can permit the storage of malicious executable code within an executable area of memory. An example of such malicious executable code is known as ‘shellcode’ which can be used to exploit a vulnerability by, for example, the execution, installation and/or reconfiguration of resources in a computer system. Such weaknesses, once exploited, can bootstrap a process of greater exploitation of a target system, and propagation of the malware to other computer systems. The effects of malware on the operation and/or security of a computer system lead to a need to identify malware in a computer system in order to implement protective and/or remedial measures.
While malware detection is often directed to computer systems themselves or the networks over which they communicate, embodiments of the present invention recognise that interactions between computer systems transcend the physical interconnections therebetween.
In particular, embodiments of the present invention are directed to addressing interactions between electronic devices or computer systems that arise from communication between pairs of electronic devices or computer systems in a network. Such interactions can include, for example, interactions between users of each of a pair of electronic device or computer systems using, inter alia, social media, messaging, electronic mail or file sharing facilities. Thus, embodiments of the present invention employ a model or simulation of a set of electronic devices or computer systems in which interacting pairs of computer systems are identified, such interactions being based on previous communication occurring between the electronic device or computer systems in the pair.
Notably, such a model (or simulation) may disregard intermediates in an interaction—such as physical resources or other computer systems involved in a communication. For example, an interaction arising from a social media communication between two users using each of a pair of computer systems will involve potentially multiple physical or logical networks, intermediate servers, service provider hosts, intermediate communication appliances and the like.
As a result, a model (or simulation) of the physical communication becomes burdened by the intermediate features of a typical inter-computer communication. In contrast, embodiments of the present invention address the endpoints of an interaction such as the computer systems through which users communicate. A similar analysis can be conducted for interactions involving email, electronic messaging, file sharing and the like.
Ideally, the behaviour and characteristics of an infection in the simulation of the infected network accurately reflects the behaviour and characteristics of an infection in a real network. Embodiments of the present invention relate to improvements in such simulations, providing a more accurate simulation of the propagation of malware through a network (i.e. compared with the propagation of malware through a real network). This allows for a more effective determination of suitable mitigation measures that can be employed to mitigate the spread of the malware, or infection, throughout the network. The deployment of malware protection measures is targeted to provide an effective and/or efficient inhibition of the propagation of the infection on the network.
The nature and type of malware protection measures themselves are understood by those skilled in the art and can include, inter alia: anti-malware facilities; malware filters; malware detectors; a block, preclusion or cessation of interaction and/or communication, such as between computer systems; and/or a reconfiguration of one or more computer systems or communications facilities therebetween.
Embodiments of the present invention identify computer systems or interacting pairs of computer systems for the deployment of malware protection measures based on a simulation of a propagation of malware through the model of a set of computer systems. Such a simulation employs simulation parameters including: a rate of interaction (or a contact rate) between each interacting pair of computer systems (i.e. a number of interactions per time period); a rate of transmission of the malware between interacting computer systems per interaction; a rate of detection of malware in the network; and a rate of removal of computer systems from the network to slow or stop the rate of infection. Some or all of these parameters may be derived statistically according to a statistical distribution. In some embodiments, some or all of these parameters may be determined based on historical interaction information over a historical time period. In some embodiments, some or all of these parameters are determined based on one or more machine learning processes based on historical interaction information.
Preferably, mitigation measures are intended to directly affect the transmission rate, detection rate, and/or removal rate for a malware, or an infection, propagating through a network. For example, implementing an adjustment or supplement to security facilities such as antimalware, proxies, firewalls and the like within the network such as by modifying policies for such facilities can directly affect one or more of these rates.
Mitigative measures can further include protective or interruptive measures including one or more of, inter alia: deployment of malware remediation facilities such as anti-malware; the isolation of a subset of the network by interrupting communications along one or more selected edges; the disconnection of one or more devices from the network; the instigation of protective measures in respect of data stored at devices in advance of their predicted infection such as backup, storage, offlining or disconnection of sensitive data stores; the generation of new networks of devices such as to exclude devices predicted to be infected; affecting a transmission rate within a network or between pairs of devices in the network such as by throttling or otherwise affecting a rate or frequency of communication between devices, or to limit/constrain a “size” of communication (e.g. payload size) or otherwise constrain communication (e.g. imposing new limits) [all of these are particularly beneficial]; and the propagation of alerts and/or information to devices on the network. Such mitigative measure can be determined and configured cognisant of the time required to effect such mitigation and the forecast state of the network and malware infection over such a time period.
Preferably, an edge 212 constitutes an indication that at least one interaction has taken place over at least a predetermined historic time period between computer systems in a pair. Preferably, the existence of an edge 212 is not determinative, indicative or reflective in of itself of a degree, frequency, or propensity of interaction between computer systems in a pair. Rather, the edge 212 identifies that interaction between nodes can or has taken place. In some embodiments, edges 212 can have associated, for example, inter alia: an edge identifier; an identification of a pair of nodes (and/or the corresponding electronic devices or computer systems) that the edge interconnects; and/or interaction frequency information between a pair of computer systems.
It will be appreciated by those skilled in the art that, while the model 200 is illustrated as a literal graph in the arrangement of
The arrangement of
The simulator 202 is operable on the basis of simulation parameters including: an contact rate as a number of interactions between pairs of interacting computer systems in a time period; and a transmission rate 250 as a rate of transmission of a malware between computer systems in a pair of systems per interaction. The transmission rate 250 is a probability of transmission of a malware from one node to another node during an interaction between the nodes. The transmission rate 250 may incorporate aspects of a malware infection process.
For example, in the case of malware transmitted as a web-link between two computer systems by email, the transmission rate can reflect all of: a probability that an email is communicated between the two computer systems; a probability that the email includes the malicious web-link; and a probability that a recipient accesses the malicious web-link resulting in malware infection.
The simulator 202 can operate on the basis of configurable characteristics such as simulation assumptions. For example, the simulator 202 may operate on the basis that any computer system as represented by a node in the model 200 can only transmit the malware to first-degree neighbours according to the model 200.
Further, the simulator 202 preferably operates on the basis that each computer system has a state of infection at a point in time. States of infection at a point in time can include, for example: a state of susceptibility in which a computer system is susceptible to infection, such as a computer system that is not and has not been so far infected and is not specifically protected from infection by a particular malware; a state of infected in which a computer system is subject to infection by the malware at the point in time; and a state of removed or remediated in which a computer system is remediated of a past infection or protected from prospective infection by the malware.
It will be appreciated by those skilled in the art that sub-states of these states can also be employed, such as, inter alia: an infected state that is not infectious (i.e. transmission of malware cannot be effected by a computer system in such a state); an infected state that is infectious; an infected state that is detected; and an infected state that is not detected (such as might be determined by the simulator 202).
Therefore, in use, the simulator 202 is operable for a time period to model the propagation of a malware infection. In some embodiments, one or more predetermined source computer systems represented in the model 200 are selected as originating computer systems for the malware infection such that propagation is simulated from such originating computer systems. Preferably, the simulator 202 is executed for each of a plurality of time periods so as to model the propagation of the malware in the set of computer systems over time. Additionally or alternatively, the simulator 202 can be performed a plurality of times for each of a plurality of predetermined source computer systems selected as originating computer systems for the malware infection.
At step 300, several parameters are defined for the simulation and input into the simulator, including a removal rate, a detection rate and an infection rate. Each of these parameters takes a value between 0 and 1. In some embodiments, one or malware protection measures may be deployed. The malware protection measures may be any one of the malware protection measures described herein.
In the present example, each of the nodes has four Boolean states: susceptible, infected, detected, and removed. Initially, every node is set to susceptible, uninfected, undetected and not removed. At step 302, an outbreak is initiated by setting the state of one or more nodes to infected. These nodes may be referred to as “outbreak nodes”.
At step 304, detected nodes are removed from the network according to the removal rate. That is to say, the infected nodes that have been detected are isolated from the rest of the network (i.e. by severing the connection to neighbouring nodes). The state of these nodes is set to “removed”.
At step 306, a number of the infected nodes are detected according to the detection rate. The state of these nodes is set to “detected”.
At step 308, the neighbours of the infected nodes are determined. The susceptible neighbours are then infected according to the infection rate at step 310. The state of these nodes is set to “infected”.
At step 320, it is determined whether the infection is finished, i.e. whether the malware is able to spread any further, or if the maximum number of steps for the simulation has been reached. For example, if the infected nodes are all removed, then these nodes cannot infect any other nodes and the malware cannot spread any further. If the malware cannot spread any further, the simulation ends at step 322, and optionally statistics for the simulation are calculated. Otherwise, if the malware can still spread further, steps 304-310 are repeated.
Steps 304 to 308 are discussed above in a particular sequence. However, it will be appreciated that this sequence is not intended to be limiting, and steps 304 to 308 may instead be carried out in any appropriate sequence or order.
As is further shown in
The protector 208 may identify one or more computer systems or interacting pairs of computer systems (such as are represented by edges 212 in the model 200) for the deployment of malware protection measures. Such identified systems or pairs of systems can be selected based on, for example, inter alia: a computer system or interacting pair of systems through which malware propagates in the simulation to a subset of other computer systems in the set of computer systems; identifying a subset of computer systems having a relatively greater, or greatest, proportion of computer systems infected by the malware according to the simulation, so as to identify one or more computer systems or pairs of systems as a gateway, link or bridge to such identified subset; a number of computer systems to which the malware is propagated via a computer system or pair of systems; and other criteria as will be apparent to those skilled in the art. For example, “choke-points” in the model 200 can be identified by the protector 208 based on the simulator 202 output as nodes or pairs of nodes representing computer systems or interacting pairs of systems constituting pathways for propagation of the malware to subsets of nodes in the model 200. The malware protection measures deployed by the protector 208 can include those previously described, and in this way at least a subset of the set of computer systems can be protected from the malware by the targeted deployment of malware protection measures.
In view of the above, the simulator 202 can be used to simulate the propagation
of malware through a computer network (or computer system) in a realistic manner, such that the protector 208 may use the simulation to improve the selection of one or more appropriate malware protection measures for implementing in the computer network. Such appropriately selected one or more malware protection measures may then be deployed in a computer network (or computer system), either in response to a real malware infection, or as a pre-emptive measure to prevent or reduce the likelihood of an infection propagating. Consequently, the more realistic the simulation of malware propagation in the modelled computer network is, the better the protector 208 is able to select an appropriate and effective malware protection measure to contain, counteract, or pre-emptively prevent a real malware infection in the computer network.
Step 400 is similar to step 300 of
In the present example, each of the nodes has four Boolean states: susceptible, infected, detected, and removed. Initially, every node is set to susceptible, uninfected, undetected and not removed. In some examples, one or more nodes may be patched (i.e. the state of the nodes is set to insusceptible).
Step 402 is similar to step 302; however, in addition to the actions of step 302, a secondary infections list is created for each node. These infection lists are used to track which nodes each node has infected. Initially, each infection list is empty.
Steps 404-410 are the same as steps 304-310 of
At step 412, following the step 410 of each primary infected node infecting a number of neighbouring secondary nodes, the secondary infection list for each primary node is updated with the identification number of the secondary node(s) that it infected.
Steps 414-418 define how the reproduction number R0 is determined. At step 414, all the primary infected nodes are determined. At step 416, the secondary infection list for each primary infected node is obtained, and the length of the list (i.e. the number of entries) is determined. For example, if a primary node has infected three secondary nodes, then the length of the secondary infection list is three.
At step 418, the lengths of all the infection lists are summed together, and the resulting number is divided by the number of primary infected nodes. The result of this calculation is the R0 value. In examples where a mitigation measure is deployed, the determined value of the reproduction number R0 provides an objective measure of the impact of the mitigation measure on the spread of an infection.
In some examples, a value of the effective reproduction number Rt may be determined, by multiplying the value of the reproduction number R0 and the proportion of nodes, p, that are susceptible to the malware. If the infection has not finished, steps 404-418 are repeated for one or more
further time periods, and a new R0 value is determined for each iteration of steps 404-418. In this way, a series of R0 values may be obtained, with each R0 value corresponding to a given time period. In some examples, one or more mitigation measures may be deployed for each time period. For example, in a first time period a first node may be removed, in a second time period a second node may be removed, and in a third time period a third node may be patched. In this case, each R0 value corresponds to a particular mitigation measure, and the effects of different mitigation measures over the course of the simulation can be evaluated.
If the infection has finished, the simulation ends at step 422, and optionally, statistics for the simulation can be calculated. If desired, the simulation can be repeated using one or more mitigation measures if no mitigation measures were initially used, or using different mitigation measures if mitigation measures were initially used.
An example of a network graph is shown in
At step S1, the time for the simulation is initialised to t=00:00. A secondary infections list is initialised for each node. The state of the central node N0 is set to infected with a malware.
At step S2, detected nodes are removed according to the removal rate. At present, no nodes have been detected, so no nodes are removed.
At step S3, infected nodes are detected according to the detection rate. In the present example, node N0 is detected.
At step S4, neighbours of infected nodes are infected according to the infection rate. In the present example, the node N5 is infected, i.e. the malware is passed on to node N5 from node N0. The secondary infections list for node N0 is updated to include the node N5.
At step S5, a value of the reproduction number R0 is determined. First, all the primary infected nodes are determined. Then, the length of the secondary infections list for each initially infected node is obtained. In this case, the only node that was initially infected with the malware is node N0. The secondary infections list for node N0 includes one node (N5), and thus has a length of 1. The lengths of the secondary infections lists are then averaged over the number of primary infected nodes to obtain a value of the reproduction number R0. In the present case, this average is 1 divided by 1. Therefore R0=1.
Optionally, a value of the effective reproduction number Rt may be determined, by multiplying the value of the reproduction number R0 and the proportion of nodes that are susceptible to the malware p. In the present example, every node is susceptible to the malware, and therefore Rt=R0×p=1×1=1.
At step S6, the time is incremented by 1 hour, so t=02:00.
At step S7, detected nodes are removed according to the removal rate. Since node N0 has been detected, it is removed.
At step S8, infected nodes are detected according to the detection rate. In the present case, randomly no new nodes are detected.
At step S9, neighbours of primary infected nodes are infected according to the infection rate. In the present example, node N5 randomly infects two other nodes N4 and N6, i.e. the malware is passed on from node N5 to nodes N4 and N6. The secondary infections list for node N5 is updated to include the nodes N4 and N6.
At step S10, a value of the reproduction number R0 is determined. First, all the primary infected nodes are determined. In this case, the two nodes that were initially infected with the malware are node N0 and node N5. The secondary infections list is then obtained for each primary infected node, and the length of each secondary infections list is obtained. In this case, the secondary infections list for node NO includes one node (N5) and the secondary infections list for node N5 includes two nodes (N4 and N6). The lengths of the secondary infections lists for these nodes are thus 1 and 2 respectively. The lengths of the secondary infections lists are then averaged over the number of primary infected nodes to obtain a value of the reproduction number R0. In the present case, this average is (1+2)/2. Therefore R0=1.5. Again, R0=Rt=1, since all nodes are susceptible.
At step S11, the time is incremented by 1 hour, so t=03:00. For this step, nodes N1 and N2 are patched.
At step S12, the detected nodes are removed according to the removal rate. In the present example, there are no newly detected nodes, so no nodes are removed.
At step S13, infected nodes are detected according to the detection rate. In the present case, node N4 is randomly detected.
At step S14, neighbours of infected nodes are infected according to the infection rate. In the present example, node N4 randomly infects node N3. The secondary infections list for node N4 is updated to include the node N3.
At step S15, a value of the reproduction number R0 is determined. The lengths of the secondary infections lists of the initially infected nodes (nodes N0, N4, N5 and N6) are determined, giving 1, 1, 2, 0. These values are then averaged over the number of initially infected nodes to obtain a value of the reproduction number R0. In the present case, this average is (1+1+2+0)/4=1. Therefore R0=1. Since nodes N1 and N2 are not susceptible to the malware, only 5/7 nodes are susceptible. Therefore Rt=R0×p=1×(5/7)=5/7=0.71.
In the present case, no more nodes can be infected, therefore the infection has finished.
At step 600, a model of a set of computer systems is accessed. The model may be a graph, e.g. as shown in
At step 602, propagation of the malware through the set of computer systems is simulated using the model. The simulation may be performed using any of the methods described herein, for example the method described above in relation to
At step 604, one or more malware protection measures that are to be deployed to one or more computer systems are identified. The malware protection measures are identified based on the results of the simulation performed in step 602.
At step 606, the malware protection measures identified in step 604 are deployed to the one or more computer systems.
Performing the above method using a simulation where one or more values of a reproduction number are determined can be useful for identifying an appropriate malware protection measure to deploy in a real-world network. For example, if the determined reproduction number is relatively high (e.g. 2-3), then a strict malware protection measure (e.g. removing computer systems from the network) may be deployed. Alternatively, if the determined reproduction number is relatively low (e.g. <1), then a less severe malware protection measure (e.g. patching computer systems) may be deployed.
Furthermore, in examples where one or more mitigation measures are deployed as part of the simulation, the reproduction number associated with each mitigation measure can be used to form a guided decision in a real-world system. For example, reproduction number values determined for different mitigation measures can be compared, and the mitigation measure(s) exhibiting the greatest impact (or at least a degree of impact meeting a threshold degree) on the reproduction number can be selected for deployment to protect a real-world network. Hence, determining the reproduction numbers simplifies the process of selecting one or more malware protection measures for deployment in a real-world network.
Any of the above discussed methods may be performed using a computer system or similar computational resource, or system comprising one or more processors and a non-transitory memory storing one or more programs configured to execute the method. Likewise, a non-transitory computer readable storage medium may store one or more programs that comprise instructions that, when executed, carry out the methods described herein.
Whilst certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the application. Indeed, the novel devices, and methods described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the devices, methods and products described herein may be made without departing from the scope of the present application. The word “comprising” can mean “including” or “consisting of” and therefore does not exclude the presence of elements or steps other than those listed in any claim or the specification as a whole. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope of the application.
Number | Date | Country | Kind |
---|---|---|---|
2203371.6 | Mar 2022 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2023/053494 | 2/13/2023 | WO |