The present disclosure relates to a method of generating invariants for detecting cyber-attacks, and more particularly but not exclusively, to a method of generating invariants for detecting distributed cyber-attacks on a cyber-physical system having a number of system components, and also to an apparatus thereof.
Cyber-physical systems integrate physical processes with computation and networking capabilities allowing monitoring and control of process components using embedded computers and networking systems. Such cyber-physical systems are vulnerable to both physical and cyber attacks. While employing physical security to guard a walled facility may be necessary to prevent physical attacks, it is not sufficient to prevent or detect cyber attacks. When the cyber-physical system is deployed in a critical infrastructure such as a water treatment plant or power generation facility, it becomes even more critical to prevent successful attacks on these systems. There has been continued increase in the number of security related incidents on such infrastructures. For example, a report indicated 25 incidents on water systems out of a total of 295 reported incidents in one year. Of these incidents, 22 of the attacks had reached “Level 6-Critical Systems”. Given the dependence on water, power, and other critical infrastructure, it is important that such infrastructure be secured against both external and internal malicious actors.
A proposed method for detecting cyber-physical attacks on critical infrastructure included theoretical and simulation studies on distributed attack detection in power grids, water treatment plants, and automotive systems. The method used state-based invariants to identify deviation of the plant process from its normal behaviour, also termed as process anomaly. A design-centric (DeC) approach was proposed to derive such invariants. The invariants were derived directly from plant design such as a Process and Instrumentation Diagram (P&ID) for water treatment or a Line Diagram for power generation systems. The proposed method produced invariants that were coded and installed inside controllers or placed on the plant communications network to serve as process monitors. An alert was generated when any one invariant has been violated, i.e. evaluated to false. However, the alert was indicative of process anomaly that could have been due to a fault in one or more components of the plant or a cyber attack.
Invariant generation and analysis using machine learning techniques have also been attempted. However, the techniques that are known have not been able to detect distributed attacks to a satisfactory degree of accuracy to ensure critical infrastructures are secured against both external and internal malicious actors.
Therefore, it is desirable to provide a method of generating invariants for distributed attack detection which addresses at least one of the drawbacks of the prior art and/or to provide the public with a useful choice.
Various aspects of the present disclosure will now be described in order to provide a general overview of the present disclosure. The following summary, by no means, delineate the scope of the invention.
According to a first aspect, there is provided a method of generating invariants for distributed attack detection on a cyber physical system having a number of system components. The method includes (i) deriving design invariants based on system design of the cyber physical system including physical specifications of the system components, (ii) obtaining operational data of the cyber physical system including operational attributes of the system components, (iii) generating operational invariants from the obtained operational data, and (iv) correlating the operational invariants with the design invariants to generate an integrated set of invariants for detecting distributed cyber attacks of the cyber physical system.
Advantageously, by integrating design invariants with operational invariants, the integrated set improves the accuracy of distributed attack detection and reduces false alarms.
The operational invariants may be validated operational invariants.
Producing the validated operational invariants may include validating the operational invariants against the system design of the cyber physical system.
Obtaining the operational data may include collecting network packets, decoding the network packets for state information of sensors to derive the operational attributes of the system components, and producing an invariant dataset for generating the operational invariants.
The method may further include reducing the invariant dataset to produce a reduced invariant dataset for generating the operational invariants.
Producing the reduced invariant dataset may include processing the operational attributes of the system components to produce discrete valued attributes.
The operational attributes may be real valued attributes, and producing discrete valued attributes may include discretizing the real valued attributes to binary valued attributes.
The method may further include monitoring the sensors corresponding to the system components of the discrete valued attributes for changes in the discrete values over a specific period of time.
The method may further include selecting operational attributes which exhibit change in the discrete values as part of the reduced invariant dataset for generating the operational invariants.
The method may further include forming one or more of the discrete valued attributes into itemsets, and selecting the itemsets that satisfy a preselected minimum support level as part of the reduced invariant dataset.
The method may further include generating association rules that satisfy a preselected minimum confidence level from the itemsets. The operational invariants may be the association rules for defining a relationship between the operational attributes of each system component.
Correlating the operational invariants with the design invariants may include comparing the operational invariants to the design invariants, and removing highly correlated attributes to form the integrated set of invariants.
The method may further include coding the integrated set of invariants as respective computer codes, and programming controllers with the respective computer codes for monitoring process anomalies in the cyber physical system.
The cyber physical system may be a water treatment or power generation plant.
According to a second aspect, there is provided an apparatus for generating invariants to detect distributed attacks on a cyber physical system having a number of system components. The apparatus includes a first invariant generator configured to derive design invariants based on system design of the cyber physical system including physical specifications of the system components, a data collector configured to obtain operational data of the cyber physical system including operational attributes of the system components, a second invariant generator configured to generate operational invariants from the obtained operational data, and a processor configured to correlate the operational invariants with the design invariants to generate an integrated set of invariants for detecting distributed cyber attacks on the cyber physical system.
The operational invariants may be validated operational invariants.
The apparatus may further include a rule validation processor configured to validate the operational invariants against the system design of the cyber physical system to produce the validated operational invariants.
Exemplary embodiments will now be described with reference to the accompanying drawings, in which:
One or more embodiments of the present disclosure will now be described with reference to the figures. It should be noted that the use of the term “an embodiment” in various parts of the specification does not necessarily refer to the same embodiment. Features described in one embodiment may not be present in other embodiments, nor should they be understood as being precluded from other embodiments merely by the absence of the features from those embodiments. Further, various features described may be present in some embodiments and not in others.
Additionally, figures have been provided to aid in the description of the preferred embodiments. The figures and the following description should not take away from the generality of the preceding summary. The following description contains specific examples for illustrative purposes. The person skilled in the art would appreciate that variations and alterations to the specific examples are possible and within the scope of the present disclosure. For illustrative purposes, specific embodiments are described with respect to a Secure Water Treatment Plant (SWaT) which utilizes a cyber physical system. However, it should be understood that the embodiments are equally applicable to other infrastructures e.g. a power generation plant, that employ cyber physical systems.
The physical layer 200 of the SWaT 100 is herein described with reference to
The following notations are used in
Programmable Logic Controller (PLC):
Px x={1, 2, 3, 4, 5, 6} for each stage of the treatment
Referring to
Sensors and actuators: The physical layer 200 of SWaT 100 contains a total of 68 sensors and actuators. It should be noted that not all of the sensors and actuators are shown in
Plant supervision and control: A Supervisory Control and Data Acquisition (SCADA) workstation is located in the plant control room. Data or control access to nearly all plant components is available via this workstation. A plant operator can view process state and set process parameters via the workstation. A Human Machine Interface (HMI) is also located inside the plant room and can be used to view process state and set parameters. Control code can be loaded into each PLC via the workstation. A historian is available for recording process state as well as network packet flows at preset time intervals.
Communications: With reference to
SWaT operation: Operation of the plant is initiated by an operator at the SCADA workstation and, when needed, can be controlled. State information can be viewed at the workstation or at the HMI, and is recorded in the historian. Process anomaly detectors, i.e. monitors, developed by researchers have been installed in SWaT 100. Detectors generate visual alerts and send messages to the operator. All alerts generated by the monitors, i.e. coded invariants, are recorded in the historian. SWaT 100 can be attacked by compromising its communications network at all levels as well as directly by accessing the PLCs, the SCADA workstation, and the HMI. Physical attacks are feasible in SWaT 100 through several means such as by replacing or removing sensors, disconnecting wires between sensors/actuators and the PLCs, removing power to one or more actuators.
While physical attacks on the physical layer 100 may be prevented through physical security, this is inadequate to prevent cyber attacks. The embodiments described herein therefore use invariants to detect and prevent cyber attacks on the multi-layer network. An invariant is a condition that holds during the operation of a physical plant when the plant is in a given state.
Let X(t) denotes a time (t) dependent n-dimensional state vector for the plant consisting of state variables that can be observed via sensors. X(t)=Xc(t)∪Xd(t), where Xc(t) and Xd(t) denote, respectively, vectors of continuous valued and discrete valued state variables. For example, the state of a motorized valve such as MV101, is discrete valued while the water level of tank T101 measured by level sensor LIT101 is continuous valued. It is taken that all state variables evolve with time and hence time is not explicated indicated, e.g., X≡X(t). Furthermore, state variable x∈X may be discrete or continuous.
Let f(X) and g(X) denote Boolean functions, and h(Xc)∈R+ denote a function on a continuous state variable. The following types of invariants are presented.
x op v (1)
f(X)⇒g(X) (2)
h(x∈Xc)<ε (3)
where v∈R is a constant, ε>0 an error threshold, x∈Xc a continuous state variable, and op denotes a relational operator. Invariants of type (1) are simple and intended to check a state variable against its upper and lower limits. Such invariants might be redundant when checks are coded in the plant control algorithms. Invariants of type (2) are to be interpreted as “if f (X) then g(X).” Such an invariant is also referred to as an association rule in the description. Invariants of type (3) are used to compare predicted values of a continuous state variable with measured values from the corresponding sensor. The error threshold ε is determined based on the error in the measurements reported by the corresponding sensor.
Each invariant is coded in an appropriate language depending on where in the plant it is placed. In SWaT 100, invariants are coded in structured text and placed inside the PLCs to serve as process monitors. These monitors can also be placed on the communications network. It is understood that the skilled person would be aware of the best location(s) for invariants in a plant.
State variables in an operational plant are sampled at pre-specified instants by obtaining measurements from the corresponding sensors. The states of actuators are obtained by sampling sensors inbuilt into the actuators. Each invariant is evaluated soon after the data is sampled. An alert is generated when any invariant evaluates to false. In distributed attack detection, functions f(X) and g(X) may use state variables from multiple stages of the plant. Thus, state vector X can be written as [X1, X2, . . . , Xn], where Xi is the state vector for stage ‘i’ of a plant, where 1≤i≤n. An invariant is considered local to stage ‘i’ if it uses state variables from only stage ‘i’ i.e. a local invariant. If an invariant uses state variables from more than one stage, then it is considered a global invariant. A distributed attack on a system may occur at one or more stages of a system. Therefore, a mix of local and global invariants are used for distributed attack detection.
At step 310, design invariants 311 are derived using an invariant generator. The derivation is based on the system design of the SWaT 100. System design are found in the physical specification of SWaT system components, Process and Instrumentation Diagrams (P&IDs), and State Condition Graphs (SCGs). Design invariants 311 are derived using control algorithms and the physical specification of SWaT 100. Alternatively, with a P&ID provided as input, design invariants 311 may also be generated by the invariant generator using fundamental laws of physics. Design invariants 311 may also be generated using an SCG. An SCG captures conditions needed to change the state of an actuator such as a pump or a motorized valve. These conditions lead to type (2) invariants. Type (1) invariants are derived from physical specifications of the plant components, while those of type (3) are derived from the physics of water flow.
Some examples of invariants derived from the system design of SWaT 100 are given next.
With reference
LIT101(k)<HH (4)
LIT101(k)>H⇒MV101=CLOSED (5)
LIT301(K)<L⇒P101=ON (6)
LIT101
The following explains how the invariants would work to detect system anomalies. The above invariants are coded to generate alerts when they evaluate to false. For example, invariant (4) generates an alert when water level in tank T101 goes above the HH marker. Invariant (5) generates an alert if motorized valve MV101 is not CLOSED when water level in tank T101 is above the H marker. Similarly, invariant (6) generates an alert when pump P101 is OFF and the water level in tank T301 is below L. Invariant (7) is used for predicting the water level in tank T101 (L/T101) given the amount of inflow (Win) and outflow (Wout) with a being the proportionality constant to convert flow to level in the tank. In the context of attack detection, (7) is not an invariant. Instead, it is used to create an invariant such as the following:
where n is the number of samples over which the average is computed and E is the error tolerance beyond which the process is considered in anomalous state. Considerations in selecting values of n and ∈ are in [2]. Table 1 lists several parameters used while coding the invariants derived.
At step 320, operational data 321 of SWaT 100 is obtained from publicly available datasets. Alternatively, the operational data 321 may also be collected from normal operation of SWaT i.e. “SWaT Normal Data”. For example, a data collection infrastructure can be put in place to capture and save state information generated by sensors. In SWaT 100, this data may be collected by capturing network packets, decoding the network packets for state information, and saving the state information in a historian. The operational data 321 collected will be later used to derive rules (invariants) to represent the normal behaviour/operation of SWaT 100. By doing so, the operational invariants 331 so derived are able to detect process anomalies that deviate from the normal behaviour of the SWaT 100.
To collect the normal data, SWaT 100 is started in a state in which tanks T101 and T103 are near state L, UF is active, and RO is inactive. To simulate the operation of a commercial plant, the feedback from RO tank T601 to tank T101 is disabled and all pure water generated from RO is sent to drain.
Soon after starting the data collection process, the plant moves to its full capacity of producing about 5 gallons/minute of pure water. The time-stamped dataset collected over a 7-day period consists of an Excel spreadsheet with 53 columns and 496,800 rows. Columns 1 and 53 contain, respectively, the time stamp and whether there was an attack or not. The normal data set is created without any detected attacks. The remaining columns contain the sensor data indicating the states of various plant components including tanks, valves, pumps, and meters, as well as data on chemical properties including pH, conductivity, and the Oxidation Reduction Potential (ORP).
The data collected is collated in a SWaT dataset to be used for generating operational invariants in the following step. Hence, the SWaT dataset is also termed interchangeably as an invariant dataset. For illustration, Table 2 lists some of the sample data extracted from the SWaT dataset. For example, data in the first row indicates that valve MV101 is ‘2’ or ‘OPEN’ and pump P101 is ‘2’ or CON′. The inflow and outflow rates into and from tank T101 as indicated by FIT101 and FIT102, respectively, are around 2.47 m3/hr. The nearly same inflow and outflow rates are consistent with the water level in T101 which hovers around 261 as indicated by LIT101.
Deriving Operational Invariants from SWaT Dataset
At step 330, operational invariants 331 are generated from the invariant dataset i.e. operational data 321, using association rule mining. Association Rule Mining (ARM) is a rule-based machine learning method to uncover relationships between seemingly unrelated data in databases. This relationship is expressed as a rule such as LIT301(k)<L⇒P101=ON. In such rules the item to the left of is referred to as antecedent and the one to the right as the consequent. ARM is used for a variety of applications including predicting customer behaviour, product clustering, web usage mining, catalogue design, store layout, intrusion detection, and bioinformatics.
In practical applications, discovering rules, such as the one mentioned above, poses several challenges for large datasets. In particular, the number of such rules grows exponentially with respect to the total number of dimensions, also referred to as items or attributes, in the dataset. Thus, the rule generation algorithm is NP-complete. To make the problem tractable, only “interesting” rules are selected. Furthermore, other statistical techniques are applied to further reduce the number of attributes in the invariant dataset. As a result, a reduced invariant dataset is produced. This is further explained below.
The state space of all possible rules that can be generated depends on the number of attributes and the number of unique values of each attribute in the dataset. Given a continuous valued attribute, virtually infinite rules is generated thus rendering the problem intractable. ARM therefore requires the attributes to be discrete valued whereas the SWaT dataset consists of real valued, binomial, and trinomial attributes. Therefore, it is necessary to discretize the real valued attributes to binary valued attributes to reduce the state space and consequently the set of possible rules.
In SWaT 100, sensors record the values of attributes and states of the various components. Transforming these attributes to binomial requires special care. The actuators for the most part are either in the OPEN or CLOSED state for valves and ON and OFF for pumps. However, during the transition between the two states, these attributes assume a third value, thus making them ternary valued. This transition between the two states only lasts less than 10 seconds and usually occurs after a long interval. Thus, the transition value of ternary-valued attributes was replaced by the value of the state towards which the transition was headed, i.e. to OPEN if the transition was from CLOSED to OPEN, and to CLOSED if it was from OPEN to CLOSED for a motorized valve. This change from ternary-valued attributes to binary-valued attributes further reduced the possible state space used in the ARM procedure.
To further reduce the possible set of rules, a naive feature selection may be applied. All the sensor and actuator attributes (after conversion to binary valued attributes) that did not change their values throughout the seven days of data are removed from the dataset. These included three types of attributes: all the backup actuators that remained in the OFF, or CLOSED, state during data collection because none of the active actuators failed, the actuators that were in ON, or OFF, state throughout the data collection process, and the sensor values that failed to exhibit a change in value after discretization. Consequently, none of the attributes from the processes in stages 4 and 5 qualifies for the final set of attributes, reducing the attribute set from fifty one to fifteen attributes. In this way, only dynamic attributes that gave meaningful information are selected. An exemplary list of dynamic attributes selected from the SWaT dataset is provided in Table 3.
It is noted that highly correlated attributes may also be removed to reduce redundant attributes to further reduce the state space.
To make the problem even more tractable, only “interesting” rules are selected using statistical constraints. Rules that meet a minimum criterion of support and confidence are deemed interesting. The concept of support and confidence is explained in the following section.
Let D denote a dataset of interest. A collection of values of one or more attributes, e.g., the pair water level and state of a motorized valve, is known as an item set. Item sets that satisfy a minimum support are referred to as frequent item sets. Support for an item set A in D is the proportion of examples (rows, or transactions) e in the dataset that contain A. Formally, support can be defined as follows.
It should be noted that setting a high support leads to few frequent item sets and thus a conservative model, whereas a small support results in an explosion of frequent item sets which will likely include rare item sets.
A frequent item set can be partitioned in more than one way into antecedents and consequents to generate rules of the type X⇒Y. Only rules that satisfy a minimum confidence level defined by the user are considered as the final set of association rules. Confidence is defined as the proportion of rules that contain the antecedent which also contains the consequent; it measures how often the rule appears in the dataset when X has occurred. The confidence of a rule X⇒Y is defined as follows.
Thus, confidence can be interpreted as an estimate of the conditional probability Y given X for rules that also contain X. Setting a low value of confidence yields rules that may be less accurate than those generated for higher confidence. X and Y can have one, two, or more attributes depending on the size of the frequent item set.
In the present embodiments, rules are mined with 100% confidence and a minimum support of 0.77%. Furthermore, the FP-growth frequent pattern mining algorithm is used to mine the association rules. The implementation of the algorithm provided by Python Orange-Associate library is used.
Table 4 lists the exemplary invariants that are derived. Notably, the list includes a large number of global attributes as compared to local attributes. This tilt towards global attributes points to the power of distributed attack detection as global attributes are capable of detecting attacks that compromise all sensors and actuators at any one stage of SWaT 100.
In general, the following two challenges have to be overcome when deriving the operational invariants 331.
Transformation of Attributes:
Some of the attributes in the dataset are real valued while usually ARM works on binomial attributes. Transforming these real valued attributes into binomial attributes is a challenging task as the absence of proper boundaries may lead to incorrect rules or rules with low accuracy. There is also a problem with the trinomial attributes that represent the motorized valve that enters the transition state. Hence, changing this transition state to either ON (OPEN), or OFF (CLOSED) state is important or else false alarms may be generated.
Very Large Set of Rules:
Association rule mining generates a large set of rules most of which have low accuracy. The number of rules could be controlled through support threshold. However, increasing the support level may cause loss of important rules that do not have enough occurrences in the dataset to meet the support threshold. On the other hand, reducing the support threshold would generate a large set of rules. Notably, there are some attributes with low items in the dataset. For example, there are 3164 items where P602=ON and the total number of items is 410400. This implies that any rule containing P602=ON could have a maximum support of 3164/410400, i.e. 0.77%. Hence, without decreasing the support up to this level, no rule including P602=ON can be generated. Consequently, a large set of rules needs to be scanned in order to get meaningful and accurate rules.
At step 340, the design invariants 311 are correlated to the operational invariants 331 to produce an integrated set of invariants 341. In the absence of automation, deriving design invariants 311 require an expert level of understanding of the physical process in SWaT 100. The invariants derived are thus accurate in their depiction of the physical processes in SWaT 100. However, due to the complexity of the task of deriving design invariants, certain hidden patterns may be overlooked by experts resulting in the invariants derived being limited in scope. On the other hand, despite ARM being blind to the control strategy specifications or the physical laws that derive the physical process of the system, the process of generating operational invariants 331 yields invariants that are insightful and complex, and that may very well have been overlooked by the experts. However, some obvious invariants might not be identified by ARM. Table 5 lists invariants that are common to both the design invariants 331 and operational invariants 331.
On the other hand, many design invariants derived 311 in step 310 differ from the operational invariants 331 generated by ARM using the operational data 321 obtained in step 320. Table 6 lists design invariants that are not common to the operational invariants 331. The reason for the different may be because the algorithm used by ARM is unable to identify certain underlying relationship between different components of the physical system, or there may have been loss of information during discretization, e.g., for LIT101, and feature removal, or the corresponding behaviour is not present/recorded in the dataset during the time window in which the data collection is carried out. For example, in Table 6, invariants 3, 20, and 37 could not have been derived because in the dataset used the state of various tanks in SWaT 100 lies between the normal ranges of L and H. Thus, the tanks can only reach LL and HH if the plant is either under attack, or an actuator is faulty, or when it is restarted with near-empty tanks.
Deriving design invariants 311 manually becomes increasingly complex with the size of the antecedent. Thus, design invariants 311 derived by comparing pairs of features, e.g., MV101 and LIT101, are relatively easy to derive than those where, for example, 6 or more features are compared simultaneously. Advantageously, invariants generated in step 330 i.e. operational invariants 331 are able to capture the relationships between multiple sensors and actuators across different processes of SWaT 100 without any constraint on the size of the antecedent. Invariants that are dependent on multiple sensors and actuators, instead of single or pairwise sensors and actuators, may be generated.
Indeed, exclusion of invariants in the monitoring system of SWaT 100 would likely lead to attacks not being detected. If only operational invariants 331 are implemented in SWaT 100, then design invariants 311 that are not common to operational invariants 331 would not be implemented. For example, (LIT101≤L⇒MV101=OPEN) which is found in the first row of Table 6, and those corresponding to type (8) would not have been implemented. Thus, a simple single point attack that spoofs LIT101 values while keeping MV101 open, could lead to an overflow in tank T101. Several similar attacks can be derived that would not be detected.
By correlating the design invariants 311 to operational invariants 331, a richer integrated set of invariants 341 is obtained. Thus, higher accuracy of attack detection is achieved than when either the design invariants 311 or operational invariants 331 is used without the other.
In the present embodiment, the apparatus 500 comprises an invariant generator 511 configured to receive plant design 150 and component specification 160 of SWaT 100. The invariant generator 511 is configured to generate design invariants 311 from the plant design 150 and the component specifications 160. Therefore, the invariant generator 511 is also termed as a design invariant generator. The design invariant generator 511 is communicatively coupled to a code generator 551 which is configured to receive the design invariants 311.
The step of deriving design invariants 311 will now be described with reference to step 610 of
The apparatus 500 further comprises a data collector 521 which receives sensor data 522 collected from SWaT 100, the data collector 521 outputs operational data 321.
The step of obtaining operational data will now be described with reference to step 620 of
The apparatus 500 further comprises an operational invariant generator 531 communicatively coupled to the data collector 521. The operational invariant generator 531 receives the operational data 321 and generates the operational invariants 331 from the operational data 321. This involves a number of processes which is described under the section on “Deriving operational invariants from SWaT Dataset”. Therefore, the invariant generator 531 for generating operational invariants 331, also termed operational invariant generator, includes a number of components In particular, the operational invariant generator 531 comprises a feature selector 532 which receives the operational data 321 and outputs a feature set 533. The feature selector 532 is communicatively coupled to a frequent itemset generator 534 which receives the feature set 533 and generates frequent itemsets 536 at a preselected level of support 535 from the selected feature set 533. The frequent itemset generator is communicatively coupled to an associate rule generator 537 which then receives the frequent itemsets 536 and generates association rules (the operational invariants 331) at a preselected level of confidence 538.
The step of generating the operational invariants 331 is described herein with reference to step 630 of
Next, a subset of the original dataset containing only a selected feature set 533 is passed on to a frequent itemset generator 534. Frequent itemsets 536 at a preselected level of support 535 are generated by the frequent itemset generator 534 from the selected feature set 533.
The reduced itemsets 536 are then inputted to an association rule generator 537 that generates the association rules (operational invariants 331) at a preselected level of confidence 538. Support and accuracy thresholds are parameters that enable controlling invariant-explosion and in reducing chances of false alarms. In cyber physical systems, having a high enough level of accuracy is vital to prevent the false alarm rate from becoming unacceptable high.
Unlike the design invariants 311 which are derived based on physical processes of SWaT 100, the operational invariants 331 generated have not been implemented in SWaT 100 yet. Therefore, optionally, the apparatus 500 further comprises a rule validation processor 541 communicatively coupled to the operational invariant generator 531. The rule validation processor 541 is arranged to receive the operational invariants 331 from the operational invariant generator 531 and to validate the operational invariants 352 against the SWaT plant design 150 and the component specifications 160 to produce validated operational invariants 352.
The step of validating the operational invariants 331 will now be described with reference to step 640 of
The rule validation processor 541 is communicatively coupled to the code generator 551 which receives the validated operational invariants 352. Notably, if the rule validation processor 541 is not required, then the operational invariant generator 531 may be communicatively coupled directly to the code generator 551 (not shown in
The code generator 551 comprises a processor (not shown) which correlates the design invariants 311 and validated operational invariants 352 to produce the integrated set of invariants 341. The code generator 551 then encodes the integrated set of invariants 341 to produce coded integrated invariants 552.
The step of producing coded integrated invariants 552 will now be described with reference to step 650 of
The apparatus further comprises a monitor placement 561 communicatively coupled to the code generator 551. The monitor placement 561 receives the coded integrated invariants 552 and places the coded integrated invariants 552 inside respective PLCs in SWaT 100.
Placement of the coded integrated invariants 552 will now be described with reference to step 660 of
In the present disclosure, an attack detection system which uses the control strategy of the plant system, as well as association rule mining, to discover the inherent behaviour of the plant system for detecting process anomaly is defined. The design invariants 311 and the operational invariants 331 are separately derived/generated and a combined set of invariants 341 is generated, with no redundancy, and implemented to monitor a plant process. Doing so improves the accuracy of distributed attack detection and reduces false alarms more than when either the design invariants 311 or the operational invariants 331 are used independently. Having said that, if operational invariants 331 are to be used alone, they should be augmented with other approaches to derive invariants that correspond to continuous variables such as LIT in SWaT 100. Furthermore, operational invariants may be continuously generated while operational data 321 is being collected during plant operation. Doing so would enable retuning parameters, e.g. opening and closing times of a valve, as the plant gets older and components degrade. Notably, tuning the parameters of design invariants 311 is ineffective as the derivation assumes parameters available at the time of plant design.
Additionally, derivation of the design parameters should be automated if it is to be used in any large plants. Generally such plants have hundreds, if not thousands, of sensors and actuators. It would be practically impossible to generate manually even simple invariants with an antecedent of size 1 in such plants.
Violation of an invariant does not necessarily imply detection of a cyber attack. It could also be due to the failure of one of more components. State information has to be analysed to identify if an alert generated using monitors derived from invariants is due to cyber attack or component failure.
As it can be appreciated from the described embodiments, process anomaly is used for detecting cyber-physical attacks on critical infrastructure such as plants for water treatment and electric power generation. Identification of process anomaly is possible using rules that govern the physical and chemical behavior of the process within a plant. These rules, often referred to as invariants, or monitors when implemented, is derived/generated from an integration of both the plant design and from the data generated in an operational plant.
Although the present disclosure has been described with reference to specific exemplary embodiments, various modifications may be made to the embodiments without departing from the scope of the invention as laid out in the claims. For example, each invariant is coded in an appropriate language depending on where in the plant it is placed. In SWaT 100, the invariants are coded in structured text and placed inside the PLCs to serve as process monitors. However, these monitors could also be placed at level 1 and level 0 of on the communications network. However, care must be taken in doing so to ensure that all data needed to evaluate the invariants is available on the network. It is understood that the skilled person would have knowledge of the best location(s) to place the invariants in a plant.
Furthermore, while the association rules are mined using the FP-growth frequent pattern mining algorithm which is implemented using Python Orange-Associate Library, ARM could be implemented using any one of many algorithms that are available to the skilled person. Similarly, different heuristics techniques that are at the skilled person's disposal may be implemented for the generation of the design invariants 311.
The various embodiments as discussed above may be practiced with steps in a different order as disclosed in the description and illustrated in the Figures. Modifications and alternative constructions apparent to the skilled person are understood to be within the scope of disclosure.
| Number | Date | Country | Kind |
|---|---|---|---|
| 548/2017 | Oct 2017 | PK | national |
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/SG2018/050522 | 10/23/2018 | WO | 00 |