Controller Area Network (CAN) bus, designed by Robert Bosch GmbH in 1983, is the most popular standard for in-vehicle network communication. The CAN bus is a broadcast medium that uses a lossless bitwise arbitration method to resolve contentions during data transmissions among different Electronic Control Units (ECUs) in vehicles. Due to the lack of CAN frame encryption and authentication, a CAN bus is vulnerable to various attacks, which can in general be divided into message injection, suspension, and falsification. Existing CAN bus anomaly detection mechanisms either can only detect one or two of these attacks, or require numerous CAN messages during predictions, which can hardly realize real-time performance.
In detail, every CAN message is assigned a message identifier (CAN ID) based on its functionality and priority. A message with a lower ID will win the contention when two messages collide. In a vehicle CAN bus messages containing crucial information, such as those related to powertrain and vehicle safety, will be assigned lower CAN IDs, while infotainment and telematics messages will have higher CAN IDs.
The CAN bus protocol does not provide any authentication or encryption mechanisms. In this case, attackers have the chance to compromise the CAN bus in a vehicle through inserting forged messages. For a typical CAN ID, once forged messages overwhelm normal messages, attackers can take control of relevant operations in the targeted vehicle, which can cause severe consequences if that CAN ID is related to powertrain or vehicle safety. Furthermore, the recent development of Intelligent Transportation Systems (ITS) and Internet of Things (IoT) continuously expand the attack interfaces, which include sensors, Wi-Fi, and On-Board Diagnostics (OBD) and others. In 2021, a global automotive cyber security report based on 633 publicly reported incidents in the last decade, showing an exponential growth trend in cyber-attacks on connected vehicles.
Considering this, several Intrusion Detection Systems (IDSs) have been proposed to increase security of CAN bus. In general, if an adversary wants to make forged messages overwhelm normal ones, it can inject forged messages (DoS or fuzzy attack), suspend normal messages (suspension attack), or falsify data contents of normal messages (replay or spoofing attack). ECUs in the same CAN network usually transmit their messages with a comparatively fixed frequency, which makes the statistics of CAN message sequences comparatively stable. In this case, if the first two attack strategies happen, message frequencies or message sequences are likely to be changed, which are the motivations for existing message frequency or sequence-based detection methods. Other approaches consider message falsification attacks. They are based on the observation that message contents with the same CAN ID will not vary too much.
The situation becomes complicated when considering message injection/suspension attacks together with message falsification attacks. Message falsification attacks will not change message frequencies or sequence patterns, which will evade IDSs only considering message injection/suspension. On the other hand, IDSs for message falsification cannot identify Denial-of-Service (DoS) attacks or bus-off attacks, which do not need to change normal message contents. Recently, some related works tried to detect both of the above two kinds of attacks simultaneously based on Long Short-Term Memory (LSTM) autoencoder and bloom filtering. However, these schemes need to generate a separate model and analyze relevant messages for each CAN ID, which can induce very high computation complexity and intrusion detection delay.
The accompanying drawings provide visual representations which will be used to describe various representative embodiments more fully and can be used by those skilled in the art to understand better the representative embodiments disclosed and their inherent advantages. In these drawings, like reference numerals identify corresponding or analogous elements.
The various apparatus and devices described herein provide mechanisms for anomaly detection in a Controller Area Network (CAN bus) or other networks where messages between nodes follow one or more patterns.
While this present disclosure is susceptible of embodiment in many different forms, there is shown in the drawings and will herein be described in detail specific embodiments, with the understanding that the embodiments shown and described herein should be considered as providing examples of the principles of the present disclosure and are not intended to limit the present disclosure to the specific embodiments shown and described. In the description below, like reference numerals are used to describe the same, similar or corresponding parts in the several views of the drawings. For simplicity and clarity of illustration, reference numerals may be repeated among the figures to indicate corresponding or analogous elements.
The disclosure relates to an intrusion detection system (IDS) that can detect and classify various attacks simultaneously in as short as three milliseconds based on Graph Neural Networks (GNNs). The disclosure is described below with reference to a CAN bus. However, the disclosed mechanisms may be applied to other networks that exhibit one or more message patterns. In various embodiments, directed attributed graphs are generated based on message streams in given message intervals. Graph node attributes denote data contents in messages while each graph edge attribute represents the frequency of a typical message pair in the given time interval. After generating the graph, a GNN is trained based on generated message graphs.
Optionally, IDS 100 may include anomaly classifier 110 configured to process the input feature vectors using a second graph neural network, based on the same graph data, to produce second output feature vectors, and then classify the anomaly. The anomaly may be classified by processing the second output feature vectors through one or more second output layers. The classifier need only be operated when an anomaly is detected by anomaly detector 108.
Since training data may be highly imbalanced, IDS 100 uses a two-stage classifier cascade that is composed of a one-class classifier for anomaly detection and a multi-class classifier for attack classification. In one embodiment, an openmax layer is used in the multi-class classifier to tackle new anomalies from unknown classes. To take advantage of crowdsourcing while protecting user data privacy, federated learning may be used to train a universal model that covers different driving scenarios and vehicle states. Extensive experiment results show the effectiveness and efficiency of the disclosed approach.
The present disclosure relates to a CAN bus Intrusion Detection System (IDS), based on a graph neural network (GNN), which can efficiently detect message injection, suspension, and falsification attacks simultaneously. Directed attributed graphs are constructed to include both statistical CAN message sequences and message contents. Therein, message sequences are described by nodes, edges and edge attributes in graphs and summarized as the corresponding node attributes. For example, data contents in messages with a typical CAN ID may be preprocessed by the “READ” method (disclosed in M. Marchetti and D. Stabili, “READ: Reverse engineering of automotive data frames,” IEEE Trans. Inf. Forensics Secur., vol. 14, no. 4, pp. 1083-1097, 2018). With these generated CAN message graphs, a two-stage GNN-based classifier cascade is trained to build the disclosed IDS. Alternatively, data contents—such as signals and counters—may be extracted from message payloads using known message protocols. For example, a 64-bit payload may include four 15-bit values denoting encoded wheel speeds and a 4-bit checksum. Protocol information may be provided in a *. DBC database file or other format and stored for use by the IDS computer.
IDS 400 includes first stage 412 that performs anomaly detection. The first stage includes a graph neural network (GNN) 416. The structure of GNN 416 is based on the graph structure 414. Input feature vectors 418 are processed by GNN 416 to produce output feature vectors 420. In turn, output feature vectors 420 are passed through a one-class classification layer 422 that determines if a sequence of messages contains normal data 424 or one or more anomalies 426. An advantage of using one-class classification layer 422 for anomaly detection is that that attacked data are usually hard to acquire in the real world, normal data usually dominate the training set, so the data may become highly imbalanced.
Once an attacked data sample is captured by first-stage detector 412, the data sample is processed by second-stage classifier 428 for attack classification. The second stage includes a graph neural network (GNN) 430. Again, the structure of GNN 430 is based on the graph structure 414′ which is the same as structure 414. Input feature vectors 418′ are processed by GNN 430 to produce output feature vectors 432. In turn, output feature vectors 432 are passed through a multi-class classification layer that classifies the attack. During training, the classification layer may be a softmax layer 434 that produces a multi-class classification 436. During normal operation (inference) an openmax layer 438 is used to produce a classification 440 that can include new anomalies from potentially unknown classes, which will be buffered for further investigation and open world recognition.
As discussed in more detail below, different vehicle states can lead to variations in message sequences, message contents, and further message graphs. Therefore, the model trained on one vehicle will be constrained by its limited driving scenarios (e.g., a vehicle may mostly be driven in local with low speed and a lot of stops) and vehicle states, so cannot be applied to other vehicles with different driving scenarios and vehicle states (e.g., a vehicle mostly be driven on highway at high speed and with few stops). To take advantage of crowdsourcing while protecting user data privacy, a federated learning framework may be adopted to train a universal model that covers a wide range of driving scenarios and vehicle states.
A CAN bus Intrusion Detection System (IDS) is disclosed that can efficiently detect CAN message injection/suspension and message falsification attacks at the same time. Instead of a simple combination of the above two traditional IDSs, a CAN message graph is used to integrate message contents with statistical message sequences in terms of CAN ID pairs.
Further, a graph neural network (GNN) is disclosed that can be used for directed attributed graphs. Considering that attacked data are hard to acquire in the real world, which may cause highly imbalanced training sets, a two-stage classifier cascade is used to tackle normal and attacked CAN data respectively. An openmax layer is used to cope with new anomalies from potentially unknown classes.
Still further, a federated learning mechanism is disclosed to cover different driving scenarios and vehicle states while protecting data privacy.
Performance of the proposed IDS is discussed below with reference to extensive experiments based on several real-world datasets. Through comparisons with three baselines, it is shown that the IDS can achieve similar performance to both IDSs based on statistical CAN message sequences and message contents. Besides, federated learning can effectively combine models derived under different driving scenarios and vehicle states and improve intrusion detection performance.
Attacks may be introduced into a network through a physical interface, such as a maintenance interface or sensor interface, or through a wireless interface, such as entertainment or information interface.
Attacks to in-vehicle networks or CAN buses may include message sniffing, message injection, message suspension and message falsification. Therein, the later three categories of attacks can further impact normal functionalities of vehicles.
Message injection attack is usually considered together with message suspension attack, since they are both related to message frequency or statistical message sequence in terms of CAN ID. In practice, message falsification attack is realized through a combination of message sniffing, message injection and message suspension.
Specific attack types include:
Denial-of-Service (DoS): This attack can also be called bus-off attack, which aims at paralyzing CAN bus systems through continuously injecting legitimate CAN messages with a low CAN ID, e.g., 0x000. Since the CAN bus uses a lossless bitwise arbitration method to tackle data transmission contention, some functionalities assigned with higher CAN IDs will never get the chance to be transmitted.
Fuzzy: This attack will generate and transmit CAN messages with random CAN IDs and data contents. The CAN IDs will range from 0x000 to 0x7FF, and some of them originally may not be used in the compromised vehicles. Such a type of attack can interfere with some functionalities of victims if extra lower CAN IDs are introduced into the systems. Besides, the randomly generated data contents can mislead vehicles if those CAN IDs are initially used.
Suspension: This is just the message suspension attack. The attacker will try to compromise some ECUs in the CAN bus system and stop them from sending any CAN messages.
Replay: This attack will try to compromise some ECUs in the CAN bus system and store some valid CAN messages in a particular time period, which will be transmitted later. Such a type of attack can mislead compromised vehicles since those stored CAN messages are actually outdated. If the attacker can further suspend those sniffed ECUs, it becomes message falsification attack.
Spoofing: Similar to replay attack, spoofing attack will at first try to sniff some ECUs in the CAN bus system. However, a spoofing attack will try to impersonate those compromised ECUs by simulating their message transmission frequencies, while the related data contents are usually forged. Message falsification attack can also be realized here by further suspending those compromised ECUs.
Intrusion Detection Systems
IDSs can be divided into anomaly detection-based and signature-based methods, which respectively identify intrusions by comparing with normal data and known attacks. Most CAN bus IDSs are anomaly detection-based methods. Message injection or suspension attacks will change CAN message frequencies or statistical message sequences in terms of CAN ID pairs, which is the basis of related works. Prior approaches directly use message frequencies for intrusion detection or data analysis methods based on statistical message sequences. Such methods compare two message sequences through statistical metrics, such as cosine similarity, Pearson correlation or chi-squared test. If a significant change in message frequencies or sequences (metric values larger than given thresholds) is detected, they predict that intrusions happen in the second message interval. Other approaches rely on machine learning based methods, such as LSTM autoencoder, to detect message injection or suspension attacks through statistical CAN message sequence reconstruction. In addition, it has been proposed to convert CAN message streams to images and train a Generative Adversarial Network (GAN) for CAN bus intrusion detection.
None of the above IDSs can properly tackle message falsification attacks. Existing works targeting message falsification attacks assume that message contents or some bits therein do not have much variance. Early works consider Hamming distance between the bit representation of each two CAN messages or message entropy for intrusion detection. For machine learning methods, they adopt techniques originally used for identifying human actions.
Recently, the development of natural language processing inspires some CAN bus IDSs based on LSTM to detect message injection/suspension attacks together with message falsification attacks. It has also been proposed to consider both CAN message frequencies and message contents through bloom filtering and ensemble learning. However, these schemes need to either consider each CAN ID separately or train multiple machine learning models. Considering each CAN ID separately will induce high data collection delay, since sufficient numbers of CAN messages are required for each CAN ID.
The disclosed IDS provides the advantages of those IDSs based on CAN message sequences, which only needs a large enough number of overall CAN messages. On the other hand, training multiple machine learning models will bring much higher computation complexity than training a single model. Furthermore, most of existing CAN bus IDSs, especially those based on LSTM, detect attacks on the basis of a trustworthy reference (a normal CAN message sequence in the previous time slot), which usually cannot be guaranteed during intrusion detection.
Preliminaries and Basics
CAN Message Format
In accordance with various embodiments of the disclosure, when the IDS has buffered enough previously unknown attacks or anomalies, “open world” recognition strategies are used to tackle them.
A comprehensive comparison between the disclosed IDS and existing schemes is summarized in TABLE I. References to the schemes used for comparison are provided below. A comparison of detection functionality is also listed in the table. Existing IDSs for the CAN bus system can only tell whether an attack happens or not (binary), while the disclosed IDS can also identify the specific attack type and tackle previously unknown anomalies (open multi-class).
CAN Message Graph
As discussed above, CAN messages with each CAN ID are usually transmitted with a comparatively fixed frequency. It has been observed that fixed frequencies can further infer stable statistical message sequences in terms of CAN ID pairs. In addition, several engine control units (ECUs) may need to collaborate with each other to accomplish a vehicle operation task, which can be realized through transmitting successive CAN messages. For example, after one CAN message reflecting gas increase is transmitted, it is likely to be followed by a CAN message denoting an increase in revolutions per minute (rpm) of the vehicle engine and another CAN message representing vehicle acceleration. Considering these two factors, the sequences should follow comparatively fixed patterns.
With this observation, data analysis metrics, such as cosine similarity and Pearson correlation, or machine learning models, such as LSTM autoencoder, can be applied based on statistical message sequences for intrusion detection. For real-time analysis, CAN messages are considered in intervals or windows, which usually vary from 100 to 200 messages. Streaming this number of CAN messages usually takes an order of milliseconds. If the message interval is too short, e.g., less than 100 messages, the message sequence can become unstable, since messages with higher CAN IDs do not need to be transmitted very frequently and these CAN IDs may not appear in some message intervals. Message intervals may overlap.
Besides statistical message sequence, a message graph is another possible structure to describe CAN message streams. Compared with message sequences, message graphs can further embed message contents. Graph structures have both edge attributes and node attributes, and they provide the possibility to simultaneously detect all the three categories of attacks mentioned above.
CAN Message Content Preprocessing
As described above, the data content in each CAN message can be used for identifying message falsification attacks. In addition, proper data division may be used to improve intrusion detection accuracy. The message protocol may be known in advance. Alternatively, the protocol can be inferred (reverse engineered) from messages. For example, CAN message contents may be divided based on the rate of bit-flips. The general idea is that for each signal semantic, the most significant bit in the related data block will vary much slower than the least significant one. In this case, if a bit with a high bit-flip rate is followed by one with a low rate, these two bits probably belong to two different signal semantics or data blocks.
In order to calculate the bit-flip rate, a certain number of CAN messages are collected for each CAN ID. It is noted that the CAN message content division only needs to be conducted once based on the training set before the IDS model is trained. In this case, any data collection delay during intrusion detection is unimportant. Besides, any CAN message content associated with a new CAN ID during intrusion detection may be neglected, since it just infers a fuzzy attack.
In the descriptions below, CAN message content is described by a node matrix Vn×n′. Message contents with different CAN IDs may be separated into different numbers of data block, each of which is further converted to a decimal number. The integer n′ in the node matrix denotes the maximal number of data blocks across all CAN IDs. The number of data blocks is aligned for all CAN IDs by complementing a number of Os in the front.
First Stage Classifier
Graph Neural Network
Graph learning has recently drawn increasing attention. All current graph learning frameworks can be divided into three levels, i.e., node level, edge level and graph level. Therein, node level algorithms can be utilized to determine whether a typical CAN message is forged or not. The disclosed CAN bus IDS is based on graph level algorithms. All three levels of models start with some graph convolution layers, and graph level schemes realize graph classification tasks via further introducing pooling and readout layers.
The convolution layer of GNN takes the following form:
Z=ƒ({tilde over (D)}−1Ā×W). (1)
The node matrix V is concatenated with the edge matrix E to generate a descriptor for CAN message graphs, denoted as Xn×(n+n′)=[V; E]. Data content vectors in the node matrix may be zero-padded to the maximum length. Such an operation embeds edge attributes and node attributes into a feature vector for each node. In other words, attributes of edges starting from a typical node are considered as part of attributes of that node. Ā=A+I represents the adjacency matrix with added self-loops, which are realized through an identity matrix I. The CAN message graphs are directed and may have self-loops, so the diagonal elements of the adjacency matrix A are set to zero before adding the identity matrix I. {tilde over (D)} is a diagonal degree matrix with {tilde over (d)}ii=Σj=1nāij, where {tilde over (d)}ii and āij are respectively elements in b and Ā with the corresponding indices. W∈(n+n′)×c denote model parameters in a 1×1×c convolution layer, where c is the number of feature channels in the convolution layer. ƒ(·) is a pointwise nonlinear activation function, such as Rectified Linear Unit (ReLU).
The whole graph convolution process can be explained as follows. Node and edge attributes are at first fitted into the convolution layer through a linear feature transformation XW. Afterwards, for each node, its “channel descriptor,” denoted by rows of the matrix Y=XW, is propagated to its neighborhood, including itself, through ĀY. Hence the adjacency matrix Ā=A+I includes self-loops. The propagation results are then normalized by multiplying them with the inverse diagonal degree matrix, {tilde over (D)}−1, which aims at keeping a fixed feature scale after graph convolution. Finally, a pointwise nonlinear activation function ƒ(·) is applied before outputting the graph convolution results.
Similar to convolution neural networks applied in image processing, in order to capture graph substructure features in different scales, Multiple convolution layers are applied and stacked. In this case, in the tth convolution layer:
Z
t=ƒ({tilde over (D)}−1ĀZt-1Wt). (2)
Here, Zt-1 and Zt respectively represent the input and output matrix of the tth convolution layer, and Z0=X. Wt∈c
This completes the design of the graph convolution layers. For graph classification tasks, pooling and readout layers are used. Besides pooling layers for down-sampling, a sorting operation may be used to sort nodes in the message graph according to their structural roles. This facilitates fitting similar graphs into readout layers in a comparatively consistent node order. As will be discussed below, vehicle states can sometimes cause variations in CAN message graphs, since some ECUs in vehicles are not always activated in order to prolong battery life. Such a situation can further induce node indexing issues, which can be solved by this sorting operation. The readout layers are usually composed of one or more 1-Dimensional (1-D) convolution layers and dense layers.
One-Class Classification
Traditionally, the last layer of a neural network is a sigmoid or softmax layer for classification tasks. The softmax function converts a vector of K real numbers into a probability distribution of K possible outcomes. Intrusions to CAN buses are hard to acquire in real vehicles so, unless simulated data is used, a training set may be highly imbalanced. To avoid imbalance, the first-stage classifier is designed as an anomaly detection-based IDS in which the conventional sigmoid or softmax layer is replaced with a one-class classification layer. This enables the model to be trained using only “normal” CAN bus data.
One-class classification is widely used for anomaly detection and can train a classifier only by using normal data. Known approaches include One-Class Support Vector Machine (OC-SVM) and Support Vector Data Description (SVDD). Compared with OC-SVM, SVDD has better computation scalability and better performance in tackling high dimensionality. Either approach may be used with a GNN for intrusion detection.
Similar to OC-SVM, SVDD comes from the traditional Support Vector Machine (SVM). However, instead of using a hyperplane to define the soft classification border, SVDD tries to find a hypersphere with a center o∈Rc
subject to
∀k,∥Øc
In equation (3a), ξ is called a slack vector, in which all slack variables ξk≥0 together construct a soft classification border or hypersphere. The soft classification border allows some of data samples or support vectors originally from one class to cross the borderline and fall in another class, which helps to avoid model overfitting via reducing the radius of the hypersphere to a certain extent. On the other hand, the hypersphere cannot be shrunk without any limitation since this can potentially induce model underfitting. In this case, a weight v∈(0, 1] is introduced to balance model overfitting and underfitting. In Equation (3b), the function φcg(xk) is the kernel function usually used in SVM for space mapping.
Here, the whole architecture of GNN except the last one-class classification layer is considered to be the kernel function. Xk refers to the kth CAN message graph in the training set. ∥·∥ is the L2 norm.
The constraint of the above optimization problem can be embedded into its objective function. The kth element of the slack vector, ξk, is initialized as:
ξk=max{0,∥Øc
ξk in equation (3a) is then replaced with equation (4). In addition, a model parameter regularization term is introduced to further avoid model overfitting. This provides the following optimization problem without any constraint, which can be seen as the loss function of the whole GNN model of the present disclosure:
Here, λ is the weight decay to balance the penalty for large model parameters. Wall denotes a vector which includes all the parameters in the GNN model except the last one-class layer. The center of the hypersphere o is considered to be fixed and is not considered as a coefficient to be tuned. Some data samples are at first chosen from the training set. Afterwards, these data samples are passed through the initialized whole GNN except the last one-class classification layer, and the outputs from the last dense layer are recorded. Finally, the center o is set to the mean of these outputs. Note that the above loss function considers all of the CAN message graphs in the training set at the same time. If gradient descent is used for model parameter update, CAN message graphs for training may be considered in batches. To minimize computation complexity, the training process may be based on mini-batch Stochastic Gradient Descent (SGD).
Second-Stage Classifier
After the first-stage classifier filters out all the anomalies, the message graphs with anomalies are sent to the second-stage classifier to detect the specific attack type. The second-stage classifier can be seen as a signature-based IDS. In order to tackle the potential new anomalies from unknown classes, such as zero-day attacks, the second-stage classifier further introduces an openmax layer.
During the training process, the second-stage classifier still utilizes the traditional softmax function in the output layer. The openmax layer is deployed during intrusion detection to replace the softmax layer, which revises the output of the penultimate layer in the original classifier to reject data samples not close enough to any known class. The output of the penultimate layer can be denoted as an activation vector AC×1, where C represents the number of known types of attacks. The design of the openmax layer may be based on meta-recognition algorithms, in which the distribution of the activation vector is analyzed through Extreme Value Theory (EVT), and the Weibull distribution is determined. In detail, the EVT fitting in the openmax layer design adapts the concepts of Nearest Class Mean or Nearest Non-Outlier per attack type to the activation vector. Each attack type ci is represented with a mean activation vector (MAV), which is derived through averaging all the activation vectors of the training samples belong to this attack type, denoted as c
c
c
∥c
c
where αi, βi and γi respectively denote the shape, scale and position parameter to describe a Weibull distribution. γi has the same dimensionality as c
When a new anomaly CAN message graph comes to the second-stage classifier, the openmax layer will follow METHOD 1, listed below, for attack type classification with potential unknown anomaly rejection. From METHOD 1, it can be seen that the openmax layer in general adapts the softmax function to open world recognition, which is realized by introducing an extra class c0 (line 8 of METHOD 1). Such a class is used to include all the anomaly CAN message graphs that are not quite similar to any existing attack type, and further infer a potential unknown class. Line 4 of METHOD 1 generates probabilities in which the new anomaly CAN message graph belongs to a selected number of top-ranked classes or attack types. Such probabilities are derived from the corresponding fitted Weibull distributions and used to revise the activation vector (line 5 of METHOD 1).
Method 1: Openmax Attack Type Classification with Potential Unknown Rejection.
(X) = [a1, a2, . . . , aC]T: activation vector of the new anomaly CAN
c
(X))
(X) ←
(X) ○ ω [r] // Element-wise product.
Federated Graph Neural Network Learning
Vehicle State and CAN Message Relationship
Previous works assume that CAN bus intrusions can change statistical message sequences and message contents. In fact, changes of vehicle states can also cause CAN message variations but in a reasonable way. Intuitively, CAN message contents will change to reflect different vehicle states. On the other hand, vehicle states can also affect CAN message sequences since some ECUs may not get activated all the time to prolong battery life in vehicles. For example, tire pressure sensors will sleep most of the time and wake up only when vehicles start to travel at high speeds (over 40 km/h), or during diagnosis and the initial CAN ID binding phases.
The above claims can be validated using datasets without any attacks.
Based on the above two comparisons, a step change can be seen when two message sequences are 50 intervals apart, which cannot be observed in the first situation. In a short time-interval, i.e., when two consecutive message sequences are compared, the vehicle state does not change. In this case, message sequences vary within a certain range, which is reflected in
Federated Learning
Federated learning is a type of distributed machine learning. For example, a cloud server may collaborate with several local users for model training. The whole training set in federated learning is actually composed of those local datasets owned by each local device, which are non-Independent and Identically Distributed (non-IID). During model training, each local device will train a local model based on its own dataset, and only upload model parameters to the cloud server for aggregation, through which data privacy can get protected. In the present application, the non-IID property can be caused by different driving scenarios or vehicle states. In order to improve intrusion detection performance of the disclosed IDS, a federated learning environment may be used. Vehicles in different status can train the disclosed GNN model based on their own CAN bus data, and then upload model parameters to the cloud server for model aggregation.
By way of example, two federated learning schemes, referred to as “FedAvg” and “FedProx,” are evaluated below. Both schemes are based on mini-batch SGD to update local parameters. The whole training process can be divided into a certain number of communication rounds between a cloud server and vehicles. In each communication round, some of the vehicles are selected to upload their model parameters to the cloud server for aggregation after local iterations. In the lth selected vehicle, some generated CAN message graphs are selected at each local iteration. During the training of the first-stage classifier, for each selected message graph Xk, the corresponding loss is derived based on the loss function defined in Equation 5 and current local model parameters in the lth vehicle, denoted as ƒ(rl, Wall,l, Xk). Under mini-batch SGD, the mini-batch loss of the lth selected vehicle can be calculated by averaging losses across all selected message graphs:
where Pl is the set of selected message graphs in the lth vehicle at each local iteration, and ml denotes the total number of elements in Pl.
When it comes to local model parameter update in the lth vehicle, the mini-batch gradient Gr(rl, Wall,l) at each local iteration τ can be derived by computing partial derivatives of F(rl, Wall,l) on each model parameter. Then, model parameters in the lth vehicle are updated as follows:
[rl,Wall,l]τ+1←[rl,Wall,l]τ−ηGτ(rl,Wall,l) (8)
Here, [rl,Wall,l]τ is used to denote a vector which includes variations, such as the above CAN ID 254 related to vehicle speed. In order to differentiate CAN message all parameters in the disclosed one-class GNN model. η represents the learning rate. After a given number of iterations, all the L selected vehicles upload their learned model parameters to the cloud server for aggregation:
The result is considered as the globally learned model parameters after each communication round. Finally, [r, Wall] is broadcast to all vehicles for the following possible local iterations.
The above model parameter update is generally based on “FedAvg,” which will not necessarily provide convergence guarantee. “FedProx” improves convergence performance by further introducing a proximal term. Such a term can ensure that updated local model parameters will not drift away from the global model parameters derived in the last communication round. Mathematically, the mini-batch loss function of the ith vehicle can be modified to:
Here, [r, Wall]pre is used to represent the global model parameters derived in the previous communication round. μ is a hyperparameter to balance the effect of the proximal term. The training of the second-stage classifier follows a similar process except that the traditional softmax and cross-entropy function are respectively used as the output layer and loss function. In addition, the deployment of the openmax layer needs the Weibull fitting, which is conducted at the end of model training. In the federated learning scenario, the cloud server at first requests vehicles to upload the activation vectors of anomaly CAN message graphs and the corresponding labels in their training sets. Afterwards, the Weibull fitting is conducted in the cloud server. Since only the activation vectors instead of raw data need to be uploaded, the privacy guarantee of federated learning is not violated. Besides, the Weibull fitting only needs one single communication round between the cloud server and vehicles, which will not induce much extra communication overhead.
Experimental Results
Datasets and Experiment Setup
In this section, experiments conducted to evaluate our GNN-based IDS under CAN message injection attacks, message suspension attacks and message falsification attacks are described. In particular, for the first-stage classifier, the disclosed IDS is compared with three baselines on anomaly detection, which respectively represent IDSs considering statistical CAN message sequences and message contents. For the second-stage classifier, performance is evaluated for classifying specific attack types and identifying new anomalies from unknown classes.
In the first baseline, described by M. Jedh, L. ben Othmane, N. Ahmed, and B. Bhargava, in “Detection of message injection attacks onto the CAN bus using similarity of successive messages-sequence graphs,” arXiv: 2104.03763., 2021, develops three kinds of strategies to detect CAN bus intrusions via message sequences, i.e., thresholds for cosine similarity, Pearson correlation and LSTM. Data for evaluation are collected on a Ford Transit 500, which are separated into three sets. The first dataset is composed of 23,963 normal CAN messages, while message injection attacks are considered in the remaining two datasets. In the second and third dataset, CAN IDs related to vehicle speed and rpm (254 and 115) are respectively targeted, and compromised CAN messages are randomly injected into these two datasets after given time spots. In total, the second and third dataset each have 88,492 and CAN messages. The second baseline is described by A. Taylor, S. Leblanc, and N. Japkowicz, “Anomaly detection in auto-mobile control network data with long short-term memory networks,” in Proc. IEEE Int. Conf. Data Science and Advanced Analytics, 2016, pp. 130-139. The third baseline is described by H. Sun et al., in “Anomaly detection for in-vehicle network using CNN-LSTM with attention mechanism,” IEEE Trans. Veh. Technol., vol. 70, no. 10, pp. 10 880-10 893, 2021. Both the second and third baselines use IDSs based on message contents and LSTM. The evaluation dataset is CAN Signal Extraction and Translation Dataset provided by Hacking and Countermeasure Research Lab. To facilitate comparisons, anomaly data with the five attack types—DoS, fuzzy, suspension, replay, and spoofing attack—is generated.
The graph neural network (GNN) model used in the examples below at first has four graph convolution layers, with the last layer having only one feature channel for the convenience of node sorting, which is realized in a SortPooling layer. The SortPooling layer is followed by two 1-D convolution layers, one MaxPooling layer, one dense layer and one dropout layer. A Support Vector Data Description (SVDD) classifier is used for the first-stage classifier, while an openmax layer is used in the second-stage classifier. During the GNN model training, a large number of hyperparameters can be tuned. The results below focus on the number of feature channels in the first three graph convolution layers. different settings are evaluated for two hyperparameters in the SVDD classifier, i.e., v and λ. In addition, the hyperparameter p in FedProx will also be explored. In detail, the output of the SortPooling layer is set to a ks×Σt=14ct tensor, where ks is set to a value that is not larger than the number of nodes in 60% of CAN message graphs in the training set. The first 1-D convolution layer has 16 feature channels followed by a MaxPooling layer with a filter size 2 and step size 2. The second 1-D convolution layer has 32 feature channels, a filter size 5 and step size 1. The dense layer has 128 hidden neurons, followed by a dropout layer with a 0.5 drop rate. For activation functions, hyperbolic tangent function (tanh) is selected in graph convolution layers, and ReLU in all the other necessary layers. Mini-batch SGD is optimized as described by D. P. Kingma and J. L. Ba in “Adam: A method for stochastic optimization,” arXiv: 1412.6980., 2014.
Statistical CAN Message Sequences
In this subsection, training performance is evaluated under different GNN model settings to select the best setting based on experimental results. In detail, the number of feature channels in graph convolution layers is selected from {32, 64, 128}, v is selected from {0.001, 0.01, 0.1} and λ is selected from {10−4, 10−5, 10−6}. The first dataset in the first baseline, which only contains normal CAN messages, is used as the training set. The message interval is chosen to be 100 CAN messages. The size of a mini-batch is 10. Each setting is based on a 10-fold cross validation, which is further run for 10 times.
Comparisons are then made with the first baseline to show the effectiveness of the disclosed IDS in detecting CAN message injection attacks. Note that in the original second and third dataset of the first baseline, data contents of injected CAN messages also get changed. In detail, all injected rpm and speed messages respectively end with “FFF” and “FFFF.” In order to reflect the ability of the disclosed IDS to detect intrusions only changing statistical message sequences, one of normal messages with the same CAN ID is randomly selected in the same message interval, and its data content is used in the relevant injected CAN messages. These modifications simulate DoS attacks or bus-off attacks, which cannot be identified by IDSs considering CAN message contents. The comparison with the first baseline based on a 100 CAN message long interval is shown in Table II, in which the considered metrics include accuracy, precision, recall and F1-score. In this table, rpm and speed respectively represent those two corresponding attacks. CS, PC, LSTM-CS and LSTM-PC respectively represent the three kinds of intrusion detection strategies in the first baseline, i.e., thresholds for Cosine Similarity, thresholds for Pearson Correlation and statistical CAN message sequence reconstruction based on LSTM for intrusion detection. GNN corresponds to the disclosed IDS. Based on the comparison, it can be seen that the disclosed IDS can achieve even better performance than the first baseline in detecting CAN message injection attacks.
In order to evaluate the scalability of the disclosed IDS, the situation where message intervals have different lengths during intrusion detection, i.e., 150 and 200, is considered. The results are shown in TABLE III. From this table, it can be seen that the disclosed IDS can still achieve fairly high performance even when the length of message intervals during intrusion detection is different from the length during model training.
Finally, it is shown that message intervals longer than 200 can impact real-time performance of IDSs, since collecting 200 CAN messages will cost more than 100 milliseconds. The average prediction time of the disclosed IDS for each CAN message graph is 3 milliseconds.
TABLE II shows comparison results with the first baseline (in %). The anomaly threshold in the first baseline is set at 0.87.
TABLE III shows a scalability evaluation (in %)
CAN Message Contents
The related dataset used for the following results is first separated into a training set (the first 80%) and a test set (the last 20%). It is assumed here that the first 80% dataset covers most of vehicle states appearing in the last 20% of the whole data. This is validated by the following experimental results. As described above, the training set here also only contains normal CAN messages. The message interval length is set to 100 for both the training and test set. Although the second and third baseline consider CAN message contents for intrusion detection, they can both detect message injection (DoS and fuzzy), message suspension, and message falsification (replay and spoofing) attack at the same time. Therefore, all five attack types are applied in the test set for comprehensive comparisons. Those attacks are randomly deployed in different message intervals of the test set. GNN model hyperparameters and the mini-batch size are the same as those selected in the last subsection. Table IV shows related comparison results, where LSTM-P and CLAM respectively denote the second and third baseline.
TABLE IV shows comparison results with the second and third baseline (in %). The anomaly threshold in the second and third baselines is set at 0.83.
From the results, it can be seen that in general, the disclosed IDS can achieve comparable performance to the second and third baseline. Message falsification attacks, i.e., replay and spoofing attacks, are harder to detect if related data contents only have slight changes compared with normal CAN message contents, which explains why the disclosed IDS performs a little less effective towards message falsification attacks. Considering the robustness of CAN bus, IDSs targeting statistical message sequences usually can achieve high performance, since injected CAN messages should overwhelm normal messages to take control of CAN buses, especially those DoS attacks, which will usually make significant changes to statistical message sequences. However, message falsification attacks can easily control CAN buses via slightly modifying normal CAN messages. For example, a speed change from 35 km/h to 40 km/h can trigger the activity of some ECUs, such as tire pressure sensors. It may be beneficial to balance the sensitivity of related IDSs in practical applications.
TABLE V shows comparison results for real-time performance.
Although LSTM-P and CLAM can detect all the three kinds of attack at the same time, they both need a certain number of observations for each CAN ID to detect intrusions based on LSTM, which can induce high data collection delay and impact real-time performance. TABLE V shows a real-time performance comparison with them. In TABLE V, NoO, DCT, and DT respectively represent Number of Observations, Data Collection Time, and Detection Time for each intrusion detection. The NoO of LSTM-P and CLAM are the number for each CAN ID while the GNN corresponds to the total number of CAN messages. Based on these observations, even for CAN IDs with the highest message frequency, only 4 to 6 messages can be found in each 100 CAN messages. In this case, the total number of CAN messages required before each intrusion detection has to be larger than 200 in LSTM-P and CLAM, or over 2 times larger than the necessary number in the disclosed IDS. Such a number difference can further cause DCT difference, and seriously impact the real-time performance, which is also reflected in TABLE V. The DT of the disclosed IDS is 3.20 ms on average. Although it is larger than the DT of CLAM, the gap is almost negligible compared with the difference of DCT. The memory consumption of the first-stage classifier for intrusion detection is 354.6 KB, which is also less than that of LSTM-P (13,417 KB) and CLAM (682 KB).
Attack Type Classification
In this subsection, the performance of our second-stage classifier is evaluated using the same dataset as the previous subsection. In contrast to the previous subsection, the five attack types are applied uniformly to the whole dataset before training (80%) and test (20%) separation. The model training of the second-stage classifier is quite similar to that of the first-stage classifier except that the traditional softmax and cross-entropy function are respectively used as the output layer and loss function. The GNN has the same structure and hyperparameters as that in the first-stage classifier.
For specific attack type classification, the openmax layer is used. Initially, open world recognition, in which c0 is not introduced and c is set to 0, is not considered.
Next, the ability of the second-stage classifier in identifying new anomalies from unknown classes is evaluated. Here, one attack type is left as the “unknown” class and the other four attack types are considered during model training. For each targeted attack type, the most proper probability threshold c in METHOD 1 is determined based on the optimal cut-point in the corresponding receiver operating characteristic (ROC) curve.
TABLE VI shows the evaluation results. Note that here, the four attack types used are combined in model training to a “general” class, through which the open world recognition problem can be seen as a two-class classification problem. In general, the second-stage classifier can effectively identify new anomalies different from any existing attack type, which usually infers unknown classes. Besides, fuzzy, replay and spoofing attack sometimes are difficult to identify when they are considered as the “unknown” class.
Effect of Federated Learning
As has been discussed previously, vehicle states can affect statistical CAN message sequences and message contents. In this subsection, the impact of federated learning on intrusion detection performance improvement is evaluated based on the first-stage classifier. Related experiments are conducted based on datasets in the first baseline. The first dataset is split into 10 portions for local training to simulate an environment with 10 vehicles. The second and third dataset are still used as test sets. The GNN model hyperparameters and the mini-batch batch size are the same as before in each simulated vehicle. The center of the hypersphere o in the one-class classification layer is set to the average of local centers derived from all simulated vehicles. First experiments are conducted to explore the hyperparameter settings in FedProx, which are illustrated in
TABLE VII shows the accuracy comparisons based on test sets and GNN model parameters after 120 communication rounds.
For the situation without considering federated learning (without FL), only data in the first simulated vehicle is used for training. Based on the comparisons, it can be seen that intrusion detection performance can get degraded if training sets only cover limited vehicle states. In a practical application, such situations can happen if only one vehicle having limited driving scenarios is tracked. However, this issue can be solved by collecting and integrating multiple local models derived from different vehicles. For more comprehensive comparison, a traditional centralized learning scenario can be included. It can be seen that the convergence performance of FedProx is comparable to that of centralized learning. Similar trends can be identified for the second-stage classifier.
The disclosed CAN bus IDS is based on a graph neural network GNN, which can efficiently detect CAN message injection, suspension, and falsification attacks at the same time. A CAN message graph is developed to integrate statistical message sequences with message contents. A GNN fit for directed attributed graphs is constructed and trained to predict intrusions. A two-stage classifier cascade is used to tackle normal and attacked CAN data respectively. This is useful since attack data are hard to acquire in the real world and may cause highly imbalanced training sets. In the first-stage classifier, a one-class classification layer is used for anomaly detection in GNN rather than a conventional softmax layer. The GNN may be trained with only normal CAN messages. Once anomaly CAN data are spotted, they are further passed to the second-stage classifier to determine the specific attack type, in which an openmax layer is introduced to tackle anomalies from potential unknown classes. Changes of vehicle states can affect CAN message graph patterns in a reasonable way. In this case, federated learning is considered to cover a wide range of driving scenarios and vehicle states while protecting user data privacy. The disclosed IDS has been validated based on several real-world datasets and compared with three baselines. Experimental results show that the disclosed IDS can achieve similar performance to those IDSs only based on statistical CAN message sequences and message contents.
As is shown in
In this document, relational terms such as first and second, top and bottom, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms “comprises,” “comprising,” “includes,” “including,” “has,” “having,” or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. An element preceded by “comprises . . . a” does not, without more constraints, preclude the existence of additional identical elements in the process, method, article, or apparatus that comprises the element.
Reference throughout this document to “one embodiment,” “certain embodiments,” “an embodiment,” “implementation(s),” “aspect(s),” or similar terms means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of such phrases or in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments without limitation.
The term “or,” as used herein, is to be interpreted as an inclusive or meaning any one or any combination. Therefore, “A, B or C” means “any of the following: A; B; C; A and B; A and C; B and C; A, B and C.” An exception to this definition will occur only when a combination of elements, functions, steps or acts are in some way inherently mutually exclusive.
As used herein, the term “configured to,” when applied to an element, means that the element may be designed or constructed to perform a designated function, or that is has the required structure to enable it to be reconfigured or adapted to perform that function.
Numerous details have been set forth to provide an understanding of the embodiments described herein. The embodiments may be practiced without these details. In other instances, well-known methods, procedures, and components have not been described in detail to avoid obscuring the embodiments described. The disclosure is not to be considered as limited to the scope of the embodiments described herein.
Those skilled in the art will recognize that the present disclosure has been described by means of examples. The present disclosure could be implemented using hardware component equivalents such as special purpose hardware and/or dedicated processors which are equivalents to the present disclosure as described and claimed. Similarly, dedicated processors and/or dedicated hard wired logic may be used to construct alternative equivalent embodiments of the present disclosure.
Dedicated or reconfigurable hardware components used to implement the disclosed mechanisms may be described, for example, by instructions of a hardware description language (HDL), such as VHDL, Verilog or RTL (Register Transfer Language), or by a netlist of components and connectivity. The instructions may be at a functional level or a logical level or a combination thereof. The instructions or netlist may be input to an automated design or fabrication process (sometimes referred to as high-level synthesis) that interprets the instructions and creates digital hardware that implements the described functionality or logic.
The HDL instructions or the netlist may be stored on non-transitory computer readable medium such as Electrically Erasable Programmable Read Only Memory (EEPROM); non-volatile memory (NVM); mass storage such as a hard disc drive, floppy disc drive, optical disc drive; optical storage elements, magnetic storage elements, magneto-optical storage elements, flash memory, core memory and/or other equivalent storage technologies without departing from the present disclosure. Such alternative storage devices should be considered equivalents.
Various embodiments described herein are implemented using dedicated hardware, configurable hardware or programmed processors executing programming instructions that are broadly described in flow chart form that can be stored on any suitable electronic storage medium or transmitted over any suitable electronic communication medium. A combination of these elements may be used. Those skilled in the art will appreciate that the processes and mechanisms described above can be implemented in any number of variations without departing from the present disclosure. For example, the order of certain operations carried out can often be varied, additional operations can be added, or operations can be deleted without departing from the present disclosure. Such variations are contemplated and considered equivalent.
The various representative embodiments, which have been described in detail herein, have been presented by way of example and not by way of limitation. It will be understood by those skilled in the art that various changes may be made in the form and details of the described embodiments resulting in equivalent embodiments that remain within the scope of the appended claims.
This application claims the benefit of provisional application Ser. No. 63/393,336 filed Jul. 29, 2022 and titled “Graph-Based Can Bus Anomaly Detection Method,” the entire content of which is hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
63393336 | Jul 2022 | US |