This disclosure relates generally to anomaly detection, and, more particularly, to methods and apparatus for detecting a side channel attack using a cache state.
Over the past few years, micro-architectural side channel attacks have evolved from theoretical attacks on cryptographic algorithm implementations to highly practical generic attack primitives. For example, vulnerabilities such as the Meltdown and Spectre attacks exploit vulnerabilities in modern processors and break memory isolation among processes and/or privilege layers to gain access to data from other applications and/or the operating system (OS). Such data may include passwords, personal photos, emails, instant messages, and even business-critical documents. Side channel attacks exploit the fact that hardware resources are physically shared among processes running in different isolation domains.
The figures are not to scale. In general, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts.
Side channel attacks exploit the fact that hardware resources of a computing system, such as cache, branch predictor, branch target buffer, execution units, etc., are physically shared among processes running on the computing system. Mitigations against side channel attacks mainly focused on patching and proposing new architecture designs. However, not all systems can be patched. Even where possible, patching can be difficult. Moreover, patching sometimes introduces a large amount of operational overhead including, for example, physically replacing hardware components. Example approaches disclosed herein seek to mitigate side channel attacks by early detection of such attacks, enabling responsive actions to be taken to avoid the impact(s) of a side channel attack.
Cache Side Channel Attacks (SCA) are serious threats to information security where multiple processes/virtual machines (VMs) execute on the same physical machine (e.g., share hardware resources of the physical machine). The cache of the central processing unit (CPU) is one of the most dangerous shared resources since the CPU cache is shared by all of the cores in a CPU package. As a result, the CPU cache represents a possible attack vector to perform fine-grained, high-bandwidth, low-noise cross-core attacks.
A cache SCA typically includes three phases: a priming phase, a triggering phase, and an observing phase. In the priming phase, an attacker places the system into a desired initial state (e.g., flushes cache lines). In the triggering phase, a victim performs some action that conveys information through a side channel. In the observing phase, the attacker detects the presence of the information conveyed through the side channel. Such information may include sensitive information such as, for example, passwords, personal photos, emails, instant messages, business-critical information, social security numbers, etc.
To leak sensitive information, the cache SCAs utilize one or more techniques such as Flush+Reload, Evict+Reload, Prime+Probe, Flush+Flush, etc. In Flush+Reload and Evict+Reload techniques, the attacker begins by evicting a cache line shared with the victim from the cache. After the victim (e.g., a personal computer, a phone, a processor platform, an on-board vehicle processor, etc.) executes for a while, the attacker measures the time it takes to perform a memory read at the address corresponding to the evicted cache line. If the victim accessed the monitored cache line, the data will be in the cache and the access will be fast. By measuring the access time, the attacker learns whether the victim accessed the monitored cache line between the eviction and probing operations.
In Prime+Probe attacks, the attacker fills the targeted cache set(s) by accessing an eviction set (a sequence of memory addresses mapped into same cache set) and waits a time interval. As the victim process operates, the victim process may evict cache lines. In the observing phase, the attacker measures the cache access time to prime the targeted cache set(s) and identify the evicted cache lines to extract data access pattern of the victim application.
In Flush+Flush attacks, the attacker measures differences in the duration(s) of flushing a cache line. In this attack, the attacker flushes all the cache lines and lets the victim process run normally. The attacker then again flushes all the cache lines and measures the execution time of the flushing instruction. If the victim process has accessed to a specific memory location, the data will be cached and the flushing instruction will take a longer time.
In the priming and observing phases of cache SCAs, the attacker repeatedly accesses the targeted cache set(s) or cache set(s) containing the targeted cache lines at a high frequency. Note that the anomalous cache behavior only occurs in the priming and observing phases of the attack, while the triggering phase resembles normal program behavior. Example approaches disclosed herein can be used to detect such access patterns and, in response to such detection, perform a responsive action to mitigate the effects of the SCA.
In example approaches disclosed herein, a machine learning (ML) analysis of cache access patterns in a system is performed to detect ongoing cache SCAs (speculative or traditional) in an early phase (e.g., during the priming phase, and/or during the triggering phase). In example approaches disclosed herein, a machine learning model is trained using a histogram of cache set states to characterize cache access behaviors corresponding to a priming phase, a triggering phase, an observing phase, or as a non-attack. During operation, cache set states are sampled, and a histogram is created. The histogram and/or values derived from the histogram are used as an input to the machine learning model to classify the cache state and detect an ongoing attack (e.g., determine if the cache state samples belong to any phase of a cache SCA).
The example processor 105 of the illustrated example of
The example OSS/VMM 110 of the illustrated example of
The example benign process 112 of the illustrated example of
The example unknown process 116 of the illustrated example of
The example side channel anomaly detector 102 of the illustrated example of
The example anomaly detection orchestrator 120 of the illustrated example of
The example cache state interface 125 of the illustrated example of
The example cache state memory 127 of the illustrated example of
The example histogram generator 130 of the illustrated example of
The example histogram repository 135 of the illustrated example of
The example histogram analyzer 140 of the illustrated example of
The example machine learning model trainer 155 of the illustrated example of
The example model data store 150 of the illustrated example of
The example machine learning model processor 145 of the illustrated example of
The example multiple hypothesis tester 160 of the illustrated example of
The example p-value produced by the multiple hypothesis testing represents a similarity of the generated histogram(s) to the one or more characteristic benign histograms. In examples disclosed herein, p-values are created on a scale of zero to one. However, any other scale or nomenclature for representing a similarity may additionally or alternatively be used. A high p-value (e.g., a p-value near or approaching one, a p-value greater than or equal to 0.8) represents a high similarity to a benign histogram (i.e., that the generated histograms represent benign activity), whereas a low p-value (e.g., a p-value near approaching zero, a p-value less than or equal to 0.2) represents a low similarity to the benign histogram (i.e., that the generated histograms do not represent benign activity).
While an example manner of implementing the side channel anomaly detector 102 is illustrated in
Flowcharts representative of example hardware logic, machine readable instructions, hardware implemented state machines, and/or any combination thereof for implementing the side channel anomaly detector 102 of
As mentioned above, the example processes of
“Including” and “comprising” (and all forms and tenses thereof) are used herein to be open ended terms. Thus, whenever a claim employs any form of “include” or “comprise” (e.g., comprises, includes, comprising, including, having, etc.) as a preamble or within a claim recitation of any kind, it is to be understood that additional elements, terms, etc. may be present without falling outside the scope of the corresponding claim or recitation. As used herein, when the phrase “at least” is used as the transition term in, for example, a preamble of a claim, it is open-ended in the same manner as the term “comprising” and “including” are open ended. The term “and/or” when used, for example, in a form such as A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) B with C, and (7) A with B and with C. As used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. As used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B.
Once training is complete, the example side channel anomaly detector 102 enters the operational phase 302. The example cache state interface 125 samples the cache state from the cache 108. (Block 320). An example approach for sampling the cache state is described in further detail below in connection with
The example histogram generator 130 generates a histogram of the sampled cache state(s). (Block 330). In examples disclosed herein, a separate histogram is created for each sampled cache state. That is, a first histogram is created corresponding to the L1-D cache (e.g., an L1-D histogram), and a second histogram is created corresponding to the L1-I cache (e.g., an L1-I histogram). However, any number of histograms corresponding to any number of caches may additionally or alternatively be created.
The example histogram analyzer 140 extracts a histogram statistic(s) from the histogram(s). (Block 340). In examples disclosed herein, the statistic includes at least one of a minimum value, a maximum value, a percentile value, a standard deviation value, a skewness value, one or more values representing a range in the histogram, an upper-hinge value of the histogram, etc. In examples disclosed herein, separate statistics are created for each of the histogram(s). That is, a first statistic (and/or first set of statistics) is created for the L1-D histogram, and a second statistic (and/or second set of statistics) is created for the L1-I histogram.
The example machine learning model processor 145 operates on the statistic(s) with the machine learning model(s) stored in the 150 to determine one or more classifications. (Block 350). In examples disclosed herein, each machine learning model includes one or more one-class support vector machine(s) (SVMs). Each SVM is capable of producing a binary indication of whether the statistic(s) indicate a particular type of activity (e.g., no attack, a priming phase of an attack, a triggering phase of an attack, an observing phase of an attack, etc.). In examples disclosed herein, separate machine learning models are used for the each of the statistic(s) and/or set of statistics and, accordingly, are used in connection with each of the corresponding statistic(s). For example, a first machine learning model is used to classify the L1-D statistics and a second machine learning model is used to classify the L1-I statistics.
In connection with each machine learning model, binary values are returned for each respective particular type of activity (e.g., a classification). For example, a machine learning model may return a binary value (e.g., true or false) indicating whether a priming phase is identified, a binary value indicating whether a triggering phase is identified, a binary value indicating whether an observing phase is identified, and/or a binary value indicating whether benign activity is identified. However, any other type(s) of value(s) may additionally or alternatively be returned. For example, a numeric value representing a similarity score to a particular type of phase/activity may be returned. Moreover, in some examples, certain types of activities may be omitted from the returned data.
The example anomaly detection orchestrator 120 then determines whether the output classification(s) identify an attack. (Block 360). For example, an attack may be identified if at least one of the classifications indicate that the priming phase returns a result of true or that the observing phase returns a result of true. In some examples, an attack may be identified if the triggering phase returns a result of true, and/or if the benign activity classification returns a result of false. If no attack is identified (e.g., block 360 returns a result of NO), the example anomaly detection orchestrator 120 determines whether any further re-training is to occur. (Block 395). If training is not to occur (e.g., block 395 returns a result of NO), control returns to block 320, where regular monitoring continues. In some examples, additional checks to determine whether to terminate the process 300 of
Returning to block 360, if the example anomaly detection orchestrator 120 determines that at an attack has been identified (block 360 returns a result of YES), the example multiple hypothesis tester 160 conducts multiple hypothesis testing comparing the histogram(s) to one or more characteristic benign histograms stored in the histogram repository 135 to generate a p-value. (Block 370). In examples disclosed herein, the multiple hypothesis testing performed by the multiple hypothesis tester 160 is implemented using a Kolmogorov-Smirnov test. However, other types of multiple hypothesis testing algorithms may additionally or alternatively be used. The example p-value produced by the multiple hypothesis testing represents a similarity of the generated histogram(s) to the one or more characteristic benign histograms. In examples disclosed herein, p-values are created on a scale of zero to one. However, any other scale or nomenclature for representing a similarity may additionally or alternatively be used. A high p-value (e.g., a p-value near or approaching one, a p-value greater than or equal to 0.8) represents a high similarity to a benign histogram (i.e., that the generated histograms represent benign activity), whereas a low p-value (e.g., a p-value near approaching zero, a p-value less than or equal to 0.2) represents a low similarity to the benign histogram (i.e., that the generated histograms do not represent benign activity).
The example anomaly detection orchestrator 120 compares the p-value against the threshold (e.g., the threshold trained in connection with block 310, and described in further detail below in connection with
In response to the detection of the anomaly signifying potential onset or incidence of cache side channel attacks, (block 380 returning a result of YES), the example anomaly detection orchestrator 120 implements one or more responsive actions (e.g., error handling techniques) to mitigate such side channel attacks. (Block 390).
For example, the anomaly detection orchestrator 120 may inform the corresponding system software (OS/VMM) 110 of the detected anomaly through available inter-process communication and/or other communication approaches (e.g., flags, interrupts, etc.). In some examples, additional information such as, for example, attacker and/or victim domain identifiers (e.g., process identifiers and/or virtual machine identifiers of the process suspected to be under attack, process identifiers and/or virtual machine identifiers of the process suspected to be performing an attack) are utilized in classification and, as such, the OS/VMM 110 is notified of that information as well. In some examples, such information is obtained by a runtime and/or scheduler of the OS/VMM 110. Such information enables the two domains (e.g., the attack domain and the victim domain) to be physically separated (e.g., on two separate cores, on two separate CPUs) by the scheduler of the OS/VMM 110. Such separation minimizes the shared hardware resources between the two domains (process, VM, etc.) and thereby minimizes a risk that sensitive data may be exposed.
In some examples, the anomaly detection orchestrator 120 informs the OSS/VMM 110 about potential onset of the side channel attack. The OS/VMM 110 can enable one or more architectural feature(s) that defend against cache side channels. Such architectural features may be disabled by default to avoid performance costs, but may be enabled in situations where the potential onset of such an attack is detected. Such architectural features may include, for example, cache partitioning through cache allocation technology in LLC (of that CPU), activating memory tagging based capabilities for L1-I/D, limiting speculation of memory accesses across domains, activating flushing of at least the L1-I/D caches across context switches, etc.
In some examples, the performance of the responsive action involves further analysis to determine whether a side channel attack (or a particular phase thereof) is being performed. That is, the detection/identification disclosed above in connection with
In some examples, other error handling methods are used by the OS/VMM 110 to obtain such information such as, for example, dynamic enabling of settings and/or thresholds in a memory manager to prevent the use of shared memory (e.g., by the attacker process and the victim process) (critical in flush+reload, flush+flush) and flush all TLB states, disable or limit the privileges of CLFLUSH to Ring 0 only (will trap to OS such that it can monitor).
The example histogram generator 130 generates one or more histogram(s) of the sampled cache state(s). (Block 420). In examples disclosed herein, a separate histogram is created for each sampled cache state. That is, a first histogram is created corresponding to the L1-D cache (e.g., an L1-D histogram), and a second histogram is created corresponding to the L1-I cache (e.g., an L1-I histogram). However, any number of histograms corresponding to any number of caches may additionally or alternatively be created. The histogram(s) created in connection with block 420 represent benign operation (e.g., not an attack). The histogram(s) are stored in the histogram repository 135.
Next, the side channel anomaly detector 102 creates histogram(s) representing an attack. The example anomaly detection orchestrator 120 causes the OS/VMM 110 to execute the known (but non-malicious) attack process 114. (Block 425). In examples disclosed herein, the non-malicious attack process 114 is intended to simulate an actual side channel attack, but is not intended to actually expose sensitive information. In some examples, the attack process 114 is an actual attack process that would, under normal circumstances, expose sensitive information. However, in some examples, additional safeguards may be put in place to prevent the sharing of such sensitive information such as, for example, a firewall and/or other communication preventing approaches to prevent the attack process from sharing such sensitive information with a third party.
While the example attack process 114 is operating in a priming phase, the example cache state interface 125 samples the cache state from the cache 108. (Block 430). As noted above, an example approach for sampling the cache state is described in further detail below in connection with
Upon completion of the priming phase of the attack process 114, the attack moves into a triggering phase. While the example attack process 114 is operating in the triggering phase, the example cache state interface 125 samples the cache state from the cache 108. (Block 435). Upon completion of the triggering phase of the attack process 114, the attack moves into an observing phase. While the example attack process 114 is operating in the observing phase, the example cache state interface 125 samples the cache state from the cache 108. (Block 440). In examples disclosed herein, the cache state information from each of the priming phase, triggering phase, and observing phase are stored separately from each other, as there are expected to be different cache access patterns in each of those phases. However, in some examples, cache state information for the separate phases may be combined. The example anomaly detection orchestrator 120 then causes the OS/VMM 110 to terminate the attack process 114. (Block 442).
Using the collected cache state information representing an attack, the example histogram generator 130 creates histogram(s) corresponding to each of the phases of the attack. (Block 445). In examples disclosed herein, separate histogram(s) are created for each sampled cache state. That is, a first histogram is created corresponding to the L1-D cache (e.g., an L1-D histogram) in the priming phase, a second histogram is created corresponding to the L1-I cache (e.g., an L1-I histogram) in the priming phase, a third histogram is created corresponding to the L1-D cache (e.g., an L1-D histogram) in the triggering phase, a fourth histogram is created corresponding to the L1-I cache (e.g., an L1-I histogram) in the triggering phase, a fifth histogram is created corresponding to the L1-D cache (e.g., an L1-D histogram) in the observing phase, and a sixth histogram is created corresponding to the L1-I cache (e.g., an L1-I histogram) in the observing phase. However, any number of histograms corresponding to any number of caches and/or attack phases may additionally or alternatively be created. The histogram(s) are stored in the histogram repository 135.
The example histogram analyzer 140 extracts histogram statistics from the histograms created in blocks 425 and block 445. (Block 450). That is, at least one statistic is determined in connection with each of the benign histogram(s), the priming phase attack histogram(s), the triggering phase attack histogram(s), and/or the observing phase attack histogram(s). In examples disclosed herein, the statistic(s) include at least one of a minimum value, a maximum value, a percentile value, a standard deviation value, a skewness value, one or more values representing a range in the histogram, an upper-hinge value of the histogram, etc. In examples disclosed herein, separate statistics are created for each of the histogram(s). That is, a first statistic (and/or first set of statistics) is created for the L1-D histogram, and a second statistic (and/or second set of statistics) is created for the L1-I histogram.
The example machine learning model trainer 155, in connection with the example machine learning model processor 145, trains one or more models based on the histogram statistics to produce an indication of whether an unknown process is exhibiting characteristics of an attack. (Block 455). In examples disclosed herein, the machine learning model is implemented by a one-class support vector machine (SVM). However, any other type of machine learning model may additionally or alternatively be used. During training, the example machine learning model trainer 155 updates the model(s) stored in the model data store 150 to reduce an amount of error generated by the example machine learning model processor 145 when using the histogram statistics to attempt to correctly output the desired response.
In examples disclosed herein, the model is trained to learn the cache access patterns of normal and/or benign program execution and of various attack phases (e.g., a priming phase, a triggering phase, an observing phase, etc.). In some examples, the triggering phase of an attack is considered normal and/or benign. In some examples, combinations of data from benign and/or attack operations are combined to allow for training to identify an attack occurring during benign operation. In some examples, the priming statistics and observing statistics are combined into a single malicious label and are considered for the detection as a binary classification problem.
As a result of the training, a model and/or an update to an existing model is created and is stored in the model data store 150. In examples disclosed herein, the model update can be computed with any sort of model learning algorithm such as, for example, Stochastic Gradient Descent.
The example multiple hypothesis tester 160 conducts multiple hypothesis testing comparing the attack histogram(s) to one or more characteristic benign histograms stored in the histogram repository 135 to generate a p-value threshold. (Block 460). In examples disclosed herein, the multiple hypothesis testing performed by the multiple hypothesis tester 160 is implemented using a Kolmogorov-Smirnov test. However, other types of multiple hypothesis testing algorithms may additionally or alternatively be used. The example p-value threshold produced by the multiple hypothesis testing represents a similarity threshold of the attack histograms and benign histograms that can be used to determine if histogram(s) of an unknown process are more similar to an attack operation or benign operation. In examples disclosed herein, p-value thresholds are created on a scale of zero to one. However, any other scale or nomenclature for representing a similarity and/or similarity threshold may additionally or alternatively be used. In examples disclosed herein, the resulting p-value threshold is set to reduce the number of false positives and/or false negatives. As such, additional histograms stored in the histogram repository 135 may be considered (e.g., histograms collected during operation of the side channel anomaly detector 102) to reduce a number of false positives. The example process 400 of
If the monitoring interval has not been reached (e.g., block 530 returns a result of NO), the example cache state interface 125 continues to wait until the monitoring interval has been reached. Upon reaching the monitoring interval (e.g., block 530 returns a result of YES), the example cache state interface 125 determines whether the collection of the cache state information is complete. (Block 540). In examples disclosed herein, cache state collection may be considered complete upon the expiration of the monitoring interval (which may, in some examples, represent a single sampling of cache state information). However, in some examples, collection may be considered to be complete when a threshold number of monitoring intervals have been reached (e.g., ten intervals, fifteen minutes, etc.)
If the collection of the cache state information is not complete (e.g., block 540 returns a result of NO), control returns to block 510, where the process of
The processor platform 700 of the illustrated example includes a processor 712. The processor 712 of the illustrated example is hardware. For example, the processor 712 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor based (e.g., silicon based) device. In this example, the processor implements the example anomaly detection orchestrator 120, the example cache state interface 125, the example histogram generator 130, the example histogram analyzer 140, the example machine learning model processor 145, the example machine learning model trainer 155, the example multiple hypothesis tester 160.
The processor 712 of the illustrated example includes a local memory 713 (e.g., a cache, such as the cache 108 of
The processor platform 700 of the illustrated example also includes an interface circuit 720. The interface circuit 720 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface.
In the illustrated example, one or more input devices 722 are connected to the interface circuit 720. The input device(s) 722 permit(s) a user to enter data and/or commands into the processor 712. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system.
One or more output devices 724 are also connected to the interface circuit 720 of the illustrated example. The output devices 724 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer and/or speaker. The interface circuit 720 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.
The interface circuit 720 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 726. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc.
The processor platform 700 of the illustrated example also includes one or more mass storage devices 728 for storing software and/or data. Examples of such mass storage devices 728 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and digital versatile disk (DVD) drives.
The machine executable instructions 732 of
From the foregoing, it will be appreciated that example methods, apparatus and articles of manufacture have been disclosed that enable detection of side channel attacks. Some such methods, apparatus and articles of manufacture disclosed herein improve the efficiency of using a computing device by enabling earlier detection of side channel attacks that, for example, detect an ongoing side channel attack before a data leak can occur. In this manner, data leaks can be prevented without the need for patching existing systems, applications, and/or hardware, thereby achieving one or more improvement(s) in the functioning of a computer.
Example 1 includes an apparatus for detecting side channel attacks, the apparatus comprising a histogram generator to generate a histogram representing cache access activities, a histogram analyzer to determine at least one statistic based on the histogram, a machine learning model processor to apply a machine learning model to the at least one statistic to identify an attempt to perform a side channel attack, a multiple hypothesis tester to perform multiple hypothesis testing to determine a probability of the cache access activities being benign, and an anomaly detection orchestrator to, in response to the machine learning model processor identifying that the at least one statistic is indicative of the side channel attack and the probability not satisfying a similarity threshold, cause the performance of a responsive action to mitigate the side channel attack.
Example 2 includes the apparatus of example 1, wherein the machine learning model is implemented by a support vector machine.
Example 3 includes the apparatus of example 1, wherein the multiple hypothesis tester is to perform the multiple hypothesis testing using a kolmogorov-smirnov test.
Example 4 includes the apparatus of example 1, further including a machine learning model trainer to train the machine learning model based on a benign histogram representative of benign cache access activities.
Example 5 includes the apparatus of example 1, further including a machine learning model trainer to train the machine learning model based on an attack histogram representative of cache access activities performed during the side channel attack.
Example 6 includes the apparatus of example 1, further including a cache state interface to sample a cache state of a processor, the histogram generator to generate the histogram based on the sampled cache state.
Example 7 includes at least one non-transitory computer-readable medium comprising instructions that, when executed, cause at least one processor to at least create a histogram representing cache access activities, determine at least one statistic based on the histogram, apply a machine learning model to the at least one statistic to attempt to identify a side channel attack, perform multiple hypothesis testing on the histogram to determine a probability of the cache access activities being benign, and in response to determining that the at least one statistic is indicative of the side channel attack and the probability not satisfying a similarity threshold, perform a responsive action to mitigate the side channel attack.
Example 8 includes the at least one non-transitory computer-readable medium of example 7, wherein the machine learning model is implemented using a support vector machine.
Example 9 includes the at least one non-transitory computer-readable medium of example 7, wherein the instructions, when executed, cause the at least one processor to perform the multiple hypothesis testing with a kolmogorov-smirnov test.
Example 10 includes the at least one non-transitory computer-readable medium of example 7, wherein the instructions, when executed, cause the at least one processor to train the machine learning model based on a benign histogram representative of benign cache access activities.
Example 11 includes the at least one non-transitory computer-readable medium of example 7, wherein the instructions, when executed, cause the at least one processor to train the machine learning model based on an attack histogram representative of cache access activities performed during the side channel attack.
Example 12 includes the at least one non-transitory computer-readable medium of example 7, wherein the instructions, when executed, cause the at least one processor to sample a cache state of the at least one processor, wherein the histogram is generated based on the sampled cache state.
Example 13 includes an apparatus for detecting side channel attacks, the apparatus comprising means for generating a histogram representing cache access activities, means for determining at least one statistic from the histogram, means for applying a machine learning model to classify the at least one statistic as indicative of a side channel attack, means for testing using the histogram based on multiple hypothesis testing to determine a probability of the cache access activities being benign, and means for mitigating the side channel attack in response to determining that the at least one statistic is indicative of the side channel attack and the probability not satisfying a similarity threshold.
Example 14 includes the apparatus of example 13, wherein the machine learning model includes a support vector machine.
Example 15 includes the apparatus of example 13, wherein the testing means is to perform the multiple hypothesis testing using a kolmogorov-smirnov test.
Example 16 includes the apparatus of example 13, further including means for training the machine learning model based on a benign histogram representative of benign cache access activities.
Example 17 includes the apparatus of example 13, further including means for training the machine learning model based on an attack histogram representative of cache access activities performed during the side channel attack.
Example 18 includes the apparatus of example 13, further including means for sampling a cache state, the means for generating to generate the histogram based on the sampled cache state.
Example 19 includes a method for detecting side channel attacks, the method comprising creating, by executing an instruction with a processor, a histogram representing cache access activities, determining at least one statistic based on the histogram, applying a machine learning model to the at least one statistic to attempt to identify a side channel attack, performing multiple hypothesis testing on the histogram to determine a probability of the cache access activities being benign, and in response to identifying the side channel attack and the probability not satisfying a similarity threshold, performing a responsive action to mitigate the side channel attack.
Example 20 includes the method of example 19, wherein the machine learning model is implemented using a support vector machine architecture.
Example 21 includes the method of example 19, wherein the performing of the multiple hypothesis testing includes performing a kolmogorov-smirnov test.
Example 22 includes the method of example 19, further including training the machine learning model based on a benign histogram representative of benign cache access activities.
Example 23 includes the method of example 19, further including training the machine learning model based on an attack histogram representative of cache access activities performed during the side channel attack.
Example 24 includes the method of example 19, further including sampling a cache state of the processor, wherein the histogram is generated based on the sampled cache state.
Although certain example methods, apparatus and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the claims of this patent.