The present application claims the benefit under 35 U.S.C. § 119 of European Patent Application No. EP 20201311.6 filed on Oct. 12, 2020, which is expressly incorporated herein by reference in its entirety.
The present invention relates to a method for a neural network. The present invention further relates to an apparatus for a neural network.
Exemplary embodiments of the present invention relate to a method, for example a computer-implemented method, for a neural network, for example an artificial deep neural network (DNN), comprising: providing a plurality of training data sets, each training data set comprising input data for the neural network and associated output data, training the neural network based on the plurality of training data sets and a loss function, wherein the loss function is based on a bit-wise correlation of an output value provided by the neural network and a predetermined function characterizing an operation of a physical system. In some embodiments, this enables to increase a correlation coefficient within a side channel attack (SCA).
According to further exemplary embodiments of the present invention, the providing of the training data sets comprises: determining a plurality of profiling traces, wherein each profiling trace characterizes at least one physical parameter, for example an electrical power consumption, of the physical system during execution of the predetermined function, and determining a respective output value of the predetermined function for each of the plurality of profiling traces, and, optionally, using the profiling traces and the output values as said training data sets.
In some embodiments of the present invention, a profiling trace can, e.g., be characterized by a plurality of measurements of an electrical power consumption of the physical system, e.g., a time series of such measurements, which may, e.g., be represented by a vector.
In some embodiments of the present invention, a respective output value may be obtained from evaluating the predetermined function, which may, e.g., comprise a cryptographic primitive and/or function, such as, e.g., the S-Box (nonlinear substitution) operation according to the Advanced Encryption Standard (AES). As an example, in some embodiments, input data such as, e.g., a plaintext p and a (secret) key k may be used, and the function f(p, k), e.g., the S-Box operation, may be evaluated based on the input data p, k, to obtain the respective output value v=f(p, k).
Note that in some embodiments of the present invention, e.g., for a profiling phase, e.g., for training the DNN, the usually secret key k is known and may be used to determine the output value v. In some embodiments of the present invention, in a further phase, e.g., after the DNN has been trained (profiling phase), the trained DNN may be used to analyze and/or “attack” (e.g., perform side channel analysis) the physical system and/or a similar system, e.g., to determine a secret key used by the physical system. In some embodiments, e.g., during the further (e.g., “attack”) phase, the information represented by the trained DNN may be used to determine the (then) secret key.
According to further exemplary embodiments of the present invention, the neural network is a convolutional neural network, e.g., a neural network providing convolution operations to process data provided to the DNN as input data and/or data derived from the input data.
According to further exemplary embodiments of the present invention, a backpropagation technique is used for the training.
According to further exemplary embodiments of the present invention, the loss function can be characterized by the following equation:
wherein lbit characterizes a bit value of a leakage value associated with the predetermined function, wherein θbit characterizes a bit value of the output value, wherein i is an index variable, wherein B is a total number of bits, e.g., of the function value, wherein CO characterizes a correlation loss function, and wherein 1.1 characterizes an absolute value.
According to further exemplary embodiments of the present invention, the correlation loss function can be characterized by the following equation:
wherein cov( ) is a covariance, wherein σl characterizes the standard deviation of the input vector l, wherein σθ characterizes the standard deviation of the input vector θ, wherein D characterizes the batch size (e.g., characterizing a number of, e.g., power traces considered), wherein
According to further exemplary embodiments of the present invention, the method further comprises weighting the bit values lbit of the leakage value using weighting coefficients, wherein weighted bit values lbit_w are obtained, and performing the correlation based on the weighted bit values lbit_w (“weighted bit leakage”).
In some embodiments of the present invention, a bit-wise correlation may be performed based on the weighted bit values lbit_w.
In some embodiments of the present invention, a non-bit-wise correlation may be performed based on the weighted bit values lbit_w. In other words, in some embodiments, the weighting of the bit values lbit of the leakage value using weighting coefficients may be performed, and a non-bit-wise correlation may be performed based on a result thereof.
According to further exemplary embodiments of the present invention, using a further neural network, e.g., a perceptron, is provided, to approximate at least some of the weighting coefficients. In some embodiments, the weighting coefficients comprise information on hardware aspects such as, e.g., parasitic capacitance and/or (transmission) line/wire lengths, e.g., between different register cells of the physical system, which may influence, e.g., the electric power consumption.
According to further exemplary embodiments of the present invention, the weighted bit leakage may be characterized by the equation
wherein it is assumed that R is a B-bit register with an input RD and a registered output RQ, and wherein η refers to a non-data dependent noise factor.
According to further exemplary embodiments, the method further comprises modifying a design and/or structure of the physical system based on at least one of the approximated weighting coefficients. In some embodiments, this way, asymmetry between different bit lines of the physical system may be reduced, wherein the physical system may be hardened against side channel attacks.
According to further exemplary embodiments of the present invention, the method further comprises using the neural network for determining information on at least one unknown and/or secret parameter of the physical system and/or of a further physical system, which for example is structurally identical with the physical system at least to some extent. In some embodiments, the DNN trained according to the principle of the embodiments may, e.g., be used for side channel attacks (SCA), e.g., based on correlation power analysis (CPA).
Further exemplary embodiments of the present invention relate to an apparatus configured to perform the method according to the embodiments.
Further exemplary embodiments of the present invention relate to a computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method according to the embodiments.
Further exemplary embodiments of the present invention relate to a computer-readable storage medium comprising instructions which, when executed by a computer, cause the computer to carry out the method according to the embodiments.
Further exemplary embodiments of the present invention relate to a data carrier signal carrying and/or characterizing the computer program according to the embodiments.
Further exemplary embodiments of the present invention relate to a use of the method according to the embodiments and/or of the apparatus according to the embodiments and/or of the computer program according to the embodiments for at least one of: a) performing a side channel analysis and/or attack on a physical system, b) evaluating a design of a physical system, e.g., regarding its vulnerability to side channel attacks, c) determining a correlation between a predicted output of a physical system and a physical parameter, e.g., power consumption, of the physical system, d) determining secret data, e.g., a secret cryptographic key.
Some exemplary embodiments will now be described with reference to the figures.
Exemplary embodiments relate to a method,
According to further exemplary embodiments, the providing 100 of the training data sets TDS comprises,
In some embodiments, the training data sets TDS may be characterized by DTrain={xi,vi}, with i=1, . . . , Nprofiling, wherein NProfiling characterizes a number of the training data sets TDS.
In some embodiments, a profiling trace xi can, e.g., be characterized by a plurality of measurements of an electrical power consumption of the physical system PS, e.g., a time series of such measurements, which may, e.g., be represented by a vector xi. In some embodiments, the physical system PS may be or comprise an electronic device, e.g., an electronic device configured to perform cryptographic functions.
In some embodiments, a respective output value vi may be obtained from evaluating the predetermined function F, which may, e.g., comprise a cryptographic primitive and/or function, such as, e.g., the S-Box (nonlinear substitution) operation according to the Advanced Encryption Standard (AES). As an example, in some embodiments, input data such as, e.g., a plaintext p and a (secret) key k may be used, and the function f(p, k) (exemplarily symbolized by reference sign “F” in
Note that in some embodiments, e.g., for a profiling phase, e.g., for training the DNN, the—usually secret—key k is known and may be used to determine the output value v. In some embodiments, in a further phase, e.g., after the DNN NN has been trained, e.g., during the profiling phase, the trained DNN may be used to analyze and/or “attack” (e.g., perform side channel analysis) the physical system PS and/or a similar system PS' (see
According to further exemplary embodiments, the neural network NN (
In some embodiments, a number of processing elements PE of the input layer IL may, e.g., correspond with the number of power consumption measurement values of a profiling trace xi, wherein, e.g., each processing element PE of the input layer IL is configured to process one power consumption measurement value of the profiling trace xi.
In some embodiments, the processing element of the output layer OL may be configured to output a floating point number as output value OV, e.g., characterizing an encoding. In other words, in some embodiments, the DNN NN may be used as an encode network that determines the output value OV based on the input data, e.g., power traces of the physical system PS.
In some embodiments, at least some of the layers IL, HL1, HL2, OL may be fully connected, as exemplarily depicted by
According to further exemplary embodiments, a backpropagation technique is used for the training 102 (
According to further exemplary embodiments, the loss function e.g., used for backpropagation can be characterized by the following equation:
wherein lbit characterizes a bit value of a leakage value associated with the predetermined function F (CO characterizes a correlation loss function, and wherein 1.1 characterizes an absolute value.
As can be seen, in some embodiments, the loss function is evaluated in a bit-wise manner, e.g., processing, by means of the
sum
corresponding single bits of the output value OV (represented by θ in the above equation) and of the leakage value l (corresponding with the output value vi of the function F) each. In other words, in some embodiments, the correlation loss function CO is provided with respective single bit values lbit, θbit, evaluated with these two input bits, and after that an absolute value of the respective value of the correlation loss function
LO is obtained, wherein this procedure is repeated B many times, adding the individual absolute values of the bit-wise evaluated correlation loss function
CO.
According to further exemplary embodiments, the correlation loss function CO can be characterized by the following equation:
wherein cov( ) is a covariance, wherein σl characterizes the standard deviation of the input vector l, wherein σσ characterizes the standard deviation of the input vector θ, wherein D characterizes the batch size, wherein
According to further exemplary embodiments,
Note that in some embodiments, a bit-wise correlation may be performed based on the weighted bit values lbit_w. Also, note that in some (other) embodiments, a non-bit-wise correlation may be performed based on the weighted bit values lbit_w.
According to further exemplary embodiments, as shown in
According to further exemplary embodiments, the method further comprises modifying 106 a design and/or structure of the physical system PS based on at least one of the approximated weighting coefficients. In some embodiments, this way, e.g., asymmetry between different bit lines of the physical system PS may be reduced, wherein the physical system PS may be hardened against side channel attacks.
According to further exemplary embodiments, the method further comprises using 104 (
Element e2 symbolizes measurement equipment which may, e.g., be used to determine the profiling traces in some embodiments. Element e3 exemplarily symbolizes graphically a profiling trace forming part of the training data set DTrain. Element e4 symbolizes the training data set TDS that may be used for training 102, and element e5 symbolizes the neural network NN, e.g., during a training phase. Element e6 symbolizes an optional CPA that may be performed using the trained neural network NN in some embodiments.
In some embodiments, during an attack phase, a (new) set of attack traces DAttack may be obtained by operating an actual target device PS′, e.g., a physical system which is structurally identical or similar to the profiling device PS, whereby the secret key k is, e.g., fixed and unknown. Element e7 symbolizes the target device, and element e8 symbolizes measurement equipment which may, e.g., be used to determine the attack traces in some embodiments, similar to element e2. Element e9 symbolizes an exemplary attack trace, and element e10 symbolizes the set of attack traces DAttack.
The secret key k of the target device PS′ may in some embodiments be determined or recovered by applying a CPA e6, wherein the attack traces DAttack are, e.g., encoded by the DNN NN, e5 that was trained in the profiling phase.
In some embodiments, after a profiling phase, the DNN outputs (encoded traces+optimized leakage) may, e.g., be used as an input for a template attack (TA) or to train a simple (linear) classifier such as logistic regression.
Further exemplary embodiments, as shown in
The apparatus 200 comprises at least one calculating unit 202 and at least one memory unit 204 associated with (i.e., usably by) said at least one calculating unit 202 for at least temporarily storing a computer program PRG and/or data DAT, wherein said computer program PRG is, e.g., configured to at least temporarily control an operation of said apparatus 200, e.g., the execution of a method according to the embodiments, cf., e.g., the exemplary flow chart of
In some embodiments, said at least one calculating unit 202 comprises at least one core 202a, 202b, 202c for executing said computer program PRG or at least parts thereof, e.g., for executing the method according to the embodiments or at least one or more steps thereof.
According to further preferred embodiments, the at least one calculating unit 202 may comprise at least one of the following elements: a microprocessor, a microcontroller, a digital signal processor (DSP), a programmable logic element (e.g., FPGA, field programmable gate array), an ASIC (application specific integrated circuit), hardware circuitry, a tensor processor, a graphics processing unit (GPU). According to further preferred embodiments, any combination of two or more of these elements is also possible.
According to further preferred embodiments, the memory unit 204 comprises at least one of the following elements: a volatile memory 204a, particularly a random-access memory (RAM), a non-volatile memory 204b, particularly a Flash-EEPROM. Preferably, said computer program PRG is at least temporarily stored in said non-volatile memory 204b. Data DAT, which may, e.g., be used for executing the method according to the embodiments, may at least temporarily be stored in said RAM 204a.
According to further preferred embodiments, an optional computer-readable storage medium SM comprising instructions, e.g., in the form of a computer program PRG, may be provided, wherein said computer program PRG, when executed by a computer, i.e., by the calculating unit 202, may cause the computer 202 to carry out the method according to the embodiments. As an example, said storage medium SM may comprise or represent a digital storage medium such as a semiconductor memory device (e.g., solid state drive, SSD) and/or a magnetic storage medium such as a disk or hard disk drive (HDD) and/or an optical storage medium such as a compact disc (CD) or DVD (digital versatile disc) or the like.
According to further preferred embodiments, the apparatus 200 may comprise an optional data interface 206, preferably for bidirectional data exchange with an external device (not shown). As an example, by means of said data interface 206, a data carrier signal DCS may be received, e.g., from said external device, for example via a wired or a wireless data transmission medium, e.g., over a (virtual) private computer network and/or a public computer network such as, e.g., the Internet. According to further preferred embodiments, the data carrier signal DCS may represent or carry the computer program PRG according to the embodiments, or at least a part thereof.
In some embodiments, the apparatus 200 may receive profiling traces xi and/or attack traces and/or other data vi via the data carrier signal DCS.
Further exemplary embodiments relate to a computer program PRG comprising instructions which, when the program is executed by a computer 202, cause the computer 202 to carry out the method according to the embodiments.
Further exemplary embodiments, as shown in
The principle of the embodiments may, e.g., be used for Power-based Side-Channel Attacks (SCAs), e.g., against security enabled devices. Power-based SCA exploit information leakages gained from a power consumption or electromagnetic emanations of a device to extract secret information such as cryptographic keys, even though the employed algorithms are mathematically sound.
SCAs can be divided in two categories: Non-profiled SCAs techniques aim to recover the secret key k by performing statistical calculations on power measurements of the device under attack regarding a hypothesis of the device's leakage. Profiled SCAs assume a stronger adversary who is in possession of a profiling device. It is an open copy of the attacked device which the adversary can manipulate to characterize the leakages very precisely in a first step. Once this has been done, the built model can be used to attack the actual target device in the key extraction phase.
In some embodiments, the DNN NN is trained to learn an encoding of the input data (e.g., power traces and/or electromagnetic parameters or the like) that maximizes a Pearson correlation with a hypothetical power consumption (aka leakage).
Some embodiments provide improvements of a CO scheme, wherein the first aspect is based on the bitwise DNN loss function calculation. A second aspect uses an additional (e.g., small) DNN fNN (
Number | Date | Country | Kind |
---|---|---|---|
20201311.6 | Oct 2020 | EP | regional |