The present invention relates to a method for securely checking a code and a circuit system for carrying out the presented method, also referred to as a tester or checker, which is secured against fault attacks.
Redundant codes are used in safety-relevant systems in which, in the event of an error, the error is recognized by a code checker, thus allowing a critical situation to be averted. In this regard, m-out-of-n codes also play a role. In addition, for cryptographic applications, random generators which are to have a self-test at their disposal according to the recommendations of the National Institute of Standards and Technology (NIST) are necessary (also see the separate publication entitled “Recommendation for Random Number Generation Using Deterministic Random Bit Generators,” SP 800-90, March 2007). The implementation of a self-test for any given deterministic random generator may involve a high level of effort. When an m-out-of-n code is used for the implementation, the recommended self-test may be easily achieved using a code checker.
An m-out-of-n code is an error detection code having a code word length of n bits, each code word including exactly m instances of a “one.”
For generating an m-out-of-n code, for example a mask generator having m-out-of-n encoding may be used. One configuration of such a mask generator is illustrated in
Mask generators, similarly as for other cryptographic devices and cryptological algorithms, are subjected to attacks for the purpose of manipulating or reading out protected data. In encryption methods commonly used nowadays, such as the Advanced Encryption Standard (AES), keys are used which, due to the key length of 128 bits and greater, are not ascertainable by trial and error (so-called “brute force” attacks), even when rapid computational techniques are used. Therefore, an attacker also examines side effects of an implementation, such as the variation of power consumption over time, the duration, or the electromagnetic emission of a circuit in the encryption operation. Since the attacks do not directly target a function, they are referred to as side channel attacks.
These side channel attacks (SCA) make use of the physical implementation of a cryptosystem in a device. The control unit having cryptographic functions is observed during execution of the cryptological algorithms in order to find correlations between the observed data and the hypotheses for the secret key.
Numerous side channel attacks are believed to be understood, such as those discussed, for example, in the publication by Mangard, Oswald, and Popp in “Power Analysis Attacks,” Springer 2007. A successful attack on the secret key of the AES may be practicably carried out in particular by Differential Power Analysis (DPA).
In DPA, the power consumption of a microprocessor during cryptographic computations is recorded, and traces of the power consumption are compared to hypotheses, using statistical methods.
In known processes which make DPA more difficult, an intervention is made in the algorithm itself. With masking, the operations are carried out using randomly altered operands, and as a result the random value is then removed, which means that the random chance does not have an impact on the result. Another option is so-called hiding, in which an attempt is made to compensate for high-low transitions using corresponding low-high transitions.
As explained above, in the present state of the art of computational technology, recent cryptographic processes such as the Advanced Encryption Standard (AES) are well protected against so-called brute force attacks, i.e., testing all possibilities by trial and error, due to the length of the key and the complexity of the process itself. The attacks by a potential attacker are therefore increasingly focused on the implementations. Using so-called side channel attacks, the attacker attempts to obtain information concerning the power consumption during processing of the algorithm via the electromagnetic emission or the operand-dependent duration of the processing, via which the secret key may be deduced. However, if the secret key or the input/output signals of a cryptographic operation is/are linked to a mask that is unknown to the attacker, an attack is made more difficult or even prevented. The attacker will then initially attempt to find out the secret mask.
One option for improving the robustness against such side channel attacks is to use in a mask generator a system of automatic state machines or state machines having the same configuration, which are supplied with an input signal on the input side and which generate an output signal as a function of their state, each state machine always having a different state than the other state machines of the system. It is assumed that, due to the number of ones and zeros which is the same in each case (and thus a Hamming weight which is the same), and due to transitions of these states for identical input signals, each having the same Hamming distance, the power consumption is independent of the particular state of the state machines that are used.
It is known that so-called fault attacks may be used to bring a circuit into a state which is actually not provided for normal operation. This state provides an option for more easily ascertaining the secret key. Thus, for example, by a targeted alteration of the operating voltage (spike attacks), or by using electromagnetic fields or radiation, for example alpha particles or lasers, the state of individual or all used state machines could be changed into a state (0, 0, . . . , 0). If a bit vector generated in this way is used for masking a key, the originally provided protection of the key from side channel attacks is completely or at least partially lost. It is thus easier to ascertain the secret key. Using specialized code checkers, in particular for m-out-of-n code a check may be made very easily as to whether one or also multiple bits (in particular in one direction) has/have been corrupted.
Such code checkers are discussed, for example, in the publication by A. P. Stroele and S. Tarnick, entitled “Programmable Embedded Self-Testing Checkers for All-Unidirectional Error Detecting Codes,” Proceedings of the 17th IEEE VLSI Test Symposium, Dana Point, Calif., 1999, pages 361-369. The publication describes a code checker which monitors the outputs of a system in order to detect occurring errors as quickly as possible. The checker is composed of a number of full adders and flip-flops, and has a uniform structure. A simplified circuit for the same purpose is discussed in another publication by S. Tarnick, entitled “Design of Embedded Constant Weight Code Checkers Based on Averaging Operations,” Proceedings of the 16th IEEE On-Line Testing Symposium, Corfu Island, Greece 2010, pages 255-260.
The publication WO 2006/003023 A2 discusses a method and a system for recognizing unidirectional errors in words of systematic unordered code. This system also includes a number of full adders and flip-flops. The system, which includes a translation circuit and a Berger-type code checker, may be tested using a small number of code words.
The code checkers in the cited publications are configured in such a way that they test themselves. For this purpose, the code space is reduced, using a first checker, so that only one-half of the code bits are present, and also only one-half of the code bits have the value 1 (m/2 out of n/2). This procedure is carried out, for example, until a 1-out-of-2 code (dual-rail code) is present. However, this is valid only when m=n/2.
This dual-rail code is ultimately checked in a self-testing dual-rail code checker as described, for example, in the following article: S. Kundu, S. M. Reddy, “Embedded Totally Self-Checking Checkers: A Practical Design,” Design and Test of Computers, 1990, Volume 7, Edition 4, pages 5-12.
A disadvantage of known code checkers is that the code checkers themselves are not resistant to an attack, for example DPA. Regardless of whether or not a fault attack is present, an attacker could make use of the periods of the code check to draw conclusions concerning the secret key that is used.
Against this background, a method for securely checking a code having the features described herein and a circuit system according to the description herein for carrying out the method are presented. Embodiments result from the dependent claims and the description.
Using the presented method, the risk of attacking a code checker with the aid of DPA is eliminated. The possibility is thus provided for continuously checking the state of a structure having 2n automatic state machines, each having n bits, for errors when all of these automatic state machines are configured to always have a different state. DPA is no longer able to make use of the testing itself. This allows the implementation of a DPA-resistant random generator according to NIST recommendations, for example in the publication NIST SP 800-90, in which a self-test of a deterministic random bit generator (DRGB) is required.
The method proposed herein, at least in some of the embodiments, goes far beyond the NIST requirements, which stipulate only a self-test. Due to the option for monitoring, significantly increased protection from fault attacks, for example, is ensured.
Further advantages and embodiments of the present invention result from the description and the appended drawings.
It is understood that the features stated above and to be explained below may be used not only in the particular stated combination, but also in other combinations or alone without departing from the scope of the present invention.
The present invention is schematically illustrated based on specific embodiments in the drawings, and is described in greater detail with reference to the drawings.
Transformation elements TE—0, TE—1, TE—2, . . . , TE—15 form an output signal, not described in greater detail here, from input signal 102 supplied to them. These output signals are combined, and then receive a signature S 120 having 256 bits. Transformation elements TE—0, TE—1, TE—2, . . . , TE—15 each have an automatic state machine or a state machine ZA whose state information is stored, for example, in the form of a digital data word of predefinable width. For example, state machine ZA may have a memory capacity of 4 bits, so that a total of 16 different states are possible. State machines ZA of each system 104, 106, 108, 110 have the same configuration. The word “same” means that starting with identical input signals 102 and an identical initialization state, each state machine ZA assumes the same successor state as another identical state machine ZA in a subsequent processing cycle.
It is also provided that each state machine ZA in each case always has a different state than all other state machines ZA of corresponding systems 104, 106, 108, or 110. DPA attacks which attempt to draw conclusions concerning an internal processing state of circuit system 100 or of individual transformation elements TE—0, TE—1, TE—2, . . . , TE—15 based on an analysis of an electrical current and/or power consumption or of interference emissions are thus made more difficult.
It is advantageous when the number of provided transformation elements TE—0, TE—1, TE—2, . . . , TE—15 corresponds to the number of maximum possible different states of state machine ZA, in the present case, 16. As a result, every theoretically possible state is always present, i.e., for each processing cycle, in exactly one state machine ZA, so that in each case only one combination of all 16 possible states is “visible” from the outside, i.e., by a possible attacker carrying out a DPA attack. Even in a subsequent processing cycle, in which the individual state machines ZA each change their state according to a predefined rule, once again exactly one of the 16 possible states is present overall in each of the 16 state machines ZA, so that in turn all 16 states are simultaneously “visible” from the outside.
As a result, a possible attacker is not able to deduce a state of the internal signal processing in transformation elements. TE—0, TE—1, TE—2, . . . , TE—15 from a corresponding electromagnetic emission which is present in a customary implementation of circuit system 100, or also from the electrical power consumption of circuit system 100. For an ideally symmetrical configuration of all components, the electrical power consumption is always constant, so that the emitted electromagnetic field in each case experiences no significant changes during a change of state between successive processing cycles. Based on signature S 120, a bit vector 130 having 128 bits is generated by a linear gate in block 122. The linear gate may be an XOR gate or also an XNOR gate, for example. To further hinder the task of the potential attacker, the outputs of the various transformation elements are interchanged prior to the linear gate. One meaningful measure in this regard is to rotate the states within a system as a function of the input data.
Illustrated mask generator 100 utilizes so-called nonlinear signature formation. It is thus known how a structure based on p state machines having the same configuration and each having q state bits may be built whose power consumption is independent of the particular state of these state machines. For this purpose, a complete set of state machines (COSSMA) must be provided. This occurs precisely when p=2q. If each state machine now has a different initial state, (p*q)/2 ones and the same number of zeros are thus necessarily present in the p*q bits. In addition, all of these state machines of such a system are provided with the same input signals. For a given input signal, if each of these state machines has a unique successor state and a unique precursor state, the states of the m state machines are different from one another at any time, and therefore this must necessarily involve a complete set of all possible states. Thus, a (p*q)/2-out-of-(p*q) code is present at any point in time of the processing of input data.
In one practical example, q=4 and therefore p=24=16. The 16 state machines then always have the states 0, 1, 2, . . . , 15, and only the position of these states is arbitrary in alternation. When p*q=64, there are always exactly 32 ones and 32 zeros at the outputs of all of these state machines. This 32-out-of-64 code could be checked using a code checker as described above according to the related art. However, such a code checker would be very complex, since even in a first reduction stage in a circuit for a weighted average value formation for code reduction, a so-called weight averaging circuit WAC, 32 full adder cells, and also two flip-flops would be necessary. In the second stage, 16 full adders and two flip-flops would then be necessary, and so forth, until only two full adders and two flip-flops would be necessary. With 62 full adders (approximately 8 GE), 10 flip-flops (approximately 8 GE), and 6 dual-rail checkers (approximately 4 GE), a total outlay of approximately 600 gate equivalents (GE) would be necessary. If this were carried out for a 4-fold structure having 4*64 bits, a total of approximately 2400 gates of circuit complexity would be present in the parallel implementation.
In contrast, the presented implementation makes use of the fact that in the same bit positions of the state machines, the same number of ones is present at any point in time. Therefore, the checking may be subdivided, and in each case only 16 bits may be tested in a checking step. The remaining 3×16 bits are then tested in three further checking steps. In contrast to the code checkers provided according to the related art, the flip-flops upstream and downstream from the full adders may be completely dispensed with in the weight averaging circuit when a counter, which is present in the circuit anyway, is utilized and in each case one bit thereof is used on a weight averaging circuit WAC (code reducer), for example as input x0. To implement the circuit with self-testing, the carry-in inputs of the weight averaging circuit and of the dual-rail checkers must assume all possible combinations at least once.
The MSBs of the 16 state machines are used as input bits in this circuit. When the 16 state machines all have a different state, exactly 8 ones are contained in the 16 input bits (8-out-of-16 code). As disclosed in the literature according to the related art shown (Stroele, Tarnick), a 4-out-of-8 code is generated at the 8 outputs w′0, w′1, . . . w′7 of 304 precisely when the input has been an 8-out-of-16 code and the reduction circuit contains no errors. Input x0 generates an output x1, where x1≠x0 when no error is present. Thus, a 1-out-of-2 code is present for this first signal pair. To ensure the quality of self-testing, x0 must be changed frequently, and in addition d0 . . . d15 should not be constant either.
Sum bits of the full adders are denoted by sumn (n=0, 1, 2 . . . ), and transfer input bits of the full adders are denoted by cinn (n=0, 1, 2, . . . ). The transfer output bits (outputs of full adders 202), which are transferred into the next stage as signals wn (n=0, 1, 2 . . . ), are denoted by coutn (n=0, 1, 2 . . . ).
Lastly,
All 4-to-1 multiplexers 302 are controlled in the same way via counter bits e0 and e1, so that in each case they select the same position bit of state machine 300 as bit gi. Thus, depending on the four states of these two counter bits, a given bit is selected in each case from one of the connected 16 state machines 300, which is then processed in code reducer or WAC—16 304. In the error-free case, these inputs should correspond to an 8-out-of-16 code. The eight outputs w′0 . . . w′7 of WAC—16 result in a 4-out-of-8 code and are connected to the inputs of WAC—8 or code reducer 306. WAC—8 306 has a similar configuration to WAC—16 304, except that it has only half the number of full adders, and the last sum bit is connected to output x3 in an inverted manner. Code reducer or WAC—4 308 which is then additionally provided has only two full adders and two outputs, x6 and x7, to which the carry-out of these full adders is connected. Additional output x5 is the inverted sum output of the second full adder in code reducer or WAC—4 308.
In the error-free case, respective pairs x0 and x1, x2 and x3, x4 and x5, and x6 and x7 in each case supply a dual-rail code (or 1-out-of-2 code); i.e., a signal of these pairs is always exactly 1. It is now sufficient only to test whether this property is fulfilled for all of these signal pairs. This check is carried out in so-called two-rail code checkers TRC according to
e2 . . . e0 is an event counter which is incremented with each code check (in each case, 16 of the 64 bits are checked in four phases).
It is thus possible to check whether any of these automatic state machines has a different state at the point in time of the check, which indicates error-free functioning. In this method, however, it is possible for the check itself to allow conclusions concerning the secret states of the automatic state machine, for example by examining the power consumption of the code checker during the check. This is where the presented method comes into play.
TRC 400 forms a dual-rail output signal at an output 412 from two dual-rail-coded signals at the two inputs 402 and 404. Output 412 is also formed as a dual-rail pair when the dual-rail code for both input signal pairs of inputs 402 and 404 are not impaired and TRC 400 itself operates in an error-free manner.
As shown in
A code error is present when the two output signals of dual rail checker 504 are the same. The signal “error” 510 is equal to 1 and “not an error” 512 is equal to 0 as soon as the two outputs of 504 are the same. In the error-free case, 510 is equal to 0 and 512 is equal to 1. When input signals x0, x2, and x4 assume any arbitrary combination, the TRCs are self-testing. This property is ensured by counter bits e2 . . . eo when the counter counts from 0 to 7. The code of the counter is arbitrary (binary code, Gray code, excess 3 code, counting forward or backward) only when all allocated positions of the used bits occur in sequence. The signal “error” at output 510 of equivalence element 506 in
The code checker according to
The proposed circuit requires 14 full adders (8 GE each), 3 inverters (0.5 GE each), 16×4:1 multiplexers (7.5 GE each), 3 TRCs (4 GE each), and 2 XOR/XNOR (2.5 GE each). This is approximately 250 GE total, and thus significantly less than the above-mentioned recommendation of 600 GE. For 4 COSSMA structures, either 4×250=1000 GE are necessary, or the operation for the four structures is carried out successively on the same hardware and additionally requires 64×4:1 multiplexers having 480 GE, i.e., approximately 750 GE total.
In a generalization of the method, other codes which do not satisfy the condition m=n/2 may also be checked.
For the case that m≠n/2, the m-out-of-n code cannot be reduced to two bits via multiple stages as in
If m=2 and n=16, it is possible to carry out only the first stage according to
These dual-rail outputs of the customary code checkers are checked in the TRCs according to
Thus, a circuit system for checking an m-out-of-n code using a multistage code reducer is described which is suited in particular for carrying out the presented method, at least one stage of this code checker being composed of multiple full adders, and in the first stage n/2 full adders being used in which the sum bit of a full adder in each case is led to the transfer input of the next full adder, and the n/2 transfer bits are output to the n/2 full adders. In addition, it may be provided that the transfer input of the first full adder is connected to the output of a first counter bit, and the sum output of the last full adder is output, and that the first counter bit and the sum bit of the last full adder form a first signal pair.
Furthermore, it may be provided that the second stage of the code checker is composed of n/4 full adders, and that the n/2 output bits of the first stage are connected to the operand inputs of the n/4 full adders of the second stage of the code checker, the sum bits of the full adders in each case being connected to the transfer input of the next full adder, and the n/4 transfer bits being output to the n/4 full adders, a second counter bit being connected to the transfer input of the first full adder of the second stage, and this second counter bit together with the output sum bit of the last full adder of the second stage forming a second signal pair.
In addition, further stages of the code checker may be added, provided that up to only two transfer bits of two full adders which form a dual-rail signal pair (for m=n/2) may be output, or another suitable code checker is connected to one of the stages (for m≠n/2), and for the last stage, for the case that m=n/2, either a last signal pair is formed from the connected last counter bit and the sum output of the second full adder, or a code checker checks the code of the preceding stage and outputs a dual-rail signal pair.
For the signal pairs (first, second, . . . last), in each case a signal may be inverted, and modified signal pairs may thus be formed. These modified signal pairs together with the dual-rail signal pair, connected to one another, are led to a two-rail checker in such a way that a last two-rail checker outputs a signal pair which forms a 1-out-of-2 code when the code and the code checker are free of errors, and therefore a check may be made for errors in the m-out-of-n code or in the test circuit itself.
The mentioned counter bits may be varied in such a way that all states of these counter bits are incorporated during successive checking steps (of one or multiple code words), and that various code words may be selected for checking, using various counter bits.
In addition, the m-out-of-n code to be checked may be split into multiple subcodes. These subcodes may be successively checked on the same code reducer or code checker. For this purpose, the inputs of the code reducer may be switched between the various subcodes.
Alternatively, these subcodes may be simultaneously checked on different code reducers.
Thus,
Even in the first stage of code reducer 206 according to
The presented method is now based on intermixing and interchanging the input signals in an unpredictable manner. This is possible due to the fact that the code checker delivers the same result, regardless of the sequence of the input signals.
In this way it is ensured that the position of the code bits, and thus the possibly secret precursor stage, cannot be deduced by successfully analyzing the current pattern during decoding.
It is thus ensured that for any value of r, other combinations of proximity relationships in s0 through s15 result, and therefore in each case other signals are also entered in an adder of structure WAC—16.
The intermixing also acts indirectly on the proximities of the subsequent stages. Since the signals of r are not predictable and are not known to a potential attacker, the attacker is therefore also not able to carry out attacks on the output signals of the code checker stages or their internal intermediate signals. The proposed shifts are listed in Table 1 below. However, any other desired associations are also possible when in each case all bits d0 through d15 are present in bits s0 through s15 for any value of r.
Association of Input Bits d0 . . . d15 with Output Bits s0 . . . s15 as a Function of r
The presented method is usable in principle in all deterministic random bit generators which are based, for example, on a COSSMA and are therefore inured to DPA attacks. In particular, the method may be used with nonsystematic codes. However, use with systematic codes is also conceivable when it is ensured that only the information bits are interchanged.
Thus, the method is also usable for a Berger code, for example, when only the information bits and not the check bits are interchanged in an appropriate manner. For a Berger code, the check bits represent the number of ones (binarily represented and inverted) in the information bits. When the information bits are interchanged, the number of ones at that location remains the same. Accordingly, for this code the check may also be carried out with interchanged information bits.
For a parity code which is a systematic code, a check is made as to whether the number of ones, including the parity bit, is even or odd. Here as well, the sequence does not play a role. The bits for the parity check may be arbitrarily interchanged, and the parity bit may also be included in this interchange.
For a Hamming code, although the position of the bits does not play a role, when the code check is considered as a sum of parity checks, for each parity check the bits considered in the parity check may be arbitrarily interchanged prior to the code checker. In this case, however, the parity bit is preferably not included in the interchange when an error correction is to be made, since the parity bits include information concerning the bit streams to be corrected. For security reasons (for prevention of fault attacks), however, a correction is not very meaningful. Thus, if a Hamming code is to be used only for recognizing multiple errors without correction, the interchange for each parity check, including a parity bit, is thus possible. It should be ensured that some of the bits of the code word are included in multiple parity checks. These bits are then interchanged, possibly in a different way, for each of these checks.
In this sense, code checkers for the self-test measures mentioned at the outset for a DRGB code are meaningfully usable with m-out-of-n code, Berger Code, parity code, and Hamming code, and the code check itself is not attackable via DPA.
One possible procedure for a Berger code is illustrated in a flow chart in
Secure checking is made possible by interchanging information bits 702 in interchanging unit 706, i.e., prior to the actual check.
A modified embodiment for a Hamming code is illustrated in
The cyclic interchange may also be used for all of the above-described interchange operations. Provided that multiple multiplexer 602 illustrated in
The cyclic interchanging according to
As already mentioned in the preceding embodiments, bits may also be added to a code word in the transfer unit. This is possible whenever a valid code word thus once again results. Thus, for a 4-out-of-8 code, for example, four ones and four zeros may be added to the code word at any arbitrary position. The resulting code word is then an 8-out-of-16 code word. For a parity code word, any arbitrary number of zeros and an even number of ones may be added, and a valid code word having a correspondingly increased bit width is obtained. For a Berger code, any arbitrary number of zeros may be added in the information part.
The above-described examples show options via which the observation of the current pattern may be made more difficult for an attacker by increasing the bit width of the code word, since the attacker is not able to distinguish between the original bits of the original code word and the additionally added bits (dummy bits). Adding code bits may take place in addition to the interchanging. Furthermore, the additionally added bits may also be interchanged, or their position should be determined as a function of nonpredictable bits.
In principle, the first code word may be transferred into at least one second code word; i.e., a transfer may be made into exactly one second code word or into a number of second code words.
Number | Date | Country | Kind |
---|---|---|---|
10 2011 078 645.7 | Jul 2011 | DE | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2012/061769 | 6/20/2012 | WO | 00 | 4/17/2014 |