The invention relates to a microprocessor equipped with an arithmetic and logic unit and with a hardware security module.
Numerous attacks are possible in order to obtain information about a binary code or to cause unexpected operation of the binary code. For example, attacks known under the name “fault injection” or “fault attack” may be implemented. These attacks involve disrupting the operation of the microprocessor or the memory containing the binary code, using various physical means such as modifying supply voltages, modifying the clock signal, exposing the microprocessor to electromagnetic waves, inter alia.
Using such attacks, an attacker is able to alter the integrity of machine instructions or data in order for example to recover a secret key of a cryptographic system, bypass security mechanisms such as verification of a PIN code during authentication, or simply prevent the execution of a function essential to the security of a critical system.
These attacks may notably cause three types of fault, called execution faults, when the binary code is executed:
1)altering the instructions of the machine code that is executed,
2) altering the data stored in the main memory or in registers of the microprocessor, and
3) altering the control flow of the machine code.
he control flow corresponds to the execution path that is followed when the machine code is executed. The control flow is conventionally depicted in the form of a graph, known under the name “control flow graph”.
To detect such execution faults, there has already been the proposal to associate an error correction code with each data item processed by the microprocessor. Next, the error correction code associated with the result of the instruction that processes these data is computed from the error correction codes of the processed data. In this way, if a fault occurs when this instruction is executed, the result obtained does not correspond to the computed error correction code. This allows this fault to be detected. Such a solution is for example disclosed in application FR3071082. The algorithm for constructing the error correction code associated with a data item is known. It is therefore possible for an attacker to inject faults in order to modify the error correction code computed for the result so that it corresponds to the faulted result. In this case, the execution fault is not detected.
To overcome the above disadvantage, it has been proposed that the error correction code be replaced by an integrity code. This integrity code is constructed from the data item and, in addition, using a secret key known only to the microprocessor. It is thus difficult for an attacker to modify an integrity code so that it corresponds to a faulted result, because he does not know the secret key. However, it should always be possible to construct the integrity code for the result using the integrity codes associated with the processed data and without using the result of the instruction executed by the arithmetic and logic unit. For example, such a solution is described in the following article: L. De Meyer, V. Arribas, S. Nikova, V. Nikov and V. Rijmen: “M&M: Masks and Macs against physical attacks”, IACR Transactions on Cryptographic Hardware and Embedded Systems, pages 25-50, 2019. This article is subsequently denoted by the term “DEMEYER2019”. The method described in this article for computing the integrity code for the result from the integrity codes of the processed data is complex. The reason is that it does this by using multiplications in a Galois field. Hardware circuits that rapidly compute multiplications in a Galois field are complex and slow. The method of DEMEYER2019 is therefore difficult to implement in a microprocessor, including in the case of Boolean operations.
Prior art is also known from EP3457620A1 and from the following article: Savry Olivier et al.: “Confidaent: Control Flow protection with Instruction and Data Authenticated Encryption”, 2020 23 RD Euromicro Conference On Digital System Design, 26/08/2020, pages 246-253.
The objective is to propose a microprocessor that has the same level of security for at least one executed arithmetic and logic operation as that described in the article of DEMEYER2019 but that is easier to produce. [oolo] The invention therefore relates to such a microprocessor.
The invention will be better understood on reading the description that follows, which is given solely by way of non-limiting example, with reference to the drawings, in which:
In the figures, the same references have been used to designate elements that are the same. In the rest of this description, features and functions that are well known to those skilled in the art will not be described in detail.
In this description, the following definitions have been adopted.
A “program” designates a set of one or more predetermined functions that it is desired to have executed by a microprocessor.
A “source code” is a representation of the program in a computer language, not being able to be executed directly by a microprocessor and being intended to be converted, by a compiler, into a machine code able to be executed directly by the microprocessor.
A program or a code is said to be “able to be executed directly” or “directly executable” when it is able to be executed by a microprocessor without this microprocessor needing to compile it beforehand by way of a compiler or to interpret it by way of an interpreter.
An “instruction” denotes a machine instruction able to be executed by a microprocessor. Such an instruction consists:
of an opcode, or operation code, that codes the nature of the operation to be executed, and
of one or more operands defining the value(s) of the parameters of this operation.
The registers in which the data item or data to be processed by an instruction are stored are typically identified by one or more operands of the instruction. Likewise, the register Rres−p in which the result Dres−p of the execution of an instruction needs to be stored can also be identified by an operand of this instruction.
“Logic instruction” is used to denote an instruction from the set of instructions of the microprocessor 2 that, when executed by the arithmetic and logic unit, stores the result of a Boolean operation in a register Rres−p of the microprocessor. The opcode of the logic instruction identifies the Boolean operation to be executed by the arithmetic and logic unit in order to modify or combine the data item or data D1 to Dn.
The “&” symbol is used below to generically denote a Boolean operation. Thus, the notation D1&D2& . . . &Dn generically denotes a Boolean operation executed by the microprocessor 2 between the data D1 to Dn. When n =1, the Boolean operation is the complement operation also known by the name “NOT”. When n is greater than or equal to two, the Boolean operation is chosen from the group made up of the following Boolean operations and their composition:
the “OR” logic operation,
the “EXCLUSIVE-OR” logic operation,
the “AND” logic operation.
The following notations are used to denote Boolean operations:
the “OR” logic operation is denoted by the symbol “ + ”,
the “EXCLUSIVE-OR” logic operation is denoted by the symbol “XOR”,
the “AND” logic operation is denoted by the symbol “ . ”,
the “NOT” Boolean operation is denoted by the symbol “ ′ ” placed after the variable for which the complement is computed.
“Arithmetic instruction” is used to denote an instruction from the set of instructions of the microprocessor 2 that, when executed by the unit 10, stores the result of an arithmetic operation in a register Rres−p of the microprocessor. An arithmetic operation is different from a Boolean operation. An arithmetic operation typically belongs to the group made up of bit shift operations, bit rotation operations, addition operations, multiplication operations and division operations.
“Arithmetic and logic instruction” denotes both a logic instruction and an arithmetic instruction. Unless indicated otherwise, the term “instruction” denotes an arithmetic and logic instruction below.
A “machine code” is a set of machine instructions. It typically is a file containing a sequence of bits with the value “0” or “1”, these bits coding the instructions to be executed by the microprocessor. The machine code is able to be executed directly by the microprocessor, that is to say without the need for a preliminary compilation or interpretation.
A “binary code” is a file containing a sequence of bits bearing the value “0” or “1”. These bits code data and instructions to be executed by the microprocessor. The binary code thus comprises at least one machine code and also, in general, digital data processed by this machine code.
The expression “execution of a function” is understood to designate execution of the instructions making up this function.
A block of bits of a data item or of a variable is a group of consecutive bits of this data item or of this variable. The size of a block of bits is equal to the number of bits contained in this block.
The microprocessor 2 here comprises:
an arithmetic and logic unit 10;
a set 12 of registers;
a control module 14;
a data input/output interface 16,
an instruction loader 18 having a program counter 26,
a queue 22 of instructions to be executed, and
a hardware security module 28.
The memory 4 is configured so as to store instructions of a binary code 30 of a program to be executed by the microprocessor 2. The memory 4 is a random access memory. The memory 4 is typically a volatile memory. The memory 4 may be a memory external to the microprocessor 2, as shown in
By way of illustration, the binary code 30 notably comprises a machine code 32 of a secure function. Each secure function corresponds to a set of several lines of code, for example several hundred or thousand lines of code, stored at successive addresses in the memory 4. Each line of code corresponds here to a machine word. A line of code is thus loaded into a register of the microprocessor 2 in a single read operation. Likewise, a line of code is written to the memory 4 by the microprocessor 2 in a single write operation. Each line of code codes either a single instruction or a single data item.
By way of illustration, the microprocessor 2 is a reduced instruction set computer more commonly known by the acronym RISC.
The loader 18 loads the next instruction to be executed by the unit 10 into the queue 22 from the memory 4. More precisely, the loader 18 loads the instruction to which the program counter 26 points. To this end, the queue 22 comprises a succession of multiple registers.
The unit 10 is notably configured to execute one after another the instructions loaded into the queue 22. The instructions loaded into the queue 22 are generally consistently executed in the order in which these instructions were stored in this queue 22. The unit 10 is also capable of storing the result of these executed instructions in one or more of the registers of the set 12.
In this description, “execution by the microprocessor 2” and “execution by the unit 10” will be used synonymously.
The module 14 is configured to move data between the set 12 of registers and the interface 16. The interface 16 is notably able to acquire data and instructions, for example from the memory 4 and/or the medium 6 that are external to the microprocessor 2. To speed up transfers of data and instructions between the microprocessor 2 and the memory 4 here, the interface 16 comprises one or more cache memories. To simplify
The module 28 is capable of automatically executing the various operations described in detail in the sections that follow in order to make the execution of the arithmetic and logic instructions by the unit 10 secure. The module 28 operates independently and without using the unit 10. It is thus capable of processing the lines of code before and/or after they are processed by the unit 10. To this end, it notably comprises a secure nonvolatile memory 29 and various hardware computation circuits. This memory 29 can only be accessed via the module 28. In this embodiment, the module 28 is configured to execute operations such as the following operations:
verifying an integrity code,
constructing an integrity code from a data item,
constructing the integrity code for a result from the integrity codes of the processed data.
Each computation circuit constructs an integrity code Cres−t for a result Dres−p from the integrity codes C1 to Cn of the data D1 to Dn processed by the unit 10 and 30 without directly using the result Dres−p. Here, there is a computation circuit 110 for the logic instructions. There is also a computation circuit for each arithmetic instruction whose execution needs to be made secure. Here, the module 28 comprises five computation circuits 120, 130, 140, 149 and 160, each associated with a respective arithmetic instruction. These computation circuits are described in more detail in section IV that follows.
The memory 29 is used to store the secret information required for the operation of the module 28. Here, it therefore notably comprises a pre-stored secret key α.
In this example of embodiment, the set 12 comprises general registers that can be used to store any type of data. The size of each of these registers is sufficient to store a data item or a result and the integrity code associated therewith.
A data interchange bus 24 that connects the various components of the microprocessor 2 to one another is shown in
The medium 6 is typically a non-volatile memory. It is for example an EEPROM or flash memory. Here, it contains a backup copy 40 of the binary code 30. It is typically this copy 40 that is automatically copied to the memory 4 to restore the code 30, for example after a power failure or the like or just before the execution of the code 30 starts.
By injecting faults while the unit 10 is operating, it is possible to disrupt its operation so that the result of the execution of the arithmetic and logic instruction does not correspond to that expected. The unit 10 is then said to have been caused to malfunction. This section describes a solution for detecting such a malfunction of the unit 10.
The registers R1 to Rn denote the registers of the set 12 comprising the data D1 to Dn, respectively, to be processed when the instruction is executed. The register Rres−p denotes the register of the set 12 in which the result, of the execution of the arithmetic and logic instruction, is stored.
The size, in terms of the number of bits, of each data item D1, D2 and Dres−p is equal to 2d, where d is a whole number typically greater than four or five.
The structures of the registers R1, R2 and Rres−p are identical and shown in the specific case of the register Ri in
a bit range containing the data item Di,
a range containing an integrity code Ci allowing the integrity and the authenticity of the data item Di to be checked.
The code Ci is generated by the module 28 using a pre-programmed relationship defined generically by the following relationship: Ci=Qα(Di), where:
the subscript i identifies a register among the registers R1, R2 and Rres−p, and
the function Qα is a function pre-programmed in the module 28 and configured by the secret key α.
The function Qα is defined by the following relationship: Qa(Di)=P o Fα(Di), where the symbol “o” denotes the function-composition operation. The function P is a predetermined function. In the first embodiments described below, the function P is the identity function. Thus, in these first embodiments, the function Qα is equal to the function Fα. Examples where the function P is different from the identity function are given in the section dealing with variants.
The function Fα is a homomorphism of a set A equipped with the “&” Boolean operation towards a set B equipped with the same “&” Boolean operation such that Fα(D1&D2)=Fα(D1) & Fα(D2), and such is the case for all “&” Boolean operations. Here, the sets A and B are each the set of numbers that can be coded over 2d bits, that is to say the set of possible data D1 and D2. Thus, using the notations introduced earlier, the function Fα is such that for any & Boolean operation, the circuit 110 simply computes the integrity code Cres−t associated with the result Dres−p of the Boolean operation D1 & D2 using the following relationship Cres−t=C1 & C2. When the Boolean operation executed is the complement operation of the data item D1, the circuit 110 computes the code Cres−t associated with the result Dres−p using the following relationship Cres−t=C1′, where the symbol “ ′ ” denotes the complement operation that returns a “1” when D1=0 and that returns “0” when D1=1.
each function Eq is a stage of transpositions that can be executed in parallel,
NbE is the number of stages of transpositions, and
the subscript q is an order number between zero and NbE−1.
The number NbE is greater than one and less than or equal to d. Preferably, the number NbE is equal to d. In
Each stage Eq of transpositions is defined by the following relationship: Eq(x)=Tαm,q o . . . o Tαj,q o . . . o Tα1,q o Tαo,q(X), where:
x is a variable whose size, in terms of the number of bits, is equal to the size of the data item Di,
Tαj,q is a conditional transposition, configured by the parameter αj,q, that permutes two blocks of bits B2j+1,q and B2j,q of the variable x when the parameter αj,q is equal to “1” and that does not permute these two blocks of bits when the parameter αj,q is equal to “0”,
“m+1” is the total number of transpositions Tαj,q of the stage Eq,
“j” is an order number identifying the transposition Tαj,q among the other transpositions of the stage Eq. The subscript “j” therefore also identifies the position of the blocks B2j+1,q and B2j,q in the variable x. In this application, the blocks are classified in ascending order of their subscript, which depends on the value of the subscript j.
Each transposition Tαj,q is distinguished from all of the other transpositions of the function Fα by the fact that it is the only one that permutes the two blocks B2j+1,q and B2j,q when the parameter αj,q is equal to “1”. Moreover, the blocks B2j+1,q and B2j,q of all of the transpositions Tαj,q of the same stage Eq are different from one another and do not overlap. Thus, all of the transpositions Tαj,q of the stage Eq can be executed in parallel. Here, the stages Eq are executed one after the other in descending order of the subscripts q.
Moreover, this function Fα has the following characteristics:
the blocks B2j+1,q and B2j,q permuted by the transpositions Tαj,q are adjacent blocks,
the size of the blocks B2i+1,q and B2j,q is equal to 2q,
the number m of transpositions Tαj,q per stage Eq is equal to 2d−q−1.
The notations Bwj+1,q and B2j,q indicate that these are the (2j+1)-th and 2j-th blocks of 2q bits, respectively, of the variable x. In
In the case of
The operation of the microprocessor 2 in order to make the execution of arithmetic and logic instructions secure will now be described in more detail with reference to
The method begins by providing, in a step 86, the binary code 30. During this step, in this example, the binary code 30 is loaded into the memory 4 from the medium 6. Next, execution of the binary code 30 by the microprocessor 2 begins.
In a step 88, each time a data item Di is stored in the cache memory 27, the module 28 computes the code Ci using the relationship Ci=Fα(Di). Next, the data item Di and the code Ci associated therewith are both stored in the memory 27.
Each time an instruction to load a data item into one of the registers Ri is executed by the unit 10, in a step 90, the data item D, and the code Ci are written to this register Ri.
Prior to the execution of an arithmetic and logic instruction between two data items D1 and D2, step 90 is executed once for the data item D1 and once for the data item D2.
Next, each time an arithmetic and logic instruction is about to be executed by the unit 10, just before it is executed, in a step 94, the module 28 checks whether there is an error in the data item D, contained in the register Ri identified by an operand of the instruction to be executed.
During this step, for each register Ri in question, the module 28 checks, using the code Ci contained in the register Ri, whether or not the data item Di currently stored in this register has an error. For example, this involves the module 28 computing a code Ci* using the relationship Ci*=Fα(Di) and without using the code Ci stored in the register R. If the code C,* computed in this way is identical to the code Ci stored in the register Ri, then the integrity and authenticity of the data item Di are confirmed. In that case, the module 28 detects no error and proceeds to a step 96. Otherwise, the module 28 proceeds to a step 102.
In step 102, the module 28 triggers signalling of an execution fault.
If the module 28 detects no error, in step 96, the microprocessor 2 decodes the arithmetic and logic instruction and then the unit 10 executes it and stores its result Dres−p in the register Rres−p.
When the executed instruction is an arithmetic and logic instruction whose execution is secure, in parallel with step 96 or after the execution of step 96, in a step 98, the module 28 computes the code Cres−t by using only the codes Ci associated 10 with the data Di processed by the unit 10 in step 96. Thus, when it is the data D1 and D2 that are processed, the code Cres−t is computed by combining the codes C1 and C2 stored in the registers R1 and R2, respectively, prior to execution of the logic instruction.
More precisely, when the executed instruction is a logic instruction, the circuit 110 computes the code Cres−t using the following relationship: Cres−t=C1 & C2, where the “&” symbol denotes the Boolean operation executed by the unit 10 in step 96.
When the executed instruction is an arithmetic instruction for which the module 28 comprises a specific computation circuit for computing the code Cres−t, then this specific circuit is selected and the code Cres−t is computed by this specific circuit. Examples of such specific computation circuits are described in detail in the section that follows.
Next, in a step 100, the module 28 checks whether the computed code Cres−t corresponds to a code Cres−p computed from the result Dres−p stored in the register Rres−p. In the case of the circuits 110, 120, 130, 149 and 160, the code Cres−p is computed by implementing the relationship Cres−p=Fα(Dres−p). When the code Cres−t is computed by the circuit 140, the code Cres−p is equal to the result Dres−p.
Next, the module 28 compares the computed codes Cres−p and Cres−t. If these codes are different, the module 28 triggers the execution of step 102. Otherwise, this means that the code Cres−t corresponds to the code Cres−p and therefore that there was no fault during the execution of the instruction by the unit 10. In this last case, no signalling of an execution fault is triggered and the method continues with the execution of the next instruction in the queue 22.
The execution of steps 98 and 100 allows a malfunction in the unit 10 to be detected, because the computed codes Cres−p and Cres−t are identical only if the unit 10 has executed the arithmetic and logic instruction correctly. In the case of a logic instruction, this can be explained simply by the following relationship: Cres−p=Fα(Dres−p)=Fα(D1&D2)=Fα(D1) & Fα(D2)=C1 & C2=Cres−t. In the case of arithmetic instructions, this can be explained by the structure and the operations performed by the implemented computation circuit.
If the instruction executed in step 96 is the complement operation for the data item D1, in step 98, the code Cres−t is computed using the following relationship: Cres−t=C1′. The remainder of the method is then identical to what was described earlier. In the case of the complement operation, the codes Cres−p and Cres−t are identical only if the unit 10 has operated correctly. This can be demonstrated using the following relationship: Cres−p=Fα(Dres−p)=Fα(D1′)=Fα(D1)′=C1′=Cres−t.
In response to an execution fault being signalled, in a step 104, the microprocessor 2 implements one or more countermeasures. A wide range of countermeasures are possible. The countermeasures implemented may have very different degrees of severity. For example, the countermeasures that are implemented may range from simply displaying or simply storing an error message without interrupting the normal execution of the machine code 32 as far as definitively taking the microprocessor 2 out of service. The microprocessor 2 is considered to be out of service when it is definitively put into a state in which it is incapable of executing any machine code. Between these extreme degrees of severity, there are many other possible countermeasures, such as:
using a human-machine interface to indicate detection of the faults,
immediately interrupting the execution of the machine code 32 and/or reinitializing it, and
deleting the machine code 32 from the memory 4 and/or deleting the backup copy 40 and/or deleting the secret data.
The function Fα described earlier allows the bit locality to be preserved. This denotes the property according to which the bits of a data item Di that are placed within the block B2j+1,q or B2j,q still remain within this block. In other words, the transpositions Tαj,q−1 to Tαj,0 that apply to the bits of this block B2j+1,q or B2j,q cannot permute a bit of this block with a bit placed outside this block. On the other hand, the bits of the block B4+1,q or B2bq can be moved within this block by applying these transpositions Tαj,q−1 to Tαj,0. This stems from the fact that, for all of the stages Eq for which q is less than NbE−2 and for all of the transpositions Tαj,q of this stage, the blocks B2j+1,q and B2j,q are both placed within one and the same block Bl,q+1 of the higher stage Eq+1 . It is this particular property of the function Fα that allows simple and fast computation circuits for computing the code Cres−t to be produced, for each arithmetic instruction. This is illustrated below in the particular case of bit shift instructions, a rotation instruction and an addition instruction. However, on the basis of these examples, a person skilled in the art is capable of producing other computation circuits for computing the code Cres−t for other arithmetic instructions.
In
The bits of the code Cres−t are denoted by the symbols a′0 to a′7, these bits a′0 to a′7 being classified in the same order as the bits of the code C1.
For each stage Eq of the function Fα, the circuit 120 comprises a corresponding stage EDq. Each stage EDq comprises a component CDαj,q for each transposition Tαj,q of the stage Eq. More precisely, each component CDαj,q is associated with a respective corresponding transposition Tαj,q. The components CDαj,q are classified in ascending order of their subscript, which varies depending on the subscript j.
For q less than NbE−1, the components CDαj,q are all structurally identical to one another. One of these components CDαj,q is shown in more detail in
The output 122 delivers the result a.k′+c.k, where a, k and c are the values received at the inputs 126, 128 and 129, respectively.
The output 123 delivers the result b.k+c.k′, where b is the value received at the input 127.
The output 124 delivers the result a.k+b.k′.
The inputs 126 and 127 of each component CDαj,0 of the stage ED0 are connected to the 2j-th and (2j+1)-th bits, respectively, of the code C1. For q greater than zero and q less than NbE−1, the inputs 126 and 127 of each component CDαj,q are connected, respectively, to the outputs 124 of the components CDα2j,q−1 and CDα(2j+1),q−1, respectively, of the stage EDq−1.
The input 128 of each component CDαj,q receives the parameter αj,q of the transposition Tαj,q with which it is associated.
For q greater than zero, the outputs 122 and 123 of each component CDαj,q are connected to the input 129 of the components CDα(2j+1),q−1 and CDα2j,q−1 , respectively. For q=0, the outputs 122 and 123 of each component CDαj,0 deliver the (2j+1)-th and 2j-th bits, respectively, of the code Cres−t.
The stage EDNbE−1 comprises a single component CDα0,NbE−1. The component CDα0,NbE−1 is shown in more detail in
The inputs 136 and 137 of the component CDα0,NbEA−1 are connected to the outputs 124 of the components CDα0,NbE−2 and CDα1,NbE−2 respectively. The input 138 receives the parameter α0,NbE−1. The outputs 132 and 133 are connected to the inputs 129 of the components CDα1,NbE−2 and CDα0,NbE−2, respectively.
When the instruction executed in step 96 is a shift instruction for shifting one bit to the left, the circuit 120 is selected in order to perform step 98. To this end, the bits of the code C1 are delivered to the inputs 126 and 127 of the components CDαj,0. In response, the components CDαj,0 deliver the results present at their output 124 to the inputs 126 and 127 of the higher stage ED1. This process is repeated stage by stage until the component CDα0,NbE−1 is reached. The component CDα0,NbE−1 then delivers the results a.k′ and b.k that are present at its outputs 132 and 133, respectively, to the inputs 129 of the components CD60 j,NbE−2. In response, the component CD60 j,NbE−2 delivers the results a.k′+c.k and b.k+c.k′ to the inputs 129 of the components CDαj,q of the lower stage Eq. The process is then repeated stage by stage until the components CDαj,0 of the stage ED0 are reached. The components CDαj,0 then deliver the various bits of the computed code Cres−t at their outputs 122 and 123. Next, the method continues with step 100.
The operation of the circuit 120 is illustrated in
The following notations are used below to describe the circuit 130. The symbols BCy,r, BCIx,r, BCRx,r and BDz,r denote the blocks of 2r bits at the position y in the code C1, at the position x in an intermediate code CI, at the position x in the code Cres−p and at the position z in the data item D1, respectively. The subscripts y, x and z here are order numbers that begin at 0 and increase by 1 each time there is a move from one block of 2r bits to the next block of 2r bits, moving towards the most significant bits.
The circuit 130 comprises a higher permutator 131 and a scheduler 134. The permutator 131 allows the position of the blocks of 2r bits within the code Cres−t to be computed from the code C1. To that end, here, the permutator 131 comprises stages EDr to EDNbE−1 , which are identical to the stages EDr to EDNbE−1 of the circuit 120 except that instead of manipulating blocks of 1 bit, it is blocks of 2r bits that are manipulated.
More precisely, the inputs 126 and 127 of each component CDαj,q, for q greater than or equal to r, receives blocks of 2r bits rather than of a single bit. Thus, the outputs 122 and 123 of the components CDαj,r deliver an intermediate code CI, in which each block BCIx,r of 2r bits is at the desired location, that is to say at the location that it needs to occupy in the code Cres−t, to the scheduler 134.
Moreover, each block BCIx,r is identical to a corresponding block BCy,r of the code C1. The reason is that the permutator 131 is only able to move the blocks BCy,r with respect to one another in order to obtain the intermediate code CI. The permutator 131 does not permute the bits placed within a block BCy,r. The position of the block BCy,r which is identical to the block BCIx,r placed at the position x in the code CI, is denoted y below.
The order of the 2r bits within the block BCIx,r is identical to the order of the 2r bits within the corresponding block BCy,r. The arrangement of the 2r bits within the block BCy,r results from application of some of the transpositions of the stages Er−1 to E0 to a corresponding block BDz,r when the code C1 is calculated. The position of the block BDz,r from which the positions of the bits placed within the block BCy,r are computed is denoted z below. The position z of the block BDz,r is not necessarily the same as the position y of the block BCy,r because the transpositions of the stages ENbE−1 to Er may have moved this block BDz,r before applying the transpositions of the next stages thereto. The composition of the transpositions Tαj,q of the stages Er−1 to E0 applied, during construction of the code C1, to the bits placed within the block BDz,r in order to obtain the block BCy,r is denoted Fαcy. The key αcy is a subset of the key a that contains only the parameters of the transpositions of the function Fαcy. The key αcy is called the “current key” below because it is the one that explains the present arrangement of the bits within the block BCy,r. The key αcy is dependent on the position y of the block BCy,r. The parameter of the transposition Tαj,q that is contained in the current key αcy is also denoted αcj,q below.
In the code Cresp−p, the arrangement of the 2r bits within the block BCRx,r results from application, to the corresponding block BDz,r, of some of the transpositions of the stages Er−1 to Eo. The corresponding block BDz,r is the same as the one that corresponds to the block BCy,r. The composition of the transpositions Tαj,q of the stages Er−1 to E0 applied, during construction of the code Cresp−p, to the bits placed within the block BDz,r in order to obtain the block BCRx,r is denoted Fαsx. The key αsx is called the “desired key” below because it is the one that determines the arrangement of the bits within the block BCRx,r. The key asx is dependent on the position x of the block BCRx,r. The parameter of the transposition Tαj,q that is contained in the desired key asy is also denoted asj,q below.
The result of the explanations above is that the arrangement of the 2r bits within the block BCIx,r is not necessarily identical to the arrangement of the 2r bits within the block BCRx,r that occupies the same position x in the code Cres−t. The reason is that the current key acy is not necessarily identical to the desired key αsx. The scheduler 134 rearranges the order of the bits within each block BCIx,r to obtain the desired block BCRx,r.
To explain this, let us suppose that the data item D1 comprises four blocks, of 2r bits each, denoted in the order BD3,r, BD2,r, BD1,r and BD0,r. It is also supposed in this example that NbE=d=4 and r=2 and that the parameters α0,3, α1,2, α0,2 are equal to 1, 1 and 0, respectively. The computation of the code C1 is broken down into first and second successive phases. During the first phase, it is the transpositions of the stages ENbE−1 to Er that are applied. Thus, during this first phase, only the whole blocks BDz,r of the data item D1 are permuted. In this example, at the end of this first phase, the order of the blocks BDz,r is as follows: BD0,r, BD1,r, BD3,r and BD2,r.
Next, during the second phase, it is the transpositions of the stages Er−1 to E0 that are applied. This second phase permutes only the bits within each of the blocks BDz,r. During this second phase, no applied transposition moves a bit placed within a block BDz,r to another block. At the end of this second phase, the code C1 is obtained. The four blocks, of 2r bits each, of the code C1 obtained are denoted in the order BC3,r, BC2,r, BC1,r and BC0,r.
During the second phase, the bits of the block BD0,r are permuted by applying the transpositions of the stages Er−1 to E0, which are applied only to the 2r most significant bits. This stems from the fact that at the end of the first phase, the block BD0,r is at the location of the most significant bits, that is to say at the position y=3 here. The composition of the transpositions of the stages Er−1 to E0 that apply only to the 2r most significant bits is denoted Fαc3. Similarly, the compositions of transposition of the stages Er−1 to E0 that permute only the bits placed within the blocks BD1,r, BD3,r and BD2,r respectively are denoted Fαc2, Fαc1 and Fαc0. It is thus noted that the block BC3,r of the code C1 is the result of application of the function Fαc3 to the bits of the block BD0,r. Similarly, the blocks BC2,r, BC1,r and BC0,r of the code C1 are the results of application of the functions Fαc2, Fαc1 and Fαc0 to the blocks BD1,r, BD3,r and BD2,r, respectively.
Following the logic shift of 2r bits to the left, the result Dres−p is equal to the concatenation, in order, of the blocks BD2,r, BD1,r, BD0,r and of a block [0] of 2r null bits.
Application of the function Fα to the result Dres−p in order to compute the code Cres−p is broken down, similarly, into a first phase then a second phase. At the end of the first phase, the blocks of the result Dres−p are classified in the following order: [0], BD0,r, BD2,r and BD1,r. During the second phase, the functions Fαs3 to Fαs0 are applied to the blocks [0], BD0,r, BD2,r and BD1,r, respectively. Thus, the blocks BCR3,r, BCR2,r, BCR1,4, BCR0,r of the code Cres−p are the results of application of the functions Fαs3 to Fαs0 to the blocks [0], BD0,r, BD2,r and BD1,r, respectively.
The intermediate code CI delivered by the permutator 131 is the concatenation, in order, of the blocks [0], BC0,r, BC2,r and BC1,r. The block BCI2,r is the result of application of the function Fαc3 to the block BD0,r. The desired block BCR2,r, which occupies the same position in the code Cresp−p, is the result of application of the function Fαs2 to the block BD0,r. Thus, in the case of the block BCI2,r, the role of the scheduler 134 is to cancel application of the transpositions of the function Fac3 and to apply the transpositions of the function F2 instead in order to obtain the desired block BCR2,r.
For this, for example, for each stage Er−1 to E0 of the function Fα, the scheduler 134 comprises a stage corresponding to EOr−1 to EO0. Each stage EOq comprises 2d−q−1 comparators COj,q. Each comparator COj,q is associated with the corresponding transposition Tαj,q that permutes the blocks BCI2j+1,q and BCI2j,q of the intermediate code CI when the value of its parameter is equal to one. The size of the blocks BCI2j+1,q and BCI2j,q is equal to 2q, To simplify
The current αc and desired as keys are obtained as follows, for example. The key α is divided into 2r blocks BKz,r of 2r bits each. Each block BKz,r contains only the parameters αj,r−1 to αj,0 of the transpositions to be applied, when the code C1 is calculated, to the bits placed within the respective block BDz,r of the data item D1. Within each block BKz,r, the parameters αj,r−1 to αj,0 are classified in a predetermined order. For example, here, they are first of all classified in descending order of stages and the various parameters αj,q of one and the same stage Eq are also classified in descending order of subscript j. The desired key as is equal to the key a in which the parameters are classified as described above. Next, the various blocks BKz,r are permuted, for example, by a circuit identical to the permutator 131. The key containing the blocks BKz,r permuted in this way is equal to the current key αc. The parameters αcj,q and asj,q are the parameters that occupy the same position in the current key ac and the desired key as, respectively.
For each stage Eq of the function Fα, the circuit 140 comprises a stage ETq. Each stage ETq has four inputs 142 to 145 and three outputs 146 to 148. The input 142 receives a code CPq to be permuted. Each code CPq comprises 2d−q blocks BCPj,q, of 2q bits each. These blocks BCPj,q do not overlap and are immediately consecutive. The subscript j is equal to the order number of the block BCPj,q counting from the block BCP0,q that contains the least significant bits.
Each input 142 is connected to the output 146 of the previous stage ETq+1, except the input 142 of the stage ETNbE−1 , which receives the code C1 at its input 142.
The input 143 receives the parameters αj,q of the stage Eq of the function Fα. Here, to this end, the input 143 of the stage ETq is connected to the output 147 of the previous stage ETq+1.
The input 144 receives a permutation key KETq that contains only the parameters αj,q−1 to αj,0 that are required for implementing the transpositions of the stages Eq−1 to E0 of the function Fα. This key KETq is divided into 2d−q blocks BKj,q of 2q bits each. Each block BKj,q contains only the parameters αj,q−1 to αj,0 of the transpositions to be applied to the bits placed within the block Bj,q of the data item D1. Within each block BKj,q, the parameters αj,q−1 to αj,0 are classified in a predetermined order. For example, here, they are first of all classified in descending order of stages and the various parameters of one and the same stage Eq−1 are also classified in descending order of subscript j. For example, let us suppose that q=3, the transpositions to be applied to the block B2,3 are, successively, the transpositions Tα2,2, Tα5,1, Tα4,1, Tα11,0, Tα10,0, Tα9,0 and Tα8,0. This follows from the organization of the transpositions Tαj,q in the function Fα described with reference to
The input 145 receives the coefficient aq of the shift to be applied to the data item D1.
The stage ETq comprises two permutators 150 and 151, two shift registers 154 and 155 and two multiplexers 158 to 159.
The permutator 150 executes the transpositions of the stage Eq on the data received at its input 142. In other words, the permutator 150 executes the following function: Tαm,qo . . . oTαj,qo . . . oTα0,q(CPq), where m is equal to 2d−q−1.
To do this, the permutator 151 is connected to the input 142 in order to receive the code CPq to be permuted and to the input 143 in order to receive the parameters αm,q to α0,q.
The code permuted by the permutator 150 is transmitted directly to a first input of the multiplexer 158 and, in parallel, to an input of the register 154.
The register 154 performs a logic shift of 2q bits to the left on the bits received at its input in order to obtain a permuted and shifted code, which is delivered to a second input of the multiplexer 158.
The multiplexer 158 connects the first input directly to the output 146 if the coefficient αq received at the input 145 is equal to zero. If the coefficient aq received is equal to one, the multiplexer 158 connects its second input directly to the output 146.
The permutator 151 and the register 155 are identical to the permutator 150 and the register 154, respectively. They therefore perform the same operations as the permutator 150 and the register 154, respectively, but applied to the key KETq, that is to say to the key received at the input 144.
The multiplexer 159 selects the permuted key delivered by the permutator 151 if the coefficient αq is equal to zero. Otherwise, it selects the permuted and shifted key delivered by the register 155. Moreover, using the key selected on the basis of the coefficient aq, the multiplexer 159 delivers to the output 147 the parameters αj,q−1 required for configuring the transpositions of the stage ETq−. It also delivers the key KETq−1 to the output 148.
The operation of the circuit 140 is as follows: the permutator 150 of the stage ETq cancels the effect of the transpositions Tj,q of the stage Eq that is applied to the data item D1 when the code C1 is calculated. It should be remembered here that Tj,q o Tj,q is the identity function. Thus, in the permuted code delivered by the permutator 150, the blocks BCPj,q occupy the same position as the one that they had in the data item D1. However, within each block BCPj,q, the position of the bits corresponds to the result of application of the transpositions of the stages Ej,q−1 to Ej,0 to the bits placed within the block Bj,q of the data item D1. Thus, the order of the bits within the blocks BCPj,q is not the same as the order of the bits within the block Bj,q of the data item D1. However, this is not important for the application of a logic shift of 2q bits to the left, because such a shift shifts only blocks of 2q bits. Thus, applying the shift of 2q bits in the stage ETq allows the code Cres−t to be computed without this requiring:
the data item D1 to be found from the code C1, then
the various logic shifts of 2q bits to be applied to this data item D1 that has been found.
In this embodiment, the parameters αj,q−1 to αj,0 required for configuring all of the transpositions Tj,q−1 to Tj,0 to be applied to the bits placed within a block BCPj,q received at the input 142 are placed within the block BKj,q of the same size and occupying the same position j in the key KETq received at the input 144. At the output 148, this relationship between the position of the blocks BCPj,q−1 and the position of the blocks BKj,q−1 is preserved. In other words, in the key delivered at the output 148, the block BKj,q−1 contains all of the parameters αj,q−2 to αj,0 required for configuring the transpositions to be applied to the bits placed within the block BCPj,q−1 delivered at the output 146.
To preserve this relationship, the permutator 151 and the register 155 apply the same transpositions and the same shifts, respectively, as those applied to the code CPq, but this time to the key KETq received. Next, the multiplexer 159 extracts from the selected key the parameters αj,q−1 to be delivered to the output 147. The multiplexer 159 also extracts the parameters αj,q−2 to αj,0 and generates the key KETq−1 that is delivered to the output 148. Here, the multiplexer 159 identifies the parameters αj,q to be extracted on the basis of their position in the selected key.
The output 146 of the stage ET0 delivers the code Cres−t to be compared with the code Cres−p in order to check, in step 100, that the execution of the shift instruction has taken place without a fault.
For each stage Eq of the function Fα, the circuit 149 comprises a corresponding stage ERq. Each stage ERq comprises a component CCαj,q for each transposition Tαj,q of the stage Eq. More precisely, each component CCαj,q is associated with a respective corresponding transposition Tαj,q.
For q less than NbE−2, each component CCαj,q is identical to the component CDαj,q of the circuit 120. The component CCα0,NbE−1 is shown in
The output 156 consistently delivers the same value as that received at the input 152. The output 157 consistently delivers the same value as that received at the input 153. In other words, the component CCα0,NbE−1 consistently reverses the position of the bits received at its inputs. This allows the most significant bit to be reinjected at the location intended to receive the least significant bit.
The operation of the circuit 149 is derived from the explanations given for the circuit 120.
The circuit 160 comprises a stage 162 of adders and a carry look-ahead unit 164.
The stage 162 comprises an adder ADp for each pair of bits ap, bp to be added. The subscript p denotes the position of the bits in the codes C1, C2 and Cres−t. These adders ADp are structurally identical to one another and differ from one another only in the bits ap and by that they add.
The adder ADp is shown in more detail in
The input 172 receives a carry cp to be used in the addition of the bits ap and bp.
The output 174 delivers the result ap.bp. The output 175 delivers the result ap+bp. The output 176 delivers the bit sp. The bit sp is computed by the adder ADp using the following relationship: sp=apXOR bpXOR cp.
The function of the unit 164 is to rapidly propagate the various carries cp to be used by the adders ADp. To that end, here, the unit 164 computes the carries cp from the information delivered at the outputs 174 and 175 of each adder ADp.
In this embodiment, the unit 164 comprises a stage EAq for each stage Eq of the function Fα. Each stage EAq comprises 2d−q−1 components CAαj,q, where the subscript j is the order number of the component CAαj,q in the stage EAq. Each component CAαj,q is associated with a respective transposition Tαj,q of the function Fα.
Here, the components CAαj,q are all structurally identical to one another and are distinguished only by their connection to the other components of the circuit 160.
The component CAαj,q is shown in more detail in
The output 189 delivers the result PI.Pr. The output 190 delivers the result ci.k′+(GI+PI.ci).k, where ci is the value received at the input 184. The output 191 delivers the result ci.k+(Gr+Pr.C1).k′.
The input 185 receives the parameter αj,q of the transposition Tαj,q associated with this component CAαj,q.
The inputs 180 and 181 of each component CAαj,0 of the stage EA0 are connected to the outputs 174 and 175, respectively, of the adder AD2j. The inputs 182 and 183 of each component CAαj,0 of the stage EA0 are connected to the outputs 174 and 175, respectively, of the adder AD2j+1.
The outputs 190 and 191 of each component CAαj,0 of the stage EA0 are connected to the inputs 172 of the adders AD2j and AD2j+1 , respectively.
For q greater than zero:
the inputs 180 and 181 of each component CAαj,q of the stage EAq are connected to the outputs 188 and 189, respectively, of the component CAα2j,q−1 of the lower stage EAq−1,
the inputs 182 and 183 of each component CAαj,q are connected to the outputs 188 and 189 of the component CAα(2j+1),chi of the lower stage EAq−1,
the outputs 190 and 191 of each component CAαj,q are connected to the inputs 184 of the components CAα2j,q−1 and CAα(2j+1),q−1, respectively, of the stage ETq−1
The outputs 188 and 189 of the component CAα0,NbE−1 are not used. The input 184 of the component CAα0,NbE−1 allows the carry computed by another circuit, for example identical to the circuit 160, to be received. This allows multiple circuits 160 to be linked to one another, so as to perform additions on data of greater size.
The unit 164 functions as a conventional carry look-ahead computation unit. Here, however, this conventional unit is modified to take account of the transpositions Tαj,q and therefore the parameters αj,q that are used for computing the code Cres−t. In summary, the components CAαj,q propagate the carry to the right when the parameter αj,q is equal to zero and to the left when the parameter αj,q is equal to one.
Variants of the Function Qα:
In the relationship Qa(Di)=P o Fα(Di), the function P is not necessarily the identity function. For example, the function P is a compression function that constructs, from each of the bits of the result Fα(Di), a code C1 whose size, in terms of the number of bits, is less than 2d. The reason is that when the function P is the identity function, the size of the code Ci is equal to the size of the data item Di, that is to say equal to 2d. Now, in some contexts, it is desirable to reduce the size of the code C. For example, this is desirable in order to reduce the space that it can take up in the cache memory 27. By way of illustration, to this end, the function P is the 30 function that performs the following operations:
1) the function P divides the result Fα(Di) into two blocks P0 and p1 of bits of the same size, then,
2) the function P performs an “EXCLUSIVE-OR” between the blocks P0 and p1. In this case, the size of the code Ci is halved and is equal to 2d−1.
Many other compression functions P are possible. The function P can also be different from the identity function and from a compression function. For example, the function P is an encryption or other function.
When the function P is different from the identity function, each of the computation circuits described here is broken down into a first and a second subcircuit. The first subcircuit is identical to one of the computation circuits described earlier. This first subcircuit therefore delivers a code Cres−int, which, in the absence of an execution fault, is consistently equal to the result Fα(Dres−p). The second subcircuit applies the predetermined function P to the code Cres−int in order to obtain the code Cres−t.
The various variants are described below in the particular case where the function P is equal to the identity function. However, these variants also apply to the case where the function P is different from the identity function.
As a variant, the transposition Tαj,q permutes the blocks B2j+1,q and B2j,q when the parameter αj,q =0 and does not permute them when the parameter αj,q=1.
The function Fα has been described in the particular case where the stages of transpositions first transpose the blocks of greater size and end by transposing the blocks of smaller size. However, as a variant, the stages Eq of transpositions can be executed and classified in reverse order. In this case, the transpositions of smaller size are applied first, ending by applying the transposition Tα0,NbE−1 of greater size. The order in which the various stages Eq are classified does not modify the bit locality property described earlier. Thus, even when the order of the stages Eq is reversed, it is possible to construct fast and simple computation circuits for computing the code Cres−t for arithmetic instructions.
As a variant, one or more stages of the function Fα are omitted.
Some of the transpositions Tαj,q can be omitted. In this case, at least one of the stages comprises fewer than 2d−q−1 transpositions Tαj,q.
Variants of the Computation Circuits for Computing the Code Cres−t:
The teaching provided here in the case of a few arithmetic instructions, such as shifts, rotations and additions, can be applied to other arithmetic instructions. In particular, it is possible to take the teaching provided in these particular cases as a basis for developing computation circuits for computing a code Cres−t for other arithmetic instructions. For example, the circuit 120 can be modified to compute the code Cres−t corresponding to a logic shift instruction for shifting 1 bit to the right. In practice, it is sufficient, for this purpose, to retain the same circuit as the circuit 120, but to send the complement of the parameter αj,q, that is to say the parameter αj,q′, to the input 128 or 138 of each of the components CDαj,q.
Equally, it is possible to construct a computation circuit for computing the code Cres−t for an arithmetic shift instruction for shifting one bit to the left or to the right. An arithmetic shift is distinguished from the logic shifts described earlier by the fact that the most significant bit remains unchanged and is therefore not shifted, unlike the other bits.
In the circuit 140, each 2q-bit shift register can be replaced by a register that performs a rotation of 2q bits. The circuit thus obtained computes the result Cres−t, which corresponds to a rotation instruction for rotating the bits of the data item D1 aNbE−12NbE−1+ . . . +aq2q+ . . . +a0 bits to the left. By replacing these registers with registers that perform a shift to the right or a rotation to the right, the circuit obtained computes the code Cres−t corresponding to a shift or rotation instruction for shifting or rotating aNbE−12NbE−1+ . . . +aq2q+ . . . +a0 bits to the right.
It is also possible to link multiple computation circuits in order to compute the code Cres−t for an operation that corresponds to the composition of multiple suboperations for each of which there is already a computation circuit for computing the code Cres−t. For example, if the executed operation is a logic shift instruction for shifting two bits to the left, the code Cres−t is computed by applying the circuit 120 twice. To that end, the code C1 is first injected at the inputs of the circuit 120 and a first intermediate code CIcres−t is obtained. Next, the code CIcres−t is injected at the inputs of this same circuit 120 in order to obtain the desired code Cres−t.
Similarly, the circuits 120 and 130 can be linked in order to compute the code Cres−t for a logic shift to the left for any number of bits.
If the coefficients aq are consistently constant, as a variant, the inputs 145 of the circuit 140 are omitted. In this case, the coefficients αq are wired inside each stage ETq. For example, the multiplexer 158 is omitted. If the value of the coefficient aq is consistently equal to one, then the output of the register 154 is directly connected to the output 146. The multiplexer 159 is also simplified, since it consistently selects the output of the register 155. Conversely, if the coefficient aq is consistently equal to zero, then the output of the permutator 150 is directly connected to the output 146 and the register 154 is omitted. Equally, the register 155 is omitted.
As a variant, the component CAα0,NbE−1 of the circuit 160 is devoid of the outputs 188 and 189, which are not used.
As a variant, the output 175 of the component ADp delivers the result αp XOR by rather than the result αp+bp.
In a simplified embodiment, only the execution of some arithmetic or logic instructions is secure. For example, only the execution of one of the following instructions is made secure by implementing the method described here: the bit shift instruction, the bit rotation instruction and the bit addition instruction. The execution of the other instructions, such as the logic instructions, is thus not secure. In this latter case, this means that, for these other instructions, no code res−t is calculated and step 100 is omitted.
Other Variants:
The module 28 is not necessarily a hardware module of a single block. As a variant, it is made up of multiple hardware submodules that each perform one of the specific functions of the module 28. These hardware submodules are thus preferably embedded as close as possible to the data that they process. For example, in this case, the hardware submodule that computes the code Ci associated with each data item D1 is embedded in the cache memory 27. From then on, the code Ci associated with each data item Di stored in the cache memory 27 is computed locally in this cache memory.
As a variant, each instruction of the machine code is also associated with an integrity code Fα(Ii) computed from the value of the loaded instruction Ii. This code Fα(Ii) is verified just before the unit 10 executes the instruction Ii. This allows the signalling of an execution fault to be triggered if the instruction Ii is modified in the queue 22.
It is possible to associate the code Ci with the data item Di in various ways. For example, instead of storing the code Ci in the same register Ri as the one that contains the data item D1, the code Ci is stored in a register RCi associated with the register Ri rather than in the register Ri.
The secret key a can be modified, for example, at regular intervals.
Other embodiments of step 100 are possible. For example, the module 28 computes, as in the case of the circuit 140, a code Cres−t that is equal to the result Dres−p in the absence of an execution fault. In this case, the code Cres−t is computed using the following relationship: Cres−t=Fa-1(C Ires−t), where:
the function Fα−1 is the inverse of the function Fα, and
CIres−t is the code computed, for example, by the circuits 120, 130, 149 or 160.
The various computation circuits for computing the code Cres−t that are described here can be implemented independently of one another.
Computing the code C; using a secret key α makes the method for executing the machine code more robust in the face of attempted attacks. The reason is that the attacker then has greater difficulty in falsifying the code Cres−t so that it corresponds to an expected code when an execution fault has been deliberately introduced. Thus, the methods described earlier have the same advantages in terms of robustness as the one described in the article by DEMEYER2019. Moreover, the use of a function Fα that has the locality property described earlier makes it possible to obtain computation circuits for computing the code Cres−t that are simpler and faster than those required for implementing the method of the article DEMEYER2019.
The circuits 120, 130, 140, 149 and 160 each allow simple and fast computation of the code Cres−t corresponding to a specific arithmetic instruction.
The circuit 110 also allows simple and fast computation of the code Crest−t for all Boolean operations.
Number | Date | Country | Kind |
---|---|---|---|
21 04898 | May 2021 | FR | national |