Embodiments of the present invention relate to a memory access that is at least partially randomized.
A first embodiment relates to a device for a memory access, the device comprising
The first component may be a processor or a controller. The second component may be another processing device, which may also refer to as address randomization unit.
A second embodiment relates to a second component that is arranged for
A third embodiment relates to a method for accessing a memory comprising
A forth embodiment is directed to a computer program product directly loadable into a memory of a digital processing device, comprising software code portions for performing the steps of the method described herein.
Embodiments are shown and illustrated with reference to the drawing(s). The drawing(s) serve to illustrate the basic principle, so that only aspects necessary for understanding the basic principle are illustrated. The drawing(s) are not to scale. In the drawing(s) the same reference characters denote like features.
For many attacks on smart cards or embedded systems an attacker needs to know at what time which data value is processed. The attacker may probe the value, do a template attack or falsify the value. An example is the implementation of a cryptographic algorithm: If the attacker knows the time at which intermediate values are processed, he can perform various attacks on such intermediate values.
In order to raise the burden for an attack to be successful, the steps of an algorithm may be randomized, i.e. operations may be executed in a random order. However, not all steps may be subject to such reordering, because a certain succession may be inherent and necessary for the algorithm to work correctly and produce the results required.
There exist several algorithms, though, which have a degree of freedom with regard to their respective sequence of performing operations. For example, in the Advanced Encryption Standard, AES, an AES SubBytes operation is performed on 16 Bytes. The sequences in which the 16 operation are performed can be randomized, i.e. the 16 lookups may be performed in a random order. For example, operations that can be conducted in parallel may be subject to a randomization when performed in sequence.
A disadvantage, however, may be that the randomization for software implementations is also done in software. Hence, the random values that are used to randomize the execution sequence of a software implementation are produced by this very software. Consequently, the random values themselves are subject to the same attack as the data that needs to be protected.
For example: If an attacker probes (e.g., by putting needles on) a CPU data bus, he learns about the random values and all data that are processed in a randomized order (determined by said random values). In such scenario, the randomization has no effect, because the attacker is aware of it and may remove its effect without significant difficulty.
Similar considerations may apply for determining a power consumption of a device under attack: If the attacker is able to build a template on the values that are processed by the CPU, this template can be used to reveal the random values as well as the protected values, which may render the randomization ineffective.
Examples presented herein in particular allow randomizing an execution sequence of a software implementation in hardware, whereby the randomization may preferably be hidden from the software. Hence, random values that are used to generate the randomization sequence may not be provided and/or processed by the software and cannot be determined by an attack directed on the software. As the software is not aware of the randomization sequence, the processor (e.g. CPU) or the software cannot leak (or supply) any information about such randomization.
It is noted that the randomization may be a pseudo-randomization. Hence, any random value may be either a true random value or it may be calculated by a deterministic mechanism. In this context, good random numbers are qualified by the property that the attacker has no advantage in guessing the next random number also if he knows all previously generated random numbers.
An advantage of this approach is that it shifts the rather difficult problem of protecting one large component (e.g., a CPU) towards protecting several components, i.e. in addition to the large component a smaller component (e.g., a random generator and/or an address randomization unit). Hence, the attacker needs to direct his efforts towards several components (i.e. the large and the smaller one) instead of only one.
The ARU 102 maps an input address a to an output address a′. The mapping may be based on at least one of the following:
The mapping may in particular be based on the address a, the configuration c may be explicitly provided or it may be hard-coded in the ARU 102. Optionally, the address a may be part of a configuration c. It is also an option that the access type t is supplied by the CPU 101 to the ARU 102.
The internal state s can be updated by different means. For example, the internal state s may be set to an initial state when loading the configuration c and it may be updated by read or writes accesses of the CPU 101. It is noted that read and/or write operations may lead to different state transitions and may hence have a (different) effect on the randomization of the address a.
The configuration c of the ARU 102 may comprise at least one of the following:
At least one of the following configurations of the ARU 102 could be switched during runtime and may hence be the subject to the randomization:
A sequence may be randomized in which the software performs an AES S-box operation on 16 byte values. Performing such AES S-box operation on 16 byte values means calculating yi=S(xi), where X0 . . . 15 are input bytes and y0 . . . 15 are output bytes, i.e. 16 operations that may be conducted in parallel.
In a common software implementation this is done by repeating the following operations for each byte:
By using the ARU 102 to randomize the sequence in which the bytes are processed, the software code may (substantially) be the same. The ARU 102 only needs to be configured before executing the code.
In this example, the ARU 102 may be configured with a read memory area of 16 bytes and a write memory area of 16 bytes. For read operations, the ARU 102 may be configured such that the next 16 read operations from the read memory area will lead to a random permutation of the 16 byte values that are stored in the read memory area. Hence, the ARU 102 may replace the addressing within the 16 bytes provided by the CPU 101 with a random permutation. Hence, without the ARU 102, the CPU would use a predetermined scheme to address the 16 bytes, but this predetermined addressing scheme is randomized by the ARU 102 with the help of the random number generator 104.
For write permutations to the write memory area, the ARU 102 takes the same index that has been used to read a byte from the read memory area.
An AES MixColumns operation may use a specific sequence of operations on specific data (an AES state column). The following formulas describe the AES MixColumns operation on a single column.
b
o=2·a0+3·a1+a2+a3
b
1
=a
0+2·a1+3·a2+a3
b
2
=a
0
+a
1+2·a2+3·a3
b
3=3·a0+a1+a2+2·a3
Note that multiplications 2·ax and 3·ax represent a multiplication in a GF(28) and additions represent a bitwise exclusive-OR operation (XOR). Hence, the order in which the individual operands are exclusive-OR combined is not decisive for the results of the AES MixColumns operation.
Furthermore, the operation may use a temporary state variable “TEMP” of 16 bytes to store intermediate results.
The following Verilog-like description shows an exemplary functionality of the ARU 102 when configured for such AES MixColumns operation:
Initially, a random permutation generator (RPG) in the ARU is reset to zero, i.e. all values from 0 to 15 are available for selection (line 2). Then, the first value (i) of a random permutation is generated (line 3).
An output address y is generated depending on the type of a RAM access. If a read access is generated (line 6) the ARU generates an address for a specific byte based on the random value i and a source address x provided by the software. Hence, the random value i[3:2] determines the current column, the source address x[1:0] determines the current byte of the current column and random value I[1:0] determines the output byte address of the current column. The software may ensure that the source address is correctly generated and all four bytes of the current column are correctly multiplied and XORed together to calculate the result byte.
In case of a write access, the ARU generates the output address y, which equals the current random value i (line 8). At the same time a new random value is generated (line 10).
The operation finishes when all 16 values have been selected.
Essentially, the ARU functionality to perform a random read of shares from a memory may correspond to an AES S-box operation. The difference may be determined by the operation provided by the software to the data read from the memory before it is written back.
The examples suggested herein may in particular be based on at least one of the following solutions. In particular combinations of the following features could be utilized in order to reach a desired result. The features of the method could be combined with any feature(s) of the device, apparatus or system or vice versa.
A device for a memory access is suggested, the device comprising
Hence, the overall operation is distributed among two units, i.e. the first and second component. These may be arranged on the same die, circuit or piece of silicon. Alternatively, they may be arranged on different such pieces. By allowing to randomize a memory access for the portion of operations (e.g., operations that may be conducted in parallel) an attacker is confronted with the additional difficulty that some memory accesses are (or may appear) arbitrary. Hence, by probing only the first component, the attackers cannot easily determine at what time which data value is processed. This increases the safety of the device.
It is in particular an option that the communication between the first component and the second component has at least some protection against tampering and/or sniffing.
It is a further option that the first component and the second component are separate physical elements with a communication link between each other.
It is noted that “randomized manner” may be based on a random value. The random value mentioned herein may be an actual random value or a pseudorandom value.
In an embodiment, the device further comprises a random number generator for providing a random value to the second component for accessing the memory in the randomized manner.
In an embodiment, the second component accesses the memory in the randomized manner
Hence, the results of the portion of operations may be stored in memory in a memory area determined by the first component. However, the sequence accessing the memory is randomized via the second component, i.e. an additional piece of hardware that may be different from a software that runs on the first component.
In an embodiment, the second component comprises a random generator for providing a random value to the second component for accessing the memory in the randomized manner.
In an embodiment, the second component accesses the memory in the randomized manner
In an embodiment, the first component is a processor or a controller.
In an embodiment, the second component is an address randomization unit that is configured via the first component.
In an embodiment, the second component is configured via the first component by setting at least one of the following:
Hence, a configuration may determine an address range of the memory which is to be used by the second component. The portion of operations is conducted on this address range, wherein the second component provides the randomized access.
As an option, the randomized access conducted by the second component may be permutated.
In an embodiment, the portion of the operations are operations that can be conducted in parallel.
In an embodiment, the portion of the operations are operations comprises one of the following:
In an embodiment, the second component comprises an internal state that is updated via a configuration provided by the first component.
In an embodiment, the internal state is set by a read or a write operation of the first component that is conducted via the second component.
In an embodiment, the second component accesses the memory in the randomized manner, wherein a randomized access of the second component to the memory is determined based on at least one of the following:
A second component is provided that is arranged for
In an embodiment, the information is used for at least a portion of operations conducted by the first component via the second component.
A method is suggested for accessing a memory, the method comprising:
The features described above are accordingly applicable for such method.
In an embodiment, the method further comprises:
In an embodiment, the method further comprises:
A computer program product is provided, which is directly loadable into a memory of a digital processing device, comprising software code portions for performing the steps of the method as described herein.
In one or more examples, the functions described herein may be implemented at least partially in hardware, such as specific hardware components or a processor. More generally, the techniques may be implemented in hardware, processors, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, computer-readable media generally may correspond to (1) tangible computer-readable storage media which is non-transitory or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. A computer program product may include a computer-readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium, i.e., a computer-readable transmission medium. For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are instead directed to non-transient, tangible storage media. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Instructions may be executed by one or more processors, such as one or more central processing units (CPU), digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term “processor,” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, various units may be combined in a single hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.
Although various exemplary embodiments of the invention have been disclosed, it will be apparent to those skilled in the art that various changes and modifications can be made which will achieve some of the advantages of the invention without departing from the spirit and scope of the invention. It will be obvious to those reasonably skilled in the art that other components performing the same functions may be suitably substituted. It should be mentioned that features explained with reference to a specific FIGURE may be combined with features of other figures, even in those cases in which this has not explicitly been mentioned. Further, the methods of the invention may be achieved in either all software implementations, using the appropriate processor instructions, or in hybrid implementations that utilize a combination of hardware logic and software logic to achieve the same results. Such modifications to the inventive concept are intended to be covered by the appended claims.