One or more aspects of the present invention relate generally to programmable integrated circuit devices and, more particularly, to a method and apparatus for authenticating a bitstream used to configure such programmable devices.
Programmable logic devices (PLDs) exist as a well-known type of programmable integrated circuit (IC) device (“programmable device”) that may be programmed by a user to perform specified logic functions. There are different types of programmable logic devices, such as programmable logic arrays (PLAs) and complex programmable logic devices (CPLDs). One type of programmable logic device, known as a field programmable gate array (FPGA), is very popular because of a superior combination of capacity, flexibility, time-to-market, and cost.
An FPGA typically includes an array of configurable logic blocks (CLBs) surrounded by a ring of programmable input/output blocks (IOBs). The CLBs and IOBs are interconnected by a programmable interconnect structure. The CLBs, IOBs, and interconnect structure are typically programmed by loading a stream of configuration data (known as a bitstream) into internal configuration memory cells that define how the CLBs, IOBs, and interconnect structure are configured. An FPGA may also include various dedicated logic circuits, such as memories, microprocessors, digital clock managers (DCMs), and input/output (I/O) transceivers.
Programmable devices, such as FPGAs, can include decryption circuitry on-chip in order to process bitstreams that have been encrypted to provide design security. Without knowledge of the appropriate encryption key, it is difficult to analyze a bitstream in order to understand or clone the design. As a further security measure, programmable devices may include authentication logic that can be used to detect whether an encrypted bitstream (or any bitstream) has been altered.
Notably, a bitstream may include various fields that generally contain both configuration data and instructions for modifying registers in the configuration logic of the programmable device. The instructions may include setting the configuration rate, setting a startup sequence, and the like. Configuration logic in a programmable device may include various registers that implement command and control of the configuration process, where the values of such registers can be set using the instructions in the bitstream. One current bitstream authentication mechanism involves authenticating the bitstream before the device is activated, but after the instructions have been executed. That is, the instructions in the bitstream take effect as soon as they are encountered by the configuration logic. As a result, an attacker can tamper with the instructions in the bitstream in an attempt to defeat the authentication and encryption mechanisms and gain access to the design.
Accordingly, there exists a need in the art for a method and apparatus for authenticating a bitstream used to configure programmable devices that overcomes the aforementioned deficiencies.
A method of authenticating a bitstream coupled to a programmable device to configure the programmable device is described. The method can include: receiving the bitstream via a configuration port of the programmable device, the bitstream including instructions for programming configuration registers of the programmable device and at least one embedded message authentication code (MAC); initially storing at least a portion of the instructions in a memory of the programmable device without programming the configuration registers; computing, at the programmable device, at least one actual MAC based on the bitstream using a hash algorithm; comparing the at least one actual MAC with the at least one embedded MAC, respectively; and executing each instruction stored in the memory to program the configuration registers until any one of the at least one actual MAC is not the same as a corresponding one of the at least one embedded MAC, after which any remaining instructions in the memory are not executed.
In an embodiment, the at least one embedded MAC can include a single embedded MAC computed with respect to all of the instructions. The at least a portion of the instructions stored in the memory can comprise all of the instructions.
In an embodiment, the instructions can include delay-sensitive instructions and delay-insensitive instructions. The at least a portion of the instructions stored in the memory can include only the delay-insensitive instructions. The method can further include executing each delay-sensitive instruction as such instruction is received in the bitstream at the programmable device.
In an embodiment, the instructions can include head instructions and tail instructions. The bitstream can include configuration data between the head instructions and the tail instructions. The at least one embedded MAC can include a first MAC for the head instructions and a second MAC for a combination of the configuration data and the tail instructions.
In an embodiment, the bitstream includes at least one decrypt word count (DWC) that indicates a number of words respectively associated with the at least one embedded MAC. In an embodiment, the method can include decrypting at least a portion of the bitstream upon receipt from the configuration port. In an embodiment, the at least a portion of the bitstream is decrypted using a shared symmetric key.
Also disclosed is another method of authenticating a bitstream coupled to a programmable device to configure the programmable device. The method can include: receiving the bitstream via a configuration port of the programmable device, the bitstream including instructions for programming configuration registers of the programmable device and configuration data; performing at least one consistency check on the bitstream periodically; executing each of the instructions as received in the bitstream to program the configuration registers until any one of the at least one consistency check fails, after which any remaining instructions in the bitstream are not executed.
In an embodiment, performing the at least one consistency check can include: analyzing each of the instructions as received in the bitstream to detect an ill-formed instruction by comparing each instruction against a plurality of valid instructions; and indicating consistency check failure if any one of the instructions is detected as an ill-formed instruction.
In an embodiment, each of the instructions can include a corresponding checksum. Performing the consistency check(s) can include: processing the checksum for each of the instructions; and indicating consistency check failure if the checksum of any one of the instructions fails.
In an embodiment, the bitstream can include a plurality of checksums occurring periodically throughout. Performing the consistency check(s) can include: initializing a timer that counts down towards zero to a predefined value; validating each checksum as each of the plurality of checksums is received; re-initializing the timer to the predefined value after each valid checksum; and indicating consistency check failure if the timer reaches zero.
In an embodiment, the bitstream can include a message authentication code (MAC) computed with respect to all of the instructions in the bitstream.
An apparatus for authenticating a bitstream coupled to a programmable device to configure the programmable device can include: a memory a controller coupled to the memory, and an authenticator coupled to the controller. The controller can be configured to: (1) receive the bitstream via a configuration port of the programmable device, the bitstream including instructions for programming configuration registers of the programmable device and at least one embedded message authentication code (MAC); and (2) initially store at least a portion of the instructions in a memory of the programmable device without programming the configuration registers. The authenticator can be configured to: (1) compute, at the programmable device, at least one actual MAC based on the bitstream using a hash algorithm; and (2) compare the at least one actual MAC with the at least one embedded MAC, respectively. The controller can provide each instruction stored in the memory for execution to program the configuration registers until any one of the at least one actual MAC is not the same as a corresponding one of the at least one embedded MAC, after which any remaining instructions in the memory are not provided for execution.
Accompanying drawings show exemplary embodiments in accordance with one or more aspects of the invention. However, the accompanying drawings should not be taken to limit the invention to the embodiments shown, but are for explanation and understanding only.
The architecture 100 includes a large number of different programmable tiles including multi-gigabit transceivers (MGTs 101), configurable logic blocks (CLBs 102), random access memory blocks (BRAMs 103), input/output blocks (IOBs 104), configuration logic 116, clocking logic 117, digital signal processing blocks (DSPs 106), specialized input/output blocks (I/O 107) (e.g., configuration ports and clock ports), and other programmable blocks 108, such as digital clock managers, analog-to-digital converters, system monitoring logic, and so forth. The layout of the physical structures implementing the programmable logic plane 100 on the IC may be the same or similar to the layout of the logical architecture shown in
In some FPGAs, each programmable tile includes a programmable interconnect element (INT 111) having standardized connections to and from a corresponding interconnect element in each adjacent tile. Therefore, the programmable interconnect elements taken together implement the programmable interconnect structure for the illustrated FPGA. The programmable interconnect element (INT 111) also includes the connections to and from the programmable logic element within the same tile, as shown by the examples included at the top of
For example, a CLB 102 can include a configurable logic element (CLE 112) that can be programmed to implement user logic plus a single programmable interconnect element (INT 111). A BRAM 103 can include a BRAM logic element (BRL 113) in addition to one or more programmable interconnect elements. Typically, the number of interconnect elements included in a tile depends on the height of the tile. In the pictured embodiment, a BRAM tile has the same height as five CLBs, but other numbers (e.g., four) can also be used. A DSP tile 106 can include a DSP logic element (DSPL 114) in addition to an appropriate number of programmable interconnect elements. An IOB 104 can include, for example, two instances of an input/output logic element (IOL 115) in addition to one instance of the programmable interconnect element (INT 111). As will be clear to those of skill in the art, the actual I/O pads connected, for example, to the I/O logic element 115 are manufactured using metal layered above the various illustrated logic blocks, and typically are not confined to the area of the input/output logic element 115.
The FPGA architecture 100 may also include one or more dedicated processor blocks (PROC 110). The processor block 110 comprises a microprocessor core, as well as associated control logic. The processor block 110 is coupled to the programmable logic of the FPGA in a well known manner.
In the pictured embodiment, a columnar area near the center of the die (shown hatched in
Some FPGAs utilizing the architecture illustrated in
Note that
The configuration port 202 is configured to receive a bitstream for controlling the configuration circuit 208 to program the configuration memory 210. Storage locations in the configuration memory 210 control the programmable logic of the programmable device (e.g., CLBs, IOBs, etc.). The bitstream generally includes configuration data to be stored in the configuration memory 210, and instructions for execution by the configuration circuit 208 to facilitate storage of the configuration data in the configuration memory 210. The instructions may load particular values into registers 218 of the configuration circuit 208 in order to control operation of the configuration process.
The data portion 304 includes a header instruction field 310, a configuration data field 312, a tail instruction field 314, a MAC field 316, and other fields 318. The header instruction field 310 includes instruction(s) for reading and/or writing one or more of the registers 218 in the configuration circuit 208 prior to loading configuration data into the configuration memory 210. The configuration data field 312 includes the actual configuration data to be loaded into the configuration memory 210. The tail instruction field 314 includes instruction(s) for reading and/or writing one or more of the registers 218 in the configuration circuit 208 after the configuration data has been loaded into the configuration memory 210. The MAC field 316 can include a MAC code produced by a hash algorithm for a particular number of words proceeding the MAC field 316 as indicated by the DWC field 306. For example, the MAC code may be a keyed-hash MAC (HMAC) code produced using a SHA-256 algorithm, although other kinds of hash algorithms can be used. The MAC code in the MAC field 316 is referred to herein as the “embedded MAC”, to distinguish from the MAC code produced as the bitstream is processed by the authentication logic 206. The other fields 318 can include various gaps, no-operations, padding, dummy, and the like type fields. It is to be understood that the arrangement of the other fields 318 in the structure 300 of
Referring to
In some embodiments, the authentication circuitry 206 authenticates the bitstream by executing a hash algorithm and comparing the result to embedded MAC code(s) in the bitstream. Various embodiments of the authentication process are described below. The authentication circuitry 206 provides the bitstream to the configuration circuit 208. The configuration circuit 208 loads the configuration data into the configuration memory 210 based on the instructions that program the registers 218. If the authentication circuitry 206 determines that the bitstream is not authentic (i.e., has be altered), then the authentication circuitry 206 can signal a configuration error and stop all or a portion of the configuration process performed by the configuration circuit 208 and decryptor 204.
Exemplary embodiments relate to preventing an attacker from tampering with the bitstream instructions in order to obviate the encryption and authentication mechanisms and gain unauthorized access to the bitstream data. In one embodiment, all of the instructions in the bitstream are stored in memory prior to being sent to the configuration circuit 208. Once the full bitstream has been loaded and authenticated, then the instructions are read from the memory and executed by the configuration circuit 208.
Notably, the controller 214 can store each instruction in the bitstream in the memory 212. The configuration data can be loaded into the configuration memory 210. Once the entire bitstream has been read, the authenticator 216 executes the authentication algorithm (e.g., on the data portion 304) to obtain an actual MAC. The authenticator 216 compares the actual MAC with the embedded MAC to determine if the bitstream is authentic. If so, the controller 214 provides the instructions from the memory 212 to the configuration circuit 208 and the configuration process is allowed to finish. If the bitstream is not authentic, the controller 214 stops the configuration process and the instructions are not provided to the configuration circuit 208 for execution. Thus, instructions in a bitstream having unauthorized alterations are never executed.
In some cases, the instructions in the bitstream may include one or more instructions that must be executed as the configuration data is loaded to the configuration memory 210. For example, an instruction to change or set the configuration clock frequency may need to be executed by the configuration circuit 208 prior loading the configuration data into the configuration memory 210. Such instructions are referred to herein as “delay-sensitive” instructions, since execution of such instructions cannot be delayed.
Thus, in another embodiment, the controller 214 only stores in the memory 212 those instructions that are not delay-sensitive. The delay-sensitive instructions are forwarded to the configuration circuit 208 as they are received in the bitstream. In this embodiment, the delay-sensitive instructions may be moved outside the encrypted portion of the bitstream (i.e., in the plain text portion of the bitstream before the DWC field).
In the embodiments above, a single MAC code is embedded in the bitstream and is used to authenticate the entire bitstream. In another embodiment, the bitstream may be configured with multiple MAC code fields each having a MAC code for a different portion of the bitstream. In general, the configuration plane 200 reads instructions from the bitstream, authenticates the portion of the bitstream read so far, executes the instructions if authentic, reads additional instructions, and then repeats.
If header instruction field 310 does not fit in memory 212, the field 310 may be split into smaller pieces, each including a portion of the header instruction field and an embedded MAC for that portion. A separate DWC may be included with each smaller piece of the field 310. Thus, section 512 may be repeated any number of times. In particular, the memory 212 may be sized to only contain one instruction. Alternatively, the memory 212 may be sized to contain one decrypted block of data (e.g., 128 bits for AES).
Returning to
In the embodiment above, two MAC codes were embedded in the bitstream. It is to be understood that more than two MAC codes can be embedded in the bitstream for different fields or combinations of fields. Further, the structure 500 in
If at step 608 the authenticator 216 computes an actual MAC that matches the embedded MAC, the authenticator 216 signals no error to the controller 214. At step 612, the controller 214 forwards the instructions to the configuration circuit 208 for execution. At step 614, the controller 214 determines whether the bitstream includes additional instructions and/or configuration data. If so, the method 600 returns to step 602. Otherwise, the method 600 ends. Although the steps of the method 600 are shown sequentially, it is to be understood that some steps may be performed contemporaneously with other steps. For example, while the instructions are being executed at step 612, additional instructions can be stored in the memory at step 602.
If at step 710 the authenticator 216 computes an actual MAC that matches the embedded MAC, the authenticator 216 signals no error to the controller 214. At step 714, the controller 214 forwards the instructions to the configuration circuit 208 for execution. At step 716, the controller 214 determines whether the bitstream includes additional instructions and/or configuration data. If so, the method 700 returns to step 702. Otherwise, the method 700 ends. Although the steps of the method 700 are shown sequentially, it is to be understood that some steps may be performed contemporaneously with other steps. For example, while the instructions are being executed at step 614, additional instructions can be read and stored in the memory at steps 702 and 704.
In the embodiments described above, the bitstream is authenticated using one or more embedded MAC codes and a corresponding hash algorithm. As noted, in case of a single MAC code, one or more unauthorized instructions can be executed before the bitstream is found to be inauthentic. In another embodiment, the configuration plane 200 can stop the configuration process upon detection of an ill-formed instruction. Such detection can be used to combat a “flipped-bit” attack that scrambles one block before flipping the bits in a following block.
For example, the bitstream can be formatted according to the structure 300. As instructions are received in the bitstream, the controller 214 can analyze the instructions to detect an ill-formed instruction. For example, the controller 214 may include a list of valid instructions. If an attacker scrambles bits in the encrypted bitstream to employ a flipped-bit attack, then a decryption of the scrambled bits may result in a set of data that does not result in a valid instruction. Once such an invalid instruction is detected, the controller 214 can stop the configuration process. In this manner, an altered bitstream can be detected before the MAC authentication is performed. The actual probability of successfully detecting an ill-formed instruction is based on the probability that a random pattern of bits will result in an ill-formed instruction given a particular set of bit patterns resulting in valid instructions.
In another embodiment, a parity bit, instruction sequence field or checksum can be added to each instruction in the bitstream. The checksum can be a cyclic redundancy checksum (CRC). For example, a 32-bit CRC value can be added for each instruction, increasing the probability of detecting an ill-formed instruction by 232 (two to the 32nd power). Incorporation of a 32-bit CRC for each instruction requires an attacker to try approximately 231 changes to the bitstream to get a fake instruction into the bitstream. It is to be understood that more or less CRC bits can be used for each instruction in the bitstream than 32 bits. In some cases, one or more instructions may already include unused bits that can be re-purposed for use as CRC bits, parity or a sequence field.
In another embodiment, a checksum such as a CRC can be added to the bitstream periodically after a predefined number of words. For example, a checksum can be inserted after every four words in the bitstream. The AES encryption standard algorithm decrypts 128-bit blocks. If the checksum check is the first 32-bit instruction in a 4-word block, this will catch an attack before a hacked instruction can execute with probability of 1:232. A smaller CRC, for example 24 bits, has the advantage of being smaller and using less memory, but gives a lower probability of detecting a hacked instruction sequence. Since the flipped-bit attack scrambles one block before flipping the bits in the following block, a scrambled block will fail the checksum check before any hacked instruction can execute in the configuration circuit 208. To ensure that the checksum check gets executed, a watchdog counter can be employed to count the number of instructions executed since the last successful CRC (e.g., in the controller 214). Every time the checksum is successful, the timer can be reset. If the counter reaches zero, the controller 214 can stop the configuration process, since a checksum check has not be completed due to an altered bitstream.
In the embodiments described above, the authenticator 216 can perform the checksum checks on the bitstream. The authenticator 216 can inform the controller 214 in case of any failed checksum and the controller 214 can stop the configuration process in case of a failed checksum. Either the authenticator 216 or the controller 214 can implement the watchdog timer in cases where the bitstream includes periodic checksums.
While the foregoing describes exemplary embodiments in accordance with one or more aspects of the present invention, other and further embodiments in accordance with the one or more aspects of the present invention may be devised without departing from the scope thereof, which is determined by the claims that follow and equivalents thereof. Claims listing steps do not imply any order of the steps. Trademarks are the property of their respective owners.
Number | Name | Date | Kind |
---|---|---|---|
5406627 | Thompson et al. | Apr 1995 | A |
5457408 | Leung | Oct 1995 | A |
5598424 | Erickson et al. | Jan 1997 | A |
5668947 | Batcher | Sep 1997 | A |
5802003 | Iadanza et al. | Sep 1998 | A |
5892961 | Trimberger | Apr 1999 | A |
6105105 | Trimberger | Aug 2000 | A |
6181164 | Miller | Jan 2001 | B1 |
6188766 | Kocher | Feb 2001 | B1 |
6191614 | Schultz et al. | Feb 2001 | B1 |
6278783 | Kocher et al. | Aug 2001 | B1 |
6298442 | Kocher et al. | Oct 2001 | B1 |
6304658 | Kocher et al. | Oct 2001 | B1 |
6327661 | Kocher et al. | Dec 2001 | B1 |
6381699 | Kocher et al. | Apr 2002 | B2 |
6429682 | Schultz et al. | Aug 2002 | B1 |
6510518 | Jaffe et al. | Jan 2003 | B1 |
6539092 | Kocher | Mar 2003 | B1 |
6640305 | Kocher et al. | Oct 2003 | B2 |
6654884 | Jaffe et al. | Nov 2003 | B2 |
6654889 | Trimberger | Nov 2003 | B1 |
6731536 | McClain et al. | May 2004 | B1 |
6839774 | Ahn et al. | Jan 2005 | B1 |
6894527 | Donlin et al. | May 2005 | B1 |
6978370 | Kocher | Dec 2005 | B1 |
7039816 | Kocher et al. | May 2006 | B2 |
7064577 | Lee | Jun 2006 | B1 |
7162644 | Trimberger | Jan 2007 | B1 |
7200235 | Trimberger | Apr 2007 | B1 |
7254800 | Trimberger | Aug 2007 | B1 |
7373668 | Trimberger | May 2008 | B1 |
7401258 | Fang et al. | Jul 2008 | B1 |
7506165 | Kocher et al. | Mar 2009 | B2 |
7535249 | Knapp | May 2009 | B1 |
7539926 | Lesea | May 2009 | B1 |
7546441 | Ansari et al. | Jun 2009 | B1 |
7587044 | Kocher et al. | Sep 2009 | B2 |
7599488 | Kocher et al. | Oct 2009 | B2 |
7606362 | Streicher et al. | Oct 2009 | B1 |
7634083 | Kocher et al. | Dec 2009 | B2 |
7668310 | Kocher et al. | Feb 2010 | B2 |
7681233 | Fox et al. | Mar 2010 | B1 |
7787620 | Kocher et al. | Aug 2010 | B2 |
7792287 | Kocher et al. | Sep 2010 | B2 |
7966534 | Jacobson | Jun 2011 | B1 |
7984292 | Streicher et al. | Jul 2011 | B1 |
20010002486 | Kocher et al. | May 2001 | A1 |
20010015919 | Kean | Aug 2001 | A1 |
20010025337 | Worrell et al. | Sep 2001 | A1 |
20010037458 | Kean | Nov 2001 | A1 |
20010053220 | Kocher et al. | Dec 2001 | A1 |
20020099948 | Kocher et al. | Jul 2002 | A1 |
20030028771 | Kocher et al. | Feb 2003 | A1 |
20040162989 | Kirovski | Aug 2004 | A1 |
20060136723 | Taylor | Jun 2006 | A1 |
20070033419 | Kocher et al. | Feb 2007 | A1 |
20070038984 | Gschwind et al. | Feb 2007 | A1 |
20080037781 | Kocher et al. | Feb 2008 | A1 |
20080049935 | Kocher et al. | Feb 2008 | A1 |
20080101604 | Kocher et al. | May 2008 | A1 |
20080130886 | Kocher et al. | Jun 2008 | A1 |
20080133938 | Kocher et al. | Jun 2008 | A1 |
20080137848 | Kocher et al. | Jun 2008 | A1 |
20080279369 | Palmer | Nov 2008 | A1 |
20100205461 | Satou et al. | Aug 2010 | A1 |
20100287359 | Norden | Nov 2010 | A1 |
20110320855 | Ambroladze et al. | Dec 2011 | A1 |
Entry |
---|
Hori, Yohei; Satoh, Akashi; Sakane, Hirofumi; Toda, Kenji;, “Bitstream Encryption and Authentication Using AES-GCM in Dynamically Reconfigurable Systems”, Advances in Information and Computer Security, Lecture Notes in Computer Science, 2008, Springer Berlin / Heidelberg, 978-3-540-89597-8, pp. 261-278, vol. 5312. |
Badrignans, B.; Elbaz, R.; Torres, L.; , “Secure FPGA configuration architecture preventing system downgrade,” Field Programmable Logic and Applications, 2008. FPL 2008. International Conference on , vol., No., pp. 317-322, Sep. 8-10, 2008. |
Drimer, Saar; “Authentication of FPGA Bitstreams: Why and How,” Reconfigurable Computing: Architectures, Tools and Applications, Lecture Notes in Computer Science, 2007, Springer Berlin / Heidelberg, vol. 4419, pp. 73-84. |
Trimberger, S.; , “Trusted Design in FPGAs,” Design Automation Conference, 2007. DAC '07. 44th ACM/IEEE , vol., No., pp. 5-8, Jun. 4-8, 2007. |
T.M. Galla et al., “Control Flow Monitoring for a Time-Triggered Communication Controller,” 10th European Workshop on Dependable Computing (EWDC-10), Schriftenreihe der Osterreichischen Computer Gesellschaft,Vienna, 1999. pp. 43-48. |
Fong, R.J.; Harper, S.J.; Athanas, P.M.; , “A versatile framework for FPGA field updates: an application of partial self-reconfiguration,” Rapid Systems Prototyping, 2003. Proceedings. 14th IEEE International Workshop on , vol., No., pp. 117-123, Jun. 9-11, 2003. |
Xilinx, Inc., Virtex-6 FPGA Configuration User Guide UG360 (v3.0), Jan. 18, 2010, pp. 1-174, in particular pp. 94-98 and pp. 103-166, available from Xilinx, Inc., San Jose, California, USA. |
U.S. Appl. No. 12/791,608, filed Jun. 1, 2010, Bridgford et al. |
Specification and drawings for U.S. Appl. No. 13/077,814, filed Mar. 31, 2011, Trimberger. |
Xilinx, Virtex-5 FPGA Configuration User Guide, UG191 (v3.9.1), Aug. 20, 2010, Chapter 6, pp. 1-166, Xilinx, Inc., San Jose, California, USA. |
Xilinx, Virtex-6 FPGA Configuration, UG360 v3.2, Nov. 1, 2010, pp. 1-180, Xilinx, Inc., San Jose, California, USA. |