The instant application, application Ser. No. 14/964,424 is a reissue of application Ser. No. 09/932,247, filed on Aug. 16, 2001, now U.S. Pat. No. 6,996,725.
The principles of the present invention generally relate to processors, and more particularly, by way of example but not limitation, to security microcontrollers with encryption features.
Electronic devices are a vital force for creating and perpetuating the engine that drives today's modem economy; concomitantly, electronic devices improve the standard of living of people in our society. Furthermore, they also play an important role in providing entertainment and other enjoyable diversions. A central component of many of these electronic devices are processing units. Processing units may be broadly divided into two categories: (i) processors used as central processing units (CPUs) of (e.g., personal) computers and (ii) embedded processors (a.k.a. microcontrollers, microprocessors, etc.) (e.g., processors operating in cars, microwaves, wireless phones, industrial equipment, televisions, other consumer electronic devices, etc.). Although CPUs of computers garner the lion's share of reports and stories presented by the popular press, they are only responsible for less than 1% of all processors sold while microcontrollers are actually responsible for greater than 99% of all processors sold. Consequently, significant time and money is also expended for research and development to improve the efficiency, speed, security, feature set, etc. of microcontrollers. These aspects of microcontrollers may be improved, individually or in combination, by improving one or more of the individual aspects of which microcontrollers are composed. Exemplary relevant aspects of microcontrollers include, but are not limited to: processing core, memory, input/output (I/O) capabilities, security provisions, clocks/timers, program flow flexibility, programmability, etc.
Many uses of microcontrollers involve a need for security. The security may be related to the executable code, the data being manipulated, and/or the functioning of the microcontroller. For example, it is important to guard against the possibility of someone monitoring program code and deciphering communication protocols and/or information that are communicated between an automated teller machine (ATM) and the computer at an associated bank. Criminals, hackers, and mischief makers continually attempt to thwart and break existing security measures. Conventional, relatively relaxed and crackable, standards and approaches lead to security deficiencies because it is fairly easy to decipher or to access conventional systems and interfaces, and then control or otherwise jeopardize the microcontroller's mission. One technique for cracking a microcontroller's security is to monitor information entering the microcontroller and information exiting the microcontroller in order to effectively reverse engineer the program code by revealing what information may be effectuating the execution of a given instruction. It is therefore apparent that newer and stricter security measures are needed to safeguard against information theft, corruption, and/or misuse.
The deficiencies of the prior art are overcome by the methods, systems, and arrangements of the present invention. For example, as heretofore unrecognized, it would be beneficial to utilize block decryption after several reads from external memory. In fact, it would be beneficial if multiple executable instructions were decrypted simultaneously and accessible to a core processor/instruction decoder via, e.g., a decryption buffer, a cache, etc.
In certain embodiment(s), multiple instructions are loaded into a buffer from an external memory. The multiple instructions are decrypted at least substantially simultaneously and then made available to processing core and/or instruction decoder. An instruction desired by the processing core/instruction decoder may be routed directly thereto, or all or a portion of the multiple instructions may be first transferred to a cache. Also in certain embodiment(s), “n” bytes (e.g., an exemplary eight bytes) of encrypted information may be loaded byte-by-byte (or in chunks of multiple bytes) into an “n”-byte-wide encrypted buffer from an external memory. The “n” encrypted bytes may then be jointly decrypted using a designated encryption/decryption scheme, after which they may be forwarded to an “n”-byte-wide decrypted buffer. A processing core/instruction decoder, possibly in conjunction with a memory management unit (MMU)/memory controller, determines and requests a program instruction address. If the program instruction address hits in an associated instruction cache, then the instruction byte may be retrieved therefrom. If the program instruction address misses the cache but hits in the decrypted buffer, the requested instruction byte may be forwarded immediately to the processing core/instruction decoder while the “n”-bytes of the decrypted buffer are moved into the cache at the appropriate location approximately simultaneously. The buffers and decryptor, possibly in conjunction with the MMU/memory controller, may also be configured for prefetching.
A more complete understanding of the methods, systems, and arrangements of the present invention may be obtained by reference to the following Detailed Description when taken in conjunction with the accompanying drawings wherein:
The numerous innovative features of the present application are described with particular reference to the illustrated exemplary embodiments. However, it should be understood that this class of embodiments provides only a few examples of the many advantageous uses of the innovative teachings herein. In general, statements made in the specification of the present invention do not necessarily delimit any of the various aspects of the claimed invention. Moreover, some statements may apply to some inventive features, but not to others.
Security of systems including processors is problematic in that it can be extremely difficult to protect against program copying, corruption, and/or misuse. Security measures that may be taken include using block encryption of code and/or data, self-destruct inputs, or programmable countermeasures against attacks. External encryption sequencing is a critical aspect of security when fetching external data or instructions stored in an external memory. The information stored in the external memory may be encrypted using any of many encryption approaches and standards, such as, for example, the Data Encryption Standard (DES) algorithm, the triple DES algorithm, the Advanced Encryption Standard (AES) algorithm, etc. While byte-length instructions may be encrypted individually, cracking such encryption is relatively easier because a would-be hacker monitoring the processor may track input instructions and output results on something of a one-to-one basis. Block encryption, on the other hand, may be performed on multiple instructions/bytes simultaneously (e.g., a block of eight instructions composed eight bytes). Employing block encryption, however, may result in processor stalls because multiple encrypted instructions may be loaded for decryption before a decrypted instruction can be provided to the processing core/instruction decoder. An en/decryption scheme may be designed so as to rely on caching and prefetching (e.g., of blocks of instructions) to reduce or minimize the occurrence of stalls of the microcontroller. Consequently, a well-designed and well-implemented microcontroller in accordance with the principles of the present invention may execute even large loops and jumps without having to stall. In short, improvements to certain security aspects of microcontrollers may be accomplished by modifying the memory, I/O capabilities, security provisions, interrelationships therebetween, etc. of the microcontrollers.
Referring now to
Referring to
It should be noted that in certain embodiment(s) the decrypted information, or at least part thereof, may be forwarded directly from the decrypted buffer 225 to the CPU/instruction decoder 235, thus bypassing the cache 230 for at least that decrypted information. In such embodiment(s), the decrypted information may optionally be substantially simultaneously (e.g., within predictable time delays/lags of circuit elements) loaded into the cache 230. Also, it should be understood that in certain embodiment(s) the microcontroller 100 of
Referring to
Referring to
Referring to
(1) A hit occurs if there is a match and the corresponding tag is valid; or
(2) A miss occurs if there is no match or if there is a match but the tag is invalid. A hit enables the corresponding instruction or operand byte to be used for execution. A miss results in a stall of the program execution, waiting for the needed block to be loaded and decrypted. Address generation logic 415 provides the external addresses, for example, to program memory for external instruction fetch. The internal address bus 240A provides the addresses for accessing the cache 230 on reads and writes.
External data accesses are transferred through the data encryptor 420. The block size for external data memory encryption may be an exemplary one byte, but multiple-byte blocks may alternatively be implemented. If the corresponding encryption enable bit of a particular data memory chip is cleared to a logic 0, for example, the data may be transferred directly to/from the accumulator and the data memory chip. Thus, no encryption/decryption need be involved. The memory controller 405 may be responsible for coordinating the internal address bus 240A and the internal data bus 240D activities, the cache/tag access and update, program block decryption, and external data/program memory accesses.
Program decryption, on the other hand, may be effectuated on blocks of 64 bits through the program decryptor 400. This program decryptor 400 may be based, for example, on the full 16-round DES algorithm and may also be capable of supporting single or triple DES operations. The microcontroller 100 may, for example, be designed to decrypt a 64-bit block in five machine cycles for single DES operations and to decrypt a 64-bit block in 7 machine cycles for triple DES operations. The program decryptor 400 may include the following exemplary units and/or aspects:
The DES algorithm, which may be employed in accordance with the present invention, is based on the principle of building up a sequence of simple operations to form an overall complex operation, each round of operation providing very little security if separated. The basic operation of each round consists of some permutations and substitutions, determined by a subset of the key bits, performed on one half of the data. In this operation, one-half may be encrypted and the other half may continually pass through unchanged, providing invertible paths for decryption. The operation can be performed using the following equations:
Given an initial input data M divided into left and right halves (L0, R0), M is transformed after the ith round into Mi=(Li, Ri) defined by
Li=Ri−1 (1)
Ri=Li−1+f(Ri−1, K1) (2)
where Ki is the encryption subkey. This is easily invertible if the key is known since, given (Li, Ri), it may recover (Li−1, Ri−1) by
Li−1=Ri+f(Ri−1, Ki)=Ri+f(Li, Ki) (3)
Ri−1=Li (4)
The f function itself need not be invertible, any desired f function may be used. However, the f function used in the DES is designed to provide a high level of security by a specially chosen base 2 value in the so called S-boxes.
The 16 subkeys can be generated from the 56 key bits (originally generated by a random number generator (An exemplary random number generator that may be employed along with the principles of the present invention is described in U.S. Nonprovisional application for patent Ser. No. 09/879,686, filed on Jun. 12, 2001, and entitled “IMPROVED RANDOM NUMBER GENERATOR”. U.S. Nonprovisional application for patent Ser. No. 09/879,686 is hereby incorporated by reference in its entirety herein.)) by first applying an initial 56-bit permutation to the original 56 key bits and then using the shifting sequence (1,1,2,2,2,2,2,2,1,2,2,2). A permutation of length 48 is applied after each shift to pick out the subkeys.
A predefined permutation is used to permute the 64 data bits before dividing into two halves. The f function is performed on one half of the input data during a round of process. These data bits are mapped into a 8×6 diagram as the following:
Modulo 2 addition of the above 48 numbers with the subkey bits provide the reference for the result of the f function from the 4×16 S-box. The 8 S-boxes can be referenced from the Data Encryption Standard (DES). This round of encryption is completed by modulo 2 addition of the result base 2 numbers with the other half data bits after a predefined permutation.
This process is repeated through all 16 rounds with a different set of subkeys. A final permutation then takes place which is the inverse of the initial permutation.
The security of a system having a processor and a memory, for example in accordance with certain embodiment(s) of the present invention, can be further improved if the encryption/decryption keys are modified by, for example, the address of the block (or other level of addressable granularity such as the byte, word, page, etc.) that is being encrypted/decrypted. Such a modification effectively creates a different key for each block (or other level of granularity) that is being encrypted/decrypted. Advantageously, if the key is dependent on the block address, no two (2) blocks can be swapped with each other or otherwise moved. When blocks can be swapped or otherwise moved, then another avenue of attack is available to a would-be hacker because the would-be hacker is able to change the order of execution of the system. Limiting or blocking this avenue of attack using the modification scheme described in this paragraph can be accomplished in many ways. By way of example but not limitation, this modification may be effectuated by (i) having the relevant address “xor”ed with the key of (or with more than one key if, e.g., the triple DES is selected as) the encryptor/decryptor algorithm, (ii) having the relevant address added to the key or keys, (iii) having the relevant address affect the key(s) in a non-linear method such as through a table lookup operation, (iv) having one or more shifts of the relevant address during another action such as that of (i) or (ii), some combination of these, etc.
While the modification described in the above paragraph can cause code under attack to become garbled and useless, the following modification described in this and the succeeding paragraph can recognize that code is or has become garbage and, optionally, take an action such as destroying the code, destroying or erasing one or more keys, destroying part(s) or all of the chip, etc. It should be noted that either modification may be used singularly or together in conjunction with certain embodiment(s) in accordance with the present invention. For this modification, an integrity check can be added to the system so as to ensure that the integrity of the code has not been compromised by an attacker. Such an integrity check may be accomplished in many ways. By way of example but not limitation, this modification may be effectuated by fetching a checksum byte or bytes after a block of code is fetched. This fetched checksum may be associated (e.g., by proximity, addressability, a correspondence table or algorithm, etc.) with the block of code previously fetched. It should be noted that the addressing order/location, as well as the fetching order, of the fetched block and the fetched checksum may be changed. It is possible, for example, to use different busses and/or different RAMS to store the encrypted code and the checksum(s), but in a presently preferred embodiment, each is stored in different section(s) of the same RAM (but may alternatively be stored in the same section).
After the block of encrypted code is fetched and decrypted, it may be latched into a checksum calculation circuit (or have a checksum operation performed in the same “circuit” as that of the decryption). A calculated checksum of the decrypted code is calculated by the checksum circuit/operation. The calculated checksum may be compared with the fetched checksum. If they differ, then the block of code may be considered to have failed the integrity check. The calculated checksum may be calculated in many different ways. By way of example but not limitation, the checksum (i) may be an “xor” of the fetched block of information, (ii) may be a summation of the fetched block of information, (iii) may be a CRC of the fetched block of information, (iv) may be some combination thereof, etc. If the fetched block fails the checksum comparison, then various exemplary actions may be taken by the system/chip. By way of example but not limitation, actions that may be taken responsive to a failed checksum integrity check include: (i) a destructive reset that may, for example, result in the clearing of internal key information and/or internal RAMS, (ii) an evasive action sequence may be started, (iii) an interrupt may allow the system program/chip to take action, (iv) some combination thereof, etc.
The microcontroller 100 of
In certain embodiment(s), the program decryptor 400 may be accessed by loaders via specific program block decryption registers in the SFR. The loaders can select either an encryption or a decryption operation by setting/clearing the relevant bits when using the program decryptor 400. The data encryptor 420 may perform byte encryption and decryption on data transferred to and from external memory (not explicitly illustrated in
Referring to
Tags 410 may be made invalid on reset, and the, e.g., byte-wide internal address bus 240A may be initialized to the first external program memory block with byte offset 0h. After a reset, the memory controller 405 may begin fetching program code from the external memory. Eight consecutive external addresses are generated by the address generator 415 (illustrated in
The Least Recently Used (LRU) bit 520 of each tag 410 is used by the memory controller 405 to determine which block is replaced upon a write to the cache 230. An access to a byte in a block clears/sets that block's LRU bit 520 and sets/clears the corresponding block's LRU bit 520 in the other cache way 230a or 230b at the same index address. An exemplary cache block replacement policy is as follows:
(1) Replace the invalid block. If both tags 410 are invalid, replace the block in Way 0 230a; and
(2) If there are no invalid blocks, replace the LRU block. Replacing a valid block causes the necessity of reloading that particular block if accessed again. A newly written block is set to be valid and to have the LRU bit(s) updated.
The MMU is responsible for managing and operating the cache 230, the program and data encryptors 400 and 420, and the accessing of external memory 205. The MMU may include the cache control, the program decryptor 400, and the external address generator 415. The memory controller 405 may provide the following exemplary control functions:
The memory controller 405 controls the cache 230 read/write access by monitoring the PC address activities and comparing the PC address with the tags 410. A hit enables a cache read, a miss causes a stall of the CPU clocks while the memory controller 405 fetches the needed block of code from the external memory 205 (if not already somewhere in the pipeline such as in the program decryptor 400). The requested code block is placed into one of the two appropriate index locations according to the active replacement policy. The corresponding tag 410a or 410b is also updated while normal program execution resumes.
The internal address is latched by the address generator 415 during a data transfer instruction or during a complete miss (block is also not in pipeline). Otherwise, the address generator 415 simply fetches the next sequential block after passing a block to the decryptor 400D. The internal address provides the address to the cache 230a or 230b and tags 410a and 410b. As illustrated in
With respect to specific exemplary implementation(s), external memory 205 may be initialized by a ROM boot loader built into the microcontroller 100, according to certain embodiment(s) of the present invention. When the loader is invoked, the external program memory space appears as a data memory to the ROM and can therefore be initialized. Invoking the loader causes the loader to generate new encryption keys (e.g., from a random number generator), thus invalidating all information external to the part. Invocation of the loader also invalidates the tags 410 and erases the cache 230. The program loading process may be implemented by the following exemplary procedures: the program is read through a serial port, encrypted, and written to the external memory space via data transfer instructions. The encryption process is done in blocks of an exemplary 8 bytes. Eight consecutive bytes are loaded to the program decryption block through an SFR interface, encrypted, then read out. It should be noted that the SFR interface, in certain embodiment(s), may only be available while executing from the loader in ROM mode or a user loader mode. The encrypted code is then written by the loader to the external memory 205 through the byte-wide data bus (e.g., a byte-wide embodiment of bus 210). The byte-wide address bus is driven by the address generator 415 on a data transfer instruction. Eight consecutive memory writes are required to store one block of encrypted code. Data-transfer-instruction-type information transferred between the accumulator and the external memory 205 may travel via a bus 425 if the information is intended for program space. The data information may or may not be encrypted (depending on the value of encryption select bits). The address generator 415 latches the data transfer instruction address, and the data encryptor 420 drives or reads the byte-wise data bus during the data transfer instruction. With respect to these exemplary implementation(s), data encryption is thus byte-wise and real-time, which contrasts with program encryption, which is a block encryption taking multiple cycles.
Referring to
Similar RAM structure can be used for the cache tags with the following exemplary features:
It should be noted that the hit signal is an important path for a cache read. A cache read includes propagation delays caused by the tag RAM read, tag value comparison, and cache read access, to be performed all in one machine cycle. The hit/miss signals are therefore generated quickly (e.g., in less than two oscillator clocks). If a read miss occurs, the CPU/instruction decoder or equivalent stalls until the targeted code block is ready to be placed into the cache 230. There are many levels of stall penalties for a single DES read miss.
These include the following exemplary latencies for single DES operations:
The stall penalties for a triple DES miss include the following exemplary latencies:
Hits on the finished decrypted buffer 400DB do not cause a miss. The CPU/instruction decoder or equivalent can execute from the decrypted buffer 400DB directly, while the corresponding block in cache 230a or 230b is replaced with the buffer instruction data.
Referring now to
Instruction fetching from the external program memory (from the perspective of the program execution unit) is actually a read from the cache. To that end, the cache is accessed, or at least checked (along with the address of the block in the decrypted buffer), prior to external code fetching. It should be noted that internal program memory may also be present and accessed. To check the cache, at steps 730a and 730b, 13 high order PC address bits are compared with values of the cache tags that are addressable by a block index (e.g., PC bits [8:3]). It should be noted that other total numbers as well as divisions of the address bits may alternatively be employed and that steps 730a and 730b may be performed in a different order, substantially simultaneously, etc. If there is a match (at step 730b) and the tag is valid (at step 730a), then, at step 735, a memory controller/MMU generates a hit signal and allows a read access to that particular cache location at step 740. If, on the other hand, a cache miss occurs (as determined at steps 730a and 730b), but the PC address matches the block that is currently in the decrypted buffer (and the decryption process is completed) at step 745, the corresponding instruction byte is transferred directly to the data bus (at step 720c) for the program execution unit, the cache and tags are updated (at steps 720a and 720b, respectively), and execution continues without a stall (at step 725).
A stall occurs when there is no tag match or the tag is matched but invalid and when the data is not available in the decrypted buffer. Such a miss temporarily stalls program execution, and if the desired instruction(s) are not already in the decryption unit or currently being loaded thereto from the encrypted buffer as determined at step 750, the current PC address is used to fetch the desired code from the external program memory (at steps 705, 710, et. seq.). If the PC address does match that of the encrypted instructions, then flow can continue at the decryption or forwarding-to-the-decryption unit stages (at steps 710, 715, et seq.). It should be understood that the PC address is not necessarily the first byte of the block address; however, the memory controller/MMU may set the byte offset bits to 0h for block alignment in the buffers, cache, etc. Program execution may be resumed after the targeted code block reaches the point (at step 725) where the decryption process is finished and the opcode/operand byte may be fed to the program execution unit directly from the decrypted buffer (at step 720c) (e.g., while the cache and tags are updated).
To minimize or reduce the latency associated with external code fetching, processors may be designed with the assumption that program execution is frequently sequential and often follows a rule of locality (e.g., the 90/10 rule). The memory controller/MMU may (pre-)fetch the next consecutive code block, decrypts it, and has it ready in the decrypted buffer. The memory controller/MMU is typically responsible for controlling these activities, and it is responsible for checking possible matches between the current PC address and the blocks in the encrypted buffer, decryption unit, and decrypted buffer. For example, if the current PC address matches the block in the decrypted buffer (as determinable at step 745), the requested byte of instruction data in the decrypted buffer is used for program execution (at step 725 via step 720c) while the related block is written to the cache (at step 720a). If, on the other hand, the current PC address matches the block in the encrypted buffer (or optionally the decryption unit) (at step 750), the encrypted instruction data in the encrypted buffer is transferred to the decryption unit and decrypted (at step 710). A match to either buffer causes the instruction data therein to be placed in the cache at the appropriate address-based index. In the case of a match to the decrypted buffer, the decrypted instruction data can be placed into the cache without delay; otherwise, a stall occurs until the block having the requested instruction byte is decrypted and forwarded to the decrypted buffer, whereafter the decrypted instruction byte may be driven for execution and the cache and tags updated. As described above, upon a miss of the cache, the requested instruction byte may be driven directly from the decrypted buffer while also being substantially simultaneously written to the cache. Any other instruction bytes in the same block that are subsequently requested are located in and may read from the cache, until the block is replaced or rendered invalid.
Although embodiment(s) of the methods, systems, and arrangements of the present invention have been illustrated in the accompanying Drawings and described in the foregoing Detailed Description, it will be understood that the present invention is not limited to the embodiment(s) disclosed, but is capable of numerous rearrangements, modifications, and substitutions without departing from the spirit and scope of the present invention as set forth and defined by the following claims.
Number | Name | Date | Kind |
---|---|---|---|
3573747 | Adams et al. | Apr 1971 | A |
3609697 | Blevins et al. | Sep 1971 | A |
3798360 | Feistel | Mar 1974 | A |
5943421 | Grabon | Aug 1999 | A |
5982887 | Hirotani | Nov 1999 | A |
6003117 | Buer et al. | Dec 1999 | A |
6061449 | Candelore et al. | May 2000 | A |
6523118 | Buer | Feb 2003 | B1 |
6741729 | Bjorn | May 2004 | B2 |
7039814 | DaCosta | May 2006 | B2 |
7073069 | Wasson | Jul 2006 | B1 |
7089418 | Ellison | Aug 2006 | B1 |
20020116606 | Gehring | Aug 2002 | A1 |
20020188839 | Noehring | Dec 2002 | A1 |
Entry |
---|
Product Bulletin entitled, “VMS320 High Speed PCMCIA Security Tokken Crytographic Engine”, by VLSI Technology, Inc., 1997, PB-0497-020, (pp. 6). |
Number | Date | Country | |
---|---|---|---|
Parent | 09932247 | Aug 2001 | US |
Child | 14964424 | US |