1. Field of the Invention
The current invention relates to cryptography, and in particular to modules for the encryption of plaintext and/or decryption of ciphertext.
2. Description of the Related Art
Encryption and decryption are cryptographic processes that convert plaintext into ciphertext and vice versa, respectively. Plaintext refers to text-based data (i.e., a sequence of bit strings) that is typically readily readable by and comprehensible to a human. Note that, more generally, plaintext refers to the input to an encryption algorithm, and the plaintext may well be gibberish. Data encryption is a process used to convert a block of plaintext into a block of ciphertext, where ciphertext typically appears to be gibberish not readily readable by or comprehensible to a human. Note that, more generally, ciphertext refers to the output of an encryption algorithm, and the ciphertext might happen to resemble recognizable text. A typical flow of information in cryptography involves inputting original plaintext into an encryption algorithm that outputs ciphertext, transmitting the ciphertext, and then inputting the ciphertext into a complementary decryption algorithm that outputs the original plaintext.
One way to encrypt plaintext involves using a key. The resulting ciphertext is decrypted using the appropriate corresponding key. A cryptographic system that uses the same key for both encryption and decryption is known as a symmetric cryptographic system. A collection of functions and their inverses that use keys and map strings of a fixed length to strings of the same length is known as a block cipher. One popular symmetric block cipher is the Advanced Encryption Standard (AES), described in Federal Information Processing Standards Publication (FIPS) 197, incorporated herein by reference in its entirety. Older FIPS-approved symmetric block ciphers include the Data Encryption Standard (DES) and triple-DES.
Symmetric block ciphers are used in multiple endeavors and for multiple purposes. One use for symmetric block ciphers is for the cryptographic protection of data on block-oriented storage devices, such as typical computer hard drives. Two typical characteristics of storage-device data protection transforms are that they (1) are length-preserving, meaning a block of ciphertext is the same length as the corresponding block of plaintext and (2) allow for independent processing of data units.
A symmetric block cipher, such as AES, may be used in a variety of operational modes, each involving a different way of using the block cipher. Several operational modes are useful for avoiding having identical outputs from identical inputs into a cryptographic algorithm. Since having identical outputs from identical inputs can be used by an adversary to break a cryptographic algorithm, it can be useful to have operational modes that, given an input stream including identical blocks A0, A1, . . . , Ag, (in other words, A0=A1=Ag), output corresponding but non-identical output blocks B0, B1, . . . , Bg (in other words, B0≠B1≠Bg), respectively.
Several basic modes are described in the National Institute of Standards and Technology (NIST) Special Publication (SP) 800-38A, titled “Recommendation for Block Cipher Modes of Operation,” and incorporated herein by reference in its entirety. An additional mode of operation is described in NIST Special Publication 800-38D, titled “Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC,” incorporated herein by reference in its entirety. Yet another mode of operation, LRW (named for Liskov, Rivest, and Wagner), is described in the IEEE's 2006 P1619D5 publication, titled “Draft Standard Architecture for Encrypted Shared Storage Media,” incorporated herein by referenced in its entirety.
A further additional mode of operation, XTS (XEX-based Tweaked codebook mode with ciphertext Stealing, where “XEX” is from “XOR-Encrypt-XOR”), is described in IEEE's Std 1619-2007 publication, titled “IEEE Standard for Cryptographic Protection of Data on Block-Oriented Storage Devices,” incorporated herein by reference in its entirety, which describes one system for storage-device data protection. The XTS mode is also described in NIST Special Publication Draft 800-38E, titled “Recommendation for Block Cipher Modes of Operation: The XTS-AES Mode for Confidentiality on Block-Oriented Storage Devices,” incorporated herein by reference in its entirety.
Yet another mode of operation, which combines counter (CTR) mode encryption with Cipher Block Chaining Message Authentication Code (CBC-MAC), is the Counter with CBC-MAC (CCM) mode. CCM mode is described in the Internet Engineering Task Force's (IETF's) request for comment (RFC) 3610, incorporated herein by reference in its entirety. The various modes use the plaintext, input vectors, and cipher functions in a variety of ways, described in more detail below.
Each of the various modes described below uses a cryptographic kernel executing an underlying block-cipher algorithm that is assumed to be a FIPS-approved symmetric-key block-cipher algorithm where a secret, random key K has been established between the parties to the communication. As previously noted, examples of such block-cipher algorithms include AES, DES, and triple-DES. A forward cipher function under the key K applied to block X is designated as CIPHK(X). An inverse cipher function under the key K applied to block X is designated as CIPH−1K(X). It should be noted that, in some operational modes, the forward cipher function CIPHK(X) is used for both encryption and decryption.
A first basic mode described in NIST Special Publication 800-38A is electronic codebook (ECB) mode, which features, for a given key, the assignment of a fixed ciphertext block to each corresponding plaintext block, analogous to the assignment of code words in a codebook. ECB mode is specified in Equation (1) below:
ECB Encryption: Cj=CIPHK(Pj) for j=1 . . . n (1.1)
ECB Decryption: Pj=CIPI−1K(Cj) for j=1 . . . n (1.2)
where Cj is the jth ciphertext block of n ciphertext blocks, CIPHK(Pj) is a forward cipher function of the block-cipher algorithm under the key K applied to jth plaintext block Pj of n plaintext blocks, and CIPH−1K(Cj) is an inverse cipher function of the block-cipher algorithm under the key K applied to Cj to produce Pj. Note that, using the ECB mode, with a given key and cipher function, any given plaintext block gets encrypted to the same ciphertext block and vice versa.
A second basic mode described in NIST Special Publication 800-38A is the cipher block chaining (CBC) mode, which features the combining (i.e., chaining) of a given plaintext block with a previous ciphertext block. An initialization vector (IV) is needed for combination with the first plaintext block. The operation represented herein by the symbol ⊕ is an exclusive-OR (XOR) operation. The XOR operation is sometimes referred to as bitwise addition. As used herein, unless otherwise indicated, an addition operation refers to an XOR operation. The CBC mode is specified in Equation (2) below, where the terms are as defined above:
As can be seen, in encryption, each successive plaintext block after the first is added to the previous output/ciphertext block to produce the new input block, and the forward cipher function is applied to each input block to produce the ciphertext block. In decryption, to recover any subsequent plaintext block after the first, the inverse cipher function is applied to the corresponding ciphertext block, and the resulting block is XORed with the previous ciphertext block
A third basic mode described in NIST Special Publication 800-38A is the Cipher Feedback (CFB) mode, which has an initialization vector as the initial input block and feeds back successive ciphertext segments into successive input blocks in the forward cipher to generate output blocks that are added to plaintext blocks to produce corresponding ciphertext blocks, and vice versa. The blocks of plaintext and ciphertext in CFB mode are b bits long; however, plaintext blocks are encrypted in segments of length s, where 1≦s≦b. The CFB mode is specified in Equation (3) below, where P#j is the jth plaintext segment, C#j is the jth ciphertext segment, Ij is the jth input block to the cipher function, Oj is the jth output block of the cipher function, C#j is the jth ciphertext segment (having a length s), LSBx(y) represents the x least-significant bits of y, MSBx(y) represents the x most-significant bits of y, and “∥” represents a concatenation operation. It should be noted that n in CFB mode represents the number of plaintext and/or ciphertext segments, which is not necessarily equal to the number of plaintext and/or ciphertext blocks.
A fourth basic mode described in NIST Special Publication 800-38A is the Output Feedback (OFB) mode, which iterates the forward cipher function on an initialization vector (IV) to generate a sequence of output blocks that are added to plaintext blocks to produce corresponding ciphertext blocks, and vice versa. In OFB mode, the IV should be a nonce, i.e., the IV should be unique for each execution of the mode under the given key. In OFB encryption, the IV is processed by the forward cipher function to produce the first output block, which is added to the first plaintext block to produce the first ciphertext block. The first output block is then enciphered to produce the second output block, which is added to the second plaintext block to produce the second ciphertext block, and so on, i.e., output blocks of the forward cipher function are used as inputs to successive applications of the forward cipher function to produce new output blocks for adding to corresponding plaintext blocks. The OFB mode is specified in Equation (4) below, where P*n represents the last block of the plaintext, which may be a partial block of u bits, and C*n represents the last block of the ciphertext, which may be a partial block of u bits:
A fifth basic mode described in NIST Special Publication 800-38A is the Counter (CTR) mode, which applies the forward cipher to a set of input blocks, called counters, to produce a sequence of output blocks that are added to plaintext blocks to produce corresponding ciphertext blocks, and vice versa. The counters should be unique for each message encrypted under a given key. One way to achieve this result is by starting with an initial input block and iteratively incrementing its value to get the subsequent counters. The forward cipher function is applied to each counter block, and the resulting output blocks are added to the corresponding plaintext blocks to produce the ciphertext blocks. The CTR mode is specified in Equation (5) below, where the jth counter block is represented by Tj:
One block-cipher mode described in NIST Special Publication 800-38D is Galois/Counter Mode (GCM), which is a variation on the above-described CTR mode. GCM mode combines an encryption function referred to as GCTR and a hashing function referred to as GHASH. A device encrypting in GCM mode takes plaintext and Additional Authenticated Data (AAD) and outputs (1) a ciphertext based on the plaintext and (2) a hashed message digest, also called a tag, based on both the AAD and the ciphertext.
The GCTR encryption of plaintext string X, given key K and initial counter block ICB and resulting in ciphertext string Y (i.e., Y=GCTRK(ICB, X)) is specified in Equation (6) below, where “┌x┐” represents the result of the application of the ceiling function to the number x, n is the number of blocks in plaintext string X, len(W) represents the bit-string length of bit string W (e.g., len(“01000101”)=8), int(W) is the integer for which the bit string W is a representation (e.g., int(“0100”)=4), “[x]s” is an s-character bit-string representation of the integer x (e.g., [4]8=“0000 0100”), CBi is the ith counter block, X*n represents the last block of plaintext string X, which may be an incomplete block, and inc32(W) represents the result of an incrementing function on bit string W, where inc32(W)=MSBlen(W)-32(W)∥ [int([LSB32(W))+1 mod 232]32 (in other words, inc32(W) increments the right-most 32 bits of bit string W by 1, with the result reduced modulo 232):
As can be seen, the GCTR mode is simply a variation of the above-described CTR mode where the specified counter-block incrementing method is the inc32(W) function. The GCM mode of operation, given key K, initialization vector IV, plaintext P, and AAD A, which uses GCTR encryption and GHASH hashing, is specified in Equation (7) below, where t is a supported tag length associated with the key, J0 is a pre-counter block generated from initialization vector IV, H is the hash subkey for the GHASH function, 0m is an m-bit bit string of “0”s, u and v are integers, S is a text block, T is the resultant authentication tag, and C is the resultant ciphertext string:
One AES-based ciphering system described in the above-referenced IEEE P1619D5 publication is the LRW (named for Liskov, Rivest, and Wagner) transform. The LRW transform for the jth 128-bit plaintext block Pj of plaintext string P takes a 256-, 320-, or 384-bit key K and a 128-bit tweak value ij. As noted below, key K is used as two keys: master key K1 and tweakable key K2. A tweak value is the name given in the LRW transform to a nonce. Typically, the tweak value is the sequential address or number of block P within string P.
The key K is used as two keys, namely K1 and K2, where K2 is the last 128 bits of key K and K1 is the first 128, 192, or 256 bits of key K. The LRW encryption and decryption modes of operation for the block P are specified in Equation (8), below, where TT, PP, and CC are temporary binary strings, C1 is the resulting 128-bit ciphertext block, and represents modular multiplication over the binary Galois field GF(2), modulo x128+x7+x2+x+1.
The LRW transform has a special operation for the last two blocks Pm−1 and Pm of plaintext string P whose bit length is not a multiple of 128, where the bit length of final block Pm is b bits. The encryption procedure, described in the above-referenced IEEE P1619D5 publication, involves (1) performing the LRW encryption transform on Pm−1 to get CC, (2) returning the first b bit of CC as Cm, (3) performing the LRW encryption transform on the concatenation of Pm and the last (128-b) bits of CC to get Cm−1. The corresponding LRW decryption procedure for the corresponding ciphertext blocks reverses this transformation.
As noted above, the XTS mode of operation is described in the IEEE Std 1619-2007 publication. The XTS encryption and decryption modes of operation for the jth block Pj of plaintext string P is specified in Equation (9) below, where α is a primitive element of Galois field GF(2128), i is a tweak value typically corresponding to the logical block address of the first block of plaintext string P (but can also bee some other non-negative integer), and the other elements are as defined above.
The XTS transform has a special operation for the last two blocks Pm−1 and Pm of plaintext string P, whose bit length is not a multiple of 128, where the bit length of final block Pm is b bits. This operation is referred to as ciphertext stealing. The encryption procedure, described in the above-referenced IEEE Std 1619-2007 publication, involves (1) performing the XTS encryption on Pm−1 to get CC, (2) returning the first b bit of CC as Cm, (3) performing the XTS encryption transform on the concatenation of Pm and the last (128-b) bits of CC to get Cm−1. The corresponding XTS decryption procedure for the corresponding ciphertext blocks reverses this transformation.
As described above, the CCM mode of operation combines counter (CTR) mode encryption with cipher block chaining message authentication code (CBC-MAC). A device encrypting in CCM mode takes a plaintext message M, additional authenticated data D, a nonce N, and a key K. An initial authentication block B0 is generated from flags, the nonce N, and the length of message M in bytes (“l(M)”). 128-bit blocks B1, . . . , Bn are formed from the additional authenticated data D and the plaintext message M. The authentication using CBC-MAC is performed as per Equation (10) below, where Xi is the ith output block of the forward cipher function using key K, T is the unencrypted authentication tag, m is the size in bytes of the field for unencrypted authentication tag T, and first-m-bytes(W) is a function that returns the first m bytes of W:
The message M and authentication tag T are then encrypted using CTR mode encryption. A key-stream of blocks Si is defined as Si=CIPHK(Ai) for i=0, 1, 2, . . . , where Ai is a block comprising flags, the nonce N, and counter i. S0 is used to generate encrypted authentication tag U, where U=T ⊕ first-m-bytes(S0). The message M is then encrypted by performing an XOR operation on the bytes of message M with the first l(M) bytes of the concatenation of S1, S2, . . . . Note that S0 is not used in the encryption of the message M.
A device decrypting in CCM mode takes ciphertext message C, additional authenticated data D, nonce N, and key K. The key-stream of blocks Si is generated as described above and used for adding to tag U and ciphertext message C to produce unencrypted tag T and plaintext message M. The corresponding CBC-MAC is then recomputed to generate T′, which is compared to T to authenticate plaintext message M and additional authenticated data D.
Novel systems and methods would be useful, which (1) allow greater flexibility with multiple operational modes and data streams and (2) do not require significant additional resources, such as integrated-circuit (IC) floor space in a hardware implementation.
One embodiment of the invention can be a multi-mode cryptography (MM-C) module. The MM-C module is adapted to process an input string-data block using corresponding key data and corresponding mask data to generate an output string-data block. The MM-C module comprises (a) a data-stream (D-S) processing module adapted to process a corresponding input data block in accordance with the corresponding key data and corresponding mask data to generate an output data block, wherein the input data block is derived from at least one of the corresponding input string-data block and the corresponding mask data, (b) a key expansion and selection (E&S) module adapted to provide the corresponding key data to the D-S processing module, (c) a mask generation/updating (G/U) module adapted to provide the corresponding mask data to the D-S processing module, and (d) a controller adapted to control operations of the D-S processing module, the E&S module, and the G/U module such that the MM-C module processes, in an interleaved manner, a first data stream in a first cryptographic mode and a second data stream in a second cryptographic mode.
Another embodiment of the invention can be a multi-mode cryptography (MM-C) method for processing input string-data blocks using corresponding key data and corresponding mask data to generate output string-data blocks. The method comprises, for each corresponding input data block (a) providing the corresponding key data and the corresponding mask data and (b) processing the corresponding input data block in accordance with the corresponding key data and corresponding mask data to generate an output data block, wherein the input data block is derived from at least one of the corresponding input string-data block and the corresponding mask data. The method comprises processing, in an interleaved manner, a first data stream in a first cryptographic mode and a second data stream in a second cryptographic mode.
Yet another embodiment of the invention can be a storage controller comprising a multi-mode cryptography (MM-C) module. The MM-C module is adapted to process an input string-data block using corresponding key data and corresponding mask data to generate an output string-data block. The MM-C module comprises (a) a data-stream (D-S) processing module adapted to process a corresponding input data block in accordance with the corresponding key data and corresponding mask data to generate an output data block, wherein the input data block is derived from at least one of the corresponding input string-data block and the corresponding mask data, (b) a key expansion and selection (E&S) module adapted to provide the corresponding key data to the D-S processing module, (c) a mask generation/updating (G/U) module adapted to provide the corresponding mask data to the D-S processing module, and (d) a controller adapted to control operations of the D-S processing module, the E&S module, and the G/U module such that the MM-C module processes, in an interleaved manner, a first data stream in a first cryptographic mode and a second data stream in a second cryptographic mode.
Other aspects, features, and advantages of the present invention will become more fully apparent from the following detailed description, the appended claims, and the accompanying drawings in which like reference numerals identify similar or identical elements.
DMA controllers generally coordinate data transfers between a computer system's storage devices and the computer system's memory or external devices. DMA engine 101 of storage controller 100 also provides encryption and decryption functionality for data stored in the one or more storage devices. DMA engine 101 comprises multimode AES module 108, which performs encryption and decryption of string data read from or written to the one or more storage devices. DRAM controller 102 interfaces with the computer-system DRAM. SAS core 103 interfaces with the one or more storage devices. PCI-E core 104 interfaces with the computer-system motherboard. Internal RAM 105 is for use by the components of storage controller 100. CPU 106 is the controller for the components of storage controller 100. Peripheral controller 107 interfaces with peripheral devices such as input/output (I/O) devices.
After processing each block of a data stream, other than the final string-data block of the data stream, MM-AES module 108 generates or updates an internal state associated with the data stream, where the internal state is data-stream data needed to process the next string-data block of the data stream. Note that some processing modes, such as ECB mode, do not require an internal state and, therefore, (1) have empty internal states, (2) have no internal states, or (3) effectively ignore their internal states. MM-AES module 108 is thus able to process a first set of one or more blocks of a first data stream, then process one or more blocks of a second data stream, and then process a second set of one or more subsequent blocks of the first data stream, where data generated during the processing of the first set of blocks is saved and then used to process the second set of blocks. This is an example of interleaved processing.
MM-AES module 108 is capable of interleaved processing of up to eight different data streams. As noted above, interleaved processing allows MM-AES module 108 to, for example, begin processing a first data stream and then begin processing a second data stream before completing processing of the first data stream. Each data stream has (i) a corresponding AES operational mode, including selection between encryption and decryption, and (ii) a corresponding AES key. Note that the respective data streams may have different or the same operational modes, data streams, and/or keys. Many AES modes are also associated with a corresponding mask. As used herein, unless otherwise indicated, a mask for a data stream being processed using a particular AES mode refers to mode-specific intermediate information (e.g., the above-described internal state) needed to allow continuing processing of the data stream in accordance with the AES mode if processing is interrupted. Masks may include information such as, for example, counter blocks, previous output blocks, hash data, and/or length data. MM-AES module 108 is adapted to store up to eight masks and eight AES keys, each identifiable by a three-bit identifier.
Note that, in some AES modes, an internal state undergoes one or more transformations between the processing of two consecutive string-data blocks. For example, in CTR-mode encryption, the counter block used for a previous plaintext data block is incremented and enciphered before being added to a present plaintext data block to generate a present ciphertext output block. MM-AES module 108 increments the counter block as part of the processing of the previous block and stores the incremented counter block as the mask. When the present string-data block is processed, the mask is input to the forward AES function, and the result is added to the present string-data block to generate the ciphertext output block for the present string-data block. In general, MM-AES module performs mask pre-processing up to, but not including, AES forward/inverse ciphering. Thus, for example, in the XTS-mode processing of a previous string-data block, the previous string-data block's TT string block is multiplied by α to generate the present string-data block's TT string block, which is stored as the mask for use in processing a present string-data block. Note that, in alternative implementations of MM-AES module 108, different degrees of pre-processing may be performed between consecutive string-data blocks of a data stream.
MM-AES module 108 receives string-data blocks in series, with each string-data block accompanied by a corresponding key ID and mask ID. For a given data stream, the corresponding key is provided via key data path 201b concurrently with (i) the data stream's first data block (which, like all data blocks, is provided via path 201a) and (ii) the corresponding key ID, which is provided via control line 201c. Subsequent data blocks of that given stream are accompanied by the key ID, provided via control line 201c, but do not need the key itself. In other words, a data stream's key does not need to be provided more than once. Each received string-data block is separately processed based on its corresponding key and mask. Since different data streams may be interleaved, consecutively received string-data blocks may be part of the same data stream or of different data streams. For data streams where the processing of a subsequent string-data block depends on a result of processing of the previous string-data block (e.g., for CBC encryption), a corresponding mask is cached by MM-AES module 108 for future use by the subsequent data block of that same data stream. For data streams that require initialization vectors, those vectors are supplied as data blocks via data path 201a, with an appropriate corresponding control instruction on control line 201c. Any other required operational masks may be generated and updated internally and on the fly by MM-AES module 108 during processing of the corresponding data stream.
MM-AES module 108 comprises interface and buffers (“I&B”) module 201 and multimode AES encoding/decoding (“E/D”) module 202. I&B module 201 receives (1) string data, e.g., plaintext and ciphertext, via data path 201a, (2) keys via data path 201b, and (3) control instructions via control line 201c. I&B module 201 functions to synchronize and control the provision of these data, keys, and instructions to MM-AES E/D module 202. The control instructions received via control line 201c indicate, for example, whether the associated string data is to be encrypted or decrypted, which AES mode to use, and identify the data stream to which the particular string data belongs. I&B module 201 provides to MM-AES E/D module 202 (1) string data via data path 201g and (2) keys via data path 201h. I&B module 201 also passes through the control instructions on control line 201c. In addition, I&B module 201 caches keys for use by MM-AES E/D module 202.
MM-AES E/D module 202 outputs (1) processed, i.e., encrypted or decrypted, string data via data path 202a, (2) status flags, such as module state, key status, and mask status, via data path 202b, and (3) any error flags via data path 202c. Authentication tags, when their provision is necessary, are provided via data path 202a, with an appropriate corresponding indicator status flag (e.g., “tag_valid”) on data path 202b. It should be noted that individual data paths may represent, in varied physical implementations, discrete, independent data buses or portions of shared data buses. Data paths 201a, 201g, and 202a are 128-bit buses able to accommodate all the bits of a single entire AES data block in parallel. Data paths 201b and 201h are 256-bit buses able to accommodate all the bits of an entire AES key (whether 128-, 196-, or 256-bit) in parallel. Note that some compound keys, such as key K in LRW mode, which comprises K1 and K2, may be provided as two separate keys.
MM-AES E/D module 202 comprises controller 203, mask generation/updating (“G/U”) module 204, data-stream (D-S) processing module 205, and key expansion and selection (“E&S”) module 206. Controller 203 receives control signal 201c and outputs control signals 203a, 203b, and 203c to mask G/U module 204, D-S processing module 205, and key E&S module 206, respectively. Modules 204, 205, and 206 output (1) stream status flags via data path 202b and (2) error flags via data path 202c.
Key expansion and selection module 206 receives keys via data path 201h and provides the expanded and selected keys to data-stream processing module 205 via data path 206a. D-S processing module 205 also receives string data via data path 201g and outputs processed string data via data path 202a. D-S processing module 205 may provide data, such as processed string data, to mask generation/updating module 204 via path 205a and may receive data (e.g., masks for processing data, initialization vectors, and string data) from mask G/U module 204 via data path 204a. Note that initialization vectors may either be generated by, or simply passed through by, mask G/U module 204 for provision to D-S processing module 205. Mask G/U module 204 may receive string data via data path 201g and may receive keys via key path 201h.
Controller 203 controls the operation of the other modules of MM-AES E/D module 202 via control lines 203a, 203b, and 203c so that appropriate processing is performed on the input string data and corresponding keys. Controller 203 includes a finite state machine (FSM) for the control of modules 204, 205, and 206 and the operation flow of E/D module 202. Mask G/U module 204 handles the masks by, for example, (i) performing Galois-field multiplications and other operations to generate or update masks, (ii) storing masks for the various data streams being processed by MM-AES module 108, and (iii) storing initialization vectors (“IVs”) when needed. Note that mask G/U module 204 may also store and/or pass through to D-S processing module 205 input string data or other data. D-S processing module 205 performs the mode-specific AES encrypting and decrypting (e.g., by performing the forward or inverse AES cipher function) using (i) the input string-data block, the mask, and/or IV and (ii) the expanded key provided by key E&S module 206. Note that, depending on the processing mode, the AES cipher function may be applied to either the input string-data block or the corresponding mask.
Key E&S module 206 synchronously expands the key corresponding to an input string-data block and provides the appropriate corresponding key data for the AES-round processing performed by D-S processing module 205. In other words, for each round of performing the AES cipher function, key E&S module 206 provides the appropriate segment of the expanded key schedule to D-S processing module 205. Since each input string-data block is received with a corresponding key and is not necessarily preceded in processing by a string-data block from the same data stream, key E&S module 206 dynamically performs this expansion and provision for each input string-data block.
As noted above, MM-AES module 108 processes each received input string-data block individually. Each string-data block received via path 201a is accompanied by (a) the corresponding key on path 201b and (b) the corresponding instructions indicating processing mode on control line 201c. This allows MM-AES module 108 to process multiple data streams where the different streams use different keys (including keys of different lengths) and/or different processing modes (including selecting encryption or decryption). It should be noted that MM-AES module 108 may receive commands via control line 201c without corresponding data blocks on path 201a. Also, as indicated elsewhere herein, data blocks other than input string-data blocks may be received on path 201a. Similarly, as indicated elsewhere herein, data blocks other than output string-data blocks may be output on path 202a.
Mask generation/updating module 204 stores a mask for each particular stream requiring a mask so that, if MM-AES module 108 returns to processing that stream, then mask G/U has available the prior mask for processing the next string-data block. MM-AES module 108 can also be updated to properly process (e.g., encrypt or decrypt) string-data blocks in accordance with new AES or other modes of operation not described above, including future modes of operation not yet invented.
Note that one optional mode of processing is transparent processing, also called bypassing, where data blocks are pipelined through MM-AES module 108 without encryption or decryption. Transparent mode may be used to simplify processing of a sequence of mixed data blocks having blocks that do not require encryption/decryption by avoiding both (i) extracting data blocks out of the sequence and (ii) creating bypass mechanisms. Transparent mode may also be used to implement non-encrypting authentication protocols, such as the IEEE media access control (“MAC”) security (MACsec) protocol, described in the IEEE 802.1AE Standard, incorporated herein by reference in its entirety.
Storage controller 100 of
The ability of MM-AES module 108 to externally store and read masks also provides added flexibility to operational modes, such as XTS-AES, that use ciphertext-stealing to encrypt or decrypt data streams whose respective final blocks are not the standard length (e.g., 128 bits). As explained above, ciphertext stealing uses joint processing of the final two blocks of the data stream for encryption and decryption. MM-AES module 108 may store one of the final processed blocks in mask G/U module 204. Alternatively, MM-AES module 108 may store one of the final pair of processed blocks in the above-described external storage space of storage controller 100.
Table 1, below, shows exemplary processing commands for XTS-mode AES encryption of four separate consecutive string-data blocks of a single data stream. Note that, as described above, XTS (like LRW mode) uses a key K that comprises two independently used keys, K1 and K2, where K2 is used to encrypt the tweak and K1 is used to encrypt an intermediate block resulting from the addition of an input string-data block and a corresponding mask.
Table 2, below, shows exemplary processing commands for GCM-mode AES encryption of 3 blocks of input data. Note that the final block may be incomplete.
Data blocks and corresponding keys are processed in parallel data pipelines by MM-AES module 108. The particular way that an implementation of MM-AES module 108 is configured may determine the average number of clock cycles that MM-AES module 108 will require to process a data block. For a hardware implementation, there are typically trade-offs between processing speed and circuit size. A larger circuit, as in a fully-unrolled implementation, would generally be able to start processing an entire incoming string-data block every single clock cycle. In other words, while the pipeline is full, the fully-unrolled implementation processes 14 blocks at a time, each block in a different stage of processing.
Unrolling is a hardware-implementation technique for adding hardware components to allow for faster average processing through pipelining. As would be appreciated by one of ordinary skill in the art, various degrees of unrolling are possible in implementing a device in hardware, where less unrolling saves integrated-circuit (IC) floor space at the expense of processing speed and more unrolling increases processing speed at the expense of greater IC floor space. Meanwhile, a smaller circuit with no unrolling may process a single data block at a time, i.e., without concurrently processing other data blocks. Intermediate-level circuits, such as a half-unrolled implementation, may take up less floor space than the larger fully-unrolled circuit and require fewer clock cycles on average per data block than the smaller not-unrolled circuit.
Each of the various pipelines comprises seven linearly connected segments to correspond with the seven linearly connected segments of the main data-processing path. As used herein, when referring to a plurality of segments, the term “linearly connected” refers to a plurality of segments that forms a pipeline and includes a first segment, a last segment, and zero or more intermediate segments, where (i) the first segment is connected to a subsequent segment, (ii) the last segment is connected to a preceding segment, and (iii) each intermediate segment is connected to both a preceding segment and a subsequent segment. Note that additional connections between segments (e.g., feedback connections) are possible. Along with the preliminary AddRoundKey( ) operation, a data block begins its transformation, e.g., round one of the AES transformation, in the first segment of the data-processing path. The data block then proceeds through segments 2 to 7, e.g., the second to seventh rounds. After segment 7, the processed block is fed back to the first segment for further processing, e.g., the eighth round. Note that, therefore, segment 1 is adapted to receive both (a) new input data blocks and (b) feedback data blocks, but only one for further processing in any particular clock cycle. Depending on the AES key used, an input data block may be fully processed in 10, 12, or 14 rounds. It should be noted that, in some AES modes, additional operations, such as XOR operations, may be performed after the AES round transformations are completed but before providing an output block via path 202a.
I&B module 201 of MM-AES module 108 comprises data synchronizer 301 and key-cache module 302. Data synchronizer 301 comprises a plurality of registers that cache the data provided on data path 201a for timely (i.e., synchronous) provision to E/D module 202 via data path 201g along with the corresponding command instructions provided to controller 203 via control path 201c. Key-cache module 302 stores up to eight AES keys corresponding to the up-to-eight data streams that may be concurrently processed by MM-AES module 108. Key-cache module 302 provides information about the lengths of its cached keys to controller 203 via path 302a. Controller 203 uses the key-length information in its control of key E&S module 206 and other modules.
Mask G/U module 204 comprises multipliers module 303 and registers module 304. Multipliers module 303 performs binary Galois-field multiplications for generating and updating masks (e.g., in XTS, LRW, and GCM modes). Registers module 304 caches the generated and/or updated masks and provides them, as needed, to data-stream processing module 205. Registers module 304 may also store initialization vectors (IVs) for processing modes that use IVs (e.g., CBC and CTR modes).
Controller 203 comprises shift register 305 and FSM-based controller 306, which are parallel pipelines to the processing pipelines of data-stream processing module 205 and key E&S module 206. Shift register 305 comprises seven segments (not shown) and is used to synchronize commands and related information with their corresponding data blocks as those data blocks are processed through the data-processing pipeline of data-stream processing module 205. When needed, command and related information blocks are looped from the last segment of shift register 305 to the first segment of shift register 305 via feedback path 305a. Controller 306 comprises seven segments (not shown), each of which (1) receives commands and related data from a corresponding segment of shift register 305, (2) controls corresponding segments in data-stream processing module 205 and key E&S module 206 based on those received commands and related data. Each segment of controller 306 accesses a block's set of commands and related information from a corresponding segment of register 305 via path 305b, which comprises paths 305(1)b-305(7)b. Controller 306 may provide feedback to shift register 305 via a feedback path (not shown). Control path 203c to key E&S module 206 comprises seven paths 203(1)c-203(7)c, each going from a segment of controller 306 to a corresponding segment of key E&S module 206. Control path 203b to data-stream processing module 205 comprises constituent control lines 306(1)a-306(7)a and 306b.
Since there may be certain operations that need to be performed only once for a data block (including, e.g., feedback operations resulting from the half-unrolled architecture) or only once for an entire data stream, the first segment of controller 306 may be dedicated to orchestrate those operations, along with the regular AES-round operations that the other six segments perform, while the other six segments of controller 306 only perform the regular AES-round operations. Alternative embodiments may have several or all of the segments of controller 306 capable of orchestrating any operations of mask G/U module 204, data-stream processing module 205, and/or key E&S module 206.
Key E&S module 206 comprises seven segments, 206(1)-206(7), which together form a parallel pipeline to the main data-processing pipeline of data-stream processing module 205. The pipelines are generally implemented as 128-bit wide pathways comprising interconnected logic gates (including latches and/or flip-flops) transforming the data as it goes from segment to segment. Key E&S module 206 receives an AES key from key cache 302 via path 201h and then performs key expansions synchronously for each round of AES processing of the corresponding data block. The appropriate segment of the dynamically expanded key schedule is provided to a corresponding segment in data-stream processing module 205 via one of paths 206(1)a-206(7)a. While a data block is being processed, the corresponding AES-key data moves from one segment of key E&S module 206 to the next with each round, looping, when needed, from segment 206(7) to segment 206(1) via path 206b.
Data-stream processing module 205 comprises shift register 307 and main data-processing block 308. Shift register 307 comprises seven segments (not shown) and functions as a parallel pipeline of auxiliary data corresponding to data blocks in main data-processing block 308 for keeping the corresponding auxiliary data with the data block for use as needed (e.g., for XOR operations before and/or after AES processing of the data block). Depending on the processing mode for a particular string-data block, the string-data block may be (i) processed via main data-processing block 308 with mask data as auxiliary data moving in parallel in shift register 307 or (ii) moving, as the auxiliary data, in shift register 307 in parallel with the processing of the corresponding mask data in main data-processing block 308. Auxiliary data for processing the corresponding string-data block is provided by shift register 307 via path 307a to segment 308(1) of main data-processing block 308. Shift register 307 is controlled by controller 306 via path 306b. When needed, auxiliary-data blocks are looped from segment 7 of shift register 307 to segment 1 of shift register 307 via path 307b.
Data-processing block 308 is the main pipelined data path for encrypting or decrypting data blocks and comprises seven segments, each controlled by controller 306 via one of 306(1)a-306(7)a. The topmost segment, segment 308(1), receives either a new input block from I&B module 201 via path 201g or a fed-back partly-processed data block from segment 308(7) via path 308a. Each segment comprises the hardware circuitry for performing the transformations of one round of the AES algorithm, encryption or decryption, on the received data block, using the corresponding key segment in key E&S module 206 and command segment in controller 306. Each segment 308(i) is adapted to perform one round of a cipher-function transformation on a transitory data block. As used here, the term “transitory block” refers to the state of a data block in any round of a cipher-function transformation. Note that, depending on control instructions from controller 203, a segment 308(i) may simply pass through a transitory data block without transforming the transitory data block.
Segment 308(1) additionally comprises circuitry for performing (1) preliminary-round processing, (2) first-round processing, (3) last-round processing, (4) and auxiliary-data processing, whether before or after the rounds of AES processing. Segment 308(1) is adapted to both start processing a new input data block and finish processing a fed-back processed data block for output via path 202a. Note that, in alternative implementations, different and/or additional segments of data-processing block may be adapted to perform auxiliary-data processing and/or output the output block via path 202a.
An embodiment of the invention has been described where MM-AES module 108 of
An embodiment of the invention has been described where MM-AES module 108 of
An implementation of MM-AES module 108 of
An implementation of MM-AES module 108 of
An embodiment of the invention has been described wherein for each string-data block processed by MM-AES module 108 of
An embodiment of the invention has been described as comprising an MM-AES module. The invention is not limited to systems using AES. Alternative embodiments use different symmetric block ciphers such as, for example, DES and triple-DES. The generic term multimode cryptography (MM-C) module is used to refer to modules embodying the invention regardless of the particular symmetric block cipher used.
Unless indicated otherwise, the term “determine” and its variants as used herein refer to obtaining a value through measurement and, if necessary, transformation. For example, to determine an electrical-current value, one may measure a voltage across a current-sense resistor, and then multiply the measured voltage by an appropriate value to obtain the electrical-current value. If the voltage passes through a voltage divider or other voltage-modifying components, then appropriate transformations can be made to the measured voltage to account for the voltage modifications of such components and to obtain the corresponding electrical-current value.
As used herein in reference to data transfers between entities in the same device, and unless otherwise specified, the terms “receive” and its variants can refer to receipt of the actual data, or the receipt of one or more pointers to the actual data, wherein the receiving entity can access the actual data using the one or more pointers.
Exemplary embodiments have been described wherein particular entities (a.k.a. modules) perform particular functions. However, the particular functions may be performed by any suitable entity and are not restricted to being performed by the particular entities named in the exemplary embodiments.
Exemplary embodiments have been described with data flows between entities in particular directions. Such data flows do not preclude data flows in the reverse direction on the same path or on alternative paths that have not been shown or described. Paths that have been drawn as bidirectional do not have to be used to pass data in both directions.
As used herein, the term “cache” and its variants refer to a dynamic computer memory that is preferably (i) high-speed and (ii) adapted to have its present contents repeatedly overwritten with new data. To cache particular data, an entity can have a copy of that data stored in a determined location, or the entity can be made aware of the memory location where a copy of that data is already stored. Freeing a section of cached memory allows that section to be overwritten, making that section available for subsequent writing, but does not require erasing or changing the contents of that section.
References herein to the verb “to generate” and its variants in reference to information or data do not necessarily require the creation and/or storage of new instances of that information. The generation of information could be accomplished by identifying an accessible location of that information. The generation of information could also be accomplished by having an algorithm for obtaining that information from accessible other information.
As used herein in reference to an element and a standard, the term “compatible” means that the element communicates with other elements in a manner wholly or partially specified by the standard, and would be recognized by other elements as sufficiently capable of communicating with the other elements in the manner specified by the standard. The compatible element does not need to operate internally in a manner specified by the standard.
The present invention may be implemented as circuit-based processes, including possible implementation as a single integrated circuit (such as an ASIC or an FPGA), a multi-chip module, a single card, or a multi-card circuit pack. As would be apparent to one skilled in the art, various functions of circuit elements may also be implemented as processing steps in a software program. Such software may be employed in, for example, a digital signal processor, micro-controller, or general-purpose computer.
The present invention can be embodied in the form of methods and apparatuses for practicing those methods. The present invention can also be embodied in the form of program code embodied in tangible media, such as magnetic recording media, optical recording media, solid state memory, floppy diskettes, CD-ROMs, hard drives, or any other non-transitory machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. The present invention can also be embodied in the form of program code, for example, stored in a non-transitory machine-readable storage medium including being loaded into and/or executed by a machine, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. When implemented on a general-purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits.
It will be further understood that various changes in the details, materials, and arrangements of the parts which have been described and illustrated in order to explain the nature of this invention may be made by those skilled in the art without departing from the scope of the invention as expressed in the following claims.
Reference herein to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments necessarily mutually exclusive of other embodiments. The same applies to the term “implementation.”
Unless explicitly stated otherwise, each numerical value and range should be interpreted as being approximate as if the word “about” or “approximately” preceded the value of the value or range. As used in this application, unless otherwise explicitly indicated, the term “connected” is intended to cover both direct and indirect connections between elements.
For purposes of this description, the terms “couple,” “coupling,” “coupled,” “connect,” “connecting,” or “connected” refer to any manner known in the art or later developed in which energy is allowed to be transferred between two or more elements, and the interposition of one or more additional elements is contemplated, although not required. The terms “directly coupled,” “directly connected,” etc., imply that the connected elements are either contiguous or connected via a conductor for the transferred energy.
The use of figure numbers and/or figure reference labels in the claims is intended to identify one or more possible embodiments of the claimed subject matter in order to facilitate the interpretation of the claims. Such use is not to be construed as limiting the scope of those claims to the embodiments shown in the corresponding figures.
The embodiments covered by the claims in this application are limited to embodiments that (1) are enabled by this specification and (2) correspond to statutory subject matter. Non-enabled embodiments and embodiments that correspond to non-statutory subject matter are explicitly disclaimed even if they fall within the scope of the claims.
Although the steps in the following method claims are recited in a particular sequence with corresponding labeling, unless the claim recitations otherwise imply a particular sequence for implementing some or all of those steps, those steps are not necessarily intended to be limited to being implemented in that particular sequence.