This application is a national stage filing under 35 U.S.C. §371 of International Application No. PCT/JP2011/074468, filed on Oct. 24, 2011, which claims priority to Japanese Application No. 2010-274807 filed on Dec. 9, 2010. Each of these applications is incorporated herein by reference in its entirety.
The present invention is related to an encryption processing device, an encryption processing method, and a program. More particularly, the present invention is related to an encryption processing device, an encryption processing method, and a program executing shared key block ciphers with a Feistel structure or generalized Feistel structure.
As information-oriented societies progress, the demand for information security technologies to safely protect the information used continues to increase. One configuration element of information security technologies is encryption technology, and these encryption technologies are currently used in various products and systems.
There are many different encryption processing algorithms, and one basic example of such a technology is what is known as a shared key block cipher. There are two shared keys with the shared key block cipher, an encryption key and a decryption key. During encryption and decryption processing, multiple keys are generated from the share keys and data transformation processing is repeatedly executed in units of block data sizes such as 64-bit, 128-bit, and 256-bit block units.
As for representative shared key block cipher algorithms, there are known are the DES (Data Encryption Standard), which was the previous US standard, and the AES (Advanced Encryption Standard), which is the current US standard. Many other shared key block ciphers exit are still being proposed even now, and the CLEFIA proposed by Sony Corporation in 2007 is also a shared key block cipher.
This kind of shared key block cipher algorithm is mainly configured with an encryption processing unit including a round function executing unit for repeatedly executing input data transformations and a key scheduling unit for generating round keys applied to each round of the round function unit. The key scheduling unit generates an expanded key with increased bit counts based on a master secret key (main key), and then generates round keys (secondary keys) applied for each round function unit of encryption processing based on the generated expanded key.
A commonly known specific structure for executing this kind of algorithm repeatedly executes a round function that includes a linear transformation unit and a non-linear transformation unit. The most common of this kind of structure is the Feistel structure and the generalized Feistel structure. The Feistel structure and the generalized Feistel structure converts plaintext into ciphertext by a simple repetition of a round function that includes an F function functioning as a data transformation function. The F function executes linear transformation processing and non-linear transformation processing. Further details disclosing encryption processing applying the Feistel structure may be found in NPL 1 and NPL 2, for example.
The two types of embodiments of encryption algorithms include software embodiments and hardware embodiments. Hardware embodiments may be implemented with smaller circuit sizes, which lead to decreased costs and lower power consumption efficiency when implementing a hardware-based design. For this reason, many implementation methods to reduce size are proposed regardless of whether for new algorithms or existing algorithms.
For example, Hamalainen, Alho, Hannikainen, Hamalainen, et al., propose an implementation to reduce size of devices using the AES encryption algorithm with a Substitution Permutation Network (SPN) structure. Details on this implementation method to reduce size is disclosed in NPL 3 [Panu Hamalainen, Timo Alho, Marko Hannikainen, and Timo D. Hamalainen, Design and implementation of low-area and low-power aes encryption hardware core. In DSD, pages 577-583. IEEE Computer Society, 2006.9].
However, this implementation method to reduce size is applicable to processing sequences involving AES algorithms using the SPN structure, and so this results in a problem in which a sufficient reduction in size is not attainable when directly applied to DES and CLEFIA encryption algorithms with the previously described Feistel structure and a generalized Feistel structure, which are different from the SPN structure.
Further, the previously described AES encryption is an encryption algorithm using the SPN structure, and the DES encryption and the CLEFIA encryption are encryption algorithms using the Feistel structure and the generalized Feistel structure, which are different from the SPN structure. Details on these structures will be described in the following paragraphs.
The present invention has been made in light of the previously described situation, with the aims of providing an encryption processing device, encryption processing method, and a program to attain a reduction in size regarding encryption processing structures using the Feistel structure and generalized Feistel structure.
According to a first aspect of the present invention,
an encryption processing device includes
an encryption processing unit for dividing and inputting data to be processed in data blocks of a configured bit size into a plurality of lines, and repeatedly executing data transformation processing applying a round function on the data transferred to each line;
wherein the encryption processing unit includes
a register for storing the operation results from the operation unit,
and wherein the operation unit is configured to sequentially obtain the data from the register, performs an operation in the sequence the data was obtained, and store the results of which into the register,
and wherein the operation unit includes
and the matrix operation executing unit
executes, when a first cycle of a matrix operation is executed, an operation on the data in the second line during the execution of a matrix operation on the data in the first line.
Further, regarding an embodiment of the encryption processing device of the present invention, the matrix operation executing unit is configured to execute the matrix operation over a plurality of cycles on a plurality of data units sequentially output from an upstream non-linear transformation unit, and to perform an operation on the data in the second line in conjunction with the matrix operation on the unit data input from the non-linear transformation unit during a first cycle of the plurality of cycles.
Further, regarding an embodiment of the encryption processing device of the present invention, the encryption processing device is configured without an independent register for storing the data from the second line important for executing an operation on the data in the second line after the important operation cycle of the matrix operation on the data in the first line finishes, and to use a register for storing the results of the matrix operation in progress on the data in the first line as a register for storing the data from the second line.
Further, regarding an embodiment of the encryption processing device of the present invention, the matrix operation executing unit performs an XOR operation on the matrix operation process data regarding the first line and the data in the second line during an initial cycle for executing the matrix operation on the data in the first line.
Further, regarding an embodiment of the encryption processing device of the present invention, the matrix operation executing unit is configured to execute the matrix operation applying a cyclic matrix or a Hadamard matrix.
Further, regarding an embodiment of the encryption processing device of the present invention, the encryption processing unit acts as an executing unit of the round function, including a non-linear transform unit for executing non-linear transformations, and a matrix operation executing unit functioning as a linear transformation unit for executing linear transformations applying a matrix.
Further, regarding an embodiment of the encryption processing device of the present invention, the matrix operation executing unit sequentially inputs output from an S-box functioning as the non-linear transform unit and performs a matrix operation on the input data as one cycle of processing.
Further, regarding an embodiment of the encryption processing device of the present invention, the encryption processing executed by the encryption processing unit applies a Feistel structure or a generalized Feistel structure.
Further, regarding an embodiment of the encryption processing device of the present invention, the encryption processing executed by the encryption processing unit follows the CLEFIA encryption algorithm.
Further, according to a second aspect of the present invention,
an encryption processing method executes an encryption processing by an encryption processing device, the encryption processing method including
an encryption processing step in which an encryption processing unit divides and inputs data to be processed in data blocks of a configured bit size into a plurality of lines, and repeatedly executes data transformation processing applying a round function on the data transferred to each line;
wherein during the encryption processing step, transformation processing is executed on the data for a first line configured from the plurality of lines, an operation is performed on the data from the first line and a different, second line regarding the generated transformed data, and an operation is repeatedly executed to use the data obtained as a result as input data for a next round;
and wherein, when a matrix operation processing is in an initial cycle, an operation is executed on the data in the second line during the execution cycle of a matrix operation for executing the processing to generate transform data from the data in the first line.
Further, according to a third aspect of the present invention,
a program executing an encryption processing in an encryption processing device, the program including
an encryption processing step in which an encryption processing unit divides and inputs data to be processed in data blocks of a configured bit size into a plurality of lines, and repeatedly executes data transformation processing applying a round function on the data transferred to each line;
wherein during the encryption processing step, transformation processing is executed on the data for a first line configured from the plurality of lines, an operation is performed on the data from the first line and a different, second line regarding the generated transformed data, and an operation is repeatedly executed to use the data obtained as a result as input data for a next round, and an operation is executed,
and wherein, when a matrix operation processing is in an initial cycle, an operation is executed on the data in the second line during the execution cycle of a matrix operation for executing the processing to generate transform data from the data in the first line.
Further, the program of the present invention is a program provided on a recording medium, for example, regarding an information processing device or computer system capable of executing various program code, for example. Processing is achieved according to the program by executing this kind of program by a program executing unit in the information processing device or computer system.
Other purposes, feature, and advantages of the present invention should be understood by the detailed description based on the attached figures and embodiments described later. Note that system as used in the present specification is a logical assembly of multiple devices, and the devices of each configuration are not necessarily in the same casing.
The configuration of an embodiment of the present invention enables a reduction in size and lower power consumption of an encryption processing configuration applying a generalized Feistel structure.
Specifically, an encryption processing configuration applying a generalized Feistel structure in which data is divided and input into multiple lines, and data transformation processing is repeatedly executed applying a round function on the data transferred into each line, wherein during an execution cycle of matrix operation in which a matrix operation executing unit executes a linear transformation processing applying a matrix to the data in a first line, an operation is performed on the matrix operation process data in a first cycle and the data for a second line. This configuration enables a register to be used for both the storage of the data for the second line and the storage of the results of the matrix operation on the first line of data in progress, a reduction in the total number of registers, and thus a reduction in size. Further, the reduction in size of the circuit configuration also enables a reduction in power consumption due to a reduction in the number of elements.
An encryption processing device, an encryption processing method, and a program according to the present invention will be described in detail below with reference to the figures. The description is organized into the following sections.
1. Overview of Share Key Block Ciphers
2. Overview of Method to Reduce Size of AES Encryption Algorithms Applying the SPN Structure
3. Details on Structure and Processing of Matrix Calculating Circuits Regarding Reduced Size SPN Structures
4. Issues with Applying Reduced Size SPN Structures to Generalized Feistel Structures
5. Structures to Achieve Size Reduction of Generalized Feistel Structures
6. Advantageous Effects and Other Embodiments of Structures Regarding the Present invention
7. Example Configuration of Encryption Processing Device as an IC Card
First, an overview of shared key block ciphers that are applicable to the present invention will be described. The shared key block cipher (hereafter, block cipher) according to the present specification indicates ciphers according to the following definition.
A plaintext P and a key K are input into a block cipher, which outputs a ciphertext C. The bit length of the plaintext and the ciphertext is called a block size, and this is represented as n here. n may be set to an arbitrary integer value, but this is usually a singularly predetermined value for each block cipher algorithm. Note that there may also be algorithms which handle multiple block lengths. When a block cipher with a block length of n may be called an n-bit block cipher.
The key bit length is represented as k. Keys can be set to an arbitrary integer value. Shared key block cipher algorithms are compatible with either one or multiple key sizes. For example, a certain block cipher algorithm A may have a configuration that supports a block size n of 128, and several key sizes with bit lengths k of 128, 192, or 256.
The bit size of each the plaintext [P], the ciphertext [C], and the key [K] are as follows:
Plaintext P: n bit
Ciphertext C: n bit
Key K: k bit
Block ciphers may be divided into two portions. One portion is a key scheduling unit 111 which inputs the key K, expands the bit length of the input secret key K by some predetermined step, and outputs the expanded key K′ (bit length k′), and the other portion is a data encryption unit 112 to receive a round key RK or similar generated from the plaintext P and the expanded key K′ input from the key scheduling unit 111, inputs the plaintext P, executes encryption processing applying the round key RK or similar, and then executes a data transformation to generate the ciphertext C. Further, as previously described, the decryption processing is implemented by changing the data encryption unit 112.
In this way, the shared key block cipher algorithm is configured with the data encryption unit 112 including a round function to repeatedly execute transformation of the input data, and the key scheduling unit 111 to generate round keys applied to each round of a round function unit. The key scheduling unit 111 inputs the secret key K, and generates round keys to be input into each round function. For example, regarding a block cipher configured to perform a round function of r rounds, a round key is input into the round function corresponding to each round from 1 to r, labeled RK1, RK2, . . . , Rr. Also, the key scheduling unit 111 outputs an initial key IK and a final key FK to the data encryption unit 112, and an XOR operation is performed on these keys and the processed data.
As previously described, the Feistel structure is the most common structure used in the data encryption unit 112 regarding shared key block ciphers.
The Feistel structure includes a structure to convert plaintext into ciphertext by a simple repetition of a round function including an F function as a data transformation function. The F function executes linear transformation processing and non-linear transformation processing.
The right side of
As illustrated in the right side of
The F function for each round is input with the round keys RK1 through RKr generated from the expanded key K′ input from the key scheduling unit 111.
There are various types of configurations of the F function, and as a known example, the configuration of the F function 120 illustrated in
Further, the structure illustrated in
The configuration illustrated in
The data to be processed is not limited to being divided into only two parts, as various configurations may be created. Feistel structures which are not limited to two divisions are called generalized Feistel structures.
An example of a generalized Feistel structure will be described with reference to
The Feistel structure described with reference to
The configuration illustrated in
However, the data flow becomes complex when dividing the n-bit input into four parts as illustrated in
Similar to the F function 120 described with reference to
As with the 4-line generalized Feistel structure illustrated in
Further,
The following description of the present invention will be described using the 4-line generalized Feistel structure as an applicable example of the present invention. However, the present invention is not limited to a 4-line generalized Feistel structure, and may be applied to a 2-line (2-line) Feistel structure or a generalized Feistel structure that processes any arbitrary number of lines more than two (2-line).
Next, as a precursor to the description of the embodiments according to the present invention, on overview of a method to reduce the size of AES encryption algorithms applying the SPN structure previously proposed will be described.
As previously described, configurations have been proposed such as, for example, that by Hamalainen, Alho, Hannikainen, and Hamalainen to reduce the size of AES encryption algorithms with Substitution Permutation Network (SPN) structures such as by reducing the required register count. Details on this method to reduce size is disclosed in NPL 3 [Panu Hamalainen, Timo Alho, Marko Hannikainen, and Timo D. Hamalainen. Design and implementation of low-area and low-power aes encryption hardware core. In DSD, pages 577-583. IEEE Computer Society, 2006.9].
The method to reduce the size of this AES encryption algorithm will be described.
First, a random function of the AES encryption algorithm applying the SPN structure will be described with reference to
Further, similar to the Feistel structure, the AES encryption algorithm applying the SPN structure repeatedly executes a round function multiple times.
The round function executing unit illustrated in
The configuration includes a nonlinear transform unit 201 comprised of 16 S-box units with an 8-bit input and output for executing nonlinear transform processing, a shift low executing unit 202 for switching the 8-bit output from the S-box units configuring the nonlinear transform unit 201, a linear transform unit 203 comprised of four matrix operation units for executing linear transform operations by inputting the output from the shift low executing unit 202 in 32-bit units and applying this to a matrix, and an XOR operation unit 204 comprised of 4 operation units for performing XOR operations with 32-bit round keys against the 32-bit output from each of the four matrix operation units configuring the linear transform unit 203.
The example illustrated in
According to this implementation of AES, when one round of the round function processing (1 round), that is to say, the series of processing performed by the nonlinear transform unit 201, the shift low executing unit 202, the linear transform unit 203, and the XOR operation unit 204, is executed as one cycle (1 cycle), it is desirable to have a configuration in which a data encryption unit is configured with at least the 16 units of S-box circuits and the four matrix operation circuits, as illustrated in
Hamalainen, et al. were able to achieve a reduction in size of the data encryption unit by configuring the one round of the round function processing (1 round) as a sequential serial processing of 16 cycles (16 cycle) instead of as a single cycle.
According to this configuration to reduce size, only one S-box circuit is used, and one matrix operation is executed over four cycles (4 cycle). Such an implementation enables the size of the matrix operation circuit to be reduced.
Regarding the configuration illustrated in
As described with reference to
With the configuration in
Also, as described with reference to
As illustrated in
The output from the S-box 252 is input into a matrix operation circuit 253, and linear transformation processing applying a matrix is executed by the matrix operation circuit 253. Further, according to the configuration in
Processing of the four matrix operation circuits in the linear transform unit 203 illustrated in
The XOR operation processing of the XOR operation unit 203 illustrated in
The data substitution processing of the shift low executing unit 202 illustrated in
There is only one S-box unit according to the executable configuration of the AES algorithm by the PSN structure proposed by Hamalainen, et al. illustrated in
The number of registers is 152 bits worth of registers as illustrated in
The executable configuration of the AES algorithm applying the SPN structure proposed by Hamalainen, et al. illustrated in
Next, the configuration and processing of the matrix operation circuit regarding the reduced size SPN structure described with reference to
The linear transformation processing applying the matrix executed by the matrix operation circuit 253 within the configuration of the AES algorithm using the SPN structure proposed by Hamalainen, et al. described with reference to
A simplified data path as illustrated in
A register group 261 as in
The operation of the matrix operation circuit 253 for executing the linear transformation processing applying a matrix will be described using
Further, the (x0, x1, x2, x3) illustrated in Expression 1 correspond to the input into matrix operation circuit 253 (output from the S-box unit), and (y0, y1, y2, y3) corresponds to the output from the matrix operation circuit 253 (linear transformation result).
The 4×4 matrix corresponds to the matrix applied to the matrix operation circuit 253 (linear transformation matrix).
Further, the elements in the 4×4 linear transformation matrix are represented as hexadecimal values.
Each value of (x0, x1, x2, x3) in the present example represents 8-bit data which is the output per one cycle from the S-box 252. Each value of the output (y0, y1, y2, y3) is 8-bit data.
Further, processing of the linear transform unit 203 comprised of the four matrix operation units illustrated in
Therefore, the output from the four S-box units illustrated in
For example, when the matrix operation processing is executed by the matrix transform unit 203a illustrated in
The input from the S-box 252 into the matrix operation circuit 253 illustrated in
The linear transformation result applying a matrix using this data is output as the (y0, y1, y2, y3).
This data transformation is performed by the matrix operation circuit 253 illustrated in
As previously described, each of the x0, x1, x2, and x3 output from each cycle of the S-box 252 is 8-bit data, and each of the y0, y1, y2, and y3 as the result of the linear transformation by the matrix operation circuit 253 applying a matrix is also 8-bit data.
Next, the processing that occurs during each cycle will be described.
The matrix operation circuit 253 illustrated in
At the topmost line L1 illustrated in
At the second line L2 as well, the x0, which is the input data (din) is directly stored in the register r17 passing through an XOR operation unit 282.
At the third line L3 and the fourth line L4, the x0, which is the input data (din), is multiplied by a previously specified value of either 2 or 3 at a finite volume. That is to say, the following multiplication is executed by multiplying units 285 and 286.
Calculations X0·2 and X0·3 are performed.
These calculation results are stored in registers r18 and r19 passing through XOR operation units 283 and 284.
Further, a multiplication unit is not configured for the first line L1 and the second line L2, but this is equivalent to multiplying the x0, which is the input data (din) by a previously specified value of 1 at a finite volume.
The x1, x2, and x3, which is the input data (din) are input for the second cycle, the third cycle, and the fourth cycle, respectively. The second cycle, third cycle, and fourth cycle are different from the first cycle in that the enable signal input into the logical AND circuits 271 through 274 is set to one.
With this configuration, the XOR operations are performed between the input data of the multiplication value and the output from the logical AND circuits 271 through 274 at the XOR operation units 281 through 284, and the results of which are stored in registers r16 through r19.
As a result of such processing, the results of calculations according to the above (Expression 1) are stored in registers r16 through r19 after completion of the four cycles. That is to say,
(dout0,dout1,dout2,dout3)=(y0,y1,y2,y3).
In this way, the matrix operation is executed according to the previously described (Expression 1) by the matrix operation circuit 253 illustrated in
Further, the processing described with reference to
Further, the (x0, x1, x2, x3) illustrated in Expression 2 correspond to the input into the matrix operation circuit 290 illustrated in
The 4×4 matrix corresponds to the matrix applied to the matrix operation circuit 290 (linear transformation matrix).
Further, the elements in the 4×4 linear transformation matrix are represented as hexadecimal values.
The difference between the matrix operation circuit 290 that achieves a matrix operation applying the Hadamard matrix illustrated in
Multiplying units 291 through 294 are configured to correspond to the linear transformation matrix elements comprised of the 4×4 Hadamard matrix illustrated in Expression 2.
The logical AND circuits are changed to multiplexers (Multiplexers) 295 through 298, and the input to each of the registers r16 through r19 is configured to be selected from one of three possibilities, the output from two other registers or zero.
These configurations are differentiating points.
The configuration to reduce the size of the AES encryption configuration using the SPN structure proposed by Hamalainen, et al. described with reference to
The important registers applied to one round of the round calculation perform a simple calculation as illustrated below. However, the block size n which is the size of the data to be processed in the round calculation is set to 128 bits.
(1) 128-bit register for storing the round key
(2) 128-bit register for storing the processing data
(3) 32-bit register for storing the results of currently processing calculations regarding the matrix operation applying the linear transformation matrix
Registers (2) and (3) are the minimum desired for the data calculation unit, which results in a desired minimum of 160 bits (128+38) worth of registers.
However, the configuration proposed by Hamalainen, et al. illustrated in
The configuration proposed by Hamalainen, et al. does not use the 8-bit value after being input into the matrix operation circuit from the S-box unit for the next round. Taking a further look at this configuration, the register for the first 8 bits of the 32 bits input into the matrix operation circuit from the S-box unit is a shared register within the matrix operation circuit, and so this 8-bit register is deleted.
As previously described, Hamalainen, et al. have achieved a reduction in size of the SPN structure. However, this configuration to reduce size is a specialized configuration corresponding to SPN structures, and thus a sufficient reduction in size could not be attained when applying this configuration to reduce size to generalized Feistel structures. The related issues will be described next. Further, Feistel structures are included in the meaning of generalized Feistel structures for the following description.
If the configuration proposed by Hamalainen, et al. described with reference to
Also, with generalized Feistel structures, after the F function calculation in the round function, an XOR operation is performed with another line, and this step does not exist with the SPN structure. For this reason, the bit lengths of lines in the generalized Feistel structures have to accommodate this circuit for performing this XOR operation.
Further, the block size functioning as the processing data size for the round operation is set to n bits. As previously described with reference to
A register group 301 in
Further, the calculation executed applying an encryption algorithm data path (calculation execution circuit) applying the 4-line generalized Feistel structure illustrated in
This means that a round function is executed including an F function in the 4-line generalized Feistel structure illustrated in
A specific example of the F function in the round function is illustrated in
The F function illustrated in
(a) an XOR operation unit 321 for executing XOR operations with round keys,
(b) a non-linear transform unit [S] 322 comprised of S-box units for executing non-linear transformation processing against the output from the XOR operation unit 321, and
(c) a linear transform unit [M] 323 for performing linear transformation processing by the matrix operation against the output of the non-linear transform unit [S] 322.
However, the input and output corresponding to the F function in the 4-line generalized Feistel structure has a size of n/4 bits.
Further, the matrix executed by the linear transform unit MS] 323 used for the matrix operation applied to the linear transformation processing is assumed to be a cyclic matrix with a first row of elements (a, b, c, d). That is to say, this matrix is illustrated in (Expression 3) as follows.
The block configuration bit n, which is the processing unit, is set to
n=128-bit
so as to compare the configuration of the AES encryption algorithm applying the SPN structure previously described with reference to
The circuit illustrated in
As illustrated in
Data with a size of n/16 bits is input into the S-box 303 illustrated in
Further, according to the configuration in
The data non-linearly transformed by the S-box 303 is next input into a matrix operation circuit 304 at one cycle intervals of the n/16-bit data. The matrix operation circuit 304 executes a linear transformation processing applying a predetermined matrix.
Next, the operation executing circuit excluding the register group 301 within the data path of the encryption algorithm applying the 4-line generalized Feistel structure illustrated in
According to the operation circuit illustrated in
Also, the number of XOR operation circuits is also increased.
In this way, when applying the configuration proposed by Hamalainen, et al. to the generalized Feistel structure, in addition to the registers corresponding to the block length, a register for one line worth of data and an XOR operation circuit has to be added as per the operation circuit illustrated in
The increase in registers has a large effect on the scale of the circuit, and so it would be desirable to create an implementation method to configure only registers corresponding to the block length.
Note that the gate size of the registers is relatively greater as compared to other cells, and increase of the number of registers greatly affects gate size. Accordingly, as one directionality to realize reduction in size, consideration an implementation method where increase in registers is suppressed, becomes important.
Next, a configuration according to the present invention, that is to say, a configuration to reduce the size of generalized Feistel structures will be described.
As described in the previous section, registers and XOR operation circuits are increased when applying the implementation method proposed by Hamalainen, et al. to the configuration for executing an encryption algorithm having a generalized Feistel structure, which does not result in a reduction in size.
The difference between the encryption algorithm applying the SPN structure and the encryption algorithm applying the generalized Feistel structure is particularly that according to the configuration of the encryption algorithm applying the generalized Feistel structure, once the matrix operation result is obtained, an XOR operation is performed on another line.
That is to say, with the encryption algorithm applying the generalized Feistel structure, both registers for storing the results during matrix operations and registers for storing other line data have to be configured.
Also, according to the encryption algorithm applying the generalized Feistel structure, once the matrix operation for one line of data finishes, a matrix operation for a new line of the next cycle (cycle) starts. For this reason, an XOR operation on another line has to be performed between this one cycle. Thus, a circuit for XOR operations of one line has to be configured.
The configuration according to the present invention described as follows uses the associative law regarding XOR operations, that is to say, the following expression as formed below, and thus is able to delete important registers by changing the operation sequence.
[Math. 4]
(a⊕b)⊕c=a⊕(b⊕c) Expression 4
Though the order of the XOR operation is changed, the above Expression 4 still obtains the same result. According to the present invention, changing the operation order enables the deletion of important registers using this law.
Specifically, the operation order is changed in that an XOR operation is performed on the results of the matrix operation still processing stored in a register, which is storing data for another line. By changing the operation order in this way, the results of the matrix operation still processing do not have to be stored, which enables a reduction in the number of registers.
Further, the circuit illustrated in
A register group 501 as in
However, the register group 501 as in
As previously described, the register group 301 illustrated in
In contrast, the registers included in the register group 501 illustrated in
The bit number n for the block size of the processing data, that is to say, the bit number: n for the block size functioning as an the encryption processing unit applying the 4-line generalized Feistel structure is
n=128 bits.
With this setting, according to the configuration as in
96+64=160-bit
as the sum of
(3/4n) 1-bit=96-bit in the register group 301, and
eight 8×8=64-bit in operation units other than the register group 301
are necessary.
In contrast, according to the configuration as in
64+64=128-bit
as the sum of
(1/2) n-bit=64-bit in the register group 501, and
eight 8×8=64-bit in operation units other than the register group 301
are necessary.
That is to say, the reduction of significant registers is achieved as only 128 bits of registers are configured in the configuration of the present invention for the operation circuit applying the 4-line generalized Feistel structure illustrated in
Compared to the configuration as in
This will be described in detail next, but the reduction with the configuration of the present invention illustrated in
The operation sequence enabling the removal of these registers will be described in detail now.
A special configuration regarding the processing of the present invention is implemented for the operation sequence applying a matrix in the matrix operation circuit for performing linear transformations in order to reduce the number of registers. The operation sequence applying a circuit configuration as represented by the data path of the present invention as illustrated in
Further,
The difference between each processing will be described using Table 1 (
First, a matrix operation sequence used when the implementation method proposed by Hamalainen, et al. corresponding to the SPN structure is simply applied to the 4-line generalized Feistel structure is described with reference to
Let us assume that the output data (x0, x1, x2, x3) from the S-box 303 is sequentially input into the matrix operation circuit 304 following the data path illustrated in
The matrix operation circuit 304 outputs the output generated by the matrix operation applying a matrix (y0, y1, y2, y3) to an XOR operation unit 305.
The XOR operation unit 305 performs an XOR operation on the output from the matrix operation circuit 304 (y0, y1, y2, y3) and the output from another line in the 4-line generalized Feistel structure (E0, E1, E2, E3). The output from another line (E0, E1, E2, E3) is, for example, equivalent to the processing result of the round operation from the previous round.
Further, each of portion of the input into the matrix operation circuit 304 (x0, x1, x2, x3) has a size of n/16 bits, and each portion the output (y0, y1, y2, y3) as well as the output from another line (E0, E1, E2, E3) has a size of n/16 bits.
At this time, the values stored in the registers R0, R1, . . . , R7 illustrated in
Each element of the matrix operation result based on the input element x0 input into the matrix operation circuit 304 is stored in the registers R0, R1, R2, and R3 during the first cycle (1 cycle). At this timing, the enable signal (en) input in a logical AND circuit 313 is set to zero, and the multiplication result produced by a multiplication unit 311 based on the input element x0 is stored in the registers R0, R1, R2, and R3, that is,
the stored value in the register R0 is the result of (d·x0),
the stored value in the register R1 is the result of (c·x0),
the stored value in the register R2 is the result of (b·x0), and
the stored value in the register R3 is the result of (a·x0).
Afterwards, the input element x1 is input into the matrix operation circuit 304 for the second cycle. During the second through fourth cycles, the enable signal (en) input into the logical AND circuit 313 is set to one, and an XOR operation unit 312 executes an XOR operation on the multiplication result produced by the multiplication unit 311 based on the input element x1 and the values stored in the registers R0, R1, R2, and R3 from the previous cycle, and the result of which is newly stored in the registers R0, R1, R2, and R3.
Also, the output element E0 from another line is stored in the register R7 during this second cycle.
The input element x2 is input into the matrix operation circuit 304 for the third cycle. During the second through fourth cycles, the enable signal (en) input into the logical AND circuit 313 is set to one, and an XOR operation unit 312 executes an XOR operation on the multiplication result produced by the multiplication unit 311 based on the input element x2 and the values stored in the registers R0, R1, R2, and R3 from the previous cycle, and the result of which is newly stored in the registers R0, R1, R2, and R3.
Also, the output element E0 from another line is stored in the register R6 and the element E1 is stored in the register R7 during this third cycle.
The input element x3 is input into the matrix operation circuit 304 for the fourth cycle. The input of all input data (x0, x1, x2, x3) completes, and the matrix operation results (y0, y1, y2, y3) is stored in the registers R0, R1, R2, and R3 during the fourth cycle.
For the next or fifth cycle, the XOR operation unit 305 performs an XOR operation on the output from another line (E0, E1, E2, E3) and the (y0, y1, y2, y3), which are the matrix operation results produces by the matrix operation circuit 304 applying a matrix (linear transformation results), and the values for the results of which are stores in the registers R4, R5, R6, and R7.
The values stored in the registers, that is to say, the data illustrated in (Expression 5) below, are input through a line 306 illustrated in
[Math. 5]
E0⊕y0
E1⊕y1
E2⊕y2
E3⊕y3 Expression 5
Further, the values illustrated in (Expression 5) above are equivalent to the round output data (D) from the connection unit between rounds illustrated in
Also, with the fifth cycle, the operation of the first element x′0 in the next input values (x′0, x′1, x′2, x′3) into the matrix operation circuit 304 is stored in the registers R0, R1, R2, and R3.
Next, the transfer process in units of cycles of matrix operations performed by a matrix operation circuit 504 using the data path according to the present invention illustrated in
The data (x0, x1, x2, x3) as the output from a S-box unit 503 similar to that described with reference to
If the matrix operation is performed using the configuration illustrated in
[Math. 6]
E0⊕y0
E1⊕y1
E2⊕y2
E3⊕y3 Expression 6
Note that the values illustrated in (Expression 6) above are stored. These values are stored in the registers R4, R5, R6, and R7 during the fifth cycle, and are input through a line 506 into the register group 501 as the data to be used for the next round.
The configuration illustrated in
Each cycle in this processing will be described now.
According to the processing using the data path illustrated in
E0 is stored in register R5,
E1 is stored in register R6, and
E2 is stored in register R7.
During the first cycle, the output (E0, E1, E2, E3), which is sum of the E0, E1, E2 values stored in their respective registers and the new output value E3 output from the register group 501 through the output line 521, are input into an XOR operation unit 512. Further, the control of these operations is performed by a control unit not illustrated or a control based on clock input information.
The XOR operation unit 512 executes an XOR operation on these output values (E0, E1, E2, E3) and the multiplication result produced by the multiplication unit 311 based on the input element x0, that is,
d·x0,
c·x0,
b·x0,
a·x0
the results of this XOR operation are stored in the registers R0, R1, R2, and R3.
That is to say,
the value E1 stored in the register R6 is input into the XOR operation unit 512 through a multiplexer m0, in which the XOR operation result of (d·x0) is stored in the register R0.
The value E2 stored in the register R7 is input into the XOR operation unit 512 through a multiplexer m1, in which the XOR operation result of (c·x0) is stored in the register R1.
The value E3 output from the register group through the output line 521 is input into the XOR operation unit 512 through a multiplexer m2, in which the XOR operation result of (b·x0) is stored in the register R2.
The value E0 stored in the register R5 is input into the XOR operation unit 512 through a multiplexer m3, in which the XOR operation result of (a·x0) is stored in the register R3.
That is to say, each value illustrated in (Expression 7) below is stored in the registers R0, R1, R2, and R3.
[Math. 7]
E1⊕d·x0
E2⊕c·x0
E3⊕b·x0
E0⊕a·x0 Expression 7
Further, a multiplexer 513 (m0 through m3) performs the same processing as a selector that outputs one input selected from two inputs.
The values stored in the registers R7, R6, and R5 and the output value from the output line 521 are configured to be output during the first cycle. Further, this is controlled by a control unit not illustrated.
According to the configuration of the present invention, at the input timing of the input x0 from the S-box 503 into the matrix operation circuit 504 illustrated in
According to the configuration of the present invention, an XOR operation is preemptively performed on the output from another line (E0, E1, E2, E3) in this way. As a result, the output from another line (E0, E1, E2, E3) does not have to be stored until the matrix operation period over the four cycles completes. This processing change in the operation sequence enables the reduction in the number of important registers.
Afterwards, the input element x1 is input into the matrix operation circuit 504 for the second cycle. During the second through fourth cycles, the multiplexer 513 (m0 through m3) is controlled select and output the values stored in the registers R0, R1, R2, and R3.
As a result, the XOR operation unit 512 executes an XOR operation on the multiplication result produced by a multiplication unit 511 based on the input element x1 and the values stored in the registers R0, R1, R2, and R3 during the previous cycle, and the results of which are newly stored in the registers R0, R1, R2, and R3.
Also, the output element E′0 from another line is stored in the register R7 during this second cycle.
The input element x2 is input into the matrix operation circuit 504 for the third cycle. The XOR operation unit 512 executes an XOR operation on the multiplication result produced by a multiplication unit 511 based on the input element x2 and the values stored in the registers R0, R1, R2, and R3 during the previous cycle, and the results of which are newly stored in the registers R0, R1, R2, and R3.
Also, the output element E′0 from another line is stored in the register R6 during this third cycle, and E′1 is stored in the register R7.
The input element x3 is input into the matrix operation circuit 504 for the fourth cycle. The input of all input data (x0, x1, x2, x3) completes, and the XOR operation results on the matrix operation results (y0, y1, y2, y3) and the output from another line (E0, E1, E2, E3) is stored in the registers R0, R1, R2, and R3 during the fourth cycle.
For the next or fifth cycle, the output from the next other line (E′0, E′1, E′2, E′3) are input into the XOR operation unit 512 as the values stored in the registers R7, R6, and R5 and the output from the output line 521.
The XOR operation unit 512 performs the XOR operation on this input and the multiplication result produced by the multiplication unit 511 based on the input element x′0 newly input into the matrix operation circuit 504, and stores the result of which into the registers R0, R1, R2, R3.
At this timing, the values stored in the registers R0, R1, R2, R3 are stored in the registers R4, R5, R6, R7.
The values stored in the registers, that is to say, the data illustrated in (Expression 8) below, are input through the line 506 illustrated in
[Math. 8]
E0⊕y0
E1⊕y1
E2⊕y2
E3⊕y3 Expression 8
Regarding the configuration according to the present invention, preemptively executing the XOR operation on the output from another line (E0, E1, E2, E3) during the matrix operation processing eliminates having to separately configure independent registers for storing the output from another line (E0, E1, E2, E3) and the registers for storing the results of the matrix operation in progress, and so these registers are shared to reduce the number of important registers.
As previously described, according to the data path for executing the encryption processing applying a generalized Feistel structure according to the present invention illustrated in
That is to say, the XOR operation is preemptively executed on the output from another line (E0, E1, E2, E3) equivalent to the processing result of the round operation for the previous round, for example.
As described with reference to
According to the configuration of the present invention, by implementing the multiplexers (Multiplexers) for one line worth of data ((n/16)×4 in the present embodiment) in this way, one line worth of registers important for the configuration illustrated in
Also, lower power consumption may be expected along with this reduction in size.
In particular, the gate size of the registers is relatively larger when compared to other cells, the reduction of one line worth of registers contributes remarkably to the reduction in size.
Further, according to the previously described embodiment, a representative example of a configuration applying the present invention was described being applied to a 4-line generalized Feistel structure. However, the processing sequence described with reference to
Also, according to the embodiment previously described, the example was described using a cyclic matrix as the matrix applied to the matrix operation circuit, but the matrix applied to the matrix operation circuit is not limited to cyclic matrices, other types of matrices such as the Hadamard matrix, for example, may be applied.
Further, the matrix applied to the matrix operation circuit is not limited to a 4×4 matrix, the matrix may be any arbitrary size in the format of x×x as long as x is any natural number of at least two.
Also, the configuration having the F function previously described with reference to
Further, though the configuration described using the previously described embodiment includes the 4-line generalized Feistel structure, which is an example of a type 2 generalized Feistel structure, the present invention may also be applied to other type 1 and type 3 generalized Feistel structures with the expected similar effect.
The data path illustrated in
In this way, a matrix operation using a similar processing sequence to that previously described with reference to
Specifically, when implementing the configuration as in
Further, when the configuration is the 2-line Feistel structure illustrated in
Lastly,
A CPU (central processing unit) 701 illustrated in
An encryption processing unit 703 executes the encryption processing configuration, for example, as described with reference to
Further, the example illustrated the encryption processing method as individual modules here, but instead of configuring these kind of independent encryption processing modules, a configuration may be implemented in which the encryption processing program is stored in ROM, and the CPU 701 reads and executes the program stored in ROM, for example.
A random number generating unit 704 executes processing to generate random numbers used for the generation of keys important in the encryption processing.
A transmission/reception unit 705 is a data transmission processing unit for executing data transmission with external devices, executes data transmission with card readers/writers, IC modules, etc., and executes output of ciphertext generated within the IC module or data input from external devices such as card readers/writers, etc.
This concludes the detailed description of the present invention with reference to specific embodiments. However, it should be understood by those skilled in the art that various modifications and substitutions may be made insofar as they are within the scope of the present invention. That is to say, the present invention was disclosed using embodiments as examples, and this should not be interpreted as limiting the present invention. The claims should be referenced to determine the scope of the present invention.
Further, the series of processing described in the specification may be executed by hardware, software, or some combination thereof. When executing the processing with software, the program to which the processing sequence is recorded may be installed on and executed from a memory internal to a computer assembled from specialized hardware, or may be a program installed to and executed from a general-purpose computer capable of executing various processing.
For example, the program may be previously recorded on a recording medium such as a hard disk or ROM (Read Only Memory). Conversely, the program may be temporarily or permanently stored on (recorded to) removable recording media such as a flexible disk, CD-ROM (Compact Disc Read Only Memory), DVD (Digital Versatile Disc), MO (Magneto optical) disk, magnetic disk, and semiconductor memory. This kind of removable media may be provided as so-called packaged software.
Further, in addition to being installed on a computer from a removable recoding medium as previously described, the program may be transferred wirelessly from a download site to a computer, or may be transferred to a computer via a wired connection to a network such as a LAN (Local Area Network) or the Internet, the program being transferred thusly is received on a computer, which then may be installed on a recording medium in the computer such as on a hard disk.
Further, the various processing described in the specification is not limited to being executed serially according to the description, and may be executed in parallel or individually depending on the processing capabilities of the device to execute the processing or as desired. Also, the system in the present specification is a configuration of a logical collection of multiple devices, and thus is not limited to a configuration in which each configuration device is installed in the same chassis.
As previously described, the configuration of the embodiment of the present invention enables a reduction in size and lower power consumption of an encryption processing configuration applying a generalized Feistel structure.
Specifically, an encryption processing configuration applying a generalized Feistel structure in which data is divided and input into multiple lines, and data transformation processing is repeatedly executed applying a round function on the data transferred into each line, wherein during an execution cycle of matrix operation in which a matrix operation executing unit executes a linear transformation processing applying a matrix to the data in a first line, an operation is performed on the matrix operation process data in a first cycle and the data for a second line. This configuration enables a register to be used for both the storage of the data for the second line and the storage of the results of the matrix operation on the first line of data in progress, a reduction in the total number of registers, and thus a reduction in size. Further, the reduction in size of the circuit configuration also enables a reduction in power consumption due to a reduction in the number of elements.
Number | Date | Country | Kind |
---|---|---|---|
2010-274807 | Dec 2010 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2011/074468 | 10/24/2011 | WO | 00 | 5/31/2013 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2012/077419 | 6/14/2012 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
20070083768 | Isogai et al. | Apr 2007 | A1 |
20100091991 | Shibutani et al. | Apr 2010 | A1 |
Number | Date | Country |
---|---|---|
2003-345244 | Dec 2003 | JP |
Entry |
---|
Hamalainen et al., Design and Implementation of Low-area and Low-power AES Encryption Hardware Core. Proceeding of the 9th EUROMICRO Conference on Digital System Design (DSD06). IEEE 2006 pp. 577-583. |
Nyberg, Generalized Feistel Networks. Advances in Cryptology—ASIACRYPI '96. 1996. pp. 91-104. |
Satoh et al., Hardware-Focused Performance Comparison for the Standard Block Ciphers AES, Camellia, and Triple-DES. Information Security. Lecture Notes in Computer Science. 6th International Conference. Bristol, UK. ISC 2003, LNCS 2851. Oct. 1-3, 2003: 252-266. |
Sugawara et al., High-performance ASIC implementations of the 128-bit block cipher CLEFIA. IEEE. International Symposium on Circuits and Systems. 2008: 2925-2928. |
Zheng et al., On the Construction of Block Ciphers Provably Secure and Not Relying on Any Unproved Hypotheses (Extended Abstract). CRYPTO '89 Proceedings on Advances in cryptology. Springer-Verlag New York, Inc. New York, NY, USA. 1989: 461-480. |
Number | Date | Country | |
---|---|---|---|
20130251144 A1 | Sep 2013 | US |