The present invention generally relates to the secure hash standard. More specifically, the present invention relates to a method and system for implementing a secure hash algorithm (SHA-1) specified by the secure hash standard with hardware resources.
The SHA-1 generally operates as follows. The SHA-1 takes as input a message of maximum length which is less than 264 bits. The message is padded, if necessary, to render the total message length a multiple of 512. The message is then converted into 512-bit blocks. The 512-bit blocks are processed sequentially and the cumulative results represent a 160-bit message digest.
The SHA-1 performs eighty rounds of processing for each 512-bit block. For each of four groups of twenty rounds, the SHA-1 uses one of four Boolean functions and one of four constant values, to be further described below. Once all eighty processing rounds are completed, five 32-bit intermediate variables are updated. The process is then repeated for the next 512-bit block. Once all the 512-bit blocks are processed, the final, cumulative values of the five intermediate variables represent the 160-bit message digest. The details with respect to the processing of the 512-bit blocks will be further described below.
As mentioned above, the SHA-1 converts the message into 512-bit blocks and then processes the 512-bit blocks one at a time. More specifically, each 512-bit block to be processed is divided into sixteen (16) longwords W0, W1, . . . , W15, where W0 is the leftmost longword. Each longword is thirty-two (32) bits in length. The SHA-1 uses a five longword circular buffer to maintain the five 32-bit intermediate variables, a, b, c, d and e.
Prior to processing the first 512-bit block, the intermediate variables are initialized with the constant values H0 through H4 (in hex) respectively as follows:
a=H0=0×67452301
b=H1=0×EFCDAB89
c=H2=0×98BADCFE
d=H3=0×10325476
e=H4=0×C3D2E1F0
After the intermediate variables are initialized, the processing of the 512-bit blocks takes place as follows:
For t=16 to 79, let Wt=S1(Wt-3 XOR Wt-8 XOR Wt-14 XOR Wt-16), where Sk( ) represents a k-bit circular left shift.
The eighty (80) rounds of processing for each 512-bit block are executed according to the following equations:
For t32 0 to 79 do
a=TEMP=S5(a)+ft(b, c, d)+e+Wt+Kt
b=a
c=S
30(b)
d=c
e=d
where “+” represents addition modulo 232.
The function ft(b, c, d) and the constant Kt vary during the eighty (80) rounds of processing as follows:
f
t(b, c, d)=(b AND c) OR (NOT b AND d), for (t=0 to 19);
f
t(b, c, d)=b XOR c XOR d, for (t=20 to 39);
f
t(b, c, d)=(b AND c) OR (b AND d) OR (c AND d), for (t=40 to 59);
f
t(b, c, d)=b XOR c XOR d, for (t=60 to 79)
K
t=232×(21/2/4)=0×A827999 for (t=0 to 19);
K
t=232×(31/2/4)=0×6ED9EBA1 for (t=20 to 39);
K
t=232×(51/2/4)=0×8F1BBCDC for (t=40 to 59);
K
t=232×(101/2/4)=0×CA62C1D6 for (t=60 to 79)
After the eighty (80) rounds of processing (t=0 to 79) are completed, i.e., after a 512-bit block is processed, the intermediate variables a, b, c, d and c are updated as follows:
a=a+H
0
b=b+H
1
c=c+H
2
d=d+H
3
e=e+H
4
After processing the last 512-bit block, the message digest is the 160-bit string represented by the five (5) longwords, a, b, c, d and e. The foregoing is a brief description of the SHA-1. Details with respect to the operations of the SHA-1 are well understood.
The SHA-1 is typically implemented using software. A person of ordinary skill in the art will know how to implement the SHA-1 using software. Using software to implement the SHA-1, however, has a number of shortcomings. For example, it is relatively easy to break into a software program designed to implement the SHA-1 thereby revealing that the SHA-1 is used for encrypting messages. By ascertaining the type of encryption algorithm that is being used to encrypt messages, a hacker may then successfully decrypt the message digests to obtain the messages. Hence, it would be desirable to provide a method and system that is capable of offering more secure implementation of the SHA-1.
According to one exemplary embodiment of the present invention, an integrated circuit for implementing the secure hash algorithm is provided. According to this exemplary embodiment, the integrated circuit includes a data path and a controller controlling operation of the data path. The data path is capable of handling each round of processing reiteratively. In one implementation, the data path includes a data multiplexor, an address multiplexor, a memory, a first processing multiplexor, a second processing multiplexor, a first register, a second register, a shifter and an arithmetic logic unit. By coupling these various components of the data path, as further described below, the data path can be used to execute the secure hash algorithm in a reiterative manner.
In another implementation, the controller includes an address control module and a finite state machine. The address control module further includes a pico code ROM and a number of counters. The address control module uses a pico code memory address, the state of the finite state machine and various counter bits to generate a physical memory address and appropriate control bits to control the operation of the data path.
Reference to the remaining portions of the specification, including the drawings and claims, will realize other features and advantages of the present invention. Further features and advantages of the present invention, as well as the structure and operation of various embodiments of the present invention, are described in detail below with respect to accompanying drawings, like reference numbers indicate identical or functionally similar elements.
a-c are selected illustrative timing diagrams showing operations of the respective components of the data path in accordance with the present invention.
The present invention in the form of one or more exemplary embodiments is now described. According to an exemplary embodiment of the present invention, an integrated circuit is provided to implement the Secure Hash Algorithm (SHA-1) specified by the Secure Hash Standard as promulgated by the National Institute of Standards and Technology.
The parallelizability of the SHA-1 allows a continuum of hardware implementations that trade performance and hardware complexity. Assume that performance/throughout is represented by the following equation:
Throughout=(512×fmax)/(81×m) bits per second
where fmax represents the maximum clock frequency, 81 represents 80 processing rounds plus one update round, and m represents the number of clock periods required for each processing round.
In one implementation where m=16 and fmax100 MHZ, the resulting performance is calculated to be 39.5 Mb/s or 4.94 MB/s, or approximately five (5) kilobytes per millisecond. Experimentally, it has been determined that the 5 MB/s implementation requires approximately 1500 gates, 128 bytes of RAM and 132 bytes of ROM. In another implementation having an approximate order of magnitude increase in hardware for an m=1 and fmax=100 MHz, a performance of 79 MB/s, or 79 kilobytes per millisecond is achieved.
In an exemplary embodiment the data path 10 shown in
The finite state machine 32 is capable of assuming a number of states. In the exemplary embodiment shown in
According to an exemplary embodiment, the data stored within the ROM 44 is organized in a pico code format.
The memory 16 is organized based on a memory map.
As mentioned above, the pico code memory address is used to generate the physical memory address for accessing the memory 16. Generally, the physical memory address is generated from the pico code memory address, the state of the finite state machine 32, and various counter bits from the second mod-16 counter 38.
The physical memory address, A[4:0], used to access the memory 16 is generated from the pico code memory address in the following manner. When the pico code memory address bits [12-11] are “00”, A[4] is set to “0” and A[3:0] is determined as follows: (constant+t (mod 16)) mod 16, where the constant is:
When the pico code memory address bits [12-11] are “01”, A[4], A[2] and A[4] are set to “1”. A[3] is set as follows: if [t>=40], then A[3] is set to “1”, else A[3] is set to “0”. A[0] is set as follows: if ([20<=t<=39] OR [t>=60]), then A[0] is set to “1”, else A[0] is set to “0”.
When the pico code memory address bits [12-11] are “10”, A[4] is set to “1” and A[3] is set to “0”. A[2:0] are set as follows using the state of the finite state machine 32 and the pico code memory address bits [10-8]:
if ([FSM_STATE=INIT] OR [FSM_STATE=UPDATE]) then A[2:0] bits [10-8] else
if([bits[10-8]=“101”] AND [t<20]) then A[2:0]=(“001”−t[mod5]) mod 5
else if ([bits[10-8]=“101”] AND [t>=20]) then A[2:0]=(“011”−t[mod5]) mod 5
else if([bits[10-8]=“111”] AND [t<20]) then A[2:0]=(“011”−t[mod5]) mod 5
else if ([bits[10-8]=“111”] AND [t>=20]) then A[2:0]=(“001”−t[mod5]) mod 5
else A[2:0]=(bits[10-8]−t[mod5]) mod 5
When the pico code memory address bits [12-11] are “11” then A[4:0] are set to the pico code memory address bits [12-8].
Operations of the data path 10 are illustrated by a number of selected ting diagrams.
In an exemplary embodiment, the data path 10 and the controller including the finite state machine 32 and the address control module 34 are implemented as part of an integrated circuit using hardware. The integrated circuit can be embedded in a mobile communication device, such as a mobile phone, where encryption and decryption functions are desired for security purposes. Furthermore, the data path 10 and the controller can be implemented using reconfigurable hardware resources within an adaptive computing architecture. Details relating to the adaptive computing architecture and how reconfigurable hardware resources are used to implement functions on an on-demand basis are disclosed in U.S. patent application Ser. No. 09/815,122 entitled “ADAPTIVE INTEGRATED CIRCUITRY WITH HETEROGENEOUS AND RECONFIGURABLE MATRICES OF DIVERSE AND ADAPTIVE COMPUTATIONAL UNITS HAVING FIXED, APPLICATION SPECIFIC COMPUTATIONAL ELEMENTS,” filed on Mar. 22, 2001, the disclosure of which is hereby incorporated by reference in their entirety as if set forth in full herein for all purposes. Based on the disclosure provided herein, it will be appreciated by a person of ordinary skill in the art that the present invention can be implemented using hardware in various different manners.
It should also be understood that based on the disclosure provided herein, it will be appreciated by a person of ordinary skill in the art that minor modifications can be made to the present invention to accommodate and implement a number of other encryption/decryption algorithms.
It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, patents, and patent applications cited herein are hereby incorporated by reference for all purposes in their entirety.
The present application is a continuation-in-part application of U.S. patent application Ser. No. 09/815,122 entitled “ADAPTIVE INTEGRATED CIRCUITRY WITH HETEROGENEOUS AND RECONFIGURABLE MATRICES OF DIVERSE AND ADAPTIVE COMPUTATIONAL UNITS HAVING FIXED, APPLICATION SPECIFIC COMPUTATIONAL ELEMENTS,” filed on Mar. 22, 2001, the disclosure of which is hereby incorporated by reference in their entirety as if set forth in fill herein for all purposes.
Number | Date | Country | |
---|---|---|---|
Parent | 10093156 | Mar 2002 | US |
Child | 12353267 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 09815122 | Mar 2001 | US |
Child | 10093156 | US |