This invention relates to a method of generating a code of the type, for example, used to verify integrity of data, such as between a source node and a destination node in a communications network. This invention also relates to a processing apparatus of the type, for example, that processes data to generate a code, such as is used to verify integrity of data.
In the field of digital communications, data is commonly communicated from a source node to a destination node. Typically, the source node, having a block of data to be transmitted, appends a code to the block of data, the code relating to the block of data to be transmitted and serving as a mechanism for verifying that the block of data is free of errors upon receipt thereof following transmission. One example of this technique is known as Cyclic Redundancy Checking (CRC), and involves the source node applying a 16- or 32-bit polynomial to the block of data, the result of the polynomial constituting a CRC code that is appended to the block of data. Upon receipt of the block of data and the CRC code by the destination terminal, the destination terminal applies a same polynomial to the block of data, the result of the same polynomial being compared with the CRC code appended by the source node. If the result of applying the polynomial at the destination node agrees with the CRC code appended to the block of data received, the block of data is deemed received free of errors. However, in the event that the result of the application of the polynomial at the destination node does not match the CRC code appended by the source node, the destination node usually notifies the source node to re-transmit the block of data.
This technique is used in relation to a number of communication technologies, for example: Media Access Control (MAC) of Ethernet, Third Generation wireless communications systems as standardised by the Third Generation Partnership Project (3GPP), as well as certain aspects of Internet-related technology, such as Stream Control Transmission Protocol (SCTP) and the Asynchronous Transfer Mode (ATM) Adaptation Layer 5 (AAL-5).
G. Griffiths and G. C. Stones, “The tea-leaf reader algorithm: An efficient implementation of CRC-16 and CRC-32” (Communications of the ACM, vol. 30, No. 7, July 1987), T. V. Ramabadran, S. S. Gaitonde, “A Tutorial on CRC Computations” (IEEE Micro, August 1988), and D. V. Sarwate, “Computation of Cyclic Redundancy Checks via Table Look-up” (Communications of the ACM, vol. 31, No. 8, August 1988) all describe conventional “parallel CRC calculation” techniques for generating CRC codes based upon contemporaneous look-ups of multiple bits. An exclusive-OR (XOR) is then performed on the result of the look-up and a successive input bit-string to generate a new value as an index for a subsequent look-up. Hence, generation of an index relies upon the result of a previous look-up.
The above-described data dependency represents a bottleneck to maximising performance of Central Processing Units (CPUs) in relation to modern high-performance microprocessors. For example, if a look-up (a load instruction) takes three cycles to complete, then four data-dependent look-ups take twelve cycles on any CPU architecture. In contrast, four data-independent look-ups are completed in fewer cycles than an equivalent number of data-dependent look-ups. In this respect, a pipelined super-scalar CPU can complete four data-independent table look-ups in seven cycles if one table look-up takes three cycles to complete. Other architectures, supporting parallel multiple memory bank accesses, such as StarCore developed by StarCore, LLC, or the TI C6xx family of processors available from Texas Instruments, only require three cycles to carry out four data-independent table look-ups. This is achieved by duplicating the table in four different memory banks, one table look-up taking three cycles to complete.
However, although data-independency is desirable, data-dependency is inevitable due to the nature of the mathematics underlying the technique, i.e. since the process of problem solving used to generate the CRC code needs related data.
According to the present invention, there is provided a processing apparatus and a method of generating a code for verifying integrity of data as set forth in the appended claims.
At least one embodiment of the invention will now be described, by way of example only, with reference to the accompanying drawings, in which:
Throughout the following description identical reference numerals will be used to identify like parts. In relation to changes to bits, updated parts of bit streams described herein will be underlined.
Referring to
The communications device/apparatus comprises an MC7447 32-bit processor available from Freescale Semiconductor, Inc. and constituting a processing resource 100. The skilled person will, however, appreciate from the description later herein that the above-described functionality can be implemented on other 32-bit processors.
The processing resource 100 has a scalar core and comprises, inter alia, an input 102 coupled to a Load/Store Unit (LSU) 104 capable of communicating with an Integer Unit (IU) 106 and a parallel processing unit, such as a so-called “AltiVec” Single Instruction Multiple Data (SIMD) engine 107, the LSU 104 also being coupled to an output 108. The skilled person will, of course, appreciate that the processing resource 100 comprises other operational units not described herein for the sake of conciseness and simplicity, since such operational units do not have a direct bearing on the examples described herein.
Turning to
An N-bit binary string, B, 200 is defined as B(x)=b0xN-1+b1xN-2+ . . . +bN-2x+bN-1, i.e. B=b0b1 . . . bN-1. A fixed (M+1)-bit value is likewise defined as G(x)=g0xM+g1xM-1+ . . . +gM-1x+gM.
A CRC code for the N-bit binary string, B, is an M-bit value defined as an M-bit value CRCM=c0c1 . . . cM-1, where ci (i=0, 1, 2, . . . , M−1) are coefficients of the polynomial CRCM(x)=c0xM-1+ . . . +cM-2x+cM-1=B(x)·xM mod G(x). That is, the CRC code, CRCM, is the remainder of left-shifting the string B(x) by M bits and then divided by G(x).
An L-bit parallel processing approach to calculating the CRC code over B can be achieved when B is accessed in units of length L bits, i.e. L-bit units, and a 2L entry table is provided to store the CRCM values, for 2L L-bit units, i.e. pre-calculating a look-up table, LTU, defined as LTU(z)=CRCM(z) for z=0, 1, 2, . . . , 2L−1. For example, if L=8, we have a byte-wise table look-up approach to calculate the CRC code.
A conventional approach to generating the CRC code handles the N-bit binary string, B, as a plurality of L-bit wide data units, Bi, and iteratively performs a look-up in the look-up table, LTU, in respect of each L-bit wide data unit, Bi, in turn, according to a known pattern of table look-ups. For each iteration, an exclusive-OR (XOR) function is then performed on a result obtained from the look-up table and a previous result of the XOR function. In this respect, for example, a 2nd lookup LTU(B1′) relies upon a 1st lookup result LTU(B0), while a 3rd lookup relies on a 2nd lookup result and so on. A final remainder of the above process is the desired CRCM value.
In contrast, the improved CRC algorithm “postpones” the data-dependent table lookups for several (for example, K) of the L-bit units, Bi, to allow K data-independent look-ups to take place. In order to achieve deferral of the data-dependent look-up stage, one or more compression tables are built for use in K data-independent table look-ups that are launched independently.
It is possible to postpone the data-dependent table look-ups due to the absence of carry propagation for addition operations on polynomials having a base Galois Field (GF(2n)), i.e. there is no carry propagation for XOR operations on binary strings. Consequently, through recursive application of data-independent table look-ups 202, 204, the N-bit binary string, B is compressed to form a compressed binary string, B′, 206 having a same CRCM value as the N-bit binary string, B. The theory supporting the above compression of the N-bit binary string, B, will now be explained in greater detail.
The skilled person should understand that in order for compression to be possible a congruence equivalence has to exist. In this respect, it can be proven that a given function, A(x), is congruent to B(x) modulo G(x) if and only if there exists another function, Q(x), such that A(x)−B(x)=Q(x)G(x). For the sake of conciseness of description, an actual proof of the above congruence equivalence has been omitted herein.
Referring to
where the W(x) is an “integral part” 302 of the N-bit binary string, B, and t(x) is a T-bit trailer 304 or “left-over part”. In this respect, to constitute the N-bit binary string, B, m vectors are needed, but since m vectors results in a string of greater length than N-bits, the difference between the first (m−1) vectors 300 and the N-bits is the T-bit trailer 304 that is shorter than one vector in length. The integral part W(x) 302 therefore constitutes a first (m−1) complete vectors.
It can be proven that (W(x )−Q(x)G(x))xT+t(x) is congruent to B(x) modulo G(x) where Q(x) is an arbitrary function. However, for the sake of conciseness the proof will not be described herein. The congruence holds when Q(x)G(x) is subtracted from W(x), the integral part of B(x), i.e. after subtracting a multiple of G(x) from the integral part of the data frame; the result of the subtraction still has a same CRC value as the N-bit string of the data frame, B(x) 200.
In relation to the integral part 302, the integral part 302 can be expressed as:
W(x)=v0(x)x(m-2)KL+v1(x)x(m-3)KL+v2(x)x(m-4)KL+. . . +vm-2(x)
Taking a first term, v0(x)x(m-2)KL, of the above expression for the integral part W(x) 302, the first term, v0(x)x(m-2)KL, can be expressed as follows:
v
0(x)x(m-2)KL=(B0x(K-1)L+B1x(K-2)L+ . . . BK-2xL+BK-1)x(m-2)KL
In order to achieve compression of the data frame, B(x), 200 it is necessary to replace the first term, v0(x)x(m-2)KL, with a congruent polynomial having a lower degree than the first term, v0(x)x(m-2)KL. The above replacement results in the integral part, W(x), 302 being reduced to W′(x) (not shown), W′(x) being one vector shorter in length than W(x). In this way, the same CRC code can be generated in respect of the resultant, compressed frame.
Furthermore, by introducing an integer C limiting the number of vectors updated, where 0<C<(m−2), the first term is equivalent to:
v
0(x)x(m-2)KL=(B0xCKL+(K-1)L+B1xCKL+(K-2)L+ . . . +BK-2xCKL+L+BK-1xCKL)x(m-2-C)KL
For each BixCKL term, where i=0, 1, . . . , K−1, there exists a function qi(x), such that BixCKL=qi(x)G(x)+ri(x); the degree of ri(x) is less than the degree of G(x), i.e. deg ri(x)<M. Consequently:
The above expression is equivalent to:
v
0(x)x(m-2)KL=(r0x(K-1)L+r1x(K-2)L+ . . . +rK-2xL+rK-1)x(m-2-C)KLmod G(x)
The term, ri=BixCKL mod G(x), is thus pre-calculated and stored in a compression look-up table LTU(Bi) for Bi=0, 1, . . . 2L−1. The term, ri=LTU(Bi), may overlap with a subsequent term, ri+1=LTU(Bi+1), but the same compression look-up table is nevertheless used to achieve the result of an operand modulo multiplied by xCKL.
At this stage, the first term, v0, is eliminated while the subsequent terms, v1, v2, . . . , vC, are updated as v1′, v2′, . . . , vC′, such that:
(vc+1 . . . vm-2),
and
B(x)=W(x)xT+t(x)=W′(x)xT+t(x) modulo G(x)
From the above illustration, it can be seen that W(x)=W′(x) mod G(x), where W′(x) is the compressed integral part resulting from elimination of the leading vector (first term), v0, and shifting and performing XOR operations on the results of the compression table look-ups r0, r1, . . . rK-1.
Hence, by recursively applying the above approach to a leading vector of the N-bit binary string, a binary string of arbitrary length can be reduced to a binary string having C vectors as a compressed integral part thereof. The reduced binary string is congruent to the original binary string modulo G(x).
In practice, the integer C is chosen as 2 in order to ensure that a final lookup result from the compression look-up table, LUT(BK-1), is right aligned with v2, and hence to avoid wasting processor resources as only v1 and V2 need updating. However, other values can be chosen for the integer, C, to postpone the data-dependency further.
In relation to a vector-based parallel processor, such as the AltiVec engine (
As mentioned above, ri, where i=0, 1, . . . , 15 is a remainder of byte i, Bi, of vector j, vj, times x256 with respect to the function G(x), i.e. Bi*x256 mod G(x). For the function G(x) with degree of M, each remainder, ri, occupies M bits.
The AltiVec engine is capable of carrying out 16 Bi*x256 mod G(x) look-ups for i=0, 1, . . . , 15 in parallel by splitting each byte to be looked-up into a corresponding pair of nibbles, namely a high nibble, Hi, and a low nibble, Li. In this way, the following expression illustrates that the same result as Bi*x256 can be achieved by carrying out look-ups in respect of nibble pairs:
B
i
*x
256=(Hi+*x4+Li)*x256=Hi*x260+Li*x256 mod G
Hence instead of a single compression look-up table for vector-based parallel processing, two 16-entry compression look-up tables are employed: one in respect of the high nibble (LUTH) and one in respect of the low nibble(LUTL):
LUTH(H)=H*x260 mod G,
and
LUTL(L)=L*x256 mod G.
Combining results obtained from these look-up tables (using an XOR operation) allows results to be obtained equivalent to using a single compression look-up table:
LTU(HL)=LUTH(H)+LUTL(L)
The remainders obtained are staggered or “staircased” in M-bit wide blocks, i.e. the result of mod G(x), as a result of the long division, effectively being arranged for the performance of XOR operations as shown in
In another embodiment, if the vectors v0, v1, v2 . . . vm-1, are assumed to be shorter in length, in particular, four bytes in length each, then the scalar core of the processing resource 100 can compress the N-bit binary string, B(x), on a word-wise basis in accordance with the principles already described herein. In a further embodiment, the vector-based compression technique described above can be used to generate an intermediate compressed binary string that is subsequently subjected to a word-wise compression by arranging the vectors as units of smaller lengths, i.e. words.
The above compression technique will now be described in the context of CRC12. However, the skilled person will appreciate that other variants of CRC can be implemented using the above technique. In operation (
The number of vectors (vecs) is then calculated (Step 602) along with the number of trailing bytes, which is used to determine the number of words (words) 612 after vector compression has been completed, of the N-bit binary string. The N-bit string, B(x), is then arranged as a series of vectors, v0, v1, v2, followed by the T-bit trailer 304 comprising a trailing word, Tw0, and a trailing byte TB0.
According to the conventional method of generating a CRC12B, the CRC12B in respect of the N-bit binary string, B(x), is CRC12B(B(x))=0xF19. However, using the above described improved CRC code generation technique, a same CRC code can be generated as the CRC12B code generated by the conventional method.
In this respect, the LSU 104 implements a loop, by firstly initializing (Step 604) a counter, i, to zero. The LSU 104 then verifies (Step 606) that the counter is less than a vector number that is two less than the total number of vectors, vecs. The LSU 104, in conjunction with the AltiVec engine 107, then performs a compression iteration (Step 608) in respect of a vector corresponding to the counter. The compression iteration is performed as follows.
Firstly, the first vector, v0, is broken into a high nibble vector, vH, and a low nibble vector, vL, such that:
vH=9FED CBA9 0765 4321
vL=9123 4567 89AB CDEF
Due to the fact that the processing resource 100 is operating on vectors for this stage of the improved CRC algorithm, a first compression look-up table (Table 1, below) and second compression look-up table (Table 2, below) is used, generated using the already described algebraic technique.
Four vectors are needed to store each of the first and second vector compression tables to facilitate a so-called “vperm” instruction supported by the AltiVec engine 107. The first compression table is used in relation to Low (L)-nibble vectors and the second compression table is used in relation to High (H)-nibble vectors. Further, a two-byte result of a parallel nibble look-up needs two vectors to store.
The first and second compression look-up tables are then accessed by the AltiVec engine 107 to obtain Most Significant (MS) bytes corresponding to the high and low nibble vectors vH and vL:
vCcrc12_MSB(vH)=0901060e09060109000f080007080f07
vCcrc12_MSB(vL)=0a090a030d04070e030a09000e07040d
The results of the above two compression table look-ups are then subjected to an XOR operation:
to generate the MS bytes for the compression of the first vector, v0. The first and second compression look-up tables are then accessed again to obtain Least Significant (LS) bytes corresponding to the high and low nibble vectors vH and vL:
vCcrc12_LSB(vH)=8437fda36910da840079b3ed275e94ca
vCcrc12_LSB(vL)=9c79fd84f58c0871e59c18611069ed94
The results of the above two compression table look-ups are then subjected to an XOR operation:
As can be seen from
(vCcrc12_MSB(v0)<<8)⊕(vCcrc12_LSB(v0H)((vCcrc12_LSB(vL))=03:10420d239e9ad5f6e0e4ab85383c735e
The counter, i, is then incremented (Step 610) and the LTU 104 determines (Step 606) once again whether the counter is still less than (vec−2) vectors. Depending upon the size of the N-bit binary string, B(x), the above process of table look-ups is repeated (Steps 608, 610, 606) until all bytes in the (vec−2) vectors forming the vectors being compressed have been looked-up.
In the present example, the N-bit string only comprises 3 vectors and so after one iteration of the above process (Step 408), another XOR operation is performed on the result of the previous XOR operation as well as the first and second vectors, v1:v2 to yield a partially compressed string of bits B′, constituting the elimination of the first vector, v0, from B(x):
Thereafter, the processing resource 100 switches from compressing the N-bit string on a vector basis to a word compression basis (
In this respect, the LTU 104 calculates (Step 612) the number of words based upon the length of the vectors of the previous iteration using the AltiVec engine 107 (32 bytes) and the number of bytes in the trailer 304; the number of bytes in the trailer 304 being calculated as (bytes mod 4). Thereafter, the LTU 104 re-initialises (Step 614) the counter, i, to zero and determines (Step 616) whether the counter is less than the number of words less the constant, c. In this example, the constant c is 2 as mentioned above so as to preserve right alignment. Since this is the first iteration of the word-wise compression, the processing resource 100 proceeds to access a word (4-byte) compression table (Table 3, below). Each entry in the word compression table is 2-bytes wide and padded with 4 binary leading zeros to facilitate right byte-alignment.
Consequently, the LSU 104 initially accesses the word compression table to looks-up each word in the partially compressed string B′, Hence, for the first word 700 (w1), the look-up results are as follows:
wCcrc12(10)=46C (reference 710)
wCcrc12(11)=229 (reference 712)
wCcrc12(12)=8E6 (reference 714)
wCcrc12(13)=EA3 (reference 716)
Thereafter, an XOR operation is performed on the 4 results of the above look-ups and w2:w3 in order to obtain w2′:w3′:
w2′:w3′=14151617 18191a1b
⊕ 4 6C
⊕ 229
⊕ 8E6
⊕ EA3
14151613 7638F2B8
thereby reducing the partially compressed string B′(x), since the first word, w1, is eliminated. Hence, a first iteration of the word-wise compression (Step 618) results in a further compressed binary string, B″:
B″=14151613:7638F2B8:18191a1b:1c1d1e1c:30632f00: babff3d1:c8cd81ae:14115d71:30313233:84=w2′:w3′:w4:w5:w6:w7:w8:w0:B0
The counter, i, is then incremented (Step 620) and the LSU 104 again determines (Step 616) whether the counter is still less than the predetermined maximum of (word-c). In the present example, the counter, i, is still less than (word-c) and so the word-wise compression process is repeated (Step 618) using the further compressed binary string B″, resulting in a final compressed binary string B′″:
B′″=7a1747a5: 0xaf156735:30313233:84
At this point, it should be pointed out that the final compressed binary string B′″ cannot be compressed further, but that it has a same CRC12 value/code as the N-bit binary string, B.
The conventional known CRC12 algorithm is therefore applied (Step 422) to the final compressed binary string B′″, resulting in:
crc12B(B′″)=0xF19
This is the same result as previously stated above in relation to performing the conventional CRC algorithm exclusively on the N-bit binary string. As can be seen from the above example, other than in relation to the conventional CRC12 algorithm, the compression table look-ups performed above are data-independent. It is thus possible to provide a method of generating a code for verifying integrity of data and an apparatus therefor that is capable incorporating more instructions into a given number of CPU cycles than through use of existing data-dependent CRC algorithms. Consequently, higher performance is achieved through the increase in Instructions Per Cycle (IPC).
The above described example using exemplary data employed both vector-based and word-wise compression of the N-bit binary string 200. However, the skilled person will appreciate that either the vector-based implementation can also be used as a sole means of compressing the N-bit binary string 200 prior to implementing the conventional CRC algorithm, or the word-wise implementation can alternatively be used as the sole means of compressing the N-bit string prior to implementing the conventional CRC algorithm.
Although, in the above examples, a single microprocessor constitutes the processing resource 100, the skilled person will appreciate that more than one suitably-equipped processing unit can constitute the processing resource 100.
Alternative embodiments of the invention can be implemented as a computer program product for use with a computer system, the computer program product being, for example, a series of computer instructions stored on a tangible data recording medium, such as a diskette, CD-ROM, ROM, or fixed disk, or embodied in a computer data signal, the signal being transmitted over a tangible medium or a wireless medium, for example, microwave or infrared. The series of computer instructions can constitute all or part of the functionality described above, and can also be stored in any memory device, volatile or non-volatile, such as semiconductor, magnetic, optical or other memory device.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2005/053119 | 6/30/2005 | WO | 00 | 12/28/2007 |