The present invention relates to cyclic redundancy check (CRC) computations, and more particularly to table look-up techniques for error detection using CRC.
Information transmitted electronically may be vulnerable to corruption due to various factors including noise in the transmission channel, intentional tampering, etc. For example, errors may be introduced to a message transmitted over a network by the transmission media and/or the electrical or optical components comprising the network. To establish the integrity of a message, a sender may attach a checksum to the transmitted message that can be employed by the recipient to check for any of various transmission errors that may occur while the message is being sent.
A simple example of a checksum implementation includes appending the sum of the bytes in the message at the end of the transmission. The recipient may then add up the bytes in the received message and compare it with the checksum. If one or more of the bytes in the message were corrupted during the transmission, the sum will not likely match the appended checksum, thus indicating that the message may have been corrupted. However, this relatively simple checksum technique will fail if the various bytes in the message are corrupted in such a way that individual byte errors compensate one another to result in a sum consistent with the checksum. The probability of a checksum technique failing to identify an error can be reduced by introducing more complex techniques than simple summing.
It should be appreciated that the term “checksum” refers generally to any information appended or otherwise included in a transmission indicating one or more properties of the message, and is not limited to sums or any other particular operation. For example, a checksum may be a quotient remainder, a product, a sum or may include one or more other transformations based on the content of the message. The term “message” herein refers to the content portion of an electronic transmission (i.e., without the checksum). The term “transmission” is used herein to describe the combination of at least the message and the checksum.
Cyclic redundancy check (CRC) methods involve forming a checksum from the remainder of a quotient of the message and a predetermined binary number. For example, the message may be considered as a large binary number, wherein the first bit in the message may operate as the most significant bit (MSB) of the number and the final bit in the message may operate as the least significant bit (or vice-versa). The message may then be divided by a predetermined binary number known to both the sender and receiver of the message. The sender attaches the quotient remainder as the checksum and the receiver repeats the division operation on the received message to ensure that it matches the transmitted checksum.
The efficacy of the above scheme to detect certain types of transmission errors depends, in part, on the binary number used as the divisor. Certain classes of divisors have properties that can more readily detect transmission errors of different types. Certain polynomials exhibit desirable properties (e.g., randomness that when operated on with using polynomial arithmetic, and more particularly, Galois field polynomial arithmetic, provides a basis for performing effective CRC computations.
As discussed above, a message to be transmitted may be considered as a single large binary number. This binary number may then be divided by the binary representation of a chosen polynomial, referred to as a generator polynomial (e.g., polynomial 10 illustrated in
CRC operations employing generator polynomials are typically done using Galois field arithmetic (sometimes referred to as polynomial arithmetic). In Galois field arithmetic, addition and subtraction are equivalent to a logical exclusive-OR (XOR) operation as shown in Table 1 below. Certain generator polynomials are known to have generally desirable characteristics that lend themselves to detection of a variety of transmission errors, while having a low probability of missing errors due to, for example, internal compensation. Numerous generally effective generator polynomials are known in the art. However, any generator polynomial may be used.
A division operation in GF (2) may be performed by computing successive XOR operations between divisor and dividend. For example,
The division operation in
As message 355 is shifted through the LFSR, the state vector continues to change, based on the content of message 355 and the feedback connections or “taps” formed at various stages of the LFSR. For example, LFSR 300 includes feedback connections 310a and 310b, which provide the value stored at the MSB of the register to a respective summing element 325a and 325b situated between predetermined stages of the LFSR. The summing elements perform modulo-2 arithmetic on their inputs (i.e., the summing elements perform a logical XOR operation on respective input values). The feedback connections are arranged according to the generator polynomial being used. For example, feedback connections 310a and 310b implement the generator polynomial shown in the division operation of
A received message may be shifted through LFSR 300 as a binary stream from right to left. As the message is shifted through the LFSR, the feedback connections perform a division operation equivalent to the operation shown in
As the MSB of message 220 is shifted out of the LFSR (e.g., on the next clock pulse), feedback connections 410a and 410b take on a value of 1. As a result, the value in storage element R4 will be XOR'ed with feedback connection 410a and the result shifted into storage element R5. Likewise, the value in R3 is XOR'ed with feedback connection 410b and the result is shifted into storage element R4. The result after the first modular shift is shown in
In the configuration illustrated in
It should be appreciated that bits can be streamed into LFSR 400 to provide a division operation on any size message to perform a checksum validation. An LFSR may include any number of storage elements, i.e., the shift register may be of any length, and may implement any generator polynomial (e.g., the feedback connections may be of any configuration or arrangement to implement a desired generator polynomial). An LFSR may be implemented in hardware or software or a combination of both. While the hardware solutions are typically faster, software solutions provide generality and obviate the need to have dedicated hardware to perform CRC computations. For example, software solutions can easily incorporate and switch between any number of generator polynomials.
Software implementations, however, may significantly increase the computational cost of performing a CRC. In particular, the algorithm illustrated in
Look-up tables (LUTs) have been employed to speed up CRC computation by allowing multiple bits to be processed in a single operation. By pre-computing states of the LFSR and storing the results in an LUT, multiple states may be bypassed via an index into the LUT. For example, in
The number of states that an LFSR may be advanced depends, in part, on the generator polynomial being used, and the number of bits of an incoming message that are simultaneously considered. In particular, the distance between an initial state and an advanced state (i.e., the number of intervening states) depends on the number of bits being considered that precede the first feedback connection. For example, in
The index 565 of LFSR 500 (i.e., the contents of register 520 at iteration zero) may be used to address LUT 550 to obtain the associated advanced state stored as an entry in LUT 550. The obtained advanced state may then be loaded into the LFSR, obviating the need to iterate through the intervening states. As shown, only the values of S at the next iteration (i.e., S(1)) are obtained from LUT 520, the values of register 520 preceding the first feedback connection are obtained by shifting message 555 into LFSR 500 a number of times equal to the number of states by which the LFSR has been advanced to form the next index into LUT 550. Accordingly, on each iteration four bits of the message are processed simultaneously.
One embodiment according to the present invention includes a method for advancing a state of a cyclic redundancy check (CRC) computation on a transmitted message via a look-up table (LUT) storing a plurality of entries associated with possible states of the CRC computation, the method comprising acts of computing a plurality of indexes based at least on a current state and a message chunk of the transmitted message, each index of the plurality of indexes addressing a location in the LUT, obtaining a plurality of entries from the LUT, each entry acquired from the location indicated by a respective one of the plurality of indexes, and computing an advanced state based on the plurality of entries.
Another embodiment according to the present invention includes a computer readable medium encoded with a program for execution on at least one processor, the program, when executed on the at least one processor, performing a method of advancing a state of a cyclic redundancy check (CRC) computation on a transmitted message via a look-up table (LUT) storing a plurality of entries associated with possible states of the CRC computation. The method comprises acts of computing a plurality of indexes based at least on a current state and a message chunk of the transmitted message, each index of the plurality of indexes addressing a location in the LUT, obtaining a plurality of entries from the LUT, each entry acquired from the location indicated by a respective one of the plurality of indexes, and computing an advanced state based on the plurality of entries.
Another embodiment according to the present invention includes an apparatus for advancing a state of a cyclic redundancy check (CRC) computation on a transmitted message. The apparatus comprises an addressable storage area encoded with a look-up table (LUT), at least one input adapted to receive a message chunk from the transmitted message and a current state of the CRC computation, and at least one controller coupled to the at least one input, the at least one controller adapted to compute a plurality of indexes based on the message chunk and the current state, use each of the plurality of indexes to address a respective location of the LUT to obtain an entry from each of the locations, and compute an advanced state based on the obtained entries.
The number of states by which an LFSR may be advanced may be increased by considering more of the message preceding the first tap when computing an LUT. For example, in
An increase in the number of states by which a CRC computation is advanced on each iteration incurs a corresponding increase in the size of the LUT required to store possible combinations of advanced states. For example, the number of entries stored by an LUT typically increases as 2n, where n is the length of the index or state vector used to access the LUT. Applicant has identified and developed methods and apparatus for generating an LUT for a CRC computation that requires substantially less storage space than conventional methods. In one embodiment, multiple indexes are computed to address, in parallel, an LUT to obtain information from the LUT that, in combination, may be used to determine an advanced state.
As discussed above, computation times may be decreased by obtaining an advanced state from an LUT, rather than arriving at the advanced state by iteratively shifting u through the LFSR on a bit-by-bit basis (either physically in hardware, logically in software or a combination of both). Based on the generic form of LFSR 700, Applicant has developed methods for generating a LUT to store values pre-computed to advance an LFSR by k states, wherein the LUT has fewer than 2k entries. In one embodiment, knowledge of the tap configuration (i.e., characteristics of the generator polynomial are used to generate an LUT that does not require storing an advanced state for an exhaustive list of possible initial states. This realization stems in part from Applicant's work in Galois field mathematics beginning with the formulation of the generalized LFSR in
For scalar data, the * operator is used to indicate a bitwise AND operation. For vector and matrix data, the * operator is used to indicate the operation shown as follows:
where, as mentioned above, the * operates on the scalar values inside the matrix in equation 3 as a bit-wise AND operation. The formulation in equation 1 may be expressed more succinctly as,
x(k+1)=A*x(k)⊕B*u0 (4),
where x(k+1) is the state vector after the LFSR has been shifted from an initial state x(k). That is, column vector x31(k) . . . x0(k) represents the values stored in the stages of LFSR 700 at some reference instant (i.e., the column vector, denoted as x(k) in equation 4, represents an initial or current state of LFSR 700). Similarly, column vector x31(k+1) . . . x0(k+1) represents the state immediately succeeding the initial state x(k) after a single “shift” of the LFSR in view of a first bit u0 of binary number u.
Matrix A depends on the tap configuration (h0-h31) and affects a shift of the LFSR. Matrix B also depends on the tap configuration and performs the operation of the feedback connections. It should be appreciated that matrices A and B include similar information. In particular,
B=A*[1,0,0, . . . ,0]T (5).
This relationship may be used to simplify the expression in equation 4. For example, let k=0 to define an arbitrary initial state x(0). By multiplying u0 by [1, 0, 0, . . . , 0]T and substituting equation 5 into equation 4, equation 4 becomes,
x(1)=A*(x(0)⊕[u0,0,0, . . . ,0]T) (6).
Proceeding in a similar manner, the current state vector after the second shift (i.e., k=1) can be expressed as,
x(2)=A*x(1)⊕B*u1 (7),
or,
x(2)=A*(x(1)⊕[u1,0,0, . . . ,0]T) (8),
where again, x(2) is the state vector after the second shift and u1 is the second bit of u being introduced to the LFSR. Substituting the expression of equation 6 into equation 8 yields,
x(2)=A*(A*(x(0)[u0,0,0, . . . ,0]T))⊕A*[u1,0,0, . . . ,0]T (9).
It should be appreciated that A * [u1, 0, 0, . . . , 0]T is merely the first column of A multiplied by u1. A*A (i.e., A2) results in a matrix having a second column equal to the first column of A. Applicant has appreciated that the operation A2*[0, u1, 0, . . . , 0]T extracts the second column of A2, which is equal to the first column of A. That is,
A2*[0,u1,0, . . . ,0]T=A*[u1,0,0, . . . ,0]T (10).
The above equivalency allows equation 9 to be rewritten as,
A*(A*(x(0)⊕[u0,0,0, . . . ,0]T))⊕A2*[0,u1,0, . . . ,0]T (11).
Which may be simplified to,
A2*(x(0)⊕[u0,u1,0, . . . ,0]T) (12).
Taking further powers of A (i.e., A3, A4, A5, etc.) successively shifts the columns of the previous power to the right and generates a new first column. Accordingly, repeating the substitutions shown in equations 8-11, provides an expression for an arbitrary advanced state of the LFSR as follows:
x(N)=AN*(x(0)⊕[u0,u1,u2, . . . , uN]T) (13).
It should be appreciated that the advanced state x(N) is expressed in terms of an initial state x(0), powers of A and an N-bit chunk of u. For example, an advanced state advanced from an initial state by 32 states may be determined as follows:
x(31)=A31*(x(0)⊕[u0,u1,u2, . . . ,u31]T) (14).
In general, an arbitrary advanced state may be determined by,
x(N)=AN*(x(0)⊕u(0)) (15),
where AN is an N×N matrix, x is a state vector of length N, and u is the next N bits of u (e.g., an N-bit message chunk of a transmitted message). Applicant has appreciated that AN may be pre-computed, for example, to form a basis for a look-up table. By partitioning matrix AN and the corresponding indexes, the ultimate size of the LUT may be reduced. For example, consider the case where N is chosen to be 31, and partition A31 as follows:
where E1, E2, E3, and E4 are respective portions of A31, each being a matrix of size 32×8. From equation 15, let
Y=(x(0)⊕u(0)) (17),
and partition Y as follows:
where Y1, Y2, Y3, and Y4 are the first, second, third and fourth bytes of Y, respectively. Accordingly, the state vector x(31) may be written as,
which can be expressed as,
In equation 20, each Ei*Yi is a vector of length 32. Keeping in mind the relative expense of the matrix operation *, computing Ei*Yi on each iteration of a CRC computation to advance the state may become prohibitive from a computational standpoint. However, Ei may be pre-computed since it depends only on the configuration of the taps of the LFSR (i.e., Ei depends only on the known generator polynomial). Accordingly, Applicant has appreciated that Ei*Yi may be computed for all possible values of Yi to form a look-up table. For example, in the case where x(31) is being determined, each Yi may be a byte long and therefore can take on 256 possible values (i.e., 0-255). Thus, computing Ei*Yi (e.g., where i={1, 2, 3, 4}) for all values of Yi results in an LUT of the size 4×256. Accordingly, when a particular value of Y is obtained (i.e., by computing x(0)⊕u(0)), it can be used to index the LUT. For example, Y may be partitioned into multiple bytes Yi and used to address respective locations in the LUT to obtain entries Ei*Yi. The entries obtained from the LUT may then be XOR'ed together (as shown in equation 18) to obtain the desired advanced state (e.g., x(31)).
It should be appreciated that the LUT may be viewed as a single LUT or as multiple LUTs, either of which may be addressed in sequence or in parallel. When performed in parallel, the information needed to determine an advanced state may be obtained substantially during a single read operation. The size of an LUT will depend on a chosen N, which may also influence how the LUT and indexes are partitioned. Any size may be chosen for N and any arrangement of partitioning may used, as the aspects of the invention are not limited in this respect.
A look-up table 865 storing possible advanced states corresponding to a generator polynomial may be pre-computed. For example, a matrix AN may be computed based on the generator polynomial for any desired value of N, where N generally indicates the number of states advanced on each iteration. However, in some implementations N may not exactly equal state advancement.
The matrix AN may be used in connection with various combinations of initial states to compute advanced states corresponding to each of the initial states. For example, an index may be defined as shown in equation 14. The index Y typically will have a length equal to the larger of the length of the generalized LFSR state vector x and the length of the message chunk u being considered on each iteration, which may be chosen to be the same length. In one embodiment, the value of the index Y is the XOR of the initial state vector and the message chunk, as shown in equation 15. An initial state vector Y of length 32, therefore, may take on 232 values.
In conventional LUTs, an initial state vector is used to obtain a corresponding advanced state from an LUT. Accordingly, an advanced state for each possible initial state is stored in the LUT. For example, for a 32-bit initial state vector, the LUT may have 232 entries to store advanced states for each of the possible initial states that a CRC computation may potentially be in. Applicant has appreciated that the index Y may be partitioned into a number of parts, with each part being considered in a substantially independent manner. For example, a 32-bit index Y may be partitioned into four byte length parts Y1, Y2, Y3, and Y4. When treated independently, each part may take on 28 different values for a total of 4×28 (1024) possible values. By likewise partitioning matrix A31 into a corresponding number of portions (as shown in equation 16), an LUT of reduced size may be provided. In particular, all combinations of the first part of index Y (i.e., Y1) may be multiplied by the first part of matrix A31 (i.e., E1) to form a first portion of LUT 855. Likewise, all combinations of the second part of index Y (i.e., Y2) may be multiplied by the second part of matrix A31 (i.e., E2) to form a second portion of LUT 855. This process may be repeated until all corresponding portions of the LUT have been computed.
As discussed above, the index lengths and number of partitions illustrated herein are merely exemplary, and any desired configuration may be used to achieve a desired reduction in LUT size. As shown above, index Y and matrix AN are generalized and can be dimensioned and partitioned in any way, and the aspects of the invention are not limited for use with any particular sizes, partitions and/or configurations. The pre-computed LUT 855 may then be indexed during a subsequent CRC computation, as discussed in further detail below.
Assume that in method 800, n bits of a transmitted message are to be considered simultaneously (i.e., the CRC computation may be advanced by n states on each iteration). In act 810, a first n bits of the message (i.e., message chunk 805i) and an initial or current state vector 815i associated with the CRC computation are obtained. For example, the current state vector may initially be a zero vector on the first iteration (i.e., on iteration i=0) or may take on some other initial value. It should be appreciated that when a CRC computation is implemented in software, a current state may simply be a number that is updated and maintained throughout the course of the computation. The term “current state” or “current state vector” refers herein to the state of a CRC computation at a given instant. Each current state may function as an initial state from which to compute an advanced state.
In act 820, message chunk 805i and current state vector 815i are employed to compute a plurality of indexes into LUT 855. In one embodiment, the message chunk 805i and current state vector 815i are XOR'ed together to form a concatenated index into look-up table 855 (e.g., forming concatenated index Y as shown in equation 15). The concatenated index may then be partitioned into a plurality of indexes 835 that address respective portions of LUT 855. For example, the concatenated index may include 32 bits, which are separated into four byte-length indexes 835a-835d.
In act 830, the plurality of indexes 835 formed from the concatenated index are used to access LUT 855 to obtain respective entries, for example, indexes 835a-835d may each reference an associated entry in LUT 855. Data at the associated addresses may then be acquired, e.g., entries 845a-845d may be read from the LUT, to obtain information about a corresponding advanced state. Indexes 835 may be logical addresses that map to addressable portions of the LUT, or may correspond to any other type of mapping that allows a value corresponding to the index to be retrieved from the LUT. For example, indexes 835 may undergo one or more operations to transform each index into the actual physical address of the corresponding entry in the LUT.
In act 840, entries 840 obtained from the LUT are employed to compute an advanced state vector advanced from the current state vector by n states, i.e., by a number of states equal to the length of message chunk 805i. For example, the entries 840a-840d acquired from the LUT may be XOR'ed together to form the advanced state vector 815i+1. The current state vector may then be updated to equal the advanced state vector for the subsequent iteration i++.
Act 810 may then be repeated in a subsequent iteration using the updated current state (i.e., the advanced state computed on the previous iteration), in combination with the next n bits of the message, to compute new indexes into the LUT. This process may be repeated until all bits of the message have been processed, at which point the updated current state vector may represent the remainder of a division operation between the generator polynomial used to form the LUT 855 and the transmitted message. The obtained remainder may then be compared with the transmitted checksum to determine whether the message was corrupted during transmission.
It should be appreciated that the number of bits in the message chunk considered on each iteration may be any number, as the aspects of the invention are not limited for use with any particular choice of message chunk length, or the number of states advanced upon each iteration. In addition, a concatenated index may be of any length and may be partitioned into any number of indexes of any length to obtain any number of entries from the LUT. Similarly, the LUT may include any number of portions addressable by the indexes formed from the concatenated index.
In many processor architectures, it is common for operations to be applied to data having word length boundaries that may depend on the bus and/or register lengths of the processor. For example, bus widths may determine how much data is obtained in a single read operation and/or register lengths may determine how much data is transferred in load and store operations. By designing a CRC computation as described in connection with
In method 800, advanced state computations may be implemented with an XOR operation between a current state and a message chunk, a parallel index into the LUT, and XOR operations between the entries obtained from the LUT on each iteration. While other minor register operations may be required, from a computational standpoint, an iteration substantially consists of the above operations, providing a computationally efficient CRC. In addition, the CRC computation may be advanced by n states without requiring an LUT having 2n entries. For example, in the CRC computation illustrated in
The above-described embodiments of the present invention can be implemented in any of numerous ways. For example, the embodiments may be implemented using hardware, software or a combination thereof. When implemented in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers. It should be appreciated that any component or collection of components that perform the functions described above can be generically considered as one or more controllers that control the above-discussed function. The one or more controller can be implemented in numerous ways, such as with dedicated hardware, or with general purpose hardware (e.g., one or more processor) that is programmed using microcode or software to perform the functions recited above.
It should be appreciated that the various methods outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of a number of suitable programming languages and/or conventional programming or scripting tools, and also may be compiled as executable machine language code.
In this respect, it should be appreciated that one embodiment of the invention is directed to a computer readable medium (or multiple computer readable media) (e.g., a computer memory, one or more floppy discs, compact discs, optical discs, magnetic tapes, etc.) encoded with one or more programs that, when executed on one or more computers or other processors, perform methods that implement the various embodiments of the invention discussed above. The computer readable medium or media can be transportable, such that the program or programs stored thereon can be loaded onto one or more different computers or other processors to implement various aspects of the present invention as discussed above.
It should be understood that the term “program” is used herein in a generic sense to refer to any type of computer code or set of instructions that can be employed to program a computer or other processor to implement various aspects of the present invention as discussed above. Additionally, it should be appreciated that according to one aspect of this embodiment, one or more computer programs that when executed perform methods of the present invention need not reside on a single computer or processor, but may be distributed in a modular fashion amongst a number of different computers or processors to implement various aspects of the present invention.
Various aspects of the present invention may be used alone, in combination, or in a variety of arrangements not specifically discussed in the embodiments described in the foregoing and is therefore not limited in its application to the details and arrangement of components set forth in the foregoing description or illustrated in the drawings. The invention is capable of other embodiments and of being practiced or of being carried out in various ways.
Also, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of “including,” “comprising,” or “having,” “containing”, “involving”, and variations thereof herein, is meant to encompass the items listed thereafter and equivalents thereof as well as additional items.