The proposed solution deals primarily with the processing of data packets in Ethernet-based computer networks, however, it is general enough to be also utilizable for a vast area of different data transfer mechanisms which use some kind of CRC value to ensure data integrity (e.g. high-bandwidth memory technology). During transfer over a medium, the data are susceptible to the introduction of random bit errors or burst errors, which must be usually detected before further processing. Damaged data should be then ignored as their origin meaning (semantics) can be significantly altered by the introduced errors. Therefore, the solution falls into the area of data transfers, telecommunication technology, and services.
CRC—cyclic redundancy check
FPGA—field-programmable gate array
To ensure the integrity of variably long data packets during transfer over a medium, the CRC control code value is computed and appended before their transmission. After transfer of the packets and their reception by the other communicating side, a new CRC value is computed from the received data. The computed value is then compared with the value appended to the packet by the sender. Equality of both CRC values signifies transmission of data without any error. On the other hand, if CRC values are not equal, the data have been somehow altered on the way and received message is invalid. Independent CRC value must be computed for each transferred packet (transaction) based only on data it contains.
Current solutions are able to realize basic CRC computations with relatively high theoretical throughputs. However, their main shortcoming is in the missing support for parallel computation of values for multiple individual packets transferred simultaneously (i.e. sharing a single data bus word). This considerably limits the real achievable throughput of these solutions, especially when very short packets are processed. The negative impact of the described shortcoming is becoming worse as data buses are constantly getting wider with their rising throughput requirements. Insufficient achievable throughput of CRC computation over data packets can therefore significantly limit the total transfer speed of the whole communication.
The throughput disadvantages mentioned above are eliminated by the Architecture for High-speed Computation of Error-detecting CRC Codes of Data Packets Transferred via Directly Connected Bus, according to the presented solution. Its principle is that the data bus word is divided between multiple (a total of N) individual submodules for CRC value computation from transferred packets. The number of these submodules is given by the data bus width, or more specifically by the maximal possible number of finished packets in a single data word on this bus. Every submodule is capable of CRC value computation based on the given part of the data word and intermediate CRC values computed by previous submodules. The internal architecture of each submodule enables correct CRC computation for every valid situation that can occur in the processed data word part. In the case of the packet start, the data before the packet are masked on the data input of the submodule. Furthermore, if the end of the same packet is also in the same data word part, a multiplexer forwards the masked data input to a specific CRC end handling logic and resulting CRC value is provided on the output. On the other hand, in the case of ending packet that continues from previous word parts, the unaltered input data are used together with intermediate CRC values from previous submodules and finalized CRC values is provided on the output. If starting packet is not ending in the same word part, the masked input data are used to compute intermediate CRC value and it is provided for the subsequent submodules. Finally, if the processed data word part does not contain packet start nor packet end, the unaltered input data are used to compute base CRC value which is then accumulated with the intermediate values from the previous submodule and the resulting intermediate CRC value is again provided for the next submodules. The behaviour of each submodule is controlled only by the signaling of packet positions that is a part of the connected input data bus.
In a preferred embodiment, the described architecture is created within an FPGA chip, which serves to receive, process and send data packets on Ethernet-based computer networks or high-bandwidth memories (HBM). The architecture is usually placed on the chip in two identical an independent instances for each communication port—one instance for transmitting (TX) side (appending of CRC value to the packet) and the other instance for receiving (RX) side (comparison of CRC values).
The advantage of the proposed solution is maintaining a very high throughput of CRC computation when processing packets of arbitrary valid lengths, so even for the shortest possible ones. Multiple independent CRC values can be computed in every cycle of FPGA clock as the processing of the data bus is divided between multiple submodules, which are able to cooperate together on a long packet or independently handle multiple short ones. Another advantage of the solution is the ability to fine-tune the architecture to the specific parameters of particular data bus and packets transferred over it. The submodules for the CRC computation are connected in a homogenous manner and share a unified interface, therefore the alteration of the top-level circuit structure is not a problem.
The principle of the proposed solution is further explained and described using the attached drawings. The architecture of the solution has two versions of realization—serial and parallel.
The subjects of the new solution, in general, are two versions of circuit architecture for the high-speed computation of cyclic redundancy check (CRC) codes that can handle multiple (up to N) data packets in every single clock cycle when processing a wide directly connected data transfer bus. The whole functionality of the circuit is divided into N submodules, where the attached
The diagram presented in the
Circuit connection at
Circuit connection in
Circuit connection at
Architecture for High-speed Computation of Error-detecting CRC Codes of Data Packets Transferred via Directly Connected Bus according to the presented solution can find industrial applicability in circuits for stream or batch processing of data that are divided into smaller independent pieces called packets or transactions. When compared to commonly applied solutions it allows parallel processing of multiple of these data packets in a single clock cycle (single data bus word), thus considerably increasing the effective achievable throughput of data integrity checking even for very wide data buses.
The solution disclosed above deals with the problem of high-speed computation of error-detecting CRC codes of data packets by means of architecture connected directly to the data bus, where firstly the data bus is by its data outputs interconnected with N parallel submodules (9 or 19) specialized to compute CRC values from given parts of data bus word (9.1 or 19.1), the number of which (N) is given by the maximal number of data packets transferred in a single data bus word; secondly the unique form of intermediate CRC values distribution is realized between submodules (9) through signals (9.2, 9.4) and register (10) in serial version of top-level architecture or between submodules (19) through signals (19.2, 19.4, 19.5, 19.6) and register (20) in parallel version of the top-level architecture, where the internal structure of individual submodules (9 or 19) is specifically tailored for such an arrangement; and finally the structure of each submodule (9 or 19) capable of processing one part of data bus word separates the main CRC value computation without any regard to packet boundaries (4) from the specific alterations of this process required to correctly handle continuing, starting or ending data packets, which is realized independently mainly by component (1) connected to data and control signals of the input data bus (1.1, 1.2, 1.3) for handling packet starts, component (8) connected to masked data signal (3.1) and intermediate CRC values (7.2) through multiplexers (3, 7) controlled by output (2.2) of component (2) for handling packet ends, and by component (6) together with multiplexers (5) handling the correct aggregation and distribution of intermediate CRC values (4.1, 5.1, 5.4, 6.1) for each submodule (9 or 19). Altogether, such parallel arrangement of submodules enables finalization of independent CRC values (9.3 or 19.3) for multiple (up to N) data packets that are simultaneously ending in the same single word of the connected data bus.
Number | Date | Country | Kind |
---|---|---|---|
PV 2018-270 | Jun 2018 | CZ | national |