This invention relates to a protocol for communication between devices and, more particularly, to the processing of transaction layer packets between a requesting device and a receiving device.
Communication protocols, of which there are many, enable different types of connected devices to converse. PCI Express, for example, is a serial input/output (I/O) protocol in which devices, such as chips or adapter cards, communicate with one another using packets.
PCI Express employs a scalable serial interface. Two low-voltage, differential driven signal pairs, one for transmit, one for receive, constitute a PCI Express link between two devices. (The PCI Express™ Base Specification, Revision 1.0a, was published by the PCI Special Interest Group, www.pcisig.com, on Apr. 15, 2003.)
The PCI Express protocol defines a transmission layer, a link layer, and a physical layer, present in both a transmit device and a receive device, the devices being connected by a PCI Express link. At the transmit device, the transmission layer assembles packets of transaction requests, such as reads and writes, from the device core. Header information is added to the transaction request, to produce transaction layer packets (TLPs). The link layer of the transmitting device applies a data protection code, such as a cyclic redundancy check (CRC), and assigns a sequence number to each TLP. At the physical layer, the TLP is framed and converted to a serialized format, then is transmitted across the link at a frequency and width compatible with the receiving device.
At the receiving device, the process is reversed. The physical layer converts the serialized data back into packet form, and stores the extracted TLP in memory at the link layer. The link layer verifies the integrity of the received TLP, such as by performing a CRC check of the packet, and also confirms the sequence number of the packet. Once both checks are performed, the TLP, excluding the sequence number and the link layer CRC, is forwarded to the transaction layer. The transaction layer disassembles the packet into information (e.g., read or write requests) that is deliverable to the device core. The transaction layer also detects unsupported TLPs and may perform its own data integrity check. If the packet transmission fails, the link layer requests retransmission of the TLP, known as a link layer retry (LLR).
While effective, the division of labor between the various layers in the communication link may produce undesirable latency in processing the transaction. The latency on a link depends on many factors, including pipeline delays, width and operational frequency of the link, and electrical transmission delays. The communications protocol itself may also produce an undesirable latency.
For example, link layer processing is completed in its entirety before a packet is transferred to the transaction layer. Put another way, the transaction layer is unable to begin processing the packet until the link layer is done processing the packet. This method ensures that transactions are not forwarded to the core unless validated by the link layer. However, the scheme also causes some latency in the processing of the packet.
As another example, at the receiving device, the TLP is stored at the link layer and again stored at the transaction layer. Link layer processing of the TLP occurs in link layer memory before being sent to the transaction layer. Likewise, transaction layer processing of the TLP occurs in transaction layer memory before being sent to the device core. By completing the processing of the TLPs in each layer, both the link layer and the transaction layer must separately provide memory space for the transaction.
Thus, there is a continuing need for a communications protocol that overcomes the shortcomings of the prior art.
In accordance with the embodiments described herein, a receiving device including a physical layer, a link layer, a transaction layer, and a core, is disclosed in which transaction layer packets are speculatively forwarded from the link layer to the transaction layer before processing at the link layer is completed, and without the use of memory storage at the link layer. A link layer engine minimally processes the data link layer packet by checking the sequence number only and not the CRC before forwarding the packet to the transaction layer. This allows the transaction layer to pre-process the packet, such as verifying header information. However, the transaction layer is unable to make the transaction globally available until the link layer has verified the CRC of the packet. The simultaneous processing of the packet by both the link layer and the transaction layer reduces latency, in some embodiments, and lessens the amount of memory needed for processing.
In the following detailed description, reference is made to the accompanying drawings, which show by way of illustration specific embodiments other embodiments will become apparent to those of ordinary skill in the art upon reading this disclosure. The following detailed description is, therefore, not to be construed in a limiting sense, as the scope of the present invention is defined by the claims.
In
Two low-voltage, differential driven signal pairs, or links 50A and 50B (collectively, links 50) establish a conduit between the devices 10, through which the devices may communicate. The link 50A processes transactions that are sent from the device 10A (as transmitter) to the device 10B (as receiver). Likewise, the link 50B processes transactions that are sent from the device 10B (as transmitter) to the device 10A (as receiver).
Each device consists of distinct functional layers for processing transactions. The device 10A includes a core 12A, a transaction layer 20A, a link layer 30A, and a physical layer 40A. The device 10B includes a core 12B, a transaction layer 20B, a link layer 30B, and a physical layer 40B. Transaction request 14A originates from the core 12A of the device 10A while transaction request 14B originates from the core 12B of the device 10B (collectively, transaction requests 14). Either device may be a transmitter or a receiver, depending on the direction of communication. Further, both devices 10A and 10B are involved in the processing of either the transaction request 14A or the transaction request 14B.
Arrows in
The header 52, which appears at the beginning of the TLP 22, is a set of fields that includes information about the transaction request 14, such as the purpose of the transaction and other characteristics. In some embodiments, the header 52, is twelve to sixteen bytes in length, and includes such information as the transaction type, the transaction length, and the identification (ID) of the requesting device. The data field 54 includes any data involved in the transaction. (For a write transaction, the data field 54 includes the data to be written, as one example.) For transactions that involve no data, the data field is of length zero. Once the TLP 22 is assembled at the transaction layer 20A, the TLP 22 is passed to the link layer 30A within the device 10A.
At the link layer 30A, a new transaction layer packet (TLP) 32 is constructed by adding fields to the TLP 22. The link layer 30A is an intermediate stage between the transaction layer 20A and the physical layer 40A. To ensure that the packets are reliably transmitted to the receiving device 10B, the link layer 30A assigns a sequence number 56 to each TLP. In
The physical layer 40A takes the TLP 32 and prepares it for serial transmission over the link 50A. A frame 62 is added to the beginning of the TLP and a second frame 64 is added to the end of the TLP, resulting in packet 42. The packet 42 is then transmitted, one bit 44 at a time, over the link 50A, to be received by the device 10B (i.e., the receiving device).
At the receiving device 10B, a reverse process transforms the packet back into a form that can be processed by the core 12B. The serialized stream of bits 44 received by the device 10B is assembled into a packet 42 in the physical layer 40B, where it is stripped of the frames 62 and 64 and sent to the link layer 30B as TLP 32 (which includes the TLP 22). The link layer 30B confirms the sequence number 56 and calculates the CRC 58. If one or both indicators are erroneous, the link layer 30B requests retransmission of the transaction request 14, by sending a link level retry (LLR) signal to the transmitting device 10A (going through the link 50B). If the sequence number 56 and CRC 58 are correct, the link layer sends the TLP 22 (minus the sequence number and CRC) to the transaction layer 20B.
Once the TLP 22 has reached the transaction layer 20B, the packet has already passed data integrity checks at the link layer. However, the transaction layer 20B checks several fields of the header 52 to ensure proper processing of the TLP 22, before sending it on to the core 12B. Finally, the transaction layer 20B submits the transaction request 14 to the core 12B. Thus, the transaction request 14 that started at the core 12A of the device 10A is successfully received by the core 12B of the device 10B.
Transaction requests 14 submitted by the device 10B are similarly processed. If, for example, the transaction request 14 from the device 10A is one in which a response is expected, the core 12B of the device 10B will issue a transaction request in the other direction, back to the device 10A. In any event, a transaction request 14 initiated by the core 12B becomes a TLP 22 at the transaction layer 20B, a TLP 32 at the link layer 30B, and a serially transmitted packet 42 at the physical layer 40B. Serialized bits 44 traverse the link 50B, to be received by the device 10A, and assembled into packet 42 in the physical layer 40A. There, the frames 62 and 64 are stripped off, the TLP 32 is sent to the link layer 30A, where the sequence number 56 and CRC 58 are verified, then the header 52 and data 54 portions (i.e., the TLP 22) are sent to the transaction layer 20A. The transaction layer 20A processes the header (and transaction layer CRC, if present), and submits the transaction request 14 to the core 12A of the receiving device 10A.
In
The transaction request 14 is processed as a sequence of distinct operations, as described above. In
CRC is used to detect transmission errors and loss of packets. CRC processing typically involves polynomial or modulo-based mathematics being performed on some portion or the entire packet. The CRC verification may start with the sequence number 56, and include the header 52, the data 54, and the CRC 58. The result produced is compared with an expected result, such as zero. As another possibility, the CRC verification may include the sequence number 56, the header 52, and the data 54, such that the result produced is compared with the CRC 58. In some embodiments, a 32-bit polynomial CRC is calculated over the sequence number 56, the header 52, and the data 54 of the TLP. A myriad of other possibilities for data integrity verification are known. CRC verification can be performed automatically on a serially bitstream as it is being transmitted from one location to another.
Once both the sequence number and the CRC are verified, the link layer engine 34 sends the header and data of the TLP 32 (i.e., the TLP 22) to the memory 26 of the transaction layer 20 (block 190).
Once the TLP 22 is in the memory 26, the transaction layer engine 24 can begin processing the TLP. The transaction layer engine 24 checks the header 52 for pertinent information about the transaction request (block 192). If information in the header is erroneous, the transaction layer drops the transaction and either reports the associated error to the sending device or denotes the error in a transaction log (block 194). Once the header (and CRC) are verified, the engine 24 sends the transaction request (and data 54, if present) to the core 12 of the device 10 (block 196). Thus, the processing of a transaction request within the prior art receiving device of
An alternative protocol is illustrated in
The link layer 130 includes a link layer engine 134 for processing a TLP 132 received from the physical layer 140. The TLP 132 includes a sequence number 156, a header 152, data 154, and a CRC 158. As in the prior art, the link layer engine 134 processes both the sequence number 156 and the CRC 158. However, after processing the sequence number, but before processing the CRC, the link layer engine 134 sends the header 152 and the data 154 portions of the TLP 132 to the transaction layer 120.
The link layer 130 of the receiving device 100 has no memory, as was found in the prior art receiving device (see
TLPs 132 that are received with a sequence number 156 that does not match the expected sequence number are of no interest to the transaction layer 120. In
For a given TLP, where the sequence number 156 is greater than expected and the CRC status is good (first table entry), the link layer engine 130 logs an error, to indicate that a sequence number synchronization error may have occurred. A link layer retry is issued by the link layer engine 130, if not already in progress. Thus, the current TLP is ignored by the link layer engine 130 and is not forwarded to the transaction layer. Where the sequence number 156 is greater than expected, but the CRC status is bad (second table entry), a link layer retry is issued by the link layer engine 130 (in response to the bad CRC), if not already in progress, and the current TLP is ignored.
Where the sequence number 156 is less than the expected sequence number, the TLP is also ignored. When the CRC is good (third table entry), the current TLP is a retransmitted packet that was already serviced by the transaction layer. Thus, the current TLP may be ignored. When the CRC is bad (fourth table entry), it cannot be determined which field of the packet is in error (since both the sequence number and the CRC are bad). The link layer engine 130 issues a link layer retry, if not already in progress. Again, the current TLP is ignored.
Thus, the packets that are of interest to the transaction layer 120 are the ones for which the sequence number 156 matches the expected sequence number. This allows the link layer engine 130 to process the sequence number alone and send the header 152 and the data 154 of the TLP 132 to the transaction layer 120, once the sequence number is confirmed as correct.
Since the TLP 132 is transmitted serially to the link layer 130 from the physical layer 140, the link layer engine 134 receives the sequence number 156 as the first bit of the packet. Although confirmation of the sequence number 156 is made at this time, the link layer engine 134 is also beginning to process the CRC 158.
CRC protection typically adds latency because the packet is not considered useful downstream until the CRC is validated. Whatever the validation method, CRC verification may be performed on the incoming serial bitstream without storing the packet contents in memory. Upon receiving the first bit of the packet, the link layer engine 134 verifies the sequence number 156 and consequently routes the bits (i.e., the header and data fields) to storage 126 in the transaction layer 120, performing the CRC verification on the bits of the packet 132 as they pass from the physical layer, through the link layer (without being stored), to the transaction layer.
At the transaction layer, a transaction layer engine 134 performs pre-processing of the TLP 122, which includes the header 152 and the data 154 that was speculatively transmitted by the link layer engine 134. The transaction layer engine 124 ensures that the transaction request 114 is not globally visible (i.e., available to the core) until validated by the link layer engine 134. The memory 126 within the transaction layer 120, however, stores both speculatively transmitted packets and verified packets simultaneously. Thus, pointers are used to distinguish between the packets having different status, which are stored in the same memory.
For illustration, the memory 126 of
The transaction layer engine 124 uses a load pointer 28A, a speculative pointer 28B, and an unload pointer 28C (collectively, pointers 28) to keep track of the status of the TLPs 122 within the memory 126. The load pointer 28A points to the address where the current TLP 122A is speculatively stored. Any new packets sent by the link layer engine are stored at the address pointed to by the load pointer. The unload pointer 28C points to the address where TLPs which are ready for transmission to the core 112 are stored. The TLP 122C has both been “released” by the link layer engine 134, having passed CRC verification, and by the transaction layer engine 124, having been processed there as well.
Between the load pointer 28A and the unload pointer 28C, the speculative pointer 28B essentially floats, pointing to intermediate address locations of the memory 126. The position of the speculative pointer 28B is governed by whether the link layer engine 134 has confirmed the validity of the speculatively forwarded TLP or not to the transaction layer engine 124.
Take the TLP 122B, for example. In
If, instead, the CRC of the TLP 122B is determined to be bad by the link layer engine 134, the transaction layer engine 124 is notified and the load pointer 28A is moved “down” one address location, in a direction towards the speculative pointer 28B. The effect of this downward movement of the load pointer 28A is to cause a subsequently loaded TLP to be written over the TLP 122B. This is an appropriate result, since the TLP 122B failed the CRC validation.
A flow diagram in
If, however, the sequence number matches the expected sequence number, the link layer engine 134 speculatively forwards the header 152 and the data 154 of the TLP 132 to the transaction layer (block 176). The forwarded TLP 122 is stored in the memory 126 of the transaction layer 120 (block 180). At this point, both the link layer and the transaction layer may simultaneously process part of the transaction request 114. At the transaction layer 120, the transaction layer engine 124 is checking the header of the TLP for information about the transaction (block 182). If the header is incorrect, such as when the header information is inconsistent with the type of transaction being sent, the transaction layer engine 124 drops the transaction and either reports the associated error or records the error in a transaction log (block 184). Otherwise, the header is considered correct. Once the header and CRC are verified, the transaction layer engine 124 is unable to forward the transaction request to the core 112, until the request is “released” by the link layer engine 134.
Meanwhile, the link layer engine 134 is processing the CRC of the TLP 132, after having forwarded part of the TLP to the transaction layer (block 186). If the CRC is not correct, the link layer engine 134 will notify the transaction layer engine 124 that the TLP is bad (block 194). The transaction layer engine will change the location of the load pointer 28A, moving it toward the speculative pointer 28B (block 190). This has the effect of causing subsequent packets to overwrite the current TLP. If the CRC is correct, the link layer engine will so notify the transaction layer engine (block 192). In response, the transaction layer engine 124 changes the location of the speculative pointer 28B, moving it toward the load pointer 28A (block 188). This ensures that subsequent packets will not be written over the current packet. TLPs that complete verification are sent to the core 112.
The receiving device 100 (
While the invention has been described with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of the invention.