1. Field of the Invention
The present invention generally relates to exchanging data on a bus between devices and, more particularly, to exchanging data between devices on a single bus using multiple virtual channels.
2. Description of the Related Art
A system on a chip (SOC) generally includes one or more integrated processor cores, some type of embedded memory, such as a cache shared between the processors cores, and peripheral interfaces, such as external bus interfaces, on a single chip to form a complete (or nearly complete) system. Often SOCs communicate with other devices, such as a memory controller or graphics processing unit (GPU), by exchanging data packets over an external bus. Often, the devices will communicate over a single external bus utilizing muliple streams of data, commonly referred to as virtual channels.
Virtual channels are referred to as virtual because, as multiple virtual channels may utilize a common physical interface (e.g., the bus), they appear and act as separate channels. Virtual channels may be implemented using various logic components (e.g., switches, multiplexors, etc.) utilized to route data, received over the common bus, from different sources to different destinations, in effect, as if there were separate physical channels between each source and destination. An advantage to utilizing virtual channels is that various processes utilizing the data streamed by the virtual channels may operate in parallel which may improve system performance (e.g., while one process is receiving/sending data over the bus, another process may be manipulating data and not need the bus).
In a system that utilizes multiple virtual channels to exchange data over a common bus, data is typically exchanged using data packets sent over the virtual channels. For example, these packets may include command packets, such as packets to request data and packets to respond with requested data. When a transmitting device sends a packet, the receiving device typically replies with a packet acknowledging the packet was received. Occasionally, due to some type of bus error, a packet can be lost, which is typically detected when a reply packet acknowledging that packet is not received in a given amount of time.
In conventional systems, the loss of a data packet requires all data packets, for all virtual channels, that have been sent after the lost data packet to also be retransmitted (or “retried”). Unfortunately, even those commands issued by virtual channels that did not experience the lost packet are retried. In other words, a lost packet on a single virtual channel can adversely affect the performance of the entire physical link.
Accordingly, what is needed is methods and systems to reduce the impact that a lost packet on one virtual channel has on other virtual channels.
The present invention generally provides methods and systems that reduce the impact that a lost packet on one virtual channel has on other virtual channels.
One embodiment provides a method of communicating with an external device over a bus utilizing a plurality of virtual channels, each virtual channel representing a stream of data exchanged on the bus. The method generally includes maintaining at least one link retry timer for each virtual channel used to send data packets to the external device, initializing a first link retry timer in conjunction with sending a first data packet to the external device over a corresponding first virtual channel, and resending one or more previously sent and unacknowledged data packets to the external device over the corresponding virtual channel, in response to expiration of the first link retry timer.
Another embodiment provides an integrated circuit (IC) device. The device generally includes one or more processor cores, a bus interface for transferring data to and from an external device via an external bus, and link retry logic circuitry. The link retry logic circuitry is generally configured to maintain at least one link retry timer for each of a plurality of virtual channels used to send data packets from the one or more processing cores to the external device via the bus interface and initiate the resending of data packets over a virtual channel in response to detecting expiration of a corresponding link retry timer.
Another embodiment provides a system generally including a bus, one or more external devices, and a system on a chip (SOC). The SOC has one or more processor cores and link retry logic circuitry configured to maintain at least one link retry timer for each of a plurality of virtual channels used to send data packets from the one or more processing cores to the one or more external devices via the external bus and initiate the resending of data packets over a virtual channel in response to detecting expiration of a corresponding link retry timer.
So that the manner in which the above recited features, advantages and objects of the present invention are attained and can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments thereof which are illustrated in the appended drawings.
It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
Embodiments of the present invention generally allow lost packets on one virtual channel to be retried without requiring all subsequently issued packets, sent over other virtual channels, to be retried. In other words, command retries may be performed on a “per virtual channel” basis. As a result, these other virtual channels may not suffer reductions in their bandwidth due to a lost packet occurring on another virtual channel. For some embodiments, at least one link retry timer may be maintained for each of a plurality of virtual channels used to send data packets to an external device.
As used herein, the term virtual channel generally refers to a stream of data from one component to another. Virtual channels may be implemented using various logic components (e.g., switches, multiplexors, etc.) utilized to route data, received over a common bus, from different sources to different destinations, in effect, as if there were separate physical channels between each source and destination.
In the following, reference is made to embodiments of the invention. However, it should be understood that the invention is not limited to specific described embodiments. Instead, any combination of the following features and elements, whether related to different embodiments or not, is contemplated to implement and practice the invention. Furthermore, in various embodiments the invention provides numerous advantages over the prior art. However, although embodiments of the invention may achieve advantages over other possible solutions and/or over the prior art, whether or not a particular advantage is achieved by a given embodiment is not limiting of the invention. Thus, the following aspects, features, embodiments and advantages are merely illustrative and are not considered elements or limitations of the appended claims except where explicitly recited in a claim(s). Likewise, reference to “the invention” shall not be construed as a generalization of any inventive subject matter disclosed herein and shall not be considered to be an element or limitation of the appended claims except where explicitly recited in a claim(s).
As illustrated, each processor core 112 may have access to its own primary (L1) cache 114, as well as a larger shared secondary (L2) cache 116. In general, copies of data utilized by the processor cores 112 may be stored locally in the L2 cache 116, preventing or reducing the number of relatively slower accesses to external main memory 140. Similarly, data utilized often by a processor core may be stored in its L1 cache 114, preventing or reducing the number of relatively slower accesses to the L2 cache 116.
The CPU 110 may communicate with external devices, such as a graphics processing unit (GPU) 130 and/or a memory controller 136 via a system or frontside bus (FSB) 128. The CPU 110 may include an FSB interface 120 to pass data between the external devices and the processing cores 112 (through the L2 cache) via the FSB 128. An FSB interface 132 on the GPU 130 may have similar components as the FSB interface 120, configured to exchange data with one or more graphics processors 134, input output (I/O) unit 138, and the memory controller 136 (illustratively shown as integrated with the GPU 130).
As illustrated, the FSB interface 120 may include a physical layer 122, link layer 124, and transaction layer 126. The physical layer 122 may include hardware components for implementing the hardware protocol necessary for receiving and sending data over the FSB 128. The physical layer 122 may exchange data with the link layer 124 which may format data received from or to be sent to the transaction layer 126.
As illustrated, the transaction layer 126 may exchange data with the processor cores 112 via a CPU bus interface 118. For some embodiments, data may be sent over the FSB as packets. Therefore, the link layer 124 may contain circuitry configured to encode into packets or “packetize” data received from the transaction layer 126 and to decode packets of data received from the physical layer 122, which may include a serializer 243 and a de-serializer 244 (shown in
As shown in
As illustrated, the virtual channels may be used to transfer data into and out of a shared buffer pool 210. Each virtual channel may be allocated a different portion of the shared buffer pool. For example, the first transmit-side virtual channel 2201 may be allocated and utilize buffers 211 and 212 to hold request commands and data that will be sent in packets to an external device, while the second transmit-side virtual channel 2202 may be allocated and utilize buffers 213 and 214 to hold response commands and data to be sent the external device (e.g., in response to commands received therefrom). Similarly, the first receive-side virtual channel 2203 may be allocated and utilize buffer 215 to hold request commands and data received from the external device, while the second receive-side virtual channel 2204 may be allocated and utilize buffers 216 and 217 to hold response commands and data received from the external device.
For some embodiments, each data packet sent to the external device on a virtual channel may be assigned a sequence count. Acknowledgement packets received from the external device for a particular packet may contain this assigned count to indicate that packet has been successfully received by the external device. Each virtual channel may utilize a unique sequence, different from those used by other virtual channels.
Each transmitting device may have a data structure that is used to retain pertinent command information in case packet retries are required on its transmit virtual channels. For example, this data structure may retain (or buffer) a series of packets that have been sent. In the event any of these packets are not acknowledged in some predetermined period of time, that packet and all subsequent packets may be retried. As illustrated, for some embodiments, this data structure may be implemented using a circular buffer 222. The circular buffer 222 may provide a straightforward method for matching commands with their corresponding sequence count. A given command packet may always have the same index in the queue, and various pointers into the circular buffer will wrap around as they reach the top (hence the term circular). Similar data structures operating in a similar manner may also be utilized on the GPU side, to track data packets sent to the CPU over virtual channels 2203 and 2204.
For each circular buffer 222, a set of pointers may be maintained that indicate important buffer entries. For example, as illustrated in
As previously described, occasionally data packets sent the external device over one of the virtual channels may be lost (e.g., due to a bit error resulting in a bad checksum), as indicated by the failure to receive an acknowledgement from a receiving device. Referring back to
Each link timer may be initiated and reset in conjunction with the sending of packets and receiving of corresponding acknowledgement packets, respectively. For example, at step 304, a link timer for a virtual channel may be activated when sending a packet on that virtual channel. In other words, the link timer may be initialized with the maximum amount of time in which a packet must be acknowledged before a sent packet is declared lost. For some embodiments, this maximum acknowledgement time may be programmable, for example, by writing to a control register.
At step 306, in response to detecting expiration of a link timer for a corresponding virtual channel, outstanding packets for that virtual channel, but not all virtual channels, may be retried. In other words, other virtual channels that have not experienced lost packets may continue to send packets without retrying outstanding packets, which may increase overall system bandwidth.
For some embodiments, link retry logic 230 may perform operations for each virtual channel, for example, according to the exemplary operations 400 of
Regardless, if a new packet is to be sent on a virtual channel, as determined at step 404, the new packet is sent, at step 406, and a link timer for that virtual channel is activated, at step 408. For some embodiments, the circular buffer may be updated, at step 410. For example, the command sent at step 406 may have been indicated by the send pointer 253, which may be subsequently incremented. As previously described, for some embodiments, a separate link timer may not be maintained for each packet sent. Therefore, rather than reset the link timer with each packet sent, the link retry logic may contain additional logic to determine when to reset\initialize the link timer. For example, a single common link timer may be reset after receiving acknowledge packets (with the link timer used to monitor acknowledgement timeout of a subsequently sent packet).
If an acknowledgement packet is received, as determined at step 412, the circular buffer and/or link timer may be updated, at step 414. For example, because an outstanding packet has been acknowledged, the start pointer 254 used to indicate the start of outstanding packets in the circular buffer 222 may be incremented to point to a subsequently issued outstanding packet. If a single link timer is utilized for each virtual channel, the link timer may be re-initialized, as described above, to begin monitoring for an acknowledgement of this subsequently issued outstanding packet.
If an acknowledgement packet is not received before a corresponding link timer has expired, as determined at step 416, outstanding packets for this virtual channel may be retried, at step 418. For some embodiments, all packets that have been sent on this virtual channel since the unacknowledged packet and including the unacknowledged packet may be retried. The unacknowledged packets that should be retried may be determined by examining pointers maintained by the circular buffer 222.
For example, if the Start Pointer 254 points to the earliest command not acknowledged and the Send Pointer 253 points to the next command to send, packets starting with that pointed to by Start Pointer 254 up to the packet just before the packet pointed to by Send Pointer (i.e., Send Pointer—1) may be considered outstanding and may be retried. For other embodiments that maintain a separate link timer for each outstanding data packet on a particular virtual channel, only those commands sent after a command whose corresponding link timer has expired (as well as that command) may need to be retried. In either case, only outstanding data packets sent over virtual channels experiencing lost packets may be resent.
By maintaining one or more link timers for each virtual channel, only those virtual channels experiencing lost packets may require packets to be retried. Other virtual channels, not experiencing lost packets, may avoid having to retry packets, thereby increasing overall system bandwidth.
While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.