1. Field
Embodiments of the invention relate to the field of computer systems and more specifically, but not exclusively, to queue resource sharing for an input/output controller.
2. Background Information
Input/output (I/O) devices of a computer system often communicate with the system's central processing unit (CPU) and system memory via a chipset. The chipset may include a memory controller and an input/output controller. Devices of the computer system may be connected using various buses, such as a Peripheral Component Interconnect (PCI) bus.
A new generation of PCI bus, called PCI Express, has been promulgated by the PCI Special Interest Group. PCI Express uses high-speed serial signaling and allows for point-to-point communication between devices. Communications along a PCI Express connection are made using packets. Interrupts are also made using packets by using the Message Signal Interrupt scheme.
Current implementations assign dedicated resources to each PCI Express port of an I/O controller.
Non-limiting and non-exhaustive embodiments of the present invention are described with reference to the following figures, wherein like reference numerals refer to like parts throughout the various views unless otherwise specified.
In the following description, numerous specific details are set forth to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that embodiments of the invention can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring understanding of this description.
Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Referring to
A central processing unit (CPU) 106 and memory 108 is coupled to MCH 102. CPU 106 may include, but is not limited to, an Intel Pentium®, Xeon®, or Itanium® family processor, or the like. Memory 108 may include, but is not limited to, Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), Synchronized Dynamic Random Access Memory (SDRAM), Rambus Dynamic Random Access Memory (RDRAM), or the like. MCH 102 may also be coupled to a graphics card 110 via PCI Express link 126 (PCI Express discussed further below). In an alternative embodiment, MCH 102 may be coupled to an Accelerated Graphics Port (AGP) interface (not shown).
ICH 104 may include support for a Serial Advanced Technology Attachment (SATA) interface 112, an Integrated Drive Electronics (IDE) interface 114, a Universal Serial Bus (USB) 116, and a Low Pin Count (LPC) bus 118.
ICH 104 may also include PCI Express ports 120-1 to 120-4 that may operate substantially in compliance with the PCI Express Base Specification Revision 1.0a, Apr. 15, 2003. While the embodiment shown in
Each port 120 is coupled to an add-in device via a PCI Express link, such as PCI Express link 124. In the embodiment of
Alternative embodiments of computer system 100 may include other PCI Express port configurations (embodiments of port configurations are discussed below in conjunction with
Link 200 supports at least 1 lane. Each lane represents a set of differential signaling pairs, one pair for transmitting and one pair for receiving resulting in a total of 4 signals. A x1 link includes 1 lane. The width of link 200 may be aggregated using multiple lanes to increase the bandwidth of the connection between ICH 104 and device 128. In one embodiment, link 200 may include a x1, x2, and x4 link. Thus, a x4 link includes 4 lanes. In other embodiments, link 200 may provide up to a x32 link. In one embodiment, a lane in one direction has a rate of 2.5 Gigabits per second.
Information between devices is communicated using packets.
In general, the Transaction Layer assembles and disassembles Transaction Layer Packets (TLPs), such as TLP 252. TLP 252 includes a header 262 and data 264. TLPs may be used to communicate read and write transactions. TLPs may also include command functions, such as an interrupt.
The Data Link Layer serves as an intermediate stage between the Transaction Layer and the Physical Layer. The Data Link Layer may perform link management and data integrity verification. The Data Link Layer creates a Data Link Layer Packet (DLLP) 254 by adding a sequence number 260 and a Cyclic Redundancy Check (CRC) 266 for transmission. On the receive side, the Data Link Layer checks the integrity of packet 250 using CRC 266. If the receiving Data Link Layer detects an error, the Data Link Layer may request that the packet be re-transmitted.
The Physical Layer takes information from the Data Link Layer and transmits a packet across the PCI Express link. The Physical Layer adds packet framing 258 and 268 to indicate the start and end of packet 250. The Physical Layer may include drives, buffers, and other circuitry to interface packet 250 with link 200.
Referring to
In port width configuration 301, port 1 uses lane 1, port 2 uses lane 2, port 3 uses lane 3, and port 4 uses lane 4. Configuration 301 results in four x1 connections for devices.
In port width configuration 302, port 1 uses lanes 1 and 2. Port 2 is disabled. Port 3 uses lanes 3 and 4. Port 4 is disabled. Configuration 302 results in two x2 connections.
In port width configuration 303, port 1 uses lanes 1 and 2. Port 2 is disabled. Port 3 uses lane 3 and port 4 uses lane 4. Configuration 303 results in one x2 and two x1 connections.
In port width configuration 304, port 1 uses lanes 1-4. Ports 2-4 are disabled. Thus, configuration 304 results in one x4 connection. Turning to
Device port 218 has associated receive buffers 410 as well as replay buffer 412 and transmit buffers 414.
Turning to
Transmit buffers 406 include posted buffer 420, non-posted buffer 422, and completions buffer 424. Posted buffer 420 holds TLPs that do not require a reply from the receiver, such as a write transaction. Non-posted buffer 422 holds TLPs that may require a reply from the receiver, such as a read request.
Completions buffer 424 holds TLPs that are to be transmitted to device 128 in response to non-posted TLPs received from device 128. For example, ICH 104 may receive a read request (non-posted transaction) from device 128. The requested information is retrieved from memory and provided to ICH 104. The retrieved information is formed into one or more TLPs that may be placed in completions buffer 424 awaiting transmission to device 128.
Replay buffer 404 is used to maintain a copy of all transmitted TLPs until the receiving device acknowledges reception of the TLP. Once the TLP has been successfully received, that TLP may be removed from the Replay buffer 404 to make room for additional TLPs. If an error occurs, then the TLP may be re-transmitted from Replay buffer 404.
Receive buffers 408 include posted buffer 426, non-posted buffer 428, and completions buffer 430. Receive buffers 408 store the received TLPs until the receiving device is ready to act on the received packets.
Turning to
Transmit buffers 440 include VC(1) Transmit Buffers 440-1 to VC(N) Transmit Buffers 440-N. Port 120-1 has associated a single Replay buffer 442. Receive buffers 444 include VC(1) Receive Buffers 444-1 to VC(N) Receive Buffers 444-N. Each VC Transmit and VC Receive Buffer may include posted, non-posted, and completions buffers as described above in conjunction with
Turning to
Further, it will be understood that embodiments of load and unload pointers are not limited to five bits, as shown in
In other embodiments, an index pointer may be more or less than three bits if the size of a quarter is more or less than 8 entries. For example, the index pointer may be 4 bits wide [3:0] for 16 entries per quarter, or in another example, the index pointer may be 5 bits wide [4:0] for 32 entries per quarter. In other embodiments, the number of entries of a quarter does not have to correspond to a binary based number (discussed further below).
Referring to
Referring again to
A shared resource queue inlet 606 and a shared resource queue outlet 604 are coupled to queue 602. Shared resource queue inlet 606 receives load index pointer 616 and load segment pointer 618 for processing of TLP data received at TLP data in 608. Load segment pointer 618 identifies the quarter selected for loading of the data, and load index pointer 616 identifies the entry within the quarter for loading the data. Shared queue resource inlet 606 also receives port width configuration 620 to be used for identifying the selected quarter and its entry for loading of TLP data.
Shared resource queue outlet 604 receives unload segment pointer 612 and unload index pointer 614. Shared resource queue outlet 604 also receives port width configuration 620. Outlet 604 uses pointers 612 and 614, and the port width configuration 620, to determine which quarter and entry to unload data from to a particular port. The data is outputted from shared resource queue outlet 604 at TLP data out 610 to the designated port.
Turning to
Starting in a block 702, a load segment pointer, a load index pointer, and TLP data is received at a shared resource queue inlet. Proceeding to a block 704, the selected quarter of the shared resource queue is determined from the load segment pointer and the port width configuration. The port width configuration indicates how the quarters of the shared resource queue are allocated to the ports.
Continuing to a block 706, the entry within the selected quarter is determined from the load index pointer and the port width configuration. In a block 708, the queue entry is loaded with the received TLP data.
Proceeding to a decision block 710, the logic determines if the limit of the selected quarter has been reached. If the answer to decision block 710 is no, then the logic proceeds to a block 720 to increment the load index pointer. This increment of the index pointer sets the index pointer to the next available entry for loading of TLP data. The logic then returns to block 702.
If the answer to decision block 710 is yes, then the logic proceeds to a block 712 to wrap the load index pointer to the start of the selected quarter. Continuing to a decision block 714, the logic determines if the limit of the number of allocated quarters has been reached. If the answer to decision block 714 is yes, then the logic continues to a block 718 to wrap the segment pointer. The logic then returns to block 702.
If the answer to decision block 714 is no, then the logic continues to a block 716 to increment the load segment pointer. The logic then returns to block 702.
As an example of wrapping the index pointer and segment pointer, consider port width configuration 303. For this example, assume Q1 and Q2 are allocated to port 1, while Q3 and Q4 are allocated to ports 3 and 4, respectively. The segment pointer of Q1 starts at 00b (where “b” indicates a binary number). The Q1 segment pointer increments to 01b when the end of Q1 is reached. This segment pointer wraps around to 00b after the end of Q2 is reached because port 1 has a two quarter address limit. However, the segment pointer of port 3 always stays at 00b because it has a one quarter address limit. It will be understood that in block 718 for port 3, the segment pointer wraps by staying at 00b. The segment pointer of port 4 operates in a substantially similar manner as the segment pointer of port 3.
Referring to
Port width configuration information is provided to various multiplexers when handling the load and unload pointers. In one embodiment, this port width configuration information acts as select inputs to the multiplexers. These multiplexers will be described below using examples of various port width configurations. It will be understood that the use of “!” in
An example of operations by the embodiment of
At multiplexer (mux) 816, since port 4 is in a x1 configuration, the port 4 unload index pointer, shown as p4_unload_ptr[2:0], is passed to Q4. Since port 3 is not in a x2 configuration and port 1 is not in a x4 configuration, these unload index pointers are not passed through mux 816.
At mux 818, port 3 unload index pointer, shown as p3_unload_ptr[2:0], goes to Q3 since port 3 is not in a x4 configuration. At mux 820, since port 2 is in a x1 configuration, port 2's unload index pointer, p2_unload_ptr[2:0], is passed to Q2.
Continuing with this port width configuration 301 example, the unload segment pointers 808 will now be discussed. The logic of the unload segment pointers is grouped into a single mux 810. Corresponding logic for the load segment pointers 802 is provided in de-mux 812.
Since ports 1-4 are all in a x1 configuration, all their segment pointers remain at value 00b. The data from Q2 is always sent to P2 and data from Q4 is always sent to P4, as shown at TLP data out 836. P1_unload_ptr[3] and p3_unload_ptr[3] are inputted into mux 814. Since port 3 is in a x1 configuration, the p3_unload_ptr[3] is passed to mux 832. Since the value of p3_unload_ptr[3] is 0b, Q3 data is sent to P3 of TLP data out 836.
Also the output of mux 832 is inputted to mux 828. Since p1_unload_ptr[4] is 0b, the output of mux 830 is selected. Mux 830 outputs Q1 data since the value of p1_unload_ptr[3] is 0b. Thus, Q1 data is sent to P1.
Turning to the load portion of
In another example, port width configuration 304 will be used. In this configuration, all 4 lanes are assigned to port 1 and ports 2-4 are disabled. Thus, Q1-Q4 are allocated for use by port 1. Port 1 unload index pointer is sent directly to Q1. At mux 820, port 1 unload index pointer, shown as p1_unload_ptr[2:0] is passed to Q2 since port 1 is not in a x1 configuration. At mux 818, port 1 unload index pointer is passed to Q3 since port 1 is in a x4 configuration. At mux 816, the port 1 unload index pointer is passed to Q4 since port 1 is in a x4 port width configuration.
An embodiment of the incrementing of segment and index pointers for configuration 304 may be summarized as follows. The unload segment pointer of port 1, p1_unload_ptr[3:4], may start at 00b and work through Q1. At the end of Q1, the index pointer wraps around, and the segment pointer may advance to 01b to start unloading from Q2. The index pointer advances and wraps around again, while the segment pointer advances to 10b for Q3. At the end of Q3, the index pointer wraps around, and the segment pointer advances to 11b for Q4. At the end of Q4, the index pointer and the segment pointer wrap around to a value of 0.
Referring to
When the port 1 unload segment pointer is 01b, data from Q2 is sent to P1. At mux 830, p1_unload_ptr[3] selects Q2, and at mux 828, p1_unload_ptr[4] selects Q2 data from mux 830.
When port 1 unload segment pointer is 10b, data from Q3 is sent to P1. Mux 832 outputs Q3 data since the value of p1_unload_ptr[3] from mux 814 is 0b. Mux 828 then forwards Q3 data to P1 of TLP data out 836 since p1_unload_ptr[4] is 1b.
When port 1 unload segment pointer is 11b, data from Q4 is sent to P1. Q4 data is forwarded by mux 832 since p1_unload_ptr[3] from mux 814 is 1b. This Q4 data is forwarded by mux 828 to P1 since p1_unload_ptr[4] is 1b.
The load index pointers 804 and load segment pointers 802 operate in a similar fashion in port width configuration 304. From the above two examples, one skilled in the art will appreciate the operation of the embodiment of
Turning to
The index pointer counts up until the index pointer reaches its maximum value, stored at 918. In one embodiment, the maximum value corresponds to the number of entries of a quarter of the shared resource queue.
In one embodiment, the depth of a quarter may not be a binary depth, such as 2, 4, 8, etc. The embodiment of
The configuration of the port associated with pointer 900 is indicated by cfg_x1 input shown at 904 and the cfg_x2 input shown at 906. If neither cfg_x1 nor cfg_x2 is set to “1”, then it is assumed that the port is in a x4 configuration. The setting of the configuration allows for the segment pointer (ptr[3] and ptr[4]) to be incremented accordingly.
A wrap bit 908 is used to determine if the quarter associated with the load and unload pointers is empty or full. In one embodiment, if the segment and index values of the load pointer and the unload pointer are the same value, and if the wrap bit of both load and unload pointers are equal, then this is an empty condition of the quarter. If the wrap bit of both load and unload pointers are not equal, then this is a full condition of the quarter.
Embodiments as described herein provide for queue resource sharing for an I/O controller. Instead of having dedicated queues at each port of the ICH, embodiments herein provide a single queue that may be shared by multiple ports. This may result in a lower gate count and smaller die area than used by port-dedicated resources. Further, embodiments herein provide shared queue resources for I/O controllers having multiple port width configurations.
The above description of illustrated embodiments of the invention, including what is described in the Abstract, is not intended to be exhaustive or to limit the embodiments to the precise forms disclosed. While specific embodiments of, and examples for, the invention are described herein for illustrative purposes, various equivalent modifications are possible, as those skilled in the relevant art will recognize. These modifications can be made to embodiments of the invention in light of the above detailed description.
The terms used in the following claims should not be construed to limit the invention to the specific embodiments disclosed in the specification. Rather, the following claims are to be construed in accordance with established doctrines of claim interpretation.