Contained herein is material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction of the patent disclosure by any person as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all rights to the copyright whatsoever.
The present invention relates to computer systems; more particularly, the present invention relates to cache memory systems.
Many storage, networking, and embedded applications require fast input/output (I/O) throughput for optimal performance. I/O processors allow servers, workstations and storage subsystems to transfer data faster, reduce communication bottlenecks, and improve overall system performance by offloading I/O processing functions from a host central processing unit (CPU). Typically I/O processors process Scatter Gather List (SGLs) generated by the host to initiate necessary data transfers. Usually these SGLs are moved to the I/O processor's local memory from the host memory, before I/O processors start processing the SGLs. Subsequently, the SGLs are processed by being read from local memory.
The invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which:
According to one embodiment, a mechanism to pull data into a processor cache is described. In the following detailed description of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.
Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
A chipset 107 is also coupled to bus 105. Chipset 107 includes a memory control hub (MCH) 110. MCH 110 may include a memory controller 112 that is coupled to a main system memory 115. Main system memory 115 stores data and sequences of instructions that are executed by CPU 102 or any other device included in system 100. In one embodiment, main system memory 115 includes dynamic random access memory (DRAM); however, main system memory 115 may be implemented using other memory types. Additional devices may also be coupled to bus 105, such as multiple CPUs and/or multiple system memories.
Chipset 107 also includes an input/output control hub (ICH) 140 coupled to MCH 110 to via a hub interface. ICH 140 provides an interface to input/output (I/O) devices within computer system 100. For instance, ICH 140 may be coupled to a Peripheral Component Interconnect Express (PCI Express) bus adhering to a Specification Revision 2.1 bus developed by the PCI Special Interest Group of Portland, Oreg.
According to one embodiment, ICH 140 is coupled an I/O processor 150 via a PCI Express bus. I/O processor 150 transfers data to and from ICH 140 using SGLs.
Referring to
The XSI is a split address data bus where the data and address are tied with a unique Sequence ID. Further, the XSI bus provides a command called “Write Line” (or “Write” in the case of writes less than a cache line) to perform cache line writes on the bus. Whenever a PUSH attribute is set during a Write Line (or Write), one of the CPUs 202 (CPU_1 or CPU_2) on the bus will claim the transaction if a Destination ID (DID) provided with the transaction matches the ID of the particular CPU 202
Once the targeted CPU 202 accepts the Write Line (or Write) with PUSH, the agent that originated the transaction will provide the data on the data bus. During the address phase the agent generating the command generates a Sequence ID. Then during the data transfer the agent supplying data uses the same sequence ID. During reads the agent claiming the command will supply data, while during writes the agent that generated the command provides data.
In one embodiment, XSI bus functionality is implemented to enable DMA controller 220 to pull data directly in to a cache of a CPU 202. In such an embodiment, DMA controller 220 issues a set of Write Line (and/or Write) with PUSH commands targeting a CPU 202 (e.g., CPU_1). CPU_1 accepts the commands, stores the Sequence IDs and waits for data.
DMA controller 220 then generates a sequence of Read Line (and/or Read) commands with the same sequence IDs used during Write Line (or Write) with PUSH commands. Interface unit 230 claims the Read Line (or Read) commands and generates corresponding commands on the external bus. When data returns from host system 200, interface unit 230 generates corresponding data transfers on the XSI bus. Since they have matching sequence IDs, CPU_1 claims the data transfers and stores them in its local cache.
At processing block 340, DMA controller 220 generates read commands to the XSI Bus with the same Sequence IDs. At processing block 350, external bus interface 230 claims the read command and generates read commands on the external bus. At processing block 360, external bus interface 230 places received data (e.g., SGLs) on the XSI bus. At processing block 370, CPU_1 accepts the data and stores the data in the cache. At processing block 380, DMA controller 220 monitors data transfers on the XSI bus and interrupts CPU_1. At processing block 390, CPU_1 begins processing the SGLs that are already in the cache.
The above-described mechanism takes advantage of a PUSH cache capability of a CPU within an I/O processor to move SGLs directly to the CPU's cache. Thus, there is only one data (SGL) transfer that occurs on the internal bus. As a result, traffic is reduced on the internal bus and latency is improved since it is not required to move SGLs first in to a local memory external to the I/O processor.
Whereas many alterations and modifications of the present invention will no doubt become apparent to a person of ordinary skill in the art after having read the foregoing description, it is to be understood that any particular embodiment shown and described by way of illustration is in no way intended to be considered limiting. Therefore, references to details of various embodiments are not intended to limit the scope of the claims, which in themselves recite only those features regarded as essential to the invention.