The present invention relates generally to reducing memory fetch latency and, more particularly, to methods and apparatus for reducing memory fetch latency using a next fetch hint.
In a typical bus-based computer system, one or more processors may be connected to a memory controller. The one or more processors and the memory controller may be connected with shared or point to point busses. That is, generally speaking, a processor may be connected to a memory controller via a processor bus.
Internal processor frequencies are commonly reaching 2 GHz, with some running over 5 GHz. However, due to electrical limitations, it is not possible to run the interface (i.e., a processor bus) between a processor and a memory controller at such a high rate of speed. For example, for a non-serial processor bus, a data rate of 1000 MT/s is approaching the limit of what can be signaled. As such, the processor bus can be a bottleneck in bandwidth intensive applications, such as STREAM, SPECfp/SPECint, or SPECjbb.
Due to the rate of signaling for data returns, the rate at which commands may be issued on a processor bus may be limited. For instance, on a quad pumped processor bus, a request may be issued once every two cycles, so when reading from memory, the request rate may not exceed the maximum data bandwidth.
Internally generated requests by a processor may therefore be queued up inside the processor, waiting for their time to gain access to the processor bus. Work has been done in the past to prioritize prefetch reads versus actual reads, but given how fast processor cores are becoming, by the time a prefetch read reaches a processor bus queue, it may have morphed into a demand read, and any delay by the memory controller in processing the read may impact system performance.
In a first aspect of the invention, a processor may be provided. The processor may include logic, coupled to the processor, and to issue a currently issued memory fetch over a processor bus. The currently issued memory fetch may include a next fetch hint that may include information about a next memory fetch.
In a second aspect of the invention, a memory controller may be provided. The memory controller may include logic, coupled to the controller, and to receive a currently issued memory fetch. The currently issued memory fetch may include a next fetch hint including information about a next memory fetch. The memory controller may begin a memory access corresponding to the next memory fetch before the next memory fetch is received by the memory controller.
In a third aspect of the invention, a system may be provided. The system may include a processor, a memory controller, a processor bus to connect the processor to the memory controller, and logic. The logic may be coupled to the processor, and may issue a currently issued memory fetch from the processor to the memory controller over the processor bus. The currently issued memory fetch may include a next fetch hint including information about a next memory fetch.
In a fourth aspect of the invention, a method may be provided. The method may include issuing a currently issued memory fetch from a processor to a memory controller over a processor bus. The currently issued memory fetch may include a next fetch hint including information about a next memory fetch.
Other features and aspects of the present invention will become more fully apparent from the following detailed description, the appended claims, and the accompanying drawings.
What is needed is a method to allow a memory controller to be able to view a processor bus queue, to begin processing of a memory fetch that may be issued, prior to its issuance on the processor bus. An embodiment of the present invention may provide a method for a processor to communicate information about a next memory fetch it may issue as part of a currently issued memory fetch (i.e., bus request). This may allow a memory controller to begin the next memory fetch while the next memory fetch may still be in the processor bus queue, and prior to its issuance on the processor bus. When the next memory fetch is then issued, a memory access (e.g., DRAM access) has already commenced, and the data may be returned with reduced latency. The information about the next memory fetch may be referred to as a next fetch hint.
In an embodiment, the processor bus 106 may be a quad pumped data bus. In a quad pumped data bus, bus requests 200 may be issued once every other cycle, and may queue up inside the processor bus queue 108, waiting for their time slice on the processor bus 106. The presence of other requesters on the processor bus 106 may cause further queuing within the processor bus queue 108.
In an embodiment, the processor 102 may examine a next queued request (e.g., a next memory fetch) in the processor bus queue 108, and provide a next fetch hint 210 as part of a currently issued memory fetch (i.e., bus request 200). The next fetch hint 210 may indicate the address of the next memory fetch.
The operation of the bus-based system 100 is now described with reference to
In an embodiment, to take advantage of streaming applications, or “adjacent sector” prefetch behavior of the processor 102, the next fetch hint may be a limited subset of next possible fetches. For example, if two bits of the request phase 202 were used as the next fetch hint 210, the possible combinations could be (assuming a 64 KB cacheline): 00—No next fetch hint; 01—the next bus request may be to the following 64 B cacheline; 10—the next bus request may be to the following 128 B cacheline; and 11—the next bus request may be to the previous 64 B cacheline.
In
The memory controller 104 may use the next fetch hint 214 to manipulate the address of the current bus request 200, and issue a subsequent request of the new address to memory prior to the processor 102 actually issuing its request (e.g., next memory fetch). Then, when the processor 102 does issue its request, the request may be matched with the already in-flight memory (e.g., DRAM) access, resulting in a lower latency for the second request.
The foregoing description discloses only exemplary embodiments of the invention. Modifications of the above-disclosed embodiments of the present invention of which fall within the scope of the invention will be readily apparent to those of ordinary skill in the art. For instance, although embodiments are described with reference to environments including a processor bus, in alternative embodiments, environments may include a process bus interface and/or network protocol. Further, although the next fetch hint 210 is described as two-bits of the request phase 202, a larger or smaller number of bits could be used. Similarly, a larger or smaller number of possible next fetch hints could be possible.
Accordingly, while the present invention has been disclosed in connection with exemplary embodiments thereof, it should be understood that other embodiments may fall within the spirit and scope of the invention as defined by the following claims.