[Not Applicable]
[Not Applicable]
[Not Applicable]
Many computer systems use a prefetch instruction to transfer lines of instructions or data from memory to cache. One disadvantage of using the prefetch instruction may be the possibility that the processor does not immediately use a line in cache, and transferring the line takes up time and cache bandwidth that could be used for other operations. Furthermore, the transferred line may replace a line currently being used by the processor.
Another disadvantage of using the prefetch instruction may be the possibility that the processor requires multiple lines. Code space may be wasted for prefetch instructions, and specifying the line transfer at the correct time adds complexity to a program.
Prefetching may also be done in hardware by automatically transferring lines that are likely to be used by the processor. A disadvantage of hardware prefetching may be the possibility of bringing more lines into the cache than the processor needs, thereby increasing the cache access time and reducing the advantage of caching.
Limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with some aspects of the present invention as set forth in the present application with reference to the drawings.
Aspects of the present invention may be found in a computer system with cache storage, substantially as shown in and/or described in connection with at least one of the figures, as set forth more completely in the claims.
These and other advantages, aspects and novel features of the present invention, as well as details of an illustrated embodiment thereof, will be more fully understood from the following description and drawings.
Aspects of the present invention relate to performing a prefetch operation. Systems and methods to support programmable prefetching of one or more lines of instructions or data into cache storage of a computer system is disclosed. A secondary cache is used to avoid the transfer of a line that is currently being used by the processor. Sequential prefetching is made possible by presetting control registers.
The processor 101 may execute a prefetch instruction 111. In accordance with the prefetch instruction 111 and a plurality of preset control registers 109, a current line may be requested 113 by the secondary cache 105 and may be transferred 115 from the memory 107 during program execution. The current line may be stored by address as one of a plurality of addressed lines in the secondary cache 105.
The processor 101 may access consecutive memory locations sequentially. For example, the current line and the next line may be sequentially addressed. The processor 101 may execute one prefetch instruction 111 for the current line and the next line may be refilled automatically in the background. This may reduce memory latency, and therefore, applications with a significant amount of data streaming can be executed more efficiently.
Line transfer may be controlled by software or hardware. A software prefetch may use the prefetch instruction 111 to specify and transfer the line. The hardware prefetch may automatically transfer lines likely to be used by the processor.
Checking and transferring a sequentially addressed line may be repeated according to one or more field(s) in a control register 109. For example, there may be two fields, SPF and HPF, to control the software and hardware prefetching respectively. Alternatively, prefetching instructions (I) may be distinguished from prefetching data (D) by using 4 fields, SPF_I, SPF_D, HPF_I, and HPF_D.
The SPF field may be set according to a required prefetch duration. Table 1 defines exemplary settings for SPF.
When using the hardware prefetch and the secondary cache 105 is accessed, a line request 113 is sent to memory 107 and the requested line is transferred 115 from memory 107 to the secondary cache 105. The HPF field may be set according to a required prefetch duration. Table 2 defines exemplary settings for HPF.
A line may be stored in the secondary cache 105 with an associated bit, pf_next, that indicates a line request 113. The associated bit, pf_next, may be set according to a field in the control register 109. The value of pf_next may be set when the addressed line is transferred into the secondary cache 105 and may be based on SPF_I, HPF_I, SPF_D, and HPF_D for instruction requests and data requests, respectively. Table 3 defines exemplary settings for pf_next.
The execution of a program may require a line from memory. That line may be requested by address from primary cache 303. If the requested line is in primary cache 305a, the processor may access the requested line from primary cache 307 without requiring a memory transfer that may introduce delay. If a cache miss occurs at primary cache 305b, secondary cache is searched by address for the requested line 309. If a cache miss occurs at secondary cache 309a, the requested line is transferred from memory to primary cache 311, where the processor may access the requested line from primary cache 307.
If the address of the requested line is found in secondary cache 309b, the requested line is transferred from secondary cache to primary cache 313. After transferring the requested line from secondary cache to primary cache, secondary cache may be checked for a next line, and if the next line is not in secondary cache, the next line may be transferred from memory to secondary cache. The next line becomes another of the plurality of addressed lines in secondary cache.
The present invention is not limited to the particular aspects described. Variations of the examples provided above may be applied to a variety of processors without departing from the spirit and scope of the present invention.
Accordingly, the present invention may be realized in hardware, software, or a combination of hardware and software. The present invention may be realized in a centralized fashion in an integrated circuit or in a distributed fashion where different elements are spread across several circuits. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited. A typical combination of hardware and software may be a general-purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein.
The present invention may also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which when loaded in a computer system is able to carry out these methods. Computer program in the present context means any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: a) conversion to another language, code or notation; b) reproduction in a different material form.
While the present invention has been described with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the present invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the present invention without departing from its scope. Therefore, it is intended that the present invention not be limited to the particular embodiment disclosed, but that the present invention will include all embodiments falling within the scope of the appended claims.
The same mechanism can be applied to i) level-0 cache and level-1 cache (primary cache in this embodiment), ii) level-2 cache (also called secondary cache here) and level-3 cache (also called tertiary cache).
Number | Date | Country | |
---|---|---|---|
60667481 | Apr 2005 | US |