1. Technical Field
The present invention relates generally to computer memory, and particularly to computer cache for storing frequently accessed lines.
2. Description of Related Art
Cache refers to an upper level memory used in computers. When selecting memory systems, designers typically must balance performance and speed with cost and other limitations. In order to create the most effective machines possible, multiple types of memory are typically implemented.
In most computer systems, the processor is more likely to request information that has recently been requested. Cache memory, which is faster but smaller than main memory, is used to store instructions used by the processor so that when an address line that is stored in cache is requested, the cache can present the information to the processor faster than if the information must be retrieved from main memory. Hence, cache memories improve performance.
Cache performance is becoming increasingly important in computer systems. Cache hits, which occur when a requested line is held in cache and therefore need not be fetched from main memory, save time and resources in a computer system. Several types of cache have therefore been developed in order to increase the likelihood of consistent cache hits and to reduce misses as much as possible.
Several types of cache have been used in prior art systems. Instruction caches (I-caches) exploit the temporal and spatial locality of storage to permit instruction fetches to be serviced without incurring the delay associated with accessing the instructions from the main memory. However, cache lines that are used frequently, but spaced apart temporally or spatially, may still be evicted from a cache depending on the associativity and size of the cache. On a cache miss, the processor incurs the penalty of fetching the line from main memory, thus reducing the overall performance.
Therefore, it would be advantageous to have a method and apparatus that allows lines to continue to be cached based on the frequency of their use can potentially increase the overall hit rates of cache memories.
In an example of a preferred embodiment, a cache system for a computer system is provided with a first cache for storing a first plurality of instructions, a second cache for storing a second plurality of instructions, wherein each instruction in the first cache has an associated counter that is incremented when the instruction is accessed. In this embodiment, when the counter reaches a threshold, the related instruction is copied from the first cache into the second cache, where it will be maintained and not overwritten for a longer period than its storage in the first cache.
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
With reference now to the figures and in particular with reference to
With reference now to
An operating system runs on processor 202 and is used to coordinate and provide control of various components within data processing system 200 in
Those of ordinary skill in the art will appreciate that the hardware in
For example, data processing system 200, if optionally configured as a network computer, may not include SCSI host bus adapter 212, hard disk drive 226, tape drive 228, and CD-ROM 230, as noted by dotted line 232 in
The depicted example in
The processes of the present invention are performed by processor 202 using computer implemented instructions, which may be located in a memory such as, for example, main memory 204, memory 224, or in one or more peripheral devices 226-230.
The present invention teaches an innovative cache system for a computer system, for example, the system shown in
In an example of a preferred embodiment, a Dynamic Frequent Instruction cache (DFI-cache) is implemented with a main cache, for example, an instruction cache (I-cache). Both caches are queried simultaneously, so that if a line is held in either cache, the query will result in a hit.
In one embodiment, the each line in the main cache is outfitted with an associated counter that increments whenever that address line is accessed. When the counter reaches a certain number, the line is removed from cache and placed in the DFI-cache. The DFI-cache thereby holds more frequently accessed lines longer than main cache.
In another embodiment, the main cache is supplemented with a hardware counter that counts most referenced lines. When an item is to be eliminated from the main cache, the highest counter value determines which line is removed. The removed line is preferably moved into a DFI-cache, so that more frequently accessed lines remain in cache longer.
When a counter reaches a predetermined threshold (or a variable threshold, depending on implementation), the line is removed from I-cache 302 and placed in DFI-cache 300, for example, line 304A. In this example, the DFI-cache is fully associative and follows a LRU (Least Recently Used) policy for determining which line to overwrite when a new line is to be added.
The DFI-cache is an additional cache that stores instruction lines that have been determined to be frequent, for example,-by the associated counter reaching a threshold. In the above example, the I-cache and DFI-cache are preferably queried simultaneously. If the sought instruction is in either the DFI-cache or the I-cache, a hit results. The DFI will be updated when an instruction is determined to be frequent.
Though the present invention describes “frequent” instruction lines by the associated counter reaching a threshold, other methods of designating an instruction as “frequent” also can be implemented within the scope of the present invention.
The DFI keeps frequently used lines in cache longer than the normal instruction cache, so that the normal instruction cache can invalidate lines, including lines deemed frequently used lines. When such an instruction is requested again, it is found in the DFI cache, and a hit results. Hence, by the mechanism of the present invention, some cache misses are turned into hits. When a line is found in the DPI-cache only, that line may be retained only in the DFI-cache, or it may be deleted from the DFI-cache and copied into the main cache, or it could be kept in DFI-cache and copied into the main cache. In preferred embodiments, frequency of accessed lines in the DFI-cache is also measured by using a counter. For example, an algorithm that keeps track of the last “X” accesses can be used, where “X” is a predetermined number. Other methods of keeping track of the frequency of accessed lines in the DFI-cache also can be implemented within the scope of the present invention. The DFI cache can be organized as a direct mapped or set associative cache, and the size is preferably chosen to provide the needed tradeoff between space and performance.
In another illustrative embodiment, not only is a counter, such as counter 308A associated with a frequently used cache line 306A of I-cache 302 incremented when that cache line 306A is accessed, but the counters associated with other cache lines in the I-cache 302 are decremented. When a line is to be replaced in the I-cache, the line with the lowest counter number is chosen to be replaced.
This process allows the DFI cache 300 to hold lines with higher counter values, which are accessed more frequently, to be held longer.
This example process describes a main cache, comparable to cache 302 of
If the counter does exceed the threshold and if the auxiliary cache is full, an entry in the auxiliary cache is selected to be replaced (step 410). If the auxiliary cache is not full, or after an entry in auxiliary cache has been selected to be replaced, the cache line is moved from main cache to auxiliary cache (step 412). Note that this includes removal of the cache line from the main memory. Next, the data is accessed from auxiliary cache (step 414) and the process ends.
If the memory address sought in step 402 is not found in main cache, then the memory address is checked for in auxiliary cache (step 418). If it is found, the process moves to step 414, and the data is accessed from auxiliary cache (step 414). If the memory address is not in auxiliary cache, then the main cache is checked to see if it is full (step 420). If the main cache is full, an entry in main cache is selected to be replaced (step 422) and the data is moved into main cache from the main memory of the computer system (step 424), and the process ends. If the main cache is not full, the data is moved from main memory into the main cache (step 424) and the process ends.
The process starts with a memory request being received at cache (step 502). In this example, the cache described is comparable to main cache 302 of
If the desired address is not in the cache (step 504), then the cache line with the lowest counter is chosen to be replaced (step 512). The chosen cache line is then replaced with new data (step 514). The process then ends.
In this embodiment, the main cache is the same as described in
This hardware counter of the present invention is useful, for example, in determining which address from the main cache should be moved into the auxiliary cache, and for determining which line in auxiliary cache (e.g., the DFI-cache) to remove when the cache is full. For example, in one embodiment, main cache is I-cache, and auxiliary cache is DFI-cache. The addresses in I-cache each have a position in hardware counter 600. When I-cache is full and must add another address line, it must expel an address line to make room. The determination as to which line in I-cache to expel (and preferably to write into DFI-cache) is made with reference to hardware counter 600. In preferred embodiments, the most frequently accessed lines are removed from I-cache, i.e., those appearing highest in the hardware counter stack 600. In this example, if a new address were to be added, it would be added at the bottom of the stack, while address 1 would be removed entirely from the I-cache and placed in the DFI-cache. When an item is removed from I-cache, it is also removed from the counter 600.
In another embodiment, each line in the DFI-cache has a counter associated with it. When a line in the DFI-cache is hit, the values of counters associated with the other lines in the DFI-cache are decremented. Thus, more frequently used lines in the DFI-cache have a higher value than less frequently used lines. When a line is to be replaced in the DFI-cache, the line with the lowest counter number is replaced. In this way, the DFI-cache holes frequently used lines longer than the I-cache.
When removing lines form the DFI-cache, other methods can be used to determine what line to remove. In some cases, removing the line with the least hits may be undesirable or inefficient. Known algorithms for page replacement can be implemented in the context of the present invention, such as a working set replacement algorithm. Such an algorithm uses a moving window in time, and pages or lines not referred to in the specified time are removed from the working set or cache.
It is important to note that while the present invention has been described in the context of a fully functioning data processing system, those of ordinary skill in the art will appreciate that the processes of the present invention are capable of being distributed in the form of a computer readable medium of instructions and a variety of forms and that the present invention applies equally regardless of the particular type of signal bearing media actually used to carry out the distribution. Examples of computer readable media include recordable-type media, such as a floppy disk, a hard disk drive, a RAM, CD-ROMs, DVD-ROMs, and transmission-type media, such as digital and analog communications links, wired or wireless communications links using transmission forms, such as, for example, radio frequency and light wave transmissions. The computer readable media may take the form of coded formats that are decoded for actual use in a particular data processing system.
The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.