The invention relates in general to microprocessors, and in particular to a replacement strategies in least-recently-used (LRU) approaches such as LRU caches.
Many different processor architectures are known in the art. State-of-the-art processors typically make use of caches to improve memory access. Cache is the name given to the first level of memory hierarchy encountered after a processing unit. The processing unit can be a central processing unit (CPU). However, since the concept of improving performance by means of a cache mechanism is very popular, the term cache is generally applied whenever buffering is employed to locally store commonly reused items. Other examples of caches are file caches or name caches. For example, a cache is used to buffer items of memories on lower levels. Such memories on lower levels can be a main memory or even a disk storage.
Real caches have thousands of block frames and real memories can have billions of blocks. The set associative cache 30 of
Set associative caches are commonly used in processor architectures. It should be noted in
A common and simple strategy to find a block frame within a set when blocks are written to a cache is the first-in first-out (FIFO) approach. The block that was written first—the oldest block—is overwritten when a new block goes into the set. The following example is offered as illustrative for further understanding of this concept. A write pointer can mark the position of the block frame within a set where the next block goes to. Once the block frame has been written the pointer is incremented. The pointer is reset to the beginning of the set when it exceeds the end of the set. Such an approach is easy to implement. However this approach is not an optimal strategy as used block frames are overwritten regardless how often they are queried.
A better strategy can be the least-frequently-used (LFU) approach. The block frame of a set which has been queried the least is overwritten. However, the LFU approach is not adequate when block frames with a high number of queries in a set have not been used for a long time. The LFU approach can be very expensive and, hence, requires additional concepts to allow block frames with a high number of queries to be selected for writing.
Another good strategy is the least-recently-used (LRU) approach. The block frame of a set is selected for writing which is the least recently been used. This approach is easier to implement than the LFU approach and, hence, applied more often. The strategy of selecting the block frame that is to be overwritten next is called a replacement strategy. A new kind of apparatus and method to implement LRU replacement strategies is within a scope of the present invention.
Other replacement strategies are, for example, “random.” In these replacement strategies, the block frame to be replaced is randomly selected, or “clock” which uses a sequential approach that queries a status bit to determine the block to be selected for replacement.
Caches are just one example, however, a very striking example for LRU replacement strategies. Caches have to be very fast and logic elements to implement a replacement strategy in a set have to be small in order to allow small areas of the whole cache. Therefore, there is a need for a high-performance and a small implementation size for LRU replacement strategies that provides a very simple circuit and mechanism to select the block frame to be replaced.
In an exemplary embodiment, the present invention is an electronic system to implement a replacement strategy. The system includes a set of N blocks and a set of N priority modules. Each of the set of N blocks is capable of storing at least one value and each of the N priority modules is electrically coupled to a select one of the set of N blocks. Each of the set of N priority modules includes a priority level register configured to store a priority level value where the priority level is an integer within a range of 0 to N−1, an incrementor configured to generate a next higher priority level value, an equal comparator configured to compare the priority level value with a reference value and generate an equal signal when the priority level value and the reference value are equal, the reference value being an integer from 0 to N−1. Each of the set of N priority modules further includes a second comparator configured to compare the priority level value with the reference value and generate a second signal when the priority level value is greater than the reference value and a logic circuit configured to load the priority level register, the logic circuit further configured to be responsive to the equal signal and the second signal.
In another exemplary embodiment, the present invention is a method of reading a block from a set of N blocks in a data processing environment. The method includes storing a select one of a plurality of priority level values in each of a set of N priority modules, determining whether a selected block in the set of N blocks is available using an address of the block, determining a current priority level value where the current priority level value is a priority level of the selected block to be read, reading the selected block, resetting the current priority level to zero, and incrementing each priority level of a set of N priority level registers to a next higher priority level which are lower than a reference value.
In another exemplary embodiment, the present invention is a method of replacing a current block in a set of N blocks with a new block in the set of N blocks in a data processing environment. The method includes storing one of a plurality of priority level values in each of a plurality of priority modules, determining whether the current block has a priority level value of N−1, overwriting the current block with the new block, and resetting the priority level value assigned to the current block to zero, and incrementing each priority level of the set of N priority level registers to a next higher priority level except for the priority level assigned to the current block.
The appended drawings illustrate exemplary embodiments of the present invention only and, therefore, may not be considered as limiting a scope of the present invention.
A method and apparatus for replacement strategies is disclosed herein. An exemplary embodiment of the replacement strategy presented in this disclosure is a replacement strategy in set associative caches. However, the apparatus and method presented herein can be used in various applications where easy implementations for replacement strategies are desired. The method and apparatus stores a priority level to determine which block frame is to be selected for replacement. A priority level of N−1 marks a block frame to be replaced, a priority level of 0 is assigned to a block frame when the block is queried. The entire logic is small and allows implementation in area critical applications.
With reference to
Using a third multiplexer 113, a signal OW can be used to decide whether the PL register 101 is loaded with the subsequent PL value on the second multiplexer output line 151 or with a reset value “ext” applied to the priority module 100.
When a plurality of priority modules 100 are used in a circuit to implement a replacement strategy, each of the plurality of modules 100 hold different values at each clock cycle and, hence, have to be reset with different values at reset time.
Each of the plurality of memories 201 can hold a block. When a certain block has to be written to a select one of the plurality of memories 201 in the current set, the input data (the block) are applied to each of the plurality of memories 201 in parallel and a write signal 261 is set to true. A logic circuit 211 prevents both a read and a write signal being applied simultaneously and sets the write signal 263. The set write signal 263 then enables that one of the plurality of AND gates 205 which receives a true signal from one of the plurality of comparators 203 as described above. The enabled AND gate of the plurality of AND gates 205 sends a write enable signal (wen) to the corresponding one of the plurality of memories 201. Thus, when a write signal 261 is set the input data 255 are stored in one of the plurality of memories 201 that is marked by the corresponding one of the plurality of priority modules 100. A priority module 100 marks its corresponding memory 201 when the PL value which is stored in that priority module 100 has the maximum PL value, which is 3 in the case of the embodiment shown in
When a block has to be read from the implementation of a set of a four-way set associative cache shown in
The applied reference value causes the priority module 100 that exactly has that PL value to reset its PL value to zero and the remaining priority modules 100 which have lower PL values to increase their PL values. Thus, the logic shown in
With reference to
Steps 404, 405, and 406 can be performed in parallel or sequentially in that order. In step 404, the block is read from the memory with the given address. According to step 405, the PL value of the block which is read is set to 0. Finally, step 406 illustrates that the PL values of those blocks are increased, which are lower than the current PL value. Steps 405 and 406 ensure that the PL values are set properly before a subsequent block is read or written.
Referring now to
This application claims priority from U.S. Provisional Patent Application Ser. No. 60/864,435 entitled “Method and Apparatus for Least Recently Used Replacement” filed Nov. 6, 2006 which is hereby incorporated by reference in its entirety.
| Number | Date | Country | |
|---|---|---|---|
| 60864435 | Nov 2006 | US |