Disclosed aspects are directed to power management and efficiency improvement of memory systems. More specifically, exemplary aspects are directed to selective refresh mechanisms for dynamic random access memory (DRAM) for decreasing power consumption and increasing availability of the DRAM.
DRAM systems provide low-cost data storage solutions because of the simplicity of their construction. Essentially, DRAM cells are made up of a switch or transistor, coupled to a capacitor. DRAM systems are organized as DRAM arrays comprising DRAM cells disposed in rows (or lines) and columns. As can be appreciated, given the simplicity of DRAM cells, the construction of DRAM systems incurs low cost and high density integration of DRAM arrays is possible. However, because capacitors are leaky, the charge stored in the DRAM cells needs to be periodically refreshed in order to correctly retain the information stored therein.
Conventional refresh operations involve reading out each DRAM cell (e.g., line by line) in a DRAM array and immediately writing back the data read out to the corresponding DRAM cells without modification, with the intent of preserving the information stored therein. Accordingly, the refresh operations consume power. Depending on specific implementations of DRAM systems (e.g., double data rate (DDR), low power DDR (LPDDR), embedded DRAM (eDRAM) etc., as known in the art) a minimum refresh frequency is defined, wherein if a DRAM cell is not refreshed at a frequency that is at least the minimum refresh frequency, then the likelihood of information stored therein becoming corrupted increases. If the DRAM cells are accessed for memory access operations such as read or write operations, the accessed DRAM cells are refreshed as part of performing the memory access operations. To ensure that the DRAM cells are being refreshed at least at a rate which satisfies the minimum refresh frequency even when the DRAM cells are not being accessed for memory access operations, various dedicated refresh mechanisms may be provided for DRAM systems.
It is recognized, however, that periodically refreshing each line of a DRAM, e.g., in an implementation of a large last level cache such as a level 3 (L3) Data Cache eDRAM, may be too expensive in terms of time and power to be feasible in conventional implementations. In an effort to mitigate the time expenses, some approaches are directed to refreshing groups of two or more lines in parallel, but these approaches may also suffer from drawbacks. For instance, if the number of lines which are refreshed at a time are relatively small, then the time consumed for refreshing the DRAM may nevertheless be prohibitively high, which may curtail availability of the DRAM for other access requests (e.g., reads/writes). This is because the ongoing refresh operations may delay or block the access requests from being serviced by the DRAM. On the other hand, if the number of lines being refreshed at a time is large, the corresponding power consumption is seen to increase, which in turn may raise demands on the robustness of power delivery networks (PDNs) used to supply power to the DRAM. A more complex PDN can also reduce routing tracks available for other wiring associated with the DRAM circuitry and increase the die size of the DRAM.
Thus, there is a recognized need in the art for improved refresh mechanisms for DRAMs which avoid the aforementioned drawbacks of conventional implementations.
Exemplary aspects of the invention are directed to systems and method for selective refresh of caches, e.g., a last-level cache of a processing system implemented as an embedded DRAM (eDRAM). The cache may be configured as a set-associative cache with at least one set and two or more ways in the at least one set and a cache controller may be provided, configured for selective refresh of lines of the at least one set. The cache controller may include two or more refresh bit registers comprising two or more refresh bits, each refresh bit associated with a corresponding one of the two or more ways and two or more reuse bit registers comprising two or more reuse bits, each reuse bit associated with a corresponding one of the two or more ways. The refresh and reuse bits are used in determining whether or not to refresh an associated line in the following manner. The cache controller may further include a least recently used (LRU) stack comprising two or more positions, each position associated with a corresponding one of the two or more ways, the two or more positions ranging from a most recently used position to a least recently used position, wherein positions towards the most recently used position of a threshold designated for the LRU stack comprise more recently used positions and positions towards the least recently used position of the threshold comprise less recently used positions. The cache controller is configured to selectively refresh a line in a way of the two or more ways if the position of the way is one of the more recently used positions and if the refresh bit associated with the way is set, or the position of the way is one of the less recently used positions and if the refresh bit and the reuse bit associated with the way are both set.
For example, an exemplary aspect is directed to a method of refreshing lines of a cache. The method comprises associating a refresh bit and a reuse bit with each of two or more ways of a set of the cache, associating a least recently used (LRU) stack with the set, wherein the LRU stack comprises a position associated with each of the two or more ways, the positions ranging from a most recently used position to a least recently used position, and designating a threshold for the LRU stack, wherein positions towards the most recently used position of the threshold comprise more recently used positions and positions towards the least recently used position of the threshold comprise less recently used positions. A line in a way of the cache is selectively refreshed if the position of the way is one of the more recently used positions and if the refresh bit associated with the way is set, or the position of the way is one of the less recently used positions and if the refresh bit and the reuse bit associated with the way are both set.
Another exemplary aspect is directed to an apparatus comprising a cache configured as a set-associative cache with at least one set and two or more ways in the at least one set and a cache controller configured for selective refresh of lines of the at least one set. The cache controller comprises two or more refresh bit registers comprising two or more refresh bits, each refresh bit associated with a corresponding one of the two or more ways, two or more reuse bit registers comprising two or more reuse bits, each reuse bit associated with a corresponding one of the two or more ways, and a least recently used (LRU) stack comprising two or more positions, each position associated with a corresponding one of the two or more ways, the two or more positions ranging from a most recently used position to a least recently used position, wherein positions towards the most recently used position of a threshold designated for the LRU stack comprise more recently used positions and positions towards the least recently used position of the threshold comprise less recently used positions. The cache controller is configured to selectively refresh a line in a way of the two or more ways if the position of the way is one of the more recently used positions and if the refresh bit associated with the way is set, or the position of the way is one of the less recently used positions and if the refresh bit and the reuse bit associated with the way are both set.
Yet another exemplary aspect is directed to an apparatus comprising a cache configured as a set-associative cache with at least one set and two or more ways in the at least one set and means for tracking positions associated with each of the two or more ways of the at least one set, the positions ranging from a most recently used position to a least recently used position, and wherein positions towards the most recently used position of the threshold comprise more recently used positions and positions towards the least recently used position of the threshold comprise less recently used positions. The apparatus further comprises means for selectively refreshing a line in a way of the cache if the position of the way is one of the more recently used positions and if a first means for indicating refresh associated with the way is set, or the position of the way is one of the less recently used positions and if the first means for indicating refresh and a second means for indicating reuse associated with the way are both set.
Another exemplary aspect is directed to a non-transitory computer-readable storage medium comprising code, which, when executed by a computer, causes the computer to perform operations for refreshing lines of a cache. The non-transitory computer-readable storage medium comprising code for associating a refresh bit and a reuse bit with each of two or more ways of a set of the cache, code for associating a least recently used (LRU) stack with the set, wherein the LRU stack comprises a position associated with each of the two or more ways, the positions ranging from a most recently used position to a least recently used position, code for designating a threshold for the LRU stack, wherein positions towards the most recently used position of the threshold comprise more recently used positions and positions towards the least recently used position of the threshold comprise less recently used positions, and code for selectively refreshing a line in a way of the cache if the position of the way is one of the more recently used positions and if the refresh bit associated with the way is set, or the position of the way is one of the less recently used positions and if the refresh bit and the reuse bit associated with the way are both set.
The accompanying drawings are presented to aid in the description of aspects of the invention and are provided solely for illustration of the aspects and not limitation thereof.
Aspects of the invention are disclosed in the following description and related drawings directed to specific aspects of the invention. Alternate aspects may be devised without departing from the scope of the invention. Additionally, well-known elements of the invention will not be described in detail or will be omitted so as not to obscure the relevant details of the invention.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects. Likewise, the term “aspects of the invention” does not require that all aspects of the invention include the discussed feature, advantage or mode of operation.
The terminology used herein is for the purpose of describing particular aspects only and is not intended to be limiting of aspects of the invention. As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes,” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Further, many aspects are described in terms of sequences of actions to be performed by, for example, elements of a computing device. It will be recognized that various actions described herein can be performed by specific circuits (e.g., application specific integrated circuits (ASICs)), by program instructions being executed by one or more processors, or by a combination of both. Additionally, these sequence of actions described herein can be considered to be embodied entirely within any form of computer readable storage medium having stored therein a corresponding set of computer instructions that upon execution would cause an associated processor to perform the functionality described herein. Thus, the various aspects of the invention may be embodied in a number of different forms, all of which have been contemplated to be within the scope of the claimed subject matter. In addition, for each of the aspects described herein, the corresponding form of any such aspects may be described herein as, for example, “logic configured to” perform the described action.
In exemplary aspects of this disclosure, selective refresh mechanisms are provided for DRAMs, e.g., eDRAMs implemented in last level caches such as L3 caches. The eDRAMs may be integrated on the same system on chip (SoC) as a processor accessing the last level cache (although this is not a requirement) For such last level caches, it is recognized that a significant proportion of cache lines thereof may not receive any hits after being brought into a cache, since locality of these cache lines may be filtered at inner level caches such as level 1 (L1), level 2 (L2) caches which are closer to the processor making access requests to the caches. Further, in a set associative cache implementation of the last level caches, with cache lines organized in two or more ways in each set, it is also recognized that among the cache lines that hit in the last level caches, the corresponding hits may be confined to a subset of ways including more recently used ways a set (e.g., the 4 more recently used positions in a least recently used (LRU) stack associated with a set of the last level cache comprising 8 ways). Accordingly, the selective refresh mechanisms described herein are directed to selectively refreshing only the lines which are likely to be reused, particularly if the lines are in less recently used ways of a cache configured using DRAM technology.
In one aspect, 2 bits, referred to as a refresh bit and a reuse bit are associated with each way (e.g., by augmenting a tag associated with the way, for example, with two additional bits). Further, a threshold is designated for the LRU stack of the cache, wherein the threshold denotes a separation between more recently used lines and less recently used lines. In one aspect, the threshold may be fixed, while in another aspect, the threshold can be dynamically changed, using counters to profile the number of ways which receive hits.
In general, the refresh bit being set to “1” (or simply, being “set”) for a way is taken to indicate that a cache line stored in the associated way is to be refreshed. The reuse bit being set to “1” (or simply, being “set”) for a way is taken to indicate that the cache line in the way has seen at least one reuse. In exemplary aspects, a cache line with its refresh bit set will be refreshed while the cache line is in a way whose position is more recently used; but if the position of the way crosses the threshold to a less recently used position, then the cache line is refreshed if its refresh bit is set and its reuse bit is also set. This is because cache lines in less recently used ways are generally recognized as not likely to see a reuse and therefore are not refreshed unless their reuse bit is set to indicate that these cache lines have seen a reuse.
By selectively refreshing lines in this manner, power consumption involved in the refresh operations is reduced. Moreover, by not refreshing certain lines which may have been conventionally refreshed, the availability of the cache for other access operations, such as read/write operations, is increased.
With reference first to
As shown, in one example for the sake of illustration, cache 104 may be a set associative cache with four sets 104a-d. Each set 104a-d may have multiple ways of cache lines (also referred to as cache blocks). Eight ways w0-w7 of cache lines for set 104c have been representatively illustrated in the example of
In exemplary aspects, a threshold may be used to demarcate entries of LRU stack 105c, with positions towards the most recently used (MRU) position of the threshold being referred to as more recently used positions and positions towards the less recently used (LRU) position of the threshold being referred to as less recently used positions. With such a threshold designation, the lines of LRU stack 105c in ways associated with more recently used positions may generally be refreshed, while lines in ways associated with less recently used positions may not be refreshed unless they have seen a reuse. A selective refresh in this manner is performed by using two bits to track whether a line is to be refreshed or not.
The above-mentioned two bits are representatively shown as refresh bit 110c and reuse bit 112c associated with each way w0-w7 of set 104c. Refresh bit 110c and reuse bit 112c may be configured as additional bits of a tag array (not separately shown). More generally, in alternative examples, refresh bit 110c may be stored in any memory structure such as a refresh bit register (not identified with a separate reference numeral in
In an exemplary aspect, cache controller 103 (or any other suitable logic) may be configured to perform exemplary refresh operations on cache 104 based on the statuses or values of refresh bit 110c and reuse bit 112c for each way, which allows selectively refreshing only lines in ways of set 104c which are likely to be reused. The description provides example functions which may be implemented in cache controller 103, for performing selective refresh operations on cache 104, and more specifically, selective refresh of lines in ways w0-w7 of set 104c of cache 104. In exemplary aspects, a line in a way is refreshed, only when the associated refresh bit 110c of the way is set and is not refreshed when the associated refresh bit 110c of the way is not set (or set to a value “0”). The following policies may be used in setting/resetting refresh bit 110c and reuse bit 112c for each line of set 104c.
When a new cache line is inserted in cache 104, e.g., in set 104c, the corresponding refresh bit 110c is set (e.g., to value “1”). The way for a newly inserted cache line will be in a more recently used position in LRU stack 105c. The position of the way starts falling from more recently used to less recently used positions as lines are inserted into other ways. Refresh bit 110c will remain set until the position associated with the way in which the line is inserted in LRU stack 105c crosses the above-noted threshold to go from a more recently used line designation to a less recently used line designation.
Once the position of the way changes to a less recently used designation, refresh bit 110c for the way is updated based on the value of reuse bit 112c. If reuse bit 112c is set (e.g., to value “1”), e.g., if the line has experienced a cache hit, then refresh bit 110c is also set and the line will be refreshed, until the line becomes stale (i.e., its reuse bit 112c is reset or set to value “0”). On the other hand, if reuse bit 112c is not set (e.g., set to value “0”), e.g., if the line has not experienced a cache hit, then refresh bit 110c is set to “0” and the line is no longer refreshed.
On a cache miss for a line in set 104c, the line may be installed in a way of set 104c and its refresh bit 110c may be set to “1” and reuse bit 112c reset or set to “0”. The relative usage of the line is tracked by the position of its way in LRU stack 105c. As previously, once the way crosses the threshold into positions designated as less recently used in LRU stack 105c, and if the line has not been reused (i.e., reuse bit 112c is “0”), then the corresponding refresh bit 110c is reset or set to “0”, to avoid refreshing stale lines which have not recently been used and may not have a high likelihood of reuse.
For a cache hit on a line in a way of set 104c, if its refresh bit 110c is set, then its reuse bit 112c is also set and the line is returned or delivered to the requestor, e.g., processor 102. In some aspects, a cache hit may be treated as a cache miss for a line in a way if refresh bit 110c is not set (or set to “0”) for that way. In further detail, a line in a way that has its refresh bit 110c not set (or set to “0”) is assumed to have exceeded a refresh limit and accordingly is treated as being stale, and so, is not returned to processor 102. The request for the cache line which is treated as a miss is then sent to a next level of backing memory, e.g., main memory 106 so a fresh and correct copy may be fetched again into cache 104.
In an aspect, if a line is in a way of set 104c which has crossed the threshold towards the MRU position into more recently used positions (e.g. the line is in the four more recently used positions) in LRU stack 105c, and if reuse bit 112c is set, then refresh bit 110c is also set, since the line has seen a reuse, and so the line is always refreshed. On the other hand, if a line crosses the threshold into more recently used positions and its reuse bit 112c is not set then refresh bit 110c is reset or set to “0”, since the line has not seen a reuse; and as such may have a low probability of future reuse; correspondingly, a refresh of the line is halted or not performed.
In some aspects, rather than a fixed threshold as described above, a dynamically variable threshold may be used in association with positions of LRU stack 105c for example set 104c of cache 104. The threshold may be dynamically changed, for example, based on program phase or some other metric.
In some designs, it may be desirable to reduce the hardware and/or associated resources for counters 205c of
In yet another implementation, although not explicitly shown, counters may be provided for only a subset of the overall number of sets of cache 104. For example, if counters N1-N4 are provided for tracking the upper half of ways of four out of 16 sets in an implementation of cache 104 (not corresponding to the illustration shown in
Accordingly, it will be appreciated that exemplary aspects include various methods for performing the processes, functions and/or algorithms disclosed herein. For example, method 300 is directed to a method of refreshing lines of a cache (e.g., cache 104) as discussed further below.
In Block 302, method 300 comprises associating a refresh bit and a reuse bit with each of two or more ways of a set of the cache (e.g., associating, by cache controller 103, refresh bit 110c and reuse bit 112c with ways w0-w7 of set 104c).
Block 304 comprises associating a least recently used (LRU) stack with the set, wherein the LRU stack comprises a position associated with each of the two or more ways, the positions ranging from a most recently used position to a least recently used position (e.g., LRU stack 105c of cache controller 103 associated with set 104c, with positions ranging from MRU to LRU).
Block 306 comprises designating a threshold for the LRU stack, wherein positions towards the most recently used position of the threshold comprise more recently used positions and positions towards the least recently used position of the threshold comprise less recently used positions (e.g., a fixed threshold or a dynamic threshold, with positions towards MRU position of the threshold in LRU stack 105c shown as more recently used positions and positions towards the LRU position of the threshold shown as less recently used positions in
In Block 308, a line in a way of the cache may be selectively refreshed if the position of the way is one of the more recently used positions and if the refresh bit associated with the way is set; or if the position of the way is one of the less recently used positions and if the refresh bit and the reuse bit associated with the way are both set (e.g., cache controller 103 may be configured to selectively direct a refresh operation to be performed on a line in a way of the two or more ways w0-w7 of set 104c of cache 104 if the position of the way is one of the more recently used positions and if refresh bit 110c associated with the way is set; or if the position of the way is one of the less recently used positions and if refresh bit 110c and reuse bit 112c associated with the way are both set).
It will be appreciated that aspects of this disclosure also include any apparatus configured to or comprising means for performing the functionality described herein. For example, an exemplary apparatus according to one aspect comprises a cache (e.g., cache 104) configured as a set-associative cache with at least one set (e.g., set 104c) and two or more ways (e.g., ways w0-w7) in the at least one set. As such, the apparatus may comprise means for tracking positions associated with each of the two or more ways of the at least one set (e.g., LRU stack 105c), the positions ranging from a most recently used position to a least recently used position, and wherein positions towards the most recently used position of the threshold comprise more recently used positions and positions towards the least recently used position of the threshold comprise less recently used positions. The apparatus may also comprise means (e.g., cache controller 103) for selectively refreshing a line in a way of the cache if: the position of the way is one of the more recently used positions and if a first means for indicating refresh (e.g., refresh bit 110c) associated with the way is set; or the position of the way is one of the less recently used positions and if the first means for indicating refresh and a second means for indicating reuse (e.g., reuse bit 112c) associated with the way are both set.
An example apparatus in which exemplary aspects of this disclosure may be utilized, will now be discussed in relation to
Accordingly, in a particular aspect, input device 430 and power supply 444 are coupled to the system-on-chip device 422. Moreover, in a particular aspect, as illustrated in
It should be noted that although
Those of skill in the art will appreciate that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Further, those of skill in the art will appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the aspects disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The methods, sequences and/or algorithms described in connection with the aspects disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor.
Accordingly, an aspect of the invention can include computer-readable media embodying a method for selective refresh of a DRAM. Accordingly, the invention is not limited to illustrated examples and any means for performing the functionality described herein are included in aspects of the invention.
While the foregoing disclosure shows illustrative aspects of the invention, it should be noted that various changes and modifications could be made herein without departing from the scope of the invention as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the aspects of the invention described herein need not be performed in any particular order. Furthermore, although elements of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.