TECHNICAL FIELD
The present invention relates generally to data storage controllers and, in particular, to establishing cache discard and destage policies.
BACKGROUND ART
A data storage controller, such as an International Business Machines Enterprise Storage Server®, receives input/output (I/O) requests directed toward an attached storage system. The attached storage system may comprise one or more enclosures including numerous interconnected disk drives, such as a Direct Access Storage Device (DASD), Redundant Array of Independent Disks (RAID Array), Just A Bunch of Disks (JBOD), etc. If I/O read and write requests are received at a faster rate then they can be processed, then the storage controller will queue the I/O requests in a primary cache, which may comprise one or more gigabytes of volatile storage, such as random access memory (RAM), dynamic random access memory (DRAM), etc. A copy of certain modified (write) data may also by placed in a secondary, non-volatile storage (NVS) cache, such as a battery backed-up volatile memory, to provide additional protection of write data in the event of a failure at the storage controller. Typically, the secondary cache is smaller than the primary cache due to the cost of NVS memory.
In many current systems, an entry is included in a least recently used (LRU) list for each track that is stored in the primary cache. Commonly-assigned U.S. Pat. No. 6,785,771, entitled “Method, System, and Program for Destaging Data in Cache” and incorporated herein by reference, describes one such system. A track can be staged from the storage system to cache to return a read request. Additionally, write data for a track may be stored in the primary cache before being transferred to the attached storage system to preserve the data in the event that the transfer fails. Each entry in the LRU list entry comprises a control block that indicates the current status of a track, the location of the track in cache, and the location of the track in the storage system. A separate NVS. LRU list is maintained for tracks in the secondary NVS cache and is managed in the same fashion. In summary, the primary cache includes both read and modified (write) tracks while the secondary cache includes only modified (write) tracks. Thus, the primary LRU list (also known as the ‘A’ list) includes entries representing read and write tracks while the secondary LRU list (also known as the ‘N’ list) includes entries representing only write tracks. Although the primary and secondary LRU lists may each be divided into a list for sequential data (an “accelerated” list) and a list for random data (an “active” list), for purposes of this disclosure no such distinction will be made.
Referring to the prior art cache management sequences illustrated in FIGS. 1A-1F and FIGS. 2A and 2B, list entries marked with a prime symbol (′) represent modified track entries while those without the prime symbol represent unmodified or read entries. FIG. 1A illustrates examples of A and N lists which have already been partially populated with read and write entries. New entries are added to the most recently used (MRU) end of the LRU list to represent each track added to the primary cache. In FIG. 1B, a new write entry E′ has been added to the MRU end of both lists. As the new entries are added to the MRU ends, existing entries are “demoted” towards the LRU end of the lists. When a request is received to access a track, a search is made in the primary cache and, if an entry for the requested track is found (known as a “hit”), the entry is moved up to the MRU end of the list (FIG. 1C).
When additional space in the primary cache is needed to buffer additional requested read data and modified data, one or more tracks represented by entries at the LRU end of the LRU list are discarded from the cache and corresponding entries are removed from the primary LRU list (FIG. 1F in which entry A′ has been discarded from both caches when new entry H′ is added). A read data track in the primary cache may be discarded from the cache quickly because the data is already stored on a disk in the storage system and does not need to be destaged. However, a modified (write) data track in the primary and secondary caches may be discarded from the caches and lists only after it has been safely destaged to the storage system. Such a destage procedure may take as much as 100 times as long as discarding unmodified read data.
Due to the size difference between the primary and secondary caches, if a write data entry is discarded from the secondary (NVS) list after the associated track has been destaged from the secondary cache, it is possible that the entry and track remain in the primary LRU list and cache (FIGS. 2A and 2B in which write entry D′ is discarded from the secondary list while remaining in the primary list). In such an event, the status of the entry will be changed from “modified” to “unmodified” and remain available for a read request (FIG. 2B; entry D′ has been changed to D).
As noted above, if the primary cache does not have enough empty space to receive additional data tracks (as from FIG. 1D to 1E), existing tracks are discarded. In one currently used process, the primary LRU list is scanned from the LRU end for one or more unmodified (read) data entries whose corresponding tracks can be discarded quickly. During the scan, modified (write) data entries are skipped due to the longer time required to destage such tracks (FIG. 1E; unmodified entry C has been discarded). Even if the modified data entries are not skipped over but are destaged, they may not be able to free up space quickly enough for the new entries; and as long as the destage is in progress, these modified entries will have to be skipped over. As a result, during heavy write loads, some modified tracks may be modified several times and remain in the secondary cache for a relatively long time. Such tracks will also remain in the primary cache with the “modified” status before being destaged. Moreover, even after such tracks have eventually been destaged from the secondary cache, they may remain in the primary cache as “unmodified” and, if near the MRU end of the primary list, may receive another opportunity (or “life”) to move through the primary list. When there are many modified tracks in the primary cache, list scans have to skip over many entries and may not be able to identify enough unmodified tracks to discard to make room for new tracks. As will be appreciated, skipping over so many cached tracks takes a significant amount of time and wastes processor cycles. Because of these factors, the read replacement and write retirement policies are interdependent and write cache management is coupled to read cache management.
Thus, notwithstanding the use of LRU lists to manage cache destaging operations, there remains a need in the art for improved techniques for managing data in cache and performing the destage operation.
SUMMARY OF THE INVENTION
The present invention provides system, method and program product for more efficient cache management discard/destage policies. Prior to or during the scan, modified (write) data entries are moved to the most recently used (MRU) end of the list, allowing the scan to proceed in an efficient manner and not have to skip over modified data entries. Optionally, a status bit may be associated with each modified data entry. When the entry is moved to the MRU end of the A list, its status bit is changed from an initial state (such as 0) to a second state (such as 1), indicating that it is a candidate to be discarded. If a write track requested to be accessed is found in the primary cache (a “hit”), the status bit of the corresponding A list entry is changed back to the first state, preventing the track from being discarded. Thus, write tracks are allowed to remain in the primary cache only as long as necessary.
BRIEF DESCRIPTION OF THE DRAWINGS
FIGS. 1A-1F illustrate a prior art sequence of cache management;
FIGS. 2A and 2B illustrate another prior art sequence of cache management;
FIG. 3 is a block diagram of a data processing environment in which the present invention may be implemented;
FIG. 4 illustrates examples of LRU lists employed in the present invention;
FIGS. 5A and 5B illustrate a sequence of cache management according to one aspect of the present invention;
FIGS. 6A-6F illustrate a sequence of cache management according to another aspect of the present invention; and
FIGS. 7A-7E illustrate a sequence of cache management according to still another aspect of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
FIG. 3 is a block diagram of a data processing environment 300 in which the present invention may be implemented. A storage controller 310 receives input/output (I/O) requests from one or more hosts 302A, 302B, 302C to which the storage controller 310 is attached through a network 304. The I/O requests are directed to tracks in a storage system 306 having disk drives in any of several configurations, such as a Direct Access Storage Device (DASD), a Redundant Array of Independent Disks (RAID Array), Just A Bunch of Disks (JBOD), etc. The storage controller 310 includes a processor 312, a cache manager 314 and a cache 320. The cache manager 314 may comprise either a hardware component or a software/firmware component executed by the processor 312 to manage the cache 320. The cache 320 comprises a first portion and a second portion. In one embodiment, the first cache portion is a volatile storage 322 and the second cache portion is non-volatile storage (NVS) 324. The cache manager 314 is configured to temporarily store read (unmodified) and write (modified) data tracks in the volatile storage portion 322 and to temporarily store only write (modified) data tracks in the non-volatile storage portion 324.
Although in the described implementations the data is managed as tracks in cache, in alternative embodiments the data may be managed in other data units, such as a logical block address (LBA), etc.
The cache manager 314 is further configured to establish a set of data track lists for the volatile cache portion 322 (the “A” lists) and a set of data track lists for the NVS cache portion 324 (the “N” lists). As illustrated in FIG. 4, one list in each set may be established to hold entries for random access data (the “Active” lists) and the second list to hold entries for sequential access data (the “Accel” lists). In the illustration, the Active lists are larger than the Accel lists; however, this need not be so. Moreover, the present invention is not dependent upon the presence of a division of track entries between Active and Accel lists and further description hereinafter will make no such distinction.
FIGS. 5A and 5B illustrate a sequence of cache management according to one aspect of the present invention. In FIG. 5A, the A list has been filled with read and write entries from the MRU end to the LRU end. Entries have also been entered into the N list from the MRU end to the LRU end, but the list is not yet full. Either some time before the addition of a new read or write entry into the A list, or as part of the process to add a new entry, the A list is rearranged by the cache manager 314 in preparation for the addition of the new entry. As summarized in the Background hereinabove, in a prior art process, the A list would be scanned from the LRU end up towards the MRU end to locate the first unmodified read entry. The track associated with that entry would then be discarded, making room in the volatile cache for the new entry. In contrast, however, in one variation of the present invention, the cache manager 314 moves all or enough of the modified (write) data entries to the MRU end of the A list, leaving one or more unmodified data entries at the LRU end (FIG. 5B). Then, when the cache manager 314 initiates a scan of the A list, no time or processor cycles are wasted trying to identify an unmodified data entry: such an entry is already at the LRU end and can immediately be discarded. In another variation, a scan of the A list is initiated and modified data entries are moved to the MRU end until an unmodified data entry is at the LRU end; the data track represented by that entry is then discarded.
FIGS. 6A-6F illustrate a first optional enhancement to the embodiment described with respect to FIGS. 5A and 5B. Each write data entry includes an extra status bit which is initially set to 0 (FIG. 6A). For simplicity in implementing the present invention, all entries may include the status bit, initially set to 0. However, status bits associated with unmodified entries will remain at 0. In FIG. 6B, modified data entries (A′, D′ and E′) have been moved to the MRU end of the A list and their status bits have been changed to 1, indicating that they have progressed at least partially through the A list one time. As in the sequence of FIGS. 5A and 5B, all or some of the modified entries may be moved and they may be moved either prior to a scan or during a scan in which enough entries for modified data are moved to expose an entry for unmodified data at the LRU end.
Subsequently, a request is received by the storage controller 310 from a host 302 to access a modified track, such as track E′. Because the track is in the cache 320, it may be quickly read out of the cache instead of having to be retrieved from a storage device 306. The “hit” on track E′ causes the cache manager 314 to move the corresponding data entry to the MRU end of the A list and to change its status bit back to 0 (FIG. 6C), allowing the entry to move through the list again. Another write track added to the NVS cache 324 fills the cache 324 and its entry (G′) fills the N list. Its entry into the A list also forces the read entry at the LRU end of the A list (C) to be discarded (FIG. 6D). When still another write track is added to the NVS cache 324, the associated entry (H′) is added to the N list, forcing the write entry at the LRU end of the N list (A′) to be destaged to a storage device 306. The corresponding entry in the A list is changed from a modified to an unmodified state (FIG. 6E). Since its status bit is 1, the entry (A) is discarded from the A list (FIG. 6F), either immediately or during a subsequent scan.
FIGS. 7A-7E illustrate an alternative procedure to that illustrated in FIGS. 6A-6E. The initial sequence (FIGS. 7A and 7B) is the same as that in the preceding procedure (FIGS. 6A and 6B). Next, a read hit on track A′ leaves the associated A list entry at the top of the MRU end of the A list (or moves it there if it was previously demoted towards the LRU end). Additionally, the entry's status bit is changed from 1 to 0 allowing the entry to move through the list again (FIG. 7C). A new write entry (G′) causes the N list to become full (FIG. 7D) while another new write entry (H′) forces A′ to be destaged from the N list. The cache manager 314 determines that the status bit of its corresponding A list entry is 0; therefore, the cache manager 314 changes the state from modified (A′) to unmodified (A), and does not discard the entry from the A list immediately. The entry (A) is given another opportunity to move through the A list and will be discarded only at a time when it is unmodified and has a status bit of 1.
It is important to note that while the present invention has been described in the context of a fully functioning data processing system, those of ordinary skill in the art will appreciate that the processes of the present invention are capable of being distributed in the form of a computer readable medium of instructions and a variety of forms and that the present invention applies regardless of the particular type of signal bearing media actually used to carry out the distribution. Examples of computer readable media include recordable-type media such as a floppy disk, a hard disk drive, a RAM, and CD-ROMs and transmission-type media such as digital and analog communication links.
The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. For example, the foregoing describes specific operations occurring in a particular order. In alternative implementations, certain of the operations may be performed in a different order, modified or removed. Moreover, steps may be added to the above described operation and still conform to the described implementations. Further, operations described herein may occur sequentially or may be processed in parallel. The embodiments described were chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated. Moreover, although described above with respect to methods and systems, the need in the art may also be met with a computer program product containing instructions for managing cached data in a data storage controller.