Storage systems can be used to store relatively large amounts of data. Such storage systems can be provided in a network, such as a storage area network, to allow for remote access over the network by one or more hosts. An issue associated with storage systems is the possibility of failure, which may result in loss of data.
One recovery technique that has been implemented with some storage systems involves taking “snapshots” of data, with a snapshot being a copy of data taken at a particular time. A snapshot of data is also referred to as a point-in-time representation of data. If recovery of data is desired, data can be restored to a prior state by reconstructing the snapshot.
Multiple snapshots of data at different times can be stored in the storage system. Such snapshots refer to different generations of snapshots (with a “generation” referring to the particular time at which the snapshot was taken).
A snapshot subsystem of a storage system can be implemented with a snapshot primary volume and snapshot pool volumes, where the snapshot pool volumes are used to store old data. Typically, non-updated data is kept in the snapshot primary volume, while the snapshot pool volumes are used to store prior generations of data that have been modified at different times. Different snapshots can include different combinations of data from the snapshot primary volume and one or more volumes in the snapshot pool.
The storage system can receive requests from one or more hosts to actively utilize snapshots. For example, in a storage system that is capable of maintaining 64 snapshots, it may be possible that there may be up to 64 outstanding input/output requests to snapshots at a given time.
For improved throughput, caches are typically provided in storage systems. Caches are implemented with memory devices that have higher access speeds than persistent storage devices (e.g., magnetic disk drives) that are part of the storage system. If an access request can be satisfied from the cache (a cache hit), then an input/output (I/O) access of the slower persistent storage devices can be avoided. However, conventional cache management algorithms do not effectively handle scenarios where there may be multiple outstanding requests for snapshots, where the outstanding requests (which may be from multiple hosts) may each involve access of the snapshot primary volume. N (N>1) hosts requesting I/O against N snapshots will produce respective workload at N different Gaussian distributed random disk head locations (assuming that the persistent storage devices are disk drives). If each of the N requests against snapshots involves an access of the snapshot primary volume, then the disk head(s) associated with the snapshot primary volume will be distracted with the N snapshot read activity. Note that the primary volume may also be concurrently handling normal read requests (reads of the current data, rather than reads of snapshot data).
The increased workload and the fact that the snapshot primary volume is being accessed by multiple outstanding requests increases the likelihood of a cache miss, which can result in performance degradation, particularly during write operations to the snapshot subsystem. Note that each write to a snapshot subsystem can result in three times the I/O traffic, since a write to a snapshot subsystem involves the following: (1) read old data from the snapshot primary volume; (2) write old data to the snapshot pool of volumes; and (3) write new data to the snapshot primary volume. Conventional cache management algorithms that are not effectively designed to handle snapshots will lead to increased cache misses, which in turn will cause degradation of performance of the storage system.
Some embodiments of the invention are described with respect to the following figures:
The storage controller 102 includes a processor 118, cache control logic 121, and a cache 120. The cache 120 is used to cache data stored in the persistent storage 104, such that subsequent reads can be satisfied from the cache 120 for improved performance. The cache 120 is implemented with storage device(s) that has (have) higher access speeds than the storage devices used to implement persistent storage 104. For example, the cache 120 can be implemented with semiconductor memories, such as dynamic random access memories (DRAMs), static random access memories (SRAMs), flash memories, and so forth. The cache control logic 121 manages the cache 120 according to one or more cache management algorithms. The storage controller 102 is connected to a network interface 122 to allow the storage controller 102 to communicate over the network 106 with hosts 108 and with an administrative station 110.
Although the cache control logic 121 and cache 120 are depicted as being part of the storage controller 102 in
A snapshot subsystem 105 can be provided in the persistent storage 104. The snapshot subsystem 105 is used for storing snapshots corresponding to different generations of data. A “snapshot” refers to a point-in-time representation of data in the storage system 100. Different generations of snapshots refer to snapshots taken at different points in time. In case of failure, one or more generations of snapshots can be retrieved to recover lost data.
In accordance with some embodiments, “sticky” indicators can be associated with certain data items stored in the cache 120. In some embodiments, the data items associated with sticky indicators are data items associated with certain segments of the snapshot subsystem 105, such as one or more snapshot primary volumes. In certain scenarios, data items may have to be replaced (sacrificed) if the cache 120 needs additional storage space to store other data (e.g., write data associated with write requests). A sticky indicator associated with a data item in the cache is an indicator that prevents displacement of the data item in the cache according to some predefined criteria. In some embodiments, a sticky indicator can be a counter (referred to as a “reclamation escape counter”) associated with a particular data item stored in the cache. The counter can be adjusted (incremented or decremented) as the particular data item moves through a queue associated with the cache. The particular data item is not allowed to be replaced (sacrificed) until the counter has reached a predetermined value (e.g., zero or some other value).
The snapshot subsystem 105 includes a snapshot primary volume A and an associated snapshot pool A, where the snapshot pool A includes one or more volumes. The volumes in the snapshot pool A are used to store prior versions of data that have previously been modified. A “volume” refers to a logical collection of data. Unmodified data, from the perspective of each snapshot, is maintained in the snapshot primary volume A. Multiple generations of snapshots can be maintained, with each snapshot generation made up of data that is based on a combination of unmodified data from the snapshot primary volume A and previously modified data from one or more volumes in the snapshot pool A. The persistent storage can have multiple snapshot primary volumes, with another snapshot primary volume B and associated snapshot pool B illustrated in the example of
As further depicted in
The storage system 100 is also accessible by an administrative station 110, which can also be implemented with a computer. The administrative station 110 is used to control various settings associated with the storage system 100. In accordance with some embodiments, settings that can be adjusted by the administrative station 110 include settings related to which data items of the cache 120 are to be associated with sticky indicators.
In some embodiments, the user at the administrative station 110 can indicate that cached data items for one or more of the snapshot primary volumes in the snapshot subsystem 105 are to be associated with sticky indicators. The setting can be a global setting that indicates that cached data items for all snapshot primary volumes are to be associated with sticky indicators. Alternatively, a user can selectively set a single one or some subset of the snapshot primary volumes are to be associated with sticky indicators.
As depicted in the example of
The host write data remains in the dirty queue 202 until the write data is destaged to the persistent storage 104. After destaging, the duplexed write data are moved to the clean queue 204 and free queue 206, with one copy of the write data re-linked onto the clean queue 204, and the other copy of the write data re-linked onto the free queue 206. Read requests can be satisfied from the clean queue 204 (a cache hit for a read request). The clean queue 204 is also referred to as a read queue. Any data in the free queue 206 can be overwritten.
As depicted in the example of
When no available space exists in the free queue 206, then the LRU entry of the clean queue 204 can be sacrificed to the free queue 206 to be overwritten (replaced) with new host write data. Such a replacement algorithm is an LRU replacement algorithm. In another implementation, another type of replacement algorithm can be used. Sacrificing an entry of the clean queue 204 to the free queue 206 means that such entry of the clean queue 204 is logically linked to the free queue 206.
The queues illustrated in
Note that in each of the queues 202, 204, and 206, the head entry is identified by a head pointer, where the head entry contains the newest data item (the most recently accessed data item), while the tail entry is identified by a tail pointer, where the tail entry contains the oldest data item (the least recently accessed data item). As depicted in
Using a conventional LRU replacement algorithm, if no available space exists in the free queue 206, the entry in the clean queue 204 containing the LRU data item is sacrificed for storing new write data. Sacrificing entries (and associated data items) from the clean queue 204 means that there is a reduced opportunity for a cache hit for subsequent read requests that would otherwise have been satisfied by the data items in the sacrificed entries.
As depicted in
For example, there may be multiple requests for data items associated with a snapshot primary volume (requests for normal data as well as requests for snapshot data) pending at various times. Moreover, a host write to a snapshot primary volume may involve a read of a snapshot primary volume (in addition to a write to a snapshot pool volume and a write to the snapshot primary volume). In the above scenarios, retaining snapshot-related data items in the cache 120 for a longer period of time would tend to significantly enhance the storage system performance since it increases cache hits for read requests.
Use of the sticky indicators provides for a modified LRU algorithm, which takes into account values of the sticky indicators when deciding whether or not a data item that has reached a sacrifice point of the clean queue should be sacrificed.
In other implementations, note that the management area 304 can also include a backwards pointer field to store a backwards pointer to point to a previous data item.
A data item in the clean queue 204 moves from its head to its tail as read requests are received and processed. A cache hit in the clean queue 204 will result in the corresponding data item (that provided the cache hit) to be moved to the head of the clean queue 204. When a data item moves to the tail of the clean queue 204 (or some other predefined sacrifice point of the clean queue 204), the clean queue entry containing the data item becomes a candidate for sacrificing to the free queue 206. However, if a data item is associated with a sticky indicator 308, then the clean queue entry containing the data item is riot sacrificed even though the data item has reached the sacrifice point (.e.g., tail) of the clean queue 204, unless the sticky indicator has reached a predefined value. In the case where the sticky indicator 308 is a reclamation escape counter, that means that the data item associated with the sticky indicator is not sacrificed unless the counter has decremented to zero, for example (or otherwise counted to some other predefined value). Each time such a data item reaches the sacrifice point of the clean queue 204, the counter is decremented (or otherwise adjusted). For example, a reclamation escape counter having a starting value of X would allow the data item to make X more trips through the queue before the clean queue entry containing the data item is a candidate for sacrificing. As a result, this data item would be approximately X times more likely to provide a read cache hit.
In the foregoing discussion, reference has been made to a data item “moving” through a queue. According to some implementations, instead of actually moving data items through the queue, it is the head pointer and tail pointer that are updated (1) based on accesses of the queue, and (2) for the clean queue 204, also based on whether the sticky indicator field 308 has reached a predetermined value. Thus, moving a data item in a queue can refer to either physically moving the data item in the queue, or logically moving the data item in the queue by updating pointers or by some other mechanism.
Based on the settings, the storage controller 102 is able to associate (at 504) sticky indicators with certain data items in the cache. Thus, if the settings indicate that data items of a particular snapshot primary volume are to be associated with sticky indicators, then if a data item associated with the particular snapshot primary volume is retrieved into the clean queue 204 of the cache 120, the storage controller 102 will associate a sticky indicator for the data item of the particular snapshot primary volume retrieved into the clean queue.
The storage controller 102 updates (at 506) the sticky indicators as the corresponding data items move through the clean queue 204. More specifically, as a data item associated with a sticky indicator moves to a sacrifice point (e.g., tail) of the clean queue 204, the sticky indicator is updated (e.g., a reclamation escape counter is decremented).
The storage controller sacrifices (at 508) a particular data item that is associated with a sticky indicator if the sticky indicator has a predefined value (e.g., reclamation escape counter has decremented to 0) and the data item has reached the sacrifice point of the clean queue. A particular data item being sacrificed refers to an entry of the clean queue being sacrificed to the free queue.
However, a data item associated with a sticky indicator is not sacrificed if the sticky indicator has not reached the predetermined value even though the data item has reached the sacrifice point.
Instructions of software described above (including software of the storage controller 102, for example) are loaded for execution on a processor. The processor includes microprocessors, microcontrollers, processor modules or subsystems (including one or more microprocessors or microcontrollers), or other control or computing devices. As used here, a “processor” can refer to a single component or to plural components.
Data and instructions (of the software) are stored in respective storage devices, which are implemented as one or more computer-readable or computer-usable storage media. The storage media include different forms of memory including semiconductor memory devices such as dynamic or static random access memories (DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), electrically erasable and programmable read-only memories (EEPROMs) and flash memories; magnetic disks such as fixed, floppy and removable disks; other magnetic media including tape; and optical media such as compact disks (CDs) or digital video disks (DVDs).
In the foregoing description, numerous details are set forth to provide an understanding of the present invention. However, it will be understood by those skilled in the art that the present invention may be practiced without these details. While the invention has been disclosed with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover such modifications and variations as fall within the true spirit and scope of the invention.
Number | Date | Country | |
---|---|---|---|
61023533 | Jan 2008 | US |