A storage system can include a persistent storage medium to store data. A persistent storage medium is a storage medium that maintains data stored in the storage medium even if power to the system is lost or removed. For improved performance, the storage system can also include a write cache to temporarily store write data (associated with a write operation) that has not yet been written to the persistent storage medium. The write operation can be in response to a request from a remote requesting device. For improved performance, the storage system can send an acknowledgement of the write request to the requesting device after the write data has been stored to the write cache, but prior to storing the write data to the persistent storage medium.
Some implementations of the present disclosure are described with respect to the following figures.
A storage system can include a storage device or an arrangement of storage devices. In examples of storage systems that include multiple storage devices, the storage devices can be arranged as an array of storage devices, such as an array of disk drives or other types of storage devices, including solid state storage devices. Generally, the storage device(s) include persistent storage media.
In the ensuing discussion, reference is made to a “persistent storage medium” of a storage system, which can refer to either the persistent storage medium of one storage device, or the persistent storage media of multiple storage devices.
For improved throughput of the storage system, a write cache can be provided in the storage system. The write cache is used to temporarily store write data (as well as any metadata associated with the write data) for a write operation that is responsive to a write request received from a remote requesting device over a network. The storage system can send an acknowledgement of successful completion of the write request to the requesting device over the network, even if the write data and associated metadata for the write request have not yet been written to a persistent storage medium. By not having to wait until the write data (and any associated metadata) is written to the persistent storage medium before the acknowledgment is sent to the requesting device, the storage system can provide the acknowledgment to the requesting device more quickly to provide an appearance of faster operation of the storage system.
The write cache can be implemented with a memory, such as a dynamic random access memory (DRAM), a static random access memory (SRAM), or other type of memory, that has an access speed that is greater than an access speed of the persistent storage medium of the storage system. Generally, a DRAM or SRAM is a volatile memory that can lose data stored in the memory if power is removed from the memory. Thus, if the storage system including a write cache implemented with volatile memory were to lose power unexpectedly, then the content of the write cache can be lost, which can result in a data inconsistency condition since an acknowledgement of successful completion of a write request has already been sent to the requesting device even though write data associated with the write request has not yet been stored in the persistent storage medium of the storage system. In such a scenario, the requesting device expects the write data for the write request to be persistently stored in the storage system, even though such data is lost if the content of the write cache in the storage system is lost.
To address the issue of lost data when power is unexpectedly lost in a storage system, a power backup arrangement can be provided at the storage system. The power backup arrangement includes a backup power source and any associated component(s) that is (are) used to provide temporary power for some time duration to allow the content of a write cache to be preserved in case of loss of power to the storage system.
Different storage systems (such as different generations of storage systems or different versions of storage systems) may employ different types of power backup arrangements. For example, a first type of storage system can use a first type of power backup arrangement in which the backup power source is to supply power to a first memory (which can be volatile memory) but not to certain other components of the system, such as a processor and possibly another memory in the system. Using a backup power source to supply power to the volatile memory effectively provides a non-volatile memory (the combination of the backup power source and the volatile memory constitutes the non-volatile memory), since the backup power source can be used to temporarily provide power to the volatile memory to prevent the volatile memory from losing its data so long as the backup power source is able to continue to supply power to the volatile memory.
In some examples, the backup power source can include a battery arrangement (that includes one battery or multiple batteries). In other examples, the backup power source can include a capacitor arrangement (including a capacitor or multiple capacitors) whose charge can be used to temporarily power the volatile memory. With the first type of power backup arrangement, the backup power source (e.g. a battery arrangement or capacitor arrangement) can have a smaller power capacity that is able to supply power for a relatively short period of time. Using a smaller capacity backup power source can reduce the overall cost of a storage system as compared to the cost associated with use of a larger capacity power source that can be used to power more components and can last a longer period of time.
As further examples, a second type of storage system can include a second type of power backup arrangement, which can include a larger capacitor backup power source (e.g. a larger capacity battery arrangement or larger capacity capacitor arrangement) to supply power to a larger number of components, including a first memory (which can be implemented with volatile memory), a processor of the storage system, and possibly other components in the storage system.
Although reference is made to two different types of backup power arrangements, it is noted that in other examples, more than two different types of backup power arrangements can be selectively used in different types of storage systems.
Depending upon which type of power backup arrangement is used in a storage system, different powerless data protection techniques can be used in the storage system. A powerloss data protection technique can refer to a technique to protect against loss of data in a write cache when power to the storage system is lost. For example, a first powerloss data protection technique is used if the storage system uses the first type of power backup arrangement, while a second powerloss data protection technique is used if the storage system employs the second type of power backup arrangement. Different powerloss data protection techniques can employ different respective processes to protect data in the write cache in response to detecting loss of power to the storage system.
In scenarios where a manufacturer of storage systems can build different types of storage systems that employ respective different types of power backup arrangements, it may not be desirable to provide different storage system code (in the form of machine-readable instructions) for the different storage systems depending upon which type of power backup arrangement is employed by each specific storage system. A storage system code can refer to software or firmware (or more generally, machine-readable instructions) that manages various operations of a storage system, including saving data to a persistent storage medium, storing data in a write cache, exchanging communications with a remote requesting device, recovering from a power loss to the storage system, and so forth.
In some examples, a first storage system code can be used for storage systems that employ the first type of power backup arrangement, while a second, different storage system code can be used in a storage system that employs the second type of power backup arrangement. Having to write and maintain different storage system code for different storage systems can increase manufacturing complexity and cost.
In accordance with some implementations of the present disclosure, a common storage system code that is capable of supporting any of different types of power backup arrangements can be used in storage systems. In some examples, the storage system code can perform a process according to
Responsive to determining which of the different types of power backup arrangements is used in the storage system, the storage system code dynamically selects (at 104), from among different powerless data protection techniques corresponding to the different types of power backup arrangements, a selected powerloss data protection technique to use in saving write data that has not yet written to a persistent storage medium of the storage system. Such write data may have been written to the write cache, but has not yet been stored to the persistent storage medium from the write cache.
To enhance efficiency associated with using the different powerloss data protection techniques, each of the different powerloss data protection techniques is responsive to a loss of power of a storage system by performing a save of the write data according to a common format to a specified storage location (such as a recovery storage medium); in other words, the format of the saved write data as saved by each of the different powerloss data protection techniques is the same. The common format used to save the write data organizes the write data as well as metadata associated with the write data in a specific manner. For example, the write data and metadata can be divided into distinct sections or have other organizations according to the specified format.
Although reference is made to storing write data in a write cache and writing such write data from the write cache to the persistent storage medium of the storage system, it is noted that the write cache can also store metadata that is to be written to the persistent storage medium. Metadata can refer to information associated with the write data and that relates to a feature or multiple features of the storage system. For example, a feature of a storage system can include thin provisioning, in which the storage system, which can be shared by multiple requesting devices, provides an appearance of having more storage capacity than the storage system has. With thin provisioning, the storage resources of the storage system are allocated on demand, instead of up front. Metadata related to thin provisioning can include information (such as information of a file system) that keeps track of storage resources that have been allocated.
Another example feature of a storage system can include a snapshot feature, where a snapshot stores a version of data at a respective point in time. The snapshot is part of backup data that can be used to recover data of the storage system in case of data corruption or storage system error or failure. The metadata related to snapshots can track pieces of data stored in the snapshots.
In further examples, the metadata can relate to other features of the storage system.
The write data and associated metadata according to the common format can be saved to a recovery storage medium, which can be persistent in that the recovery storage medium can maintain its stored content even when power in the storage system is removed. The recovery storage medium can include a disk drive (or disk drives), a solid state drive (or solid state drives), or any other type of storage medium that can store data persistently. The recovery storage medium can be used as a staging storage region to store write data and any associated metadata during a recovery process for recovering data to the persistent storage medium after a power loss to the storage system has occurred.
In response to loss of power in the storage system, write data (and any metadata) in the write cache is stored (at 106) according to the common format using the selected powerloss data protection technique to the recovery storage medium.
In response to recovery of power in the storage system, the write data in the recovery storage medium is used (at 108) to recover the write data (and any metadata) of the write cache.
The storage system 200 includes a backup power source 202 (e.g. a battery arrangement including a battery or multiple batteries) that is able to provide power to a write cache memory 204, which can be implemented as a volatile memory. A combination of the backup power source 202 and the write cache memory 204 can be considered a non-volatile memory. In some examples, the write cache memory 204 can be implemented as a memory module that includes multiple memory devices, such as multiple DRAM devices. An example of such a memory module can be a dual inline memory module (DIMM), although other types of memory modules can be employed. In yet further examples, the write cache memory 204 can be implemented as a single standalone memory device, or as a group of memory devices arranged on a circuit board.
The storage system 200 further includes a recovery storage medium 205, in which write data in the write cache memory 204 can be saved according to the common format as discussed above for the purpose of recovering the write data (and any associated metadata) in response to loss of power to the storage system 200.
The storage system 200 also includes a processor 206. Although just one processor is shown in
The machine-readable instructions stored in the machine-readable storage medium 208 includes power backup arrangement determination instructions 210 to determine a given type of power backup arrangement from among different types of power backup arrangements used by the storage system 200. The different types of power backup arrangements can include a first type of power backup arrangement that uses the backup power source 202 to power the first memory 204 but not the processor 206, and a second type of power backup arrangement that uses the backup power source 202 to power the first memory 204 and the processor 206.
The machine-readable instructions can also include powerloss data protection technique selection instructions 212 that can, based on the determined given type of power backup arrangement used by the storage system, dynamically select, from among different powerloss data protection techniques corresponding to the different types of power backup arrangements, a selected powerloss data protection technique to use, in response to a loss of power to the storage system 200, in storing write data in the write cache memory 204 to the recovery storage medium 205 according to a specified format that is in common with a format used by another of the different powerloss data protection techniques in storing write data responsive to a power loss.
The machine-readable instructions further include write data recovering instructions 214 to recover the write data of the write cache memory 204 using the write data stored to the recovery storage medium according to the specified format.
The power backup arrangement determination instructions 210, the powerloss data protection technique selection instructions 212, and the write data recovery instructions 214 can be part of the storage system code discussed above that performs tasks according to
In
The storage system 300 further includes the machine-readable storage medium 208 that stores storage system code 308. Since the storage system 300 uses the first type of power backup arrangement, the storage system code 308 executable on the processor 206 dynamically selects the first powerloss data protection technique to protect write data and associated metadata in the DIMM 302 against a power loss. In some examples, since the DIMM 302 is backed up by the backup power source 202, the first powerloss data protection technique employed by the storage system code 308 stores write data (and associated metadata) for write operations requested by a remote requesting device to the DIMM 302, but not to another memory that is not power backed by the backup power source 202.
In response to a loss of power to the storage system 300, while power from the backup power source 202 is available to the DIMM 302, the microcontroller 304, and the flash memory 306, the microcontroller 304 can automatically transfer data (including write data and associated metadata) from the DIMM 302 to the flash memory 306.
After data has been transferred from the DIMM 302 to the flash memory 306, data is safely held in the flash memory 306 until power is restored to the storage system 300. In this way, the data in the DIMM 302 would not be lost due to the loss of power to the storage system 300. The backup power source 202 can last for some specified amount of time during which the microcontroller 304 can perform the transfer of data from the DIMM 302 to the flash memory 306.
Later, in response to detecting, by the microcontroller 304 that power is restored to the storage system 300, the microcontroller 304 can transfer the data from the flash memory 306 back to the DIMM 302.
After data has been transferred from the flash memory 306 to the DIMM 302, a data recovery process can continue. For example, the data in the DIMM 302 can be saved according to the common format to the recovery storage medium 205, and the data in the recovery storage medium 205 can be processed to cause the data (including write data and metadata of the write cache) to be written to the disk array 310. During the recovery process, the metadata in the recovery storage medium 205 can be processed to determine how respective portions of write data in the recovery storage medium 205 are to be stored to recover the storage system 300 to a state prior to the power loss. For example, if redundancy such as redundant array of independent disks (RAID) redundancy is employed, then the metadata can be used to determine how write data can be saved from the recovery storage medium 205 to the disk array 310 in a manner consistent with the redundancy scheme employed. Similarly, if the write data relates to snapshots, then the metadata can specify how the write data is to be stored to the disk array 310 as respective snapshots.
In examples according to
Since the storage system 400 uses the second type of power backup arrangement, the storage system code 308 executable on the processor 206 dynamically selects the second powerloss data protection technique to use in protecting write data and associated metadata in the write cache (including the DIMM 302 and the second memory 402) against loss of power to the storage system 400.
The larger capacity backup power source 202 in the storage system 400 can provide power to more components than the smaller capacity backup power source 202 in the storage system 300 of
Once power is restored to the storage system 400, the content of the recovery storage medium 205 can be used to recover the write data and the associated metadata for writing to the disk array 310, in the manner discussed above in connection with
With the first powerless data protection technique that is used with the storage system 300 of
In contrast, with the second powerloss data protection technique that is used with the storage system 400 of
The machine-readable instructions further include write data storing instructions 504 to, in response to loss of power in the storage system, store, using the selected powerloss data protection technique, write data corresponding to write operations in the storage system according to the common format to a recovery storage medium.
The machine-readable instructions further include recovery instructions 506 to recover from the loss of power of the storage system by using the write data according to the common format stored using the selected powerloss data protection technique.
The storage medium 208 of
In the foregoing description, numerous details are set forth to provide an understanding of the subject disclosed herein. However, implementations may be practiced without some of these details. Other implementations may include modifications and variations from the details discussed above. It is intended that the appended claims cover such modifications and variations.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2016/030042 | 4/29/2016 | WO | 00 |