Field
The present application is generally related to storage systems and, more specifically, to input/output (I/O) processing methods of storage systems.
Related Art
The atomic write operation is used in the related art for the purpose of reducing overhead in the OS or middleware and the number of I/Os to the flash memory.
Atomic write reduces the number of write operations by bundling two or more write operations into one write operation. Additionally, atomic write assures that the write is performed in an all or nothing manner for two or more write operations.
The related art utilizes atomic write for Solid State Drives (SSD), and is realized by the flash translation layer (FTL) of the SSD.
Reducing the number of write operations to the flash memory improves the flash memory's endurance.
Assuring the all or nothing write operation by the FTL can ensure that the SSD does not overwrite the data in the write operation. However, storage systems do not presently have this feature. Therefore, the same methods utilized in the FTL cannot be applied to the storage system.
For example, many types of storage media can be installed in a storage system. Example of storage media installed in the storage system include SSDs supporting the atomic write operation, SSDs that do not support the atomic write operation, and Hard Disk Drives (HDDs). Related Art storage systems cannot determine which media supports the atomic write operation. Therefore, the atomic write operation is not utilized for related art storage systems.
Aspects of the present application may include a storage system, which may involve a storage device; and a controller with a cache unit. The controller may be configured to manage a status in which a first data and a second data corresponding to an atomic write command are stored in the cache unit, and a third data and a fourth data are maintained in the storage system, the third data being a previous data to be updated by the first data, and the fourth data being a previous data to be updated by the second data; and may be further configured to handle the atomic write command such that the status is maintained until the controller stores a plurality of data corresponding to the atomic write command in the cache unit.
Aspects of the present application may also include a method, which may involve managing a status in which a first data and a second data corresponding to an atomic write command are stored in a cache unit, and a third data and a fourth data are maintained in a storage system, the third data being a previous data to be updated by the first data, and the fourth data being a previous data to be updated by the second data; and handling the atomic write command such that the status is maintained until a plurality of data corresponding to the atomic write command is stored in the cache unit.
Aspects of the present application may also include a computer readable storage medium storing instructions for executing a process. The instructions may involve managing a status in which a first data and a second data corresponding to an atomic write command are stored in a cache unit, and a third data and a fourth data are maintained in a storage system, the third data being a previous data to be updated by the first data, and the fourth data being a previous data to be updated by the second data; and handling the atomic write command such that the status is maintained until a plurality of data corresponding to the atomic write command is stored in the cache unit.
Some example implementations are described with reference to drawings. Any example implementations that are described herein do not restrict the inventive concept in accordance with the claims, and one or more elements that are described in the example implementations may not be essential for implementing the inventive concept.
In the following descriptions, the process is described while a program is handled as a subject in some cases. For a program executed by a processor, the program executes the predetermined processing operations. Consequently, the program being processed can also be a processor. The processing that is disclosed while a program is handled as a subject can also be a process that is executed by a processor that executes the program or an apparatus that is provided with the processor (for example, a control device, a controller, and a storage system). Moreover, a part or a whole of a process that is executed when the processor executes a program can also be executed by a hardware circuit as substitute for or in addition to a processor.
The instructions for the program may be stored in a computer readable storage medium, which includes tangible media such as flash memory, random access memory (RAM), HDD and the like. Alternatively, instructions may be stored in the form of a computer readable signal medium, which includes non-tangible media such as carrier waves.
Example implementations described herein are directed to protocols for facilitating an atomic write command in a storage system. The storage system may maintain a status where data corresponding to an atomic write command are stored in a cache unit for writing to the storage devices of the storage system, with old data being maintained in storage system. The status can be maintained until the processing of the data corresponding to the atomic write command to the cache unit is completed.
As atomic commands may involve one or more write locations to one or more storage devices, multiple data streams may be used in the atomic write command. In such a case, the cache unit is configured such that multiple data corresponding to the atomic write command is stored in the cache unit and the original old data across the one or more storage devices is maintained until processing of the multiple data streams to the cache unit is complete. After completion of such processing, the data may then be destaged to the storage devices.
By maintaining such a status for the storage system, the all or nothing feature of the atomic write command can be realized, because the old data in the storage system is not overwritten until processing of the data into the cache is complete. If the processing fails, then the write data stored in the cache can be discarded without being written to the storage devices of the storage system.
Processor 203 executes processing by executing programs that have been stored into storage program 208. Moreover, the processor 203 executes processing by using information that has been stored in storage control information 207.
Disk I/F 204 is coupled to at least one HDD 206 as an example of a physical storage device via a bus. For example, a volume 205 that is configured to manage data is configured by at least one storage region of the HDD 206. The physical storage device is not restricted to an HDD 206 and can also be an SSD or a Digital Versatile Disk (DVD). Moreover, at least one HDD 206 can be collected in a unit of a parity group, and a high reliability technique such as a RAID (Redundant Array of Independent Disks) can also be used.
Storage control information 207 stores a wide variety of information used by a wide variety of programs. Storage program 208 stores a wide variety of programs, such as a read processing program or a write processing program. Cache unit 201 caches the data stored in HDD 206 in order to boost performance.
The storage write program 230 is a program to receive a write command from the server 100 and store the write data in the storage system. The cache allocation program 231 is a program to allocate a cache area for the read and write command from the server. The destage program 232 writes the data from the cache unit 201 to the HDD 206. The destage program 232 is called by other programs and executed periodically. As described above, the programs can contain instructions stored in a computer readable storage medium or a computer readable signal medium for execution by a processor or controller. Further details of these programs are described below.
The Destage flag is information for indicating whether that data on the cache unit 201 is to be destaged (e.g. written) to the HDD 206 or not. If the value of the destage flag is OFF, the data will not be written in HDD 206. On the contrary, if the value is ON, the data will be written in HDD 206 by destage program 232. In this example, the data is structured in a table. Generally, a tree structure is used for cache management. However, example implementations described herein are not limited to any particular data structure of the cache management, and other data structures known in the art may be substituted therefor.
Cache free” is at the head of the queue and indicates which caches are free. In this example, cache addresses 1024, 1536, 2560 and so on are free.
Then, the program calls the cache allocation program to allocate the area for preparing the cache area (S101). After that allocation, the program notifies the server 100 that the storage system can receive the write data (S102). Then, the program receives the write data and stores the write data in the allocated cache area (S103). Next, the program confirms whether un-transferred write data remains in the server (S104). This confirmation can be realized by using the write data length information received in first step. If the write data remains, the program returns to S101. If the write data does not remain, the program sends the completion message to the server 100 and terminates the processing (S105).
If all the write data of the atomic write command could not be transferred because of the server, network switch, or cable failure, a part of the write data of the atomic write command which is already written to the cache area should be deleted.
However, a portion of the write data may have overwritten old data. When this is the case, deletion of only a portion of the write data is difficult. Besides situations involving the failure of the server, network, or cable, there are also situations in which the data cannot be written in the cache area or HDD in the storage system due to various other obstacles known in the art. Furthermore, after writing a portion of the write data, there are situations in which the server directs cancellation.
Described below are three example implementations to assure the application of the all or nothing feature of the atomic write.
First Example Implementation
In a first example implementation, methods are utilized to assure the all or nothing feature of the atomic write operation. In particular, cache memory installed in the storage system is utilized as described below. The first example implementation is described in
In the first example implementation, an atomic write command containing write data A and B is issued to the partial areas 301. The storage system receives data A from the server, allocates a temporary cache area, and stores data A to the allocated temporary cache area. The storage system does not store the write data A to the partial cache area 302, to avoid overwriting old data. In this example, the old data of A is indicated as A′. Then, the storage system receives data B from the server and stores data B to the temporary cache area in a similar manner as data A. After receiving write data A and B, the write data A and B are copied from temporary cache area 304 to the cache area 302.
When a part of the write data of the atomic write command cannot be written to the cache or cannot be transferred from the server due to failure, the write data already received can be deleted by releasing the temporary cache area. Therefore, the all or nothing feature of the atomic write command can be realized.
After the allocation, the program notifies the server 100 that the storage system can receive the write data (S202). Next, the program decides whether the write processing can be continued or not (S203). For example, the condition for the decision is “No” if the next data is not transferred over the predetermined period, then the cancellation of the write command is received, and/or write data cannot be written due to the failure of the storage resource. If the result of S203 is “No,” the program updates the temporary cache table to release the allocated temporary cache area 304 (S211). The write data is not written to the volume because the copy operation from the temporary cache areas 304 to the cache area 302 is not executed.
If the result of S203 is “Yes”, the program progresses to S204. The program receives the write data and stores it in the allocated temporary cache area 304 (S204). Next, the program confirms whether un-transferred write data remains in the server (S205). If the result of S205 is “Yes”, the program returns back to S201 and repeats the process for the next write data.
If the result of S205 is “No”, then the result indicates that all write data is stored in the temporary cache area 304. Therefore, the program starts to copy the write data to the cache area 302 corresponding to the volume. First, the program sends the completion message to the server and calls the cache allocation program to allocate cache areas 302 corresponding to the volume (S206, S207). In the example in FIG.7, two cache areas 302 are allocated. After allocation of the cache areas 302, the program copies the write data from the temporary cache area 304 to the allocated cache area 302 (S208).
By copying the write data, the temporary cache area 304 will no longer be required. Thus, the program updates the temporary cache table to release the allocated temporary cache area 304 (S209). The In-Use flag is set to “OFF” and the volume ID and address is set to “-”. Then, the program terminates the processing (S210).
Second Example Implementation
The second example implementation is directed to methods to utilize the atomic write operation efficiently in the storage system, and is described in
When a part of the write data of atomic write cannot be written to the cache or cannot be transferred from the server due to failure and so on, the write data already received can be deleted by releasing the cache area 302, because the old data is not included in the cache area 302. However, there is a possibility that the data A may be written in HDD 206 before receipt of the data B. In this case, old data A may be overwritten. To avoid this overwriting, the second example implementation defers the destaging of the data A until reception of the data B. Thus, the all or nothing feature of the atomic write command can be realized.
The program calls the cache allocation program to allocate cache area (S304). At this allocation, the cache management table 220 is updated. At the update, the program sets the value in the destage flag field to “OFF”. If the value of the destage flag is “OFF,” the destage program doesn't execute the destage processing for the data. Therefore, the destaging of data A before receiving data B in
After that allocation, the program notifies the server 100 that the storage system can receive the write data (S305). Next, the program decides whether the write processing can be continued or not (S306) in a similar manner to S203 in
If the result of step S306 is “Yes,” the program progresses to S307, wherein the program receives the write data and stores it in the allocated cache area 302 (S307). Next, the program confirms whether un-transferred write data remains in the server (S308), in a similar manner as S205 in
If the result of S306 is “No,” the program releases the cache areas which are already allocated for this atomic write operation from S311. All write data are not written to the volume because the destage of the write data is not executed.
Third Example Implementation
The third example implementation is described in
Write side 305 is used to store the write data written from the server. Read side 306 is used to store the old data. In
With the use of RAID-5 technology, new parity data is calculated after writing the data. Generally, new parity data is calculated from new data (data A), old data and old parity data. Therefore, read side 306 is used to store the old data which is read from the HDD 206. Read side 306 is also used to store the old parity data. The third example implementation leverages these two types of cache area.
The staging field and the use field are added. If the staging is “ON,” the data of HDD 206 is done staging to the read side 306. The staging status can be managed with smaller granularity than that of cache allocation by using bitmap structure instead of flag.
The use field manages use of the read side. If the use field is “Parity,” the parity making processing is using the read side 306. If the use field is “ATM,” the atomic write operation is using the read side 306. If the use field is “null,” there is no processing using the read side 306. By using this information, erasure of the atomic write data by read processing (staging) of the old data and incorrect parity calculation from the write data of the atomic write command can be avoided for parity making and atomic write.
In the example of
Next, the program decides whether the write processing can be continued or not (S403), in a similar manner to S203 in
Then, the program copies the write data from the read side 306 to the write side 305 (S407). The program updates the cache management table. In particular, it changes the staging field to “OFF” and the use field to “null” (S408). The changing of the staging field is to avoid the write data of the atomic write operation being used as old data for parity calculation. The changing of the use field is to cancel exclusion of parity calculation and atomic write. Finally, the program terminates the processing (S409).
Differences between the first example implementation and the third example implementation are described below. The first example implementation allocates a size (e.g., predetermined) of temporary cache area beforehand. Also, the first example implementation may require more cache area than the third example implementation because the temporary cache area has both a read side and a write side. Moreover, the first example implementation may require management information and a program to manage the temporary cache area.
Fourth Example Implementation
There are many technologies which use the flash memory as cache unit 201. These technologies include the configuration which installs both DRAM and flash memory as cache unit 201. In a fourth example implementation, the methods that may improve the endurance and performance of the flash memory 401 are installed in the storage system. Non-atomic write commands are processed by the DRAM, while atomic write commands are processed by the flash memory.
When the server receives the atomic write command, the flash memory 401 which supports the atomic write command allocates the cache area. The storage system issues the atomic write command to flash memory 401, thereby avoiding the requirement to execute the processing described in the first example implementation. Moreover, the performance and endurance of the flash memory 401 may be improved.
If the result of S501 is “No,” the program calls the cache allocation program to allocate cache area from the DRAM 400 (S509). The program then executes the write operation for processing a non-atomic write command (S510). After that, the program progresses to S508 to send the completion message and terminate the processing. If the result of S501 is “Yes,” the program calls the cache allocation program to allocate cache area from the flash memory 401 (S502). This processing allocates the cache area for all the write data of the write command. After the cache allocation, the program issues atomic write to the flash memory 401 (S503). Then, the program receives a “transfer ready” indication from the flash memory 401 (S504) and sends the “transfer ready” indication to the server (S505). Next, the program receives the write data from the server and stores the write data in the allocated flash memory 401 (S506). Accordingly, the storage system transfers the write data to the flash memory 401. After the transfer, the program confirms whether un-transferred write data remains in the server (S507). If the result of S507 is “Yes,” the program returns to S504 and executes the above process for the next write data. If the result of S507 is “No”, it means that all the data has been received. So, the program sends the completion message to the server and terminates the processing. Thus, the all or nothing feature is realized by the flash memory 401.
First, the storage write program (5) receives the write command and analyses the command (S600). More specifically, the storage write program (5) is called from the kernel and obtains from the write command request from the I/O queue. Then, the program confirms whether there are any other write commands in the I/O queue (S601). If the result of S601 is “No,” the program executes the processing the same manner as in S501 to S510 in
After the cache allocation, the program forms and issues an atomic write command to the flash memory 401 (S603). The program determines the write command for processing (S604) and executes the following step for the determined write command. The program receives a “transfer ready” indication from the flash memory 401 (S605) and sends the “transfer ready” indication to the server (S606). The program receives the write data from the server and stores the write data in the allocated flash memory 401 (S607), thereby performing the transferring the write data to the flash memory 401.
After the transfer, the program confirms whether un-transferred write data remains in the server (S608). If the result of S608 is “Yes,” the program returns to S605 and executes the above process for the next write data. If the result of S608 is “No,” it means that receiving of the all write data of the determined write command at the S604 is completed. Accordingly, the program sends the completion message to the server. Then, the program confirms whether any unprocessed write commands remain in the write command list obtained in S601.
If the result of S610 is “Yes,” the program returns to S604.The program determines the next write command for processing and executes S605 to S609 to the determined next write command. Eventually, the result of S610 will be “No,” and the program terminates the processing (S611).
Some portions of the detailed description are presented in terms of algorithms and symbolic representations of operations within a computer. These algorithmic descriptions and symbolic representations are the means used by those skilled in the data processing arts to most effectively convey the essence of their innovations to others skilled in the art. An algorithm is a series of defined steps leading to a desired end state or result. In the example implementations, the steps carried out require physical manipulations of tangible quantities for achieving a tangible result.
Moreover, other implementations of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the example implementations disclosed herein. Various aspects and/or components of the described example implementations may be used singly or in any combination. It is intended that the specification and examples be considered as examples, with a true scope and spirit of the application being indicated by the following claims.
The application is a continuation of U.S. application Ser. No. 13/897,188, filed on May 17, 2013, the disclosure of which is incorporated by reference in its entirety for all purposes.
Number | Date | Country | |
---|---|---|---|
Parent | 13897188 | May 2013 | US |
Child | 15226695 | US |