The present invention relates generally to data storage systems, and specifically to actions taken to detect and correct data integrity problems related to write commands in such systems.
Mass storage systems implement several kinds of mechanisms in order to ensure continued data availability and integrity. A high percentage of data integrity problems in such systems are caused at the time, and as part, of the very act of writing data to the media. One approach typically used to overcome such problems is known as “Write-Read-Verify”.
Typically, whenever a host computer writes data to the system, this data is temporarily stored in cache and the command is immediately acknowledged, and thus the latency of the individual write command is shortened. The cache eventually writes the data into the permanent media. Under the “Write-Read-Verify” approach, this write transaction is immediately followed by a second transaction, whereby the cache reads the data just written and compares the result of this read transaction with the data originally written by the host and temporarily stored in cache. If the comparison shows that the data was not correctly written, the write transaction from cache to disk can be repeated until it is completed successfully.
The extra transactions incurred when following the “Write-Read-Verify” approach considerably increases the rate of internal activity within the storage system. In systems working under heavy workload activity, this increase may affect the system's overall performance.
There is therefore a need for procedures that ensure the integrity of data just written to the permanent media in mass storage systems, but which have a lower negative impact in the system's overall performance.
In embodiments of the present invention, a data storage system comprises a group of mass storage devices which store respective data therein, the data being accessed by one or more hosts transmitting input/output (IO) requests to the storage system. The data is stored redundantly in the system, so that at least two mass storage devices each have a copy of the data. The IO requests comprise IO write requests, wherein data is written redundantly to at least two mass storage devices, and IO read requests, wherein data is read from one of the devices.
A method for ensuring integrity of a data portion written by a controller and stored on a disk drive is provided that includes, among other things, forming at least one queue of a plurality of verification tasks associated with the disk drive and executing at least one verification task associated with the data portion in accordance with the queue. The method also includes identifying each datum of the data portion as one of faulty and not faulty in accordance with the verification task.
The method may further include writing the data portion from a cache to the disk drive and repairing each datum identified as faulty.
The method may further include temporarily storing the data portion in a memory buffer. The memory buffer may be part of the disk controller or part of a cache memory of a storage system.
The method may further include erasing the data portion temporarily stored in the memory buffer after performing the verification task.
The repairing operation may include: taking no action; issuing a message to a user or to a system manager indicating that a fault has been identified; rewriting the data portion on the disk drive with the data portion temporarily stored in the memory buffer; and/or overwriting the data portion with a further data portion obtained from one or more alternative locations.
The method may further include defining the at least one verification task. The method verification task may include issuing a verify command for the data portion on the disk drive; reading the data portion from the disk drive at the location where it was written; sending a read request to an alternative location for a corresponding data portion in a system in communication with the controller; comparing the data portion in the disk drive with the corresponding data portion in the alternative location; reading meta-data associated with the data portion and verifying data sanity in the data portion in accordance with the metadata; reading further meta-data associated with the corresponding data portion in the alternative location and verifying data sanity in the data portion in accordance with the further metadata; and/or comparing metadata associated with the data portion in the disk drive with the further metadata associated with the corresponding data portion in the alternative location.
The method may further include acknowledging the completion of the write request, and the verification task may be executed substantially after the acknowledging operation.
The queue may be formed according to a scheme of: first in first out (FIFO); last in first out (LIFO); last recently used (LRU); most recently used (MRU); and/or random access.
The method may further include managing the at least one queue. The queue may be managed by: performing the at least one verification task before a maximum time elapses since the verification task was added to the queue; performing the at least one verification task after a minimum time elapses since the verification task was added to the queue; performing the at least one verification task when the disk controller determines there is a low demand for high priority read/write tasks; performing the at least one verification task when the disk controller determines an optimal time is reached based on a system demand overview; performing the at least one verification task when the at least one queue is a maximal length; performing the at least one verification task when a time stamp for the verification task exceeds a maximal time; performing the at least one verification task when an average time to perform a plurality of performed verification tasks in the queue exceeds a maximal value; and/or performing each verification task a maximal value of the most recent verification task.
The identifying operation may include: inability to read the data portion from the disk drive; inability to read the data portion from the disk drive within a given time limit; disagreement between the data portion read and a corresponding data portion read from an alternative location; disagreement between metadata associated with the data portion and the data portion; disagreement between the metadata associated with the data portion and further metadata associated with the corresponding data portion from the alternative location; and/or disagreement between two or more data instances of the corresponding data portion from the alternative location.
A data storage apparatus is provided that includes a storage media adapted to store data, a source media adapted to read data, and a controller adapted to receive write commands, read data from the source media, and write data to the storage media. The controller is adapted to manage at least one queue of a plurality of verification tasks, each of the verification tasks associated with a data portion read from the source media and written to the storage media. The controller is adapted to execute each verification task associated with the data portion in accordance with the queue. The controller is adapted to identify each datum of the data portion as one of faulty and not faulty in accordance with the verification task.
The controller may be adapted to repair each datum identified as faulty.
A device is provided that is adapted to execute a method for ensuring integrity of data written by a controller and stored on a disk drive. The device includes a managing arrangement adapted to manage at least one queue associated with the disk drive, the queue including a plurality of verification tasks, each verification task being associated with a data portion of the data. The device further includes a performing arrangement adapted to perform each verification task in accordance with the queue and an identifying arrangement adapted to identify each datum of the data portion as one of faulty and not faulty in accordance with the verification task. The device also includes a repairing arrangement adapted to repair each datum identified as faulty.
A computer-readable storage medium is provided that contains a set of instructions for a computer. The set of instructions includes managing at least one queue associated with the disk drive, the queue including a plurality of verification tasks, each verification task being associated with a data portion. The set of instruction also includes performing the verification task associated with the data portion in accordance with the queue and identifying each datum of the data portion as one of faulty and not faulty in accordance with the verification task. The set of instruction further includes repairing each datum identified as faulty.
A method for ensuring integrity of a data portion written by a controller and stored on a disk drive is provided. The method includes flagging with at least one scrubbing flag at least one data partition of the disk drive and scanning the disk drive for the scrubbing flags. The method also includes assigning at least one verification task to the data partition flagged with the scrubbing flag and executing the verification task assigned to the data partition. The method further includes identifying each datum of the data portion as one of faulty and not faulty in accordance with the verification task.
The method may further include writing the data portion from a cache to the data partition and repairing each datum identified as faulty.
The scanning may be performed: at regular intervals of time; after writing a predetermined amount of data; and/or after writing a predetermined number of write operations.
The scrubbing flag may include a verification task indicator and the assigning operation may include reading the verification task indicator and assigning the verification task based on the verification task indicator.
The present invention will be more fully understood from the following detailed description of the embodiments thereof, taken together with the drawings, a brief description of which is given below.
Reference is now made to
In embodiments of the present invention, data is stored on disks 12 as data blocks which are in turn organized into sets of consecutive data blocks called “partitions”. Partitions are the basic data portions used to manage data transactions between the controller and the disks, and between the controller and other components of a storage system with which it may communicate, and the present invention describes a method to ensure the integrity of data associated with partitions. The terms “partition” and “data portions” are used herein equivalently and they may be freely interchanged throughout this document. It must be further pointed out that in an exemplary embodiment of the present invention, where the controller 20 is a component of a storage system, sequences of consecutive partitions may be taken to form the basic storage unit of the system, known as a logical unit (LU) . LUs are thus logical sequences of data blocks, each of which may be associated with a logical address (LA) . A partition may thus be defined as a range of consecutive blocks in an LU. In embodiment of the present invention partitions may be considered to be of equal size.
The controller may be adapted to receive data that is to be written into the disks and to retrieve data form the disk and communicate it to other components of a storage system that are requesting it. Whenever data is sent to the controller in order to be stored on disks, for instance if a data partition is sent to controller 20 in order to be stored in one of the disks 12 associated with it, the main controller module 204 may store the data associated with that partition in the data memory buffer 206 and it may at the same time create a corresponding entry in the partition table 17. This entry may be used to manage the partition lifecycle while it exists in one of the disks 12 associated with controller 20. The controller 20 may eventually transmit to the disks, via disk communication module 208, the data associated with the partition, and the data may be stored on the disk in a substantially permanent way.
In embodiments of the present invention, the storage system of which the controller is part may be a redundant storage system, namely, a system in which more than one physical copy of every logical partition is stored. Table 17 may contain a column 226 indicating an alternative location in the storage system where the second physical coy of the partition indicated in this entry is located.
In a conventional write-verify-read algorithm, a partition may be written into the disk and may continue to be stored in the cache. Then the cache may immediately try to read the partition that has just been written. If the read operation is successful, the algorithm ends. If it is not successful, the cache (or more generally the disk controller) may take the partition that is stored in the cache and write it again. The same verification by means of a read attempt may be performed again.
In an exemplary embodiment of the present invention, a write operation is performed and then the verify is performed only when it is convenient to the system in terms of overall system considerations, which is discussed in detail in the following discussion.
In
Thus, while in some exemplary embodiments the data may be kept in cache until the verification is completed, there are alternative verification modes that do not require keeping the data in cache until it has been verified. Therefore, there are various methods for implementing the system.
Therefore, the present invention provides a system with several options and for each task created a specific verification option is chosen. The system may determine that all of the tasks are of a certain type and then of another type, or choose at random what type of verification to assign to this task. In one exemplary method, one verification type is applied to all tasks.
Notice also that it is possible that the partition for which we are defining a verification task has already a verification task in the queue waiting to be performed. In this case, the existing verification tasks for this partition can be deleted when the new task for this partition is added to the queue (or alternatively, the new task may overwrite the existing one) and indeed the new data can be temporarily stored in the same place where the previous data for this partition was temporarily stored. One possibility to implement this is by adding a bit in the partition table that indicates that a partition has a verification task in the queue. If the bit is on, we will look for that task in the queue and modify accordingly. It will also tell us where the data is temporarily stored. Every time that a verification task is created, the corresponding bit may have to be updated in the table. This bit may be a scrubbing flag.
Additionally, writing to a specific partition may cause damage in the partition that immediately precedes or immediately follows (on the disk), the specific partition. Thus, in an alternative exemplary embodiment, creating a task for a given partition also creates a task for the preceding partition and/or the following partition. The preceding partition and the following partition may be identified by the partition table. In this situation, the data corresponding to the preceding/following partition may not be in a cache, and therefore the verification may be just to attempt to read the data from the media, or issue a “verify” command. In the event there is a problem, then the correct data may be brought from an alternate location. The alternative location may be identified by the partition table.
Defining a verification task may be followed by the formation of a verification queue. Alternatively, a newly created verification task may be added to an already existing verification queue. Verification queues may be managed based on various schemes. The queue may be managed by an algorithm such as LRU (last recently used), MRU (most recently used), LIFO (last in first out) or FIFO (first in first out). Queue management algorithms may determine where to add the new task to the queue. Usually the new task is added to the end or tail of the queue, but alternative methods are possible, for instance adding to the middle or at a random position in the queue.
Another queue management issue addresses how to determine which verification task should be performed at any given moment. The queue is managed so that when the time comes to execute a verification task, the queue identifies which task to perform. Each verification task may have a timestamp which may be useful as part of the handling of the queue.
The appropriate time for executing a verification task may be decided by the main controller module 204 as part of the overall handling of the cache. The cache may be a disk controller and may have many demands made upon it from various systems, and may also have many tasks to perform. An exemplary embodiment of the present invention may determine the appropriate prioritization of the execution of a verification task according to a general overview of the system, and not necessarily because the partition has just been written. For instance, if the demands on the cache are momentarily high, then tasks like reading from and/or writing to the disk may be prioritized, and the verification task may be postponed. On the other hand, if there are many verification tasks in the queue that need to be performed, the cache may determine that completing verification tasks should be given priority. Additionally, the temporarily stored data may occupy precious cache space, which may weigh in favor of performing the verification tasks.
The prioritization of verification tasks may be made based on the kind of task to be executed by the cache. Additionally, the prioritization may be made based on and/or account for additional parameters for modifying the prioritization. These additional parameters may include: a maximal length of the queue, above which verification tasks may be immediately executed; a maximal time of the oldest verification task (as determined from a time stamp); a maximal value for the average times of tasks in the queue; a maximal time elapsed since the most recent verification task; etc.
An exemplary embodiment of the present invention may include writing data to a partition and verifying the data after some delay. Therefore, the present invention may include a type of scrubbing that, instead of checking all partitions in the system, addresses only partitions that have been modified recently and/or partitions that are proximate (e.g. either preceding or succeeding) to partitions that have been modified recently.
In an alternative exemplary embodiment of the present invention, a verification task is not created at the time of writing the data from cache to disk, but a flag or other indicator is associated with the data partition indicating that it requires verification. This write verification method may use a polling algorithm for selecting verification tasks. In this manner, the verification task is created after a delay from the write operation, either immediately before the verification task is performed, or before another delay before the verification task is performed.
For instance, a partition that has been written is marked in some manner so that at some later point in time (e.g., when the demands on the cache are reduced), some or all partitions may be scanned to determine which partitions are marked. When a marked partition is found, then a verification task may be created and/or executed for that partition. Therefore, the verification task need not be defined at the time of the write operation, but may be defined at some later point in time and/or immediately prior to execution. Thus, the queue may be of partitions to be verified and the particular verification task may be created at some later point in time.
It will be appreciated that the embodiments described above are cited by way of example, and that the present invention is not limited to what has been particularly shown and described hereinabove. Rather, the scope of the present invention includes both combinations and subcombinations of the various features described hereinabove, as well as variations and modifications thereof which would occur to persons skilled in the art upon reading the foregoing description and which are not disclosed in the prior art.