The present invention relates generally to data storage in computer systems, and specifically to bad block management in a RAID system.
Data storage devices divide data storage capacity into sectors or “blocks.” A single physical drive may have many blocks. RAID (Redundant Array of Independent Disks) systems are storage systems that provide redundant arrays of hard disks. RAID systems protect against data loss due to hard disk failure. High-availability storage systems combine RAID techniques with hardware and firmware implementations that ensure the highest degree of data accessibility. High-availability storage systems must protect against the failure of major components, such as a controller, cache memory, or power supply.
Most commonly marketed high-availability RAID systems address the high level items that could cause an interruption to data accessibility. However, RAID manufacturers have overlooked the management of media errors. Media errors are errors encountered by a data storage system while attempting to access data from a physical drive. They are caused by failed sectors or “bad blocks” on the physical drive. When media errors occur in a single physical drive, the file or files using the bad blocks must be deleted. When media errors occur in a physical drive that is part of a RAID implementation, the RAID system attempts to recover the media error.
The design and implementation of a RAID system must take into consideration a practical and effective strategy for dealing with media errors. The storage device itself can manage media errors, or media errors can be managed by software or “firmware.” Under certain scenarios the firmware of a device creates media errors on a block of a disk (hereafter referred to as puncturing) by corrupting the Error Correcting Code (ECC) on the block. The firmware uses Small Computer System Interface (SCSI) commands READ LONG and WRITE LONG to corrupt the ECC and thereby record what blocks on the physical drive to puncture.
Existing systems utilizing Software Bad Block Management (SBBM) allocate an SBBM table. The SBBM table records each bad block as one entry. Device compatibility considerations limit the size of SBBM tables to 254 entries. When the SBBM table for a particular drive is exhausted, there is no option but to mark the drive as failed and the drive becomes unusable. In a RAID system, once a drive is dropped, the logical volume becomes degraded and the redundancy of the volume no longer exists. Any subsequent drive failure can cause the whole logical volume to go offline which causes data loss and data unavailability.
Media errors in sequential blocks, commonly called clustered media errors, are not uncommon. Clustered media errors may fill all available entries in an SBBM table very quickly. As the capacity of physical drives increases, the probability of having media errors on those physical drives also increases.
Consequently, it would be advantageous if a method and apparatus existed that were suitable for managing large numbers of clustered bad blocks in a storage system, and for dynamically expanding the capacity of SBBM.
Accordingly, the present invention is directed to a novel method and apparatus for managing large numbers of clustered bad blocks in a storage system, and for dynamically expanding the capacity of SBBM.
The present invention teaches a method of managing sequential bad blocks by storing the Logical Block Address (LBA) of the first bad block in the sequence and the number of bad blocks in the sequence. A data storage element storing the LBA of the first bad block in the sequence and the number of bad blocks in the sequence may also store a pointer to the next data storage element storing similar information concerning a subsequent sequence of bad blocks.
By this method, an SBBM table of 254 entries may store 254 separate sequences of bad blocks rather than 254 individual bad blocks. Furthermore, by using pointers to subsequent entries, the SBBM table may be expandable beyond the 254 entry limit, yet still compatible with existing standards.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention claimed. The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate an embodiment of the invention and together with the general description, serve to explain the principles.
The numerous objects and advantages of the present invention may be better understood by those skilled in the art by reference to the accompanying figures in which:
Reference will now be made in detail to the subject matter disclosed, which is illustrated in the accompanying drawings. The scope of the invention is limited only by the claims; numerous alternatives, modifications and equivalents are encompassed. For the purpose of clarity, technical material that is known in the technical fields related to the embodiments has not been described in detail to avoid unnecessarily obscuring the description. Any reference to Software Bad Block Management (SBBM) should be understood to encompass Logical Drive Bad Block Management (LDBBM) as well.
Referring to
Existing SBBM tables may store up to 254 bad block entries 100. Each bad block entry may store a sequence of bad blocks up to some maximum defined by the data type of the sequence count storage 104 in each bad block entry 100. Storage devices with firmware utilizing bad block entries 100 within existing SBBM tables can identify, and therefore manage, many times the number of bad blocks as existing implementations of SBBM tables. By managing more bad blocks, a storage device utilizing the present invention may continue to operate when a conventional storage device would have no choice but to fail. The present invention therefore enhances the reliability of data storage devices and data storage systems utilizing redundant data storage devices such as RAID systems.
The bad block entry 100 may include a next entry pointer 106 to point to a subsequent bad block entry 100 identifying the sequence of bad blocks that follows the present bad block entry 100 based on the LBA stored in the LBA storage and the number of bad blocks stored in the sequence count storage 104. A storage device with firmware utilizing bad block entries 100 with next entry pointers 106 could effectively expand the SBBM table beyond 254 entries by utilizing a portion of the storage on the storage device as an SBBM expansion.
Referring to
Referring to
Referring to
Implementing the present invention may require mechanisms for adding, deleting, consolidating and splitting bad block entries 100. Storage devices utilizing existing SBBM tables may simply add and remove bad blocks to the SBBM table as necessary. The present invention requires the firmware in a storage device to identify when a new bad block is adjacent to an existing bad block entry 100, and modify that bad block entry 100 accordingly. Furthermore, a bad block may bridge two previously separate bad block entries 100, in which case, the two bad block entries 100 should be consolidated. A storage device may overwrite one or more bad blocks such that a bad block entry 100 may no longer identify a continuous sequence of bad blocks; in that case the bad block entry 100 should be split into multiple bad block entries 100. Likewise, the sequence count storage 104 of the bad block entry 100 may define a maximum sequence length based on the data type of the sequence count storage 104, in which case, a sequence may need to be split into multiple bad block entries 100.
Referring to
If the SBBM list 300 is not empty, the firmware may determine 506 if the new bad block 500 is already recorded in the SBBM list 300. The firmware may determine 506 if the new bad block 500 is already recorded in the SBBM list 300 by traversing the SBBM list 300 to find the bad block entry 100 with largest LBA less than or equal to the LBA of the new bad block 500, then comparing the LBA of the new bad block 500 to the LBA and sequence count of the identified bad block entry 100. If the LBA of the new bad block 500 is between the LBA of the bad block entry 100 and the LBA plus the sequence count of the bad block entry 100, the new bad block 500 is already recorded and the process ends 520. If the new bad block 500 is not already recorded, the firmware may determine 508 if the new bad block should be recorded after the last entry 304 in the SBBM list 300 by comparing the LBA of the new bad block to the LBA plus sequence count of the last entry 304 in the SBBM list 300.
If the LBA of the new bad block 500 is greater than the LBA plus the sequence count of the bad block entry 100 identified by the last entry pointer 304 in the SBBM list, the firmware may create 510 a new bad block entry 100 and add 512 the new bad block entry to the end of the SBBM list 300. Referring to
If the LBA of the new bad block 500 is not greater than the LBA plus the sequence count of the bad block entry 100 identified by the last entry pointer 304 of the SBBM list 300, the firmware may determine 514 if the LBA of the new bad block 500 is less than the LBA of the bad block entry 100 identified by the first entry pointer 302 of the SBBM list 300. If the LBA of the new bad block 500 is less than the LBA of the bad block entry 100 identified by the first entry pointer 302 of the SBBM list 300, the firmware may create 510 a new bad block entry 100 and add 518 the new bad block entry 100 to the beginning of the SBBM list 300. Referring to
If the LBA of the new bad block 500 is not less than the LBA of the bad block entry 100 identified by the first entry pointer 302 of the SBBM list 300, the new bad block 500 may be inserted 522 somewhere in the SBBM list 300. Referring to
If the new bad block 500 is sequential to the next greater bad block entry, the firmware may determine 604 if the new bad block 500 is sequential to the next lesser bad block entry by determining if the LBA of the new bad block 500 is one greater than the LBA plus the sequence count of the next lesser bad block entry. If the new bad block 500 is sequential to the next lesser bad block entry, the next lesser bad block entry, the new bad block 500 and the next greater bad block entry may all be consolidated into a single bad block entry 100. The firmware may consolidate the next lesser bad block entry, the new bad block 500 and the next greater bad block entry by incrementing 606 the sequence count in the sequence count storage 104 of the next lesser bad block entry by one, and adding 608 the sequence count in the sequence count storage 104 of the next greater bad block entry. The firmware may then copy 610 the next entry pointer 106 from the next greater bad block entry to the next lesser bad block entry. The firmware may then determine 620 if the consolidated bad block entry 100 is approaching a maximum value.
As detailed herein, the data type of the sequence count storage 104 may limit the number of bad blocks each bad block entry 100 can identify. The firmware of a storage device utilizing the present invention may monitor the sequence count of each bad block entry 100 to determine if a bad block entry 100 is approaching a maximum limit. If the firmware determines that the sequence count of a bad block entry 100 has reached a maximum, the firmware may split 622 the bad block entry 100. Referring to
If the new bad block 500 is sequential to the next greater bad block but not sequential to the next lesser bad block entry, the firmware may modify the next greater bad block entry. The firmware may replace 612 the LBA in the LBA storage 102 of the next greater bad block entry with the LBA of the new bad block 500. The firmware may then increment 614 the sequence count in the sequence count storage 104 of the next greater bad block entry. The firmware may then determine 620 if the next greater bad block entry 100 is approaching a maximum value as described herein.
If the new bad block 500 is not sequential to the next greater bad block entry, the firmware may determine 616 if the new bad block 500 is sequential to the next lesser bad block entry by determining if the if the LBA of the new bad block 500 is one greater than the LBA plus the sequence count of the next lesser bad block entry. If the new bad block 500 is sequential to the next lesser bad block entry, the firmware may increment 618 the sequence count in the sequence count storage 104 of the next lesser bad block entry. The firmware may then determine 620 if the next lesser bad block entry 100 is approaching a maximum value as described herein.
If the new bad block 500 is not sequential to the next lesser bad block entry or the next greater bad block entry, the firmware may insert 624 a new bad block entry 100 into the SBBM list 300. Referring to
Whenever a storage device overwrites bad blocks, the blocks may no longer contain media errors. In that case, firmware implementing the present invention may remove the blocks from the SBBM list 300.
Referring to
If the write operation did not end before the first entry or start after the last entry, the firmware may determine 1106 the bad block entry 100 in the SBBM list 300 with the smallest LBA greater than or equal to the LBA of the write operation (next greater bad block entry), and the bad block entry 100 in the SBBM list 300 with the largest LBA less than or equal to the LBA of the write operation (next lesser bad block entry). The firmware may then determine 1108 if the write operation occurred entirely between the next lesser bad block entry and the next greater bad block entry. If the write operation occurred entirely between the next lesser bad block entry and the next greater bad block entry, the process ends 1126.
If the write operation did not occur entirely between the next lesser bad block entry and the next greater bad block entry, some portion of at least one bad block entry 100 may be overwritten by the write operation, and the firmware may remove such portions from the SBBM list 300. The firmware may determine 1110 if the write operation started at an LBA within the sequence of the next lesser bad block entry by determining if the LBA of the write operation was less than the LBA plus sequence count of the next lesser bad block entry. If the write operation did not start at an LBA within the sequence of the next lesser bad block entry, the firmware may determine 1112 if the write operation ended within the sequence of the next greater bad block entry by determining if the LBA plus the sequence count of the write operation was less than the LBA plus the sequence count of the next greater bad block entry. If the write operation did not end within the sequence of the next greater bad block entry, the firmware may delete 1116 the next greater bad block entry by modifying the next entry pointer 106 of the next lesser bad block entry to point to the bad block entry 100 identified by the next entry pointer 106 of the next greater bad block entry. The firmware may then determine 1100 new LBA and sequence count values for the any bad blocks overwritten during the write operation, but not accounted for during the previous sequence, and begin the process again.
If the firmware determines that the write operation did not start at an LBA within the sequence of the next lesser bad block entry, and determines that the write operation ended within the sequence of the next greater bad block entry, the firmware may adjust 1114 the sequence count stored in the sequence count storage 104 of the next greater bad block entry by a value equal to the different between the LBA of the next greater bad block entry and the sum of the LBA and sequence count of the write operation. The firmware may also adjust 1114 the LBA stored in the LBA storage 102 of the next greater bad block entry to the LBA plus the sequence count of the write operation. The process then ends 1126.
If the firmware determines that the write operation started at an LBA within the sequence of the next lesser bad block entry, and determines that the write operation ended within the sequence of the next lesser bad block entry, the firmware may insert 1122 a new bad block entry 100 having an LBA stored in the LBA storage 102 equal to the LBA plus the sequence count of the write operation, and having a sequence count stored in the sequence count storage 104 equal to the sum of the LBA and sequence count of the next lesser bad block entry minus the sum of the LBA and sequence count of the write operation. The firmware may set the next entry pointer 106 of the new bad block entry 100 to the bad block entry 100 identified by the next entry pointer 106 of the next lesser bad block entry, and it may set the next entry pointer 106 of the next lesser bad block entry to point to the new bad block entry 100. The firmware may also adjust 1120 the sequence count stored in the sequence count storage 104 of the next lesser bad block entry to reflect the difference between the LBA of the write operation and the LBA stored in the LBA storage 102 of the next lesser bad block entry. The process then ends 1126.
If the firmware determines that the write operation started at an LBA within the sequence of the next lesser bad block entry, and that the write operation did not end within the sequence of the next lesser bad block entry, the firmware determine 1130 if the write operation ended before the start of the start of the next greater bad block entry. If the firmware determines that the write operation ended before the start of the next greater bad block entry, the firmware may adjust 1134 the sequence count stored in the sequence count storage 104 of the next lesser bad block entry to reflect the difference between the LBA of the write operation and the LBA stored in the LBA storage 102 of the next lesser bad block entry. The process then ends 1126.
If the firmware determines that the write operation ended after the start of the next greater bad block entry, the firmware may determine 1132 if the write operation ended before the end of the next greater bad block entry as described herein. If the firmware determines that the write operation did not end before the end of the next greater bad block entry, the firmware may delete 1128 the next greater bad block entry as described herein, and adjust 1124 the sequence count stored in the sequence count storage 104 of the next lesser bad block entry to reflect the difference between the LBA of the write operation and the LBA stored in the LBA storage 102 of the next lesser bad block entry. The firmware may then determine 1100 new LBA and sequence count values for the remainder of the write operation and begin the process again.
If the firmware determines that the write operation did end before the end of the next greater bad block entry, the firmware may adjust 1136 the sequence count stored in the sequence count storage 104 of the next lesser bad block entry to reflect the difference between the LBA of the write operation and the LBA stored in the LBA storage 102 of the next lesser bad block entry. The firmware may also adjust 1138 the sequence count stored in the sequence count storage 104 of the next greater bad block entry by a value equal to the difference between the LBA of the next greater bad block entry and the sum of the LBA and sequence count of the write operation. The firmware may also adjust 1138 the LBA stored in the LBA storage 102 of the next greater bad block entry to the LBA plus the sequence count of the write operation. The process then ends 1126.
A storage device implementing methods described herein may effectively manage more bad blocks than is possible with existing technology. Such a storage device would have improved reliability and would be particularly suitable for implementation in a RAID system.
Referring to
It is believed that the present invention and many of its attendant advantages will be understood by the foregoing description, and it will be apparent that various changes may be made in the form, construction, and arrangement of the components thereof without departing from the scope and spirit of the invention or without sacrificing all of its material advantages. The form herein before described being merely an explanatory embodiment thereof, it is the intention of the following claims to encompass and include such changes.