The present disclosure relates to an approach that improves disk reliability by periodically disabling drives included in a RAID configuration for a period of time allowing the disabled drives to rest and consequently improve reliabiity.
RAID is an acronym for “Redundant Array of Independent Disks” which is a storage technology that provides increased reliability and functions through redundancy. RAID technology includes computer data storage schemes that can divide and replicate data among multiple physical drives. Various quality levels of drives are currently marketed. “Enterprise-class drives” are typically found in robust environments that often require continuous availability around the clock and every day of the year. Enterprise-class drives are often found as storage utilized in online-accessible servers and the like, such as used in web sites, etc. In contrast, desktop-class drives are found in less robust environments, such as found in a typical user's computer system. Enterprise-class and desktop-class drives differ in a number of criteria such as error recovery time limits, rotational vibration tolerances, error correction-data integrity, and other quality features. The differences in criteria generally allow the enterprise-class drives to operate in a robust environment that might cause their desktop-class drive counterparts to fail. Because of the more robust quality criteria found in enterprise-class drives, the cost of enterprise-class drives is typically considerably higher than the cost of desktop-class drives of otherwise similar specifications (e.g., capacity, speed, etc.).
An approach is provided to inactivate a selected drive included in a RAID configuration that provides data redundancy by using a predefined RAID algorithm. While the selected drive is inactive, write requests are handled by identifying data blocks destined to be written to each of the drives included in the RAID configuration including the selected drive. The identification of the blocks to be written to the various drives is based on the RAID algorithm. The identification further identifies a data block address that corresponds to each of the data blocks. The data blocks destined to one or more non-selected (active) drives are written to the non-selected drives at the corresponding data block addresses. The data block destined to the selected (inactive) drive is instead written to a memory area outside of the RAID configuration. In addition, the data block address corresponding to the data block destined to the selected drive is also written to the memory area. After a period of time, the selected drive is reactivated. During reactivation, the data block addresses and their corresponding data blocks that were written to the memory area are read from the memory area and each of the data blocks are written to the selected drive at the corresponding data block addresses.
The foregoing is a summary and thus contains, by necessity, simplifications, generalizations, and omissions of detail; consequently, those skilled in the art will appreciate that the summary is illustrative only and is not intended to be in any way limiting. Other aspects, inventive features, and advantages of the present invention, as defined solely by the claims, will become apparent in the non-limiting detailed description set forth below.
The present invention may be better understood, and its numerous objects, features, and advantages made apparent to those skilled in the art by referencing the accompanying drawings, wherein:
Certain specific details are set forth in the following description and figures to provide a thorough understanding of various embodiments of the invention. Certain well-known details often associated with computing and software technology are not set forth in the following disclosure, however, to avoid unnecessarily obscuring the various embodiments of the invention. Further, those of ordinary skill in the relevant art will understand that they can practice other embodiments of the invention without one or more of the details described below. Finally, while various methods are described with reference to steps and sequences in the following disclosure, the description as such is for providing a clear implementation of embodiments of the invention, and the steps and sequences of steps should not be taken as required to practice this invention. Instead, the following is intended to provide a detailed description of an example of the invention and should not be taken to be limiting of the invention itself. Rather, any number of variations may fall within the scope of the invention, which is defined by the claims that follow the description.
The following detailed description will generally follow the summary of the invention, as set forth above, further explaining and expanding the definitions of the various aspects and embodiments of the invention as necessary. To this end, this detailed description first sets forth a computing environment in
Northbridge 115 and Southbridge 135 connect to each other using bus 119. In one embodiment, the bus is a Direct Media Interface (DMI) bus that transfers data at high speeds in each direction between Northbridge 115 and Southbridge 135. In another embodiment, a Peripheral Component Interconnect (PCI) bus connects the Northbridge and the Southbridge. Southbridge 135, also known as the I/O Controller Hub (ICH) is a chip that generally implements capabilities that operate at slower speeds than the capabilities provided by the Northbridge. Southbridge 135 typically provides various busses used to connect various components. These busses include, for example, PCI and PCI Express busses, an ISA bus, a System Management Bus (SMBus or SMB), and/or a Low Pin Count (LPC) bus. The LPC bus often connects low-bandwidth devices, such as boot ROM 196 and “legacy” I/O devices (using a “super I/O” chip). The “legacy” I/O devices (198) can include, for example, serial and parallel ports, keyboard, mouse, and/or a floppy disk controller. The LPC bus also connects Southbridge 135 to Trusted Platform Module (TPM) 195. Other components often included in Southbridge 135 include a Direct Memory Access (DMA) controller, a Programmable Interrupt Controller (PIC), and a storage device controller, which connects Southbridge 135 to nonvolatile storage device 185, such as a hard disk drive, using bus 184. RAID controller 180 is used to provide a hardware-based RAID configuration attached to system 100 via PCI Express 1-lane interface 178. The RAID configurations described herein can be either hardware-based or software-based RAID configurations.
ExpressCard 155 is a slot that connects hot-pluggable devices to the information handling system. ExpressCard 155 supports both PCI Express and USB connectivity as it connects to Southbridge 135 using both the Universal Serial Bus (USB) the PCI Express bus. Southbridge 135 includes USB Controller 140 that provides USB connectivity to devices that connect to the USB. These devices include webcam (camera) 150, infrared (IR) receiver 148, keyboard and trackpad 144, and Bluetooth device 146, which provides for wireless personal area networks (PANs). USB Controller 140 also provides USB connectivity to other miscellaneous USB connected devices 142, such as a mouse, removable nonvolatile storage device 145, modems, network cards, ISDN connectors, fax, printers, USB hubs, and many other types of USB connected devices. While removable nonvolatile storage device 145 is shown as a USB-connected device, removable nonvolatile storage device 145 could be connected using a different interface, such as a Firewire interface, etcetera.
Wireless Local Area Network (LAN) device 175 connects to Southbridge 135 via the PCI or PCI Express bus 172. LAN device 175 typically implements one of the IEEE 802.11 standards of over-the-air modulation techniques that all use the same protocol to wireless communicate between information handling system 100 and another computer system or device. Optical storage device 190 connects to Southbridge 135 using Serial ATA (SATA) bus 188. Serial ATA adapters and devices communicate over a high-speed serial link. The Serial ATA bus also connects Southbridge 135 to other forms of storage devices, such as hard disk drives. Audio circuitry 160, such as a sound card, connects to Southbridge 135 via bus 158. Audio circuitry 160 also provides functionality such as audio line-in and optical digital audio in port 162, optical digital output and headphone jack 164, internal speakers 166, and internal microphone 168. Ethernet controller 170 connects to Southbridge 135 using a bus, such as the PCI or PCI Express bus. Ethernet controller 170 connects information handling system 100 to a computer network, such as a Local Area Network (LAN), the Internet, and other public and private computer networks.
While
The Trusted Platform Module (TPM 195) shown in
If it is not time to inactivate one of the drives in the RAID configuration, then decision 520 branches to the “no” branch whereupon normal RAID operations are used to read and write data to all of the drives in the RAID configuration with none of the drives being inactive. Depending on the amount of inactive time desired per drive, normal operation using all of the drives in an active fashion may continue for some time. For example, a schedule could be established allowing normal operations for some amount of time (e.g., an hour, etc.) and then one of the drives is selected and inactivated for some amount of time (e.g., a half hour, etc.) followed by normal operations for an amount of time (e.g., another hour, etc.), followed by the next drive in the configuration being inactivated for an amount of time (e.g., a half hour, etc.) and so on. In this fashion, each drive is able to occasionally rest for a period of time so that the drive is not continuously used for an overly extended period which may otherwise cause the drive to prematurely fail.
Returning to decision 510, if it is time to inactivate one of the drives included in the RAID configuration, then decision 510 branches to the “yes” branch whereupon, at step 510, a drive is selected from the RAID configuration (e.g., the next drive in the series is cyclically selected, etc.). At step 525, the selected drive is inactivated. In the example shown, Disk 0 (330) is selected from RAID configuration 320. In a cyclical implementation, after Disk 0 is reactivated, the next drive to be selected would be Disk 1(331), followed by Disk 2 (332), followed by Disk 3, and so on until the last drive in the configuration is selected, at which point the selection would revert back to the first drive (Disk 0 (330)) and the inactivation process would continue. In addition, at step 525, a trigger, such as a timer, etc. is initiated so that when the trigger occurs (e.g., a half hour of inactive time, etc.) the inactive drive will be reactivated as shown in
While the selected drive is inactive, processing of requests received at the RAID controller (software or hardware implemented controller) are handled as shown in steps 535 through 570. At step 535 a request is received at the RAID controller. A decision is made as to whether the request is to read data from the RAID configuration or to write data to the RAID configuration (decision 540). If the request is to read data from the RAID configuration, then decision 540 branches to the “read” branch whereupon, at step 550, the RAID controller identifies the responsive data that is stored on the active (non-selected) drives using the redundancy provided by the RAID algorithm to get data from the active drives that would otherwise have been retrieved from the selected (inactive drive). The responsive data retrieved from the non-selected (active) drives is returned to the requestor at step 555.
Returning to decision 540, if the request was to write data to the RAID configuration, then decision 540 branches to the “write” branch whereupon, at step 560, the RAID controller identifies data blocks to be written to both the active drives and the inactive drive. At step 560, the data destined to be written to the active (non-selected) drives is written to the active drives at block addresses determined by the RAID algorithm. At step 570, the data blocks destined for the inactive (selected) drive are written to memory area 330 along with the data block addresses corresponding to the data blocks.
After the request has been handled by the RAID controller, a decision is made as to whether it is time to reactivate the inactive (selected) drive (decision 580). If it is not time to reactivate the selected drive, then decision 580 branches to the “no” branch which loops back to receive the next request at the RAID controller and handle it as described above. This looping continues until it is time to reactivate the selected drive, at which point decision 580 branches to the “yes” branch whereupon, at predefined process 590, the selected drive is reactivated (see
At step 630, the selected drive is activated (e.g., powered on, etc.). At step 640, the first entry from memory area 300 is selected with each entry providing a data block address and a data block with the data block address providing the address at which the data block is to be written to the selected drive. At step 650, the data block is written to the selected drive at the corresponding data block address. A decision is made as to whether there are more data entries in memory area 300 to process (decision 660). If there are more entries to process, then decision 660 branches to the “yes” branch which loops back to select the next entry from memory area 300 with the next entry providing the next data block address and data block to be written to the selected drive. This looping continues until all of the entries from memory area 300 have been processed so that all of the writes that were destined to the selected drive while the drive was inactive have been written to the selected drive. When there are no more entries in memory area 300 that need to be processed, then decision 660 branches to the “no” branch.
At step 670, the requests received while the selected drive was being reactivated and updated with the data stored in memory area 300 are retrieved from buffer memory 620. These buffered requests are processed using normal RAID (read/write) operations using all of the (now active) drives. Once the buffered requests have been processed, then processing returns to the calling routine (see
One of the preferred implementations of the invention is a client application, namely, a set of instructions (program code) or other functional descriptive material in a code module that may, for example, be resident in the random access memory of the computer. Until required by the computer, the set of instructions may be stored in another computer memory, for example, in a hard disk drive, or in a removable memory such as an optical disk (for eventual use in a CD ROM) or floppy disk (for eventual use in a floppy disk drive). Thus, the present invention may be implemented as a computer program product for use in a computer. In addition, although the various methods described are conveniently implemented in a general purpose computer selectively activated or reconfigured by software, one of ordinary skill in the art would also recognize that such methods may be carried out in hardware, in firmware, or in more specialized apparatus constructed to perform the required method steps. Functional descriptive material is information that imparts functionality to a machine. Functional descriptive material includes, but is not limited to, computer programs, instructions, rules, facts, definitions of computable functions, objects, and data structures.
While particular embodiments of the present invention have been shown and described, it will be obvious to those skilled in the art that, based upon the teachings herein, that changes and modifications may be made without departing from this invention and its broader aspects. Therefore, the appended claims are to encompass within their scope all such changes and modifications as are within the true spirit and scope of this invention. Furthermore, it is to be understood that the invention is solely defined by the appended claims. It will be understood by those with skill in the art that if a specific number of an introduced claim element is intended, such intent will be explicitly recited in the claim, and in the absence of such recitation no such limitation is present. For non-limiting example, as an aid to understanding, the following appended claims contain usage of the introductory phrases “at least one” and “one or more” to introduce claim elements. However, the use of such phrases should not be construed to imply that the introduction of a claim element by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim element to inventions containing only one such element, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an”; the same holds true for the use in the claims of definite articles.