The invention relates to hard disk drive systems, and more particularly, to time correlating grown defects found on a hard disk of a hard disk drive system.
A hard disk of a hard disk drive storage system is generally a rigid rotating platter having a planar magnetic surface on which digital data is stored. When hard disks are manufactured, they often have defective areas on them that cannot be used to store information. The defects typically occur in blocks, each of which typically corresponds to 5 bytes.
After hard disks are manufactured, and prior to shipment to the customer, the hard disks normally are tested to determine the number and location of defective blocks on the disks. Defective blocks that are present on a disk before it is shipped to the customer are called “primary” defects. Blocks on a disk that become defective after it has been placed in operation are called “grown” defects.
Hard disk drive systems typically include a mechanism that detects defective blocks on the disk and reports the defective blocks upon being queried by a source that is external to the disk drive system, such as, for example, a host processor that is interfaced to the disk drive system by a Small Computer System Interface (SCSI).
The hard disk drive system (HDDS) 6 typically includes a controller 11 (e.g., a SCSI controller), a hard disk controller (HDC) 12 and a recording channel 13. The recording channel 13 typically includes physical and electrical components, such as, for example, the read/write channels and magnetic recording head (not shown), the read/write head armature (not shown), the hard disk (not shown), and the pre-amplifier (not shown).
The HDDS driver software program allows various hard disk drive system parameters, such as a defective block parameter, for example, to be accessed. The manner in which such a driver program can be used to access a defective block parameter will now be described with reference to
When the driver program is executed by the host processor 2, the host processor 2 sends a “READ DEFECT DATA” command over system bus 4 to SCSI host adapter 5. The SCSI host adapter 5 translates the command into a SCSI command and sends the SCSI command over the SCSI bus 7 to the SCSI controller 11 of the disk drive system 6. Upon receiving the command, the SCSI controller 11 queries the HDC controller 12, which controls the writing and reading of information to and from the recording channel 13. In response to the query from the controller 11, the HDC 12 sends a response to the SCSI controller 11. The SCSI controller 11 transmits the response over the SCSI bus 7 to the SCSI host adapter 5, which translates the SCSI response into the language of the HDDS driver program and sends the translated response to the host processor 2.
The available defect list formats are described below in table 1. The defect list format field describes to the HDDS the format in which the HDDS must return the data.
If the “REQ_PLIST” is asserted (bit 4), the addresses of the primary defective blocks contained on the hard disk are to be returned in a list to the host processor in a format as requested by the host setting the bits 0, 1, and 2 of byte 2 (“DEFECT LIST FORMAT”) according to table 1. If “REQ_GLIST” is asserted (bit 3), the addresses of the grown defective blocks contained on the hard disk are to be returned in a list to the host processor in a format as requested by the host setting the bits 0, 1, and 2 of byte 2 (DEFECT LIST FORMAT”) according to table 1.
One of the disadvantages of the existing response data format shown in
Attempts have been made to provide technology that makes it easier to diagnose hard disk drive systems. Self-monitoring, analysis and reporting technology (S.M.A.R.T.) is disk drive system monitoring and reporting technology that, when coupled with supporting software, enables the reliability of a disk drive system to be predicted and reported. S.M.A.R.T. technology typically monitors certain parameters and determines when a threshold condition has occurred. When a threshold condition occurs, the occurrence of the threshold condition is reported to the end user to enable the end user to take action to prevent failure (e.g., backup data on another storage device).
Although S.M.A.R.T. technology has proven effective at enabling certain types of disk drive system failures to be predicted, it does not implicitly or explicitly report information about grown defects as a function of time over SCSI bus that would enable a typical customer to determine whether or not a disk drive system is about to fail as a result of grown defects. A rapid increase over time in the number of grown defects contained on a hard disk can be an indicator of loss of capacity and performance, and that hard disk failure is imminent. If such information were available to the customer, the customer could take proactive steps to prevent valuable data from being lost. Accordingly, a need exists for a method and apparatus for recording the occurrence of grown defects and the points in time at which the grown defects occurred, and for reporting the recorded information to a host system.
The invention provides a method and an apparatus for registering a point in time at which a memory element of a hard disk becomes defective. The apparatus comprises first logic configured to detect one or more occurrences of defective memory elements of a hard disk, second logic configured to record each respective time event at which a respective occurrence of a defective memory element was detected by the first logic, and third logic configured to report the recorded information to a host system.
The method comprises detecting when one or more memory elements of a hard disk become defective, recording each respective time event at which a respective defective memory element was detected, and reporting the recorded information to the host system.
The invention also provides a computer program for recording time events at which defective memory elements of a hard disk occur and for reporting the recorded information. The computer program is embodied on a computer-readable medium and comprises a first code segment for detecting when one or more memory elements of a hard disk become defective, a second code segment for recording each respective time event at which a respective defective memory element was detected, and a third code segment for reporting the recorded information to a host system.
The invention also provides a computer program for generating a request to obtain information from a hard disk drive storage system. The information requested relates to detected occurrences of defective memory elements and time events at which the occurrences of defective memory elements were detected. The program comprises a first code segment for generating a request for time correlated defect information from a hard disk drive storage system and a second code segment for processing a response to the request received from the hard disk drive storage system. The response includes a list of detected defective memory elements of a hard disk and points in time at which the defective memory elements were detected.
These and other features and advantages of the invention will become apparent from the following description, drawings and claims.
In accordance with the present invention, a hard disk drive system is configured with logic that time correlates grown defects. Preferably, the hard disk drive system time correlates grown defects by associating each occurrence of a grown defect with a time stamp. When the hard disk drive system receives a read defect data command sent by the host processor, the hard disk drive system returns time correlated grown defect information to the host processor. The host processor preferably executes a software program that processes the time correlated grown defect information to determine any changes in the number of grown defects that are occurring as a function of time. Preferably, the host processor causes the relationship of grown defects as a function of time to be displayed on a display monitor so that a user can determine the health of the disk drive system, e.g., whether a rapid increase in the number of grown defects occurring over time indicates that a failure of the disk drive system is imminent.
The hard disk drive system 50 of the invention typically includes a SCSI controller 51, an HDC 60 and a recording channel 70. The recording channel 70 typically includes read/write channels and magnetic recording head (not shown), a read/write head armature (not shown), a hard disk (not shown), and a pre-amplifier (not shown).
The HDC 60 will typically include the logic of the invention that time correlates grown defects by time stamping grown defects as they occur, recording the time stamped grown defects, and reporting the recorded time stamped grown defect information to a host system. The logic for performing these functions will typically be implemented in firmware in the HDC 60, although it may be implemented in one or more other components of the hard disk drive system 50, such as in one or more components of the recording channel 70, for example. The invention is not limited with respect to the location at which this logic is physically implemented in the hard disk drive system 50. For purposes of describing an exemplary embodiment of the invention, it will be assumed that the logic of the invention for time correlating grown defects is implemented in the HDC 60.
Whenever a grown defect occurs on the hard disk, the recording channel 70 reports the occurrence to the HDC 60. The capability of hard disk drive systems to report the occurrence of a grown defect is not new. Known hard disk drive systems have this capability, as demonstrated above by the description of
Each grown defect corresponds to a defective memory element of the hard drive. Typically, a grown defect occurs when a memory element corresponding to a block of memory of the hard disk becomes defective. However, the invention is not limited with respect to the size of the memory element of the hard disk that has to be defective in order for a grown defect to have occurred.
The defect list containing the time stamped grown defects may be stored on the hard disk itself or in some other memory device of the hard disk drive system 50. The host processor 30 retrieves the defect list and processes it to determine the health of the hard disk drive system 50. The operations of the host processor 30 are controlled by the operating system (OS) 80 of the host processor 30. The OS 80 may include code that instructs the host processor 30 to retrieve and process the defect list containing the time stamped grown defect occurrences from the hard disk drive system 50. Alternatively, the host processor 30 may execute a hard disk monitoring and reporting software program 90 that instructs the host processor 30 to retrieve and process the defect list. The OS 80 or the program 90 may instruct the host processor 30 to display information relating to the time stamped grown defect occurrences on a display monitor 22 so that an end user can view the information and decide whether to take action to prevent data stored on the hard disk from being lost.
The request issued by the host processor 30 may be similar to the request CDB shown in
It should be noted that
The response to the request shown in
The invention is not limited to use with the SCSI protocol. Other protocols, including the Serial Attached SCSI (SAS) protocol, the Fibre Channel protocol, the Advanced Technology Attachment (ATA) protocol, the Advanced Technology Attachment Packet Interface (ATAPI) protocol, the Serial ATA (SATA) protocol, the Universal Serial Bus (USB) protocol, and the Institute of Electrical and Electronics Engineers (IEEE) 1394 protocols, for example, are also suitable for use with the invention. These other protocols use requests and responses that are different from those described above for the SCSI protocol. However, those skilled in the art will understand, in view of the description provided herein, the manner in which those protocols can be modified to enable time stamped grown defects to be retrieved by the host system. Therefore, a description of the manner in which those protocols can be modified to achieve the goals of the invention will not be provided herein in the interest of brevity.
In addition or in lieu of displaying the list or information associated with the list, the host processor or OS may cause some other action to occur based on the information. For example, the host processor may evaluate the list and cause data stored on the hard disk to be backed up in a backup storage element (not shown) when the host processor determines that information on the list indicates that failure of the hard disk is imminent. Other actions are also possible, such as, for example, in a system that includes multiple hard disks, halting storing information on a disk that is about to fail and causing the information to be stored instead on a different hard disk of the system.
Also, although the invention preferably uses time stamps to time correlate defects, it is not necessary, although it is preferable, to record an exact instant in time at which a defect occurred. Instead, a time interval during which a defect occurred can be recorded. The term “time event” as that term is used herein means an instant in time as well as a time interval. A time interval may include multiple instants in time. Therefore, the phrase “a time event at which” a defective memory element is detected can mean an exact instant in time at which a defect occurs or is detected as well as a time interval during which a memory element becomes defective or the defect is detected.
It should be noted that the invention has been described with reference to some preferred and exemplary embodiments and that the invention is not limited to these embodiments. Modifications may be made to the embodiments shown herein and all such modifications are within the scope of the invention.