System and Method for Detecting False Positive Information Handling System Device Connection Errors

Abstract
False positive error warnings associated with hot insertion or removal of a device with an SAS link are filtered by comparing the timing of error warnings with the timing of hot insertion or removal of the device. An SCSI Enclosure Processor monitors physical device presence events through a side band bus, such as an I2C bus interfaced with physical devices. Upon detection of an error associated with the SAS link, an error filter module retrieves time stamped physical device presence events from the SCSI Enclosure Processor, compares the time stamp of the physical device presence event and suppresses the warning if the time stamp falls within a predetermined time of the error.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention may be better understood, and its numerous objects, features and advantages made apparent to those skilled in the art by referencing the accompanying drawings. The use of the same reference number throughout the several figures designates a like or similar element.



FIG. 1 depicts a block diagram of an information handling system that filters errors detected at an SAS link to avoid false positive error warnings due to hot insertion or removal of a device; and



FIG. 2 depicts a flow diagram of a process for filtering link error warnings to avoid false positive error warnings due to hot insertion or removal of a device.





DETAILED DESCRIPTION

Filtering error warnings associated with an information handling system physical and electrical interconnect, such as an SAS link, to account for errors generated by hot insertion or removal of a device avoids issuance of false positive error warnings at the information handling system. For purposes of this disclosure, an information handling system may include any instrumentality or aggregate of instrumentalities operable to compute, classify, process, transmit, receive, retrieve, originate, switch, store, display, manifest, detect, record, reproduce, handle, or utilize any form of information, intelligence, or data for business, scientific, control, or other purposes. For example, an information handling system may be a personal computer, a network storage device, or any other suitable device and may vary in size, shape, performance, functionality, and price. The information handling system may include random access memory (RAM), one or more processing resources such as a central processing unit (CPU) or hardware or software control logic, ROM, and/or other types of nonvolatile memory. Additional components of the information handling system may include one or more disk drives, one or more network ports for communicating with external devices as well as various input and output (I/O) devices, such as a keyboard, a mouse, and a video display. The information handling system may also include one or more buses operable to transmit communications between the various hardware components.


Referring now to FIG. 1, a block diagram depicts an information handling system 10 that filters errors detected at an SAS link to avoid false positive error warnings due to hot insertion or removal of a device. Information handling system 10 has plural processing components to process information, such as a CPU 12, RAM 14, plural network interface cards (NICs) 16 and a chipset 20. Chipset 20 interfaces with a network controller, such as an SAS controller 22 in an HBA or ROC, to support network controller interaction with CPU 12. SAS controller 22 provides an interface for the processing components to interact with SAS devices through an SAS Environment or Service Delivery Subsystem 24 that supports an SAS link 26. Devices supported by SAS link 26 include hard disk drives 28 or other types of SAS devices 30, such as tape drives, optical drives, scanners, facsimile devices, etc. . . . A link end device monitoring subsystem 32, such as an SCSI Enclosure Processor (SEP), monitors the primary SAS link and end device environment by communicating with devices 28 and 30 through an out of band management bus 34, such as an I2C bus. The SAS environment 24 including bus 26, devices 28 and 30, SEP 32 and I2C bus 34, may be integrated within information handling system 10 or distributed.


In operation, SAS controller 22 manages communication of information between processing components of information handling system 10 and SAS devices 28 or 30 through SAS link 26. SAS controller 22 also communicates through SAS link 26 with SEP 32 to monitor environmental information gathered through I2C management bus 34. An error detector 36 monitors traffic through SAS link 26 to detect errors that occur and tracks the errors in a physical error log counter 38. Error detector 36 generates warnings of link failure or impending link failing by tracking the rate-of-change of errors in physical error log counter 38 over time, such as failing connections in SAS delivery subsystem 24, target devices 28 or 30, or other components of the SAS solution set. The issuance of visual warnings by error detector 36 if the values in log counter 38 exceed or are about to exceed a predetermined level of errors allow end user corrective action. However, hot insertion or removal of a device at SAS link 26 generates errors which error detector 36 incorrectly perceives as a failed or failing connection resulting in issuance of a false positive error warning.


In order to avoid issuance of false positive error warnings, an error filter module 40 monitors error warnings generated by error detector 36 and filters those error warnings to account for errors generated by hot insertion or removal of a physical device at SAS link 26. If error detector 36 issues an error warning, error filter module 40 confirms or refutes the error warning determination and filters out issuance of false positive error warnings, such as error warnings generated by hot insertion or removal of a device at SAS link 26. For example, error filter module 40 queries SEP 32 for time stamped information about physical device presence events and compares the time of the physical device presence events with the time of the error warning generation. If a physical device presence event correlates sufficiently with generation of an error warning, such as within a predetermined time period, then error filter module 40 suppresses issuance of the error warning. If insufficient temporal correspondence is found between the issuance of an error warning and a physical device presence event, such as a hot insertion or removal, then error filter module 40 allows issuance of the warning at information handling system 10. Although FIG. 1 depicts a SEP 32 as providing physical device environmental monitoring for presence events, in alternative embodiments other types of hardware, firmware or software can monitor the environment of physical devices through an out of band bus to provide environmental information to error filter module 40.


Referring now to FIG. 2, a flow diagram depicts a process for filtering link error warnings to avoid false positive error warnings due to hot insertion or removal of a device. The process begins at step 42 with the monitoring of the physical error log counter for errors. At step 44, if an error is not detected then the process returns to step 42 to continue monitoring for errors. If at step 44 an error is detected, the process continues to step 46 to confirm or refute the error. At step 46, time stamped physical device presence information is retrieved from the link end device monitoring subsystem. At step 48 the physical device presence event information is analyzed to determine if a physical device presence event occurred within a predetermined time of a detected error event. If sufficient temporal correspondence exists between the physical device presence event and the error event, the process continues to step 50 to refute the error event, the refuted error event is filtered out from issuance as a false positive event, and the process returns to step 42. If at step 48 insufficient temporal correspondence exists between a physical device presence event and an error event, the process continues to step 52 to generate a link error warning and then returns to step 42 to continue the monitoring of the physical error log counter for additional errors.


Although the present invention has been described in detail, it should be understood that various changes, substitutions and alterations can be made hereto without departing from the spirit and scope of the invention as defined by the appended claims.

Claims
  • 1. An information handling system comprising: plural processing components operable to process information;a link controller interfaced with the processing components and operable to communicate information between the processing components and a link;a link interfaced with the link controller, the link operable to communicate information between the link controller and one or more devices interfaced with the link;a link end device monitoring subsystem interfaced with the link and a management bus, the link end device monitoring subsystem operable to communicate with the one or more devices through the management bus to monitor the status of the devices;a error detector interfaced with the link and operable to determined if errors associated with communication of information through the link exceed a predetermined error threshold; andan error filter module interfaced with the error detector and the link end device monitoring subsystem, the error filter module operable to filter errors determined by the error detector if a predetermined status is detected for a device by the link end device monitoring subsystem.
  • 2. The information handling system of claim 1 wherein the error filter module filters errors by: retrieving time stamped physical device presence information from the link end device monitoring subsystem;comparing the time of a physical device presence event with the time of the error; andfiltering out the error if the physical device presence event is within a predetermined time period of the error.
  • 3. The information handling system of claim 2 wherein the error filter module filters errors further by: allowing the error to issue if the physical device presence event is greater than the predetermined time period.
  • 4. The information handling system of claim 2 wherein the physical device presence event comprises a hot insertion of a device to the link.
  • 5. The information handling system of claim 2 wherein the physical device presence event comprises a hot removal of a device from the link.
  • 6. The information handling system of claim 1 wherein the link comprises an SAS link, the link controller comprises an SAS controller and the link end device monitoring subsystem comprises an SCSI enclosure processor.
  • 7. The information handling system of claim 6 wherein the management bus comprises an I2C bus.
  • 8. The information handling system of claim 6 wherein the device comprises a hard disk drive.
  • 9. The information handling system of claim 6 wherein the device comprises a tape drive.
  • 10. The information handling system of claim 6 wherein the device comprises an optical drive.
  • 11. A method for filtering errors associated with a link, the method comprising: communicating information across a link;detecting a predetermined link error threshold of errors associated with the communicating of information across the link;determining if a physical device presence event occurred within a predetermined time of the detecting a predetermined link error threshold of errors; andfiltering the predetermined link error threshold of errors if the physical device presence event occurred within the predetermined time.
  • 12. The method of claim 11 further comprising: issuing an error message if a physical device presence event did not occur within the predetermined time.
  • 13. The method of claim 11 wherein the determining if a physical device presence event occurred within a predetermined time of the detecting a predetermined link error threshold of errors further comprises: retrieving time stamped physical device presence information from a side band link end device monitoring subsystem;comparing the time of a physical device presence event with a time of the error.
  • 14. The method of claim 13 wherein the link comprises an SAS link and the side band link end device monitoring subsystem comprises an SCSI Enclosure Processor that interfaces with physical devices through a side band bus.
  • 15. The method of claim 14 wherein the side band bus comprises an I2C bus.
  • 16. The method of claim 13 wherein communicating information further comprises communicating information with a storage device.
  • 17. A system for managing communication across an SAS link with one or more devices, the system comprising: an error detector operable to detect errors in communication of information across the SAS link and to issue error warnings if the detected errors exceed a predetermined threshold;a link end device monitoring subsystem interfaced with the SAS link and interfaced with the one or more devices through a side band bus, the link end device monitoring subsystem operable to determine device presence events associated with the devices; andan error filter module interfaced with the error detector and the link end device monitoring subsystem, the error filter module operable to filter an error warning if the error warning occurs within a predetermined time of a device presence event.
  • 18. The system of claim 17 wherein the link end device monitoring subsystem comprise an SCSI Enclosure Processor.
  • 19. The system of claim 18 wherein the side band bus comprises an I2C bus.
  • 20. The system of claim 17 wherein the device presence events comprise a hot insertion of a device to the SAS link.