1. Technical Field
The present invention relates generally to computer systems and in particular to distributed storage systems. Still more particularly, the present invention relates to a method and system for improving error analysis and reporting in computer systems that include distributed storage systems.
2. Description of the Related Art
Distributed storage systems are being more commonly utilized in the computer industry. Over the last several years, significant changes have occurred in how persistent storage devices are attached to computer systems. With the introduction of Storage Area Networks (SANS) and Network Attached Storage (NAS) technologies, storage devices have evolved from locally attached, low capability, passive devices to remotely attached, high capability, active devices that are capable of deploying vast file systems and file sets.
As the storage devices and infrastructure become more and more distributed, isolating and correcting errors that occur within the distributed system becomes much more difficult. While a system administrator or monitoring component is made aware of the error, it is not easy to determine where within the distributed system the error actually occurred. The administrator is forced to implement a time consuming device-by-device analysis to determine whether the error occurred in the database, the file system, the storage device driver, the network connecting the distributed storage to the host system, or the distributed storage server.
A major contributing factor to the problem with conventional error resolution within a distributed system is the fact that error tracking and reporting in distributed storage systems is not coordinated. When a problem is detected on the host system, the occurrence/detection of the problem is not conveyed to the storage server(s). As a result, error tracking and logging may not even be turned on at the storage server(s). Conversely, when an error is detected at a storage server, the error is not reported to the host system, and error tracking and logging is not turned on at the host system.
Notably, even if tracking is turned on at both the storage server and the host system, it may be impossible to correlate events because the time stamps at each system differ. Additionally, the target and/or level of tracking at the host system and storage server may be incompatible. As an example, the host may be tracking storage writes at the highest level of detail while the storage server logs only the fact that a write to the device has occurred.
There are several current procedures for collating system trace and error logs in a distributed computing environment. Among these procedures are ones created by “syslog” a program utility introduced in BSD 4.2 and available on most Unix variants (e.g., AIX, Solaris, Linux) and “streams,” another program utility originally introduced as part of AT&T System V UNIX. Syslog provides a set of procedures that are not storage system specific. Streams provides a mechanism that enables a message to bi-directionally transverse a stack. However, the mechanism is not applied to error tracking and reporting.
U.S. Patent application, No. 20020188711, titled “Failover Processing in a Storage System,” provides policies for managing fault tolerance and high availability configurations. The approach encapsulates the knowledge of failover recovery between components within a storage server and between storage server systems. The encapsulated knowledge includes information about what components are participating in a Failover Set, how they are configured for failover, what is the Fail-Stop policy, and what are the steps to perform when “failing-over” a component. However, there is no mechanism for tracking errors across the entire storage system.
The present invention recognizes the above limitations in tracking and recording errors occurring in distributed storage systems, and the invention details a method to coordinate error tracking, level setting and reporting among the components of a distributed storage system that resolves the above described problem. These and other benefits are provided by the invention described herein.
Disclosed is a method and system for coordinating system-wide, error tracking, level setting and reporting among the components of a distributed storage system. These functions are initiated via a single trigger operation. Each component of the distributed system, e.g., the host system(s) and storage server(s), includes a trigger generation and response (TGR) utility, which generates an error tracking trigger (ETT). ETT comprises three primary sub-components: (1) a representation of the action that the initiator wants the stack error tracking mechanisms to take (e.g., start error logging for a specific target at a specific level of detail); (2) a message containing human readable data that the initiator wants the stack to immediately post in its logs; and (3) a route, representing the direction (to the host or to the storage server) that the trigger is to be transmitted through the stack.
TGR utility provides a software interface utilized to construct and transmit the error tracking trigger (ETT). The interface is accessible to all components of the stack and all host system applications with the appropriate permissions. The ETT is initialized by either host system applications and/or any layer of the stack, and the ETT is transmitted one layer at a time through the stack. Each intervening layer of the stack is designed/provided with a utility to examine the ETT and take the appropriate action(s) designated by the trigger. An error log is also maintained by each layer of the stack to record information about the error. User access is provided to these logs, and the user/administrator is able to review log entries immediately before and after the message for unusual events, and determine the source, timing and cause of errors. Accordingly, distributed tracking and logging of errors within the storage system is coordinated to track specific targets at a specific level of detail.
The above as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.
The invention itself, as well as a preferred mode of use, further objects, and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
The present invention provides a method and system for coordinating error tracking, level setting and reporting among the various layers/components of a distributed storage system via a single trigger operation. Each component of the distributed system includes a trigger generation and response (TGR) utility, which generates an error tracking trigger (ETT). ETT comprises three primary sub-components: (1) an action that the initiator wants the stack's error tracking mechanisms to take; (2) a message containing human readable data that the initiator wants the stack to immediately post in its logs; and (3) a route, representing the direction that the trigger is to be transmitted through the stack. The ETT is transmitted one layer at a time through the stack, and each intervening layer of the stack is equipped with a utility to examine the ETT and take the appropriate action(s), designated by the trigger. An error log is maintained by each layer of the stack and used to record information about the error and enable user determination of the source, timing and cause of errors.
While
With reference now to
Additionally, host system 101 includes a trigger generation and response (TGR) utility 131 and logical volume manager (LVM) 133 (within the illustrated embodiment), which along with other software components executing on each device within the connected distributed storage network, enables the various functional features of the invention. Specifically, TGR utility 131 generates a software construct referred to herein as an error tracking trigger (ETT), which is described in greater details below.
From a distributed storage network-level view, applications, such as databases 135 and file systems 137, execute on the host system 101 accessing virtualized storage pools (not shown). These storage pools are constructed by the hosts system(s) using file systems and/or logical volume managers and are physically backed by actual storage residing at one or more of the storage servers. As applications issue input/output (I/O) operations to the storage pools, these requests are passed through the host file system 137, host logical volume manager 131, and host device drivers 139.
For the purposes of the invention, the above described processing pipeline (host file system, host logical volume manager, host device driver, storage network protocol and storage server modules) are collectively referred to as the distributed storage system software stack. It is noted, however, that the described software stack is a logical definition of the software stack presented for illustration only. Certain implementations may not contain all of the above components explicitly or may contain additional/different named components providing similar functional features. For example, some operating systems which implement the features of the invention may not have a logical volume manager, but rather combine the function (of an LVM) with the file system. Those skilled in the art appreciate that the functional features of the invention are applicable to any of the different logical configurations of the software stack.
According to one embodiment, a software interface is constructed for the purpose of constructing and transmitting the error tracking trigger (ETT). This interface is accessible to all components of the stack and all host system applications with the appropriate permissions. The interface is synonymous with or a component part of trigger generation and response (TGR) utility 131. Thus, the invention applies to file systems and data bases. Notably, a general application of the features of the invention enables the initialization of a trigger by either host system applications and/or any layer of the stack.
An exemplary ETT is illustrated by
Returning to
Once the ETT is completed, the LVM issues the ETT, as shown at block 320, and the host's device driver receives the trigger and invokes a trigger receive/send algorithm, indicated by block 322. Then, the trigger is transmitted to the storage server, which invokes a trigger receive algorithm when the server receives the trigger, as indicated at block 324. Notably, the trigger receive algorithm is invoked by each layer of the stack.
Following, at block 410, the message within the trigger is interpreted and a determination made whether the error information should be recorded within an error log maintained by the destination component (i.e., the host system in the present example). Each component maintains an error log utilized for storing relevant information about errors that are encountered. If the error information is not of the type that is required to be placed in the error log, the process ends at block 412. If the error information is required to be logged, however, the required information about the error is recorded within the log, as indicated at block 414, and then the process ends at bock 412.
Once the trigger arrives at the storage server, the trigger message is read by the server. The interface composes the trigger and initiates the trigger's transmission through the stack in the designated direction. The trigger is transmitted one layer at a time so that each intervening layer of the stack can examine the trigger and take appropriate actions. As the layers of the stack take the action designated by the trigger, the distributed tracking and logging of the system becomes coordinated, and tracks a specific target at a specific level of detail. Additionally, as each layer of the stack posts the message to its log, the problem of correlating disparate system logs is resolved. Readers are able to review log entries immediately before and after the message for unusual events, in order to determine the source, timing and cause of errors.
As a final matter, it is important that while an illustrative embodiment of the present invention has been, and will continue to be, described in the context of a fully functional computer system with installed management software, those skilled in the art will appreciate that the software aspects of an illustrative embodiment of the present invention are capable of being distributed as a program product in a variety of forms, and that an illustrative embodiment of the present invention applies equally regardless of the particular type of signal bearing media used to actually carry out the distribution. Examples of signal bearing media include recordable type media such as floppy disks, hard disk drives, CD ROMs, and transmission type media such as digital and analogue communication links.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.