Method of recording information system events

Information

  • Patent Grant
  • 6282673
  • Patent Number
    6,282,673
  • Date Filed
    Wednesday, October 1, 1997
    27 years ago
  • Date Issued
    Tuesday, August 28, 2001
    23 years ago
Abstract
A method of recording events occurring in an information processing system in a memory unit. A system recorder is used as part of a server system which supports communication of digital information for a microcontroller network. The server system monitors the status of several system functions including temperature, cooling fan speeds, and the presence or absence of canisters and power supplies. The system updates the pertinent event messages and identification codes in the memory unit including the time such event or change in status occurred.
Description




APPENDICES




Appendix A, which forms a part of this disclosure, is a list of commonly owned copending U.S. patent applications. Each one of the applications listed in Appendix A is hereby incorporated herein in its entirety by reference thereto.




COPYRIGHT RIGHTS




A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.




BACKGROUND OF THE INVENTION




1. Field of the Invention




The invention relates generally to information processing systems, such as system servers and personal computers (PCs). More particularly, this invention relates to the management and maintenance of information system failures.




2. Description of the Related Art




Information processing systems, such as computer system servers, have virtually become an inseparable part of information processing networks. These systems communicate and process an enormous amount of information in a relatively short time. To perform these sophisticated tasks, a computer system server typically includes various subsystems and components such as a plurality of microprocessors, memory modules, various system and bus control units, and a wide variety of data input/output (I/O) devices. These computer components communicate information using various data rates and protocols over multiple system buses. The demand for faster processing speeds, and the revolutionary fast-track development of computer systems, have necessitated the use of interconnecting devices. The wide variety of these devices, coupled with various data transfer protocols, have added special complexity to the management and maintenance of faults occurring in such information systems.




To facilitate the understanding of the invention, a brief description of the I


2


C bus protocol is first provided.

FIG. 1

is a functional block diagram of an exemplary I


2


C bus application. As shown in

FIG. 1

, an I


2


C Bus


100


is provided to support data transfer among a variety of I


2


C devices. The I


2


C Bus


100


is a serial interface bus that allows multiple I


2


C devices to communicate via a bi-directional, two-wire serial bus. The I


2


C Bus


100


comprises two wires: a serial data line (SDA)


102


and a serial clock line (SCL)


104


. The SDA


102


carries data transmissions among I


2


C devices, and the SCL


104


carries the clock timing information that synchronizes the data transmission. A complete system usually consists of at least one microcontroller and other peripheral devices such as memory units and input/output (I/O) expanders for transferring data on the I


2


C Bus


100


. These peripheral devices may include liquid crystal display (LCD) and light emitting diode (LED) drivers, random access memory (RAM and read only memory (RON devices, clock/calendars, I/O expanders, analog-to-digital (A/D) and digital-to-analog (D/A) converters.




As shown in

FIG. 1

, a micro-controller A


106


and a micro-controller B


108


are coupled to the I


2


C Bus


100


for exchanging information on the I


2


C Bus


100


. Additionally, an I


2


C-ISA Interface


110


is connected to the P


2


C Bus


100


to provide access interface between industry standard architecture (ISA) devices and I


2


C devices. A LCD driver


112


is coupled to the I


2


C Bus


100


for displaying information accessed from other I


2


C devices located on the I


2


C Bus


100


. An I/O Expander


114


is also coupled to the I


2


C Bus


100


to enable I/O devices (not shown in this figure) to obtain direct access to the I


2


C Bus


100


. Moreover, a memory device


116


such as a RAM or an electrically erasable programmable read only memory (EEPROM) is also coupled to the I


2


C Bus


100


to provide storage of data transmitted by other I


2


C devices.




Each device connected to the I


2


C bus is software addressable by a unique address and simple master/slave relationships exist at all times. The term “master” refers to an I


2


C device which initiates a transfer command to another I


2


C device, generates clock signals, and terminates the transfer on the I


2


C bus. The term “slave” refers to the I


2


C device which receives the transfer command from the master device on the I


2


C bus. The P


2


C bus is a true multi-master bus which includes collision detection and arbitration to prevent data corruption if two or more masters simultaneously initiate data transfer. Moreover, I


2


C devices act as transmitters and receivers. A “transmitter” is the I


2


C device which sends the data to the I


2


C Bus


100


. A “receiver” is the I


2


C device which receives the data from the I


2


C Bus


100


. Arbitration refers to a procedure whereby, if more than one master simultaneously attempts to control the I


2


C Bus


100


, only one is allowed to do so and the transmitted message is not corrupted.




The I


2


C Bus


100


supports up to 40 I


2


C devices and may have a maximum length of 25 feet. The I


2


C Bus


100


supports a transfer data rate of up to 100 kilobits/second (kbps) in “standard mode,” or up to 400 kbps in “fast mode.” Data transfers over the I


2


C Bus


100


follow a well-defined protocol. A transfer always takes place between a master and a slave. All bus transfers are bounded by a “Start” and a “Stop” condition. In the standard mode, the first byte after the Start condition usually determines which slave will be selected by the master. In the fast mode, the first two bytes after the Start condition usually determine which slave will be selected by the maser. Each peripheral device on the I


2


C Bus


100


has a unique 8-bit address in the standard mode, or a 10-bit address in the fast mode. The address is hard-coded for each type of I


2


C device, but some devices provide an input pin that allows a designer to specify one bit of the device's I


2


C address. This allows two identical I


2


C devices used on the same bus to be addressed individually.




With the increased complexity of information processing systems, the frequency of system failures due to system- and component-level errors has increased. Some of the problems are found in the industry standard architecture (ISA) bus used in IBM PC-compatible computers. The enhanced ISA (EISA) provided some improvement over the ISA architecture of the IBM PC/AT, but more resistance to failure and higher performance are still required. Other problems may exist in interface devices, such as bus-to-bus bridges. Additionally, problems may exist in bus peripheral devices such as microcontrollers, central processors, power supplies, cooling fans, and other similar components.




With these added components and subsystems, occasional system failures have become inevitable. Existing information systems do not currently provide a tool for managing these failures. More importantly, present systems do not possess the means to more efficiently diagnose and restore the system from the occurrence of such failures. Therefore, when failures occur, there is a need to identify the events leading up to these failures. The ability to identify the events leading up to system failures minimizes downtime and ensures more efficient system maintenance and repair in the future.




SUMMARY OF THE INVENTION




One embodiment of the invention provides a method of recording event messages with a real-time stamp in a memory unit. The method records an event occurring in an information processing system having a computer bus and a system recorder. The method comprises the act of accessing the system recorder via the computer bus, transmitting a message to the system recorder in response to the event, and storing the message in a memory unit.











BRIEF DESCRIPTION OF THE DRAWINGS




The invention will be better understood by referring to the following detailed description, which should be read in conjunction with the accompanying drawings, in which:





FIG. 1

is a functional block diagram of an exemplary I


2


C bus application.





FIG. 2

is a functional block diagram of one embodiment of the invention.





FIG. 3

is a flow chart describing the decisional steps performed by one embodiment of the system recorder.





FIG. 4

is a flow chart describing the steps of performing an exemplary read from and/or write operations to a first block of the memory unit.





FIG. 5

is a flow chart describing the steps of performing an exemplary read operation from a second block of the memory unit.





FIG. 6

is a flow chart describing the steps of performing an exemplary write operation to a second block of the memory unit.











DETAILED DESCRIPTION OF THE INVENTION




The invention provides a method of recording a time-stamped history of events leading up to the failure of an information system server. The method may be applied to a black box recorder (hereinafter the “System Recorder”). One embodiment of the invention involves system operation on an Inter-Integrated-Circuit (I


2


C) bus. The operation of the System Recorder on an I


2


C bus should not, however, be construed to imply any limitations on the bus protocol which may be used with this invention. The invention may be implemented using virtually any bus protocol.




Referring now to

FIG. 2

, a functional block diagram of an embodiment of the invention is shown. This embodiment depicts a system server


200


which comprises the so-called “Intrapulse” system. The system server


200


is a network of microcontrollers integrated to support the transfer of data messages within the system server


200


. The system server


200


performs control and monitoring functions of processing units, power supplies, cooling fans, and similar functions. To achieve its monitoring objective, the system server


200


utilizes a variety of control, diagnostic, monitoring, and logging processors. The system server


200


may employ switches, indicators, or other controls to perform its monitoring and control functions. Optionally, the system server


200


need not employ any switches, indicators, or other controls, to perform these functions. This characteristic is often referred to as the “fly-by-wire” feature.




The system server


200


is a part of and supports one or more microcontroller networks (not shown in this figure). A microcontroller network further includes a variety of peripheral components such as AC/DC power supplies and cooling fans. The system server


200


may be divided into two main subsystems: the system board


202


and the back plane


204


. Communication between main system processing units (e.g., CPU


212


) and the system interface


214


on the system board


206


is supported by an industry standard architecture (ISA) Bus


208


. Communication among the devices located on the back plane


204


is supported by an I


2


C Bus


210


. The I


2


C Bus


210


also supports communication between the back plane


204


and the system board


206


.




The system board


206


comprises a system interface


214


and a plurality of central processor units (CPUs) including CPU “A Controller”


216


and CPU “B Controller”


218


interconnected via the I


2


C Bus


210


. One or more system CPU


212


communicates with the devices on the system board


206


via an ISA Bus


208


. The system interface


214


provides a bridging function between the CPU


212


and the A Controller


216


and B Controller


218


on the system board


206


. In addition, the system interface


214


interconnects the CPU


212


with other devices on the I


2


C Bus


210


. A remote interface


240


is connected to the system board


206


to allow remote access by outside clients to the system server


200


. Using a client modem


244


, a client computer


246


accesses a server modem


242


via a remote link


243


. The server modem


242


is typically directly connected to the remote interface


240


to support communication between the server system


200


and the client computer


246


.




The back plane


204


includes a system recorder


220


and a chassis controller


222


interconnected via the P


2


C Bus


210


. The system recorder


220


includes a real-time clock (RTC)


221


. Additionally, a non-volatile random access memory (NVRAM)


224


is directly connected to the system recorder


220


. A plurality of canister controllers are also coupled to the I


2


C Bus


210


to communicate with devices located on the back plane


204


and the system board


206


. These canister controllers include “Canister Controller A”


232


, “Canister Controller B”


234


, “Canister Controller C”


236


, and “Canister Controller D”


238


(the “canister controllers”). Generally, a canister is a detachable module which provides expendability to a plurality of peripheral component interconnect (PCI) devices.

FIG. 2

does not show the canister controllers as part of the back plane


204


because they are removable units,




One embodiment of the system recorder


220


is a high-performance, CMOS, fully-static, 8-bit microcontroller which controls read and write operations from and into the NVRAM


224


, respectively. The system recorder


220


of

FIG. 2

has a multi-level deep stack, and multiple internal and external interrupt sources. The system recorder


220


may employ a Harvard architecture for allowing a 14-bit wide instruction word with separate 8-bit wide data. The system recorder


220


has 192 bytes of RAM and 33 I/O pins. In addition, several peripheral features are available, including: three timer/counters, two Capture/Compare modules, and two serial ports. The system recorder


220


can directly or indirectly address its register files or data memory. All special function registers including the program counter are mapped in the data memory. The system recorder


220


has a synchronous serial port which may be configured as either a two-wire I


2


C bus or a 3-wire serial peripheral interface (SPI). An 8-bit parallel slave port is also provided. The system recorder


220


may be based on microcontrollers manufactured by Microchip Technology Inc., e.g., the PIC16C6X family of microcontrollers.




The RTC


221


is integrated in the system recorder


220


on the back plane


204


. The RTC


221


comprises two 32-bit counters which keep track of real time and elapsed time in seconds. The RTC


221


comprises a four-byte field (i.e., 32 bits) for recording time for over 125 years (2{circumflex over ( )}32 seconds) without having to reset itself. It is designed to count seconds when its input power (V


cc


) is applied and continually count seconds under battery backup regardless of the condition of V


cc


. The continuous counter is used to derive time of day, week, month, and year by using a software algorithm. Alternatively, the RTC


221


is used under the control of the system recorder


220


to record real time events. Communication to and from the RTC


221


takes place via a 3-wire serial port. A one byte protocol selects read/write functions, counter clear functions and oscillator trim. The RTC


221


records real time in an absolute format. The O/S uses a reference point in time in order to synchronize the RTC


221


with the standard 24-hour time format.




One embodiment of the NVRAM


224


is a 128-kbyte memory unit which is organized as 131,072 words by 8 bits. Each NVRAM


224


has a self-contained lithium energy source and control circuitry which continuously monitors its input voltage (V


cc


) for an out-of tolerance condition (e.g., +/− 10% of 5 Volts). When such a condition occurs, the lithium energy source is automatically switched on and write protection is unconditionally enabled to prevent data corruption. With special firmware, the NVRAM


224


is divided into two blocks: a first block having 64 kbytes of-memory space, and a second block having 64 kbytes of memory space. The first block of the NVRAM


224


is a fixed-variable memory block which stores ID codes of the devices installed in the network. In addition to ID codes, the first block of NVRAM


224


may also store one or more address pointers, each pointing to a memory address in the second block of NVRAM


224


. An address pointer may be a head pointer (indicating a start address) or a tail pointer (indicating an end address). The second block is a memory block which stores message codes in connection with events occurring in the network. The NVRAM


224


may be based upon devices manufactured by Dallas Semiconductor Corporation, e.g., the DS1245Y/AB 1024K Nonvolatile SRAM.




Once the system server


200


is powered on, the system recorder


220


writes an internal message entry to the NVRAM


224


. When the power up process is enabled, the back plane


204


monitors the status of several system events and functions. These functions may include system temperature, fan speeds, and changes in the installation or presence of canisters and power supplies. Non-specific or general faults on most devices in the microcontroller network may be monitored in a summary bit. However, the fans, canisters, and temperature of the CPU may be monitored with particularity.




The back plane


204


monitors a plurality of temperature sensors located on a temperature bus (not shown in this figure) once every predetermined time interval, e.g., every second. Each temperature sensor comprises a transducer connected to and having an address at a serial bus (not shown in this figure) on the back plane


204


. These transducers are read in the same sequence as their address order. The temperature may range between −25 and +70 degrees Celsius. If any of the temperature sensors reaches +55 degrees Celsius, or −25 degrees Celsius, then a warning is issued, and a message corresponding to that event is written to the NVRAM


224


, and sent to other destinations via the system interface


214


and the remote interface


240


. If any of the temperature sensors reaches +70 degrees Celsius, then a shutdown command is typically issued and the system is powered off.




The back plane


204


monitors the presence of the canisters several times per second. There are several methods to determine the presence or absence of a canister. To monitor the canister corresponding for Canister Controller A


232


for example, the chassis controller


222


sends a reset pulse to that canister, preferably through a direct one-wire serial bus connection. If the canister is changed/replaced, then the chassis controller


222


updates a canister presence bit accordingly and sends a canister event message to the system recorder


220


and remote interface


240


, preferably via the I


2


C Bus


210


. The system recorder


220


replaces the ID code (e.g., a serial number string) of the previous canister (corresponding to Canister Controller A


232


) by the ID code of the current canister in the NVRAM


224


accordingly. If a canister is removed from the server system


200


, then the length of the ID code string of that (absent) canister is set to zero. However, if a new canister is installed in its place, the ID code of the new canister is written to the NVRAM


224


. Serial numbers are typically stored in NVRAM


224


in BCD format.




Similarly, the back plane


204


monitors the presence or absence of power supplies several times per second. To monitor a particular power supply, the chassis controller


222


transmits a reset pulse to detect a pulse of the power supply, preferably via a direct one-wire serial bus. If a power supply is replaced, the chassis controller


222


updates the presence bit for that power supply and sends a message corresponding to that power supply event to the NVRAM


224


and the remote interface


240


. If a power supply is removed from the network, then the length of the ID code (e.g., serial number string) of that (absent) power supply is set to zero. However, if a new power supply is installed in the network, the system recorder


220


writes the ID code of that power supply into the NVRAM


224


.




Similarly, the back plane


204


may monitor the speeds of the cooling fans of all CPUs in the same sequence as the CPU's address order. For instance, the cooling fan of the system board


206


generally has a low-speed limit of about 30 revolutions per second (rps). Moreover, the cooling fan of a canister typically has a low-speed limit of about 20 rps. If the speed of any fan falls below its set low limits, a fan fault corresponding to that fan is issued. In addition, the system recorder


220


writes a fan event message into the NVRAM


224


. Corrective measures such as setting a fan fault LED on, and setting the fan speed to high, may also be performed.




The protocol of the I


2


C Bus


210


uses an address in the memory NVRAM


224


of the system server


200


as the means of identifying various control and diagnostic commands. Any system function is queried by generating a “read” request. Conversely, a function can be executed by generating a “write” request to an address specified in the protocol format. An I


2


C device in the system server


200


initiates read and write requests by sending a message on the I


2


C bus. A read or write request may consist of a payload, a message, and a packet. A payload is the data included in the request command. A message is a wrapper around the payload. In addition to the data, the message includes a slave address, a least significant bit (LSBit), a most significant bit (MSBit), a data type, a command ID (LSByte and MSByte), and status. A packet is a wrapper around a message that is transferred to the ISA Bus


208


. The packet includes check sum and inverted slave address fields.




The slave address is typically a 7-bit wide field which specifies the identification code of a slave device. The slave address usually occupies the first byte of the message. The LSBit may specify the type of activity that is taking place on the bus. If the LSBit is set to 1 (i.e., high), the master is reading from a slave device. If the LSBit is set to 0 (i.e., low), then the master is writing to a slave device. The MSBit is bit


7


of the second byte (0-7 bits) of the message which specifies the type of command being executed. If the MSBit is set to a 1, then the command is a read command. If the MSBit is set to a 0, then the command is a write command. The data type specifies the data format of a read or write command. There are several data types that may be used in the server system


200


. The data types include: a bit data type, a byte data type, a string data type, a log data type, an event data type, a queue data type, a byte array data type, a lock data type, and a screen data type. These data types determine the value specified in the Type field of a message.




A bit data type is typically used for a simple logic value, such as True (1) and False (0), or On (1) and Off (0). The byte data type is used for a single-byte value, with a variable length of 0 through FF (hexadecimal). A string data type is used for a variable-length string of data having a length of 0 to FF bytes. The log data type is used to write a byte string to a circular log buffer, such as the NVRAM


224


. The log data type records system events in the NVRAM


224


. A byte array data type is used for general data storage which is not anticipated in the implementation of the Intrapulse system. An event data type is used to alert external interfaces of certain events occurring in the system server


200


, such as status changes in the CPU, power supplies, canisters, cooling fans, temperature, screen, queue, and O/S timeout. A screen data type is used to communicate character mode screen data from BIOS to time remote interface unit


240


.




The command ID (LSByte) specifies the least significant byte of the device address. Command ID (MSByte) specifies the most significant byte of the device address. The status byte specifies whether or not a command has been executed successfully. A non-zero entry indicates an execution failure. The check sum byte specifies a direction control byte to ensure the integrity of a message on the bus. The check sum byte is typically calculated in the system server


200


firmware. Finally, the inverted slave address byte specifies the slave address in an inverted format. The inverted slave address byte is also calculated in the system server


200


firmware.




Referring now to

FIG. 3

, a flow chart is provided for describing the decisional steps performed by the system recorder


220


upon receiving interrupt commands from other microcontrollers. At step


302


, the system recorder


220


is typically in an idle state, i.e., waiting for commands from other microcontrollers in the network. At step


304


, the system recorder


220


determines if an interrupt command is detected from other microcontrollers. If no interrupt command is detected, then at step


306


, the system recorder


220


checks if a reset command is received from other microcontrollers. A reset command is a request to clear the all memory cells in the NVRAM


224


. If a reset command is detected, then at step


308


, the system recorder


220


clears all memory cells in the NVRAM


220


and returns to its idle state at step


302


, and the entire process repeats itself. If a reset command is not detected, then at step


310


, the system recorder


220


updates the RTC


221


time every one second. At this step, the system recorder


220


reads the real time clock and saves the real time in its local register (not shown in this figure).




If, at step


304


, an interrupt command is detected from other microcontrollers, the system recorder


220


determines the type of data in the interrupt command at step


312


. For the purpose of logging message events in the NVRAM


224


, the log data and event data type are pertinent. As noted above, the log data type is used to write a byte string to a circular log buffer, such as the NVRAM


224


. The log data type records system events in the NVRAM


224


. The maximum number of bytes that can be written in a log entry is 249 bytes. For some embodiments of the invention, the system recorder


220


adds a total of six bytes at the beginning of the interrupt command: a two-byte identification code (ID), and a four-byte timestamp for recording the real time of the occurrence of the system event.




Based on the interpretation of the data type at step


314


, the system recorder


220


determines whether the interrupt command is intended to be communicated to the first block or second block of the NVRAM


224


. If the interrupt command is intended to go to the first block of NVRAM


224


, then the process described in

FIG. 4

is followed. If the interrupt command is not intended to be transmitted to the first block of NVRAM


224


, then it is intended to go to the second block of NVRAM


224


. At step


316


, the system recorder


220


determines whether the interrupt command is a read or write command for the second block. If the interrupt command is a read command, then the process described in

FIG. 5

is followed. If the interrupt command is not a read command, then it is a write command and the process described in

FIG. 6

is followed.




Referring to

FIG. 4

, a flow chart is provided for describing the steps of performing a read from and/or write to the first block of the NVRAM


224


. As noted above, the first block of the NVRAM


224


is a 64-kbyte memory block. The first block is a fixed-variable memory block which stores ID codes of the devices installed in the network. Hence, a command addressed to the first block is typically generated by a controller (e.g., chassis controller


222


) responsible for updating the presence or absence of devices in the network. The process described in

FIG. 4

is followed when, at step


314


(shown in FIG.


3


), the system recorder


220


determines that the command interrupt is intended for the first block of the NVRAM


224


.




As shown in

FIG. 4

, at step


402


, the system recorder


220


determines whether the interrupt command is to read from or write to the NVRAM


224


. If the command interrupt is a read command, then at step


404


, the system recorder


220


loads the address pointer at the intended address location in NVRAM


224


. At step


406


, the system recorder


220


reads the intended message from the address location in the NVRAM


224


, and forwards the read data to the master device (i.e., device requesting the read operation) in the network. After the read operation is complete, at step


412


, the system recorder


220


issues an interrupt return command to return to its idle state at step


302


(shown in FIG.


3


).




If at step


402


the system recorder


220


determines that the interrupt command is a write command, then at step


408


, the system recorder


220


loads the address pointer at the intended address location in NVRAM


224


. The system recorder


220


typically checks on the availability of memory space in NVRAM


224


prior to executing a write operation (see

FIG. 6

for details). At step


408


, the system recorder


220


writes the event message to the address location in the NVRAM


224


, and forwards a confirmation to the master device in the network. After the write operation is complete, at step


412


, the system recorder


220


issues an interrupt return command to return to its idle state at step


302


(shown in FIG.


3


).




Referring now to

FIG. 5

, a flow chart is provided for describing the steps of performing a read operation from the second block of the NVRAM


224


. As noted above, the second block of the NVRAM


224


is a 64-kbyte memory block. The second block is a memory block which stores event messages in connection with events occurring in the network. Hence, a command addressed to the second block is typically generated by a controller responsible for updating the occurrence of such events. The process described in

FIG. 5

is followed when, at step


316


(shown in FIG.


3


), the system recorder


220


determines that the interrupt command is a read command intended to be transmitted to the second block of the NVRAM


224


.




As shown in

FIG. 5

, if the system recorder


220


determines that the interrupt command is a read operation, then at step


502


, the system recorder


220


loads an address pointer to the intended address in the second block of NVRAM


224


. At step


504


, the system recorder


220


performs a read operation of the first logged message from the NVRAM


224


commencing with the intended address location. For a read operation, it is preferable that only the


65534


(FFFEh) and


65533


(FFFDh) addresses be recognized. The address


65534


specifies the address of the oldest valid message. The address


65533


specifies the address of the next message following the last message read from the log in NVRAM


224


. The last address in the second block of the NVRAM


224


is


65279


(FEFFh). This is also the address at which the system recorder


220


performs a pointer wrap operation (see

FIG. 6

for details). In doing so, the system recorder


220


redirects the address pointer to the beginning of the second block of the NVRAM


224


. Hence, the address of the next message address after the


65279


address is 0. To perform a read operation of the entire second block in a chronological order, the timestamp is read first. Then, the message logged at address


65534


is read second. This message constitutes the first logged message. Then, the message logged at address


65533


is read next. This message is the next logged message. Then, the message logged at address


65533


is read again to read all subsequently logged messages. The reading at address


65533


terminates until the status field returns a non-zero value.




At step


506


, the system recorder


220


determines whether the address location has reached the end of the second block in the NVRAM


224


. If the address location has not reached the end of the second block, then at step


508


, the system recorder


220


performs a read operation of the next logged message using the addressing scheme described above. The system recorder


220


transmits all read messages to the master device via the I


2


C bus. If the address location has reached the end of the second block, then at step


510


, the system recorder


220


issues an interrupt return command to return to its idle state


302


(shown in FIG.


3


).




Referring now to

FIG. 6

, a flow chart is provided for describing the steps of performing a write operation to the second block of the NVRAM


224


. Typically, a command addressed to the second block is generated by a controller (e.g., chassis controller


222


) responsible for updating the occurrence of such events. The process described in

FIG. 6

is followed when, at step


316


(shown in FIG.


3


), the system recorder


220


determines that the interrupt command is a write command directed to the second block of the NVRAM


224


.




As shown in

FIG. 6

, if the system recorder


220


determines that the interrupt command is a write command, then at step


602


, the system recorder


220


loads an address pointer to the intended address in the second block of NVRAM


224


. At step


604


, the system recorder


220


determines whether a memory space is available in the second block of NVRAM


224


to perform the requested write operation. If a memory space is not available in the second block, then at step


606


, the system recorder


220


performs a pointer wrap operation. In doing so, the system recorder


220


redirects the address pointer to the beginning of the second block of the NVRAM


224


. The system recorder


224


erases the memory space corresponding to a single previously logged message which occupies that memory space. Additional previously logged messages are erased only if more memory space is required to perform the present write operation.




If the system recorder


220


determines that a memory space is available in the second block of the NVRAM


224


, then at step


608


, the system recorder


220


fetches the time from the real-time clock


221


and stamps (i.e., appends) the real time to the message being written. As noted above, the real time comprises a four-byte field (i.e., 32 bits) which are appended to the message being written. At step


610


, the system recorder


220


writes the time-stamped message to the second block of the NVRAM


224


. At step


612


, the system recorder


220


issues an interrupt return command to return to its idle state


302


(shown in FIG.


3


).




Upon the occurrence of a system failure, system maintenance personnel retrieve the logged event messages in a chronological fashion to identify and trace the events leading up to the time of when such failure occurred. As a result, the system failure is easily repairable with minimal downtime.




In view of the foregoing, it will be appreciated that the invention overcomes the longstanding need for logging and recording the occurrence of information system events leading up to the occurrence of system failures without the disadvantages of having complex failure diagnosis. The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive and the scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.



Claims
  • 1. A method of recording an event occurring in an information processing system having a computer bus and a system recorder, said method comprising:accessing the system recorder via the computer bus; transmitting a message to the system recorder in response to the event, wherein said message comprises a plurality of bits representing at least a slave address, a least significant bit, a most significant bit, a data type a command ID, and a status field; and storing the message in a memory unit.
  • 2. The method as defined in claim 1, wherein the act of accessing the system recorder includes the act of accessing a real-time clock for time-stamping said message.
  • 3. The method as defined in claim 1, wherein the act of storing the message includes the act of storing the message in a non-volatile random access memory (NVRAM) unit having an exclusive power source.
  • 4. The method as defined in claim 1, wherein the act of storing in the memory unit includes the act of storing in a plurality of memory blocks, wherein one block of the plurality of memory blocks stores a device identification code.
  • 5. The method as defined in claim 4, wherein another block of the plurality of memory blocks stores an event message.
  • 6. The method as defined in claim 1, wherein the act of accessing the system recorder includes the act of supporting a read operation from the memory unit.
  • 7. The method as defined in claim 1, wherein the act of accessing via the computer bus includes the act of accessing via an inter-integrated-circuit (I2C) bus.
  • 8. In an information processing system having a plurality of components and experiencing first-type and second-type events, a method of storing the events comprising:receiving a signal indicative of at least one of the first-type and the second-type events; storing information relating to the first-type event associated with one of the plurality of components in a first block of a memory; and storing information relating to the second-type event associated with one of the plurality of components in a second block of the memory in a wrap around sequence, wherein said signal comprises at least a message of a failure in one of the plurality of components, and wherein the first and the second blocks of the memory are configured to store the information without restriction of time.
  • 9. The method as defined in claim 8, wherein the act of storing the information relating to the first-type event includes the act of storing at least an identification number.
  • 10. The method as defined in claim 8, wherein the act of storing the information relating to the first-type event includes the act of storing the an event message relating to a change in the presence of one of the plurality of components.
  • 11. A program storage device storing instructions that when executed by a computer perform a method of recording an event occurring in an information system processing having a computer bus and a system recorder, said method comprising:accessing the system recorder via the computer bus; transmitting a message to the system recorder in response to the event, wherein said message comprises a plurality of bits representing at least a slave address, a least significant bit, a most significant bit, a data type, a command ID, and a status field; and storing the message in a memory unit.
  • 12. The device as defined in claim 11, wherein the method further comprises time-stamping sad message for storage.
  • 13. The device as defined in claim 11, wherein storing the message includes storing the message in a non-volatile random access memory (NVRAM) having an exclusive source.
  • 14. The device as defined in claim 11, wherein storing the message includes storing the message in a plurality of memory blocks, and wherein one block of the plurality of memory blocks stores a device identification code.
  • 15. The device as defined in claim 14, wherein another block of the plurality of memory blocks stores an event message.
  • 16. The device as defined in claim 11, further comprising performing a read operation from the memory unit.
  • 17. The device as defined in claim 11, wherein accessing the system recorder via the computer bus includes accessing the system recorder via an inter-integrated circuit (I2C) bus.
  • 18. A program storage device storing instructions that when executed by a computer perform, in an information processing system having a plurality of components and experiencing first-type and second-type events, a method of storing the events comprising:receiving a signal indicative of at least one of the first-type and the second-type events; storing information relating to the first-type event associated with one of the plurality of components in a first block of a memory; and storing information relating to the second-type event associated with one of the plurality of components in a second block of the memory in a wrap around sequence, wherein the signal comprises at least a message of a failure in one of the plurality of components, and wherein the first and the second blocks of the memory are configured to store the information without restriction of time.
  • 19. The device as defined in claim 18, wherein storing information relating to a first-type event includes storing information relating to at least an identification number.
  • 20. The device as defined in claim 18, wherein storing information relating to the first-type event includes storing information that relates to a change in the presence of one of the plurality of components.
  • 21. The device as defined in claim 18, wherein storing information relating to the second-type event includes storing information that relates to a failure occurring in one of the plurality of components.
  • 22. The device as defined in claim 18, further comprising allocating the size of the first block to be equal to the size of the second block.
  • 23. A method of recording information relating to an event experienced by a component in an information system having a processor and a memory unit, said method comprising:receiving a command including data from the processor, wherein the command comprises a plurality of bits representing at least a slave address, a least significant bit, a most significant bit, a data type, a command ID, and a status field; determining the data type of the command; and performing at least one of a read and write operation in the memory unit in response to the act of determining the data type of the command.
  • 24. The method as defined in claim 23, wherein the act of receiving a command includes the act of receiving an interrupt command.
  • 25. The method as defined in claim 23, wherein the act of receiving the command includes the act of determining if the command has been received.
  • 26. The method as defined in claim 25, further comprising the act of determining if a reset command has been received.
  • 27. The method as defined in claim 26, further comprising the act of clearing the memory unit if the reset command is received.
  • 28. The method as defined in claim 26, further comprising the act of saving, a real time if the reset command is not received.
  • 29. The method as defined in claim 25, further comprising the act or determining if the command is intended for a first block of the memory unit.
  • 30. The method as defined in claim 29, further comprising the act of determining if the command is a read command.
  • 31. The method as defined in claim 30, further comprising the act of performing a read operation from the memory unit, if the command is a read command.
  • 32. The method as defined in claim 30, further comprising the act of performing a write operation to the memory unit, if the command is not a read command.
  • 33. The method as defined in claim 29, further comprising the act of determining if the command is a read command, if the command is not intended for the first block of the memory unit.
  • 34. The method as defined in claim 33, further comprising the act of performing a read operation from the memory unit, if the command is a read command.
  • 35. The method as defined in claim 33, further comprising the act of performing a write operation to the memory unit, if the command is not a read command.
  • 36. The method as defined in claim 23, further comprising the act of returning to an idle state after performing a read operation.
  • 37. The method as defined in claim 25, further comprising the act of returning to an idle state after performing a write operation.
  • 38. A method of recording an event occurring in an information processing system having a computer bus, said method comprising:monitoring the occurrence of the event; generating a message in response to the event; accessing a system recorder via the computer bus; transmitting the message to the system recorder, wherein said message comprises a plurality of bits representing at least a slave address, a least significant bit, a most significant bit, a data type, a command ID, and a status field; time-stamping the message using a substantially real-time clock; and storing the message in a memory unit.
  • 39. The method as defined in claim 38, wherein the act of monitoring the occurrence of the event includes the act of monitoring the occurrence of a failure in the information processing system.
  • 40. In an information processing system having a plurality of components and experiencing an event, a method of storing the event comprising:storing information relating to the presence of one of the plurality of components into a first memory block; and storing information relating to the failure of one of the plurality of components into a second memory block in a wrap around sequence, wherein the first and second memory blocks are configured to store the information without restriction of time.
RELATED APPLICATIONS

The subject matter of U.S. Patent Application entitled BLACK BOX RECORDER FOR INFORMATION SYSTEM EVENTS, filed on Oct. 1, 1997, application Ser. No. 08/942,381. The benefit under 35 U.S.C. §119(e) of the following U.S. provisional application(s) is hereby claimed:

US Referenced Citations (241)
Number Name Date Kind
4057847 Lowell et al. Nov 1977
4449182 Rubinson et al. May 1984
4672535 Katzman et al. Jun 1987
4695946 Andreasen et al. Sep 1987
4707803 Anthony, Jr. et al. Nov 1987
4769764 Levanon Sep 1988
4774502 Kimura Sep 1988
4821180 Gerety et al. Apr 1989
4835737 Herrig et al. May 1989
4949245 Martin et al. Aug 1990
4999787 McNally et al. Mar 1991
5006961 Monico Apr 1991
5007431 Donehoo, III Apr 1991
5051720 Kittirutsunetorn Sep 1991
5073932 Yossifor et al. Dec 1991
5103391 Barrett Apr 1992
5118970 Olson et al. Jun 1992
5123017 Simpkins et al. Jun 1992
5136715 Hirose et al. Aug 1992
5157663 Major et al. Oct 1992
5210855 Bartol May 1993
5222897 Collins et al. Jun 1993
5247683 Holmes et al. Sep 1993
5253348 Scalise Oct 1993
5261094 Everson et al. Nov 1993
5265098 Mattson et al. Nov 1993
5266838 Gerner Nov 1993
5269011 Yanai et al. Dec 1993
5272382 Heald et al. Dec 1993
5272584 Austruy et al. Dec 1993
5276814 Bourke et al. Jan 1994
5283905 Saadeh et al. Feb 1994
5307354 Cramer et al. Apr 1994
5311451 Barrett May 1994
5317693 Cuenod et al. May 1994
5329625 Kannan et al. Jul 1994
5337413 Lui et al. Aug 1994
5351276 Doll, Jr. et al. Sep 1994
5367670 Ward et al. Nov 1994
5379184 Barraza et al. Jan 1995
5379409 Ishikawa Jan 1995
5386567 Lien et al. Jan 1995
5402431 Saadeh et al. Mar 1995
5404494 Garney Apr 1995
5423025 Goldman et al. Jun 1995
5430717 Fowler et al. Jul 1995
5430845 Rimmer et al. Jul 1995
5432946 Allard et al. Jul 1995
5440748 Sekine et al. Aug 1995
5455933 Schieve et al. Oct 1995
5465349 Geronimi et al. Nov 1995
5471617 Farrand et al. Nov 1995
5471634 Giorgio et al. Nov 1995
5473499 Weir Dec 1995
5483419 Kaczeus, Sr. et al. Jan 1996
5485607 Lomet et al. Jan 1996
5487148 Komori et al. Jan 1996
5491791 Glowny et al. Feb 1996
5493574 McKinley Feb 1996
5493666 Fitch Feb 1996
5513314 Kandasamy et al. Apr 1996
5513339 Agrawal et al. Apr 1996
5515515 Kennedy et al. May 1996
5517646 Piccirillo et al. May 1996
5519851 Bender et al. May 1996
5526289 Dinh et al. Jun 1996
5528409 Cucci et al. Jun 1996
5530810 Bowman Jun 1996
5533198 Thorson Jul 1996
5539883 Allon et al. Jul 1996
5542055 Amini et al. Jul 1996
5546272 Moss et al. Aug 1996
5555510 Verseput et al. Sep 1996
5559764 Chen et al. Sep 1996
5559958 Farrand et al. Sep 1996
5559965 Oztaskin et al. Sep 1996
5564024 Pemberton Oct 1996
5566299 Billings et al. Oct 1996
5568610 Brown Oct 1996
5568619 Blackledge et al. Oct 1996
5572403 Mills Nov 1996
5577205 Hwang et al. Nov 1996
5579491 Jeffries et al. Nov 1996
5579528 Register Nov 1996
5581712 Herrman Dec 1996
5581714 Amini et al. Dec 1996
5586250 Carbonneau et al. Dec 1996
5588121 Reddin et al. Dec 1996
5588144 Inoue et al. Dec 1996
5592610 Chittor Jan 1997
5598407 Bud et al. Jan 1997
5602758 Lincoln et al. Feb 1997
5604873 Fite et al. Feb 1997
5606672 Wade Feb 1997
5608865 Midgely et al. Mar 1997
5608876 Cohen et al. Mar 1997
5615207 Gephardt et al. Mar 1997
5621159 Brown et al. Apr 1997
5621892 Cook Apr 1997
5622221 Genga, Jr. et al. Apr 1997
5628028 Michelson May 1997
5632021 Jennings et al. May 1997
5636341 Matsushita et al. Jun 1997
5638289 Yamada et al. Jun 1997
5644470 Benedict et al. Jul 1997
5644731 Liencres et al. Jul 1997
5651006 Fujino et al. Jul 1997
5652832 Kane et al. Jul 1997
5652833 Takizawa et al. Jul 1997
5652892 Ugajin Jul 1997
5655081 Bonnell et al. Aug 1997
5655148 Richman et al. Aug 1997
5659682 Devarakonda et al. Aug 1997
5664119 Jeffries et al. Sep 1997
5666538 DeNicola Sep 1997
5671371 Kondo et al. Sep 1997
5675723 Ekrot et al. Oct 1997
5680288 Carey et al. Oct 1997
5682328 Roeber et al. Oct 1997
5684671 Hobbs et al. Nov 1997
5685671 Hobbs et al. Nov 1997
5689637 Johnson et al. Nov 1997
5696895 Hemphill et al. Dec 1997
5696899 Kalwitz Dec 1997
5696949 Young Dec 1997
5696970 Sandage et al. Dec 1997
5701417 Lewis et al. Dec 1997
5704031 Mikami et al. Dec 1997
5708775 Nakamura Jan 1998
5708776 Kikinis Jan 1998
5712754 Sides et al. Jan 1998
5721935 DeSchepper et al. Feb 1998
5724529 Smith et al. Mar 1998
5726506 Wood Mar 1998
5737708 Grob et al. Apr 1998
5737747 Vishlitzky et al. Apr 1998
5740378 Rehl et al. Apr 1998
5742514 Bonola Apr 1998
5742833 Dea et al. Apr 1998
5747889 Raynham et al. May 1998
5748426 Bedingfield et al. May 1998
5752164 Jones May 1998
5754396 Felcman et al. May 1998
5754449 Hoshal et al. May 1998
5754797 Takahashi May 1998
5758352 Reynolds et al. May 1998
5761033 Wilhelm Jun 1998
5761045 Olson et al. Jun 1998
5761462 Neal et al. Jun 1998
5761707 Aiken et al. Jun 1998
5764924 Hong Jun 1998
5764968 Ninomiya Jun 1998
5765008 Desai et al. Jun 1998
5765198 McCrocklin et al. Jun 1998
5767844 Stoye Jun 1998
5768541 Pan-Ratzlaff Jun 1998
5768542 Enstrom et al. Jun 1998
5771343 Hafner Jun 1998
5774640 Kurio Jun 1998
5774645 Beaujard et al. Jun 1998
5774741 Choi Jun 1998
5778197 Dunham Jul 1998
5781703 Desai et al. Jul 1998
5781716 Hemphill et al. Jul 1998
5781767 Inoue et al. Jul 1998
5781798 Beatty et al. Jul 1998
5784576 Guthrie et al. Jul 1998
5787459 Stallmo et al. Jul 1998
5790775 Marks et al. Aug 1998
5790831 Lin et al. Aug 1998
5793948 Asahi et al. Aug 1998
5793987 Quackenbush et al. Aug 1998
5794035 Golub et al. Aug 1998
5796185 Takata et al. Aug 1998
5796580 Komatsu et al. Aug 1998
5796934 Bhanot et al. Aug 1998
5796981 Abudayyeh et al. Aug 1998
5798828 Thomas et al. Aug 1998
5799036 Staples Aug 1998
5799196 Flannery Aug 1998
5801921 Miller Sep 1998
5802269 Poisner et al. Sep 1998
5802305 McKaughan et al. Sep 1998
5802324 Wunderlich et al. Sep 1998
5802393 Begun et al. Sep 1998
5802552 Fandrich et al. Sep 1998
5803357 Lakin Sep 1998
5805804 Laursen et al. Sep 1998
5805834 McKinley et al. Sep 1998
5809224 Schultz et al. Sep 1998
5809256 Najemy Sep 1998
5809555 Hobson Sep 1998
5812748 Ohran et al. Sep 1998
5812750 Dev et al. Sep 1998
5812757 Okamoto et al. Sep 1998
5812858 Nookala et al. Sep 1998
5815117 Kolanek Sep 1998
5815651 Litt Sep 1998
5815652 Ote et al. Sep 1998
5821596 Miu et al. Oct 1998
5822547 Boesch et al. Oct 1998
5826043 Smith et al. Oct 1998
5829046 Tzelnic et al. Oct 1998
5835719 Gibson et al. Nov 1998
5835738 Blackledge, Jr. et al. Nov 1998
5838932 Alzien Nov 1998
5841964 Yamaguchi Nov 1998
5841991 Russell Nov 1998
5850546 Kim Dec 1998
5852720 Gready et al. Dec 1998
5852724 Glenn, II et al. Dec 1998
5857074 Johnson Jan 1999
5864653 Tavallaei et al. Jan 1999
5864654 Marchant Jan 1999
5864713 Terry Jan 1999
5867730 Leyda Feb 1999
5875308 Egan et al. Feb 1999
5875310 Buckland et al. Feb 1999
5878237 Olarig Mar 1999
5878238 Gan et al. Mar 1999
5881311 Woods Mar 1999
5884049 Atkinson Mar 1999
5886424 Kim Mar 1999
5892898 Fujii et al. Apr 1999
5892915 Duso et al. Apr 1999
5893140 Vahalia et al. Apr 1999
5907672 Matze et al. May 1999
5909568 Nason Jun 1999
5911779 Stallmo et al. Jun 1999
5913034 Malcolm Jun 1999
5922060 Goodrum Jul 1999
5930358 Rao Jul 1999
5935262 Barrett et al. Aug 1999
5936960 Stewart Aug 1999
5938751 Tavallaei et al. Aug 1999
5941996 Smith et al. Aug 1999
5964855 Bass et al. Oct 1999
5983349 Kodama et al. Nov 1999
5987621 Duso et al. Nov 1999
5987627 Rawlings, III Nov 1999
6038624 Chan et al. Mar 2000
Foreign Referenced Citations (5)
Number Date Country
0 866 403 A1 Sep 1998 EP
4-333118 Nov 1992 JP
5-233110 Jan 1993 JP
7-093064 Apr 1995 JP
7-261874 Oct 1995 JP
Non-Patent Literature Citations (25)
Entry
“The Flight Rcorder: An Architectural Aid For System Monitoring”, Michael M. Gorlick, Conference Proceedings on ACM/ONR workshop on parallel and distributed debugging, p 175-181, 1991.*
Lyons, Computer Reseller News, Issue 721, pp. 61-62, Feb. 3, 1997, “ACC Releases Low-Cost Solution for ISPs.”
M2 Communications, M2 Presswire, 2 pages, Dec. 19, 1996, “Novell IntranetWare Supports Hot Pluggable PCI from NetFRAME.”
Rigney, PC Magazine, 14(17):375-379, Oct. 10, 1995, “The One for the Road (Mobile-aware capabilities in Windows 95).”
Shanley, and Anderson, PCI System Architecture, Third Edition, p. 382, Copyright 1995.
Gorlick, M., Conf. Proceedings: ACM/ONR Workshop on Parallel and Distributed Debugging, pp. 175-181, 1991, “The Flight Recorder: An Architectural Aid for System Monitoring.”
IBM Technical Disclosure Bulletin, 92A+62947, pp. 391-394, Oct. 1992, Method for Card Hot Plug Detection and Control.
Haban, D. & D. Wybranietz, IEEE Transaction on Software Engineering, 16(2):197-211, Feb. 1990, “A Hybrid Monitor for Behavior and Performance Analysis of Distributed Systems.”
Shanley and Anderson, PCI System Architecture, Third Edition, Chapters 15 & 16, pp. 297-328, CR 1995.
PCI Hot-Plug Specification, Preliminary Revision for Review Only, Revision 0.9, pp. i-vi, and 1-25, Mar. 5, 1997.
SES SCSI-3 Enclosure Services, X3T10/Project 1212-D/Rev 8a, pp. i, iii-x, 1-76, and I-1 (index), Jan. 16, 1997.
Compaq Computer Corporation, Technology Brief, pp. 1-13, Dec. 1996, “Where Do I Plug the Cable? Solving the Logical-Physical Slot Numbering Problem.”
Standard Overview, http://www.pc-card.com/stand_overview.html#1, 9 pages, Jun. 1990, “Detailed Overview of the PC Card Standard.”
Digital Equipment Corporation, datasheet, 140 pages, 1993, “DECchip 21050 PCI-TO-PCI Bridge.”
NetFRAME Systems Incorporated, News Release, 3 pages, referring to May 9, 1994, “NetFRAME's New High-Availability ClusterServer Systems Avoid Scheduled as well as Unscheduled Downtime.”
Compaq Computer Corporation, Phenix Technologies, LTD, and Intel Corporation, specification, 55 pages, May 5, 1995, “Plug & Play BIOS Specification.”
NetFRAME Systems Incorporated, datasheet, 2 pages, Feb. 1996, “NF450FT Network Mainframe.”
NetFRAME Systems Incorporated, datasheet, 9 pages, Mar. 1996, “NetFRAME Cluster Server 8000.”
Joint work by Intel Corporation, Compaq, Adaptec, Hewlett Packard, and Novell, presentation, 22 pages, Jun. 1996, “Intelligent I/O Architecture.”
Lockareff, M. HTINews, http://www.hometoys.com/htinews/dec96/articles/lonworks.htm, 2 pages, Dec. 1996, “Loneworks—An Introduction.”
Schofield, M.J., http://www.omegas.co.uk/CAN/canworks.htm, 4 pages, Copyright 1996, 1997, “Controller Area Network—How CAN Works.”
NTRR, Ltd., http://www.nrtt.demon.co.uk/cantech.html, 5 pages, May 28, 1997, “CAN: Technical Overview.”
Herr, et al., Linear Technology Magazine, Design Features, pp . 21-23, Jun. 1997, “Hot Swapping the PCI Bus.”
PCI Special Interest Group, specification, 35 pages, Draft For Review Only, Jun. 15, 1997, “PCI Bus Hot Plug Specification.”
Microsoft Corporation, file:///A|/Rem_devs.htm. 4 pages, Copyright 1997, undated Aug. 13, 1997, “Supporting Removable Devices Under Windows and Windows NT.”
Provisional Applications (3)
Number Date Country
60/046397 May 1997 US
60/047016 May 1997 US
60/046416 May 1997 US