Method of recording information system events

Description

APPENDICES

Appendix A, which forms a part of this disclosure, is a list of commonly owned copending U.S. patent applications. Each one of the applications listed in Appendix A is hereby incorporated herein in its entirety by reference thereto.

COPYRIGHT RIGHTS

A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The invention relates generally to information processing systems, such as system servers and personal computers (PCs). More particularly, this invention relates to the management and maintenance of information system failures.

2. Description of the Related Art

Information processing systems, such as computer system servers, have virtually become an inseparable part of information processing networks. These systems communicate and process an enormous amount of information in a relatively short time. To perform these sophisticated tasks, a computer system server typically includes various subsystems and components such as a plurality of microprocessors, memory modules, various system and bus control units, and a wide variety of data input/output (I/O) devices. These computer components communicate information using various data rates and protocols over multiple system buses. The demand for faster processing speeds, and the revolutionary fast-track development of computer systems, have necessitated the use of interconnecting devices. The wide variety of these devices, coupled with various data transfer protocols, have added special complexity to the management and maintenance of faults occurring in such information systems.

To facilitate the understanding of the invention, a brief description of the I

2

C bus protocol is first provided.

FIG. 1

is a functional block diagram of an exemplary I

2

C bus application. As shown in

FIG. 1

, an I

2

C Bus

100

is provided to support data transfer among a variety of I

2

C devices. The I

2

C Bus

100

is a serial interface bus that allows multiple I

2

C devices to communicate via a bi-directional, two-wire serial bus. The I

2

C Bus

100

comprises two wires: a serial data line (SDA)

102

and a serial clock line (SCL)

104

. The SDA

102

carries data transmissions among I

2

C devices, and the SCL

104

carries the clock timing information that synchronizes the data transmission. A complete system usually consists of at least one microcontroller and other peripheral devices such as memory units and input/output (I/O) expanders for transferring data on the I

2

C Bus

100

. These peripheral devices may include liquid crystal display (LCD) and light emitting diode (LED) drivers, random access memory (RAM and read only memory (RON devices, clock/calendars, I/O expanders, analog-to-digital (A/D) and digital-to-analog (D/A) converters.

As shown in

FIG. 1

, a micro-controller A

106

and a micro-controller B

108

are coupled to the I

2

C Bus

100

for exchanging information on the I

2

C Bus

100

. Additionally, an I

2

C-ISA Interface

110

is connected to the P

2

C Bus

100

to provide access interface between industry standard architecture (ISA) devices and I

2

C devices. A LCD driver

112

is coupled to the I

2

C Bus

100

for displaying information accessed from other I

2

C devices located on the I

2

C Bus

100

. An I/O Expander

114

is also coupled to the I

2

C Bus

100

to enable I/O devices (not shown in this figure) to obtain direct access to the I

2

C Bus

100

. Moreover, a memory device

116

such as a RAM or an electrically erasable programmable read only memory (EEPROM) is also coupled to the I

2

C Bus

100

to provide storage of data transmitted by other I

2

C devices.

Each device connected to the I

2

C bus is software addressable by a unique address and simple master/slave relationships exist at all times. The term “master” refers to an I

2

C device which initiates a transfer command to another I

2

C device, generates clock signals, and terminates the transfer on the I

2

C bus. The term “slave” refers to the I

2

C device which receives the transfer command from the master device on the I

2

C bus. The P

2

C bus is a true multi-master bus which includes collision detection and arbitration to prevent data corruption if two or more masters simultaneously initiate data transfer. Moreover, I

2

C devices act as transmitters and receivers. A “transmitter” is the I

2

C device which sends the data to the I

2

C Bus

100

. A “receiver” is the I

2

C device which receives the data from the I

2

C Bus

100

. Arbitration refers to a procedure whereby, if more than one master simultaneously attempts to control the I

2

C Bus

100

, only one is allowed to do so and the transmitted message is not corrupted.

The I

2

C Bus

100

supports up to 40 I

2

C devices and may have a maximum length of 25 feet. The I

2

C Bus

100

supports a transfer data rate of up to 100 kilobits/second (kbps) in “standard mode,” or up to 400 kbps in “fast mode.” Data transfers over the I

2

C Bus

100

follow a well-defined protocol. A transfer always takes place between a master and a slave. All bus transfers are bounded by a “Start” and a “Stop” condition. In the standard mode, the first byte after the Start condition usually determines which slave will be selected by the master. In the fast mode, the first two bytes after the Start condition usually determine which slave will be selected by the maser. Each peripheral device on the I

2

C Bus

100

has a unique 8-bit address in the standard mode, or a 10-bit address in the fast mode. The address is hard-coded for each type of I

2

C device, but some devices provide an input pin that allows a designer to specify one bit of the device's I

2

C address. This allows two identical I

2

C devices used on the same bus to be addressed individually.

With the increased complexity of information processing systems, the frequency of system failures due to system- and component-level errors has increased. Some of the problems are found in the industry standard architecture (ISA) bus used in IBM PC-compatible computers. The enhanced ISA (EISA) provided some improvement over the ISA architecture of the IBM PC/AT, but more resistance to failure and higher performance are still required. Other problems may exist in interface devices, such as bus-to-bus bridges. Additionally, problems may exist in bus peripheral devices such as microcontrollers, central processors, power supplies, cooling fans, and other similar components.

With these added components and subsystems, occasional system failures have become inevitable. Existing information systems do not currently provide a tool for managing these failures. More importantly, present systems do not possess the means to more efficiently diagnose and restore the system from the occurrence of such failures. Therefore, when failures occur, there is a need to identify the events leading up to these failures. The ability to identify the events leading up to system failures minimizes downtime and ensures more efficient system maintenance and repair in the future.

SUMMARY OF THE INVENTION

One embodiment of the invention provides a method of recording event messages with a real-time stamp in a memory unit. The method records an event occurring in an information processing system having a computer bus and a system recorder. The method comprises the act of accessing the system recorder via the computer bus, transmitting a message to the system recorder in response to the event, and storing the message in a memory unit.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will be better understood by referring to the following detailed description, which should be read in conjunction with the accompanying drawings, in which:

FIG. 1

is a functional block diagram of an exemplary I

2

C bus application.

FIG. 2

is a functional block diagram of one embodiment of the invention.

FIG. 3

is a flow chart describing the decisional steps performed by one embodiment of the system recorder.

FIG. 4

is a flow chart describing the steps of performing an exemplary read from and/or write operations to a first block of the memory unit.

FIG. 5

is a flow chart describing the steps of performing an exemplary read operation from a second block of the memory unit.

FIG. 6

is a flow chart describing the steps of performing an exemplary write operation to a second block of the memory unit.

DETAILED DESCRIPTION OF THE INVENTION

The invention provides a method of recording a time-stamped history of events leading up to the failure of an information system server. The method may be applied to a black box recorder (hereinafter the “System Recorder”). One embodiment of the invention involves system operation on an Inter-Integrated-Circuit (I

2

C) bus. The operation of the System Recorder on an I

2

C bus should not, however, be construed to imply any limitations on the bus protocol which may be used with this invention. The invention may be implemented using virtually any bus protocol.

Referring now to

FIG. 2

, a functional block diagram of an embodiment of the invention is shown. This embodiment depicts a system server

200

which comprises the so-called “Intrapulse” system. The system server

200

is a network of microcontrollers integrated to support the transfer of data messages within the system server

200

. The system server

200

performs control and monitoring functions of processing units, power supplies, cooling fans, and similar functions. To achieve its monitoring objective, the system server

200

utilizes a variety of control, diagnostic, monitoring, and logging processors. The system server

200

may employ switches, indicators, or other controls to perform its monitoring and control functions. Optionally, the system server

200

need not employ any switches, indicators, or other controls, to perform these functions. This characteristic is often referred to as the “fly-by-wire” feature.

The system server

200

is a part of and supports one or more microcontroller networks (not shown in this figure). A microcontroller network further includes a variety of peripheral components such as AC/DC power supplies and cooling fans. The system server

200

may be divided into two main subsystems: the system board

202

and the back plane

204

. Communication between main system processing units (e.g., CPU

212

) and the system interface

214

on the system board

206

is supported by an industry standard architecture (ISA) Bus

208

. Communication among the devices located on the back plane

204

is supported by an I

2

C Bus

210

. The I

2

C Bus

210

also supports communication between the back plane

204

and the system board

206

.

The system board

206

comprises a system interface

214

and a plurality of central processor units (CPUs) including CPU “A Controller”

216

and CPU “B Controller”

218

interconnected via the I

2

C Bus

210

. One or more system CPU

212

communicates with the devices on the system board

206

via an ISA Bus

208

. The system interface

214

provides a bridging function between the CPU

212

and the A Controller

216

and B Controller

218

on the system board

206

. In addition, the system interface

214

interconnects the CPU

212

with other devices on the I

2

C Bus

210

. A remote interface

240

is connected to the system board

206

to allow remote access by outside clients to the system server

200

. Using a client modem

244

, a client computer

246

accesses a server modem

242

via a remote link

243

. The server modem

242

is typically directly connected to the remote interface

240

to support communication between the server system

200

and the client computer

246

.

The back plane

204

includes a system recorder

220

and a chassis controller

222

interconnected via the P

2

C Bus

210

. The system recorder

220

includes a real-time clock (RTC)

221

. Additionally, a non-volatile random access memory (NVRAM)

224

is directly connected to the system recorder

220

. A plurality of canister controllers are also coupled to the I

2

C Bus

210

to communicate with devices located on the back plane

204

and the system board

206

. These canister controllers include “Canister Controller A”

232

, “Canister Controller B”

234

, “Canister Controller C”

236

, and “Canister Controller D”

238

(the “canister controllers”). Generally, a canister is a detachable module which provides expendability to a plurality of peripheral component interconnect (PCI) devices.

FIG. 2

does not show the canister controllers as part of the back plane

204

because they are removable units,

One embodiment of the system recorder

220

is a high-performance, CMOS, fully-static, 8-bit microcontroller which controls read and write operations from and into the NVRAM

224

, respectively. The system recorder

220

of

FIG. 2

has a multi-level deep stack, and multiple internal and external interrupt sources. The system recorder

220

may employ a Harvard architecture for allowing a 14-bit wide instruction word with separate 8-bit wide data. The system recorder

220

has 192 bytes of RAM and 33 I/O pins. In addition, several peripheral features are available, including: three timer/counters, two Capture/Compare modules, and two serial ports. The system recorder

220

can directly or indirectly address its register files or data memory. All special function registers including the program counter are mapped in the data memory. The system recorder

220

has a synchronous serial port which may be configured as either a two-wire I

2

C bus or a 3-wire serial peripheral interface (SPI). An 8-bit parallel slave port is also provided. The system recorder

220

may be based on microcontrollers manufactured by Microchip Technology Inc., e.g., the PIC16C6X family of microcontrollers.

The RTC

221

is integrated in the system recorder

220

on the back plane

204

. The RTC

221

comprises two 32-bit counters which keep track of real time and elapsed time in seconds. The RTC

221

comprises a four-byte field (i.e., 32 bits) for recording time for over 125 years (2{circumflex over ( )}32 seconds) without having to reset itself. It is designed to count seconds when its input power (V

cc

) is applied and continually count seconds under battery backup regardless of the condition of V

cc

. The continuous counter is used to derive time of day, week, month, and year by using a software algorithm. Alternatively, the RTC

221

is used under the control of the system recorder

220

to record real time events. Communication to and from the RTC

221

takes place via a 3-wire serial port. A one byte protocol selects read/write functions, counter clear functions and oscillator trim. The RTC

221

records real time in an absolute format. The O/S uses a reference point in time in order to synchronize the RTC

221

with the standard 24-hour time format.

One embodiment of the NVRAM

224

is a 128-kbyte memory unit which is organized as 131,072 words by 8 bits. Each NVRAM

224

has a self-contained lithium energy source and control circuitry which continuously monitors its input voltage (V

cc

) for an out-of tolerance condition (e.g., +/− 10% of 5 Volts). When such a condition occurs, the lithium energy source is automatically switched on and write protection is unconditionally enabled to prevent data corruption. With special firmware, the NVRAM

224

is divided into two blocks: a first block having 64 kbytes of-memory space, and a second block having 64 kbytes of memory space. The first block of the NVRAM

224

is a fixed-variable memory block which stores ID codes of the devices installed in the network. In addition to ID codes, the first block of NVRAM

224

may also store one or more address pointers, each pointing to a memory address in the second block of NVRAM

224

. An address pointer may be a head pointer (indicating a start address) or a tail pointer (indicating an end address). The second block is a memory block which stores message codes in connection with events occurring in the network. The NVRAM

224

may be based upon devices manufactured by Dallas Semiconductor Corporation, e.g., the DS1245Y/AB 1024K Nonvolatile SRAM.

Once the system server

200

is powered on, the system recorder

220

writes an internal message entry to the NVRAM

224

. When the power up process is enabled, the back plane

204

monitors the status of several system events and functions. These functions may include system temperature, fan speeds, and changes in the installation or presence of canisters and power supplies. Non-specific or general faults on most devices in the microcontroller network may be monitored in a summary bit. However, the fans, canisters, and temperature of the CPU may be monitored with particularity.

The back plane

204

monitors a plurality of temperature sensors located on a temperature bus (not shown in this figure) once every predetermined time interval, e.g., every second. Each temperature sensor comprises a transducer connected to and having an address at a serial bus (not shown in this figure) on the back plane

204

. These transducers are read in the same sequence as their address order. The temperature may range between −25 and +70 degrees Celsius. If any of the temperature sensors reaches +55 degrees Celsius, or −25 degrees Celsius, then a warning is issued, and a message corresponding to that event is written to the NVRAM

224

, and sent to other destinations via the system interface

214

and the remote interface

240

. If any of the temperature sensors reaches +70 degrees Celsius, then a shutdown command is typically issued and the system is powered off.

The back plane

204

monitors the presence of the canisters several times per second. There are several methods to determine the presence or absence of a canister. To monitor the canister corresponding for Canister Controller A

232

for example, the chassis controller

222

sends a reset pulse to that canister, preferably through a direct one-wire serial bus connection. If the canister is changed/replaced, then the chassis controller

222

updates a canister presence bit accordingly and sends a canister event message to the system recorder

220

and remote interface

240

, preferably via the I

2

C Bus

210

. The system recorder

220

replaces the ID code (e.g., a serial number string) of the previous canister (corresponding to Canister Controller A

232

) by the ID code of the current canister in the NVRAM

224

accordingly. If a canister is removed from the server system

200

, then the length of the ID code string of that (absent) canister is set to zero. However, if a new canister is installed in its place, the ID code of the new canister is written to the NVRAM

224

. Serial numbers are typically stored in NVRAM

224

in BCD format.

Similarly, the back plane

204

monitors the presence or absence of power supplies several times per second. To monitor a particular power supply, the chassis controller

222

transmits a reset pulse to detect a pulse of the power supply, preferably via a direct one-wire serial bus. If a power supply is replaced, the chassis controller

222

updates the presence bit for that power supply and sends a message corresponding to that power supply event to the NVRAM

224

and the remote interface

240

. If a power supply is removed from the network, then the length of the ID code (e.g., serial number string) of that (absent) power supply is set to zero. However, if a new power supply is installed in the network, the system recorder

220

writes the ID code of that power supply into the NVRAM

224

.

Similarly, the back plane

204

may monitor the speeds of the cooling fans of all CPUs in the same sequence as the CPU's address order. For instance, the cooling fan of the system board

206

generally has a low-speed limit of about 30 revolutions per second (rps). Moreover, the cooling fan of a canister typically has a low-speed limit of about 20 rps. If the speed of any fan falls below its set low limits, a fan fault corresponding to that fan is issued. In addition, the system recorder

220

writes a fan event message into the NVRAM

224

. Corrective measures such as setting a fan fault LED on, and setting the fan speed to high, may also be performed.

The protocol of the I

2

C Bus

210

uses an address in the memory NVRAM

224

of the system server

200

as the means of identifying various control and diagnostic commands. Any system function is queried by generating a “read” request. Conversely, a function can be executed by generating a “write” request to an address specified in the protocol format. An I

2

C device in the system server

200

initiates read and write requests by sending a message on the I

2

C bus. A read or write request may consist of a payload, a message, and a packet. A payload is the data included in the request command. A message is a wrapper around the payload. In addition to the data, the message includes a slave address, a least significant bit (LSBit), a most significant bit (MSBit), a data type, a command ID (LSByte and MSByte), and status. A packet is a wrapper around a message that is transferred to the ISA Bus

208

. The packet includes check sum and inverted slave address fields.

The slave address is typically a 7-bit wide field which specifies the identification code of a slave device. The slave address usually occupies the first byte of the message. The LSBit may specify the type of activity that is taking place on the bus. If the LSBit is set to 1 (i.e., high), the master is reading from a slave device. If the LSBit is set to 0 (i.e., low), then the master is writing to a slave device. The MSBit is bit

7

of the second byte (0-7 bits) of the message which specifies the type of command being executed. If the MSBit is set to a 1, then the command is a read command. If the MSBit is set to a 0, then the command is a write command. The data type specifies the data format of a read or write command. There are several data types that may be used in the server system

200

. The data types include: a bit data type, a byte data type, a string data type, a log data type, an event data type, a queue data type, a byte array data type, a lock data type, and a screen data type. These data types determine the value specified in the Type field of a message.

A bit data type is typically used for a simple logic value, such as True (1) and False (0), or On (1) and Off (0). The byte data type is used for a single-byte value, with a variable length of 0 through FF (hexadecimal). A string data type is used for a variable-length string of data having a length of 0 to FF bytes. The log data type is used to write a byte string to a circular log buffer, such as the NVRAM

224

. The log data type records system events in the NVRAM

224

. A byte array data type is used for general data storage which is not anticipated in the implementation of the Intrapulse system. An event data type is used to alert external interfaces of certain events occurring in the system server

200

, such as status changes in the CPU, power supplies, canisters, cooling fans, temperature, screen, queue, and O/S timeout. A screen data type is used to communicate character mode screen data from BIOS to time remote interface unit

240

.

The command ID (LSByte) specifies the least significant byte of the device address. Command ID (MSByte) specifies the most significant byte of the device address. The status byte specifies whether or not a command has been executed successfully. A non-zero entry indicates an execution failure. The check sum byte specifies a direction control byte to ensure the integrity of a message on the bus. The check sum byte is typically calculated in the system server

200

firmware. Finally, the inverted slave address byte specifies the slave address in an inverted format. The inverted slave address byte is also calculated in the system server

200

firmware.

Referring now to

FIG. 3

, a flow chart is provided for describing the decisional steps performed by the system recorder

220

upon receiving interrupt commands from other microcontrollers. At step

302

, the system recorder

220

is typically in an idle state, i.e., waiting for commands from other microcontrollers in the network. At step

304

, the system recorder

220

determines if an interrupt command is detected from other microcontrollers. If no interrupt command is detected, then at step

306

, the system recorder

220

checks if a reset command is received from other microcontrollers. A reset command is a request to clear the all memory cells in the NVRAM

224

. If a reset command is detected, then at step

308

, the system recorder

220

clears all memory cells in the NVRAM

220

and returns to its idle state at step

302

, and the entire process repeats itself. If a reset command is not detected, then at step

310

, the system recorder

220

updates the RTC

221

time every one second. At this step, the system recorder

220

reads the real time clock and saves the real time in its local register (not shown in this figure).

If, at step

304

, an interrupt command is detected from other microcontrollers, the system recorder

220

determines the type of data in the interrupt command at step

312

. For the purpose of logging message events in the NVRAM

224

, the log data and event data type are pertinent. As noted above, the log data type is used to write a byte string to a circular log buffer, such as the NVRAM

224

. The log data type records system events in the NVRAM

224

. The maximum number of bytes that can be written in a log entry is 249 bytes. For some embodiments of the invention, the system recorder

220

adds a total of six bytes at the beginning of the interrupt command: a two-byte identification code (ID), and a four-byte timestamp for recording the real time of the occurrence of the system event.

Based on the interpretation of the data type at step

314

, the system recorder

220

determines whether the interrupt command is intended to be communicated to the first block or second block of the NVRAM

224

. If the interrupt command is intended to go to the first block of NVRAM

224

, then the process described in

FIG. 4

is followed. If the interrupt command is not intended to be transmitted to the first block of NVRAM

224

, then it is intended to go to the second block of NVRAM

224

. At step

316

, the system recorder

220

determines whether the interrupt command is a read or write command for the second block. If the interrupt command is a read command, then the process described in

FIG. 5

is followed. If the interrupt command is not a read command, then it is a write command and the process described in

FIG. 6

is followed.

Referring to

FIG. 4

, a flow chart is provided for describing the steps of performing a read from and/or write to the first block of the NVRAM

224

. As noted above, the first block of the NVRAM

224

is a 64-kbyte memory block. The first block is a fixed-variable memory block which stores ID codes of the devices installed in the network. Hence, a command addressed to the first block is typically generated by a controller (e.g., chassis controller

222

) responsible for updating the presence or absence of devices in the network. The process described in

FIG. 4

is followed when, at step

314

(shown in FIG.

3

), the system recorder

220

determines that the command interrupt is intended for the first block of the NVRAM

224

.

As shown in

FIG. 4

, at step

402

, the system recorder

220

determines whether the interrupt command is to read from or write to the NVRAM

224

. If the command interrupt is a read command, then at step

404

, the system recorder

220

loads the address pointer at the intended address location in NVRAM

224

. At step

406

, the system recorder

220

reads the intended message from the address location in the NVRAM

224

, and forwards the read data to the master device (i.e., device requesting the read operation) in the network. After the read operation is complete, at step

412

, the system recorder

220

issues an interrupt return command to return to its idle state at step

302

(shown in FIG.

3

).

If at step

402

the system recorder

220

determines that the interrupt command is a write command, then at step

408

, the system recorder

220

loads the address pointer at the intended address location in NVRAM

224

. The system recorder

220

typically checks on the availability of memory space in NVRAM

224

prior to executing a write operation (see

FIG. 6

for details). At step

408

, the system recorder

220

writes the event message to the address location in the NVRAM

224

, and forwards a confirmation to the master device in the network. After the write operation is complete, at step

412

, the system recorder

220

issues an interrupt return command to return to its idle state at step

302

(shown in FIG.

3

).

Referring now to

FIG. 5

, a flow chart is provided for describing the steps of performing a read operation from the second block of the NVRAM

224

. As noted above, the second block of the NVRAM

224

is a 64-kbyte memory block. The second block is a memory block which stores event messages in connection with events occurring in the network. Hence, a command addressed to the second block is typically generated by a controller responsible for updating the occurrence of such events. The process described in

FIG. 5

is followed when, at step

316

(shown in FIG.

3

), the system recorder

220

determines that the interrupt command is a read command intended to be transmitted to the second block of the NVRAM

224

.

As shown in

FIG. 5

, if the system recorder

220

determines that the interrupt command is a read operation, then at step

502

, the system recorder

220

loads an address pointer to the intended address in the second block of NVRAM

224

. At step

504

, the system recorder

220

performs a read operation of the first logged message from the NVRAM

224

commencing with the intended address location. For a read operation, it is preferable that only the

65534

(FFFEh) and

65533

(FFFDh) addresses be recognized. The address

65534

specifies the address of the oldest valid message. The address

65533

specifies the address of the next message following the last message read from the log in NVRAM

224

. The last address in the second block of the NVRAM

224

is

65279

(FEFFh). This is also the address at which the system recorder

220

performs a pointer wrap operation (see

FIG. 6

for details). In doing so, the system recorder

220

redirects the address pointer to the beginning of the second block of the NVRAM

224

. Hence, the address of the next message address after the

65279

address is 0. To perform a read operation of the entire second block in a chronological order, the timestamp is read first. Then, the message logged at address

65534

is read second. This message constitutes the first logged message. Then, the message logged at address

65533

is read next. This message is the next logged message. Then, the message logged at address

65533

is read again to read all subsequently logged messages. The reading at address

65533

terminates until the status field returns a non-zero value.

At step

506

, the system recorder

220

determines whether the address location has reached the end of the second block in the NVRAM

224

. If the address location has not reached the end of the second block, then at step

508

, the system recorder

220

performs a read operation of the next logged message using the addressing scheme described above. The system recorder

220

transmits all read messages to the master device via the I

2

C bus. If the address location has reached the end of the second block, then at step

510

, the system recorder

220

issues an interrupt return command to return to its idle state

302

(shown in FIG.

3

).

Referring now to

FIG. 6

, a flow chart is provided for describing the steps of performing a write operation to the second block of the NVRAM

224

. Typically, a command addressed to the second block is generated by a controller (e.g., chassis controller

222

) responsible for updating the occurrence of such events. The process described in

FIG. 6

is followed when, at step

316

(shown in FIG.

3

), the system recorder

220

determines that the interrupt command is a write command directed to the second block of the NVRAM

224

.

As shown in

FIG. 6

, if the system recorder

220

determines that the interrupt command is a write command, then at step

602

, the system recorder

220

loads an address pointer to the intended address in the second block of NVRAM

224

. At step

604

, the system recorder

220

determines whether a memory space is available in the second block of NVRAM

224

to perform the requested write operation. If a memory space is not available in the second block, then at step

606

, the system recorder

220

performs a pointer wrap operation. In doing so, the system recorder

220

redirects the address pointer to the beginning of the second block of the NVRAM

224

. The system recorder

224

erases the memory space corresponding to a single previously logged message which occupies that memory space. Additional previously logged messages are erased only if more memory space is required to perform the present write operation.

If the system recorder

220

determines that a memory space is available in the second block of the NVRAM

224

, then at step

608

, the system recorder

220

fetches the time from the real-time clock

221

and stamps (i.e., appends) the real time to the message being written. As noted above, the real time comprises a four-byte field (i.e., 32 bits) which are appended to the message being written. At step

610

, the system recorder

220

writes the time-stamped message to the second block of the NVRAM

224

. At step

612

, the system recorder

220

issues an interrupt return command to return to its idle state

302

(shown in FIG.

3

).

Upon the occurrence of a system failure, system maintenance personnel retrieve the logged event messages in a chronological fashion to identify and trace the events leading up to the time of when such failure occurred. As a result, the system failure is easily repairable with minimal downtime.

In view of the foregoing, it will be appreciated that the invention overcomes the longstanding need for logging and recording the occurrence of information system events leading up to the occurrence of system failures without the disadvantages of having complex failure diagnosis. The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive and the scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

1. A method of recording an event occurring in an information processing system having a computer bus and a system recorder, said method comprising:accessing the system recorder via the computer bus; transmitting a message to the system recorder in response to the event, wherein said message comprises a plurality of bits representing at least a slave address, a least significant bit, a most significant bit, a data type a command ID, and a status field; and storing the message in a memory unit.
2. The method as defined in claim 1, wherein the act of accessing the system recorder includes the act of accessing a real-time clock for time-stamping said message.
3. The method as defined in claim 1, wherein the act of storing the message includes the act of storing the message in a non-volatile random access memory (NVRAM) unit having an exclusive power source.
4. The method as defined in claim 1, wherein the act of storing in the memory unit includes the act of storing in a plurality of memory blocks, wherein one block of the plurality of memory blocks stores a device identification code.
5. The method as defined in claim 4, wherein another block of the plurality of memory blocks stores an event message.
6. The method as defined in claim 1, wherein the act of accessing the system recorder includes the act of supporting a read operation from the memory unit.
7. The method as defined in claim 1, wherein the act of accessing via the computer bus includes the act of accessing via an inter-integrated-circuit (I2C) bus.
8. In an information processing system having a plurality of components and experiencing first-type and second-type events, a method of storing the events comprising:receiving a signal indicative of at least one of the first-type and the second-type events; storing information relating to the first-type event associated with one of the plurality of components in a first block of a memory; and storing information relating to the second-type event associated with one of the plurality of components in a second block of the memory in a wrap around sequence, wherein said signal comprises at least a message of a failure in one of the plurality of components, and wherein the first and the second blocks of the memory are configured to store the information without restriction of time.
9. The method as defined in claim 8, wherein the act of storing the information relating to the first-type event includes the act of storing at least an identification number.
10. The method as defined in claim 8, wherein the act of storing the information relating to the first-type event includes the act of storing the an event message relating to a change in the presence of one of the plurality of components.
11. A program storage device storing instructions that when executed by a computer perform a method of recording an event occurring in an information system processing having a computer bus and a system recorder, said method comprising:accessing the system recorder via the computer bus; transmitting a message to the system recorder in response to the event, wherein said message comprises a plurality of bits representing at least a slave address, a least significant bit, a most significant bit, a data type, a command ID, and a status field; and storing the message in a memory unit.
12. The device as defined in claim 11, wherein the method further comprises time-stamping sad message for storage.
13. The device as defined in claim 11, wherein storing the message includes storing the message in a non-volatile random access memory (NVRAM) having an exclusive source.
14. The device as defined in claim 11, wherein storing the message includes storing the message in a plurality of memory blocks, and wherein one block of the plurality of memory blocks stores a device identification code.
15. The device as defined in claim 14, wherein another block of the plurality of memory blocks stores an event message.
16. The device as defined in claim 11, further comprising performing a read operation from the memory unit.
17. The device as defined in claim 11, wherein accessing the system recorder via the computer bus includes accessing the system recorder via an inter-integrated circuit (I2C) bus.
18. A program storage device storing instructions that when executed by a computer perform, in an information processing system having a plurality of components and experiencing first-type and second-type events, a method of storing the events comprising:receiving a signal indicative of at least one of the first-type and the second-type events; storing information relating to the first-type event associated with one of the plurality of components in a first block of a memory; and storing information relating to the second-type event associated with one of the plurality of components in a second block of the memory in a wrap around sequence, wherein the signal comprises at least a message of a failure in one of the plurality of components, and wherein the first and the second blocks of the memory are configured to store the information without restriction of time.
19. The device as defined in claim 18, wherein storing information relating to a first-type event includes storing information relating to at least an identification number.
20. The device as defined in claim 18, wherein storing information relating to the first-type event includes storing information that relates to a change in the presence of one of the plurality of components.
21. The device as defined in claim 18, wherein storing information relating to the second-type event includes storing information that relates to a failure occurring in one of the plurality of components.
22. The device as defined in claim 18, further comprising allocating the size of the first block to be equal to the size of the second block.
23. A method of recording information relating to an event experienced by a component in an information system having a processor and a memory unit, said method comprising:receiving a command including data from the processor, wherein the command comprises a plurality of bits representing at least a slave address, a least significant bit, a most significant bit, a data type, a command ID, and a status field; determining the data type of the command; and performing at least one of a read and write operation in the memory unit in response to the act of determining the data type of the command.
24. The method as defined in claim 23, wherein the act of receiving a command includes the act of receiving an interrupt command.
25. The method as defined in claim 23, wherein the act of receiving the command includes the act of determining if the command has been received.
26. The method as defined in claim 25, further comprising the act of determining if a reset command has been received.
27. The method as defined in claim 26, further comprising the act of clearing the memory unit if the reset command is received.
28. The method as defined in claim 26, further comprising the act of saving, a real time if the reset command is not received.
29. The method as defined in claim 25, further comprising the act or determining if the command is intended for a first block of the memory unit.
30. The method as defined in claim 29, further comprising the act of determining if the command is a read command.
31. The method as defined in claim 30, further comprising the act of performing a read operation from the memory unit, if the command is a read command.
32. The method as defined in claim 30, further comprising the act of performing a write operation to the memory unit, if the command is not a read command.
33. The method as defined in claim 29, further comprising the act of determining if the command is a read command, if the command is not intended for the first block of the memory unit.
34. The method as defined in claim 33, further comprising the act of performing a read operation from the memory unit, if the command is a read command.
35. The method as defined in claim 33, further comprising the act of performing a write operation to the memory unit, if the command is not a read command.
36. The method as defined in claim 23, further comprising the act of returning to an idle state after performing a read operation.
37. The method as defined in claim 25, further comprising the act of returning to an idle state after performing a write operation.
38. A method of recording an event occurring in an information processing system having a computer bus, said method comprising:monitoring the occurrence of the event; generating a message in response to the event; accessing a system recorder via the computer bus; transmitting the message to the system recorder, wherein said message comprises a plurality of bits representing at least a slave address, a least significant bit, a most significant bit, a data type, a command ID, and a status field; time-stamping the message using a substantially real-time clock; and storing the message in a memory unit.
39. The method as defined in claim 38, wherein the act of monitoring the occurrence of the event includes the act of monitoring the occurrence of a failure in the information processing system.
40. In an information processing system having a plurality of components and experiencing an event, a method of storing the event comprising:storing information relating to the presence of one of the plurality of components into a first memory block; and storing information relating to the failure of one of the plurality of components into a second memory block in a wrap around sequence, wherein the first and second memory blocks are configured to store the information without restriction of time.

RELATED APPLICATIONS

The subject matter of U.S. Patent Application entitled BLACK BOX RECORDER FOR INFORMATION SYSTEM EVENTS, filed on Oct. 1, 1997, application Ser. No. 08/942,381. The benefit under 35 U.S.C. §119(e) of the following U.S. provisional application(s) is hereby claimed:

US Referenced Citations (241)

Number	Name	Date
4057847	Lowell et al.	Nov 1977
4449182	Rubinson et al.	May 1984
4672535	Katzman et al.	Jun 1987
4695946	Andreasen et al.	Sep 1987
4707803	Anthony, Jr. et al.	Nov 1987
4769764	Levanon	Sep 1988
4774502	Kimura	Sep 1988
4821180	Gerety et al.	Apr 1989
4835737	Herrig et al.	May 1989
4949245	Martin et al.	Aug 1990
4999787	McNally et al.	Mar 1991
5006961	Monico	Apr 1991
5007431	Donehoo, III	Apr 1991
5051720	Kittirutsunetorn	Sep 1991
5073932	Yossifor et al.	Dec 1991
5103391	Barrett	Apr 1992
5118970	Olson et al.	Jun 1992
5123017	Simpkins et al.	Jun 1992
5136715	Hirose et al.	Aug 1992
5157663	Major et al.	Oct 1992
5210855	Bartol	May 1993
5222897	Collins et al.	Jun 1993
5247683	Holmes et al.	Sep 1993
5253348	Scalise	Oct 1993
5261094	Everson et al.	Nov 1993
5265098	Mattson et al.	Nov 1993
5266838	Gerner	Nov 1993
5269011	Yanai et al.	Dec 1993
5272382	Heald et al.	Dec 1993
5272584	Austruy et al.	Dec 1993
5276814	Bourke et al.	Jan 1994
5283905	Saadeh et al.	Feb 1994
5307354	Cramer et al.	Apr 1994
5311451	Barrett	May 1994
5317693	Cuenod et al.	May 1994
5329625	Kannan et al.	Jul 1994
5337413	Lui et al.	Aug 1994
5351276	Doll, Jr. et al.	Sep 1994
5367670	Ward et al.	Nov 1994
5379184	Barraza et al.	Jan 1995
5379409	Ishikawa	Jan 1995
5386567	Lien et al.	Jan 1995
5402431	Saadeh et al.	Mar 1995
5404494	Garney	Apr 1995
5423025	Goldman et al.	Jun 1995
5430717	Fowler et al.	Jul 1995
5430845	Rimmer et al.	Jul 1995
5432946	Allard et al.	Jul 1995
5440748	Sekine et al.	Aug 1995
5455933	Schieve et al.	Oct 1995
5465349	Geronimi et al.	Nov 1995
5471617	Farrand et al.	Nov 1995
5471634	Giorgio et al.	Nov 1995
5473499	Weir	Dec 1995
5483419	Kaczeus, Sr. et al.	Jan 1996
5485607	Lomet et al.	Jan 1996
5487148	Komori et al.	Jan 1996
5491791	Glowny et al.	Feb 1996
5493574	McKinley	Feb 1996
5493666	Fitch	Feb 1996
5513314	Kandasamy et al.	Apr 1996
5513339	Agrawal et al.	Apr 1996
5515515	Kennedy et al.	May 1996
5517646	Piccirillo et al.	May 1996
5519851	Bender et al.	May 1996
5526289	Dinh et al.	Jun 1996
5528409	Cucci et al.	Jun 1996
5530810	Bowman	Jun 1996
5533198	Thorson	Jul 1996
5539883	Allon et al.	Jul 1996
5542055	Amini et al.	Jul 1996
5546272	Moss et al.	Aug 1996
5555510	Verseput et al.	Sep 1996
5559764	Chen et al.	Sep 1996
5559958	Farrand et al.	Sep 1996
5559965	Oztaskin et al.	Sep 1996
5564024	Pemberton	Oct 1996
5566299	Billings et al.	Oct 1996
5568610	Brown	Oct 1996
5568619	Blackledge et al.	Oct 1996
5572403	Mills	Nov 1996
5577205	Hwang et al.	Nov 1996
5579491	Jeffries et al.	Nov 1996
5579528	Register	Nov 1996
5581712	Herrman	Dec 1996
5581714	Amini et al.	Dec 1996
5586250	Carbonneau et al.	Dec 1996
5588121	Reddin et al.	Dec 1996
5588144	Inoue et al.	Dec 1996
5592610	Chittor	Jan 1997
5598407	Bud et al.	Jan 1997
5602758	Lincoln et al.	Feb 1997
5604873	Fite et al.	Feb 1997
5606672	Wade	Feb 1997
5608865	Midgely et al.	Mar 1997
5608876	Cohen et al.	Mar 1997
5615207	Gephardt et al.	Mar 1997
5621159	Brown et al.	Apr 1997
5621892	Cook	Apr 1997
5622221	Genga, Jr. et al.	Apr 1997
5628028	Michelson	May 1997
5632021	Jennings et al.	May 1997
5636341	Matsushita et al.	Jun 1997
5638289	Yamada et al.	Jun 1997
5644470	Benedict et al.	Jul 1997
5644731	Liencres et al.	Jul 1997
5651006	Fujino et al.	Jul 1997
5652832	Kane et al.	Jul 1997
5652833	Takizawa et al.	Jul 1997
5652892	Ugajin	Jul 1997
5655081	Bonnell et al.	Aug 1997
5655148	Richman et al.	Aug 1997
5659682	Devarakonda et al.	Aug 1997
5664119	Jeffries et al.	Sep 1997
5666538	DeNicola	Sep 1997
5671371	Kondo et al.	Sep 1997
5675723	Ekrot et al.	Oct 1997
5680288	Carey et al.	Oct 1997
5682328	Roeber et al.	Oct 1997
5684671	Hobbs et al.	Nov 1997
5685671	Hobbs et al.	Nov 1997
5689637	Johnson et al.	Nov 1997
5696895	Hemphill et al.	Dec 1997
5696899	Kalwitz	Dec 1997
5696949	Young	Dec 1997
5696970	Sandage et al.	Dec 1997
5701417	Lewis et al.	Dec 1997
5704031	Mikami et al.	Dec 1997
5708775	Nakamura	Jan 1998
5708776	Kikinis	Jan 1998
5712754	Sides et al.	Jan 1998
5721935	DeSchepper et al.	Feb 1998
5724529	Smith et al.	Mar 1998
5726506	Wood	Mar 1998
5737708	Grob et al.	Apr 1998
5737747	Vishlitzky et al.	Apr 1998
5740378	Rehl et al.	Apr 1998
5742514	Bonola	Apr 1998
5742833	Dea et al.	Apr 1998
5747889	Raynham et al.	May 1998
5748426	Bedingfield et al.	May 1998
5752164	Jones	May 1998
5754396	Felcman et al.	May 1998
5754449	Hoshal et al.	May 1998
5754797	Takahashi	May 1998
5758352	Reynolds et al.	May 1998
5761033	Wilhelm	Jun 1998
5761045	Olson et al.	Jun 1998
5761462	Neal et al.	Jun 1998
5761707	Aiken et al.	Jun 1998
5764924	Hong	Jun 1998
5764968	Ninomiya	Jun 1998
5765008	Desai et al.	Jun 1998
5765198	McCrocklin et al.	Jun 1998
5767844	Stoye	Jun 1998
5768541	Pan-Ratzlaff	Jun 1998
5768542	Enstrom et al.	Jun 1998
5771343	Hafner	Jun 1998
5774640	Kurio	Jun 1998
5774645	Beaujard et al.	Jun 1998
5774741	Choi	Jun 1998
5778197	Dunham	Jul 1998
5781703	Desai et al.	Jul 1998
5781716	Hemphill et al.	Jul 1998
5781767	Inoue et al.	Jul 1998
5781798	Beatty et al.	Jul 1998
5784576	Guthrie et al.	Jul 1998
5787459	Stallmo et al.	Jul 1998
5790775	Marks et al.	Aug 1998
5790831	Lin et al.	Aug 1998
5793948	Asahi et al.	Aug 1998
5793987	Quackenbush et al.	Aug 1998
5794035	Golub et al.	Aug 1998
5796185	Takata et al.	Aug 1998
5796580	Komatsu et al.	Aug 1998
5796934	Bhanot et al.	Aug 1998
5796981	Abudayyeh et al.	Aug 1998
5798828	Thomas et al.	Aug 1998
5799036	Staples	Aug 1998
5799196	Flannery	Aug 1998
5801921	Miller	Sep 1998
5802269	Poisner et al.	Sep 1998
5802305	McKaughan et al.	Sep 1998
5802324	Wunderlich et al.	Sep 1998
5802393	Begun et al.	Sep 1998
5802552	Fandrich et al.	Sep 1998
5803357	Lakin	Sep 1998
5805804	Laursen et al.	Sep 1998
5805834	McKinley et al.	Sep 1998
5809224	Schultz et al.	Sep 1998
5809256	Najemy	Sep 1998
5809555	Hobson	Sep 1998
5812748	Ohran et al.	Sep 1998
5812750	Dev et al.	Sep 1998
5812757	Okamoto et al.	Sep 1998
5812858	Nookala et al.	Sep 1998
5815117	Kolanek	Sep 1998
5815651	Litt	Sep 1998
5815652	Ote et al.	Sep 1998
5821596	Miu et al.	Oct 1998
5822547	Boesch et al.	Oct 1998
5826043	Smith et al.	Oct 1998
5829046	Tzelnic et al.	Oct 1998
5835719	Gibson et al.	Nov 1998
5835738	Blackledge, Jr. et al.	Nov 1998
5838932	Alzien	Nov 1998
5841964	Yamaguchi	Nov 1998
5841991	Russell	Nov 1998
5850546	Kim	Dec 1998
5852720	Gready et al.	Dec 1998
5852724	Glenn, II et al.	Dec 1998
5857074	Johnson	Jan 1999
5864653	Tavallaei et al.	Jan 1999
5864654	Marchant	Jan 1999
5864713	Terry	Jan 1999
5867730	Leyda	Feb 1999
5875308	Egan et al.	Feb 1999
5875310	Buckland et al.	Feb 1999
5878237	Olarig	Mar 1999
5878238	Gan et al.	Mar 1999
5881311	Woods	Mar 1999
5884049	Atkinson	Mar 1999
5886424	Kim	Mar 1999
5892898	Fujii et al.	Apr 1999
5892915	Duso et al.	Apr 1999
5893140	Vahalia et al.	Apr 1999
5907672	Matze et al.	May 1999
5909568	Nason	Jun 1999
5911779	Stallmo et al.	Jun 1999
5913034	Malcolm	Jun 1999
5922060	Goodrum	Jul 1999
5930358	Rao	Jul 1999
5935262	Barrett et al.	Aug 1999
5936960	Stewart	Aug 1999
5938751	Tavallaei et al.	Aug 1999
5941996	Smith et al.	Aug 1999
5964855	Bass et al.	Oct 1999
5983349	Kodama et al.	Nov 1999
5987621	Duso et al.	Nov 1999
5987627	Rawlings, III	Nov 1999
6038624	Chan et al.	Mar 2000

Foreign Referenced Citations (5)

Number	Date	Country
0 866 403 A1	Sep 1998	EP
4-333118	Nov 1992	JP
5-233110	Jan 1993	JP
7-093064	Apr 1995	JP
7-261874	Oct 1995	JP

Non-Patent Literature Citations (25)

Entry
“The Flight Rcorder: An Architectural Aid For System Monitoring”, Michael M. Gorlick, Conference Proceedings on ACM/ONR workshop on parallel and distributed debugging, p 175-181, 1991.*
Lyons, Computer Reseller News, Issue 721, pp. 61-62, Feb. 3, 1997, “ACC Releases Low-Cost Solution for ISPs.”
M2 Communications, M2 Presswire, 2 pages, Dec. 19, 1996, “Novell IntranetWare Supports Hot Pluggable PCI from NetFRAME.”
Rigney, PC Magazine, 14(17):375-379, Oct. 10, 1995, “The One for the Road (Mobile-aware capabilities in Windows 95).”
Shanley, and Anderson, PCI System Architecture, Third Edition, p. 382, Copyright 1995.
Gorlick, M., Conf. Proceedings: ACM/ONR Workshop on Parallel and Distributed Debugging, pp. 175-181, 1991, “The Flight Recorder: An Architectural Aid for System Monitoring.”
IBM Technical Disclosure Bulletin, 92A+62947, pp. 391-394, Oct. 1992, Method for Card Hot Plug Detection and Control.
Haban, D. & D. Wybranietz, IEEE Transaction on Software Engineering, 16(2):197-211, Feb. 1990, “A Hybrid Monitor for Behavior and Performance Analysis of Distributed Systems.”
Shanley and Anderson, PCI System Architecture, Third Edition, Chapters 15 & 16, pp. 297-328, CR 1995.
PCI Hot-Plug Specification, Preliminary Revision for Review Only, Revision 0.9, pp. i-vi, and 1-25, Mar. 5, 1997.
SES SCSI-3 Enclosure Services, X3T10/Project 1212-D/Rev 8a, pp. i, iii-x, 1-76, and I-1 (index), Jan. 16, 1997.
Compaq Computer Corporation, Technology Brief, pp. 1-13, Dec. 1996, “Where Do I Plug the Cable? Solving the Logical-Physical Slot Numbering Problem.”
Standard Overview, http://www.pc-card.com/stand_overview.html#1, 9 pages, Jun. 1990, “Detailed Overview of the PC Card Standard.”
Digital Equipment Corporation, datasheet, 140 pages, 1993, “DECchip 21050 PCI-TO-PCI Bridge.”
NetFRAME Systems Incorporated, News Release, 3 pages, referring to May 9, 1994, “NetFRAME's New High-Availability ClusterServer Systems Avoid Scheduled as well as Unscheduled Downtime.”
Compaq Computer Corporation, Phenix Technologies, LTD, and Intel Corporation, specification, 55 pages, May 5, 1995, “Plug & Play BIOS Specification.”
NetFRAME Systems Incorporated, datasheet, 2 pages, Feb. 1996, “NF450FT Network Mainframe.”
NetFRAME Systems Incorporated, datasheet, 9 pages, Mar. 1996, “NetFRAME Cluster Server 8000.”
Joint work by Intel Corporation, Compaq, Adaptec, Hewlett Packard, and Novell, presentation, 22 pages, Jun. 1996, “Intelligent I/O Architecture.”
Lockareff, M. HTINews, http://www.hometoys.com/htinews/dec96/articles/lonworks.htm, 2 pages, Dec. 1996, “Loneworks—An Introduction.”
Schofield, M.J., http://www.omegas.co.uk/CAN/canworks.htm, 4 pages, Copyright 1996, 1997, “Controller Area Network—How CAN Works.”
NTRR, Ltd., http://www.nrtt.demon.co.uk/cantech.html, 5 pages, May 28, 1997, “CAN: Technical Overview.”
Herr, et al., Linear Technology Magazine, Design Features, pp . 21-23, Jun. 1997, “Hot Swapping the PCI Bus.”
PCI Special Interest Group, specification, 35 pages, Draft For Review Only, Jun. 15, 1997, “PCI Bus Hot Plug Specification.”
Microsoft Corporation, file:///A\|/Rem_devs.htm. 4 pages, Copyright 1997, undated Aug. 13, 1997, “Supporting Removable Devices Under Windows and Windows NT.”

Provisional Applications (3)

Number	Date	Country
60/046397	May 1997	US
60/047016	May 1997	US
60/046416	May 1997	US

Method of recording information system events

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications

Disclaimer