1. Technical Field
The present invention relates to computer networks in general, and more particularly, to a method for providing notifications of a failing node to other nodes within a computer network.
2. Description of Related Art
High-availability computer networks typically include multiple interconnected nodes (or computer systems). Since the processing load of a computer network may be distributed across multiple nodes, the nodes within a high-availability computer network are becoming increasingly interdependent. If one node within a computer network experiences a failure, the problem can impair the performance of other nodes within the computer network.
In a conventional high-availability computer network, a failing node is aware of its own failure and can send a failure notification to a service personnel when a problem occurs. However, a node that depends on the failing node will continue to operate normally (i.e., without any knowledge of the failure) until the node that depends on the failing node attempts to contact the failing node. Upon learning of the failure node, the node that depends on the failing node must handle the unexpected failure in a reactive manner. Furthermore, the node that depends on the failing node typically does not have the ability to determine the details of a failure occurring on another node. Thus, a huge amount of time and resources can be used to determine the cause, severity, and potential corrective actions for a failing node.
Consequently, it would be desirable to provide an improved method for supplying notifications of a failing node to other nodes within a computer network.
In accordance with a preferred embodiment of the present invention, a first node monitors data traffic within a computer network. If the data traffic includes data exchanged between the first node and a second node, the first node adds the second node to a list of interested nodes stored within the first node. If the first node experiences an error, the first node generates an error notification packet that includes a hop limit value that corresponds to a pre-defined level of nodes within the computer network that the error notification packet may propagate. The first node sends the error notification packet with the hop limit value to the second node and other nodes within the list of interested nodes. After receiving the error notification packet, the second node decrements the hop limit, performs one or more actions, and if the hop limit value is greater than zero, the second node also forwards the error notification packet to each node within its list of interested nodes.
All features and advantages of the present invention will become apparent in the following detailed written description.
The invention itself, as well as a preferred mode of use, further objects, and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
With reference now to the drawings, and in particular to
For the present embodiment, each of nodes 105A-105G within computer network 100 is similarly configured and includes a processor, a memory, and an input/output (I/O) interface. For example, node 105A includes a processor 10A, which is coupled to a memory 115A and an I/O interface 120A. I/O interface 120A enables node 105A to communicate with one or more other nodes, such as node 105B and node 105C, within computer network 100.
With reference now to
With the present invention, a node that experiences an error sends an error notification packet to one or more interested nodes, and in turn, each of which may then send its own error notification packet to their own list of interested nodes. A hop limit counter, such as hop limit counter 205, contains a pre-defined value that determines how far out within a computer network an error notification packet will propagate, and each error notification packet contains the value from the hop limit counter of the node that sends the error notification packet.
For example, if node 105A experiences an error, node 105A will send an error notification packet to other nodes. Since interested nodes list 200 include node B, node C, node E, and node N, node 105A will send an error notification packet to nodes B, C, E and N, and each of which will, in turn, send its own error notification packet to other nodes according to their respective interested nodes list. Since the value within hop limit counter 205 is 1, the error notification packet can only propagate to exactly one more level of nodes, and each of nodes B, C, E and N will only forward its own error notification packet to nodes on its interested nodes list.
Referring now to
Referring now to
Referring now to
Referring now to
Next, a determination is made whether or not the node that received the error notification packet has previously received the error notification packet, as shown in block 332. If the node that received the error notification packet has previously received the error notification packet, the process terminates at block 345. Otherwise, if the node that received the error notification packet has not previously received the error notification packet, another determination is made whether or not the hop limit value included in the error notification packet is greater than 0, as shown in block 335. If the hop limit value is not greater than 0, the node that received the error notification packet will not forward the error notification packet, and the process terminates at block 345. Otherwise, if the hop limit value is greater than 0, the node that received the error notification packet forwards the error notification packet to each node on its corresponding list of interested nodes, as depicted in block 440, and the process returns to block 330. As mentioned above, the maximum number of error notification packets that can be forwarded to other nodes is dictated by the value of the hop limit value in the first error notification packet.
As has been described, the present invention provides an improved method for providing notifications of a failing node to other nodes within a computer network.
While an illustrative embodiment of the present invention has been described in the context of a fully functional storage system, those skilled in the art will appreciate that the software aspects of an illustrative embodiment of the present invention are capable of being distributed as a program product in a variety of forms, and that an illustrative embodiment of the present invention applies equally regardless of the particular type of media used to actually carry out the distribution. Examples of the types of media include recordable type media such as thumb drives, floppy disks, hard drives, CD ROMs, DVDs, and transmission type media such as digital and analog communication links.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.