A distributed system typically includes a number of interconnected nodes. The nodes typically include a processor and memory (e.g., Random Access Memory (RAM)). In addition, the nodes also include the necessary hardware and software to communicate with other nodes in the distributed system. The interconnected nodes may also communicate with each other using an overlay network. Nodes belonging to the overlay network route messages between each other using the underlying networking infrastructure (e.g., Internet Protocol (IP) and Transmission Control Protocol (TCP), etc.). While the underlying network infrastructure has the information and capability to directly route messages between specific computers, overlay networks typically maintain only partial routing information and rely on successive forwarding through intermediate nodes in order to deliver a message to its final intended destination.
One common use for overlay networks is to build distributed hash tables (DHT). In one implementation, each node in the overlay network is associated with a Globally Unique Identifier (GUID) and stores a part of the DHT. When a node (i.e., the requesting node) requires a piece of data stored on a node (i.e., a target node) in the overlay network, the requesting node determines the GUID associated with target node, which contains the requested data. The requesting node then queries its routing table entries (i.e., the DHT entries) to find the node (i.e., an intermediate node) with the GUID closest to the target node's GUID. The request is then forwarded to the intermediate node. The intermediate node follows the same process, comparing the target node's GUID with the intermediate node's routing table entries. The aforementioned process is repeated until the target node is reached. Typically, the overlay network maintains enough information in the DHT to determine the appropriate intermediate node.
To store data in the aforementioned overlay network, the data is loaded onto a particular node (i.e., a target node) in the overlay network and is associated with a GUID. The node that stores the data subsequently publishes the presence of the GUID on the node. Another node (i.e., the root node) in the network stores the necessary information in its DHT to indicate that the data associated with the GUID is stored in the target node. It is important to note that any given node in the overlay network may operate as both a target node (i.e., stores data) and as a root node (i.e., maintains a DHT). Typically, a given root node is only responsible for a certain range of GUIDs.
In general, in one aspect, the invention relates to a method for monitoring a target node in a distributed system, comprising determining a plurality of neighbor nodes of the target node, determining a plurality of neighbor watch nodes, wherein the plurality of neighbor watch nodes are selected from the plurality of neighbor nodes, monitoring at least one selected from the group consisting of data sent by the target node and data received by the target node, using at least one of the plurality of neighbor watch nodes to obtain tracking information, and determining, using at least one the plurality of neighbor watch nodes, an action to perform using the tracking information and a response policy, wherein the action is specified in the response policy, wherein the distributed system implements an overlay network for message delivery.
In general, in one aspect, the invention relates to a distributed system, comprising a target node, a plurality of neighbor watch nodes configured to monitor at least one selected from the group consisting of data sent by the target node and data received by the target node, using at least one of the plurality of neighbor watch nodes to obtain tracking information, and determine, using at least one the plurality of neighbor watch nodes, an action to perform using the tracking information and a response policy, wherein the action is specified in the response policy, wherein each of the plurality of neighbor watch nodes is a neighbor of the target node, and wherein the distributed system implements an overlay network for message delivery.
In general, in one aspect, the invention relates to a computer readable medium comprising software instructions for monitoring a target node in a distributed system, comprising software instructions to determine a plurality of neighbor nodes of the target node, determine a plurality of neighbor watch nodes, wherein the plurality of neighbor watch nodes are selected from the plurality of neighbor nodes, monitor at least one selected from the group consisting of data sent by the target node and data received by the target node, using at least one of the plurality of neighbor watch nodes to obtain tracking information, and determine, using at least one the plurality of neighbor watch nodes, an action to perform using the tracking information and a response policy, wherein the action is specified in the response policy, wherein the distributed system implements an overlay network for message delivery.
Other aspects of the invention will be apparent from the following description and the appended claims.
Exemplary embodiments of the invention will be described with reference to the accompanying drawings. Like items in the drawings are shown with the same reference numbers.
In an embodiment of the invention, numerous specific details are set forth in order to provide a more thorough understanding of the invention. However, it will be apparent to one of ordinary skill in the art that the invention may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid obscuring the invention.
In general, embodiments of the invention relate to a method and system for monitoring a target node in a distributed system. More specifically, one or more neighboring nodes of the target node monitor the target node and determine whether the target node is operating within specified operating parameters. In one or more embodiments of the invention, if the target node is operating outside the specified operating parameters, one or more of the neighboring nodes that are monitoring the target node may take an appropriate action based on a response policy. In one embodiment of the invention, the appropriate action may include disconnecting the target node from the overlay network (e.g., using DHT), sending the target node a warning of potential disconnection, not using results generated by the target node, etc.
Once the neighbor node(s) have been determined, a subsequent determination is made about which of the neighbor node(s) will be designated to monitor the target node (i.e., neighbor watch nodes) (ST104). In one embodiment of the invention, the aforementioned determination is made using a pre-set policy. For example, the pre-set policy determines the neighbor watch nodes based on the GUIDs of the neighbor node(s). Those skilled in the art will appreciate that any pre-set policy of determining neighbor watch node(s) may be used.
Those skilled in the art will appreciate that in one or more embodiments of the invention, all the neighbor node(s) of a given target node may be neighbor watch nodes. Those skilled in the art will also appreciate that a given target node may only include one neighbor node and, thus, the single neighbor node is set as the neighbor watch node.
Once the neighbor watch nodes have been determined, a tracking policy is specified (ST106). In one embodiment of the invention, the tracking policy specifies the types of data to track (i.e., monitor). More specifically, the neighbor watch nodes include functionality to monitor the data received by the target node and data sent from the target node. Thus, the purpose of the tracking policy is to determine what specific data to monitor. In some embodiments, all the data is monitored. In other embodiments, only specific pieces of data are monitored. In addition, the tracking policy also specifics how to extract (and/or aggregate) relevant information from the data that is monitored. For example, the tracking policy may include one or more algorithms used to determine the average response time of the target node.
In addition to the tracking policy, a response policy is also specified (ST108). The response policy may include, but is not limited to, one or more specified operating parameters for the target node, a voting policy, and one or more actions to be taken if the target node is operating outside the specified operation parameter(s). In one embodiment of the invention, the specified operating parameter(s) may include any operating parameter of interest. For example, the specified operating parameter(s) may indicate the minimum level of reliability of the target node (e.g., the percentage of time that the target node responds in a timely manner), the minimum level of availability of target node (e.g., the percentage of time the target node is online), the minimum level of validity of data generated by the target node (e.g., the percentage of time that the target node provides valid data). The above examples are for illustration purposes only and are not intended to limit the scope of the invention.
In one embodiment of the invention, the voting policy specifies how the neighbor watch nodes collectively determine whether to take action (in some cases what action to take) when the target node is operating outside the specified operating parameters. In one embodiment of the invention, the voting policy specifies the minimum number of neighbor watch nodes which must agree to take action on the target node. Further, in one embodiment of the invention, the voting policy may specify a minimum number of neighbor watch nodes required to reach a quorum (i.e., the number of neighbor watch nodes that must participate in the voting) and the percentage of the quorum that must vote in favor of taking action. Those skilled in the art will appreciate that other policies governing voting by the neighbor watch nodes may be included in the voting policy. In one embodiment of the invention, the voting policy prevents one neighbor watch node from unilaterally taking an action (e.g., disconnecting the target node from the distributed network) on the target node.
In one embodiment of the invention, the actions listed in the response policy may include, but are not limited to: (1) disconnecting the target node from the distributed network (i.e., the neighbor node(s) stop communicating with the target node, thereby effectively disconnecting the target node from the network), (2) sending a warning message to the target node that indicates that the target node will be disconnected within a certain time period unless the target node begins to operate within the specified operating parameters, (3) ceasing to use data sent from the target node, etc. Those skilled in the art will appreciate that ST106 and ST108 may be performed in any order and at anytime prior to implementing the invention.
At this stage the distributed system includes the necessary information (e.g., the tracking policy, the response policy, etc.) to implement one or more embodiments of the invention.
Once the communication channel between the neighbor watch nodes has been established, the neighbor watch nodes proceed to track data sent by the target node and/or data received by the target node (ST122). The tracking of data sent from the target node and/or received by the target node is performed in accordance with the tracking policy. The result of applying the tracking policy to the data received by the target node and/or data sent by the target node is referred to as tracking information. In one embodiment of the invention, the tracking information is communicated to one or more of the neighbor watch nodes via the communication channel established in ST120.
At certain time intervals, the neighbor watch nodes determine whether the target node is operating within the specified operating parameters (ST124). Those skilled in the art will appreciate that other events, as opposed to time intervals, may trigger ST124. Continuing with the discussion of ST124, the neighbor watch nodes uses the tracking information, the voting policy, and the specified operating parameters to determine whether the target node is operating outside the specified operating parameters. In one embodiment of the invention, one of the neighbor watch nodes initiates a vote to take action on the target node. The other neighbor watch nodes then vote to determine whether to take action on the target node. The individual neighbor watch nodes use the generated tracking information as well as the tracking information received from the other neighbor watch nodes, to determine how to vote (i.e. vote to take action or vote not to take action). Depending on the tracking information available to a given neighbor watch node, the results of the vote may not always be unanimous.
If the neighbor watch nodes determine that the target node is operating outside the specified operating parameters, then the neighbor watch nodes may take one or more of actions specified in the response policy (ST126). Alternatively, if the neighbor watch nodes determine that the target node is operating within the normal operating parameters, then the neighbor watch nodes continue to track data sent by the target node and/or data received by the target node (ST128).
As discussed above, in one or more embodiments of the invention, a subset of the neighbor nodes may be designated as neighbor watch nodes. In
Once the communication channel between the neighbor nodes is established, the neighbor watch nodes may proceed to monitor/track data received by the Node A (100) and/or data sent from the target node. In one embodiment of the invention, neighbor watch nodes track data received by the Node A (100) and/or data sent from Node A (100) by, for example, intercepting messages communicated to and/or from Node A (100) via the overlay network using DHT.
In the distributed system shown in
The aforementioned discussion of the invention described an embodiment of the invention in which the target node was known. In another embodiment of the invention, each node in the distributed system includes functionality to determine neighbor node(s) of a target node, functionality to determine neighbor watch nodes from the neighbor node(s), functionality to implement the tracking policy, and functionality to implement the response policy. In this embodiment, the nodes (which include the aforementioned functionality) are deployed in the distributed system. At some point during the operating of the distributed system, a target node may be selected. Once the target node is selected, the neighbor nodes of the target node begin to monitor the target node (and take action on the target node if the target node is operating outside the specified operating parameters). At a later point, the target node may no longer be a target node (i.e., there is no need to monitor the node). Once there is no target node, the neighbor watch node(s) cease monitoring the target node.
Those skilled in the art will appreciate that the distributed system may include more than one target node. Further, a given node may be both a target node and a neighbor watch node. Moreover, a given node may be a neighbor node and a neighbor watch node for more than one target node.
An embodiment of the invention may be implemented on virtually any type of computer regardless of the platform being used. For example, as shown in
As noted above, embodiments of the invention may be used to determine the validity of data generated by a target node. In one embodiment of the invention, the validity of data generated by the target node may be determined by having two or more of the neighbor watch nodes send a request for a certain piece of data. Once each of the neighbor watch nodes which sent the request receives a response with the certain piece of data, the aforementioned neighbor watch nodes compare the responses received from the target node to determine whether the certain piece of data is valid. If the responses received by the individual neighbor watch nodes which requested the certain piece of data are outside a limited and/or pre-specified range, then the data generated by the target node is deemed to be invalid.
Alternatively, neighbor nodes may temporarily hold the data sent from the target node until the neighbor watch nodes reach an agreement (using the voting policy) as to what data should be sent from the target node (i.e., whether the data sent from the target node is valid). Once an agreement is reached, the neighbor watch nodes signal the neighbor nodes (which include the neighbor watch nodes) to forward the temporarily held data to the destination if the temporarily held data corresponds to data which the neighbor watch nodes agreed is valid.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.
Number | Name | Date | Kind |
---|---|---|---|
6112257 | Mason et al. | Aug 2000 | A |
6119143 | Dias et al. | Sep 2000 | A |
6263368 | Martin | Jul 2001 | B1 |
6327622 | Jindal et al. | Dec 2001 | B1 |
6405252 | Gupta et al. | Jun 2002 | B1 |
6484143 | Swildens et al. | Nov 2002 | B1 |
6553413 | Leighton et al. | Apr 2003 | B1 |
6587878 | Merriam | Jul 2003 | B1 |
6820133 | Grove et al. | Nov 2004 | B1 |
7111061 | Leighton et al. | Sep 2006 | B2 |
7483391 | Xu et al. | Jan 2009 | B2 |
20040078490 | Anderson et al. | Apr 2004 | A1 |