This relates to networks and, more particularly, to selecting a subset of network nodes from a given set of network nodes. It is applicable, for example, to monitoring the behavior of networks.
A data network such as the Internet comprises nodes (e.g., routers) and links that interconnect the nodes. A typical objective of such networks is to establish connections between nodes that utilize the network most effectively, which translates to the objective of choosing a best path from a given originating node of a connection to a given terminating node of the connection. One well known algorithm for choosing a path from an originating node to a terminating node is the Open Shortest Path First (OSPF) algorithm, where each link of the network has an associated cost; a path from node N1 to node N2 is said to have a cost that corresponds to the sum of the costs of the links which form the path, and the algorithm identifies a path that has the lowest cost.
There is a recognized need to know the operational state of the network—such as packet loss rate, packet delay through the routers and links, etc.—and to that end, there is—a need to measure the traffic that flows through the various links and nodes. This need exists in the data network as a whole, and also in sub-networks of the data network, such as virtual private networks within a data network.
Whether it is the entire network or a sub-network, the situation typically is the same: an administrator desires to monitor a specific set of nodes (herein referred to as branch nodes) and is able to perform this monitoring through equipment or modules that the administrator is able to install in any of a given set of network nodes (herein referred to as potential monitoring nodes). The branch nodes and the potential monitoring nodes may or may not be disjoint; meaning that one or more of the potential monitoring nodes may also be branch nodes.
It would be beneficial to be able to choose a small set of nodes from among the set of potential monitoring nodes as the actual monitoring nodes.
An improvement in the art is realized with a method that, in general, identifies a subset of nodes from a given set of nodes; that subset satisfies a requirement related to disjointness of shortest paths to nodes in another given set of nodes. In connection with a network monitoring embodiment, the disclosed method, for each branch node, chooses a pair of nodes that are reached by the branch node by disjoint paths. For the set of branch nodes, the method chooses a set of monitoring nodes from the set of potential monitoring nodes such that each branch node can be monitored by at least two monitoring nodes and, moreover, that the nodes that are chosen to monitor a branch node monitor that node through paths from the branch node that are node disjoint (except for the branch node). That is, given a set B of branch nodes b, and a set of potential monitoring nodes, a subset M of the monitoring nodes is chosen from among the potential monitoring nodes that insures monitoring each of the nodes b in B with a pair of disjoint paths, and each such path terminates at one of the monitoring nodes in M. However, branch nodes that are also potential monitoring nodes may monitor themselves, and in such circumstances the branch node that is also a monitoring node does not require a pair of monitoring nodes.
Some of the potential monitoring nodes are chosen to be included in M in a first step, by identifying a subset of the branch nodes that are “t-good” nodes (defined hereinafter), and choosing for those “t-good” branch nodes a subset of the potential monitoring nodes as First Partner (FP) nodes. In a second step, another subset of the potential monitoring nodes is chosen to be included in M as Second Partner (SP) nodes for those branch nodes, thereby providing the necessary monitoring means for those branch nodes. In a third step, other nodes are chosen to be included in M, using a greedy algorithm, to handle the branch nodes that are not “t-good.” Optionally, a “minimalization” step is included to reduce the set of nodes chosen in the above-mentioned three steps.
The objective is to find a small subset, M, of nodes m from a set of K modes m, such that for each branch node, b, there are two distinct nodes mi and mj in M such that no node except b is on both any shortest path from b to mi (there may be more than one shortest path) and on any shortest path from b to mj (there may be more than one shortest path). Such a pair {mi,mj} is said to “cover” b. It is not a requirement of this invention, but it is helpful to think of the node pairs {mi,mj} as consisting of a first monitoring partner (FP) node and a second monitoring partner (SP) node. If a branch node b is also a potential monitoring node, then the definition of “covering b” is slightly different. Such a node can be covered either by two distinct monitoring nodes, exactly as above, or by itself, and no other monitoring node.
A node b typically has a number of outgoing links. In trying to reach a given other node such as a potential monitoring node, the particular routing algorithm that is employed in the
To illustrate, from b1 to m1 in
In accord with the principles of this invention, a potential monitoring node m, is “good” for node bj if all of the lowest cost paths from bj to m depart node bj via one and the same link
Additionally, node bj is considered to be “t-good” if there are at least t potential monitoring nodes that are “good” for bj.
In many experimental runs of the method disclosed herein, t was fixed at one half the number of the potential monitoring nodes i.e.,
and the results were quite satisfactory. In some experimental runs a value of t=1 was found to be even better.
Step 11 illustratively proceeds by creating a table in step 20 and iteratively executing a process in steps 21-23 to remove the table rows as expeditiously as possible. Specifically, step 20 creates a table with a column for each potential monitoring node (thus, there are K columns), and a row for each one of the “t-good” branch nodes that were identified in step 10. Each cell of a row for node bj illustratively has a “1” if the corresponding potential monitoring node is “good” for the branch node, and a “0” otherwise. Alternatively, each cell contains the label of the node that is reached first in the lowest-cost path from the branch node to that cell's associated potential monitoring node (or the label of the outgoing link itself). In cases where there is more than one lowest-cost path and their paths use different links incident from node bj, that exit bj to different nodes, the cell identifies each of the different reached nodes (or the outgoing links).
While for computer execution purposes use of the “0” and “1” is advantageous, for expository purposes the use of the alternative is deemed clearer and, therefore, the table below employs this alternative approach.
For t chosen at
relative to the
It is not uncommon for some of the branch nodes to be also potential monitoring nodes. If such a branch node is chosen as a monitoring node, whether in the course of executing step 11, or specifically for the purpose of monitoring the node, no second monitoring node is necessary because it can monitor itself. Such nodes can, however, be covered in the usual way by two monitoring nodes. For such a branch node b, the cell corresponding to row b and column b (if b is a potential monitoring node) contains a “1” in the first, binary implementation of the table, and the cell contains the dummy entry b in the second implementation of the table. In the example, if node b1 happened to also be a measuring node m6, aside from the fact that the above table would have another column, the cell corresponding to the m6 column and b, row would have the entry bj.
After the table is created, control passes to step 21 where a monitoring node is chosen as an FP node. The monitoring node that is chosen is the one that hits the largest number of branch nodes, which in the context of the created table means the column with the largest number of cells that identify a single node. In the case of the table above, nodes m1 and m5 have fewer such cells than nodes m2, m3, and m4 (m1 hits 4 branch nodes and m5 hits 5 branch nodes, whereas m2, m3, and m4 each hit 6 branch nodes) so the algorithm, in this case, chooses one of the three nodes m2, m3, and m4.
In the example, node m2 is chosen at this step as an FP node (though nodes m3 or m4 could have been chosen).
Control then passes to step 22 which identifies the outgoing link of each branch node that is hit by the chosen FP node, removes the rows of the branch nodes that are hit by the chosen node, reforms the table with the remaining rows, and passes control to step 23, which determines whether there are any remaining rows. In the embodiment where the table cells contain the first-reached node, or the outgoing link, the step of identifying the outgoing link is merely recording the values in the cells.
As an aside, what this removal effectively states is that the branch node of a removed row, bj will be covered by using the chosen monitoring node as the FP node, and some other node that has not yet been chosen as the SP node (excluding, as indicated above, branch nodes that are potential monitoring nodes that are chosen FP nodes). As defined above, a node b (that is not a chosen monitoring node) is said to be covered when there exist two distinct monitoring nodes, mi and mj such that all lowest cost paths from node b to node mi are node-disjoint from all lowest cost paths from node b to node mj (except for b, of course).
Returning to the algorithm and the
It is noted that the outgoing links toward the FPs that are identified by step 22 for the above example are (b1,m2), (b2,b1), (b3,m2), (b4,m2), (b5,m2), and (b6,m3), for b1 through b6, respectively.
Having chosen the FP nodes, the next task is to choose the SP nodes and, as indicates above, the task of choosing the SP nodes is executed in step 12, which comprises steps 30 and 31.
Step 30 creates a second-pass table by identifying, for each t-good branch node b, the potential monitoring nodes m such that all least cost paths from b to m do NOT use the outgoing link identified in step 22, for branch node b, and placing a “1” in cell (b,m), while placing a “0” in all other cells. In addition, if a t-good branch node b is also a potential monitoring node, then in cell (b,b) we place a “1”. Control then passes to step 31 which chooses a set of monitoring nodes (i.e., columns) as the SP nodes that, together, hit all of the rows in the table.
It may be noted that in constructing the table of step 30 it is required to not include those rows corresponding to branch nodes that are also FP nodes because they do not need SP nodes for their proper monitoring. They monitor themselves.
In the illustrative example of the
What the table below indicates is that choosing node m1 hits all but one of the branch nodes while the other potential monitoring nodes hit fewer branch nodes, so step 31 chooses m1 as a SP node, removes the nodes that are hit by the choice of node m1, and observing that the only node that remains un-hit is node b2 and that it is hit by monitoring nodes m3 and m4, step 31 chooses one of these monitoring nodes; for example, node m3, and passes control to step 40.
Expressing the process involved in steps 10 and 11 somewhat more mathematically, one needs to determine, for each branch node bj, link e=(bj,x) leaving bj, and potential monitoring node mk, whether all lowest-cost OSPF paths from bj to mk depart bj via the link e=(bj,x). All lowest-cost paths from bj to mk depart bj via the link e=(bj,x) if and only if for all links (bj,y) y≠x, cost(bj,y)+dist(y,m)>cost(bj,x)+dist(x,m), where dist(x,m) stands for the cost of the lowest-cost path from x to m.
Expressing the process involved in step 12 somewhat more mathematically, one needs to determine, for each branch node bj, link e=(bj,x) leaving bj, and potential monitoring node mk, whether all lowest-cost OSPF paths from bj to mk avoid link e. All lowest-cost OSPF paths from bj to mk avoid link e=(bj,x) if and only if cost(bj,x)+dist(x, mk)>dist(bj, mk).
Defining Sjk as the set of all nodes x on some shortest bj-to-mk path, Sjk can be computed for all branch nodes bj and monitoring nodes mk, and then the pair {mk,mn} of monitoring nodes covers bj (which is not a potential monitoring node) if and only if mk and mn are two distinct potential monitoring nodes and the intersection of Sjk and Sjn is a set that contains only branch node bj. As an aside, a node x belongs to Sjk if and only if dist(bj,x)+dist(x,mk)=dist(bj, mk).
As indicated above, the method steps disclosed above do not handle the nodes that are not “t-good” (“t-bad” for short). It is the function of step 40 to handle those nodes, but if no such nodes exist then, of course, control passes to the next step, which is step 50.
There are different approaches that can be taken for handling these “t-bad” nodes in step 40. One such approach is a “greedy” algorithm where one of the remaining potential monitoring nodes (RPM nodes) is considered, and the network is analyzed to determine how many of the “t-bad” nodes can be covered by choosing that node. In the above example, the RPM nodes are nodes m4 and m5—because node m2 was chosen as a FP node and nodes m1 and m3 were chosen as SP nodes. The analysis is repeated, each time for a different chosen RPM node, until all of the RPM nodes have been considered. Then one of the RPM nodes is chosen that, together with the already chosen potential monitoring nodes (an FP, SP, or a previously chosen RPM) covers the largest number of the “t-bad” nodes. That chosen RPM node is removed from the RPM node set. Once an RPM node is chosen, step 40 determines whether any “t-bad” nodes still remain that are not covered. If so, the steps involving choosing the RPM nodes one at a time, determining how many uncovered “t-bad” nodes can be covered, and choosing an RPM node that covers the largest number of uncovered “t-bad” nodes, are repeated, until no uncovered “t-bad” nodes remain.
It is possible that the addition of no single added RPM will cover any “t-bad” node. A different chosen order might result in more complete coverage of the “t-bad” nodes, or perhaps even a complete coverage, so such a different order might be tried. Alternatively, the RPM nodes are considered in pairs.
Trying all possible pairs is guaranteed to yield coverage of the t-bad nodes because if a feasible solution is possible, it can always be found by trying all possible pairs of nodes. (In a feasible solution, every branch node b can be covered by choosing either some pair of distinct potential monitoring nodes or, if b is also a potential monitoring node, by choosing b itself.) Of course, the pair that is best to choose is the one that hits the largest number of remaining RPM nodes.
Step 50 takes into account the fact that the choices made for the FP nodes and SP nodes and RPM nodes are sufficient to cover all of the branch nodes (i.e., presenting a feasible solution), but it is not necessarily a minimal set for covering all of the branch nodes. That is, the entire set of chosen potential monitoring nodes (chosen FP nodes, SP nodes, and RPM nodes) might be reduced, and optional step 50 “minimalizes” this set. Illustratively, step 50 takes each of the monitoring nodes, temporarily removes it from the set and determines whether each of the branch nodes is still covered by a pair of monitoring nodes. If so, the temporarily removed node is removed permanently. If not, the temporarily removed node is returned to the set. This operation is performed on each of the chosen monitoring nodes.
It may be noted that if the considered network is such that at least one feasible solution exists, then the method disclosed herein will find one such feasible solution; and experimental results indicate that the method disclosed herein yields a feasible solution that is quite close to optimum, and within a reasonable processing time.