The elements of the network matrix MN are the costs of links (link costs). Each link has a root end that is a node included in the network N and a tail end that is another node directly connected to the root end node. In this example, the cost (link cost) of a link AB that has node A as the root end node and node B as the tail end node is four, the cost of a link AC that has node A as the root end node and node C as the tail end node is three, the cost of a link BC that has node B as the root end node and node C as the tail end node is five, the cost of a link BD that has node B as the root end node and node D as the tail node is three, and the cost of a link CD that has node C as the root end node and node D as the tail end node is three. Since other elements in the network matrix MN are not directly connected, the link costs of such elements are set at infinity in this example to exclude such elements (link costs) from searches. Depending on the system used for the search, the value indicating that elements are to be excluded from the search is not limited to infinity.
An example of searching for shortest paths from node A as a starting node to other nodes will now be described. Since the links BA, CA, and DA that have node A as the tail end node are to be excluded from the search, the costs of such links are set in advance at infinity. In addition, since the links AA, BB, CC, and DD do not exist, such link costs are also set at infinity.
An example of searching for shortest paths using the algorithm including steps a1 to a4 described above will now be described for the network matrix MN. First, as shown in
In this algorithm, the tail end node of a link out of the subject links being searched whose link cost becomes zero due to the subtraction is a “reached node”. Accordingly, as shown in
Out of the link costs included in the network matrix MN, the link costs of links whose tail end nodes are the reached node are changed to the value indicating that such link costs are to be excluded from further search. As shown in
Out of the link costs produced after subtraction, the values aside from zero (where zeros indicate reached node or nodes) indicate links to be subjected to a continued search. In addition, the link costs of links that have reached nodes as their root end nodes are added as links to be searched. Accordingly, in this example, the link costs of links that have node A or node C, which have been assigned the reached node flag “1” in the path matrix MR, as their root end nodes are the costs of the links subjected to the next search.
Next, the lowest value “1” out of the link costs of the subject links for the next search is subtracted from the link costs. As a result, as shown in
Also, out of the link costs included in the network matrix MN, the link costs of links that have the reached nodes as their tail end nodes are changed to the value that indicates that such links are to be excluded from further searches. As shown in
Out of the link costs produced after subtraction, the values aside from zero (where zeros indicate reached node or nodes) indicate links to be subjected to a continued search. In addition, the link costs of links that have reached nodes as their root end nodes are added as links to be searched. Accordingly, in this example, the link costs of links that have node A, node B, or node C, which have been assigned the reached node flag “1” in the path matrix MR, as their root end nodes are the costs of links subjected to the next search.
Next, the lowest value “2” out of the link costs of the subject links is subtracted from the link costs. As a result, as shown in
Also, out of the link costs included in the network matrix MN, the costs of links that have reached nodes as their tail end nodes are changed to the value that indicates that such links are to be excluded from searches. This means that as shown in
In this algorithm, the lowest cost out of the link costs of the subject links being searched is subtracted from such link costs. Accordingly, the link cost of at least one link becomes zero, so that at least one reached node is found by each subtraction. This means that regardless of the magnitudes of the link costs, it is possible to find all of the shortest paths from a starting node to other nodes included in the network by repeating the process described above a number of times equal to the number of nodes in the network aside from the starting node, or in other words, a number (Na−1) where Na is the number of nodes included in the network. Accordingly, the number of iterations of the loop process will increase substantially in proportion to the total number of nodes at most. If the number of nodes included in a network to be searched is determined or known, since it will definitely be possible to find the shortest paths from the starting node to the other nodes by carrying out (Na−1) iterations, as described earlier it is not actually necessary to confirm whether every element in the network matrix has been set at infinity.
On the other hand, it is also possible to use an algorithm that subtracts the smallest unit for expressing link costs from the link costs being searched. In this example, the smallest unit for the link costs is “1”, and therefore in one iteration of the loop, “1” is subtracted from the link costs of the subject links being searched. When the subtraction loop is repeated and one of the link costs reaches zero, a reached node is found. In this algorithm, the number of iterations of the loop will increase depending on the link costs.
The functions that realize the routing unit 80 are provided by special purpose modules (dedicated modules) or a combination of software and hardware resources that are shared with other functions, for example, the CPU or the reconfigurable processor. The routing unit 80 includes a function 82 that refers to a routing table 81 and selects the next hop to which the packets are to be transmitted. The routing unit 80 also includes a function 83 for updating the routing table 81 using a suitable routing protocol, such as OSPF, that can dynamically update a routing table, to obtain information showing the configuration (arrangement) of the network relating to routing including information of routers in the vicinity. The updating function 83 sets the network whose information obtained by the routing protocol as the network to be analyzed (subject network) and provides a lower-level analysis system 30 with a network matrix 39 generated for showing the subject network or the network information obtained for generating the network matrix 39 to carry out routing analysis. In a link-state routing algorithm such as OSPF, link information between routers is regularly exchanged by LSA (Link-State Advertisements). Accordingly, all of the link information (i.e., all of the link costs) in the subject network is known for the nodes that belong to the subject network. The network matrix 39 can therefore include link cost information in the subject network obtained by LSA.
The analysis system 30 has a hierarchical construction. The function (the third function) 31 on the highest level converts the network to be analyzed (the subject network that has been provided by a higher-level application or system) into hierarchical networks, thereby dividing the subject network into small-scale networks. The function (the second function) 21 on an intermediate level provides the lowest costs between nodes in the small-scale networks to the higher level function and thereby makes it possible to reconfigure the networks on the higher level. The function (the first function) 11 on the lowest level searches for the shortest paths between nodes in the small-scale networks and provides the shortest paths to the higher level function, thereby making it possible to calculate the lowest costs between nodes.
A first system 10 including the first function 11 on the lowest level is equipped with a processor 50 in which a plurality of circuits can be reconfigured, a control unit 51 for reconfiguring, data inputting and controlling the processor 50, and a memory 52. Circuit configuration information 54 for reconfiguring the circuits is stored in the memory 52 and at appropriate timing, circuits for carrying out a path search are reconfigured in the processor 50 by a configuration control function 53 of the control unit 51. Accordingly, different circuits can be reconfigured in the processor 50 when a path search is not required, thereby making it possible for other functions to use the processor 50.
In the processor 50, a circuit 61 for initially inputting data for a path search, a circuit 62 for reinputting data for the path search, a circuit 63 for finding the lowest value, a circuit 64 for subtracting, a circuit 65 for setting which nodes have been reached, and a circuit 66 for generating data for a continued search are reconfigured. The first function 11 for controlling the circuits constructed in the processor 50 is loaded in the control unit 51 by an appropriate program. The first function 11 includes an input unit 12 having a function for initial inputting, which selects a starting node for the initial input and repeats the initial inputting into the processor 50 (the circuits configured in the processor 50) with all of the nodes shown in the network matrix MN as starting nodes, and a loop controller 13 having a function that repeats the reinputting into the processor 50 until all of the other nodes have been reached from the starting node.
Typical elements of PE 55 for constructing the reconfigurable processor 50 are elements whose functions can be freely set using look-up tables respectively. Elements with internal data-paths suited to special functions or processing, such as elements for arithmetic/logic operations, delay elements, memory elements, elements for generating addresses for inputting or outputting data, and elements for inputting or outputting data, are arranged in the matrix 50 of the data processing apparatus 15. By arranging elements that are roughly divided into functional groups, it is possible to reduce redundancy, so that the AC characteristics and processing speed can be improved.
More specifically, the DAPDNA matrix 50 includes 368 PE 55, and by carrying out program control of the RISC 51, configuration data is supplied to the respective PE 55 from the RISC 51 or the memory 52 via a control bus 16. The functions of the respective PE 55 and the connections by the wire groups 57 are controlled using the configuration data, and by doing so a variety of data flows (data paths) can be freely constructed on the matrix 50. Accordingly, the matrix 50 is the one of the reconfigurable processor, in which the circuits using the PE 55 can be freely changed by the program.
To connect the matrix 50 to the periphery, such as to an external memory 25, and input and output the data to be processed, the data processing apparatus 15 includes an input buffer 18, an output buffer 19, and a bus switching unit (a bus interface or BSU) 17 that functions as an access arbitration unit. The buffers 18 and 19 each include four buffer elements and function so as to manage data inputted and outputted into and from the circuits constructed in the matrix 50.
The control unit 55a of each PE receives configuration data from the RISC 51 via the control bus 16 and controls the configuration of the internal data path region 55b. Accordingly, in the PE 55, the states of the shift circuits, the mask circuits, and the arithmetic logic unit are set by the control unit 55a so that various types of arithmetic operations and logic operations can be performed. The PE 55 respectively also include input registers for latching (setting) input data according to the clock signal and the output register for latching (setting) output data according to the clock signal. This means that once the content of processing, calculation or function to be performed by the PE 55 has been determined, the latency until the inputted data is subjected to calculation and outputted is determined. Accordingly, it is possible to easily arrange circuits of a pipeline processing by connecting a plurality of PE 55 using the wires 57 and thereby provide the circuits with a large throughput.
The circuit 62 shown in
In the circuit 63 for finding the lowest value, the lowest value out of the data pieces showing the links costs to be processed for searching out of the data sets A1 to A4 is detected by a knockout method. In the circuit 64 for subtraction, the found lowest value is subtracted from every data pieces to be processed for the searching included in the data sets A1 to A4. Cost values, which is the data pieces, set at infinity (“7F”) are not subjected to this operation. In the circuit 65 for setting a reached node, the determination flag on the leading bit of the data piece whose link cost has become zero is set at “1” to show which node has been reached.
The circuit 66 for generating data for a continued search includes circuit for changing the data pieces (link costs) included in the data sets A1 to A4. In the circuit for changing, the link costs of links that have tail end nodes whose determination flag (that is the determination flag of one of the data pieces showing the link costs of links having such tail end node) has been set at “1” are all set at infinity to exclude such link costs from further processing for searches. That is, by the circuit for changing, the link costs in the column of the node with the reached flag (the determination flag is “1”) of the network matrix MN are changed to infinity. The circuit 66 also includes circuit for setting values after the subtracting of link costs of the subject links whose link costs do not become zero due to the subtracting as link costs of links for a continued search. Therefore, from the circuit 66, data sets A1 to A4 (new data sets A1 to A4) including the link costs set at infinity for the reached node and the link costs after subtraction of the other links are outputted.
These new data sets A1 to A4 are supplied to the re-input circuit 62 and at appropriate timing are supplied again to the circuit 63 that detects the lowest value. The process is repeated a predetermined number of times and the data sets where all of the link costs are infinity gather in the memory 52. The determination flags of the data sets are used as the path matrix MR that shows the shortest paths. The functions of the circuit 61 for initial inputting and the circuit 62 for reinputting data can also be realized by software. For example, a process can be carried out using software control that stores the new data sets outputted from the circuit 66 for generating data for a continued search temporarily in the memory 52 and reinput at appropriate timing that does not cause the pipeline to fail. Alternatively, if the reconfigurable processor 50 is used by other functions according to time sharing, the circuits for carrying out a path search can be reconfigured at timing that does not adversely affect execution of the other functions and the path search can be continued by inputting the reinputted data using software control.
In the processor 50, a large number of processing elements PE 55 are provided and by using such PE 55, it is possible to construct circuits for processing a plurality of data sets in parallel. A 4×4 matrix is used in the present specification and current DAPDNA are arranged for processing 32-bit input data sets. So, by dividing the 32 bits data set into four 8-bit pieces, the data sets representing the 4×4 matrix are generated. Alternatively, by configuring circuits in DAPDNA for processing the inputting sixteen of such 32-bit data sets, which each has four data pieces, parallel processing can be carried out for link costs included in an 8×8 network matrix MN. Also, as described earlier, the elements PE that are provided in the DAPDNA operate in synchronization with the clock and in each PE, FF (flip-flops) for latching data in synchronization with the clock are disposed at the input and/or output ends. Accordingly, circuits that fundamentally carry out pipeline processing are constructed in the DAPDNA, thereby making it possible to carry out the processing of the path search described above by pipeline processing.
With Dijkstra's algorithm, a calculation carried out at a given point in time is affected by the preceding calculation results, and therefore Dijkstra's algorithm is not suited to a parallel reconfigurable processor such as a DAPDNA. In addition, when Dijkstra's algorithm is installed in a data flow machine, since there is dependency between loop iterations, a long feedback loop is required, thereby reducing the throughput of the processing. In addition, since searching for lists takes time, Dijkstra's algorithm is not a shortest path searching algorithm suited to a parallel data flow machine.
Unlike Dijkstra's algorithm, in one iteration of processing by the shortest path searching algorithm included in the present invention, the present position is advanced along every branch by the distance (i.e., link cost) of the node with the lowest link cost and at least one reached node is found. Accordingly, the inventors have named this shortest path searching algorithm an “AMPSA” (Advanced-Multi-route Parallel Search Algorithm). As described earlier, the algorithm included in the present invention may use a method where in one iteration of processing, the present position is advanced by one cost unit on every branch starting from a starting node and the path where the present position first reaches another node is set as the shortest path. The inventors have named this algorithm an “MPSA” (Multi-route Parallel Search Algorithm). With MPSA, when the present position has reached a node, the present position is thereafter also advanced on the branches that extend from such reached node, and the processing ends when the present position has reached every node.
AMPSA and MPSA are algorithms that suppress the dependency of the processing and can carry out a search simultaneously on a plurality of paths, and therefore are suited to parallel processing. MPSA and AMPSA can also be expressed as a matrix calculation. As described later, this matrix calculation algorithm is scalable to a larger network. When AMPSA that subtracts the lowest value is executed on a DAPDNA, the number of clock cycles consumed for the execution of the algorithm will not depend on the link cost of the path from the starting node to the furthest node, and, when searches are carried out for every node out of n nodes with a lattice-like mesh topology, is O(n). Accordingly, with AMPSA, compared to Dijkstra's algorithm, the amount by which the calculation load increases relative to an increase in the number of nodes is extremely small.
The amount of calculation with Dijkstra's algorithm is O(n2) for a number n of nodes, so that when all the nodes are searched, the amount of calculation is O(n3). This means that with Dijkstra's algorithm, when the scale of the network increases, there is a sudden increase in the amount of calculation, which makes a large amount of CPU power and memory necessary. Also, recently, since the finding a shortest path becomes more complicated by considering a plurality of information such as the bandwidth and the wavelength and it is necessary with a normal serial processor to calculate the shortest paths separately, there is a large increase in calculation time. A combination of the AMPSA or MPSA included in the present invention and a parallel reconfigurable processor like DAPDNA can solve this problem. In addition, with a schedule where the routing table is updated once every few minutes or several times an hour, the parallel reconfigurable processor like DAPDNA can be utilized for other network processing, for example, finding best matches for IP addresses, which means that the present system is also suited to making effective use of hardware resources.
As described above, the combination of the AMPSA or MPSA included in the present invention and a synchronous parallel processor like DAPDNA is capable of independent calculation by pipeline processing. That is, even for the same network, AMPSA or MPSA can carry out processing for finding shortest paths independently for different starting nodes. In addition, the DAPDNA can process data by pipeline processing in element units and in units of single clock cycles. Accordingly, the combination of the AMPSA or MPSA and DAPDNA has a large throughput which makes it possible to carry out a shortest path search for multiple (n) starting nodes in substantially the same calculation time as that of a shortest path search for a single starting node.
For ease of understanding the present invention,
In more detail, when carrying out a shortest path search for a network with an 8×8 network matrix using a DAPDNA, although it depends on the circuit construction, around 480 clock cycles are consumed per larger data set. Normally, when searching for shortest paths for all nodes, a total of “480 clock cycles×the number of nodes” will be consumed. However, by using pipeline processing configured in the DAPDNA, results are consecutively outputted in units of clock cycles, therefore, for a network that has 8 nodes and an 8×8 network matrix, when carrying out shortest path searches for all the nodes in the network, that is, when finding shortest paths that have all of the nodes as starting nodes and reach all of the other nodes, a total of only 487 (=480+8−1) clock cycles are consumed. In addition, a plurality of networks can be processed using the same pipeline processing. For example, it is possible to complete shortest path searches for all nodes in three networks, each of which has 8 nodes, in a total of 503 (480+8+8+8−1) clock cycles.
When the number of network groups increases, the number of inputted network matrices also increases. If an amount of data sets for inputting the network matrices consecutively is beyond the 480 or more clock cycles that are required to pass the shortest path search circuits configured, it will not be possible to advance to the second iteration carried out on the first data set until all of the initial inputting has been completed. In this case, the first data set may be stored in a memory and/or the pipeline configured in a DAPDNA may be extended, or, to extend the pipeline, a plurality of groups of circuits and/or a plurality of processors (DAPDNAs) may be serially connected. It is also possible to configure a plurality of groups of circuits and/or connect a plurality of processors in parallel to increase the parallelism.
In this way, the first process of the first system 10 includes steps 71 to 77. Out of these steps, at least steps 72 to 75 should preferably be carried out by parallel processing by a parallel processor like a DAPDNA. That is, the process 70 including finding a lowest value (step 72) to outputting the link costs for a continuous search (step 75) can be carried out in parallel for all of the link costs included in a larger data set using a parallel processor. Also, by processing all of the link costs included in a larger data set in parallel using a parallel processor, it is possible to use pipeline processing and to obtain a large throughput.
As shown in
The second system 20 includes a memory 25 for storing the network matrix MN that is provided to the first system 10, the path matrix MR obtained from the first system 10, and the lowest cost matrix MC. The second function 21 of system 20 includes a lowest cost finder (cost finder) 22 having a function for calculating, using the network matrix MN and the path matrix MR, the lowest costs from a given starting node to other nodes, and an MC generator 23 having a function for generating the lowest cost matrix MC including the lowest cost from each node to every other node by repeating the process described above with all of the nodes included in the network matrix as starting nodes. The method of generating the lowest cost matrix MC is the same as described earlier with reference to
Note that although the second process is shown in
As shown in
Nodes that make links for connecting (directly connecting) across groups are called boundary nodes Bn and the boundary nodes Bn in all of the groups are extracted. When the number of extracted boundary nodes Bn is greater than m, the division into groups is repeated to make one or more intermediate levels (layers). By repeating this process, as shown in
Carrying out processing after dividing the large-scale network constructed having the n nodes into small-scale networks having m nodes is also effective in reducing the amount of calculation by using the locality of the connections on the network. That is, the network matrix of a large-scale network having n nodes includes n by n elements. However, since every node will not be connected to all other (n−1) nodes, most of the link costs of the elements included in the network matrix will be infinity showing that the nodes are not connected. By dividing into small-scale networks, it is possible to reflect the links between local nodes efficiently in the network matrices of the small-scale networks, thereby reducing the amount of data to be processing. Accordingly, the scale of the hardware required to carry out the search for shortest paths can be reduced and the processing time can also be reduced.
Here, consider the case when finding a shortest path from node all belonging to group A1 as a source node to node b14 belonging to group B1 as a destination node. As shown in
Step 1—The MN generator 33 generates the network matrix of the group A1 on the lowest level. The MC finder 34 supplies the network matrix to the second system 20 and obtains the lowest cost matrix A1c for the group A1 (see
Step 2—The MN generator 33 generates the network matrix of the group B1 on the lowest level. The MC finder 34 supplies the network matrix to the second system 20 and obtains the lowest cost matrix B1c for the group B1 (see
Step 3—The MN generator 33 generates the network matrix of the group A2 on the higher level (this case the highest level) that includes the boundary nodes a22, a23, and a24 that make links for connecting the group A1 and the group B1 (see
Step 4—The MC finder 34 obtains the lowest cost matrix A2c for the higher level group A2 (see
Step 5—From the lowest cost matrix A2c of the group A2, the MC finder 34 further generates the matrix A2c′ for adding the lowest costs of the paths between the boundary nodes included in the groups A1 and B1 to the lowest costs of the paths between the boundary nodes and the destination node included in the group B1. That is, the matrix A2c′ is generated for adding the lowest costs w5 and w6 of the paths between the boundary node a22 (a13) included in the group A1 and the boundary node a23 (b11) and the boundary node a24 (b12) included in the group B1 that includes the destination node b14 (see
Step 6—The MC finder 34 adds the matrices A2c′ and A1c′ to the lowest cost matrix B1c of the group B1. By doing so, the cost matrix A0c including link costs for every path from the source node a13 to the every nodes included in the group B1 via the boundary nodes a22, a23, and a24 is generated (see
Step 7—For the destination node b14, the cost (y3+w5+x2) of the path via the boundary node a23 (b11) and the cost (y6+w6+x2) of the path via the boundary node a24 (b12) are compared and the path with the lower cost is selected as the shortest path from the source node all to the destination node b14. In the same way, the shortest path that has node b13 as the destination node b13 can be determined by comparing the cost (y2+w5+x2) of a path via the boundary node a23 (b11) and the cost (y5+w6+x2) of a path via the boundary node a24 (b12) included in the cost matrix A0c.
Next, by using the MN generator 33 having the function that generates the network matrix and the MC finder 34 having the function that obtains the lowest cost matrix, the lowest costs are found in order from the small-scale networks on the lowest level and then the shortest paths in the large-scale network are found. In step 125, the MN generator 33 generates a network matrix MN for each of the plurality of networks on the lowest level. In step 126, the MC finder 34 supplies the respective network matrices to the second system 20 and obtains the lowest cost matrices MC. In step 127, if the lowest cost matrix MC of the highest level has been obtained, the process of system 30 is completed and the lowest cost matrix MC for the subject network 39 to be analyzed is found.
Accordingly, in step 129, the system 30 finds shortest paths that have all of the nodes included in the subject network 39 as their destination nodes and updates the routing table 81. The above method is one of methods of finding the shortest paths for each node as the destination node. That is, in the above method, for each of a plurality of paths from a source node to a destination node that pass a plurality of boundary nodes included in the higher-level networks, the total of the lowest cost from the source node to a boundary node in a network including the source node out of the plurality of networks on the lowest level, the lowest cost between the boundary nodes included in the network or networks on the higher level, and the lowest cost from a boundary node to the destination node in a network including the destination node out of the plurality of networks on the lowest level is calculated. Next, the path for which the total of the lowest costs is lowest is set as the shortest path from the source node to the destination node.
When a higher level network exists, in step 128, the MN generator 33 extracts the link costs between the boundary nodes that are used to generate the higher level network from the lowest cost matrix MC, and generates a network matrix MN for the higher level network.
When the network matrix MN has been generated, in step 126, the MC finder 34 supplies the network matrix MN to the second system 20 and obtains the lowest cost matrix MC. That is, for the higher level network, the second system 20 (the second process) obtains the lowest costs from each boundary node as a source node to the other boundary nodes.
In an alternate method of finding shortest paths with each node as the destination node, in step 128, information of the lower-level networks is included (incorporated) when the network matrix MN of a higher-level network is generated. For example, the MN generator can generate a different higher level network matrix for each destination node by expressing the lowest costs between boundary nodes on the higher level as not only the link costs for other boundary nodes on the higher level but also including the lowest cost to the source node and the destination node. That is, the lowest costs relating to the boundary nodes connecting to the source node can include the lowest cost between such boundary nodes to the source node, and the lowest costs relating to the boundary nodes connecting to the destination node can include the lowest cost between such boundary nodes to the destination node,. With this method, by finding the lowest costs of the network matrix of the highest level or by finding lowest costs of the network matrix of a higher level that includes a boundary node relating to the source node and a boundary node relating to the destination node, it is possible to find the shortest paths from the source node to the destination node.
When a normal sequential processor carries out a shortest path search for n starting points, n times the calculation time for one starting point is required. On the other hand, since parallel processing is possible when the processing is carried out using a DAPDNA operating according to the algorithm included in the present invention, the calculation load hardly increases whether there is one node or n starting node. Basically, if n nodes are added, processing can be carried out with the processing time increased by n clock cycles. When the calculation circuit is designed so that calculation is carried out with 8 nodes in a group, the amount of calculation will hardly be affected by the number of nodes in each group. If there is an average of two boundary nodes per group, the number of groups S on a network of n nodes is given by the equation shown in
Number | Date | Country | Kind |
---|---|---|---|
2006-134639 | May 2006 | JP | national |