The present invention relates to a network, and in particular to a network allowing multiple users access to distributed storage devices.
There has been a tendency for offices and employees to become geographically separated while, simultaneously, both employees and organizations have sought to have unified access to company data systems almost as if they were all located at a single office. In addition, companies and users wish their data system to be robust. That is to say the communications channels and storage devices should be able to tolerate a reasonably small number of faults without loss of service. As a result storage area networks, SANs, have gained popularity with companies looking for efficient distributed storage solutions. Workers in field of storage area networks often describe a SAN as a set of “fabric elements” connecting a set of hosts to a set of storage devices. In this terminology “hosts” are computers which may need to retrieve or deposit data from the storage devices. Thus the hosts are typically user's computers. The “fabric elements” are the interconnections between the hosts and the data stores and comprise both real elements, such as cables, routers and network hubs, as well as intangible items such as a communications link having a prescribed bandwidth. The communications link may be routed through a physical cable which the company has access to, or may be a communications link provided by a third party and which, in reality, represents a share of a much faster physical data link, such as a fiber optic cable, which the third party uses to move data from one place to another. Herein the term “devices” will be used to encompass both hosts and data stores, where “fabric elements” will be used to refer to components and communications links forming the network to which the devices are connected. Devices and fabric elements are examples of Network Elements, that is components forming the network.
Each network element within the SAN has one or more ports. A port is the connection which a network element makes to a communication link and each link has a port at each end thereof.
The SAN is constructed from real world components and hence each component choice brings with it a physical limitation and a real monetary cost. It is therefore important that the components that make up the SAN are appropriately selected to achieve a suitable cost-performance balance.
The design and installation of storage area networks is a potentially very complex matter. Typically network designers and providers base their designs on a few well known topologies. This can result in SANs being over provisioned in their capabilities by quite a considerable margin. This, in turn, can result in the network installation cost, as mainly determined by the hardware components which were brought for it, being much more expensive than was strictly necessary to meet the performance criteria laid down by the user.
As noted before, a SAN is specified by providing a list of hosts (user devices that wish to access data), a list of storage devices upon which the data can be stored, a list of possible types of fabric nodes (cables, routers, hubs) and a description of the data flow requirements from a specified host to a specified storage device. The data flow requirements define the bandwidth of the data link required between the host and the storage device. These requirements are specified by a network designer based upon knowledge of the use to which the network will be put. The network designer then passes the design parameters to the SAN designer.
The SAN designer than has to juggle these user requirements with further limitations of the SAN design, such as not being able to split a data flow between a host and a device, and the limited number of ports available on each of the fabric nodes. Hewlett Packard had a project called “Appia”, reported by Ward et al, “Appia: Automatic Storage Area Network Fabric Design”, Proceedings of the FAST 2002 Conference on file and storage technologies, pages 203 to 217, January 2002, which demonstrated that algorithmic optimisation techniques could quickly specify a topology that satisfies a SAN design requirements and competed with designs created by human SAN experts. Although the Appia algorithms were able to quickly determine a possible SAN topology, they are not guaranteed to find an optimal solution.
According to a first aspect of the present invention there is provided a method of optimising a network comprising the steps of:
According to a second aspect of the present invention there is provided a computer program for causing a computer to perform A method of optimising a network comprising the steps of:
According to a third aspect of the present invention there is provided a method of optimising a storage network configuration, the method comprising the steps of: taking an Ith set of network configurations and using genetic modification to derive from the Ith set of configurations an (I+1)th set of configurations, having at least one change applied to them; and computing a cost for at least some of the configurations of the (I+1)th set.
Preferably the process is iterated. Advantageously successive iterations are based on one or more solutions from a previous iteration which exhibited a comparatively low cost compared to other solutions found in that iteration. The probability that a solution is used as the basis of a further iteration may be a function, and preferably an inverse function of the cost of that solution. Thus high cost solutions have a low probability of being selected compared to lower cost solutions. Advantageously “elitism” is invoked such that the lowest cost solution from any generation is automatically included into the next generation. This ensures that irrespective of the number of iterations or generations over which the process is run, the lowest cost solution is always carried forward to the final generation.
According to a fourth aspect of the present invention there is provided a computer program for causing a computer to perform a method of optimising a storage network configuration, the method comprising the steps of:
The present invention will now be described, by way of example, with reference to the accompanying drawings, in which:
a and 3b schematically illustrate an “ant behaviour” based approach to selecting a low cost or quickest route;
a and 14b schematically illustrate multilayer networks designed in accordance with the present invention; and
As described hereinbefore, Hewlett Packard has disclosed algorithms for storage area network design.
It can be seen that the first attempt by the flow merge algorithm results in four devices (encircled in the Figure) having port violations, that is more connections than the number of ports that the device actually has. The second iteration reduces the number of port violations by introducing a switch S1 between the first host Hi and the first device D1. The switch connects to the devices D1 and D2. In spite of this introduction of a switch there are still three port violations. The third iteration removes the direct connection between host H2 and device D1 and instead routes the data flow through the switch S1. This solution has reduced the number of port violations from three to two. The fourth iteration results in the removal of a direct connection from host H2 to device D2 and instead routes this connection via the switch S1. This has reduced the number of port violations to one. The fifth iteration sees the introduction of a second switch, S2 which is connected to hosts H2 and H3 and also to device D3. This results in a solution which does not have any port violations. However, stopping at this solution would be a mistake as it is relatively costly. The next iteration results in the solution in which one of the switches is removed, and all of the hosts connect into the remaining switch, as do all of the devices, but direct connections also exist between host 1 and device 3 and also between host 3 and device 1.
This example demonstrates that even for a seemingly trivial network there are a plurality of configurations that may be solutions to the network connection problem, and that some solutions incur a greater cost penalty than others.
In order to exemplify the workings of an embodiment of the present invention it is helpful to consider a simple network.
Ant Optimisation
An first approach to finding the optimal solution is based on so called ant optimisation. The inspiration for ant colony optimisation algorithms comes from knowledge of path selection in ant species using pheromone trails. The ant deposits a pheromone as it walks. Thus a subsequent ant, when facing a choice of routes, will take a path which has the highest concentration of pheromone on it, as this path has had the largest number of ants travelling on it previously. If multiple paths have the same amount of pheromone, each of the equivalent paths is chosen with equal probability. Occasionally an ant will ignore the pheromone concentrations and simply pick a path randomly. It is also accepted that the pheromone evaporates at some rate over time such that paths age out. These simple rules result in an “emergent” behaviour of a colony finding the shortest path between two points. This can be shown by considering the journey of four ants, as shown in
a and 3b give a simple illustration of the process involved in “ant colony optimisation”. In
b shows the ant positions at some time later, when the ant A2 travelling along the lower path has arrived at point 42, whereas the ant A1 travelling along the upper path is still in transit. Similarly ant A4 has arrived at point 40 whilst ant A3 is still in transit. Thus, an ant departing from point 42 at this time or just after is more likely to choose the path “B” over path A as there will be a greater concentration of pheromone on the lower path. Providing the paths are used frequently, path B will become the chosen path.
In order to optimise the network, the storage area network design problem is translated into a path optimisation that can be solved by virtual ants which act as investigation agents to investigate connections within the network. The designer assumes that a set of F flows exist through a set of N fabric nodes. The number of fabric nodes N is generally greater than the number of fabric nodes strictly necessary to solve the problem. A colony of virtual ants are then created and allowed to choose routes through the storage area network. A route in this context is the assignment of a flow f from the set of flows F to a fabric node n from the set of fabric nodes N. The routes chosen allow a network topology to be inferred.
In order to choose its path, a particular ant will iterate through a set of required flows. The flows are ordered by decreasing bandwidth requirements. The ant chooses a fabric node, or direct connection, for each of the flows that it is to pass through. As the path is constructed the set of possible nodes to which the next flow can be assigned is restricted to only those nodes which will result in a feasible network. If there are no feasible fabric node choices and the set of possible nodes is therefore empty, the ant is “terminated” and ignored. It then becomes possible to evaluate the resulting network in terms of a “cost” to implement the network. The calculation of cost is comparable to calculating the length of a particular tour. At the end of each generation, each ant updates the pheromone concentration values along its path according to a predetermined rule.
The exact probability of a particular ant choosing a particular fabric node for a particular flow during a particular generation (t) is, in general, a combination of both pheromone concentration r and heuristic desirability ξ as defined below:
where
It is assumed that there are K ants in each generation t. The probabilities are normalised by dividing the numerator by the sum of all feasible choices Nfeasible in order to ensure that the probabilities sum to 1. These feasible choices are those known choices that are consistent with a path corresponding to a buildable network. The probability of selecting an infeasible node is 0, and equation 3 gives the probability of selecting each of the feasible nodes. If there are no feasible nodes, then the ant has reached a “dead end” and it and the path are terminated.
In order to evaluate equation 1 it is necessary to determine the pheromone values. In order to do this, the pheromone values may be stored in a matrix which has dimensions F by N. The pheromone values of a particular route, that is flow f assigned to a fabric node n, is then determined by accessing the matrix at location (f, n). Initially each position of the matrix is initialised at the start of a run with a preset value. At the end of each run, which constitutes a generation, the pheromone levels are updated according to the formulas specified in accordance with equations 2 and 3 below:
τfn(t+1)=(1−P)*τfn(t)+Dkb (equation 2)
τfn(t+1)=(1−P)*τfn(t) equation 3)
In each case P is a constant representing a coefficient of decay. The coefficient of decay models the rate of evaporation for the pheromone between each run. A particular route's pheromone concentration is only updated by a particular ant if the ant's path has passed through that location.
Equations 2 and 3 correspond to a form elitism. Equation 2 is used for all routes (f, n) which have been traversed by the best ant in the generation. Equation 3 is used for all other routes. Dkb is either set to a constant or is set such that the increase in pheromone at this point is inversely proportional to the cost of the solution specified by the best ant Kb.
It is also necessary to determine the heuristic desirability value. For our purposes we will define ξ for the heuristic desirability as an undesirability and choose β to be negative so that the term ξβ in equation 1 represents a weighted desirability. The concept of undesirability can express several ideas, namely:
Undesirability ξ is defined as a combination of the undesirability of adding a particular new fabric node (Un) and the undesirability of adding new ports (UPh, UPd or UPn) on a host, device, or fabric node respectively. The heuristic undesirability Un of adding a particular new fabric node (n) is influenced by ideas (1) and (2) above, and is given as a formula in equation 6.
Here Brem(m) is the remaining bandwidth available on the already used fabric nodes Nused, b(g) is the bandwidth required for flow g, and the sum Σg=f+1|F| is over flows not yet assigned.
The undesirability UPh, UPd or UPn of adding a new port, is influenced by ideas (3), (4) and (5) above, and is defined in Equation 5.
In this equation Bused
The undesirability of a new route is defined to be one plus the sum of all relevant undesirabilities. For example, if the route does not require the addition of any new fabric elements, then ξ=1; if it only requires a new host port is added, ξ=Uph; if the route requires that a new fabric node and fabric node port are added to the network, ξ=Un+Upn+1, etc. No monetary cost term in the heuristics is included. That is, there is distinction, in term of ξ, made between different fabric node choices because of the monetary cost associated with that choice. This is something that is expected to be learned by the system. Since only the best ant updates the pheromone table, and the monetary cost f the network specified by that ant is incorporated into the updated pheromone values, as in equation 6, more cost effective routes are reinforced through τ.
Finding the Best Ant in the Colony
The best ant in a generation is determined by comparing number of port violations, amount over-allocated bandwidth, and monetary cost of each ant's resulting network. It is desirable to make a distinction between host or device port violations and fabric node port violations. This is because of the idea that it is harder to resolve host/device port violations than fabric node port violations, since a fabric node generally can support more ports (8-16) than a host or device (2). This makes one host/device port violation more significant than a fabric node port violation. When two ants are compared, it is beneficial to first look at the relative number of host/device port violations. An ant specifying a network with a smaller number of these type of port violations is considered to be better than the ant with a greater number. If two ants have the same number of host/device port violations, then the number of fabric node port violations is examined. If necessary the amount of overused bandwidth is compared, followed by monetary cost of each of the networks. Determination of the best ant in a colony is used when updating the pheromone matrix using equations 2 and 3. An equation similar to equation 2 can be used to calculate a cost (which is the inverse of the fitness) for each ant.
Consider one generation of two ants attempting to solve a four flow problem as shown in
We can then begin the search through a solution space using virtual ants.
The process is repeated for a predetermined number of iterations. For simplicity in calculations we will set α=Kh=Kd=Ks=Cd=1 and β=−1. These values, while serving to simplify drastically the following algorithm demonstration, are not realistic settings as they produce large unwanted disparities in the values calculated. The maximum bandwidth available on each port is 10e07 MB. Starting with the first ant (ant 0), we solve equation 3 for each possible routing choice for the first flow. Beginning with a direct connection as the first routing choice, the value τ is obtained exactly from the pheromone matrix for the particular flow (flowf) and routing choice (directionConnection).
The value of ξ is slightly more complex, and in this case it is calculated based on the need to add a new host port, and a new device port in order to route a flow from host 2 to device 1. Therefore for flow 3 to be a direct connection we can solve ξ=UPn+UPd+1 as illustrated in equation 6.
The value of ΣƒremainingB is 6.8e07+5.4e07+1.0e07=13.2e07. The amount of remaining bandwidth available on the ports available in the system, is calculated as the maximum amount of bandwidth available from each port multiplied by the number of ports and then subtracting off the amount of bandwidth used once this flow has been allocated on this fabric node. Therefore ΣPusedBα=10e07*2-9.7e07*2=0.6e07. The amount of fabric node bandwidth available is zero, as when flow 3 is routed directly there are no fabric nodes in the network. As such ξ becomes
We repeat this series of calculations for each of the other potential routes for flow 3: switch 0 and switch 1. The value of ξ for the probability of choosing switch 0 or switch 1 is composed of UPh, UPd, UPn and Un as each of a new host, device and fabric node parts would have to be added to the network in addition to a new fabric node. The computed values of Ux for switch 0 and switch 1 set out in table 2 below:
With knowledge of the appropriate value of τ and ξ for each of the possible choices for routing of flow 3, it is then possible to compute the appropriate probability of selecting each choice. The probability Pflow3, switch3 of selecting switch 0 is computed in equation 7.
We can also compute Pflow3,switch1 and Pflow3,directConnectionin this manner, the results of which are shown in table 3 below:
We then allow ant 0 to choose a route for flow 3 following the computed probabilities. For the sake of this example we will choose switch 0. Once the first route has been selected, the ant will then compute probabilities for flow 2 (the second flow in the series presented), as shown in the table 4, and choose a route, say through switch 0 again. Routes are then chosen for flow 1 and flow 1 in the same manner.
Once ant 0 has chosen its path, ant 1 also compiles a set of routes similarly (this may be done in parallel). Once all ants have designed a network by choosing routes for each flow, this completes the end of the generation, and pheromone matrix is now updated.
The value with which a particular location in the matrix changes may be computed in several ways depending on the particular use of the algorithm (i.e. using elitism or letting all ants attract the pheromone matrix at different levels). In this example we will use elitism with all flow/fabric node pairs which the best ant (kb) has chosen incremented by a fixed amount Dkb=0.1. If we assume that ant 0 choose switch 0, switch 0, switch 0, and a directConnection for flow 3, flow 2, flow 1, flow 0 respectively, and ant 1 choose to route all flows through switch 1, comparing the network design arising from these two sets of choices we see that ant 0 is better ant 1. The updated pheromone matrix appears in table 5 where the decay rate (p) is set to 0.1.
Thus a routing can be derived by picking the most favoured flows.
“Genetic Optimisation”
Potential solutions to the SAN problem (and indeed general network problems) can be investigated using genetic optimisation. A description of the genetic techniques used to cause genetic modification of an Ith set of network configurations to derive an (I+1)th set of network configurations, which represents a next generation of solutions and hence are children of the parent solutions in the Ith set of solutions, will be described. However in summary (and as will be described later with reference to
The connections between the hosts and devices as set out in
An alternate genome 0121 as shown in
A further genome 0101 as shown in
In genetics, it is well known that a genome can change through a combination of two processes. One of these is combination of the partial attributes of two genomes to create a new genome, and the other process is mutation where one or more genes within the genome may become altered. Returning to FIGS. 5 to 10, it can in fact be seen that each of the three networks shown results from the mutation of the third gene 14 specifying the interconnect arrangement between host 0 and device 1. All other genes within the genome remain unchanged within each of these three examples. Combination of genomes is analogous to the breeding of solutions. The combination can occur in different ways as illustrated in
In the alternative breeding arrangement shown in
In the breeding process, mutations naturally occur and hence the mutation process may be applied to the genomes produced by the breeding process. In general, mutation occurs at a low rate and hence for each gene in the genome a mutation function is applied to it where the far greater probability is that the gene will be left unaltered. However there remains a small possibility that any gene will be marked for change, and then once a gene is marked to mutate a further randomisation can be performed in order to change the content of that gene to a new value 0, 1, 2, 3, and so on. Hence numbers 1, 2 and 3 represent switches 1, 2 and 3 (and so on). The randomisation process is preferably weighted against the insertion of a new switch or other fabric element, with the weighting against the insertion of a new switch increasing as the number of iterations or generations have increased. This tends to stop the system from introducing new high cost solutions late on into the optimisation process.
Evaluation of Solutions
Each genome represents a SAN which is a theoretically viable solution. However as the trivial example in
Thus for each network a cost is computed. The cost can be computed in many ways and an exemplary cost computation is given in equation 8:
C=W1CM+W2Phd+WePf+W4b (equation 8)
The coefficients W1, W2, W3 and W4 represent the relative importance of each cost term. The terms Cm, Phd, Pf and b represent, respectively, the monetary cost of each of the components necessary, the number of host or storage device port violations, the number of fabric node port violations, and the amount of bandwidth in any communications channel which is in excess of the capability of that channel to carry.
The terms Cm, Phd, Pf and b are advantageously normalised to lie between zero and one. This is achieved by dividing the term by an over approximation of their worst case values. Thus, for example, the worst case monetary cost Cmw is approximated by the following formula:
Cmw=(Nh+Ch)+Nd*Cd)+(Nf*2)*(Cl+max(Cp))+Nf*maxCf) (equation 9)
where
Thus each possible solution is evaluated for its financial cost and its “badness” of unbuildability due to port violations or bandwidth violations. Typically each iteration of the genetic algorithm can be considered as being a new generation, and the number of individuals (and hence possible solutions) may be constrained. Therefore in each generation the possible solutions can be evaluated and then ranked in order of cost.
From the cost table, individuals are selected to form the next generation by breeding. To creating new offspring, two parents from the existing generation are selected randomly, but with those having a lower cost being given preferential weighting compared to those parents having a higher cost. A parent is not removed from the table after it has been used for breeding, and hence may breed several times, with different partners. Thus lower cost solutions can be regarded as being promiscuous. Furthermore elitism may be applied so that the lowest cost solution, in this case G12 is also copied directly into the next generation thereby persevering its genome intact. The breeding process as described with respect to
In the example considered hereinbefore the network has only been one layer deep, i.e. there has been one switch or hub between devices. However, this has only been the case so as to keep the example simple and in practice networks may have several layers. Consider for example a network having three servers S1, S2, S3 that are to connect to three user devices D1, D2 and D3 via a network of two layers. We may constrain the solutions by specifying that Each layer can have four elements in it and the maximum number of ports on a device is 6.
Let the flows F (each denoted as [server_id,device_id]) be: [1,1],[1,2],[1,3],[2,1],[3,2],[3,3].
Then a flow for this application is denoted:
In that genome the votes (i.e. the status of a slot) for each slot in layer 1 are:
And the votes for each slot in layer 2 are:
With these instantiations, the network flows can be drawn as shown in
It can be seen that the single layer approach can be extended to define genomes for use in a multi-layer network. Once the genomes have been established they can be combined and mutated using the processes described for the single layer genomes, and then the cost of implementing each solution tested as described hereinbefore. Thus the extension to multi-layer system is easy to perform.
The computer implements the process illustrated in
Number | Date | Country | Kind |
---|---|---|---|
0416484.4 | Jul 2004 | GB | national |