This application is the US national phase of international application PCT/GB2003/002631 filed 19 Jun. 2003 which designated the U.S. and claims benefit of EP 02254294.8, dated 20 Jun. 2002, the entire content of which is hereby incorporated by reference.
The present invention relates to a distributed computer and to a method of operating a computer forming a component of a distributed computer.
The relatively low cost of today's microprocessors mean that the most economic way of building a powerful computer is to interconnect a number of low cost microprocessors to provide a distributed computer. Although a purpose-built distributed computer will often be a unit of equipment comprising tens or hundreds of processors interconnected via a high-speed bus, the common arrangement of desktops PCs interconnected by an office LAN is also a form of distributed computer.
One application of a distributed computer is the carrying out of a task which is too demanding to be solved quickly by a computer having a single processor. In such a case, it is necessary to divide the task to be performed amongst the plurality of processors present in the distributed computer. This is known as processor allocation or ‘load balancing’.
Distributed computers should also be tolerant to the failure or shutdown of one of the processors within them—systems of this type are disclosed, for example, in International Patent Application WO 01/82678, and European Patent applications 0 887 731 and 0 750 256.
A number of processor allocation or load balancing algorithms have been disclosed. In EAGER D. L., LAZOWSKA, E. D., and ZAHORJAN, J.: “Adaptive Load Sharing in Homogeneous Distributed Systems,” IEEE Trans. On Software Engineering, vol. SE-12, pp. 662-675, May 1986, three algorithms are considered. One of those algorithms involves each processor creating a new process (i.e. contemplating starting another component of the task) in: a) finding whether it is overloaded, and, b) sending the new process to another randomly-chosen processor. The processor receiving the new process then carries out a similar procedure. This continues either until a processor accepts the new process or a hop-count is exceeded.
In other algorithms, one or more processors is given the task of tracking how heavily-loaded other processors in the distributed computer are. If the processors within the distributed computer are organised into a logical hierarchy independent of the physical structure of the network interconnecting the different processors, the task of monitoring levels of usage of the processors can be split-up in accordance with that hierarchy. An example of this is seen in WITTIE, L. D., and VAN TILBORG, A. M.: “MICROS, a Distributed Operating System for MICRONET, A Reconfigurable Network Computer,” IEEE Trans. On Computers, vol. C-29, pp. 1133-1144, December 1980. New processes can be generated anywhere within the logical hierarchy and are escalated sufficiently far up the hierarchy to a ‘manager’ processor which has a sufficient number of subordinates to carry out the task. The manager then delegates the component tasks back down the hierarchy.
According to a first aspect of the present invention, there is provided a method of dividing a task amongst a plurality of nodes within a distributed computer, said method comprising:
By calculating task group topology data representing nodes and interconnections between them in dependence on requirements data entered by a user/administrator, and then distributing a task to be performed between nodes in accordance with the calculated topology, a more flexible method of utilising the resources of a distributed computer than has hitherto been known is provided. It is to be understood that the task group will not necessarily equate to the physical topology of the nodes and interconnections between them in the distributed computer. The nodes and connections used will often be a subset of those available—also a logical connection represented in the task group topology data might represent a concatenation of a plurality of physical connections.
Preferably, said topology calculation comprises the step of comparing said requirements data with node capability data for a node available to join said task group. This provides a convenient mechanism for automatically generating the task group topology.
Preferably, said requirements data is arranged in accordance with a predefined data structure defined by requirements format data stored in said computer, said method further comprising the step of verifying that said requirements data is formatted in accordance with predefined data structure by comparing said requirements data to said requirements format data. Defining the format of said requirements data in this way allows for easier communication of requirements data between computers. In preferred embodiments, the extensible Markup Language (XML) is used to define the format data, and known XML parsing programs are used to check the format of requirements data.
Similar considerations apply to the node capability data.
In some embodiments, said method further comprises the step of operating a node seeking to join said task group to generate node capability data and send said data to one or more nodes already included within said task group.
Advantageously, said task distribution involves a node forwarding a task to a node which neighbours it in said task group topology. This provides a convenient way of utilising the generated topology in the subsequent calculation.
According to a second aspect of the present invention, there is provided a distributed computer apparatus comprising:
Advantageously, each node further has recorded therein received program data execution code executable to receive program data from another of said nodes and to execute said program. Preferably, said plurality of processor nodes comprise computers executing different operating systems programs, and said received program execution code is further executable to provide a similar execution environment on nodes despite the differences in said operating system programs. This means that embodiments of the invention can carry out calculations across a heterogeneous computer network and increases the possibilities for utilising the processing power and memory of idle computers in a typical computer network comprising computers based on different hardware architectures and/or running different operating system programs.
According to a third aspect of the present invention, there is provided a method of operating a member node of a distributed computing network, said method comprising:
By controlling a member node of a distributed computing network to compare profile data from another computer with criteria indicated by membership policy data accessible to the member node, and updating distributed computing network data accessible to the member node if said profile data indicates that said one or more criteria is met, a distributed network whose membership accords with said policy data is built up. Provided the policy reflects the distributed task that is to be shared amongst the members of the distributed computing network, a distributed computer network whose membership is suited to the distributed task to be shared is built up.
Preferably, the member node stores said distributed network membership data. This results in a distributing computing network which is more robust than networks where this data is stored in a central database. Similarly, in some embodiments, said member node stores said membership policy data.
In preferred embodiments, the method further comprises the steps of:
This allows the distributed computing network to be dynamically reconfigured in response, for example, to a change in the task to be performed or the addition of a new type of node which might apply to become a member of the distributed computing network.
According to a fourth aspect of the present invention, there is provided a computer program product loadable into the internal memory of a digital computer comprising:
By way of example only, specific embodiments of the present invention will now be described with reference to the accompanying Figures in which:
Attached to the fixed local area network 10 are a server computer 218, and three desktop PCs (219, 220, 221). The first wireless local area network 12 has a wireless connection to a first laptop computer 223, the second wireless local area network 14 has wireless connections to a second laptop computer 224 and a personal digital assistant 225.
Also illustrated is a compact disc which carries software which can be loaded directly or indirectly onto each of the computing devices of
As dictated by the DTD, a profile document consists of eight sections, some of which themselves contain one or more fields.
In the present embodiment, the eight sections relate to:
An example of an XML document created in accordance with the DTD shown in
The fields specified in the Document Type Definition and the values placed in the above profile written in accordance with that DTD will be self-explanatory to those skilled in the art. The generation of a profile document in accordance with the above DTD will be described further on.
Policy documents may also cause the node which receives them to carry out an action specified in the policy.
As dictated by the DTD, a profile document consists of two sections, each of which has a complex logical structure.
The first section 100 refers to the creator of the policy and includes fields which indicate the level of authority enjoyed by the creator of the policy (some computing devices may be programmed not to take account of policies generated by a creator who has a level of authority below a predetermined level), the unique name of the policy, the name of any policy it is to replace, times at which the policy is to be applied etc.
The second section 102 refers to the individual computing devices or classes of computing devices to which the policy is applicable, and sets out the applicable policy 104 for each of those individual computing devices or classes of computing devices.
Each policy comprises a set of ‘conditions’ 106 and an action 108 which is to be carried out if all those ‘conditions’ are met. The conditions are in fact values of various fields, e.g. processing power (represented here as ‘BogoMIPS’—a term used in Linux operating systems to mean Bogus Machine Instructions Per Second) and free memory. It will be seen that many of the conditions correspond to fields found in a profile document.
An example of an XML document created in accordance with the DTD shown in
Much of the above program is explained in Bubak M, Plaszczak P, “Hydra—Decentralized And Adaptative Approach To Distributed Computing”, Applied Parallel Computing, New Paradigms for HPC in Industry and Academia, 5th International Workshop, PARA 2000, 18-20 Jun. 2000, Springer-Verlag pp 242-9. The salient features of the classes are given below together with a full description of the additions and alterations made in order to implement the present embodiment.
As explained in that paper, the purpose of the software is to allow a task to shared amongst a plurality of computing devices. A user must provide a sub-class of a predetermined SimpleTask or CompositeTask abstract class in order to specify the task that he or she wishes to be carried out by the devices (218-225) included within the internetwork.
Whenever a new task arrives at the computing device running the program, the Secretary module 106 handles its reception and stores it using the Task Repository 108 module until the task is carried out as explained below.
The Work Manager module 110 causes a task to be carried out if a task arrives at the computing device and the computing device has sufficient resources to carry out that task. Each task results in the starting of a new execution thread 112 which carries out the task or, in insufficient resources are available at the device, delegates some or all of the class to one of a selected subset (218-220, 225) of computing devices (218-225) which form a task group suitable for carrying out the task. The manner in which the task group (218-220, 225) is assembled will be explained below.
The Guardian module 114 provides the interface to the other computing devices in the internetwork (
The Topology Centre module 118 maintains a remote graph data structure—a graph in this sense being a network comprising a plurality of nodes connected to one another via links. Each of the computing devices which is a member of the task group (218-220, 225) is represented by an RMI remote object in the remote graph data structure. When computing devices connect to or are disconnected from the computing device network, this is requested using RMI and results in the computing devices updating their remote graph data structures accordingly.
Lastly, the Initiator module comprises two objects. One, the Initiator object, initiates the computing device. The other, the ReferenceServer object, maintains the references to the created modules.
Each of the computing devices (218-225) also stores a launch script. The processes carried out by each computing device on execution of that script are illustrated in
Turning to
Thereafter, in step 132, a MetaDataHandler execution thread is started together with another execution thread (step 140) which runs the Initiator class (
Many of the fields of the profile document are to be found in the files created at the time of the preliminary system information collection step (step 130) as follows:
The remaining entries in the profile by utility software which forms part of the MetaDataHandler thread.
The MetaDataHandler thread then opens a socket on port 1240 and listens for connections from other computing devices. The action taken in response to receiving a file via that socket will be explained below with reference to
The part of the script which launches the Initiator class may include the RMI name of a computing device to connect to (it will not if the computing device concerned is the first node in the task group). If it does, then the Initiator class results in an attempt to connect to that node. An example will now be explained with reference to
A script including a reference to the server 218 is run on the PC 219. As explained above, this results in the Initiator class 120 being run on the PC 219. This in turn requests HydraNodeConnector 150 to connect to the server 218 (HydraNodeConnector is an interface for connection decision making, implemented by RegnoTopologyCentre 118). HydraNodeConnector decides to fulfil the request and sends it to Guardian 152, which passes it to NodeGateImpl 154. As mentioned above, NodeGateImpl encapsulates RMI technology. NodeGateImpl 154 uses Naming class (a standard RMI facility) to obtain a reference to NodeGate of the server 218 (NodeGate is the node remote interface seen by other nodes, normally implemented by NodeGateImpi). As soon as it has the reference, NodeGateImpl 154 requests NodeGate of the server 218 to connect. The request contains the remote reference to RemoteGraphNode of the PC 219 and the XML profile document representing the capabilities of the PC 219.
When received at the server 218, the request is passed to the Guardian and then to the HydraNodeConnector. As explained below, the MetaDataHandler thread determines whether the request to connect to the distributed computing network should be accepted and informs HydraNodeConnector accordingly. In the present case, the connection is accepted. Hence, HydraNodeConnector supplies the local RemoteGraphNode with a reference to its counterpart on the PC 219 and orders the RemoteGraphNode to establish a connection. The server 218 and the PC 219 exchange references and link to each other using their internal connection mechanisms.
The task group topology databases in the server 218 and the PC 219 are then updated accordingly.
The response of a computing device running the MetaDataHandler execution thread to receipt of a profile XML document will now be explained with reference to
On receiving a profile file (step 170), the MetaDataHandler checks that the XML document is well-formed—a concept which will be understood by those skilled in the art (step 172). This check is carried out by an XML parser—in the present case the Xerces XML parser available from the Apache Software Foundation is used. Thereafter, in step 174, the MetaDataHandler recognises the input file as a profile which results in the use of an evaluateConditions method of a PolicyHandler class to check the profile against any policies stored in the computing device which has received the profile document.
This involves a comparision of the values stored in the profile which those stored in the policy. The nature of that comparison (i.e. whether, for example, the value in the profile must be equal to the value in the policy or can also be greater than) is programmed into the PolicyHandler class. To give an example, the policy example given above includes a value of 112000K between <HD> tags. The profile example given above has two sets of data relating to permanent memory, one for each of two hard discs. The second set of data is:
<HDTotal>16496</HDTotal>
<RDUsed>12007</HDUsed>
In this case, the PolicyHandler class is programmed to calculate the amount of free hard disc space (i.e. 4489K) and will refuse connection since that amount is not greater than or equal to the required 112000K of permanent storage.
In step 178, it is determined whether all the required conditions are met. If they are the connection is formed (step 180) and the task group topology data is updated (step 182) as described above. If one or more of the conditions is not met then the profile is forwarded to another node in the internetwork (step 184).
If, on the other hand, the file received on the port associated with the MetaDataHandler execution thread is a policy, then the processing shown in
The first step is identical to that carried out in relation to the receipt of a profile file. After receipt (step 190), the file is checked (step 192) to see whether it is well-formed. Thereafter, the policy file is validated by checking it against the structure defined in the relevant DTD. As will be understood by those skilled in the art, the DTD may be incorporated directly in the policy file, or it can be a separate file which is referenced in an XML DOCTYPE declaration as a Universal Resource Identifier (URI). The policy document therefore includes information on the location of the DTD to use—normally, the DTD will be stored at an accessible web server. Thereafter, the Network Policy subsystem is started (step 194). This then causes a check to be carried out to see whether the policy uses the correct date system and has sensible values for parameters (step 196). The computing device receiving the policy then extracts the domain and/or subject-list within the policy document (step 198). A test (step 200) is then carried out to see whether the receiving computing device is within a domain to which the policy applies or is included in a list of subjects to which the policy applies.
If the computing device is not in the target group then it forwards the policy to its neighbours which are yet to receive the policy (step 202). This forwarding step is carried out in accordance with the so-called echo pattern explained in Koon-Seng Lim and Rolf Stadler, ‘Developing pattern-based management programs’, Center for Telecommunications Research and Department of Electrical Engineering, Columbia University, NewYork, CTR Technical Report 503-01-01, Aug. 6, 2001. The physical topology information 34 found in the profile is used as an input to this step.
If the computing device is within the target group then it checks whether if already has the policy (steps 204 and 206). If the policy is already stored, then it is just forwarded (step 208) as explained in relation to step 202 above. Alternatively, the current policy can be overwritten, thus providing a mechanism for updating a policy.
If the policy is not already stored, then it is stored (step 210). Copies of the policy are then forwarded as explained above. It is to be noted that the policy may specify that the node receiving the policy is to re-send its profile to the node to which it initially connected. If this is combined with a replacement of the policy adopted by the node to which it initially connected, repeating the joining steps explained above will re-configure the distributed computing network in accordance with the replacement policy.
An example of the operation of the above embodiment will now be explained with reference to
The adminstrator of the internetwork of
He supplies that policy to the server computer 218 and runs a script as explained above, but without specifying the IP address of a host to connect to. Thereafter, he amends the script to specify the server 218 as the device to connect to, makes the condition relating to processor speed less stringent, and copies the amended policy to each of the computing devices within the internetwork. He then runs the script in numerical order of host addresses (i.e. he runs it on personal computer 219 first, then personal computer 220 etc).
In this example, it is supposed that the resultant attempts to connect to the server 218 by the personal computer 221 and the laptop computers 223 and 224 fail because their utilisation is greater than 5%. As explained in relation to
However, the personal digital assistant might pass the utilisation test, but fail the test on processor speed. In this case, although the server 218 rejects the request, the personal computer 219 will accept the request.
It will be realised by those skilled in the art, that the resulting logical topology (which places the fastest processors closest to the centre of the task group) will result in better performance than had the personal digital assistant connected directly to the server 218. It will be seen how the generation of policies and profiles and comparison of the two prior to accepting a connection to a task group allows the automatic generation of a logical topology which suits the nature of the distributed task which is to be carried out. Thus, the same set of network nodes can be arranged into different distributed networks in dependence on policies which might reflect, for example, a requirement for large amounts of memory (e.g. in a file-sharing network), a requirement for low latency (e.g. in a multi-player gaming network), a requirement for stored energy to drive a radio transmitter (in an ad hoc wireless network) or a requirement for processing power (e.g. in a network performing a massive calculation).
Many variations on the above embodiment are possible. Some of the possible variations are listed below:
Number | Date | Country | Kind |
---|---|---|---|
02254294 | Jun 2002 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/GB03/02631 | 6/19/2003 | WO | 00 | 12/10/2004 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO04/001598 | 12/31/2003 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5423037 | Hvasshovd | Jun 1995 | A |
5442791 | Wrabetz | Aug 1995 | A |
5732397 | Detore | Mar 1998 | A |
5745687 | Randell | Apr 1998 | A |
5774668 | Choquier et al. | Jun 1998 | A |
5790848 | Wlaschin | Aug 1998 | A |
5829023 | Bishop | Oct 1998 | A |
5881231 | Takagi et al. | Mar 1999 | A |
5978791 | Farber | Nov 1999 | A |
6128590 | Stadel et al. | Oct 2000 | A |
6249844 | Schloss | Jun 2001 | B1 |
6272612 | Bordaz | Aug 2001 | B1 |
6330621 | Bakke et al. | Dec 2001 | B1 |
6336177 | Stevens | Jan 2002 | B1 |
6353608 | Cullers et al. | Mar 2002 | B1 |
6393485 | Chao et al. | May 2002 | B1 |
6405284 | Bridge | Jun 2002 | B1 |
6438705 | Chao et al. | Aug 2002 | B1 |
6463457 | Armentrout et al. | Oct 2002 | B1 |
6505283 | Stoney | Jan 2003 | B1 |
6605286 | Steidler | Aug 2003 | B2 |
6622221 | Zahavi | Sep 2003 | B1 |
6631449 | Borrill | Oct 2003 | B1 |
6662235 | Callis et al. | Dec 2003 | B1 |
6801949 | Bruck et al. | Oct 2004 | B1 |
6871219 | Noordergraaf | Mar 2005 | B2 |
6898634 | Collins | May 2005 | B2 |
6961539 | Schweinhart et al. | Nov 2005 | B2 |
7062556 | Chen et al. | Jun 2006 | B1 |
7069295 | Sutherland et al. | Jun 2006 | B2 |
7127606 | Wheeler et al. | Oct 2006 | B2 |
7152077 | Veitch et al. | Dec 2006 | B2 |
7296221 | Treibach-Heck et al. | Nov 2007 | B1 |
7434257 | Garg et al. | Oct 2008 | B2 |
7610333 | Robertson et al. | Oct 2009 | B2 |
20010034709 | Stoifo et al. | Oct 2001 | A1 |
20010034791 | Clubb et al. | Oct 2001 | A1 |
20020002577 | Garg et al. | Jan 2002 | A1 |
20020091833 | Grimm et al. | Jul 2002 | A1 |
20020099815 | Chatterjee et al. | Jul 2002 | A1 |
20020114341 | Sutherland et al. | Aug 2002 | A1 |
20020129248 | Wheeler et al. | Sep 2002 | A1 |
20020133681 | McBrearty et al. | Sep 2002 | A1 |
20020138471 | Dutta et al. | Sep 2002 | A1 |
20020138659 | Trabaris et al. | Sep 2002 | A1 |
20020156893 | Pouyoul et al. | Oct 2002 | A1 |
20020184310 | Traversat et al. | Dec 2002 | A1 |
20030032391 | Schweinhart et al. | Feb 2003 | A1 |
20030046270 | Leung et al. | Mar 2003 | A1 |
20030061491 | Jaskiewicz et al. | Mar 2003 | A1 |
20030115251 | Fredrickson et al. | Jun 2003 | A1 |
20030163457 | Yano et al. | Aug 2003 | A1 |
20030204856 | Buxton | Oct 2003 | A1 |
20040054807 | Harvey et al. | Mar 2004 | A1 |
20040064568 | Arora et al. | Apr 2004 | A1 |
20050022014 | Shipman | Jan 2005 | A1 |
20050050291 | Chen et al. | Mar 2005 | A1 |
20050257220 | McKee | Nov 2005 | A1 |
20060117046 | Robertson et al. | Jun 2006 | A1 |
20060149836 | Robertson et al. | Jul 2006 | A1 |
20080059746 | Fisher | Mar 2008 | A1 |
Number | Date | Country |
---|---|---|
0 481 231 | Sep 1991 | EP |
0 515 073 | Nov 1992 | EP |
0 750 256 | Jun 1996 | EP |
0 887 731 | Jun 1998 | EP |
1248441 | Oct 2002 | EP |
2002-027375 | Jan 2002 | JP |
9630839 | Oct 1996 | WO |
WO9809402 | Mar 1998 | WO |
9944334 | Sep 1999 | WO |
0182678 | Nov 2001 | WO |
0229551 | Apr 2002 | WO |
03069480 | Aug 2003 | WO |
04001598 | Dec 2003 | WO |
2005121965 | Dec 2005 | WO |
Number | Date | Country | |
---|---|---|---|
20050257220 A1 | Nov 2005 | US |