This application claims priority from French Patent Application No. 14 55446 filed Jun. 13, 2014, the contents of all of which are incorporated herein by reference in their entirety.
The invention concerns a management system for managing an interconnection network.
“Interconnection network” is understood here as any dedicated computer network (such as an InfiniBand network), or more generally, any collection of computer elements, particularly distributed processors, with physical communication links between them.
A management system utilizing a secondary network, which can be of the Ethernet type, is configured to manage this computer network in an out-of-band mode.
However, with the growth of the size of supercomputers, the topologies of high-performance computer networks are becoming denser and more complex. The result is that the out-of-band management, by means of a dedicated management network also called “secondary network,” of the interconnection network of a supercomputer requires more than one component in order to:
Moreover, for purposes of upscaling and ruggedness, each of the aforementioned tasks is to be done by separate processes, potentially from different dedicated machines. To that end, an efficient communication mechanism allows these components to share a common global state when dialoguing. Said mechanism should be provided by the management system. The messages are exchanged on the secondary network (or management network) of the interconnection network of the supercomputer. The management system allows the processes responsible for management of the interconnection network to communicate and to share a global state, a subset of which represents the state of the interconnection network of the supercomputer (i.e., all of the statuses of the equipment that comprises the interconnection network of the supercomputer).
In order to ensure communication between the processes being executed on different machines of the management network, said management network must, in particular:
In that regard, there are systems for sharing states such as distributed hash tables. However, none of the existing solutions meets all of the aforementioned requirements that should be met by a management system. In this instance, a distributed hash table cannot offer the last two obligations mentioned above.
An object of the present invention is to propose an interprocess communication mechanism in the form of a management system that meets the aforementioned requirements.
Another intention of the present invention is to propose a client/server-type communication architecture in order to interconnect distributed processes.
Another object of the present invention is to propose a system managing an interconnection network based on an interprocess communication.
Another object of the present invention is to propose an asynchronous and disconnected interprocess communication mechanism.
Another intention of the present invention is to propose a method for managing the interconnection network of a supercomputer.
To those ends, the invention relates, according to a first aspect, to a server of a system for managing an interconnection network, said server comprising:
The server of a management system of an interconnection network has, according to various embodiments, the following features, which may be combined:
According to a second aspect, the invention relates to a client of a system for managing an interconnection network, said client comprising:
The client of a system for managing an interconnection network further comprises a business process, said business process being provided with a publication client interface associated with said business process in such a way that said business process can publish an update of the global state of the interconnection network.
Advantageously, the data published by the client process is a message in the form of a “key-value” message.
According to a third aspect, the invention relates to a management system for managing an interconnection network comprising the server and the client introduced above.
According to a fourth aspect, the invention relates to a supercomputer comprising an interconnection network and the management system cited above.
Moreover, the supercomputer comprises:
Other objects and advantages of the invention will be seen from the description of the embodiments provided hereinbelow with reference to the appended drawings in which:
The system for managing an interconnection network is based on a client/server-type network architecture implementing different communication paradigms depending on the connection interfaces.
With reference to
The server 10 is configured to hold the global state of the interconnection network. Said global state comprises information concerning the interconnection network of the supercomputer. Said global state is stored in a key-value associative data structure 2. Said data structure 2 is stored in a random-access memory of the server 10.
The key-value associative data structure 2 is a data container, preferably local to the server 10, which has a particular protocol for adding, withdrawing and searching for elements. Said key-value associative data structure 2 associates a key with a value. The uniqueness of the keys should be ensured by the sender processes. If an already-existing key is updated, the former value is overwritten by the new one.
In one embodiment, the key-value associative data structure 2 is an associative table, also called hash table or hashmap, having a predefined association or hashing function. Advantageously, said particular data structure enables quick access to a value as a function of a key.
The server 10 of the management network further comprises:
The configuration server interface 3 enables the configuration of the management system to be communicated to a client wishing to join said management system of the interconnection network. The configuration sent comprises, in particular, the addresses (or URL for “Uniform Resource Locator”) of the other connection interfaces cited above, namely the publication server interface 4, the collection server interface 5, and the snapshot server interface 6.
Preferably, the connection server interfaces 3-6 are ZeroMQ sockets (the document at the link http://zeromq.org/intro:read-the-manual specifies the ZeroMQ or ZMQ sockets). Indeed, this embodiment makes it possible to avoid the problems of the request/response paradigm.
It should be noted that in the server implementation of the management system represented in
Said server implementation of the management system can be produced in C language, Python or any other appropriate programming language, and it can also be in mono-thread or multi-thread mode.
In a mono-thread implementation of the server 10, the server executes the following loop, based on the connection server interface event paradigm 3, 5, 6,
Advantageously, this implementation based on the connection server interface event paradigm 3, 5, 6 makes it possible to react quickly to an action from a client process (request or update). The conditional instruction “ACCORDING TO” manages the three connection server interfaces (configuration 3, collection 5 and snapshot 6) and a predefined waiting time (a timeout). It returns a list comprising the connection interface(s) triggered by an event of a connection server interface 3, 5, 6. If no event occurs when the timeout has expired, the server 10 sends to the management system clients a presence message (called heartbeat message).
The particular order in which the connection server interfaces 3, 5, 6 are processed allows:
Advantageously, in a mono-thread implementation of the server 10, a locking mechanism to prevent simultaneous and concurrent access to the data structure 2 and/or simultaneous utilization of a connection server interface 3-6 is not necessary.
The server 10 of the management system receives and processes key-value-type entries that can be interpreted as orders. These entries can originate from a client of the management system. Advantageously, this makes it possible to have cached entries, particularly when the data structure 2 is written in system files (the case of a UNIX-based system, for example).
A client 20 of the management system is understood here as a client process 21, which is generally configured to support at least one business process 31 for the execution of a business code 30.
The client 20 of the management system of an interconnection network comprises:
The client process 21 manages the incoming communications from the client 20 of the management system. For this purpose, said client process 21 has its own input client interfaces 24 and 26, namely that of the snapshot 26 and that of the subscription 24.
The publication client interface 25 allows a business process 31 to update the global state of the interconnection network that is shared and stored on the server side.
In
In order to be connected to the server 10, the client 20 is connected first to the configuration server interface 3 in order to receive all of the configuration parameters from the management system, including in particular the addresses (URLs) of the other connection server interfaces 4-6 of the server 10. In one embodiment, upon receipt on the configuration server interface 3 of a configuration request (for example, of the CONFIG? type), the server 10 (see the link between the connection interfaces 3 and 23):
Upon receipt of the configuration parameters, the client 20 decides whether or not to retrieve a snapshot of the current global state of the interconnection network (see the link between the connection interfaces 6 and 26). If so:
Accordingly, the client 20 is responsible for connecting to the publication server interface 4 (see the link between the connection interfaces 4 and 24) in order to receive updates, and to the collection server interface 5 (see the link between the connection interfaces 5 and 25) to possibly send updates (or new entries).
Preferably, the updates published by the server 10 are timestamped. Advantageously, the timestamping on the server side guarantees consistency over time (eventual consistency) of the global state within each client 20. It should be noted that a client 20 can ignore updates sent during the retrieval of a snapshot when said updates are prior to (thanks to the timestamping) the snapshot received.
Beyond the connection time of a client 20, the server 10 is configured to react to each request for update and to regularly send presence (heartbeat) messages.
Upon receipt of a request for update, or more generally, of a command, via the collection server interface 5:
By using the special input data fields, such as “purge” or “TTL” for “time to live,” the server 10 can delete this input data from the local data structure 2. Otherwise, if no input data exists, then it is added or updated in the data structure 2.
Following are examples of requests that the server 10 can receive:
The basic entity here is a message based on the “key-value” paradigm. This entity, in addition to the key-value pair, provides other information such as the number of the sequence, the unique universal identifier of that entity, the identity of the sender thereof, or the time to live of the entity.
This information can be sent in one or more data frames. Preferably, said information is sent in a first frame containing the key for the subscription mechanism, and a second frame containing the rest of the information (data, identifier of the sender, sequence number, for example).
Preferably, a dynamic downtime is calculated before the probe of the collection server interface 5 initiates a period of inactivity.
Advantageously, the server implementation 10 of the management network makes it possible:
In one embodiment, the management system is used for managing the interconnection (or computing) network of a supercomputer. For this purpose, the following modules can be required:
When the additional components cited above start up, the server 10 of the management network is already present so that all of the data published by the topology manager can be retrieved by the other clients of the server 10.
Advantageously, the status updates are done one by one, and after aggregation, a special key, called trigger element, is sent. Upon receipt of said trigger element, the routing calculator triggers its phase of calculating new routing tables. Thus, it is possible to easily and quickly differentiate the different types of messages sent by the server of the management system:
This advantageously results in an aggregation of the events in order to gain efficiency. In particular, the routing calculator does not calculate the routing tables for each change of status of a single piece of equipment.
In this regard, a mechanism based on prefixes of keys makes it possible not only for each client to be subscribed to a subset of the global state, but also to make this differentiation. Moreover, said mechanism makes it possible to add, in the management system, other types of information that are found in specific subsets. For example, the following prefixes can be utilized in a management system:
Advantageously, the different embodiments described above utilize different communication paradigms depending on the connection interfaces. For example, the publication server interface 4 of the server 10 operates in “broadcast” mode and can use a “multicast” protocol (symmetrical with the subscription server interface 24 of the side of the client 20). Moreover, a client 20 can subscribe to one or more prefixes of keys, which can enable it to retrieve only a subset of the global state stored in the data structure 2.
Number | Date | Country | Kind |
---|---|---|---|
14 55446 | Jun 2014 | FR | national |
Number | Name | Date | Kind |
---|---|---|---|
20090276771 | Nickolov | Nov 2009 | A1 |
20130318199 | Le Jouan | Nov 2013 | A1 |
20140280913 | Karren | Sep 2014 | A1 |
20140304326 | Wesley | Oct 2014 | A1 |
20150117174 | Alber | Apr 2015 | A1 |
20150188777 | Frost | Jul 2015 | A1 |
20150334162 | Krishnamurthy | Nov 2015 | A1 |
Number | Date | Country | |
---|---|---|---|
20150365284 A1 | Dec 2015 | US |