This invention relates to an information processing system, a management apparatus, and a management method.
Along with enlargement of a cluster system constructed in a data center or the like, a network to which each compute node in the cluster system is connected becomes large. For example, when tens of thousands of compute nodes exist, sometimes thousands of switches are arranged in the network. Because compute nodes and switches are connected by tens of thousands of cables in such a large cluster system, it is very difficult to manually find if there are incorrect cable connections.
There is a technique to find incorrect cable connections among devices. Specifically, a designed configuration is compared with a configuration in actual operation by matching design data concerning devices in a network system and a relation among the devices with data concerning whether or not the devices in the network system in actual operation exist and a relation among the devices.
However, an identifier (for example, node ID in InfiniBand, or MAC address and IP address on Ethernet (registered mark)) is allocated to a switch after starting operation of the network system. Therefore, there is a problem that it is impossible to match in case where association between an identifier allocated to the switch when designing and an identifier allocated to the switch after starting the operation of the network system is unclear.
Moreover, there is a technique in which information on connected devices, which is relevant to a configuration for the devices, is obtained plural times, a majority decision is made for the obtained connection device information, and the most probable connection device information is adopted. However, even if an actual connection relation is decided by a majority rule, it is impossible to solve the above problem.
Patent Document 1: International Publication Pamphlet No. WO 2006/117832
Patent Document 2: Japanese Laid-open Patent Publication No. 2000-31992
An information processing system includes relating to this invention includes: plural switches; plural information processing apparatuses; and a management apparatus. And the management apparatus comprises: a memory; and a processor configured to use the memory and execute a process. And the process includes: associating first switch identifiers included in first information with second switch identifiers included in second information by calculating a maximum matching in a bipartite graph for the first switch identifiers and the second switch identifiers, in which the first information represents designed cable connections between the plural switches and the plural information processing apparatuses, and the second information represents actual cable connections between the plural switches and the plural information processing apparatuses; converting, according to a result of the associating, types of the first switch identifiers included in the first information to types of the second switch identifiers; and detecting a difference between the first information that includes the first switch identifiers whose types were converted and the second information.
The object and advantages of the embodiment will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the embodiment, as claimed.
In this embodiment, a network is regarded as a graph in which an apparatus on the network (or a network interface) corresponds to a vertex, and a cable (or a bundle of cables) connecting apparatuses corresponds to an edge. “Network topology” corresponds to an isomorphism class of the graph generated in this manner.
A set of all permutations which do not change a shape of the graph forms a group. This group is called an automorphism group of the graph. If an actual connection situation is in accordance with design data, a graph based on connection data representing the actual connection situation and a graph based on the design data are isomorphic. In the following, a method to identify an incorrect cable connection based on such a graph theory will be explained.
The switch 301 and the switch 302 include plural ports. The switch 301 includes ports 3011 to 3013. The switch 302 includes ports 3021 to 3023. In
The compute nodes 311 to 316 include a connection end point (hereinafter referred to CEP). The CEP is, for example, a host channel adapter or a network interface card. The compute node 311 includes a CEP 3110. The compute node 312 includes a CEP 3120. The compute node 313 includes a CEP 3130. The compute node 314 includes a CEP 3140. The compute node 315 includes a CEP 3150. The compute node 316 includes a CEP 3160.
A port on a switch and a CEP on a compute node are physically connected by a cable (for example, an InfiniBand cable or an Ethernet (registered trademark) cable). The port 3011 on the switch 301 is connected to the CEP 3110 on the compute node 311. The port 3012 on the switch 301 is connected to the CEP 3120 on the compute node 312. The port 3013 on the switch 301 is connected to the CEP 3130 on the compute node 313. The port 3021 on the switch 302 is connected to the CEP 3140 on the compute node 314. The port 3022 on the switch 302 is connected to the CEP 3150 on the compute node 315. The port 3023 on the switch 302 is connected to the CEP 3160 on the compute node 316.
Assuming that cable connections illustrated in
A management apparatus 1 which performs a main processing is connected to the computer system 3.
The collector 101 generates connection data by collecting data representing a connection situation between switches and compute nodes, and stores the connection data in the connection data storage unit 102. The first association unit 103 performs a processing by using data stored in the connection data storage unit 102, data stored in the design data storage unit 105, and data stored in the endpoint data storage unit 104, and stores a result of the processing in the first association table storage unit 106. The graph generator 107 performs a processing by using data stored in the first association table storage unit 106, and stores a result of the processing in the graph data storage unit 108. The second association unit 109 performs a processing by using data stored in the graph data storage unit 108, and stores a result of the processing in the second association table storage unit 110. The detector 111 performs a processing by using data stored in the first association table storage unit 106 and data stored in the second association table storage unit 110.
Next, operation of the management apparatus 1 is explained by using
The collector 101 included in the manager 100 collects data representing a connection status between switches and CEPs, and generates connection data (
Returning to the explanation of
The first association unit 103 identifies a CEP ID corresponding to the switch ID identified at the step S13 from the connection data storage unit 102 (step S15).
The first association unit 103 identifies a CEP name corresponding to the CEP ID identified at the step S15 from the end point data storage unit 104 (step S17).
The first association unit 103 identifies a switch name corresponding to the CEP name identified at the step S17 from the design data storage unit 105 (step S19).
The first association unit 103 stores the switch ID identified at the step S13 in the first association table storage unit 106 in association with the switch name identified at the step S19 (step S21). The processing of the step S21 is skipped when data representing identical association has already been stored in the first association table storage unit 106.
Processings at the step S13 to S21 are explained by using
Association is not always decided (in other words, a switch ID is not associated with a switch name on a one-to-one basis) by only performing processings at the step S13 to S21. As an example, a status as illustrated in
In the status as illustrated in
Returning to the explanation of
By performing the processings described above, it is possible to identify a switch ID and a switch name possible to be associated. There is a case where association is decided by the processings so far.
Returning to the explanation of
When it is determined that the association between the switch IDs and the switch names has been decided (step S3: Yes route), the processing shifts to the processing of the step S7 to perform a detection processing. On the other hand, when it is determined that the association between the switch IDs and the switch names has not been decided (step S3: No route), the manager 100 performs a second association processing (step S5). The second association processing is explained by
First, the graph generator 107 included in the manager 100 reads out data (namely, switch IDs and switch names) from the first association table storage unit 106 (
The graph generator 107 generates a bipartite graph in which switch IDs correspond to a first vertex set, switch names correspond to a second vertex set, and line segments connecting vertices possible to be associated correspond to edges (step S33). Then, the graph generator 107 stores the generated bipartite graph data in the graph data storage unit 108.
If a switch ID is associated with a switch name on a one-to-one basis, data of a bipartite graph, which is illustrated in
Returning to the explanation of
Obtaining a maximum matching is obtaining a combination of edges among edges which exist between two vertex sets so that the number of edges becomes a maximum and endpoints of edges do not overlap. Algorithms like Hungarian method are used in the maximum matching. Since the algorithms of the maximum matching are known well, a further explanation is omitted here.
If a maximum matching in the bipartite graph is obtained in this manner, it is possible to identify association so that mismatching is the least as a whole.
Returning to the explanation of
First, the detector 111 reads out design data from the design data storage unit 105, and converts a switch name in the design data into a switch ID according to data representing association, which is stored in the first association table storage unit 106 or the second association table storage unit 110 (
The detector 111 converts a CEP name in the design data into a CEP ID according to data representing the association, which is stored in the end point data storage unit 104 (step S43).
The detector 111 detect a difference between connection data stored in the connection data storage unit 102 and design data which has been converted by the processings of the step S41 and S43 (step S45).
For example, assume that connection data as illustrated in the left side of
Returning to the explanation of
For example, in the situation as illustrated in
The first association processing in the second embodiment is explained by using
The collector 101 outputs a request for execution of a processing to the first association unit 103. Accordingly, the first association unit 103 identifies an unprocessed switch ID from the connection data storage unit 102 (step S53).
The first association unit 103 identifies a CEP ID corresponding to the switch ID identified at the step S53 from the connection data storage unit 102 (step S55).
The first association unit 103 identifies a CEP name corresponding to the CEP ID identified at the step S55 from the end point data storage unit 104 (step S57).
The first association unit 103 identifies a switch name corresponding to the CEP name identified at the step S57 from the design data storage unit 105 (step S59).
The first association unit 103 sets a counter value for association between the switch ID identified at the step S53 and the switch name identified at the step S59 in the first association table storage unit 106 (step S61).
Returning to the explanation of
By performing the processings as described above, it is possible to employ the most probable association among association between switch IDs and switch names. However, even if these processings are performed, the association is not decided when, for example, switch names which have the same counter value exist. In the case, the second association processing will be performed. For example, when the incorrect cable connections are sparse, and it can be expected that connecting plural wrong cables in a lump hardly happens, it is considered to be effective to perform the processings in this embodiment.
For example, in the situation as illustrated in
A processing performed by the regenerator 112 is explained by using
At the step S71, for example, a list as illustrated in FIG. 23 is generated. In the example of
Returning to the explanation of
The regenerator 112 identifies a switch name (hereinafter referred to as a switch name to be processed) corresponding to the switch ID to be processed from the second association table storage unit 110 (step S75).
The regenerator 112 identifies a CEP name corresponding to the switch name to be processed from the design data storage unit 105, generates a list of CEP names (step S77), and stores it in a storage device such as the main memory or the like.
Returning to the explanation of
The regenerator 112 regenerates connection data using the switch IDs to be processed and the CEP IDs identified at the step S79, and updates the connection data stored in the connection data storage unit 102 for the switch IDs (step S81). Apart of combinations of the switch IDs to be processed and the CEP IDs, which is stored in the connection data storage unit 102, is to be deleted by the processing of the step S81.
The regenerator 112 determines whether an unprocessed switch ID exists in the list generated at the step S71 (step S83). When it is determined that the unprocessed switch ID exists (step S83: Yes route), the processing returns to the processing of the step S73 to perform a processing for a next switch ID. On the other hand, when it is determined that the unprocessed switch ID does not exist (step S83: No route), the regenerator 112 requests the manager 100 to perform association again by using updated connection data.
By performing the processings as described above, a result with higher precision is obtained by the association which is performed again since it is possible to remove CEP IDs, which are considered to be invalid from a result of the association which has been performed, from connection data.
The first association processing in the fourth embodiment is explained by using
The collector 101 included in the manager 100 collects data representing a connection status between switches and CEPs from the computer system 3, and generates connection data (
Returning to the explanation of
The first association unit 103 identifies a CEP ID corresponding to the switch ID identified at the step S93 from the connection data storage unit 102 (step S95).
The first association unit 103 identifies a CEP name corresponding to the CEP ID identified at the step S95 from the end point data storage unit 104 (step S97).
The first association unit 103 determines whether design data used for a processing is stored in the design data storage unit 105 (step S99). When it is determined that the design data used for a processing is stored in the design data storage unit 105 (step S99: Yes route), the processing shifts to the processing of the step S103 since it is possible to perform the first association processing.
On the other hand, when it is determined that the design data used for a processing is not stored in the design data storage unit 105 (step S99: No route), a notification request to inquire about a management apparatus which has the design data used for the processing is transmitted to the layout management apparatus 5. Then, the first association unit 103 transmits, to the management apparatus which has the design data and is notified of from the layout management apparatus 5, a data transmission request for the design data, and obtains the design data used for the processing from a management apparatus that is a destination of the request (step S101). Then, the first association unit 103 stores the obtained design data in the design data storage unit 105. A part of the data used for the processing is stored in the design data storage unit 105 (namely, there is a case where the own management apparatus has a part of the design data). In such a case, the processing shifts to the No route of the step S99.
The first association unit 103 identifies a switch name corresponding to the CEP name identified at the step S97 from the design data storage unit 105 (step S103).
The first association unit 103 stores the switch ID identified at the step S93 in the first association table storage unit 106 in association with the switch name identified at the step S103 (step S105). The processing of the step S105 is skipped when data representing the identical association has already been stored in the first association table storage unit 106.
The first association unit 103 determines whether an unprocessed switch ID exists in the connection data storage unit 102 (step S107). When it is determined that the unprocessed switch ID exists (step S107: Yes route), the processing returns to the processing of the step S93 to perform a processing for a next switch ID. On the other hand, when it is determined that the unprocessed switch ID does not exist (step S107: No route), the processing returns to the calling-source processing.
By performing the processings as described above, it is possible to deal with a situation that plural management apparatuses dispersedly perform processings. Although the design data is dispersedly arranged in the plural management apparatuses in the example described above, it is possible to deal with a situation that the connection data is dispersedly arranged in the plural management apparatuses by the same mechanism.
Although the embodiments of this invention were explained above, this invention is not limited to those. For example, the functional block configuration of the management apparatus 1 do not always correspond to a program module configuration.
Moreover, as for the processing flow, as long as the processing results do not change, the turns of the steps may be exchanged. Furthermore, plural steps may be executed in parallel.
Display data generated by the detection processing is not limited to the aforementioned one.
Moreover, although an example in which the design data is converted in the detection processing is illustrated, the connection data may be converted in the detection processing.
Moreover, when the association is not decided even if the second association processing has been performed, an alert may be displayed.
In addition, the management apparatus 1 and compute nodes 311 to 316 are computer devices as illustrated in
The aforementioned embodiment is summarized as follows:
An information processing system relating to a first aspect of this embodiment includes: (A) plural switches; (B) plural information processing apparatuses; and (C) a management apparatus. And the management apparatus includes: (c1) an association unit that performs associating first switch identifiers included in first information with second switch identifiers included in second information by calculating a maximum matching in a bipartite graph for the first switch identifiers and the second switch identifiers, in which the first information represents designed cable connections between the plural switches and the plural information processing apparatuses, and the second information represents actual cable connections between the plural switches and the plural information processing apparatuses; (c2) a conversion unit that converts, according to a result of the associating, the first switch identifiers included in the first information to the second switch identifiers; and (c3) a detector that detects a difference between the first information that includes the first switch identifiers which were converted and the second information.
Thus, it is possible to identify incorrect cable connections. Therefore, it is possible to reduce a cost required to remove the incorrect cable connections.
Moreover, the association unit may include: (c11) a generator that generates the bipartite graph in which first vertices correspond to the first switch identifiers, second vertices correspond to the second switch identifiers, and edges are line segments that connect the first vertices with the second vertices, in which the first vertices and the second vertices are possible to be associated; and (c12) a calculator that calculates the maximum matching in the bipartite graph generated by the generator. It is possible to perform association between switch identifiers properly by using the bipartite graph.
Moreover, the first information may include the first switch identifiers and identifiers of information processing apparatuses connected with switches identified by the first switch identifiers, and the second information includes the second switch identifiers and identifiers of information processing apparatuses connected with switches identified by the second switch identifiers. And the association unit may include: (c13) a first specification unit that specifies, for each of the second switch identifiers, one or more first switch identifiers of one or more switches that are connected with an information processing apparatus which corresponds to an information processing apparatus connected with the switch of the second switch identifier to specify combinations of first and second switch identifiers possible to be associated. Thus, it is possible to perform association by using the maximum matching in the bipartite graph after narrowing combinations of first and second switch identifiers possible to be associated.
Moreover, the first specification unit includes: (c131) a counter that counts, for each of the second switch identifiers, a number of information processing apparatuses for each of the combinations to determine that a combination whose number of information processing apparatuses is the greatest is a combination of first and second switch identifiers possible to be associated. Thus, it is possible to employ the most probable combination.
Moreover, the management apparatus may further include: (c4) a second specification unit that specifies, for each of combinations which are combinations of the first and second switch identifiers and are included in the result of the associating, identifiers of information processing apparatuses connected with the switch of the first switch identifier included in the combination; a deletion unit that deletes identifiers of information processing apparatuses other than specified identifiers of the information processing apparatuses; and an execution unit that executes the associating. Thus, a result with higher precision can be obtained by the associating performed again since it is possible to remove identifiers of information processing apparatuses considered to be invalid from the result of the associating among identifiers of information processing apparatus, which are included in the second information.
Moreover, the first specification unit may include: (c132) a first obtaining unit that obtains, upon detecting that the management apparatus does not have at least a portion of the first information and the second information, information of an apparatus that has the portion from an allocation management apparatus that manages allocation of the first information and the second information; and a second obtaining unit that obtains the portion from the apparatus identified by obtained information. Thus, for example, it is possible to deal with a situation that processings are performed by, for example, plural management apparatuses.
A management method relating to a second aspect of this embodiment includes: (D) associating first switch identifiers included in first information with second switch identifiers included in second information by calculating a maximum matching in a bipartite graph for the first switch identifiers and the second switch identifiers, in which the first information represents designed cable connections between the plural switches and the plural information processing apparatuses, and the second information represents actual cable connections between the plural switches and the plural information processing apparatuses; (E) converting, according to a result of the associating, the first switch identifiers included in the first information to the second switch identifiers; and detecting a difference between the first information that includes the first switch identifiers which were converted and the second information.
Incidentally, it is possible to create a program causing a computer to execute the aforementioned processing, and such a program is stored in a computer readable storage medium or storage device such as a flexible disk, CD-ROM, DVD-ROM, magneto-optic disk, a semiconductor memory, and hard disk. In addition, the intermediate processing result is temporarily stored in a storage device such as a main memory or the like.
This application is a continuing application, filed under 35 U.S.C. section 111(a), of International Application PCT/JP2012/074024, filed on Sep. 20, 2012, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2012/074024 | Sep 2012 | US |
Child | 14661910 | US |