The present invention relates to a management device and a management method.
Conventionally, in a large scale system, such as a distributed computer system in a large scale data center, the hardware being responsible for processes is switched over, that is, processes are moved to different hardware, thereby enhancing the availability of the system. In one example, according to a known technique, a VM (Virtual Machines) host is operated on the hardware, and a VM guest is operated on this VM host.
The VM host is a program for virtually realizing the operating environment of another computer system. The VM guest operates as a virtual machine in an environment provided by the VM host, and is responsible for processes to be provided to a user. The VM guest can continue to perform processes even if it is moved to a different VM host.
Conventionally, there has been a known technique for detecting occurrence or a sign of failure in a computer on which the VM host operates, and there has been a known technique for moving a virtual machine guest to a different host.
Patent Document 1: Japanese Laid-open Patent Publication No. 2010-039730
Patent Document 2: Japanese Laid-open Patent Publication No. 2007-233687
Patent Document 3: Japanese National Publication of International Patent Application No. 2007-536657
However, in the conventional techniques, the VM guest is repeatedly moved to another VM host, upon occurrence of trouble in a computer on which the VM host operates. This results in difficulty in identifying which is the original VM host on which the VM guest originally operated. If the VM host on which the VM guest originally operated is not identified, it is difficult to recall the VM guest to the original VM host.
If the moved VM guest is not recalled to the original host, the relationship between the VM host and the VM guest randomly changes in accordance with the system operation. As a result, the hardware is occasionally not used as intended.
According to an aspect of an embodiment of the invention, a management device includes a memory and a processor coupled to the memory, wherein the processor executes a process including monitoring an operating state of a target device to be managed as a node of a network to be managed, moving a process executed by the target device to another node on the network, when a sign of failure is detected, as a result of the monitoring, and determining, at activation of the target device, whether there is a process having been moved from the target device to another node, and recalling the moved process from the another node to the target device when there is the process having been moved to the another node.
The object and advantages of the embodiment will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the embodiment, as claimed.
Preferred embodiments of the present invention will be explained with reference to accompanying drawings. In addition, the embodiments do not limit the technique disclosed herein.
The target device to be managed n1 is connected to a management device m1, the target device to be managed n2 is connected to a management device m2, and the target device to be managed n3 is connected to a management device m3. The management devices m1 to m4 build an overlay network for a network to which the target devices to be managed n1 to n4 belong, using a network interface of the target devices to be managed n1 to n4. The management devices m1 to m4 function as nodes of this overlay network, and can communicate with each other.
The management devices m1 to m4 have the same configuration, and thus the management device m1 will hereinafter be described by way of example. The management device m1 has a sign monitoring unit m14, a guest movement unit m15, and a guest recalling unit m16. The sign monitoring unit m14 monitors a sign of trouble in the target device to be managed n1. The guest movement unit m15 moves a process operated by the target device to be managed n1 to another target device to be managed, upon detection of a sign of trouble in the target device to be managed n1. The guest recalling unit m16 performs a process for recalling, as needed, the process moved from the target device to be managed n1 to another target device to be managed.
As illustrated in
The overlay network building unit m11 is a processing unit for building an overlay network for a target network to be managed, and has a communication processing unit m21, a hash processing unit m22, an information acquisition unit m23, and a notification unit m24.
The communication processing unit m21 performs a process for communicating with another node on the network in which the target device to be managed n1 participates as a node. The hash processing unit m22 obtains a hash value based on information acquired by the communication processing unit m21 from another node or information of the target device to be managed, and sets the obtained hash value as a key for the overlay network. The information acquisition unit m23 is a processing unit for acquiring information from another node of the overlay network through the communication processing unit m21. The notification unit m24 is a processing unit for notifying another node of the overlay network about information, through the communication processing unit m21.
The to-be-managed target search unit m12 performs a process for assuming that the target device to be managed n1 which is directly connected to the management device m1 is a self node and searching for a node which belongs to the same management region (domain) as that of the self node, from the overlay network built by the overlay network building unit m11.
The management information generating unit m13 generates management information representing the node acquired through the searching by the to-be-managed target search unit m12, as a to-be-managed target node.
The sign monitoring unit m14 monitors an operating state of the hardware, for example, a fan, memory, CPU (Central Processing Unit), and power supply unit of the target device to be managed n1, to detect a sign of trouble therein.
When the sign monitoring unit m14 detects a sign of failure, the guest movement unit m15 moves a process executed by the target device to be managed n1 to another node on the overlay network.
The guest recalling unit m16 determines whether there is any process moved from the target device to be managed n1 to another node, at the activation of the target device to be managed n1. When determined that there is a process moved to another node, the guest recalling unit m16 recalls the moved process from the destination node.
The management device ml preferably operates as a management program operating on a computer which is the target device to be managed n1. In the example illustrated in
In one of the servers 310 of the domain A, a VM (Virtual Machines) host program 311 for virtually realizing the operating environment of another computer system is operated. Four VM guest programs 312 operate on a VM host program 311. In this server 310, an operation management program 311a further operates on the VM host program 311. The operation management program 311a operating on the VM host program 311 controls the server 310 to function as a management device. The target device to be managed by this operation management program 311a includes the server 310, the VM host program 311 and a VM guest program 312 operating on the server 310.
In one of the servers 320 of the domain A, an OS (Operating System) 321 operates, and the operation management program 321a operates on the OS 321. This server 320 is connected to a switch 322 and a router 323. The operation management program 321a operating on this server 320 controls the server 320 to function as a management device. The target device to be managed by this operation management program 321a includes the server 320 itself, the switch 322 and the router 323 connected to the server 320.
In one of the servers 330 of the domain A, the OS (Operating System) 331 operates, and the operation management program 331a operates on the OS 331. This server 330 is connected to a storage 332. The operation management program 331a operating on this server 330 controls the server 330 to function as a management device. The target devices to be managed by this operation management program 331a include the server 330 itself and the storage 332 connected to the server 330.
Like the domain A, in the three servers included in the domain B, the operation management program operates on the VM host program or the OS of the server, and controls each server to function as a management device. Thus, each server, various programs operating on each server, and the hardware connected to each server are managed by the operation management program operating on a corresponding server.
The operation management programs on the servers communicate with each other, and build the overlay network. In addition, the operation management program can collect information about another node in the domain to which the operation management program belongs, and generate management information. The operation management program can be acquired from a terminal which can be accessed from both of the domain A and the domain B.
If the server is activated, the management program pg10 is read from the HDD p13, and is developed into a memory p12. The CPU (Central Processing Unit) p11 sequentially executes the programs developed in the memory, thereby controlling the server as a management device. At this time, a communication interface p14 of the server is used as an interface of the overlay network in the management device.
In the DHT, pairs of a Key and a value are distributed to nodes participating the overlay network, and kept therein. In the case of “Chord”, a value, which is obtained as a hashed result using a SHA (Secure Hash Algorithm)-1, is used as a key. Each key has a key which is a larger value than its own key, and is stored in the first node on which the management program operates.
In the example of
The vmhosts 1 to 3 and the servers 1 to 3 belong to the domain 1, are nodes on which the management program is operated, and are illustrated with a black circle mark in
As described above, a pair of a key and a value has a key which is a larger value than its own key, and is stored in the first node on which the management program is operated. Thus, the Keys 40 and 55 are stored in the node with the Key=66.
In the case of “Chord”, each node keeps information of a previous node, a following node, and a node of (self node key+2̂(x−1)) mod (2̂k) (x is a natural number from 1 to k, and k is a bit number of key), as routing information. Specifically, each node has information of discrete nodes, like 1, 2, 4, 8, 16, 32, 64, 128 . . .
As a result, in the Chord DHT, each node can store a Value corresponding to a Key in a node having a next larger Key than the Key. Further, each node can acquire the Value corresponding to the Key from the node having the next larger Key than the Key.
In
For a server, a server name is hashed with “SHA-1” to set a Key. The table has, as Values, a tag “server” representing a server, a server name, a key obtained using the server name, a list of IP addresses (IP list) that the server has, a list of WWN (WWN list) that the server has, a manager flag representing whether it functions as a management node, and a list of server-belonging domains and domain keys.
For a VM host, a VM host name is hashed with “SHA-1” to set a Key. The tables has, as Values, a tag “vmhost” representing a VM host, a VM host name, a key obtained using the VM host name, an IP list of VM hosts, a list of domains to which the VM hosts belongs and domain keys, and a list of VM guests operating on the VM host.
For a VM guest, a VM guest name is hashed with “SHA-1” to set a Key. The table has, as Values, a tag “vmguest” representing a VM guest, a VM guest name, a key obtained using the VM guest name, an IP list of VM guests, and a name and a key of a VM host on which the VM guest operates.
For a switch, a switch name is hashed with “SHA-1” to set a Key. The table has, as Values, a tag “switch” representing a switch, a switch name, a key obtained using the switch name, an IP list of switches, and a list of domains to which the switches belong and domain keys.
For the storage, a storage name is hashed with “SHA-1” to set a Key. The table has, as Values, a tag “storage” representing the storage, a storage name, a key obtained using the storage name, an IP list of the storages, a WWN list of the storages, and a list of the domains to which the storages belong and domain keys.
For a user, a user name is hashed with “SHA-1” to set a Key. The table has, as Values, a tag “user” representing a user, a user name, a key obtained using the user name, and a list of group names to which the users belong and group keys.
For a group, a group name is hashed with “SHA-1” to set a Key. The table has, as Values, a tag “group” representing a group, a group name, a key obtained using the group name, and a list of user names belonging to the group and keys.
For a domain, a domain name is hashed with “SHA-1” to set a Key. The table has, as Values, a tag “domain” representing a domain, a domain name, a key obtained using the domain name, and a list of keys of management devices in the domain.
In the example of
Similarly, a registered entry includes “type” representing “vmguest”, “node name” representing “vmguest12.domain1.company.com”, “key” representing “70”, “IP” representing “10.20.30.42”, “WWN” representing “null”. Further, a registered entry includes “type” representing “vmguest”, “node name” representing “vmguest13.domain1.company.com”, “key” representing “85”, “IP” representing “10.20.30.43”, and “WWN” representing “null”. A registered entry includes “type” representing “vmguest”, “node name” representing “vmguest14.domain1.company.com”, “key” representing “90”, “IP” representing “10.20.30.44”, and “WWN” representing “null”.
The node management table t4 of
The node management table t4 illustrated in
Specifically, the node management table t4 has an entry including “type” representing “vmhost”, “node name” representing “vmhost2.domain1.company.com”, “Key” representing “1”, “Domain Key” representing “5”, “Manager Flag” representing “false”, and “Managed Flag” representing “true”.
The node management table t4 has an entry including “type” representing “server”, “node name” representing “server1.domain1.company.com”, “Key” representing “15”, “Domain Key” representing “5”, “Manager Flag” representing “true”, “Managed Flag” representing “true”.
The node management table t4 has an entry including “type” representing “server”, “node name” representing “server2.domain1.company.com”, “Key” representing “20”, “Domain Key” representing “5”, “Manager Flag” representing “false”, and “Managed Flag” representing “true”.
The node management table t4 has an entry including “type” representing “vmguest”, “node name” representing “vmguest11.domain1.company.com”, “Key” representing “55”, “Domain Key” representing “5”, “Manager Flag” representing “false”, and “Managed Flag” representing “true”.
The node management table t4 has an entry including “type” representing “server”, “node name” representing “server3.domain1.company.com”, “Key” representing “66”, “Domain Key” representing “5”, “Manager Flag” representing “false”, and “Managed Flag” representing “true”.
The node management table t4 has an entry including “type” representing “vmguest”, “node name” representing “vmguest12.domain1.company.com”, “Key” representing “70”, “Domain Key” representing “5”, “Manager Flag” representing “false”, and “Managed Flag” representing “true”.
The node management table t4 has an entry including “type” representing “vmhost”, “node name” representing “vmhost3.domain1.company.com”, “Key” representing “75”, “Domain Key” representing “5”, “Manager Flag” representing “false”, and “Managed Flag” representing “true”.
The node management table t4 has an entry including “type” representing “vmguest”, “node name” representing “vmguest13.domain1.company.com”, “Key” representing “85”, “Domain Key” representing “5”, “Manager Flag” representing “false”, and “Managed Flag” representing “true”.
The node management table t4 has an entry including “type” representing “vmguest”, “node name” representing “vmguest14.domain1.company.com”, “Key” representing “90”, “Domain Key” representing “5”, “Manager Flag” representing “false”, and “Managed Flag” representing “true”.
The node management table t4 has an entry including “type” representing “vmhost”, “node name” representing “vmhost1.domain1.company.com”, “Key” representing “100”, “Domain Key” representing “5”, “Manager Flag” representing “true”, and “Managed Flag” representing “true”.
The node management table t4 has an entry including “type” representing “switch”, “node name” representing “switch1.domain1.company.com”, “Key” representing “110”, “Domain Key” representing “5”, “Manager Flag” representing “false”, “Managed Flag” representing “true”.
The node management table t4 has an entry including “type” representing “storage”, “node name” representing “storage1.domain1.company.com”, “Key” representing “115”, “Domain Key” representing “5”, “Manager Flag” representing “false”, and “Managed Flag” representing “true”.
The node management table t4 has an entry including “vmguest”, “node name” representing “vmguest21.domain1.company.com”, “Key” representing “120”, “Domain Key” representing “5”, “Manager Flag” representing “false”, and “Managed Flag” representing “true”.
Accordingly, the node management table t4 is a table for managing the nodes belonging to the domain 1. Thus, those nodes belonging to the domain 2 are not registered in this table.
In the example illustrated in
The routing table t5 has items, such as “distance” representing “3”, “node name” representing “vmhost2.domain1.company.com”, “Destination Key” representing “1”, and “Destination IP” representing “a1.b1.c1.d1”.
The routing table t5 has items, such as “distance” representing “5”, “node name” representing “vmhost2.domain1.company.com”, “Destination Key” representing “1”, and “Destination IP” representing “a1.b1.c1.d1”.
The routing table t5 has items, such as “distance” representing “9”, “node name” representing “vmhost2.domain1.company.com”, “Destination Key” representing “1”, and “Destination IP” representing “a1.b1.c1.d1”.
The routing table t5 has items, such as “distance” representing “17”, “node name” representing “vmhost2.domain1.company.com”, “Destination Key” representing “1”, and “Destination IP” representing “a1.b1.c1.d1”.
The routing table t5 has items, such as “distance” representing “33”, “node name” representing “node1.domain2.company.com”, “Destination Key” representing “4”, and “Destination IP” representing “a4.b4.c4.d4”.
The routing table t5 has items, such as “distance” representing “65”, “node name” representing “node3.domain2.company.com”, “Destination Key” representing “36”, “Destination IP” representing “a36.b36.c36.d36”.
Accordingly, the routing table t5 defines that routing is done for the Key1 (IP: a1.b1.c1.d1) when any of the nodes (key: 1,2,3,5,9,17) belonging to the domain 1 is the destination. The routing table t5 defines that routing is done for the Key 4 (IP: a4.b4.c4.d4) when the node (key: 33) belonging to the domain 1 is the destination, and defines also that routing is done for the Key 36 (IP: a36.b36.c36.d36) when the node (key: 65) belonging to the domain 2 is the destination.
When the sign monitoring process pg14 detects warning information as a sign of trouble in the operating state of the hardware, such as the fan, memory, CPU, and power supply unit (S102, Yes), the guest movement process pg15 searches the hash table t1 for another VM host (S103). At this time, the VM host to be searched for is preferably a VM host belonging to the same domain, that is, belonging to the same management region.
When another VM host has been found (S104, Yes), the guest movement process pg15 communicates with the VM host, and checks whether this VM host has enough capacity for moving the VM guest of the self host (S105). When another VM host has not been found (S104, No), or when the found VM host does not have enough capacity (S105, No), the process ends as is.
On the other hand, when the found VM host has enough capacity (S105, Yes), the guest movement process pg15 moves the VM guest to the found VM host (S106) and updates VM guest information of the moved VM guest in the DHT (S107).
When no VM guest information is included in the self-node table t2 (S203, No), the guest recalling process pg16 ends the process as is. On the contrary, when VM guest information is included in the self-node table t2 (S203, Yes), the guest recalling process pg16 searches the hash table t1 for information about the VM guest (S204), and specifies on which VM host the VM guest presently operates (S205). This specification is made by calculating a node having a hash table using the Key of the VM guest in the self-node table t2, and by finding the VM host using a value of the hash table of the VM guest.
The guest recalling process pg16 communicates with an operation management program on the destination VM host on which the VM guest presently operates, and inquires whether the VM guest can be moved (S206).
As a result of inquiry, if it is possible to move the VM guest (S207, Yes), the guest recalling process pg16 moves the VM guest to the original VM host (S208), updates the hash table, and ends the process.
As a result of inquiry, if it is not possible to move the VM guest (S207, No), the guest recalling process pg16 returns to Step S206, and periodically inquires of the target VM host for recalling. When the target VM host is in an enabled state to recall, the recalling VM host may be informed of it.
Accordingly, when the VM guest is moved, the hash table t1 is rewritten. Thus, it is possible to understand on which VM host the VM guest operates. The self-node table t2 is not rewritten upon movement of the VM guest, and represents the VM guest originally operating on the VM host. In addition, the self-node table t2 is kept in the SAN. Thus, information is not lost due to any failure happened in the VM host or restart of the VM host.
Therefore, when a further abnormality occurs in the target VM host to which the VM guest is moved, and when the VM guest is repeatedly moved, the VM guest can quickly and surely be recalled without following the movement track of the VM guest.
As described above, the management device, the management program, and the management method according to this embodiment monitor the operating state of the target device to be managed as a node of a target network to be managed, and moves a process executed by the target device to be managed to another node when a sign of failure is detected. A determination is made as to whether there is a process which has been moved to another node at the activation of the target device to be managed, and then the moved process is recalled. Thus, the moved process can surely be recalled.
According to the disclosed management device, the management program, and the management method of the present application, the moved VM guest can be recalled to the VM host on which the VM guest originally operated.
All examples and conditional language recited herein are intended for pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiment of the present invention has been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
This application is a continuation of International Application No. PCT/JP2010/061565, filed on Jul. 7, 2010 and designating the U.S., the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2010/061565 | Jul 2010 | US |
Child | 13735276 | US |