This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2015-059641, filed on Mar. 23, 2015, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to an information processing apparatus, a computer-readable recording medium, an information processing method, and a data center system.
Conventionally, there is a technology for monitoring apparatuses, such as computers, and systems, and addressing a failure when the failure occurs in an apparatus or a system targeted for the monitoring. Furthermore, in the conventionally performed handling of failures, after a failure is detected, pieces of log information or the like on an apparatus or the like in which a failure occurs are collected and analyzed and then the handling is performed. Furthermore, failures that can be handled by specific engineers are limited to some extent. With regard to the conventional technology, see Japanese Laid-open Patent Publication No. 2011-118685, Japanese Laid-open Patent Publication No. 2006-318311, and Japanese Laid-open Patent Publication No. 2011-197785, for example.
However, if a failure occurs in a data center system constituted by a plurality of data centers, in the conventional technology, there may be a case in which it is difficult to appropriately select an engineer who handles the failure that has occurred. Thus, there is a problem in that it takes time to handle the failure that has occurred in the data center.
According to an aspect of an embodiment, an information processing apparatus includes a receiving unit and a specifying unit. The receiving unit receives information on a failure that has occurred in each of data centers arranged in a plurality of locations. The specifying unit compares area information that indicates a characteristic related to a failure in the data center in which the failure has occurred with area information that is associated with an engineer based on a task. Then, the specifying unit specifies, as a failure handling candidate from among engineers, an engineer who is associated with area information that is similar to the area information that indicates the characteristic related to the failure in the data center in which the failure has occurred.
According to another aspect of an embodiment, a computer-readable recording medium has stored therein an information processing program. The information processing program causes a computer to execute a process. The process includes: receiving information on a failure that has occurred in each of data centers arranged in a plurality of locations; comparing area information that indicates a characteristic related to a failure in the data center in which the failure has occurred with area information that is associated with an engineer based on a task; and specifying, as a failure handling candidate from among engineers, an engineer who is associated with area information that is similar to the area information that indicates the characteristic related to the failure in the data center in which the failure has occurred.
According to still another aspect of an embodiment, an information processing method includes: receiving, performed by a computer, information on a failure that has occurred in each of data centers arranged in a plurality of locations; comparing, performed by the computer, area information that indicates a characteristic related to a failure in the data center in which the failure has occurred with area information that is associated with an engineer based on a task; and specifying, performed by the computer, as a failure handling candidate from among engineers, an engineer who is associated with area information that is similar to the area information that indicates the characteristic related to the failure in the data center in which the failure has occurred.
According to still another aspect of an embodiment, a data center system includes data centers and an information processing apparatus. The data centers are arranged in a plurality of locations. The information processing apparatus includes a receiving unit and a specifying unit. The receiving unit receives information on a failure that has occurred in each of the data centers. The specifying unit compares area information that indicates a characteristic related to a failure in the data center in which the failure has occurred with area information that is associated with an engineer based on a task. The specifying unit then specifies, as a failure handling candidate from among engineers, an engineer who is associated with area information that is similar to the area information that indicates the characteristic related to the failure in the data center in which the failure has occurred.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
Preferred embodiments of the present invention will be explained with reference to accompanying drawings. In the embodiment, it is assumed that the present invention is used in a data center system that includes a plurality of data centers that provide virtual machines. Furthermore, the present invention is not limited to the embodiment. The embodiments can be appropriately used in combination as long as processes do not conflict with each other.
Configuration of a Data Center System According to an Embodiment
The management center 10 manages the plurality of data centers 11. For example, in accordance with the occurrence of failure, the management center 10 analyzes the failure state, estimates a requested skill, and specifies an appropriate engineer. Furthermore, the management center 10 may also be integrated with one of the data centers 11.
The data centers 11 are arranged in geographically separate locations with each other. In the embodiment, it is assumed that each of the data centers 11 is arranged in a different region, such as a different country. For example, it is assumed that the data centers 11A, 11B, and 11C are set in an area A, an area B, and an area C, respectively. Furthermore, in the embodiment, a description will be given of a case, as an example, in which the three data centers 11A, 11B, and 11C are set in the area A, the area B, and the area C, respectively; however, two or more of the management centers 10 may also be set in the same area. Furthermore, each of the data centers 11 may also be communicated with each other. Furthermore, in a described below, when the data centers 11A, 11B, and 11C are described without distinction, the data centers 11A, 11B, and 11C are referred to as the data center 11.
Hardware Configuration of the Management Center
In the following, the functional configuration of the management center 10 will be described with reference to
The management center 10 includes a failure management server 100, a failure contact terminal 200, and a failure handling terminal 300. The failure management server 100, the failure contact terminal 200, and the failure handling terminal 300 are connected by, for example, a network inside the management center 10 and are communicated with each other. The network inside the management center 10 is connected to the network 12 such that they can communicate with each other and the network can be communicated with the data center 11 via the network 12. Furthermore, in the example illustrated in
The failure management server 100 is an information processing apparatus that analyzes, in accordance with a failure in the data center 11, a failure state, estimates a requested skill, and specifies an appropriate engineer. For example, if the failure management server 100 receives information related to the failure that has occurred in the data center 11, the failure management server 100 specifies, as a failure handling candidate on the basis of area information that indicates the characteristic related to the occurrence of the failure in the data center 11 in which the failure has occurred, an engineer who handles the failure. In a description below, a description will be given of a case in which the failure management server 100 receives a notification of the occurrence of the failure in the data center 11 as the information related to the failure that has occurred in the data center 11.
Furthermore, the failure contact terminal 200 and the failure handling terminal 300 are implemented by, for example, a desktop personal computer (PC), a notebook PC, a tablet type terminal, a mobile phone device, a personal digital assistant (PDA), or the like. For example, the failure contact terminal 200 is used by a contact person who performs the contact task of a failure. For example, the failure handling terminal 300 is used by a failure handling candidate. In a description below, the failure contact terminal 200 may sometimes be referred to as a failure contact person. Namely, in a description below, a failure contact person can alternatively be read as the failure contact terminal 200. Furthermore, in a description below, the failure handling terminal 300 may sometimes be referred to as a failure handling candidate. Namely, in a description below, the failure handling candidate can alternatively be read as the failure handling terminal 300.
Configuration of the Failure Management Server (Information Processing Apparatus)
In the following, the configuration of the failure management server 100 according to the first embodiment will be described. As illustrated in
The communication unit 110 is implemented by, for example, a network interface card (NIC). The communication unit 110 is connected to, for example, the network 12 in a wired or a wireless manner. Then, the communication unit 110 sends and receives information to and from the data center 11 via the network 12. Furthermore, the communication unit 110 sends and receives information to and from the failure contact terminal 200 or the failure handling terminal 300 via, for example, the network inside the management center 10.
The storing unit 120 is a database that includes a storage device that stores therein various kinds of data. For example, the storing unit 120 includes, as a storage device, a hard disk, a solid state drive (SSD), an optical disk, or the like. Furthermore, the storing unit 120 may also use, as a storage device, a semiconductor memory, such as a random access memory (RAM), a flash memory, a nonvolatile static random access memory (NVSRAM), or the like, that can rewrite data.
The storing unit 120 stores therein an operating system (OS) or various kinds of programs that are executed in the control unit 130. For example, the storing unit 120 stores therein various kinds of programs including a program that executes a process of specifying an engineer, which will be described later. Furthermore, the storing unit 120 stores therein various kinds of data that are used by the programs executed in the control unit 130. The storing unit 120 according to the embodiment includes a failure handling recording database 120A, a failure handling person database 120B, and an area similarity database 120C. In the failure handling recording database 120A, failure information 121, log information 122, and requested skill information 123 are stored. In the failure handling person database 120B, engineer information 124, and holding skill information 125 are stored. Furthermore, in the area similarity database 120C, area similarity information 126 is stored. The storing unit 120 stores therein setting information 127 and unregistered skill information 128.
The failure information 121 is data that stores therein information related to a failure that has occurred in the data center system 1. For example, the failure information 121 stores therein information, such as the storage location of a file in which the content of a failure that has occurred in the data center system 1 is described for each failure; the storage location of a file in which the handling content of the failure is described; the status that indicates the handling state of the failure; the engineer who performed the handling; and the like.
The item of the failure ID is an area that stores therein identification information that identifies a failure that has occurred in the data center system 1. A failure ID is attached to the failure that has occurred in the data center system 1 as the identification information that identifies each of the data center. In the item of the failure ID, the failure ID that is attached to the failure that has occurred in the data center system 1 is stored. The item of the failure information file path is an area that stores therein the storage location of a file in which the content of the failure that is identified by the failure ID is described. The item of the handling action content file path is an area that stores therein the storage location of a file in which the content of the handling with respect to the failure that is identified by the failure ID is described. The item of the failure status is an area that stores therein the handling status of the failure that is identified by the failure ID. The item of the engineer ID is an area that stores therein identification information that identifies an engineer who handles the failure that has occurred in the data center system 1. A description thereof in detail will be described with reference to
The example illustrated in
The log information 122 is data that stores therein log information related to a failure that has occurred in the data center system 1. For example, in the log information 122, the log information acquired from the data center 11 in which the failure has occurred is included. For example, the log information 122 stores therein information, such as the storage location of a file in which a device log acquired from the data center 11 in which a failure has occurred is described; the storage location of a file in which a monitoring log acquired from the data center 11 in which a failure has occurred is described; a vendor name of a device in which a failure has occurred; and the like.
The item of the failure ID is an area that stores therein the identification information that identifies the failure that has occurred in the data center system 1. The item of the path of a device log directory is an area that stores therein the storage location of a file of the log information that is acquired from the device in which the failure identified by the failure ID has occurred. The item of the path of a monitoring log directory is an area that stores therein the storage location of a file of the log information that is acquired from the monitoring server that monitors the device in which the failure identified by the failure ID has occurred. The item of the vendor is an area that stores therein vendor information, such as a manufacturing name, serial number of the device, or the like, that related to the device in which the failure identified by the failure ID has occurred.
The example illustrated in
The requested skill information 123 is data that stores therein information indicating whether an engineer who handles each failure occurring in the data center system 1 is requested to have an ability to handle the failure (hereinafter, sometimes referred to as a “skill”). For example, the requested skill information 123 stores therein, for each failure, information indicating whether a skill related to each of the various kinds of OSs, various kinds of services, various kinds of networks, and various kinds of data storage (for example, disk) are requested.
The item of the failure ID is an area that stores therein the failure ID attached to a failure that has occurred in the data center system 1. The item of the X (OS) is an area that stores therein information indicating whether the skill related to the X (OS) has been requested to handle the failure identified by the failure ID. The item of the service A is an area that stores therein information indicating whether the skill related to the service A has been requested to handle the failure identified by the failure ID. The item of the network A is an area that stores therein is an area that stores therein information indicating whether the skill related to the network A has been requested to handle the failure identified by the failure ID. The item of the disk A is an area that stores therein information indicating whether the skill related to the disk A has been requested to handle the failure identified by the failure ID.
The example illustrated in
The engineer information 124 is data that stores therein information about the engineers registered in the data center system 1. For example, the engineer information 124 is data that stores therein information about the engineers belonging to each of the data centers. Furthermore, for example, the engineer information 124 stores therein information about the engineer ID, the name, the contact address of an engineer, the action time of an engineer, the data center to which an engineer belongs, the language that can be used by an engineer, and the like.
The item of the engineer ID is an area that stores therein the identification information that identifies the engineers registered in the data center system 1. An engineer ID is attached to each of the engineers registered in the data center system 1 as the identification information that identifies each of the engineers. The item of the engineer ID stores therein the engineer ID that is attached to each of the engineers registered in the data center system 1. The item of the name is an area that stores therein the name of the engineer identified by the engineer ID. The item of the contact address is an area that stores therein the contact address (for example, an email address, a phone number, or the like) of the engineer identified by the engineer ID. The item of the action time is an area that stores therein the time occupied by the engineer identified by the engineer ID. The item of the area information is an area that stores therein the area information associated with an engineer on the basis of a task. For example, the item of the area information is an area that stores therein an area in which the data center belonging to the engineer identified by the engineer ID is located. The item of the number of tasks is an area that stores therein the number of tasks that is being handled by the engineer identified by the engineer ID. Furthermore, the engineer information 124 is not limited to the information indicated the above and may also include therein various kinds of information, such as information on a non-working day of an engineer.
The example illustrated in
The holding skill information 125 is data that stores therein information related to the skills held by the engineers registered in the data center system 1. For example, the holding skill information 125 stores therein, for each failure, information indicating whether an engineer has the skill related to various kinds of OSs, whether an engineer has the skill related to various kinds of services, whether an engineer has the skill related to various kinds of networks, and the like.
The item of the engineer ID is an area that stores therein the engineer ID attached to the engineer registered in the data center system 1. The item of the X (OS) is an area that stores therein information indicating whether the engineer identified by the engineer ID has the skill related to the X (OS) or the like. The item of the service A is an area that stores therein information indicating whether the engineer identified by the engineer ID has the skill related to the service A or the like. The item of the network A is an area that stores therein information indicating whether the engineer identified by the engineer ID has the skill related to the network A or the like. The item of the disk A is an area that stores therein information indicating whether the engineer identified by the engineer ID has the skill related to the disk A or the like.
The example illustrated in
The area similarity information 126 is data that stores therein the information related to the similarity between each of the data centers 11. For example, the area similarity information 126 stores therein the information related to the similarity of each of the area A, the area B, and the area C. Here, in the embodiment, the similarity takes values from 0 to 1. The area with the value of the similarity that is closer to 0 indicates dissimilarity, whereas the area with the value of the similarity that is closer to 1 indicates similarity. Furthermore, the similarity is calculated on the basis of the area information that indicates the characteristic related to the occurrence of a failure in the data center in which the failure occurs and that is created for each area. For example, the similarity between the areas in which similar failures occur may also be made to high. Furthermore, for example, the similarity between the areas similar in climate may also be made to high.
The item of the area A is an area that stores therein the similarity to the area A. The item of the area B is an area that stores therein the similarity to the area B. The item of the area C is an area that stores therein the similarity to the area C.
The example illustrated in
Setting information 127 is data that stores therein a defined value needed for each process. For example, the setting information 127 stores therein the information, such as the file name of a device log, the file name of a monitoring log, a parent directory name that loads a device log, a parent directory name that loads a monitoring log, a threshold used to determine the similarity of log information, a threshold used to determine the skill of an engineer, and the like.
The item of the file name of a device log is an area that stores therein the file name of the device log received from the data center 11. The item of the file name of a monitoring log is an area that stores therein the file name of the monitoring log received from the data center 11. The parent directory name that loads a device log is an area that stores therein the parent directory name that loads the received device log. The parent directory name that loads a monitoring log is an area that stores therein the parent directory name that loads the received monitoring log. The similarity determination threshold is an area that stores therein the threshold that is used to determine the similarity of the log information. The skill determination threshold is an area that stores therein the threshold that is used to determine whether an engineer has a sufficient skill.
The example illustrated in
A description will be given here by referring back to
The receiving unit 131 receives information related to a failure that has occurred each of the data centers 11. For example, if a failure occurs in the data center 11, the receiving unit 131 receives information that is sent from the data center 11 and that is related to the failure that has occurred.
The extracting unit 132 extracts engineers who can handle the failure that has occurred. For example, the extracting unit 132 may also determine, on the basis of the various kinds of log information received from the data center 11, the type of the failure that has occurred. In this case, the extracting unit 132 may also determine, on the basis of various kinds of technologies, the content of the failure that has occurred.
The extracting unit 132 extracts the engineers who can handle the failure on the basis of, for example, the skills of the engineers stored in the holding skill information 125 in the storing unit 120. For example, the extracting unit 132 estimates, from the information related to the past failure handling, such as the failure information 121, the requested skill information 123, or the like, the skill that is requested to handle the failure detected by the receiving unit 131. For example, the extracting unit 132 may also search the failure information 121 in the storing unit 120 for a past failure in which the same problem as that currently occurs in the current failure and may also estimate the skill requested by the searched past failure as the skill that is currently requested to handle the failure that has occurred. Furthermore, the extracting unit 132 may also estimate the skill requested for the failure in which the same problem occurred in the past and that is being investigated as the skill requested to handle the failure that has occurred.
The extracting unit 132 extracts engineers who have the estimated skill. Specifically, if a failure related to software has occurred, the extracting unit 132 extracts an engineer who has the estimated skill and the time at which the failure has occurred falls on the action time of the subject engineer. For example, in the examples illustrated in
When the extracting unit 132 estimates the skill requested to handle the failure that is detected by the receiving unit 131, the extracting unit 132 may also extract the engineer who can handle the failure by taking into account the experience of the skill. For example, if the experience is also requested, in addition to the skill of the “network A”, for the failure that has occurred, the extracting unit 132 does not need to extract the engineer “A03” who has the skill of the “network A” but has no experience. Furthermore, if the extracting unit 132 estimates a plurality of skills requested to handle the failure received from the receiving unit 131, the extracting unit 132 may also extract only the engineer who has all of the skills that are estimated as the requested skills. Furthermore, the extracting unit 132 may also extract an engineer who has skills the number of which is equal to or greater than a predetermined number of skills from among the plurality of skills estimated as the requested skills. For example, if the number of skills estimated as the requested skills is five, the extracting unit 132 may also extract an engineer who has three skills out of the requested five skills. Furthermore, the extracting unit 132 may also allocate a weighting value to each of the plurality of skills estimated as the requested skills and extract an engineer who has skills in which the sum of the weighting value held by the engineer exceeds a threshold. Furthermore, the extracting unit 132 may also classify the plurality of skills estimated as the requested skills into fundamental skills and optional skills and extract an engineer who has the fundamental skills and has the optional skills the number of which is equal to or greater than a predetermined number. The extraction of an engineer, performed by the extracting unit 132, who handles a failure is only an example and the extracting unit 132 may also extract an engineer on the basis of various criteria in accordance with a failure that has occurred or in accordance with a purpose of the handling.
Furthermore, if a plurality of extracted engineers is present, the extracting unit 132 may also prioritize the plurality of extracted engineer. In this case, the extracting unit 132 may also give a higher priority to an engineer whose action time is longer from the time at which the failure has occurred. For example, if a failure occurs at 13:00 (JST) and if the engineer “A01” and the engineer “A03” are extracted as the available engineers, the extracting unit 132 may also give the first priority to the engineer “A03” whose action time is longer from 13:00 (JST). Furthermore, the extracting unit 132 may also give a higher priority to an engineer who has a greater number of skills that are estimated as the requested skills. Furthermore, the extracting unit 132 may also give a higher priority to an engineer who has skills in which the sum of weighting values is greater. The prioritization of engineers who handles the failure by the extracting unit 132 described above is only an example and the extracting unit 132 may also prioritize the engineers on the basis of various criteria in accordance with a failure that has occurred or in accordance with a purpose of the handling.
The specifying unit 133 specifies, as a failure handling candidate, the engineer who handles the failure from among the engineers extracted by the extracting unit 132. For example, if two engineers with the engineer ID of “A01” and “A02” are extracted by the extracting unit 132, the specifying unit 133 specifies, between the two engineers “A01” and “A02” as the failure handling candidates, the engineer who is allowed to handle the failure. The specifying unit 133 specifies the failure handling candidate on the basis of the comparison between the area information that indicates the characteristic related to the failure occurrence in the data center 11 in which the failure has occurred and the area information that is associated with the engineer on the basis of the task. For example, the specifying unit 133 specifies, as the failure handling candidate, the engineer associated with the area information that is similar to the area information on the data center 11 in which the failure has occurred. For example, if a failure occurs in the data center 11C located in the area C and if two engineers with the engineer ID of “A01” and “A02” are extracted by the extracting unit 132, the specifying unit 133 specifies the failure handling candidate on the basis of the area associated with each of the engineers. In this case, the area associated with the engineer with the engineer ID “A01” is the area A and the similarity to the area C in which the failure has occurred is 0.92. In contrast, the area associated with the engineer with the engineer ID “A02” is the area B and the similarity to the area C in which the failure has occurred is 0.25. Consequently, the specifying unit 133 specifies the engineer with the engineer ID “A01” associated with the area having a higher similarity as the failure handling candidate. Furthermore, the extracting unit 132 and the specifying unit 133 may also be integrated as a specifying unit.
The transmitting unit 134 sends various kinds of information to the data center 11. For example, specifically, the transmitting unit 134 may also send the information related to the engineer specified by the specifying unit 133 to the data center 11 in which a failure occurs.
Hardware Configuration of the Data Center
In the following, the functional configuration of the data center 11 will be described with reference to
The data center 11 includes a monitoring server 13, a plurality of servers 14A, and a plurality of storage media 14B. Furthermore, the plurality of the servers 14A and the plurality of the storage media 14B are targets for monitoring, by the monitoring server 13, whether a failure has occurred. When the servers 14A and the storage media 14B are described without distinction, the servers 14A and the storage media 14B are referred to as monitored devices 14. The monitoring server 13 and the plurality of the monitored devices 14 are connected by, for example, the network inside the data center 11 and can be communicated with each other. The network inside the data center 11 is connected to the network 12 such that they can communicate with each other and the network can be communicated with the management center 10 or the other data centers 11 via the network 12. Furthermore, in the example illustrated in
The monitoring server 13 is, for example, a server device that monitors the monitored device 14. Specifically, the monitoring server 13 monitors whether a failure occurs in the monitored device 14.
The server 14A is, for example, a server device that provides various kinds of services with a user. Furthermore, the storage media 14B are, for example, storage devices that provide a service stored in the various kinds of information acquired from the user.
Configuration of the Monitoring Server
In the following, the configuration of the monitoring server 13 according to the embodiment will be described. As illustrated in
The communication unit 31 is implemented by, for example, a network interface card (NIC). The communication unit 31 is connected to, for example, the network 12 in a wired or a wireless manner. Then, the communication unit 31 sends and receives information to and from the management center 10 or the other data centers 11 via the network 12. Furthermore, the communication unit 31 sends and receives information to and from the monitored device 14 via, for example, the network inside the data center 11.
The storing unit 32 is a storage device that stores therein various kinds of data. For example, the storing unit 32 is a storage device, such as a hard disk, a solid state drive (SSD), an optical disk, or the like. Furthermore, the storing unit 32 may also be a semiconductor memory, such as a random access memory (RAM), a flash memory, a nonvolatile static random access memory (NVSRAM), or the like, that can rewrite data.
The storing unit 32 stores therein Operating Systems (OSs) or various kinds of programs that are executed in the control unit 33. For example, the storing unit 32 stores therein various kinds of programs including a program that executes a migration control process, which will be described later. Furthermore, the storing unit 32 stores therein various kinds of data that are used by the program executed by the control unit 33. For example, the storing unit 32 stores therein setting information 40.
The setting information 40 is data that stores therein defined values needed for each process. For example, the setting information 40 stores therein information related to the data centers, such as the file name of a device log, the file name of a monitoring log, a script name or the like that is used to collect device logs and vendor information, the script name or the like that is used to collect monitoring logs.
The item of the file name of a device log is an area that stores therein the file name of the device log of the monitored device 14 in which a failure occurs. The item of the file name of a monitoring log is an area that stores therein the file name of a monitoring log of the monitoring server 13. The script name, etc. that is used to collect the device logs and vendor information is an area that stores therein the script name that is used to collect the device logs and the vendor information or is an area that stores therein the command name. The script name, etc. that is used to collect the monitoring logs is an area that stores therein the script name that is used to collect the monitoring logs or is an area that stores therein the command name. The information related to the data center is an area that stores therein various kinds of information related to the data center, such as the name of a system administrator, the contact address, the name of a data center, area information, and the like.
The example illustrated in
A description will be given here by referring back to
The detecting unit 50 detects a failure that occurs in the monitored device 14 or the like operated in the data center 11. For example, the detecting unit 50 detects the operational status of the data center 11. For example, the detecting unit 50 detects, as the operational status of the data center 11, the operational status of the failure in the operational status checking system that is operating in the data center 11. For example, the detecting unit 50 detects whether a failure occurs by using a log or a thermal error of the basic input output system (BIOS) of the monitoring server 13 in which the operational status checking system is operated, by using an event log of the OS of a virtual machine, by using a monitoring ALARM message, or the like.
If a failure occurs in the data center 11, the transmitting unit 51 sends the information related to the failure that has occurred to the management center 10. For example, if a failure occurs in the data center 11, the transmitting unit 51 sends, to the management center 10, the device log of the monitored device 14 in which the failure has occurred, the monitoring log of the monitoring server 13, or the like.
The receiving unit 52 receives various kinds of information sent from the management center 10. For example, if a failure occurs in the data center 11, the receiving unit 52 receives information related to the engineer who handles the failure from the management center 10.
Here, an example of specifying an engineer who handles a failure when the failure has occurred in the data center 11 in the data center system 1 will be described with reference to
First, if the monitoring server 13 in the data center 11 detects a failure in the monitored device 14, such as the server 14A or the storage media 14B, the monitoring server 13 collects logs (see (1) illustrated in
The failure management server 100 that received the notification of the occurrence of the failure checks the logs received from the monitoring server 13 against the logs stored in the failure handling recording database 120A and creates a requested skill list (see (3) illustrated in
Thereafter, the failure management server 100 creates a failure handling candidate list by using the requested skill list and information related to the engineers stored in the failure handling person database 120B (see (4) illustrated in
Thereafter, the failure management server 100 attaches the failure handling candidate list to the mail received from the monitoring server and sends the mail to the failure contact terminal 200 (see (6) in
In the following, a calculation of the similarity of logs will be described with reference to
For example, the collection log, which is the log that is collected when a failure has occurred, indicates that three error codes are output in the order of a 273th warning, a third error, and a fourth error and indicates that an alert is sent. In contrast, for example, a log A in the log information 122 stored in the failure handling recording database 120A indicates that three error codes are output in the order of a 295th warning, the third error, and the fourth error and indicates that an alert is sent. Accordingly, in the collection log and the log A, the error code that is output second is the same third error and the error code that is output third is the same fourth error. Here, in the embodiment, the failure management server 100 uses the value obtained by dividing the number of the same error codes by the number of all of the error codes as the similarity. Accordingly, the similarity of the collection log to the log A is ⅔=0.67.
In contrast, for example, a log B in the log information 122 stored in the failure handling recording database 120A indicates that three error codes are output in the order of a 101th warning, a 103th warning, the fourth error and indicates that an alert is sent. Accordingly, in the collection log and the log B, the error code that is output third is the fourth error, which is the same in the both logs. Accordingly, the similarity of the collection log to the log B is ⅓=0.33.
Furthermore, the calculation example EX2 illustrated in
For example, the collection log indicates that an alert is sent after three operations are performed in the order of an operation A, an operation C, and an operation D. In contrast, for example, the log A in the log information 122 stored in the failure handling recording database 120A indicates that an alert is sent after three operations are performed in the order of an operation B, the operation C, and the operation D. Accordingly, in the collection log and the log A, the operation that is performed second is the same operation C and the operation that is performed third is the same operation D. Here, in the embodiment, the failure management server 100 uses the value obtained by dividing the number of the same operations by the number of all of the operations as the similarity. Accordingly, the similarity of the collection log to the log A is ⅔=0.67.
In contrast, for example, the log B in the log information 122 stored in the failure handling recording database 120A indicates that an alert is sent after three operations are performed in the order of an operation X, an operation Y, and an operation D. Accordingly, in the collection log and the log B, the operation that is performed third is the same operation D. Accordingly, the similarity of the collection log to the log B is ⅓=0.33.
Furthermore, the calculation example EX3 illustrated in
In the following, information that is updated at the time of a process of specifying an engineer (failure handling candidate) who performs failure handling will be described with reference to
First, when the failure management server 100 receives a notification of the occurrence of a failure, the failure management server 100 adds, to the failure information 121, the information related to the failure that has occurred. This point will be described with reference to
Furthermore, in addition to adding the information to the failure information 121, the failure management server 100 adds, to the log information 122, the information related to the failure that has occurred. This point will be described with reference to
In the following, a process of creating a requested skill list by the failure management server 100 will be described with reference to
In the example illustrated in
Thus, the failure management server 100 extracts the records with the failure ID of F01 and F03 in the requested skill information 123. Furthermore, if the number of extracted records is less than, for example, the threshold TH12 indicated in
In the following, a process of creating a failure handling candidate list by the failure management server 100 will be described with reference to
First, the failure management server 100 calculates a skill value and an experience value of each of the engineers by using the requested skill list and the holding skill information 125. In the example illustrated in
Furthermore, when the failure management server 100 calculates an experience value, the failure management server 100 adds the aggregate value of the requested skill list associated with the item indicated by “experienced” in the holding skill information T125-1. For example, the engineer with the engineer ID “A02” has an experience of the X (OS) and the service A. Thus, the failure management server 100 calculates the experience value of the engineer with the engineer ID “A02” as 2 that is obtained by adding the aggregate value 1 of the X (OS) and the aggregate value 1 of the service A.
Here, the failure management server 100 extracts engineers with the skill value equal to or greater than a predetermined threshold. In the example illustrated in
The failure management server 100 extracts the record of the target engineer from the engineer information 124, creates the engineer information T124-1, and adds the skill value and the experience value of each of the engineers. Then, the failure management server 100 creates engineer information T124-2 that is obtained by replacing the area information in the engineer information T124-1 with the similarity between the area associated with each of the engineers and the area in which the data center 11 in which a failure has occurred is located. For example, the area associated with the engineer with the engineer ID “A02” is the area B and the area in which the data center 11 in which the failure has occurred is located in the area C. Thus, the failure management server 100 replaces the area information in the record of the engineer with the engineer ID “A02” with the similarity of “0.25” between the area B and the area C. Furthermore, for example, the area associated with the engineer with the engineer ID “A03” is the area A. Thus, the failure management server 100 replaces the area information in the record of the engineer with the engineer ID “A03” with the similarity of “0.92” between the area A and the area C.
Thereafter, by using the engineer information T124-2 in which the area information is replaced by the similarity, the failure management server 100 classifies the candidates for the failure handling, which will be described in detail later. Furthermore, the failure management server 100 sends a mail to the failure contact terminal 200 on the basis of the engineer information T124-2. For example, the contact person who uses the failure contact terminal 200 determines, on the basis of the information acquired from the failure contact terminal 200, the engineer (failure handling candidate) who performs the failure handling. Furthermore, the failure management server 100 may also determine, on the basis of the engineer information T124-2, the engineer (failure handling candidate) who performs the failure handling. Furthermore, for example, when the failure management server 100 and the contact person who uses the failure contact terminal 200 allow all of the specified failure handling candidates to perform the failure handling, the failure management server 100 and the contact person do not need to perform the determination described above.
In the following, the flow of a process performed after an engineer who performs the failure handling has been specified will be described with reference to
First, the failure handling candidate (the failure handling terminal 300) acquires the failure state via a hearing from the data center 11 in which a failure has occurred (see (1) in
Then, on the basis of the information or the like obtained from the logs or the hearing, the failure handling candidate checks and handles the failure that has occurred (see (4) in
After the failure handling has been completed, the failure handling candidate records the status in the failure management server 100 (see (5) in
Then, the failure management server 100 records the failure handling recorded by the failure handling candidate in the failure handling recording database 120A (see (6) in
In the following, the information that is updated after the failure handling has been completed will be described with reference to
First, if the failure handling has been completed, the failure management server 100 updates the information, in the failure information 121, that is related to the record associated with the failure that has been handled. This point will be described with reference to
Then, if the failure handling has been completed, the failure management server 100 adds the information on the record associated with the failure that has been handled to the requested skill information 123. This point will be described with reference to
Furthermore, if the failure handling has been completed, the failure management server 100 updates the information, in the engineer information 124, on the record that is associated with the failure handling candidate and that has the failure ID “F05”. This point will be described with reference to
Furthermore, if the failure handling has been completed, the failure management server 100 updates the information on the record, in the holding skill information 125, that is associated with the engineer who has the engineer ID “A03” and who is the failure handling candidate. This point will be described with reference to
In the following, a case in which an unregistered skill is added to a skill item will be described on the basis of
The unregistered skill information 128 is data that stores therein the information related to unregistered skills that have not been added to the skill item in the requested skill information 123 and the holding skill information 125. For example, if “other” is selected when the failure handling process is recorded, the failure management server 100 registers, in the unregistered skill information 128, the content of the skill that is input in the text, the failure ID thereof, and the engineer ID.
The table ID is an area that stores therein the identification information that identifies the information related to the unregistered skill that has been registered. A table ID is attached, as the identification information that identifies each of the pieces of the information, to the information related to unregistered skill that has been registered in the unregistered skill information 128. In the item of the table ID, the table ID attached to the information related to the unregistered skill that has been registered is stored. The item of the failure ID is an area that stores therein the identification information that identifies the failure that occurs in the data center system 1. For example, in the item of the failure ID, the failure ID that is input when “other” is selected at the time of recording the failure handling process. The item of the skill content is an area that stores therein the skill content requested when the failure handling process is performed. The item of the registered engineer ID is an area that stores therein the engineer ID of the failure handling candidate. For example, in the item of the registered engineer ID, the engineer ID that is input when “other” is selected at the time of recording the failure handling process.
The example illustrated in
In the following, a description will be given of an example in which the unregistered skill in the unregistered skill information 128 is added to the skill item in the requested skill information 123 or the holding skill information 125. In below, a description will be given of an example in which the skill content “service B (software)” of T01 and the skill content “service B (platform)” of T03 are integrated to a single skill item of “service B” and the integrated “service B” is added to the requested skill information 123 and the holding skill information 125. In this way, the similar skills in the unregistered skill information 128 may also be added to the requested skill information 123 and the holding skill information 125 as the integrated skill item.
First, the failure management server 100 adds the unregistered skill to the requested skill information 123 as a skill item. This point will be described with reference to
Furthermore, the failure management server 100 adds the unregistered skill to the holding skill information 125 as the skill item. This point will be described with reference to
Furthermore, the failure management server 100 may also update the area similarity information at predetermined intervals (for example, once a week or the like). An example of a process in which the failure management server 100 updates the area similarity information will be described below. For example, the failure management server 100 extracts, from each of the records on the failure information, a record in which the failure status is “completed”. For example, the failure management server 100 performs a statistical process on the basis of the “failure section” of the file indicated by the “handling action content file path” and the “area information on the data center in which the failure has occurred” in the extracted record, and aggregates for each area. For example, the failure section may also be created on the base of the failure caused by a geographical characteristic. In the geographical characteristic mentioned here, various kinds of information, such as a climate, the stability of the electrical power supply, or the like, may also be included. For example, the failure section may also include the climate calculated on the basis of the frequency of the failure caused by a temperature and humidity as the geographical characteristic. Furthermore, for example, the failure section may also include an environment that is calculated on the basis of the frequency of the failure caused by the environment, such as cosmic rays, a hardware failure, or the like. Furthermore, for example, the failure section may also be created on the basis of the failure that occurred in the data center 11 in the past. For example, the failure section may also include the hardware quality or the software quality calculated on the basis of the frequency of, for example, a hardware failure. Furthermore, for example, the failure section may also include the learning level of an operator calculated on the basis of the frequency of the failure caused by, for example, an operation error and a setting error. Furthermore, the failure section may also be divided into parts in accordance with the object. For example, the failure section “climate” may also be divided into a “high-temperature environment failure”, a “low-temperature environment failure”, a “failure caused by the humidity”, and the like. Then, for all of the combinations of the areas, the failure management server 100 calculates the similarity between the areas of the itemized aggregate value acquired from the aggregation for each area and updates the area similarity information to the obtained result.
Flow of the Process Performed in the Data Center System
In the following, the flow of the process performed by the data center system 1 according to the embodiment will be described on the basis of
If the operation is not possible (No at Step s103), the monitored device 14 that accepts the request from the monitoring server 13 sends an error response to the monitoring server 13 (Step s104). Furthermore, if the operation is possible (Yes at Step s103), the monitored device 14 performs the script and collects logs and the vendor information (Step s105). Thereafter, the monitored device 14 sends the collected information to the monitoring server 13 (Step s106).
The monitoring server 13 that received the information from the monitored device 14 performs the monitoring log collection script and collects the monitoring logs (Step s107). Then, the monitoring server 13 creates a mail in which DC information that is the information that is related to the data center and that is defined in the set file is described (Step s108).
Then, if an error response is received from the monitored device 14 at Step s104 (Yes at Step s109), the monitoring server 13 attaches the collected logs to the created mail and sends the mail to the management center 10 (Step s110). Then, the management center 10 that received the mail performs the process at Step s112 illustrated in
In the following, the process performed on the management center 10 side that received the mail will be described.
First, the control unit 130 in the management center 10 that received the mail from the monitored device 14 issues a failure ID (Step s112). Then, the control unit 130 acquires a log file from the incoming email and loads the acquired file (Step s113). Furthermore, the control unit 130 acquires the area information and the device vendor information from the incoming email (Step s114). Furthermore, the control unit 130 registers the issued ID (issued failure ID) and the area information in the failure handling recording database 120A (hereinafter, referred to as a failure handling record DB 120A) (Step S115).
The failure handling record DB 120A that received the registration from the control unit 130 adds a new record (Step s116). Then, the failure handling record DB 120A sets the input ID that is the failure ID acquired from the control unit 130 in the failure ID (Step s117). Furthermore, the failure handling record DB 120A sets “not yet started” in the failure status (Step s118). Furthermore, the failure handling record DB 120A sets the input area information in the “area information on the data center in which the failure has occurred” and notifies the control unit 130 of the input area information (Step s119).
The control unit 130 that received the notification from the failure handling record DB 120A registers the failure ID, the path for the log file, and the vendor information in the failure handling record DB 120A (Step s120).
The failure handling record DB 120A that accepted the registration from the control unit 130 adds a new record (Step s121). Then, the failure handling record DB 120A sets the input ID that is the failure ID acquired from the control unit 130 to the failure ID (Step s122). Furthermore, if the device log is also registered (Yes at Step s123), the failure handling record DB 120A sets an input path to the device log and the file path for the monitoring log, sets the input vendor information to the vendor, and notifies the control unit 130 of the result (Step s124). Thereafter, the control unit 130 that received the notification performs the process at Step s127 illustrated in
As illustrated in
The failure handling record DB 120A that received the request searches the failure information 121 for the record by using the condition that the failure status is “completed” (Step s128). Then, the failure handling record DB 120A returns the list of the subject ID to the control unit 130 (Step s129).
The control unit 130 that acquired the list of the subject ID requests, from the failure handling record DB 120A (the log information 122), the record that has the acquired failure ID (acquired ID) (Step s130).
The failure handling record DB 120A that accepted the request extracts the record from the log information 122 by using the input ID that is the input failure ID as a key and returns the extracted record to the control unit 130 (Step s131).
The control unit 130 that acquired the extracted record from the failure handling record DB 120A sets the variable i to 0, performs the process at Steps s133 to s135, and repeats the process of incrementing the variable i by 1 by the number of times that corresponds to the number of extracted records (Step s132). First, the control unit 130 calculates the similarity of the log and the vendor information related to the failure that has occurred and the log and the vendor information related to the record i (Step s133). For example, by calculating the similarity of the logs illustrated in
After the control unit 130 ends the processes that are repeatedly performed at Steps s132 to s135, if the number of acquired IDs that are acquired at Step S135 is greater than the predetermined threshold (Yes at Step s136), the control unit 130 performs the process at Step s137 illustrated in
As illustrated in
The failure handling record DB 120A that accepted the request extracts a record from the requested skill information 123 by using the input ID that is the input failure ID as a key and returns the extracted record to the control unit 130 (Step s138).
The control unit 130 that acquired the extracted record from the failure handling record DB 120A creates the list that has each of the skill items of the extracted records (Step s139). For example, the control unit 130 may also create, on the basis of the list of the skill item in the requested skill table, a list that has each of the skill items of the extracted records. For example, the control unit 130 may also create a list that has each of the skill items of the record extracted from the requested skill list that is created on the basis of the process illustrated in
Here,
The failure handling person DB 120B that accepted the request returns, to the control unit 130, all of the records in the holding skill information 125 as the extracted record (Step s202).
The control unit 130 that acquires the extracted record from the failure handling person DB 120B creates an empty and temporary file (Step s203). Then, after the control unit 130 sets the variable i to 0, the control unit 130 performs the processes at Steps s205 to s213 and repeats the process of incrementing the variable i by 1 by the number of times corresponding to the number of extracted records (Step s204). First, the control unit 130 sets the skill value to 0 and sets the experience value to 0 (Step s205). Then, after the control unit 130 sets the variable j to 0, the control unit 130 performs the processes at Steps s207 to s211 and repeats the process of incrementing the variable j by 1 by the number of times corresponding to the number of extracted records (Step s206). First, the control unit 130 sets the list value to the value of the item j in the requested skill list (Step s207).
Then, if the skill item j of the record i is “skilled” (Yes at Step s208), the control unit 130 updates the skill value to the value obtained by adding the skill value to the list value (Step s209). Thereafter, the control unit 130 performs the process at Step s210. Furthermore, if the skill item j of the record i is not “skilled” (No at Step s208), the control unit 130 performs the process at Step s210.
If the skill item j of the record i is “experienced” (Yes at Step s210), the control unit 130 updates the experience value to the value obtained by adding the experience value to the list value (Step s211). Then, the control unit 130 returns to Step s206 and repeats the processes. Furthermore, if the skill item j of the record i is not “experienced” (No at Step s210), the control unit 130 returns to Step s206 and repeats the processes.
After the control unit 130 ends the processes that are repeatedly performed at Steps s206 to s211, the control unit 130 determines whether the updated skill value is equal to or greater than the predetermined threshold (Step S212). If the updated skill value is equal to or greater than the predetermined threshold (Yes at Step s212), the control unit 130 outputs the engineer ID, the skill value, and the experience value to the temporary file (Step s213). Then, the control unit 130 returns to Step s204 and repeats the processes. Furthermore, if the updated skill value is less than the predetermined threshold (No at Step s212), the control unit 130 returns to Step s204 and repeats the processes.
After the control unit 130 ends the processes repeatedly performed at Steps s204 to s213, the control unit 130 reads the created temporary file (Step s214). Then, the control unit 130 performs the process at Step s215 illustrated in
As illustrated in
The failure handling person DB 120B that accepted the request extracts a record from the engineer information 124 by using, as a key, the input ID that is the ID that was input and then returns the extracted record to the control unit 130 (Step s216).
The control unit 130 that acquired the record from the failure handling person DB 120B creates a temporary table in which the columns of the “skill value” and the “experience value” are added to the returned record (Step s217).
Then, after the control unit 130 sets the variable i to 0, the control unit 130 performs the processes at Steps s219 and s220 and repeats the process of incrementing the variable i by 1 by the number of times corresponding to the number of records that are output to the temporary file (Step s218). First, the control unit 130 acquires, from the read data, the information on the “engineer ID”, the “skill value”, and the “experience value” in the record that is output at the ith time (Step s219). Then, the control unit 130 sets the acquired information on the “skill value” and the “experience value” to the items of the “skill value” and the “experience value”, respectively, in the record that matches the acquired ID in the temporary table (Step s220). Then, the control unit 130 returns to Step s218 and repeats the processes.
After the control unit 130 ends the processes repeatedly performed at Steps s218 to s220, the control unit 130 refers to the mail and acquires the area information on the data center (DC) (Step s221).
Then, after the control unit 130 sets the variable i to 0, the control unit 130 performs the processes at Steps s223 and s224 and repeats the process of incrementing the variable i by 1 by the number of times corresponding to the number of records in the temporary table (Step s222). First, the control unit 130 acquires the similarity between the areas registered in the area similarity database 120C (hereinafter, referred to as the area similarity DB 120C) from the area information in the table (=record i) and the area information acquired at Step s221 (Step s223). Then, the control unit 130 overwrites the value acquired at Step s223 to the area information in the table (=record i) (Step s224). For example, the control unit 130 may also overwrite the area information on the basis of the area similarity information 126 illustrated in
As illustrated in
In the following, the process performed after a failure handling candidate is specified will be described with reference to
First, a responsible person (contact person) at the contact desk inputs, at the failure contact terminal 200, the “engineer ID” and the “failure ID” to the failure management server 100 (Step s307).
The control unit 130 in the failure management server 100 that received an input from the failure contact terminal 200 inputs the engineer ID and the failure ID to the failure handling record DB 120A (the failure information 121) (Step s308). Furthermore, for example, the control unit 130 notifies of an input of a failure handling candidate.
The failure handling record DB 120A that received the input from the control unit 130 sets the input engineer ID in the item of the “engineer ID” in the record, in the failure information 121, that has the input failure ID and notifies the control unit 130 of the setting (Step s309).
The control unit 130 that received the notification from the failure handling record DB 120A inputs the engineer ID to the failure handling person DB 120B (the engineer information 124) (Step s310). For example, the control unit 130 notifies the failure handling person DB 120B that the engineer information has been updated.
The failure handling person DB 120B that received the input from the control unit 130 increments the item of the “number of tasks” in the record that has the input ID in the engineer information 124 by 1 and notifies the control unit 130 of this state (Step s311).
The control unit 130 that received the notification from the failure handling person DB 120B notifies the failure contact terminal 200 (the failure contact person) of the completion of the registration (Step s312).
The failure contact terminal 200 (the failure contact person) checks the completion of the registration process that is received from the control unit 130 in the failure management server 100 (Step s313), whereby the registration process has been completed.
In the following, the registration process performed on the failure information will be described with reference to
First, the failure handling candidate inputs, at the failure handling terminal 300, the “failure ID” and the failure information to the failure management server 100 (Step s314).
The control unit 130 in the failure management server 100 that received an input from the failure handling terminal 300 stores therein the failure information as a file (Step s315). Then, the control unit 130 inputs the file path that is stored together with the failure ID to the failure handling record DB 120A (Step s316).
The failure handling record DB 120A that received the input from the control unit 130 extracts the record from the failure information 121 by using the input ID as a key (Step s317). Then, the failure handling record DB 120A sets the input file path to the “failure information file path” in the extracted record (Step s318). Thereafter, the failure handling record DB 120A changes the “failure status” of the extracted record from “not yet started” to “being investigated” and notifies the control unit 130 of the status (Step s319).
The control unit 130 that received the notification from the failure handling record DB 120A notifies the failure handling terminal 300 (the failure handling candidate) of the completion of the registration (Step s320).
The failure handling terminal 300 (the failure handling candidate) that received from the control unit 130 in the failure management server 100 checks the completion of the registration process (Step s321), whereby the registration process has been completed.
In the following, the registration process performed on the failure information will be described with reference to
First, a contact person logs into the failure management server 100 by using the input screen at the failure handling terminal 300 (Step s401). At this point, the contact person may also be a failure handling candidate or may also be another contact person who acquired the information requested for the registration from the failure handling candidate.
The control unit 130 in the failure management server 100 in which the contact person logged requests the list of skills from the failure handling record DB 120A (Step s402).
The failure handling record DB 120A that received the request returns the information on the items in the table in the requested skill information 123 to the control unit 130 (Step s403).
The control unit 130 that acquired the item information in the table in the requested skill information 123 creates an input screen and displays the failure handling terminal 300 (Step s404).
Then, the contact person inputs various kinds of information in the input screen that are displayed on the failure handling terminal 300 (Step s405). At this time, the contact person may also input a skill by using a method of selecting the skill from the list.
The control unit 130 in the failure management server 100 that received the input from the failure handling terminal 300 stores the handling action content as a file (Step s406). Then, the control unit 130 inputs the failure ID and the saved file path to the failure handling record DB 120A (Step s407).
The failure handling record DB 120A that received the input extracts the record that has the input ID (Step s408). Then, the failure handling record DB 120A sets the file path in the item of the “handling action content file path” (Step s409). Then, the failure handling record DB 120A changes the failure status from “being investigated” to “completed” and notifies the control unit 130 of the change (Step s410). The control unit 130 that received the notification from the failure handling record DB 120A performs the process at Step s411 illustrated in
As illustrated in
The failure handling record DB 120A that received the input adds a new record to the requested skill information 123 (Step s412). Then, the failure handling record DB 120A sets the failure ID (Step s413). Then, the failure handling record DB 120A sets “Yes” in the item of the corresponding skill, sets “No” in the other items, and notifies the control unit 130 of the result (Step s414).
The control unit 130 that received the notification from the failure handling record DB 120A inputs the input ID to the failure handling person DB 120B (Step s415).
The failure handling person DB 120B that received the input extracts the record that has the input ID from the engineer information 124 (Step s416). Then, the failure handling person DB 120B decrements the number of tasks of the extracted record by 1 and notifies the control unit 130 of the result (Step s417).
The control unit 130 that received the notification from the failure handling person DB 120B inputs the engineer ID and the input skill to the failure handling person DB 120B (Step s418).
The failure handling person DB 120B that received the input extracts the record that has the input ID from the holding skill information 125 (Step s419). Then, the failure handling person DB 120B sets “experienced” to each of the items of the input skill in the extracted record and notifies the control unit 130 of it (Step s420).
The control unit 130 that received the notification from the failure handling person DB 120B notifies the failure handling terminal 300 (contact person) of the completion of the input (Step s421).
The failure handling terminal 300 (contact person) that received from the control unit 130 in the failure management server 100 checks the completion of the input (Step s422), whereby the registration process has been completed.
In the following, the additional process of the skill item will be described with reference to
First, the administrator of the management center 10 inputs the skill name and the table ID to the failure management server 100 (Step s501). Furthermore, the administrator may also input the skill name and the table ID to the failure management server 100 via the dedicated terminal or may also directly input the subject data to the failure management server 100.
The control unit 130 in the failure management server 100 that received the input from the administrator of the management center 10 inputs the input skill name and the table ID to the failure handling record DB 120A (Step s502).
The failure handling record DB 120A that received the input adds a skill item to the requested skill information 123 (Step s503). Then, the failure handling record DB 120A sets “Yes” to the value of the skill item added to the record that has the input table ID in the requested skill information 123 (Step s504). Furthermore, the failure handling record DB 120A sets “No” to the value of the skill item added to the record that does not have the input table ID and notifies the control unit 130 of the result (Step S505).
The control unit 130 that received the notification from the failure handling record DB 120A inputs the input skill name and the table ID to the failure handling person DB 120B (Step s506).
The failure handling person DB 120B that received the input adds a skill item to the holding skill information 125 (Step s507). Then, the failure handling person DB 120B sets “skilled/experienced” to the value of the skill item added to the record that has the input table ID (Step S508). Furthermore, the failure handling person DB 120B sets the “unskilled/unexperienced” to the value of the skill item added to the record that does not have the input table ID and notifies the control unit 130 of the result (Step s509).
The control unit 130 that received the notification from the failure handling person DB 120B inputs the input table ID to the storing unit 120 (hereinafter, referred to as a DB 120) (Step s510).
The DB 120 that received the input deletes the record that has the input table ID from the unregistered skill information 128 and notifies the control unit 130 of the result (Step s511).
The control unit 130 that received the notification from the DB 120 notifies the administrator of the management center 10 that the input has been completed (Step s512).
The administrator of the management center 10 received from the control unit 130 in the failure management server 100 checks the completion of the input (Step s513), whereby the registration process has been completed.
In the following, a process of updating the area similarity will be described with reference to
The control unit 130 in the failure management server 100 requests, from the failure handling record DB 120A, the record in which the failure status is “completed” (Step s601).
The failure handling record DB 120A that received the request searches the failure information 121 for the record by using the condition that the failure status is “completed” (Step s602). Then, the failure handling record DB 120A returns the extracted record that is extracted from the failure information 121 to the control unit 130 (Step s603).
The control unit 130 that acquired the extracted record from the failure handling record DB 120A checks the “failure section” of the file indicated by the “handling action content file path” in the extracted record and aggregates the extracted records for each area (Step s604).
Then, after the control unit 130 sets the variable a to 0, the control unit 130 performs the process at Steps s606 to s608 and repeats the process of incrementing the variable a by 1 by the number of times corresponding to the number of areas (Step s605). Furthermore, after the control unit 130 sets the variable b to the value obtained by incrementing the variable a by 1, the control unit 130 performs the processes at Steps s607 and s608 and repeats the process of incrementing the variable b by 1 until the variable b reaches the number of areas (Step s606). First, the control unit 130 calculates the similarity of the area a to the area b on the basis of the aggregate value for each “failure section” acquired at Step s604 (Step s607). Then, the control unit 130 sets the calculated similarity values in the cell of the “area a” and the “area b” in the “area similarity table” and notifies the area similarity DB 120C of the result (Step s608).
The area similarity DB 120C that received the notification from the control unit 130 overwrites the set value in the two cells, i.e., (column, row)=(area a, area b) and (area b, area a), respectively, and notifies the control unit 130 of the result (Step s609).
The control unit 130 that received the notification from the area similarity DB 120C returns to Step s606 and repeats the process. After the control unit 130 ends the process repeatedly performed at Steps s605 to s608, the control unit 130 ends the registration of the update.
As described above, the information processing apparatus according to the embodiment (in the embodiment, the failure management server 100) includes the receiving unit 131 and the specifying unit 133. The receiving unit 131 receives a notification indicating that a failure occurs in each of the data centers 11 arranged in a plurality of locations. The specifying unit 133 compares the area information that indicates the characteristic related to the occurrence of the failure in the data center 11 in which the failure has occurred with the area information that is associated with an engineer on the basis of the task and specifies the engineer who is associated with the area information that is similar to the area information on the data center 11 in which the failure has occurred by giving a higher priority to the engineer than engineers belonging to the other data center. Consequently, the failure management server 100 can speed up the handling of the failure that has occurred in the data center.
Furthermore, in the failure management server 100 according to the embodiment, the specifying unit 133 compares area information that is associated with the characteristic related to the occurrence of a past failure in the data center 11 in which the failure occurred with the area information that is associated with the engineer on the basis of the task and specifies the engineer who is associated with the area information that is similar to the area information on the data center 11 in which the failure has occurred by giving a higher priority to the engineer than engineers belonging to the other data center. Consequently, because the failure management server 100 specifies the engineer on the basis of the area information that is associated with the characteristic related to the occurrence of the past failure in the data center, the failure management server 100 can further speed up the handling of the failure that has occurred in the data center.
Furthermore, in the failure management server 100 according to the embodiment, the specifying unit 133 compares the area information that is associated with the geographical characteristic of the data center 11 in which a failure has occurred with the area information that is associated with the engineer on the basis of the task and specifies the engineer who is associated with the area information that is similar to the area information on the data center 11 in which the failure has occurred by giving a higher priority to the engineer than engineers belonging to the other data center. Consequently, because the failure management server 100 specifies the engineer on the basis of the area information by taking into account the geographical characteristic of the data center, the failure management server 100 can further speed up the handling of the failure that has occurred in the data center.
Furthermore, in the failure management server 100 according to the embodiment, the specifying unit 133 compares the area information on the data center in which a failure has occurred with the area information that is associated with the engineer on the basis of the area information on the data center that was handled by the engineer who handled the failure in the past and who is associated with the area information that is similar to the area information on the data center in which the failure has occurred by giving a higher priority to the engineer than engineers belonging to the other data center. Consequently, because the failure management server 100 specifies the engineer on the basis of the area information on the location of the data center in which the engineer performed failure handling in the past, the failure management server 100 can further speed up the handling of the failure that has occurred in the data center.
Furthermore, the components of each device illustrated in the drawings are only for conceptually illustrating the functions thereof and are not always physically configured as illustrated in the drawings. In other words, the specific shape of a separate or integrated device is not limited to the drawings. Specifically, all or part of the device can be configured by functionally or physically separating or integrating any of the units depending on various loads or use conditions. For example, each of the processing units, such as the receiving unit 131, the extracting unit 132, the specifying unit 133, and the transmitting unit 134, may also be integrated as a single unit. Furthermore, the process performed by each of the processing units may also be appropriately separated into processes performed by a plurality of processing units. Furthermore, all or any part of the processing functions performed by each device can be implemented by a CPU and by programs analyzed and executed by the CPU or implemented as hardware by wired logic.
Information Processing Program
Furthermore, various kinds of processes described in the above embodiment can be implemented by executing programs prepared in advance for a computer system, such as a personal computer, a workstation, or the like. Accordingly, in the following, a description will be given of an example of a computer system that executes a program having the same function as that performed in the embodiment described above.
As illustrated in
The HDD 420 stores therein, in advance, an information processing program 420a having the same function as the performed by the receiving unit 131, the extracting unit 132, the specifying unit 133, and the transmitting unit 134 described above. The information processing program 320a may also appropriately be separated.
Furthermore, the HDD 420 stores therein various kinds of information. For example, the HDD 420 stores therein various kinds of data that are used for the OS or production planning.
Then, the CPU 410 reads the information processing program 420a from the HDD 420 and executes the program, whereby the information processing program 420a executes the same operation as that executed by each of the processing units in the embodiment. Namely, the information processing program 420a executes the same operation as that performed by the receiving unit 131, the extracting unit 132, the specifying unit 133, and the transmitting unit 134.
Furthermore, the information processing program 320a described above is not always needed to be initially stored in the HDD 420.
For example, the program is stored in a “portable physical medium”, such as a flexible disk (FD), a CD-ROM, a DVD disk, a magneto-optic disk, an IC CARD, or the like, that is to be inserted into the computer 400. Then, the computer 400 may read and execute the program from the portable physical medium.
Furthermore, the program may also be stored in “another computer (or a server)” connected to the computer 400 via a public circuit, the Internet, a LAN, a WAN, or the like. Then, the computer 400 may also read and execute the program from the other computer.
According to an aspect of an embodiment of the present invention, an advantage is provided in that it is possible to speed up handling of a failure that has occurred in a data center.
All examples and conditional language recited herein are intended for pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2015-059641 | Mar 2015 | JP | national |