The present invention relates to a correct data generation device, a correct data generation method, and a correct data generation program.
Network monitoring operation work has a process of detecting a state change and an alarm of a device by continuous monitoring, grasping an event such as a failure or construction, analyzing and determining division of failure factors or the like, and performing (handling) failure recovery.
This process is achieved by a maintenance person (remote worker) who manages the entire network sending a local worker to a site when physical work such as repair and replacement of a failed device on the site is required. The maintenance person performs management of the devices arranged all over the country remotely from an aggregation base.
In order to handle an event that has occurred in a network, it is important for the maintenance person to grasp what kind of event (construction or failure) has caused the detected group of alarms.
Non Patent Literature 1 discloses a technology of correlating a plurality of alarms occurring by one event by combining network connection configuration information and a predefined rule.
An alarm is generated from a plurality of devices by one event, and a plurality of events occur simultaneously in a network across the country, so that a large number of alarms are generated. A maintenance person correlates (associates) the group of alarms for each event.
A method using machine learning has been proposed as a technology of automating association. However, in a case of using machine learning, it is necessary to learn a large amount of correct data in which alarms are correlated in units of events.
However, manually checking several tens of thousands of alarms generated per day one by one and giving a result of correlation to each alarm would increase the burden on the maintenance person. Therefore, it is desired to easily generate correct data.
Although it is possible to generate alarm correlation data on the basis of the method of Non Patent Literature 1 and use the data for machine learning, the maintenance person needs to define a rule in advance. Therefore, the creation of correct data using Non Patent Literature 1 requires rule definition by a maintenance person, and not all can be automated.
The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a correct data generation device, a correct data generation method, and a correct data generation program for easily generating correct data for causing machine learning of alarm correlation.
In order to achieve the above object, an aspect of the present invention includes: an acquisition unit that acquires alarm information output from a plurality of devices; a correlation unit that associates, from the alarm information, alarms whose occurrence times are within a first time width and recovery time is within a second time width, as a group of alarms that have occurred by a same event; and a generation unit that generates correct data in which identification information of the event is set for each alarm of the group of alarms that has been associated.
An aspect of the present invention is a correct data generation method performed by a correct data generation device, the method including steps of: acquiring alarm information output from a plurality of devices; associating, from the alarm information, alarms whose occurrence times are within a first time width and recovery time is within a second time width, as a group of alarms that have occurred by a same event; and generating correct data in which identification information of the event is set for each alarm of the group of alarms that has been associated.
An aspect of the present invention is a correct data generation program for causing a computer to function as the correct data generation device.
According to the present invention, it is possible to provide a correct data generation device, a correct data generation method, and a correct data generation program for easily generating correct data for causing machine learning of alarm correlation.
Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
In the illustrated example, an alarm a and an alarm d occur almost simultaneously due to failure occurrence 201 (event). Then, the alarm a and the alarm d indicating recovery are generated almost simultaneously in response to completion of handling 202 of the failure occurrence 201. That is, the alarms that have occurred by the same event have features that the occurrence times are close and the recovery times are close.
In the present embodiment, on the basis of this feature, among past alarms for which handling has been completed, alarms having close occurrence times and close recovery times are determined as a group of alarms that have occurred by the same event, and correct data in which identification information of the same event is set to each alarm of the group of alarms is generated. As a result, it is possible to easily generate correct data used for machine learning only by inputting past alarm information for which handling has been completed to the correct data generation device 1. Therefore, in the present embodiment, operation by a maintenance person can be made unnecessary.
In the illustrated example, the correct data generation device 1 determines the alarms a and d having close occurrence times and close recovery times as alarms by the same event, and generates correct data in which identification information of the event is set to the alarms a and d.
The acquisition unit 11 acquires alarm information 101 output (issued) from a plurality of devices and stores the acquired alarm information 101 in the alarm information DB 15. For example, the acquisition unit 11 acquires alarm information from at least one operation system (OpS). A device (for example, a network device) of a general communication carrier is monitored by the OpS. The OpS provides functions of collecting alarm information from a device, displaying an alarm screen for a maintenance person, or other function. The acquisition unit 11 acquires the alarm information stored in the OpS at a predetermined timing (for example, every n minutes), stores the alarm information in the alarm information DB 15, and sends the alarm information to the preprocessing unit 12.
The preprocessing unit 12 preprocesses each alarm of the alarm information 101. Each alarm includes, for example, an alarm type, an occurrence time or a recovery time, a device ID, location information (physical location), and the like. In the present embodiment, since an alarm occurrence time and a recovery time are used, the preprocessing unit 12 associates an alarm including an occurrence time with an alarm including a recovery time by using an alarm type, a device ID, and the like as keys. As a result, the correlation unit 13 can acquire the occurrence time and the recovery time of each alarm.
Examples of the alarm type include a type indicating a device failure (for example, Eqp failure alarm) and a type indicating an interface-related abnormality of a device (for example, link down alarm). The location information indicates a physical position (for example, installed buildings, areas, and the like) of the device that has issued the alarm, and the like.
The correlation unit 13 associates, from the alarm information 101, alarms whose occurrence times are within a predetermined time width (within a first time width) and whose recovery times are within a predetermined time width (within a second time width), as a group of alarms that have occurred by the same event. That is, the correlation unit 13 groups a plurality of alarms whose occurrence times are within a predetermined time width and whose recovery times are within a predetermined time width.
In other words, the correlation unit 13 sets, as a group of alarms by the same event, another alarm that has occurred before and after a certain alarm occurrence time t1 (within a first time width) and that has occurred before and after a recovery time t2 (within a second time width). The same value or different values may be used for the first time width of the occurrence time and the second time width of the recovery time.
The reference numeral 401 indicates occurrence times of the alarms a to e. The alarms a to e caused by a plurality of events occur in a short time width. In such a case where a plurality of events occur simultaneously, it is difficult to classify an alarm into event units only by the occurrence time.
A reference numeral 402 indicates correlation of the correlation unit 13 of the present embodiment.
Specifically, the correlation unit 13 associates the alarm a and the alarm b whose occurrence times are within the first time width (here, within 1 second) and whose recovery time is within the second time width (here, within 1 second) as a group of alarms generated by the same event 1.
The correlation unit 13 associates the alarm c, the alarm d, and the alarm e whose occurrence times are within the first time width and whose recovery times are within the second time width, as a group of alarms generated by the same event 2. As described above, the correlation unit 13 can easily group each alarm into event units by using the closeness of the occurrence time and the closeness of the recovery time.
The correlation unit 13 may narrow down each alarm of a group of alarms grouped as alarms by the same event by using location information included in the alarm information. The location information indicates a location (physical position) where the device that has issued the alarm is disposed.
Specifically, the correlation unit 13 may extract only alarms whose device locations are close from a group of alarms grouped by using occurrence time and recovery time, and may delete alarms whose device locations are far from the group of alarms. Being close in location means that a certain device and another device are disposed within a predetermined range (within a predetermined distance). Being far in location means that another device is disposed outside a predetermined range from the position of a certain device.
For example, in the group of alarms (alarms c, d, e) of the event 2 illustrated in
The correlation unit 13 may acquire the location information of each alarm from the configuration information DB 16. The configuration information DB 16 is a database that stores information regarding a network configuration of each device. The information regarding the network configuration includes a device ID, location information, an IP address, a port (IF), connection destination information of the port, and the like of each device. In this case, the correlation unit 13 may acquire the location information of each alarm of the group of alarms from the configuration information DB 16 by using the device ID or the like included in the alarm as a key, and narrow down each alarm of the group of alarms by using the location information.
The generation unit 14 generates correct data in which identification information of a common event is set to each alarm of the associated group of alarms. For example, as in correct data 102 illustrated in
The correlation unit 13 associates, from the alarm information 101, alarm whose occurrence times are within a first time width and whose recovery times are within a second time width, as a group of alarms that have occurred by the same event (S13). The correlation unit 13 may narrow down the alarm of each associated group of alarms by using the location information. The generation unit 14 generates correct data in which identification information of a corresponding event is set for each alarm of the group of alarms, and outputs the generated correct data (S14).
The correct data generation device 1 of the present embodiment described above includes: an acquisition unit 11 that acquires alarm information output from a plurality of devices; a correlation unit 13 that associates, from the alarm information, alarms whose occurrence time are within a first time width and recovery times are within a second time width, as a group of alarms that have occurred by the same event; and a generation unit 14 that generates correct data in which identification information of the event is set for each alarm of the group of alarms that has been associated.
As a result, in the present embodiment, it is possible to easily generate correct data for causing machine learning of association (correlation) of an alarm. Specifically, by automating the creation of correct data that imparts a heavy burden to a maintenance person, it is possible to efficiently generate correct data only by inputting past alarm information.
The correlation unit 13 of the present embodiment may narrow down each alarm of the group of alarms by using location information of the device. By using the location information (physical position) of the device, an alarm can be associated with high accuracy.
For the correct data generation device 1 described above, for example, a general-purpose computer system as illustrated in
The correct data generation device 1 may be implemented by one computer or may be implemented by a plurality of computers. The correct data generation device 1 may be a virtual machine that is implemented in a computer.
The program for the correct data generation device 1 can be stored in a computer-readable recording medium such as an HDD, an SSD, a universal serial bus (USB) memory, a compact disc (CD), or a digital versatile disc (DVD), or can be distributed via a network.
The present invention is not limited to the embodiments and the modification, and various modifications can be made within the scope of the gist of the present invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/021593 | 6/7/2021 | WO |