This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2008-204615, filed on Aug. 7, 2008, the entire contents of which are incorporated herein by reference.
The present invention relates to a method of network management.
The monitoring of faults in network management is generally conducted using an operation manager installed in a network management system (NMS), a router forming the network, and an operation agent installed at a node (such as a hub or network element). When a fault such as a network break is detected, the operation agent sends an information message to the operation manager and a signal indicating the held status in response to polling periodically performed by the operation manager. Various timers including a timer for setting a timeout period (e.g., a fault detection time) for detection of faults are mounted for the detection of faults performed by the operation agent. Similarly, various timers including a timer for setting a timeout period (e.g., a status acquisition error detection time) for which the operation manager waits for a response from the operation agent are mounted for the operation manager. The values of these timers are dependent on each other. If the value of a timer is set too short, an erroneous detection would occur in spite of the fact that the system operates normally in practice. Conversely, if the value of the timer is set too long, it takes longer to detect generation of a fault. This will adversely affect corresponding actions taken subsequently. In the case of a simple network configuration, the time in which a fault is detected by the operation agent and the response time are determined substantially uniquely by the type of the node. Consequently, the values of the timers have been determined and held constant according to the type of the node.
In recent years, as the internet protocol (IP) technology has evolved, the building of large-scale private IP networks that are combinations of various carrier network services is in progress. In such a large-scale private IP network, even if nodes forming the network are of the same type, the timing of information messages and the response time from each node responsive to periodic polling from a network management system (NMS) are different depending on the network capacity and on settings of router priority control. Therefore, in such a large-scale private IP network, it is necessary to appropriately tune the values of various timers throughout the operation.
First, the timeout period (fault detection time) for detection of a fault and the timeout period for polling (status acquisition error detection time) are previously designed depending on the contents of services offered to the network user and on the node type and network capacity (P: Plan).
2) The designed values are used as parameters in monitoring a commercial network (D: Do).
3) Data indicated by information messages based on the results of the step 2) are totaled and checked (C: Check).
4) The results of the check are analyzed. An improvement plan is discussed (A: Act).
5) Data derived by the discussed improvement plan is fed back to the design (P).
Tuning is performed by this procedure using a PDCA cycle.
The operation manager 11 of the network management system (NMS) 1 sends out SNMP (simple network management protocol) or telnet commands in a given sequence to the operation agent 41 of the node 4, thus performing operations such as configuration transfer or port control. The NMS 1 has a command catalog delivery portion 13 that delivers a catalog of commands to the node 4, the commands being used for controlling the operation of the node 4. The delivered command catalog is held in a system configuration setting file 42 in the node 4. The network is centrally monitored by periodically acquiring the status from the operation agent 41 of the node 4 by means of the operation manager 11 of the network management system 1 using SNMP or another protocol.
The relationship between the location at which a fault occurs and the detection of the fault or an error occurring in acquiring the status (hereinafter may be referred to as a “status acquisition error” is as follows. When a fault occurs in the operation agent 41 within the node 4 or another fault occurs at the node 4, a status acquisition error occurs in the operation manager 11 of the network management system 1 without detecting any fault. If a fault occurs either on the physical link 33 connected with the node 4 or on a single logical link connected with the node 4 (e.g., a fault on one link), the operation manager 11 of the management system 1 does not produce a status acquisition error but detects the fault on the physical link by being informed of the fault from the nodes 2 and 4. If a fault occurs on a redundant logical link connected to the node 4 (i.e., fault on both links), the operation manager 11 of the management system 1 produces a status acquisition error without detecting a fault from the node 4.
On the other hand, the network management system 1 sets a timer value into a timer setting file 12 in response to each individual manipulation or command according to the type of the node of the monitored network. This prevents the system from being in an operation response waiting state for a long time when a fault occurs at the node or in the network. A retry subroutine is also used to suppress frequent detection of errors on temporal communication failures. As described previously, in an IP network, even with the same node type in the same network, different responses are made to the same request according to the following parameters: (a) type of network used, (b) configuration of adjacent node, (c) circuit class, and (d) priority control level of the node for each individual kind of data.
In the set of preparatory tasks, patterns regarding various timers are designed (step S1) and data about the monitored node is registered into the network management system 1 (step S2). The designed patterns include various parameters such as machine type, vendor, physical link (capacity), logical link, used carrier network, conditions under which an access is made to the carrier network, redundant logical link, and scale (range) of the serviced configuration. In the set of setting modification task, the results of the design of the patterns are reflected in the timer setting file 12 when the network is configured (step S3).
Then, in the operation phase for monitoring and operating the network, fault statistical data produced by the monitoring subroutine is extracted (step S4). The results of the extracted statistical data are then evaluated and analyzed (step S5). Subsequently, the timer values are readjusted based on the results of the evaluation and analysis of the fault statistical data (step S6). For example, the timer values readjusted in step S6 are reflected by a manual tuning operation.
The prior art large-scale private IP network is monitored whether or not there is a fault as mentioned previously. However, the following problems exist:
However, the communication quality (response performance) varies depending on the bandwidth of a carrier-offering shared IP network, on the quality of each individual line, on timer settings for detection of router faults, or on the amount of configuration of the network outside a customer's edge. Therefore, the operation may not be monitored appropriately if only pre-designed timer values are used.
In consequence, a technique is desired which is capable of readjusting the relationship between a timer for detecting a status acquisition error in the network management system and a fault detection timer in the node with the least steps from the designing phase in a corresponding manner to various connection configuration considerations (such as configuration modification, elimination and consolidation, addition of other machine type, and utilization of carrier network which often occur in a large-scale private IP network).
Meanwhile, JP-A-9-149045 discloses a technique for monitoring and controlling a network using a network management system (NMS) but fails to disclose any technique for setting timer values for monitoring the network for faults as described previously.
In view of the foregoing problem with the prior art, it is an object of the present invention to provide a method of network management capable of strictly classifying faults by setting timer values appropriately for both a network management system and a node according to the network configuration.
A network management system comprises a unit for identifying a node, whose settings are to be modified, from design pattern information about a network to be managed; a unit for finding values of various timers included in the network management system and in the node whose settings are to be modified about the identified node based on a template for timer control; and a unit for causing the found values of the timers to be reflected simultaneously in the network management system and in the node whose settings are to be modified.
The preferred embodiments of the present invention are hereinafter described.
In
The configuration management functional portion 120 has a connection configuration management portion 122 and a connection configuration searching portion 121. The connection configuration management portion 122 manages the whole configuration of a large-scale private IP network including a node 2, a link 3, and a node 4 as a network configuration dataset 110. The connection configuration searching portion 121 searches the network configuration dataset 110 and creates a setting modification target node list 130.
The fault monitoring functional portion 140 finds values of timers in the network management system 100 and in the node 4 based on a timer control template 141 and on the setting modification target node list 130. The monitoring functional portion 140 has a timer control functional portion 142 for setting the value of the timer in the network management system 100 into a timer setting file 143 such that the timer values found as described above are reflected substantially simultaneously. Furthermore, the timer control functional portion 142 requests a command catalog execution portion 152 included in the node communication control functional portion 150 to communicate with the node 4. In addition, the fault monitoring functional portion 140 has a fault statistics output portion 144 for producing fault detection statistics output data 145.
The node communication control functional portion 150 has a node status acquisition portion 151 for acquiring information about the status of the node 4 by periodically polling the operation agent 41 of the node 4 via the node 2 and link 3, and the command catalog execution portion 152 for distributing a command catalog to the system configuration setting file 42 from the node 2 and link 3 via the operation agent 41 of the node 4. The node status acquisition portion 151 corresponds to an operation manager.
In the set of preparatory tasks, patterns regarding various timers are designed (step S11), and data about the monitored node is registered into the network management system 100 (step S12). The designed patterns are registered as the network configuration dataset 110.
Referring back to
Referring back to
Referring back to
Then, the timer control functional portion 142 of the fault monitoring functional portion 140 asks the command catalog execution portion 152 of the node communication control functional portion 150 to modify the settings of the node 4. In this operation, the contents of the timer control template 141 are used as parameters (step S144).
Then, the command catalog execution portion 152 of the node communication control functional portion 150 creates a catalog of commands to be executed for the node 4, using the applicable node name and the contents of the timer control template 141 (step S145).
Then, the command catalog execution portion 152 of the node communication control functional portion 150 introduces the catalog of commands into the node 4 and modifies the system configuration setting file 42 in the node 4 (step S146).
Referring back to
The timer control functional portion 142 of the fault monitoring functional portion 140 causes the values of the various timers to be repetitively reflected in a batch in other nodes with similar designs.
Furthermore, the timer control functional portion 142 of the fault monitoring functional portion 140 may separate the effects of the monitoring of carrier-dependent unmonitored devices as a different design pattern by incorporating the conditions of the quality of the communication between the node and the network management system into the pattern of design of the node, the quality being dependent on the device gaining access to the carrier network. Consequently, what is transmitted may be limited to an optimum notification message desired for the classification of faults.
After the processing of the subroutine for modifying the settings as described so far, program control returns to the subroutine of
The processing described so far causes the values of the timers in the network management system 100 and node 4 to be set in a batch-by-batch activation or a manual operation. Alternatively, the timer values may be set dynamically whenever a polling operation is performed.
Referring to
Then, timer values are acquired for each individual parameter from a timer value management table set T3 for each individual parameter (step S23). The table set T3 includes a device type table T31, a network type table T32, a device configuration pattern table T33, an accommodated terminal number table T34, and a priority control level table T35.
Weight values for the respective parameters are acquired from a weight management table T4 (step S24). Where no weights are used in later computation, this processing step is omitted.
Timer values are then calculated according to a given calculation formula from the timer values at each parameter and from the weight values (step S25). The following formulas may be used as the given calculation formula.
timer value=timer value at a specific device type+timer value at a specific network type+timer value for a device configuration pattern+timer value corresponding to the number of accommodated terminals+timer value at a priority control level (1)
timer value=timer value at a specific device type×weight+timer value at a specific network type×weight+timer value for a device configuration pattern×weight+timer value corresponding to the number of accommodated terminals×weight+timer value at a priority control level×weight (2)
timer value=[timer value at a specific device type|timer value at a specific network type|timer value for a device configuration pattern|timer value corresponding to the number of accommodated terminals|timer value at a priority control level], (3)
timer value=[timer value at a specific device type×weight|timer value at a specific network type×weight|timer value for a device configuration pattern×weight|timer value corresponding to the number of accommodated terminals×weight|timer value at a priority control level×weight]. (4)
Then, a retry number is acquired from a retry number management table T5 based on the found timer value (step S26).
Then, the found timer value is reflected in and applied to both the network management system 100 and the node 4 (step S27).
Then, the monitored node is polled and monitored (step S28). After the completion of the monitoring of the node, the polling of the next node is started and monitored (step S29).
A value indicating that the node is not monitored may be defined in the timer value management table T3 for each parameter. Calculation of each timer value may be nullified by nullifying the parameter values (e.g., set to “100”) in the node fundamental information table T2 for the node whose timer settings may be made invalid.
As described so far, in the network management system including a database having network type information, information about connection configurations, and information about priority control, the timer values regarding operations on the node and a command sequence for the operations may be modified based on design pattern information (such as network type information, connection configuration information, and information about priority control levels), as well as on the information about the type of the node. Consequently, when large-scale private IP networks having varied machine types, networks, and levels of priority control are managed, it is possible to circumvent frequent unwanted communication timeouts and prolonged operation waiting times.
New timer control templates 141 for individual design patterns for modifying the timer value settings regarding physical/logical link faults detected by the node are prepared for a command catalog execution portion 152 that issues command catalogs to the node from the network management system. As a consequence, when modifications are made to a large-scale private IP network (such as variations in the network configuration, addition of a carrier network, addition of a different network type, addition of a different machine type, or the like), the operator may easily modify the timer values of the node by making use of the network management system.
It is possible to cause the effects of monitoring of a carrier-depending unmonitored device to be isolated as a different design pattern by incorporating quality conditions of the communication between the node depending on a carrier network access device (such as an ADSL modem or protective device) and the network management system into the node design pattern. What is transmitted may be limited to optimum information messages desired for the classification of faults.
The present invention has been described so far using preferred embodiments. While specific examples have been illustrated in explaining the invention, various modifications and changes may be made thereto without departing from the broad gist and scope of the present invention delineated by the appended claims. That is, it should not be construed that the present invention is limited by the details of the specific examples and/or the accompanying drawings.
Number | Date | Country | Kind |
---|---|---|---|
2008-204615 | Aug 2008 | JP | national |