The present invention relates to the telecommunication field and more in particular to a method for increasing the availability of a telecommunication network. Still more in particular, the invention concerns a method for the recovery of corrupted configuration information of network elements.
This application is based on and claims the benefit of European Patent Application No 05291596.4 filed on Jul. 22, 2005, which is incoporated by reference herein.
In a telecommunication network corruption of the configuration information of a network element can occur. In fact the configuration information is stored locally to the network element into a memory (for example a non-volatile memory like an hard disk or a volatile memory like a RAM=Random Access, Memory) and corruption can occur for example for the following reasons:
Corruption of the configuration information can also occur in case of an uncorrect provisioning of the network element configuration, for example during setup of the network; this occurs, for example, if the configuration of a network element doesn't fit the configuration of other network elements directly or indirectly connected.
In both cases the consequence is corruption (errors or lost) of configuration information stored in the network element, like for example:
The corruption can be traffic affecting and thus it is required an automatic recovery mechanism with a small recovery time, that is a small time required for restoring the correct configuration information after detection of the corruption.
The availability of a telecommunication network can be increased according to different known solutions.
A first solution requires a central network management station controlling a domain of network elements and connected (directly or indirectly) to each network element, using network management protocols like SNMP (Simple Network Management Protocol), Q3, TL1 (Transaction Language 1). The management station stores, for each network element, a backup configuration information. In case of corruption of configuration information, the network element affected by the corruption (indicated hereafter with corrupted network element) must notify the failure to the management station, which selects the backup configuration information and afterwards transmits it to the corrupted network element. Since the bandwidth available between the manager and the controlled network elements is limited, this solution has the following disadvantages:
A second solution requires hardware redundancy, like for example the duplication of the network element or, at the same network element, the duplication of the memory in order to store locally a backup copy of the configuration information. In case of detection of corrupted configuration information, the redundant hardware storing the backup configuration information is selected. This solution is timely efficient and can also allow to recover from multiple corruptions at the same time, but it is too expensive because it requires duplication of hardware resources. Moreover, it has the disadvantage that in some cases the backup configuration information must be updated, because the configuration information can change. An example is configuration information indicating the active cross-connections between input-output ports of a network element; these cross-connections can change, because the routes of the existing connections carryed by the network can change or new connections can be configured or existing connections can be removed. Therefore it is required to update periodically the backup configuration information by transmitting updated backup configuration information from the central network management station to each controlled network element and this increases the management traffic between the central network management station and the controlled network elements (wherein the bandwidth is limited).
In view of drawbacks of the known solutions, the main object of the present invention is to provide a method for recovering corrupted configuration information of a recovering network element. This object is achieved by a method according to claim 1. The basic idea is to avoid usage of a central network management station and to share among a group of neighbor network elements the storing of configuration information at least in part correlated to the configuration information stored in the recovering network element; the group of neighbor network elements can communicate with the recovering network element such that the corrupted configuration information stored in the recovering network element can be recovered by transmitting to the recovering network element correlated configuration information stored in a neighbor network element.
A further object of the invention is to provide a method for identification of neighbor network elements; this is achieved by a method according to claim 2.
Advantages of the invention are:
The recovering network element can communicate with neighbor network elements and it can be directly connected or indirectly through intermediate network elements. This is achieved, for example, by the Control Plane Elements of the new network architectures, based on the Automatically Switched Optical Network (ASON) defined in ITU-T G.8080/Y.1304 (11/2001), wherein the control plane elements (CPEs) are interconnected each other and communicate according to a signalling protocol. Each CPE controls one or more network elements, also defined Transport Plane Elements (TPE), for configuration of the connections starting from the controlled network element (also defined source network element), in order to provide a fast detection of a failure, a fast and efficient configuration of new connections within the Transport Plane, modify the connections previously set up and perform a faster restoration function providing backup connections for protecting connections affected by a failure. Various signalling protocols can fit the ASON architecture, like the Resource Reservation Protocol (RSVP) defined in RFC2205, RFC2209 and RFC2750, the Resource Reservation Protocol—Traffic Engineering (RSVP-TE) defined in RFC3209 and ITU-T G.7713.2, the Label Distribution Protocol (LDP) defined in RFC3036, the Constraint Based—Label Distribution Protocol (CR-LDP) defined in ITU-T G.7713.3 and RFC3472, the Private Network to Network Interface (PNNI) defined in ITU-T G.7713.1.
In the ASON architecture each CPE further needs to know properties of network resources for calculating the connections; this information is distributed to the CPEs through routing protocols, like Routing Information Protocol (RIP), Interior Gateway Routing Protocol (IGRP), Open Shortest Path First (OSPF, defined in RFC2328), Open Shortest Path First—Traffic Engineering (OSPF-TE, defined in RFC3639), Intermediate System to Intermediate System (IS-IS, defined in RFC1142), Exterior Gateway Protocol (EGP) and Border Gateway Protocol (BGP, defined in RFC1771).
The routing protocol can also be used for transmitting from the neighbor CPEs (that is the CPE controlling the neighbor network elements) to the recovering CPE (that is the CPE controlling the recovering network element) the indication of the configuration information stored in the neighbor CPEs, periodically or when the configuration information changes or after receiving a request from the recovering CPE. The transmission from the neighbor to the recovering network element of the correlated configuration information stored in the neighbor network element can be performed, for example, according to the File Transfer Protocol (FTP), through the channel available between the neighbor CPEs and the recovering CPE; the bandwidth available for this channel is greater (and more reliable) than the bandwidth available between a central network management station and the controlled network elements and therefore the recovery time is smaller and also more controlled.
According to the above architecture, each CPE can communicate with each other CPE of the network and thus the recovering CPE can communicate with each other CPE of the network, that is all the other CPE can be neighbor CPEs; anyway, a subset of the possible CPEs can usually provide to the recovering CPE the correlated configuration information and therefore only a group of the possible CPEs can be selected, according to different criteria, like for example:
The transmission delay is the time it takes for transmission of the correlated configuration information from the neighbor to the recovering CPE and depends for example on the propagation delay of the physical connections between the neighbor and recovering CPEs and on the time required by the intermediate CPEs (that is the CPEs between the neighbor and recovering CPEs) for receiving, processing, routing and transmitting the correlated configuration information towards the recovering CPE. The round trip delay is the time it takes for transmitting a request of correlated configuration information from the recovering to the neighbor CPE and for receiving an answer.
The transmission delay and the round trip delay are estimated before the transmission of the correlated configuration information, for example through the routing protocols by messages exchanged between the recovering and the neighbor CPE, and thus an estimated value of the transmission delay or of the round trip delay is stored at the recovering CPE. Reliability of a CPE can be evaluated, for example, if the CPE is protected by redundancy in case of an hardware failure.
A network element usually stores different types of configuration information:
The correlated configuration information can be transmitted from a neighbor network element to a recovering network element at pre-defined time units (for example, periodically) or can be transmitted only after a recovering network element sends a first request to the neighbor network element and the first request is received from a neighbor network element; the first request can be transmitted, for example, after detection of corruption of the configuration information stored in the recovering network element.
Corruption of the configuration information stored in the recovering network element can be detected before or after the transmission of correlated configuration information stored in the neighbor network elements.
Before the transmission, detection can be performed controlling checksum of the configuration information stored in the recovering network element and detecting an error. Afterwards, the recovering network element transmits to the neighbor network element the first request of configuration information stored in the neighbor network element and correlated to the configuration information affected by error. After receiving the first request, the neighbor network element transmits to the recovering network element an answer for indicating the correlated configuration information stored in the neighbor network element. The answer is received by the recovering network element from one or more neighbor network elements and is processed, in order to select the neighbor network element wherefrom to perform the transmission; the selection can be performed according to different rules, like for example the neighbor network element with the minimum cost to reach, wherein the cost can be evaluated at the recovering network element taking into account at least one of the following criteria:
Detection of corruption of the configuration information can also be performed after the transmission of correlated configuration information, by receiving the correlated configuration information at the recovering network element and comparing the configuration information stored in the recovering network element respect to the received correlated configuration information. The advantage of detection before the transmission is to reduce the traffic between the neighbor network elements and the recovering network element, because the transmission is performed only from the neighbor network element having the minimim cost to reach, while the advantage of detection after the transmission is to reduce the recovering time, because the recovering network element already stores a copy of correlated configuration information received from the neighbor network elements and (after detecting the corruption) the corrupted configuration information can be immediately replaced by the received one.
In case the recovering and neighbor network elements store a great amount of configuration information, this information can be divided into different smaller parts defined “components”; each component can be for example one or more files. This division can be used in combination with the groups of neighbor network elements indicated above, that is the configuration information can be not only divided into different groups (corresponding to the different types of configuration information), but the configuration information of each group can be sub-divided into components. Thus the components of the same type of configuration information (that is a software program or a database of configured connections/cross-connections or a database of performance monitoring functions) are grouped into a “repository”, which is described by an identifier indicating the specific type of configuration information. Each component of a repository is identified by a name indicating the specific functionality of the component inside the repository and by a version indicating specific functionalities of the component; for example, the name can be a string, while the version can be an integer number or a date. More specifically:
Referring to
More specifically, the components can be transmitted from a neighbor network element to a recovering network element at pre-defined time units (for example, periodically) or can be transmitted only after a recovering network element sends a first request to the neighbor network elements and the first request is received from a neighbor network element; the first request can be transmitted, for example, after detection of corruption of a component stored at the recovering network element.
Corruption of a component stored at the recovering network element can be detected before or after the transmission of the component stored at the neighbor network elements.
Before the transmission, detection can be performed controlling checksum of the component stored in the recovering network element and detecting an error. Afterwards, the recovering network element transmits to the neighbor network elements a first request of the components stored in the neighbor network element; the first request transmitted from the recovering to the neighbor network elements can include the name and the version of the component stored at the recovering network element. This is required for identifying from the list the neighbor network elements storing a compatible component, that is a component having the same name and a compatible version, wherein the term “compatible” means that the downloaded component can replace the corrupted component. Referring to a component of a software program, this occurs for example if the version of the downloaded component is more recent than the version of the corrupted component, because a new software version can perform the same functionalities of a previous version (plus the new ones). After receiving the first request, the neighbor network elements transmit to the recovering network element an answer for indicating a list of the components stored at the neighbor network elements, including, for each component, the name and the version. The answer is received by the recovering network element from one or more neighbor network elements and is processed, in order to identify from the list the compatible components and the corresponding neighbor network elements and in order to select the neighbor network element wherefrom to perform the transmission; the selection can be performed according to rules and the criteria indicated above, that is from the neighbor network element having the minimum cost to reach, wherein the cost is calculated taking into account reliability, bandwidth, estimated transmission delay or number of intermediate network elements. Thus the recovering network element transmits to the identified neighbor network element a second request of transmission of the identified compatible component. After receiving the second request, the neighbor network element transmits to the recovering network element the compatible component stored in the neighbor network element. Alternatevely, the first request is received by each neighbor network element and the compatible component stored in the neighbor network element is immediately transmitted to the recovering network element, which receives the compatible component from each neighbor network element and selects only one.
In case of configuration information indicating a software program, the correlation is achieved by:
In case of a a database storing configuration information, the correlation is achieved by storing in the neighbor network elements information of the configured connections crossing the recovering and the neighbor network elements. Each connection is identified uniquely in the network at least by:
For example the first row indicates that the recovering network element receives from N1 (Neigh.Id=N1) the connection c1 (Conn.Id=c1) in the direction from N1 to R (Dir=Out) on port p1 (Port Id=p1) on Virtual Channel # 4.33.
The recovering network element can restore a corrupted configuration of part of the connections from the neighbor network elements to the recovering network element by configuring the part of the connections taking into account:
The recovering network element needs to recover not only the correlated configuration information of the external part of the connections from the neighbor to the recovering network element, but also the correlated configuration information of the internal part of the connections (also indicated with cross-connections) between input and output ports of the recovering network element. This is indicated by Table 4 of
The example refers to unidirectional connections, but it is possible to recover configuration of bi-directional connections, by filling the Xc field in the associated row. In the above example, we would have the Xc field of the first row filled with p3.4.33<->p1.2.53.
In case of a configuration information indicating a database of performance monitoring (PM) functions, the correlation is achieved storing in the neighbor network elements a list of PM counters (for maintenance or for Quality of Service purposes) for monitoring the connections crossing the recovering and neighbor network elements, and storing corresponding parameters like for example:
The terms “recovering network element” and “neighbor network element” are used only for explaining the invention, but in the telecommunication network there isn't any differentiation between the network elements defined in advance: this means that at a defined time unit any network element can be “neighbor” or “recovering”. Therefore a network element includes both transmitting means (for performing the neighbor functionalities) adapted to transmit to another network element correlated configuration information and receiving means (for performing the recovering functionalities) adapted to receive from another network element correlated configuration. In fact, depending on the function performed:
The method can be advantageously implemented on a network element, for example an Add-Drop Multiplexer, a switch or a router. The network element includes storing means adapted to store configuration information at least in part correlated to configuration information stored in another network element and transmitting means adapted to transmit to the other network element at least part of the correlated configuration information. The network element further includes receiving means adapted to receive from the other network element part of the configuration information stored in the other network element. The network element also includes processing means, for example a microprocessor (external or embedded into an ASIC/FPGA) adapted to run a software program (like C) for performing one or more steps of the inventive method. The processing means are adapted to select the other network element from possible network elements taking into account at least one of the following criteria:
The method can be advantageously implemented in a telecommunication network including a recovering network element and a neighbor network element. The recovering network element and the neighbor network element includes storing means adapted to store configuration information; the configuration information stored in the neighbor network element is at least in part correlated to the configuration information stored in the recovering network element. The neighbor network element includes transmitting means adapted to transmit to the recovering network element correlated configuration information stored in the neighbor network element and the recovering network element includes receiving means adapted to receive from the neighbor network element correlated configuration information. The network includes at least one further network element including storing means adapted to store configuration information; the configuration information stored in the further network element is at least in part correlated to the configuration information stored in at least one neighbor network element. The further network element includes transmitting means adapted to transmit to at least one neighbor network element corresponding correlated configuration information and includes receiving means adapted to receive from at least one neighbor network element correlated configuration information.
Number | Date | Country | Kind |
---|---|---|---|
05291596.4 | Jul 2005 | EP | regional |