The present invention generally relates, in a first aspect, to a procedure for optical network survival against multiple failures, and more particularly to a procedure that allows an optical network to quickly react in case of working paths failures by pre-calculating a set of recovery paths for each working path linking an origin node with a destination node, with backward reserve of resources.
In a second aspect, the invention generally relates to a system for optical network survival against multiple failures, and more particularly to a system arranged for pre-calculating a set of recovery paths for each working path and simultaneously using them to recover a working path failure.
Currently, optical transport networks are the solution for moving huge volumes of data from one point to another in geographically different locations. An optical transport network is made of photonic switches interconnected by fibre links. It is not uncommon a failure in the fibre links or the nodes. The source of the failures is related to failure of equipment in the link (e.g. amplifiers), cuts in the fibres, e.g. by roadworks, digging, power failure or bad weather. In order to estimate the amount of failures in the network, it can be seen at the FCC reports, which published findings that long haul networks experience annually 3 cuts for 1500 kms of fibre [Grover03]. That implies a cut every four days in a typical long haul network with 45000 km of fibre. Thus, it is necessary to provide the network means of maintaining the service continuity in the presence of failures. Not only should the network be able to react to a single failure, but also to a multiple failure situation in which several link or nodes are affected simultaneously (or one after the other). A double failure can lead to service disconnection. It is not uncommon to reach a situation in which before a failure is repaired, another fault happens in the network. Also, catastrophic situations, like those provoked by bad weather or power outages, affect multiple geographically close locations.
Summing up, as optical networks transport huge volumes of data, a quick recovery time is critical to avoid the loss of Tbs of data. The impact of network unavailability is studied in [Grover03] and is summarized in
Currently, the solution for providing survivability in optical transport networks is based on GMPLS [RFC3471], an evolutionary advance of MPLS that supports packet switching, time division and wavelength multiplexing. GMPLS is the basis of an optical transport network control plane. The functional specification of the GMPLS recovery is described in [RFC 4426]. For the rest of the present description, the terminology for Recovery (Protection and Restoration) for GMPLS specified in [RCF 4427] will be used.
GMPLS defines different ways of achieving survivability. In an optical transport network, end to end data connections are known as LSP (label switched path). According to [RFC 4426], a (LSP) may be subject to local (span), segment and end-to-end recovery. Local span refers to the protection of the link between two neighbour switches. Segment protection refers to the recovery of a segment between two nodes. Finally, end-to-end protection refers to the protection of an entire LSP from the ingress to the egress node.
According to [RFC4427], protection and restoration of switched LSPs under tight time constraints is a challenging problem. This is particularly relevant to optical networks that consist of Time Division Multiplex (TDM) and/or all-optical (photonic) cross-connects referred to as GMPLS nodes (or simply nodes, or even sometimes “Label Switching Routers, or LSRs”) connected in a general topology [RFC3945].
For the rest of the present description, the working LSP will be referred as the LSP to transport normal user traffic, and recovery LSP as the LSP to transport normal user traffic when the working LSP fails.
GMPLS [RFC4426] and ITU-T [G.801] define several schemes of survivability, grouped in protection and restoration schemes, which are next summarized, together with combined schemes.
Protection Schemes:
a) 1+1 Dedicated Protection
This scheme is based on the pre-establishment of a dedicated resource-disjoint protection recovery LSP (Label Switched Path) associated with the working LSP. Traffic is divided and simultaneously sent on both paths but when a failure is detected then all the information is sent through the path that is not affected by the failure.
b) 1:1 Protection with Extra Traffic
This scheme is similar to 1+1, but allows transporting extra traffic in the recovery LSP. This extra traffic will be preempted in case of failure.
c) 1:n Protection
A recovery LSP is set to protect N working LSPs. In the event that 2 or more LSPs fail, only one of them can use the recovery LSP.
d) m:n Protection
m recovery LSPs are set to protect n working LSPs. If more than m LSPs fail, some of the working LSPs cannot be recovered.
e) SMP (Shared Meshed Protection)
According to SMP, each working connection is protected by a pre-configured protection path. The protection path can share some resources with other protection paths. The resources are reserved for the protection, but the recovery paths are not established. After the failure notification and location, the protection path is fully established.
The main difference with the previous schemes is that parts of the recovery LSP are shared with other recovery LSPs (not the whole recovery path).
Restoration Schemes:
a) Pre-Planned LSP Restoration
Before the failure detection and notification, one or more restoration LSPs are pre-computed and signalled between the same ingress-egress node pair as the working LSP, but not established (cross-connected). After the notification of the failure and its location, one recovery LSP is selected among those pre-calculated and is completely established.
b) Shared-Mesh Restoration
“Shared-mesh” restoration is defined as a particular case of the pre-planned LSP restoration that reduces the restoration resource requirements by allowing multiple restoration LSPs (initiated from distinct ingress nodes) to share common resources (including links and nodes.).
This mechanism is very similar to the shared meshed protection (SMP) in the ITU-T. The main difference is that SMP reserves the resources, while in the shared meshed restoration there is no guarantee.
c) LSP Restoration
After failure detection and notification, an alternate LSP is computed, signalled and fully established. The alternate LSP is signalled from the ingress node and may reuse the intermediate node's resources of the working LSP under failure condition (and may also include additional intermediate nodes.)
There are no specific recovery LSPs activated protecting the working LSP.
However, the working LSP can potentially be restored through any alternate available route, with or without any pre-computed restoration route. In this case the resources for the recovery LSP can be preallocated, but explicit signalling is needed to activate the recovery LSPs. The inventors refer to the latter as pre-computed restoration.
Combined Schemes
Additionally, these schemes can be combined to obtain further levels of protection:
1+1 Protection+Restoration Combined (PRC):
In this case, the path is protected by a dedicated LSP, and when either the working LSP or the protection LSP fail, they are restored.
[RFC 4872] defines the RSVP-TE Extensions needed to Support of End-to-End Genera (GMPLS) Recovery.
Survivability in Optical Transport Networks is also being standardized in the ITU-T SG-15. In this context. Shared Meshed Protection is currently being defined (G.SMP).
There are several patent documents that aim to solve the survivability in optical transport networks, some of which are next cited and briefly described.
US20050259570A1—Fault recovery method for multi protocol label switching network, involves receiving fault event notification that indicates occurrence of fault after fault localization is performed, and performing alternative path calculation. The method involves receiving a fault event notification that indicates occurrence of a fault after fault localization is performed. A predetermined waiting time more than a time taken to receive state information notifications of links other than a link that is being utilized as an LSP is awaited. Alternative path calculation is performed based on the fault event notification and the state information notifications [Hitachi].
US20030084367A1—Communication network e.g. mesh network, accommodates traffic path on fault recovery layer through specific transport path when one transport path is not working properly.
WO2008006268A1—Method for realizing service protection in automatically switched optical network involves establishing connection according to recovery path establishment request, and switching service from work paths to recovery paths. [Huawei]
One of the main limitations of the different solutions is the need to locate the failure in the network. Locating the failure needs a Hello like protocol for the discovery and messages for the notification [Rozycki07]. The time to locate the failure depends on the transmission time from the closest node to the failure and the node origin of the affected LSPs and the processing time in the nodes. This time can vary in the range of hundreds of milliseconds, depending on the size of the network. Thus, avoiding this time can be the difference between no impact on the services and interruptions in the services.
To summarize, the main limitations of current solutions are:
Protection Schemes:
The 1+1, 1:1, 1:n and m:n protection are not able to survive in case of double/multiple failure. Moreover, in the cases with extra traffic, this traffic should be preempted in case of failures. Only the 1+1 mechanism guarantees the recovery of all the LSPs
On the other hand, protection schemes are the only ones to recover in less than 50 ms.
Restoration Schemes:
a) LSP Restoration
The first step in this scheme is to know the exact location of the failure. Thus, the network must have mechanisms to provide such information. The next step is the calculation of the recovery LSP, which cannot begin until the information of the location of the failure arrives at the node. The calculation of the recovery LSP includes, updating the topology, calculation of an alternative path, assignment of a new wavelength and check of all the physical restrictions of the path. Next the route must be signalled and perform all the cross-connections.
All these steps can take several seconds in wavelength switched optical networks. Moreover, there are no guarantees that the calculated route does not interfere with other calculated routes. In case there are collisions, recovery time increases significantly.
b) Pre-Planned LSP Restoration/Shared Meshed Restoration
In order to alleviate the restoration time, a recovery path (or a set of recovery paths) can be computed in advance. Also, as noted by RFC4426, in cased of multiple failures, the shared mesh restoration capacity can be claimed for more than one failed LSP and the recovery LSP can be activated for one of them at most.
This kind of schemes are the most adequate to recover in case of multiple failure. However, currently the mechanisms need the notification of the fault to start the recovery process. Although multiple recovery paths can be pre-computed, only one of them can be selected and signalled. Thus, if the selected recovery LSP fails, e.g. because it uses the same shared resources as other LSP, it will have to try again, enlarging the recovery time.
In the best case, a successful pre-planned Restoration will be in the order of hundreds of milliseconds. Thus, HELLO problems between the routers may appear and TCP sessions start to fail. When several LSP restorations collide, the restoration time can go up to several seconds, increasing the problems.
1+1 Protection+Restoration Combined (PRC):
This mechanism can survive to a double failure in the same sense as the restoration. As long as the working and recovery LSPs are set, the mechanism is fast.
However, resources consumption is very high and knowledge of the location of the failure is again needed to survive against multiple failures.
Two transmitters are needed simultaneously, and if the second failure happens before the working LSP has been re-established, a restoration is needed, taking a long time, and with no guarantee.
The method described in US20050259570A1 [Hitachi] begins with a notification of the location of the failure.
The method described in US20030084367A1 [NEC] does not need to know the location of the failure, so the recovery is speeded. However, it is aimed at the recovery of single failures. In the presence of multiple failures, it needs complementary mechanisms.
WO2008006268A1 [Huawei] proposal speeds the recovery in automatically switched optical networks. However, it is mainly aimed at the recovery of a single failure.
Other patent documents disclosing mechanisms for the recovery of a single failure are: US20070242605 A1, US20030043427A1 and US20040109687A1, the latter requiring locating the failure.
“Multiple link failure recovery in survivable optical networks”, Xiaofei Cheng, Xu Shao and Yixin Wang, of Photonic Network Communications, Volume 14, Number 2, 159-164, DOI: 10.1007/s11107-007-0071-4, discloses a method for recovery of a multiple failure link, but is focused in the reserve of a backup bandwidth, which increases the resources consumption.
Summarizing, none of the current solutions is able to respond quickly to a multiple failure with a low use of resources and guarantees. Most of the solutions need be notified first of the exact location of the failure.
Although restoration is able to respond to multiple failures, it takes in average several seconds to recover, which are significantly increased in case of collisions.
Fast recovery of a single fault, in less than 50 ms, is only possible with protection schemes, which double the network resources dedicated for survivability.
It is necessary to offer an alternative to the state of the art which covers the gaps found therein, particularly those related to the lack of proposals providing a fast recovery for multiple failures.
To that end, the present invention provides, in a first aspect, a procedure for optical network survival against multiple failures, comprising using a pre-planned path restoration scheme for computing in advance, or pre-calculating, recovery paths for recovering failed working paths.
On contrary to the prior art proposals, the procedure of the first aspect of the invention, in a characteristic manner, in case of single or multiple failures have been produced in each of said working paths it comprises pre-calculating a set of recovery paths for each working path linking an origin node with a destination node, and simultaneously using the recovery paths of said set of recovery paths to try to communicate, said origin node with said destination node, by means of simultaneously sending different request recovery messages through said recovery paths, and selecting at said destination node, the recovery paths to be used depending on said different request recovery messages sent.
The present invention takes its basis on the pre-planned restoration schemes and improves them to react quickly in case of multiple failures avoiding the need to know the exact location of the failure. For some embodiments, the invention will be able to recover in less than 200 ms in many cases, avoiding problems in the TCP sessions and communication between the routers.
For an embodiment, said paths of said set of recovery paths are three or more in number, such that at least one of the recovery paths is valid in case of a double link failure.
The procedure comprises, as per an embodiment, detecting a working LSP failure at said origin node, for example by means of loss of signal or loss of quality indications.
Other embodiments of the procedure of the first aspect of the invention are described in appended claims 5 to 13, and in a next section of the present description.
A second aspect of the invention concerns to a system for optical network survival against multiple failures, comprising:
On contrary to conventional systems, in the system of the second aspect of the invention said processing means are arranged for pre-calculating a set of recovery paths for each working path, and said control means are arranged for simultaneously using the recovery paths of said set of recovery paths to try to establish communications between said origin node and said destination node, in case of at least a failure has been produced in each of said working path.
The system of the second aspect of the invention implements, for some embodiments, the method of the first aspect of the invention.
Other embodiments of the system of the second aspect of the invention are described in appended claims 15 to 20, and in a next section of the present description with reference to the attached drawings.
The previous and other advantages and features will be more fully understood from the following detailed description of embodiments, with reference to the attached drawings (some of which have already been described in the Prior State of the Art section), which must be considered in an illustrative and non-limiting manner, in which:
The present invention, as for its first and second aspects, aims at a quick recovery of an optical network in case of a double/multiple failure keeping the use of resources low. It relays on the following concepts:
The main concept of the procedure is, firstly, the pre-calculation of a set of recovery paths (route+wavelength+optical parameters for the transponder) for each working path. The set of recovery paths is configurable, according to the desired level of protection. The present invention includes a mechanism to calculate all the paths for a double failure case, in such a way that at al least one of the paths is valid in case of a double link failure.
The invention relies on end-to-end LSP recovery. The failure has to be detected at the ingress node, either by means of loss of signal (LS) or Loss of quality (LQ) indications.
Once the failure is detected at the optical layer in the ingress node, simultaneous request recovery messages are sent to the egress node, each of them following the corresponding pre-calculated path. All request recovery messages have the same LSP identifier, a common failure id, and a different recovery LSP id per path. At the destination (egress node), depending on the embodiment, one or more request recovery messages will arrive. Only one of them will be chosen. The preferred option is to select the first request recovery message to arrive, ignoring the rest of request recovery messages with the same LSP id and failure id. When the intermediate nodes receive the request recovery messages, the resources are not reserved, only checked if they are available for the given LSP.
The set up of the backup path is made through the reverse path. Note only one response recovery message, or response to a request recovery message, is sent from the egress node, as the rest of request recovery messages are ignored. When the intermediate nodes receive the positive recovery response, i.e. the response recovery message, the resources are activated (e.g. the cross-connections are made).
With this mechanism the chances that a connection survives in case of double failure are enhanced. Furthermore, the mechanism does not need to wait to know where the failure has happened.
The invention also proposes an implementation of the recovery message based on extending the RSVP messages.
Thus, the invention is an enhancement of the pre-planned LSP restoration scheme to react fast in case of multiple failures. It is also suitable to be implemented in Shared meshed restoration schemes.
Next the main modules of the system of the second aspect of the invention are described, for the embodiment illustrated in
Element 101 is a Multi-path Computation element: This module is in charge of the computation of a set of recovery LSPs for a working LSP in the network. The set of paths computed by this element cover multiple fault cases. This module is composed of sub-modules 102, 103 and 104.
Sub-module 102, PCEP: This element is in charge of the communication with submodule 106 of the Optical Node Survivability Controller (element 105).
Sub-module 103 is the multipath computation, wavelength assignment and impairment validation:
This element is in charge of computing the set of recovery LSPs for a working LSP in the network. The paths must cover all the fault cases of interest (single, double . . . ) It must calculate a wavelength for each recovery LSPs. It must validate the feasibility of the calculated path.
A possible, not excluding others, mechanism to perform the calculation of the paths is described next:
First of all, it is advised that, in order to be able to achieve a 100% availability against failure of order n, all nodes need to be connected to at least n+1 other nodes. Otherwise, the node that is not connected with at least n+1 links is subject to be isolated in some failure of order-n case, making impossible to ensure 100% of availability.
The next described mechanism to calculate different paths that survive against a double failure is just one of several mechanisms which can be used for calculating the set of paths. Depending on the embodiment, all or only part of the calculated paths are used for recovery purposes. The number of recovery paths can be limited, reducing the availability.
The mechanism works as follows:
For each source destination pair in the network:
Sub-Module 104, Topology:
This sub-module is in charge of listening to IGP protocols and maintains an update on the Traffic Engineering DataBase with the topology and use of the lambdas in the different interfaces.
Element 105, which is an Optical Node Survivability Controller, is a module is in charge of:
This module is composed of five sub-modules, particularly sub-modules 106, 107, 108, 109 and 110, and is in charge of the end to end LSP recovery.
Sub-module PCEP 106 is in charge of the communication with sub-module 102 of the multi-path computation element 101. Sub-module 106 receives requests for recovery paths for a working LSP from the Decissor sub-module 108. The sub-module 106 processes the answers from the multi-path computation element 101 and stores them in the path cache 107.
Sub-module Path Cache 107 maintains, for each working LSPs starting in that node a set of recovery LSPs. Associated to each LSP the optical node should keep all the information need to quick set up of the lightpath (i.e. power balance). The set of paths can be updated at any time.
Sub-module Core 108 is the intelligence of the node. Every time a working LSP is established, starting from its node, it requests a set of recovery paths to sub-module 106. Once the set of recovery paths are received, it calculates of the adjustments in the node needed to establish each of these LSPs and stores them in the Path Cache 107.
The origin node has or is associated to all of the above modules and sub-modules, while the destination node and each of the intermediate nodes has or is associated to respective core sub-modules 108, in order to process the request and response recovery messages received and check, reserve and/or activate the needed resources, as for the intermediate nodes is concerned, and for receiving the request recovery message and to generate and send the corresponding response recovery message, as for the destination node is concerned.
In the event of a fail in the working LSP starting in the node (due to any kind of failure, single or multiple), which will be notified by element 109 (fault detection), one request recovery message is created for each recovery path available. This message is sent to the signalling module (110), which will forward it for all the control plane interfaces.
As shown in
A request recovery message with different Recovery_LSP_ID is sent for each recovery LSP. Thus, n simultaneous messages are sent from the Source node, where n is the number of pre-calculated paths for each LSP. Each of the request recovery messages will follow its recovery path.
When a core sub-module 108 of a node receives a request recovery message and the final destination of the LSP is not that node, i.e. it is a intermediate node, it checks if resources are available for such path, and if the route of the recovery LSP is possible (it may not be possible if it uses a link which has failed). Note that in this point, the resources are not reserved, only checked if they are available for the given LSP.
The set up of the backup path is made through the reverse path. When core module 108 receives a response recovery message, it reserves the resources in the optical node. Note only one recovery response is sent from the egress node, as the rest are ignored. When the intermediate nodes receive the positive recovery response, the resources are activated (e.g. the cross-connections are made).
With this mechanism the chances that a connection survives in case of double failure are enhanced. Furthermore, the mechanism does not need to wait to know where the failure has happened.
When core module 108 receives a recovery message whose destination is its optical node, i.e. it is the destination node, if it is the first recovery message for that LSP and failure case that has arrived there at, said core module 108 of said destination node responds to the message if there are available resources. To this end, a list of LSP_ID, FailureId and Recovery_LSP_ID is maintained. If the incoming recovery message has the LSP_ID and FailureId that is in the list, the message is discarded.
A Response recovery message used to confirm to the Source node that the Request recovery message arrived and the corresponding path is available. This message is composed by the next fields, as shown in
Sub-module 110, Signalling, is in charge of constructing the Request and response recovery message and sending them to the network. A possible, not excluding others, way of constructing these messages is by extending the RSVP protocol.
Next different embodiments of the procedure of the first aspect of the invention, and of use of the system of the second aspect of the invention, are described with reference to
The other possible case is when there are more recovery paths arriving to the destination. As shown in
Destination node receives a message through at least one path that is available after a failure. So, when the Destination node receives a message, it analyses the Path field of the Request recovery message in order to know the sequence of nodes, that is, the path that is available. The D node then generates the Response recovery message with the same information that was included in the Request recovery message and sends it back through the path determined in the Path field.
In case the Destination node receives more than one Request recovery message it will accept the message that arrives in first place that corresponds with the shortest path, and will discard later messages. The advantage of accepting the first message is that it corresponds to the path with less time delay.
In the transmission time of the Response recovery message, one of the important stages of the invention takes place. The stage referred is “the backward reservation of the resources needed for the transmission”. This is possible because the response recovery messages include all the information, that is, the node sequence and thus the link sequence, and the wavelength (it is considered a network without wavelength conversion so the wavelength assigned to a path is the same for all the links of the path).
After a given time the Source node receives the Response recovery message. The O node analyses the Path field of the Response recovery message in order to know the available path and begins again to send the information that was already been transmitted before the double failure, through the new established path, as is shown in
Current mechanisms for the survivability of an optical network either take several seconds or are very fast (less than 50 ms) but use a high number of resources and dot handle multiple failures.
This invention provides the following features:
Perform a Fast recovery (less than one second) of an end to end optical path in case of a double failure in the network.
Perform a Fast recovery (less than one second) of an end to end optical path in case of a catastrophic multiple failure in the network.
Recover from a multiple failure without having to know of the exact location of the failure.
Low use of network resources (no need to dedicate transponders and wavelengths) to guarantee the survival against double failure.
A person skilled in the art could introduce changes and modifications in the embodiments described without departing from the scope of the invention as it is defined in the attached claims.
Number | Date | Country | Kind |
---|---|---|---|
P201130064 | Jan 2011 | ES | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2011/074090 | 12/27/2011 | WO | 00 | 9/30/2013 |