This application is a National stage of International Application No. PCT/EP2012/067981, filed Sep. 13, 2012, which is hereby incorporated by reference.
This invention relates to a method of realigning a node in a label switched network and to a node for use in a label switched network.
A telecommunications network is typically made up of a plurality of nodes which are interconnected. Routes are created through the network by choosing a set of links between the nodes so that a route can enter the network at an ingress node, traverse the network by hopping between nodes via the links and exit the network at an egress node. Typically each node has knowledge of the routes traversing that node.
Periodically, it is necessary for nodes to be restarted. Prior to the restart, it is likely that one or more route will traverse the node and thus after a restart it is necessary for the node to recover the state information about routes traversing the node in order to successfully allow the one or more routes to be resurrected after the restart. This recovery of state is typically termed “realignment” of the node.
In the prior art, two approaches typically have been used to realign nodes following a restart. In the first approach, nodes carry out periodic backups of their state to some form of non-volatile memory and are then able to restore that state following a restart. In a traditional circuit-based domain this approach is called “hard state”. In such a network (typically a time division multiplex (TDM) or dense wavelength division multiplex (DWDM) network) the circuit paths are generally quite static and do not vary over short periods of time. Thus periodic snapshots taken in the form of backups, generally work quite well because the routes in the network are unlikely to have changed in any significant way between backups. However, if significant changes to the network have been made, a very large manual effort is required to reconcile the local circuit status with the actual network status following a restart.
In other types of networks such as packet switched networks, for example Internet protocol (IP) or multi-protocol label switched (MPLS) networks, the routes change much more quickly than in traditional circuit-based networks. Accordingly a second approach to node restart has been made for this type of network, which involves a restarted node talking to neighbouring nodes in the network, i.e. nodes which are interconnected with the restarted node to gain state information about the routes in place before the restart. In the context of such networks this type of restart is known as a “graceful restart” and is described in Internet Request for Comments (RFC) 5063. This prior art graceful restart procedure assumes that only one node restarts at a time. However, if multiple restarts occur, the information available from neighbouring nodes as postulated in RFC 5063, is likely to be incomplete and the resurrection of previous routes will fail. Furthermore, if the network has fast changing routes, a backup and restore method will also fail because route changes since the last backup will often be too great and thus the restarted node will have an unacceptably outdated restarted state.
Thus there is a need for recovery of node state information to be possible in the event of multiple node restarts in a network having relatively rapid changes in routing.
It is an object of the present invention to overcome at least some of the problems of the prior art node restart approaches described above.
The present invention provides a method of realigning a node in a label switched network
Typically the network will have a plurality of nodes and the method includes periodically maintaining backup path status information for the node. In response to restarting the node, the label switched paths are re-established with the other nodes in the network using the backup status information. This is achieved by communicating with an adjacent node in order to establish a path reliability value for each path recorded in the backup status information using a reliability value from an adjacent node, in order to establish node realignment.
Advantageously, during restart, the node maintains a path reliability value for a path traversing the node. These path reliability values may be exchanged with adjacent nodes for a path traversing the adjacent nodes to establish a cumulative path reliability value for the path. In this way, the probability of the path being a valid path may be increased since the existence of the path is based upon multiple stored backups in the network rather than just the backup status information of the single node being restarted.
Preferably the cumulative path reliability value is compared with a path reliability threshold to determine whether the related path is reliable and thus may be considered to be a path which was in existence before the node restart. Advantageously the path reliability threshold is a function of the number of Hops in the related path.
The invention also provides a node for use in a label switched network comprising a plurality of nodes. The node includes a processor which is operable periodically to create a backup status record for the nodes. The node also includes a memory arranged to store the backup status record and a processor arranged to restart the node on command and the restore the node to a state defined by the backup status record. The processor is also arranged to calculate a path reliability value. The node has a network interface which is operable to receive path status information and path reliability values from adjacent nodes.
Following a restart, the node may receive a path reliability value from an adjacent node and the processor may modify the received path reliability value dependent on an internally held path reliability value for the same path. This modified path reliability value may then be passed to another adjacent node. The processor may be arranged to modify the path reliability value to establish a cumulative path reliability value for the path. The processor may further be arranged to compare the cumulative path reliability value with a path reliability threshold to determine whether the related path is reliable and thus may be properly set up following the restart. Preferably the processor is arranged to calculate the path reliability threshold as a function of the number of Hops in the related path.
The invention also provides a computer program product which, when executed on a computer, causes the computer to carry out the method of the invention.
Embodiments of the invention will now be described by way of example with reference to the drawings, in which:
The invention is described below in connection with generalised multi-protocol label switching (GMPLS) but the technique is applicable to any network having multiple nodes and routes set up through those nodes and in particular, to label-switched nodes.
Typically in such a network, a node or “network element” (NE) may be restarted due to a software problem or normal maintenance. The restart of the control unit in the NE causes re-initialisation of the control plane meaning that the control plane information about the label switched paths (LSPs) in the Network Element (NE) may be lost. Typically the data plane is unaffected.
Briefly, the NE stores periodic backup files of the path status, restores itself according to this backup file and then communicates with adjacent nodes to test the validity and currency of the backup file. This is described in more detail below.
In general terms, a particular LSP is considered to be valid when a significant number of the NEs traversed by that LSP also have the same information relating to that LSP. Hence, the more NEs that have recorded the existence of an LSP immediately before the NE restart, the more likely it is that the LSP was indeed current before the restart and had not been removed.
Thus with reference to
During a restart, link management protocol (LMP) information is used in order to determine if an neighbouring node which is typically a node a single Hop away, is UP or DOWN. When the first control channel with an adjacent NE is up then the adjacency with that NE is considered to be UP and when the last control channel of an adjacent NE goes down, the adjacency with that NE is considered to be DOWN and it is likely that the neighbouring, or adjacent, NE has restarted.
In order to realign a neighbour, different Notify messages are used as explained below in connection with
The realignment procedure uses three new Notify messages:
The new realignment procedure also uses the concept of an LSP Degree of Reliability which is explained in detail below.
LSP Degree of Reliability
An adjacent NE i.e. any NE which is traversed by a particular LSP, can be in two different states; namely Aligned or Restarted. Also information about an LSP can be Reliable or Not_Reliable and with a certain degree of reliability.
An adjacency is considered to be Aligned when if viewed from a particular NE, the adjacent NE has information about LSPs which is entirely consistent between the NEs.
When an adjacent NE is considered to be in the Restarted state, it will instead hold information about LSPs which do not have an adequate or consistent degree of reliability compared with an adjacent NE. In the preferred embodiment, the degree of reliability for an LSP which is Reliable is zero whilst a Not_Reliable LSP can have a degree of reliability assuming any value between 1 and 28.
When the two sides of an adjacency, i.e. two NEs linked along an LSP, exchange the same LSP information and the state of the LSP has a degree of reliability indicated as reliable, then the state of that adjacency moves to Aligned.
In order to show how the procedure may be implemented, we define the following parameters:
At restart, the NE 2 rebuilds its database retrieving LSP's information from .rpp files from its persistent storage 6 or restored from a backup 8. Because this information is not reliable (it may be out-of-date) the degree of reliability for each LSP is assigned the value DR=1.
When the DR reaches a threshold (Th) value the reliability for an LSP is moved to 0 (Reliable).
Assuming an LSP traverse #_Hops, the rule is
If DR>=Th then DR==0
Where Th=#_Hops/X
X is a configurable value defined on network base.
As explained below, since the DR value gradually increments as the P_DR values propagate through a network, the division by the number of hops helps normalise the threshold value for different sized network. The value X, allows adjustment of the weighting given to the threshold value, Th; a lower value allowing an LSP to be considered Reliable with a larger number of NEs having information about the LSP in their backup data. The relevance of this will become more apparent later.
This procedure is repeated for each LSP in the rebuilt database. Every time there is an LSP “unknown” by an adjacent NE its DR, P_DR and R_DR are considered ‘null’. Null values are considered less then 1 but higher then 0; so somewhere between Not_Reliable and Reliable.
Thus is pseudo-code we have:
Procedure
NE Restarted (start):
Each Aligned neighbouring NE (i.e. neighbours that haven't been re-started so have a completely reliable LSP database) receives a Notify_Update from the restarted NE
The NE (Restarted) must process the information received from its Aligned neighbouring NEs.
Thus:
NE Restarted (realigning):
The state of the LSPs is the same for both the Upstream and Downstream side of the adjacency while the state of the adjacency depends on the state of all circuits it shares with the neighbour.
This is explained in more detail with reference to
Firstly, with reference to
The DR values 50 are shown being incremented as messages pass between the NEs in the direction of the arrows. The R_DR values are labeled 52 and the P_DR values are labeled 54. In this case, Th is set at 3.
It will be seen that at restart, NE 2′-7 only has DR=1 for route a. But NE 2′-2 and NE 2′-4 also have route a in their backup information and thus the cumulative DR increases as it propagates through these NEs.
When P_DR reaches NE7, 2′-7 the DR for route a at NE7 is finally incremented up to the Th threshold and thus this NE is able to issue a Notify_Add message with full setup information and thus allow all the other NEs in the LSP, to set the LSP as reliable (DR=0) and set the route up.
In the discussion above, it has been assumed that DR for an LSP reaches the Th threshold. However, it is necessary to deal with the case in which the Th threshold is not reached for an LSP.
With reference to
The rule used by the NE in order to decide whether to keep or delete the LSP is the following:
If (LSP_Length−#_NE_Unaware)+DR <Th then the LSP can be removed
Where:
All the NEs have restarted; NE1, 2″-1 and NE62″-6 recovers via .rpp, information about circuit a.
The first number on each arrow between NEs is the DR while the second one is the #_NE_Unaware
The boxes 56 represent the cumulative #_NE_Unaware.
The Th is again 3.
When the Notify_Update with DR=1 and #_NE_Aware reach the NE 62″-6 we have:
So: (6-5)+1<3 and thus it is possible to delete the LSP.
The realignment procedure is closed when all the LSPs a NE shares with a neighbour are in the reliable state, that is, are all with DR=0.
The process starts, step 100 and the NE creates a backup, step 102. When the NE restarts, step 104, the NE maintains a path reliability value for each path in the backup, step 105 and begins to re-establish paths, step 106, by exchanging path reliability values with adjacent nodes and deriving a cumulative reliability value for each path, step 108. The cumulative path reliability value is compared with a threshold, step 110 and a decision made whether to create or delete the path, step 112. When all the paths are deemed in a reliable state, the process stops, step 114.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2012/067981 | 9/13/2012 | WO | 00 | 3/13/2015 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2014/040628 | 3/20/2014 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
7130304 | Aggarwal | Oct 2006 | B1 |
7359377 | Kompella | Apr 2008 | B1 |
7680028 | Zamfir | Mar 2010 | B1 |
8040793 | Gao | Oct 2011 | B2 |
20020036989 | Payton | Mar 2002 | A1 |
20030189947 | Beshai | Oct 2003 | A1 |
20050018599 | Mahasoom | Jan 2005 | A1 |
20050185621 | Sivakumar | Aug 2005 | A1 |
20060002368 | Budampati | Jan 2006 | A1 |
20060023677 | Labrador | Feb 2006 | A1 |
20060072461 | Luong | Apr 2006 | A1 |
20060098608 | Joshi | May 2006 | A1 |
20060126496 | Filsfils | Jun 2006 | A1 |
20060164975 | Filsfils | Jul 2006 | A1 |
20070121486 | Guichard | May 2007 | A1 |
20070286097 | Davies | Dec 2007 | A1 |
20080084890 | Kompella | Apr 2008 | A1 |
20090147731 | Chion | Jun 2009 | A1 |
20090268605 | Campbell | Oct 2009 | A1 |
20090296668 | Capone | Dec 2009 | A1 |
20090310482 | Asaie | Dec 2009 | A1 |
20100128680 | Coletti | May 2010 | A1 |
20110090786 | Liu | Apr 2011 | A1 |
20110142056 | Manoj | Jun 2011 | A1 |
20110235504 | Nozaki | Sep 2011 | A1 |
20120014246 | Matsumoto | Jan 2012 | A1 |
20130242717 | Fujii | Sep 2013 | A1 |
20150109943 | Sahin | Apr 2015 | A1 |
20160029403 | Roy | Jan 2016 | A1 |
Entry |
---|
“PCT International Search Report for PCT Counterpart Application No. PCT/EP2012/067981”, (May 31, 2013), 3 pages. |
Jiangweilian, et al., “Mechanism of multiple adjacent nodes RSVP graceful restart Simultaneously”, draft-jian-ccamp-multinodes-rsvp-restart-00, Network Working Group Internet Draft, The Internet Society, (Nov. 20, 2005), 12 pages. |
Rahman, et al., “RSVP Graceful Restart Extensions”, draft-rahman-rsvp-restart-extensions-00.txt, (Oct. 2003), 9 pages. |
Satyanarayana, et al., “Extensions to GMPLS Resource Reservation Protocol (RSVP) Graceful Restart”, RFC 5063, The IETF Trust, http://tools.ietf.org/html/rfc5063 (Sep. 2007), 24 pages. |
Wu, et al., “Recovery from control plane failures in the CR-LDP and O-UNI signalling protocols”, Design of Reliable Communication Networks, 2003. (DRCN 2003). Proceedings. Fourth International Workshop on, IEEE, (Oct. 19-22, 2003), pp. 139-146. |
Wu, et al., “Recovery from control plane failures in the LDP signalling protocol”, Optical Switching and Networking, ELSEVIER, (2005), pp. 148-162. |
International Preliminary Report on Patentability, Application No. PCT/EP2012/067981, dated Mar. 26, 2015, 9 pages. |
Number | Date | Country | |
---|---|---|---|
20150244614 A1 | Aug 2015 | US |