In modern data communications, data for an end-to-end network connection may be carried through two or more separate provider networks, each being owned and/or operated by a separate communications carrier. For example, a communications subscriber's data may travel through separate provider networks operated by AT&T and MCI. Such separate provider networks, or sub-networks as referred to herein, are capable of inter-operating in certain ways thanks to widely observed standards. However, each provider network is managed substantially independently, and information about the internal structure of each provider network is generally guarded with great care for reasons of business security and confidentiality. The sub-network operators expose the minimum information about network internals that is necessary to comply with pertinent interface standards. Networks including such separate sub-networks are herein termed “heterogeneous networks”. The sub-networks themselves may be referred to as “opaque” networks due to the very limited information about the network internals that are available or “visible” outside the sub-networks.
There is a current trend toward increased use of so-called Multi-Protocol Label Switching (MPLS) techniques in wide-area networks such as the heterogeneous networks described above. MPLS enables the establishment of connections known as “label-switched paths” (LSPs) on which data may be exchanged by endpoint devices, such as switches and routers within endpoint networks for which the heterogeneous network provides wide-area communications. MPLS can be used in conjunction with signaling protocols such as Resource Reservation Protocol with Traffic Engineering extensions (RSVP-TE), which provides mechanisms for establishing, maintaining, and tearing down LSPs (which are also referred to herein as TE LSPs for “Traffic Engineering Label Switched Paths”). A recent variant of MPLS is known as Generalized MPLS or GMPLS.
One facet of operation of networks employing MPLS Traffic Engineering and other protocols is the identification and signaling of network failures and support for recovery from such failures. Using RSVP-TE, for example, it is possible for a network device that detects a failure to generate a Path Error message that identifies the location of the failure, and to send the Path Error message to a path-originating device that is responsible for maintaining the path. The path-originating device can respond to the Path Error message by attempting to set up a new communications path to replace the original path, which will in many cases have been torn down automatically upon occurrence of the failure. In a Path Setup message for the new path, the path-originating device may specify that the new path should avoid the location of the failure. Other devices within the network perform path calculations for the new path while observing this constraint as signaled by the path-originating device. Note that in this case the entire path is not computed by the path-originating device (also referred to as the “head-end” LSR (Label Switched Router)) but by several network devices along the path.
In heterogeneous networks, the interfaces between sub-networks of different providers, and the interfaces between sub-networks and endpoint (or user) networks, are well-specified in order to promote interoperability. The interface between sub-networks is referred to as a network-to-network interface or NNI, and the interface between a sub-network and an endpoint network is referred to as a user-to-network interface (UNI). One common set of NNI and UNI specifications in use are those promulgated by the Optical Internetworking Forum (OIF). In these specifications, there is a provision for identifying the location of a network failure by using certain messages and appropriate data fields thereof. An error message sent from one subnetwork to another may include no information in the pertinent fields, or may include an identification of only the edge-most device in the sub-network, in order to minimize the exposure of internal sub-network information as discussed above. More specific information about the location of a failure stays within the sub-network in which the failure occurred. From the perspective of external devices, such as those of a customer or another network provider on the other side of a UNI or NNI interface, there is little or no information about the specific location of the failure.
One shortcoming of the above-described operation of heterogeneous networks is the inability of devices outside a sub-network containing a failure to make the best routing decisions for new communications paths to be established in response to the failure, due to the lack of detailed information about the location of the failure. Based on the limited information it has, an external device such as a path-originator may be forced to use a relatively expensive alternative path, or may erroneously conclude that no alternative path exists, when in fact the sub-network with the failure may itself be capable of providing part of a more attractive alternative path. Additionally, because of the lack of detailed failure information, logging errors at the external device is incomplete, and diagnosing problems in the heterogeneous network can be adversely affected. In some networks, third-party network managers are utilized to provide network management services across multiple provider networks of a heterogeneous network, and the operation of these networks can also be adversely affected by the lack of detailed failure information outside of each provider network.
In accordance with the present invention, methods and apparatus are disclosed for error recovery in heterogeneous networks in which it is desired to maintain the confidentiality of information about the internal structure and/or operation of sub-networks. In particular, methods and apparatus are disclosed for establishing a new communications path between two end devices in a computer network in response to failure of a previously established communications path between the two end devices, when the failure occurs in a sub-network at a failure location whose identity is not to be revealed to an external device outside the sub-network that is responsible for maintaining communications between the two end devices.
In response to the failure, the external device is signaled that the failure has occurred by a device within the sub-network having the failure, such as by a router adjacent to a failed communications link. The signaling includes an encoded identifier of the failure location that enables identification of the failure location within the sub-network while masking the identity of the failure location to the external device. In one embodiment, the failure location information is encrypted into an encrypted sub-object within a signaling message. In another embodiment, the signaling message includes a token that is associated with the failure location information, which remains stored within the sub-network.
In response to the signaling of the failure, the external device issues a path-establishment message for the new communications path. The path-establishment message indicates that the new communications path should exclude the failure location as identified by the encoded identifier, which is included in the path-establishment message. In the case of an encrypted sub-object, the encrypted sub-object is included as a corresponding sub-object of the path-establishment message. When the token approach is utilized, the token is included in the path-establishment message.
A device within the sub-network responds to the path-establishment message by determining whether a path segment for the new communications path can be provided through the sub-network while excluding the failure location as identified by the encoded identifier from the path-establishment message. This may involve decrypting the encrypted sub-object or retrieving stored failure location information based on the token included in the path-establishment message. Based on whether such a path segment can be provided, the device then proceeds accordingly with the path setup operation.
By using the encoded identifier, the external device is able to more precisely specify a failure location to be avoided when a new communications path is being set up, and can therefore avoid the above-discussed problems that arise from the lack of detailed failure location information. However, the detailed information about the failure location is encoded so as to be unavailable outside the sub-network, thus preserving such information in confidence as may be desired by network providers.
The foregoing and other objects, features and advantages of the invention will be apparent from the following description of particular embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention.
Each provider network 16 includes a plurality of switches or routers 18 (18-11, 18-12 and 18-21 though 18-24 ), which may be IP routers or optical or time-division-multiplex (TDM) switches. Such devices are referred to generally herein as “routers”. The routers 18 are responsible for establishing and maintaining communications paths for the transfer of data among endpoint devices such as the edge devices 14 of the endpoint networks 12. In one embodiment, the routers 18 are so-called “label switched routers” (LSRs) that utilize a switching technology referred to as Multi-Protocol Label Switching (MPLS) or Generalized Multi-Protocol Label Switching (GMPLS). In accordance with MPLS techniques, communications paths known as “traffic engineering label-switched paths” (TE LSPs) are established through the provider networks 16 as respective sets of entries in forwarding tables maintained in the routers 18. Each router 18 uses a label contained within each received communications message to retrieve a corresponding entry from its forwarding table that indicates which port of the router the message should be forwarded on. Each entry generally also includes another label that is used in the outgoing message in place of the label received with the incoming message. The functions of consulting the local forwarding table and forwarding the message accordingly are repeated at each router 18 to realize each communications path that has been established.
The TE LSPs used in the heterogeneous network 10 of
Although in the present description RSVP-TE is utilized as the signaling protocol, it is to be understood that RSVP-TE is provided as an illustrative example only and that in alternative embodiments other communication protocols may be used for the functions described herein, including the signaling of network failures as described below.
It is assumed that routers 18 adjacent to the failure 20 detect the presence of the failure 20 and take actions to notify other devices in the network so that (1) any communications paths that are affected by the failure 20 can be re-routed if possible to restore communications, and (2) new communications paths avoid the failure 20. Many network routing protocols already provide for failure location information to be “flooded” so as to become known by all routing devices in the network for these purposes. Unfortunately, normal flooding mechanisms may be too slow. The requests for new paths may be re-tried several times while the information about the failure 20 is flooded throughout the network. If this mechanism is the only one relied upon for restoring communications, customers' communications may experience unacceptably long downtime.
In addition to the aforementioned flooding of failure location information, a router 18 may initiate the tearing down of a communications path along which a failure such as failure 20 has occurred. In the example of
The limited network view depicted in
In step 24, the occurrence of the failure is signaled to the external device. This signaling might be initiated, for example, by a router adjacent to the failure such as the router 18-22 in
In step 26, the external device that receives the failure signaling (e.g. edge device 14-1) issues a new path-establishment message for a new communications path. The path-establishment message indicates that the new communications path should exclude the failure location, as identified by the encoded identifier which is included with the path-establishment message. Mechanisms for providing such indications in the context of RSVP-TE in particular are described below. This path-establishment message can be sent into the sub-network containing the failure on the possibility that an alternative path through the sub-network can be created.
In step 28, a device within the sub-network containing the failure responds to the path-establishment message by determining whether a path segment through the sub-network can be provided that excludes the failure location as identified by the encoded identifier in the path-establishment message. In one embodiment, this step includes decrypting the encrypted identifier of the failure location, and then performing normal path-computation operations with the constraint of excluding the identified failure location. This operation may be carried out, for example, by an edge router such as the router 18-21. The remainder of the normal path setup process then ensues with respect to the new communications path.
In an RSVP-TE environment, the messaging of steps 24 and 26 can be carried out using Path Error and Path Setup messages.
As mentioned above, the signaling of the fault location 20 can be accomplished using an RSVP-TE Path Error message. Per the RSVP-TE specification, each Path Error message may include an ErrorSpec object that is used to provide information about the location of the failure. However, in a heterogeneous network such as network 10, the ErrorSpec object will normally be used to hold the address of the edge router of the sub-network where the failure occurred (e.g. router 18-21) as described above. Thus, an extension is proposed for RSVP-TE by which the Path Error message includes a “private error location” sub-object that includes the encoded identifier of the actual failure location within the sub-network.
When the originating device (e.g. router 14-1) generates a new Path Setup message, it includes the token 52 in the Private Exclude Object 48. A router 18 within the provider network 16-2 that receives the Path Setup message (e.g. router 18-21) uses the token from the Private Exclude Object 48 to look up the locally stored error information data object 50, and then uses the failure location information in that object in its path computations for the newly requested path.
It will be appreciated that the Path Setup message for the new communications path may not be received at the same edge router 18 that emitted the Path Error message from the provider network 16-2. When the above-described encryption method issued, this situation is automatically handled correctly as long as the decryption key has been distributed to all routers 18 that might receive the Path Setup message. When the token approach of
In the above description, the signaling for communications path failures and for new communications paths to be established is described as following generally the same paths as the communications paths themselves, which can be typical in an RSVP-TE environment for example. However, in alternative embodiments the signaling may travel different paths or may travel through an entirely separate network that may be dedicated to signaling. In such alternative embodiments, any signaling entities can be seen as extensions of and included within their respective sub-networks for purposes of the presently disclosed techniques. Thus, a signaling entity that operates on behalf of provider network 16-2, for example, will generally have access to the same information and perform the same signaling functions as described above for router 18-21, i.e., generating either an encrypted Private Error Location sub-object 40 or token 52, decrypting an encrypted Private Exclude Error sub-object 48, etc.
The techniques described herein can be used in a hierarchical or nested manner. For example, provider network 16-2 of
While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.