The present invention relates to methods of operating a control plane for a communications network, to control domain servers for such control planes, to network management systems for such control planes, and to corresponding computer programs for operating such control planes.
Control planes are known for various types of communications networks. They can be applied to access networks, to metro, to core networks and all the intermediate types, pushing for solutions that will soon span from end to end. The control plane has well known advantages including, among the others, significant OPEX and CAPEX savings, traffic restoration and so on. However the control plane has two main drawbacks:
Different flavors of control plane (i.e. Centralized or Distributed) can be used to mitigate one of the two drawbacks but unavoidably makes the other one even worse. The centralized approach, for example, allows a better control of the resource of the network, improves their utilization and reduces concurrent resource utilization to the minimum (or even zero in some cases), but it has great scalability issue. On the other side the distributed approach allows a fully automated management of very big networks, with the unavoidable problem of worse control of the resources being used.
One new approach is based on a split architecture (split between control plane and forwarding plane) that allows a better trade-off between the two above mentioned difficulties and thus potentially enables better performance on the network. The architecture is based on three pieces of equipment:
Forwarding Elements are grouped into domains, each of which has a Control Domain server “speaking” the control plane protocols and presenting its domain outside the boundaries as a single Network Element with a number of interfaces equal to the number of Forwarding Elements. What happens inside each domain is only a concern of the Control Domain server. The set up, tear down and restoration of LSPs is done by the Control Domain server via management plane interactions. There is no control plane message flowing between the forwarding elements of each domain, just a dumb forwarding by the border nodes towards the Control Domain server.
Coordination of actions of these three parts of the control plane can be degraded after recovery from a failure.
Embodiments of the invention provide improved methods and apparatus. According to a first aspect of the invention, there is provided a method of operating a control plane having a central network management system, one or more control domain servers coupled to the network management system, and a plurality of forwarding elements, for controlling respective network entities, and grouped into control domains. Each domain has one of the control domain servers coupled to the forwarding elements of that domain to manage control plane signalling for the domain. The network management system has record of the circuits set up in all of the control domains, and the forwarding elements have a record of those of the circuits set up to use its respective network entity. After a failure in a part of the control plane, amongst the control domain server, the forwarding elements, and the network management system, communications is re-established between those parts of the control plane affected by the failure. There is a step of checking automatically for inconsistencies between the records held at the different parts of the control plane, caused by the failure, by comparing checklists of the records of the circuits stored at the different parts of the control plane. If an inconsistency is found, the records at the part or parts affected by the failure are updated automatically using a copy of a corresponding but more up to date record sent from another of the parts of the control plane.
A benefit of these features is that the updating can enable the reliability of the network to be improved since inconsistencies are likely to cause further failures. Notably since the checking can be achieved without forwarding entire copies to or from the NMS, this means the improved reliability can be achieved more efficiently as the quantity of control plane communications and processing resource needed can be reduced. Thus the arrangement is more scaleable. See
Any additional features can be added, some such additional features are set out below. The method can have the step of using the control domain server to generate a checklist for any of the forwarding elements of its control domain based on the records held at the respective forwarding element, for use in the step of checking for inconsistencies. By centralising the generating of the checklist for each domain this can enable the forwarding element to be simpler and not need to generate the checklist, and thus be more compatible with existing or legacy standards. See
Another such additional feature is the checking step comprising forwarding a checklist of the records between the control domain server and the network management system, and using either the network management system or the control domain server to compare the checklists for the records held at the different parts without forwarding entire copies of all the records between the network management system and the control domain server. By concentrating the checking at the network management system or the control domain server, this can help make the process more efficient and thus scalable. See
Another such additional feature is at least one of the control domain servers being arranged to maintain a record of current checklists for the forwarding elements of their domain. This can help enable quicker checking by avoiding the need to generate the checklist on the fly. See
Another such additional feature, in case of the failure being at the network management system, of using the checklist in the records at the control domain server, for the comparing step, comparing with corresponding information in the records at the network management system. This can enable check to be made more efficiently, particularly if it avoids the need to transfer more details from the CDS or the FE. See
Another such additional feature is the checking for inconsistencies comprising, for the case of the failure being at the control domain server, of generating current checklists at the control domain server by requesting a copy of circuit information from the records at the forwarding elements in the respective domain. This can enable more efficient operation and thus more scalability by sending a reduced amount of information sufficient checking without sending all the records unless needed for updating. See
Another such additional feature is the checking step also comprising the step of comparing the current checklist at the control domain server with corresponding information in the records at the network management system. This can help enable the check to be more efficient as it avoids the need to end all the records, if there turns out to be no inconsistency. See
Another such additional feature is the updating step comprising the steps of: the control domain server sending a request to selected ones of the forwarding elements to send a copy of the record which needs updating, the control domain server forwarding the copy to the network management system, and the network management system using these to update its records in relation to the respective forwarding elements. This can enable the updating to be more efficient if the CDS selects which records to forward rather than forwarding all the records. See
Another such additional feature is the checking for inconsistencies comprising, for the case of the failure being at one of the forwarding elements, of the steps of: the forwarding element sending to the control domain server an indication that the forwarding element needs to rebuild its records, and the control domain server sending this indication on to the network management system. This could enable the FE to do the checking and enable the CDS to control the updating. See
Another such additional feature is the updating step comprising the network management system selecting from its records, information about any circuits using the respective forwarding element, and sending this to the control domain server, the control domain server sending this information to the forwarding element, and the forwarding element updating its records. This can enable the CDS to control the updating. See
Another such additional feature is using the updated records at the parts of the control plane in a restoration procedure for restoring a failed circuit by rerouting onto a new route. In this procedure there is a particular need for consistency of records otherwise the new route may not be set up and valuable customer traffic may be lost while any inconsistency is being resolved.
Another aspect of the invention provides a control domain server for a control plane of a communications network, the control plane having a central network management system, one or more control domain servers coupled to the network management system, and a plurality of forwarding elements, for controlling respective network entities, and grouped into control domains, each domain having a respective one of the control domain servers coupled to the forwarding elements of that domain to manage control plane signalling for the domain. The network management system has a record of the circuits set up in all of the control domains, the forwarding elements have a record of those of the circuits set up to use its respective network entity, and the control domain server has parts arranged to re-establish communication between those parts of the control plane affected by the failure, then check for inconsistencies between the records held at the different parts of the control plane, caused by the failure. The control domain server also has a part arranged for updating, if an inconsistency is found, the records at the part or parts affected by the failure, by requesting a copy of a corresponding but more up to date record from the respective forwarding element or from the network management system, and forwarding the copy on to the other of these.
A possible additional feature is the control domain server having a part arranged to generate a checklist for any of the forwarding elements of its control domain based on the records held at the respective forwarding element, for use in the step of checking for inconsistencies.
Another such additional feature is the control domain server having a part arranged to maintain a record of current checklists for the forwarding elements of their domain.
Another aspect of the invention provides a network management system for a control plane of a communications network, the control plane also having one or more control domain servers coupled to the network management system, and a plurality of forwarding elements for controlling respective network entities, and grouped into control domains, each domain having a respective one of the control domain servers coupled to the forwarding elements of that domain to manage control plane signalling for the domain. The network management system has a record of the circuits set up in all of the control domains, the forwarding elements has a record of those of the circuits set up to use its respective network entity, and the network management system has parts for re-establishing communication between those parts of the control plane affected by the failure, then checking for inconsistencies between the records held at the different parts of the control plane, caused by the failure. If an inconsistency is found, the records affected by the failure are updated by requesting a copy of a corresponding but more up to date record from the respective forwarding element.
Another aspect of the invention provides a computer program on a computer readable medium, for operating a control plane of a communications system, the control plane having a central network management system, one or more control domain servers coupled to the network management system, and a plurality of forwarding elements, for controlling respective network entities, and grouped into control domains. Each domain has a respective one of the control domain servers coupled to the forwarding elements of that domain to manage control plane signalling for the domain, the network management system having a record of the circuits set up in all of the control domains, and the forwarding elements having a record of those of the circuits set up to use its respective network entity. The computer program has instructions which when executed by a processor or processors at one or more parts of the control plane amongst the control domain server, the network management system and the forwarding element cause the processor or processors to re-establish communication between those parts of the control plane affected by the failure, then check for inconsistencies between the records held at the different parts of the control plane, caused by the failure. If an inconsistency is found, the records at the part or parts affected by the failure, are updated by requesting a copy of a corresponding but more up to date record from the respective forwarding element or from the network management system, and forwarding the copy on to the other of these.
Any of the additional features can be combined together and combined with any of the aspects. Other effects and consequences will be apparent to those skilled in the art, especially over compared to other prior art. Numerous variations and modifications can be made without departing from the claims of the present invention. Therefore, it should be clearly understood that the form of the present invention is illustrative only and is not intended to limit the scope of the present invention.
How the present invention may be put into effect will now be described by way of example with reference to the appended drawings, in which:
The present invention will be described with respect to particular embodiments and with reference to certain drawings but the invention is not limited thereto but only by the claims. The drawings described are only schematic and are non-limiting. In the drawings, the size of some of the elements may be exaggerated and not drawn to scale for illustrative purposes.
ROADM Reconfigurable optical add drop multiplexer/demultiplexer
CAPEX Capital expenditure
OPEX Operating expenditure
Where the term “comprising” is used in the present description and claims, it does not exclude other elements or steps and should not be interpreted as being restricted to the means listed thereafter. Where an indefinite or definite article is used when referring to a singular noun e.g. “a” or “an”, “the”, this includes a plural of that noun unless something else is specifically stated. Elements or parts of the described nodes or networks may comprise logic encoded in media for performing any kind of information processing. Logic may comprise software encoded in a disk or other computer-readable medium and/or instructions encoded in an application specific integrated circuit (ASIC), field programmable gate array (FPGA), or other processor or hardware.
References to nodes can encompass any kind of switching node, not limited to the types described, not limited to any level of integration, or size or bandwidth or bit rate and so on.
References to switches can encompass switches or switch matrices or cross connects of any type, whether or not the switch is capable of processing or dividing or combining the data being switched.
References to software can encompass any type of programs in any language executable directly or indirectly on processing hardware.
References to processors, hardware, processing hardware or circuitry can encompass any kind of logic or analog circuitry, integrated to any degree, and not limited to general purpose processors, digital signal processors, ASICs, FPGAs, discrete components or logic and so on. References to a processor are intended to encompass implementations using multiple processors which may be integrated together, or co-located in the same node or distributed at different locations for example.
References to optical paths can refer to spatially separate paths or to different wavelength paths multiplexed together with no spatial separation for example.
References to circuits can encompass any kind of circuit, connection or communications service between nodes of the network, and so on,
References to a checklist can encompass any kind of summarised version of the complete information describing the circuits set up, such as a list of the identities of the circuits, a compressed version of the complete information, information which indirectly references the circuit identities and so on.
References to Control Domain Server are intended to encompass any centralized entity able to manage control plane signaling and routing for a set of Forwarding Elements.
References to Forwarding Elements are intended to encompass entities within a network node which are able to manage data plane and control plane signaling.
By way of introduction to the embodiments, how they address some issues with conventional designs will be explained.
The split architecture high level concept is illustrated in
The routers adjacent to any of the domains act as though they are connected to a single LSR node. In fact the clouds are seen externally as a single IP node independently of the actual implementation within the domain itself. Such description can be also referred to as “aggregation network as a platform”. A good comparison to understand this concept would be the analogy of the split architecture domain corresponding to a multi-chassis system, where multiple line cards are grouped and controlled by a Routing Engine centrally located in a dedicated platform. (where Control Domain server is analogous to the central Routing Engine, and the Forwarding Elements correspond to the different chassis).
This type of split architecture can provide the same type of services and features as today's IP centric MPLS networks. By having a shared rather than an independent Control Plane, as new services are added, the processing requirements increase and the likelihood of one service affecting another also increases; therefore, the overall stability of the system may be affected. For instance, having an IP VPN service, a virtual private LAN service (VPLS), and an Internet service running on the same router, the routing tables, service tunnels and other logical circuits are scaled together, and at some point will combine to exceed limits that each would not reach alone. When introducing new services in this environment, there is the need to perform compound scaling and regression testing—in both the lab and in the field—and determine how these services are affecting each other. However, by removing the tight coupling between control and forwarding planes of a router, each can be allowed to have new degrees of freedom to scale and innovate. Customers, sessions and services can scale on one hand, or traffic can scale on the other hand, and neither control nor traffic scaling will be dependent on the other. Virtualization advantages an also arise: there is no need to deploy different NEs for different services. True convergence can arise because of Network segmentation and isolation between geographically and/or operational separate domains, or because of Network simplification and isolation. Also, considering a Transport operational model, from an IP & control plane perspective, the problem of managing 100 s to 1000 s of Access/Aggregation nodes can be simplified to managing a small set (1 s to 10 s) of nodes. From a NE perspective, the 100 s to 1000 s of Access/Aggregation nodes can be managed similar to the way Transport nodes are operated: failures are treated as HW or physical connectivity problems which can be fixed by changing physical parameters, e.g. replace HW modules, chassis', system reboot, replace fiber or pluggable optical module etc. HW issues are reported to server as alarms which can give operational staff information about how to troubleshoot problems.
Speaking of traditional control plane features:
Signaling, which today is done by RSVP-TE, LDP or T-LDP (in case of distributed approach) can be handled as follows in the split architecture:
Regarding the Routing functionality, the same reasoning as above for the signaling is also valid.
Regarding path computation, this can be performed distributed -by the head NE (Ingress node)- or centrally by the NMS. The split architecture can offer added value in terms of multi-technology (for example packet and optical) coordination or for easily implementing specific algorithms not included in the existing solutions.
Regarding dynamic recovery (restoration), this is usually a distributed functionality performed by the control plane. Centralizing this would mean having a real time interaction between the network and the central element (the split architecture module in this case). This behavior is not available today in the current packet IPT-NMS.
Split architecture is not intended for replacing the NMS in the already available centralized functionality. In fact the NMS is typically designed for planning and provisioning activity that defines long-term characteristics of the network which typically have lifespan of months to years. The parts of the control plane having split architecture are meant for “real-time” automation for handling live operational events that frequently change. Moreover, the split architecture module of the NMS can be co-located or in many cases not co-located with the rest of the NMS if it is preferred to allow more flexibility and more independence from a single supplier.
Referencing the seamless MPLS (draft-ietf-mpls-seamless-mpls) model, split architecture would enable such a deployment in the case of Aggregation and Access NEs not supporting the protocols required (IBGP, ISIS, LDP). The split architecture in this case would take the role of control plane agent to handle ISIS, iBGP and LDP signaling on behalf of the AGN and AN nodes.
Generalized Multiprotocol Label Switching (GMPLS) provides a control plane framework to manage arbitrary connection oriented packet or circuit switched network technologies. Two major protocol functions of GMPLS are Traffic Engineering (TE) information synchronization and connection management. The first function synchronizes the TE information databases of the node in the network and is implemented with either Open Shortest Path First Traffic Engineering (OSPF-TE) or Intermediate System to Intermediate System Traffic Engineering (ISIS-TE). The second function, managing the connection, is implemented by Resource ReSerVation Protocol Traffic Engineering (RSVP-TE).
The Resource ReSerVation Protocol (RSVP) is described in IETF RFC 2205, and its extension to support Traffic Engineering driven provisioning of tunnels (RSVP-TE) is described in IETF RFC 3209. Relying on the TE information, the GMPLS supports hop-by-hop, ingress and centralized path computation schemes. In hop-by-hop path calculation, each node determines only the next hop, according to its best knowledge. In the case of the ingress path calculation scheme, the ingress node, that is the node that requests the connection, specifies the route as well. In a centralized path computation scheme, a function of the node requesting a connection, referred to as a Path Computation Client (PCC) asks a Path Computation Element (PCE), to perform the path calculations, as described in IETF RFC 4655: “A Path Computation Element (PCE)-Based Architecture”. In this scheme, the communication between the Path Computation Client and the Path Computation Element can be in accordance with the Path Computation Communication Protocol (PCEP), described in IETF RFC 5440.
A problem related to these types of split architecture is the possibility of misalignment between the entities which are located apart, between the Control Domain server and its Forwarding Elements, and between the NMS which is always present, and the other entities.
In order to have the whole architecture correctly working, the three entities should always be aligned in the sense of them having data records which are consistent, particularly the records of circuits which are currently set up. Misalignment can be caused by either problems on the DCN or by a temporary outage of one of them. When the connection is restored the databases may be have out of date information and thus be inconsistent with each other.
To address the misalignment issue, there is a realignment procedure, having the steps shown in
Three sub-procedures are defined in the following, respectively covering the failure and recovery of the NMS (
In
In
In order to minimize the creation of orphans the NMS sends to the restarting FE also the information related to the LSP that it was not able to delete due to the restart of the FE. After the realignment that LSP are deleted with the usual punctual Delete message.
There are two timers namely T1 and T2:
Nota Bene: Each time the CDS receives an ImAlive message it restarts the re-alignment procedure. That happens even if the first Information message has been sent.
Other variations can be envisaged within the scope of the claims.
| Filing Document | Filing Date | Country | Kind | 371c Date |
|---|---|---|---|---|
| PCT/EP2011/071245 | 11/29/2011 | WO | 00 | 6/30/2014 |