The present invention relates generally to communication systems and, in particular, to facilitating high availability in secure network transport.
High availability in secure systems is often achieved via redundancy. For transport networks, such as wireless backhaul networks, the existing solutions are not scalable and do not have automatic recovery when failure occurs while maintaining security. Switchover times are sufficiently long such that existing services (e.g., VoIP calls) are terminated, with visible impact in performance. In today's fast switching networks, a system architecture incorporating new techniques is needed to maintain security, reliability and load balance so that transport resources can be recovered quickly to prevent service interruption.
Specific embodiments of the present invention are disclosed below with reference to
Simplicity and clarity in both illustration and description are sought to effectively enable a person of skill in the art to make, use, and best practice the present invention in view of what is already known in the art. One of skill in the art will appreciate that various modifications and changes may be made to the specific embodiments described below without departing from the spirit and scope of the present invention. Thus, the specification and drawings are to be regarded as illustrative and exemplary rather than restrictive or all-encompassing, and all such modifications to the specific embodiments described below are intended to be included within the scope of the present invention.
The advent of wireless high speed packet data has caused the Radio Access Network (RAN) in wireless networks to evolve from a circuit switched to a packet switched “all IP” network, in an effort to meet high capacity demand efficiently and to interface and operate with other packet data networks. As these IP networks are deployed, wireless operators demand the transport services to be reliable. Furthermore, the transport network elements are required to operate at high availability in a secure environment while maintaining high data throughout capacity.
While the performance of traditional transport networks is determined by the bandwidth limitations and by the reliability requirements (networks use some form of sparing scheme to meet the reliability objectives), it is possible to operate the transport gear in a manner that is limited by the hardware capacity. This may be done by performing load balance and fault management at the same time, such that the hardware is utilized more efficiently.
In addition, “all IP” networks, telecommunication equipment and computers use open interfaces and protocols for communication based on the TCP/IP protocol suite, which makes them vulnerable to internal and external attacks. These network assets need to be protected against these threats, as required by the service operators.
One way to protect the network equipment and traffic in transit is to protect the layer 3 (L3) traffic by using IPsec tunnels. IPsec tunnels protect the network interfaces and the L3 traffic and above layers by supporting host authentication, traffic confidentiality, integrity protection, anti-replay and non-repudiation on a per IP packet basis. Even if IPsec is an effective security solution reaching many security dimensions, the failure of an IPsec tunnel creates a reliability condition that must be addressed. This is particularly important in large networks with many hosts, where the likelihood of failures and security attacks is higher. In order to provide a reliable L3 transport with high availability while preserving the security policies during failures requires resource diversity, usually implemented via some form of redundancy. In order to provide an automatic recovery service to overcome IPsec failures that is self-healing and requires no manual intervention, several components are proposed: a detection mechanism to detect tunnel failure; a trigger mechanism driven by the detection system to initiate recovery procedures; a fault management recovery procedure to switchover the traffic while preserving security; and a mechanism of detection and activation to switch back to the original network configuration, after detecting that the failed equipment has been repaired, to re-establish load balance, all while preserving security.
Such actions should be performed quickly in order to maintain high levels of quality of service. For instance, a reliability requirement driven by some service providers is to implement security in the backhaul network without significant impact in the overall end-to-end availability. This means IPsec detection, switchover and recovery should be done very quickly to prevent VoIP call drops and other service discontinuities.
Thus, in view of the desires of system operators, a system should provide a transport solution that support load balance, high availability and security with high performance. The main components of such a solution are: a fault management mechanism for high availability, load balance of backhaul traffic and secure communication.
The present invention can be more fully understood with reference to
In general, network nodes and security gateways are known to comprise components such as processing units and network interfaces. In addition and again generally speaking, processing units and network interfaces are well-known components themselves. For example, processing units are known to comprise basic components such as, but neither limited to nor necessarily requiring, microprocessors, microcontrollers, memory devices, application-specific integrated circuits (ASICs), and/or logic circuitry. Such components are typically adapted to implement algorithms and/or protocols that have been expressed using high-level design languages or descriptions, expressed using computer instructions, expressed using signaling flow diagrams, and/or expressed using logic flow diagrams.
Thus, given a high-level description, an algorithm, a logic flow, a messaging/signaling flow, and/or a protocol specification, those skilled in the art are aware of the many design and development techniques available to implement a processing unit that performs the given logic. Therefore, network nodes and security gateways represent a known devices that have been adapted, in accordance with the description herein, to implement multiple embodiments of the present invention. Furthermore, those skilled in the art will recognize that aspects of the present invention may be implemented in and across various physical components and none are necessarily limited to single platform implementations. For example, processing units and/or network interfaces, in either network nodes or security gateways, may be implemented in or across one or more network components, such as one or more network platforms/servers. Also, although the network nodes in the figures are depicted as eNBs, thereby providing a concrete example to the reader, network nodes can be more generally characterized as IP hosts implemented in or across one or more network components, such as one or more network platforms/servers.
Diagram 100 shows an example network topology to illustrate some basic principles that further some desired architecture goals. High availability is achieved by using redundancy. The simplest level of redundancy is a 1+1 system where functions are supported in two identically prepared mate systems. Security GW 1 and Security GW 2 are two mates of a single system called the Security Gateway. The system is designed to support the designed processing capacity with the two mates, or with a single mate, in case the other mate is down. During normal operation, many IP hosts (eNB1, eNB2, eNB3,eNB4) are connected to the security gateway. For large networks, the Security GW can terminate many hundreds of eNBs, and for powerful Security GWs, a single Security Gateway can terminate many thousand of eNBs. High availability is achieved via a redundant 1+1 system. In this approach, load balance is achieved via the configuration deployment. This means that during normal operation (i.e., when both Security GW mates are up and running), half of the eNB IP hosts are connected to Security GW1, while the other half of the eNBs are connected to Security GW2, as shown in diagram 100. The specific interfaces between the eNBs and the Security GWs are provisioned during initialization of each eNB, and do not need to be modified during operation.
Communication security is provided via IPsec. Each IPsec tunnel terminates at an eNB and at a Security Gateway. In order to be secure and reliable on the Security Gateways, each eNB terminates two IPsec tunnels: one tunnel connected to Security GW1 and one tunnel connected to Security GW2. Since in this example, the eNB hardware is not duplicated, it represents a single point of failure. However, due to concentration, it is far more important to have the Security Gateway reliable than a single eNB, and it is far cheaper to implement when compared to eNB high availability.
During normal operation, traffic is load balanced with a granularity of a single eNB and transport is secure. When a failure occurs, resources must be switched to address the failure and re-establish service. Each traffic direction (downlink and uplink) must be treated separately, because redundancy is asymmetric. For a system where both, the eNB and the Security Gateways are duplicated, one can apply the same ideas described in this approach in a symmetric manner for downlink and uplink traffic.
Specifically, in a scenario where the eNB and the Security Gateway are both redundant, the ideas proposed herein can be extended to each eNB mate, where each eNB mate is connected with each Security Gateway mate for a total of four independent connections. In this configuration each eNB1 mate behaves in the same manner as the single eNB1 scenario, but the security gateway mates must route the downlink traffic to the preferred IPsec tunnel, or if not available, to the alternate IPsec tunnel.
Central to the implementation of load balance and security is the concept of a preferred IPsec tunnel. The preferred IPSec tunnel is the one that, if operational, is the one chosen to send traffic by the sender. The preferred tunnel is set on a per eNB basis (but alternatively could be set per interface), and represents the mechanism to load balance the traffic during normal operation. For load balance, half the eNBs have their preferred IPsec tunnels assigned to the top Security Gateway (SGW1), while the other half of the eNBs have their preferred IPsec tunnel assigned to the bottom Security Gateway (SGW2). The preferred IPsec tunnel is provisioned at the eNb and at the security Gateway interfaces, and they are assigned to the same IPsec physical tunnel. This is desirable in order to be able to load balance the downlink and the uplink at the same time. This can also simplify the IPsec policy implementation and troubleshooting, specially during the phase of recovery and re-establishment of the load balance condition.
In this downlink approach, a preferred IPsec tunnel is configured per eNb. Both the eNB and the Security Gateway should be provisioned with this information. At any given time, the Security Gateway monitors the preferred IPSec tunnel, and if the tunnel is running correctly, the Security Gateway sends traffic to the eNB via the preferred IPsec tunnel. If the preferred tunnel fails in the downlink, the Security Gateway routes traffic via the alternative IPsec tunnel. When the preferred IPsec tunnel is operational, the Security Gateway switches routes again so that it sends downlink link traffic to the eNB via the preferred IPsec tunnel. In this way, load balance is re-establish after the repair of the failure is completed and the outage is fixed. As an illustration, the following steps describe in detail how a network thus configured would handle downlink failure due to tunnel failure:
In general, some, if not all, of the embodiments described herein are effective to detect, repair and recover automatically IPSec tunnels due to failures of transport gear (L2/L3 switches) as well as the IPsec gateway components. Load balance is also an integral part of the approach. When a failure is repaired, the architecture in various embodiments will re-establish load balance and high availability automatically at L2 and L3 and preserve security during the switch-over and recovery process.
The detailed and, at times, very specific description above is provided to effectively enable a person of skill in the art to make, use, and best practice the present invention in view of what is already known in the art. In the examples, specifics are provided for the purpose of illustrating possible embodiments of the present invention and should not be interpreted as restricting or limiting the scope of the broader inventive concepts.
Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments of the present invention. However, the benefits, advantages, solutions to problems, and any element(s) that may cause or result in such benefits, advantages, or solutions, or cause such benefits, advantages, or solutions to become more pronounced are not to be construed as a critical, required, or essential feature or element of any or all the claims.
As used herein and in the appended claims, the term “comprises,” “comprising,” or any other variation thereof is intended to refer to a non-exclusive inclusion, such that a process, method, article of manufacture, or apparatus that comprises a list of elements does not include only those elements in the list, but may include other elements not expressly listed or inherent to such process, method, article of manufacture, or apparatus. The terms a or an, as used herein, are defined as one or more than one. The term plurality, as used herein, is defined as two or more than two. The term another, as used herein, is defined as at least a second or more. Unless otherwise indicated herein, the use of relational terms, if any, such as first and second, top and bottom, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The terms including and/or having, as used herein, are defined as comprising (i.e., open language). The term coupled, as used herein, is defined as connected, although not necessarily directly, and not necessarily mechanically. Terminology derived from the word “indicating” (e.g., “indicates” and “indication”) is intended to encompass all the various techniques available for communicating or referencing the object/information being indicated. Some, but not all, examples of techniques available for communicating or referencing the object/information being indicated include the conveyance of the object/information being indicated, the conveyance of an identifier of the object/information being indicated, the conveyance of information used to generate the object/information being indicated, the conveyance of some part or portion of the object/information being indicated, the conveyance of some derivation of the object/information being indicated, and the conveyance of some symbol representing the object/information being indicated. The terms program, computer program, and computer instructions, as used herein, are defined as a sequence of instructions designed for execution on a computer system. This sequence of instructions may include, but is not limited to, a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a shared library/dynamic load library, a source code, an object code and/or an assembly code.