The present invention relates to communication network equipment (NE), and more particularly to packet switching and routing devices used in communication networks that provide support for 1+1 and N:1 line-card redundancy in the data path across a switch/router backplane. The focus of the present invention is minimizing data loss in the presence of a line card failure.
A desirable characteristic of a data network is resiliency. A line card is part of a switch/router which is used to receive and process data units from other devices and to forward the data units to other devices. The card in the system may not have external line connections to other network elements (NE) but still connects to other cards within the same system via a switching fabric. The invention presented in this case cover both card types. Ordinarily, when a line card fails, the data units, which would otherwise traverse it, are lost, until a dynamic routing protocol reconfigures the switch/router to forward the data units on the other line cards. This reconfiguration may take several seconds or even minutes.
Alternatively, modern switches/routers provide line card redundancy. A device implementing line card redundancy has primary line cards and protection line cards. A line card is an active line card when it sends and receives data units. When there is no failure, the primary card is ordinarily active, but when the primary card fails, the protection card becomes active.
There are two types of line card redundancy: 1+1 and N:1. 1+1 line card redundancy refers to a configuration where for each protected primary card there is a dedicated protection card. N:1 line card redundancy refers to a configuration where there is a single protection card for N protected primary cards. 1+1 redundancy allows for a primary card to fail over (where “failing over” means that the protection card is sending and receiving the data units destined for the failed primary card). N:1 redundancy allows for only a single card out of N protected cards to fail over, because after the first failure, the protection card will no longer be available as a backup for the remaining N−1 cards.
Previously proposed implementations of 1+1 redundancy and N:1 redundancy took considerable time for the NE to enable the flow of data units through a protection line card, when a primary line card which it was protecting would fail.
The present invention includes systems and methods which facilitate efficient switchover from a primary line card to a protection line card in case of primary line card failure. When a failure of a line card is detected, alarms will be generated and consolidated, and the failed line card is identified by the switch/router. Once the failed line card is identified, the protection card for this primary card will become active. In case the failed card was not active, no action related to redundancy will be taken.
The efficiency is facilitated by maintaining information describing the redundancy pairings. For 1+1 redundancy the invention sends every data unit to both the primary and the protection line cards; for N:1 redundancy, it is first necessary to enable the switchover before the data units are sent to the protection card. In this case, every data unit is sent to either the primary or the protection line card, but not to both.
In certain embodiments of the invention this information is stored as a redundancy table on every line card. This table is indexed by the IDs of the slots which hold primary line cards, and for every such slot includes the ID of the slot holding the corresponding protection card, a 1+1 redundancy indicator, and a N:1 redundancy indicator.
In certain embodiments, the 1+1 redundancy requirement for sending two replicas of the same data unit to two different line cards is met by using multicast functionality. In the preferred embodiment, for the switch/router with mesh switch fabric, the replication occurs at the level of the switch fabric hardware by writing the two replicas on two links of the mesh.
In certain embodiments, it may be preferred for the redundancy to be revertive, that is to automatically return to the initial state once the failure on the primary card is cured. In other embodiments, it may be preferred for the redundancy scheme to be non-revertive, that is remaining in the state where the protection card is active even though the failure on the primary card was cured.
The present invention may be understood more fully by reference to the following detailed description of the preferred embodiments of the present invention, illustrative examples of specific embodiments of the invention, and the appended figures in which:
An exemplary switch/router comprises a chassis with slots and a switch fabric. The switch fabric has a number of uniquely addressable interfaces, single interface corresponding to each slot. In some implementations of the switch/router multiple slots can be sharing the same fabric thread. In that case additional systems and methods are required to property identify the exact slot for which data units are destined for on a given thread. This invention does allow this capability by properly identifying a slot and the associated fabric thread. In the preferred embodiment, it is assumed that there is one to one correspondence between a slot and a fabric thread without lack of generality. When a line card is inserted in a slot, it connects to the switch fabric through one of these uniquely addressable interfaces. One line card is then able to forward data units to another line card by forwarding the data units to the appropriate switch fabric interface. Every physical slot on the chassis corresponds to one addressable switch fabric interface. These addressable interfaces are referred to as slot IDs when there is one to one correspondence between a fabric thread and a slot.
Line cards designed to receive and to send traffic on various media are inserted into the slots and connect to the switch fabric. The line cards may have network ports, which are also uniquely identifiable among the ports of the given line card. When these ports exist, the connection between the network ports and the line card can happen in many ways. In the preferred embodiment, this connection is implemented over a cross connect that connects by software configuration ports to a line card. The ports can be logical media channels (e.g., STS1) or physical ports. Alternatively, the method of implementing the connection between the physical ports and the line card may be implemented on a line card hardware. Every port in the switch can be uniquely identified by the slot ID, into which the line card was inserted, and the IID of the port on the line card.
A line card comprises both an ingress component and an egress component. The ingress component comprises an interface to the switch fabric for transmission of the data units to other line cards and one or more input ports on which data units are received from the other network elements (NE), depending on whether the card has network ports or not. The egress component comprises the interface to the switch fabric for receiving the data units from other line cards of the switch, and output ports for transmitting the data unit to the other NEs, depending on whether the card has network ports or not. In the preferred embodiment, both ingress and egress components are part of a single line card. However, in certain embodiments, the ingress and the egress components may be parts of separate physical line cards.
A line card can be schematically illustrated, as shown in
The invention provides two redundancy configurations: 1+1 redundancy and N:1 redundancy. In 1+1 redundancy, a line card to be protected is called a primary line card. Another line card, which must be exactly the same in every aspect (such as protocols, port rates, configurations, etc.) as its primary card is chosen to be its protection card.
1+1 redundancy is explained concretely and without limitation by an example. In
It is apparent that if 1+1 protection of every line card is desired, one additional card will be required for every primary card, in effect doubling the number of line cards required. Since half of the cards in this configuration will be idle at any given moment, the system will always be underutilized. To alleviate this doubling N:1 redundancy may be used.
N:1 Redundancy is illustrated in FIGS. 3A-B. In
In
Once this cross-connect connection 119 is established, the flow of data units would have the following path: line card 1A (96), line card nA (94), the connection 119 of the programmable cross-connect, transmission link 117 to line card 3B on Switch/Router B, and then across the switch fabric of system B (100) to line card 1B (106). The flow in the opposite direction would traverse the same elements in the reverse direction.
N:1 redundancy is capable of supporting a single line card failure at one time. If a second line card, for example line card 2A (92), on system A (90) would fail, the data units which ordinarily traverse that line card would be lost until a routing protocol of a higher network layer would reconfigure the routing tables (in other NEs) so that data units could bypass this second failed card.
The present invention introduces a method that enables efficient redirection of the data units destined for a failed primary line card to its protection card. The efficiency reduces the number of operations and the time required to effectuate the redirection.
A preferred embodiment of the invention is based on a programmed table lookup that returns control information for steering data units from a primary line card to a protection line card that becomes active as a result of a failure. After receiving a data unit on an input port, it is determined to what line card in what slot the data unit is to be forwarded based on information in the data unit header (e.g., IP destination address in case of IP or VPI/VCI in case of ATM) and forwarding state information (e.g., IP forwarding table in case of IP). The data unit is chunked up into FDUs and a control is put in each FDU that, among other things, contains the destination slot for the FDU. Each FDU of the same data unit is destined to the same slot. Before the FDU is forwarded across the switch fabric, the redundancy table, shown in
The steps shown in
This embodiment provides the functionality preferred for 1+1 redundancy, namely sending data units to both active and non-active line cards. When the alarms signaling the failure of a protected line card are received by the system, only a minimal time is required to switch the non-active protection line card to be active and vice versa. Thus, the redirection of the traffic takes just a few clock cycles and consequently just a few data units, if any, will be lost due to the failure.
1+1 redundancy requires sending two identical data units to two different line cards simultaneously. This resembles multicast functionality. In certain embodiments, the switch/router comprises a mesh switch fabric. The actual replicating of data units to the correct line cards is preferably done at the hardware level by a single command that indicates to the fabric hardware the slots to which the data unit should be sent. This is known as an “enable write” command and it enables the writing of the data unit to both mesh interfaces that connect to destination slots. In this manner transmitting the data unit to two line cards does not require increased memory bandwidth or scheduling cycles of the ingress line card. The replication method is described in greater details in the Multicast application. It should also be noted that the switchover of data flows from a primary card to a redundant card happens without the need for reprogramming the forwarding information.
In this embodiment, N:1 redundancy requires that upon a detected failure of a primary line card, N:1 redundancy bit in the appropriate row of the redundancy table be set from ‘0’ to ‘1’ in addition to changing the state of the protection line cards. Once the N:1 redundancy bit is set, data units will be forwarded to the protection line card as explained.
In certain embodiments of the invention, the protection groups may be configured to operate in a revertive or a non-revertive mode. In the revertive mode, when failed primary card is cured, it becomes active again and the protection card becomes inactive. For example in
In the preferred embodiment, the FDUs are stored in one or more virtual output queues (VoQ) before they are transmitted on the fabric as described in the Scheduler Application. When a line-card asserts backpressure flow control on a particular VoQ on an ingress line card, dequeueing from that VoQ is ceased until backpressure is de-asserted. In the 1+1 case, the active line-card and the protection line-card can assert backpressure asynchronously to the same VoQ on an ingress line card. In that case, when either, or both, of these line cards, asserts backpressure on a VoQ, that VoQ is put in a state wherein the data units are not forwarded to either of those line cards. Both cards have to de-assert backpressure on a VoQ for data units to be sent out from that VoQ.
The invention described and claimed herein is not to be limited in scope by the preferred embodiments herein disclosed, since these embodiments are intended as illustrations of several aspects of the invention. Any equivalent embodiments are intended to be within the scope of this invention. Indeed, various modifications of the invention in addition to those shown and described herein will become apparent to those skilled in the art from the foregoing description. Such modifications are also intended to fall within the scope of the appended claims.
This application incorporates by reference in their entireties and for all purposes the following patent applications, all of which are owned or subject to a right of assignment to the assignee of the present application and all of which were filed concurrently together with the present application: (1) the application titled “METHODS AND SYSTEMS FOR EFFICIENT MULTICAST ACROSS A MESH BACKPLANE”, by Bitar et al. and identified by attorney docket no. BITAR 7-11-1 (Ser. no. ______) (hereafter, the “Multicast application”); (2) the application titled “VARIABLE PACKET-SIZE BACKPLANES FOR SWITCHING AND ROUTING SYSTEMS”, by Bitar et al. and identified by attorney docket no. BITAR 5-9-3 (Ser. no. ______) (hereafter, the “Variably-sized FDU application”); (3) the application titled “A UNIFIED SCHEDULING AND QUEUEING ARCHITECTURE FOR A MULTISERVICE SWITCH”, by Bitar et al. and identified by attorney docket no. BITAR 4-8-2 (Ser. no. ______) (hereafter, the “Scheduler application”); and (4) the application titled “SYSTEMS AND METHODS FOR SMOOTH AND EFFICIENT ROUNG-ROBIN SCHEDULING”, by Bitar et al. and identified by attorney docket no. BITAR 8-4 (Ser. no. ______) (hereafter, the “SEWDRR application”).