The present invention relates to methods and systems for layer 3 packet forwarding. More particularly, the present invention relates to methods and systems for hitless restart of layer 3 packet forwarding.
Open systems interConnect (OSI) layer 3 forwarding devices such as internet protocol (IP) routers maintain a routing database and forwarding tables to control the forwarding of layer 3 packets. The routing database may be generated by participating in layer 3 routing protocols to obtain next hop information for received packets. The forwarding tables may be generated based on lookups in the routing table.
Participating in layer 3 routing protocols, such as Open Shortest Path First (OSPF), Border Gateway Protocol (BGP), Intermediate System to Intermediate System (IS-IS), Protocol Independent Multicast (PIM) Dense Mode (PIM-DM), PIM Sparse Mode (PIM-SM), Distance Vector Multicast Routing Protocol (DVMRP), and Core Based Trees (CBT) can consume a high percentage processor cycles of a layer 3 forwarding device, such as an IP router. Accordingly, IP routers typically have a management module that participates in these protocols and distributes routes learned from participating in the layer 3 routing protocols to input/output modules that actually forward packets. For reliability purposes, some layer 3 forwarding devices may include a backup management module to take over participation in layer 3 routing protocols in the event of failure of the main management module. The switching of control from the main management module to the backup management module is referred to as failover.
One goal of the failover mechanisms is to be hitless with regard to packet forwarding. As used herein, the term “hitless failover” refers to continuing packet forwarding for existing connections when the main management module on a layer 3 forwarding device fails. In some conventional layer 3 forwarding devices, failover mechanisms are not hitless. That is, when the main management module fails, the backup management module must be booted from scratch, and or it must learn routing table entries by participating in the layer 3 routing protocols. Packets on existing connections will be dropped or routed around the failed the device until the new main management module participates in the IP routing protocols to reestablish itself with other nodes in the network. Higher protocol layers running on end nodes are required to retransmit dropped packets. This type of restart can be referred to as “cold restart.” In light of the problems associated with cold restart, hitless restart mechanisms have been proposed. One hitless restart mechanism is proposed in Moy, J., Hitless OSPF Restart, draft-ietf-ospf-hitless-restart-02.text, February 2002 (hereinafter, “Moy”). According to Moy, an OSPF router attempting a hitless restart originates grace link state advertisements (LSAs) announcing the intention to perform a hitless restart and asking for a grace period. During the grace period, its neighbors continue to announce the restarting router in their LSAs as if it were operating normally, and packets are routed through the restarting router using its forwarding tables which are preserved during the restart. One problem with the solution proposed in Moy is that it requires an extension to the OSPF protocol in that routers adjacent to the restarting router must recognize the grace LSAs and give the restarting router a grace period to restart its OSPF protocol function.
Another hitless restart mechanism that has been proposed is Sangli et al., Graceful Restart Mechanism for BGP, draft-ietf-idr-restart-05.text (June 2002) (hereinafter, “Sangli”). Sangli proposes a graceful restart mechanism for BGP. BGP or border gateway protocol is a routing protocol for routers that are not in the same administrative domain. The graceful restart mechanism proposed in Sangli requires the router requesting to restart its BGP protocol to send a message to its BGP pairs indicating its ability to preserve its forwarding state during BGP restart. Like the solution proposed in Moy, the peer routers wait for a predetermined time period before removing the BGP router from their forwarding tables. Also like the solution proposed in Moy, the BGP restart mechanism proposed in Sangli requires that neighboring routers participate in extensions to the BGP protocol.
Another problem with the restart mechanisms proposed in both Sangli and Moy is that they only relate to specific routing protocols. A given router may run multiple protocols, requiring a separate restart mechanism for each protocol. A possible solution to this hitless restart problem is to run all of the routing protocols on the backup management module so that restart can occur seamlessly. However, running all of the routing protocols on the backup management module is processor-intensive and requires synchronization between the databases and protocol state machines of the main and backup management modules. Moreover, neither Moy nor Sangli discusses or presents a solution to the problem of linking hardware and software forwarding table entries after a re-start.
Accordingly, in light of these problems associated with conventional restart mechanisms, there exists a long felt need for improved methods and systems for hitless restart of layer 3 forwarding.
The present invention includes a method for hitless restart of layer 3 packet forwarding in response to failure of a management service module (MSM). In a layer 3 forwarding device, such as an IP router, one management service module functions as a master and another management service module functions as a slave. The master management service module builds a layer 3 routing table by participating in layer 3 routing protocols. This layer 3 routing table is stored in memory. A hardware layer 3 forwarding table is constructed by performing lookups in the layer 3 routing table and storing the results in hardware. This layer 3 forwarding table is replicated to hardware-specific forwarding tables which may be located on input/output modules and/or either or both management service modules. A software copy of the hardware forwarding table may also be stored on both the master and slave management service modules.
When the master management service module fails, the slave management service module allows the forwarding hardware to continue operation. The slave management service module also starts running layer 3 routing protocols to build another forwarding table. Entries in the forwarding table built using layer 3 routing protocols are linked with entries in the hardware forwarding table using the software copy received from the former master management service module. Any entries that are not linked within a predetermined time period are preferably deleted from both hardware and the software copy of the hardware layer 3 forwarding table.
Because the slave management service module maintains hardware and software copies of the hardware forwarding tables formerly managed by the master management service module, hitless restart can be performed for existing routes, i.e., those routes for which an entry existed in the hardware forwarding table. New routes may be learned in the normal manner by participating in IP routing protocols after restart. The slave management service module need not participate in layer 3 routing protocols prior to restart. As a result, the need for synchronization between the master and slave management service modules is reduced.
Accordingly, it is an object of the invention to provide improved methods and systems for hitless restart of layer 3 packet forwarding.
It is another object of the invention to provide improved methods and systems for hitless restart of layer 3 packet forwarding that reduce the amount of state information required to be replicated between master and slave management service modules.
Some of the objects of the invention having been stated hereinabove, other objects will become evident as the description proceeds when taken in connection with the accompanying drawings as best described hereinbelow.
Preferred embodiments of the invention will now be explained with reference to the accompanying drawings of which:
Methods and systems for hitless restart of layer 3 packet forwarding may be implemented in any suitable layer 3 forwarding device, such as an IP router.
In the illustrated example, layer 3 forwarding device 100 includes a plurality of input/output modules 101-106. Input/output modules 101-106 send and receive layer 3 packets over a network. Input/output modules may each be implemented as printed circuit boards plugged into slots in layer 3 forwarding device 100. A switch fabric 107 connects input/output modules to each other and to master and slave management service modules 108 and 109. Switch fabric 107 may be any suitable type of switching fabric. In one exemplary embodiment, switch fabric 107 includes a puraity of gigabit Ethernet connections, one half of which are managed by management service module 108, and the other half of which that are managed by slave management service module 109.
Master and slave management service modules 108 and 109 each include hardware and software for implementing hitless failover.
Hardware forwarding table 110 stores destination addresses of received packets and corresponding forwarding information. This forwarding table is replicated to input/output modules 101-106 to enable forwarding of packets, as illustrated in
The present invention is not limited to storing software copies of hardware forwarding tables. In an alternate embodiment, software copies 112 may be omitted and entries may be accessed by accessing hardware forwarding tables 110 directly.
According to an important aspect of the invention, master management service module 108 includes a routing table 115 that is preferably not replicated to slave management service module 109. Routing table 115 is preferably constructed by participating in IP routing protocols. Exemplary IP routing protocols in which master MSM may participate includes any of the above referenced IP routing protocols, such as BGP, OSPF, IS-IS, etc. Slave MSM 109 preferably does not participate in IP routing protocols until a restart occurs.
Master and slave management service modules 108 and 109 may communicate with each other over suitable reliable communications mechanism.
In one example, the reliable communication mechanism may be shared memory.
That is, master MSM 108 may be capable of writing to memory of slave MSM 109, but not vice versa. The reason for implementing one-way shared memory is so that a bug in slave MSM 109 will not affect the operation of master MSM 108.
In one embodiment of the invention, hardware forwarding table 110 contains individual IP addresses and corresponding forwarding information. Table 1 shown below illustrates an example of forwarding table information that may be included in hardware forwarding table 110.
In Table 1, individual IP addresses extracted from received addresses may be stored along with corresponding forwarding information. The forwarding information is illustrated in text format as MAC_ADDR for media access control addresses, VLAN_ID for virtual local area network identifiers, and port_ID for I/O port identifiers. It is understood that in the actual implementation of the invention, binary values corresponding to actual MAC and VLAN addresses and output ports would be present in this table. Storing individual IP addresses extracted from received packets reduces the need to implement a longest prefix matching algorithm in hardware. However, the present invention is not limited to storing individual IP addresses in hardware forwarding table 110. In an alternate embodiment of the invention, a longest prefix matching algorithm may be implemented in hardware and the individual entries illustrated in Table 1 may be replaced by address prefixes and subnet masks.
Table 2 shown below illustrates an example of entries that may be included in a routing table, such as routing table 115 illustrated in
As illustrated in Table 2, each entry in routing table 115 may include an address prefix and a subnet mask. The subnet mask is applied to destination IP addresses in received packets and the result is compared with the prefix in the prefix portion of the table. The entry having the longest prefix is considered to be a match. The corresponding forwarding information is extracted from routing table 115 and used to route the packet to its intended destination. As discussed above, the entries in routing table 115 may be built by participating in IP routing protocols.
Returning to
In step ST5, it is determined whether master MSM 108 has failed. Master MSM 108 may fail for any number of reasons, including hardware and software exceptions or management action to force activation of slave MSM 109, e.g., to replace master MSM 108 or upgrade software executing on master MSM 108. The failure may be detected by slave MSM 109 by any number of mechanisms, including the absence of heartbeat messages from master MSM 108 or a failure message indicating that a failure has occurred. if there is no master MSM failure, master MSM 108 continues operating as normal and controls the operation of layer 3 forwarding device 100.
If master MSM fails, in step ST6, slave MSM 109 becomes the master. According to an important aspect of the invention, packet forwarding continues for existing routes or existing routes or network traffic flows because hardware database 110 was replicated to slave MSM 110 and I/O modules 101-106. Thus, provided that the entries in these forwarding tables are still valid, packet forwarding will continue without error.
In step ST7, slave MSM 109 begins participation in IP routing protocols to build an IP routing table. Because slave MSM is not required to run these IP routing protocols in advance of failure, the problem of synchronization between master MSM 108 and slave MSM 109 is eliminated.
Once slave MSM 109 becomes the master, slave MSM 109 may immediately (i.e., with enhanced priority) begin sending layer 2 keepalive messages to its neighbors. This reduces the likelihood that the neighbors will declare a topology change and thus send messages around this switch rather than to it.
In step ST8, slave MSM waits for a timer to expire to begin linking entries in its newly-constructed routing table with entries in software copy 112 of hardware forwarding table 110. Once this timer expires, in step ST9, slave MSM 109 begins the process of linking entries in its newly constructed routing table with entries in software copy 112 of hardware forwarding table 110. In step ST10, slave MSM 109 searches its newly-constructed routing table and determines whether a matching entry exists for an entry in hardware forwarding table 110. if a corresponding entry has not received via P routing protocols for the new routing table, a matching entry for the entry in hardware forwarding table 110 will not be found. If a match is not found, in step ST11, the entry is deleted from both software copy 112 of hardware forwarding table and from hardware forwarding table 110. In step ST12, slave MSM 109 determines whether all entries have been checked. If all entries have not been checked, control proceeds to step ST13 where the next entry is located and checked for age out. The process continues until all of the entries in forwarding table 110 have either been validated or deleted.
Thus, using the steps illustrated in
As stated above, one advantage of the present invention is that layer 3 and higher layer protocol state information does not need to be replicated on slave MSM 109. However, layer 2 state information is preferably replicated on slave MSM 109.
Master MSM 108 also executes a layer 3 protocol 404, such as one or more IP routing protocols, which creates layer 3 state information 406, such as reachability and topology information for IP destinations. This information is preferably not mirrored to slave MSM 109. Omitting the mirroring of layer 3 protocol information avoids the need for synchronizing layer 3 protocols between master and slave MSMs 108 and 109. This results in significant reductions in complexity and processing by forwarding device 100.
An operational configuration file 408, which contains information regarding the number, type, and location of modules within layer 3 forwarding device 100 is preferably stored in memory on master MSM 108. The operational configuration includes a change file 410, which stores an original configuration, and configuration changes 412 implemented since the original configuration. An example of a configuration change is the addition of a new input/output module to forwarding device 100. Operational configuration 408 is preferably mirrored to slave MSM 109 prior to failover to enable slave MSM to start operating under the same configuration previously recognized by master MSM 108. Finally, hardware forwarding table 110 and software copy 112 are preferably mirrored to slave MSM 109 to enable hitless failover. Software copy 112 is then used efficiently to link entries in the newly created routing table with entries stored in hardware.
Thus, the present invention includes improved methods and systems for hitless failover of layer 3 packet forwarding that avoid the need for mirroring layer 3 state information between master and slave management service modules. The slave MSM maintains a copy of a layer 3 forwarding table received from the master MSM. However, the slave MSM does not maintain the routing table or layer 3 state information maintained by the master management service module. Upon failover, the slave management service module is able to continue forwarding of packets for existing routes and begins construction of a new routing table. Entries in the new routing table are linked with entries in the forwarding table. Because the slave management service module is capable of continuing packet forwarding without implementing layer 3 and higher protocols, the need for maintaining synchronized upper layer protocol information is reduced.
It will be understood that various details of the invention may be changed without departing from the scope of the invention. Furthermore, the foregoing description is for the purpose of illustration only, and not for the purpose of limitation-the invention being defined by the claims.
Number | Name | Date | Kind |
---|---|---|---|
4694452 | Beckinger et al. | Sep 1987 | A |
5835696 | Hess | Nov 1998 | A |
5963540 | Bhaskaran | Oct 1999 | A |
6035415 | Fleming | Mar 2000 | A |
6148410 | Baskey et al. | Nov 2000 | A |
6442250 | Troen-Krasnow et al. | Aug 2002 | B1 |
6490246 | Fukushima et al. | Dec 2002 | B2 |
6650660 | Koehler et al. | Nov 2003 | B1 |
6674713 | Berg et al. | Jan 2004 | B1 |
6732184 | Merchant et al. | May 2004 | B1 |
6738826 | Moberg et al. | May 2004 | B1 |
6751191 | Davar et al. | Jun 2004 | B1 |
6885635 | Haq et al. | Apr 2005 | B1 |
6928576 | Sekiguchi | Aug 2005 | B2 |
6941487 | Balakrishnan et al. | Sep 2005 | B1 |
7274711 | Kajizaki et al. | Sep 2007 | B2 |
7274926 | Laumen et al. | Sep 2007 | B1 |
20030154154 | Sayal et al. | Aug 2003 | A1 |
20040001485 | Frick et al. | Jan 2004 | A1 |
20040034703 | Phadke | Feb 2004 | A1 |
20040047286 | Larsen et al. | Mar 2004 | A1 |
Number | Date | Country |
---|---|---|
1294137 | Mar 2003 | EP |
2409601 | Jun 2005 | GB |
WO 2098059 | Dec 2002 | WO |
Number | Date | Country | |
---|---|---|---|
20040001485 A1 | Jan 2004 | US |