1. Field of the Application
Generally, this application relates to communication networks. More specifically, it relates to methods and systems for fragmentation and reassembly for IP tunnels in hardware pipelines.
2. Description of the Related Art
In traditional networking environments, networking devices are connected by physical wires or wireless links. For example, L2 Ethernet networks are constructed using wired links, bridges and switches, and L3 IP networks are constructed by physically connecting multiple L2 Ethernet networks together using routers. To increase the flexibility and reduce installation overhead, tunneling technologies have been introduced to allow multiple nodes or networks to be connected via logical links instead of physical links. This allows network administrators to construct networks that are independent of the underlying physical topology, thus increasing the flexibility of the network topology. For example, network administrators can connect two disjoint networks in two different geographic locations by running an IP tunnel between two sites within the two networks. The two networks are transparent to the internetworking infrastructure between the two networks (e.g., the IP tunnel, etc.).
Tunneling technology is typically implemented by adding an encapsulation outside of the original payload, for example, an IP datagram. The encapsulating header is responsible for transporting the payload from one location to another location. Once the encapsulated payload reaches the destination, the network node decapsulates the packet, extracts the data out of the original payload, and processes the data like a regular, non-tunneled packet.
Tunnels are widely used in modem networking infrastructure. For example, the IP security protocol, IPSec, uses tunnels to form a secure connection between two networks or between a host and a network so they can be logically connected. Two disjoint Internet Protocol version 6, IPv6, networks can be connected by an IPv6-in-IPv4 tunnel so they can be connected even though there is no internetworking IPv6 between them (e.g., only IPv4).
Tunnels, especially IP tunnels, increase flexibility, but also create some problems. The biggest problem is that payload sizes normally increase in the tunnel encapsulation process. For example, IP-in-IP tunnel increase the payload by 20 bytes. If the original packet size is the same as the maximum transmission unit (MTU) size of the transmission link, the tunneled payload will exceed the MTU limitation by 20 bytes. To solve this problem, network protocols are typically designed to fragment the outgoing packets to ensure that the total transmission payload does not exceed the MTU size. During fragmentation, each packet is divided into multiple segments before it is sent out, where each segment does not exceed the MTU size. The tunnel termination node will then reassemble all of the received segments back to the original packet before extracting the payload and forwarding the original packet to the destination. This process is typically called IP fragmentation and reassembly.
Logically, the IP layer is responsible for the typical fragmentation and reassembly process, and should reassemble packets before passing the datagram to upper layer stacks (e.g., generic routing encapsulation (GRE) 150, transmission control protocol (TCP) 160, user datagram protocol (UDP) 170, IP Security Protocol (IPSec) 180, etc.). Likewise, if the packet coming from an upper layer exceeds the MTU size, the IP layer should fragment it before passing it to lower layer interfaces (e.g., physical interface 110, 120, logical interface 130, etc.).
There are generally two typical implementations for packet fragmentation and reassembly used in IP tunneling. Both implementations have at least some negative impact on latency or throughput, or both.
Switching processors typically pass packets to a separate host processor, or CPU, for additional fragmentation and reassembly processing. For example, during the reception process, if a packet fragment is detected by the switching processor, it passes the fragment to host CPU. The IP stack on the host CPU reassembles the IP fragments back together before it passes the reassembled packet back to the switching hardware for additional processing. During the transmission process, if the switching hardware detects that the packet size exceeds the MTU size for the outgoing interface, it again passes the packet to host CPU, which is then responsible for fragmenting the packet before sending the fragments back to an outgoing interface.
A drawback of this method is that all fragments will require slow path host CPU intervention. Host CPU processing is slower than inline, hardware processing. If the percentage of packets requiring fragmentation is relatively high within a given network, the total throughput of the network will slow down significantly. A second drawback of this typical implementation is latency and jitter. Fragmented packets will have much higher forwarding latency (normally on the order of milliseconds) compared with the latency of non-fragmented packets (normally on the order of microseconds). This increased latency can negatively affect latency-sensitive applications, such as, for example, streaming media and voice over IP (VoIP) applications. Another drawback of this implementation is out-of-order packet/fragment delivery. If a non-fragmented packet comes immediately after a fragmented packet, the second, non-fragmented packet will likely be forwarded out first, and the first, fragmented packet (once reassembled) will be forwarded second, via the host CPU. This creates out-of-order packet delivery that can negatively affect TCP application throughput.
The second typical fragmentation and reassembly implementation is to use a separate fragmentation and reassembly co-processor. If a fragmented packet is received from an interface, it is passed to the co-processor where fragments are stored in packet memory. After all the fragments arrive, the co-processor reassembles the segments and passes them on for IP processing and forwarding. For outgoing packets, if fragmentation is required, it is stored at a temporary place and the co-processor fragments the entire packet before all fragments are transmitted sequentially out of the interface. A drawback of this approach is complexity and cost. There is an additional packet store-and-forward stage added to the packet processing path, which means additional memory requirement for packet storage and additional packet latency for forwarding. This increases cost and reduces network application throughput.
Therefore, what are needed are systems and methods for efficiently implementing IP fragmentation and reassembly for tunneled packets, possibly combined with encryption, decryption and forwarding, which is suitable for hardware pipelines of network switching processors.
A novel flow-through architecture for fragmentation and reassembly of tunnel packets in network devices is presented. The fragmentation and reassembly of tunneled packets are handled in the hardware pipeline to achieve line-rate processing of the traffic flow without the need for additional store and forward operations typically provided by a host processor or a co-processor. In addition, the hardware pipeline may perform fragmentation and reassembly of packets using encrypted tunnels by performing segment-by-segment crypto. A network device implementing fragment reassembly can include an ingress hardware pipeline that reassembles fragmented packets between a media access control (MAC) of the device and an output packet memory of the device, where the incoming fragmented packets can be encrypted and/or tunneled. A network device implementing packet fragmentation can include an egress hardware pipeline that fragments packets between an input packet memory of the device and the MAC, where the outgoing fragments can be encrypted and/or tunneled.
Aspects and features of this application will become apparent to those ordinarily skilled in the art from the following detailed description of certain embodiments in conjunction with the accompanying drawings, wherein:
Embodiments will now be described in detail with reference to the drawings, which are provided as illustrative examples of certain embodiment so as to enable those skilled in the art to practice the embodiments and are not meant to limit the scope of the application. Where aspects of certain embodiments can be partially or fully implemented using known components or steps, only those portions of such known components or steps that are necessary for an understanding of the embodiments will be described, and detailed description of other portions of such known components or steps will be omitted so as not to obscure the embodiments. Further, certain embodiments are intended to encompass presently known and future equivalents to the components referred to herein by way of illustration.
In certain embodiments, a novel flow-through implementation in the hardware pipeline for fragmentation and reassembly of tunnel packets attempts to solve at least some of the problems associated with the typical IP reassembly and fragmentation designs. The fragmentation and reassembly of tunneled packets are handled in the hardware pipeline without the need for any additional store and forward operations. In addition, certain embodiments can work with fragmented packets in encrypted tunnels, where fragments can be decrypted before they are reassembled, and where the fragmentation of a packet can happen before encrypting the fragments. As used herein, the words frame, packet, datagram, segment, message, cell, data, information and the like are not meant to be limiting to any particular network protocol, appliance or layer, but instead are generically meant to indicate any type information or data unit.
In certain embodiments, IP fragmentation in switching devices can be implemented at an egress processing stage. It should be noted that the blocks shown in the exemplary egress hardware pipeline of
In certain embodiments, IP fragmentation may be needed for packets that are to be transmitted as encrypted packets. In that case, the packet can be fragmented 260 in the egress hardware pipeline taking into account constraints in payload size imposed by the specific cryptographic algorithms used. For example, when using the Advanced Encryption Standard with Cipher Block Chaining (AES-CBC), all fragments except the last fragment should have a payload that is a multiple of 16 bytes. As each fragment is created, it is sent through the encryption block and the encryptor encrypts the payload. Some state is retained once a fragment is encrypted and this state is utilized to initiate the encryption for the next fragment. Segment-by-segment encryption can be performed to accomplish flow-through fragmentation as described in commonly-assigned and co-pending U.S. patent application Ser. No. 11/351,331 filed on Feb. 8, 2006 and entitled “Methods and Systems for Incremental Crypto Processing of Fragmented Packets,” which is fully incorporated herein by reference for all purposes.
In certain embodiments, an exemplary switching device can implement flow-through reassembly of fragmented packets as part of ingress hardware pipeline 300. It should be noted that the blocks shown in the exemplary ingress hardware pipeline of
Certain embodiments can be used for flow-through reassembly of clear as well as encrypted tunnels. If a tunnel identified as part of ingress hardware pipeline 300 is an encrypted tunnel (e.g. IPSec tunnel, PPP-SSH, CIPE, etc.), certain embodiments can use decryption logic to handle decryption on a segment-by-segment basis, the results of which can be combined together during the reassembly process after the last segment arrives. For example, segment-by-segment decryption can be performed to accomplish flow-through reassembly of encrypted fragments as described in commonly-assigned and co-pending U.S. patent application Ser. No. 11/351,331 filed on Feb. 8, 2006 and entitled “Methods and Systems for Incremental Crypto Processing of Fragmented Packets,” which is fully incorporated herein by reference for all purposes.
In certain embodiments, incoming non-fragmented packets can run through tunnel table processing where all tunneled packets are identified. If the incoming packet is encrypted it goes through decryption. In the case where it is a clear tunnel packet, i.e., packets on the tunnel are not encrypted, the decryption processing is bypassed. The decrypted (or clear tunnel) packet is then subjected to L2/L3 switching and/or firewall/ACL processing, as appropriate, and if needed, the inner header is updated. The inner header editing can do minor updates, such as, for example, updating the IP DiffServ Code Point (DSCP) for the inner packet if needed by ACL. The decrypted packet is then stored in a packet buffer in packet memory and the pointer to the packet buffer is queued into the egress queue. Based on the scheduling criteria, the packet buffer can be dequeued by the scheduler and sent for egress processing.
As shown in
From step 406, if the MF flag in the IP header is set (i.e., MF=1) and the OFFSET field in the IP header is 0, this packet is the first, or initial, fragment of an IP datagram. Most of the processing for the first fragment is the same as for non-fragmented packets (discussed above). The detailed processing of the first fragment begins at step 408, where the previously retrieved tunnel table offset is checked for a non-zero value. If the tunnel table offset is non-zero, then a partial IP fragment exists in the tunnel table and, as discussed further below, the IP reassembly queue should ultimately be flushed because, as previously noted, this incoming packet is the first fragment of an IP datagram. At this point, flow 400a transfers to flow 400b via connector Ain.
As shown in
If the payload check at step 422 indicates an IEEE 802.3 type, then L2, L3 switching and/or ACL processing 424, 426, 428 are performed normally, as necessary, and flow control is passed through connector Aout. If the payload check at step 422 indicates and/or ACL processing 426, 428 are performed normally and flow control is passed through connector Aout. However, if IPSec is inside at step 429, then IPSec header parsing and decryption 430, 432 can be performed prior to L3 switching and/or ACL processing 426, 428, followed by passing flow control through connector Aout.
If the payload check at step 422 indicates an IEEE 802.11 type, 802.11 header parsing 434 can be performed. Then the 802.11 packet can be checked for encryption 436. If the packet is encrypted, then it can be decrypted 438 and L2/L3 switching and/or ACL processing 424, 426, 428 are performed, as necessary, and flow control is passed through connector Aout. If the 802.11 packet is not encrypted, it can be further checked for IPSec 440. If IPSec is inside of the 802.11 packet, then IPSec parsing and decryption 430, 432 can be performed prior to L3 switching and/or ACL processing 426, 428, followed by passing flow control through connector Aout. If the 802.11 packet is not encrypted and IPSec is not inside, then L2/L3 switching and/or ACL processing 424, 426, 428 are performed, as necessary and flow control is passed through connector Aout. For each of these payload types, firewall processing (not shown) can additionally be performed as necessary.
For certain embodiments, discussed previously in relation to flow 400b, decryption logic decrypts the fragment and packet format information and the pointer to the decryption context can be stored in the tunnel table. A temporary decryption state can also be stored in the tunnel table. Additionally, an intermediate packet integrity check value can be calculated for the segment and the result can be stored in the tunnel table. No replay counter update is performed at this point in the flow as the packet has not yet been authenticated.
As shown in
Returning the step 406 of flow 400a in
As shown in
As shown in
Returning to step 406 of flow 400a in
In certain embodiments, the exemplary data flow described in
For clear tunnels, the ingress packet processing is the same as above with respect to
For encrypted tunnels, the ingress processing can be accomplished in two passes. On finding an out of order IP fragment, the fragment is not decrypted as the intermediate decryption context cannot be used to decrypt out of order fragments. At time of enqueuing such a fragment, a special flag namely, for example, “isEncrypted” can be set for these fragments to indicate that they arrived out of order. Here too, the IP reassembly queue can be maintained as an ordered list with IP offset in the IP header forming the basis for ordering.
At the time of enqueuing, the IP reassembly queue is traversed to figure out whether there has been a fragment which is marked with the “isEncrypted” flag and for which the previous fragment has arrived and been enqueued with this flag not set. If such a fragment is found, then it is looped back into the pipeline and ingress processing is performed for such packets. During ingress processing these fragments can be decrypted. If at the time of enqueuing the fragments into the IP reassembly queue it is found that all the intermediate fragments and the last fragment have arrived and the “isEncrypted” flag is false for all the enqueued fragments, then the fragments are moved to the output queue and the rest of the egress processing is the same as discussed above.
Certain embodiments can support the need to reassemble IP fragments from multiple IP packets belonging to a tunnel. In such embodiments, multiple IP reassembly flow contexts can be maintained per tunnel. The rest of the data processing is similar to the one described above with reference to
Although certain embodiments described above illustrate a mechanism for reassembling tunneled packets and fragmentation of tunneled packets, these embodiments can be easily extended to include the reassembly and/or fragments of non-tunneled IP datagrams. If non-tunneled IP packets need to be reassembled, then the IP reassembly flow entry is created for every source IP address, destination IP address, protocol and ID. Further, the IP reassembly flow entry keeps a pointer to the stored IP fragments in the memory. The rest of reassembly mechanism is similar to certain embodiments described above. Likewise, if an IP packet needs fragmentation, but is not tunneled, the egress process can simply not perform egress tunnel header creation and the rest of the fragmentation mechanism is similar to certain embodiments described above.
Although the application has been particularly described with reference to embodiments thereof, it should be readily apparent to those of ordinary skill in the art that various changes, modifications, substitutes and deletions are intended within the form and details thereof, without departing from the spirit and scope of the application. Accordingly, it will be appreciated that in numerous instances some features of certain embodiments will be employed without a corresponding use of other features. Further, those skilled in the art will understand that variations can be made in the number and arrangement of inventive elements illustrated and described in the above figures. It is intended that the scope of the appended claims include such changes and modifications.
This application is related to and claims the benefit of U.S. Provisional Patent Application No. 60/673,482 filed Apr. 21, 2005 and is incorporated, in its entirety, herein by reference.
Number | Date | Country | |
---|---|---|---|
60673482 | Apr 2005 | US |