The present disclosure relates to forwarding packets in a network device.
In a packet-switched or packet mode computer network, data is transmitted in the form of packets (sometimes referred to as datagrams, segments, blocks, cells or frames) according to predefined protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP). A sequence of packets transmitted from a source device to a destination device is referred to as a network flow.
Packets generally comprise control information and actual data (also known as payload). The control information is data that intermediate network devices (e.g., switches, routers, etc.) use to forward the packet from the source device to the destination device. The control information may comprise, for example, source and destination addresses (e.g., source and destination Media Access Control (MAC) addresses), error detection codes (i.e., checksums), sequencing information, etc. This control information is generally found in a portion of the packet referred to as the packet header (i.e., the information that precedes the actual data within the packet) and/or the packet trailer (i.e., the information that follows the actual data within the packet).
Overview
Techniques are provided for forwarding packets via an intermediate network device. A packet comprising a destination MAC address is received at a first port of a network device having a plurality of bi-directional ports. A second port of the network device to which the packet should be forwarded is identified through the use of at least an approximate ingress table at the first port comprising a plurality of compressed destination MAC addresses each having an associated egress port, and the packet is forwarded to the second port. At the second port, a subsequent network device to which the packet should be forwarded is identified through the use of an exact egress table at the second port including exact destination MAC addresses each associated with a network device connected to the second port, and the packet is forwarded to the subsequent network device.
Example Embodiments
A computer network is a communication system that links two or more computers or other network devices so that the network devices may communicate, share resources, access centrally stored information, etc. In a packet-switched network, such communication occurs through the exchange of packets. In the example of
In the example of
Packet 16 comprises control information 20 and actual data (payload) 22. The actual data 22 may comprise, for example, video data, numeric data, alphanumeric data, voice data, etc. The control information 20 comprises information that is used by switches 14(2), 14(1), and 14(3) to direct packet 16 along the segments to host device 12(6). Control information 20 may comprise, for example, source and destination addresses, error detection codes (i.e., checksums), sequencing information, etc. In an Ethernet network, the source/destination addresses in control information 20 are unique identifiers assigned to network interfaces of network devices, referred to as Media Access Control (MAC) addresses. As such, in the example of
In a conventional packet switching network, a switch forwards a packet through the use of a central switching table. The central switching table includes entries comprising: (1) the exact MAC addresses for all host devices in the computer network, (2) the exact virtual local area network (VLAN) identifier, and (3) the associated interface identifier for the interface to be used to forward the packet. Generally, the MAC address has a length of 48 bits and the VLAN identifier has a length of 12 bits, leading to a 60-bit entry along with the associated interface identifier.
When a conventional switch receives a packet containing a destination MAC address that is absent from the central forwarding table, the packet is broadcast to every host device in the network (referred to as broadcast floods), thereby consuming valuable network bandwidth. As a result, designers have increased the size of the central forwarding table to accommodate as many entries as the number of hosts in the network to avoid broadcast floods. However, computer networks, particularly Ethernet networks, are configured to be very large and include a large number of host devices (e.g., in the order of several hundreds of thousands of host devices). This is particularly true given the advent of server virtualization in datacenters in which a single physical server can host multiple virtual servers each having its own MAC address. As such, in order to accommodate these large numbers of MAC addresses, the central forwarding table utilizes a large amount of memory (e.g., Random Access Memory (RAM)) in an application-specific integrated circuit (ASIC) on the switch.
Techniques are described herein for forwarding packets in such a way so as to substantially reduce the memory utilized by an intermediate network device (e.g., switch) to forward a packet by eliminating use of a central forwarding table containing the exact MAC addresses for all host devices in the computer network. In accordance with techniques described herein, a packet is forwarded by a switch through the use of an approximate ingress table and an exact egress table.
When packet 16 is received at port 30(1), port 30(1) determines which of ports 30(2) or 30(3) is the correct egress for the packet. Port 30(1) identifies the correct egress port through the use of approximate ingress table 32 that includes a plurality of approximate destination MAC addresses each having an associated egress port. Further details of the use of approximate ingress tables to forward packets are provided below.
In the example of
In the example of
In the example of
In the example of
As shown in
The compressed forwarding information in each table entry 50(1)-50(N) includes a MAC address for a destination host device in computer network 10 and an associated VLAN identifier. The compressed forwarding information in each table entry 50(1)-50(N) also has an interface (port) associated therewith. In such an example, the size of the compressed forwarding information in each of table entries 50(1)-50(N) may be, for example, approximately 8 to 16 bits (assuming compressed forwarding information comprises the MAC address and VLAN identifier).
It is to be appreciated that the destination MAC addresses and the VLAN identifier are merely examples of the contents of compressed forwarding information in a table entry 50(1)-50(N) and that other control information may be additionally or alternatively be included in an entry.
As noted above, in the example of
In certain circumstances, the hash value generated through the ingress processing based on control information 20 in packet 16 may not have a match stored in approximate ingress table 32(1). In such cases, packet 16 is broadcast to all the other ports (ports 30(2) and 30(3)) in switch 14(1). Each port then uses the egress processing described below to determine if the destination device is connected to that port and, if so, to forward the packet. If the destination device identified in packet 16 is not connected to an egress port 30(2) or 30(3), the egress port may ignore the packet and send a correction notification to port 30(1), as described below.
In the example of
In the example of
In operation, when packet 16 is received by port 30(3) from port 30(1), processor 38(3) compares the destination MAC address in packet 16 to the exact forwarding information in table entries 55(1)-55(N) in exact egress table 34(3). The comparison, which uses a single memory access, identifies the network device attached to port 30(3) to which packet 16 should be forwarded. Subsequently, packet 16 is forwarded via network interface 37(3) to a subsequent network device. The subsequent network device may be the destination host device (direct connection) or, as shown in
When packet 16 is received by port 30(3) and the MAC address and, optionally, other control information contained therein is compared to the entries in exact egress table 34(3), there are two potential results. First, as noted above, the egress processing may determine that the control information in packet 16 matches the exact forwarding information contained in one of the entries 55(1)-55(N), and packet 16 is forwarded, directly or indirectly, to the identified destination device. Second, the egress processing may determine that the control information in packet 16 does not match any of the exact forwarding information contained in one of the entries 55(1)-55(N). As noted above, in such circumstances, packet 16 was incorrectly forwarded to the port because the hash value computed based on the control information in the packet erroneously matched the compressed forwarding information for another destination device. This results from, for example, aliasing in computing the compressed forwarding information.
In such circumstances, because the egress processing has now looked up the exact forwarding information for packet 16 in exact egress table 34(3), the egress processing can trigger a correction process that prevents future incorrect forwarding of packets directed towards the destination device specified in packet 16. Further details of this correction process are provided below. However, one effect of this correction process is the transmission of a notification to port 30(1) that the port sent packet 16 to the wrong egress port. Port 30(1) is configured to maintain a correction table 42(1) that includes the exact forwarding information (i.e., destination addresses) in received packets that will result in forwarding of packets to the wrong egress port. Specifically, as shown in
As noted above, each of ports 30(1)-30(3) are bi-directional and include an approximate ingress table 32(1)-32(3) respectively, an exact egress table 34(1)-34(3), respectively, and a correction table 42(1)-42(3), respectively. The entries in each of these tables include a Layer 2 address (source address and/or destination address) and/or other control information. It is to be appreciated that the techniques described herein are not limited to any specific types of control information in the table entries.
As noted above,
As noted above, the ingress and egress processing logic and the approximate, correction, and exact tables are stored in memory 40(1)-40(3). Each memory 40(1)-40(3) may comprise read only memory (ROM), random access memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. The processors 38(1)-38(3) are, for example, microprocessors or microcontrollers that each execute instructions for the process logic 43(1)-43(3) and 44(1)-44(3) stored in memory 40(1)-40(3), respectively. Thus, in general, the memory 40(1)-40(3) may each comprise one or more computer readable storage media (e.g., a memory device) encoded with software comprising computer executable instructions and when the software is executed (by the processors 38(1)-38(3) it is operable to perform the operations described herein in connection with ingress logic 43(1)-43(3) and egress logic 44(1)-44(3).
After the packet is received by the second port, at 78 the egress processing in the second port identifies a subsequent network device to which the packet should be forwarded. The egress processing identifies the subsequent network device through the use of an exact egress table at the second port including exact destination MAC addresses each associated with a network device connected to the second port. At 78, the packet is forwarded to the subsequent network device identified by the egress processing.
At 95, packet 16 is received via network interface 37(1) of port 30(1). At 96, the association of the source MAC address of the received packet 16 to port 30(1) is determined. That is, a check is performed to determine if the source MAC address of packet 16 is already present in the ingress-side correction table. If the source MAC address is not present, the source MAC address is added to the ingress-side correction table and the source MAC address is associated with the port identifier of port 30(1).
At 100, at least a portion of the control information 20, particularly the destination MAC address, in packet 16 is compared to the entries 60(1)-60(N) in ingress-side correction table 42(1). At 105, a determination is made as to whether the destination MAC address in packet 16 matches the exact forwarding information in any of the entries 60(1)-60(N). If the destination MAC address in packet 16 matches the exact forwarding information in any of the entries 60(1)-60(N), method 90 proceeds to 110. At 110, using the destination MAC address, the egress port for the packet 16 is identified, the packet is forwarded to this port, and method 90 ends.
If the destination MAC address in packet 16 does not match the exact forwarding information in any of the entries 60(1)-60(N), method 90 proceeds to 115. At 115, a hash function is used to compute a hash value of the destination MAC address (and optionally other control information) in packet 16. The hash value is then compared to the entries 50(1)-50(N) in approximate ingress table 32(1). At 120, a determination is made as to whether the hash value matches the compressed forwarding information in any of entries 50(1)-50(N). If the computed hash value matches the compressed forwarding information in any of the entries 50(1)-50(N), method 90 proceeds to 125. At 125, using the port associated with the matching compressed forwarding information, the egress port for the packet 16 is identified, the packet is forwarded to this port, and method 90 ends.
If the computed hash value does not match the compressed forwarding information in any of entries 50(1)-50(N), method 90 proceeds to 130. At 130, packet 16 is broadcast to all other ports in switch 14(1) (i.e., ports 30(2) and 30(3)), and method 90 ends.
At 145, packet 16 is received at port 30(3). As noted above, packets, such as packet 16, include control information. This control information may comprise a destination MAC address as well as a source MAC address. At 150, the source MAC address in packet 16 is compared to the entries 55(1)-55(N) in exact egress table 34(3). At 155, a determination is made as to whether the source MAC address matches the exact forwarding information in any of entries 55(1)-55(N). If a match is found, the egress processing determines that packet 16 was sourced by port 30(3) and is erroneously looping back. As such, method 140 proceeds to 160 where packet 16 is dropped and method 140 ends.
If, at 155, a match between the source MAC address and the forwarding information in any of entries 55(1)-55(N) is not found, method 140 proceeds to 165. At 165, the destination MAC address in packet 16 is compared to entries 55(1)-55(N) in exact egress table 34(3). At 170, a determination is made as to whether the destination MAC address matches the exact forwarding information in any of entries 55(1)-55(N). If a match is found, method 140 proceeds to 175 where packet 16 is forwarded, via network interface 37(3), to the network device identified in the matching entry 55(1)-55(N) and method 140 ends.
If, at 170, a match between the destination MAC address in packet 16 and the exact forwarding information in entries 55(1)-55(N) is not found, method 140 proceeds to 180. At 180, a determination is made as to whether packet 16 is a special broadcast/special flood packet (i.e., whether packet 16 was received at port 30(3) as a result of a broadcast). This determination is made based on a bit in control information 20 of packet 16 referred to herein as the special broadcast or special flood bit. If the special broadcast bit is set, packet 16 is determined to be a broadcast packet. As such, another flood is not used and method 140 proceeds to 185 where the packet 16 is forwarded on this port.
Alternatively, if the special broadcast bit is not set, then packet 16 is determined to be a regular packet that was received from another port (i.e., the packet was not broadcast to all ports). In other words, if packet 16 is not a special broadcast packet, then the packet came from an original ingress port and not a proxy port that set the special flood bit and sent the packet to all ports. If packet 16 is a regular packet (i.e., not a special flood packet), then, at 186, the association of the source address of the packet to the ingress port from where it was sourced is determined. That is, a check is performed to determine if the compressed source address is present in the port's approximate ingress table. If it is not present, the address is inserted and the source port (ingress port which sent the packet) is associated with the address.
At 190, a special broadcast is performed to flood all ports with the packet. Operations that occur subsequent to such a flood are described below.
As noted above, in certain circumstances, an egress port, such as port 30(3), can perform a special broadcast of a packet when the packet was erroneously received by the port. To perform a special broadcast, a bit in the control information of the packet (i.e., the special broadcast bit) is set and the packet is forwarded to all the ports (including itself). When all the ports receive this special broadcast packet, if the intended destination host device is connected to a port, the corresponding port will understand that another port is trying to reach it, but has wrong information in its approximate ingress table (i.e., due to aliasing). With this knowledge, the correct egress port can now inform the ingress port that if it wants to reach the host device identified in the special broadcast packet, it should send the packet to that port. This correction notification is sent via a special control message or through the supervisor software. Once the ingress port learns of the correct port for packets having the specific destination MAC address, the destination MAC address is added to the ingress correction table.
Special broadcasts may also advantageously prevent looping of a packet. When a packet is broadcasted by a port, it is received by the ingress port that forwarded the packet. However as noted in
If a certain MAC address is removed from the exact egress table on an egress port (e.g., when a network device is disconnected from the egress port), the egress port notifies all the other ports of this removal (via a correction message) so that they can remove the same entry from their respective correction tables as well as approximate ingress tables. Similarly, such correction messages may be used to add entries, as noted above, to the correction tables and the approximate ingress tables (i.e., when a new network device is connected to an egress port).
At 225 a correction message, as noted above, is received at the ingress port. At 230, a determination is made as to whether the correction message is for the addition of an entry to the correction table. If the correction message is for addition of an entry, method 220 proceeds to 235 where the MAC address and the associated port are extracted from the correction message and this pair is added to the correction table.
If, at 230, it is determined that the correction message is not for addition of an entry (i.e., the correction message is for deletion), then method 220 proceeds to 240 where the MAC address is extracted from the message. At 245, a search is conducted in the correction table for an entry corresponding to the extracted MAC address, and this corresponding entry is deleted from the correction table. At 250, a search is conducted in the approximate ingress table for an entry corresponding to the extracted MAC address, and this corresponding entry is deleted from the approximate ingress table.
As noted above, one advantage of the packet forwarding techniques described herein is reduced memory utilization (when compared to conventional techniques using a centralized forwarding table containing the exact MAC addresses of all network devices in the computer network). More particularly, the combination of an approximate ingress table and an exact egress table may achieve correct forwarding with only about 15% to 25% of the memory utilized in conventional techniques. The memory utilized to forward packets using an approximate ingress table and an exact egress table may determined as described below, where:
In a traditional system the forwarding tables on a chip would consume the amount of memory given below by Equation (1).
T=n(k+d)bits Equation (1)
Using techniques described herein, the approximate ingress table on a chip would consume the amount of memory given below by Equation (2), and the exact egress table would consume the amount of memory given below by Equation (3).
A=n(ck+ceiling(log2[p]) bits Equation (2)
E=[n(k+d)]/h bits Equation (3)
As such, the total memory consumed in accordance with certain techniques described herein may be given below by Equation (4).
T=A+E=n(ck+ceiling(log2[p])[n(k+d)]/h bits Equation (4)
Therefore, the memory savings is given below by Equation (5).
S=[1−(A+E)/T]×100%={1−[(ck+ceiling(log2[p]))/(k+d)+1/h]}×100% Equation (5)
The above description is intended by way of example only.