The present disclosure relates to scheduling multicast packets for transmission.
Traditional Internet Protocol (IP) communication allows a source device to send packets to a single network-connected destination device (unicast transmission) or to all network-connected destination devices (broadcast transmission). A third technique, referred to as multicast transmission or multicasting, has evolved to support increased demands for various audio and video applications (e.g., online conferences, video on demand, etc.) that involve high data rate transmission to multiple (but not all) destination devices.
Multicasting is a form of communication that allows a source device to send an IP packet to a network for distribution to multiple destination devices. The network usually includes at least one multicast-enabled networking device (e.g., router or other switching device) configured to replicate the packet and forward the replicated packets to the multiple destination devices. Multicast-enabled networking devices typically include a centralized controller to replicate the received packets, and one or more output interfaces (ports) to forward the replicated packets to the destination devices over various data links.
In a multicast networking device having a plurality of output ports, a hierarchical packet scheduling tree is generated for a first port. The hierarchical packet scheduling tree is generated from a transmission queue for the first port. A sequential search of the hierarchical scheduling tree is performed to determine a packet pointer to a first packet in the transmission queue of the first port. Based on the packet pointer, packet data for the first packet is obtained and the first packet comprising the packet data is transmitted to a destination device via the first port.
Input ports 15(1), 15(2), and 15(3) and output ports 20(1), 20(2), and 20(3) may be provided by one or more line cards. In the example of
The multicast storage 45 of memory 35 may include one or more packet queues, one or more hierarchical packet scheduling structures (as described below), packet data, etc. The dequeue engines 50(1), 50(2), and 50(3) are associated with port 20(1), port 20(2), and port 20(3), respectively. The dequeue engines 50(1), 50(2), and 50(3) each include logic that enables the processor 30 to dequeue packet information from queues and logic that allows the processor 30 to perform the operations described further below.
Memory 35 may comprise read only memory (ROM), random access memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. The processor 30 is, for example, a microprocessor or microcontroller that executes instructions for the per-port multicast logic 40 and dequeue engines 50(1), 50(2), and 50(3). Thus, in general, the memory 35 may comprise one or more tangible (non-transitory) computer readable storage media (e.g., a memory device) encoded with software comprising computer executable instructions and when the software is executed (by the processor 30) it is operable to perform the operations described herein in connection with per-port multicast logic 40, dequeue engines 50(1), 50(2), and 50(3), and, more generally, the per-port multicast techniques.
In operation, incoming packets are received at one of the input ports 15(1)-15(3) from a source device 60(1)-60(n). An example sequence of packets received from source devices 60(1)-60(n) is shown in
Per-port multicast logic 40 is executed by processor 30 so that the processor 30 may perform packet processing of the received packets 66-69. In general, this processing may include replication of the received packets 66-69 and forwarding of the packets via routing/switching fabric 25 to one or more of the output ports 20(1), 20(2), and 20(3) for transmission to one or more of the destination devices 75(1), 75(2), and 75(3). In the example of
Packet replication is known in the art and, as such, the details of packet replication are not provided herein. In addition, any specific elements of multicast networking device 10 that may be used to perform such packet replication (e.g., buffers, controllers, control lists, lookup circuits, replication tables, etc.) have been omitted from
As noted, port 20(1) is, for example, a 40GE port that subscribes only to the third flow C, port 20(2) is a 10GE port that subscribes to all three flows A, B, and C, and port 20(3) is a 1 GE that subscribes only to the first and second flows A and B. In a conventional multicast networking device, although port 20(1) only subscribes to one flow and operates at a faster speed than the other ports 20(2) and 20(3), port 20(1) cannot send the replicated packet 67 from the third flow until the replicated packet 66 is first sent by ports 20(2) and 20(3). This performance-limiting phenomenon is referred to as Head-of-line blocking (HOL blocking or HOLB) and results, for example, from the slower speeds of ports 20(2) (1GE) and 20(3) (10GE) relative to port 20(1) (40GE), flow-specific restrictions relating to the first flow A, etc.
Certain conventional arrangements use a per-packet queue data structure where the egress processing elements (i.e., egress dequeue engines) can only locate the first packet in any sequence, thereby suffering from HOL blocking issues. Other conventional arrangements attempt to prevent HOL blocking by using ingress multicast packet replication. However, ingress multicast packet replication is costly in terms of silicon real-estate and may not be practical for certain applications (e.g., switch-on-chip solutions).
Presented herein are per-port multicast processing techniques that are configured to prevent such HOL blocking issues without the need for the costly ingress multicast packet replication. More specifically, the per-port multicast processing techniques use a per-port hierarchical data structure that enables the ports to locate queued packets and transmit those packets independently and immediately (i.e., without waiting for other ports to transmit). In other words, in the example of
As described further below, the per-port multicast processing techniques use a bit to represent each packet and create a per-port searchable hierarchical tree that is subsequently used to schedule the multicast packets. With completely independent structures for each port, the per-port hierarchical tree is scalable, able to support line rate performance even while simultaneously supporting different fan-out port speeds, and eliminates HOL blocking.
In this example, received packets are replicated (as known) and scheduled for transmission via one of the ports 1 to N. More specifically, the packets are placed into one of a plurality of transmission queues 140(1)-140(N). In the example of
The first packets scheduled for transmission are at the first level (level 1) of the transmission queues 140(1)-140(N), while the last packets scheduled for transmission are at the last level (level x) of the transmission queues 140(1)-140(N).
Method 80 begins at 85 where the packet pointers 150(1)-150(x) are used to generate a fanout vector for each port.
At 90, the fanout vectors are used to generate a hierarchical packet scheduling tree for each port. In other words, port-specific hierarchical packet scheduling trees are generated for each of the ports 1 to N.
As shown, hierarchical packet scheduling tree 165 includes three hierarchical levels 170(1)-170(3). The first hierarchical level 170(1) comprises the fanout vector 155 in which the vector entries 160(1)-160(x) are grouped into logical blocks 175(1)-175(w) each containing sixteen (16) vector entries. The size of these logical blocks may vary depending on the size of the fanout vector or based on other values.
As described above, each vector entry 160(1)-160(x) corresponds to an entry in a packet pointer associated with port 1. As such, each vector entry 160(1)-160(x) may be indexed with respect to a selected packet pointer. As shown, the first entry in block 175(1) is vector entry 160(1) from packet pointer 150(1). As such, packet pointer 150(1) is selected as the base packet pointer for indexing purposes, and each of the other entries in block 175(1) may be indexed with respect to packet pointer 150(1). Similarly, the first entry in block 175(w) is vector entry 160(w) from a packet pointer 150(w). As such, packet pointer 150(w) is selected as the base packet pointer for indexing purposes, and each of the other entries in block 175(w) may be indexed with respect to packet pointer 150(w). This indexing is shown in
At 95, the next (upper) hierarchical level in hierarchical scheduling tree 165 is populated based on values in lower level blocks 175(1)-175(w). More specifically, as shown in
In the example of
As noted, an upper level entry set to the bit value of 1 means that there is a packet for transmission on the port at a location identified in the corresponding lower level block. In order to be able to locate this packet, a linked list is created for any such upper level entries having a bit value of 1. In the example of
At 100, a determination is made as to whether there are additional upper hierarchical levels that are to be populated. This determination is made, for example, by determining if more than one entry exists in the current hierarchical level. If more than one entry exists, method 80 returns to 95 where the next level is populated. However, if only one entry exists, method 80 proceeds to 105. In the example of
In the example of
As noted, an upper level entry set to the bit value of 1 means that there is a packet for transmission on the port at a location identified in the corresponding lower level block. In order to be able to locate this packet, another linked list is created for any such upper level entries having a bit value of 1. In the example of
At 105, a sequential search of the populated hierarchical scheduling tree 165 is conducted with a dequeue engine corresponding to port 1. More particularly, the dequeue engine will first examine the highest level in the hierarchical scheduling tree 165 to determine if the entry 185 at this level has a bit value of 0 or 1. If the bit value is zero, the dequeue engine determines that there are no packets that need to be transmitted on this port (i.e., the multicast queue for this port is empty) and the method ends. However, if entry 185 has a bit value of 1, then the linked list for entry 185 is used to locate the entry 180(1) in the next level 170(2) that has a bit value of 1. After this entry 180(1) is located, the linked list for entry 180(1) is used to locate the block 175(1) in the next level 170(1) that includes a bit value of 1. The bit position of the first bit value of 1 (vector entry 160(2)) is located and is used to determine (infer) the packet pointer for that entry.
At 110, the packet pointer is used to generate a memory pointer and at 115 the memory pointer is used to locate the packet for transmission on port 1. At 120, the packet is transmitted on port 1.
In summary, the hierarchical packet scheduling tree allows for indirect derivation of the packet pointer, and there is no need to wait for the pointer to reach the head-of-the-line (i.e., the front of the transmission queue). Since each port operates independently on its own data structure, there are no HOL blocking issues.
In summary, every block 175(1)-175(w) can store up to 16 frames in the same multicast queue in the order that they are received. Each frame corresponds to a packet for transmission on the subject port (port 1 in
In the example of
In the example of
The per-port multicast processing techniques presented herein use a compact and searchable per-port (i.e., port-specific) tree to schedule multicast packets. The use of this data structure eliminates the issues pertinent to multicast replication and improves multicast performance even while simultaneously supporting ports having different speeds (e.g., 1GE/10GE/40GE/100GE speeds). The per-port multicast processing techniques may also eliminate HOL blocking between fanout ports, and the compact representation of the fanouts makes implementation feasible for switch-on-chip solutions.
The above description is intended by way of example only.
Number | Name | Date | Kind |
---|---|---|---|
7280542 | Hassan-Ali et al. | Oct 2007 | B2 |
7809009 | Tatar et al. | Oct 2010 | B2 |
8184628 | Cai et al. | May 2012 | B2 |
20030061227 | Baskins et al. | Mar 2003 | A1 |
20110170542 | Liu et al. | Jul 2011 | A1 |
20120002546 | Sundararaman et al. | Jan 2012 | A1 |
20120275301 | Xiong | Nov 2012 | A1 |
20120275304 | Patel et al. | Nov 2012 | A1 |