Switches are commonly used to transfer information. A common, prior art mesh switch architecture is illustrated in
The present invention is directed toward a data switch for transferring data. In one embodiment, the data switch includes an A port group, a B port group, and an AB connector. In this embodiment, the A port group includes an A interface, a first A port that is electrically connected to the A interface, and a second A port that is electrically connected to the A interface. The B port group includes a B interface, a first B port that is electrically connected to the B interface, and a second B port that is electrically connected to the B interface. Further, the AB connector directly connects the A interface to the B interface so that data from first A port is transferred from the A interface to the B interface via the AB connector.
With this design, in certain embodiments, because the AB connector services a number of ports, the switch can have a large number of ports with a relatively small form factor. Further, this switch architecture can have very high bandwidth because of the amount of data that can be flowing through the switch in parallel.
Additionally, the data switch can include a C port group that includes a C interface, a first C port that is electrically connected to the C interface, and a second C port that is electrically connected to the C interface. In this embodiment, the data switch can also include an AC connector that directly connects the A interface to the C interface, and a BC connector that directly connects the B interface to the C interface.
Further, the data switch can include a D port group that includes a D interface, a first D port that is electrically connected to the D interface, and a second D port that is electrically connected to the D interface. In this embodiment, the data switch can include an AD connector that directly connects the A interface to the D interface, a BD connector that directly connects the B interface to the D interface, and a CD connector that directly connects the C interface to the D interface.
In one embodiment, each of the connectors has enough bandwidth to support a maximum combined input bandwidth of the respective ports. With this design, the switch supports parallel data transfer between the ports.
Further, one or more of the port groups can include more than two ports. For example, one or more of the port groups can include three, four, or five ports that are connected to the respective interface.
In one embodiment, each of the ports includes an input buffer and an output buffer. Moreover, the data switch can include a control system that utilizes a switching algorithm that controls the transfer of data packets between the ports. For example, the control system can be a distributed, decentralized system that includes a port control system at each port that controls the transfer of data.
In one embodiment, the switching algorithm includes a burst read function that causes each of the ports to sequentially send all of the data packets in each input buffer, per priority, without waiting for a response. The burst read function can provide a significant performance increase in randomized data packet traffic as it allows the data packets to be transmitted when otherwise those packets could be blocked by a packet at the front of the queue that is waiting for a congestion at its intended destination port to be resolved.
In certain embodiments, if the first A port is attempting to send a first data packet to the first B port at the same time that the second A port is attempting to send a second data packet (with the same priority as the first data packet) to the first B port, the switching algorithm stops the burst read function so that the second A port stops sending the second A data packet until an acceptance is received by the first A port. Stated in another fashion, if two source ports of a particular port group are attempting to send data to the same destination port with the same priority, the switching algorithm stops one of the source ports from sending the data to the destination port until the other data has been sent. With this design, the bandwidth reserved for the second A port can be used by the first A port to transfer the first data packet to expedite the data transfer to the first B port.
In another embodiment, if the first A port is attempting to sequentially send a first data packet and a second data packet (with the same priority) to the first B port, the switching algorithm stops the burst function if an abort is received for the first data packet and waits until an acknowledgement is received for the first data packet prior to attempting to send the second data packet. Stated in another fashion, if one of the ports has a plurality of data packets to send to the same destination port, with the same priority, if an abort is received, the switching algorithm waits for the acknowledgement from the destination port prior to sending the next data packet. With this design, the switching algorithm prevents the data packets that are out of order from being sent because these out of order data packets will not be accepted out of order and these out of order data packets, if sent, will use resources that can be allocated for sending other packets.
The present invention is also directed to a switching algorithm and a method for transferring data.
The novel features of this invention, as well as the invention itself, both as to its structure and its operation, will be best understood from the accompanying drawings, taken in conjunction with the accompanying description, in which similar reference characters refer to similar parts, and in Which:
In one embodiment, the switch 14 includes a plurality of ports 28, a plurality of interfaces (“I/F”) 30, and a plurality of electrical connectors 32. The design of each of these components can vary pursuant to the teachings provided herein. As an overview, in
In one embodiment, the integrated circuit 10 supports the components of the switch 14.
Each of the ports 28 provides a connector point for connecting the switch 14 to the integrated circuit 10. The number of ports 28 in the switch 14 can be changed to achieve the design requirements of the switch 14. In
In
In one embodiment, each of the ports 28 includes an output buffer 28A that provides temporary storage of data that is leaving the respective port 28 and an input buffer 28B that provides temporary storage of data arriving at the respective port. In one embodiment, there is a separate memory for each priority data packet. Alternatively, portions of a single memory can be used for each priority data packet.
Further, each port 28 can include a packet tracker 28C (sometimes referred to as a Protocol Enforcement “PE” Buffer) that tracks a certain number of packets. For example, the packet tracker 28C can track four packets per priority, per port. Alternatively, the packet tracker 28C can be designed to track more than four or fewer than four packets per priority, per port.
The number of interfaces 30 used in the switch 14 can be varied according to the number of port groups 34A-34D. In certain embodiments, each port group 34A-34D includes an interface 30. Thus, the number of interfaces 30 is equal to the number of port groups 34A-34D. Alternatively, the switch 14 can be designed with more than four or fewer than four interfaces 30.
In
The number of connectors 32 used in the switch 14 can be varied according to the number of interfaces 30. In
In one embodiment, the connectors 32 between interfaces 30 have enough bandwidth to support the aggregate bandwidth of the ports 28 in the port group 34. For example, the bandwidth of the connectors 32 can be time-sliced so that all ports 28 in each port group 34 have a dedicated portion of the connector 32 bandwidth, each portion of which is large enough to support the maximum bandwidth that the port 28 can provide. In this way, the parallel data transfer advantage in bandwidth that is achieved in the traditional mesh architecture is maintained while the number of connectors 32 required can be reduced to make this hybrid architecture more size-efficient.
As one non-exclusive example, each interface 30 can have a bandwidth of approximately 10 gigabits/second. In this example, if all of the ports 28 of a particular interface 30 have data to transmit, each of the ports 28 would get 2.5 gigabits/second for a 10 gigabit/second system. Alternatively, (i) if only three ports 28 have data to transmit, each of the ports 28 would get 3.3 gigabits/second for a 10 gigabit/second system, (ii) if only two ports 28 have data to transmit, each of the ports 28 would get 5 gigabits/second for a 10 gigabit/second system, or (iii) if only one port 28 has data to transmit, this port 28 would get 10 gigabits/second for a 10 gigabit/second system.
Additionally, the switch 14 includes a switch control system 63 that controls the transfer of each data packet in the switch 14. In one embodiment, the switch control system 63 is a distributed, decentralized control system with each port 28 including a separate port control system 63A. In this embodiment, each port control system 63A can independently make decisions regarding its port, in parallel with the other port control systems 63A. Additionally, each of the interfaces 30 can also includes an interface control system 63B that controls the flow of data to and from that interface 30. In this example, each of the control systems 63A, 63B is merely a place where control and logic can occur.
Alternatively, for example, the control of data can occur in just the ports 28 with the separate port control systems 63A, or just the interfaces 30 with the interface control systems 63B.
As an overview, in one embodiment, the port control systems 63A use a switching algorithm in which all data packets stored in the buffer 28B of each port 28 of a given priority are read out sequentially without waiting to see if a particular packet is accepted or rejected at the intended destination port. Stated in another fashion, each data packet in the buffer 28B of the port 28 is sent sequentially without waiting for acknowledgements or aborts. In this embodiment, the data packets in each port 28 are read out sequentially with the highest priority data packets granted transmission before the lower priority data packets. For example, data packets with priority 1 in the port will be transmitted before data packets with priority 0 in the port. In this example, if the port only has two data packets with priority 1 and three data packets with priority 0, the two priority 1 data packets will be sequentially sent and then the three priority 0 data packets will be sequentially sent without waiting to see if a particular packet is accepted or rejected at the intended destination port. This algorithm used by the port control system 63A can be referred to as a “burst read algorithm”.
In this design, the acceptance or rejection of a particular data packet is determined later when the source port receives either an acknowledgment or abort signal from the intended destination port for each packet that had been read out. This architecture is a simple, space-efficient solution to head-of-line blocking for packets within the input buffer of a particular priority. This burst read algorithm can provide a significant performance increase in randomized traffic as it allows packets to be transmitted when otherwise those packets could be blocked by a packet at the front of the queue that is waiting for a congestion at its intended destination port to be resolved.
The port at which the data packet 64 starts (the first A port 36 in the previous example) can be referred to as the “source port”, while the port in which the data packet 64 is directed (the first C port 40 in the previous example) can be referred to as the “destination port”. Further, the interface 30 which is sending the data packet 64 is referred to as the upstream interface (the A interface 44 in the previous example) and the interface 30 which is receiving the data packet 64 is referred to as the downstream interface (the C interface 48 in the previous example).
In one embodiment, each interface 44-50 contains logic that is used by the interface control system 63B for both upstream and downstream data since each port 28 can be both a source port and destination port. More specifically,
With this design, when data packets from one or more source ports (not shown in
Further, when the interface 30 is a downstream interface, data flow from one or more source ports (not shown in
In this example, the C interface 48 has selected the data packet 64A from the first A port 36 over the data packet 64B from the first B port 38. For example, (i) the data packet 64A from the first A port 36 could have the same priority and had been chosen before the data packet 64B from the first B port 38, or (ii) some other elaborate fairness algorithm could have been used.
As can be seen in
As can be seen in
It should be noted that with the decentralized control system disclosed herein, the PE logic for each of the destination interfaces is operating concurrently and independently of each other and each of the destination interfaces takes care of its own packet transfers. As can be seen from
In this example, the switch includes four interfaces and there are four interface flows running concurrently. Alternatively, in the switch can have more the four interfaces and more than four interface flows running concurrently.
In certain embodiments, the present switching algorithms provides high performance bandwidth while ensuring that all of the ports are serviced fairly. In many applications, the specific data flow that a switch will have to transfer is constantly changing, and the specifics are frequently evolving. The present switching algorithms are designed to handle various manners of traffic.
The initial goal of the switching algorithms was high performance with fairness during completely randomized traffic. However, many switches have a more uniform traffic flow such as in a backplane operation. This type of traffic has a structure where many ports attempt to send data packets to one port (the backplane port, for example). In certain integrated circuits, backplane traffic is a significant portion of the overall data flow through the switch, although there is always a component of the data flow that may be random (such as control plane traffic).
In certain embodiments, the switching algorithm provides fair, high performance switching in a randomized environment while also having the architecture that provides good performance in backplane traffic.
In one embodiment, the switching algorithm recognizes the presence of backplane data flow and adapts so that data is efficiently transferred in the presence of backplane data flow. This solution also had to be able to quickly revert to the nominal algorithm in the case the traffic changed and was no longer just backplane traffic.
One enhancement is to the arbitration between ports in a port group. In the switch 14 illustrated in
In a backplane environment, multiple source ports are trying to send data to the same destination port. In the present invention, whenever two or more source ports (of a particular port group) are competing for the same destination port, the switching algorithm enforces a fairness algorithm that would service one of them while rejecting the others.
With the burst read algorithm as defined above, the rejected source ports would continue trying to access the destination port due to the distributed nature of the switch architecture but would only be granted access once the source port that was transmitting finished its packet and the fairness algorithm then selected a different source port in the group. The continued attempts to access a destination port that is servicing some other source port would take up bandwidth in the connector that the four ports share. This bandwidth is wasted until the packet attempting to be transmitted is actually able to be received by the output port. During backplane traffic, this can be a major cost such as when all four of the source ports in a group are vying for the same destination port.
In one embodiment, if two or more source ports (of a particular port group) are competing for the same destination port with the same priority, the switching algorithm of the source interface stops allocating bandwidth over the connector to one of the source ports for that particular priority and that particular destination. In this embodiment, for example, if the first A port is attempting to send a first data packet to the first B port at the same time and with the same priority that the second A port is attempting to send a second data packet to the first B port, the switching algorithm of the A interface stops trying to send the second data packet until an acceptance is received by the first A port. Stated in another fashion, if two source ports of a particular port group are attempting to send data to the same destination port, with the same priority, the switching algorithm at the source interface stops one of the source ports from sending the data with that priority to that particular destination port until the other data has been sent. With this design, the bandwidth reserved for the second A port can be used to transfer the first data packet to expedite the data transfer to the first B port. Stated in another fashion, this allows for the reallocation of the bandwidth that would have been wasted by the second A port to the other A ports, including the first A port.
In this embodiment, the logic of the upstream interface recognizes that the packets from the ports in the port group will collide. Instead of retrying itself and taking up bandwidth, the rejected port turns itself ‘invisible’ to the algorithm controlling access to the shared connector. This allows the bandwidth of the rejected port to be reallocated. When this is done for all three of the ports in the port group that were not granted access to the destination port, this allows all the bandwidth of the connector to be given to the one source port that was accepted. Invisibility is cleared whenever an ‘end of packet’ is seen which will allow all the source ports to attempt access to the destination port and the fairness algorithm to select one.
This solution improves the bandwidth for the connector while not changing the fundamental architecture of the switch. Without wholesale changes to the switching algorithm, the invisibility enhancement allows the algorithm to adapt to a high-collision traffic environment such as a backplane environment while not impacting regular traffic since only those packets that are rejected because of a collision with other ports in the group going to the same destination with the same priority are made invisible.
Another option for enhancement of the switching algorithms for a backplane traffic environment is called backplane traffic mode. As discussed above, the burst reading function can have a negative performance impact in a backplane traffic environment. In a situation where all the packets in a source port buffer (for a given priority) have the same destination (such as the case would be in a backplane traffic environment) then burst reading may cause the source port to attempt to transfer the wrong packet (out of order) if the source port is allowed to just continue burst reading continuously, thereby using bandwidth that otherwise could be allocated to send other data packets. This can cause a reduction in performance of the switch.
In this embodiment, if one of the ports has a plurality of data packets to send to the same destination port, with the same priority, the switching algorithm begins sequentially sending the data packets. However, if an abort is received, the switching algorithm halts the burst read function and quits sending the data packets to that destination port with that priority until an acknowledgement is received from the destination port for that aborted packet. With this design, the switching algorithm prevents the data packets that are out of order from being sent because these out of order data packets will not be accepted out of order and these out of order data packets, if sent, will use resources that can be allocated for sending other packets.
In this example, if the first A port is attempting to send a first data packet and a second data packet sequentially to the first B port and the data packets have the same priority, if an abort is received for the first data packet, the switching algorithm of the source port stops the burst function for the first A port, for that priority, and waits until an acknowledgement is received for the first data packet prior to sending the second data packet. Stated in another fashion, in the backplane traffic mode, if an abort is received, the logic of the source port turns off the burst read function for the source port, for that priority, when all the packets in the source port buffer are destined for the same destination port, provided the source port has packets in the packet tracker positions. The progression from one packet tracker position to the next to initiate the reading out of the packet, in backplane traffic mode, is made when the packet that was read out gets acknowledged by the destination port. With this design, the switching algorithm at the source port prevents the data packets that are out of order from being sent because these out of order data packets will use resources that can be allocated for sending other packets.
While the particular switch as herein shown and disclosed in detail are fully capable of obtaining the objects and providing the advantages herein before stated, it is to be understood that they are merely illustrative of one or more embodiments and that no limitations are intended to the details of construction or design herein shown other than as described in the appended claims.