1. Field of the Invention
The present invention relates generally to networks. Particularly, the present invention relates to transmission bandwidth control.
2. Description of the Related Art
Storage networks can comprise several Fibre Channel switches interconnected in a fabric topology. These switches are interconnected by a number of inter-switch links (ISLs), which carry both data and control information. An ISL is terminated at a port on each of the two switches it connects to. The ISL typically provides a physical link between the two switches. Frames/packets can be transmitted between the switch ports over the ISL. The rate at which these packets can be transmitted depends upon, among other factors, the bandwidth provided at the port and the buffer-to-buffer credit established between the two ports connected by the ISL.
Typically, traffic transmitted from one switch port to another, via an ISL, can consist of multiple flows, where each flow can be associated with a pair of devices within the storage network (e.g., host-storage device pair). Frames associated with these flows are temporarily stored in a buffer associated with the transmitter of the port before being transmitted. If only a single buffer is used per transmitter, a single flow may block the frames associated with other flows. To mitigate this problem, the ISL can be logically split into one or more virtual channels (VCs), where each VC has an associated buffer. Data flows can then be directed over separate VCs to avoid blocking. Each VC can support one or more data flows.
The bandwidth provided by a port can be divided among the VCs associated with that port. For example, a port having a 10 Gbps transmitting bandwidth and 10 VCs can allow each VC equal transmitting bandwidth of 1 Gbps. However, such schemes, employing fair division, may be disadvantageous when one or more VCs include data flows that deserve more bandwidth than data flows on other VCs. For example, a data flow between two mission-critical applications may require and deserve more bandwidth than a data flow for simple data backup. Thus, traffic through different VCs can have different quality of service (QoS) requirements. In such cases weighted division of bandwidth can allocate bandwidth to a VC based on its assigned weight. However, these methods do not provide precise individual control over the bandwidths assigned to one or more VCs.
Another technique for bandwidth control is called credit throttling. In credit throttling, a receiving port can throttle the number of credits sent to a transmitting port on the other end of an ISL in order to control the received bandwidth at the receiving port. However, in this case the transmitter itself has no control over its transmission bandwidth. The receiving port connected on the other end of the ISL controls the transmission bandwidth of the transmitter.
An input/output port on a switch can be connected to an input/output port on an adjacent switch using inter-switch links (ISLs). Traffic flow between the two ports can be divided into logical channels or virtual channels (VCs). The transmitter can maintain a separate queue for each VC.
A bandwidth limiting circuit can be coupled with the transmitting port for controlling the bandwidth of one or more VCs associated with that port. The bandwidth limiting circuit can include a register that is initially loaded with a threshold value TH, which threshold value is related to the maximum bandwidth allocated for the associated group of VCs. The register is incremented periodically (at a rate r) with the threshold value. The register is decremented by the frame length in bytes each time a frame is transmitted from one of the VCs belonging to the group. A comparator compares the register value to zero. The group is enabled to transmit a frame when the register value is greater than zero. The maximum bandwidth allocated to the group of VCs can be determined approximately by the ratio of the threshold value TH and the rate r.
A bandwidth guarantee circuit associated with a group of VCs guarantees the group of VCs with a minimum bandwidth. The bandwidth guarantee circuit includes bandwidth limiting circuits associated with each group of VCs. Additionally, the bandwidth circuit enables a group of VCs based on a fairness algorithm if the output of comparators of all the bandwidth limiting circuits is zero. As a result, the bandwidth guarantee circuit guarantees at least a minimum bandwidth determined by the bandwidth limiting circuit and provides additional bandwidth based on the fairness algorithm.
The sum of bandwidths of all groups should be less than or equal to the maximum bandwidth provided by the port.
Bandwidth limiting and bandwidth guarantee can also be provided on host bus adaptors within a host device connected to the network.
The present invention has other advantages and features which will be more readily apparent from the following detailed description of the invention and the appended claims, when taken in conjunction with the accompanying drawings, in which:
A variety of devices can be connected to the fabric 102. A Fibre Channel fabric supports both point-to-point and loop device connections. A point-to-point connection is a direct connection between a device and the fabric. A loop connection is a single fabric connection that supports one or more devices in an “arbitrated loop” configuration, wherein signals travel around the loop through each of the loop devices. Hubs, bridges, and other configurations may be added to enhance the connections within an arbitrated loop.
On the fabric side, devices are coupled to the fabric via fabric ports. A fabric port (F_Port) supports a point-to-point fabric attachment. A fabric loop port (FL_Port) supports a fabric loop attachment. Both F_Ports and FL_Ports may be referred to generically as Fx_Ports. Typically, ports connecting one switch to another switch are referred to as expansion ports (E_Ports). In addition, generic ports may also be employed for fabric attachments. For example, G_Ports, which may function as either E_Ports or F_Ports, and GL_Ports, which may function as either E_Ports or Fx_Ports, may be used.
On the device side, each device coupled to a fabric constitutes a node. Each device includes a node port by which it is coupled to the fabric. A port on a device coupled in a point-to-point topology is a node port (N_Port). A port on a device coupled in a loop topology is a node loop port (NL_Port). Both N_Ports and NL_Ports may be referred to generically as Nx_Ports. The label N_Port or NL_Port may be used to identify a device, such as a computer or a peripheral, which is coupled to the fabric.
In the embodiment shown in
Switches S1110, S2112, S3114, and S4116 are connected with one or more inter-switch links (ISLs). Switch S1110 can be connected to switches S2112, S3114, and S4116, via ISLs 180a, 180b, and 180c, respectively. Switch S2112 can be connected to switches S3114 by ISL 180d. Switch S3114 can be connected to switch S4116 via ISL 180e. Note that although only single links between various switches have been shown, links between any two switches can include multiple ISLs. The fabric can use link aggregation or trunking to form single logical links comprising multiple ISLs between two switches. For example, if 180a comprised of three 2 Gbps ISLs, the three ISLs can be aggregated into a single logical link between switches S1110 and S2112 with a bandwidth equal to the sum of bandwidth of the individual ISLs, i.e. 6 Gbps. It is also conceivable to have more than one logical links between two switches where each logical link is composed of one or more trunks. The fabric 102 with multiple switches interconnected with ISLs can provide multiple paths with multiple bandwidths for devices to communicate with each other.
Ports 206 and 208 can include one or more logical channels VC0228-VCn 232, also known as virtual channels in Fibre Channel networks. Each virtual channel is allocated its own queue within the switch. The transmitter 212, for example, determines the virtual channel that an outgoing frame needs to be on. The transmitter 212 can then place the frame in the queue corresponding to that virtual channel. Typically, frames with the same source and destination (denoted by, e.g., S_ID and D_ID) pair are sent and received via the same virtual channel. However, each virtual channel can carry frames having various source destination pairs. In other words, each virtual channel VC0228-VCn 232 can carry frames associated with different data flows.
Note that the virtual channel concept in FC networks should be distinguished from “virtual circuit” (which is sometimes also called “virtual channel”) in ATM networks. An ATM virtual circuit is an end-to-end data path with a deterministic routing from the source to the destination. That is, in an ATM network, once the virtual circuit for an ATM cell is determined, the entire route throughout the ATM network is also determined. In contrast, an FC virtual channel is a local logical channel for a respective link between switches. That is, an FC virtual channel only spans over a single link. When an FC data frame traverses a switch, the virtual channel information can be carried by appending a temporary tag to the frame. This allows the frame to be associated to the same VC identifier on outgoing link of the link. However, the VC identifier does not determine a frame's routing, because frames with different destinations can have the same VC identifier and be routed to different outgoing ports. An ATM virtual circuit, on the other hand, spans from the source to the destination over multiple links. Furthermore, an FC virtual channel carries FC data frames, which are of variable length. An ATM virtual circuit, however, carries ATM cells, which are of fixed length. Furthermore, frames having different end-to-end routes may share the same FC virtual channel. In contrast, all the data cells in an ATM virtual circuit belong to the same source/destination pair.
Referring back to
Switches 202 and 204 can also include transmitter bandwidth policy circuits 238 and 240 associated with transmitters 212 and 218 respectively. Policy circuit 202 allows the switch 202 to establish bandwidth policies related to each virtual channel or each group of virtual channels. Policy circuit 238 can include bandwidth limiting and bandwidth guarantee circuits (discussed in further detail below) associated with each VC or each group. For example, policy circuits 238 can include n bandwidth limiting circuits and n bandwidth guarantee circuits, where n is the total number of virtual channels VC0228-VCn 232 supported by port 206. Alternatively, the number of bandwidth limiting and bandwidth guarantee circuits can be equal to the maximum number of groups of VCs that can be allocated per port. For example, if port 206 can assign a maximum of 48 different groups of VCs, then the policy circuits can include 48 bandwidth limiting circuits and 48 bandwidth guarantee circuits.
Frames associated with a VC are input to the VC's queue for transmission. Several factors dictate when a frame on the head of a VC's queue is eligible for transmission. For example, these factors can include speed matching, credit availability, class of service, de-skew time, and bandwidth availability. In case of bandwidth availability, the bandwidth policy circuits 238 can send an enable signal to the appropriate queue at the transmitter 212 to indicate that the frame at the head of that queue has met the bandwidth policy requirement, and is ready to be transmitted. For example,
When VCs are combined into a group, an enable signal for the group signifies that a frame at the head of any one of the queues associated with the VCs in the group can be transmitted. The VC, and the associated queue, can be selected based on the group policy. For example, if a fairness policy is observed, each VC will be selected in turn every time an enable signal is received. Of course, other selection schemes, such as weighted priority, random selection, etc. can also be employed. As stated earlier, a group may include only a single VC, and in such cases receiving a group enable signal will enable the frame on the head of the queue associated with that single VC.
Discussion now turns to the transmitter bandwidth policy circuits 238 (and 240).
Group BW limiting circuit 300 includes a group counter register C 304 that stores a value, based on which the group's VCs are enabled. Size of register C 304 is typically the same or larger than the size of threshold register 302. Assuming that the size is n bits, the counter register C 304 can be built using n flip-flops. Of course, other well known digital structures for storing a series of bits can also be employed. Although the inputs and output signals/interconnects in
Input to counter register C 304 is controlled by 3-to-1 multiplexer 308. Multiplexer 308 receives three data inputs: one from the threshold register 302, one from adder 310 and one from subtracter 312. Control inputs RST 314, FLA 316, and ‘r’ tick 318 determine which one of the three inputs to the multiplexer 308 is provided to the counter register C 304. Control input RST 314 can be a reset signal that is asserted on power-up or when the counter register C 304 needs to be reset to an initial value. Control signal FLA 316 (Frame Length Available) can be received whenever the frame length of a frame that is transmitted from a VC belonging to the group becomes available. Control input ‘r’ tick 318 can be a periodic pulse signal that activates every ‘r’ seconds. Alternatively, ‘r’ tick 318 can be a non periodic signal, but that on average can provide a predetermined number of pulses per second. When control input RST 314 is asserted, the multiplexer 308 can pass the output of threshold register 302 to the counter register 312. When control input ‘r’ tick 318 is asserted, multiplexer 308 can pass the output of adder 310 to the input of counter register C 304. And when input FLA 316 is asserted, output of the subtracter 312 can be passed to the input of the counter register C 304.
Adder 310 can add the value TH stored in the threshold register 302 to the current value stored in the counter register C 304. The resultant value, C+TH, can then be loaded into the counter register C 304 every ‘r’ seconds. Subtracter 312 can subtract the value FL, representing the length of the frame (in bytes) that has been transmitted, from the current value stored in the counter register C 304. The resultant value, C−FL, can then be loaded into the counter register C 304 when control signal FLA is asserted. Adder 310 and subtracter 312 can be n-bits in size and can carry out 2's complement addition and subtraction. In other words, they can operate with both positive and negative numbers. A 2's complement representation of numbers usually represents negative numbers with a value ‘1 ’ in the MSB, and represents positive numbers with a value ‘0’ in the MSB. Operationally, the BW limiting circuit 300 increments the counter register C 304 by a value TH every ‘r’ seconds, and decrements the counter register C 304 by a value FL whenever a frame is transmitted by a VC belonging to the group.
Output of the counter register C 304 can be fed to a comparator 306, which compares the value stored in the counter register C 304 to 0. If the value is greater than 0 then the output of the comparator can be a single bit ‘1’, and if the value is less than or equal to 0 then the output of the comparator can be a single bit ‘0’. Output of the comparator 306 can be fed to the group VC enable signal, which can allow at least one frame associated with the VCs from the group scheduled for transmission to be transmitted. Therefore, if the value in the counter register C 304 is greater than 0, then the group VCs can be enabled for transmitting frames, otherwise the group VCs can be disabled for transmission. Note that the value chosen for comparison may be different than 0. For example, the value for comparison can be approximately equal to 0, such as −1, −2, +1, +2, etc. In cases where the value of TH is much smaller than the transmitted frame size in bytes, then the value of comparison can be anywhere between −TH to TH with only small effect on the actual bandwidth allocated to the group VCs.
Discussion now turns to the operation of BW limiting circuit 300 in limiting the bandwidth of the group of VCs, as shown in the exemplary flowchart 400 of
Step 418 in
Therefore, as long as the value of the register counter C 304 remains negative, the execution can repeatedly proceed through steps 406-408-412-418-406. When an ‘r’ tick signal is received every r seconds, step 420 can also be executed after step 418 and before step 420. This can allow the value C of the counter register C 304 to increment by value TH every r seconds. Eventually, the current value of counter register C 304 can become greater than zero, which event is shown at 508 in
Note that counter register C 304 can have a value that is not greater than the threshold value TH. In other words, in step 420, when adding TH to the current value of C results in a value that is greater than TH, the adder can store the value TH, instead of the actual sum of C and TH, in the counter register C 304. For example, in
Referring back to
To simplify analysis, two assumptions can be made. One that the port transmits the maximum allowable frame size each time a frame is transmitted. Second that the VC has satisfied all other factors necessary for it to successfully transmit a frame when a VC is enabled for frame transmission by the BW limiting circuit, i.e., as soon as the counter becomes positive, the port is able to transmit the frame immediately. Both these assumptions are valid, considering the fact that they provide for the worst case conditions for which bandwidth limiting is to be provided. In other words the above two assumptions result in the maximum amount of bytes being transmitted per unit time, and the bandwidth limiting circuit should be able to limit the bandwidth under such conditions.
For maximum bandwidth, the pattern of counter C in
As an example for demonstrating bandwidth limiting, the value of TH can be set to 50 and the value of r can be set to 8 micro-seconds. The frame length FL is assumed to be 2000 bytes. Initially, the counter register C 304 can be loaded with the value 50. Because this value is greater than zero, a frame can be transmitted. Once the frame length is subtracted from C, the resultant value in the register counter C 304 will be −1950. Every 8 micro-seconds the TH value of 50 will be added to C. Therefore every 8 micro-seconds the value of C will progress as −1950, −1900, −1850, and so on until the value becomes greater than zero to +50. When C is equal to +50 another frame can be transmitted and the FL value will be subtracted from C. The progression of C from −1950 to +50 in steps of 50 will require 40 increments. Therefore, from the instant the counter C was decremented to −1950 due to the transmission of the first frame to the instant when the C reaches +50 and transmission of the second frame takes place, 40×8 micro-seconds=320 micro-seconds will have elapsed. Within these 320 micro-seconds 2000 bytes of information was transmitted. Therefore, the bandwidth will be equal to (2000 bytes)/320 micro-seconds. This is equal to 50 M bits per second. In other words, setting the value of TH to 50 and r to 8 micro-seconds results in a maximum bandwidth of 50 M bits per second.
The same result can also be obtained by plugging in the values of TH and r in the expression of maximum bandwidth determined earlier, and will yield BWmax=50/8 micro-seconds=6.25×106 bytes per second=50 M bits per second.
Setting the value of r to 8 micro-seconds produces a convenient relationship between the value TH and the resultant bandwidth, such that the resultant bandwidth is no more than TH Mbps. For example, if the required value of BWmax is 2 Gbps, then the value TH can be set to 2000.
Operation of bandwidth guarantee circuit 600 can be described with the aid of the exemplary flowchart 700 shown in
Referring back to step 708, if the current value C of the register counter C 304 is less than or equal to 0, then the execution moves to step 712. If the fairness algorithm, shown in the FA block 606 in
Comparing the bandwidth guarantee flowchart 700 of
Typically, values stored in the group bandwidth threshold registers of all groups can be selected such that the total bandwidth for all groups is less than or equal to the maximum port bandwidth. For example, let's assume that the value of r is 8 micro-seconds. Then the value TH for a group will specify a bandwidth of TH Mbps assigned to that group. For n groups, the total bandwidth assigned to port will be the sum of the values stored in each groups bandwidth threshold register. In other words, the total bandwidth of the port is greater than or equal to
where TH, is the value programmed into the group bandwidth threshold register for the ith group. So, as an example, if there were three groups, each with the threshold value of 1000 (i.e., 1 Gbps), with the port bandwidth of 4 Gbps, the bandwidth guarantee circuit can guarantee each group with a bandwidth of 1 Gbps. Therefore, if each group can be utilized to the extent that it can transmit at a bandwidth of 1 Gbps, then the bandwidth guarantee circuit can enable sufficient frames for each group for the group to achieve 1 Gbps. Additional bandwidth required by each group can be provided from the remaining 1 Gbps bandwidth of the port, and this can be based on a fairness algorithm, as shown by way of example in
The FA block 606 can also include an enable signal 610 that allows the activation/deactivation of bandwidth guarantee for a particular port. For example, if no bandwidth guarantee is required, the BW guarantee enable signal 610 is de-asserted. As a result the outputs of the FA block 606 coupled to the OR gates 608a-608n is pulled low. Because one of the two inputs to each OR gate is a zero, the output of each OR gate is dependent on only the other input. In other words, once the FA block 606 is disabled, the enable signals for each group will depend upon the outputs of their respective BW limiting circuits only.
Although the preceding descriptions of bandwidth limiting and bandwidth guarantee circuits have been described within the context of a network switch (e.g., 202 and 204 in
Furthermore, the preceding description of bandwidth limiting and bandwidth guarantee circuits is not limited to Fibre Channel networks, and can be used in direct link networks such as, Ethernet, wireless 802.11, etc., and packet switched networks such as the Internet.
The above description is illustrative and not restrictive. Many variations of the invention will become apparent to those skilled in the art upon review of this disclosure. The scope of the invention should therefore be determined not with reference to the above description, but instead with reference to the appended claims along with their full scope of equivalents.