This application is related to U.S. patent application publication number 20020078299, titled “Caching System and Method for a Network Storage System” by Lin-Sheng Chiou, Mike Witkowski, Hawkins Yao, Cheh-Suei Yang, and Sompong Paul Olarig, which was filed on Dec. 14, 2000, now U.S. Pat. No. 6,792,507 and which is incorporated herein by reference in its entirety for all purposes; U.S. patent application Ser. No. 10/015,047, titled “System, Apparatus and Method for Address Forwarding for a Computer Network” by Hawkins Yao, Cheh-Suei Yang, Richard Gunlock, Michael L. Witkowski, and Sompong Paul Olarig, which was filed on Oct. 26, 2001 and which is incorporated herein by reference in its entirety for all purposes; U.S. patent application Ser. No. 20030200330, titled “System And Method For Load-Sharing Computer Network Switch” by Sompong Paul Olarig, Mark Lyndon Oelke, and John E. Jenne, which was filed on Apr. 22, 2002, and which is incorporated herein by reference in its entirety for all purposes; U.S. patent application Ser. No. 10/039,190, titled “Network Processor Interface System” by Sompong Paul Olarig, Mark Lyndon Oelke and John E. Jenne, which is being filed concurrently on Dec. 31, 2001, and which is incorporated herein by reference in its entirety for all purposes; and U.S. patent application Ser. No. 10/039,189, titled “Xon/Xoff Flow Control for Computer Network” by Hawkins Yao, John E. Jenne and Mark Lyndon Oelke, which is being filed concurrently on Dec. 31, 2001, and which is incorporated herein by reference in its entirety for all purposes.
The present invention is related to computer networks. More specifically, the present invention is related to providing flow control of information for a computer network.
Fibre Channel standards define protocols for link-level and end-to-end congestion control. However, these standard protocols do not eliminate head of line (HOL) blocking within a switch. HOL blocking is a problem for internal switching that occurs when several packets at the head of an input queue block packets from being forwarded to output ports. Storage Area Network (SAN) switches that share egress buffer resources are particularly susceptible to HOL blocking when they become congested because, unlike typical TCP/IP switches and routers, a SAN switch does not discard traffic when it becomes congested.
The Fibre Channel link-level flow control mechanism (buffer to buffer credits or BB Credits) is typically provided on a per-link basis to devices attached to the SAN switch. Occasionally, several ingress ports may share a pool of BB credits to receive traffic. In most SAN switches, egress congestion is not communicated to the ingress ports to limit the amount of ingress traffic. As a result, HOL blocking may occur within the switch as pools of shared memory become congested. Another major problem with buffer-to-buffer flow control model is that it is difficult to determine the number of BB Credits that are needed to efficiently move the frames. This is critical because the system needs enough credits to be able to provide a continuous flow between ports.
The invention overcomes the above-identified problems as well as other shortcomings and deficiencies of existing technologies by providing an end-to-end, e.g., ingress port to egress port, traffic flow control through a computer network at the system level.
The present invention is directed to a method for providing buffer-to-buffer credit port-level flow control for a computer network in operative communication with a plurality of ingress and egress network processors, each having an egress port and an ingress port that is associated with a buffer-to-buffer credit value corresponding to the current number of frames the ingress port may send, a buffer value corresponding to the current total frame size the ingress port may send, and a pending buffer-to-buffer value corresponding to the pending buffer-to-buffer credits an egress port may issue the ingress port.
In an exemplary embodiment of the present system and method for flow control, buffer-to-buffer flow control is implemented to manage frame traffic from a selected ingress port based on the number and size of the frames the port is permitted to send. The port is issued credits that correspond to the number and size of the frame that the port may send. These credits are decremented when a frame is sent and may be incremented when the frame reaches its destination.
The present invention is directed to a method comprising the steps of: sending a frame from the ingress port to a destination egress port, if the ingress port has a sufficient buffer-to-buffer credit value and buffer value; decrementing the buffer-to-buffer credit value associated with the ingress port; decrementing the buffer value associated with the ingress port; determining whether to increment the buffer-to-buffer credit value associated with the ingress port; incrementing the pending credit value associated with the ingress port; and determining whether to send a credit message to the ingress port. A set of network processors is associated with a bridge. The computer system may further comprise a switch fabric; and the network processors may be in operative communication with the switch fabric via the associated bridge. The step of determining whether to increment the buffer-to-buffer credit value may further comprise: incrementing the buffer-to-buffer credit value associated with the ingress port if the product of one plus the buffer-to-buffer credit value times the maximum frame size in bytes is less than or equal to a minimum egress buffering value. The minimum egress buffering value may correspond to the minimum amount of egress buffering that is available for any one egress port. The step of determining whether to send a credit message to the ingress port may further comprise: sending the credit message if the pending credit value is greater than, or equal to, a credit watermark value. The method may further comprise the step of: increasing the buffer value if the credit message is sent.
An advantage of the present flow control schemes is that HOL blocking is substantially eliminated. The present flow control schemes alleviate the problems of increased system latency, unintentionally dropped packets, and time-out situations. Another advantage of the present flow control schemes is that more efficient data streaming is provided for the computer network. Other advantages will be apparent in view of the figures and detailed description below.
A more complete understanding of the present disclosure and advantages thereof may be acquired by referring to the following description taken in conjunction with the accompanying drawings, wherein:
While the present invention is susceptible to various modifications and alternative forms, specific exemplary embodiments thereof have been shown by way of example in the drawings and are herein described in detail. It should be understood, however, that the description herein of specific embodiments is not intended to limit the invention to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.
The present invention relates to a system and method for providing traffic flow control through a computer network, e.g., a SAN, at the system level. The presently disclosed system and method for flow control involves dynamic buffer-to-buffer flow control scheme that uses a credit/debit based scheme to manage traffic to a particular port. Generally, the flow control scheme limits the traffic associated with a selected port in the computer network based on the size and number of frames or packets that are to be passed through that port.
For the present disclosure, the network processor 30 may merely provide limited computational capabilities and may be satisfied by even rudimentary digital processors. Any of these digital processors need not exist within the present invention. Any necessary processing can be satisfied from remote processors. In a preferred embodiment of the present invention, latency may be reduced by having the network processor 30 within the system.
In the exemplary embodiment shown in
Each network processor 30 has ingress buffering that is used to implement a VOQ for each egress Fibre Channel port 65 in the system. In the example discussed above, each network processor 30 implement 512 VOQs, one for each egress Fibre Channel port 65 in the system. Each network processor 30 also has egress buffering that is used to implement at least two outbound queues, one for each egress Fibre Channel port 65 connected to the network processor 30. The network processors 30 monitor the depth of the egress buffers for each of its two Fibre Channel ports 60 and 65.
The flow-control scheme of the present disclosure utilizes a dynamic buffer-to-buffer flow control mechanism to control traffic between ports. Standard fibre channel buffer-to-buffer flow control mechanisms use a credit/debit based algorithm to control traffic between the N-Port and F-Port of a fibre channel link. Table I below shows an example of how fibre channel buffer-to-buffer flow control can be extended across the fabric switch to handle frame transfers between two network processors, “NP 1” and “NP 250.”
This example shown in Table I illustrates the fibre channel buffer-to-buffer flow control scheme may be extended all the way to the egress port within the network switch. The egress NP determines when to issue a credit message that translates into a BB Credit to the ingress port. Therefore, the egress network processor throttles the ingress port transmission rate by controlling when the BB Credit is sent. As discussed above, a major problem with this flow control model is that it is difficult to determine the number of BB Credits that are needed to efficiently moves the frames. It is important that the system has enough credits to be able to provide a continuous flow between a single ingress and egress port.
One challenge is that there is a wide range of fibre channel frame sizes. If the system uses the minimum number of BB Credits needed for a continuous flow of the largest fibre channel frames, then a stream of small fibre channel frames would be unnecessarily throttled due to a lack of BB credits. On the other hand, if the number of BB Credits is set to the number of credits needed to stream the smallest fibre channel frames, then large fibre channel frames place extreme buffering requirements on the system. To minimize the buffering requirements for large fibre channels and to enable small fibre channel frames to stream, a dynamic BB credit level flow control is needed.
The presently disclosed dynamic BB Credit flow control scheme involves byte-based connectivity between the ingress and egress network processors. The ingress network processor is permitted to send a predefined amount of traffic to an egress network processor. This traffic can be made up of a large number of small frames or a small number of large frames. Regardless of size characteristics of the traffic, the ingress network processor preferably never sends more than the predefined amount of traffic to the egress network processor. After the egress network processor has transmitted the frame out the egress port, it sends a credit to the ingress network processor with the frame size in bytes. The ingress network processor then uses this credit to increase its pool of permissible traffic.
This BB Credit flow control scheme dynamically allocates BB Credits based on the amount of egress buffering available at the egress network processor. This fibre channel buffer-to-buffer flow control scheme operates on a per-frame basis, so if there are a lot of small fibre channel frames, the BB Credits are given quickly which permits the small frames to stream. If there are a lot of large fibre channel frames, the BB Credits are given out at a slower rate that the egress port can handle. If an egress port is congested, the egress network processor gives credits back to ingress network processor at the rate it is transmitting traffic out of the network switch.
The byte-based credits sent from the egress network processor to the ingress network processor of a network switch may generate overhead that uses valuable switch fabric bandwidth. Accordingly, another embodiment of the dynamic BB Credit flow control scheme combines credit messages. Instead of generating a byte-based credit for every frame transmitted out the egress port, the BB Credit flow control scheme may combine multiple credits. The egress network processor tracks the amount of byte-based credits for each ingress port, and once a credit level was reached, the egress network processor generates a credit for the accumulated byte total. The credit level may be programmable so that the level may be defined by a user for a particular system or network. The ingress network processor tracks the amount of available egress buffer available for each egress port. BB Credits are based on the minimum amount of egress buffering available at any one egress port.
In another embodiment of the BB Credit flow control scheme, the information shown in Tables 2 and 3 is tracked at each ingress and egress port, respectively. Table II below shows the variables to be tracked at the ingress port, where N corresponds to the number of ports in the system. The Max_BB_Credit variable corresponds to the maximum number of BB Credit that may be negotiated for that port. The Cur_BB_Credit variable is the current BB Credit value for the port. Max_Buff represents the maximum amount, e.g., in bytes, of egress buffering for a single egress port. Cur_Buff is the currently available amount, e.g., in bytes, of egress buffering for a single ingress port per egress port.
Table III below shows the variables to be tracked at the egress port, where N corresponds to the number of ports in the system. Credit_Watermark is a variable that is used by the system to determine when to issue a credit back to an ingress port. Pending_Credit represents the amount of pending egress buffering credit for each ingress port.
At step 505, the ingress port is on standby to receive a frame from the ingress device. When an ingress device sends a frame to the ingress port, it must be determined whether the ingress port has sufficient BB Credit to send a frame. Generally, ingress device 445 only tracks the number of BB Credits that were negotiated. If the ingress port has insufficient BB Credit, then the frame cannot be sent at this time. Accordingly, at step 515, the ingress port must wait for a BB Credit before it may send a frame.
If it is determined at step 510 that the ingress port has a sufficient BB Credit value, then the ingress port sends the frame to the ingress network processor at step 520. The ingress device decrements its BB Credit count at step 525. The ingress NP receives the frame and decrements the ingress port's Cur_BB_Credit at step 530.
At step 530, the ingress network processor determines whether there is sufficient Cur_Buf[n] at the egress network processor to send the frame. If Cur_Buf[n] is insufficient at the egress network processor, the ingress processor must wait. If Cur_Buf[n] is sufficient, the ingress network processor sends the frame to the egress network processor at step 540. The ingress network processor then decrements Cur_Buf[n] at step 545 by the frame size plus the associated overhead.
At step 550, the ingress network processor then determines whether or not to give a BB Credit to the ingress device. If the system determines that there is sufficient egress buffering for any one egress port, then the network processor gives a BB Credit to the ingress device at step 555 and then increments the Cur_BB_Credit for the ingress port at step 560. In one embodiment of the present BB Credit flow control scheme, if ((Cur_BB_Credit+1)*MAX—FC_FRAME_SIZE)≦Minimum (Cur_Buf[n]), then the ingress network processor will immediately gives a BB Credit to the ingress device or port and increments Cur_BB_Credit. The variable Minimum (Cur_Buf[n]) is the minimum amount of egress buffering available for any one egress port. Otherwise, the ingress network processor must wait until it receives a credit message from the egress network processor, before it may increment the ingress port's BB Credit. In this particular embodiment, the system checks to see if there is enough egress buffering for all the current BB Credits plus one (assuming each BB Credit is associated with a maximum sized fibre channel frame). Typically, for Fibre Channel networks, the smallest frame is 36 bytes and the largest is 2148 bytes.
In one embodiment of the present BB Credit flow control scheme, if Pending_Credit[N]≧Credit_Watermark, the egress network processor creates a credit message to send back to the ingress network processor. If the system determines that a sufficient amount of buffering is available, then a credit message is generated and sent at step 640. The credit message includes the amount of buffering freed up, which is equal to Pending_Credit[n]. The variable Pending_Credit[N] is then preferably set to zero at step 645 before preceding back to step 605. Otherwise, the flow control process proceeds to step 505 and the ingress port waits for the next frame to send.
Table IV below shows an example of how the BB Credit flow control scheme controls frame traffic across the network switch 410. For the purposes of illustration, the process shown in Table IV is based on the following values: Max_BB Credit=3; Max_Buf=8 KB; Credit_Watermark=4 KB; and the maximum fibre channel frame size equals 2K. Furthermore, for the purposes of illustration, the example shown in Table IV is based on the following assumptions: all fibre channel frames are equal to the maximum fibre channel frame size; there is no overhead; and the same egress is used throughout Table IV to simplify the example.
In one exemplary embodiment of the BB Credit buffer scheme of the present invention, one frame occupies one buffer/credit. Typically, one buffer contains usually 2 KB of memory. Thus, a small frame pocket of 36 bytes, for example, still consumes an entire 2 KB of memory. Accordingly, for another exemplary embodiment of the BB Credit scheme, several smaller frame packets may be logically grouped and associated with a single buffer. For example, several small frames of about 36 bytes may be collapsed into one 2 KB buffer. The system may keep track of the ordering via hardware logic. For example, the system may maintain a scoreboard to track the order in which the frame packets are to be sent. In this exemplary embodiment, the system maximizes the number of available credit for larger frames and thus potentially increases the throughput between the sender and receiver.
The presently disclosed flow control schemes provide a number of advantages. One advantage of the present invention is that the flow control scheme substantially eliminates head of line (HOL) blocking. As discussed above Fibre Channel standards define link-level and end-to-end congestion control protocols but these standard protocols do not eliminate head of line (HOL) blocking. HOL blocking is a problem for internal switching that occurs when several packets at the head of an input queue block packets from being forwarded to output ports. The dynamic BB Credit flow control scheme prevents problems caused by HOL blocking such as increased system latency, unintentionally dropped packets, and time-out problems.
Another advantage of the presently disclosed flow control schemes is that they allow for more efficient data streaming. Instead of frame based flow control that does not account for frame size, the dynamic BB Credit flow control scheme provides byte-based connectivity between the ingress and egress network processors. The ingress network processor is permitted to send a predefined amount of traffic to the egress network processor. This flow control scheme allows for the system to dynamically give BB Credits based on the amount of buffering available.
The invention, therefore, is well adapted to carry out the objects and attain the ends and advantages mentioned, as well as others inherent therein. While the invention has been depicted, described, and is defined by reference to exemplary embodiments of the invention, such references do not imply a limitation on the invention, and no such limitation is to be inferred. The invention is capable of considerable modification, alternation, and equivalents in form and function, as will occur to those ordinarily skilled in the pertinent arts and having the benefit of this disclosure. The depicted and described embodiments of the invention are exemplary only, and are not exhaustive of the scope of the invention. Consequently, the invention is intended to be limited only by the spirit and scope of the appended claims, giving full cognizance to equivalents in all respects.
Number | Name | Date | Kind |
---|---|---|---|
4755930 | Wilson, Jr. et al. | Jul 1988 | A |
5140682 | Okura et al. | Aug 1992 | A |
5247649 | Bandoh | Sep 1993 | A |
5515376 | Murthy et al. | May 1996 | A |
5530832 | So et al. | Jun 1996 | A |
5602841 | Lebizay et al. | Feb 1997 | A |
5611049 | Pitts | Mar 1997 | A |
5699548 | Choudhury et al. | Dec 1997 | A |
5778429 | Sukegawa et al. | Jul 1998 | A |
5835756 | Caccavale | Nov 1998 | A |
5835943 | Yohe et al. | Nov 1998 | A |
5845280 | Treadwell, III et al. | Dec 1998 | A |
5845324 | White et al. | Dec 1998 | A |
5852717 | Bhide et al. | Dec 1998 | A |
5864854 | Boyle | Jan 1999 | A |
5873100 | Adams et al. | Feb 1999 | A |
5878218 | Maddalozzo, Jr. et al. | Mar 1999 | A |
5881229 | Singh et al. | Mar 1999 | A |
5918244 | Percival | Jun 1999 | A |
5930253 | Brueckheimer et al. | Jul 1999 | A |
5933849 | Srbljic et al. | Aug 1999 | A |
5944780 | Chase et al. | Aug 1999 | A |
5944789 | Tzelnic et al. | Aug 1999 | A |
5978841 | Berger | Nov 1999 | A |
5978951 | Lawler et al. | Nov 1999 | A |
5987223 | Narukawa et al. | Nov 1999 | A |
5991810 | Shapiro et al. | Nov 1999 | A |
6041058 | Flanders et al. | Mar 2000 | A |
6044406 | Barkey et al. | Mar 2000 | A |
6081883 | Popelka et al. | Jun 2000 | A |
6085234 | Pitts et al. | Jul 2000 | A |
6098096 | Tsirigotis et al. | Aug 2000 | A |
6128306 | Simpson et al. | Oct 2000 | A |
6138209 | Krolak et al. | Oct 2000 | A |
6147976 | Shand et al. | Nov 2000 | A |
6243358 | Monin | Jun 2001 | B1 |
6289386 | Vangemert | Sep 2001 | B1 |
6400730 | Latif et al. | Jun 2002 | B1 |
6484209 | Momirov | Nov 2002 | B1 |
6532501 | McCracken | Mar 2003 | B1 |
6584101 | Hagglund et al. | Jun 2003 | B1 |
6594701 | Forin | Jul 2003 | B1 |
6597699 | Ayres | Jul 2003 | B1 |
6615271 | Lauck et al. | Sep 2003 | B1 |
6657962 | Barri et al. | Dec 2003 | B1 |
6687247 | Wilford et al. | Feb 2004 | B1 |
6721818 | Nakamura | Apr 2004 | B1 |
6731644 | Epps et al. | May 2004 | B1 |
6735174 | Hefty et al. | May 2004 | B1 |
6747949 | Futral | Jun 2004 | B1 |
6757791 | O'Grady et al. | Jun 2004 | B1 |
6765871 | Knobel et al. | Jul 2004 | B1 |
6765919 | Banks et al. | Jul 2004 | B1 |
6785241 | Lu et al. | Aug 2004 | B1 |
6792507 | Chiou et al. | Sep 2004 | B1 |
20010037435 | Van Doren | Nov 2001 | A1 |
20010043564 | Bloch et al. | Nov 2001 | A1 |
20020004842 | Ghose e al. | Jan 2002 | A1 |
20020010790 | Ellis et al. | Jan 2002 | A1 |
20020012344 | Johnson et al. | Jan 2002 | A1 |
20020024953 | Davis et al. | Feb 2002 | A1 |
20020071439 | Reeves et al. | Jun 2002 | A1 |
20020186703 | West et al. | Dec 2002 | A1 |
20020188786 | Barrow et al. | Dec 2002 | A1 |
20030002506 | Moriwaki et al. | Jan 2003 | A1 |
20030012204 | Czeiger et al. | Jan 2003 | A1 |
20030014540 | Sultan et al. | Jan 2003 | A1 |
20030048792 | Xu et al. | Mar 2003 | A1 |
20030063348 | Posey, Jr. | Apr 2003 | A1 |
20030074449 | Smith et al. | Apr 2003 | A1 |
20030084219 | Yao et al. | May 2003 | A1 |
20030093541 | Lolayekar et al. | May 2003 | A1 |
20030093567 | Lolayekar et al. | May 2003 | A1 |
20030126297 | Olarig et al. | Jul 2003 | A1 |
20030128703 | Zhao et al. | Jul 2003 | A1 |
20030154301 | McEachern et al. | Aug 2003 | A1 |
20030163555 | Battou et al. | Aug 2003 | A1 |
20030195956 | Bramhall et al. | Oct 2003 | A1 |
20030198231 | Kalkunte et al. | Oct 2003 | A1 |
20030202520 | Witkowski et al. | Oct 2003 | A1 |
20050018619 | Banks et al. | Jan 2005 | A1 |
20050243734 | Nemirovsky et al. | Nov 2005 | A1 |
Number | Date | Country | |
---|---|---|---|
20030126223 A1 | Jul 2003 | US |