The present invention relates generally to the storage and data networking fields, and more particularly, relates to a scheduler, scheduling method, and computer program product for implementing Quality-of-Service (QoS) scheduling with a cached status array.
Related United States patent applications by William John Goetzinger, Glen Howard Handlogten, James Francis Mikos, and David Alan Norgaard and assigned to the present assignee are being filed on the same day as the present patent application including:
U.S. patent application Ser. No. 10/004373, entitled “QoS SCHEDULER AND METHOD FOR IMPLEMENTING PEAK SERVICE DISTANCE USING NEXT PEAK SERVICE TIME VIOLATED INDICATION”;
U.S. patent application Ser. No. 10/002416, entitled “QoS SCHEDULER AND METHOD FOR IMPLEMENTING QUALITY OF SERVICE WITH AGING TIME STAMPS”;
U.S. patent application Ser. No. 10/004217, entitled “QoS SCHEDULER AND METHOD FOR IMPLEMENTING QUALITY OF SERVICE ANTICIPATING THE END OF A CHAIN OF FLOWS”;
U.S. patent application Ser. No. 10/016518, entitled “WEIGHTED FAIR QUEUE HAVING EXTENDED EFFECTIVE RANGE”;
U.S. patent application Ser. No. 10/015994, entitled “WEIGHTED FAIR QUEUE SERVING PLURAL OUTPUT PORTS”;
U.S. patent application Ser. No. 10/015760, entitled “WEIGHTED FAIR QUEUE HAVING ADJUSTABLE SCALING FACTOR”; and
U.S. patent application Ser. No. 10/002085, entitled “EMPTY INDICATORS FOR WEIGHTED FAIR QUEUES”.
Storage and data networks are designed to support the integration of high quality voice, video, and high speed data traffic. Storage and data networking promises to provide transparent data sharing services at high speeds. It is easy to see that rapid movement and sharing of diagrams, pictures, movies, audio, and the like requires tremendous bandwidth. Network management is concerned with the efficient management of every bit of available bandwidth.
A need exists for a high speed scheduler for networking that ensures the available bandwidth will not be wasted and that the available bandwidth will be efficiently and fairly allocated. The scheduler should permit many network traffic flows to be individually scheduled per their respective negotiated Quality-of-Service (QoS) levels. This would give system administrators the ability to efficiently tailor their gateways, switches, storage area networks (SANs), and the like. Various QoS can be set up using combinations of precise guaranteed bandwidth, required by video for example, and limited or unlimited best effort bandwidth for still pictures, diagrams, and the like. Selecting a small amount of guaranteed bandwidth with the addition of some bandwidth from the pool of best effort bandwidth should guarantee that even during the highest peak periods, critical data will be delivered to its application at that guaranteed rate.
A scheduler advantageously may be added to a network processor to enhance the quality of service (QoS) provided by the network processor subsystem.
Known high-performance network processor scheduler systems are able to search entire calendars in one system cycle for the purpose of updating calendar status, that is active flow status. High-performance schedulers will no longer be able to search entire calendar arrays within one system cycle as performance requirements increase. Bandwidth constraints no longer allow the entire calendar array to be searched each cycle. A new technique is needed to perform calendar updates.
A principal object of the present invention is to provide a QoS scheduler, scheduling method, and computer program product for implementing Quality-of-Service (QoS) scheduling with a cached status array. Other important objects of the present invention are to provide such QoS scheduler, scheduling method, and computer program product for implementing Quality-of-Service (QoS) scheduling with a cached status array substantially without negative effect and that overcome some disadvantages of prior art arrangements.
In brief, a QoS scheduler, scheduling method, and computer program product are provided for implementing Quality-of-Service (QoS) scheduling with a cached status array. A plurality of calendars are provided for scheduling the flows. An active flow indicator is stored for each calendar entry in a calendar status array (CSA). A subset of the active flow indicators from the calendar status array (CSA) is stored in a cache. The calendar status array (CSA) is updated based upon a predefined calendar range and resolution. The subset of the active flow indicators from the calendar status array (CSA) is used to determine a given calendar for servicing.
In accordance with features of the invention, the cache copy subset of the active flow indicators from the calendar status array (CSA) is used to increment a current pointer (CP) by an identified number of positions up to a current time (CT) value, where the identified number of positions is equal to a variable number of inactive flow indicators up to the current time (CT) value and the identified number of positions has a maximum value equal to a number of entries in the cache.
The present invention together with the above and other objects and advantages may best be understood from the following detailed description of the preferred embodiments of the invention illustrated in the drawings, wherein:
Having reference now to the drawings, in
Scheduler 200 of the preferred embodiment permits many network traffic flows, for example, 64 thousand (64K) network traffic flows to be individually scheduled per their respective assigned Quality-of-Service (QoS) level. Each flow is basically a one-way connection between two different points. QoS parameters are held in a flow queue control block (FQCB), such as in the external flow queue memory 112. QoS parameters include sustained service distance (SSD), peak service distance (PSD), queue distance (QD), port identification (ID), and the like. There can be, for example, 64 thousand flows and a FQCB for each flow.
Referring now to
For a flow enqueue request received by queue manager 208, the flow's FQCB information is retrieved from one of the external SRAM 226 or 228 or internal array 230 and examined to determine if the new frame should be added to an existing frame string for a given flow, start a new frame string, or be discarded. In addition, the flow queue may be attached to a calendar or ring for servicing in the future. Read and write request messages received by queue manager 208 are used to initialize flows.
Port back-pressure from the dataflow 104 to the scheduler 200 occurs via the port status request message originated from the dataflow and applied to the calendar and rings block 220. When a port threshold is exceeded, all WFQ and PBS traffic associated with that port is held in the scheduler 200 and the selection logic of winner partition 222 does not consider those flows potential winners. When port back-pressure is removed, the flows associated with that port are again eligible to be winners.
Calendars and rings block 220 includes, for example, three calendars (low latency service (LLS), normal latency service (NLS), peak bandwidth service (PBS)) and weighted fair queues (WFQs). The calendars are time based. The weighted fair queues (WFQs) are weight based. The WFQs are also referred to as best effort queues because WFQs can only schedule excess bandwidth and therefore can have no bandwidth guarantee associated with them.
Flows are attached to one or more of three calendars (LLS, NLS, PBS) and one WFQ ring 220 in a manner consistent with its QoS parameters. For example, if a flow has a guaranteed bandwidth component, it is attached to a time based calendar. If a flow has a WFQ component, it is attached to the WFQ ring. A flow may have both a guaranteed and best effort or WFQ component. The calendars 220 are used to provide guaranteed bandwidth with both a low latency service (LLS) and a normal latency service (NLS) packet rate. Flows are scheduled for service at a certain time in the future. WFQ rings are used by the weighted fair queuing algorithm. Entries are chosen based upon position in the WFQ rings 220 without regard to time. The WFQ rings 220 are work conserving or idle only when there are no flows to be serviced. A flow set up using a WFQ ring can optionally have a peak bandwidth limit associated with it.
Scheduler 200 performs high speed scheduling, for example, processing 27 Million frames per second (Mframes/second). Scheduling rates per flow for the LLS, NLS and PBS calendars 220 range, for example, from 10 Giga bits per second (Gbps) to 3.397 Thousand bits per second (Kbps). Rates do not apply to the WFQ ring.
SRAM 226 is an external high speed, for example, quad data rate (QDR) SRAM containing flow queue information or flow queue control block (FQCB) information and frame information or frame control block (FCB) information. SRAM 228 is, for example, an optional external QDR SRAM containing flow queue information or flow queue control block (FQCB) depending on the number of flows. Internal array 230 contains for example, 4k FQCB or 64K aging information. Internal array 230 may be used in place of the external SRAM 228 if less than four thousand (4K) flows are required and is also used to hold time stamp aging information. Internal array 230 containing FQCB aging information is used with logic that searches through the flows and invalidates expired time stamps.
Queue manager 208 performs the queuing operation of scheduler 200 generally as follows: A linked list or string of frames is associated with each flow. Frames are always enqueued to the tail of the linked list. Frames are always dequeued from the head of the linked list. Flows are attached to one or more of four calendars/rings (LLS, NLS, PBS, WFQ) 220 using the QoS parameters. Selection of which flow to service is done by examining the calendars/rings 220 in the order of LLS, NLS, PBS, WFQ. Then the frame at the head of the selected flow is selected for service. The flow queues are not grouped in any predetermined way to target port. The port number for each flow is user programmable. All WFQ flows with the same port ID are attached to the same WFQ ring. The QoS parameters also apply to the discard flow. The discard flow address is user selectable and is set up at configuration time.
When a flow enqueue request is sent to the scheduler 200, its frame is tested for possible discard using information from the flow enqueue request message and information stored in the FQCB. If the frame is to be discarded then the FQCB pointer is changed from the FQCB in flow enqueue request message to the discard FQCB. Alternatively, the frame is added to the tail end of the FCB chain associated with the FQCB. In addition, the flow is attached if it is not already attached to the appropriate calendar (LSS, NLS, PBS), or ring (WFQ). As time passes, selection logic of winner partition 222 determines which flow is to be serviced (first LLS, then NLS, then PBS, then WFQ). If a port bandwidth threshold has been exceeded, the WFQ and PBS component associated with that port are not eligible to be selected. When a flow is selected as the winner, the frame at the head of the FCB chain for the flow is dequeued and a port enqueue response message is issued to the dataflow 104. If the flow is eligible for a calendar reattach, the flow is reattached to the appropriate calendar (LLS, NLS, PBS) or ring (WFQ) in a manner consistent with the QoS parameters.
Scheduler 200 of the preferred embodiment keeps track of multiple calendars 220. For example, calendars 220 include 5 epochs of low latency service (LLS) calendars, 5 epochs of normal latency service (NLS) calendars, and 5 epochs of peak service (PS) calendars, each calendar epoch including 512 entries. Also, the scheduler 200 needs to be able to keep track of many rings, for example, 66 WFQ rings, with 2 parsecs with 256 entries each for a total of 41472 locations. Any of these locations could potentially need to be updated. Conventional network processor designs read all entries each system cycle then performed a search based on the results. Bandwidth constraints no longer allow the entire calendar array to be searched each cycle.
In accordance with features of the preferred embodiment, a calendar status array (CSA) 300 provides an indication that a LLS, NLS, PS calendar, or a WFQ ring has an active flow attached. In the preferred embodiment, 1 bit is used for each possible calendar or ring location. In the preferred embodiment two on-chip arrays CSA 1, CSA 2, 300 store both the calendar and ring active flow indicator. Access to the arrays CSA 1, CSA 2, 300 is shared during a scheduler interval or scheduler tick that, for example, is equal to 6 clock cycles. For example, CSA access is shared with the WFQ rings 220 getting 2 reads per tick and the calendars 220 getting 1 read. For example, the WFQ rings access 256 bits via two 128 bit wide arrays defining CSA 1, CSA 2, 300. The calendar 220 uses, for example, one of the 128 bit wide arrays defining CSA 1, 300 for each CSA access.
Referring now to
A portion of the data of on-chip CSA 300 is accessible in one cycle, for example, for each CSA access, ¼ or 128 of the 512 flow status indicators are read. For example, in a first CSA access 128 flow status indicators corresponding to calendar entries (0:127) are read for CSA address 0. 32 of those bits are stored in a CSA cache 302 corresponding to the calendar epoch for that CSA read. When the cache overlaps two contiguous CSA addresses, for example, 0 and 1, CSA address 1 will be read. The cache 302 will be refreshed with the new information for the bits corresponding to CSA address 1. The bits corresponding to CSA 0 will be shifted from the high order bits in the cache 302 to the appropriate low order bits. The main reason for using the cached CSA bits is to allow simultaneous access to all calendar epochs without having to read all calendar epochs' worth of status information each time.
A subset of the data of on-chip array CSA1, 300 is cached for each of the calendars 220 in a cache 302 labeled CACHED COPY 302 in
The cache copy data contained in the cache 302 is used to determine if a given calendar is ready to dequeue a frame. A current pointer (CP) stored in an on-chip register points to a calendar entry that may be picked for servicing when current time (CT) stored in another on-chip register is greater than or equal to the CP. The current pointer (CP) determines where CSA 300 is accessed, with one of four addressed portions of CSA 300 accessible in one cycle. The cache 302 stores 32 bits of flow status indicators from the on-chip CSA 1, 300 based upon the current pointer (CP). For example, with current pointer (CP) equal to 24, flow status indicator bits 24–55 from the on-chip CSA 1, 300 are loaded into the cached copy data of cache 302.
The cached copy data in cache 302 is used with the CSA 300 to accommodate a wrap condition where the current pointer (CP) spans two different CSA addresses. For example, for CP at a calendar entry with an active flow, for example, calendar entry 120, the 32 bits of cached copy data in cache 302 includes 8 active flow indicator bits corresponding to calendar entries (120:127) from the previous cache copy access of CSA 300 that are shifted to a low portion of cache 302 and 24 active flow indicator bits corresponding to calendar entries (128:151) in a top portion of cache 302 from one read of CSA 300.
The cached copy data in cache 302 is also used to allow the current pointer (CP) to catch-up to current time (CT). The CP can regularly fall behind CT. For example, when servicing a calendar entry that contains several chained flows, the CT increments once each time a flow is serviced. The CP does not increment until all flows are serviced in a chain 402 of multiple flows for FLOW ID as shown in
In accordance with features of the preferred embodiment, CP is allowed to be incremented more than 1 position per tick by utilizing the 32 bit entry cache 302. CP is allowed to be incremented by an identified number of calendar entries having no active flows attached up to the CT. To illustrate this, an Example 1 is shown where examining the 32 bit cache 302, a window within the cache reveals the relationship between CP and CT as follows:
In Example 1, CP is currently pointing to a calendar entry with an active flow, for example, entry 312 as shown. Once this flow is serviced, CP may be incremented by 4 since the indicator bits between CT and CP+4 are 0. This indicates that no active flows are attached to the next 4 calendar entries. By incrementing CP by 4, for example, to calendar entry 316, CP will once again=CT. CP is never incremented past CT. This technique allows CP to be incremented by at most 32 positions each tick, that is, the number of entries in the cache 302.
In accordance with features of the preferred embodiment, calendars 220 are segmented into epochs. Epoch is a term used to identify a technique that increases the effective range of a calendar 220 without increasing the physical size of the on-chip array CSA 300 by segmenting the calendar into sections or epochs. Epoch 0 has the highest resolution and lowest range. Epochs 1 through p will have a range of n(p) times the range of the first epoch and a resolution of 1/n**p of the first epoch, where n equals a set scaling factor. As the epoch number increases, calendar range is extended and resolution is reduced.
Referring now to
Epoch 0s are accessed every 4 ticks
Epoch 1s are accessed every 16 ticks
Epoch 2s are accessed every 64 ticks
Epoch 3s are accessed every 256 ticks
Epoch 4s are accessed every 1024 ticks
Referring now to
CSA_Calendar_Update<=
In addition to the regularly scheduled read of the CSA 300 for a given calendar cache 302, the cache 302 can be updated by snooping the CSA location for an enqueue or reattach event. This gets the data to the cache 302 in the event that the normally scheduled update would not get the data to the cache in time. Also if the same flow is picked twice in a row as a winner from the same calendar or ring, the CSA 300 will not be updated as there is not enough time to add and remove the bit from the CSA. Special hardware is used in this case to properly schedule the flow.
Referring now to
A sequence of program instructions or a logical assembly of one or more interrelated modules defined by the recorded program means 604, 606, 608, 610, direct the computer system 100 for implementing Quality-of-Service (QoS) scheduling with a cached status array of the preferred embodiment.
While the present invention has been described with reference to the details of the embodiments of the invention shown in the drawing, these details are not intended to limit the scope of the invention as claimed in the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
4621359 | McMillen | Nov 1986 | A |
5249184 | Woest et al. | Sep 1993 | A |
5490141 | Lai et al. | Feb 1996 | A |
5548590 | Grant et al. | Aug 1996 | A |
5629928 | Calvignac et al. | May 1997 | A |
5650993 | Lakshman et al. | Jul 1997 | A |
5742772 | Sreenan | Apr 1998 | A |
5790545 | Holt et al. | Aug 1998 | A |
5831971 | Bonomi et al. | Nov 1998 | A |
5844890 | Delp et al. | Dec 1998 | A |
5850399 | Ganmukhi et al. | Dec 1998 | A |
5905730 | Yang et al. | May 1999 | A |
5926459 | Lyles et al. | Jul 1999 | A |
5926481 | Wang et al. | Jul 1999 | A |
5946297 | Calvignac et al. | Aug 1999 | A |
5999963 | Bruno et al. | Dec 1999 | A |
6014367 | Joffe | Jan 2000 | A |
6018527 | Yin et al. | Jan 2000 | A |
6028842 | Chapman et al. | Feb 2000 | A |
6028843 | Delp et al. | Feb 2000 | A |
6031822 | Wallmeier | Feb 2000 | A |
6038217 | Lyles | Mar 2000 | A |
6041059 | Joffe et al. | Mar 2000 | A |
6064650 | Kappler et al. | May 2000 | A |
6064677 | Kappler et al. | May 2000 | A |
6067301 | Aatresh | May 2000 | A |
6072772 | Charny et al. | Jun 2000 | A |
6072800 | Lee | Jun 2000 | A |
6078953 | Vaid et al. | Jun 2000 | A |
6081507 | Chao et al. | Jun 2000 | A |
6092115 | Choudhury et al. | Jul 2000 | A |
6094435 | Hoffman et al. | Jul 2000 | A |
6101193 | Ohba | Aug 2000 | A |
6104700 | Haddock et al. | Aug 2000 | A |
6108307 | McConnell et al. | Aug 2000 | A |
6122673 | Basak et al. | Sep 2000 | A |
6144669 | Williams et al. | Nov 2000 | A |
6157614 | Pasternak et al. | Dec 2000 | A |
6157649 | Peirce et al. | Dec 2000 | A |
6157654 | Davis | Dec 2000 | A |
6160812 | Bauman et al. | Dec 2000 | A |
6169740 | Morris et al. | Jan 2001 | B1 |
6188698 | Galand et al. | Feb 2001 | B1 |
6226267 | Spinney et al. | May 2001 | B1 |
6229812 | Parruck et al. | May 2001 | B1 |
6229813 | Buchko et al. | May 2001 | B1 |
6236647 | Amalfitano | May 2001 | B1 |
6246692 | Dai et al. | Jun 2001 | B1 |
6356546 | Beshai | Mar 2002 | B1 |
6389019 | Fan et al. | May 2002 | B1 |
6404768 | Basak et al. | Jun 2002 | B1 |
6469982 | Henrion et al. | Oct 2002 | B1 |
6563829 | Lyles et al. | May 2003 | B1 |
6608625 | Chin et al. | Aug 2003 | B1 |
6646986 | Beshai | Nov 2003 | B1 |
6721325 | Duckering et al. | Apr 2004 | B1 |
6804249 | Bass et al. | Oct 2004 | B1 |
6810012 | Yin et al. | Oct 2004 | B1 |
6810043 | Naven et al. | Oct 2004 | B1 |
6810426 | Mysore et al. | Oct 2004 | B1 |
6813274 | Suzuki et al. | Nov 2004 | B1 |
6850490 | Woo et al. | Feb 2005 | B1 |
6885664 | Ofek et al. | Apr 2005 | B1 |
6888830 | Snyder II et al. | May 2005 | B1 |
6891835 | Kalkunte et al. | May 2005 | B1 |
20020023168 | Bass et al. | Feb 2002 | A1 |
20020181455 | Norman et al. | Dec 2002 | A1 |
20030050954 | Tayyar et al. | Mar 2003 | A1 |
20030058879 | Rumph | Mar 2003 | A1 |
Number | Date | Country |
---|---|---|
0859492 | Aug 1998 | EP |
0957602 | Nov 1999 | EP |
0989770 | Mar 2000 | EP |
1049352 | Nov 2000 | EP |
1061763 | Dec 2000 | EP |
2000183886 | Jun 2000 | JP |
2000295247 | Oct 2000 | JP |
2001007822 | Dec 2000 | JP |
WO9935792 | Jul 1999 | WO |
WO9953647 | Oct 1999 | WO |
WO9953648 | Oct 1999 | WO |
WO0120876 | Mar 2001 | WO |
Number | Date | Country | |
---|---|---|---|
20030081544 A1 | May 2003 | US |