A. Field of the Invention
The present invention relates generally to data switching and routing, and more particularly, to systems and methods for controlling data flow.
B. Description of Related Art
Routers receive data on physical media, such as optical fiber, analyze the data to determine its destination, and output the data on physical media in accordance with the destination. Routers were initially designed using a general purpose processor executing large software programs. As line rates and traffic volume increased, however, general purpose processors could not scale to meet these new demands. For example, as functionality was added to the software, such as accounting and policing functionality, these routers suffered performance degradation. In some instances, the routers failed to handle traffic at line rate when the new functionality was turned on.
To meet the new demands, purpose-built routers were architected. Purpose-built routers are designed and built with components optimized for routing. They not only handled higher line rates and higher network traffic volume, they also added functionality without compromising line rate performance.
Flow-control refers to the metering of packet flow through the network and/or through the router. For example, it may be desirable to limit the number of packets received from a certain port of the router to a pre-designated rate. One known method of implementing flow-control is based on a credit system. With this method, each data flow that is to be controlled is associated with a credit counter. As packets in the flow are transmitted by the router, the credit counter is decremented. Conversely, the credit counter is incremented based on a credit replenishment scheme, such as by periodically incrementing the credit counter up to a maximum credit amount. The router checks the credit counter before transmitting a packet and drops the packet if the credit counter is below a predetermined value. Through the operation of this type of credit-counter, the router can enforce a data flow policy such as limiting the maximum transmission rate for a particular data flow below a certain rate.
ISP 104 may wish to give each customer a predetermined guaranteed bandwidth. The total bandwidth of the data flow coming from each customer should not exceed this bandwidth. If it does, ISP 104 may drop packets from the customer's data flow. Traditional credit-based flow control techniques, such as those discussed above, may be used by ISP 104 to manage the bandwidth being used by the ISP's customers.
One drawback of traditional credit-based flow control techniques is that these techniques tend to produce “choppy” traffic patterns when interacting with other network protocols, such as the commonly used Transmission Control Protocol (TCP).
Accordingly, there is a need in the art to improve traditional flow control techniques.
Systems and methods consistent with the principles of the invention, among other things, provide for improved data flow policy enforcement mechanisms.
One aspect consistent with the principles of the invention is directed to a data flow policing device. The device includes a policer and a memory. The policer is configured to receive a policing request that includes an indication of a packet belonging to a data flow. The policer determines whether the packet is within specification using a function that implements a probabilistic comparison based on a credit count associated with the data flow. The memory stores a data structure corresponding to the data flow. The data structure includes the credit count of the data flow.
Another aspect consistent with the principles of the invention is directed to a data flow policing device that includes a memory configured to store data structures corresponding to a plurality of data flows and a policer. The data structures include at least a credit count associated with the data flows. The policer receives a policing request that includes an indication of a packet belonging to at least one of the data flows. The policer, in response to the request, reads the data structure corresponding to the data flow from the memory, determines whether the packet is within specification based on the credit count associated with the data flow, and writes an updated version of the read data structure to the memory.
Yet another aspect consistent with the invention is directed to a method that includes receiving a request to perform a credit based flow control operation. The request identifies at least the length of a data packet and a flow to which the data packet belongs. The method also includes reading a data structure corresponding to the identified flow from a memory, where the data structure includes at least an indication of a credit count associated with the data flow. Further, the method includes determining whether the data packet is within specification based on the credit count and the packet length and updates the data structure in the memory.
A further aspect consistent with the invention is a network device comprising a physical interface configured to receive packets from and transmit packets to a network and a processing unit. The processing unit is configured to store the received packets and examine header information of the packets to determine a destination device for the packets. The processing unit includes a route lookup unit that comprises a plurality of route lookup engines, a policer, and a memory. The policer receives a policing request for a packet associated with a data flow from one of the route lookup engines and determines whether the packet is within specification based on information contained in the data structure associated with the data flow. The memory is coupled to the policer and stores the data structure corresponding to the data flow.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the description, explain the invention. In the drawings,
The following detailed description of the invention refers to the accompanying drawings. The same reference numbers may be used in different drawings to identify the same or similar elements. Also, the following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims and equivalents of the claim limitations.
As described herein, a rate policer enforces data flow policies for a number of data flows using a probabilistic policy enforcement mechanism. The state of each data flow is stored in a compact data structure, allowing the number of data flows handled by the policer to be programmably increased or decreased. The computations performed by the rate policer can be implemented in hardware to increase performance.
RE 330 performs high level management functions for system 300. For example, RE 330 may communicate with other networks and systems connected to system 300 to exchange information regarding network topology. RE 330 may create routing tables based on network topology information, create forwarding tables based on the routing tables, and forward the forwarding tables to PFEs 310. PFEs 310 may use the forwarding tables to perform route lookup for incoming packets. RE 330 may also performs other general control and monitoring functions for system 300.
PFEs 310 are each connected to RE 330 and switch fabric 320. PFEs 310 receive data at ports on links connected to a device or a network, such as a wide area network (WAN) or a local area network (LAN). Each link could be one of many types of transport media, such as optical fiber, Ethernet cable, or wireless. The data on the link is formatted according to one of several protocols, such as the synchronous optical network (SONET) standard or Ethernet.
PFE 310 processes incoming data by stripping off the data link layer. PFE 310 may convert header information from the remaining data into data structures referred to herein as “cells” (where a cell is a fixed length data unit). For example, in one embodiment, the data remaining after the data link layer is stripped off is packet data. PFE 310 includes the layer 2 (L2) and layer 3 (L3) packet header information, some control information regarding the packets, and the packet data in a series of cells called “D” cells. In one embodiment, the L2, L3, and the control information are stored in the first two cells of the series of cells.
PFE 310 may form a notification based on the L2, L3, and control information, and performs a route lookup using the notification and the routing table from RE 330 to determine destination information. PFE 310 may also further process the notification to perform protocol-specific functions, policing, and accounting, and might even modify the notification to form a new notification. One policing function that may be performed by PFE 310 is flow control, such as credit based flow control, as will be described below.
If the determined destination indicates that the packet should be sent out on a link connected to PFE 310, then PFE 310 retrieves the cells for the packet, converts the notification or new notification into header information, forms a packet using the packet data from the cells and the header information, and transmits the packet from the port associated with the link.
If the destination indicates that the packet should be sent to another PFE via switch fabric 320, then PFE 310 retrieves the cells for the packet, modifies the first two cells with the new notification and new control information, if necessary, and sends the cells to the other PFE via switch fabric 320. Before transmitting the cells over switch fabric 320, PFE 310 may append a sequence number to each cell, which allows the receiving PFE to reconstruct the order of the transmitted cells. Additionally, the receiving PFE uses the notification to form a packet using the packet data from the cells, and sends the packet out on the port associated with the appropriate physical link of the receiving PFE.
In summary, in one embodiment, RE 330, PFEs 310, and switch fabric 320 perform routing based on packet-level processing. PFEs 310 store each packet in cells while performing a route lookup using a notification, which is based on packet header information, including L2 and L3 layer header information. A packet might be received on one PFE and go back out to the network on the same PFE, or be sent through switch fabric 320 to be sent out to the network on a different PFE.
PIC 410 may transmit data between a link and FPC 420. Different PICs may be designed to handle different types of links. For example, one of PICs 410 may be an interface for an optical link, another PIC may be an interface for an Ethernet link, and another a wireless interface.
FPCs 420 perform routing functions and handle packet transfers to and from PICs 410 and switch fabric 320. For each packet it handles, FPC 420 performs the previously-discussed route lookup function. Although
As will be described in greater detail below, processing units 532 and 534 may process packet data flowing between PICs 410 and first I/O unit 536. Each processing unit 532 and 534 may operate to process packet data received from the PIC(s) connected to it and to process data received from first I/O unit 536.
More particularly, processing unit 532 or 534 may process packets from PIC 410 to convert the packets into data cells, and transmit the data cells to first I/O unit 536. Data cells are the data structure used by FPC 420 internally for transporting and storing data. In one implementation, data cells are 64 bytes in length.
In the other direction, processing unit 532 or 534 receives data cells from first I/O unit 536, extracts certain information and packet data from the data cells, and creates a packet based on the extracted information. Processing unit 532 or 534 creates the packet header from the information extracted from the data cells. In one embodiment, processing unit 532 or 534 creates L2 and L3 header information based on the extracted information. The created L2 and L3 header information constitutes a new header that the packet uses as it is subsequently transmitted through the link.
Memory unit 540 may temporarily store data cells from first I/O unit 536 and second I/O unit 538 and notifications from R unit 542. Memory unit 540 may dispatch the notifications to first I/O unit 536 and second I/O unit 538. In response, first I/O unit 536 and second I/O unit 538 may use the address information in the notification to read out data cells from memory unit 540 that correspond to a notification. The notification received from memory unit 540 may have been modified by R unit 542 with route or encapsulation lookup results. First I/O unit 536 and second I/O unit 538 may update the data cells read out of memory unit 540 with information from the modified notification. The data cells, which now include information from the modified notification, are sent to processing unit 532, processing unit 534, or switch fabric 320, depending on which of first I/O unit 536 or second I/O unit 538 is processing the notification.
R unit 542 may receive notifications from first I/O unit 536 and second I/O unit 538. R unit 542 may receive one or more forwarding tables from RE 330 (
R unit 542 may provide route lookup, accounting, and policing functionality based on the notifications. Consistent with aspects of the invention, the policing function performed by R unit 542 includes probabilistic packet flow policing. This aspect of the invention will be described in more detail below.
In another embodiment consistent with the principles of the invention, route lookup engine 601, independently of the notifications, keeps track of which data flows are subject to policing. Based on this information, router lookup engines determines whether to generate requests to policer 602.
Policer 602 receives policing requests from route lookup engine 601. In response, policer 602 determines whether the packet corresponding to the request is within its credit limit specification. If it is, route lookup engine 601 forwards the packet as normal. If it is not, route lookup engine 601 may drop the packet. Alternatively, instead of simply dropping the packet, route lookup engine 601 may perform some other function on the packet, such as tagging the packet for special handling. A packet corresponding to an acceptable credit count will be referred to herein as a packet that is “within specification” while packets that are to be dropped or tagged will be referred to as “out of specification” packets.
Data structure storage component 603 stores data structures, such as data structures 610-612, which are used by policer 602. In one implementation consistent with aspects of the invention, data structure storage component 603 is a high-speed random access memory that stores a data structure corresponding to each data flow on which policer 602 may operate. The data structures store information, such as the present credit count corresponding to the data flow.
R field 704 stores a value that determines the granularity of the time value kept by policer 602 when determining current_time. For example, based on R field 704, the time value kept by policer 602 may have a period of 5.6 micro-seconds ( 1/1024 of the core clock period), 210 micro-seconds ( 1/32 of the core clock period), 6712 micro-seconds (the core clock period), or 215092 micro-seconds (32 times the core clock period). U field 705 stores a value that represents the granularity with which the actual packet length (PLEN) is used in calculating the new credit value. In one implementation, PLEN is multiplied by 2 raised to the value stored in U field 705 to obtain an adjusted packet length that is then subtracted from credit_count. Thus, when U field 705 stores the value zero, the adjusted packet length is equal to the actual packet length. CL field 1006 stores the value of credit_limit.
In one exemplary implementation, data structure 710 may be a 128 bit structure divided into four 32 bit words. More specifically, unused pad field 708 and out-of-spec packet counter field 702 may be 32 bit fields, current credit field 702 may be a 19 bit field, and time credit field 703 may be a 13 bit field. R field 704, U field 705, CL field 706, and last adjustment time field 707 may be, respectively, two bit, four bit, four bit, and 22 bit fields.
Time_credit field 703, R field 704, U field 705, and CL field 706 are user programmable values that are generally held constant throughout the operation of system 300. In contrast, out-of-spec packet counter 701, credit_count field 702, and last_adjustment_time field 707 may be dynamically adjusted by policer 602 when it processes the data structure's corresponding packet.
In response to the request, policer 602 accesses data structure storage component 603 and requests the data structure corresponding to the data flow (Act 802). Policer 602 receives the corresponding data structure from data structure storage component 603 (Act 803). As described above, the fields in the data structure include a current credit field (credit_count) that stores the current number of credits associated with the flow and a credit limit field (CL) that specifies the maximum allowed value of the current credit field.
With the data structure corresponding to the active data flow in hand, policer 602 processes the data structure (Act 804). Policer 602 then returns an updated version of the data structure to data structure storage component 603 and transmits an indication of whether the packet is within specification or out of specification to route lookup engine 601 (Acts 805 and 806).
In general, policer 602 includes two main processing components, credit increment component 1002, and decision component 1003. Credit increment component 1002 and decision component 1003 receive a number of parameters. More particularly, credit increment component 1002 receives time_credit field 703, last_adjust_time field 807, and an indication of the current time (current_time). Time_credit and last_adjust_time are stored in the data structure corresponding to the data flow. Current_time is an indication of a current time value kept by policer 602. Last_adjust_time indicates the previous time that the credit counter was increased. Based on these three values, credit increment component 1002 calculates “credit_increment,” which indicates how much the current credit count should be incremented. More specifically, credit increment component 1002 generates credit_increment based on the difference between current_time and last_adjust_time multiplied by time_credit, which represents the amount of credit the counter receives per time increment. Stated more formally, credit_increment=(current_time−last_adjust_time)×time_credit.
Decision component 1003 receives credit_increment from credit increment component 1002. Additionally, decision component 1003 receives an indication of the packet length (PLEN), the credit limit for the flow, and the credit count (credit_count field 702) that was generated for the previous packet processed for the flow. The packet length is received from the requesting route lookup engine 601. Credit_limit and credit_count are included in the data structure corresponding to the flow.
Decision component 1003 generates a new credit value based on the packet length, credit_count, and credit_increment. For example, the new credit value may be generally calculated as the credit_count plus credit_increment minus the packet length (or minus a value derived from the packet length). However, the maximum allowable value for the new credit value is capped at the credit limit. Additionally, if decision component 1003 determines that the packet is out of specification and is to be dropped, the new credit value is not decremented based on the packet length.
Decision component 1003 may generate the indication of whether or not a packet is within specification using the probabilistic function discussed above. That is, when the value of new credit is within region 902 (
After policer engine 601 generates the new credit value and the indication of whether the packet is within specification, policer 602 returns the indication of within specification or out of specification to route lookup engine 601 and updates the data structure.
As described above, a rate policer enforces data flow policies for a number of data flows using a probabilistic policy enforcement mechanism. The probabilistic enforcement mechanism helps avoids the choppy and non-uniform bandwidth pattern associated with conventional hard-drop credit based rate policers. Because state information of each data flow operated on by the rate policer is stored as a relatively small data structure in memory, the number of flows operated on by the rate policer can be expanded simply by generating an additional data structure. Additionally, the core computation sections of the rate policer may be implemented in hardware, thus increasing the performance of the rate policer.
The foregoing description of preferred embodiments of the invention provides illustration and description, but is not intended to be exhaustive or to limit the invention to the precise form disclosed. Modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention. Moreover, while a series of acts has been presented with respect to
Also, PFEs 310 may be implemented in hardware, software, or some combination thereof. For example, various portions of PFEs 310 may be implemented as application-specific integrated circuits (ASICs). The ASICs may be configured to perform some processing via dedicated logic, and may also be configured to perform some processing using microcode instructions that may be stored in memory. Those skilled in the router art will appreciate that the invention described herein might be practiced using a variety of hardware configurations in addition to, or instead of, ASICs. For example, some combination of general purpose processors, digital signal processors (DSPs), and programmable gate arrays (PGAs) may also be used to implement the functionality described herein.
No element, act, or instruction used in the description of the present application should be construed as critical or essential to the invention unless explicitly described as such. Also, as used herein, the article “a” is intended to include one or more items. Where only one item is intended, the term “one” or similar language is used.
The scope of the invention is defined by the claims and their equivalents.
This application is a continuation of U.S. patent application Ser. No. 11/741,363, filed Apr. 27, 2007 which is a continuation of Ser. No. 10/098,493, filed Mar. 18, 2002, now U.S. Pat. No. 7,227,840, the disclosures of which are incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
5063562 | Barzilai et al. | Nov 1991 | A |
5867494 | Krishnaswamy et al. | Feb 1999 | A |
5867495 | Elliott et al. | Feb 1999 | A |
5953318 | Nattkemper et al. | Sep 1999 | A |
5999525 | Krishnaswamy et al. | Dec 1999 | A |
6032272 | Soirinsuo et al. | Feb 2000 | A |
6046980 | Packer | Apr 2000 | A |
6167027 | Aubert et al. | Dec 2000 | A |
6185214 | Schwartz et al. | Feb 2001 | B1 |
6285658 | Packer | Sep 2001 | B1 |
6381649 | Carlson | Apr 2002 | B1 |
6388992 | Aubert et al. | May 2002 | B2 |
6687247 | Wilford et al. | Feb 2004 | B1 |
6798777 | Ferguson et al. | Sep 2004 | B1 |
6810031 | Hegde et al. | Oct 2004 | B1 |
6839794 | Schober | Jan 2005 | B1 |
6850252 | Hoffberg | Feb 2005 | B1 |
6901052 | Buskirk et al. | May 2005 | B2 |
6944168 | Paatela et al. | Sep 2005 | B2 |
6996117 | Lee et al. | Feb 2006 | B2 |
7027393 | Cheriton | Apr 2006 | B1 |
7042848 | Santiago et al. | May 2006 | B2 |
7103003 | Brueckheimer et al. | Sep 2006 | B2 |
7123583 | Hoar et al. | Oct 2006 | B2 |
7215637 | Ferguson et al. | May 2007 | B1 |
7227840 | Ferguson et al. | Jun 2007 | B1 |
7246233 | Brabson et al. | Jul 2007 | B2 |
7277388 | Koodli | Oct 2007 | B1 |
7349403 | Lee et al. | Mar 2008 | B2 |
7715315 | Ferguson et al. | May 2010 | B1 |
20010012272 | Aubert et al. | Aug 2001 | A1 |
20020163909 | Sarkinen et al. | Nov 2002 | A1 |
20020186661 | Santiago et al. | Dec 2002 | A1 |
Number | Date | Country |
---|---|---|
762695 | Mar 1997 | EP |
Number | Date | Country | |
---|---|---|---|
20100177638 A1 | Jul 2010 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11741363 | Apr 2007 | US |
Child | 12732370 | US | |
Parent | 10098493 | Mar 2002 | US |
Child | 11741363 | US |