The present invention relates to counters in a high speed network switch. More particularly, the present invention relates to counter with overflow FIFO and a method thereof.
Statistics counters are used to perform data analytics in a high speed network device. To be useful, an architecture needs to store a large number of counters. Although off-chip DRAM (dynamic random access memory) can be used, it cannot accommodate high speed counter updates. On-chip SRAM (static random access memory) allows for greater speed but is very expensive. Since the memory is one of the most expensive resources in an SOC (system on chip), it is critical to efficiently and flexibly utilize the memory. When dealing with storing multiple counters, there exists a tradeoff between fewer larger counters or more smaller counters. Ideally, each counter is long enough to avoid integer overflow, the wrapping around of the counter. However, in standard practice, this leads to overprovisioning, assigning the worst case number of bits for all counters.
Embodiments of the present invention relate to an architecture that extends counter life by provisioning each counter for an average case and handles overflow via an overflow FIFO and an interrupt to a process monitoring the counters. This architecture addresses a general optimization problem, which can be stated as, given N counters, for a certain CPU read interval T, of how to minimize the number of storage bits needed to store and operate these N counters. Equivalently, this general optimization problem can also be stated as, given N counters and a certain amount of storage bits, of how to optimize and increase CPU read interval T. This architecture extends the counter CPU read interval linearly with depth of the overflow FIFO.
In one aspect, a counter architecture is provided. The counter architecture is typically implemented in a network device, such as a network switch. The counter architecture includes N wrap-around counters. Each of the N wrap-around counters is associated with a counter identification. In some embodiments, each of the N wrap-around counters is w-bits wide. In some embodiments, the N wrap-around counters are in an on-chip SRAM memory.
The counter architecture also includes an overflow FIFO that is used and shared by the N wrap-around counters. The overflow FIFO typically stores the associated counter identifications of all counters that are overflowing.
In some embodiments, the counter architecture also includes at least one interrupt sent to a CPU to read the overflow FIFO and one of the overflowed counters.
In some embodiments, in a timing interval T, a number of counter overflow is M=ceiling(EPS*T/2w), wherein EPS is events per second, and w is the bit width of each counter. In some embodiments, EPS is packets per second for packet count. Alternatively, EPS is bytes per second for byte count.
In some embodiments, the overflow FIFO is M-deep and log2N-bits wide to capture all counter overflows.
In some embodiments, the counter architecture requires w*N+M*log2N total storage bits.
In another aspect, a method of a counter architecture is provided. The counter architecture includes at least one counter. The method includes incrementing a count in the at least one counter. Each of the at least one counter is typically associated with a counter identification. In some embodiments, the at least one counter is a wrap-around counter.
The method also includes, upon overflowing one of the at least one counter, storing the counter identification of the overflowed counter in a queue. In some embodiments, the queue is a FIFO buffer. In some embodiments, storing the counter identification in the queue sends interrupt to a CPU to read values from the queue and the overflowed counter.
In some embodiments, the method also includes calculating an actual value of the overflowed counter from the read values.
In some embodiments, the method also includes, after reading the overflowed counter, clearing the overflowed counter.
In yet another aspect, a method of a counter architecture is provided. The counter architecture includes a plurality of wrap-around counters. The method includes incrementing counts in the plurality of wrap-around counters. Typically, each of the plurality of counters is associated with a counter identification. The method also includes upon occurrence of an overflow of one of the plurality of wrap-around counters, storing the counter identification in an overflow FIFO, processing data at the head of the overflow FIFO, identifying a wrap-around counter by the data at the head of the overflow FIFO, reading a value stored in the identified wrap-around counter, and clearing the identified wrap-around counter.
In some embodiments, each of the plurality of wrap-around counters has the same width.
In some embodiments, the overflow FIFO is shared by the plurality of wrap-around counters.
In some embodiments, the counter architecture is implemented in a network device.
In some embodiments, the method includes repeating processing data, reading the overflow FIFO as long as it is not empty, identifying a wrap-around counter, reading a value and clearing the identified wrap-around counter.
In yet another aspect, a network device is provided. The network device includes a common memory pool. Typically, memories from the common memory pool are separated into a plurality of banks. The network device also includes a counter architecture for extending CPU read interval. The counter architecture includes N wrap-around counters that use at least a subset of the plurality of banks. Typically, each of the N wrap-around counters is associated with a counter identification. The counter also includes an overflow FIFO that stores associated counter identifications of all counters that wrap around.
In some embodiments, the network device also includes SRAM. The N wrap-around counters are stored in SRAM. In some embodiments, the overflow FIFO is stored in SRAM. Alternatively, the overflow FIFO is fixed function hardware.
In some embodiments, the network device also includes at least one interrupt sent to a CPU to read the overflow FIFO and to read and clear one of the N wrap-around counters.
In some embodiments, in a timing interval T, a number of counter overflow is M=ceiling(total_count_during_interval_T/2w), wherein total_count_during_interval_T is determined by bandwidth of the network device, and w is the bit width of each counter. In some embodiments, the total_count_during_interval_T is PPS*T for packet count, wherein PPS is packets per second. In some embodiments, the total_count_during_interval_T is BPS*T for byte count, wherein BPS is bytes per second.
The foregoing will be apparent from the following more particular description of example embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating embodiments of the present invention.
In the following description, numerous details are set forth for purposes of explanation. However, one of ordinary skill in the art will realize that the invention can be practiced without the use of these specific details. Thus, the present invention is not intended to be limited to the embodiments shown but is to be accorded the widest scope consistent with the principles and features described herein.
Embodiments of the present invention relate to an architecture that extends counter life by provisioning each counter for an average case and handles overflow via an overflow FIFO and an interrupt to a process monitoring the counters. This architecture addresses a general optimization problem, which can be stated as, given N counters, for a certain CPU read interval T, of how to minimize the number of storage bits needed to store and operate these N counters. Equivalently, this general optimization problem can also be stated as, given N counters and a certain amount of storage bits, of how to optimize and increase CPU read interval T. This architecture extends the counter CPU read interval linearly with depth of the overflow FIFO.
The overflow FIFO stores the associated counter identifications of all counters that are overflowing. Typically, as soon as any of the N counters 105 starts overflowing, the associated counter identification of the overflowed counter is stored in the overflow FIFO 110. An interrupt is sent to a CPU to read the overflow FIFO 110 and the overflowed counter. After the overflowed counter is read, the overflowed counter is cleared or reset.
In a timing interval T, the number of counter overflow is M=ceiling(PPS*T/2w), wherein PPS is packets per second, and w is the bit width of each counter. The total count of packets during interval T is PPS*T. Assume PPS is up to 654.8 MPPS, T=1, w=17 and N=16 k. Based on these assumptions, there are up to 4,995 overflow events per second.
The overflow FIFO is typically M-deep and log2N-bits wide to capture all counter overflows. As such, the counter architecture 100 requires w*N+M*log2N total storage bits, where M=ceiling(PPS*T/2w).
The graph 200 indicates that it is optimal for the counter architecture 100 to include counters that are 19-bits wide as the total storage bits required is the least. Taking, for example, the two lowest points, w=18 and w=19, in the graph 200, the total number of storage bits needed are approximately 329.882 kb (=18*16 k+(654.8 M/218)*log216 k) and 328.781 kb (=19*16 k+(654.8 M/219)*log216 k), respectively. As illustrated in
At a step 310, upon overflowing one of the at least one counter, the counter identification of the overflowed counter is stored in a queue. In some embodiments, the queue is a FIFO buffer. The queue is typically shared and used by all counters in the counter architecture 100. In some embodiments, storing the counter identification in the queue sends an interrupt to the CPU to read values from the queue and the overflowed counter. It is possible to then calculate the actual value of the overflowed counter from the read values. After the overflowed counter is read by the CPU, the overflowed counter is typically cleared or reset.
For example, a counter with 5 as its counter identification is the first counter to overflow during arithmetic operations. The counter identification (i.e., 5) is then stored in the queue, presumably at the head of the queue since counter #5 is the first counter to overflow. In the meantime, the count in counter #5 can still be incremented. In the meantime, other counters can also overflow, with the counter identifications of those counters being stored in the queue.
An interrupt is sent to the CPU to read the value at the head of the queue (i.e., 5). The CPU reads the current value stored in the counter associated with the counter identification (i.e., counter #5). Since the counter width is known, the actual value of the counter can be calculated. Specifically, the actual value of the counter is 2w plus the current value stored in the counter. Continuing with the example, assume the current value of counter #5 is 2 and w=17. The actual value of counter #5 is 131,074 (=217+2). As long as the queue is not empty, the CPU continuously reads and clears the values from the queue and the counters.
The final total count of a particular counter is: the number of times the counter identification appears in the queue*2w plus counter remainder value.
Although the counters have been described as for counting packets, it should be noted that the counters can be used for counting anything, such as bytes. Generally, an expected total count during T is calculated as EPS*T, where EPS is events per second. An upper bound of this maximum total count during time interval T can be established or calculated since the network switch is typically designed with a certain bandwidth from which the event rate can be calculated.
One of ordinary skill in the art will realize other uses and advantages also exist. While the invention has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the invention can be embodied in other specific forms without departing from the spirit of the invention. Thus, one of ordinary skill in the art will understand that the invention is not to be limited by the foregoing illustrative details, but rather is to be defined by the appended claims.