The disclosure generally relates to packet processing and building tuples from the packets.
In some implementations, a network packet processor inputs a stream of network packets, manipulates the contents of the network packets, and outputs another stream of modified network packets. The manipulations may implement a protocol for processing network packets. For example, the network packet processor may implement a protocol layer of a communication protocol, and for a high-level packet received from a higher protocol layer and delivered to a lower protocol layer for eventual transmission on the communication media, the manipulations may encapsulate the high-level packet within a low-level packet of the lower protocol layer.
A common task in processing packets is to form a compact data tuple based on certain fields of a packet. The data tuple makes processing of the assembled data convenient. For example, in a packet classification task, certain address fields and/or type fields are extracted from a packet and then used together as a lookup key to determine the class of the packet. The particular fields and positions of the fields in the packet may vary depending on processing functions and protocols.
The data rate at which packets are transmitted presents challenges for processing the packets at a rate sufficient to keep pace with the data transmission rate. In packet processing applications, packets are streamed word-wise, for example using words that are 512-bits wide and achieving a 100 Gbps data rate. Each packet may be comprised of multiple 512-bit words. The fields of a packet that are used in constructing a tuple are generally located in different areas of the packet. Thus, the fields of a packet will be available at different discrete times. The times at which the fields become available is not necessarily static since packet structures can vary from packet to packet, such as with variable field sizes.
A method for processing a data packet includes, in at least one stage of a plurality of stages of a pipeline circuit, extracting a respective packet field value from the data packet. In each stage of the plurality of stages, a respective tuple field value is inserted into a respective tuple register of the stage at a respective offset. The respective tuple field value in the at least one stage is based on the respective packet field value. In each stage of the plurality of stages except a last one of the stages, the contents of the respective tuple register of the stage are provided as input to a next one of the stages.
A packet processing circuit includes a plurality of pipeline stages. Each stage includes a field extraction circuit and a tuple construction circuit. The field extraction circuit is configured to receive a data packet and is configurable to extract none or a plurality of packet field values from the data packet. The tuple construction circuit is coupled to receive an input tuple and each packet field value from the field extraction circuit. The tuple construction circuit is configured to insert a respective tuple field value into the input tuple at a respective offset and output a tuple having the inserted respective tuple field value. The respective tuple field value is based on the at least one packet field value.
Other aspects and features will be recognized from consideration of the Detailed Description and Claims, which follow.
Various aspects and advantages of the methods and circuits will become apparent upon review of the following detailed description and upon reference to the drawings in which:
To achieve a suitable level of performance and flexibility, it may be desirable to aggregate field values of packets into tuples at a high data rate. In addition, it may be desirable to programmably select fields from data packets and formats of the tuples. In one approach, a method of processing a data packet includes, in at least one stage of multiple stages of a pipeline circuit, extracting a respective packet field value from the multiple fields of the data packet. In each of the stages, a respective tuple field value is inserted into a respective tuple register of the stage at a respective offset. In at least one stage in which the value of a field is extracted, the respective tuple field value is based on the respective packet field value. Depending on application requirements, the tuple field value may also be based on one or more constants or one or more input tuple field values. In each stage except the last stage, the contents of the respective tuple register of the stage are provided as input to a next one of the stages. With the pipelined approach, a tuple can be produced from an input stream of data packets in every cycle. With parallel circuitry, multiple tuples could be generated.
If (Ethernet.type==0x800) //IPv4 type code
Else if (Ethernet.type==0x86dd) //IPv6 type code
Else
A tuple aggregation circuit is provided to construct tuples in a pipelined fashion.
Each stage of the pipeline circuit 400 includes a field extraction circuit 420, a constant staging circuit 422, a computation circuit 424, and a tuple construction circuit 426. Programmed control information is input to the circuit elements for controlling each circuit element. The programmed control information indicates which fields to extract from the packet, any constants to be used, the computation to be performed, and offsets and sizes of the tuple field values in the tuple. The programmed control information may be provided via a microprogramming control store (not shown), for example.
The field extraction circuit 420 is controllable to extract one or more fields from the input packet. For each field to be extracted by the field extraction circuit, the programmed control information indicates an offset of the field in the packet and a size of the field. For a tuple field value that is not based on a packet field, the input program information indicates to the field extraction circuit to not extract any fields from the packet. Further disclosure of a field extraction circuit is found in the co-pending patent application having Ser. No. 13/229,083, entitled, “CIRCUIT AND METHOD FOR EXTRACTING FIELDS FROM PACKETS, by Michael Attig, and assigned to Xilinx, Inc.; the entire contents of this co-pending application are incorporated by reference into this application. The extracted value(s) of the field(s) of the packet are output by the field extraction circuit and input to the computation circuit 424.
The constant staging circuit 422 stages constant values for input to the computation circuit 424. The programmed control information input to the constant staging circuit indicates which constant value, if any, is to be provided to the computation circuit. Depending on application requirements, multiple constant values may be provided to the computation circuit. The programmed control information input to the constant staging circuit may provide the constant values, or alternatively, reference constant values stored within the constant staging circuit. The time at which the constant value(s) is provided as input to the computation circuit coincides with the provision of the field value(s) as input to the computation circuit.
The computation circuit 424 computes the value of the tuple field to be inserted into the tuple based on registered packet field values, registered constant values, and/or a registered input tuple. The computation circuit may be an arithmetic logic unit that performs arithmetic and/or logic functions on designated operands. The operation(s) to be performed may be provided to the computation circuit as executable instructions. The instructions also indicate which registered values are the operands. A no-operation-type instruction may be used to indicate to the computation circuit that a registered value is to be output without changing its value. The computation circuit may provide values for multiple tuple fields depending on application requirements.
The tuple construction circuit 426 inserts the tuple field value(s) from the computation circuit 424 into the proper location(s) in the in-process tuple (the tuple being constructed). The offset(s) provided in the programmed control information indicates the proper location(s) of the tuple field value(s). The size(s) provided in the programmed control information indicates the number of bits occupied by the tuple field value(s). Once the tuple field value(s) is inserted in the tuple, the tuple and packet are forwarded to the next stage in the pipeline. Since packets are streamed word-wise, a tuple does not necessarily have to wait until the entire packet has been received to proceed to the next stage. Rather a tuple may be forwarded to the next stage once the word of the packet having the last needed packet field has been extracted and processed to create the tuple field value. If no field is extracted from an input packet to create any tuple field value, the tuple may be forwarded to the next stage at the same time the first packet word is forwarded to the next stage.
The data path including elements 502, 506, 536, 542, 554, 562, 564, and 566 may be viewed as a mask circuit within the tuple construction circuit, and the elements 510, 512, 532, 540, 552, 560, 568, and 572 may be viewed as a tuple insertion circuit within the tuple construction circuit.
The proper size mask is created by selecting a mask word with multiplexer 502 from mask words having mask sizes that correspond to the different possible sizes of tuple fields. In an example implementation, the mask bits are logic 0 bits and are right aligned in a mask word having logic 1 bits in all other positions. For example, for a tuple field of size 8 bits, the rightmost 8 bits of the mask word selected by and output from multiplexer 502 are logic 0 bits, and all other bits of the selected mask word are logic 1. The tuple field size signal 504 selects the proper mask word, and the selected mask word is stored in register 506.
In parallel with the selection of the mask word, the tuple field value is input via multiplexer 510 and register 512. Also, the field enable signal 514 provides the selection of the tuple field value via multiplexer 510 and the field offset via multiplexer 516. The state of the field enable signal is stored in register 518, and the field offset is stored in register 520. The tuple being constructed is input to register 522 also in parallel with selection of the mask word.
The mask word and the tuple field value are shifted in two stages. In stage 526, the tuple field value and the mask word are left shifted by a number of bits indicated by the low-order bits of the field offset 528, and in stage 530 the output of the first shift stage is shifted by a number of bits indicated by the high-order bits of the field offset. In stage 526, multiplexer 532 selects from inputs in which the tuple field value has been left shifted by 0 to n−1 bits. The notation “<<x” in the diagram indicates a circuit that left shifts the input by x bits. The input tuple field value 534 occupies the low-order (right-most) bits of the input word, and the other bits are logic 0. Logic 0 values are shifted in as the tuple field value is left shifted. The mask in the mask word is also left shifted, and multiplexer 536 selects the mask word that was shifted by the same number of bits as the tuple field value. The mask occupies the low-order bits in the input mask word 538, and the other bits are logic 1. Logic 1 bits are shifted in as the mask is left shifted.
The low-order bits of the field offset are used to control the selections by multiplexers 532 and 536. For selecting from words that have been left shifted from 0 to n−1 bits, bits 0 through log2 n−1 of the field offset are used.
The selected tuple field value is stored in register 540, and the selected mask word is stored in register 542. The tuple, field enable signal, and field offset are forwarded to registers 544, 546, and 548, respectively, to maintain proper timing within the pipeline and allow the next tuple and tuple field value to be processed.
In stage 530, the tuple field value and the mask are left shifted by a number of bits specified by the high-order bits of the field offset. In stage 530, multiplexer 552 selects from inputs in which the tuple field value has been left shifted by 0, n, 2n, . . . n(n−1) bits, and multiplexer 554 selects from inputs in which the mask has been left shifted by 0, n, 2n, . . . n(n−1) bits. For the tuple field value, logic 0 bits are shifted in, and for the mask word, logic 1 bits are shifted in. For selecting from words that have been left shifted from 0, n, 2n, . . . n(n−1) bits, bits log2n through 2 log2 n−1 of the field offset are used. The tuple, field enable signal, selected tuple field value, and selected mask word are stored in registers 556, 558, 560, and 562, respectively.
The tuple from register 556 and the mask word from register 562 are input to AND circuit 564, which clears the bits in the tuple for the tuple field value to be inserted. The output is stored in register 566, and in parallel, the tuple field value from register 560 is stored in register 568, and the field enable signal is stored in register 570. The tuple with the cleared bits from register 566 and the tuple field value from register 568 are input to OR circuit 572, which outputs the tuple with the tuple field value inserted at the proper offset in the tuple. The tuple is stored in register 574, and in parallel, the field enable signal is forwarded for storage in register 576. The tuple is then ready for the next stage (if any) of the pipeline circuit 400 of
Multiple tuple fields may be inserted into a tuple in parallel in an example implementation. For each tuple field value to be inserted, the circuitry for shifting the tuple field value and constructing and shifting a mask would be replicated. The dashed line 578 input to AND circuit 564 represents the mask word having the shifted mask for the additional tuple field value. The dashed line 580 input to OR circuit 572 represents the additional shifted tuple field value.
In Step 1, the masks for fields 1 and 2 are constructed. This involves creating a mask of 0xFFFF FFFF FFFF FFFF FFFF FFFF FF00 for field 1 and a mask of 0xFFFF FFFF FFFF FFFF FFFF FFFF 0000 for field 2. Note that the first mask clears 8 bits while the second mask clears 16 bits.
In Step 2, the fields and masks are aligned to the appropriate position in the tuple being constructed by using the appropriate offset for the input field. The aligned field and mask values for field 1 are 0x0006 0000 0000 0000 0000 0000 0000 and 0xFF00 FFFF FFFF FFFF FFFF FFFF FFFF, respectively. The aligned field and mask values for field 2 are 0x0000 0000 0000 0000 0000 0032 0000 and 0xFFFF FFFF FFFF FFFF FFFF 0000 FFFF, respectively.
In Step 3 the masks are applied to the input tuple. This results in a change to the value held in the srcPort from 0xFFFF to 0x0000. There is no change to the proto field, because it was already at 0x00.
In Step 4 the new fields are inserted. This results in a value of 0x06 for the proto field and 0x0032 for the srcPort field.
In Step 5 the result is output. The final tuple value is 0x006 0000 0000 0000 0000 0032 8888.
At block 608, the data for generating a tuple field value is obtained. As described above, the data may be one or more fields extracted from the input packet, one or more constant values, or one or more input tuple field values. The tuple field value is generated at block 610. The tuple field value may be an arithmetic or logic function of one or more packet field values, one or more constants, and/or one or more input tuple field values.
The tuple field value is inserted into the tuple at block 612, and the tuple is output at block 614. For stages other than the final stage, the tuple is output for processing by the next stage in the tuple construction pipeline, and for the final stage, the tuple is output from the pipeline.
In some FPGAs, each programmable tile includes a programmable interconnect element (INT 711) having standardized connections to and from a corresponding interconnect element in each adjacent tile. Therefore, the programmable interconnect elements taken together implement the programmable interconnect structure for the illustrated FPGA. The programmable interconnect element INT 711 also includes the connections to and from the programmable logic element within the same tile, as shown by the examples included at the top of
For example, a CLB 702 can include a configurable logic element CLE 712 that can be programmed to implement user logic plus a single programmable interconnect element INT 711. A BRAM 703 can include a BRAM logic element (BRL 713) in addition to one or more programmable interconnect elements. Typically, the number of interconnect elements included in a tile depends on the width of the tile. In the pictured FPGA, a BRAM tile has the same width as five CLBs, but other numbers (e.g., four) can also be used. A DSP tile 706 can include a DSP logic element (DSPL 714) in addition to an appropriate number of programmable interconnect elements. An 10B 704 can include, for example, two instances of an input/output logic element (IOL 715) in addition to one instance of the programmable interconnect element INT 711. As will be clear to those of skill in the art, the actual I/O pads connected, for example, to the I/O logic element 715 are manufactured using metal layered above the various illustrated logic blocks, and typically are not confined to the area of the input/output logic element 715.
In the pictured FPGA, a horizontal area near the center of the die (shown shaded in
Some FPGAs utilizing the architecture illustrated in
Note that
The methods and circuits are thought to be applicable to a variety of systems for constructing tuples. Other aspects and features will be apparent to those skilled in the art from consideration of the specification. The processes and circuits may be implemented as one or more processors configured to execute software, as an application specific integrated circuit (ASIC), or as a logic on a programmable logic device. It is intended that the described features and aspects be considered as examples only, with a true scope of the invention being indicated by the following claims.
Number | Name | Date | Kind |
---|---|---|---|
6778530 | Greene | Aug 2004 | B1 |
7100078 | Pass | Aug 2006 | B1 |
8358653 | Attig et al. | Jan 2013 | B1 |
8385340 | Attig et al. | Feb 2013 | B1 |
8443102 | Attig et al. | May 2013 | B1 |
8625438 | Attig | Jan 2014 | B1 |
20030046429 | Sonksen | Mar 2003 | A1 |
Entry |
---|
“400 Gb/s Programmable Packet Parsing on a single FPGA”, Michael Attig and Gordon Brebner, 2011, provided in IDS. |
Attig M. et al., “400 Gb/s Programmable Packet Parsing on a Single FPGA”, 2011 Seventh ACM/IEEE Symposium on Architectures for Networking and Communications Systems, (ANCS '11), Oct. 3-4, 2011, pp. 12-23., Brookleyn, NY, US. |