1. Field of the Invention
This invention is related to the field of network communication and especially Ethernet communication, and more particularly to flow control on networks.
2. Description of the Related Art
Networking of computers and other electronic devices has become ubiquitous. While a variety of networking standards exist, Ethernet is one of the most popular. In particular, Gigabit Ethernet and 10 Gigabit Ethernet is becoming widely used.
The Ethernet standard currently does not permit the interruption of transmission of a packet. That is, once the first byte of a packet is transmitted on the communication media, the transmission must continue with consecutive bytes to the last byte of the packet without any “bubbles” or wait states in the transmission on the communication media (e.g. twisted pair copper wiring, optical fiber, etc.). If the source of the packet cannot supply all of the bytes of a packet, the packet is terminated and the receiver drops the packet as a bad packet.
As the bandwidth of the network interfaces has increased, the likelihood that other factors in a system become bottlenecks to transmission has also increased. For example, memory latency (in reading packets for transmission or writing packets that have been received) can become an issue. Contention for access to the memory (e.g. by processors or other devices in a host system) increases the effective memory latency, further exacerbating the effect.
Memory latency on the transmit side to read the packet from memory may be an issue since the packet may not be read quickly enough for complete transmission without any delays. Buffering in the network controller may be used to mitigate this effect, but it may not be feasible to include enough buffering in some cases. While the Ethernet standard specifies a maximum packet size of about 1500 bytes, many products implement larger packet sizes (e.g. 9 kilobytes or 16 kilobytes). Bandwidth is wasted by transmitting packets that must be dropped because the source cannot complete the transmission.
Similarly, memory latency on the receive side may prevent writing the packet data successfully to memory before a buffer in the network controller (or elsewhere in the system) overflows. The Ethernet standard (and particularly Institute of Electrical and Electronic Engineers (IEEE) specification 802.3) permits the use of a flow control packet by a receiver. The flow control packet, which is also referred to as a pause packet, can be transmitted from a receiver to the transmitter if the receiver is temporarily unable to receive packets. The flow control packet directs the transmitter to cease transmission of any packets to the receiver for a period of time specified in the packet. The transmitter may transmit up to two more packets, and then ceases packet transmission for the requested time. The flow control packet can be used to avoid dropping packets at the receiver. For example, if memory latency is causing the receiver to be unable to receive packets, the flow control packet can be used to insert delay in packet transmission so that the memory system can “catch up”. However, the transmitter can transmit up to two more packets (each of which may be, e.g., up to 16 kilobytes in size) before the flow control takes effect. These packets can be dropped by the receiver if memory latency is an issue.
Quality of service (QOS) metrics are becoming increasingly common on networks. Users can pay for different levels of service. Users for which low bandwidth communication is sufficient and for which communication latency is a lesser issue can pay for low priority service. For other users requiring higher bandwidth and/or dedicated bandwidth, higher priority service can be purchased (typically at higher prices). To manage different levels of service, the network controllers implement separate buffers, or queues, for the different levels. The buffers can be even further subdivided according to user, transmitter, receiver, etc. To abstract the various divisions, a set of channels may be supported and priorities may be assigned to each channel.
In one embodiment, a system comprises a communication medium; a first controller coupled to the communication medium; and a second controller coupled to the communication medium. The first controller is configured to interrupt transmission of a packet on the communication medium to the second controller subsequent to transmission of a first portion of the packet. The first controller is configured to transmit at least one control symbol on the communication medium in response to interrupting transmission of the packet, and wherein the first controller is configured to continue transmission of the packet with a second portion of the packet.
In another embodiment, a controller is configured to communicate packets on a communication medium. The controller comprises a media access controller (MAC) configured to transmit a packet as a plurality of bytes and a physical coding sublayer (PCS) circuit coupled to receive the plurality of bytes from the MAC. The PCS circuit is configured to encode each byte as a respective data symbol for transmission on the communication medium. The MAC is configured to interrupt transmission of the packet subsequent to transmitting a first portion of the plurality of bytes. The PCS circuit is configured to transmit a corresponding data symbol for each byte of the first portion and to transmit at least one control symbol in response to the interruption. The MAC is also configured to continue transmission of a second portion of the plurality of bytes, and the PCS circuit is configured to transmit corresponding data symbols for each byte of the second portion.
In other embodiments, a method comprises interrupting transmission of a packet on a communication medium. The packet comprises a plurality of bytes, and the interrupting is subsequent to transmitting a first portion of the plurality of bytes. Transmitting the first portion comprises encoding each byte of the first portion as a corresponding data symbol. Responsive to the interrupting, the method further comprises transmitting at least one control symbol on the communication medium. Transmission of the packet is continued with a second portion of the plurality of bytes, the transmission including encoding each byte of the second portion as a corresponding data symbol.
The following detailed description makes reference to the accompanying drawings, which are now briefly described.
While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.
Turning now to
The network interface controllers 12A-12B (more briefly referred to below as controllers 12A-12B) are configured to transmit and receive packets on the communication medium 10. The network interface controllers 12A-12B may be link partners for each other on the communication medium 10. A link partner may include any device coupled to a communication medium 10 with a given device and capable of communicating over the communication medium 10 with the given device. In Gigabit/10 Gigabit (G/10G) Ethernet, each physical link is established between a pair of devices which are link partners.
The controllers 12A-12B may be similar, and thus may operate in a similar fashion. The controller 12A (and portions thereof) will be described in more detail below, and the controller 12B may be similar. The controller 12B will thus be the link partner in this example.
The controller 12A may be configured to associate a channel with a given packet. On packet transmission, the channel is specified by software, by storing the packet in memory locations assigned to the channel. The controller 12A may select a channel for transmission, and may read the next packet to be transmitted on that channel from the memory system 34A. Alternatively, the host 14A may include direct memory access (DMA) circuitry that may select the channel and fetch the packet from the memory system 34A to the controller 12A (or the controller 12A may include the DMA circuitry). For packet reception, the controller 12A may include programmable packet classification filters (not shown) that may identify the channel for a received packet. The received packet may be written to memory locations in the memory system 34A assigned to that channel. Packets may include a channel ID field carrying the channel identifier, in some embodiments.
Generally, the MAC 22A may include the circuitry for transmitting packets on behalf of the host 14A, and for receiving packets on behalf of the host 14A. The MAC 22A may also include various other circuitry implementing MAC layer protocols and operations, as needed. The MAC 22A may be configured to transmit packets as a plurality of bytes and to receive packets as a plurality of bytes.
The PCS 20A is coupled to the MAC 22A, and provides the line coding/decoding for the packets being transmitted. For example, G/10G Ethernet specifies 8b/10b encoding for the data transmission on the communication medium. Accordingly, the PCS 20A receives data from the MAC 22A for transmission (e.g. packets) and converts each 8 bit byte to a 10 bit symbol. Each 10 bit symbol received from the PMA 18A is converted to the corresponding 8 bit byte and provided to the MAC 22A. In the illustrated embodiment, the Gigabit Media Independent Interface (GMII) is used between the MAC and the PCS 20A. Other embodiments may use the 10 Gigabit MII (XGMII). Still other embodiments may use any other interface.
The PMA 18A receives 8b/10b symbols from the PCS and converts them for transmission on the physical communication medium 10, and converts received signals to the 8b/10b symbols. For example, the symbols may be serially transmitted on one or more lanes of the communication medium 10. The PMD 16A includes the circuitry that physically drives and receives on the communication medium 10.
The communication medium 10 may comprise any medium over which packets may be transmitted between link partners. For example, in one embodiment, twisted pair copper cabling may be used. In another embodiment, optical fiber interconnect may be used. For Gigabit Ethernet, one lane of twisted pair or optical fiber may be provided in each direction. For 10G Ethernet, 4 lanes of optical fiber may be used in each direction typically, although twisted pair is also possible in some cases. Other communication media may be used in other embodiments. Still further, wireless communication media (e.g. broadcast in the air) may be used.
The 8b/10b symbol code space may be divided into data symbols and control symbols. Data symbols are symbols that represent particular data values. Each possible data value of a byte is mapped to at least one of the data symbols. In one implementation, each data value maps to two data symbols. One of the two symbols is selected for transmission for a given data value dependent on other transmission factors. For a given byte, the PCS 20A may be configured to generate the corresponding data symbol. The control symbols may be used to transmit control information. For example, control symbols may be defined to represent the start of a packet and the end of a packet. An idle control symbol may be defined, which indicates that no data is being transmitted. The idle control symbol is defined to be transmitted between packets (that is, between the end of packet symbol for one packet and the start of packet symbol for the next packet). The idle control symbol is also used, in some embodiments, as a control symbol transmitted if packet transmission is interrupted, described in more detail below.
The MAC 22A may be configured to interrupt transmission of a packet during the transmission. That is, the MAC 22A transmits the packet as a plurality of bytes, and may interrupt the transmission subsequent to transmitting a first portion of the bytes (and prior to transmitting a second portion of the bytes). Each portion comprises at least one byte, and may comprise any number of bytes. The MAC 22A may interrupt a packet transmission multiple times, and thus there may be additional portions (i.e. a third portion, a fourth portion, etc.).
The PCS 20A, in response to an interruption of the packet, may be configured to generate at least one control symbol for transmission. In one embodiment, for example, idle control symbols may be generated by the PCS 20A until the MAC 22A resumes transmission of the packet (or another packet, in some embodiments). The controller 12A may have a defined transmission bandwidth (e.g. a dedicated transmission path on the communication medium 10, in the illustrated embodiments). The PCS 20A may generate idle symbols to fill the transmission bandwidth until the MAC 22A resumes transmission. In other embodiments, other control symbols may be generated. For example, another control symbol may be defined to indicate that the packet transmission is being paused, and will be resumed again. Such a control symbol may be transmitted by the PCS 20A. Still further, the end of packet symbol may be used, if the receiver otherwise is informed that the actual end of packet comes later. For example, some embodiments below may transmit a channel indication and a packet indication (start, middle, or end) with the start of packet symbol. The packet indication may indicate which portion of the packet is being transmitted.
The interface between the MAC 22A and the PCS 20A may include explicit signalling of the start and end of packets. For example, the GMII interface includes a data valid signal. Assertion of the data valid signal is currently interpreted as an implicit start of packet and deassertion of the data valid signal is currently interpreted as an implicit end of packet. By adding explicit start of packet and end of packet signalling, the data valid signal may be deasserted during packet transfer to interrupt the flow of packet bytes without causing the packet to terminate. The XGMII interface includes control values in the data transfer. An additional control value may be generated, or a current control value (such as idle) may be used. Alternatively, separate explicit start of packet and end of packet signalling may be used.
The MAC 22A may interrupt packet transmission in response to one or more events, in various embodiments. Any combination of sets of events may be implemented. One event may be the reception of a flow control packet from the link partner of the controller 12A (e.g. the controller 12B in the embodiment of
By interrupting packet transmission for a flow control packet, the controller 12A may reduce the incidence of dropped packets due to buffer overflow in the link partner, in some embodiments. Less buffering may be implemented for handling the flow controlled case, in some embodiments, since 2 additional packets are not transmitted by the controller 12A after receipt of the flow control packet. By interrupting packet transmission when bytes are temporarily unavailable, effects of memory latency/contention may be mitigated and the incidence of packet dropping due to memory latency in reading the packet may be reduced, in some embodiments. Interleaving packets from different channels may permit prioritizing higher priority packets over lower priority packets even during transmission of lower priority packets, without causing packet drop, in some embodiments. Furthermore, interleaving packets may, in some embodiments, simplify interfacing to a link partner that may bridge to an explicitly channelized interface such as the system packet interface, version 4 (SPI-4).
A flow control packet may generally be any packet which, when received in a device that communicates on the communication medium, is defined to cause the receiver to inhibit transmitting at least some packets to the initiator of the flow control packet. The flow control packet specified by the Ethernet standard, which includes a time field specifying the time interval during which packet transmission is to be inhibited, may be an example of a flow control packet. In another embodiment, a channelized flow control packet may be supported which specifies the time interval but also specifies the channels to which the flow control packet applies. In response to a channelized flow control packet, transmission of packets may be inhibited for the specified channels but permitted for other channels. The MAC 22A may interrupt transmission of a packet if the packet is in one of the specified channels. The channelized flow control packet may include a channel indication field, which may be coded to identify the channel(s) (e.g. a channel number, a list of channel numbers and optionally a number of channels in the list, a bit mask with a bit per channel that may be set to identify the channel, etc.).
The hosts 14A-14B may comprise any circuitry that uses the controllers 12A-12B to connect to a network (e.g. the communication medium 10 may be part of a network). As illustrated in
It is noted that, in addition to interrupting packet transmission as described above, the controller 12A may also terminate packet transmission (e.g. due to error, lengthy expected delay, or other reasons that may make dropping the packet desirable).
Turning now to
The interface illustrated in
A start of packet (SOP) signal and an end of packet (EOP) signal are also provided to explicitly signal a start and end of a packet. The MAC 22A may assert the SOP signal along with the TXV signal for the initial byte of a transmitted packet, and may assert the EOP signal along with the TXV signal for the last byte of a transmitted packet. Thus, the TXV signal may be deasserted during packet transmission to interrupt the transmission of the packet. The PCS 20A may similarly use the SOP and EOP signals to signal the start and end of received packets, and may deassert the RXV signal to indicate that the packet being received has been interrupted. The MAC 22A may have separate SOP and EOP signals from the PCS 20A.
Since half duplex operation is not used in G/10G Ethernet, the CRS and COL signals are not used. The signal lines carrying the CRS and COL signals may be reused as one of the SOP and EOP signals, in this embodiment.
The XGMII interface specifies 32 bit TX and RX data buses (TXD and RXD buses in
Thus, the MAC 22A may use the start and terminate bytes as explicit start and end of packet indications. Alternatively, the SOP and EOP control signals may be included (dashed lines in
It is noted that, while an explicit start of packet and end of packet indication are included in the interfaces of
Turning now to
If the MAC 22A is initiating a packet transmission (decision block 40, “yes” leg), the MAC 22A may assert the SOP control signal (block 42). Similarly, if the MAC 22A has reached the end of a packet transmission or is terminating for another reason (decision block 44, “yes” leg), the MAC 22A may assert the EOP control signal (block 46). If the MAC 22A receives a flow control (FC) packet or has a flow control in progress from a previously received FC packet (decision block 48, “yes” leg), the MAC 22A may deassert the data valid (TXV) signal even if the MAC 22A has data to transmit (block 50). In this fashion, packet transmission may be interrupted and inhibited during the time interval requested by the FC packet. If no FC packet has been received or is in progress, and the MAC 22A has no data to transmit (decision block 52, “no” leg), the MAC 22A may also deassert the data valid (TXV) signal (block 50). Thus, packet transmission may be interrupted if no data is ready to be transmitted. If there is no FC packet received or in progress and there is data ready to be transmitted, the MAC 22A may assert the data valid (TXV) signal and transmit the data (block 54).
Turning now to
If the SOP signal is asserted by the MAC 22A (decision block 60, “yes” leg), the PCS 20A may transmit the SOP control symbol to the PMA 18A for transmission on the communication medium 10 (block 62). Similarly, if the EOP signal is asserted (decision block 64, “yes” leg), the PCS 20A may transmit the EOP control symbol to the PMA 18A (block 66). If the data valid (TXV) signal is asserted by the MAC 22A (decision block 68, “yes” leg), the PCS 20A may generate the 8b/10b encoding of the data (that is, the corresponding data symbol) and transmit the data symbol to the PMA 18A (block 70). If the data valid signal is not asserted by the MAC 22A (decision block 68, “no” leg), the PCS 20A may transmit the idle control symbol (block 72). Thus, the idle control symbol may be transmitted during times that a packet is interrupted, as well as between packets, in this embodiment.
Turning now to
If the PCS 20A receives an SOP control symbol from the PMA 18A (decision block 80, “yes” leg), the PCS 20A may assert the SOP control signal to the MAC 22A (block 82). Similarly, if the PCS 20A receives an EOP control symbol from the PMA 18A (decision block 84, “yes” leg), the PCS 20A may assert the EOP control signal to the MAC 22A (block 86). If an idle symbol is received from the PMA 18A (decision block 88, “yes” leg), the PCS 20A may deassert the data valid (RXV) signal to the MAC 22A (block 92). The MAC 22A may ignore the data on the RXD bus and await the next assertion of the data valid (RXD) signal. Otherwise, the PCS 20A may decode the data symbol and provide the data on the RXD bus, asserting the data valid (RXV) signal (block 90).
It is noted that the flowcharts of
Turning next to
The SOP signal is asserted coincident with the transmission of the initial byte of the packet (P1), and is then deasserted. The TXV signal is also asserted, and remains asserted for the transmission of two additional bytes (P2 and P3). The TXV signal is deasserted for 4 cycles of the TXCLK, and then reasserted for the transfer of 5 bytes (P4 to P8). The TXV signal is again deasserted for 2 cycles, then reasserted for the transfer of 2 more bytes (P9 to P10). The EOP signal is asserted coincident with the transmission of P10, signalling the end of the packet.
As mentioned previously, in some embodiments, the MAC 22A may also be configured to interleave packets from different channels. In such embodiments, the MAC 22A may transmit bytes identifying the channel number for each set of bytes transmitted on the interconnect. The controller 12A may transmit an SOP control symbol to indicate that bytes of a packet are being transmitted. Another symbol identifying the bytes as the start, middle, or end of the corresponding packet may also be transmitted.
If the transmission is the start of a packet or is the resumption of the packet after interruption (decision block 100, “yes” leg), the PCS 20A may transmit the SOP control symbol and the MAC 22A may prepend the packet data with the channel number and an indication of whether the bytes comprise the start, middle, or end of the packet (block 102). The bytes may be the start of the packet if they include the initial byte of the packet. The bytes may be the middle of the packet if they do not include the initial byte or the last byte of the packet. The bytes may be the middle of the packet if they include the last byte of the packet.
In some embodiments, each packet transmission that is interrupted may include an EOP control symbol as well as an SOP control symbol. Such transmissions may be compatible with link partners that do not implement the channel information and/or the flow controlling of packets on the communication medium 10. While two bytes/symbols are used for the channel number in the illustrated embodiment, other embodiments may use one byte/symbol or more than two bytes/symbols for the channel number. Additionally, the indication of start/middle/end of the packet may be transmitted before the channel symbols, if desired (e.g. on lane 1 instead of lane 3).
In one embodiment, the channel number symbols and the start/middle/end symbol may replace the first three bytes of the preamble of the packet. In G/10G Ethernet, the preamble is not really required, and so replacing the bytes should not affect functionality.
In some embodiments, the use of explicit flow control of packets (and/or transmission of channel information with packet data) may be enabled or disabled dependent on whether or not the link partner supports the features. For example, on Ethernet networks, an auto negotiation protocol is used at power up for link partners to determine each other's capabilities. After the standard auto negotiation, the link partners may exchange messages about other capabilities.
The controller 12A may perform standard auto negotiation (block 120) followed by negotiation for explicit flow control (block 122). If the link partner supports explicit flow control (decision block 124, “yes” leg), the controller 12A may enable explicit flow control (block 126). Otherwise, the controller 12A may disable explicit flow control (block 128). The controller 12A may also negotiate for channel information transmission (block 130). If the link partner supports channel information transmission (decision block 132, “yes” leg), the controller 12A may enable channel information transmission (block 134). Otherwise, the controller 12A may disable channel information transmission (block 136).
Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
This application is a continuation of U.S. patent application Ser. No. 11/211,259, filed on Aug. 25, 2005.
Number | Date | Country | |
---|---|---|---|
Parent | 11211259 | Aug 2005 | US |
Child | 12753466 | US |