This disclosure relates to physical (PHY) layer devices. This disclosure also relates to a PHY layer device for providing burst absorption of network traffic.
Rapid advances in electronics and communication technologies, driven by immense user demand, have resulted in vast interconnected networks of computing devices capable of exchanging immense amounts of data. Local Area Networks (LANs) may connect dozens or hundreds of computing devices in a single network. Perhaps the best known example of such interconnection of computing devices is the Internet or the World Wide Web, which continues to expand with each passing day. As technology continues to advance and interconnected computer networks grow in size and frequency of use, there is an increasing incentive to send and receive data more efficiently.
The innovation may be better understood with reference to the following drawings and description. In the figures, like reference numerals designate corresponding parts throughout the different views.
The discussion below makes reference to a PHY layer device. A PHY layer device may refer to a device that is implemented in the first layer (layer 1 or the physical layer) of the Open System Interconnection (OSI) model. Accordingly, the PHY layer device may be implemented without Media Control Access (MAC) logic. Logic implemented in the PHY layer device may be transparent to higher OSI level functionality of a network device and to external network devices as well. Other implementations of PHY layer devices are possible, however.
PHY Device 1110 includes a port interface 130. The port interface 130 may be communicatively coupled to a data port, such as an Ethernet port, a FireWire Port, a Universal Serial Bus (USB) port, or any other port configured to send or receive data, which the port may send or receive as a serial stream, or in other ways. The PHY Device 1110 may receive incoming data from the data port and transmit outgoing data to the data port through the port interface 130. The PHY Device 1110 also includes a switch interface 132 through which the PHY Device 1110 can send data to the switch 120 and receive data from the switch 120.
PHY Device 1110 further includes a receive datapath 140. PHY Device 1110 may process incoming data received at the port interface 130 through the receive datapath 140 before sending the data to the switch 120. The receive datapath 140 may include any number of units, hardware, logic, or modules to allow PHY Device 1110 to process incoming data received from a data port. In the example shown in
The deserializer unit 144 and the serializer unit 148 may collectively form a “SerDes” unit that may encode or decode data according to a SerDes encoding technique. For example, the SerDes in the receive datapath 140 (e.g., the deserializer unit 144 and the serializer unit 148) may encode incoming data received from the data port according to an 8b/10b SerDes encoding technique or a 64b/66b SerDes encoding technique. To that end, the SerDes may produce a 10 bit symbol from 8 bits of incoming data according to an 8b/10b encoding technique or a 66 bit symbol from 64 bits of incoming data according to a 64b/66b encoding technique. In one implementation, PHY Layer Device 1110 may encode the incoming data through the deserializer unit 144 prior to storing the data in the queue 146. Alternatively, PHY Layer Device 1110 may encode the incoming data through the serializer unit 148 after retrieving data from the queue 146. The queue 146 may be implemented as a First-In-First-Out (FIFO) queue.
PHY Device 1110 also includes a transmit datapath 150 that may include any number of units, hardware, logic, or modules to allow PHY Device 1110 to process outgoing data received from the switch 120 before transmitting the outgoing data to a data port. For example, the switch 120 may communicate outgoing data to PHY Device 1110 as 10 bit or 66 bit symbols encoded according to an 8b/10b or 64b/66b SerDes encoding technique. The transmit datapath 150 may also include a SerDes unit to transform the received 10 bit or 66 bit symbols into corresponding 8 bit or 64 bit outgoing data.
PHY Device 1110 shown in
In operation, the system 100 may provide burst absorption of network traffic in the switch 120 through packet buffering in PHY layer devices, such as PHY Device 1110. For example, the switch 120 may identify high levels of network traffic congestion, such as when an ingress queue of the switch 120 reaches capacity or surpasses a high congestion threshold. The absorption logic 160 may buffer incoming data received by PHY Device 1110 in the queue 146 and transmit the data to the switch 120 at a reduced rate, as discussed in greater detail below. Thus, the data accumulation rate of the switch ingress queue may be reduced due to PHY Device 1110 sending the incoming data to the switch 120 at a reduced rate. The switch 120 may then be able to process network data in the ingress queue at a rate faster than the data accumulation rate in the ingress queue. Thus, the switch 120 may lessen the amount of data stored in the ingress queue and alleviate the high level of network traffic congestion. When the high congestion level of network traffic in the switch 146 has passed, PHY Device 1110 may transmit data to the switch 120 at an accelerated rate in order to empty accumulated network data buffered in the queue 146 of PHY Device 1110.
The absorption logic 160 may throttle the rate at which bits of data are transferred (transfer rate) to the switch 120. As one example, the absorption logic 160 may slow the clock frequency of portions of the PHY device 1110, including, for example, the serializer 148 and the switch interface 132. In this way, the earlier portions of the receive datapath 140, e.g., the port interface 130, the CDR unit 142 and the deserializer unit 144, may process incoming data at the initial or normal rate while later portions of the datapath, e.g., the serializer unit 148 and the switch interface 132, process the incoming data at reduced rate. The difference in processing speed may cause incoming data to accumulate in the queue 146. Similarly, PHY Device 1110 may transfer bits of data to the switch 120 at an accelerated transfer rate by increasing the clock frequency of the later portions of the receive datapath 140, thereby emptying contents of the queue 146.
Alternatively, PHY Device 1110 may transmit the incoming data to the switch 120 at a reduced rate without changing clock frequency and without affecting the content of the incoming data. In the example shown in
In a similar fashion, PHY Device 1110 may transmit data to the switch 120 at an accelerated rate without changing clock frequency of any portion of the receive datapath 140. For example, and as discussed below, the absorption logic 160 may skip or forego transmission of selected data, such as an idle character, word, or symbol, in the received incoming data. In this manner, PHY Device 1110 may forward the incoming data to the switch 120 at an accelerated rate without changing clock frequency and without affecting the meaningful content of the incoming data, such as the non-idle content of the incoming data.
PHY Device 1110 may operate in multiple transmission modes depending on the network traffic congestion level of the switch 120. The absorption logic 160 may track the current transmission mode in which the PHY Device 1110 is operating. For example, the absorption logic 160 may store the current transmission mode as the transmission mode parameter 164. The transmission mode parameter 164 may be a value that the absorption logic 160 stores in a register or other memory space. In one implementation, PHY Device 1110 may operate in a nominal transmission mode, a throttled transmission mode, and an accelerated transmission mode. PHY Device 1110 may transition between transmission modes by changing the value of the transmission mode parameter 164 under various circumstances or in response to a transition condition. For example, the absorption logic 160 may change the transmission mode upon receiving a flow control message from the switch 120 or based on the amount of data in the queue 146. Transition between transmission modes is detailed in
In one implementation, the PHY device 210 operates in a nominal transmission mode when the queue 146 is empty. The queue 146 may remain empty during nominal transmission mode because processed incoming data is transmitted to the switch 120 at the same rate incoming data is received from the data port, e.g., at the nominal transfer rate. Thus, as seen in
During nominal transmission mode, incoming data deserialized by the deserializer unit 144 may momentarily pass through the queue 146 upon which the data may be encoded and serialized by the serializer 148. Alternatively, the absorption logic 160 may instruct the PHY device 210 to bypass use of the queue 146 during nominal transmission mode. In the example shown in
The absorption logic 160 may periodically transmit pacing data (e.g., an idle word) instead of the next word of data in the queue 146. In one implementation, the absorption logic 160 may periodically forego reading the next word from the queue 146 and transmit an idle word to the serializer unit 148 instead. Thus, the absorption logic 160 may throttle the transfer rate at which the incoming data received from the port interface 130 is transmitted to the switch 120, without changing the net rate at which data is delivered to the switch 120. As the PHY device 210 operates in a throttled transmission mode and the absorption logic 160 interleaves pacing data into the data stream transmitted to the switch 120, the amount of data stored in the queue 146, or queued data, may increase. That is, each time the absorption logic 160 inserts pacing data into the data stream instead of the next data word of the incoming data, an additional word may accumulate in the queue 146. The contents queued data may change as the PHY Device 210 receives, processes, and transmits incoming data to the switch 120. However, the longer the PHY Device 210 operates in a throttled transmission mode, the greater the amount of queued data, e.g., buffered network traffic, that may be stored in the queue 146. Any amount of queued data may reflect that a PHY layer device has previously operated or is currently operating in a throttled transmission mode. For example, as seen in
The absorption logic 160 may interleave pacing data, e.g., an idle word, character, or symbol, into the data stream at a rate or period specified by the idle injection rate parameter 165. The idle injection rate parameter 165 may be stored as a register value in the memory 162 and may be implemented as a numerical value. The absorption logic 160 may insert pacing data (e.g., an idle word) after reading a number of words from the queue 146, the number specified by the idle injection rate parameter 165. For example, the idle injection rate parameter 165 may have a value of 3. That is, after reading three words from the queue 146, the absorption logic 160 may insert an idle word into the data stream instead of reading the next word from the queue 146. As seen in
As one implementation example, the absorption logic 160 may configure an idle injection counter that increments each time a word is read from the queue 146. When the idle injection counter is equal to the value of the idle injection rate parameter 165, the absorption logic 160 may reset the counter and insert an idle word into the data stream instead of reading the next word in the queue 146. Alternatively, the absorption logic 160 may configure the idle injection counter to start at a value equal to the idle injection rate parameter 165 and decrement each time a word is read from the queue 146. When the idle injection counter reaches a value of zero, the absorption logic 160 may then insert an idle word and reset the idle injection counter to the idle injection rate parameter 165.
The absorption logic 160 may transmit queued data (e.g., data stored in the queue 146) to the switch 120 at an accelerated transfer rate by omitting transmission of unselected data from the queued data. The absorption logic 160 may skip transmission of an unselected data word and instead transmit the next data word of the incoming data. Unselected data may be any data that the absorption logic 160 omits from the data stream for transmission to the switch 120. Unselected data may be incoming data or queued data that the absorption logic 160 omits and may take any number of forms. In one example, unselected data may be similar to the pacing data inserted by the absorption logic 160 when the PHY device 210 operates in a throttled transmission mode, such as extra data that does not affect the substantive content of the incoming data stream. For example, the unselected data may be idle, NULL, or NOP words, symbols, characters, or packets implemented by any communication protocol, such as an idle word used to indicate a gap between packets of the incoming data. The absorption logic 160 may also identify unselected data from any subset, pattern, sequence, or progression of potential unselected data. For example, the absorption logic 160 may identify potential unselected data as any idle, NULL, or NOP word, symbol, character, or packet implemented by any communication protocol, or any combination thereof. The absorption logic 160 may identify a subset of potential unselected data as unselected data to omit from transmission to the switch 120.
In the example shown in
As the PHY device 210 operates in an accelerated transmission mode and the absorption logic 160 foregoes transmission of unselected data in the data stream transmitted to the switch 120, the amount of data queued data may decrease. For example, each time the absorption logic 160 omits transmission of unselected data, such as an idle word, and transmits the next data word instead, an additional word is removed from the queue 146. Content of the queue 146 may change as the PHY device 210 receives, processes, and transmits incoming data to the switch 120. However, the longer the PHY Device 210 operates in an accelerated transmission mode, the less the amount of queued data, e.g., buffered network traffic, that may be stored in the queue 146.
In the example shown in
The absorption logic 160 may omit unselected data (e.g., an identified idle word) and read the next data word from the queue 146 within a single clock cycle. That is, the absorption logic 160 may identify if the first data word read from the queue 146 in a clock cycle is potential unselected data and omit the identified potential unselected data from the data stream, whereupon the potential unselected data becomes unselected data. The absorption logic 160 may read and send the next data word in the queue 146. The identification of potential unselected data, omitting of the unselected data, reading of the next queued data word, and sending of the next queued data word may occur within a single clock cycle. In
In alternative implementations, the absorption logic 160 may skip transmission of the first data word and the second data word if both are identified as unselected data within a clock cycle, and instead transmit the third data word read from the queue 146. Similarly, the absorption logic 160 may skip any number of consecutively identified unselected data (e.g., idle words) from the queue 146, which may be limited by the amount of unselected data (e.g., number of idle words) the absorption logic 160 can identify within a single clock cycle.
In the example shown in
For example, the idle skip rate parameter 166 may have a value of 8. In one implementation, after identifying 8 potential unselected data words (e.g., idle words) read from the queue 146, the absorption logic 160 may identify the next potential unselected data word (e.g., idle word) as unselected data. That is, the absorption logic 160 may omit the next identified potential unselected data (e.g., idle word) from the data stream to the switch 120. Instead, the absorption logic 160 may read the next word from the queue 146. The absorption logic 160 may configure the idle skip rate parameter 166 to prevent the PHY Device 210 from emptying data buffered in the queue 146 too quickly, which may overwhelm the switch 120.
As one implementation example, the absorption logic 160 may configure an idle skip counter that increments each time the first data word read from the queue 146 is identified as potential unselected data. When the idle skip counter reaches a value equal to the idle skip rate parameter 166, the absorption logic 160 may omit transmission of the next identified potential unselected data word, thereby identifying this next potential unselected data as unselected data. The absorption logic 160 may then read the next queued data word to process. Alternatively, the absorption logic 160 may configure the idle skip counter in a reverse manner, decrementing to zero before skipping transmission of an unselected data word.
The switch logic 520 may transmit a flow control message to the PHY device 210 when the switch logic 520 identifies a high congestion condition. The high congestion condition may be determined based on network traffic level in the switch 120, for example when the amount of data in the ingress 510 exceeds a high congestion threshold parameter. The switch logic 520 may configure the high congestion threshold parameter to be a numerical value stored as a register value in the memory 530. As one example, the switch logic 520 may configure the high congestion threshold parameter to be 80% of the capacity of the ingress queue 510. In this example, when switch logic 520 identifies the amount of data in the ingress queue 510 has exceeded 80% of the capacity of the ingress queue 510, the switch logic 520 may send the control message 550 to the PHY device 210. For instance, the switch logic 520 may send a throttle message to the PHY device 210 instructing the PHY device 210 to reduce the transfer rate of incoming data to the switch 120. The PHY device 210 may receive the throttle message and the absorption logic 160 may transition operation of the PHY device 210 to a throttled transmission mode and insert pacing data into the data stream transmitted to the switch 120. In one implementation, the switch logic 520 can disregard or drop the pacing data, such as an idle word, received from the PHY device 210 instead of adding the pacing data to the ingress queue 510 for processing.
The switch logic 520 may also transmit a flow control message, such as the control message 550, to the PHY device 210 when the switch logic 520 identifies that a high congestion condition has been relieved. For instance, the switch logic 520 may identify that a congestion condition has been relieved when the amount of data in the ingress queue 510 drops below the high congestion threshold parameter discussed above. The switch logic 520 may then send a control message 550 to the PHY device 210 instructing the PHY device 210 to accelerate the transfer rate of incoming data to switch 210.
Alternatively, the switch logic 520 may transmit a control message 550 to the PHY device 210 when the switch logic 520 identifies a low congestion condition. The low congestion condition may be determined based on network traffic level in the switch 120, for example when the amount of data in the ingress 510 drops below a low congestion threshold parameter. The switch logic 520 may configure the low congestion threshold parameter to be a numerical value stored as a register value in the memory 530. As one example, the switch logic 520 may configure the low congestion threshold parameter to be 50% of the capacity of the ingress queue 510. In this example, when switch logic 520 identifies the amount of data in the ingress queue 510 has dropped below 50% of the capacity of the ingress queue 510, the switch logic 520 may send a control message 550 to the PHY device 210. For instance, the switch logic 520 may transmit an accelerate message to the PHY device 210 instructing the PHY device 210 to accelerate the transfer rate of incoming data to the switch 120.
The PHY device 210 may receive a control message 550 from the switch 120 in the switch interface 132. The switch interface 132 may then pass the control message 550 through the transmit datapath 150. The absorption logic 160 may identify the control word 550 in the transmit datapath 150, for example by inspecting encoded deserialized words in the transmit datapath 150. Upon identifying a control message 552, the absorption logic 160 may respond based on information or instructions identified from the control word 550, for example to transition the transmission mode the PHY device 210 operates in.
In one implementation, the absorption logic 160 of the PHY device 210 may communicate a control message to the switch 120. For instance, the absorption logic 160 may send an overflow message 552 to the switch 120 when the absorption logic 160 identifies an overflow condition. The absorption logic may identify an overflow condition when the amount of data in the queue 146 exceeds an overflow threshold parameter. The absorption logic 160 may configure the overflow threshold parameter to be a numerical value stored as a register value in the memory 162. As one example, the absorption logic 160 may configure the overflow threshold parameter to be 95% of the capacity of the queue 146. In this example, when absorption logic 160 identifies the amount of data in the queue 146 has exceeded 95% of the capacity of the queue 146, the absorption logic 160 may send an overflow message 552 to the switch 120. The absorption logic 160 may also transition operation of the PHY device 210 to an accelerated transmission mode to lessen the amount of data in the queue 146 or a nominal transmission mode to prevent data loss or ensure the amount of data in the queue 146 does not overflow.
In response to receiving an overflow message 552 from the PHY device 210, the switch logic 520 may use flow control methods specified by a communication protocol to stem network congestion levels in the switch 120. For example, the switch logic 520 may send an Ethernet PAUSE frame directed to external devices transmitting data to the switch 120. In another implementation, the switch logic 520 may not take any action in response to receiving the overflow message 552 from the PHY device 210, which may result in overflow of the ingress queue 510 and packet loss in the switch 120.
The PHY device 210 and the switch 120 may exchange flow control messages, such as the control message 550 and the overflow message 552, in any number of ways. In the example shown in
The absorption logic 160 may transition operation of the PHY device 210 from the nominal transmission mode 610 to the throttled transmission mode 620 in response, for example, to the transition condition 640 of receiving a throttle message from the switch 120. The absorption logic 160 may transition operation of the PHY device 210 from the accelerated transmission mode 630 to the throttled transmission mode 610 in response, for instance, to the transition condition 641 when the absorption logic 160 receives a throttle message and when the queue 146 has not exceeded an overflow threshold parameter.
The absorption logic 160 may also transition operation of the PHY device 210 to the nominal transmission mode 610. In one implementation shown in
Concerning transitions to the accelerated transmission mode 630, the absorption logic 160 may transition operation of the PHY device 210 to the accelerated transmission mode 630 in response to various transition conditions, such as the transition condition 643 occurring when the absorption logic 160 receives an accelerate control message from the switch 120. In an alternative embodiment where the PHY device 210 may operate in the nominal transmission mode 210 even when the queue 146 is not empty, the absorption logic 160 may likewise transition operation of the PHY device 210 from the nominal transmission mode 610 to the accelerated transmission mode 630 when the absorption logic 160 receives an accelerate message. Also, the absorption logic 160 may transition operation of the PHY device 210 from the throttled transmission mode 620 to the accelerated transmission mode 630 in response to the transition condition 644 occurring when the amount of data in the queue 146 exceeds an overflow threshold (which may be identified through an overflow threshold parameter), such as 95% of the capacity of the queue 146. When the amount of data in the queue 146 exceeds the overflow threshold, the absorption logic 160 may also transmit an overflow control message to the switch, such as the overflow message 552.
In one implementation, burst absorption activity by any logic, module, or unit of the absorption logic 160, the PHY device 210, the switch 120, or the switch logic 520 may operate according to the Physical Coding Sublayer (PCS). That is, the PHY device 210 and the absorption logic 160 may provide burst absorption to the switch 120 without any additional MAC logic. Similarly, the switch 120 and the switch logic 520 may manage flow control of the PHY device 210 without any additional MAC logic as well. The flow control messages communicated between the PHY device 210 and the switch 120 may also be implemented according to the PCS, for example through reserved PCS encodings on the SerDes link. In this way, the exchange of flow control messages between the PHY device 210 and the switch 120 as well as burst absorption activities by the PHY device 210 and the switch 210 may be transparent to external devices on the network or higher layer processing on the network device that implements the PHY device 210 and the switch 120.
The methods, devices, and logic described above may be implemented in many different ways in many different combinations of hardware, software or both hardware and software. For example, all or parts of the system may include circuitry in a controller, a microprocessor, or an application specific integrated circuit (ASIC), or may be implemented with discrete logic or components, or a combination of other types of analog or digital circuitry, combined on a single integrated circuit or distributed among multiple integrated circuits. All or part of the logic described above may be implemented as instructions for execution by a processor, controller, or other processing device and may be stored in a tangible or non-transitory machine-readable or computer-readable medium such as flash memory, random access memory (RAM) or read only memory (ROM), erasable programmable read only memory (EPROM) or other machine-readable medium such as a magnetic or optical disk. Thus, a product, such as a computer program product, may include a storage medium and computer readable instructions stored on the medium, which when executed in an endpoint, computer system, or other device, cause the device to perform operations according to any of the description above.
The processing capability of the system may be distributed among multiple system components, such as among multiple processors and memories, optionally including multiple distributed processing systems. Parameters, databases, and other data structures may be separately stored and managed, may be incorporated into a single memory or database, may be logically and physically organized in many different ways, and may implemented in many ways, including data structures such as linked lists, hash tables, or implicit storage mechanisms. Programs may be parts (e.g., subroutines) of a single program, separate programs, distributed across several memories and processors, or implemented in many different ways, such as in a library, such as a shared library (e.g., a dynamic link library (DLL)). The DLL, for example, may store code that performs any of the system processing described above. While various embodiments of the invention have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.