A collection of servers may be used to create a distributed computing environment. The servers may process multiple applications by receiving data inputs and generating data outputs. Network switches may be used to route data from various sources and destinations in the computing environment. For example, a network switch may receive network packets from one or more servers and/or network switches and route the packets to other servers and/or network switches. It may be the case that, as a packet is transmitted from one switch to another, the packet becomes corrupted. Corruption may be caused by faulty wiring in the network, electromagnetic interference, data noise introduced by a switch, or any other undesired network abnormality.
Many aspects of the present disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily drawn to scale, emphasis instead being placed upon clearly illustrating the principles of the disclosure. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views.
The present disclosure relates to debugging a packet is switched through a network made up of multiple nodes. As the packet is transmitted along a route, the packet may become corrupted. Various embodiments of the present disclosure allow for the identification of the source of corruption when the packet is transmitted along a multi-hop path.
Some packets may be handled as a store-and-forward packet (SAF packet) or a cut-through packet (CT packet). An SAF packet is a packet that is switched from one network node to another. At each network node, the entire SAF packet is received, stored, processed, and then forwarded to the next network node. Because an entire SAF packet is received by a network node before the SAF packet is forwarded, it may be relatively easy to identify the instant when an SAF is subjected to corruption using error detection and correction logic contained in each packet. A CT packet is a packet that is received by a particular network node and then forwarded to the next network node before the particular network node completely receives the CT packet. That is to say, a network node begins forwarding a beginning portion of a CT packet while or before the network node receives an end portion of the CT packet. In this respect, it may be the case that, at a single point in time, a CT packet is handled by multiple network nodes. Since error-detection is typically performed by a network node after receiving the last-bit, it may be difficult to detect an error prior to the start of packet transmission. The present disclosure allows for the identification of the source of corruption for a network that handles packets as CT packets.
With reference to
The access layer of the computing environment 100 may comprise a collection of computing devices such as, for example, servers 109. A server 109 may comprise one or more server blades, one or more server racks, or one or more computing devices configured to implement distributed computing.
To this end, a server 109 may comprise a plurality of computing devices that may be arranged, for example, in one or more server banks, computer banks, or other arrangements. For example, the server 109 may comprise a cloud computing resource, a grid computing resource, and/or any other distributed computing arrangement. Such computing devices may be located in a single installation. A group of servers 109 may be communicatively coupled to a network node 113. The network node 113 may relay input data to one or more servers 109 and relay output data from one or more servers 109. A network node 113 may comprise a switch, a router, a hub, a bridge, or any other network device that is configured to facilitate receiving, storing, processing, forwarding, and/or routing of packets.
The aggregation/distribution layer may comprise one or more network nodes 113. The network node 113 of the aggregation/distribution layer may route or otherwise relay data between the access layer. The core layer may comprise one or more network nodes 113 for routing or relaying data between the aggregation/distribution layer. Furthermore, the core layer may receive inbound data from a network 117 and route the incoming data throughout the core layer. The core layer may receive outbound data from the aggregation/distribution layer and route the outbound data to the network 117. Thus, the computing environment 100 may be in communication with a network 117 such as, for example, the Internet.
The computing environment 100 may further comprise a network state monitor 121. The network state monitor 121 may comprise one or more computing devices that are communicatively coupled to one or more network nodes 113 of the computing environment 100. The network state monitor 121 may be configured to execute one or more monitoring applications for identifying when packets are dropped in the computing environment 100.
The computing environment 100 is configured to generate, store, update, route, and forward packets 205. A packet may vary in size from a few bytes to many kilobytes. A packet 205 expresses information that may be formatted in the digital domain. For example, the packet 205 may include a series of 1's and 0's that represent information.
Next, a general description of the operation of the various components of the computing environment 100 is provided. To begin, the various servers 109 may be configured to execute one or more applications or jobs in a distributed manner. The servers 109 may receive input data formatted as packets 205. The packets 205 may be received by the server 109 from a network 117. The received packets 205 may be routed through one or more network nodes 113 and distributed to one or more servers 109. Thus, the servers 109 may process input data that is received via the network 117 to generate output data. The output data may be formatted as packets 205 and transmitted to various destinations within the computing environment 100 and/or outside the computing environment 100.
As the servers 109 execute various applications, packets 205 are switched from one network node 113 to the next network node 113 to reach a destination. The route a packet 205 takes in the computing environment 100 may be characterized as a multi-hop path. The computing environment 100 may include undesirable conditions that cause a packet 205 to experience corruption as it travels along a multi-hop path. Corruption may be caused by faulty wiring in the computing environment 100, electromagnetic interference, data noise introduced by a network node 113 or server 109, or any other undesired network abnormality. As a result, corruption causes the bits of a packet 205 to be altered in a manner that leads to an undesirable destruction of the data included in the packet 205. The source of the corruption may be attributed to a particular component in the computing environment 100. Various embodiments of the present disclosure relate to identifying the source of the corruption. Remedial action may be taken in response to identifying the corruption source.
The packet 205 may be handled in the computing environment 100 according to a particular scheme. According to a store-and-forward (SAF) scheme, the packet 205 is handled as an SAF packet 205 such that a network node 113 may receive the SAF packet. Thereafter, the network node 113 may store the SAF packet 205 in memory such as a packet buffer. In this respect, the network node 113 absorbs the entire SAF packet 205 and stores the entire SAF packet 205 in a memory. After the entire SAF is absorbed and stored, the network node 113 may process the SAF packet 205 and then forward the SAF packet 205 to the next network node 113. Processing the SAF packet 205 may involve performing error detection, packet scheduling, packet prioritization, or any other packet processing operation.
The packet 205 may be alternatively handled according to a cut-through (CT) scheme such that the packet 205 is handled as a CT packet 205. This is explained in further detail below with respect to at least
In
Specifically, in the non-limiting example of
In
The first network node 113a receives the first CT packet portion 205a as discussed in
In
The first network node 113a receives the third CT packet portion 205c while the second network node 113b receives the second CT packet portion 205b from the first network node 113a and while the third network node 113c receives the first packet portion 205a from the second network node 113b.
In
The first network node 113a forwards the third CT packet portion 205c to the second network node 113b. The second network node 113b receives the third packet portion 205c while forwarding the second CT packet portion 205b to the third network node 113c. The point of time represented in
In
The second network node 113b forwards the third CT packet portion 205c to the third network node 113c. The point in time represented in
The non-limiting examples of
With regard to
The network node 113 may correspond to a switch, a router, a hub, a bridge, or any other network device that is configured to facilitate the receiving, routing and forwarding of packets 205. The network node 113 is configured to receive a packet 205 from a source and route the packet to or from a destination. The network node 113 may comprise one or more input ports 209 that are configured to receive one or more packets 205. The network node 113 also comprises a plurality of output ports 211. The network node 113 may perform various operations such as prioritization and/or scheduling for routing a packet 205 from one or more input ports 209 to one or more output ports 211.
The network node 113 may be configured to handle the packet 205 as an SAF packet, as a CT packet, or as either an SAF packet or as a CT packet. The time it takes for a packet 205 to flow through at least a portion of the network node 113 may be referred to as a “packet delay.” The packet delay under an SAF scheme may be greater than the packet delay under a CT scheme because the SAF scheme may require that the entire packet 205 be received before the packet 205 is forwarded.
The network node 113 comprises one or more ingress packet processors 214. Each ingress packet processor 214 may be configured to be bound to a subset of input ports 209. In this sense, an ingress packet processor 214 corresponds to a respective input port set. In addition to associating an incoming packet to an input port set, the ingress packet processors 214 may be configured to process the incoming packet 205.
The network node 113 also comprises one or more egress packet processors 218. An egress packet processor 218 may be configured to be bound to a subset of output ports 211. In this sense, each egress packet processor 218 corresponds to a respective output port set. In addition to associating an outgoing packet to an output port set, the egress packet processors 218 may be configured to process the outgoing packet 205.
Inbound packets 205, such as those packets received by the input ports 209, are processed by processing circuitry 231. In various embodiments, the processing circuitry 231 is implemented as at least a portion of a microprocessor. The processing circuitry 231 may include one or more circuits, one or more processors, application specific integrated circuits, dedicated hardware, digital signal processors, microcomputers, central processing units, field programmable gate arrays, programmable logic devices, state machines, or any combination thereof. In yet other embodiments, processing circuitry 231 may include one or more software modules executable within one or more processing circuits. The processing circuitry 231 may further include memory 234 configured to store instructions and/or code that causes the processing circuitry 231 to execute data communication functions.
In various embodiments the processing circuitry 231 may be configured to prioritize, schedule, or otherwise facilitate a routing of incoming packets 205 to one or more output ports 211. The processing circuitry 231 receives a packet 205 from one or more ingress packet processor 214. The processing circuitry 231 may perform operations such as packet scheduling and/or prioritization of a received packet 205. To this end, the processing circuitry 231 may comprise a traffic manager for managing network traffic through the network node 113.
To execute the functionality of the processing circuitry 231, a memory 234 may be utilized. For example, the processing circuitry 231 may comprise memory 234 for storing packets 205. In an SAF scheme, the memory 234 may be used to store the entire inbound packet 205 before the packet 205 is transmitted to the next network node 113.
After a packet 205 has been processed, the processing circuitry 231 sends the packet 205 to one or more egress packet processors 218 for transmitting the packet 205 via one or more output ports 211. To this end, the processing circuitry 231 is communicatively coupled to one or more ingress packet processors 214 and one or more egress packet processors 218. Although a number of ports/port sets are depicted in the example of
The processing circuitry 231 may include an error detector 237 for detecting whether the received packet 205 has been corrupted. The error detector 237 may execute an error detection operation such as, for example, a cyclic redundancy check (CRC). To detect an error, the packet 205 may include a frame check sequence that indicates a predetermined checksum. Before the packet 205 is received a frame check sequence is generated for the packet 205 using an error detection algorithm such as CRC or any other hash function. The error detector 237 performs the error detection operation to generate a checksum 240. The checksum 240 is compared to the frame check sequence to determine whether a mismatch exists. If there is no mismatch, then it may be the case that the packet 205 was received without corruption. In other words, if the frame check sequence matches the checksum 240, then it may be deemed that the received data of the packet 205 is accurate and not corrupted. However, if there is a mismatch between the frame check sequence and the checksum 240, then corruption may have occurred such that the bits contained in the packet 205 have been undesirably altered.
According to some embodiments, the processing circuitry 231 includes a packet scheme selector 243. The packet scheme selector 243 determines whether to handle the packet 205 as a CT packet or an SAF packet. The functionality of the packet scheme selector is discussed in further detail below with respect to at least
The following is a general description of the operation of the various components of the network node 113 that allow for identifying a source of corruption using debug indicators. The computing environment 100 may be configured to accommodate CT packets while allowing for identification of a source of corruption. The network node 113 may receive packets 205 that are handled as CT packets. The network node 113 may initiate an error check operation using the error detector 237. The error detector 237 may perform the error detector operation on portions of a CT packet 205 as the CT packet is by the network node 113. In this respect, a running error detection operation is initiated before the CT packet 205 is completely received by the network node 113. Thus, the error detector 237 begins calculating the checksum 240 while the CT packet 205 is being received. The error detector 237 may complete the calculation of the checksum 240 after the CT packet 205 is completely received.
The CT packet 205 includes a frame check sequence. The error detector compares the checksum 240 of the CT packet 205 to the frame check sequence included in the CT packet 205 to determine whether the data of the CT packet 205 has been corrupted. If there is no corruption (i.e., the frame check sequence matches the checksum 240), then no action is taken.
If there is a mismatch between the checksum 240 and the frame check sequence, then the processing circuitry 231 may generate a debug indicator to indicate that the CT packet 205 is corrupted. The processing circuitry 231 may insert the debug indicator into the CT packet 205. The debug indicator may be a tag, a signature, or any additional packet data inserted into the CT packet 205. The debug indicator is used to record the instance where corruption is first identified as the CT packet 205 travels along a multi-node path. In some embodiments, the processing circuitry 231 may insert the debug indicator by replacing the frame check sequence with the debug indicator. In this case, the size of the debug indicator equals the size of the frame check sequence. By replacing the frame check sequence with the debug indicator, the overall size of the CT packet may remain unchanged. In other embodiments, the debug indicator is inserted into the CT packet to supplement the CT packet as a packet addition. In this case, the CT packet size may increase with the addition of the debug indicator.
With reference to
The frame check sequence 312 is generated prior to the packet 205 being received by the network node 113. The frame check sequence 312 may be generated according to an error detection function that is used to verify whether the packet 205, as received by the network node 113, has been corrupted. The frame check sequence 312 is a value included in a frame check sequence frame. The frame check sequence frame may be positioned in the CT packet 205 according to a packet format protocol used by the various components in the computing environment 100 (
With reference to
The processing circuitry 231 (
However, if the checksum 240 mismatches the frame check sequence 312, then it is deemed that the CT packet 205 is corrupted. In response to detecting corruption of the CT packet 205, the processing circuitry 231 generates a debug indicator 403 that signals that the CT packet 205 is corrupted.
The debug indicator 403 may include a global indicator 408, a local indicator 411, a toggle flag 414, or any other information used to identify a source of corruption. The global indicator 408 is a signature that indicates to the various components in a computing environment 100 (
The debug indicator 403 may also include a local indicator 411. The local indicator 411 may be a value that is dedicated to a particular network node 113. The local indicator 411 may be a unique identifier that corresponds to a network node 113 such that a network administrator may identify the specific network node 113 based on the local indicator 411. In response to detecting corruption, the network node 113 may insert the local indicator 411 into the CT packet 205 to allow a network administrator to identify which network node 113 initially detected the corruption.
The processing circuitry 231 may insert the debug indicator 403 into the CT packet 205 in response to detecting corruption. One or more next network nodes 113 may determine that corruption was previously detected based on the global indicator 408 and determine which network node 113 initially detected the corruption based on the local indicator 411.
In some embodiments, the processing circuitry 231 may insert the debug indicator 403 into the CT packet 205 by replacing the frame check sequence 312 with the debug indicator 403. By replacing the frame check sequence 312 with the debug indicator 403, the CT packet frame format may not need to be appended or adjusted. However, by effectively overriding the frame check sequence 312, it is likely that the one or more next network nodes 113 will determine a mismatch between the generated checksum 240 and the value in the frame check sequence frame, where the value of the frame check sequence frame was previously replaced with the debug indicator 403. However, because the global indicator 408 is included in the frame check sequence frame, one or more next network nodes 113 may determine that the corruption was previously detected.
It is statistically possible that the debug indicator 403 is equal to the checksum value 240. The consequence of this is that a next network node 113 or network administrator will be unable to differentiate between an inserted debug indicator 403 and the next node's 113 calculated checksum 240. According to various embodiments, to address this situation, a toggle flag 414 may be used by the particular network node 113 that initially detects corruption. This particular network node 113 sets the toggle flag 414 to specify whether the debug indicator 403 is equal to the checksum 240.
Turning now to
To begin, at 603, the processing circuitry 231 initiates an error detection operation on a received packet 205. According to various embodiments, the packet is a CT packet 205. The processing circuitry may use an error detector 237 (
At 606, the processing circuitry 231 generates a checksum 240 (
At 609, the processing circuitry 231 compares the checksum 240 to the frame check sequence 312 to determine whether the CT packet may be corrupted. If there is no mismatch between the checksum 240 and the frame check sequence 312, the flowchart ends. It is noted that the CT packet 205 is forwarded to the next network node 113 at any point in time regardless of whether the CT packet 205 is corrupted.
At 612, if there is a mismatch between the checksum 240 and the frame check sequence 312, then the processing circuitry 231 determines whether a previous network node 113 has inserted the debug indicator 403 (
At 615, if the debug indicator 403 or a portion thereof is included in the CT packet 205, then the flowchart ends. This reflects the fact that the instant network node 113 is not the first network node 113 to determine that the CT packet 205 is corrupted. However, if the debug indicator 403 or a portion thereof is not included in the CT packet 205, then the CT packet 205 is the first network node to determine that the CT packet 205 is corrupted. Accordingly, the flowchart branches to 618.
At 618, the processing circuitry generates a debug indicator 403 to indicate corruption of the CT packet 205. The debug indicator 403 may signal to other network nodes 113 that corruption has been detected and additionally, the debug indicator 403 may specify the identity of the network node 113 in order to determine a source of the corruption.
At 621, if the generated debug indicator 403 is the same as the checksum 240 calculated by the processing circuitry 231, then the processing circuitry, at 624, sets the toggle flag 414 of the debug indicator 403. The processing circuitry 231 may insert the debug indicator 403 into the CT packet 205 as discussed above in the non-limiting example of
The following is a general description of the operation of the various components of a network node 113 (
Referring to
Specifically,
To begin, at 702, the processing circuitry 231 determines a packet processing scheme. The packet processing scheme may be a cut-through (CT) packet processing scheme or a store-and-forward (SAF) packet processing scheme. In some embodiments, the packet processing scheme is determined according to an outcome of a number generator. The number generator comprises a deterministic random bit generator that generates the outcome according to a predefined probability. The predefined probability may be static or adjustable to control the percentage of packets 205 that are handled as CT packets 205 or SAF packets 205. Thus, the packet processing scheme may be selected randomly where the probability for an outcome is predetermined.
In other embodiments, the processing circuitry 231 determines a packet processing scheme by selecting 1 out of N inbound packets 205 to be handled as an SAF packet. For example, if N=5, then one out of five sequential inbound packets are handled according to an SAF scheme while the other four packets are handled according to a CT scheme.
In other embodiments, the inbound packet 205 may be marked to specify how the inbound packet is to be handled. The inbound packet 205 may include a marker that corresponds to an SAF scheme or a CT scheme. Thus, the packet scheme selector 243 determines which packet processing scheme to apply to the inbound packet 205 according to the marker included in the inbound packet 205.
It may be the case that CT packets 205 have less packet delay because a network node 113 that receives a CT packet 205 may begin transmitting the CT packet 205 to the next network node 113 before the CT packet 205 is completely received by the network node 113. On the other hand, it may be desirable to perform error detection on SAF packets 205 because an SAF packet 205 may be dropped or otherwise flagged if an error is detected. Thus, it may be easier to identify a corruption source for an SAF packet 205. Accordingly, in the case where the packet processing scheme is selected by a predefined probability or by a value of N, the predefined probability or the value of N may be optimized to allow a significant percentage of packets 205 to be handled as CT packets.
At 705, the processing circuitry 231 uses a packet scheme selector 243 to select the scheme to be a CT scheme or an SAF scheme. If the packet scheme selector 243 selects a CT scheme, then the inbound packet 205 is handled as a CT packet 205 and the flowchart branches to 708.
At 708, the processing circuitry 231 processes the inbound packet 205 according to a CT scheme. In this respect, the processing circuitry 231 forwards a beginning portion of the inbound packet 205 to a next network node 113 before an ending portion of the inbound packet 205 is received by the network node 113.
At 711, in some embodiments, the processing circuitry 231 may perform error detection on the inbound CT packet 205. At least some of the functionality depicted in
At 705, if the packet scheme selector 243 selects an SAF scheme, then the inbound packet 205 is handled as an SAF packet 205 and the flowchart branches to 714. At 714, the processing circuitry 231 processes the inbound packet 205 according to an SAF scheme. In this respect, the processing circuitry 231 stores the inbound SAF packet 205 in the memory 234 (
At 717, the processing circuitry 231 performs error detection on the stored SAF packet 205. The error detector 237 may calculate a checksum 240 (
At 723, the processing circuitry 231 drops the corrupted SAF packet 205. The processing circuitry 231 may send a message to a network state monitor 121 (
If there is no corruption, then the flowchart branches to 726. At 726, the processing circuitry 231 inserts an error detection status into the SAF packet 205. The error detection status indicates that the error detection operation was performed. This information may be used to determine a source of corruption if that SAF packet 205 were to later become corrupted downstream. According to some embodiments, the error detection status may be inserted into unused portions of the packet header. The error detection status may also indicate that the inbound packet 205 was handled as an SAF packet. Thereafter, the processing circuitry 231 forwards the SAF packet 205 to the next network node 113.
According to various embodiments, the predetermined probability may be adjusted according to the size of the inbound packet or it may relate to the number of instances when a particular network node 113 detected corruption. The probability may be dynamically adjusted when the network node 113 detects corruption or it may be set manually by a network administrator.
The flowcharts of
Although the flowcharts of
Also, any logic or application described herein, including the processing circuitry 231, that comprises software or code can be embodied in any non-transitory computer-readable medium for use by or in connection with an instruction execution system such as, for example, a processor in a computer system or other system. In this sense, the logic may comprise, for example, statements including instructions and declarations that can be fetched from the computer-readable medium and executed by the instruction execution system. In the context of the present disclosure, a “computer-readable medium” can be any medium that can contain, store, or maintain the logic or application described herein for use by or in connection with the instruction execution system.
The computer-readable medium can comprise any one of many physical media such as, for example, magnetic, optical, or semiconductor media. More specific examples of a suitable computer-readable medium would include, but are not limited to, magnetic tapes, magnetic floppy diskettes, magnetic hard drives, memory cards, solid-state drives, USB flash drives, or optical discs. Also, the computer-readable medium may be a random access memory (RAM) including, for example, static random access memory (SRAM) and dynamic random access memory (DRAM), or magnetic random access memory (MRAM). In addition, the computer-readable medium may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or other type of memory device.
It should be emphasized that the above-described embodiments of the present disclosure are merely possible examples of implementations set forth for a clear understanding of the principles of the disclosure. Many variations and modifications may be made to the above-described embodiment(s) without departing substantially from the spirit and principles of the disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.
The present application claims the benefit of and priority to co-pending U.S. Provisional patent application titled, “Cut-Through Packet Management”, having Ser. No. 61/880,492, filed Sep. 20, 2013, which is hereby incorporated by reference herein in its entirety for all purposes.
Number | Date | Country | |
---|---|---|---|
61880492 | Sep 2013 | US |