A packet-level routing protocol is used in typical topologies of data centers and high-performance computing (HPC) applications. Such a protocol encodes a data packet with an address header. The address header contains information of sending and receiving end-points, how long the packet has been in the network, a number of times it has been re-routed, etc. The information allows the protocol to make appropriate decisions regarding if and how and where to route the packet.
The present disclosure, in accordance with one or more various embodiments, is described in detail with reference to the following figures. The figures are provided for purposes of illustration only and merely depict typical or example embodiments.
The figures are not exhaustive and do not limit the present disclosure to the precise form disclosed.
The current packet routing protocols provide flexibility and robustness of routing packets over an optical communication network. For example, a packet is given an address header that includes routing information. When a node in the network receives the packet in an optical domain, the node first converts the optical signals of the packet including the address header and the payload of the packet from the optical domain (optical signals) to the electrical domain (electrical signals), and stores the electrical signals at a buffer. The node then reads the address header in the electrical domain to determine where to send the packet to. Once a routing decision is made, the address header and the payload of the packet are read from the buffer and converted to optical signals to be transmitted to the optical communication network.
Such protocol implementations come at the expense of intricate hardware-aware design for the protocol, electrical buffering for storing the data (header and payload) and latencies in the routing and converting. Further, the header reduces the effective data bandwidth of the communication network as the bits that comprise the header are not meaningful data for the workload that sends the data. While these conveniences of such protocols might be desired for certain applications, certain HPC workloads may suffer significant latencies.
As optical interconnects are increasingly implemented within communication networks, including serving as the fabric for HPC implementations, latency is further increased due to the need for optical to electrical conversions. Optical communications provide higher bandwidth potential over traditional electrical cabling and can transmit signals effectively over longer distances. However, the data must be converted into the electrical domain in order for the processors of the nodes to use the received data. Not only must the optical data be converted into the electrical domain for the processor to interpret, if the data is meant for a different endpoint or node, the data must be converted back into the optical domain for transmission. This increases the latency in message response.
Embodiments of the technology disclosed herein provides systems and methods for optical routing of data packets. The technology reduces the latency of data transmission between nodes within a networking fabric by keeping the routing of data packets entirely in the optical domain before any optical-to-electrical conversions. In various embodiments, each node within the network includes a photonics circuit configured to transmit and receive optical signals. The technology allows routing decision-making in the optical domain. In some embodiments, a bit is used to route a packet one way or another at a decision-making circuit. That bit is then stripped off from the overall packet. The technique uses very minimal overhead and effectively increases the payload to near 100%. In some implementations, the need for buffering and complicated protocols are eliminated by allowing the hardware of a node to make routing decisions in the optical domain. Buffering can be eliminated as paths are open for traffic and there is no contention in the system. Elimination of the software-level decisions enables that the data may be transmitted through the network substantially at the speed of light.
The network 100 further includes a network controller 102 coupled to each of the nodes of the network 100. The network controller 102 is configured to provision and control the nodes of the network 100. In some embodiments, the network controller 102 may provision each of the nodes such that each of the nodes, upon receiving a packet in the optical domain, is to read the first bit of a routing header of the packet to make a routing decision for the packet, strip the first bit of the routing header, and route the remainder of the packet to the network based on the routing decision. In some embodiments, a routing header may be encoded as a binary string.
As a non-limiting example, a routing header of a packet may be a binary string “100.” After the node 1 receives the packet and reads the first bit “1” of the routing header, the node 1 can make a routing decision such that the packet is to be transmitted through its output port 11 (i.e., to the right of the node 1). The first bit “1” is then stripped from the binary string such that the packet now has an updated routing header with a binary string “00.” The packet is then transmitted from the output port 11 of the node 1 to the node 21. The node 21 receives the packet and reads the first bit “0” of the updated routing header “00” to make a routing decision such that the packet is to be transmitted through its output port 210 (i.e., to the left of the node 21). The first bit “0” of the updated routing header is then stripped from the binary string such that the packet now has a routing header with a binary string “0.”
The packet is then transmitted from the output port 210 of the node 11 to the node 310. The node 310 receives the packet and reads the first bit “0” of the routing header to make a routing decision such that the packet is to be transmitted through its output port 3100 (i.e., to the left of the node 310). The first bit “0” of the routing header is then stripped from the binary string such that the routing header contains no routing data/binary string or the routing header is now eliminated. At this point, the packet contains only the payload and no routing header. The packet is then transmitted from the output port 3100 of the node 310 to the node 4100. Because the packet contains no routing header, indicating that the receiving node is the destination of the packet, the node 4100 does not make any routing decision and can proceed to convert the optical payload to electrical signals.
Based on these techniques, the nodes (“intermediate nodes”) that are not the destination of packet in the optical domain can make a routing decision without converting the optical signals (e.g., the payload) they receive into electrical signals. This reduces the latencies associated with the conventional routing protocols that require optical-to-electrical conversion in order for processors to read the routing header to make routing decisions. Moreover, because of no optical-to-electrical conversion, the intermediate nodes do not need to buffer the packet in the electrical domain and then convert the electrical signals back to optical signals in order to transmit the packet to its destination in the network after the routing decision is made. Further, according to the disclosed techniques, the size of the routing header of the packet is gradually reduced as the packet is transmitted through the intermediate nodes, which in term reduces the overall bandwidth required to transmit the packet in the network.
To implement these techniques, each of the nodes in the network 100 may be provided with a routing decision-making circuit to make a routing decision.
The data payload encoder 254 includes one or more ring resonators 262 (one is labeled in
The routing decision-making circuit 256 includes one or more ring resonators 264 (one is labeled in
For example, a packet may include a routing header encoded with a wavelength λ1 and a data payload encoded with another wavelength λ2. When the packet is received at the routing decision-making circuit 256, the ring resonator 264 labeled with “λ1” is configured to extract/read a first bit of the routing header. The optical signal extracted by the ring resonator 264 is converted to an electrical signal, which is sent to the controller 258 to make a routing decision. In some implementations, the function of making a routing decision of the controller 258 may be integrated with/disposed at the decision-making circuit 256. Based on the routing decision, the controller 258 may direct the optical transceiver 282 to transmit the packet to a next node in the network. In some embodiments, if the routing header contains no data or the decision-making circuit 256 does not detect any routing header, the controller 258 can make a decision that it is the destination of packet and direct the ring resonator 264 labeled with “λ2” to extract the optical signals of the data payload that is encoded with wavelength λ2.
Although the controller 258 is illustrated as independent from other components of the node 250, in some embodiments the control functions of the controller 258 may be broken down into individual control blocks and integrated with the routing header encoder 252, the data payload encoder 254, and the routing decision-making circuit 256, respectively.
As each node of the network 100 has the configurations illustrated in
At 460, the node 20 receives the remainder and reads the first bit of the updated routing header. At 462, the node 20 determines whether the first bit of the updated routing header is one (1). If the first bit of the updated routing header is not one (e.g., zero) (No at 462), at 464 the first bit of the updated routing header is stripped to generate a second-updated routing header. The second-updated routing header and the data payload (collectively the “remainder”) are routed through the output port 200 of the node 20. At 466, the remainder is received at the node 300.
If the first bit of the updated routing header is one (Yes at 462), at 468 the first bit of the updated routing header is stripped to generate a second-updated routing header. The second-updated routing header and the data payload (collectively the “remainder”) are routed through the output port 201 of the node 20. At 470, the remainder is received at the node 301.
Following 458, at 472, the node 21 receives the remainder and reads the first bit of the updated routing header. At 474, the node 20 determines whether the first bit of the updated routing header is one (1). If the first bit of the updated routing header is not one (e.g., zero) (No at 474), at 476 the first bit of the updated routing header is stripped to generate a second-updated routing header. The second-updated routing header and the data payload (collectively the “remainder”) are routed through the output port 210 of the node 21. At 478, the remainder is received at the node 310.
If the first bit of the updated routing header is one (Yes at 474), at 480 the first bit of the updated routing header is stripped to generate a second-updated routing header. The second-updated routing header and the data payload (collectively the “remainder”) are routed through the output port 211 of the node 21. At 482, the remainder is received at the node 311. As can be appreciated, the above algorithm can be applied to the nodes 300, 301, 310, and 311, or another nodes in the network 100 to route the data payload until it arrives at the destination node.
In response to determining that the routing header is not empty (No at 604), at 608 the node reads a first bit of the routing header to make a routing decision for the data payload. At 610, the node strips the first bit of the routing header in the optical domain to generate an updated routing header. At 612, the node routes the data payload and the updated routing header based on the routing decision to a next node in the network, without converting the optical data payload into electrical signals and without buffering the data payload at the node.
The node 700 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs the node 700 to be a special-purpose machine. According to one embodiment, the techniques herein are performed by the node 700 in response to processor(s) 702 executing one or more sequences of one or more instructions of the packet encoding and routing logic 706 contained in the storage media 704. Execution of the sequences of instructions contained in the storage media 704 causes processor(s) 702 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
The term “non-transitory media,” and similar terms, as used herein refers to any media that store data and/or instructions that cause a machine to operate in a specific fashion. Such non-transitory media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks. Volatile media includes dynamic memory. Common forms of non-transitory media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge, and networked versions of the same.
In summary, the disclosed techniques enables bit-level decision making for optical packet routing, purely in the optical domain. As the encoded data packet propagates through each decision point (e.g., intermediate node), the bit used for making decision is removed. Furthermore, the techniques eliminate the need for optical to electrical transition and electrical buffers. Further, by enabling decision-making in the optical domain, the techniques can be applied to optical computing to be combined with optical data transmission.
As used herein, a circuit might be implemented utilizing any form of hardware, software, or a combination thereof. For example, one or more processors, controllers, ASICs, PLAs, PALs, CPLDs, FPGAs, logical components, software routines or other mechanisms might be implemented to make up a circuit. In implementation, the various circuits described herein might be implemented as discrete circuits or the functions and features described can be shared in part or in total among one or more circuits. Even though various features or elements of functionality may be individually described or claimed as separate circuits, these features and functionality can be shared among one or more common circuits, and such description shall not require or imply that separate circuits are required to implement such features or functionality.
In general, the word “component,” “engine,” “system,” “database,” “data store,” and the like, as used herein, can refer to logic embodied in hardware or firmware, or to a collection of software instructions, possibly having entry and exit points, written in a programming language, such as, for example, Java, C or C++. A software component may be compiled and linked into an executable program, installed in a dynamic link library, or may be written in an interpreted programming language such as, for example, BASIC, Perl, or Python. It will be appreciated that software components may be callable from other components or from themselves, and/or may be invoked in response to detected events or interrupts. Software components configured for execution on computing devices may be provided on a computer readable medium, such as a compact disc, digital video disc, flash drive, magnetic disc, or any other tangible medium, or as a digital download (and may be originally stored in a compressed or installable format that requires installation, decompression or decryption prior to execution). Such software code may be stored, partially or fully, on a memory device of the executing computing device, for execution by the computing device. Software instructions may be embedded in firmware, such as an EPROM. It will be further appreciated that hardware components may be comprised of connected logic units, such as gates and flip-flops, and/or may be comprised of programmable units, such as programmable gate arrays or processors.
In common usage, the term “or” should always be construed in the inclusive sense unless the exclusive sense is specifically indicated or logically necessary. The exclusive sense of “or” is specifically indicated when, for example, the term “or” is paired with the term “either,” as in “either A or B.” As another example, the exclusive sense may also be specifically indicated by appending “exclusive” or “but not both” after the list of items, as in “A or B, exclusively” and “A and B, but not both.” Moreover, the description of resources, operations, or structures in the singular shall not be read to exclude the plural. Conditional language, such as, among others, “can,” “could,” “might,” or “may,” unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or steps.
Terms and phrases used in this document, and variations thereof, unless otherwise expressly stated, should be construed as open ended as opposed to limiting. Adjectives such as “conventional,” “traditional,” “normal,” “standard,” “known,” and terms of similar meaning should not be construed as limiting the item described to a given time period or to an item available as of a given time, but instead should be read to encompass conventional, traditional, normal, or standard technologies that may be available or known now or at any time in the future. The presence of broadening words and phrases such as “one or more,” “at least,” “but not limited to” or other like phrases in some instances shall not be read to mean that the narrower case is intended or required in instances where such broadening phrases may be absent.