Clock synchronization loop

Information

  • Patent Grant
  • 11706014
  • Patent Number
    11,706,014
  • Date Filed
    Thursday, January 20, 2022
    2 years ago
  • Date Issued
    Tuesday, July 18, 2023
    10 months ago
Abstract
In one embodiment, a synchronized communication system includes a plurality of compute nodes, and clock connections to connect the compute nodes in a closed loop configuration, wherein the compute nodes are configured to distribute among the compute nodes a master clock frequency from any selected one of the compute nodes.
Description
FIELD OF THE INVENTION

The present invention relates to computer systems, and in particular, but not exclusively to, clock synchronization.


BACKGROUND

Clock and frequency synchronization among network devices is used in many network applications. One application of using a synchronized clock value is for measuring latency between two devices. If the clocks are not synchronized the resulting latency measurement will be inaccurate.


Synchronous Ethernet (SyncE) is an International Telecommunication Union Telecommunication (ITU-T) Standardization Sector standard for computer networking that facilitates the transference of clock signals over the Ethernet physical layer. In particular, SyncE enables clock synchronization inside a network with respect to a master clock. Each network element (e.g., a switch, a network interface card (NIC), or router) needs to recover the master clock from high-speed data received from the master device clock source and use the recovered master clock for its own data transmission in a manner such that the master clock spreads throughout the network. SyncE provides synchronization with respect to clock frequency. The actual clock value (e.g., in Coordinated Universal Time (UTC) format) is handled by higher layer standards and protocols, such as Precision Time Protocol (PTP).


Time, clock and frequency synchronization is crucial in some of the modern computer network applications. It enables 5G and 6G networks, and is proven to enhance the performance of data center workloads. The SyncE standard allows improving Precision Time Protocol (PTP) accuracy by having less accumulated drift between PTP messages, and helps achieve an accurate time solution for an extended period after completely losing a PTP source.


SUMMARY

There is also provided in accordance with still another embodiment of the present disclosure, a synchronized communication system, including a plurality of compute nodes, and clock connections to connect the compute nodes in a closed loop configuration, wherein the compute nodes are configured to distribute among the compute nodes a master clock frequency from any selected one of the compute nodes.


Further in accordance with an embodiment of the present disclosure, the system includes a controller to selectively block and unblock distribution of the master clock frequency in the closed loop responsively to one of the compute nodes being designated as a master clock.


Still further in accordance with an embodiment of the present disclosure the compute nodes include at least one of the following a data processing unit (DPU), graphics processing unit (GPU), switch, network interface controller.


Additionally in accordance with an embodiment of the present disclosure each of the compute nodes includes one or more ports to transmit and receive respective communication signals over respective network links, andclock synchronization circuitry to process at least one of the respective communication signals received by the one or more ports so as to recover a respective remote clock.


Moreover in accordance with an embodiment of the present disclosure each of the compute nodes includes clock synchronization circuitry to recover a remote clock, a clock input port connected to another clock output port of a first one of the compute nodes via a first one of the clock connections, and configured to receive a clock signal at the master clock frequency from the first compute node, and a clock output port connected to another clock input port of a second one of the compute nodes via a second one of the clock connections.


Further in accordance with an embodiment of the present disclosure the first compute node and the second compute node are a same one of the compute nodes.


Still further in accordance with an embodiment of the present disclosure the clock synchronization circuitry is configured to discipline a local clock signal to the master clock frequency responsively to the recovered respective remote clock, or the received clock signal, and output the disciplined local clock signal via the clock output port to the second compute node.


Additionally in accordance with an embodiment of the present disclosure the clock synchronization circuitry includes a frequency synthesizer.


Moreover, in accordance with an embodiment of the present disclosure the frequency synthesizer is a frequency jitter synchronizer.


Further in accordance with an embodiment of the present disclosure the frequency synthesizer is a jitter network synchronizer clock.


Still further in accordance with an embodiment of the present disclosure the clock synchronization circuitry is configured to discipline a local clock signal to the master clock frequency responsively to the recovered respective remote clock, and output the disciplined local clock signal via the clock output port to the second compute node.


Additionally in accordance with an embodiment of the present disclosure the clock synchronization circuitry is configured to ignore the clock signal received by the clock input port.


Moreover, in accordance with an embodiment of the present disclosure, the system includes a controller to selectively block distribution of the master clock frequency in the closed loop by instructing the clock synchronization circuitry to ignore the clock signal received by the clock input port responsively to one of the compute nodes being designated as a master clock.


Further in accordance with an embodiment of the present disclosure the clock synchronization circuitry is configured to discipline a local clock signal to the master clock frequency responsively to the received clock signal, and output the disciplined local clock signal via the clock output port to the second compute node.


Still further in accordance with an embodiment of the present disclosure the compute nodes are configured to distribute the master clock frequency via respective ones of the clock connections using at least one of a one pulse per second (PPS) signal, or a 10 mega Hertz (10 MHz) signal.


There is also provided in accordance with still another embodiment of the present disclosure, a synchronized communication method, including connecting compute nodes with clock connections in a closed loop configuration, and distributing among the compute nodes a master clock frequency from any selected one of the compute nodes.


Additionally in accordance with an embodiment of the present disclosure, the method includes selectively blocking and unblocking distribution of the master clock frequency in the closed loop responsively to one of the compute nodes being designated as a master clock.


Moreover, in accordance with an embodiment of the present disclosure the compute nodes include at least one of the following a data processing unit (DPU), graphics processing unit (GPU), switch, network interface controller.


Further in accordance with an embodiment of the present disclosure, the method includes recovering a remote clock, connecting a clock input port to another clock output port of a first one of the compute nodes via a first one of the clock connections, receiving a clock signal at the master clock frequency from the first compute node, and connecting a clock output port to another clock input port of a second one of the compute nodes via a second one of the clock connections.


Still further in accordance with an embodiment of the present disclosure the first compute node and the second compute node are a same one of the compute nodes.


Additionally in accordance with an embodiment of the present disclosure, the method includes disciplining a local clock signal to the master clock frequency responsively to the recovered respective remote clock, or the received clock signal, and outputting the disciplined local clock signal via the clock output port to the second compute node.


Moreover, in accordance with an embodiment of the present disclosure, the method includes disciplining a local clock signal to the master clock frequency responsively to the recovered respective remote clock, and outputting the disciplined local clock signal via the clock output port to the second compute node.


Further in accordance with an embodiment of the present disclosure, the method includes ignoring the clock signal received by the clock input port.


Still further in accordance with an embodiment of the present disclosure, the method includes selectively blocking distribution of the master clock frequency in the closed loop by instructing clock synchronization circuitry to ignore the clock signal received by the clock input port responsively to one of the compute nodes being designated as a master clock.


Additionally in accordance with an embodiment of the present disclosure, the method includes disciplining a local clock signal to the master clock frequency responsively to the received clock signal, and outputting the disciplined local clock signal via the clock output port to the second compute node.


Moreover, in accordance with an embodiment of the present disclosure, the method includes distributing the master clock frequency via respective ones of the clock connections using at least one of a one pulse per second (PPS) signal, or a 10 mega Hertz (10 MHz) signal.





BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will be understood from the following detailed description, taken in conjunction with the drawings in which:



FIG. 1 is a block diagram view of a clock synchronization system with one compute node designated as a master clock constructed and operative in accordance with an embodiment of the present invention;



FIG. 2 is a block diagram view of the clock synchronization system of FIG. 1 with another compute node designated as the master clock;



FIG. 3 is a flowchart including steps in a method of operation of a controller of the system of FIG. 1;



FIG. 4 is a flowchart including steps in a method of operation of clock synchronization circuitry in a compute node in the system of FIG. 1;



FIG. 5 is a block diagram view of a clock synchronization system with two compute nodes constructed and operative in accordance with an alternative embodiment of the present invention; and



FIG. 6 is a more detailed block diagram view of a compute node in the system of FIG. 1.





DESCRIPTION OF EXAMPLE EMBODIMENTS

Clock synchronization between compute nodes remains an unsolved challenge in the networking industry. One solution is to use SyncE clock chaining by chaining multiple SyncE capable devices together so that the master clock is distributed from one compute node at the root of the chain to other compute nodes in the chain. The root is defined by wiring topology.


SyncE clock chaining may have some limitations including imposing a local clock hierarchy that is dictated by the physical wiring, introducing a possible “single point of failure” since the root controls the frequency of the entire chain. For example, if the compute node at the root malfunctions, it then becomes impossible to distribute the clock among the remaining compute nodes. Therefore, if the master clock moves to another of the compute nodes, the master clock cannot be distributed based on the physical wiring.


One solution to the above problems is to transfer information regarding frequency differences between the root and the new master clock via some centralized entity, such as a SyncE software daemon running on a central processing unit (CPU). However, this solution adds complexity to the software—hardware/firmware interfaces, and to the software itself, and may add inaccuracies to the timing solution due to latencies and jitter of the control messages exchanged between the devices and the managing software. Additionally, this solution may add CPU load due to exchanging messages and performing calculations. It should be noted that CPU utilization is extremely important in 5G use cases where SyncE is commonly required.


Embodiments of the present invention, solve at least some of the above problems by connecting compute nodes using clock connections to connect the compute nodes in a closed loop configuration. For example, compute node 1 is connected to compute node 2, which is connected to compute node 3, which is connected to compute node 1, forming a closed loop. The closed loop may then be used to distribute a master clock frequency among the compute nodes from any selected one of the compute nodes in the closed loop by passing the master clock frequency from compute node to compute node in the closed loop. For example, if one of the compute nodes is designated as a master clock, the master clock frequency is distributed from the compute node designated as the master clock to the other compute nodes via the clock connections of the closed loop. If at a later time another one of the compute nodes is designated as the master clock (for example, due to the previous compute node designated as a master malfunctioning), the master clock frequency is distributed from the compute node of the newly designated master clock to the other compute nodes via the clock connections of the closed loop. Therefore, if one of the compute nodes malfunctions, it is still possible to operate another one of the compute nodes to distribute the master clock frequency.


In some embodiments, a clock output port of one compute node is connected to the clock input port of another compute node with a cable or other connection (e.g., a trace on a circuit board), and so on, until all the compute nodes are connected together in a closed loop. For example, the clock output port of node 1 is connected to the clock input port of node 2, and so on. The clock output port of node 3 is connected to the clock input port of node 1, thereby completing the loop. Upon detecting a clock signal at its clock input port, a compute node in the closed loop uses the received clock signal to discipline its local clock signal. The received clock signal may then be output via the clock output port of that compute node to the next compute node in the chain, and so on.


In some embodiments, the compute node designated as the master clock should not use a clock signal received from another compute node to discipline its local clock signal. Instead, the compute node designated as the master clock disciplines its local clock signal from a recovered remote clock. It is this recovered remote clock which is distributed around the loop to the other compute nodes. In some embodiments, software or firmware running on a controller breaks the chain of the closed loop so that the compute node designated as the master clock does not use a clock signal received via its clock input port. Therefore, software or firmware may instruct the compute node designated as the master clock to ignore the received clock signal at its clock input port and by default use the recovered remote clock to discipline its local clock signal. In other embodiments, software or firmware running on a controller breaks the chain of the closed loop so that the compute node designated as the master clock does not receive a clock signal via its clock input port. Therefore, in some embodiments, the software or firmware running on the controller may instruct the compute node, which would otherwise pass its clock via its clock output port to the compute node of the designated master clock, to not output a clock signal to the compute node of the designated master clock.


Each of the compute nodes may include clock synchronization circuitry which performs at least some of the following: recovering a remote clock and disciplining a local clock signal based on the recovered remote clock, receiving the clock signal via the chain, discipling the local clock signal based on the received clock signal, and passing the local clock signal to the next compute node in the chain. The clock synchronization circuitry may include a frequency jitter synchronizer, for example, a low or ultra-low frequency jitter synchronizer. An example of a suitable frequency synthesizer is Ultra-Low Jitter Network Synchronizer Clock LMK05318 commercially available from Texas Instruments Inc., 12500 TI Boulevard Dallas, Tex. 75243 U.S.A..


SYSTEM DESCRIPTION

Reference is now made to FIG. 1, which is a block diagram view of a clock synchronization system 10 with one compute node 12-2 designated as a master clock constructed and operative in accordance with an embodiment of the present invention. The system 10 includes a plurality of compute nodes 12 (labeled compute nodes 12-1, 12-2, 12-3), and a controller 14. Each compute node 12 may include processing circuitry 16, one or more ports 18, clock synchronization circuitry 20 (which optionally includes a frequency synchronizer 22), an oscillator 24, a clock input port 26, and a clock output port 28.


A plurality of clock connections 30 are configured to connect the compute nodes 12 in a closed loop configuration. For example, compute node 12-1 is connected to compute node 12-2, which is connected to compute node 12-3, which in turn is connected to compute node 12-1 via the clock connections 30 as described in more detail below.



FIG. 1 shows three compute nodes 12 connected together in a closed loop configuration. The system 10 may include two compute nodes 12 connected together in a closed loop configuration, described in more detail with reference to FIG. 5. The system 10 may include more than three compute nodes 12 connected together in a closed loop configuration. The compute nodes 12 may be disposed on the same printed circuit board (not shown) with the clock connections 30 being implemented using printed circuit board (PCB) traces (not shown) on the circuit board between the compute nodes 12.


The processing circuitry 16 may include hardwired processing circuitry and/or one or more processors on which to execute software. The software may be downloaded to the compute node 12 or disposed on the compute node 12 at manufacture. The processing circuitry 16 may include packet processing circuitry which may include a physical layer (PHY) chip and MAC chip (not shown). The processing circuitry 16 may include switching circuitry, and/or a data processing unit (DPU) and/or graphics processing unit (GPU) or any suitable processor, described in more detail with reference to FIG. 6.


The port(s) 18 are configured to transmit and receive respective communication signals over respective network links, for example, to receive a clock synchronization signal or clock synchronization packets from a remote clock 32. The clock synchronization signal or clock synchronization packets may be received via any suitable interface via any suitable communication method and protocol.


The clock input port 26 of one of the compute nodes 12 (e.g., compute node 12-1) is connected to the clock output port 28 of another one of the compute nodes 12 (e.g., compute node 12-3) via one of the clock connections 30, and configured to receive a clock signal at the master clock frequency from the other compute node 12 (e.g., compute node 12-3). The clock output port 28 of one of the compute nodes 12 (e.g., compute node 12-1) is connected to the clock input port 26 of another one of the compute nodes 12 (e.g., compute node 12-2) via one of the clock connections 30. The clock output port 28 of the compute node 12-2 is connected to the clock input port 26 of the compute node 12-3 via one of the clock connections 30.


In general, the compute nodes 12 are configured to distribute among the compute nodes 12 a master clock frequency from any selected one of the compute nodes, for example, the computer node 12-2 designated as the master clock.


In the example of FIG. 1, the compute node 12-2 disciplines its local clock signal from the remote clock 32 and is designated as the master clock, for example by the controller 14. The compute node 12-2 distributes its local clock signal as the master clock frequency via the clock output port 28 of compute node 12-2 to the clock input port 26 of compute node 12-3. The compute node 12-3 disciplines its local clock signal responsively to the received clock signal received at the clock input port 26 of compute node 12-3. The compute node 12-3 distributes its local clock signal as the master clock frequency via the clock output port 28 of compute node 12-3 to the clock input port 26 of compute node 12-1. The compute node 12-1 disciplines its local clock signal responsively to the received clock signal received at the clock input port 26 of compute node 12-1. In some embodiments, the compute node 12-1 is instructed by the controller 14 not to distribute its local clock signal via the clock output port 28 of compute node 12-1. In other embodiments, the compute node 12-1 distributes its local clock signal as the master clock frequency via the clock output port 28 of compute node 12-1 to the clock input port 26 of compute node 12-2, which is instructed by the controller 14 to ignore the received clock signal received at the clock input port 26 of compute node 12-2.


The compute nodes 12 may be configured to distribute the master clock frequency via respective clock connections 30 in the form of any signal which is scaled proportional to master clock frequency using one pulse per second (PPS) signal(s) or 10 mega Hertz (10 MHz) signal(s). The scaling factor may be used by the clock synchronization circuitry 20 of the outputting compute node 12 to scale the master clock frequency to one PPS or 10 MHz, for example, and by the clock synchronization circuitry 20 of the receiving compute node 12 to rebuild the received signal (e.g., one PPS or 10 MHz) to the master clock frequency.


In some embodiments, the frequency synchronizer 22 is a frequency jitter synchronizer or a jitter network synchronizer clock. The frequency synchronizer 22 may be configured to tune a network frequency, feed the clock of the compute node 12, and provide phase lock loop (PLL) capabilities. In some embodiments, the frequency synchronizer 22 include an application-specific integrated circuit (ASIC) and/or a programmable device with analog circuitry mainly for phase lock loop (PLL) capabilities. The frequency synchronizer 22 may be a low or ultra-low frequency jitter synchronizer. An example of a suitable frequency synthesizer is Ultra-Low Jitter Network Synchronizer Clock LMK05318 commercially available from Texas Instruments Inc., 12500 TI Boulevard, Dallas, Tex. 75243 U.S.A..


In the compute node 12-2 designated as the master clock, the frequency synchronizer 22 adjusts the output of the oscillator 24 to provide a local clock signal based on a clock recovered from the remote clock 32. In the compute node(s) 12-1, 12-3 not designated as the master clock, the clock signal received at the clock input port 26 is used by the frequency synchronizer 22 to drive the local clock signal, generally without using the output of the oscillator 24.


In some embodiments, the frequency synchronizer 22 is configured to use the clock signal received at the clock input port 26 if such a clock signal is received. If not, the frequency synchronizer 22 disciplines the local clock signal based on the output of the oscillator 24 and/or a recovered remote clock. Therefore, in some embodiments, software or firmware running on the controller 14 breaks the chain of the closed loop so that the compute node 12-2 designated as the master clock does not use a clock signal received at its clock input port 26 or does not receive a clock signal at its clock input port 26, as described in more detail with reference to FIG. 3.


When the compute nodes 12 boot up, each compute node 12 looks for a clock signal being received at its own clock input port 26 and if a clock signal is not found, the respective compute node 12 uses a local clock, for example, based on an output of the oscillator 24 in that compute node 12. Therefore, the first compute node 12 to boot up outputs a clock signal based on a local clock from its clock output port 28 to the next compute node 12 in the closed loop. The next compute node 12 then detects the clock signal input via its clock input port 26 and uses the received clock signal to discipline its local clock signal, and so on. When one of the compute nodes 12 is designated as a master clock, that compute node 12 does not use the clock signal received at its clock input port 26, but disciplines its local clock signal based on the remote clock 32 and outputs its local clock signal via its clock output port 28 to the next compute node 12 in the loop, and so on. Another option is to assign one of the compute nodes 12 as a default master clock.


Reference is now made to FIG. 2, which is a block diagram view of the clock synchronization system of FIG. 1 with compute node 12-3 designated as the master clock. The master clock may be moved from one compute node 12 to another due to many reasons, for example, the remote clock 32 used by one of the compute nodes 12 previously designated as the master clock may now be non-functional or deemed to be less accurate than a remote clock used by another one of the compute nodes 12 now designated as the master clock.


In the example of FIG. 2, the compute node 12-3 is now designated as the master clock (for example, by the controller 14), and disciplines its local clock signal from the remote clock 32. The compute node 12-3 may: ignore any clock signal received at its clock input port 26; or the controller 14 may instruct the compute node 12-2 to cease outputting the local clock signal of compute node 12-2 via the clock output port 28 of compute node 12-2. The compute node 12-3 distributes its local clock signal as the master clock frequency via the clock output port 28 of compute node 12-3 to the clock input port 26 of the compute node 12-1. The compute node 12-1 disciplines its local clock signal responsively to the received clock signal received at the clock input port 26 of compute node 12-1. The compute node 12-1 distributes its local clock signal as the master clock frequency via the clock output port 28 of compute node 12-1 to the clock input port 26 of compute node 12-2. The compute node 12-2 disciplines its local clock signal responsively to the received clock signal received at the clock input port 26 of compute node 12-2. As mentioned above, in some embodiments, the compute node 12-2 is instructed by the controller 14 not to distribute its local clock signal via the clock output port 28 of compute node 12-2. In other embodiments, the compute node 12-2 distributes its local clock signal as the master clock frequency via the clock output port 28 of compute node 12-2 to the clock input port 26 of compute node 12-3, which is instructed by the controller 14 to ignore the received clock signal received at the clock input port 26 of compute node 12-3.


Reference is now made to FIG. 3, which is a flowchart 300 including steps in a method of operation of the controller 14 of the system 10 of FIG. 1.


In some embodiments, the controller 14 is configured to run a software daemon which knows the topology of the system 10 (i.e., how the compute nodes 12 are connected in the closed loop) and which compute node 12 is the master clock (e.g., SyncE master) so that the software daemon knows where to block and unblock the closed loop. If the compute nodes 12 are disposed in different hosts, then the hosts may need to communicate with respect to blocking and unblocking the closed loop.


The controller 14 is configured to identify or designate one of the compute nodes 12 as the master clock (block 302). The controller 14 is configured to selectively block and unblock distribution of the master clock frequency in the closed loop responsively to one of the compute nodes 12 being designated as a master clock (block 304). In some embodiments, the controller 14 is configured to instruct the clock synchronization circuitry 20 of the compute node 12 designated as the master clock to ignore the clock signal received at its clock input port 26 responsively to that compute node 12 being designated as the master clock (block 306). In other embodiments, the controller 14 is configured to instruct the clock synchronization circuitry 20 of the compute node 12 (designated as a slave clock prior and) located immediately prior to the compute node 12 designated as the master clock in the closed loop to not send its local clock signal via its clock output port 28 to the compute node 12 designated as the master clock (block 308).


In practice, some or all of the functions of the controller 14 may be combined in a single physical component or, alternatively, implemented using multiple physical components. These physical components may comprise hard-wired or programmable devices, or a combination of the two. In some embodiments, at least some of the functions of the controller 14 may be carried out by a programmable processor under the control of suitable software. This software may be downloaded to a device in electronic form, over a network, for example. Alternatively, or additionally, the software may be stored in tangible, non-transitory computer-readable storage media, such as optical, magnetic, or electronic memory.


Reference is now made to FIG. 4, which is a flowchart 400 including steps in a method of operation of the clock synchronization circuitry 20 in one of the compute nodes 12 (e.g., compute node 12-3) in the system 10 of FIG. 1.


The flowchart 400 is first traversed assuming that the compute node 12-3 is designated as a slave clock.


When the compute node 12-3 first boots up, the clock synchronization circuitry 20 of the compute node 12-3 is configured to generate a local clock signal responsively to an output from the oscillator 24 (block 402). After a short delay, assuming there is still no clock signal received by the clock input port 26 of the compute node 12-3, the clock synchronization circuitry 20 of the compute node 12-3 is configured to recover a remote clock, e.g., from the remote clock 32 (block 404). The step of block 404 may include the clock synchronization circuitry 20 being configured to process respective communication signal(s) received by the respective port(s) 18 so as to recover a respective remote clock (block 406). The clock synchronization circuitry 20 of the compute node 12-3 is configured to receive a clock signal via the clock input port 26 of the compute node 12-3 (block 408) from the previous compute node 12-2 in the closed loop. The clock synchronization circuitry 20 of the compute node 12-3 is configured to discipline its local clock signal to the master clock frequency responsively to the received clock signal (block 410). The clock synchronization circuitry 20 of the compute node 12-3 is configured to output the disciplined local clock signal via the clock output port 28 of the compute node 12-3 to the next compute node 12-1 in the closed loop (block 412).


The flowchart 400 is now traversed assuming that the compute node 12-3 is now designated as a master clock.


One or more of the steps of blocks 402-408 may be performed. If a clock signal is received by the clock synchronization circuitry 20 of the compute node 12-3 via the clock input port 26 of compute node 12-3, the clock synchronization circuitry 20 of the compute node 12-3 is configured to ignore the clock signal received by the clock input port 26 (block 414). The clock synchronization circuitry 20 of compute node 12-3 is configured to discipline the local clock signal of compute node 12-3 to the master clock frequency responsively to the recovered remote clock (recovered in the step of blocks 404 and/or 406) (block 416). The clock synchronization circuitry 20 of the compute node 12-3 is then configured to perform the step of block 412.


In practice, some or all of the functions of the clock synchronization circuitry 20 may be combined in a single physical component or, alternatively, implemented using multiple physical components. These physical components may comprise hard-wired or programmable devices, or a combination of the two. In some embodiments, at least some of the functions of the clock synchronization circuitry 20 may be carried out by a programmable processor under the control of suitable software. This software may be downloaded to a device in electronic form, over a network, for example. Alternatively, or additionally, the software may be stored in tangible, non-transitory computer-readable storage media, such as optical, magnetic, or electronic memory.


Reference is now made to FIG. 5, which is a block diagram view of a clock synchronization system 500 with two compute nodes 12 constructed and operative in accordance with an alternative embodiment of the present invention.


The clock synchronization system 500 is substantially the same as the system 10 except that in the clock synchronization system 500 there are only two computes node 12. The clock synchronization system 500 may be compared to combining compute nodes 12-1, 12-3 of system 10 into the same compute node 12-1, which is in a closed loop with the compute node 12-2.


In the clock synchronization system 500, the clock output port 28 of compute node 12-1 is connected to the clock input port 26 of compute node 12-2 via one of the clock connections 30, and the clock output port 28 of compute node 12-2 is connected to the clock input port 26 of compute node 12-1 via one of the clock connections 30 thereby forming the closed loop.


Reference is now made to FIG. 6, which is a more detailed block diagram view of one of the compute nodes 12 in the system 10 of FIG. 1. The compute node 12 may include any one or more of the following: a data processing unit (DPU) 600, a graphics processing unit (GPU) 602, a switch 604, or a network interface controller (NIC) 606.


Graphics processing units (GPUs) are employed to generate three-dimensional (3D) graphics objects and two-dimensional (2D) graphics objects for a variety of applications, including feature films, computer games, virtual reality (VR) and augmented reality (AR) experiences, mechanical design, and/or the like. A modern GPU includes texture processing hardware to generate the surface appearance, referred to herein as the “surface texture,” for 3D objects in a 3D graphics scene. The texture processing hardware applies the surface appearance to a 3D object by “wrapping” the appropriate surface texture around the 3D object. This process of generating and applying surface textures to 3D objects results in a highly realistic appearance for those 3D objects in the 3D graphics scene.


The texture processing hardware is configured to perform a variety of texture-related instructions, including texture operations and texture loads. The texture processing hardware generates accesses texture information by generating memory references, referred to herein as “queries,” to a texture memory. The texture processing hardware retrieves surface texture information from the texture memory under varying circumstances, such as while rendering object surfaces in a 3D graphics scene for display on a display device, while rendering 2D graphics scene, or during compute operations.


Surface texture information includes texture elements (referred to herein as “texels”) used to texture or shade object surfaces in a 3D graphics scene. The texture processing hardware and associated texture cache are optimized for efficient, high throughput read-only access to support the high demand for texture information during graphics rendering, with little or no support for write operations. Further, the texture processing hardware includes specialized functional units to perform various texture operations, such as level of detail (LOD) computation, texture sampling, and texture filtering.


In general, a texture operation involves querying multiple texels around a particular point of interest in 3D space, and then performing various filtering and interpolation operations to determine a final color at the point of interest. By contrast, a texture load typically queries a single texel, and returns that directly to the user application for further processing. Because filtering and interpolating operations typically involve querying four or more texels per processing thread, the texture processing hardware is conventionally built to accommodate generating multiple queries per thread. For example, the texture processing hardware could be built to accommodate up to four texture memory queries is performed in a single memory cycle. In that manner, the texture processing hardware is able to query and receive most or all of the needed texture information in one memory cycle.


Various features of the invention which are, for clarity, described in the contexts of separate embodiments may also be provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, described in the context of a single embodiment may also be provided separately or in any suitable sub-combination.


The embodiments described above are cited by way of example, and the present invention is not limited by what has been particularly shown and described hereinabove. Rather the scope of the invention includes both combinations and sub-combinations of the various features described hereinabove, as well as variations and modifications thereof which would occur to persons skilled in the art upon reading the foregoing description and which are not disclosed in the prior art.

Claims
  • 1. A synchronized communication system, comprising: a plurality of compute nodes including a first compute node, one or more intermediate compute nodes, and a last compute node; andclock connections to connect the compute nodes in a closed loop configuration, wherein: each of the compute nodes has an output connected to an input of a next one of the compute nodes via a respective one of the clock connections, while the last compute node has an output connected to an input of the first compute node via another respective one of the clock connections;the compute nodes are configured to distribute among the compute nodes, via ones of the clock connections, a master clock frequency from any selected one of the compute nodes, which is designated as a master clock;at a first time one of the plurality of compute nodes is designated as the master clock and is configured to distribute the master clock frequency among the compute nodes; andat a second time another one of the plurality of compute nodes is designated as the master clock, and is configured to distribute the master clock frequency among the compute nodes.
  • 2. The system according to claim 1, further comprising a controller to selectively block and unblock distribution of the master clock frequency in the closed loop responsively to one of the compute nodes being designated as a master clock.
  • 3. The system according to claim 1, wherein the compute nodes include at least one of the following: a data processing unit (DPU), graphics processing unit (GPU), switch, network interface controller.
  • 4. The system according to claim 1, wherein each of the compute nodes comprises: one or more ports to transmit and receive respective communication signals over respective network links; andclock synchronization circuitry to process at least one of the respective communication signals received by the one or more ports so as to recover a respective remote clock.
  • 5. The system according to claim 1, wherein the first compute node comprises: clock synchronization circuitry to recover a remote clock;a clock input port connected to a clock output port of a third compute node of the plurality of compute nodes via a first one of the clock connections, and configured to receive a clock signal at the master clock frequency from the third compute node; anda clock output port connected to a clock input port of a second compute node of the plurality of compute nodes via a second one of the clock connections.
  • 6. The system according to claim 5, wherein the third compute node and the second compute node are a same one of the compute nodes.
  • 7. The system according to claim 5, wherein the clock synchronization circuitry is configured to: discipline a local clock signal to the master clock frequency responsively to: the recovered respective remote clock; or the received clock signal; andoutput the disciplined local clock signal via the clock output port to the second compute node.
  • 8. The system according to claim 7, wherein the clock synchronization circuitry comprises a frequency synthesizer.
  • 9. The system according to claim 8, wherein the frequency synthesizer is a frequency jitter synchronizer.
  • 10. The system according to claim 8, wherein the frequency synthesizer is a jitter network synchronizer clock.
  • 11. The system according to claim 5, wherein the clock synchronization circuitry is configured to: discipline a local clock signal to the master clock frequency responsively to the recovered respective remote clock; andoutput the disciplined local clock signal via the clock output port to the second compute node.
  • 12. The system according to claim 11, wherein the clock synchronization circuitry is configured to ignore the clock signal received by the clock input port.
  • 13. The system according to claim 12, further comprising a controller to selectively block distribution of the master clock frequency in the closed loop by instructing the clock synchronization circuitry to ignore the clock signal received by the clock input port responsively to one of the compute nodes being designated as a master clock.
  • 14. The system according to claim 5, wherein the clock synchronization circuitry is configured to: discipline a local clock signal to the master clock frequency responsively to the received clock signal; andoutput the disciplined local clock signal via the clock output port to the second compute node.
  • 15. The system according to claim 1, wherein the compute nodes are configured to distribute the master clock frequency via respective ones of the clock connections using at least one of: a one pulse per second (PPS) signal; or a 10 mega Hertz (10 MHz) signal.
  • 16. A synchronized communication method, comprising: connecting compute nodes including a first compute node, one or more intermediate compute nodes, and a last compute node, with clock connections in a closed loop configuration so that each of the compute nodes has an output connected to an input of a next one of the compute nodes via a respective one of the clock connections, while the last compute node has an output connected to an input of the first compute node via another respective one of the clock connections;distributing among the compute nodes, via ones of the clock connections, a master clock frequency from any selected one of the compute nodes, which is designated as a master clock;at a first time, designating one of the plurality of compute nodes as the master clock and distributing the master clock frequency among the compute nodes; andat a second time, designating another one of the plurality of compute nodes as the master clock, and distributing the master clock frequency among the compute nodes.
  • 17. The method according to claim 16, further comprising selectively blocking and unblocking distribution of the master clock frequency in the closed loop responsively to one of the compute nodes being designated as a master clock.
  • 18. The method according to claim 16, wherein the compute nodes include at least one of the following: a data processing unit (DPU), graphics processing unit (GPU), switch, network interface controller.
  • 19. The method according to claim 16, further comprising: recovering a remote clock;connecting a clock input port to a clock output port of a third compute node of the plurality of compute nodes via a first one of the clock connections;receiving a clock signal at the master clock frequency from the third compute node; andconnecting a clock output port to a clock input port of a second compute node of the plurality of compute nodes via a second one of the clock connections.
  • 20. The method according to claim 19, wherein the third compute node and the second compute node are a same one of the compute nodes.
  • 21. The method according to claim 19, further comprising: disciplining a local clock signal to the master clock frequency responsively to: the recovered respective remote clock; or the received clock signal; andoutputting the disciplined local clock signal via the clock output port to the second compute node.
  • 22. The method according to claim 19, further comprising: disciplining a local clock signal to the master clock frequency responsively to the recovered respective remote clock; andoutputting the disciplined local clock signal via the clock output port to the second compute node.
  • 23. The method according to claim 22, further comprising ignoring the clock signal received by the clock input port.
  • 24. The method according to claim 23, further comprising selectively blocking distribution of the master clock frequency in the closed loop by instructing clock synchronization circuitry to ignore the clock signal received by the clock input port responsively to one of the compute nodes being designated as a master clock.
  • 25. The method according to claim 19, further comprising: disciplining a local clock signal to the master clock frequency responsively to the received clock signal; andoutputting the disciplined local clock signal via the clock output port to the second compute node.
  • 26. The method according to claim 16, further comprising distributing the master clock frequency via respective ones of the clock connections using at least one of: a one pulse per second (PPS) signal; or a 10 mega Hertz (10 MHz) signal.
US Referenced Citations (140)
Number Name Date Kind
5392421 Lennartsson Feb 1995 A
5402394 Turski Mar 1995 A
5416808 Witsaman et al. May 1995 A
5491792 Grisham et al. Feb 1996 A
5564285 Jurewicz et al. Oct 1996 A
5592486 Lo et al. Jan 1997 A
5896524 Halstead, Jr. et al. Apr 1999 A
6055246 Jones Apr 2000 A
6084856 Simmons et al. Jul 2000 A
6144714 Bleiweiss et al. Nov 2000 A
6199169 Voth Mar 2001 B1
6289023 Dowling et al. Sep 2001 B1
6449291 Burns et al. Sep 2002 B1
6535926 Esker Mar 2003 B1
6556638 Blackburn Apr 2003 B1
6718476 Shima Apr 2004 B1
6918049 Lamb et al. Jul 2005 B2
7111184 Thomas, Jr. et al. Sep 2006 B2
7191354 Purho Mar 2007 B2
7245627 Goldenberg et al. Jul 2007 B2
7254646 Aguilera et al. Aug 2007 B2
7334124 Pham et al. Feb 2008 B2
7412475 Govindarajalu Aug 2008 B1
7440474 Goldman et al. Oct 2008 B1
7447975 Riley Nov 2008 B2
7483448 Bhandari et al. Jan 2009 B2
7496686 Coyle Feb 2009 B2
7535933 Zerbe et al. May 2009 B2
7623552 Jordan et al. Nov 2009 B2
7636767 Lev-Ran et al. Dec 2009 B2
7650158 Indirabhai Jan 2010 B2
7656751 Rischar et al. Feb 2010 B2
7750685 Bunch et al. Jul 2010 B1
7904713 Zajkowski et al. Mar 2011 B1
7941684 Serebrin et al. May 2011 B2
3065052 Fredriksson et al. Nov 2011 A1
8341454 Kondapalli Dec 2012 B1
8370675 Kagan Feb 2013 B2
8407478 Kagan et al. Mar 2013 B2
8607086 Cullimore Dec 2013 B2
8699406 Charles et al. Apr 2014 B1
8879552 Zheng Nov 2014 B2
8930647 Smith Jan 2015 B1
9344265 Karnes May 2016 B2
9397960 Arad et al. Jul 2016 B2
9549234 Mascitto Jan 2017 B1
9979998 Pogue et al. May 2018 B1
10014937 Di Mola Jul 2018 B1
10027601 Narkis et al. Jul 2018 B2
10054977 Mikhaylov et al. Aug 2018 B2
10164759 Volpe Dec 2018 B1
10320646 Mirsky et al. Jun 2019 B2
10637776 Iwasaki Apr 2020 B2
10727966 Izenberg et al. Jul 2020 B1
11070304 Levi et al. Jul 2021 B1
20010006500 Nakajima et al. Jul 2001 A1
20020027886 Fischer et al. Mar 2002 A1
20020031199 Rolston Mar 2002 A1
20040096013 Laturell et al. May 2004 A1
20040153907 Gibart Aug 2004 A1
20050033947 Morris et al. Feb 2005 A1
20050268183 Barmettler Dec 2005 A1
20060109376 Chaffee et al. May 2006 A1
20070008044 Shimamoto Jan 2007 A1
20070072451 Tazawa et al. Mar 2007 A1
20070104098 Kimura et al. May 2007 A1
20070124415 Lev-Ran et al. May 2007 A1
20070139085 Elliot et al. Jun 2007 A1
20070159924 Vook et al. Jul 2007 A1
20070266119 Ohly Nov 2007 A1
20080069150 Badt et al. Mar 2008 A1
20080285597 Downey et al. Nov 2008 A1
20090257458 Cui et al. Oct 2009 A1
20100280858 Bugenhagen Nov 2010 A1
20110182191 Jackson Jul 2011 A1
20110194425 Li et al. Aug 2011 A1
20120076319 Terwal Mar 2012 A1
20120301134 Davari et al. Nov 2012 A1
20130045014 Mottahedin et al. Feb 2013 A1
20130215889 Zheng et al. Aug 2013 A1
20130235889 Aweya et al. Sep 2013 A1
20130294144 Wang et al. Nov 2013 A1
20130315265 Webb, III et al. Nov 2013 A1
20130336435 Akkihal et al. Dec 2013 A1
20140153680 Garg et al. Jun 2014 A1
20140185216 Zeng et al. Jul 2014 A1
20140185632 Steiner et al. Jul 2014 A1
20140253387 Gunn et al. Sep 2014 A1
20140281036 Cutler et al. Sep 2014 A1
20140301221 Nadeau et al. Oct 2014 A1
20140321285 Chew et al. Oct 2014 A1
20150019839 Cardinelli et al. Jan 2015 A1
20150078405 Roberts Mar 2015 A1
20150127978 Cui et al. May 2015 A1
20150163050 Han et al. Jun 2015 A1
20150318941 Zheng et al. Nov 2015 A1
20160072602 Earl et al. Mar 2016 A1
20160110211 Karnes Apr 2016 A1
20160140066 Worrell et al. May 2016 A1
20160277138 Garg et al. Sep 2016 A1
20160315756 Tenea et al. Oct 2016 A1
20170005903 Mirsky Jan 2017 A1
20170126589 Estabrooks et al. May 2017 A1
20170160933 De Jong et al. Jun 2017 A1
20170214516 Rivaud et al. Jul 2017 A1
20170302392 Farra et al. Oct 2017 A1
20170331926 Raveh et al. Nov 2017 A1
20170359137 Butterworth et al. Dec 2017 A1
20180059167 Sharf et al. Mar 2018 A1
20180152286 Kemparaj et al. May 2018 A1
20180188698 Dionne et al. Jul 2018 A1
20180191802 Yang et al. Jul 2018 A1
20180227067 Hu et al. Aug 2018 A1
20180309654 Achkir et al. Oct 2018 A1
20190007189 Hossain et al. Jan 2019 A1
20190014526 Bader et al. Jan 2019 A1
20190089615 Branscomb et al. Mar 2019 A1
20190149258 Araki et al. May 2019 A1
20190158909 Kulkarni et al. May 2019 A1
20190273571 Bordogna et al. Sep 2019 A1
20190319729 Leong et al. Oct 2019 A1
20190349392 Wetterwald et al. Nov 2019 A1
20190379714 Levi et al. Dec 2019 A1
20200162234 Almog et al. May 2020 A1
20200169379 Gaist et al. May 2020 A1
20200304224 Neugeboren Sep 2020 A1
20200331480 Zhang et al. Oct 2020 A1
20200344333 Hawari et al. Oct 2020 A1
20200396050 Perras et al. Dec 2020 A1
20200401434 Thampi et al. Dec 2020 A1
20210141413 Levi et al. May 2021 A1
20210218431 Narayanan et al. Jul 2021 A1
20210243140 Levi et al. Aug 2021 A1
20210297230 Dror et al. Sep 2021 A1
20210318978 Hsung Oct 2021 A1
20210328900 Sattinger et al. Oct 2021 A1
20210392065 Sela et al. Dec 2021 A1
20220021393 Ravid et al. Jan 2022 A1
20220066978 Mishra et al. Mar 2022 A1
20220239549 Zhao et al. Jul 2022 A1
Foreign Referenced Citations (10)
Number Date Country
106817183 Jun 2017 CN
108829493 Nov 2018 CN
1215559 Sep 2007 EP
2770678 Aug 2014 EP
2011091676 May 2011 JP
2012007276 Jan 2012 WO
2013124782 Aug 2013 WO
2013143112 Oct 2013 WO
2014029533 Feb 2014 WO
2014138936 Sep 2014 WO
Non-Patent Literature Citations (36)
Entry
Ipclock, “IEEE 1588 Primer,” ip-clock.com, pp. 1-3, May 1, 2017 (downloaded from https://web.archive.org/web/20170501192647/http://ip-clock.com/ieee-1588-primer/).
U.S. Appl. No. 16/900,931 Office Action dated Apr. 28, 2022.
U.S. Appl. No. 16/683,309 Office Action dated Mar. 17, 2022.
U.S. Appl. No. 16/779,611 Office Action dated Mar. 17, 2022.
U.S. Appl. No. 17/120,313 Office Action dated Mar. 28, 2022.
U.S. Appl. No. 17/191,736 Office Action dated Apr. 26, 2022.
EP Application # 21214269.9 Search Report dated May 2, 2022.
U.S. Appl. No. 17/148,605 Office Action dated May 17, 2022.
EP Application #22151451.6 Search Report dated Jun. 17, 2022.
U.S. Appl. No. 16/779,611 Office Action dated Jun. 24, 2022.
“Precision Time Protocol,” PTP Clock Types, CISCO, pp. 1-52, Jul. 30, 2020, as downloaded from https://www.cisco.com/c/en/us/td/docs/dcn/aci/apic/5x/system-management-configuration/cisco-apic-system-management-configuration-guide-52x/m-precision-time-protocol.pdf.
U.S. Appl. No. 17/120,313 Office Action dated Aug. 29, 2022.
ITU-T Standard G.8262/Y.1362, “Timing characteristics of synchronous equipment slave clock”, pp. 1-44, Nov. 2018.
ITU-T Standard G.8264/Y.1364, “Distribution of timing information through packet networks”, pp. 1-42, Aug. 2017.
ITU-T Standard G.8261/Y.1361, “Timing and synchronization aspects in packet networks”, pp. 1-120, Aug. 2019.
IEEE Standard 1588™-2008: “IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems”, IEEE Instrumentation and Measurement Society, Revision of IEEE Standard 1588-2002, USA, pp. 1-289, Jul. 24, 2008.
Weibel et al., “Implementation and Performance of Time Stamping Techniques”, 2004 Conference on IEEE 1588, pp. 1-29, Sep. 28, 2004.
Working Draft Project American National Standard T10/1799-D, “Information Technology—SCSI Block Commands—3 (SBC-3)”, pp. 1-220, Revision 19, May 29, 2009.
“Infiniband Architecture: Specification vol. 1”, pp. 1-1727, Release 1.2.1, Infiniband Trade Association, Nov. 2007.
Mellanox Technologies, “Mellanox ConnectX IB: Dual-Port InfiniBand Adapter Cards with PCI Express 2.0”, pp. 1-2, USA, year 2008.
Wikipedia—“Precision Time Protocol”, pp. 1-8, Aug. 24, 2019.
IEEE Std 1588-2002, “IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems”, IEEE Instrumentation and Measurement Society, pp. 1-154, Nov. 8, 2002.
Weibel, H., “High Precision Clock Synchronization according to IEEE 1588 Implementation and Performance Issues”, Zurich University of Applied Sciences, pp. 1-9, Jan. 17, 2005.
Lu et al., “A Fast CRC Update Implementation”, Computer Engineering Laboratory, Electrical Engineering Department, pp. 113-120, Oct. 8, 2003.
Texas Instruments, “LMK05318 Ultra-Low Jitter Network Synchronizer Clock With Two Frequency Domains,” Product Folder, pp. 1-86, Dec. 2018.
Dlugy-Hegwer et al., “Designing and Testing IEEE 1588 Timing Networks”, Symmetricom, pp. 1-10, Jan. 2007.
Mellanox Technologies, “How to test 1PPS on Mellanox Adapters”, pp. 1-6, Oct. 22, 2019 downloaded from https://community.mellanox.eom/s/article/How-To-Test-1PPS-on-Mellanox-Adapters.
ITU-T recommendation, “G.8273.2/Y.1368.2—Timing characteristics of telecom boundary clocks and telecom time slave clocks”, pp. 1-50, Jan. 2017.
Wasko et al., U.S. Appl. No. 17/549,949, filed Dec. 14, 2021.
Levi et al., U.S. Appl. No. 17/120,313, filed Dec. 14, 2020.
Mula et al., U.S. Appl. No. 17/148,605, filed Jan. 14, 2021.
Levy et al., U.S. Appl. No. 17/313,026, filed May 6, 2021.
U.S. Appl. No. 17/191,736 Office Action dated Nov. 10, 2022.
U.S. Appl. No. 17/670,540 Office Action dated Jan. 18, 2023.
U.S. Appl. No. 17/191,736 Advisory Action dated Feb. 16, 2023.
U.S. Appl. No. 17/549,949 Office Action dated Mar. 30, 2023.