Clock queue with arming and/or self-arming features

Description

FIELD OF THE INVENTION

The present invention relates in general to systems and methods for accurate scheduling, including, but not limited to, accurate scheduling of packet transmission and related technologies, and specifically but not exclusively to such systems and methods in the context of a clock queue.

BACKGROUND OF THE INVENTION

Various systems and methods intended to allow accurate scheduling of packet transmission are known. Some examples are described in the following pending U.S. patent application: U.S. patent application Ser. No. 16/430,457 of Levi et al, published as US Published Patent Application 2019/0379714, the disclosure of which is hereby incorporated herein by reference.

The concept of memory protection, which is described, for example, in en.wikipedia.org/wiki/Memory_protection, may be useful in understanding certain embodiments of the present invention.

SUMMARY OF THE INVENTION

The present invention, in certain exemplary embodiments thereof, seeks to provide improved systems and methods for accurate scheduling of packet transmission and related technologies.

In certain exemplary embodiments, the present invention may be useful in the following scenario:

- Communication Networks such as Enhanced Common Public Radio Interface (eCPRI), Optical Data center Network (ODCN), video over IP (e.g., Society of Motion Picture and Television Engineers (SMPTE) 2110) and others, use Time Division Multiplex (TDM) or, sometimes, Time-Division-Multiple Access (TDMA) for communicating between endpoints, wherein a plurality of data sources share the same physical medium during different time intervals, which are referred to as timeslots.
- eCPRI is described, for example, in eCPRI Specification V2.0 (2019-05-10), by Ericsson AB, Huawei Technologies Co. Ltd, NEC Corporation and Nokia. One relevant implementation of eCPRI is described in the O-RAN specification. Optical datacenter networks are described, for example, in “NEPHELE: an end-to-end scalable and dynamically reconfigurable optical architecture for application-aware SDN cloud datacenters,” IEEE Communications Magazine (Volume: 56, Issue: 2, February 2018. DOI: 10.1109/MCOM.2018.1600804), by Paraskevas Bakopoulos et al.
- TDMA multiplexing in high performance networks requires good synchronization between the end points, which is usually achieved by high precision time bases. Specialized circuitry, such as that described by Xilinx RoE Framer IP documentation (Xilinx PB056 (v2.1) Oct. 30, 2019) may also be used to send and receive data in TDM network; however, such specialized circuitry may be expensive and inflexible.
- Certain exemplary embodiments of the present invention seek to provide network-time dependent network communications using network elements, including inexpensive network adapters such as Network Interface Controllers (NICs) in the context of Ethernet™, or Host Channel Adapters (HCAs) in the context of InfiniBand. While the description below focuses mainly on embodiments suitable for network adapters, the disclosed techniques are not limited to network adapters, and may be used with any suitable network elements, including, for example, switches and routers.

It is appreciated that, in certain exemplary embodiments, the present invention may also be used in scenarios involving one or more of the following: TDM Networking; optical switching; and time sensitive networking.

There is thus provided in accordance with an exemplary embodiment of the present a timing system including timing circuitry including an arming queue, a clock work queue, and a clock completion queue, wherein at least the clock work queue is to provide timing information, and the arming queue is to arm the clock work queue.

Further in accordance with an exemplary embodiment of the present invention the clock completion queue is also to provide timing information.

Still further in accordance with an exemplary embodiment of the present invention the clock work queue is for synchronizing a sending time of packets pointed to by entries in a send queue to hold entries pointing to packets to be transmitted, via interaction with the clock completion queue.

Additionally in accordance with an exemplary embodiment of the present invention the send queue is associated with an application running in a host external to the timing system.

Moreover in accordance with an exemplary embodiment of the present invention the send queue includes a plurality of send queues each of which is associated with an application running in a host external to the timing system.

Further in accordance with an exemplary embodiment of the present invention at least one of the clock work queue and the clock completion queue is implemented in firmware.

Still further in accordance with an exemplary embodiment of the present invention the send queue includes a plurality of send queues each of which is associated with an application running in a host external to the timing system, and a least one application is associated with a different protection domain than at least one other application.

Additionally in accordance with an exemplary embodiment of the present invention the timing system also includes packet sending circuitry to transmit one or more packets over a network, wherein the packet sending circuitry is further to transmit the one or more packets in accordance with the sending time of corresponding entries in the send queue.

Moreover in accordance with an exemplary embodiment of the present invention the timing circuitry is included in a network interface card (NIC).

Further in accordance with an exemplary embodiment of the present invention the packet sending circuitry and the timing circuitry are included in a network interface card (NIC).

Still further in accordance with an exemplary embodiment of the present invention the arming queue includes at least a first arming queue and a second arming queue, and the first arming queue is to arm the second arming queue, and the second arming queue is to arm the first arming queue.

There is also provided in accordance with another exemplary embodiment of the present invention a method for packet transmission including performing the following in timing circuitry, the timing circuitry including an arming queue, a clock work queue, and a clock completion queue: the clock work queue providing timing information, and the arming queue arming the clock work queue.

Further in accordance with an exemplary embodiment of the present invention the clock work queue synchronizes a sending time of packets pointed to by entries in a send queue to hold entries pointing to packets to be transmitted, via interaction with the clock completion queue

Still further in accordance with an exemplary embodiment of the present invention the send queue is associated with an application running in a host external to the timing circuitry.

Additionally in accordance with an exemplary embodiment of the present invention the send queue includes a plurality of send queues each of which is associated with an application running in a host external to the timing circuitry.

Moreover in accordance with an exemplary embodiment of the present invention the method also includes, in packet sending circuitry, transmitting one or more packets over a network, wherein the packet sending circuitry transmits the one or more packets in accordance with the sending time of corresponding entries in the send queue.

Further in accordance with an exemplary embodiment of the present invention the arming queue includes at least a first arming queue and a second arming queue, and the method also includes the first arming queue arming the second arming queue, and the second arming queue arming the first arming queue.

Still further in accordance with an exemplary embodiment of the present invention the timing circuitry is included in a network interface card (NIC).

Additionally in accordance with an exemplary embodiment of the present invention the packet sending circuitry and the timing circuitry are included in a network interface card (NIC).

There is also provided in accordance with another exemplary embodiment of the present invention a timing system including a host system including a clock work queue and a clock completion queue, and timing circuitry in operative communication with the host system and including an arming queue, wherein at least the clock work queue is to provide timing information, and the arming queue is to arm the clock work queue.

Further in accordance with an exemplary embodiment of the present invention the clock completion queue is also to provide timing information.

Additionally in accordance with an exemplary embodiment of the present invention at least one of the clock work queue and the clock completion queue is implemented in firmware.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will be understood and appreciated more fully from the following detailed description, taken in conjunction with the drawings in which:

FIG. 1A is a simplified block diagram illustration of a clock queue based system, constructed and operative in accordance with an exemplary embodiment of the present invention;

FIG. 1B is a simplified block diagram illustration of a clock queue based system, comprising an alternative exemplary embodiment of the system of FIG. 1A;

FIG. 2A is a simplified block diagram illustration of a clock queue based system, constructed and operative in accordance with an exemplary embodiment of the present invention;

FIG. 2B is a simplified block diagram illustration of a clock queue based system, comprising an alternative exemplary embodiment of the system of FIG. 2A;

FIG. 3A is a simplified block diagram illustration of a particular example of the clock queue based system of FIG. 2A;

FIG. 3B is a simplified block diagram illustration of a particular example of the clock queue based system of FIG. 2B;

FIG. 4A is a simplified block diagram illustration of a clock queue based system, constructed and operative in accordance with an exemplary embodiment of the present invention;

FIG. 4B is a simplified block diagram illustration of a clock queue based system, comprising an alternative exemplary embodiment of the system of FIG. 4A; and

FIGS. 5-7 are simplified flowchart illustrations of exemplary modes of operation of exemplary embodiments of the present invention.

DETAILED DESCRIPTION OF AN EMBODIMENT

As described in U.S. patent application Ser. No. 16/430,457 of Levi et al, published as US Published Patent Application 2019/0379714, the disclosure of which has been incorporated herein by reference:

- A “send enable” work request (which may comprise a work queue element (WQE), as is known in InfiniBand) is posted to a so-called “master” send queue. The posted WQE has a form/contents which indicated that a WQE from a “different” queue (not from the master send queue) should be executed and sent. In the meantime, in the “different” queue, a slave send queue, WQEs are posted indicating that data should be sent. However, continuing with the present example, in the slave queue no doorbell is executed, so the WQEs in the slave queue are not executed and sent at the time that the WQEs are posted; such doorbell/s are generally sent to a network interface controller (NIC) which has access to the queues and to memory pointed to by WQEs. In the meantime a hardware packing mechanism causes doorbells to be generated by the NIC (generally every short and deterministic period of time, such as for example every few nanoseconds); these doorbells are executed in the master queue, causing NOP WQEs (each of which produces a delay as specified above) to be executed; finally, when the “send enable” work request in the master send queue is executed, this causes a doorbell to be issued to the slave queue, and the WQEs therein are then executed, causing data (packets) indicated by the slave queue WQEs to be sent. Thus, the master queue synchronizes send of data based on the WQEs in the slave queue.
- The solution described immediately above may create many queues, because there is master queue per slave queue, and hence one master queue per stream of packets to be sent. An alternative solution may be implemented as follows, with all streams for a given bit rate being synchronized to a master queue for that bit rate:
- For every specific synchronization interval (that is, for every given time desired between doorbells in a slave queue, the doorbells causing, as described above, data packets to be sent) a reference queue (“master” queue) is established, containing a constant number of NOP work requests followed by a send enable work request. In the particular non-limiting example in which a NOP work request has the same transmission time as 8 bits and therefore represents 8 bits of delay (with the same being true for a send enable work request), then:

$\frac{((number of NOP plus Send Enable work requests) * 8 bits)}{bitrate}$

should be exactly equal to the synchronization interval (to an accuracy of the transmission time of 8 bits). If higher accuracy is needed, the bitrate for the “master” queue and the number of NOP work requests could be increased in order to increase accuracy.

- After the NOP work requests as described above have been posted, the send enable work request as described above is posted. The send enable work request sends a doorbell to each slave queue, such that each slave queue will send data packets in accordance with the WQEs therein.
- Dedicated software (which could alternatively be implemented in firmware, hardware, etc) indefinitely continues to report NOP and send enable work requests to the “master” queue, so that the process continues with subsequent synchronization intervals; it being appreciated that if no more data packets are to be sent, the dedicated software may cease to post NOP and send enable work requests in the “master” queue (which ceasing may be based on user intervention).
- From the above description it will be appreciated that the software overhead in this alternative solution is per synchronization period, not per transmitted queue, nor per bitrate.
- With reference to the above-described embodiments, alternatively the doorbell sent to the slave queue or queues may be sent when a completion queue entry (CQE) is posted to a completion queue, after processing of a send enable WQE.

Reference is now made to FIG. 1A, which is a simplified block diagram illustration of a clock queue based system, constructed and operative in accordance with an exemplary embodiment of the present invention. The system of FIG. 1A is generally, but not necessarily, comprised in a network interface card (NIC), it being appreciated that other suitable embodiments (which, in light of the present description, will be evident to persons skilled in the art) are also possible. In the system of FIG. 1A, similarly to what is described immediately above with reference to U.S. patent application Ser. No. 16/430,457 of Levi et al, a system generally designated 100 is illustrated.

The system 100 includes a clock work queue 120, which is a work queue that has been posted with dummy commands (NOP descriptors); these NOP descriptors are used for packet rate enforcement. In some exemplary embodiments, the commands posted to the dock work queue may not necessarily be NOP commands; other commands may also lead to the desired packet rate enforcement behavior. For simplicity of depiction and description and without limiting the generality of the foregoing, NOP commands are generally described herein.

In general, the system 100 will trigger every “clock-tick” time. If, by way of non-limiting example, the clock-tick is 500 nanoseconds, the system 100 will execute 2 million commands per second in the clock work queue 120, in order to maintain the desired pace.

It will be appreciated that it will be necessary to re-post the NOP commands to the clock work queue 120 (typically by software; although alternatively, by way of non-limiting example, by firmware) every Queue-size/2 time, in order for the system 100, and in particular the clock queue 120, to run indefinitely. The NIC (or other system in which the system 100 is embedded) is configured not to check the NOP index, thereby to allow the software (for example) to write only a single NOP command into the clock work queue 120, and only update the door-bell record to send additional clock-queue-size/2 commands. The preceding is true since (in a typical case) all commands in the clock work queue 120 are the same, so that the clock work queue 120 may hold a single command with index 0, but a HW doorbell register (not shown) is armed to execute 16,000 such commands (by way of non-limiting example). In order to accomplish this, the system 100 is configured not to check the index of commands in the clock work queue 120, so that the system 100 will execute the same NOP command 16,000 times. While so operating, the system 100 will incrementing internally a “producer index” (pi, producer indexes being well known in the art) which is wrapped around at some maximum index value (such as, by way of non-limiting example, 16,000) and therefore the HW cannot be armed for more than 16,000 commands at once. In practice, actual limits (as opposed to 16,000) are generally an exact power of 2, such as, by way of non-limiting example, 16,384.

As just discussed, a typical size in entries of the clock work queue 120 would be 16K (16,384); at such a size, the software (for example) would need to arm (reload) the clock work queue 120 for every 8K commands that executed. It will be appreciated that 8K commands at a pace of 2 million commands per second represent a 4 millisecond interval, This would mean, in a software implementation, that software will have to “wake up” every 4 milliseconds to re-arm the clock work queue 120. In addition to CPU involvement in running such software, there is an important real-time restriction, since in the described scenario the software must wake up every 4 milliseconds. If the software woke up too late, the clock queue 120 will become empty (stop ticking), having a very negative impact on the reliability of the system 100.

In addition, the inventors of the present invention believe that real-time requirements on software are extremely problematic, since such requirements are not functional requirements, and (to ensure reliability) should be tested against any contemplated actual system, in any load that the contemplated actual system is intended to run. Such a requirement is believed to be extremely problematic and challenging, and would add a significant cost to the system 100.

The system of FIG. 1A also comprises a. clock completion queue 130, which contains an entry for the completion of each NOP command execution. Each such completion is generated every clock tick.

It is appreciated that one or both of the clock work queue 120 and the clock completion queue 130 may alternatively be situated in a host external to the system 100 and in operative communication therewith. It is also appreciated that each of the clock work queue 120 and the clock completion queue 130 may be implemented either in software or in firmware.

The system of FIG. 1A also optionally (optionally in certain exemplary embodiments) comprises a work queue (send queue) 110, which contains descriptors to data, which needs to be accurately transmitted “to the wire” (to exit the system, such as a NIC, in which the system 100 of FIG. 1A is comprised, for network transmission) at a specific network time. The work queue 110 may, in certain exemplary embodiments, serve a particular application running on a host with which the system 100 is in operative communication.

It is appreciated that, while not shown in any of FIGS. 1A-4B, the systems of FIGS. 1A-4B each generally (but optionally) comprise a packet sending module or circuitry, as is known in the art, for sending packets “to the wire”. More precisely, the specific time as described may be considered to be “do not transmit before a specific time”. The “fencing” (accurate scheduling) of transmission is done by a special command that fences the execution until a specific index of a completion message is generated.

The inventors of the present invention believe that the system described in U.S. patent application Ser. No. 16/430,457 of Levi et al, which has been incorporated herein by reference (and similarly the system of FIG. 1A described immediately above) has certain drawbacks which are intended to be overcome in certain exemplary embodiments of the present invention. In particular (referring to FIG. 1A by way of non-limiting example), it would be necessary to repost packets to the clock queue 120 and to arm the doorbell record. (It is noted that the “doorbell record” referred to here is well known in the art, and is not shown in the drawings; it is a static entity implemented for example as a set of registers for each queue holding for that queue a consumer index ci and a producer index pi. The consumer index indicate how many jobs (tasks) have been competed, while the pi indicates how many job published for execution. When ci=pi there is no more work to do at the present time).

Generally speaking, such operations would take place under software control and would consume significant resources. Moreover, such operations would need to be “real time” in software terms, so that each queue of a given pace would need to be armed in accordance with a different real time pace. In a realistic scenario, a system would handle (by way of non-limiting example) 2 million packets per second. The inventors of the present invention further believe that, in a software-implemented system, changes (such as changes in clock rate/pace or addition of a clock at a new rate i pace) would cause a need to retest software due to the new burdens placed on the software. The present invention, in exemplary embodiments thereof, is intended to overcome such limitations, particularly, but not exclusively, by being designed to minimize or even eliminate software resources in reposting and arming as described above (in particular, with respect to the above discussion of real-time requirements in software).

Reference is now additionally made to FIG. 1B, which is a simplified block diagram illustration of a clock queue based system, comprising an alternative exemplary embodiment of the system of FIG. 1A. The system of FIG. 1B is similar to the system of FIG. 1A, except that for a subsystem 135 (comprising the clock work queue 120 and the clock completion queue 130) there may be a plurality of work queues 110 each of which, in certain exemplary embodiments, may serve a particular application running on a host with which the system 100 is in operative communication, such that a plurality of applications may be served by the subsystem 135.

The concept of memory protection (which is described, for example, in en.wikipedia.org/wiki/Memory_protection) may be useful in understanding the exemplary embodiment of FIG. 1B, as well as certain other exemplary embodiments of the present invention. In general, any given application will be associated with a particular protection domain; and different applications may be associated with different protection domains. In some case, if the subsystem 135 is implemented in software, then in order to access the subsystem 135, that given application would generally need to be in the same protection domain as the subsystem 135. On the other hand, if the subsystem 135 is implemented in firmware, then the subsystem 135 will generally be in a trusted zone, and hence access between the subsystem 135 and any given application will be possible regardless of the particular protection domain with which the given application is associated. The previous explanation regarding a plurality of applications also applies, mutatis mutandis, to a plurality of virtual environments, such as virtual machines.

Reference is now made to FIG. 2A, which is a simplified block diagram illustration of a clock queue based system, constructed and operative in accordance with an exemplary embodiment of the present invention.

The system of FIG. 2A, generally designated 200, is similar to the system of FIG. 1A except as described below; the system of FIG. 2A comprises a send queue 210 (which is optional in certain exemplary embodiments) similar to the send queue 110 of FIG. 1A, a clock work queue 220 similar to the clock work queue 110 of FIG. 1A, and a clock completion queue 230 similar to the clock completion queue 130 of FIG. 1A.

Additionally, the system 200 of FIG. 2A comprises an arming queue 205. The arming queue 205 is constructed and operative to arm the clock work queue 220, thus simplifying the process described above for reposting and arming.

The arming queue 205 is posted with 2 different commands one after the other, repeatedly. One such command is a “wait” command. By way of non-limiting example, the wait command may be an instruction to wait for the next index which is 8000 greater than a current index in the clock work queue 220. Typically, this would represent a 4 microsecond wait period. The other command is a “send_enable” command, which is a command to trigger a further 8000 doorbell records in the clock work queue 220. The action here described replaces the action described above as taking place in software; it will appreciate that this action does not require software intervention.

As here described (by way of non-limiting example), the arming queue 205 need be triggered every 8000×4 milliseconds=32 sec; it is appreciated that such a requirement (triggering once every 32 seconds by software) represents negligible overhead. In other words, if software is required to carry out such a task only once every 32 seconds, there is no real-time requirement on the software, and the problems stated above with regard to real-time requirements on software are deemed to be overcome.

Reference is now additionally made to FIG. 2B, which is a simplified, block diagram illustration of a clock queue based system, comprising an alternative exemplary embodiment of the system of FIG. 2A. The system of FIG. 2B is similar to the system of FIG. 2A, except that for a subsystem 235 (comprising the clock work queue 220 and the clock completion queue 230) there may be a plurality of work queues 210 each of which, in certain exemplary embodiments, may serve a particular application running on a host with which the system 200 is in operative communication, such that a plurality of applications may be served by the subsystem 235.

Reference is now additionally made to FIG. 3A, which is a simplified block diagram illustration of a particular example of the clock queue based system of FIG. 2A; and to FIG. 3B, which is a simplified block diagram illustration of a particular example of the clock queue based system of FIG. 2B. The examples of FIGS. 3A and 3B shows in detail exemplary queue entries in the arming queue 305, with alternating send_enable entries (each entry indicating and index 8000 greater than the previous entry) and wait entries (each indicating a wait for a next index which is 8000 greater than the previous).

Reference is now made to FIG. 4A, which is a simplified block diagram illustration of a clock queue based system, constructed and operative in accordance with an exemplary embodiment of the present invention.

The system of FIG. 4A, generally designated 400, is similar to the system of FIG. 2A except as described below; the system of FIG. 4A comprises a send, queue 410 (which may in certain exemplary embodiments be optional) similar to the send queue 210 of FIG. 2A, a clock work queue 420 similar to the clock work queue 210 of FIG. 2A, and a clock completion queue 430 similar to the clock completion queue 330 of FIG. 2A.

In the system of FIG. 4A, compared to the system of FIG. 2A, the arming queue 205 of FIG. 2A has been replaced with an even arming queue 405 and an odd arming queue 407.

In the system 400, with two arming queues (the even arming queue 405 and the odd arming queue 407), each of the two arming queues contain waits and send_enable entries as described above with reference to FIG. 2A. In addition, at the end of each arming queue (the even arming queue 405 and the odd arming queue 407) there is an additional send_enable command which is operative to arm the “other” arming queue (the even arming queue 405 arms the odd arming queue 407 and vice versa). Thus, in the system 400 the system is “self-arming” indefinitely, so that no software operation is needed.

Persons skilled in the art will appreciate that, for simplicity of depiction and description, two arming queues (the even arming queue 405 and the odd arming queue 407) are described; in principal, three or more such queues may be used.

Reference is now additionally made to FIG. 4B, which is a simplified block diagram illustration of a clock queue based system, comprising an alternative exemplary embodiment of the system of FIG. 4A. The system of FIG. 4B is similar to the system of FIG. 4A, except that for a subsystem 435 (comprising the clock work queue 420 and the clock completion queue 430 there may be a plurality of work queues 410 each of which, in certain exemplary embodiments, may serve a particular application running on a host with which the system 400 is in operative communication, such that a plurality of applications may be served by the subsystem 435.

The various components comprised in the systems 100, 200, 300, and 400 and described above may also be termed herein, separately and collectively, “circuitry”.

Reference is now made to FIGS. 5-7, which are simplified flowchart illustrations of exemplary modes of operation of exemplary embodiments of the present invention. FIG. 5 represents an exemplary mode of operation, with FIGS. 6 and 7 representing further steps that may be added to the method of FIG. 5.

FIGS. 5-7 will be best understood with reference to the above discussion of FIGS. 2A-4B.

The method of FIG. 5 comprises the following steps which, as indicated in step 505, are performed in timing circuitry which comprises an arming queue, a clock work queue, and a clock completion queue.

The clock work queue provides timing information (step 510), while the arming queue arms the clock work queue (step 520). In certain embodiments, the clock completion queue may also provide timing information.

In FIG. 6, the clock work queue (additionally to the steps of FIG. 5) synchronizes a sending time of packets. The packets are pointed to by entries in a send queue. The send queue, in turn, is configured to hold entries pointing to packets to be transmitted. The synchronization occurs via interaction with the clock completion queue (step 610).

In FIG. 7 (additionally to the steps of FIG. 5 plus FIG. 6), packet sending circuitry transmits one or more packets over a network. The packet sending circuitry transmits the one or more packets in accordance with the sending time of corresponding entries in the send queue (step 710).

It is appreciated that software components of the present invention may, if desired, be implemented in ROM (read only memory) form. The software components may, generally, be implemented in hardware, if desired, using conventional techniques. It is further appreciated that the software components may be instantiated, for example: as a computer program product or on a tangible medium. In some cases, it may be possible to instantiate the software components as a signal interpretable by an appropriate computer, although such an instantiation may be excluded in certain embodiments of the present invention.

It is appreciated that various features of the invention which are, for clarity, described in the contexts of separate embodiments may also be provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, described in the context of a single embodiment may also be provided separately or in any suitable subcombination.

It will be appreciated by persons skilled in the art that the present invention is not limited by what has been particularly shown and described hereinabove. Rather the scope of the invention is defined by the appended claims and equivalents thereof:

Claims

1. A timing system comprising: an arming queue;a clock work queue; anda clock completion queue,the arming queue, the clock work queue, and the clock completion queue being comprised in a hardware-instantiated network interface card (NIC),wherein at least the clock work queue is to provide timing information, andthe arming queue comprises at least one “wait” work request and at least one “send enable” work request, and is to arm the clock work queue at least by sending a “send enable” work request to the clock work queue.
2. The timing system according to claim 1 and wherein the clock completion queue is also to provide timing information.
3. The timing system according to claim 1 and wherein the clock work queue is for synchronizing a sending time of packets pointed to by entries in a send queue to hold entries pointing to packets to be transmitted, via interaction with the clock completion queue.
4. The timing system according to claim 3 and wherein the send queue is associated with an application running in a host external to the timing system.
5. The timing system according to claim 3 and wherein the send queue comprises a plurality of send queues each of which is associated with an application running in a host external to the timing system.
6. The timing system according to claim 3 and wherein at least one of the clock work queue and the clock completion queue is implemented in firmware.
7. The timing system according to claim 6 and wherein the send queue comprises a plurality of send queues each of which is associated with an application running in a host external to the timing system, and a least one said application is associated with a different protection domain than at least one other said application.
8. The timing system according to claim 3 and also comprising packet sending circuitry to transmit one or more packets over a network, wherein the packet sending circuitry is further to transmit said one or more packets in accordance with the sending time of corresponding entries in the send queue.
9. The timing system according to claim 1 and wherein: the arming queue comprises at least a first arming queue and a second arming queue, andthe first arming queue is to arm the second arming queue, andthe second arming queue is to arm the first arming queue.
10. A method for packet transmission comprising: performing the following in: an arming queue; a clock work queue; and a clock completion queue, the arming queue, the clock work queue, and the clock completion queue being comprised in a hardware-instantiated network interface card (NIC): the clock work queue providing timing information; andthe arming queue comprising at least one “wait” work request and at least one “send enable” work request and arming the clock work queue at least by sending a “send enable” work request to the clock work queue.
11. The method according to claim 10 and wherein the clock work queue synchronizes a sending time of packets pointed to by entries in a send queue to hold entries pointing to packets to be transmitted, via interaction with the clock completion queue.
12. The method according to claim 11 and wherein the send queue is associated with an application running in a host external to the timing circuitry.
13. The method according to claim 11 and wherein the send queue comprises a plurality of send queues each of which is associated with an application running in a host external to the timing circuitry.
14. The method according to claim 11 and also comprising, in packet sending circuitry, transmitting one or more packets over a network, wherein the packet sending circuitry transmits said one or more packets in accordance with the sending time of corresponding entries in the send queue.
15. The method according to claim 10 and wherein the arming queue comprises at least a first arming queue and a second arming queue, and the method also comprises: the first arming queue arming the second arming queue; andthe second arming queue arming the first arming queue.
16. A timing system comprising: a host system comprising a clock work queue and a clock completion queue; andtiming circuitry in operative communication with the host system and comprising an arming queue,wherein at least the clock work queue is to provide timing information, andthe arming queue comprises at least one “wait” work request and at least one “send enable” work request, and is to arm the clock work queue at least by sending a “send enable” work request to the clock work queue,wherein the arming queue, the clock work queue, and the clock completion queue are instantiated in hardware.
17. The timing system according to claim 16 and wherein the clock completion queue is also to provide timing information.
18. The timing system according to claim 16 and wherein the clock work queue is for synchronizing a sending time of packets pointed to by entries in a send queue to hold entries pointing to packets to be transmitted, via interaction with the clock completion queue.
19. The timing system according to claim 16 and wherein the send queue comprises a plurality of send queues each of which is associated with an application running in the host, and a least one said application is associated with a different protection domain than at least one other said application.

RELATED APPLICATION INFORMATION

The present application claims priority from U.S. Provisional Patent Application Ser. No. 63/047,275 of Ariel Shahar et al, filed 2 Jul. 2020, the disclosure of which is hereby incorporated herein by reference.

US Referenced Citations (286)

Number	Name	Date	Kind
4933969	Marshall et al.	Jun 1990	A
5068877	Near et al.	Nov 1991	A
5325500	Bell et al.	Jun 1994	A
5353412	Douglas et al.	Oct 1994	A
5404565	Gould et al.	Apr 1995	A
5408469	Opher et al.	Apr 1995	A
5606703	Brady et al.	Feb 1997	A
5944779	Blum	Aug 1999	A
6041049	Brady	Mar 2000	A
6115394	Balachandran et al.	Sep 2000	A
6212197	Christensen et al.	Apr 2001	B1
6370502	Wu et al.	Apr 2002	B1
6438137	Turner et al.	Aug 2002	B1
6483804	Muller et al.	Nov 2002	B1
6507562	Kadansky et al.	Jan 2003	B1
6728862	Wilson	Apr 2004	B1
6816492	Turner et al.	Nov 2004	B1
6857004	Howard et al.	Feb 2005	B1
6937576	Di Benedetto et al.	Aug 2005	B1
7102998	Golestani	Sep 2006	B1
7124180	Ranous	Oct 2006	B1
7164422	Wholey, III et al.	Jan 2007	B1
7171484	Krause et al.	Jan 2007	B1
7224669	Kagan et al.	May 2007	B2
7245627	Goldenberg et al.	Jul 2007	B2
7313582	Bhanot et al.	Dec 2007	B2
7327693	Rivers et al.	Feb 2008	B1
7336646	Muller	Feb 2008	B2
7346698	Hannaway	Mar 2008	B2
7555549	Campbell et al.	Jun 2009	B1
7613774	Caronni et al.	Nov 2009	B1
7636424	Halikhedkar et al.	Dec 2009	B1
7636699	Stanfill	Dec 2009	B2
7676597	Kagan et al.	Mar 2010	B2
7738443	Kumar	Jun 2010	B2
7760743	Shokri et al.	Jul 2010	B2
8213315	Crupnicoff et al.	Jul 2012	B2
8255475	Kagan et al.	Aug 2012	B2
8370675	Kagan	Feb 2013	B2
8380880	Gulley et al.	Feb 2013	B2
8510366	Anderson et al.	Aug 2013	B1
8645663	Kagan et al.	Feb 2014	B2
8738891	Karandikar et al.	May 2014	B1
8761189	Shachar et al.	Jun 2014	B2
8768898	Trimmer et al.	Jul 2014	B1
8775698	Archer et al.	Jul 2014	B2
8811417	Bloch et al.	Aug 2014	B2
9110860	Shahar	Aug 2015	B2
9189447	Faraj	Nov 2015	B2
9294551	Froese et al.	Mar 2016	B1
9344490	Bloch et al.	May 2016	B2
9397960	Arad et al.	Jul 2016	B2
9456060	Pope et al.	Sep 2016	B2
9563426	Bent et al.	Feb 2017	B1
9626329	Howard	Apr 2017	B2
9756154	Jiang	Sep 2017	B1
10015106	Florissi et al.	Jul 2018	B1
10027601	Narkis et al.	Jul 2018	B2
10158702	Bloch et al.	Dec 2018	B2
10187400	Castro et al.	Jan 2019	B1
10284383	Bloch et al.	May 2019	B2
10296351	Kohn et al.	May 2019	B1
10305980	Gonzalez et al.	May 2019	B1
10318306	Kohn et al.	Jun 2019	B1
10320508	Shimizu et al.	Jun 2019	B2
10425350	Florissi	Sep 2019	B1
10521283	Shuler et al.	Dec 2019	B2
10528518	Graham et al.	Jan 2020	B2
10541938	Timmerman et al.	Jan 2020	B1
10547553	Shattah et al.	Jan 2020	B2
10621489	Appuswamy et al.	Apr 2020	B2
10727966	Izenberg et al.	Jul 2020	B1
11088971	Brody et al.	Aug 2021	B2
20020010844	Noel et al.	Jan 2002	A1
20020035625	Tanaka	Mar 2002	A1
20020150094	Cheng et al.	Oct 2002	A1
20020150106	Kagan et al.	Oct 2002	A1
20020152315	Kagan et al.	Oct 2002	A1
20020152327	Kagan et al.	Oct 2002	A1
20020152328	Kagan et al.	Oct 2002	A1
20020165897	Kagan et al.	Nov 2002	A1
20030002483	Zwack	Jan 2003	A1
20030018828	Craddock et al.	Jan 2003	A1
20030061417	Craddock et al.	Mar 2003	A1
20030065856	Kagan	Apr 2003	A1
20030120835	Kale et al.	Jun 2003	A1
20040030745	Boucher et al.	Feb 2004	A1
20040062258	Grow et al.	Apr 2004	A1
20040078493	Blumrich et al.	Apr 2004	A1
20040120331	Rhine et al.	Jun 2004	A1
20040123071	Stefan et al.	Jun 2004	A1
20040174820	Ricciulli	Sep 2004	A1
20040252685	Kagan et al.	Dec 2004	A1
20040260683	Chan et al.	Dec 2004	A1
20050097300	Gildea et al.	May 2005	A1
20050122329	Janus	Jun 2005	A1
20050129039	Biran et al.	Jun 2005	A1
20050131865	Jones et al.	Jun 2005	A1
20050223118	Tucker et al.	Oct 2005	A1
20050281287	Ninomi et al.	Dec 2005	A1
20060095610	Arndt	May 2006	A1
20060282838	Gupta et al.	Dec 2006	A1
20070127396	Jain et al.	Jun 2007	A1
20070127525	Sarangam et al.	Jun 2007	A1
20070162236	Lamblin et al.	Jul 2007	A1
20080040792	Larson et al.	Feb 2008	A1
20080104218	Liang et al.	May 2008	A1
20080126564	Wilkinson	May 2008	A1
20080168471	Benner et al.	Jul 2008	A1
20080181260	Vonog et al.	Jul 2008	A1
20080192750	Ko et al.	Aug 2008	A1
20080219159	Chateau et al.	Sep 2008	A1
20080244220	Lin et al.	Oct 2008	A1
20080263329	Archer et al.	Oct 2008	A1
20080288949	Bohra et al.	Nov 2008	A1
20080298380	Rittmeyer et al.	Dec 2008	A1
20080307082	Cai et al.	Dec 2008	A1
20090037377	Archer et al.	Feb 2009	A1
20090063816	Arimilli et al.	Mar 2009	A1
20090063817	Arimilli et al.	Mar 2009	A1
20090063891	Arimilli et al.	Mar 2009	A1
20090182814	Tapolcai et al.	Jul 2009	A1
20090240838	Berg et al.	Sep 2009	A1
20090247241	Gollnick et al.	Oct 2009	A1
20090292905	Faraj	Nov 2009	A1
20090296699	Hefty	Dec 2009	A1
20090327444	Archer et al.	Dec 2009	A1
20100017420	Archer et al.	Jan 2010	A1
20100049836	Kramer	Feb 2010	A1
20100074098	Zeng et al.	Mar 2010	A1
20100095086	Eichenberger et al.	Apr 2010	A1
20100185719	Howard	Jul 2010	A1
20100241828	Yu et al.	Sep 2010	A1
20100274876	Kagan et al.	Oct 2010	A1
20100329275	Johnsen et al.	Dec 2010	A1
20110060891	Jia	Mar 2011	A1
20110066649	Berlyant et al.	Mar 2011	A1
20110093258	Xu et al.	Apr 2011	A1
20110119673	Bloch et al.	May 2011	A1
20110173413	Chen et al.	Jul 2011	A1
20110219208	Asaad	Sep 2011	A1
20110238956	Arimilli et al.	Sep 2011	A1
20110258245	Blocksome et al.	Oct 2011	A1
20110276789	Chambers et al.	Nov 2011	A1
20120063436	Thubert et al.	Mar 2012	A1
20120117331	Krause et al.	May 2012	A1
20120131309	Johnson	May 2012	A1
20120254110	Takemoto	Oct 2012	A1
20130117548	Grover et al.	May 2013	A1
20130159410	Lee et al.	Jun 2013	A1
20130159568	Shahar et al.	Jun 2013	A1
20130215904	Zhou et al.	Aug 2013	A1
20130250756	Johri et al.	Sep 2013	A1
20130312011	Kumar et al.	Nov 2013	A1
20130318525	Palanisamy et al.	Nov 2013	A1
20130336292	Kore et al.	Dec 2013	A1
20140019574	Cardona et al.	Jan 2014	A1
20140033217	Vajda et al.	Jan 2014	A1
20140040542	Kim et al.	Feb 2014	A1
20140047341	Breternitz et al.	Feb 2014	A1
20140095779	Forsyth et al.	Apr 2014	A1
20140122831	Uliel et al.	May 2014	A1
20140136811	Fleischer et al.	May 2014	A1
20140189308	Hughes et al.	Jul 2014	A1
20140211804	Makikeni et al.	Jul 2014	A1
20140258438	Ayoub	Sep 2014	A1
20140280420	Khan	Sep 2014	A1
20140281370	Khan	Sep 2014	A1
20140362692	Wu et al.	Dec 2014	A1
20140365548	Mortensen	Dec 2014	A1
20140379714	Hankins	Dec 2014	A1
20150046741	Yen et al.	Feb 2015	A1
20150055508	Ashida et al.	Feb 2015	A1
20150074373	Sperber et al.	Mar 2015	A1
20150106578	Warfield et al.	Apr 2015	A1
20150143076	Khan	May 2015	A1
20150143077	Khan	May 2015	A1
20150143078	Khan et al.	May 2015	A1
20150143079	Khan	May 2015	A1
20150143085	Khan	May 2015	A1
20150143086	Khan	May 2015	A1
20150154058	Miwa et al.	Jun 2015	A1
20150178211	Hiramoto et al.	Jun 2015	A1
20150180785	Annamraju	Jun 2015	A1
20150188987	Reed et al.	Jul 2015	A1
20150193271	Archer et al.	Jul 2015	A1
20150212972	Boettcher et al.	Jul 2015	A1
20150261720	Kagan et al.	Sep 2015	A1
20150269116	Raikin et al.	Sep 2015	A1
20150278347	Meyer et al.	Oct 2015	A1
20150318015	Bose et al.	Nov 2015	A1
20150347012	Dewitt et al.	Dec 2015	A1
20150365494	Cardona et al.	Dec 2015	A1
20150379022	Puig et al.	Dec 2015	A1
20160055225	Xu et al.	Feb 2016	A1
20160092362	Barron et al.	Mar 2016	A1
20160105494	Reed et al.	Apr 2016	A1
20160112531	Milton et al.	Apr 2016	A1
20160117277	Raindel et al.	Apr 2016	A1
20160119244	Wang et al.	Apr 2016	A1
20160179537	Kunzman et al.	Jun 2016	A1
20160219009	French	Jul 2016	A1
20160246646	Craciunas et al.	Aug 2016	A1
20160248656	Anand et al.	Aug 2016	A1
20160283422	Crupnicoff et al.	Sep 2016	A1
20160294793	Larson et al.	Oct 2016	A1
20160299872	Vaidyanathan et al.	Oct 2016	A1
20160342568	Burchard et al.	Nov 2016	A1
20160352598	Reinhardt et al.	Dec 2016	A1
20160364350	Sanghi et al.	Dec 2016	A1
20170063613	Bloch et al.	Mar 2017	A1
20170093715	McGhee et al.	Mar 2017	A1
20170116154	Palmer et al.	Apr 2017	A1
20170187496	Shalev et al.	Jun 2017	A1
20170187589	Pope et al.	Jun 2017	A1
20170187629	Shalev et al.	Jun 2017	A1
20170187846	Shalev et al.	Jun 2017	A1
20170192782	Valentine et al.	Jul 2017	A1
20170199844	Burchard et al.	Jul 2017	A1
20170262517	Horowitz et al.	Sep 2017	A1
20170308329	A et al.	Oct 2017	A1
20170331926	Raveh et al.	Nov 2017	A1
20170344589	Kafai et al.	Nov 2017	A1
20180004530	Vorbach	Jan 2018	A1
20180046901	Xie et al.	Feb 2018	A1
20180047099	Bonig et al.	Feb 2018	A1
20180089278	Bhattacharjee et al.	Mar 2018	A1
20180091442	Chen et al.	Mar 2018	A1
20180097721	Matsui et al.	Apr 2018	A1
20180115529	Munger et al.	Apr 2018	A1
20180173673	Daglis et al.	Jun 2018	A1
20180262551	Demeyer et al.	Sep 2018	A1
20180278549	Mula et al.	Sep 2018	A1
20180285316	Thorson et al.	Oct 2018	A1
20180287928	Levi et al.	Oct 2018	A1
20180302324	Kasuya	Oct 2018	A1
20180321912	Li et al.	Nov 2018	A1
20180321938	Boswell et al.	Nov 2018	A1
20180349212	Liu et al.	Dec 2018	A1
20180367465	Levi	Dec 2018	A1
20180375781	Chen et al.	Dec 2018	A1
20190018805	Benisty	Jan 2019	A1
20190026250	Das Sarma et al.	Jan 2019	A1
20190044827	Ganapathi et al.	Feb 2019	A1
20190044875	Murty et al.	Feb 2019	A1
20190044889	Serres et al.	Feb 2019	A1
20190056972	Zhou et al.	Feb 2019	A1
20190065208	Liu et al.	Feb 2019	A1
20190068501	Schneder et al.	Feb 2019	A1
20190102179	Fleming et al.	Apr 2019	A1
20190102338	Tang et al.	Apr 2019	A1
20190102640	Balasubramanian	Apr 2019	A1
20190114533	Ng et al.	Apr 2019	A1
20190121388	Knowles et al.	Apr 2019	A1
20190124524	Gormley	Apr 2019	A1
20190138638	Pal et al.	May 2019	A1
20190141133	Rajan et al.	May 2019	A1
20190147092	Pal et al.	May 2019	A1
20190149486	Bohrer et al.	May 2019	A1
20190149488	Bansal et al.	May 2019	A1
20190171612	Shahar et al.	Jun 2019	A1
20190235866	Das Sarma et al.	Aug 2019	A1
20190278737	Kozomora et al.	Sep 2019	A1
20190303168	Fleming, Jr. et al.	Oct 2019	A1
20190303263	Fleming, Jr. et al.	Oct 2019	A1
20190319730	Webb et al.	Oct 2019	A1
20190324431	Cella et al.	Oct 2019	A1
20190339688	Cella et al.	Nov 2019	A1
20190347099	Eapen et al.	Nov 2019	A1
20190369994	Parandeh Afshar et al.	Dec 2019	A1
20190377580	Vorbach	Dec 2019	A1
20190379714	Levi et al.	Dec 2019	A1
20200005859	Chen et al.	Jan 2020	A1
20200034145	Bainville et al.	Jan 2020	A1
20200057748	Danilak	Feb 2020	A1
20200103894	Cella et al.	Apr 2020	A1
20200106828	Elias et al.	Apr 2020	A1
20200137013	Jin et al.	Apr 2020	A1
20200202246	Lin et al.	Jun 2020	A1
20200265043	Graham et al.	Aug 2020	A1
20200274733	Graham et al.	Aug 2020	A1
20210203621	Ylisirnio et al.	Jul 2021	A1
20210218808	Graham et al.	Jul 2021	A1
20210243140	Levi et al.	Aug 2021	A1
20220078043	Marcovitch	Mar 2022	A1
20220201103	Keppel et al.	Jun 2022	A1

Foreign Referenced Citations (1)

Number	Date	Country
2012216611	Mar 2013	AU

Non-Patent Literature Citations (79)

Entry
U.S. Appl. No. 16/782,118 Office Action dated Jun. 15, 2022.
U.S. Appl. No. 17/147,487 Office Action dated Jun. 30, 2022.
U.S. Appl. No. 16/782,118 Office Action dated Sep. 7, 2022.
IEEE Standard 1588-2008, “IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems”, pp. 1-289, year 2008.
Weibel et al., “Implementation and Performance of Time Stamping Techniques”, 2004 Conference on IEEE 1588, pp. 1-29, Sep. 28, 2004.
InfiniBandTM Architecture Specification vol. 1, Release 1.2.1,pp. 1-1727, Nov. 2007.
Lu et al., “A Fast CRC Update Implementation”, Computer Engineering Laboratory, Electrical Engineering Department, pp. 113-120, Oct. 8, 2003.
Mellette et al., “Toward Optical Switching in the Data Center”, IEEE 19th International Conference on High Performance Switching and Routing (HPSR), pp. 1-6, Bucharest, Romania, Jun. 18-20, 2018.
Bakopoulos et al., “NEPHELE: an end-to-end scalable and dynamically reconfigurable optical architecture for application-aware SDN cloud datacenters”, IEEE Communications Magazine, vol. 56, issue 2, pp. 1-26, Feb. 2018.
O-RAN Alliance, “O-RAN Fronthaul Working Group; Control, User and Synchronization Plane Specification”, ORAN-WG4.CUS.0-v01.00, Technical Specification, pp. 1-189, year 2019.
Vattikonda et al., “Practical TDMA for Datacenter Ethernet”, EuroSys conference, Bern, Switzerland, pp. 225-238, Apr. 10-13, 2012.
Ericsson AB et al., “Common Public Radio Interface: eCPRI Interface Specification”, V2.0, pp. 1-109, May 10, 2019.
Xilinx Inc., “Radio over Ethernet Framer v2.1”, PB056 (v2.1), pp. 1-9, Oct. 30, 2019.
Weibel, H., “High Precision Clock Synchronization according to IEEE 1588 Implementation and Performance Issues”, Zurich University of Applied Sciences, pp. 1-9, Jan. 17, 2005.
Sanchez-Palencia, J., “[RFC,v3,net-next,00/18] Time based packet transmission”, pp. 1-14, Mar. 7, 2018.
IEEE Std 802.1Qaz™, “IEEE Standard for Local and metropolitan area networks—Media Access Control (MAC) Bridges and Virtual Bridged Local Area Networks—Amendment 18: Enhanced Transmission Selection for Bandwidth Sharing Between Traffic Classes”, IEEE Computer Society, pp. 1-110, Sep. 30, 2011.
Crupnicoff et al., “Deploying Quality of Service and Congestion Control in InfiniBand-based Data Center Networks” White Paper, Mellanox Technologies Inc, Rev 1.0, pp. 1-19, Oct. 15, 2005.
Mathey et al., “Scalable Deadlock-Free Deterministic Minimal-Path Routing Engine for InfiniBand-Based Dragonfly Networks”, IEEE Transactions on Parallel and Distributed Systems, vol. 29, No. 1, pp. 183-197, Jan. 2018.
Wikipedia,Precision Time Protocol, pp. 1-9, Apr. 20, 2020.
SMPTE Standard, “Professional Media Over Managed IP Networks:Traffic Shaping and Delivery Timing for Video”, The Society of Motion Picture and Television Engineers, pp. 1-17, Nov. 22, 2017.
Wikipedia, “Time-Sensitive Networking”, pp. 1-12, Mar. 5, 2020.
Wikipedia, “Memory Protection,” pp. 1-6, last edited May 23, 2021.
Levi et al., U.S. Appl. No. 16/921,993, filed Jul. 7, 2020.
Mills, “Network Time Protocol (Version 1): Specification and Implementation,” RFC 1059, pp. 2-59, Jul. 1988.
Mills, “Internet Time Synchronization: The Network Time Protocol,” IEEE Transactions on Communication, vol. 39, No. 10, pp. 1482-1493, Oct. 1991.
Mills, “Network Time Protocol (Version 3): Specification, Implementation and Analysis,” RFC 1305, pp. 1-96, Mar. 1992.
Mills, “Network Time Protocol (NTP),” RFC 0958, pp. 2-15, Sep. 1985.
Levi et al., U.S. Appl. No. 17/067,690, filed Oct. 11, 2020.
Mula et al., U.S. Appl. No. 16/910,193, filed Jun. 24, 2020.
U.S. Appl. No. 17/147,487 Office Action dated Nov. 29, 2022.
Mellanox Technologies, “InfiniScale IV: 36-port 40GB/s Infiniband Switch Device”, pp. 1-2, year 2009.
Mellanox Technologies Inc., “Scaling 10Gb/s Clustering at Wire-Speed”, pp. 1-8, year 2006.
IEEE 802.1D Standard “IEEE Standard for Local and Metropolitan Area Networks—Media Access Control (MAC) Bridges”, IEEE Computer Society, pp. 1-281, Jun. 9, 2004.
IEEE 802.1AX Standard “IEEE Standard for Local and Metropolitan Area Networks—Link Aggregation”, IEEE Computer Society, pp. 1-163, Nov. 3, 2008.
Turner et al., “Multirate Clos Networks”, IEEE Communications Magazine, pp. 1-11, Oct. 2003.
Thayer School of Engineering, “An Slightly Edited Local Copy of Elements of Lectures 4 and 5”, Dartmouth College, pp. 1-5, Jan. 15, 1998 http://people.seas.harvard.edu/˜jones/cscie129/nu_lectures/lecture11/switching/clos_network/clos_network.html.
“MPI: A Message-Passing Interface Standard,” Message Passing Interface Forum, version 3.1, pp. 1-868, Jun. 4, 2015.
Coti et al., “MPI Applications on Grids: a Topology Aware Approach,” Proceedings of the 15th International European Conference on Parallel and Distributed Computing (EuroPar'09), pp. 1-12, Aug. 2009.
Petrini et al., “The Quadrics Network (QsNet): High-Performance Clustering Technology,” Proceedings of the 9th IEEE Symposium on Hot Interconnects (Hotl'01), pp. 1-6, Aug. 2001.
Sancho et al., “Efficient Offloading of Collective Communications in Large-Scale Systems,” Proceedings of the 2007 IEEE International Conference on Cluster Computing, pp. 1-10, Sep. 17-20, 2007.
Nudelman et al., U.S. Appl. No. 17/120,321, filed Dec. 14, 2020.
InfiniBand Architecture Specification, vol. 1, Release 1.2.1, pp. 1-1727, Nov. 2007.
Deming, “Infiniband Architectural Overview”, Storage Developer Conference, pp. 1-70, year 2013.
Fugger et al., “Reconciling fault-tolerant distributed computing and systems-on-chip”, Distributed Computing, vol. 24, Issue 6, pp. 323-355, Jan. 2012.
Wikipedia, “System on a chip”, pp. 1-4, Jul. 6, 2018.
Villavieja et al., “On-chip Distributed Shared Memory”, Computer Architecture Department, pp. 1-10, Feb. 3, 2011.
Ben-Moshe et al., U.S. Appl. No. 16/750,019, filed Jan. 23, 2020.
Bruck et al., “Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems”, IEEE Transactions on Parallel and Distributed Systems, vol. 8, No. 11, pp. 1143-1156, Nov. 1997.
Gainaru et al., “Using InfiniBand Hardware Gather-Scatter Capabilities to Optimize MPI All-to-All”, EuroMPI '16, Edinburgh, United Kingdom, pp. 1-13, year 2016.
Pjesivac-Grbovic et al., “Performance analysis of MPI collective operations”, Cluster Computing, pp. 1-25, 2007.
Bruck et al., “Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems”, Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures, pp. 298-309, Aug. 1, 1994.
Chiang et al., “Toward supporting data parallel programming on clusters of symmetric multiprocessors”, Proceedings International Conference on Parallel and Distributed Systems, pp. 607-614, Dec. 14, 1998.
Danalis et al., “PTG: an abstraction for unhindered parallelism”, 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, pp. 1-10, Nov. 17, 2014.
Cosnard et al., “Symbolic Scheduling of Parameterized Task Graphs on Parallel Machines,” Combinatorial Optimization book series (COOP, vol. 7), pp. 217-243, year 2000.
Jeannot et al., “Automatic Multithreaded Parallel Program Generation for Message Passing Multiprocessors using paramerized Task Graphs”, World Scientific, pp. 1-8, Jul. 23, 2001.
Stone, “An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations,” Journal of the Association for Computing Machinery, vol. 10, No. 1, pp. 27-38, Jan. 1973.
Kogge et al., “A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations,” IEEE Transactions on Computers, vol. C-22, No. 8, pp. 786-793, Aug. 1973.
Hoefler et al., “Message Progression in Parallel Computing—To Thread or not to Thread?”, 2008 IEEE International Conference on Cluster Computing, pp. 1-10, Tsukuba, Japan, Sep. 29-Oct. 1, 2008.
Wikipedia, “Loop unrolling,” pp. 1-9, last edited Sep. 9, 2020 downloaded from https://en.wikipedia.org/wiki/Loop_unrolling.
Chapman et al., “Introducing OpenSHMEM: SHMEM for the PGAS Community,” Proceedings of the Forth Conferene on Partitioned Global Address Space Programming Model, pp. 1-4, Oct. 2010.
Priest et al., “You've Got Mail (YGM): Building Missing Asynchronous Communication Primitives”, IEEE International Parallel and Distributed Processing Symposium Workshops, pp. 221-230, year 2019.
Wikipedia, “Nagle's algorithm”, pp. 1-4, Dec. 12, 2019.
Yang et al., “SwitchAgg: A Further Step Toward In-Network Computing,” 2019 IEEE International Conference on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking, pp. 36-45, Dec. 2019.
U.S. Appl. No. 16/430,457 Office Action dated Jul. 9, 2021.
EP Application # 20216972 Search Report dated Jun. 11, 2021.
U.S. Appl. No. 16/782,118 Office Action dated Jun. 3, 2021.
U.S. Appl. No. 16/789,458 Office Action dated Jun. 10, 2021.
U.S. Appl. No. 16/750,019 Office Action dated Jun. 15, 2021.
U.S. Appl. No. 17/495,824 Office Action dated Jan. 27, 2023.
EP Application # 22193564.6 Search Report dated Dec. 20, 2022.
Xu et al., “SLOAVx: Scalable Logarithmic AlltoallV Algorithm for Hierarchical Multicore Systems”, 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, pp. 369-376, year 2013.
U.S. Appl. No. 16/782,118 Office Action dated Nov. 8, 2021.
“Message Passing Interface (MPI): History and Evolution,” Virtual Workshop, Cornell University Center for Advanced Computing, NY, USA, pp. 1-2, year 2021, as downloaded from https://cvw.cac.cornell.edu/mpi/history.
Pacheco, “A User's Guide to MPI,” Department of Mathematics, University of San Francisco, CA, USA, pp. 1-51, Mar. 30, 1998.
Wikipedia, “Message Passing Interface,” pp. 1-16, last edited Nov. 7, 2021, as downloaded from https://en.wikipedia.org/wiki/Message_Passing_Interface.
EP Application # 21183290.2 Search Report dated Dec. 8, 2021.
EP Application # 20156490.3 Office Action dated Sep. 27, 2023.
U.S. Appl. No. 17/495,824 Office Action dated Aug. 7, 2023.
U.S. Appl. No. 18/071,692 Office Action dated Sep. 27, 2023.

Related Publications (1)

	Number	Date	Country
	20220006606 A1	Jan 2022	US

Provisional Applications (1)

	Number	Date	Country
	63047275	Jul 2020	US

Clock queue with arming and/or self-arming features

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Term Extension

Abstract