Programmable user-defined peripheral-bus device implementation using data-plane accelerator (DPA)

Information

  • Patent Grant
  • 12007921
  • Patent Number
    12,007,921
  • Date Filed
    Wednesday, November 2, 2022
    a year ago
  • Date Issued
    Tuesday, June 11, 2024
    16 days ago
Abstract
A network adapter includes a network interface, a bus interface, a hardware-implemented data-path and a programmable Data-Plane Accelerator (DPA). The network interface is to communicate with a network. The bus interface is to communicate with an external device over a peripheral bus. The hardware-implemented data-path includes a plurality of packet-processing engines to process data units exchanged between the network and the external device. The DPA is to expose on the peripheral bus a User-Defined Peripheral-bus Device (UDPD), to run user-programmable logic that implements the UDPD, and to process transactions issued from the external device to the UDPD by reusing one or more of the packet-processing engines of the data-path.
Description
FIELD OF THE INVENTION

The present invention relates generally to computing and data communication systems, and particularly to methods and systems for user-defined implementation of peripheral-bus devices.


BACKGROUND OF THE INVENTION

Computing systems often use peripheral buses for communication among processors, memories and peripheral devices. Examples of peripheral buses include Peripheral Component Interconnect express (PCIe), Compute Express Link (CXL) bus, NVLink and NVLink-C2C. Peripheral devices may comprise, for example, network adapters, storage devices, Graphics Processing Units (GPUs) and the like.


SUMMARY OF THE INVENTION

An embodiment that is described herein provides a network adapter including a network interface, a bus interface, a hardware-implemented data-path and a programmable Data-Plane Accelerator (DPA). The network interface is to communicate with a network. The bus interface is to communicate with an external device over a peripheral bus. The hardware-implemented data-path includes a plurality of packet-processing engines to process data units exchanged between the network and the external device. The DPA is to expose on the peripheral bus a User-Defined Peripheral-bus Device (UDPD), to run user-programmable logic that implements the UDPD, and to process transactions issued from the external device to the UDPD by reusing one or more of the packet-processing engines of the data-path.


In various embodiments, the UDPD is one of a network adapter, a storage device, a Graphics Processing Unit (GPU) and a Field Programmable Gate Array (FPGA).


In an embodiment, in processing the data units, the data-path is to communicate over the peripheral bus with a network-adapter driver running on the external device, and, in processing the transactions issued to the UDPD, the DPA is to communicate over the peripheral bus with a UDPD driver running on the external device. In another embodiment, in processing the data units, the packet-processing engines in the data-path are to trigger one another in a pipeline independently of the DPA, and, in processing the transactions issued to the UDPD, the one or more of the packet-processing engines are to be invoked by the DPA.


In yet another embodiment, the data-path includes a hardware-implemented transport engine to perform transport-protocol checks and/or offloads on incoming communication data units and to select receive-queues for the incoming communication data units, and the DPA is to re-use the transport engine to perform transport-protocol checks and/or offloads on incoming UDPD data units associated with the UDPD, and to select receive-queues for the incoming UDPD data units.


In still another embodiment, the data-path includes a hardware-implemented address-translation engine to translate between virtual addresses in a first address space and addresses assigned to the communication data units in a second address space, and the DPA is to re-use the address-translation engine to translate between virtual addresses in a third address space and addresses assigned to UDPD data units associated with the UDPD in a fourth address space.


In a disclosed embodiment, the data-path includes at least one hardware-implemented Direct Memory Access (DMA) engine to scatter data from at least some of the communication data units to memory, and to transfer completion notifications for the communication data units, and the DPA is to re-use the DMA engine to scatter data from UDPD data units, associated with the UDPD, to the memory, and to transfer completion notifications for the UDPD data units.


In an example embodiment, the data-path includes a hardware-implemented message-signaled-interrupt engine to trigger the external device with interrupts upon completions of processing of at least some of the data units, and the DPA is to re-use the message-signaled-interrupt engine to trigger the external device with interrupts upon completions of processing of UDPD data units associated with the UDPD.


In an embodiment, the data-path includes an interrupt-moderation engine to throttle a rate of the interrupts that indicate the completions of the communication data units, and the DPA is to re-use the interrupt-moderation engine to throttle a rate of the interrupts that indicate the completions of the UDPD data units. In another embodiment, the data-path includes a doorbell-aggregation engine to coalesce doorbells relating to the communication data units, and the DPA is to re-use the doorbell-aggregation engine to coalesce doorbells relating to UDPD data units associated with the UDPD.


There is additionally provided, in accordance with an embodiment that is described herein, a method in a network adapter. The method includes communicating with a network, and communicating with an external device over a peripheral bus. Using a hardware-implemented data-path that includes a plurality of packet-processing engines, data units exchanged between the network and the external device are processed. Using a programmable Data-Plane Accelerator (DPA), a User-Defined Peripheral-bus Device (UDPD) is exposed on the peripheral bus, user-programmable logic that implements the UDPD is run, and transactions issued from the external device to the UDPD are processed by reusing one or more of the packet-processing engines of the data-path.


There is further provided, in accordance with an embodiment that is described herein, a network adapter including a network interface, a bus interface, a hardware-implemented data-path and a programmable Data-Plane Accelerator (DPA). The network interface is to communicate with a network. The bus interface is to communicate with an external device over a peripheral bus. The hardware-implemented data-path includes a plurality of packet-processing engines, to process data units exchanged between the network and the external device. The DPA is to run user-programmable logic that implements a User-Defined Peripheral-bus Device (UDPD), including reusing, in implementing the UDPD, one or more of the packet-processing engines of the data-path.


The present invention will be more fully understood from the following detailed description of the embodiments thereof, taken together with the drawings in which:





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a block diagram that schematically illustrates a computing system employing user defined peripheral-bus device implementation (UDDI), in accordance with an embodiment of the present invention;



FIG. 2 is a flow chart that schematically illustrates a method for processing inbound communication packets and User-Defined Peripheral Device (UDPD) packets in a Network Interface Controller (NIC), in accordance with an embodiment of the present invention; and



FIG. 3 is a flow chart that schematically illustrates a method for processing outbound communication packets and UDPD packets in a NIC, in accordance with an embodiment of the present invention.





DETAILED DESCRIPTION OF EMBODIMENTS
Overview

Embodiments of the present invention that are described herein provide improved methods and systems for user-defined implementation (e.g., software emulation) of peripheral devices in computing systems. In the disclosed embodiments, a network adapter provides users with means for specifying user-defined peripheral devices. This framework is referred to herein as user defined peripheral-bus device implementation (UDDI).


Peripheral devices that can be specified and implemented using the disclosed techniques include, for example, network adapters (e.g., Network Interface Controllers—NICs), storage devices (e.g., Solid State Drives—SSDs), Graphics Processing Units (GPUs) and Field-Programmable Gate Arrays (FPGAs). UDDI may be performed over various types of peripheral buses, e.g., Peripheral Component Interconnect express (PCIe), Compute Express Link (CXL) bus, NVLink and NVLink-C2C. In the present context, the terms “emulation” of a device and “user-defined implementation” of a device are used interchangeably.


As will be described below, the disclosed network adapter comprises a hardware-implemented data-path that comprises various packet-processing engines used for network communication. In addition, the network adapter comprises a programmable Data-Plane Accelerator (DPA) that runs user-programmable logic implementing the UDPD. The DPA implements (e.g., emulates) the UDPD by reusing one or more of the packet-processing engines of the network adapter's data-path.


Various examples of using the same packet-processing engines for network communication and for UDDI are described herein. Data-path packet-processing engines that can be reused for network communication and for UDDI comprise, for example, transport engines, address translation engines, Direct Memory Access (DMA) engines, message-signaled-interrupt (MSI/MSI-X) engines, interrupt moderation engines, doorbell aggregation engines, as well as various “memory-to-memory” accelerators that perform computations such as compression/decompression, encryption/decryption and hashing.


In some embodiments the network adapter communicates over the peripheral bus with a host or other external device. The network adapter exposes two bus interfaces over the peripheral bus, one interface used for network communication and the other interface (referred to as “UDPD interface” or “UDDI interface”) used for UDDI. The host runs two software drivers—A “native NIC driver” (also referred to as “network-adapter driver”) for performing network communication, and a UDPD driver for interacting with the UDPD. Both drivers are accessible to user applications running on the host.


The methods and systems described herein enable users a high degree of flexibility in specifying peripheral devices. By carrying out the UDDI tasks in the network adapter, the disclosed techniques offload the host of such tasks, and also provide enhanced security and data segregation between different users. By reusing data-path packet-processing engines for both network communication and UDDI, implementation in the network adapter is simpler and more efficient in terms of cost, size and power consumption (“performance per Watt”).


System Description


FIG. 1 is a block diagram that schematically illustrates a computing system 20 employing user defined peripheral-bus device implementation (UDDI), in accordance with an embodiment of the present invention. System 20 comprises a network adapter, in the present example a Network Interface Controller (NIC) 24, which serves a host 28. Host 28 and NIC 24 communicate with one another over a peripheral bus, in the present example a PCIe bus 34. NIC 24 is connected to a network 32, e.g., an Ethernet of InfiniBand™ network.


Host 28 comprises a host CPU 36 (also referred to as a host processor) and a host memory 40, e.g., a Random-Access Memory (RAM). Host processor 36 runs various user applications (not seen in the figure). The user applications may communicate over network 32 using NIC 24, and/or interact with one or more User-Defined Peripheral Devices (UDPD) implemented on NIC 24. Host processor 36 runs a native NIC driver 44 for providing network-communication services to the user applications, and a UDPD driver 48 for providing UDDI services to the user applications.


The configuration of system 20 seen in FIG. 1 is an example, non-limiting configuration. For example, alternatively to PCIe, the peripheral bus may comprise a CXL bus, an NVLink bus, an NVLink-C2C bus, or any other suitable peripheral bus. Host 28 is regarded herein as an example of an external device that can be served by NIC 24. Additionally or alternatively, an external device may comprise, for example, a peer device (e.g., GPU or FPGA) coupled to bus 34 or to the host. A host may be part of a multi-host configuration, in which NIC 24 serves multiple hosts over separate respective logical buses.


In some embodiments, NIC 24 comprises one or more network ports 52 for communicating over network 32, and a host interface 56 (also referred to as a bus interface) for communicating with host 28 (or other external device) over bus 34. NIC 24 further comprises a hardware-implemented data path 60 and a programmable Data-Plane Accelerator (DPA) 64. Data path 60 comprises a plurality of hardware-implemented packet-processing engines that perform various processing tasks needed for network communication between host 28 and network 32, e.g., for sending and receiving packets. DPA 64 runs, possibly among other tasks, user-programmable logic that implements the UDPD. As will be explained below, DPA 60 implements the UDPD by reusing one or more of the packet-processing engines of data path 60.


In the embodiment of FIG. 1, data path 60 comprises the following packet-processing engines:

    • A transport engine 68—An engine responsible for packet transport reliability and transport protocol implementation.
    • An address translation engine 72. Given a host virtual address (VA) and a memory key (MKEY) that identifies the buffer registration, address translation engine 72 translates the host virtual address into an IO Virtual Address (IOVA). Engine 72 may support one or more translation types, such as, for example:
      • Direct mapping—A mapping that translates VAs into respective IOVAs, within the address space defined by the MKEY.
      • Indirect mapping—A mapping that translates VAs into one or more additional IOVAs or {MKEY, VA} pairs, wherein MKEY may be either direct or indirect (the final step of indirection being a direct-mapped MKEY).
      • Patterned mapping (“strided mapping”)—A mapping that translates VAs into respective one or more IOVAs or {MKEY, VA} pairs in accordance with a periodic pattern of addresses. Each MKEY may be either direct or indirect.
    • One or more DMA engines 76. A given engine 76 is able to perform parallel, asynchronous and variable-size DMA operations (e.g., DMA read and DMA write) in host memory 40. DMA engine 76 typically receives an instruction comprising an opcode (read/write), one or more IOVAs (or one or more {MKEY, VA} pairs that are then translated into IOVAs) and a length, and executes the requested PCIe transactions to carry out the instruction. In case of a write instruction, the request descriptor may also comprise the data to be written (“inline data”).
    • (In some embodiments, DPA 64 or data path 60 may additionally comprise one or more asynchronous DMA engines that are used only for UDDI.) An asynchronous DMA engine typically receives instructions from DPA 64 to move data between host memory 40 and the DPA memory (fetch data from the host memory to the DPA memory, or write data from the DPA memory to the host memory), executes the instructions asynchronously without blocking forward process of the DPA, and reports to the DPA once execution is completed.
    • An MSI-X engine 84. An engine that issues MSIX-type interrupts to host processor 36, and/or interrupts to DPA 64.
    • An interrupt moderation engine 88—An engine that throttles the rate of interrupts issued toward host processor 36 and/or toward DPA 64. Interrupt moderation engine 88 can be configured with a maximum rate of interrupts and/or with a maximum latency permitted in coalescing interrupts.
    • A doorbell aggregation engine 80—An engine that coalesces multiple doorbells, issued by host processor 36, to a single queue. This sort of coalescing enables NIC 24 to execute only the last doorbell without pre-emption from other doorbells. Since in some embodiments UDDI queues are cyclic, doorbell aggregation engine 80 can store only the last producer index of the queue.
    • One or more memory-to-memory accelerators 92—Accelerators that accelerate complex computations. A given accelerator 92 typically reads its operands from memory and writes its output back to memory. Computations that may be accelerated include, for example, compression, decompression, encryption, decryption and hash-function evaluation.


The configurations of system 20 and its various components, e.g., NIC 24 and host 28, as depicted in FIG. 1, are example configurations that are chosen purely for the sake of conceptual clarity. Any other suitable configurations can be used in alternative embodiments.


In various embodiments, the disclosed techniques can be used for implementing any suitable peripheral device, e.g., network adapters, storage devices that support various storage protocols, GPUs, FPGAs, etc. User-defined (e.g., emulated) storage devices may support various storage protocols, e.g., Non-Volatile Memory express (NVMe), block-device protocols such as virtio-blk, local or networked file systems, object storage protocols, network storage protocols, etc. Further aspects of UDDI and device emulation are addressed, for example, in U.S. patent application Ser. No. 17/211,928, entitled “Storage Protocol Emulation in a Peripheral Device,” filed Mar. 25, 2021, in U.S. patent application Ser. No. 17/372,466, entitled “Network Adapter with Efficient Storage-Protocol Emulation,” filed Jul. 11, 2021, in U.S. patent application Ser. No. 17/527,197, entitled “Enhanced Storage Protocol Emulation in a Peripheral Device,” filed Nov. 16, 2021, and in India Patent Application 202241052839, entitled “User-Defined Peripheral-Bus Device Implementation,” filed Sep. 15, 2022, which are assigned to the assignee of the present patent application and whose disclosures are incorporated herein by reference.


It is noted that the term “user” may refer to various entities, whether individuals or organizations. For example, in a given system, a user-defined peripheral device may be specified by one “user” but accessed by (interfaced with) by a different “user”. For example, the user specifying the user-defined peripheral device may be an infrastructure owner, whereas the user using the user-defined peripheral device may be a consumer. In a cloud environment, for example, the former user would be a Cloud Service Provider (CSP) and the latter user could be a guest or tenant. In some cases, however, a user-defined peripheral device may be specified and used by the same user.


In various embodiments, the various components of NIC 24 and host 28 can be implemented using hardware, e.g., using one or more Application-Specific Integrated Circuits (ASIC) and/or Field-Programmable Gate Arrays (FPGA), using software, or using a combination of hardware and software components.


In some embodiments, at least some of the functions of the disclosed system components, e.g., some or all functions of host CPU 36 and/or DPA 64, are implemented using one or more general-purpose processors, which are programmed in software to carry out the functions described herein. The software may be downloaded to the processors in electronic form, over a network, for example, or it may, alternatively or additionally, be provided and/or stored on non-transitory tangible media, such as magnetic, optical, or electronic memory.


UDDI with Reuse of Data-Path Engines by DPA

When implementing a UDPD, the UDPD interface exposed by NIC 24 typically appears to a user application as a dedicated, local peripheral device. The actual peripheral device, however, may be located remotely from host 28 (e.g., across network 32), shared by one or more other user applications and/or designed to use a different native interface than the user application, or emulated entirely using software.


Thus, in general, user-defined implementation of a peripheral device may involve accessing local devices, communication over a network with remote devices, as well as protocol translation. These operations typically involve sending and/or receiving data units to and from network 32, as well as processing data units in NIC 24.


Depending on the kind of peripheral device being implemented and the protocols involved, data units that are processed by NIC 24 may comprise, for example, packets, messages, data blocks, data objects, descriptors, contexts, work requests, completions, or any other suitable kind of data units. Some types of data units may be communicated over network 32, other types may be communicated with the host, and yet other types may be processed only internally in the NIC.


The embodiments described herein refer mainly to packets, for the sake of clarity, but the disclosed techniques are applicable to data units of any other suitable type. For clarity, data units (e.g., packets) that are processed by NIC 24 as part of UDDI, i.e., as part of implementing a user-defined peripheral device, are referred to as UDPD data units (with UDPD packets being an example). By the same token, data units (e.g., packets) that are processed by NIC 24 as part of network communication are referred to as communication data units (with communication packets being an example).


Using the above terminology, when serving user applications that run on host 28, NIC 24 reuses one or more of the processing engines of data path 60 for both (i) processing of communication packets as part of network communication using native NIC driver 44, and (ii) processing of UDPD packets as part of UDDI using UDPD driver 48.


Typically, although not necessarily, when processing communication packets, data path 60 operates in a pipelined manner, with one processing engine triggering another processing engine. This operation is typically independent of DPA 64. When processing UDPD packets, on the other hand, the various processing engines are typically invoked by DPA 64 as needed.


Reuse of Data-Path Engines—Inbound Packets


FIG. 2 is a flow chart that schematically illustrates a method for processing inbound communication packets and User-Defined Peripheral Device (UDPD) packets in NIC 24, in accordance with an embodiment of the present invention.


The left-hand side of the figure shows the processing of communication packets (also referred to as “communication process”). This process typically does not involve DPA 64. The right-hand side of the figure shows the processing of UDPD packets (also referred to as “UDDI process”). Operations that reuse the same packet-processing engine are marked in the figure by a connecting dashed line.


The communication process (left-hand side of the flow chart) begins with NIC 24 receiving a communication packet from network 32 via one of ports 52, at a communication packet reception stage 100.


At a transport processing stage 104, transport engine 68 performs applicable checks and offloads on the communication packet. Checks may comprise, for example, verification of the IP checksum and TCP checksum and/or checking of the network address (e.g., check for MAC spoofing or for invalid addresses). Offloads may comprise, for example, header decapsulation in tunneled protocols, management of Large Receive Offload (LRO) sessions, and termination of reliability protocols in RDMA such as Packet Sequence Number (PSN) checks. If all checks pass successfully, transport engine 68 selects a Receive Queue (RQ) for the packet, and issues a translation request to address translation engine 72 for the next buffer in the RQ.


At an address translation stage 108, address translation engine 72 translates the RQ buffer address into one or more IOVAs. At a packet scattering stage 112, DMA engine 76 scatters the packet to the IOVA. In some embodiments, the packet may be processed by one or more of memory-to-memory accelerators 92 as needed, e.g., to decompress and/or decrypt the packet.


At a completion scattering stage 116, DMA engine 76 scatters a completion of the packet to a Completion Queue (CQ) in host 28.


In some cases (e.g., depending on user configuration) the completion may trigger MSIX engine 84 to generate an MSIX to host processor 36, at an interrupt generation stage 120. When configured, interrupt moderation engine 88 may throttle the rate of MSIX issued toward the host, at an interrupt moderation stage 124.


The UDDI process (right-hand side of the flow chart) begins with NIC 24 receiving a UDPD packet from network 32 via ports 52, at a UDPD packet reception stage 130.


At a transport processing stage 134, transport engine 68 performs the applicable checks on the UDPD packet, e.g., verifies the IP checksum and TCP checksum, and the network address. If the checks pass successfully, the transport engine 68 selects a Receive Queue (RQ) for the packet. In this case, however, the RQ is associated with DPA 64. In some embodiments the DPA receives the packet on its selected RQ. In other embodiments the packet is written directly to host memory 40, and only packet arrival is issued to the DPA.


At a UDDI stage 138, DPA 64 performs the applicable user-defined logic on the UDPD packet. As part of this stage, DPA 64 may invoke one or more of memory-to-memory accelerators 92 as needed, e.g., to decompress and/or decrypt the packet.


At a translation requesting stage 142, the DPA issues a translation request to address translation engine 72 for the target buffer. At an address translation stage 146, address translation engine 72 translates the RQ buffer address into one or more IOVAs.


At a packet scattering stage 150, DMA engine 76 scatters the UDPD packet to the one or more IOVAs. At a completion requesting stage 154, DPA 64 sends a command to DMA engine 76 to scatter a completion. In response, in some embodiments, DMA engine 76 scatters a completion of the packet to a Completion Queue (CQ) in host 28, at a completion scattering stage 158. In other embodiments, a different scheme for completion indication (e.g., incrementing of a counter) can be used.


In some cases (e.g., depending on user configuration) the completion may trigger MSIX engine 84 to generate an MSIX to host processor 36, at an interrupt generation stage 162. When configured, interrupt moderation engine 88 may throttle the rate of MSIX issued toward the host, at an interrupt moderation stage 166.


Reuse of Data-Path Engines—Outbound Packets


FIG. 3 is a flow chart that schematically illustrates a method for processing outbound communication packets and UDPD packets in NIC 24, in accordance with an embodiment of the present invention. Here, too, the left-hand side of the flow chart shows the processing of communication packets (referred to as “communication process”), and the right-hand side of the flow chart shows the processing of UDPD packets (referred to as “UDDI process”). Operations that reuse the same packet-processing engine are marked in the figure by a connecting dashed line.


The communication process (left-hand side of the flow chart) begins with NIC 24 receiving a doorbell from native NIC driver 44, indicating a new outbound communication packet to be processed. The doorbell typically specifies a Send Queue (SQ) address. Doorbell aggregation engine 80 receives and processes the doorbell, at a doorbell processing stage 170.


Address translation engine 72 translates the SQ buffer address (and/or one or more other addresses contained in the request which exist in the SQ buffer, e.g., a Work Queue Element (WQE) containing a pointer to data) into IOVA, at a translation stage 174. At a fetching stage 178, DMA engine 76 fetches the descriptors and payload of the communication packet from host memory 40, thereby composing the communication packet. In some embodiments, the packet may be processed by one or more of memory-to-memory accelerators 92 as needed, e.g., to compress and/or encrypt the packet.


At a transport processing stage 182, transport engine 68 processes the packet, including, for example, calculating and/or verifying fields such as IP checksum, TCP checksum and network addresses, and/or performing offloads such as Large Send Offload (LSO). Transport engine 68 may implement the transport layer, fully or partially, such as add RDMA Packet Sequence Numbers (PSNs), etc. At a transmission stage 186, the communication packet is transmitted to network 32 via one of ports 52.


At a completion scattering stage 190, DMA engine 76 scatters a completion of the packet to the host CQ. In some cases (e.g., depending on user configuration), the completion may trigger MSIX engine 84 to generate an MSIX to host processor 36, at an interrupt generation stage 194. When configured, interrupt moderation engine 88 may throttle the rate of MSIX issued toward the host, at an interrupt moderation stage 198.


The UDPD process (right-hand side of the flow chart) begins with NIC 24 receiving a doorbell from UDPD driver 48, indicating a new outbound UDPD packet to be processed. The doorbell typically specifies a UDDI queue address. In some embodiments, doorbell aggregation engine 80 receives and processes the doorbell, at a doorbell processing stage 202. At a doorbell trapping stage 206, DPA 64 traps the doorbell and executes the applicable user-defined processing to the trapped doorbell.


At a fetch requesting stage 210, DPA 64 issues a command to DMA engine 76 to fetch the descriptors and data of the UDPD packet from host memory 40. At an address translation stage 214, address translation engine 72 translates the SQ buffer address (and/or additional addresses indicated by the descriptors in the SQ buffer, such as virtio-net available-ring that points to a descriptor table, which in turn points to packets and/or additional entries in the descriptor table) into IOVA. At a fetching stage 218, DMA engine 76 fetches the descriptors and payload of the UDPD packet from host memory 40, thereby composing the UDPD packet.


At a UDDI stage 222, DPA 64 performs the applicable user-defined logic on the UDPD packet. As part of this stage, DPA 64 may invoke one or more of memory-to-memory accelerators 92 as needed, e.g., to compress and/or encrypt the packet.


At a send requesting stage 226, in some embodiments the DPA issues a command to transport engine 68 to send the packet. At a transport processing stage 230, transport engine 68 processes the packet, including, for example, calculating and/or verifying fields such as IP checksum, TCP checksum and network addresses. At a transmission stage 234, the communication packet is transmitted to network 32 via one of ports 52.


At a completion requesting stage 238, DPA 64 sends a command to DMA engine 76 to scatter a completion. In response, DMA engine 76 scatters a completion of the packet to a Completion Queue (CQ) in host 28, at a completion scattering stage 242. As noted above, a CQ is only one possible way of indicating completion. In other embodiments, any other implementation can be used, e.g., using a counter.


In some cases (e.g., depending on user configuration) the completion may trigger MSIX engine 84 to generate an MSIX to host processor 36, at an interrupt generation stage 246. When configured, interrupt moderation engine 88 may throttle the rate of MSIX issued toward the host, at an interrupt moderation stage 250.


Although the embodiments described herein mainly address user-defined implementation of peripheral-bus devices, the methods and systems described herein can also be used in other applications, such as in implementing sub-device functionality within an existing device.


It will thus be appreciated that the embodiments described above are cited by way of example, and that the present invention is not limited to what has been particularly shown and described hereinabove. Rather, the scope of the present invention includes both combinations and sub-combinations of the various features described hereinabove, as well as variations and modifications thereof which would occur to persons skilled in the art upon reading the foregoing description and which are not disclosed in the prior art. Documents incorporated by reference in the present patent application are to be considered an integral part of the application except that to the extent any terms are defined in these incorporated documents in a manner that conflicts with the definitions made explicitly or implicitly in the present specification, only the definitions in the present specification should be considered.

Claims
  • 1. A network adapter, comprising: a network interface, to communicate with a network;a bus interface, to communicate with an external device over a peripheral bus;a hardware-implemented data-path, comprising a plurality of packet-processing engines, to process data units exchanged between the network and the external device; anda programmable Data-Plane Accelerator (DPA), to expose on the peripheral bus a User-Defined Peripheral-bus Device (UDPD), to run user-programmable logic that implements the UDPD, and to process transactions issued from the external device to the UDPD by reusing at least a given packet-processing engine among the packet-processing engines of the data-path,wherein (i) in processing the data units, the given packet-processing engine in the data-path is to be triggered by a preceding packet-processing engine in the data-path, and (ii) in processing the transactions issued to the UDPD, the given packet-processing engine is to be invoked by the DPA.
  • 2. The network adapter according to claim 1, wherein the UDPD is one of: a network adapter;a storage device;a Graphics Processing Unit (GPU); anda Field Programmable Gate Array (FPGA).
  • 3. The network adapter according to claim 1, wherein: in processing the data units, the data-path is to communicate over the peripheral bus with a network-adapter driver running on the external device; andin processing the transactions issued to the UDPD, the DPA is to communicate over the peripheral bus with a UDPD driver running on the external device.
  • 4. The network adapter according to claim 1, wherein: the given packet-processing engine comprises a hardware-implemented transport engine to perform transport-protocol checks and/or offloads on incoming communication data units and to select receive-queues for the incoming communication data units; andthe DPA is to re-use the transport engine to perform transport-protocol checks and/or offloads on incoming UDPD data units associated with the UDPD, and to select receive-queues for the incoming UDPD data units.
  • 5. The network adapter according to claim 1, wherein: the given packet-processing engine comprises a hardware-implemented address-translation engine to translate between virtual addresses in a first address space and addresses assigned to the communication data units in a second address space; andthe DPA is to re-use the address-translation engine to translate between virtual addresses in a third address space and addresses assigned to UDPD data units associated with the UDPD in a fourth address space.
  • 6. The network adapter according to claim 1, wherein: the given packet-processing engine comprises at least one hardware-implemented Direct Memory Access (DMA) engine to scatter data from at least some of the communication data units to memory, and to transfer completion notifications for the communication data units; andthe DPA is to re-use the DMA engine to scatter data from UDPD data units, associated with the UDPD, to the memory, and to transfer completion notifications for the UDPD data units.
  • 7. The network adapter according to claim 1, wherein: the given packet-processing engine comprises a hardware-implemented message-signaled-interrupt engine to trigger the external device with interrupts upon completions of processing of at least some of the data units; andthe DPA is to re-use the message-signaled-interrupt engine to trigger the external device with interrupts upon completions of processing of UDPD data units associated with the UDPD.
  • 8. The network adapter according to claim 1, wherein: the given packet-processing engine comprises an interrupt-moderation engine to throttle a rate of interrupts that indicate completions of at least some of the communication data units; andthe DPA is to re-use the interrupt-moderation engine to throttle a rate of the interrupts that indicate the completions of the UDPD data units.
  • 9. The network adapter according to claim 1, wherein: the given packet-processing engine comprises a doorbell-aggregation engine to coalesce doorbells relating to the communication data units; andthe DPA is to re-use the doorbell-aggregation engine to coalesce doorbells relating to UDPD data units associated with the UDPD.
  • 10. A method in a network adapter, the method comprising: communicating with a network;communicating with an external device over a peripheral bus;using a hardware-implemented data-path that includes a plurality of packet-processing engines, processing data units exchanged between the network and the external device; andusing a programmable Data-Plane Accelerator (DPA), exposing on the peripheral bus a User-Defined Peripheral-bus Device (UDPD), running user-programmable logic that implements the UDPD, and processing transactions issued from the external device to the UDPD by reusing at least a given packet-processing engine among the packet-processing engines of the data-path,wherein (i) in processing the data units, the given packet-processing engine in the data-path is to be triggered by a preceding packet-processing engine in the data-path, and (ii) in processing the transactions issued to the UDPD, the given packet-processing engine is to be invoked by the DPA.
  • 11. The method according to claim 10, wherein the UDPD is one of: a network adapter;a storage device;a Graphics Processing Unit (GPU); anda Field Programmable Gate Array (FPGA).
  • 12. The method according to claim 10, wherein: processing the data units comprises communicating over the peripheral bus with a network-adapter driver running on the external device; andprocessing the transactions issued to the UDPD comprises communicating over the peripheral bus with a UDPD driver running on the external device.
  • 13. The method according to claim 10, wherein: the given packet-processing engine comprises a hardware-implemented transport engine;processing the data units comprises, by the hardware-implemented transport engine, performing transport-protocol checks and/or offloads on incoming communication data units and selecting receive-queues for the incoming communication data units; andre-using the transport engine by the DPA comprises performing transport-protocol checks and/or offloads on incoming UDPD data units associated with the UDPD, and selecting receive-queues for the incoming UDPD data units.
  • 14. The method according to claim 10, wherein: the given packet-processing engine comprises a hardware-implemented address-translation engine;processing the data units comprises, by the hardware-implemented address-translation engine, translating between virtual addresses in a first address space and addresses assigned to the communication data units in a second address space; andre-using the address-translation engine by the DPA comprises translating between virtual addresses in a third address space and addresses assigned to UDPD data units associated with the UDPD in a fourth address space.
  • 15. The method according to claim 10, wherein: the given packet-processing engine comprises a hardware-implemented Direct Memory Access (DMA) engine;processing the data units comprises, by the hardware-implemented Direct Memory Access (DMA) engine, scattering data from at least some of the communication data units to memory, and transferring completion notifications for the communication data units; andre-using the DMA engine by the DPA comprises scattering data from UDPD data units, associated with the UDPD, to the memory, and transferring completion notifications for the UDPD data units.
  • 16. The method according to claim 10, wherein: the given packet-processing engine comprises a hardware-implemented message-signaled-interrupt engine;processing the data units comprises, by the hardware-implemented message-signaled-interrupt engine, triggering the external device with interrupts upon completions of processing of at least some of the data units; andre-using the message-signaled-interrupt engine by the DPA comprises triggering the external device with interrupts upon completions of processing of UDPD data units associated with the UDPD.
  • 17. The method according to claim 10, wherein: the given packet-processing engine comprises a hardware-implemented interrupt-moderation engine;processing the data units comprises, by the hardware-implemented interrupt-moderation engine, throttling a rate of interrupts that indicate completions of at least some of the communication data units; andre-using the interrupt-moderation engine by the DPA comprises throttling a rate of the interrupts that indicate the completions of the UDPD data units.
  • 18. The method according to claim 10, wherein: the given packet-processing engine comprises a doorbell-aggregation engine;processing the data units comprises, by the doorbell-aggregation engine, coalescing doorbells relating to the communication data units; andre-using the doorbell-aggregation engine by the DPA comprises coalescing doorbells relating to UDPD data units associated with the UDPD.
  • 19. A network adapter, comprising: a network interface, to communicate with a network;a bus interface, to communicate with an external device over a peripheral bus;a hardware-implemented data-path, comprising a plurality of packet-processing engines, to process data units exchanged between the network and the external device; anda programmable Data-Plane Accelerator (DPA), to run user-programmable logic that implements a User-Defined Peripheral-bus Device (UDPD), including reusing, in implementing the UDPD, at least a given packet-processing engine among the packet-processing engines of the data-path,wherein (i) in processing the data units, the given packet-processing engine in the data-path is to be triggered by a preceding packet-processing engine in the data-path, and (ii) in processing transactions issued to the UDPD, the given packet-processing engine is to be invoked by the DPA.
US Referenced Citations (221)
Number Name Date Kind
5003465 Chisholm et al. Mar 1991 A
5463772 Thompson et al. Oct 1995 A
5615404 Knoll et al. Mar 1997 A
5768612 Nelson Jun 1998 A
5864876 Rossum et al. Jan 1999 A
5893166 Frank et al. Apr 1999 A
5954802 Griffith Sep 1999 A
6070219 Mcalpine et al. May 2000 A
6226680 Boucher et al. May 2001 B1
6321276 Forin Nov 2001 B1
6581130 Brinkmann et al. Jun 2003 B1
6701405 Adusumilli et al. Mar 2004 B1
6766467 Neal et al. Jul 2004 B1
6789143 Craddock et al. Sep 2004 B2
6901496 Mukund et al. May 2005 B1
6981027 Gallo et al. Dec 2005 B1
7171484 Krause et al. Jan 2007 B1
7225277 Johns et al. May 2007 B2
7263103 Kagan et al. Aug 2007 B2
7299266 Boyd et al. Nov 2007 B2
7395364 Higuchi et al. Jul 2008 B2
7464198 Martinez et al. Dec 2008 B2
7475398 Nunoe Jan 2009 B2
7502884 Shah et al. Mar 2009 B1
7548999 Haertel et al. Jun 2009 B2
7577773 Gandhi et al. Aug 2009 B1
7657659 Lambeth et al. Feb 2010 B1
7720064 Rohde May 2010 B1
7752417 Manczak et al. Jul 2010 B2
7809923 Hummel et al. Oct 2010 B2
7921178 Haviv Apr 2011 B2
7921237 Holland et al. Apr 2011 B1
7945752 Miller et al. May 2011 B1
8001592 Hatakeyama Aug 2011 B2
8006297 Johnson et al. Aug 2011 B2
8010763 Armstrong et al. Aug 2011 B2
8051212 Kagan et al. Nov 2011 B2
8103785 Crowley et al. Jan 2012 B2
8255475 Kagan et al. Aug 2012 B2
8260980 Weber et al. Sep 2012 B2
8346919 Eiriksson et al. Jan 2013 B1
8447904 Riddoch May 2013 B2
8504780 Mine et al. Aug 2013 B2
8645663 Kagan et al. Feb 2014 B2
8745276 Bloch et al. Jun 2014 B2
8751701 Shahar et al. Jun 2014 B2
8824492 Wang et al. Sep 2014 B2
8892804 Morein et al. Nov 2014 B2
8949486 Kagan et al. Feb 2015 B1
9038073 Kohlenz et al. May 2015 B2
9092426 Bathija et al. Jul 2015 B1
9298723 Vincent Mar 2016 B1
9331963 Krishnamurthi et al. May 2016 B2
9483290 Mantri et al. Nov 2016 B1
9678818 Raikin et al. Jun 2017 B2
9696942 Kagan et al. Jul 2017 B2
9727503 Kagan et al. Aug 2017 B2
9830082 Srinivasan et al. Nov 2017 B1
9904568 Vincent et al. Feb 2018 B2
10078613 Ramey Sep 2018 B1
10120832 Raindel et al. Nov 2018 B2
10135739 Raindel et al. Nov 2018 B2
10152441 Liss et al. Dec 2018 B2
10162793 Bshara et al. Dec 2018 B1
10210125 Burstein Feb 2019 B2
10218645 Raindel et al. Feb 2019 B2
10423774 Zelenov et al. Apr 2019 B1
10382350 Bohrer et al. Aug 2019 B2
10417156 Hsu et al. Sep 2019 B2
10628622 Sivaraman et al. Apr 2020 B1
10657077 Ganor et al. May 2020 B2
10671309 Glynn Jun 2020 B1
10684973 Connor et al. Jun 2020 B2
10715451 Raindel et al. Jul 2020 B2
10824469 Hirshberg et al. Nov 2020 B2
10841243 Levi et al. Nov 2020 B2
10999364 Itigin et al. May 2021 B1
11003607 Ganor et al. May 2021 B2
11080225 Borikar et al. Aug 2021 B2
11086713 Sapuntzakis et al. Aug 2021 B1
11126575 Aslanidis et al. Sep 2021 B1
11537548 Makhija et al. Dec 2022 B2
20020152327 Kagan et al. Oct 2002 A1
20030023846 Krishna et al. Jan 2003 A1
20030046530 Poznanovic Mar 2003 A1
20030120836 Gordon Jun 2003 A1
20040010612 Pandya Jan 2004 A1
20040039940 Cox et al. Feb 2004 A1
20040057434 Poon et al. Mar 2004 A1
20040158710 Buer et al. Aug 2004 A1
20040221128 Beecroft et al. Nov 2004 A1
20040230979 Beecroft et al. Nov 2004 A1
20050102497 Buer May 2005 A1
20050198412 Pedersen et al. Sep 2005 A1
20050216552 Fineberg et al. Sep 2005 A1
20060095754 Hyder et al. May 2006 A1
20060104308 Pinkerton et al. May 2006 A1
20060259291 Dunham Nov 2006 A1
20060259661 Feng et al. Nov 2006 A1
20070011429 Sangili et al. Jan 2007 A1
20070061492 Van Riel Mar 2007 A1
20070223472 Tachibana et al. Sep 2007 A1
20070226450 Engbersen et al. Sep 2007 A1
20070283124 Menczak et al. Dec 2007 A1
20070297453 Niinomi Dec 2007 A1
20080005387 Mutaguchi Jan 2008 A1
20080147822 Benhase et al. Jun 2008 A1
20080147904 Freimuth et al. Jun 2008 A1
20080168479 Purtell et al. Jul 2008 A1
20080313364 Flynn et al. Dec 2008 A1
20090086736 Foong et al. Apr 2009 A1
20090106771 Benner et al. Apr 2009 A1
20090204650 Wong et al. Aug 2009 A1
20090319775 Buer et al. Dec 2009 A1
20090328170 Williams et al. Dec 2009 A1
20100030975 Murray et al. Feb 2010 A1
20100095053 Bruce et al. Apr 2010 A1
20100095085 Hummel et al. Apr 2010 A1
20100211834 Asnaashari et al. Aug 2010 A1
20100217916 Gao et al. Aug 2010 A1
20100228962 Simon et al. Sep 2010 A1
20100322265 Gopinath Dec 2010 A1
20110023027 Kegel et al. Jan 2011 A1
20110119673 Bloch et al. May 2011 A1
20110213854 Haviv Sep 2011 A1
20110246597 Swanson et al. Oct 2011 A1
20120314709 Post et al. Dec 2012 A1
20130067193 Kagan et al. Mar 2013 A1
20130080651 Pope et al. Mar 2013 A1
20130103777 Kagan et al. Apr 2013 A1
20130125125 Karino et al. May 2013 A1
20130142205 Munoz Jun 2013 A1
20130145035 Pope Jun 2013 A1
20130159568 Shahar et al. Jun 2013 A1
20130263247 Jungck et al. Oct 2013 A1
20130276133 Hodges et al. Oct 2013 A1
20130311746 Raindel et al. Nov 2013 A1
20130325998 Hormuth et al. Dec 2013 A1
20130329557 Petry Dec 2013 A1
20130347110 Dalal Dec 2013 A1
20140089450 Raindel et al. Mar 2014 A1
20140089451 Eran et al. Mar 2014 A1
20140089631 King Mar 2014 A1
20140122828 Kagan et al. May 2014 A1
20140129741 Shahar et al. May 2014 A1
20140156894 Tsirkin et al. Jun 2014 A1
20140181365 Fanning et al. Jun 2014 A1
20140185616 Bloch et al. Jul 2014 A1
20140244965 Manula et al. Aug 2014 A1
20140254593 Mital et al. Sep 2014 A1
20140282050 Quinn et al. Sep 2014 A1
20140282561 Holt et al. Sep 2014 A1
20150006663 Huang Jan 2015 A1
20150012735 Tamir et al. Jan 2015 A1
20150032835 Sharp et al. Jan 2015 A1
20150081947 Vucinic et al. Mar 2015 A1
20150100962 Morita et al. Apr 2015 A1
20150288624 Raindel et al. Oct 2015 A1
20150319243 Hussain et al. Nov 2015 A1
20150347185 Holt et al. Dec 2015 A1
20150355938 Jokinen et al. Dec 2015 A1
20160065659 Bloch et al. Mar 2016 A1
20160085718 Huang Mar 2016 A1
20160132329 Gupte et al. May 2016 A1
20160154673 Morris Jun 2016 A1
20160226822 Zhang et al. Aug 2016 A1
20160342547 Liss et al. Nov 2016 A1
20160350151 Zou et al. Dec 2016 A1
20160378529 Wen Dec 2016 A1
20170031810 Bonzini et al. Feb 2017 A1
20170075855 Sajeepa et al. Mar 2017 A1
20170104828 Brown Apr 2017 A1
20170180273 Daly et al. Jun 2017 A1
20170187629 Shalev et al. Jun 2017 A1
20170237672 Dalal Aug 2017 A1
20170264622 Cooper et al. Sep 2017 A1
20170286157 Hasting et al. Oct 2017 A1
20170371835 Ranadive et al. Dec 2017 A1
20180004954 Liguori et al. Jan 2018 A1
20180067893 Raindel et al. Mar 2018 A1
20180109471 Chang et al. Apr 2018 A1
20180114013 Sood et al. Apr 2018 A1
20180167364 Dong et al. Jun 2018 A1
20180210751 Pepus et al. Jul 2018 A1
20180219770 Wu et al. Aug 2018 A1
20180219772 Koster et al. Aug 2018 A1
20180246768 Palermo et al. Aug 2018 A1
20180262468 Kumar et al. Sep 2018 A1
20180285288 Bernat et al. Oct 2018 A1
20180329828 Apfelbaum et al. Nov 2018 A1
20190012350 Sindhu et al. Jan 2019 A1
20190026157 Suzuki et al. Jan 2019 A1
20190116127 Pismenny et al. Apr 2019 A1
20190124113 Labana et al. Apr 2019 A1
20190163364 Gibb et al. May 2019 A1
20190173846 Patterson et al. Jun 2019 A1
20190190892 Menachem et al. Jun 2019 A1
20190199690 Klein Jun 2019 A1
20190243781 Thyamagondlu Aug 2019 A1
20190250938 Claes et al. Aug 2019 A1
20200012604 Agarwal Jan 2020 A1
20200026656 Liao et al. Jan 2020 A1
20200065269 Balasubramani et al. Feb 2020 A1
20200259803 Menachem et al. Aug 2020 A1
20200314181 Eran et al. Oct 2020 A1
20200401440 Sankaran et al. Dec 2020 A1
20210042255 Colenbrander Feb 2021 A1
20210111996 Pismenny et al. Apr 2021 A1
20210133140 Jeansonne et al. May 2021 A1
20210203610 Pismenny et al. Jul 2021 A1
20210209052 Chen et al. Jul 2021 A1
20220075747 Shuler et al. Mar 2022 A1
20220092135 Sidman Mar 2022 A1
20220100687 Sahin et al. Mar 2022 A1
20220103629 Cherian Mar 2022 A1
20220283964 Burstein et al. Sep 2022 A1
20220308764 Pismenny et al. Sep 2022 A1
20220309019 Duer et al. Sep 2022 A1
20220334989 Bar-Ilan et al. Oct 2022 A1
20220391341 Rosenbaum et al. Dec 2022 A1
20230010150 Ben-Ishay et al. Jan 2023 A1
Foreign Referenced Citations (3)
Number Date Country
1657878 May 2006 EP
2463782 Jun 2012 EP
2010062679 Jun 2010 WO
Non-Patent Literature Citations (39)
Entry
U.S. Appl. No. 17/211,928 Office Action dated May 25, 2023.
“Linux kernel enable the IOMMU—input/output memory management unit support”, pp. 1-2, Oct. 15, 2007 downloaded from http://www.cyberciti.biz/tips/howto-turn-on-linux-software-iommu-support.html.
Hummel M., “IO Memory Management Hardware Goes Mainstream”, AMD Fellow, Computation Products Group, Microsoft WinHEC, pp. 1-7, 2006.
PCI Express, Base Specification, Revision 3.0, pp. 1-860, Nov. 10, 2010.
NVM Express, Revision 1.0e, pp. 1-127, Jan. 23, 2013.
Infiniband Trade Association, “InfiniBandTM Architecture Specification”, vol. 1, Release 1.2.1, pp. 1-1727, Nov. 2007.
Shah et al., “Direct Data Placement over Reliable Transports”, IETF Network Working Group, RFC 5041, pp. 1-38, Oct. 2007.
Culley et al., “Marker PDU Aligned Framing for TCP Specification”, IETF Network Working Group, RFC 5044, pp. 1-75, Oct. 2007.
“MPI: A Message-Passing Interface Standard”, Version 2.2, Message Passing Interface Forum, pp. 1-64, Sep. 4, 2009.
Welsh et al., “Incorporating Memory Management into User-Level Network Interfaces”, Department of Computer Science, Cornell University, Technical Report TR97-1620, pp. 1-10, Feb. 13, 1997.
Tsirkin et al., “Virtual I/O Device (VIRTIO) Version 1.1”, Committee Specification Draft 01/Public Review Draft 01, OASIS Open, pp. 1-121, Dec. 20, 2018.
“Switchtec PAX Gen 4 Advanced Fabric PCle Switch Family—PM42100, PM42068, PM42052, PM42036, PM42028,” Product Brochure, Microchip Technology Incorporated, pp. 1-2, year 2021.
Regula, “Using Non-Transparent Bridging in PCI Express Systems,” PLX Technology, Inc., pp. 1-31, Jun. 2004.
Marcovitch et al., U.S. Appl. No. 17/987,904, filed Nov. 16, 2022.
Marcovitch, U.S. Appl. No. 17/707,555, filed Mar. 29, 2022.
Liss et al., U.S. Appl. No. 17/976,909, filed Oct. 31, 2022.
Mellanox Technologies, “Understanding On Demand Paging (ODP),” Knowledge Article, pp. 1-6, Feb. 20, 2019 downloaded from https://community.mellanox.com/s/article/understanding-on-demand-paging--odp-x.
U.S. Appl. No. 17/372,466 Office Action dated Feb. 15, 2023.
Shirey, “Internet Security Glossary, Version 2”, Request for Comments 4949, pp. 1-365, Aug. 2007.
Information Sciences Institute, “Transmission Control Protocol; DARPA Internet Program Protocol Specification”, Request for Comments 793, pp. 1-90, Sep. 1981.
InfiniBand TM Architecture Specification vol. 1, Release 1.3, pp. 1-1842, Mar. 3, 2015.
Stevens., “TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms”, Request for Comments 2001, pp. 1-6, Jan. 1997.
Netronome Systems, Inc., “Open vSwitch Offload and Acceleration with Agilio® CX SmartNICs”, White Paper, pp. 1-7, Mar. 2017.
Dierks et al., “The Transport Layer Security (TLS) Protocol Version 1.2”, Request for Comments: 5246 , pp. 1-104, Aug. 2008.
Turner et al., “Prohibiting Secure Sockets Layer (SSL) Version 2.0”, Request for Comments: 6176, pp. 1-4, Mar. 2011.
Rescorla et al., “The Transport Layer Security (TLS) Protocol Version 1.3”, Request for Comments: 8446, pp. 1-160, Aug. 2018.
Comer., “Packet Classification: A Faster, More General Alternative to Demultiplexing”, The Internet Protocol Journal, vol. 15, No. 4, pp. 12-22, Dec. 2012.
Salowey et al., “AES Galois Counter Mode (GCM) Cipher Suites for TLS”, Request for Comments: 5288, pp. 1-8, Aug. 2008.
Burstein, “Enabling Remote Persistent Memory”, SNIA-PM Summit, pp. 1-24, Jan. 24, 2019.
Chung et al., “Serving DNNs in Real Time at Datacenter Scale with Project Brainwave”, IEEE Micro Pre-Print, pp. 1-11, Mar. 22, 2018.
Talpey, “Remote Persistent Memory—With Nothing But Net”, SNIA-Storage developer conference , pp. 1-30, year 2017.
Microsoft, “Project Brainwave”, pp. 1-5, year 2019.
“NVM Express—Base Specifications,” Revision 2.0, pp. 1-452, May 13, 2021.
Pismenny et al., “Autonomous NIC Offloads”, submitted for evaluation of the 26th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '21), p. 1-18, Dec. 13, 2020.
Lebeane et al., “Extended Task queuing: Active Messages for Heterogeneous Systems”, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), pp. 933-944, Nov. 2016.
NVM Express Inc., “NVM Express over Fabrics,” Revision 1.0, pp. 1-49, Jun. 5, 2016.
U.S. Appl. No. 17/527,197 Office Action dated Sep. 28, 2023.
U.S. Appl. No. 17/976,909 Office Action dated Feb. 21, 2024.
U.S. Appl. No. 17/987,904 Office Action dated Apr. 11, 2024.
Related Publications (1)
Number Date Country
20240143528 A1 May 2024 US