Data processing unit with transparent root complex

Information

  • Patent Grant
  • 12117948
  • Patent Number
    12,117,948
  • Date Filed
    Monday, October 31, 2022
    3 years ago
  • Date Issued
    Tuesday, October 15, 2024
    a year ago
Abstract
Computing apparatus includes a central processing unit (CPU) and a root complex connected to the CPU and to a first peripheral component bus, which has at least a first downstream port for connection to at least one peripheral device. Switching logic has an upstream port for connection to a second downstream port on a second peripheral component bus of a host computer, and is connected to the root complex so that when a peripheral device is connected to the first downstream port on the first peripheral component bus, the switching logic presents the peripheral device to the host computer in an address space of the second peripheral component bus.
Description
FIELD OF THE INVENTION

The present invention relates generally to computing systems, and particularly to devices and methods for bridging memory address spaces among computing system components and peripheral devices.


BACKGROUND

PCI Express® (commonly referred to as PCIe) is a high-speed packet-based peripheral component bus standard, which is used in most current computer motherboards. The PCIe architecture is built around a “root complex,” which connects the central processing unit (CPU) and memory subsystem of the computer to the PCIe interconnect fabric. The root complex has one or more downstream ports, which connect to the upstream ports of PCIe endpoints, switches, or bridges to other PCIe buses. Each PCIe switch has a number of downstream ports, which may likewise connect to upstream ports of other endpoints, switches, or bridges, thus forming a sub-hierarchy within the PCIe fabric.


SUMMARY

Embodiments of the present invention that are described hereinbelow provide improved apparatus and methods for interconnecting host processors and peripheral devices.


There is therefore provided, in accordance with an embodiment of the invention, computing apparatus, including a central processing unit (CPU) and a root complex connected to the CPU and to a first peripheral component bus, which has at least a first downstream port for connection to at least one peripheral device. Switching logic has an upstream port for connection to a second downstream port on a second peripheral component bus of a host computer, and is connected to the root complex so that when a peripheral device is connected to the first downstream port on the first peripheral component bus, the switching logic presents the peripheral device to the host computer in an address space of the second peripheral component bus.


In a disclosed embodiment, the switching logic includes a virtual switch. Additionally or alternatively, the peripheral device includes a data storage device and/or a network interface controller (NIC).


In some embodiments, the switching logic is to present the peripheral device as a physical function in the address space of the second peripheral component bus. In one embodiment, the peripheral device is to expose a virtual function on the first peripheral component bus, and the switching logic is to present the virtual function as the physical function in the address space of the second peripheral component bus.


Alternatively or additionally, the switching logic is to present the peripheral device as a virtual function in the address space of the second peripheral component bus.


In some embodiments, the address space of the second peripheral component bus is a second address space, and the root complex is to present the peripheral device to the CPU in a first address space of the first peripheral component bus. In a disclosed embodiment, the switching logic is to receive a bus transaction via the upstream port referencing the second address space of the second peripheral component bus and directed to the peripheral device, to translate the bus command to the first address space, and to transmit the translated bus command over the first peripheral component bus.


In one embodiment, the switching logic is to reserve a segment of the address space for a dummy device, to enable hot-plugging of a further peripheral device in the reserved segment.


There is also provided, in accordance with an embodiment of the invention, a method for computing, which includes providing a peripheral device server including a root complex connected to a first peripheral component bus. A peripheral device is connected to a first downstream port on the first peripheral component bus. An upstream port of the peripheral device server is connected to a second downstream port on a second peripheral component bus of a host computer. Using switching logic in the peripheral device server coupled between the upstream port and the first downstream port, the peripheral device is presented to the host computer in an address space of the second peripheral component bus.


The present invention will be more fully understood from the following detailed description of the embodiments thereof, taken together with the drawings in which:





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a block diagram that schematically illustrates a computer system, in accordance with an embodiment of the invention; and



FIGS. 2, 3 and 4 are block diagrams that schematically illustrate methods for interconnection between a host processor and a peripheral device, in accordance with an embodiment of the invention.





DETAILED DESCRIPTION OF EMBODIMENTS
Overview

In conventional computer and software architectures, the host computer communicates with peripheral devices via its own, local peripheral component bus, such as a PCIe bus. For this purpose, the host computer includes a root complex, as explained above. The operating system of the host computer defines the peripheral device functions that are available via the bus and assigns each one an address range on the bus, known as a base address register (BAR) in PCIe parlance. Application software running on the host processor can then access the peripheral device functions by writing to and reading from the assigned address ranges. The PCIe bus also permits “hot plugging,” in which devices can be connected to and disconnected from the bus while the computer is running.


In large computer networks, for example in a data center, multiple host computers commonly share the resources of a given peripheral device, such as a network interface controller (NIC), a graphics processing unit (GPU), or a storage device, such as a solid-state disk (SSD). These peripheral devices may be connected to a dedicated server, such as a storage server or “smart NIC,” which distributes the peripheral device services among the host computers. A peripheral device server of this sort is referred to herein, for the sake of simplicity and clarity, as a “data processing unit” (DPU), regardless of the type or types of peripheral devices that are connected to it. The DPU may run software that emulates the functionality of a peripheral device that is attached locally to a host computer, while the peripheral device is accessed through and controlled by the DPU. This sort of software-based emulation is described, for example, in U.S. Patent Application Publication 2022/0309019, whose disclosure is incorporated herein by reference.


Embodiments of the present invention that are described herein offer an alternative solution, in which host computers are able to interact with a peripheral device on the peripheral component bus of the DPU using standard bus transactions. The interaction is transparent to the host computer, as though the peripheral device were connected directly to the peripheral component bus of the host computer rather than to the DPU. Translation of transaction parameters is carried out by switching logic in the DPU and does not require software-based emulation (although it can be integrated with emulation to offer additional device capabilities to the host computer).


In the disclosed embodiments, address ranges on the host peripheral component bus are allocated to selected functions of a peripheral device or devices attached to the DPU. When bus commands are issued by software running on the host computer to these address ranges on the host bus, the commands are “tunneled” transparently through the root complex of the DPU, with appropriate conversion of the bus addresses and identifiers, to destination devices on the peripheral component bus of the DPU. Responses from the devices are tunneled back to the peripheral component bus of the host computer in similar fashion. By appropriate reservation of address ranges on the peripheral component bus of the host computer, it is also possible to enable the host computer to access devices that have been hot-plugged into the peripheral component bus of the host computer.


The embodiments that are described hereinbelow provide computing apparatus, such as a DPU, which comprises a central processing unit (CPU) and a root complex. The root complex is connected both to the CPU and to a peripheral component bus of the DPU, which has downstream ports for connection to peripheral devices. In addition, the DPU comprises switching logic, which is connected to the CPU and root complex of the DPU and has one or more upstream ports for connection to downstream ports on the peripheral component buses of one or more host computers (and is thus also in communication with the root complex of the host computer). When a peripheral device is connected to a downstream port on the peripheral component bus of the DPU, the switching logic can be configured to present the peripheral device to the host computer in the address space of the peripheral component bus of the host computer.


The switching logic typically comprises a virtual switch, which may be programmed in software to expose either physical functions or virtual functions, or both, of the peripheral devices on the bus of the DPU to the host computer. The switching logic may have multiple upstream ports, for connection to downstream ports on the peripheral component buses of multiple host computers, and may be programmed to assign different functions of the same peripheral device to different host computers. Each function has its own address range (BAR), which is translated by the switching logic to a corresponding address range on the peripheral component bus of the host computer to which it has been assigned.


System Description


FIG. 1 is a block diagram that schematically illustrates a computer system 20, in accordance with an embodiment of the invention. System 20 comprises a DPU 22, which supports transparent tunneling of root complex functionality for one or more servers 24.


Each server 24 comprises a host computer 26, which includes a CPU 28 and a memory 30. A root complex (RC) 32 in host computer 26 connects to a local peripheral component bus, such as a PCIe bus 34, having multiple downstream ports (DSPs) 36, 38, 40. In the pictured example, DSP 36 connects directly to a local peripheral device, such as a NIC 42 that serves server 24. DSP 40, on the other hand, connects to an upstream port (USP) 44 of DPU 22. Other USPs 46, 48 of DPU 22 connect to downstream ports on other servers 24.


DPU 22 comprises a CPU 50 and a memory 52, along with its own root complex (RC) 54, which connects to a PCIe bus 56 of DPU 22. PCIe bus 56 comprises multiple DSPs 58, 60, 62 for physical connection to respective endpoints. In the pictured example, DSP 58 connects to a NIC 64, which is connected to a packet network 66; DSP 60 connects to a GPU 68; and DSP 62 connects to an SSD 70. RC 54 presents the devices on PCIe bus 56 to CPU 50 in the native address space of the PCIe bus, thus enabling CPU 50 to access the peripheral devices directly. CPU 50 may perform device emulation functions on behalf of servers 24, such as the functions described in the above-mentioned U.S. Patent Application Publication 2022/0309019, but these functions are beyond the scope of the present description.


Switching logic 72 in DPU 22 connects to USPs 44, 46, 48 and to RC 54, so as to enable servers 24 to access the functions of the peripheral devices on PCIe bus 56, such as NIC 64, GPU 68, and/or SSD 70. For this purpose, switching logic 72 functions as a virtual switch, which is configured in software by CPU 50. In this respect, switching logic 72 appears to host computer 26 to be a PCIe switch, having USP 44 connected to DSP 40 and virtual downstream ports exposing the physical and/or virtual functions of one or more of the actual, physical endpoints on PCIe bus 56.


Switching logic 72 tunnels bus transactions received through USP 44 to bus 56, and similarly tunnels bus transactions from bus 56 through to USP 44. This tunneling functionality is carried out in real time by the switching logic, using tables to translate the bus addresses and the Bus Device Function (BDF) indicators between PCIe bus 34 of host computer 26 and PCIe bus 56 of DPU 22. The tables can be programmed individually for each peripheral device function that is exposed by DPU 22 to servers 24. Details of these tables are described hereinbelow.


Although PCIe buses 34 and 56 in FIG. 1 have simple, flat topologies for the sake of simplicity, the principles of the present embodiments may similarly be applied to buses comprising multiple levels of switches and endpoints. Furthermore, although the present embodiments relate, for the sake of clarity, specifically to PCIe buses, the principles of the present invention are similarly applicable, mutatis mutandis, to peripheral component buses of other types. All such alternative embodiments and configurations are considered to be within the scope of the present invention.


Tunneling Configurations

When a peripheral device is physically connected to a PCIe bus, it exposes at least one physical function (PF) and may expose one or more virtual functions (VFs). The PF enables a host computer to control a wider range of capabilities of the peripheral device, such as single root input/output virtualization (SR-IOV) and power management, while the VFs provide access to only a narrower range of functionalities. In embodiments of the present invention, switching logic 72 can expose the PF and VFs to host computer 26 in different combinations, as illustrated in the figures that follow.


The functions of each peripheral device on bus 56 of DPU 22 are exposed to root complex 54 during the host enumeration phase of the bootup of DPU 22. (Each such peripheral device is referred to as an endpoint, or EP.) Software running on CPU 50 builds the configuration space of bus 56, including assigning a local BDF and BAR to each of the functions of each of the endpoints on bus 56. For some purposes, such as implementation of SR-IOV, the addresses of the VFs of a given endpoint may be separated by a specified, fixed stride. The configuration space also defines the capabilities of each function and indicates to switching logic 72 which functions and capabilities are available for tunneled use by servers 24 via USP 44, 46 or 48. Different functions of the same endpoint on bus 56 may be exposed to different servers 24 via the respective USPs, so that the servers can share the functions of a given peripheral device.


Host computer 26 on server 24 carries out its own host enumeration process and thus builds the configuration space of its own bus 34. As a part of this process, functions exposed by switching logic 72 through the corresponding USP 44, 46 or 48 are also enumerated, and host computer 26 assigns a BAR to each function. Switching logic 72 may also expose a dummy function to host computer 26, which causes the host computer to reserve a BAR for the dummy function. This reserved BAR can later be used to make space to access hot-plugged devices on bus 34 of server 24, by “unplugging” the dummy device.


In the configuration space of bus 34, the capabilities of the tunneled functions may be identical to those of the corresponding functions on bus 56, or they may be different. For example, as illustrated in the examples described below, the PF of a given EP may or may not be tunneled together with the VFs. As another example, a given VF on bus 56 may be exposed on bus 34 as though it were a PF. When the PF is tunneled to bus 34, certain capabilities of the PF, for example power management, may be masked so that host computer 26 is unable to interact with these capabilities. Additionally or alternatively, the VFs of a given endpoint may offer capabilities that the actual, peripheral device does not offer but rather are emulated by software running on DPU 22. For example, an SSD that is physically configured to support a given storage protocol may receive and return data from and to host computer 26 by emulation of a different storage protocol.


After the tunneled version of a given function (PF or VF) on bus 56 has been enumerated and configured in RC 32 of bus 34, CPU 50 issues an “Engage Device” command to instruct switching logic 72 to begin tunneling transactions between host computer 26 and the corresponding EP on bus 56. As part of this process, CPU 50 builds translation tables in the memory of switching logic 72 to enable the switching logic to translate bus commands and responses on the fly between the address spaces of buses 34 and 56. Other tables are built ad hoc per transaction. Simplified examples of these tables and their use are presented below.



FIG. 2 is a block diagram that schematically illustrates a method for interconnection of functions of a peripheral device on bus 56 of DPU 22 to corresponding functions on bus 34 of server 24, in accordance with an embodiment of the invention. In this example, the peripheral device serving as the physical endpoint (EP) on bus 56 is assumed to be SSD 70; but the principles of this and the subsequent examples are similarly applicable to other types of peripheral devices.


In the pictured example, SSD 70 exposes a PF 84 and multiple VFs 86 via DSP 62 on bus 56 of DPU 22. All these functions are exposed fully to server 24, which configures a corresponding PF 80 and VFs 82 on bus 34 via DSP 40. In the course of the enumeration, configuration, and “Device Engage” processes described above, CPU 50 of DPU 22 configures translation tables 90 in switching logic 72 for the respective BARs of PF 80 and VFs 82. (Alternatively, when there is a fixed stride between multiple VFs within a single BAR, as is used in SR-IOV configurations, a single translation table may be used for all these VFs in each direction of communication.) Translation tables 90 enable switching logic 72 to implement tunnels 88 between PFs 80 and 84 and between VFs 82 and 86. Thus, server 24 is able to control and interact with SSD 70 as though the SSD was physically attached to DSP 40.



FIG. 3 is a block diagram that schematically illustrates a method for interconnection of functions of a peripheral device on bus 56 of DPU 22 to corresponding functions on bus 34 of server 24, in accordance with another embodiment of the invention. In this case, PF 84 on bus 56 of DPU 22 is not exposed to server 24, so that DSP 22 maintains control of the actual PF 84 of the endpoint. Switching logic 72 exposes VFs 86 as though they were individual PFs of different physical endpoints. Server 24 configures corresponding PFs 80 on bus 34, connected by tunnels 88 to VFs 86.



FIG. 4 is a block diagram that schematically illustrates a method for interconnection of functions of a peripheral device on bus 56 of DPU 22 to corresponding functions on bus 34 of server 24, in accordance with yet another embodiment of the invention. In this case, too, PF 84 on bus 56 of DPU 22 is not exposed to server 24, and VFs 86 of the physical endpoint on bus 56 are connected by tunnels 88 to corresponding VFs 82 on bus 34. Since PCIe convention requires that a PF be present on the bus for each connected device, VFs 82 are accompanied by a dummy PF 92, which is not tunneled to any active function on bus 56.


The following tables are simplified examples of translation tables 90 used by switching logic 72 in translating PCIe transaction layer packets (TLPs) received by DPU 22 from host computer 26 for transmission on bus 56 (Table I) and TLPs sent from DPU 22 to host computer 26 for transmission on bus 34 (Table II). The tables are followed by several examples illustrating typical bus transactions carried out through switching logic 72 using the tables. In these examples, the tables are assumed to be used for accessing bus addresses in the BAR 0x100000-0x101000 on bus 34.









TABLE I







ADDRESS TRANSLATION - HOST TO PHYSICAL EP










Field
Description
Value
Configuration





region_io
Set if the tunnel
0
Communicated by



represents IO

DPU SW during



BAR, cleared if it

“Engage Device”



represents

command.



Memory BAR.


region_start_addr[63:12]
Stores the BAR
0x200000
Communicated by



that was assigned

DPU SW during



by the DPU to the

“Engage Device”



EP.

command.



When Host



initiates Memory



Request to the



DPU Peripheral



Device, the



Address field in



TLP will be



translated using



this field as base.


emu_space_reqid[15:0]
Stores the BDF of
02:00.0
Known during



the root port in

DPU



DPU domain.

enumeration.



When Host



initiates Memory



Request to the



DPU Peripheral



Device, the



‘RequesterID’ field



in TLP will be



replaced by this



field.


func_bar_log_size[5:0]
Reflects the size
1 Kb
Communicated by



of the BAR.

DPU SW during





“Engage Device”





command.


num_of_func[1:0]
Reflects the
1
Communicated by



number of

DPU SW during



Functions in DPU

“Engage Device”



domain that are

command.



engaged to the



Host.
















TABLE II







ADDRESS TRANSLATION - PHYSICAL EP TO HOST










Field
Description
Value
Configuration





host_pid[2:0]
The USP to which the
Host#1
Communicated by


host_lid[2:0]
Host is connected.

DPU SW during





“Engage Device”





command.


dest_bdf[15:0]
Holds the BDF of the
02:00.0
Known during Host


(xlated_bdf)
root port in Host

enumeration.



domain.



When Physical EP



initiates Message-



Routed-by-BDF, the



‘BDF’ field in TLP will



be replaced by this



field.


emu_requested_id[15:0]
Holds the BDF of the
01:00.0
Communicated by


(xlated_requester_id)
DPU Peripheral

DPU SW during



Function in the Host

“Engage Device”



domain.

command.



When Physical EP



initiates transaction,



the ‘RequesterID’ field



in TLP will be replaced



by this field.


num_of_func[15:0]
Reflects the number of
1
Communicated by



Functions in DPU

DPU SW during



domain that are

“Engage Device”



engaged to the Host.

command.









Example I—Host to EP Write

Original TLP on Bus 34






    • Type: WRITE

    • BAR address: 0x100004

    • RequesterID: 11:00.0 (Host #1 CPU)

    • Data: 0xDEADBEAF





The BAR address is translated using the base address of the range (0x100000) and the region_start_addr field from Table I. The Requester ID is translated using the BDF provided by the emu_space_reqid field in Table I. The resulting translated TLP is as follows:


Translated TLP on Bus 56






    • Type: WRITE

    • BAR address: 0x100004−0x100000+0x200000=0x200004

    • RequesterID: 12:00.0

    • Data: 0xDEADBEAF


      (In this example, the value “12:00.0” is globally configured as the Requester ID of root complex 54.)





Example II—Host to EP Read

In response to a read request from host computer 26, switching logic 72 creates a temporary context table (shown below as Table III), which it then uses in completing the read transaction by sending data to the host computer.


Original TLP on Bus 34






    • Type: READ

    • BAR address: 0x100004

    • RequesterID: 11:00.0

    • Tag=0xA


      Translated TLP on Bus 56

    • Type: READ

    • BAR address: 0x100004−0x100000+0x200000=0x200004

    • RequesterID: 12:00.0

    • Tag=0xB (read from a pool of free tags)












TABLE III







READ CONTEXT TABLE










Field
Value







Original tag
0xA



Original Req ID
11:00.0



Orig. Completer ID
01:00.0










In response to the translated read TLP, SSD 70 will return a completion TLP over bus 56 with Requester ID 12:00.0 and Tag 0xB. Switching logic 72 uses Tables II and III in generating a translated TLP for transmission over bus 34:


Original TLP on Bus 56






    • Type: Completion

    • RequesterID: 12:00.0

    • CompleterID: 02:00.0

    • Tag=0xB

    • Data=0xDEADBEAF


      Translated TLP on Bus 34

    • Type: Completion

    • Requester ID: 11:00.0

    • CompleterID: 01:00.0

    • Tag=0xA

    • Data=0xDEADBEAF





After completion of the transaction, the context table is deleted.


Example III—Host to EP Write

Switching logic 72 performs the following translation using the BDF provided by the emu_requested_id field in Table II:


Original TLP on Bus 56






    • Type: WRITE

    • RequesterID: 02:00.0

    • Address=0x80000


      Translated TLP on Bus 34

    • Type: WRITE

    • RequesterID: 01:00.0

    • Address=0x80000





Example IV—EP to Host Read

In response to a read request from SSD 70, switching logic 72 creates a temporary context table (shown below as Table IV), which it then uses in completing the read transaction by conveying data from host computer 26 to SSD 70.


Original TLP on Bus 56






    • Type: READ

    • RequesterID: 02:00.0

    • Address: 0x80000

    • Tag=0xA


      Translated TLP on Bus 34

    • Type: READ

    • RequesterID: 01:00.0

    • Address: 0x80000

    • Tag=0xB












TABLE IV







READ CONTEXT TABLE










Field
Value







Original tag
0xA



Original Req ID
02:00.0



Orig. Completer ID
12:00.0










In response to the translated read TLP, host computer 26 will return a completion TLP over bus 34 with Requester ID 02:00.0 and Tag 0xB. Switching logic 72 uses Tables I and IV in generating a translated TLP for transmission over bus 56:


Original TLP on Bus 34






    • Type: Completion

    • RequesterID: 01:00.0

    • CompleterID: 11:00.0

    • Tag=0xB

    • Data=0xDEADBEAF


      Translated TLP on Bus 56

    • Type: Completion

    • Requester ID: 02:00.0

    • CompleterID: 12:00.0

    • Tag=0xA

    • Data=0xDEADBEAF





After completion of the transaction, the context table is deleted.


It will be appreciated that the embodiments described above are cited by way of example, and that the present invention is not limited to what has been particularly shown and described hereinabove. Rather, the scope of the present invention includes both combinations and subcombinations of the various features described hereinabove, as well as variations and modifications thereof which would occur to persons skilled in the art upon reading the foregoing description and which are not disclosed in the prior art.

Claims
  • 1. Computing apparatus, comprising: a central processing unit (CPU);a root complex connected to the CPU and to a first peripheral component bus, which has a first address space and has at least a first downstream port for connection to at least one peripheral device, wherein the root complex is to present the at least one peripheral device to the CPU in the first address space; andswitching logic, which has an upstream port for connection to a second downstream port on a second peripheral component bus of a host computer, the second peripheral component bus having a second address space, wherein the switching logic is connected to the root complex so that when a peripheral device is connected to the first downstream port on the first peripheral component bus, the switching logic presents the peripheral device to the host computer in the second address space of the second peripheral component bus.
  • 2. The apparatus according to claim 1, wherein the switching logic comprises a virtual switch.
  • 3. The apparatus according to claim 1, wherein the switching logic is to present the peripheral device as a physical function in the address space of the second peripheral component bus.
  • 4. The apparatus according to claim 3, wherein the peripheral device is to expose a virtual function on the first peripheral component bus, and the switching logic is to present the virtual function as the physical function in the address space of the second peripheral component bus.
  • 5. The apparatus according to claim 1, wherein the switching logic is to present the peripheral device as a virtual function in the address space of the second peripheral component bus.
  • 6. The apparatus according to claim 1, wherein the peripheral device comprises a data storage device.
  • 7. The apparatus according to claim 1, wherein the peripheral device comprises a network interface controller (NIC).
  • 8. The apparatus according to claim 1, wherein the switching logic is to receive a bus transaction via the upstream port referencing the second address space of the second peripheral component bus and directed to the peripheral device, to translate the bus command to the first address space, and to transmit the translated bus command over the first peripheral component bus.
  • 9. The apparatus according to claim 1, wherein the switching logic is to reserve a segment of the address space for a dummy device, to enable hot-plugging of a further peripheral device in the reserved segment.
  • 10. A method for computing, comprising: providing a peripheral device server comprising a central processing unit (CPU) and a root complex connected to a first peripheral component bus having a first address space;connecting a peripheral device to a first downstream port on the first peripheral component bus, wherein the root complex presents the peripheral device to the CPU in the first address space;connecting an upstream port of the peripheral device server to a second downstream port on a second peripheral component bus of a host computer, the second peripheral component bus having a second address space; andusing switching logic in the peripheral device server coupled between the upstream port and the first downstream port, presenting the peripheral device to the host computer in the second address space of the second peripheral component bus.
  • 11. The method according to claim 10, wherein the switching logic comprises a virtual switch coupled between the upstream port and the first downstream port.
  • 12. The method according to claim 10, wherein presenting the peripheral device comprises exposing the peripheral device as a physical function in the address space of the second peripheral component bus.
  • 13. The method according to claim 12, wherein coupling the peripheral device comprises exposing a virtual function of the peripheral device on the first peripheral component bus, and wherein exposing the peripheral device comprises presenting the virtual function as the physical function in the address space of the second peripheral component bus.
  • 14. The method according to claim 10, wherein presenting the peripheral device comprises exposing the peripheral device as a virtual function in the address space of the second peripheral component bus.
  • 15. The method according to claim 10, wherein the peripheral device comprises a data storage device.
  • 16. The method according to claim 10, wherein the peripheral device comprises a network interface controller (NIC).
  • 17. The method according to claim 10, and comprising receiving, by the switching logic, a bus transaction via the upstream port referencing the second address space of the second peripheral component bus and directed to the peripheral device, translating the bus command to the first address space, and transmitting the translated bus command over the first peripheral component bus.
  • 18. The method according to claim 10, and comprising reserving a segment of the address space for a dummy device, to enable hot-plugging of a further peripheral device in the reserved segment.
US Referenced Citations (225)
Number Name Date Kind
5003465 Chisholm et al. Mar 1991 A
5463772 Thompson et al. Oct 1995 A
5615404 Knoll et al. Mar 1997 A
5768612 Nelson Jun 1998 A
5864876 Rossum et al. Jan 1999 A
5893166 Frank et al. Apr 1999 A
5954802 Griffith Sep 1999 A
6070219 McAlpine et al. May 2000 A
6226680 Boucher et al. May 2001 B1
6321276 Forin Nov 2001 B1
6581130 Brinkmann et al. Jun 2003 B1
6701405 Adusumilli et al. Mar 2004 B1
6766467 Neal et al. Jul 2004 B1
6789143 Craddock et al. Sep 2004 B2
6901496 Mukund et al. May 2005 B1
6981027 Gallo et al. Dec 2005 B1
7171484 Krause et al. Jan 2007 B1
7225277 Johns et al. May 2007 B2
7263103 Kagan et al. Aug 2007 B2
7299266 Boyd et al. Nov 2007 B2
7395364 Higuchi et al. Jul 2008 B2
7464198 Martinez et al. Dec 2008 B2
7475398 Nunoe Jan 2009 B2
7502884 Shah Mar 2009 B1
7548999 Haertel et al. Jun 2009 B2
7577773 Gandhi et al. Aug 2009 B1
7657659 Lambeth et al. Feb 2010 B1
7720064 Rohde May 2010 B1
7752417 Manczak et al. Jul 2010 B2
7809923 Hummel et al. Oct 2010 B2
7921178 Haviv Apr 2011 B2
7921237 Holland et al. Apr 2011 B1
7945752 Miller et al. May 2011 B1
8001592 Hatakeyama Aug 2011 B2
8006297 Johnson et al. Aug 2011 B2
8010763 Armstrong et al. Aug 2011 B2
8051212 Kagan et al. Nov 2011 B2
8103785 Crowley et al. Jan 2012 B2
8255475 Kagan et al. Aug 2012 B2
8260980 Weber et al. Sep 2012 B2
8346919 Eiriksson et al. Jan 2013 B1
8447904 Riddoch May 2013 B2
8504780 Mine et al. Aug 2013 B2
8645663 Kagan et al. Feb 2014 B2
8745276 Bloch et al. Jun 2014 B2
8751701 Shahar et al. Jun 2014 B2
8824492 Wang et al. Sep 2014 B2
8892804 Morein Nov 2014 B2
8949486 Kagan et al. Feb 2015 B1
9038073 Kohlenz et al. May 2015 B2
9092426 Bathija et al. Jul 2015 B1
9298723 Vincent Mar 2016 B1
9331963 Krishnamurthi May 2016 B2
9483290 Mantri Nov 2016 B1
9678818 Raikin et al. Jun 2017 B2
9696942 Kagan et al. Jul 2017 B2
9727503 Kagan et al. Aug 2017 B2
9830082 Srinivasan et al. Nov 2017 B1
9904568 Vincent et al. Feb 2018 B2
10078613 Ramey Sep 2018 B1
10120832 Raindel et al. Nov 2018 B2
10135739 Raindel et al. Nov 2018 B2
10152441 Liss et al. Dec 2018 B2
10162793 Bshara et al. Dec 2018 B1
10210125 Burstein Feb 2019 B2
10218645 Raindel et al. Feb 2019 B2
10423774 Zelenov et al. Apr 2019 B1
10382350 Bohrer et al. Aug 2019 B2
10417156 Hsu Sep 2019 B2
10628622 Sivaraman et al. Apr 2020 B1
10657077 Ganor et al. May 2020 B2
10671309 Glynn Jun 2020 B1
10684973 Connor et al. Jun 2020 B2
10715451 Raindel et al. Jul 2020 B2
10824469 Hirshberg et al. Nov 2020 B2
10841243 Levi et al. Nov 2020 B2
10999364 Itigin et al. May 2021 B1
11003607 Ganor et al. May 2021 B2
11080225 Borikar Aug 2021 B2
11086713 Sapuntzakis et al. Aug 2021 B1
11126575 Aslanidis et al. Sep 2021 B1
11537548 Makhija Dec 2022 B2
11550745 Kelm Jan 2023 B1
20020152327 Kagan et al. Oct 2002 A1
20030023846 Krishna et al. Jan 2003 A1
20030046530 Poznanovic Mar 2003 A1
20030120836 Gordon Jun 2003 A1
20040010612 Pandya Jan 2004 A1
20040039940 Cox et al. Feb 2004 A1
20040057434 Poon et al. Mar 2004 A1
20040158710 Buer et al. Aug 2004 A1
20040221128 Beecroft et al. Nov 2004 A1
20040230979 Beecroft et al. Nov 2004 A1
20050102497 Buer May 2005 A1
20050198412 Pedersen et al. Sep 2005 A1
20050216552 Fineberg et al. Sep 2005 A1
20060095754 Hyder et al. May 2006 A1
20060104308 Pinkerton et al. May 2006 A1
20060259291 Dunham et al. Nov 2006 A1
20060259661 Feng et al. Nov 2006 A1
20070011429 Sangili et al. Jan 2007 A1
20070061492 Van Riel Mar 2007 A1
20070223472 Tachibana et al. Sep 2007 A1
20070226450 Engbersen et al. Sep 2007 A1
20070283124 Menczak et al. Dec 2007 A1
20070297453 Niinomi Dec 2007 A1
20080005387 Mutaguchi Jan 2008 A1
20080092148 Moertl Apr 2008 A1
20080147822 Benhase et al. Jun 2008 A1
20080147904 Freimuth et al. Jun 2008 A1
20080168479 Purtell et al. Jul 2008 A1
20080313364 Flynn et al. Dec 2008 A1
20090086736 Foong et al. Apr 2009 A1
20090106771 Benner et al. Apr 2009 A1
20090204650 Wong et al. Aug 2009 A1
20090319775 Buer et al. Dec 2009 A1
20090328170 Williams et al. Dec 2009 A1
20100030975 Murray et al. Feb 2010 A1
20100095053 Bruce et al. Apr 2010 A1
20100095085 Hummel et al. Apr 2010 A1
20100211834 Asnaashari et al. Aug 2010 A1
20100217916 Gao et al. Aug 2010 A1
20100228962 Simon et al. Sep 2010 A1
20100322265 Gopinath et al. Dec 2010 A1
20110023027 Kegel et al. Jan 2011 A1
20110119673 Bloch et al. May 2011 A1
20110213854 Haviv Sep 2011 A1
20110246597 Swanson et al. Oct 2011 A1
20120314709 Post et al. Dec 2012 A1
20130067193 Kagan et al. Mar 2013 A1
20130080651 Pope et al. Mar 2013 A1
20130103777 Kagan et al. Apr 2013 A1
20130125125 Karino et al. May 2013 A1
20130142205 Munoz Jun 2013 A1
20130145035 Pope et al. Jun 2013 A1
20130159568 Shahar et al. Jun 2013 A1
20130263247 Jungck et al. Oct 2013 A1
20130276133 Hodges et al. Oct 2013 A1
20130311746 Raindel et al. Nov 2013 A1
20130325998 Hormuth et al. Dec 2013 A1
20130329557 Petry Dec 2013 A1
20130347110 Dalal Dec 2013 A1
20140089450 Raindel et al. Mar 2014 A1
20140089451 Eran et al. Mar 2014 A1
20140089631 King Mar 2014 A1
20140095753 Crupnicoff Apr 2014 A1
20140122828 Kagan et al. May 2014 A1
20140129741 Shahar et al. May 2014 A1
20140156894 Tsirkin et al. Jun 2014 A1
20140181365 Fanning et al. Jun 2014 A1
20140185616 Bloch et al. Jul 2014 A1
20140244965 Manula Aug 2014 A1
20140254593 Mital et al. Sep 2014 A1
20140282050 Quinn et al. Sep 2014 A1
20140282561 Holt et al. Sep 2014 A1
20150006663 Huang Jan 2015 A1
20150012735 Tamir et al. Jan 2015 A1
20150032835 Sharp et al. Jan 2015 A1
20150081947 Vucinic et al. Mar 2015 A1
20150100962 Morita et al. Apr 2015 A1
20150288624 Raindel et al. Oct 2015 A1
20150319243 Hussain et al. Nov 2015 A1
20150347185 Holt et al. Dec 2015 A1
20150355938 Jokinen et al. Dec 2015 A1
20160065659 Bloch et al. Mar 2016 A1
20160085718 Huang Mar 2016 A1
20160132329 Gupte et al. May 2016 A1
20160154673 Morris Jun 2016 A1
20160226822 Zhang et al. Aug 2016 A1
20160342547 Liss et al. Nov 2016 A1
20160350151 Zou et al. Dec 2016 A1
20160378529 Wen Dec 2016 A1
20170017609 Menachem Jan 2017 A1
20170031810 Bonzini Feb 2017 A1
20170075855 Sajeepa et al. Mar 2017 A1
20170104828 Brown et al. Apr 2017 A1
20170180273 Daly et al. Jun 2017 A1
20170187629 Shalev et al. Jun 2017 A1
20170237672 Dalal Aug 2017 A1
20170264622 Cooper et al. Sep 2017 A1
20170286157 Hasting et al. Oct 2017 A1
20170371835 Ranadive et al. Dec 2017 A1
20180004954 Liguori et al. Jan 2018 A1
20180067893 Raindel et al. Mar 2018 A1
20180109471 Chang et al. Apr 2018 A1
20180114013 Sood et al. Apr 2018 A1
20180167364 Dong et al. Jun 2018 A1
20180210751 Pepus et al. Jul 2018 A1
20180219770 Wu et al. Aug 2018 A1
20180219772 Koster et al. Aug 2018 A1
20180246768 Palermo et al. Aug 2018 A1
20180262468 Kumar et al. Sep 2018 A1
20180285288 Bemat et al. Oct 2018 A1
20180329828 Apfelbaum et al. Nov 2018 A1
20190012350 Sindhu et al. Jan 2019 A1
20190026157 Suzuki et al. Jan 2019 A1
20190116127 Pismenny et al. Apr 2019 A1
20190124113 Labana et al. Apr 2019 A1
20190163364 Gibb et al. May 2019 A1
20190173846 Patterson et al. Jun 2019 A1
20190190892 Menachem et al. Jun 2019 A1
20190199690 Klein Jun 2019 A1
20190243781 Thyamagondlu et al. Aug 2019 A1
20190250938 Claes et al. Aug 2019 A1
20200012604 Agarwal Jan 2020 A1
20200026656 Liao et al. Jan 2020 A1
20200065269 Balasubramani et al. Feb 2020 A1
20200259803 Menachem et al. Aug 2020 A1
20200314181 Eran et al. Oct 2020 A1
20200401440 Sankaran et al. Dec 2020 A1
20210042255 Colenbrander Feb 2021 A1
20210111996 Pismenny et al. Apr 2021 A1
20210133140 Jeansonne May 2021 A1
20210203610 Pismenny et al. Jul 2021 A1
20210209052 Chen et al. Jul 2021 A1
20220075747 Shuler et al. Mar 2022 A1
20220092135 Sidman Mar 2022 A1
20220100687 Sahin et al. Mar 2022 A1
20220103629 Cherian et al. Mar 2022 A1
20220283964 Burstein et al. Sep 2022 A1
20220308764 Pismenny et al. Sep 2022 A1
20220309019 Duer et al. Sep 2022 A1
20220334989 Bar-Llan et al. Oct 2022 A1
20220391341 Rosenbaum et al. Dec 2022 A1
20230010150 Ben-Ishay et al. Jan 2023 A1
Foreign Referenced Citations (3)
Number Date Country
1657878 May 2006 EP
2463782 Jun 2012 EP
2010062679 Jun 2010 WO
Non-Patent Literature Citations (39)
Entry
“Switchtec PAX Gen 4 Advanced Fabric PCIe Switch Family—PM42100, PM42068, PM42052, PM42036, PM42028,” Product Brochure, Microchip Technology Incorporated, pp. 1-2, year 2021.
Regula, “Using Non-Transparent Bridging in PCI Express Systems,” PLX Technology, Inc., pp. 1-31, Jun. 2004.
Marcovitch et al., U.S. Appl. No. 17/987,904, filed Nov. 16, 2022.
Marcovitch, U.S. Appl. No. 17/707,555, filed Mar. 29, 2022.
Marcovitch et al., U.S. Appl. No. 17/979,013, filed Nov. 2, 2022.
Mellanox Technologies, “Understanding On Demand Paging (ODP),” Knowledge Article, pp. 1-6, Feb. 20, 2019 downloaded from https://community.mellanox.com/s/article/understanding-on-demand-paging--odp-x.
U.S. Appl. No. 17/372,466 Office Action dated Feb. 15, 2023.
U.S. Appl. No. 17/527,197 Office Action dated Sep. 28, 2023.
U.S. Appl. No. 17/211,928 Office Action dated May 25, 2023.
U.S. Appl. No. 17/979,013 Office Action dated Jan. 29, 2024.
U.S. Appl. No. 17/987,904 Office Action dated Apr. 11, 2024.
Shirey, “Internet Security Glossary, Version 2”, Request for Comments 4949, pp. 1-365, Aug. 2007.
Information Sciences Institute, “Transmission Control Protocol; DARPA Internet Program Protocol Specification”, Request for Comments 793, pp. 1-90, Sep. 1981.
InfiniBand TM Architecture Specification vol. 1, Release 1.3, pp. 1-1842, Mar. 3, 2015.
Stevens., “TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms”, Request for Comments 2001, pp. 1-6, Jan. 1997.
Netronome Systems, Inc., “Open vSwitch Offload and Acceleration with Agilio® CX SmartNICs”, White Paper, pp. 1-7, Mar. 2017.
PCI Express® Base Specification , Revision 3.0, pp. 1-860, Nov. 10, 2010.
Dierks et al., “The Transport Layer Security (TLS) Protocol Version 1.2”, Request for Comments: 5246 , pp. 1-104, Aug. 2008.
Turner et al., “Prohibiting Secure Sockets Layer (SSL) Version 2.0”, Request for Comments: 6176, pp. 1-4, Mar. 2011.
Rescorla et al., “The Transport Layer Security (TLS) Protocol Version 1.3”, Request for Comments: 8446, pp. 1-160, Aug. 2018.
Comer., “Packet Classification: A Faster, More General Alternative to Demultiplexing”, The Internet Protocol Journal, vol. 15, No. 4, pp. 12-22, Dec. 2012.
Salowey et al., “AES Galois Counter Mode (GCM) Cipher Suites for TLS”, Request for Comments: 5288, pp. 1-8, Aug. 2008.
Burstein, “Enabling Remote Persistent Memory”, SNIA-PM Summit, pp. 1-24, Jan. 24, 2019.
Chung et al., “Serving DNNs in Real Time at Datacenter Scale with Project Brainwave”, IEEE Micro Pre-Print, pp. 1-11, Mar. 22, 2018.
Talpey, “Remote Persistent Memory—With Nothing But Net”, SNIA—Storage developer conference , pp. 1-30, year 2017.
Microsoft, “Project Brainwave”, pp. 1-5, year 2019.
“NVM Express—Base Specifications,” Revision 2.0, pp. 1-452, May 13, 2021.
Pismenny et al., “Autonomous NIC Offloads”, submitted for evaluation of the 26th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '21), p. 1-18, Dec. 13, 2020.
Lebeane et al., “Extended Task queuing: Active Messages for Heterogeneous Systems”, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), pp. 933-944, Nov. 2016.
NVM Express Inc., “NVM Express over Fabrics,” Revision 1.0, pp. 1-49, Jun. 5, 2016.
“Linux kernel enable the IOMMU—input/output memory management unit support”, pp. 1-2, Oct. 15, 2007 downloaded from http://www.cyberciti.biz/tips/howto-turn-on-linux-software-iommu-support.html.
Hummel M., “IO Memory Management Hardware Goes Mainstream”, AMD Fellow, Computation Products Group, Microsoft WinHEC, pp. 1-7, 2006.
NVM Express, Revision 1.0e, pp. 1-127, Jan. 23, 2013.
Infiniband Trade Association, “InfiniBandTM Architecture Specification”, vol. 1, Release 1.2.1, pp. 1-1727, Nov. 2007.
Shah et al., “Direct Data Placement over Reliable Transports”, IETF Network Working Group, RFC 5041, pp. 1-38, Oct. 2007.
Culley et al., “Marker PDU Aligned Framing for TCP Specification”, IETF Network Working Group, RFC 5044, pp. 1-75, Oct. 2007.
“MPI: A Message-Passing Interface Standard”, Version 2.2, Message Passing Interface Forum, pp. 1-64, Sep. 4, 2009.
Welsh et al., “Incorporating Memory Management into User-Level Network Interfaces”, Department of Computer Science, Cornell University, Technical Report TR97-1620, pp. 1-10, Feb. 13, 1997.
Tsirkin et al., “Virtual I/O Device (VIRTIO) Version 1.1”, Committee Specification Draft 01/Public Review Draft 01, Oasis Open, pp. 1-121, Dec. 20, 2018.
Related Publications (1)
Number Date Country
20240143526 A1 May 2024 US