The present invention relates generally to computing systems, and particularly to memory access by peripheral devices.
Computing systems often comprises a peripheral device that is connected to a host via a peripheral bus. Peripheral devices may comprise, for example, network adapters, storage devices, accelerators and Graphics Processing Units (GPUs). Peripheral buses, also referred to as system buses, may comprise, for example, Peripheral Component Interconnect Express (PCIe), Advanced Extensible Interface (AXI), Compute Express Link (CXL), Nvlink or Nvlink Chip-to-Chip (Nvlink-C2C). In many computing systems, after initial memory registration, a peripheral device is capable of accessing memory regions in a memory of the system directly using Direct Memory Access (DMA).
An embodiment of the present invention that is described herein provides a system including a processing device and a peripheral device. The processing device is to assign a memory region in a memory. The peripheral device is to set a memory-access policy responsively to usage characteristics of the memory region, and to access data in the memory region using Direct Memory Access (DMA) in accordance with the memory-access policy.
In an embodiment, the processing device is to provide to the peripheral device context information describing the memory region, and the peripheral device is to access data in the memory region in accordance with the context information. In some embodiments, the memory-access policy specifies a caching policy for caching, in the peripheral device, portions of the data or portions of context information describing the memory region. In some embodiments, the memory-access policy specifies a prefetching policy for prefetching, in the peripheral device, portions of the data or portions of context information describing the memory region.
In example embodiments, the usage characteristics include one or more of: a pattern of addresses that characterizes access to the memory region, an access frequency that characterizes access to the memory region, an access direction that characterizes access to the memory region, a location of the memory region, and whether the memory region is pinned or unpinned.
In some embodiments, the peripheral device is to deduce the usage characteristics by tracking memory-access transactions performed in the memory region. Additionally or alternatively, the processing device is to generate a hint that is indicative of the usage characteristics of the memory region and to provide the hint to the peripheral device, and the peripheral device is to set the memory-access policy responsively to the hint.
In some embodiments, the processing device is to select the hint from a defined set of hints, and the peripheral device is to select the memory-access policy from a defined set of memory-access policies. In an example embodiment, one or both of the processing device and the peripheral device are to adaptively modify one or more of: one or more of the hints, one or more of the memory-access policies, and a mapping between the hints and the memory-access policies.
In a disclosed embodiment, the memory-access policy, and a mapping between the hint and the memory-access policy, are internal to the peripheral device and are not accessible to the processing device. In another embodiment, one or both of the processing device and the peripheral device are to provide an Application Programming Interface (API) for specifying one or more of: the hint, the memory-access policy, and a mapping between the hint and the memory-access policy. In yet another embodiment, the hint is an ad-hoc hint that is valid for a defined time period or for one or more memory-access transactions to be performed in the memory region.
There is additionally provided, in accordance with an embodiment of the present invention, a method including, using a processing device, assigning a memory region in a memory. In a peripheral device, a memory-access policy is set responsively to usage characteristics of the memory region, and data is accessed in the memory region using Direct Memory Access (DMA) in accordance with the memory-access policy.
The present invention will be more fully understood from the following detailed description of the embodiments thereof, taken together with the drawings in which:
In some computing systems, a host or other processing device provides a peripheral device with the capability to access memory regions directly using DMA. For example, a network adapter may read data directly from system memory in order to generate outgoing packets for transmission. Similarly, a network adapter may write data received in incoming packets, directly to the system memory. Other types of peripheral devices, such as storage devices or graphics accelerators, may also write and read data to and from the system memory using DMA, without involving the host in the data transfer.
In an example implementation, to enable the peripheral device to access a memory region directly, the host and the peripheral device carry out a process referred to as “memory registration”. As part of the memory registration process, the host provides the peripheral device with metadata, referred to as a “context”, which describes the memory region. The context may comprise, for example, a mapping between virtual addresses and physical addresses to be used by the peripheral device. Once the memory region has been registered, the host typically issues memory-access commands (e.g., read and write commands) to the peripheral device, but the actual data transfer is carried out directly between the peripheral device and the memory.
Different memory regions may differ from one another considerably in the way they are used by the host, and therefore in the manner the peripheral device is required to access them. For example, some memory regions may be accessed in a sequential manner, other memory regions may be accessed with some periodic or otherwise predictable pattern, and yet other memory regions may be accessed in a random manner. Some memory regions are mostly read from, whereas other memory regions are mostly written to. Some memory regions are accessed frequently, whereas other memory regions may be accessed rarely, or only once. In the present context, characteristics indicative of the manner in which a memory region is accessed, such as the above examples, are referred to herein as “usage characteristics” of a memory region. Additional examples are discussed further below.
The usage characteristics of memory regions can be valuable to the peripheral device in managing memory access, e.g., in deciding on operations such as caching and prefetching. For example, data caching provides considerable performance improvement in a memory region that is accessed frequently. In a memory region that is read only once, on the other hand, data caching does not improve performance and unnecessarily wastes memory resources. As another example, prefetching of data is highly effective in a memory region that is accessed sequentially, but practically useless when the memory is accessed randomly.
Conventionally, peripheral devices have no prior information as to the usage characteristics of the memory regions they are required to access. This lack of information prevents peripheral devices from optimizing their memory access operations.
Embodiments of the present invention that are described herein provide improved techniques for memory access by peripheral devices. In the disclosed embodiments, the peripheral device is made aware of the usage characteristic of a memory region. The peripheral device sets s a memory-access policy for accessing the memory region based on the usage characteristics.
The peripheral device can acquire information as to the usage characteristics of a memory region in various ways. In some embodiments, the host (or some other processing device external to the peripheral device) sends the peripheral a “hint” that is indicative of the usage characteristics. Various types of hints are described herein. In other embodiments, the peripheral device learns the usage characteristics regardless of any hints, by tracking memory-access transactions performed in the memory region. The latter embodiments can be used with legacy hosts that do not support the disclosed techniques.
In various embodiments, the peripheral device may set various kinds of memory-access policies, e.g., caching policies and/or prefetching policies, based on the usage characteristics of the memory region.
In some embodiments, the peripheral device holds (i) a list of possible hints, (ii) a list of possible memory-access policies, and (iii) a mapping between the hints and policies. Upon receiving a certain hint from the host, the peripheral device selects the appropriate memory-access policy based on the mapping. The hints, the policies, and/or the mapping between them, may be user-configurable and/or may vary over time.
In some embodiments, the policies and the hints-to-policies mapping are not exposed to the host. Such embodiments are useful, for example, for maintaining data privacy and security in a peripheral device that serves multiple different hosts.
The disclosed techniques enable a peripheral device to match memory-access operations (e.g., caching and prefetching) to actual usage characteristics, per memory region. As a result, computation and memory resources of the peripheral device can be used more efficiently.
In the embodiment of
Peripheral device 28 may comprise, for example, a network adapter that connects host 24 to a communication network (not shown). Examples of network adapters InfiniBand™ Host Channel Adapters (HCAs) and comprise Ethernet Network Interface Controllers (NICs). Alternatively, the peripheral device may comprise a storage device such as a Solid-State Disk (SSD), a storage controller or storage accelerator, a graphics accelerator such as a Graphics Processing Unit (GPU), an accelerator that offloads the host of certain computational tasks, and/or any other suitable type of peripheral device.
Peripheral bus 36 may comprise, for example, a Peripheral Component Interconnect Express (PCIe) bus, an Advanced Extensible Interface (AXI) bus, a Compute Express Link (CXL) bus, an Nvlink or Nvlink Chip-to-Chip (Nvlink-C2C) bus, or any other suitable type of peripheral bus. Peripheral bus 36 is also referred to as a system bus.
In some embodiments, CPU 40 of host 24 defines one or more memory regions 60 in memory 32. CPU 40 then instructs peripheral device 28 to access (e.g., read and/or write) these memory regions using DMA. In the present example two regions 60 are defined, denoted “REGION X” and “REGION Y”.
Consider, for example, an embodiment in which peripheral device 28 is a network adapter that connects host 24 to a network. In this embodiment, a driver in CPU 40 may store data and metadata for outgoing packets in a memory region 60, and requests peripheral device 28 to construct and send the packets. In response to the request, circuitry 48 of peripheral device 28 reads the data and/or metadata from the specified memory region 60 using DMA (i.e., directly over bus 36 without involving host 24), and constructs and sends the packets.
As another example, in an embodiment in which peripheral device 28 is a storage device, CPU 40 may request peripheral device to read certain data from storage and write the data to one of regions 60 in memory 32. In response, circuitry 48 of peripheral device 28 retrieves the requested data from storage and writes the data using DMA to the specified memory region 60 in memory 32. Other examples of memory regions that may be subject to hints are work queues and completion queues.
In order to enable peripheral device 28 to access a given region 60 directly, CPU 40 and circuitry 48 carry out a memory registration process. As part of memory registration, CPU 44 sends circuitry 48 a context 64 for the memory region 60 being registered. Context 64 comprises various metadata that describes the memory region. Context may comprise, for example, a virtual-to-physical address mapping be used when accessing the memory region. The example of
The types of address translations (address mappings) specified in contexts 64 may vary depending on system implementation. In some embodiments, access to memory 32 involves a single translation of a Virtual Address (VA) into a Physical Address (PA). In these embodiments, a given context 64 typically stores a single type of address mapping. In other embodiments, e.g., in a virtualized environment, access to memory 32 involves two address translations—A first translation from a Guest Virtual Address (GVA) into a Guest Physical Address (GPA), followed by a second address translation from the GPA to a Physical Address (HPA, also known as Machine Physical Address—MPA or System Physical Address—SPA). In such embodiments, a given context 64 typically stores both types of address mapping.
In some embodiments, in a virtualized environment, peripheral device may be required to access memory 32 using either of two operational modes:
In accessing a given memory region 60, circuitry 48 of peripheral device 28 may perform operations such as caching and/or prefetching. In a typical prefetching operation, circuitry 48 predicts one or more addresses in region 60 from which data is likely to be read next, and reads these addresses before receiving an explicit request from host 24. In a typical caching operation, circuitry 48 caches selected portions of data in cache 56, in order to serve future requests for the data from cache 56 instead of from memory 32. Since the size of cache 56 is limited, circuitry 48 typically evicts selected portions of data from cache 56 using suitable eviction criteria.
In some embodiments, caching and prefetching may be applied to portions of contexts 64, not only to portions of data. For example, in many practical cases the size of a context 64 is large, e.g., due to the size of the virtual-to-physical address mapping. In such cases, some or even all of context 64 may be stored in memory 32. Only selected portions of the context (e.g., individual address translations) may be prefetched as needed, and/or cached in cache 56.
Different memory regions 60 may be accessed by host 24 in different manners, i.e., with different usage characteristics. Several non-limiting examples of usage characteristics include the following:
Alternatively, any other suitable usage characteristics can be used. In some embodiments, host 24 defines and stores usage characteristic records 68 for the various memory regions 60. The example of
In order to enable peripheral device 28 to optimize its memory-access operations (e.g., caching and/or prefetching), CPU 40 provides circuitry 48 with hints indicative of the usage characteristics of the various memory regions 60. A hint provided for a certain memory region 60 may indicate a single usage characteristic (e.g., “access type is read-only”) or a combination of usage characteristics (e.g., “access type is read-only, and access pattern is random”).
The hint may be provided to peripheral device 28 as part of the initial memory registration of the memory region in question, or at any other time. For a given memory region, CPU 40 may send an updated hint that changes the usage characteristics of the region if necessary. In some embodiments, a given hint may be an “ad-hoc hint” that is valid for a certain time period or for a certain upcoming transaction or group of transactions. An example of such a hint can be “the next N transactions are expected to be read transactions”.
Upon receiving a hint for a certain memory region 60, circuitry 48 sets a memory-access policy for that region based on the hint. In some embodiments, the memory-access policy comprises a caching policy that defines (i) criteria for caching portions of data and/or metadata (e.g., portions of context 64) in cache 56, and/or criteria for evicting portions of data and/or metadata from cache 56. Additionally, or alternatively, the memory-access policy comprises a prefetching policy that defines criteria for prefetching portions of data and/or metadata from memory 32 to peripheral device 28.
As another example, when using ATS, circuitry 48 may prefetch address translation requests, in which case a prefetching policy may be defined based on hints for this sort of prefetching as well. Additionally or alternatively, a caching policy for caching address translations in the ATC may be defined based on hints. Further additionally or alternatively, the memory-access policy may comprise any other suitable policy.
In various embodiments, circuitry 48 may use any suitable technique for setting a memory-access policy for a memory region 60 based on a received hint. In the embodiment of
In some embodiments, circuitry 48 of peripheral device 28 may comprise a policy enforcer (not seen in the figure) that is configured to enforce the memory-access policies.
The configuration of system 20, including the configurations of host 24, peripheral device 28 and memory 32, are example configurations that are chosen purely for the sake of conceptual clarity. Any other suitable configurations can be used in alternative embodiments. For example, in some embodiments host 24 does not send hints to peripheral device 28. Instead, circuitry 48 in peripheral device 28 learns the usage characteristics of one or more memory regions 60 by monitoring memory-access transactions performed in memory 32. For example, circuitry 48 may track the addresses being accessed in the memory region, to determine whether the access pattern is sequential or random. As another example, circuitry 48 may track the times or frequencies of access to the memory region, to decide whether the memory region should be regarded as frequently-accessed or rarely-accessed.
As another example, in the embodiment of
In yet other embodiments, circuitry 48 may use a hybrid method that sets a memory-access policy for a memory region based on a combination of hint(s) and self-learning. As another example, circuitry 48 may set one memory-access policy based on hinting, and another memory-access policy, for the same memory region or for a different memory region, based on self-learning.
As yet another example, the disclosed techniques do not mandate the use of memory registration for defining a memory region 60. For example, a memory region may simply be a range of addresses being accessed by the peripheral device. In example embodiments, peripheral device 28 may access memory 32 using techniques such as “implicit On-Demand Paging” (implicit-ODP), Unified Virtual Memory (UVM), or Shared Virtual Addressing (SVA), all of which do not require memory registration. In such embodiments, circuitry 48 in peripheral device 28 may allocate a context 64 per address range which has a particular usage characteristic. The boundaries of the memory region, and/or the applicable usage characteristic, may be either explicitly declared, or implicitly self-learned.
When using schemes such as ODP, SVA and UVM, a memory page may not be pinned to the host memory, and may therefore be swapped-out to disk, and later swapped-in to a different memory location. When using such schemes, a hint may also be indicative of how likely it is that the relevant memory is present on the host (or on a swap device). When using a swap device, circuitry 48 may issue an early paging request, e.g., using the PCIe Page Request Interface (PRI) or using vendor specific event. Such an early request is also regarded herein as a kind of prefetching, and prefetching policies for memory pages may therefore be defined based on hints.
At a hinting stage 84, CPU 40 sends circuitry 48 a hint that is indicative one or more usage characteristics of the memory region in question. At a mapping stage 88, circuitry 48 maps the received hint to a corresponding memory-access policy. In an example embodiment, circuitry 48 uses hint list 72, policy list 76, and the mapping between them (
The method begins with a memory registration stage 96, similar to stage 80 of
At a policy setting stage 104, circuitry 48 maps the deduced usage pattern to a suitable memory-access policy. As one illustrative example, in response to deciding that the memory region in question is accessed only once or rarely, circuitry 48 may decide to set a “no caching” policy for the memory region. If, on the other hand, the memory region appears to be accessed very frequently, circuitry 48 may give preference to caching of data for this memory region.
At a memory access stage 108, circuitry 48 accesses region using the selected memory-access the memory policy.
In various embodiments, peripheral device 28 may use any suitable mapping between hints (in the case of
In the embodiment of
Service users 110 may create device-specific memory objects 118 that provide peripheral device 28 with metadata relating to memory regions 60. The metadata in objects 118 may comprise, for example, address ranges, virtual-to-physical address translations (e.g., VA-to-GPA or VA-to-HPA depending on the use-case, as described above), ATS entries, Memory Translation Tables (MTT), memory keys (mkey), Process Address Space IDs (PASID), and/or any other suitable information. Peripheral device 28 uses the metadata in objects 118, and locally-stored Memory Translation Tables (MTTs) 126, to map memory regions of the application virtual memory space (or device driver physical memory space) to physical memory space, whether HPA or GPA. In some embodiments the address translations are defined using indirection, e.g., using one mkey that points to one or more other mkey's. This structure is denoted “KLM” in the figure.
In some embodiments, hints and usage characteristics are defined at finer granularity than an entire memory region. For example, for a given memory region, different usage characteristics and/or different hints may be defined per VM, per bus transaction, per Work-Queue Element (WQE), per packet, per address range within a memory region, etc.
In some embodiments, some objects 118 may comprise hints that are indicative of the usage characteristics of the memory regions they address, as described above. In some embodiments, service provider 114 may set memory-access policies, to be applied by peripheral device 28 on objects 118. In the present example, service provider 114 provides peripheral device 28 with hint list 72, policy list 76, and a hint→policy mapping that maps the hints to the policies.
As noted above, peripheral device 28 may comprise a policy enforcer (not seen in the figure) that optionally enforces the memory-access policies. The term “optionally” in this context means that the policy enforcer has the option of overriding, changing or ignoring the request to enforce a certain policy. For example, the policy enforcer may be requested to enforce a certain policy as a result of a hint, but at the same time self-learn that the actual usage characteristics of the memory region in question do not match the requested policy. In such a case, the policy enforcer may decide to enforce a different policy than requested, or to refrain from enforcing any of the policies. The policy enforcer may apply such decisions for the single mismatched memory region, for all memory regions supplied by the service user with this hint, for all hints supplied by the service user, etc.
The policy enforcer may expose the supported policies as capabilities to service provider 114. In an embodiment, service provider 114 specifies hint list 72, policy list 76 and the hint→policy mapping based on the supported policies, as reported by peripheral device 28.
An inset at the bottom of
Another example of a memory-access policy is a policy that determines attributes of the system bus transactions themselves. For example, PCIe Transaction Layer Packets (TLPs) contain Processing Hint (PH) and steering tag fields. These attributes may have an impact on target memory caching behavior. One example is “cache stashing”: Hints can be provided to inject data directly into L2 or L1 processing device caches, instead of to system level cache or DRAM. This sort of policy is also regarded herein as a kind of caching policy.
The configurations of system 20, including the configurations of host 24, peripheral device 28 and memory 32, as shown in
The various elements of host 24 and peripheral device 28 may be implemented in hardware, e.g., in one or more Application-Specific Integrated Circuits (ASICs) or FPGAs, in software, or using a combination of hardware and software elements. In some embodiments, certain functions, e.g., some or all functions of CPU 40 and/or circuitry 48, may be implemented using one or more general-purpose processors, which are programmed in software to carry out the functions described herein. The software may be downloaded to any of the processors in electronic form, over a network, for example, or it may, alternatively or additionally, be provided and/or stored on non-transitory tangible media, such as magnetic, optical, or electronic memory.
Although the embodiments described herein mainly address DMA transactions, the methods and systems described herein can also be used in other applications, such as in Remote DMA (RDMA). In such applications, host 24 and memory 32 may be remote from one another (i.e., over a network), or host 24 and peripheral device 28 may be remote from one another, or peripheral device 28 and memory 32 may be remote from one another. As another example, in RDMA applications the host that performs memory registration is not necessarily the same host that issues hints for that memory region.
It will thus be appreciated that the embodiments described above are cited by way of example, and that the present invention is not limited to what has been particularly shown and described hereinabove. Rather, the scope of the present invention includes both combinations and sub-combinations of the various features described hereinabove, as well as variations and modifications thereof which would occur to persons skilled in the art upon reading the foregoing description and which are not disclosed in the prior art. Documents incorporated by reference in the present patent application are to be considered an integral part of the application except that to the extent any terms are defined in these incorporated documents in a manner that conflicts with the definitions made explicitly or implicitly in the present specification, only the definitions in the present specification should be considered.