MEMORY SHARING CONTROL METHOD AND DEVICE, COMPUTER DEVICE, AND SYSTEM

TECHNICAL FIELD

This application relates to the field of information technologies, and in particular, to a memory sharing control method and device, and a system.

BACKGROUND

With popularization of big data technologies, applications in various fields have increasing requirements on computing resources. Large-scale computing represented by applications such as graph computing and deep learning represents a latest application development direction. In addition, with slowdown of semiconductor technology development, scalable performance of the applications cannot be continuously improved during processor upgrade, and multi-core processors gradually become a mainstream.

A multi-core processor system has an increasingly high requirement for a capacity of a memory. As an indispensable component in a server, cost of the memory accounts for 30% to 40% of total cost of operations of the server. Improving utilization of the memory is an important means to reduce the total cost of operations (TCO).

SUMMARY

This application provides a memory sharing control method and device, a computer device, and a system, to improve utilization of memory resources.

According to a first aspect, this application provides a computer device, including at least two processing units, a memory sharing control device, and a memory pool, where the processing unit is a processor, a core in a processor, or a combination of cores in a processor, and the memory pool includes one or more memories;

- the at least two processing units are coupled to the memory sharing control device;
- the memory sharing control device is configured to separately allocate a memory from the memory pool to the at least two processing units, and at least one memory in the memory pool is accessible by different processing units in different time periods; and
- the at least two processing units are configured to access the allocated memory via the memory sharing control device.

The at least two processing units in the computer device can access the at least one memory in the memory pool in different time periods via the memory sharing control device, to implement memory sharing by a plurality of processing units, so that utilization of memory resources is improved.

Optionally, that at least one memory in the memory pool is accessible by different processing units in different time periods means that any two of the at least two processing units can separately access the at least one memory in the memory pool in different time periods. For example, the at least two processing units include a first processing unit and a second processing unit. In a first time period, a first memory in the memory pool is accessed by the first processing unit, and the second processing unit cannot access the first memory. In a second time period, the first memory in the memory pool is accessed by the second processing unit, and the first processing unit cannot access the first memory. Optionally, the processor may be a central processing unit (CPU), and one CPU may include two or more cores.

Optionally, one of the at least two processing units may be a processor, a core in a processor, a combination of a plurality of cores in a processor, or a combination of a plurality of cores in different processors. The combination of the plurality of cores in the processor is used as a processing unit, or the combination of the plurality of cores in the different processors is used as a processing unit. In this way, in a parallel computing scenario, a plurality of different cores access a same memory when executing tasks in parallel, so that efficiency of performing parallel computing by the plurality of different cores can be improved.

Optionally, the memory sharing control device may separately allocate a memory from the memory pool to the at least two processing units based on a received control instruction sent by an operating system in the computer device. Specifically, a driver in the operating system may send, to the memory sharing control device over a dedicated channel, the control instruction used to allocate the memory in the memory pool to the at least two processing units. The operating system is implemented by the CPU in the computer device by executing related code. The CPU that runs the operating system has a privilege mode, and in this mode, the driver in the operating system can send the control instruction to the memory sharing control device over a dedicated channel or a specified channel.

Optionally, the memory sharing control device may be implemented by using a field programmable gate array (FPGA) chip, an application-specific integrated circuit (ASIC), or another similar chip. Circuit functions of the ASIC have been defined at the beginning of design, and the ASIC has features of high chip integration, being easy to implement mass tapeouts, low cost of a single tapeout, a small size, and the like.

In some possible implementations, the at least two processing units are connected to the memory sharing control device via a serial bus; and

- a first processing unit in the at least two processing units is configured to send a first memory access request in a serial signal form to the memory sharing device via the serial bus, where the first memory access request is used to access a first memory allocated to the first processing unit.

The serial bus has characteristics of high bandwidth and low latency. The at least two processing units are connected to the memory sharing control device via the serial bus, so that efficiency of data transmission between the processing unit and the memory sharing control device can be ensured.

Optionally, the serial bus is a memory semantic bus. The memory semantic bus includes but is not limited to a quick path interconnect (QPI), peripheral component interconnect express (PCIe), Huawei cache coherence system (HCCS), or compute express link (CXL) interconnect-based bus.

Optionally, the memory access request generated by the first processing unit is a memory access request in a parallel signal form. The first processing unit may convert the memory access request in the parallel signal form into the first memory access request in the serial signal form through an interface that can implement conversion between a parallel signal and a serial signal, for example, a Serdes interface, and send the first memory access request in the serial signal form to the memory sharing device via the serial bus.

In some possible implementations, the memory sharing control device includes a processor interface, and the processor interface is configured to:

- receive the first memory access request; and
- convert the first memory access request into a second memory access request in a parallel signal form.

The processor interface converts the first memory access request into a second memory access request in a parallel signal form, so that the memory sharing control device can access the first memory, and implement memory sharing without changing an existing memory access architecture.

Optionally, the processor interface is the interface that can implement the conversion between the parallel signal and the serial signal, for example, may be the Serdes interface.

In some possible implementations, the memory sharing control device includes a control unit, and the control unit is configured to:

- establish a correspondence between a memory address of the first memory in the memory pool and the first processing unit in the at least two processing units, to allocate the first memory from the memory pool to the first processing unit.

Optionally, the correspondence between the memory address of the first memory and the first processing unit may be dynamically adjusted. For example, the correspondence between the memory address of the first memory and the first processing unit may be dynamically adjusted as required.

Optionally, the memory address of the first memory may be a segment of consecutive physical memory addresses in the memory pool. The segment of consecutive physical memory addresses in the memory pool can simplify management of the first memory. Certainly, the memory address of the first memory may alternatively be several segments of inconsecutive physical memory addresses in the memory pool.

In some possible implementations, the memory sharing control device includes a control unit, and the control unit is configured to:

- virtualize a plurality of virtual memory devices from the memory pool, where a physical memory corresponding to a first virtual memory device in the plurality of virtual memory devices is the first memory; and
- allocate the first virtual memory device to the first processing unit. Optionally, the virtual memory device corresponds to a segment of consecutive physical memory addresses in the memory pool. The virtual memory device corresponds to a segment of consecutive physical memory addresses in the memory pool, so that management of the virtual memory device can be simplified. Certainly, the virtual memory device may alternatively correspond to several segments of inconsecutive physical memory addresses in the memory pool.

Optionally, the first virtual memory device may be allocated to the first processing unit by establishing an access control table. For example, the access control table may include information such as the identifier of the first processing unit, an identifier of the first virtual memory device, and the start address and the size of the memory corresponding to the first virtual memory device. The access control table may further include permission information of accessing the first virtual memory device by the first processing unit, attribute information of a memory to be accessed (including but not limited to information about whether the memory is a persistent memory), and the like.

In some possible implementations, the control unit is further configured to:

- cancel the correspondence between the first virtual memory device and the first processing unit when a preset condition is met; and
- establish a correspondence between the first virtual memory device and a second processing unit in the at least two processing units.

Optionally, the correspondence between the virtual memory device and the processing unit may be dynamically adjusted based on a memory resource requirement of the at least two processing units.

The correspondence between the virtual memory device and the processing unit is dynamically adjusted, so that memory resource requirements of different processing units in different service scenarios can be flexibly adapted, and utilization of memory resources can be improved.

Optionally, the preset condition may be that a memory access requirement of the first processing unit decreases, and a memory access requirement of the second processing unit increases.

Optionally, the control unit is further configured to:

- cancel the correspondence between the first memory and the first virtual memory device when the preset condition is met; establish a correspondence between the first memory and a second virtual memory device in the plurality of virtual memory devices; and allocate the second virtual memory device to the second processing unit in the at least two processing units. In this case, it is not necessary to change the correspondence between the virtual memory device and the physical memory address in the memory pool, and only a correspondence between the virtual memory device and a different processing unit needs to be changed, so that different processing units can access the same physical memory in different time periods.

In some possible implementations, the memory sharing control device further includes a cache unit, and the cache unit is configured to: cache data read by any one of the at least two processing units from the memory pool, or cache data evicted by any one of the at least two processing units.

Efficiency of accessing the memory data by the processing unit can be further improved by using the cache unit.

Optionally, the cache unit may include a level 1 cache and a level 2 cache. The level 1 cache may be a small-capacity cache with a read/write speed higher than that of the level 2 cache. For example, the level 1 cache may be a 100-megabyte (MB) nanosecond-level cache. The level 2 cache may be a large-capacity cache with a read/write speed lower than that of the level 1 cache. For example, the level 2 cache may be a 1-gigabyte (GB) dynamic random access memory (DRAM). The level 1 cache and the level 2 cache are used, so that while a data access speed of the processor can be improved by using the caches, cache space can be increased, a range in which the processor quickly accesses the memory by using the caches is expanded, and a memory access rate of the processor resource pool is further improved generally.

Optionally, the data in the memory may be first cached in the level 2 cache, and the data in the level 2 cache is then cached in the level 1 cache based on a requirement of the processing unit for the memory data. Alternatively, the data that is evicted by the processing unit or does not need to be processed temporarily may be cached in the level 1 cache, and some data evicted by the processing unit in the level 1 cache may be cached in the level 2 cache, to ensure that the level 1 cache has sufficient space for other processing units to cache data for use.

In some possible implementations, the memory sharing control device further includes a prefetch engine, and the prefetch engine is configured to: prefetch, from the memory pool, the data that needs to be read by any one of the at least two processing units, and cache the data in the cache unit.

Optionally, the prefetch engine may implement intelligent data expectation by using a specified algorithm or a related artificial intelligence (AI) algorithm, to further improve efficiency of accessing the memory data by the processing unit.

In some possible implementations, the memory sharing control device further includes a quality of service (QoS) engine, and the QoS engine is configured to implement optimized storage of the data that needs to be cached by any one of the at least two processing units in the cache unit. By using the QoS engine, different capabilities of caching, in the cache unit 304, the memory data accessed by different processing units can be implemented. For example, a memory access request initiated by a processing unit with a high priority has exclusive cache space in the cache unit 304. In this way, it can be ensured that the data accessed by the processing unit can be cached in time, so that service processing quality of this type of processing unitis ensured.

In some possible implementations, the memory sharing control device further includes a compression/decompression engine, and the compression/decompression engine is configured to: compress or decompress data related to memory access.

Optionally, a function of the compression/decompression engine may be disabled.

The compression/decompression engine may compress, by using a compression ratio algorithm and at a granularity of 4 kilobits (KBs) per page, data written by the processing unit into the memory, and then write compressed data into the memory; or decompress data to be read when the processing unit reads compressed data in the memory, and then send the decompressed data to the processor. In this way, a data transmission rate can be improved, and efficiency of accessing the memory data by the processing unit can be further improved. Optionally, the compression/decompression engine may be disabled.

Optionally, the memory sharing control device further includes a storage unit, where the storage unit includes software code of at least one of the QoS engine, the prefetch engine, and the compression/decompression engine. The memory sharing control device may read the code in the storage unit to implement a corresponding function.

Optionally, the at least one of the QoS engine, the prefetch engine, and the compression/decompression engine may be implemented by using control logic of the memory sharing control device.

In some possible implementations, the first processing unit further has a local memory, and the local memory is used for memory access of the first processing unit. Optionally, the first processing unit may preferentially access the local memory. The first processing unit has a higher speed of accessing the local memory, so that the speed of accessing the memory by the first processing unit can be further improved.

In some possible implementations, the plurality of memories included in the memory pool are of different medium types. For example, the memory pool may include at least one of the following memory media: a DRAM, a phase change memory (PCM), a storage class memory (SCM), a static random access memory (SRAM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), a NAND flash memory, a spin-transfer torque random access memory (STT-RAM), or a resistive random access memory (RRAM). The memory pool may further include a dual in-line memory module (DIMM), or a solid-state disk (SSD).

Different memory media can meet memory resource requirements when different processing units process different services. For example, the DRAM has features of a high read/write speed and volatility, and a memory of the DRAM may be allocated to a processing unit that initiates hot data access. The PCM has a non-volatile feature, and a memory of the PCM may be allocated to a processing unit that accesses data that needs to be stored for a long term. In this way, flexibility of memory access control can be improved while a memory resource is shared.

For example, the memory pool includes a volatile DRAM storage medium and a non-volatile PCM storage medium. The DRAM and the PCM in the memory pool may be in a parallel architecture, and have no hierarchical levels. Alternatively, a non-parallel architecture in which the DRAM is used as a cache and the PCM is used as a main memory may be used. The DRAM may be used as a first-level storage medium, and the PCM is used as a second-level storage medium. For the architecture in which the DRAM and the PCM are parallel to each other, the control unit may store frequently-accessed hot data in the DRAM, in other words, establish a correspondence between a processing unit that initiates to access frequently-accessed hot data and a virtual memory device corresponding to the memory of the DRAM. In this way, a read/write speed of the memory data and a service life of a main memory system can be improved. The control unit may further establish a correspondence between a processing unit that initiates to access less frequently-accessed cold data and a virtual memory device corresponding to the memory of the PCM, to store the less frequently-accessed cold data in the PCM. In this way, security of important data can be ensured based on the non-volatile feature of the PCM. For the architecture in which the DRAM and the PCM are not parallel to each other, based on features of high integration of the PCM and low read/write latency of the DRAM, the control unit may use the PCM as a main memory to store various types of data, and use the DRAM as a cache. In this way, memory access efficiency and performance can be further improved.

According to a second aspect, this application provides a system, including at least two computer devices according to the first aspect, and the at least two computer devices according to the first aspect are connected to each other through a network.

A computer device of the system can not only access a memory pool on the computer device via a memory sharing control device, to improve memory utilization, but also access a memory pool on another computer device through a network. A range of the memory pool is expanded, so that utilization of memory resources can be further improved.

Optionally, the memory sharing control device in the computer device in the system may alternatively have a function of a network adapter, and can send an access request of a processing unit to another computer device in the system through the network, to access a memory of the another computer device.

Optionally, the computer device in the system may alternatively include a network adapter having a serial-to-parallel interface (for example, a Serdes interface). The memory sharing control device in the computer device may send, by using the network adapter, a memory access request of a processing unit to another computer device in the system through the network, to access a memory of the another computer device.

Optionally, the computer device in the system may be connected through an Ethernet-based network or a unified bus (U-bus)-based network.

According to a third aspect, this application provides a memory sharing control device, where the memory sharing control device includes a control unit, a processor interface, and a memory interface.

The processor interface is configured to receive memory access requests sent by at least two processing units, where the processing unit is a processor, a core in a processor, or a combination of cores in a processor.

The control unit is configured to separately allocate a memory from a memory pool to the at least two processing units, where at least one memory in the memory pool is accessible by different processing units in different time periods.

The control unit is further configured to access, through the memory interface, the memory allocated to the at least two processing units.

Via the memory sharing control device, different processing units can access the at least one memory in the memory pool in different time periods, so that a memory resource requirement of the processing units can be met, and utilization of memory resources is improved.

Optionally, the memory interface may be a double data rate (DDR) controller, or the memory interface may be a memory controller with a PCM control function.

Optionally, the memory sharing control device may be implemented by an FPGA chip, an ASIC, or another similar chip.

In some possible implementations, the processor interface is further configured to receive, via a serial bus, a first memory access request sent in a serial signal form by a first processing unit in the at least two processing units, where the first memory access request is used to access a first memory allocated to the first processing unit.

The serial bus has characteristics of high bandwidth and low latency. The first memory access request sent by the first processing unit in the at least two processing units in the serial signal form is received via the serial bus, so that efficiency of data transmission between the processing unit and the memory sharing control device can be ensured.

Optionally, the serial bus is a memory semantic bus. The memory semantic bus includes but is not limited to a QPI, PCIe, HCCS, or CXL protocol interconnect-based bus.

In some possible implementations, the processor interface is further configured to: convert the first memory access request into a second memory access request in a parallel signal form, and send the second memory access request to the control unit.

The control unit is further configured to access the first memory based on the second memory access request through the memory interface.

Optionally, the processor interface is the interface that can implement the conversion between the parallel signal and the serial signal, for example, may be the Serdes interface.

In some possible implementations, the control unit is further configured to establish a correspondence between a memory address of the first memory in the memory pool and the first processing unit, to allocate the first memory from the memory pool to the first processing unit.

Optionally, the correspondence between the memory address of the first memory and the first processing unit is dynamically adjustable. For example, the correspondence between the memory address of the first memory and the first processing unit may be dynamically adjusted as required.

Optionally, memory address information of the first memory includes a start address of the first memory and a size of the first memory. The first processing unit has an identifier, and the establishing a correspondence between a memory address of the first memory and the first processing unit may be establishing a correspondence between the unique identifier of the first processing unit and the memory address information of the first memory. In some possible implementations, the control unit is further configured to: virtualize a plurality of virtual memory devices from the memory pool, where a physical memory corresponding to a first virtual memory device in the plurality of virtual memory devices is the first memory; and allocate the first virtual memory device to the first processing unit.

Optionally, the virtual memory device corresponds to a segment of consecutive physical memory addresses in the memory pool. The virtual memory device corresponds to a segment of consecutive physical memory addresses in the memory pool, so that management of the virtual memory device can be simplified. Certainly, the virtual memory device may alternatively correspond to several segments of inconsecutive physical memory addresses in the memory pool.

In some possible implementations, the control unit is further configured to: cancel a correspondence between the first virtual memory device and the first processing unit when a preset condition is met, and establish a correspondence between the first virtual memory device and a second processing unit in the at least two processing units.

Optionally, the correspondence between the virtual memory device and the processing unit may be dynamically adjusted based on a memory resource requirement of the at least two processing units.

Optionally, the control unit is further configured to:

- cancel the correspondence between the first memory and the first virtual memory device when the preset condition is met; establish a correspondence between the first memory and a second virtual memory device in the plurality of virtual memory devices; and allocate the second virtual memory device to the second processing unit in the at least two processing units. In this case, it is not necessary to change the correspondence between the virtual memory device and the physical memory address in the memory pool, and only a correspondence between the virtual memory device and a different processing unit needs to be changed, so that different processing units can access the same physical memory in different time periods. In some possible implementations, the memory sharing control device further includes a cache unit.

The cache unit is configured to: cache data read by any one of the at least two processing units from the memory pool, or cache data evicted by any one of the at least two processing units.

Efficiency of accessing the memory data by the processing unit can be further improved by using the cache unit.

Optionally, the cache unit may include a level 1 cache and a level 2 cache. The level 1 cache may be a small-capacity cache with a read/write speed higher than that of the level 2 cache. For example, the level 1 cache may be a 100-MB nanosecond-level cache. The level 2 cache may be a large-capacity cache with a read/write speed lower than that of the level 1 cache. For example, the level 2 cache may be a 1-GB DRAM. The level 1 cache and the level 2 cache are used, so that while a data access speed of the processor can be improved by using the caches, cache space can be increased, a range in which the processor quickly accesses the memory by using the caches is expanded, and a memory access rate of the processor resource pool is further improved generally.

Optionally, the prefetch engine may implement intelligent data expectation by using a specified algorithm or an AI algorithm, to further improve efficiency of accessing the memory data by the processing unit.

In some possible implementations, the memory sharing control device further includes a quality of service QoS engine.

The QoS engine is configured to implement optimized storage of the data that needs to be cached by any one of the at least two processing units in the cache unit. By using the QoS engine, different capabilities of caching, in the cache unit 304, the memory data accessed by different processing units can be implemented. For example, a memory access request initiated by a processing unit with a high priority has exclusive cache space in the cache unit 304. In this way, it can be ensured that the data accessed by the processing unit can be cached in time, so that service processing quality of this type of processing unit is ensured.

In some possible implementations, the memory sharing control device further includes a compression/decompression engine.

The compression/decompression engine is configured to: compress or decompress data related to memory access.

Optionally, a function of the compression/decompression engine may be disabled.

Optionally, the compression/decompression engine may compress, by using a compression ratio algorithm and at a granularity of 4 KB per page, data written by the processing unit into a memory, and then write the compressed data into the memory; or decompress data to be read when the processing unit reads compressed data in the memory, and then send the decompressed data to the processor. In this way, a data transmission rate can be improved, and efficiency of accessing the memory data by the processing unit can be further improved. Optionally, the compression/decompression engine may be disabled.

Optionally, the memory sharing control device may further include a storage unit, where the storage unit includes software code of at least one of the QoS engine, the prefetch engine, and the compression/decompression engine. The memory sharing control device may read the code in the storage unit to implement a corresponding function.

Optionally, the at least one of the QoS engine, the prefetch engine, and the compression/decompression engine may be implemented by using control logic of the memory sharing control device.

According to a fourth aspect, this application provides a memory sharing control method, where the method is applied to a computer device, the computer device includes at least two processing units, a memory sharing control device, and a memory pool, the memory pool includes one or more memories, and the method includes:

The memory sharing control device receives a first memory access request sent by a first processing unit in the at least two processing units, where the processing unit is a processor, a core in a processor, or a combination of cores in a processor;

The memory sharing control device allocates a first memory from the memory pool to the first processing unit, where the first memory is accessible by a second processing unit in the at least two processing units in another time period.

The first processing unit accesses the first memory via the memory sharing control device.

According to the method, different processing units access the at least one memory in the memory pool in different time periods, so that a memory resource requirement of the processing units can be met, and utilization of memory resources is improved.