This application relates to the field of computer technologies, and in particular, to a memory management method and apparatus, a processor, and a computing device.
With the development of multi-core processors, a quantity of cores of a single processor gradually increases, and a computing speed of the processor continuously increases accordingly. Because a memory access speed and a memory capacity seriously lag behind the computing speed of the processor, a “memory wall” problem becomes more prominent. Currently, multiple types of storage media are disposed in a computer system to improve the memory capacity. However, how the processor uses the multiple types of storage media to enable the memory access speed to maximally satisfy the computing speed of the processor is an urgent problem to be resolved.
This application provides a memory management method and apparatus, a processor, and a computing device, so that a memory access speed maximally satisfies a computing speed of the processor.
According to a first aspect, a memory management method is provided. A hybrid memory system includes multiple processors and multiple different types of memory media, a first processor is associated with at least two different types of memory media, the first processor is any one of the multiple processors, and the method is performed by the first processor and specifically includes the following steps. After obtaining a memory allocation request, the first processor determines a to-be-allocated memory resource from the multiple different types of memory media based on an allocation policy, where the allocation policy indicates to determine the memory resource based on physical attributes (for example, the physical attributes include at least one of the following: a memory capacity, an access latency, a cost, or a service life) of the multiple different types of memory media, and allocates the memory resource to a logical address based on the allocation policy.
In this way, a memory capacity of a system is expanded by configuring the multiple different types of memory media, so that the first processor can obtain as many memory resources as possible when running an application. In addition, when allocating the memory resource to the running application, the first processor fully considers the physical attributes of the multiple different types of memory media in the system, and selects the to-be-allocated memory resource from the multiple different types of memory media, to ensure that an access latency of accessing the allocated memory resource by the first processor is as low as possible, and an access speed and a memory capacity of the memory medium maximally satisfy a computing speed of the first processor.
Specifically, the determining a to-be-allocated memory resource from the multiple different types of memory media based on an allocation policy includes: determining the memory resource from the multiple different types of memory media based on a memory medium type and the physical attributes of the multiple different types of memory media accessed by the first processor in a computing device. It may be understood that the first processor first selects a type of memory medium from the multiple different types of memory media based on the memory medium type, and then determines the to-be-allocated memory resource based on a physical attribute of the type of memory medium.
For example, the multiple different types of memory media include a first memory medium and a second memory medium, the memory media associated with the first processor include the first memory medium and the second memory medium, and if an access speed of the first memory medium is greater than an access speed of the second memory medium, an access latency of the first memory medium is less than an access latency of the second memory medium. Therefore, the first processor may first select the memory resource from first memory media associated with the multiple processors. If the first memory media associated with the multiple processors are all insufficient to satisfy a requirement of the memory allocation request, the memory resource is selected from second memory media associated with the multiple processors.
In a possible implementation, the determining the memory resource from the multiple different types of memory media based on a memory medium type and the physical attributes of the multiple different types of memory media accessed by the first processor in a computing device includes: determining the memory resource from the first memory media associated with the multiple processors based on physical attributes of the first memory media that are associated with the multiple processors and that are accessed by the first processor.
For example, the determining the memory resource from the first memory media associated with the multiple processors based on physical attributes of the first memory media that are associated with the multiple processors and that are accessed by the first processor includes: determining the first memory medium associated with the first processor as the memory resource based on that an access latency of accessing, by the first processor, the first memory medium associated with the first processor is less than an access latency of accessing, by the first processor, a first memory medium associated with a processor adjacent to the first processor; and if a remaining memory resource of the first memory medium associated with the first processor does not satisfy a memory resource requirement of the memory allocation request, determining the first memory medium associated with the processor adjacent to the first processor as the memory resource.
In another possible implementation, the determining the memory resource from the multiple different types of memory media based on a memory medium type and the physical attributes of the multiple different types of memory media accessed by the first processor in a computing device includes: if remaining storage space of the first memory media associated with the multiple processors is less than a preset memory allocation granularity, determining the memory resource from the second memory media associated with the multiple processors based on physical attributes of the second memory media that are associated with the multiple processors and that are accessed by the first processor.
For example, the determining the memory resource from the second memory media associated with the multiple processors based on physical attributes of the second memory media that are associated with the multiple processors and that are accessed by the first processor includes: determining the second memory medium associated with the first processor as the memory resource based on that an access latency of accessing, by the first processor, the second memory medium associated with the processor is less than an access latency of accessing, by the first processor, a second memory medium associated with a processor adjacent to the first processor; and if a remaining memory resource of the second memory medium associated with the first processor does not satisfy the memory resource requirement of the memory allocation request, determining the second memory medium associated with the processor adjacent to the first processor as the memory resource.
In another possible implementation, the allocating the memory resource to a logical address based on the allocation policy includes: allocating the memory resource corresponding to the logical address based on the preset memory allocation granularity. The preset memory allocation granularity is greater than a page size of a memory medium. Compared with the first processor allocating a memory by using a small page 4 KB, this reduces a probability that a miss occurs when the first processor accesses a TLB or a page table, and reduces a large quantity of faults generated when the first processor accesses the memory.
In another possible implementation, the method further includes: releasing the memory resource at the preset memory allocation granularity based on a release instruction. The release instruction instructs to release the memory resource allocated to the application run by the first processor.
In another possible implementation, the first processor is connected to the multiple different types of memory media through interfaces supporting memory semantics, and the interfaces include at least one interface supporting a compute express link (CXL), a cache coherent interconnect for accelerators (CCIX) protocol, or a unified bus (UB, or Ubus).
In another possible implementation, the first memory medium is a dynamic random access memory (DRAM), the second memory medium is a storage-class-memory (SCM), and the SCM includes at least one of the following: a phase-change memory (PCM), a magnetoresistive random access memory (MRAM), a resistive random access memory (RRAM/ReRAM), a ferroelectric random access memory (FRAM), a fast NAND, or a nano-random access memory (NRAM).
In another possible implementation, the hybrid memory system is used in a scenario in which a large-capacity memory is deployed, and the scenario includes at least one of the following: big data, an in-memory database, or a cloud service.
According to a second aspect, a memory management apparatus is provided. The apparatus includes modules configured to perform the memory management method in the first aspect or any possible design of the first aspect.
According to a third aspect, a processor is provided. The processor is associated with at least two different types of memory media, and the processor is configured to perform the operation steps of the memory management method in the first aspect or any possible design of the first aspect, to allocate a memory resource to an application process.
According to a fourth aspect, a computing device is provided. The computing device includes at least one processor, a memory, and multiple different types of memory media, the memory is configured to store a set of computer instructions, and when executing the set of computer instructions, the processor performs the operation steps of the memory management method in the first aspect or any possible implementation of the first aspect, to allocate a memory resource to an application process.
According to a fifth aspect, a computer system is provided. The computer system includes a memory, at least one processor, and multiple different types of memory media, each processor is associated with at least two different types of memory media, the memory is configured to store a set of computer instructions, and when executing the set of computer instructions, the processor performs the operation steps of the memory management method in the first aspect or any possible implementation of the first aspect, to allocate a memory resource to an application process.
According to a sixth aspect, a computer-readable storage medium is provided, including computer software instructions. When the computer software instructions are run on a computing device, the computing device is enabled to perform the operation steps of the method according to the first aspect or any possible implementation of the first aspect, to allocate a memory resource to an application process.
According to a seventh aspect, a computer program product is provided. When the computer program product is run on a computer, a computing device is enabled to perform the operation steps of the method according to the first aspect or any possible implementation of the first aspect, to allocate a memory resource to an application process.
According to an eighth aspect, a chip system is provided. The chip system includes a first processor and the first processor is associated with at least two different types of memory media, and is configured to implement a function of the first processor in the method in the foregoing first aspect. In a possible design, the chip system further includes a memory, configured to store program instructions and/or data. The chip system may include a chip, or may include a chip and another discrete component.
Based on the implementations provided in the foregoing aspects, this application may be further combined to provide more implementations.
A memory is a memory device configured to store programs and various data. A larger storage capacity of the memory indicates a slower access speed. On the contrary, a smaller storage capacity indicates a faster access speed. The access speed is a data transmission speed at which data is written to or read from the memory. The access speed may also be referred to as a read/write speed. To improve system performance of a computer system, the memory may be divided into different layers based on storage capacities and access speeds.
With the development of multi-core processors, a quantity of cores of a single processor gradually increases, and a computing speed of the processor continuously increases accordingly. Therefore, the processor has an increasingly high requirement for a memory access speed and a memory capacity. In a possible implementation, multiple different types of storage media are used together as a memory, to improve the memory capacity, and as many memory resources and memory bandwidths as possible are allocated to each processor core from the memory, to satisfy the requirement of the processor for the memory access speed and the memory capacity. A system that uses multiple different types of storage media as a memory may be referred to as a hybrid memory system. A storage medium used as a memory in the hybrid memory system may be referred to as a memory medium. For example, the multiple different types of memory media include a first memory medium and a second memory medium, where a storage capacity of the first memory medium is lower than a storage capacity of the second memory medium, an access speed of the first memory medium is higher than an access speed of the second memory medium, an access latency of the first memory medium is lower than an access latency of the second memory medium, and a cost of the first memory medium is higher than a cost of the second memory medium.
A storage-class-memory (SCM) has both advantages of a memory (memory) and features of storage, which is a new type of non-volatile memory medium for simple understanding. The SCM has features of being non-volatile, very short access time, low price per bit, solid state, and no mobile area. Currently, there are many SCM medium technologies, among which phase-change memory (PCM) is a most prominent and typical medium and one of earliest memory-level memory medium technologies that have been launched. For example, Intel® Optane™ Memory developed based on 3D Xpoint. In addition, the SCM further includes other types such as a magnetoresistive random access memory (MRAM), a resistive random access memory (RRAM/ReRAM), a ferroelectric random access memory (FRAM), a fast NAND, and a nano-random access memory (NRAM).
The SCM may have a storage capacity of hundreds of gigabytes (GB), and an access latency range of the SCM may be 120 ns to 400 ns. The SCM may be located at the second layer in a hierarchical architecture of the storage system shown in
For example,
The processor 210 further includes integrated memory controllers (iMC) 211 configured to implement memory management and control and multiple processor cores. The multiple processor cores may be further divided into multiple computing clusters, and each computing cluster includes multiple processor cores. For example, as shown in
It should be noted that the hybrid memory system 200 shown in
Optionally, in addition to being integrated into the processor of the hybrid memory system 200 as shown in
Optionally, the hybrid memory system 200 is a hybrid memory system. In addition to the first memory medium and the second memory medium, the hybrid memory system 200 may further include another type of memory medium, where the type of memory medium is different from the type of the first memory medium and the type of the second memory medium. For example, at least one of types of memory media such as a random access memory (RAM), a static random access memory (SRAM), a synchronous dynamic random access memory (SDRAM), and a double data rate synchronous dynamic random access memory (DDR SDRAM) may further be added in the hybrid memory system 200. In this case, the hybrid memory system 200 includes multiple types of hybrid memory media. For ease of description, the following embodiment of this application is described by using an example in which the hybrid memory system 200 includes only the first memory medium and the second memory medium, the first memory medium is the DRAM, and the second memory medium may be the SCM.
It should be noted that, in an initialization stage, an operating system running on each processor of the hybrid memory system may allocate different levels of memory media to the processor based on types of the memory media, and record a correspondence between the processor and the memory media, to perform a data read or write operation based on the correspondence between the processor and the different levels of memory media.
Each processor may be allocated a memory resource based on a hierarchical memory mechanism, the hierarchical memory mechanism indicates levels of the multiple different types of memory media in the hybrid memory system, and the hybrid memory system includes multiple levels. Specifically, because physical attributes of memory media produced by different manufacturers may be different, the processor may classify the memory media in a multi-level memory system into different levels based on the physical attributes of the memory media, where the physical attributes include at least one of the following: a latency, a cost, a life, and a memory capacity. The processor may classify the memory media in the multi-level memory system into multiple levels based on at least one of the latency, the cost, the life, and the memory capacity, and the multiple levels may be sorted from a first level to a second level, namely, in descending order. For example, an access speed of the DRAM is greater than an access speed of the SCM, and an access latency of the DRAM is less than an access latency of the SCM. Therefore, the DRAM may be used as a memory medium of a first level, and the SCM may be used as a memory medium of a second level, where the first level is higher than the second level.
In addition, memory media of a same type may be classified into one or more levels. For example, memory media of a same type are classified into two or more levels based on at least one of physical attributes. For example, because costs of producing the DRAM by different manufacturers may be different, a low-cost DRAM may be used as a memory medium of a first level, and a high-cost DRAM may be used as a memory medium of a second level.
In a memory management method according to this application, a memory capacity of a system is expanded by configuring multiple different types of memory media, so that a processor can obtain as many memory resources as possible when running an application. In addition, after obtaining a memory allocation request, any processor (for example, a first processor) in the system determines a to-be-allocated memory resource from the multiple different types of memory media based on physical attributes (for example, physical attributes include at least one of the following: a memory capacity, an access latency, a cost, or a service life) of the multiple different types of memory media indicated by an allocation policy, and allocates the memory resource to a logical address based on the allocation policy, to ensure that an access latency of the first processor accessing an allocated memory resource is as low as possible, so that a memory medium access speed and a memory capacity maximally satisfy a computing speed of the processor.
It is assumed that the multiple different types of memory media include a first memory medium and a second memory medium. After obtaining the memory allocation request, the first processor determines, based on a memory medium type, that the first memory medium is a memory medium of a first level, and the second memory medium is a memory medium of a second level. Because the level of the first memory medium is higher than the level of the second memory medium, the first processor first determines the memory resource from first memory media associated with the multiple processors based on physical attributes of the first memory media that are associated with the multiple processors and that are accessed by the first processor. In some embodiments, the first processor may determine a first memory medium associated with the first processor as the memory resource; and if a remaining memory resource of the first memory medium associated with the first processor does not satisfy a memory resource requirement of the memory allocation request, determine a first memory medium associated with a processor adjacent to the first processor as the memory resource. In some other embodiments, if the first memory medium associated with the processor adjacent to the first processor has a low cost or a long service life, the first memory medium associated with the processor adjacent to the first processor may also be preferentially determined as the memory resource.
In some other embodiments, if remaining storage space of the first memory media associated with the multiple processors is less than a preset memory allocation granularity, that is, remaining storage space of all first memory media in the system is insufficient, the first processor then determines the memory resource from second memory media associated with the multiple processors based on physical attributes of the second memory media that are associated with the multiple processors and that are accessed by the first processor. For example, the first processor determines a second memory medium associated with the first processor as the memory resource; and if a remaining memory resource of the second memory medium associated with the first processor does not satisfy the memory resource requirement of the memory allocation request, determines a second memory medium associated with a processor adjacent to the first processor as the memory resource.
Optionally, when allocating the memory resource based on the memory medium type and the physical attribute of the memory medium accessed by the first processor, the first processor may also allocate the memory with reference to an application type. For example, because a latency requirement of a video application is high, when an access latency of accessing, by the first processor, a memory medium associated with the first processor is less than an access latency of accessing, by the first processor, a memory medium associated with a processor adjacent to the first processor, the first processor may preferentially allocate the memory resource from the first processor and the first memory medium associated with the first processor. Because a latency requirement of a text application is low, even if remaining storage space of the first processor and a first memory associated with the first processor is greater than the preset memory allocation granularity, the first processor may also allocate the memory resource from the memory medium associated with the processor adjacent to the first processor, so that the memory resource of the first memory medium associated with the first processor is allocated to an application having a high latency requirement, thereby improving a data processing rate of the first processor, and reducing an access latency of accessing the memory.
Optionally, if the first processor is a graphics processing unit (GPU) or a neural network processing unit (NPU), because the GPU and the NPU are generally configured to train a neural network, calculation complexity is high, and a calculation amount is large, an SCM may be configured for the GPU and the NPU, to satisfy a requirement of the first processor for processing data on a large memory capacity.
It should be noted that the memory resource allocated by the first processor to an application program based on the memory allocation request is a memory resource of a memory medium accessible to the first processor. The accessible memory medium is a memory medium that is connected to the first processor through a physical bus or another processor or a network, and the first processor may perform a read/write operation on the accessible memory medium. In this embodiment of this application, the memory resource allocated by the first processor to the application program based on the memory allocation request may be memory space corresponding to a physical address in the memory medium, or memory space corresponding to a segment of physical addresses.
For ease of description, the following embodiments are described by using an example in which each type of memory medium is classified into one level. The hybrid memory system includes at least two levels, and the first level and the second level are used as an example for description. For example, the DRAM is used as the memory medium of the first level, and the SCM is used as the memory medium of the second level.
Step 310: The processor 210 obtains a memory allocation request.
The memory allocation request is generated when a memory in a system needs to be accessed in a process that the processor 210 runs an application process in a user mode, and the memory allocation request includes a logical address.
After the processor 210 obtains the memory allocation request, a memory management unit (MMU) in the processor 210 queries a translation look aside buffer (TLB) based on the logical address. The translation look aside buffer is also referred to as a page table buffer, is a high-speed storage unit located in the processor, and stores some page table files (tables, page tables, of translation from a virtual address to a physical address). If the “page table” is stored in a main memory, costs of querying the page table are high, and the TLB located in the memory can improve efficiency of translation from the virtual address to the physical address. If the TLB includes a physical address of a memory medium corresponding to the logical address, it indicates an access hit, and the processor 210 may access the memory medium based on the physical address.
If the TLB does not include the physical address corresponding to the logical address, it indicates that an access miss, and the processor 210 queries the page table based on the logical address. If the page table includes the physical address of the memory medium corresponding to the logical address, it indicates the access hit, and the processor 210 may access the memory medium based on the physical address.
If the page table does not include the physical address corresponding to the logical address, it indicates the access miss, and the processor 210 generates a page fault, the processor 210 selects, based on a memory medium type and an access latency of accessing a memory medium in the system by the processor 210, one memory medium from different types of memory media, to allocate a memory resource corresponding to the logical address to the application process. It may be understood that, because an access speed of the DRAM is greater than an access speed of the SCM, and an access latency of the DRAM is less than an access latency of the SCM, the processor 210 first allocates the memory resource corresponding to the logical address from the memory medium DRAM of a first level in the system, that is, the processor 210 allocates the memory resource corresponding to the logical address from the DRAM 220 or the DRAM 250 based on an access latency of accessing the DRAM by the processor 210. If both remaining storage space of the DRAM 220 and remaining storage space of the DRAM 250 are insufficient, the processor 210 allocates the memory resource corresponding to the logical address from the memory medium SCM of a second level in the system, that is, the processor 210 allocates the memory resource corresponding to the logical address from the SCM 230 or the SCM 260 based on an access latency of accessing the SCM by the processor 210. In addition, for memory media of a same level, when an access latency of accessing, by the processor 210, a memory medium (the DRAM 220) associated with the processor 210 is less than an access latency of accessing, by the processor 210, a memory medium (the DRAM 250) associated with a processor (for example, the processor 240) adjacent to the processor 210, the processor 210 first allocates the memory resource corresponding to the logical address from the DRAM 220 associated with the processor 210. Therefore, a memory resource of a memory medium with a high access speed is preferentially allocated, and an access latency of accessing the memory by the processor 210 is reduced. For details, refer to the following detailed descriptions of Step 320 to Step 350.
Step 320: The processor 210 allocates the memory resource corresponding to the logical address from the DRAM 220 associated with the processor 210.
For ease of understanding, each processor and/or a memory medium associated with the processor may be referred to as a non-uniform memory access (NUMA) node. The DRAM 220 associated with the processor 210 is referred to as a NUMA node 1, the DRAM 250 associated with the processor 240 is referred to as a NUMA node 2, the SCM 230 associated with the processor 210 is referred to as a NUMA node 3, and the SCM 260 associated with the processor 240 is referred to as a NUMA node 4.
An access latency of accessing, by the processor 210, the memory medium of the NUMA node 1 associated with the processor 210 is less than an access latency of accessing, by the processor 210, the memory medium of the NUMA node 2 associated with the processor 240. Therefore, the memory resource corresponding to the logical address is first allocated from the DRAM 220 associated with the processor 210.
It should be noted that, according to development of the current computer field, memory pages are usually divided into different sizes, for example, 4 KB, 2 MB, and 1 GB. A 4 KB memory page is also referred to as a small page or a small-page memory, and a 2 MB or 1 GB memory page is referred to as a huge page or a huge-page memory. Alternatively, a memory page whose memory page size is greater than a first threshold is referred to as a huge page or a huge-page memory, and a memory page whose memory page size is less than or equal to a second threshold is referred to as a small page or a small-page memory. The first threshold and the second threshold may be the same or may be different. During specific implementation, the first threshold and the second threshold may be configured based on a service requirement.
In a possible implementation, for a large-capacity memory scenario, to improve data processing efficiency, the huge-page memory is usually used for data processing. For example, when allocating a memory based on the memory allocation request, the processor may allocate the memory by using a huge page. Compared with allocating the memory by using a small page 4 KB, this reduces a probability that a miss occurs when the processor accesses the TLB or the page table, and reduces a large quantity of faults generated when the processor accesses the memory.
The processor 210 allocates, by using the huge page, the memory resource corresponding to the logical address from the DRAM 220 associated with the processor 210. If the memory allocation request indicates to request a memory resource corresponding to one logical address, the memory resource corresponding to the logical address may be a memory resource indicated by one physical address on the DRAM 220. For example, the memory resource indicated by one physical address may be a 2 MB memory resource. For another example, the memory resource indicated by one physical address may be a part of memory resources in a huge page of 2 MB. If the memory allocation request indicates to request a memory resource corresponding to a segment of logical addresses, the memory resource corresponding to the segment of logical addresses may be a memory resource indicated by a segment of consecutive physical addresses on the DRAM 220. For example, the memory resource indicated by the segment of physical addresses may be two consecutive huge-page 4 MB memory resources.
If the remaining storage space of the DRAM 220 is less than a preset memory allocation granularity, Step 330 is performed. The preset memory allocation granularity is greater than a page size of the memory medium. For example, the preset memory allocation granularity is 2 MB or 1 GB.
Step 330: The processor 210 allocates the memory resource corresponding to the logical address from the DRAM 250 associated with the processor 240 adjacent to the processor 210.
The processor 210 allocates, by using the huge page, the memory resource corresponding to the logical address from the DRAM 250 associated with the processor 240. If the remaining storage space of the DRAM 250 is less than the preset memory allocation granularity, Step 340 is performed.
Step 340: Allocate the memory resource corresponding to the logical address from the SCM 230 associated with the processor 210.
An access latency of accessing, by the processor 210, the memory of the NUMA node 3 associated with the processor 210 is less than an access latency of accessing, by the processor 210, the memory medium of the NUMA node 4 associated with the processor 240 adjacent to the processor 210. Therefore, the memory resource corresponding to the logical address is first allocated from the SCM 230 associated with the processor 210. The processor 210 allocates, by using the huge page, the memory resource corresponding to the logical address from the SCM 230 associated with the processor 210. If the remaining storage space of the SCM 230 is less than the preset memory allocation granularity, Step 350 is performed.
Step 350: Allocate the memory resource corresponding to the logical address from the SCM 260 associated with the processor 240 adjacent to the processor 210.
Optionally, when allocating the memory based on the memory medium type and the access latency of accessing the memory medium by the processor 210, the processor 210 may also allocate the memory with reference to an application type. For example, because a latency requirement of a video application is high, the processor 210 may preferentially allocate a memory from the processor 210 and a memory (the DRAM 220) of the NUMA node 1 associated with the processor 210. Because a latency requirement of a text application is low, even if remaining storage space of the processor 210 and the memory (the DRAM 220) of the NUMA node 1 associated with the processor 210 is greater than the preset memory allocation granularity, the processor 210 may allocate a memory from the DRAM 250 associated with the processor 240 adjacent to the processor 210, so that the memory resource of the DRAM 220 associated with the processor 210 is allocated to an application having a high latency requirement, thereby improving a data processing rate of the processor 210, and reducing an access latency of accessing the memory. On the premise that performance is not affected, a low-cost large-capacity SCM and a low-latency DRAM are used as a hybrid memory medium combination to store different data in a hierarchical manner, thereby reducing hardware costs of the system.
In some other embodiments, the processor may release the memory resource at the preset memory allocation granularity based on a release instruction. The release instruction instructs to release a memory resource allocated to an application run by the processor. For example, for a large kernel page, a Balloon module is added to a virtual machine, the Balloon module is configured to receive an idle page notification of the application, the Balloon module notifies a Backend located in a Qemu, and the QEMU records, based on the huge page, a part that has been currently released. A huge page is released by releasing all 4 KB pages in the huge page. Therefore, in comparison with that a memory is divided into huge-page resources by using a huge page in advance before an application uses a memory resource, and a processor uses a statically allocated huge-page resource, in this application, the processor dynamically allocates a memory resource to a running application based on physical attributes of multiple different types of memory media, thereby ensuring that a state of a system after initial memory allocation is completed is a state of optimal memory performance, and reducing an impact on application performance. In addition, the processor releases, based on the release instruction, the memory resource allocated to the running application based on the physical attributes of the multiple different types of memory media, so that the memory resource can be used by another application, thereby improving utilization of the memory resource.
The hybrid memory system described in this application may be applied to a scenario in which a large-capacity memory is deployed. For example, a scenario in which the large-capacity memory needs to be deployed, for example, big data (for example, Apache Spark™), an in-memory database (for example, Redis), or a cloud service (for example, a virtual machine provided by using a memory overcommitment mechanism in a cloud infrastructure).
It may be understood that, to implement functions in the foregoing embodiments, the hybrid memory system includes a corresponding hardware structure and/or software module for performing each function. A person skilled in the art should be easily aware that, based on the units and the method steps in the examples described in embodiments disclosed in this application, this application can be implemented through hardware or a combination of hardware and computer software. Whether a function is executed through hardware or hardware driven by computer software depends on particular application scenarios and design constraint conditions of the technical solutions.
The foregoing describes in detail the memory management method according to embodiments with reference to
As shown in
The communication module 410 is configured to communicate with another device, for example, receive a memory allocation request sent by another device. The request module 420 is configured to obtain a memory allocation request, where the memory allocation request includes a logical address. For example, the request module 420 is configured to perform Step 310 in
The storage module 450 may correspond to the foregoing method embodiment and is configured to store a correspondence between the logical address and a physical address of a memory medium, for example, a TLB or a page table.
It should be understood that the memory management apparatus 400 in this embodiment of this application may be implemented by using a graphics processing unit (GPU), a neural network processing unit (NPU), an application-specific integrated circuit (ASIC), or a programmable logic device (PLD). The PLD may be a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL), or any combination thereof. When the memory management method shown in
The memory management apparatus 400 according to this embodiment of this application may correspondingly perform the method described in embodiments of this application, and the foregoing and other operations and/or functions of units in the memory management apparatus 400 are separately for implementing the corresponding procedures of the methods in
It should be understood that, in this embodiment of this application, the processor 510 may be a CPU, for example, a processor of an X56 architecture or a processor of an ARM architecture. The processor 510 may also be another general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), another programmable logic device, a discrete gate, a transistor logic device, a discrete hardware component, a system on chip (SoC), a graphic processing unit (GPU), an artificial intelligence (AI) chip, or the like. The general-purpose processor may be a microprocessor, any conventional processor, or the like.
The memory medium 520 may include a read-only memory and a random access memory, and provide instructions and data to the processor 510. The memory medium 520 may further include a non-volatile random access memory. The memory medium 520 may be a volatile memory or a non-volatile memory, or may include a volatile memory and a non-volatile memory. The non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory. The volatile memory may be a random access memory (RAM), used as an external cache. Through illustrative but not limited description, many forms of RAMs may be used, for example, a static random access memory (SRAM), a dynamic random access memory (DRAM), a synchronous dynamic random access memory (SDRAM), a double data rate synchronous dynamic random access memory (DDR SDRAM), an enhanced synchronous dynamic random access memory (ESDRAM), a synchlink dynamic random access memory (SLDRAM), and a direct rambus random access memory (DR RAM). Optionally, the memory medium 520 may further be a storage-class-memory (SCM), and the SCM includes at least one of the following: a phase-change memory (PCM), a magnetoresistive random access memory (MRAM), a resistive random access memory (RRAM), a ferroelectric random access memory (FRAM), a fast NAND, or a nano-random access memory (NRAM).
A type of the memory medium 530 is similar to a type of the memory medium 520, and may also be any one of the foregoing various memory medium types. However, in the hybrid memory system 500, the type of the memory medium 520 is different from the type of the memory medium 530.
In addition to a data bus, the bus 560 may further include a power bus, a control bus, a state signal bus, and the like. However, for the purpose of clear descriptions, various buses are all marked as the bus 560 in the figure. The bus 560 may be a peripheral component interconnect express (PCIe) bus, an extended industry standard architecture (EISA) bus, a unified bus (Ubus, or UB), a compute express link (CXL), a cache coherent interconnect for accelerators (CCIX) protocol, or the like. The bus 560 may be classified into an address bus, a data bus, a control bus, and the like.
It should be noted that, although one processor 510 is used as an example in the hybrid memory system 500 shown in
It should be understood that the hybrid memory system 500 according to this embodiment of this application may correspond to the memory management apparatus 400 in embodiments of this application, and may correspond to a corresponding body in the method according to embodiments of this application. In addition, the foregoing and other operations and/or functions of the modules in the hybrid memory system 500 are separately for implementing the corresponding procedures of the methods in
This application further provides a processor. The processor includes an integrated circuit, the integrated circuit is connected to multiple different types of memory media, and the integrated circuit is configured to implement functions of the operation steps in the method shown in
Because the modules in the memory management apparatus 400 provided in this application may be deployed on multiple computers in a same environment or different environments in a distributed manner, this application further provides a computer system shown in
The memory medium 601 may be a combination of at least two of the following: a read-only memory, a static storage device, a dynamic storage device, a random access memory, or a storage-class-memory. For example, the memory medium includes a DRAM and an SCM. The memory medium 601 may store computer instructions. When the computer instructions stored in the memory medium 601 are executed by the processor 602, the processor 602 and the communication interface 603 are configured to perform a memory management method of a software system. The memory medium may further store a data set. For example, some storage resources in the memory medium 601 are classified into an area, configured to store a page table and a program for implementing a memory management function in this embodiment of this application.
A type of the memory medium 605 is similar to a type of the memory medium 601, and may also be any one of the foregoing various memory medium types. However, in the computer 600, the type of the memory medium 605 is different from the type of the memory medium 601.
The processor 602 may use a general-purpose CPU, an application specific integrated circuit (ASIC), a GPU, or any combination thereof. The processor 602 may include one or more chips. The processor 602 may include an AI accelerator, for example, an NPU.
The communication interface 603 uses a transceiver module such as but not limited to a transceiver, to implement communication between the computer 600 and another device or a communication network. For example, a memory allocation request may be obtained through the communication interface 603.
The bus 604 may include a channel for transmitting information between components (for example, the memory medium 601, the memory medium 605, the processor 602, and the communication interface 603) of the computer 600.
A communication channel is established between the foregoing computers 600 through the communication network. Any one or more of the request module 420, the decision module 430, and the allocation module 440 are run on each computer 600. Any computer 600 may be a computer (for example, a server) in a cloud data center, a computer in an edge data center, or a terminal computing device. Functions of a database, big data, a cloud service, or the like may be deployed on each computer 600. For example, the GPU is configured to implement a function of training a neural network.
The method steps in this embodiment may be implemented in a hardware manner, or may be implemented by executing software instructions by a processor. The software instructions may include a corresponding software module. The software module may be stored in a random access memory (RAM), a flash memory, a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), a register, a hard disk, a removable hard disk, a CD-ROM, or any other form of storage medium well-known in the art. For example, a storage medium is coupled to a processor, so that the processor can read information from the storage medium and write information into the storage medium. Certainly, the storage medium may be a component of the processor. The processor and the storage medium may be located in the ASIC. In addition, the ASIC may be located in a terminal device. Certainly, the processor and the storage medium may exist in a network device or the terminal device as discrete components.
All or a part of the foregoing embodiments may be implemented by software, hardware, firmware, or any combination thereof. When software is used to implement the embodiments, all or a part of the embodiments may be implemented in a form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer programs or instructions are loaded and executed on a computer, all or some of the procedures or functions in embodiments of this application are executed. The computer may be a general-purpose computer, a dedicated computer, a computer network, a network device, user equipment, or another programmable apparatus. The computer programs or instructions may be stored in a computer-readable storage medium, or may be transmitted from a computer-readable storage medium to another computer-readable storage medium. For example, the computer programs or instructions may be transmitted from a website, a computer, a server, or a data center to another website, computer, server, or data center in a wired or wireless manner. The computer-readable storage medium may be any usable medium accessible by a computer, or a data storage device, such as a server or a data center, integrating one or more usable media. The usable medium may be a magnetic medium, for example, a floppy disk, a hard disk, or a magnetic tape, may be an optical medium, for example, a digital video disc (DVD), or may be a semiconductor medium, for example, a solid state drive (SSD). The foregoing descriptions are merely specific implementations of this application, but are not intended to limit the protection scope of this application. Any modification or replacement readily conceived by a person skilled in the art within the technical scope disclosed in this application shall fall within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.
| Number | Date | Country | Kind |
|---|---|---|---|
| 202111162769.X | Sep 2021 | CN | national |
This application is a continuation of International Application PCT/CN2022/108537, filed on Jul. 28, 2022, which claims priority to Chinese patent application Ser. No. 20/211,1162769.X, filed on Sep. 30, 2021. The disclosures of the aforementioned priority applications are hereby incorporated by reference in their entirety.
| Number | Date | Country | |
|---|---|---|---|
| Parent | PCT/CN2022/108537 | Jul 2022 | WO |
| Child | 18611674 | US |