Embodiments herein relate to a memory allocator and methods performed therein for allocating memory. Furthermore, an arrangement and methods performed therein, computer programs, computer program products, and carriers are also provided herein. In particular, embodiments herein relate to a memory allocator for allocating memory to an application on a logical server.
In traditional server architecture, a server is equipped with a fixed amount of hardware, such as processing units, memory units, input/output units, etc., connected via communication buses. The memory units provide physical memory, that is, physical memory available for the server, having a physical memory address space. A server Operating System (OS), however, works with a virtual memory address space, herein after denoted “OS virtual memory”, and therefore reference the physical memory by using virtual memory addresses. Virtual memory addresses are mapped to physical memory addresses by the memory management hardware. The OS's virtual memory addresses are assigned to any memory request, e.g., by applications (“Apps”) starting their execution on the server, and the OS keeps the mapping between application memory address space and OS virtual memory addresses through the Memory Management Unit (MMU). The MMU is located between, or is part of, the microprocessor and the Memory Management Controller (MMC). While the MMC's primary function is the translation of OS's virtual memory addresses into a physical memory location, the MMU's purpose is the translation of application virtual memory addresses into OS virtual memory addresses.
The OS is responsible for selecting the address range from the OS virtual memory to be allocated to each application. The task of fulfilling an allocation request from the application to OS consists of locating/finding an address range from OS virtual memory that is free, i.e. unused memory, with sufficient size and accessible to be used by applications. At any given time, some parts of the memory are in use, while some are free and thus available for future allocations.
Independently of the actual location in the physical memory unit(s), the server's OS considers the whole virtual memory address space, i.e., the OS virtual memory, as one large block of virtual memory. As illustrated in
This means that the OS cannot differentiate whether the physical memory of the server is composed of several memory units and, if so, whether the units comprise different memory types with distinct characteristics or not. This was not an issue for the servers up to now, however, with the introduction of new architecture design within the data centers, namely as “disaggregated architecture”, the current concepts of physical and virtual memory will change drastically. Disaggregating a memory unit from a processing unit, e.g., a Central Processing Unit (CPU), can cause degradation in the performance of applications, if it is not carefully addressed.
Having different memory pools, brings the possibility of having different memory types, with distinct characteristics and distances to the CPUs, impacting performance of logical servers and applications which are running on top of such system.
However, the mechanisms for selecting memory units and addresses in a legacy system have drawbacks when applied to a system having a distributed architecture, in worst cases resulting in sluggish behaviour servers and applications running thereon.
An object of embodiments herein is to provide an improved mechanism for memory allocation.
Another object of embodiments herein is to provide an improved mechanism for selection of a memory address range within an allocated memory block of a logical server for an application at initialization.
According to a first aspect, there is provided a method performed by a memory allocator (MA) for allocating memory to an application on a logical server having a memory block allocated from at least one memory pool. In one action of the method, the MA obtains performance characteristics associated with a first portion of the memory block and obtains performance characteristics associated with a second portion of the memory block. The MA further receives information associated with the application and selects one of the first portion and the second portion of the memory block for allocation of memory to the application, based on the received information and at least one of the performance characteristics associated with the first portion of the memory block and the performance characteristics associated with the second portion of the memory block.
According to a second aspect, there is provided a memory allocator (MA) for allocating memory to an application on a logical server having a memory block allocated from at least one memory pool. The MA is configured to obtain performance characteristics associated with a first portion of the memory block and obtain performance characteristics associated with a second portion of the memory block. The MA is further configured to receive information associated with the application and select one of the first portion and the second portion of the memory block for allocation of memory to the application, based on the received information and at least one of the performance characteristics associated with the first portion of the memory block and the performance characteristics associated with the second portion of the memory block.
According to a third aspect, there is provided a memory allocator (MA) for allocating memory to an application on a logical server having a memory block allocated from at least one memory pool. The memory allocator comprises a first obtaining module for obtaining performance characteristics associated with a first portion of the memory block and a second obtaining module for obtaining performance characteristics associated with a second portion of the memory block. The MA also comprises a receiving module for receiving information associated with the application and a selecting module for selecting one of the first portion and the second portion of the memory block for allocation of memory to the application, based on the received information and at least one of the performance characteristics associated with the first portion of the memory block and the performance characteristics associated with the second portion of the memory block.
According to a fourth aspect, there is provided a method for allocating memory to an application on a logical server having a memory block allocated from at least one memory pool. The method comprises receiving at an Operating System (OS) a request for memory space from an application. The OS sends information associated with the application to a Memory Allocator (MA). The MA receives the information associated with the application from the OS and selects one of a first portion and a second portion of the memory block for allocation of memory to the application, based on the information associated with the application and at least one of a performance characteristics associated with the first portion of the memory block and a performance characteristics associated with the second portion of the memory block.
According to a fifth aspect, there is provided an arrangement for allocating memory to an application on a logical server having a memory block allocated from at least one memory pool. The arrangement comprises an Operating system (OS) and a Memory Allocator (MA). The OS is configured to receive a request for memory space from an application. The OS is further configured to send information associated with the application to the MA. The MA of the arrangement is configured to receive information associated with the application from the OS and select one of a first portion and a second portion of the memory block for allocation of memory to the application, based on the information associated with the application and at least one of a performance characteristics associated with the first portion of the memory block and a performance characteristics associated with the second portion of the memory block.
According to a sixth aspect, there is provided a computer program comprising instructions, which when executed on at least one processor, cause the processor to perform the corresponding method according to the first aspect. According to a seventh aspect, there is provided a computer program comprising instructions, which when executed on at least one processor, cause the processor to perform the corresponding method according to the fourth aspect.
According to an eighth aspect, there is provided a computer program product comprising a computer-readable medium having stored there on a computer program of any of the sixth aspect and the seventh aspect.
According to a ninth aspects, there are provided a carrier comprising the computer program according to any of the sixth aspect and the seventh aspect. The carrier is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium.
Disclosed herein are methods to improve the memory allocation of an application when initialized on a logical server. Embodiments herein may find particular use in data centers, having a distributed hardware architecture. The methods may for instance allow the logical server to allocate memory resources optimally for applications to optimize performance of both the logical server and the applications running on the logical server. Some embodiments herein may thus avoid the logical server becoming sluggish and enable that applications execute with sufficient speed, for example.
In the following, embodiments and exemplary aspects of the present disclosure will be described in more detail with reference to the drawings, in which:
The inventive concept will now be described more fully hereinafter with reference to the accompanying drawings, in which certain embodiments of the inventive concept are shown. This inventive concept may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided by way of example so that this disclosure will be thorough and complete, and will fully convey the scope of the inventive concept to those skilled in the art. Like numbers refer to like elements throughout the description. Any step or feature illustrated by dashed lines should be regarded as optional.
In the following description, explanations given with respect to one aspect of the present disclosure correspondingly apply to the other aspects.
For better understanding of the proposed technology,
NIC pools are used as the network interface for any of the components in the pools, i.e., CPUs, memory units, storage nodes that need external communication during their execution. Storage pools contain a number of storage nodes for storing the persistent data of the users. A fast interconnect connects the multiple resources.
On top of the above described hardware resources, thus comprising a hardware layer, there may be different logical servers (called “hosts” in
New data center hardware architectures rely on the principle of hardware resource disaggregation. The hardware disaggregation principle considers CPU, memory and network resources as individual and modular components. As described above, these resources tend to be organized in a pool based way, i.e., there is a pool of CPU units, a pool of memory units, and a pool of network interfaces. In this sense, a logical server is composed of a subset of units/resources within one or more pools. Applications run on top of logical servers which are instantiated on request.
With respect to the memory pools in a disaggregated architecture, each memory pool can serve multiple logical servers, by providing dedicated memory slots from the pool to each server, and a single logical server can eventually consume memory resources from multiple memory pools.
As seen from
As exemplified by
In the following a MA and a method performed thereby are briefly described. The MA is provided for allocating memory to an application on a logical server, which may be running in a data center. To the logical server, there is allocated a memory block from at least one memory pool. The allocation of the memory block may thus be from one or more memory unit(s) comprised in one or more memory pool(s). According to the method, the MA obtains performance characteristics associated with a first portion of the memory block and obtains performance characteristics associated with a second portion of the memory block. The MA further receives information associated with the application and selects one of the first portion and the second portion of the memory block for allocation of memory to the application, based on the received information and at least one of the performance characteristics associated with the first portion of the memory block and the performance characteristics associated with the second portion of the memory block.
The method performed by the MA provides several advantages. One possible advantage is that each application can be placed in the physical memory based on application requirement. Another possible advantage is better usage of memory pools. A further possible advantage is the improvement of application performance and speed up of the execution time, meaning that more tasks can be executed with less amounts of resources and in shorter time.
The performance characteristics can be said to be a measure of how well the portion of the memory block is performing, e.g. with respect to the connected CPU. Merely as an illustrative example, there may be one or more threshold values defined for different types of performance characteristics, wherein when a threshold value is met for a performance characteristic, the first portion of the memory block is performing satisfactorily and when the threshold value is not met, the first portion of the memory block is not performing satisfactorily. The definition of the threshold value defines what is satisfactorily, which may be a question for implementation. Merely as a non-limiting and illustrative example, the performance characteristic is delay, wherein when the threshold value is met, the delay is satisfactorily and when the threshold value is not met, the delay is too long, thus not satisfactorily. One possible reason for too long a delay may be that the first portion of the memory block is located relatively far from one or more CPU resources. In another non-limiting and illustrative example, the performance characteristics is how frequent the first portion of the memory block is accessed. It may be that the memory is of a type that is adapted for frequent access or that the first portion of the memory block is located relatively close to one or more CPU resources, wherein if the first portion of the memory block is not accessed very frequently, then the first portion of the memory block is not optimally used. Further, the memory pool(s) may comprise different types of memory, e.g. Solid-State Drive, SSD, Non-Volatile RAM, NVRAM, SDRAM, and flash type of memory, which generally provide different access times so that data that is accessed frequently may be stored in a memory type having shorter access time such as a SDRAM and data that is accessed less frequently may be placed in a memory type having longer access time such as a NVRAM. The choice of memory may be dependent on various parameters in addition to access time, e.g. short time storage, long time storage, cost, writability etc.
In some embodiments, the performance characteristics associated with the first and second portion of the memory block, which as an example are comprised in a first and second memory unit, respectively, may be defined by one or more of (i) access rate of the respectively first and second memory unit, (ii) occupancy percentage of the respectively first and second memory unit, (iii) physical distance between the respectively first and second memory unit and a CPU resource (of the CPU pool) comprised in the logical server, (iv) respectively first and second memory unit characteristics e.g. memory type, memory operation cost, memory access delay, and (v) connection link and traffic conditions between the respectively first and second memory units and CPUs comprised in the logical server.
The MA may in some embodiments obtain performance characteristics of portions of a memory block allocated to a logical server by monitoring the physical memory units of the memory block and/or other hardware associated with the logical server, e.g., CPUs, communication links between memory units and CPUs, etc. Alternatively, the MA may at least in part receive updates of current performance characteristics of portions of the memory blocks and/or information related to hardware associated with the logical server from a separate monitoring function.
In some embodiments, the MA updates memory grades, for example based on calculations, and stores the grades, e.g., in a memory grade table. The MA may thus provide dynamic sorting/grading of memory units, memory blocks, or portions thereof. The grading may then be conveniently used for obtaining performance characteristics of a portion of a memory block.
In further embodiments, the MA selects a suitable physical memory location for an application based on the memory grades. A memory grade may, e.g., comprise performance characteristics of a portion of a memory block allocated to a logical server.
In a particular embodiment, the first portion of the memory block is comprised in a first memory unit, and the second portion of the memory block is comprised in a second memory unit. The first memory unit and the second memory unit may be located in the same memory pool or in different memory pools. Alternatively, or additionally, the first memory unit and the second memory unit may comprise different types of memory, e.g., Solid-State Drive, SSD, Non-Volatile RAM, NVRAM, SDRAM, and flash type of memory.
In S110 the MA obtains performance characteristics associated with a first portion of the memory block and obtains in S120 performance characteristics associated with a second portion of the memory block. As described earlier, the performance characteristics may be obtained, e.g., by the MA monitoring hardware associated with the logical server, or by receiving information relating to hardware associated with the logical server.
The method further comprises the MA receiving S130 information associated with the application. Such information may for example be one or more of a priority for the application, information on delay sensitivity for the application, information relating to frequency of memory access for the application, a memory request of the application.
The method further comprises selecting S140 one of the first portion and the second portion of the memory block for allocation of memory to the application, based on the received information and at least one of the performance characteristics associated with the first portion of the memory block and the performance characteristics associated with the second portion of the memory block.
In one embodiment of the method, the selecting S140 of one of the first portion and the second portion of the memory block for allocation of memory to the application, is based on the received information associated with the application, the performance characteristics associated with the first portion of the memory block and the performance characteristics associated with the second portion of the memory block.
In some embodiments of the method 100 the selecting S140 comprises comparing the information associated with the application with performance characteristics associated with the first portion and the second portion of the memory block. In this way the MA may, e.g., conclude that the first portion is more suitable for the particular requirements associated with the application. For example, the application may be sensitive to delays whereby the first portion best matches the need of the application. In another example, the application is not delay sensitive, nor requires frequent memory access, and the MA may therefore select the second portion of the memory block, which for example may have performance characteristics associated with a low grade, e.g., being located far from the CPUs and thus having long delay, having long access time, the memory unit comprising the portion having a low percentage of unused memory, etc.
In particular embodiments of the method 100 the information associated with the application comprises one or more of memory type requirements, memory volume requirements, application priority, and application delay sensitivity. Having such information, the MA may suitably match application requirement(s) to performance characteristics of a portion(s) of the memory block allocated to the logical server, enabling optimal use of available memory and/or fulfilling performance requirement of the application.
In a certain embodiment, the method 100 further comprises the MA sending S150 information relating to the selected S140 one of the first portion and the second portion of the memory block for enabling allocation of memory to the application. For example, sending S150 the information to a memory management entity.
According to this embodiment, the sending S150 may comprise initiating update of a memory management table, such as a MMC table or a MMU table. Additionally, or alternatively, the sending S150 may comprise informing the MMC of physical memory addresses associated with the selected S140 one of the first portion and the second portion of the memory block. In this way, the process is transparent from OS, as the OS will select an address range from its virtual addresses, without querying the MA first, and the selection and mapping is done by MA and MMC. Hence, the OS is not affected in this embodiment. Suitably, the information associated with the application received S130 by the MA comprises information relating to a memory space in the OS virtual memory, selected by the OS in response to an application memory request. This may enable the MA to perform a virtual to physical memory mapping, which may further be used for performing an update of the MMC memory mapping table.
Alternatively, the sending S150 may comprise informing an OS of virtual memory addresses, such as a virtual memory address range, associated with the selected S140 one of the first portion and the second portion of the memory block. Receiving such information enables the OS to select a memory space from the OS virtual memory to which to map the application virtual memory, such as for example received in a memory request from the application.
According to this alternative, no update of tables for virtual to physical memory mapping is required in the middle of the process, so it can be faster. However, the OS needs to send information associated with the application to the MA, and receive a response, before selecting the address range in the OS virtual memory. Hence, some modification of OS is needed.
In a particular embodiment, the MA 400 keeps a table of available memory units, and allocated memory blocks, e.g. portions thereof, with their exact location and address. It monitors the access rate and occupancy percentage of each memory units, and updates grade of memory blocks based on the monitoring data, for example memory characteristics. Memory grades are used by the MA 400 to select suitable parts of physical memory based on the application requirements.
The MA may thus before the selection S240, obtain performance characteristics associated with the first portion of the memory block and a performance characteristics associated with the second portion of the memory block, for instance by monitoring hardware of the logical server, e.g., memory units, CPU(s), communication links, etc. As an alternative, the MA 400 obtains said performance characteristics, at least portions thereof, from a separate function which monitors the hardware.
In one embodiment of the method 200, the OS 500 further selects S211 a memory address range from the OS virtual memory and sends S212 the information related to the selected memory address range to a MMU 600. This information may also be comprised in the information associated with the application sent S220 to the MA 400, and hence this information is received S230 by the MA 400. The MA 400 further sends S241 information relating to the selected S240 one of the first portion and the second portion of the memory block, e.g., in form of an update message related to the physical memory addresses associated with the selected S240 one of the first portion and the second portion of the memory block, to an MMC 700.
According to the known art, when an application sends a request to OS to allocate a part of memory, the OS normally looks for a part of the memory with the same size that the application requested. This may be selected from anywhere within the virtual memory address spaces, as the OS has no notion of different characteristics of the underlying physical memory units. There is also a predefined mapping of the virtual memory addresses and the physical addresses kept by MMCs, as exemplified by
Returning to
Particularly, the at least one processor is configured to cause the MA to perform a set of operations, or actions, S110-S140, and in some embodiments also optional actions, as disclosed above. For example, the memory 420 may store the set of operations 425, and the at least one processor 410 may be configured to retrieve the set of operations 425 from the memory 420 to cause the MA 400 to perform the set of operations. The set of operations may be provided as a set of executable instructions. Thus the at least one processor 410 is thereby arranged to execute methods as herein disclosed.
The memory 420 may also comprise persistent storage 427, which, for example, can be any single one or combination of magnetic memory, optical memory, solid state memory or even remotely mounted memory.
The MA 400 may further comprise an input/output unit 430 for communications with resources, arrangements or entities of the data center. As such the input/output unit 430 may comprise one or more transmitters and receivers, comprising analogue and digital components.
The at least one processor 410 controls the general operation of the MA 400 e.g. by sending data and control signals to the input/output unit 430 and the memory 420, by receiving data and reports from the input/output unit 430, and by retrieving data and instructions from the memory 420. Other components, as well as the related functionality, of the MA 400 are omitted in order not to obscure the concepts presented herein.
In this particular example, at least some of the steps, functions, procedures, modules and/or blocks described herein are implemented in a computer program, which is loaded into the memory 420 for execution by processing circuitry including one or more processors 410. The memory 420 may comprise, such as contain or store, the computer program. The processor(s) 410 and memory 420 are interconnected to each other to enable normal software execution. An input/output unit 430 is also interconnected to the processor(s) 410 and/or the memory 420 to enable input and/or output of data and/or signals.
The term ‘processor’ should herein be interpreted in a general sense as any system or device capable of executing program code or computer program instructions to perform a particular processing, determining or computing task.
The processing circuitry does not have to be dedicated to only execute the above-described steps, functions, procedure and/or blocks, but may also execute other tasks.
The flow diagram or diagrams presented herein may be regarded as a computer flow diagram or diagrams, when performed by one or more processors. A corresponding apparatus may be defined as a group of function modules, where each step performed by the processor 410 corresponds to a function module. In this case, the function modules are implemented as a computer program running on the processor 410.
The computer program residing in memory 420 may thus be organized as appropriate function modules configured to perform, when executed by the processor 410, at least part of the steps and/or tasks.
The MA 400 may additionally comprise a sending module 490, for sending information relating to the selected one of the first portion and the second portion of the memory block for enabling allocation of memory to the application.
In general terms, each functional module 450-490 may be implemented in hardware or in software. Preferably, one or more or all functional modules 450-490 may be implemented by processing circuitry including at least one processor 410, possibly in cooperation with functional units 420 and/or 430. The processing circuitry may thus be arranged to fetch from the memory 420 instructions as provided by a functional module 450-490 and to execute these instructions, thereby performing any actions of the MA 400 as disclosed herein.
Alternatively it is possible to realize the module(s) in
The components of the arrangement according to some embodiments herein, comprising a MA 400 and a logical server OS 500, and which additionally may comprise a MMU 600 and a MMC 700, may be realized by way of software, hardware, or a combination thereof.
Particularly, the at least one processor is configured to cause the arrangement to perform a set of operations, or actions, S210-S240, and in some embodiments also optional actions, as disclosed above. For example, the memory 820 may store the set of operations, and the at least one processor 810 may be configured to retrieve the set of operations 825 from the memory 820 to cause the arrangement 800 to perform the set of operations. The set of operations 825 may be provided as a set of executable instructions. Thus the at least one processor 810 is thereby arranged to execute methods as herein disclosed.
The memory 820 may also comprise persistent storage 827, which, for example, can be any single one or combination of magnetic memory, optical memory, solid state memory or even remotely mounted memory.
The arrangement 800 may further comprise an input/output unit 830 for communications with resources, other arrangements or entities of a data center. As such the input/output unit may comprise one or more transmitters and receivers, comprising analogue and digital components.
The at least one processor controls the general operation of the arrangement 800 e.g. by sending data and control signals to the input/output unit and the memory, by receiving data and reports from the input/output unit, and by retrieving data and instructions from the memory.
In this particular example, at least some of the steps, functions, procedures, modules and/or blocks described herein are implemented in a computer program, which is loaded into the memory 820 for execution by processing circuitry including one or more processors 810. The memory 820 may comprise, such as contain or store, the computer program. The processor(s) 810 and memory 820 are interconnected to each other to enable normal software execution. An input/output unit(s) 830 is also interconnected to the processor(s) 810 and/or the memory 820 to enable input and/or output of data and/or signals.
The term ‘processor’ should herein be interpreted in a general sense as any system or device capable of executing program code or computer program instructions to perform a particular processing, determining or computing task.
The processing circuitry does not have to be dedicated to only execute the above-described steps, functions, procedure and/or blocks, but may also execute other tasks.
The flow diagram or diagrams presented herein may be regarded as a computer flow diagram or diagrams, when performed by one or more processors. A corresponding apparatus may be defined as a group of function modules, where each step performed by the processor 810 corresponds to a function module. In this case, the function modules are implemented as a computer program running on the processor 810.
The computer program residing in memory 820 may thus be organized as appropriate function modules configured to perform, when executed by the processor 810, at least part of the steps and/or tasks.
In one embodiment, the arrangement 800 further comprises
According to this embodiment, the arrangement further comprises
In another embodiment of the arrangement 800, the second sending module 863 is additionally for sending from the MA, information related to the information associated with the application to a Memory Management Controller, MMC; and for sending from the MA, information relating to the selected one of the first portion and the second portion of the memory block to the OS.
Further according to this embodiment, the first receiving module 850 is additionally for receiving at the OS, the information relating to the selected portion of the memory block from the MA; and the second selecting module 853 is additionally for selecting by the OS, a memory address range from a OS virtual memory; and the first sending module 852 is additionally for sending from the OS, the information related to the selected memory address range to a Memory Management Unit, MMU.
In general terms, each functional module 850-863 may be implemented in hardware or in software. Preferably, one or more or all functional modules 850-863 may be implemented by processing circuitry including at least one processor 810, possibly in cooperation with functional units 820 and/or 830. The processing circuitry may thus be arranged to fetch from the memory 820 instructions as provided by a functional module 850-863 and to execute these instructions, thereby performing any actions of the arrangement 800 as disclosed herein.
Alternatively it is possible to realize the module(s) in
It will be appreciated that the foregoing description and the accompanying drawings represent non-limiting examples of the methods and apparatus taught herein. As such, the apparatus and techniques taught herein are not limited by the foregoing description and accompanying drawings. Instead, the embodiments herein are limited only by the following claims and their legal equivalents.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/SE2017/050694 | 6/22/2017 | WO | 00 |