Embodiments generally relate to technology for a computer runtime environment. More particularly, embodiments relate to managing runtime computer apparatus for tiered object memory placement.
Computer-based applications such as servers, cloud infrastructure and database controllers make use of virtual machines, such as the Java Virtual Machine, to provide the ability to execute programs in a platform-independent environment. Virtual machines allow for programs to implemented in a variety of computing platforms while requiring only relatively minor code changes. Managed programming languages such as JAVA, PYTHON and JAVASCRIPT provide programmers with tools for creating flexible applications that may be executed in a virtual machine environment. A managed runtime is used in such managed languages, providing memory management, garbage collection, and hot method profiling. In a managed runtime, an application is typically rendered as a machine independent set of instructions (byte-code), which will then be interpreted during execution. A hot method profiler collects runtime information to identify which methods are hot spots. Identified hot methods thus may be compiled into native codes via a Just-in-Time (JIT) compiler for faster execution. A garbage collector typically runs in an independent thread to manage and recycle data heap.
Managing runtime allocation of objects for memory storage, particularly in a heterogeneous memory architecture using two or more tiers of memory having different performance, capacity and cost, provides challenges when attempting to optimize for higher performance or lower cost.
The various advantages of the embodiments will become apparent to one skilled in the art by reading the following specification and appended claims, and by referencing the following drawings, in which:
Hot object profiler 121 receives hot methods (e.g., as a list of hot methods) from hot method profiler 111 and may identify hot objects based on the hot methods. Hot object profiler 121 may also monitor objects in hot object heap 126 and/or objects in cold object heap 127 and update a hotness score for one or more monitored objects. Additional details and features of hot object profiler 121 are further described below with reference to
Object allocation redirector 122 manages the allocation of newly-created objects (i.e., objects as they are created) into hot object heap 126 and cold object heap 127. Object allocation redirector 122 may be incorporated within JIT compiler 112 (e.g., forming an enhanced JIT compiler) or may otherwise be closely associated with JIT compiler 112. Object allocation redirector 122 may employ extensions to byte-code methods 114 and native code methods 115 (e.g., see
Enhanced garbage collector 123 operates to manage and recycle objects in hot object heap 126 and cold object heap 127. Object migrator 124 operates as part of the garbage collection process to migrate objects between hot object heap 126 and cold object heap 127, based on a hotness score for each object. Enhanced garbage collector 123 may run in an execution thread independent of an execution thread for JIT compiler 112 and object allocation redirector 122. Additional details and features of enhanced garbage collector 123 (including object migrator 124) are further described below with reference to
Hot object heap 126 may store hot objects in a first memory tier, labeled as Memory Tier 1. Cold object heap 127 may store cold objects in a second memory tier, labeled as Memory Tier 2. Memory tier 1 may include memory components of a different type or quality as compared to memory components of Memory tier 2. For example, on the one hand memory tier 1 may include memory components having higher performance, higher cost and/or lower capacity, while on the other hand memory tier 2 may include memory components having lower cost, lower performance and/or higher capacity. For example, memory tier 1 may have a tight constraint on capacity (e.g., due to higher cost), while memory tier 2 may be more abundant (e.g., due to lower cost). As an illustration, memory tier 1 may include DRAM (dynamic random access memory), while memory tier 2 may include Intel® Optane™ memory. As another illustration, memory tier 1 may include eDRAM (embedded DRAM), while memory tier 2 may include traditional DRAM. Other memory tier combinations are possible. Additional details and features of hot object heap 126 and cold object heap 127 are further described below with reference to
Hot object heap 211 may store objects designated as hot objects. Cold object heap 214 may store objects designated as cold objects. Hot object heap 211 may be hot object heap 126, and cold object heap 214 may be cold object heap 127, each as described above with reference to
Object classes are organized to contain most related methods and their related data, and different classes have least data-interdependency. That is, a method in one class doesn't access a data field in another class directly; instead, it accesses through well-defined interfaces/methods such as getField( ) and setField( ). If the class instance a hot method belongs to is considered a hot object, the memory placement of hot objects (i.e., accessed by hot methods) are most likely to impact overall performance. Of note, second layer object access (i.e., a reference of another object's data from a hot object). may be naturally captured when that object's getField( )/setField( ) become hot methods.
Accordingly, apportioning objects between memory tiers based on an object's hotness promotes the ability to optimize or enhance performance and/or cost for a hybrid memory architecture. For example, allocating hot objects to a higher-performing memory tier will promote improved overall performance, since those objects accessed (or likely to be accessed) frequently will have a greater impact on performance compared to objects accessed less frequently. Similarly, allocating cold objects to a lower cost (but lower-performing) memory tier will have a lesser impact on overall performance, since any performance difference for infrequently-accessed objects would not be likely to add up to a significant difference in overall performance; yet allocating cold objects to a lower cost memory tier will be more cost effective than placing those cold objects in a more costly memory tier. Overall, the tiered object memory placement techniques described herein may provide for reduced cost of memory while minimizing any degradation of performance caused through use of lower-performing memory in a memory tier.
In one or more embodiments, objects may additionally have a designation as young or old, based on the relative age of the object (e.g., the length of time since the object was created or allocated). Objects may include a parameter or field which may be marked (or flagged) to indicate whether the object is young or old. There may be many ways to handle objects flagged as young or old, respectively, including but not limited to organizing them into separate spaces or storage areas. As shown in
In one or more embodiments, objects may additionally have a designation as young or old, based on the relative age of the object (e.g., the length of time since the object was created or allocated) as discussed above. As shown in
These hot methods are used by hot profiler 121 to efficiently identify hot objects. For each received hot method, hot object profiler 121 may identify internal objects 311, which are objects that are allocated (created) internally within the method, and external objects 312, which are objects allocated (created) externally to the method and passed to the method as arguments in the method call. For each internal object 311, hot object profiler 121 may identify the internal object type (or class) as hot (label 313). This identification may be used by object allocation redirector 122 in allocating objects newly-created (by the hot method) of that object type to the hot object heap, as described further below.
For each external object 312, hot object profiler 121 may count the invocation of each object referenced as an argument (label 314). In some embodiments, hot object profiler 121 may generate a weighted list of objects based on invocation frequency, such as, e.g.:
Hot object profiler 121 may then apply a filter (label 315) to cull out objects having lower invocation frequency (or lower weight in the weighted list), resulting in objects having higher invocation frequency (or higher weight in the weighted list) which may then be assigned a hotness score attributable to hot objects (label 316). Object filtering may be based on a threshold (e.g., a predetermined threshold) of weight (%), or may be performed dynamically based on memory tier budgets or relative memory tier availability. Hot object profiler 121 may assign a hotness score to each such hot object. As an example, the hotness score assigned to the hot objects may be a value of 2. The lower invocation frequency (or lower weight) objects culled out by filter 315 may be assigned a hotness score attributable to cold objects (label 317) by hot object profiler 121, and each such cold object may be assigned a hotness score; as an example, the hotness score assigned to the cold objects may be a value of 0. Objects identified as hot objects or cold objects by hot object profiler 121 may remain in their current memory location (current object heap), subject to migration (relocation) to the appropriate hot/cold object heap in a subsequent garbage collection cycle by enhanced garbage collector 123.
Hot object profiler 121 may also monitor heap objects (label 318) and update a hotness score for one or more monitored objects. Monitored heap objects may include objects in hot object heap 126 and/or objects in cold object heap 127. For each monitored heap object, hot object profiler 121 may count the invocation of each object. Hot object profiler 121 may then modify a hotness score based on counting invocation by hot methods. In some embodiments, the hotness score for objects may be increased for objects invoked by a hot method (e.g., increasing hotness score by 1), and the hotness score for objects may be decreased for objects not currently invoked by a hot method (e.g., decreasing hotness score by 1). In some embodiments, the hotness score for objects may be reset to a hot value for objects invoked by a hot method (e.g., reset hotness to a value of 2), and the hotness score for objects may be reset to a cold value for objects not currently invoked by a hot method (e.g., reset hotness to a value of 0).
At block 414, each internal object (e.g., of internal objects 311) for received hot method_i may be identified as a hot object. Hot object profiler 121 may assign a hotness score to each such internal object identified as a hot object (e.g., the hotness score assigned to the hot internal objects may be a value of 1).
At block 416, hot object profiler 121 may count the invocation of each external argument referenced as an argument in hot method_i. In some embodiments, hot object profiler 121 may generate a weighted list of objects based on invocation frequency.
At block 418, hot object profiler 121 may determine if it has reached the end of the list of hot methods. If no, the process may return to block 414 and evaluate the next hot method in the hot method list. If yes, the process may continue with block 420.
At block 420, hot object profiler 121 may identify external objects as hot (or cold) based on invocation frequency. Hot object profiler 121 may filter out objects having lower invocation frequency (or lower weight in the weighted list), resulting in objects having higher invocation frequency (or higher weight in the weighted list) which may then be identified as hot objects. Hot object profiler 121 may assign a hotness score to each such hot object for indicating the object as hot (e.g., the hotness score assigned to the hot external objects may be a value of 2). External objects that are filtered out—i.e., having a lower invocation frequency or lower weight may be identified as cold objects, and hot object profiler 121 may assign a hotness score (e.g., a value of 0) to each of the objects identified as cold objects.
At block 611, a method_i to which an allocation address belongs is examined.
At block 612, a query is made to determine whether the method_i has already been compiled to native code via JIT compilation (e.g., method_i is a hot method). If yes, the process may continue with block 613; if no, the process may continue with block 614.
At block 613, native code functions are executed via JIT compilation. A call to cold_heap_allocator ( ) may be replaced with (i.e., redirected to) a call to hot_heap_allocator ( ), which will allocate the object to hot object heap 126.
At block 614, it is determined whether the method_i is going to be compiled to native code via JIT compilation (e.g., method_i is newly-identified as a hot method). If no, the process may continue with block 615; if yes, the process may continue with block 616.
At block 615, byte-code functions are executed via interpretation by the runtime environment of the virtual machine. A call to new_instance_cold ( ) may be replaced with (i.e., redirected to) a call to new_instance_hot ( ), which will allocate the object to hot object heap 126.
At block 616, byte-code functions are executed via interpretation by the runtime environment of the virtual machine, but the respective method is to be compiled to native code by JIT compiler 112 for execution. A call to new_instance_cold ( ) may be compiled to a native code call to hot_heap_allocator ( ), which will allocate the object to hot object heap 126.
In
In
In
In some embodiments, migrating objects between hot object heap 211 and cold object heap 214 may include moving a first object from the cold object heap to the hot object heap if the hotness score associated with the first object is greater than a threshold value, and may include moving a second object from the hot object heap to the cold object heap if the hotness score associated with the second object is less than or equal to the threshold value. In some embodiments, the threshold value may be determined based on a size of the first tier memory. For example, for a larger first tier memory the hot object heap can hold a greater number of hot objects, so the hotness threshold may be set lower which would result in more hot objects in the hot object heap. As another example, for a smaller first tier memory the hot object heap can only hold a lower number of hot objects, so the hotness threshold may be set higher which would result in fewer hot objects in the hot object heap. In some embodiments, the threshold value may be determined based on a relative capacity of the first tier memory and the second tier memory. In some embodiments, the threshold value may be determined based on workload characteristics, such as to capture as hot objects specific object types used frequently.
At block 814, a query is made to determine whether a scanned object (k) is a hot object. If yes (i.e., the object is now hot), the process may continue with block 816; if no (i.e., the object remains cold), the process may continue with block 818.
At block 816, the hot object (k) is moved (i.e., promoted) from the cold object heap and stored in the hot object heap (such as hot object heap 211).
At block 818, enhanced garbage collector 123 may determine if it has reached the end of the objects in the cold object heap. If no, the process may return to block 814 and evaluate the next object in the cold object heap. If yes, the process may continue with the procedures as described below with reference to
At block 824, a query is made to determine whether a scanned object (k) is a hot object. If no (i.e., the object is now cold), the process may continue with block 826; if yes (i.e., the object remains hot), the process may continue with block 828.
At block 826, the hot object (k) is moved (i.e., demoted) from the hot object heap and stored in the cold object heap (such as cold object heap 214).
At block 828, enhanced garbage collector 123 may determine if it has reached the end of the objects in the hot object heap. If no, the process may return to block 824 and evaluate the next object in the hot object heap. If yes, the process may end. It will be understood that objects in the heap may be scanned as a group and then evaluated one object at a time, or each object in the heap may be scanned and evaluated one object at a time. It will be further understood that the processes outlined in method 810 and in method 820 may be combined into a single process for object promotion and demotion as part of the garbage collection process.
At block 832, a query is made to determine whether a scanned object (k) has aged. If yes (i.e., the object is now old), the process may continue with block 833; if no (i.e., the object remains young), the process may continue with block 834. In some embodiments, if the object is already marked as old the procedures regarding object age at blocks 832-833 may be skipped. In some embodiments, the procedures regarding object age at blocks 832-833 may be carried out instead as a separate process regarding monitoring object age.
At block 833, object (k) may be marked (or flagged) as an old object. For example, a parameter or field of object (k) may be changed from an indication as a young object to an indication of an old object.
At block 834, a query is made to determine whether the scanned object (k) is a hot object. If yes (i.e., the object is now hot), the process may continue with block 835; if no (i.e., the object remains cold), the process may continue with block 836.
At block 835, the hot object (k) is moved (i.e., promoted) from the cold object heap and stored in the hot object heap (such as hot object heap 211). If the object (k) is marked as an old object, it will become one of the hot old objects 213; if the object (k) is marked as a young object, it will become one of the hot young objects 212.
At block 836, enhanced garbage collector 123 may determine if it has reached the end of the objects in the cold object heap. If no, the process may return to block 832 and evaluate the next object in the cold object heap. If yes, the process may continue with the procedures described below with reference to
At block 842, a query is made to determine whether a scanned object (k) has aged. If yes (i.e., the object is now old), the process may continue with block 843; if no (i.e., the object remains young), the process may continue with block 844. In some embodiments, if the object is already marked as old the procedures regarding object age at blocks 842-843 may be skipped. In some embodiments, the procedures regarding object age at blocks 842-843 may be carried out instead as a separate process regarding monitoring object age.
At block 843, object (k) may be marked (or flagged) as an old object. For example, a parameter or field of object (k) may be changed from an indication as a young object to an indication of an old object.
At block 844, a query is made to determine whether the scanned object (k) is a hot object. If no (i.e., the object is now cold), the process may continue with block 845; if yes (i.e., the object remains hot), the process may continue with block 846.
At block 845, the cold object (k) is moved (i.e., demoted) from the hot object heap and stored in the cold object heap (such as cold object heap 214). If the object (k) is marked as an old object, it will become one of the cold old objects 216; if the object (k) is marked as a young object, it will become one of the cold young objects 215.
At block 846, enhanced garbage collector 123 may determine if it has reached the end of the objects in the hot object heap. If no, the process may return to block 842 and evaluate the next object in the hot object heap. If yes, the process may end. It will be understood that objects in the heap may be scanned as a group and then evaluated one object at a time, or each object in the heap may be scanned and evaluated one object at a time. It will be further understood that the order of considering object age and object hotness may be reversed. It will be further understood that the processes outlined in method 830 and in method 840 may be combined into a single process for object promotion and demotion as part of the garbage collection process.
At block 912, the apparatus may assign a hotness score to an object having an object type based on an invocation count of objects referenced by a hot method.
At block 914, the apparatus may allocate a newly-created object to one of a hot object heap, said hot object heap assigned to store hot objects in a first memory tier, or a cold object heap, said cold object heap assigned to store cold objects in a second memory tier, based on the hotness score associated with the object type for the newly-created object.
At block 916, the apparatus may migrate a plurality of objects between the hot object heap and the cold object heap based on a hotness score associated with each object, said object migration operating in an execution thread independent of an execution thread for said object allocation.
At block 922, the apparatus may monitor the invocation frequency of objects in the hot object heap and the invocation frequency of objects in the cold object heap.
At block 924, the apparatus may modify the hotness score of at least one object based on the monitored invocation frequency of that object. In some embodiments, the hotness score for objects may be increased for objects invoked by a hot method (e.g., increasing hotness score by 1), and the hotness score for objects may be decreased for objects not currently invoked by a hot method (e.g., decreasing hotness score by 1). In some embodiments, the hotness score for objects may be reset to a hot value for objects invoked by a hot method (e.g., reset hotness to a value of 2), and the hotness score for objects may be reset to a cold value for objects not currently invoked by a hot method (e.g., reset hotness to a value of 0).
At block 932, the apparatus may move a first object from the cold object heap to the hot object heap if the hotness score associated with the first object is greater than a threshold value.
At block 934, the apparatus may move a second object from the hot object heap to the cold object heap if the hotness score associated with the second object is less than or equal to the threshold value.
In some embodiments, the threshold value may be determined based on a relative capacity of the first tier memory and the second tier memory. In some embodiments, allocating a newly-created object to a hot object heap may include replacing a cold object allocator function with a hot object allocator function. In some embodiments, the apparatus may additionally assign a predetermined hotness score for a second newly-created object, the second newly-created object created by the hot method, and allocate the second newly-created object to the hot object heap.
System 1010 may also include an input/output (I/O) subsystem 1016. I/O subsystem 1016 may communicate with for example, one or more input/output (I/O) devices 1028, a network controller 1024 (e.g., wired and/or wireless NIC), and storage 1030. Storage 1030 may be comprised of any appropriate non-transitory machine- or computer-readable memory type (e.g., flash memory, DRAM, SRAM (static random access memory), solid state drive (SSD), hard disk drive (HDD), optical disk, etc.). Storage 1030 may include mass storage. In some embodiments, host processor 1012 and/or I/O subsystem 1016 may communicate with storage 1030 (all or portions thereof) via network controller 1024. In some embodiments, system 1010 may also include a graphics processor 1026.
System 1010 may include two or more memory tiers, such as a first memory tier 1032 (labeled as Memory Tier 1) and a second memory tier 1034 (labeled as Memory Tier 2). First memory tier 1032 may be comprised of a different memory type than second memory tier 1034. As illustrated in
Host processor 1012 and/or I/O subsystem 1016 may execute program instructions 1040 retrieved from system memory 1022 and/or storage 1030 to perform one or more aspects of the processes described above, including processes for hot object profiling described with reference to
Computer program code to carry out the processes described above may be written in any combination of one or more programming languages, including an object-oriented programming language such as JAVA, JAVASCRIPT, PYTHON, SMALLTALK, C++ or the like and/or conventional procedural programming languages, such as the “C” programming language or similar programming languages, and implemented as program instructions 1040. Additionally, program instructions 1040 may include assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, state-setting data, configuration data for integrated circuitry, state information that personalizes electronic circuitry and/or other structural components that are native to hardware (e.g., host processor, central processing unit/CPU, microcontroller, microprocessor, etc.).
Host processor 1012 and I/O subsystem 1016 may be implemented together on a semiconductor die as a system on chip (SoC) 1020, shown encased in a solid line. SoC 1020 may therefore operate as a managed runtime computing apparatus. In some embodiments, SoC 1020 may also include one or more of system memory 1022, network controller 1024, and/or graphics processor 1026 (shown encased in dotted lines).
I/O devices 1028 may include one or more of input devices, such as a touch-screen, keyboard, mouse, cursor-control device, touch-screen, microphone, digital camera, video recorder, camcorder, biometric scanners and/or sensors; input devices may be used to enter information and interact with system 1010 and/or with other devices. I/O devices 1028 may also include one or more of output devices, such as a display (e.g., touch screen, liquid crystal display/LCD, light emitting diode/LED display, plasma panels, etc.), speakers and/or other visual or audio output devices. Input and/or output devices may be used, e.g., to provide a user interface.
Semiconductor apparatus 1110 may be constructed using any appropriate semiconductor manufacturing processes or techniques. Logic 1112 may be implemented at least partly in configurable logic or fixed-functionality hardware logic. For example, logic 1112 may include transistor channel regions that are positioned (e.g., embedded) within substrate(s) 1111. Thus, the interface between logic 1112 and substrate(s) 1111 may not be an abrupt junction. Logic 1112 may also be considered to include an epitaxial layer that is grown on an initial wafer of substrate(s) 1112.
Processor core 1200 is shown including execution logic 1250 having a set of execution units 1255-1 through 1255-N. Some embodiments may include a number of execution units dedicated to specific functions or sets of functions. Other embodiments may include only one execution unit or one execution unit that can perform a particular function. The illustrated execution logic 1250 performs the operations specified by code instructions.
After completion of execution of the operations specified by the code instructions, back end logic 1260 retires the instructions of code 1213. In one embodiment, the processor core 1200 allows out of order execution but requires in order retirement of instructions. Retirement logic 1265 may take a variety of forms as known to those of skill in the art (e.g., re-order buffers or the like). In this manner, processor core 1200 is transformed during execution of code 1213, at least in terms of the output generated by the decoder, the hardware registers and tables utilized by the register renaming logic 1225, and any registers (not shown) modified by the execution logic 1250.
Although not illustrated in
The system 1300 is illustrated as a point-to-point interconnect system, wherein the first processing element 1370 and the second processing element 1380 are coupled via a point-to-point interconnect 1350. It should be understood that any or all of the interconnects illustrated in
As shown in
Each processing element 1370, 1380 may include at least one shared cache 1896a, 1896b. The shared cache 1896a, 1896b may store data (e.g., instructions) that are utilized by one or more components of the processor, such as the cores 1374a, 1374b and 1384a, 1384b, respectively. For example, the shared cache 1896a, 1896b may locally cache data stored in a memory 1332, 1334 for faster access by components of the processor. In one or more embodiments, the shared cache 1896a, 1896b may include one or more mid-level caches, such as level 2 (L2), level 3 (L3), level 4 (L4), or other levels of cache, a last level cache (LLC), and/or combinations thereof.
While shown with only two processing elements 1370, 1380, it is to be understood that the scope of the embodiments are not so limited. In other embodiments, one or more additional processing elements may be present in a given processor. Alternatively, one or more of processing elements 1370, 1380 may be an element other than a processor, such as an accelerator or a field programmable gate array. For example, additional processing element(s) may include additional processors(s) that are the same as a first processor 1370, additional processor(s) that are heterogeneous or asymmetric to processor a first processor 1370, accelerators (such as, e.g., graphics accelerators or digital signal processing (DSP) units), field programmable gate arrays, or any other processing element. There can be a variety of differences between the processing elements 1370, 1380 in terms of a spectrum of metrics of merit including architectural, micro architectural, thermal, power consumption characteristics, and the like. These differences may effectively manifest themselves as asymmetry and heterogeneity amongst the processing elements 1370, 1380. For at least one embodiment, the various processing elements 1370, 1380 may reside in the same die package.
The first processing element 1370 may further include memory controller logic (MC) 1372 and point-to-point (P-P) interfaces 1376 and 1378. Similarly, the second processing element 1380 may include a MC 1382 and P-P interfaces 1386 and 1388. As shown in
The first processing element 1370 and the second processing element 1380 may be coupled to an I/O subsystem 1390 via P-P interconnects 1376 and 1386, respectively. As shown in
In turn, I/O subsystem 1390 may be coupled to a first bus 1316 via an interface 1396. In one embodiment, the first bus 1316 may be a Peripheral Component Interconnect (PCI) bus, or a bus such as a PCI Express bus or another third generation I/O interconnect bus, although the scope of the embodiments are not so limited.
As shown in
Note that other embodiments are contemplated. For example, instead of the point-to-point architecture of
Each of the systems and methods described above, and each of the embodiments (including implementations) thereof, for managing a runtime computer apparatus for tiered object memory placement is considered to be performance-enhanced at least to the extent that it includes identifying hot objects, allocating hot objects to a hot object heap in a first memory tier and allocating cold objects to a cold object heap in a second memory tier, and migrating objects between the hot object heap and the cold object heap based on an object hotness score. Apportioning objects between memory tiers based on an object's hotness promotes the ability to optimize or enhance performance and/or cost for a hybrid memory architecture. For example, allocating hot objects to a higher-performing memory tier will promote improved overall performance, since those objects accessed (or likely to be accessed) frequently will have a greater impact on performance compared to objects accessed less frequently. Similarly, allocating cold objects to a lower cost (but lower-performing) memory tier will have a lesser impact on overall performance, since any performance difference for infrequently-accessed objects would not be likely to add up to a significant difference in overall performance; yet allocating cold objects to a lower cost memory tier will be more cost effective than placing those cold objects in a more costly memory tier. Overall, the tiered object memory placement techniques described herein may provide for reduced cost of memory while minimizing any degradation of performance caused through use of lower-performing memory in a memory tier. Indeed, the enhanced performance may translate into better responsiveness and/or smoothness of applications and an improved user experience.
Example 1 includes an enhanced computing system for managing a runtime computing environment, comprising a processor, and a memory coupled to the processor, the memory including a set of instructions which, when executed by the processor, cause the computing system to assign a hotness score to an object having an object type based on an invocation count of objects referenced by a hot method, allocate a newly-created object to one of a hot object heap, said hot object heap assigned to store hot objects in a first memory tier, or a cold object heap, said cold object heap assigned to store cold objects in a second memory tier, based on the hotness score associated with the object type for the newly-created object, and migrate a plurality of objects between the hot object heap and the cold object heap based on a hotness score associated with each object, said object migration operating in an execution thread independent of an execution thread for said object allocation.
Example 2 includes the computing system of Example 1, wherein the instructions, when executed, further cause the computing system to monitor the invocation frequency of objects in the hot object heap and the invocation frequency of objects in the cold object heap, and modify the hotness score of at least one object based on the monitored invocation frequency of that object.
Example 3 includes the computing system of Example 2, wherein to migrate a plurality of objects between the hot object heap and the cold object heap, the instructions, when executed, cause the computing system to move a first object from the cold object heap to the hot object heap if the hotness score associated with the first object is greater than a threshold value, and move a second object from the hot object heap to the cold object heap if the hotness score associated with the second object is less than or equal to the threshold value.
Example 4 includes the computing system of Example 3, wherein the threshold value is determined based on a relative capacity of the first tier memory and the second tier memory.
Example 5 includes the computing system of Example 1, wherein to allocate a newly-created object to a hot object heap, the instructions, when executed, cause the computing system to replace a cold object allocator function with a hot object allocator function.
Example 6 includes the computing system of any one of Examples 1 to 5, wherein the instructions, when executed, further cause the computing system to assign a predetermined hotness score for a second newly-created object, the second newly-created object created by the hot method, and allocate the second newly-created object to the hot object heap.
Example 7 includes a semiconductor apparatus for managing a runtime computing environment, comprising one or more substrates, and logic coupled to the one or more substrates, wherein the logic is implemented at least partly in one or more of configurable logic or fixed-functionality hardware logic, the logic coupled to the one or more substrates to assign a hotness score to an object having an object type based on an invocation count of objects referenced by a hot method, allocate a newly-created object to one of a hot object heap, said hot object heap assigned to store hot objects in a first memory tier, or a cold object heap, said cold object heap assigned to store cold objects in a second memory tier, based on the hotness score associated with the object type for the newly-created object, and migrate a plurality of objects between the hot object heap and the cold object heap based on a hotness score associated with each object, said object migration operating in an execution thread independent of an execution thread for said object allocation.
Example 8 includes the semiconductor apparatus of Example 7, wherein the logic coupled to the one or more substrates is further to monitor the invocation frequency of objects in the hot object heap and the invocation frequency of objects in the cold object heap, and modify the hotness score of at least one object based on the monitored invocation frequency of that object.
Example 9 includes the semiconductor apparatus of Example 8, wherein to migrate a plurality of objects between the hot object heap and the cold object heap, the logic coupled to the one or more substrates is to move a first object from the cold object heap to the hot object heap if the hotness score associated with the first object is greater than a threshold value, and move a second object from the hot object heap to the cold object heap if the hotness score associated with the second object is less than or equal to the threshold value.
Example 10 includes the semiconductor apparatus of Example 9, wherein the threshold value is determined based on a relative capacity of the first tier memory and the second tier memory.
Example 11 includes the semiconductor apparatus of Example 7, wherein to allocate a newly-created object to a hot object heap, the logic coupled to the one or more substrates is to replace a cold object allocator function with a hot object allocator function.
Example 12 includes the semiconductor apparatus of any one of Examples 7 to 11, wherein the logic coupled to the one or more substrates is further to assign a predetermined hotness score for a second newly-created object, the second newly-created object created by the hot method, and allocate the second newly-created object to the hot object heap.
Example 13 includes at least one non-transitory computer readable storage medium comprising a set of instructions for managing a runtime computing environment which, when executed by a computing system, cause the computing system to assign a hotness score to an object having an object type based on an invocation count of objects referenced by a hot method, allocate a newly-created object to one of a hot object heap, said hot object heap assigned to store hot objects in a first memory tier, or a cold object heap, said cold object heap assigned to store cold objects in a second memory tier, based on the hotness score associated with the object type for the newly-created object, and migrate a plurality of objects between the hot object heap and the cold object heap based on a hotness score associated with each object, said object migration operating in an execution thread independent of an execution thread for said object allocation.
Example 14 includes the at least one non-transitory computer readable storage medium of Example 13, wherein the instructions, when executed, further cause the computing system to monitor the invocation frequency of objects in the hot object heap and the invocation frequency of objects in the cold object heap, and modify the hotness score of at least one object based on the monitored invocation frequency of that object.
Example 15 includes the at least one non-transitory computer readable storage medium of Example 14, wherein to migrate a plurality of objects between the hot object heap and the cold object heap, the instructions, when executed, cause the computing system to move a first object from the cold object heap to the hot object heap if the hotness score associated with the first object is greater than a threshold value, and move a second object from the hot object heap to the cold object heap if the hotness score associated with the second object is less than or equal to the threshold value.
Example 16 includes the at least one non-transitory computer readable storage medium of Example 15, wherein the threshold value is determined based on a relative capacity of the first tier memory and the second tier memory.
Example 17 includes the at least one non-transitory computer readable storage medium of Example 13, wherein to allocate a newly-created object to a hot object heap, the instructions, when executed, cause the computing system to replace a cold object allocator function with a hot object allocator function.
Example 18 includes the at least one non-transitory computer readable storage medium of any one of Examples 13 to 17, wherein the instructions, when executed, further cause the computing system to assign a predetermined hotness score for a second newly-created object, the second newly-created object created by the hot method, and allocate the second newly-created object to the hot object heap.
Example 19 includes a method of operating a computing apparatus for managing a runtime computing environment, comprising assigning a hotness score to an object having an object type based on an invocation count of objects referenced by a hot method, allocating a newly-created object to one of a hot object heap, said hot object heap assigned to store hot objects in a first memory tier, or a cold object heap, said cold object heap assigned to store cold objects in a second memory tier, based on the hotness score associated with the object type for the newly-created object, and migrating a plurality of objects between the hot object heap and the cold object heap based on a hotness score associated with each object, said object migration operating in an execution thread independent of an execution thread for said object allocation.
Example 20 includes the method of Example 19, further comprising monitoring the invocation frequency of objects in the hot object heap and the invocation frequency of objects in the cold object heap, and modifying the hotness score of at least one object based on the monitored invocation frequency of that object.
Example 21 includes the method of Example 20, wherein migrating a plurality of objects between the hot object heap and the cold object heap comprises moving a first object from the cold object heap to the hot object heap if the hotness score associated with the first object is greater than a threshold value, and moving a second object from the hot object heap to the cold object heap if the hotness score associated with the second object is less than or equal to the threshold value.
Example 22 includes the method of Example 21, wherein the threshold value is determined based on a relative capacity of the first tier memory and the second tier memory.
Example 23 includes the method of any one of Examples 19 to 22, further comprising assigning a predetermined hotness score for a second newly-created object, the second newly-created object created by the hot method, and allocating the second newly-created object to the hot object heap.
Example 24 includes an apparatus for managing a runtime computing environment, comprising means for assigning a hotness score to an object having an object type based on an invocation count of objects referenced by a hot method, means for allocating a newly-created object to one of a hot object heap, said hot object heap assigned to store hot objects in a first memory tier, or a cold object heap, said cold object heap assigned to store cold objects in a second memory tier, based on the hotness score associated with the object type for the newly-created object, and means for migrating a plurality of objects between the hot object heap and the cold object heap based on a hotness score associated with each object, said object migration operating in an execution thread independent of an execution thread for said object allocation.
Example 25 includes the apparatus of Example 24, further comprising means for monitoring heap objects and updating a hotness score for monitored objects.
Example 26 includes the method of Example 19, wherein allocating a newly-created object to a hot object heap comprises replacing a cold object allocator function with a hot object allocator function.
Thus, technology described herein improves the performance of managed runtime computer environments through implementing tiered object memory placement by identifying hot objects, allocating hot objects to a hot object heap in a first memory tier and allocating cold objects to a cold object heap in a second memory tier, and migrating objects between the hot object heap and the cold object heap based on an object hotness score. Apportioning objects between memory tiers based on an object's hotness promotes the ability to optimize or enhance performance and/or cost for a hybrid memory architecture. For example, allocating hot objects to a higher-performing memory tier will promote improved overall performance, since those objects accessed (or likely to be accessed) frequently will have a greater impact on performance compared to objects accessed less frequently. Similarly, allocating cold objects to a lower cost (but lower-performing) memory tier will have a lesser impact on overall performance, since any performance difference for infrequently-accessed objects would not be likely to add up to a significant difference in overall performance; yet allocating cold objects to a lower cost memory tier will be more cost effective than placing those cold objects in a more costly memory tier. Overall, the tiered object memory placement techniques described herein may provide for reduced cost of memory while minimizing any degradation of performance caused through use of lower-performing memory in a memory tier. Indeed, the enhanced performance may translate into better responsiveness and/or smoothness of applications and an improved user experience. The technology described herein may be applicable in any number of managed runtime environments, including servers, cloud computing, browsers, and/or environments using JIT or AOT (ahead of time) compilation/processing.
Embodiments are applicable for use with all types of semiconductor integrated circuit (“IC”) chips. Examples of these IC chips include but are not limited to processors, controllers, chipset components, programmable logic arrays (PLAs), memory chips, network chips, systems on chip (SoCs), SSD/NAND controller ASICs, and the like. In addition, in some of the drawings, signal conductor lines are represented with lines. Some may be different, to indicate more constituent signal paths, have a number label, to indicate a number of constituent signal paths, and/or have arrows at one or more ends, to indicate primary information flow direction. This, however, should not be construed in a limiting manner. Rather, such added detail may be used in connection with one or more exemplary embodiments to facilitate easier understanding of a circuit. Any represented signal lines, whether or not having additional information, may actually comprise one or more signals that may travel in multiple directions and may be implemented with any suitable type of signal scheme, e.g., digital or analog lines implemented with differential pairs, optical fiber lines, and/or single-ended lines.
Example sizes/models/values/ranges may have been given, although embodiments are not limited to the same. As manufacturing techniques (e.g., photolithography) mature over time, it is expected that devices of smaller size could be manufactured. In addition, well known power/ground connections to IC chips and other components may or may not be shown within the figures, for simplicity of illustration and discussion, and so as not to obscure certain aspects of the embodiments. Further, arrangements may be shown in block diagram form in order to avoid obscuring embodiments, and also in view of the fact that specifics with respect to implementation of such block diagram arrangements are highly dependent upon the platform within which the embodiment is to be implemented, i.e., such specifics should be well within purview of one skilled in the art. Where specific details (e.g., circuits) are set forth in order to describe example embodiments, it should be apparent to one skilled in the art that embodiments can be practiced without, or with variation of, these specific details. The description is thus to be regarded as illustrative instead of limiting.
The term “coupled” may be used herein to refer to any type of relationship, direct or indirect, between the components in question, and may apply to electrical, mechanical, fluid, optical, electromagnetic, electromechanical or other connections. In addition, the terms “first”, “second”, etc. may be used herein only to facilitate discussion, and carry no particular temporal or chronological significance unless otherwise indicated.
As used in this application and in the claims, a list of items joined by the term “one or more of” may mean any combination of the listed terms. For example, the phrases “one or more of A, B or C” may mean A, B, C; A and B; A and C; B and C; or A, B and C.
Those skilled in the art will appreciate from the foregoing description that the broad techniques of the embodiments can be implemented in a variety of forms. Therefore, while the embodiments have been described in connection with particular examples thereof, the true scope of the embodiments should not be so limited since other modifications will become apparent to the skilled practitioner upon a study of the drawings, specification, and following claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2019/126847 | 12/20/2019 | WO |