One or more embodiments relate generally to the field of integrated circuit and computer system design. More particularly, one or more of the embodiments relate to a method and apparatus for hardware-based dynamic escape detection in managed run-time environments.
Managed run-time environments are the infrastructures for running applications based on new programming languages, such as Java and C-Sharp (C#). Within the context of managed run-time environments, the allocation of objects is performed from a common memory area, referred to as the “heap,” which is often a shared resource in such environments. Generally, the heap is periodically collected as part of the automatic memory management in such environments. This generally involves scanning dynamically allocated memory for unreachable objects and returning the memory occupied by such objects. As described herein, objects that are allocated can be classified as having either a local scope or a global scope.
As described herein, an object that is defined or classified as having local scope is an object that is visible to a single thread. In other words, a local object is only referenced by local pointers or linked to by other local objects of the same thread. Conversely, an object that is classified as having a global scope refers to an object that is visible by more than one thread.
In multi-threaded managed run-time environments (MRTEs), many optimizations can be applied when working on objects that are known to be local to a single thread. Synchronization of local objects may be avoided and local objects can be allocated in such a way to enable local reclaiming, thus minimizing the work load of a global garbage collector in MRTEs.
A large percentage of objects are indeed local, but it is a challenge to determine for a given object if it is local or global. Conventionally, there are two approaches to determine if an object is local. First, one can perform compiler static analysis of the program and determine that from when an object is created until it is destroyed, there is no possible way for the object to become reachable from another thread. Unfortunately, static analysis can only identify a small fraction of the objects that may be provably identified as local.
A second approach for identifying local objects is to dynamically keep track of what objects are local and which objects are global and detect when an object becomes global by detecting that a link to the object now makes the object globally reachable. As described herein, the scope or reachability of an object refers to the visibility of an object by either a single thread, wherein the object is deemed as having a local scope, or referred to as “locally reachable.” Conversely, an object that is visible to more than one thread is identified as having a global scope or, in the alternative, referred to as “globally reachable.”
Dynamic escape detection provides an approach for determining when a local object becomes global by detecting that a link to the object now makes it globally reachable. Conventional dynamic escape detection is performed by checking every time an object is updated. Based on such an update, if the new link changes the target object from a locally reachable object to a globally reachable object, the target object now includes a modified scope, such that the local object is now a global object having a global scope or identified, in the alternative, as “globally reachable.” As described herein, a “write barrier refers to the performance of such checks to determine whether dynamic escape detection is detected for a local object based on a pointer update.
In most MRTEs, no effort is made to identify local objects and to optimize execution based on such knowledge. The reason is that static analysis identifies so few candidates to optimize. In addition, the overhead of dynamic escape detection mitigates the benefits of optimization and exploitation of local object knowledge.
The various embodiments of the present invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
A method and apparatus for hardware-based dynamic escape detection in managed run-time environments are described. In one embodiment, the method includes the detection of a pointer update of a first object having a global scope. In one embodiment, the pointer update updates a link of the first object to point to a second object. In one embodiment, a single instruction is issued to assert that a scope attribute associated with the second object identifies a scope of the second object as global. The single instruction may return failure if the scope attribute that is associated with the second object identifies the scope of the second object as local. In one embodiment, failure of the single instruction may cause the single instruction to invoke a handler routine to verify that the scope of the second object is local. Verification may include the reading of an object descriptor for the second object to determine whether a scope attribute of the object descriptor indicates that the scope of the second object is local. If verified, the second object, and each object reachable from the second object, are converted into global objects.
In the following description, numerous specific details such as logic implementations, sizes and names of signals and buses, types and interrelationships of system components, and logic partitioning/integration choices are set forth in order to provide a more thorough understanding. It will be appreciated, however, by one skilled in the art that the invention may be practiced without such specific details. In other instances, control structures and gate level circuits have not been shown in detail to avoid obscuring the invention. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate logic circuits without undue experimentation.
In the following description, certain terminology is used to describe features. For example, the term “logic” is representative of hardware and/or software configured to perform one or more functions. For instance, examples of “hardware” include, but are not limited or restricted to, an integrated circuit, a finite state machine or even combinatorial logic. The integrated circuit may take the form of a processor such as a microprocessor, application specific integrated circuit, a digital signal processor, a micro-controller, or the like.
As known to those skilled in the art, a virtual machine (VM) logically partitions a physical machine, such that the underlying hardware of the machine appears as one or more independently operating VMs. Although not shown, CVM 102 may include a virtual machine monitor (VMM) that creates CVM 102 and runs on platform hardware 140 to facilitate for other software the extraction of one or more VMs. Accordingly, CVM 102 may function as a self-contained platform, running its own operating system (OS) and application software. As shown in
In one embodiment, MRTE 100 provides automatic memory management, type management, threads and synchronization and dynamic loading facilities. The automatic memory management provided by MRTE 100 typically includes management of heap 106. As described herein, a heap is an area of memory reserved for dynamic memory allocation needs of an application. As a result, the heap is reserved for data that is created at run-time, usually because the size, quantity or lag-time of an object to be allocated cannot be determined at compile time. For devices with a small memory footprint, such as mobile devices, including cellular telephones and personal digital assistants, management of this relatively limited memory, to maximize available storage capacity, is a substantial limitation.
Referring again to
Accordingly, while an MRTE may directly interpret bytecodes, this is not typically done unless memory is exceedingly limited. As shown in
As further illustrated in
As described herein, allocated objects can be classified as either (1) local, such that, such objects are visible to a single thread, or (2) global, such that, the object is visible to more than one thread. In multi-threaded MRTE environments, for example, as shown in
A large percentage of objects are, indeed, local, but it is a challenge to determine, for a given object, if it is local or not. A first approach for determining whether an object is local is provided by performing compiler static analysis of a program to determine, from when an object is created until it is destroyed that there is no possible way for the object to become reachable from another thread. Unfortunately, static analysis identifies a small fraction of objects that are provably local.
A second approach for is to dynamically keep track of what objects are local or global and detect when an object becomes global by detecting that a link to the object now renders the object globally reachable. This approach has been referred to, in academic publications, as “dynamic escape detection.” An approach to dynamic escape detection is to check, every time any pointer is updated, if the new link changes the target object from locally reachable to globally reachable. Accordingly, if a pointer update changes an object from locally reachable to globally reachable, “dynamic escape” is detected. As described herein, the checking of all pointer updates to detect dynamic escape is referred to herein as “write barriers.”
Referring again to
Accordingly, in the embodiments shown in
As shown in
Accordingly, as described herein, AAT introduces user controlled attribute bits 234 that are associated with cache lines 201. Although illustrated with reference to associating with cache lines, the association of such metadata, or attribute bits, with blocks of memory is not limited to the association of attribute bits with cache lines and may include the incorporation of such attribute bits within system memory and even within the paging of memory to disk, according to the desired implementation.
Both the asynchronous notification associated with monitored line invalidations and the synchronous handling of a failed bit assertion may be provided by the loading of a user-specified handler. In one embodiment, referred to as the “memory line invalidate” (MLI) scenario, the appropriate handler that needs to be invoked when a line with a specified attribute bit set is invalidated from a thread's cache (either because it is explicitly invalidated by another thread or because those lines simply were evicted from the cache) will require an appropriate user-selected handler routine. A second scenario is referred to as the “unexpected memory state (UMS) scenario.” The UMS scenario identifies the appropriate handler that needs to be invoked when a LOAD_AND_CHECK instruction 218 finds that an attribute does not meet the instruction's expected value.
In one embodiment, AAT can be further extended to provide a set of one of AAT range descriptors. These user-programmable range descriptors define a range of the virtual memory space defined by base and bound or equivalent method, wherein the line's actual attributes are overwritten for LOAD_AND_CHECK instructions 218. Accordingly, in one embodiment, when a target address of a LOAD_AND_CHECK instruction 218 falls within a predefined AAT range, instead of comparing the expected attribute value against the actual AAT attributes for the reference line, the expected attribute value is compared against the override attribute provided for the range. In one embodiment, AAT range descriptors have no effect with respect to the detecting and reporting lines that are invalidated or evicted.
In one embodiment, whenever a global object (first object) 252 has one of its pointers 253 updated, write barrier logic 110 (
Unfortunately, the performance impact of replacing each pointer update (see the basic operation shown in Table 1) with the write barrier functionality (see Table 2) a significant performance impact is caused by the addition of such write barrier functionality. In one embodiment, the first check indicated in the pseudo code of Table 2 may be avoided, such that, based on the context of the basic update operation, it can be determined whether the first object is global. Accordingly, such portion of the write barrier functionality may be removed. However, even with the removal of such functionality, the cost of including the write barrier functionality to each basic pointer update operation is significant.
As indicated by the pseudo code of Table 2, the second check is to determine whether the second object is globally reachable. In one embodiment, this check involves reading an object descriptor to determine whether a scope attribute of the object descriptor identifies a scope of the second object as local. In one embodiment, such functionality can be performed using machine code, including a read instruction followed by a compare instruction, followed by a conditional branch instruction. Although such machine code seems rather simple, pointers are updated very frequently, such that even a modest addition in the work performed by each basic pointer update operation would have a net effect of providing a significant slowdown of application execution.
Accordingly, in one embodiment, as shown in
In one embodiment, an object may be forced to live on a single cache line by making all objects at least cache line size of the line (approximately 64 bytes). Alternatively, in one embodiment, multiple small objects may start on the same cache line. In one embodiment, the capability of allowing multiple small objects to start on a single cache line is provided. In one embodiment, if any object starting on a cache line is local, the reachability attribute 234 of the respective cache line 201 is marked as local.
In one embodiment, it may be assumed that for every local object, the cache line in which the object begins has a reachability attribute 234 set to indicate a local scope. In addition, it is also assumed that based on the context of where a pointer update occurs, it can be determined that the object being updated (first object) is local or global. Based on such assumption, the write barrier functionality in the pseudo code shown in Table 2 is performed only if the first object is global.
In one embodiment, the pseudo code shown in Table 3 is replaced by LOAD_AND_CHECK instruction 218 to assert that the second object, or target object, is not a local object. In response to such assertion, if the LOAD_AND_CHECK instruction fails, in one embodiment, a user-selected handler may be invoked to perform a complete check of the second object to determine whether the second object is, in fact, a local object; and if so, perform conversion of the local object and all objects reachable from the second object to set a scope of such objects to local. Accordingly, in one embodiment, AAT instructions (216 and 218) may be used to implement a single load instruction to provide a filter to remove additional checks necessary to implement write barrier semantics for dynamic escape detection.
In embodiments where a reachability attribute is associated with various cache lines of local cache memory to their respective threads, the eviction of cache lines containing local objects is performed as follows. In one embodiment, if all local objects are created in a specific address range, a AAT range feature may be provided with an override attribute bit settings for the range. In accordance with such an embodiment, eviction of cache lines containing local objects is a non-issue. In an alternative embodiment, an MLI scenario may be provided to detect when a line starting a local object is evicted.
In one embodiment, it is assumed that a value of one is used as the appropriate attribute position to represent local scope. To implement such an embodiment, a list of all local objects is maintained. In accordance with such an embodiment, when one local object escapes the cache, all local objects can be pulled back into the cache. In one embodiment, one or more of the local objects may be promoted to global objects to reduce the number of lines being monitored.
In an alternate embodiment, a zero value is used as the appropriate attribute position to represent local. In this case, all objects that come into the cache are, by default, marked as local. Accordingly, in one embodiment, if the LOAD_AND_CHECK instruction fails, resulting in a full check of the object descriptor, the line attribute could be updated to indicate that a scope of the object is, in fact, global. Accordingly, subsequent accesses to the target global object would then be successfully filtered by the AAT attribute bit. Implementation of these various embodiments may be performed depending on the performance trade-offs and are left as implementation details.
Representatively, the host VM model 400 includes VMM 420, which runs on top of host operating system (OS) 442, and WBL 410 to provide hardware-based dynamic escape detection. In one embodiment, VMM 420 loads RSM 412 and GC 414. In a further embodiment, MRTE 100, as shown in
Representatively, hybrid VM model 500 is comprised of service OS 542 and micro-hypervisor (basic VMM) 520, including optimized API 524. According to the hybrid VM model 500, micro-hypervisor 520 may be responsible for CPU/memory resource virtualization and domain scheduling. In one embodiment, VMM 520 loads RSM 512 and GC 514. Service OS 542 may be responsible for VM management and device virtualization/simulation. In accordance with the embodiments illustrated in
Representatively, computer system 600 may be, for example, a personal computer system. Computer system 600 may include a multicore processor (e.g., processor 660), a memory controller 664, an input/output (I/O) controller 670, and one or more BIOS (basic input/output system) memories (e.g., BIOS memory 670). In one embodiment, processor 660, memory controller 664, I/O controller 680 and BIOS memory 690 may reside on a chipset 661. As described herein, the term “chipset” is used in a manner well known to those of ordinary skill in the art to describe collectively, the various devices coupled to the processor 660 to perform desired system functionality. In an alternative embodiment, processor 660, memory controller 664, I/O controller 680 and BIOS memory 690 may reside on other types of component boards, for example, a daughter board.
The memory controller 664 controls operations between processor 660 and a memory device 670 including, for example, memory modules comprised of random access memory (RAM), dynamic RAM (DRAM), static RAM (SRAM), synchronous DRAM (SDRAM), double data rate (DDR) SDRAM (DDR-SDRAM), Rambus DRAM (RDRAM) or any device capable of supporting high-speed storage of data. The I/O controller 680 may control operations between processor 660 and one or more input/output (I/O) devices 685, for examples, a keyboard and a mouse over a low pin count (LPC) bus 689. The I/O controller 680 may also control operations between processor 660 and peripheral devices, for example, a drive 686 coupled to I/O controller 680 via an integrated drive electronics (IDE) interface 687. Additional buses may also be coupled to I/O controller 680 for controlling other devices, for examples, a peripheral component interconnect (PCI) link 682, or follow on point-to-point link (e.g., PCIx, PCI Express), or a universal serial bus (USB) 688. In one embodiment, the memory controller 664 may be integrated into processor 660 or integrated with I/O controller 680 into a single component.
In the embodiment illustrated, a driver controller 683 may be coupled to PCI link 682 and may control operations of hard disk drive 681. In one embodiment, VM 602, write barrier logic 610, run-time storage manager (RSM) 612 and garbage collector (GC) 614 may be stored on the hard disk drive 681. In this manner, the hard disk drive 681 may serve as the boot-up device including, for example, a loader program to load the various host components as well as the VM 602 to load MRTE components.
BIOS memory 690 may be coupled to I/O controller 680 via bus 684. BIOS memory 690 is a non-volatile programmable memory, for example, a flash memory that retains the contents of data stored within it even after power is no longer supplied. Alternatively, BIOS memory 690 may be other types of programmable memory devices, for examples, a programmable read only memory (PROM) and an erasable programmable read only memory (EPROM). Computer system 600 may also include other BIOS memories in addition to BIOS memory 690.
Accordingly, as shown in
Representatively, CPUs 760 access shared memory 770 via interconnection network 780. In one embodiment, shared memory 770 may include, but is not limited to, a double-sided memory package including memory modules comprised of random access memory (RAM), dynamic RAM (DRAM), static RAM (SRAM), synchronous DRAM (SDRAM), double data rate (DDR) SDRAM (DDR-SDRAM), Rambus DRAM (RDRAM) or any device capable of supporting high-speed storage of data.
Accordingly, in the embodiments described, hardware-based dynamic escape detection within an MRTE, for example, as shown in
Turning now to
In addition, embodiments are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement embodiments as described herein. Furthermore, it is common in the art to speak of software, in one form or another (e.g., program, procedure, process, application, etc.), as taking an action or causing a result. Such expressions are merely a shorthand way of saying that execution of the software by a computing device causes the device to perform an action or produce a result.
Referring again to
Unfortunately, the determination of whether a second object, or target object, has a local scope, can be a very time-consuming process, requiring the querying of an object descriptor of the second object 254 to determine whether a reachability attribute, or scope attribute of the object descriptor, indicates that the second object has a local scope. Accordingly, in one embodiment, a single instruction is provided, such as, for example, AAT instructions (216 and 218), as shown in
Referring again to
Referring again to
Accordingly, at process block 850, the single instruction invokes a handler routine to verify that a scope of the second object is local. Operations performed by the handler routine are further described with reference to
Referring again to
In one embodiment, dynamic escape detection of the second object is detected when the local object descriptor identifies that the second object has a local scope. Accordingly, at process block 860, the second object is converted from a local object to a global object. Conversion of the second object from a local object to a global object is described with reference to
Accordingly, in the embodiment illustrated with reference to FIG, 14, the attribute bits associated with the program objects are not persistent, but are available as long as such start cache lines of the local objects are within the local cache memory of a respective thread. To maintain start cache lines of the local objects, in one embodiment, the handler routine is notified whenever a cache line is evicted with an attribute indicating that the cache line identifies a program object having a local scope.
Accordingly, in contrast to conventional run-time environments, MRTE 100, as shown in
It will be appreciated that, for other embodiments, a different system configuration may be used. For example, while the MRTE 600 (
Elements of embodiments may also be provided as an article of manufacturing including a machine-readable medium for storing the machine-executable instructions. The machine-readable medium may include, but is not limited to, flash memory, optical disks, compact disks-read only memory (CD-ROM), digital versatile/video disks (DVD) ROM, random access memory (RAM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), magnetic or optical cards, propagation media or other type of machine-readable media suitable for storing electronic instructions. For example, embodiments described may be downloaded as a computer program which may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., a modem or network connection).
It should be appreciated that reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Therefore, it is emphasized and should be appreciated that two or more references to “an embodiment” or “one embodiment” or “an alternative embodiment” in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments.
In the above detailed description of various embodiments, reference is made to the accompanying drawings, which form a part hereof, and in which are shown by way of illustration, and not of limitation, specific embodiments in which the invention may be practiced. In the drawings, like numerals describe substantially similar components throughout the several views. The embodiments illustrated are described in sufficient detail to enable those skilled in to the art to practice the teachings disclosed herein. Other embodiments may be utilized and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of various embodiments is defined only by the appended claims, along with the full range of equivalents to which such claims are entitled.
Having disclosed embodiments and the best mode, modifications and variations may be made to the disclosed embodiments while remaining within the scope of the embodiments as defined by the following claims.