Programs executing in a virtual memory system use virtual memory addresses. The virtual memory addresses are translated by a Memory Management Unit (MMU) to physical memory addresses that are used to access the physical memory. The virtual memory is typically much larger than the physical memory. For example, the virtual memory may be 4 Giga Bytes (GB) and the physical memory may only be 64 Kilo Bytes (KB). The MMU maps the 4 GB virtual memory address space to the 64 KB physical address space.
A multi-threaded application has multiple threads of execution which execute in parallel. Each thread is a sequential flow of control within the same application (program) and runs independently from the others, but at the same time. The thread runs within the context of the application and takes advantage of the resources allocated for the application and the application's environment. A thread must have its own resources within a running application, for example, it must have its own execution stack (portion of memory) and its own copy of the processor's registers.
Initially, a thread is typically given a fixed size execution stack (portion of virtual memory), for example, 8 KB. This stack memory size is more than sufficient for most threads and in some cases, less memory than that initially allocated would suffice. However, situations arise when 8 KB is not sufficient to carry out certain infrequent tasks, for instance, to run applications that allocate arrays or buffers as local variables. Initially, a thread is typically given a fixed size execution stack (portion of virtual memory), for example, 8 KB. This stack memory size is more than sufficient for most threads and in some cases, less memory than that initially allocated would suffice. However, situations arise when 8 KB is not sufficient to carry out certain infrequent tasks, for instance, to run applications that allocate arrays r buffers as local variables. The additional memory is allocated when needed. Thus, the execution stack memory can grow unpredictably.
Computer programs written in the JAVA programming language, typically referred to as JAVA applications typically require additional initial stack memory. JAVA is an object-oriented programming language developed by Sun Microsystems, Inc. As is well-known in the art, a JAVA application is a platform-independent program. In contrast to a native application that is compiled for a specific platform (hardware (computer and operating system)), the JAVA application can execute on any platform (hardware or software environment). The JAVA platform is a software-only platform that runs on top of other hardware-based platforms. The JAVA platform has two components: The JAVA virtual machine (JAVA VM) and the JAVA Application Programming Interface (JAVA API). JAVA source code files are compiled into an intermediate language called JAVA bytecodes (platform independent codes). Each time the program is executed, an interpreter in the JAVA Virtual Machine (VM) on the system parses and runs each JAVA bytecode instruction. The JAVA bytecodes are machine code instructions for the JAVA Virtual machine.
The JAVA VM initially allocates a small initial amount of virtual memory, for example, 16 KB for the stack in each JAVA thread and additional virtual memory is allocated to the stack when needed. By allocating a small initial amount of virtual memory, the interpreter must periodically check the current status of virtual memory, that is, if there is enough room in the stack for stack operations for example, on every procedure call. This “steals” CPU time from application execution. Also, because the virtual memory is allocated on demand, the virtual memory allocated to the stack is not contiguous. Instead, the allocated virtual memory is a linked list of blocks with a block added to the list to increase the size of the stack when needed. The blocks may even be allocated from different sections of virtual memory. Thus, the interpreter must also switch between sections of virtual memory comprising the JAVA stack.
Frequent checks of the stack memory status are eliminated by allocating a substantially larger amount of virtual memory to the stack than will typically be used by the thread. The virtual memory is only allocated once to the thread. For example, instead of the 1 6 KB initial virtual memory allocated to a thread for a JAVA application, 64 KB of virtual memory is allocated. However, at the time of allocation only one page of the allocated virtual memory is mapped to a physical page in the system. Thus, no unnecessary physical memory is allocated to the stack.
Later, as the stack expands, more pages of the virtual memory in the stack are mapped to physical memory up to the limit of the allocated stack segment size. As the stack shrinks, mapped physical pages that are no longer being used can be efficiently returned to the system.
Also, the last allocated virtual memory page is designated to be an inaccessible page, so that if for some reason the thread reaches the end of the allocated virtual memory, a stack overflow condition is reported.
A computer implemented method for allocating memory for use as stack memory is provided. A continuous block of virtual memory is allocated for the stack memory. The size of the allocated block is substantially larger than necessary for the stack memory. A virtual page at the top of the allocated block is mapped to a first physical page of physical memory. Upon detecting an access to a next virtual page of the allocated block, the next virtual page is mapped to a second physical page of the physical memory.
The page at the bottom of the allocated block may be identified as inaccessible to allow detection of a stack overflow condition. The stack memory may be allocated for use by a thread, which may be an application thread or a kernel thread. The application thread may be for a JAVA application. In one embodiment, the allocated block of memory is 64 KB and the physical page of memory is 4 KB. The second physical page may not be contiguous with the first physical page.
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention.
A description of preferred embodiments of the invention follows.
In the embodiment shown, the processor core 104 has 32 address bits allowing it to address a 4 GB memory space. The 4 GB addressable memory space is commonly referred to as virtual memory space. The physical memory space (physical memory present in the system) includes the memory 112 and the cache memory 106. Typically the physical memory space is smaller than the virtual memory space. The microprocessor 102 includes a memory management unit (MMU) 108 which handles mapping of the virtual memory addresses (120, 122) generated by the processor core 104 to physical addresses (124, 126) for accessing the physical memory in the system.
The system includes primary storage such as memory 112 which may be semiconductor memory, for example, Random Access Memory (RAM) and secondary storage 116 which may be a disk drive or CD-ROM. The secondary storage is accessed through a storage controller 114.
One of the virtual memory regions 202 is allocated for use as stack memory for working threads. The stack is a region of allocated memory in which a program (application) stores status data such as procedure and function call addresses, passed parameters and sometimes local variables. As is well-known to those skilled in the art, when memory is allocated to the stack, the memory is reserved for use by the stack. The assigned virtual region 202 is logically divided into blocks 204 of the same size, each block 204 in the virtual region 202 is available to be allocated to a thread for its stack segment (native or JAVA). Each block 204 in the assigned region 202 is subdivided into pages. In the ARM architecture, a set of 4096 (4 K) bytes aligned to a 4 K byte boundary is a standard-sized page. However, larger pages (e.g., 64 K) are also permitted. In the embodiment shown, the virtual memory space 206 is 4 GB, the physical space 208 is 64 KB, each block in the virtual region is 64 KB and the page size is 4 KB. The region includes a control block 224 which is used for storing control data structures for managing the region 202. The control block 224 will be described later in conjunction with
Prior to using a virtual memory address (address that a computer program uses to reference memory), the virtual memory location must be mapped to a physical address (hardware address). The hardware address is the address that corresponds to the hardware memory location in physical memory. The physical memory is the memory that is present in the system. The virtual memory address is mapped by translating the virtual memory address into a physical memory address.
A plurality of page tables 220 are used to map pages in the virtual memory space 206 to pages in physical memory space 208. Each page table 220 includes a plurality of page table entries 212. A “page table entry” (PTE) is a descriptor which identifies the mapped physical page and the access information associated with the physical page. In the ARM architecture, a “page table” has a set of 256 consecutive page table entries 212, with each page table entry having 32 bits. Multiple page tables can exist contiguously, or scattered, in memory. Each virtual page in the virtual address space 206 has an associated page table entry 212 in a page table 220. The MMU 108 interprets a PTE 212 that is associated with a virtual address and stored in the page tables 220 and uses the PTE 212 to translate the virtual memory address to the corresponding physical memory address.
The control block 224 also includes a last mapped page register 252 for each block in the region. The virtual address of the last page mapped for the block 204 is stored in the last mapped page register 252 associated with the block 204. The stack operates as a Last In First Out (LIFO) memory, with the last object written to the stack being the first object read from the stack. Thus, the stack grows and shrinks dependent on the number of objects stored. A stack pointer keeps track of the last object stored in the stack. A stack pointer is a register that contains the current address of the top element of the stack.
As the stack expands, the next page in a block 204 in the region 202 allocated for the stack can be automatically mapped in response to a page fault for the block. As is well-known to those skilled in the art, a page fault occurs when software attempts to access (read or write) a virtual memory address that is not mapped to a physical memory address, that is, the unmapped page is marked “not present.” After detecting a page fault, the next page is automatically mapped by comparing the virtual address that caused the fault with the virtual address for the last mapped page stored in the last mapped page register 252 for the block 204 in the control block 224. Thus, by storing the last address of the last mapped page 252 for each block, an access to the page table 220 is avoided in order to determine whether a page in the block 204 is mapped. Also, as the stack shrinks, mapped pages that are no longer required can be easily determined by comparing the stack pointer with the address of the last page mapped for the block 204.
The first page 402 of the allocated block 204 is mapped to a physical page 222 in physical memory space 208 when the virtual memory block 204 is first allocated to a thread, so that the stack is immediately ready to be used without causing an initial page fault. One of the parameters of the allocation request is the direction of the stack growth (increasing or decreasing virtual addresses from the initial virtual address provided). In the embodiment shown, the stack virtual address increases from the initial virtual address provided and thus the first page 402 in the block 204 is mapped. Dependent on the direction of the stack growth, either the first 402 or the last page 404 of the block 204 is initially mapped. In addition, the page translation entry 400 corresponding to the last page 404 at the opposite end of the block 204 is marked as inaccessible. For example, the last page 404 can be marked as inaccessible in the access information field 302 in the PTE 212 associated with the page. Thus, although the last page 404 is marked as mapped in the PTE 400, the access is “inaccessible.”
In one embodiment, the first page of the block is the top of the block and the last page of the block is the bottom of the block. In an alternate embodiment, the last page of the block is the top of the block and the first page of the block is the bottom of the block.
At step 600, a thread is created for a JAVA application. As part of the initialization of the thread, a contiguous block of virtual memory 204 is allocated as stack memory for use by the thread. The allocated block 204 is substantially larger than necessary for the stack memory. In one embodiment, a typical thread uses 10-20 KB of memory and a 64 KB contiguous block of virtual memory 204 is allocated.
At step 602, dependent on the direction of growth of the stack, the page at the top of the stack (the first page 402 or last page 404) of the contiguous block of virtual memory 204 is mapped to a physical block of memory 222. In a system with 4 KB pages, only one 4 KB page (first or last) of the 64 KB block of virtual memory is mapped to physical memory. Thus, only 4 KB of the physical memory is used initially as stack memory by the thread, but the remaining 60 KB of the contiguous block of virtual memory 204 allocated to the thread, is available for use by the thread, if needed.
At step 604, the page (last or first) at the opposite end of the allocated block of virtual memory 204 to the mapped page, is marked as inaccessible, to allow reporting of a stack overflow condition. This inaccessible page is referred to as a guard page.
At step 700, the memory page handler checks the virtual address that caused the fault. Typically, the virtual address that caused the fault is stored in one of the processor's registers.
The page fault exception handler checks that the page fault exception was due to the currently executing thread and its stack. If the virtual address that caused the page fault exception is within 4 KB of the virtual address that was last mapped to a physical address, based on the address of the last page mapped for the stack 252 stored in the control block 224, at step 702, the virtual address is related to the stack memory and another physical page 222 is automatically mapped to the next contiguous virtual page in the block 204. Control returns to the application from which the page fault exception was generated. Thus, the application is not disrupted and continues to execute as if the virtual page had been originally mapped to the physical page 222 through a PTE 212.
At step 704, if the virtual address is within the guard page 404, then at step 706 instead of mapping the virtual page to a physical page 222, a stack overflow condition is generated by the page fault exception handler, indicating that the stack for the thread has exceeded the 64 K contiguous block allocated for it. As the size of the stack allocated for each thread is restricted to the initial 64 KB allocated, a stack overflow handler is called to handle this exception condition.
At step 708, the virtual address is not within the guard page 404 or within 4 K of the last mapped page. Another handler is called to process this page fault exception condition.
As the stack shrinks, previously mapped physical pages 222 are no longer needed. Thus, these physical 222 pages can be returned to the system for use by other working threads. As the virtual address for the last mapped page for each block 204 is stored in the control region 224, it can be easily compared with the address stored in the current stack pointer. Upon detecting that the virtual address stored in the current stack pointer is less than the virtual address for the last mapped page, the last mapped page in the block 204 can be easily unmapped by modifying the associated PTE 212 in the page table.
Thus, no frequent checks of the status of the stack memory are needed. Therefore, more “CPU time” is available for other applications. Furthermore, no unnecessary physical memory is mapped and unused ranges of mapped physical memory can be efficiently returned to the system.
The invention has been described for allocating a stack for an application thread. However, the invention is not limited to application threads, the invention can also be used to allocate stack memory for a kernel (operating system) thread. As is well known in the art, a kernel is the core of an operating system, that is, the portion of the operating system that manages memory, files and peripheral devices and allocates system resources. An operating system is the software that controls the allocation and usage of hardware resources such as memory, disk space, and peripheral devices. Furthermore, the invention is not limited to allocation of memory to stacks, the invention can be used for any process or thread that requires allocation of a block of virtual memory where it is required or desired by design to gradually and undirectionally increase the utilization of such block's addresses.
It will be apparent to those of ordinary skill in the art that methods involved in the present invention may be embodied in a computer program product that includes a computer usable medium. For example, such a computer usable medium may consist of a read only memory device, such as a CD ROM disk or conventional ROM devices, or a random access memory, such as a hard drive device or a computer diskette, having a computer readable program code stored thereon.
While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.
This application claims the benefit of U.S. Provisional Application No. 60/550,241, filed on Mar. 4, 2004. The entire teachings of the above application are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60550241 | Mar 2004 | US |