1. Field of the Invention
The field of the invention is data processing, or, more specifically, methods, systems, and products for managing computer memory in a computer with dynamic logical partitioning.
2. Description Of Related Art
The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely complicated devices. Today's computers are much more sophisticated than early systems such as the EDVAC. Computer systems typically include a combination of hardware and software components, application programs, operating systems, processors, buses, memory, input/output devices, and so on. As advances in semiconductor processing and computer architecture push the performance of the computer higher and higher, more sophisticated computer software has evolved to take advantage of the higher performance of the hardware, resulting in computer systems today that are much more powerful than just a few years ago.
Today there is a tendency to develop systems that are increasingly large in terms of the number of processors, number of input/output (“I/O”) slots, and memory size. Although advances in the design of computer hardware continue to provide rapid increases in the sizes of these physical resources, some major applications and subsystems lag behind in scalability. There is a trend therefore to provide systems with partitioning, physical partitions or logical partitions, so that the underlying computer system itself provides granularity of function. Physical partitions provide granularity of partitioning that is typically relatively coarse, because the partitioning occurs at physical boundaries such as multi-chip modules (“MCMs”), backplanes, daughter boards, mother boards, or other system boards. In a logically partitioned system, the granularity of partitioning is typically much more fine-grained, such as a single CPU or even a fraction of a CPU, a small block of memory, or an I/O slot instead of an entire I/O bus. With logical partitioning, a given set of computer resources can be subdivided into many more logical partitions than physical partitions.
A logical partition LPAR (“LPAR”) is a subset of computer resources that can host an instance of an operating system (“O/S”). LPARs are implemented through special hardware registers and a trusted firmware component called a hypervisor. Together, these components build a tight architectural ‘box’ around each logical partition, confining partition operations to an exclusive set of processor, memory, and I/O resources assigned to that partition. Today, as computer systems become larger and larger, the ability to run several instances of operating systems on a given hardware system, so that each O/S instance plus its subsystems scale or perform well, support optimum use of the hardware and translates into cost saving. Although static partitioning helps to tune overall system performance, logically partitioned systems today also may provide ‘dynamic reconfiguration’ capabilities, enabling the movement of hardware resources, processors, memory, I/O slots, and so on, to or from an LPAR or from one LPAR to another, without requiring reboots. Dynamic reconfiguration enables an improved solution by providing the capability to dynamically move hardware resources to a needy O/S in a timely fashion to match workload demands.
Typical dynamic reconfiguration tools today, however, rely upon cooperation or coordination between a hypervisor and an operating system in an LPAR, a pattern of computer operation that has some drawbacks. In dynamic reconfiguration of memory, for example, an O/S may hold bolted or pinned page frames, that the O/S will not release. Many different operating systems may run in separate LPARs at the same time on the same system. IBM's POWER™ hypervisor, for example, supports three different operating systems. One or more of the supported operating systems simply may not support the functions required for such cooperation with a hypervisor. In addition, in a cooperative scheme, management of memory becomes more complex in a cooperative scheme as an errant or malicious instance of an O/S, not only may not cooperate at all, but may actually act in a manner harmful to efficient computer resource management.
Methods, systems, and products are provided for managing computer memory in a computer with dynamic logical partitioning that operate transparently with respect to operating systems in logical partitions. Exemplary methods, systems, and products are described for managing computer memory in a computer with dynamic logical partitioning that include copying by a hypervisor, from page frames in one logical memory block (“LMB”) of a logical partition (“LPAR”) to page frames outside the LMB, contents of page frames having page frame numbers in a page table for an operating system in the LPAR. Embodiments typically include storing new page frame numbers in the page table, including storing by the hypervisor, for each page frame whose contents are copied, a new page frame number that identifies the page frame to which contents are copied. In typical embodiments, copying contents of page frames and storing new page frame numbers are carried out transparently with respect to the operating system.
Typical embodiments also include creating by the hypervisor a list of all the page frames in the page table; monitoring by the hypervisor calls from the operating system to the hypervisor that add page frames to the page table while the hypervisor is copying contents of page frames and storing new page frame numbers; adding to the list page frames added to the page table; and where copying contents of page frames is carried out by copying contents of page frames on the list.
In some embodiments, memory pages of more than one size are mapped to page frames of an LMB. Such embodiments typically include vectoring memory management interrupts from the operating system to the hypervisor and switching memory management operations for the operating system from the page table for the operating system to a temporary alternative page table. In such embodiments, copying contents of page frames typically is carried out by copying contents of page frames in segments having the same size as the smallest of the pages mapped to page frames of the LMB. Copying contents of page frames in such embodiments may be carried out by deleting, from the temporary alternative page table, page frames that are also in the page table for the operating system and storing, in the page table for the operating system, the status bits of such deleted page frames.
In some embodiments, page frames of an LMB may be mapped for direct memory access (“DMA”). Copying contents of page frames in such embodiments may include blocking, by the hypervisor, DMA operations while copying contents of page frames mapped for DMA and storing, in a DMA map table for each page frame of the LMB mapped for DMA, a new page frame number that identifies the page frame to which contents are copied.
Embodiments may include creating a segment of free contiguous memory that is both larger than an LMB and also large enough to contain a page table. Creating a segment of free contiguous memory may be accomplished by carrying out the following steps repeatedly by the hypervisor for two or more contiguous LMBs: copying by the hypervisor, from page frames in the LMBs to page frames outside the LMBs, contents of page frames of the LMBs that are in a page table for an operating system in the LPAR; storing new page frame numbers in the page table, including storing by the hypervisor, for each page frame whose contents are copied, a new page frame number that identifies the page frame to which contents are copied; and adding the LMBs to a list of free memory for the system.
Embodiments may also include improving the affinity of an LMB to a processor. In such embodiments, copying contents of page frames of the LMB may include copying contents of page frames of the LMB to interim page frames outside the LMB, copying contents of page frames of a second LMB to the page frames of the LMB, and copying contents of the interim page frames to page frames of the second LMB. In such embodiments, storing new page frame numbers may include storing new page frame numbers that identify the page frames to which contents are copied both for contents of the LMB and for contents of the second LMB.
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular descriptions of exemplary embodiments of the invention as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts of exemplary embodiments of the invention.
Exemplary methods, systems, and products for managing computer memory in a computer with dynamic logical partitioning according to embodiments of the present invention are described with reference to the accompanying drawings, beginning with
Stored in RAM (168) is an application program (158), computer program instructions for user-level data processing implementing threads of execution. Also stored in RAM (168) is a hypervisor (102), a set of computer program instructions for managing resources in LPARs improved for managing computer memory in a computer with dynamic logical partitioning according to embodiments of the present invention. Also stored in RAM (168) is an operating system (154). Operating systems useful in computers according to embodiments of the present invention include UNIX™, Linux™, Microsoft NT™, AIX™, IBM's i5/OS™, and others as will occur to those of skill in the art. Operating system (154) and application program (158) are disposed within an LPAR (450). Operating system (154), application program (158), and hypervisor (102) in the example of
The system of
Computer (152) of
The example computer of
The exemplary computer (152) of
For further explanation,
A multi-chip module or ‘MCM’ is an electronic system or subsystem with two or more bare integrated circuits (bare dies) or ‘chip-sized packages’ assembled on a substrate. In the example of
The MCMs of
Accessing memory off the MCM takes longer than accessing memory on the same MCM with the processor, because computer instructions for accessing such memory and return data from such memory must traverse more computer hardware, memory management units, bus drivers, not to mention the length of bus lands and wires which themselves are a consideration at today's computation speeds. Accessing memory off the same backplane takes even longer—for the same reasons. Memory on the same MCM with the processor accessing it therefore is said to have closer affinity than memory off the MCM, and memory on the same backplane with an accessing processor is said to have closer affinity than memory on another backplane. The computer architecture so described is for explanation, not for limitation of the computer memory. Several MCMs may be installed upon printed circuit boards, for example, with the printed circuit boards plugged into backplanes, thereby creating an additional level of affinity not illustrated in
For further explanation,
The system of
Each operating system image (154) requires a range of memory that can be accessed in real addressing mode. In this mode, no virtual address translation is performed, and addresses start at address 0. Operating systems typically use this address range for startup kernel code, fixed kernel structures, and interrupt vectors. Since multiple partitions can not be allowed to share the same memory range at physical address 0, each LPAR must have its own real mode addressing range.
The hypervisor assigns each LPAR a unique real mode address offset and range value, and then sets these offset and range values into registers in each processor in the partition. These values map to a physical memory address range that has been exclusively assigned to that partition. When partition programs access instructions and data in real addressing mode, the hardware automatically adds the real mode offset value to each address before accessing physical memory. In this way, each logical partition programming model appears to have access to physical address 0, even though addresses are being transparently redirected to another address range. Hardware logic prevents modification of these registers by operating system code running in the partitions. Any attempt to access a real address outside the assigned range results in an addressing exception interrupt, which is handled by the operating system exception handler in the partition.
Operating systems use another type of addressing, virtual addressing, to give user application threads an effective address space that exceeds the amount of physical memory installed in the system. The operating system does this by paging infrequently used programs and data from memory out to disk, and bringing them back into physical memory on demand.
When applications access instructions and data in virtual addressing mode, they are not aware that their addresses are being translated by virtual memory management using page translation tables (416). These tables, referred to generally in this specification as ‘page tables,’ reside in system memory, and each partition has its own exclusive page table administered on its behalf by the hypervisor. Processors use these tables (via calls to the hypervisor) to transparently convert a program's virtual address (424) into the physical address (422) where that page has been mapped into physical memory. If, when a thread accesses a page of memory, the page frame has been moved out of physical memory onto disk, the operating system receives a page fault.
In a non-LPAR operation, an operating system creates and maintains page table entries directly, using real mode addressing to access the tables. In a logical partitioning operation, the page translation tables are placed in reserved physical memory regions that are only accessible to the hypervisor. In other words, a partition's page table is located outside the partition's real mode address range. The register that provides a processor the physical address of its page table can only be modified by the hypervisor.
Virtual addresses are implemented as a combination of a virtual page number (424) and an offset within a virtual page. Real addresses are implemented as a combination of a page frame number (422) that identifies a page of real memory and an offset within that page. The offset for a virtual address is also the offset for the real address to which the virtual address is mapped. Page tables map virtual addresses to real addresses, but because the offsets are equal, the page tables map with only the virtual page numbers and the corresponding page frame numbers. The offsets are not included in the page tables.
When an operating system (154) needs to create a page translation mapping, it executes a call to the hypervisor (102) on a processor (156), which transfers execution to the hypervisor. The hypervisor creates the page table entry on the partition's behalf and stores it in the page table. Threads can also make hypervisor calls to modify or delete existing page table entries. Page table entries only map into specific physical memory regions, called logical memory blocks or ‘LMBs,’ which are assigned in granular segments to each LMB. These LMBs provide the physical memory that backs up the LPAR's virtual page address spaces. An LPAR's memory, therefore, is generally made up of LMBs which may be assigned in any order from anywhere in physical memory.
I/O hardware use direct memory access (‘DMA’) operations to move data between I/O adapters in I/O slots (407) and page frames (406) in system memory. DMA operations use an address relocation mechanism similar to page tables. I/O hardware translates addresses (425) generated by I/O devices in I/O slots into physical memory addresses. I/O hardware makes this translation with a DMA map (650), sometimes also called a translation control entry (‘TCE’) table, stored in physical memory. As with page tables, the DMA map resides in a physical address region of system memory that is inaccessible by partitions and only accessible by the hypervisor. By calling a hypervisor service, partition programs can create, modify, or delete DMA map entries for an I/O slot assigned to that partition. When the I/O hardware translates an I/O adapter DMA address into physical memory, the resulting address falls within the physical memory space assigned to that partition.
For further explanation,
The method of
The effect of these memory management operations is illustrated with page tables (416, 418). Page tables (416, 418) are the same page table illustrated before (416) and after (418) memory management operations in the method of
In the method of
For further explanation,
The method of
In the example of
When a memory management interrupt occurs, the hypervisor looks up the real page table of the operating system to see whether the memory management interrupt would have occurred if the partition's real page table was in use. If so, the hypervisor gives control to the OS memory management interrupt vector. Otherwise, the page frame entry is inserted into the temporary alternative page table (if a copy operation is not in progress).
In the method of
For further explanation,
In the method of
DMA maps (650, 652) illustrate the effects of memory management operations according to the method of
Page tables typically are large data structures, often substantially larger than an LMB. When a system administrator tries to create a new LPAR dynamically (without a reboot) there may not be enough contiguous memory available for the page table for the new LPAR. Managing computer memory in a computer with dynamic logical partitioning according to embodiments of the present invention advantageously therefore may include creating a segment of free contiguous memory that is both larger than an LMB and also large enough to contain a page table.
For further explanation,
The method of
Often more than two contiguous LMBs must be freed to make room for a page table. The method of
As affinity of accessed memory decreases with respect to an accessing processor, overall system performance degrades. Managing computer memory in a computer with dynamic logical partitioning according to embodiments of the present invention advantageously therefore may include improving the affinity of an LMB to a processor. For further explanation,
In the example of
Page table entries for two partitions on MCMs (704, 705) respectively are illustrated in page tables (416, 418, 417, and 419). Page tables (416, 418) show page table entries for MCM (705) before (416) and after (418) affinity improvement operations respectively. Similarly, page tables (417, 419) show page table entries for MCM (704) before (417) and after (419) affinity improvement operations respectively. Page table (416) shows that virtual page numbers 567, 568, and 569, in use by threads running on processor (157) on MCM (705), are mapped to page frames 666, 667, and 668, which are physically located in LMB (402) on MCM (704) having remote affinity with respect to processor (157). Similarly, page table (417) shows that virtual page numbers 444, 445, and 446, in use by threads running on processor (156) on MCM (704), are mapped to page frames 853, 854, and 855, which are physically located in LMB (403) on MCM (705) having remote affinity with respect to processor (156). Overall processor-memory affinity and memory management efficiency could be improved, for example, if pages frames mapped to virtual pages in use on the processors could be located or moved to physical memory on the same MCM with the processor. In addition, an LPAR may be implemented with processors on multiple MCMs, and such an LPAR may have multiple page tables also, for example, one for each MCM. Improving the affinity of an LMB to a processor according to embodiments of the present invention is useful also for such an LPAR with multiple page tables and processors on multiple MCMs.
The method of
Page tables (418, 419) show the effects of these affinity improvement operations. Page table (418) shows that virtual page numbers 567, 568, and 569, in use by threads running on processor (157) on MCM (705), are now mapped to page frames 853, 854, and 855, which are physically located in LMB (403) on MCM (705), now having close affinity with respect to processor (157) on the same MCM. Similarly, page table (419) shows that virtual page numbers 444,445, and 446, in use by threads running on processor (156) on MCM (704), are now mapped to page frames 666, 667, and 668, which are physically located in LMB (402) on MCM (704) having close affinity with respect to processor (156) on the same MCM.
Exemplary embodiments of the present invention are described largely in the context of a fully functional computer system for managing computer memory in a computer with dynamic logical partitioning. Readers of skill in the art will recognize, however, that the present invention also may be embodied in a computer program product disposed on signal bearing media for use with any suitable data processing system. Such signal bearing media may be transmission media or recordable media for machine-readable information, including magnetic media, optical media, or other suitable media. Examples of recordable media include magnetic disks in hard drives or diskettes, compact disks for optical drives, magnetic tape, and others as will occur to those of skill in the art. Examples of transmission media include telephone networks for voice communications and digital data communications networks such as, for example, Ethernets™ and networks that communicate with the Internet Protocol and the World Wide Web. Persons skilled in the art will immediately recognize that any computer system having suitable programming means will be capable of executing the steps of the method of the invention as embodied in a program product. Persons skilled in the art will recognize immediately that, although some of the exemplary embodiments described in this specification are oriented to software installed and executing on computer hardware, nevertheless, alternative embodiments implemented as firmware or as hardware are well within the scope of the present invention.
It will be understood from the foregoing description that modifications and changes may be made in various embodiments of the present invention without departing from its true spirit. The descriptions in this specification are for purposes of illustration only and are not to be construed in a limiting sense. The scope of the present invention is limited only by the language of the following claims.