The present invention is related to the field of computer systems and more particularly non-uniform memory access computer systems.
As the value and use of information continues to increase, individuals and businesses seek additional ways to process and store information. One option available to users is information handling systems. An information handling system generally processes, compiles, stores, and/or communicates information or data for business, personal, or other purposes thereby allowing users to take advantage of the value of the information. Because technology and information handling needs and requirements vary between different users or applications, information handling systems may also vary regarding what information is handled, how the information is handled, how much information is processed, stored, or communicated, and how quickly and efficiently the information may be processed, stored, or communicated. The variations in information handling systems allow for information handling systems to be general or configured for a specific user or specific use such as financial transaction processing, airline reservations, enterprise data storage, or global communications. In addition, information handling systems may include a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems.
One type of information handling system is a non-uniform memory access (NUMA) server. A NUMA server is implemented as a plurality of server “nodes” where each node includes one or more processors and system memory that is “local” to the node. The nodes are interconnected so that the system memory on one node is accessible to the processors on the other nodes. Processors are connected to their local memory by a local bus. Processors connect to remote system memories via the NUMA interconnect. The local bus is shorter and faster than the NUMA interconnect so that the access time associated with a processor access to local memory (a local access) is less than the access time associated with a processor access to remote memory (a remote access). In contrast, conventional Symmetric Multiprocessor (SMP) systems are characterized by substantially uniform access to any portion of system memory by any processor in the system.
NUMA systems are, in part, a recognition of the limited bandwidth of the local bus in an SMP system. The performance of an SMP system varies non-linearly with the number of processors. As a practical matter, the bandwidth limitations of the SMP local bus represent an insurmountable barrier to improved system performance after approximately four processors have been connected to the local bus. Many NUMA implementations use 2-processor or 4-processor SMP systems for each node with an NUMA interconnection between each pair of nodes to achieve improved system performance.
The non-uniform characteristics of NUMA servers represent an opportunity and/or challenge for NUMA server operating systems. The benefits of NUMA are best realized when the operating system is proficient at allocating tasks or threads to the node where the majority of memory access transactions will be local. NUMA performance is negatively impacted when a processor on one node is executing a thread in which remote memory access transactions are prevalent. This characteristic is embodied in a concept referred to as memory affinity. In a NUMA server, memory affinity refers to the relationship (e.g., local or remote) between portions of system memory and the server nodes.
Some NUMA implementations support, at one level, the concept of memory migration. Memory migration refers to the relocation of a portion of system memory. For example, a bank/card of memory can be hot plugged into an empty memory slot or as a replacement for an existing memory slot. After a new memory bank/card is installed, the server BIOS can copy or migrate the contents of any portion of memory to the new memory and reprogram address decoders accordingly. If, however, memory is migrated to a portion of system memory that resides on a node that is different than the node on which the original memory resided, performance problems may arise due to a change in memory affinity. Threads or processes that, before the memory migration event, were executing efficiently because the majority of their memory accesses were local may execute inefficiently after the memory migration event because the majority of their memory accesses have become remote.
Therefore a need has arisen for a NUMA-type information handling system operable to dynamically adjust its memory affinity structure following a memory migration event.
The present disclosure describes a system and method for modifying memory affinity information in response to a memory migration event.
In one aspect, an information handling system, implemented in one embodiment as a non-uniform memory architecture (NUMA) server, includes a first node and a second node. Each node includes one or more processors and a local system memory accessible to its processor(s) via a local bus. A NUMA interconnect between the first node and the second node enables a processor on the first node to access the system memory on the second node.
The information handling system includes affinity information. The affinity information is indicative of a proximity relationship between portions of system memory and the nodes of the NUMA server. A memory migration module copies the contents of a block of memory cells from a first portion of memory on the first node to a second portion of memory on the second node. The migration module preferably also reassigns a first range of memory addresses from the first portion to the second portion. An affinity module detects a memory migration event and responds by modifying the affinity information to indicate the second node as being local to the range of memory addresses.
In another aspect, a disclosed computer program (software) product includes instructions for detecting a memory migration event which includes reassigning a first range of memory addresses from a first portion of memory that resides on a first node of the NUMA server to a second portion of memory on a second node of the server. The product further includes instructions for modifying the affinity information to reflect the first block of memory as being located on the second node of the server.
In yet another aspect, an embodiment of a method for maintaining an affinity structure in an information handling system as claimed includes modifying an affinity table storing data indicative of a node location of a corresponding portion of system memory following a memory migration event. An operating system is notified of the memory migration event. The operating system responds by updating operating system affinity information to reflect the updated affinity table.
The present disclosure includes a number of important technical advantages. One technical advantage is the ability to maintain affinity information in a NUMA server following a memory migration event that could alter affinity information and have a potentially negative performance effect. Additional advantages will be apparent to those of skill in the art and from the FIGURES, description and claims provided herein.
A more complete and thorough understanding of the present embodiments and advantages thereof may be acquired by referring to the following description taken in conjunction with the accompanying drawings, in which like reference numbers indicate like features, and wherein:
Preferred embodiments of the invention and its advantages are best understood by reference to the drawings wherein like numbers refer to like and corresponding parts.
As the value and use of information continues to increase, individuals and businesses seek additional ways to process and store information. One option available to users is information handling systems. An information handling system generally processes, compiles, stores, and/or communicates information or data for business, personal, or other purposes thereby allowing users to take advantage of the value of the information. Because technology and information handling needs and requirements vary between different users or applications, information handling systems may also vary regarding what information is handled, how the information is handled, how much information is processed, stored, or communicated, and how quickly and efficiently the information may be processed, stored, or communicated. The variations in information handling systems allow for information handling systems to be general or configured for a specific user or specific use such as financial transaction processing, airline reservations, enterprise data storage, or global communications. In addition, information handling systems may include a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems.
Preferred embodiments and their advantages are best understood by reference to
In one aspect, a system and method suitable for modifying or otherwise maintaining processor/memory affinity information in an information handling system are disclosed. The system may be a NUMA server system having multiple nodes including a first node and a second node. Each node includes one or more processors and local system memory that is accessible to the node processors via a shared local bus. Processors on the first node can also access memory on the second node via an inter-node interconnect referred to herein as a NUMA interconnect.
The preferred implementation of the information handling system supports memory migration, in which the contents of a block of memory cells are copied from a first portion of memory to a second portion of memory. The memory migration may also include modifying memory address decoder hardware and/or firmware to re-map a first range of physical memory addresses from a first block of memory cells (i.e., a first portion of memory) to the second block of memory cells (i.e., a second portion of memory). If the first and second portions of memory reside on different nodes, the system also modifies an affinity table to reflect the first range of memory addresses, after remapping, as residing on or being local to the second node.
Following modification of the affinity table, the updated affinity information is used to re-populate operating system affinity information. Following re-population of the operating system affinity information, the operating system is able to allocate threads to processors in a node-efficient manner in which, for example, a thread that primarily accesses the range of memory addresses may be allocated, in the case of a new thread, or migrated, in the case of an existing thread, to a processor on the second node.
Turning now to
Referring now to
In the depicted implementation, a serial port 107 is also connected to peripheral bus 211 and provides an interface to an inter-node interconnect link 105, also referred to herein as NUMA interconnect link 105.
Returning now to
First node 102-1 as shown in
NUMA server 100 as depicted in
A chip set 124 is connected through a south bridge 120 to first IO hub 110-1. Chip set 124 includes a flash BIOS 130. Flash BIOS 130 includes persistent storage containing, among other things, system BIOS code that generates processor/memory affinity information 132. Processor/memory affinity information 132 includes, in some embodiments, a static resource affinity table 300 and a system locality information table 400 as described in greater detail below with respect to
As used throughout this specification, affinity information refers to information indicating a proximity relationship between portions of system memory and nodes in a NUMA server. In one implementation, processor/memory affinity information is formatted in compliance with the Advanced Configuration and Power Interface (ACPI) standard. ACPI is an open industry specification that establishes industry standard interfaces for operating system directed configuration and power management on laptops, desktops, and servers. ACPI is fully described in the Advanced Configuration and Power Interface Specification revision 3.0a (the ACPI specification) from the Advanced Configuration and Power Interface work group (www.ACPI.info). The ACPI specification and all previous revisions thereof is incorporated in its entirety by reference herein.
ACPI includes, among other things, a specification of the manner in which memory affinity information is formatted. ACPI defines formats for two data structures that provide processor/memory affinity information. These data structures include a Static Resource Affinity Table (SRAT) and a System Locality Information Table (SLIT).
Memory affinity data structure 301 as shown in
Referring now to
Some embodiments of a memory affinity information modification procedure may be implemented as a set of computer executable instructions (software). In these embodiments, the computer instructions are stored on a computer readable medium such as a system memory or a hard disk. When executed by a suitable processor, the instructions cause the computer to perform a memory affinity information modification procedure, an exemplary implantation of which is depicted in
Turning now to
Following the memory migration event in block 502, method 500 as depicted includes updating (block 504) BIOS affinity information. The depicted embodiment of method 500 recognizes a distinction in affinity information that is visible to BIOS and affinity information that is visible to the operating system. This distinction is consistent with the reality of many affinity information implementations. As described previously with respect to
Thus, method 500 as depicted includes updating (block 504) the BIOS visible affinity information following the memory migration event. BIOS code then notifies (block 506) the operating system that a memory migration has occurred. Method 500 then further includes updating (block 508) the operating system affinity information (i.e., the affinity information that is visible to the operating system). Following the updating of the operating system visible affinity information, the operating system has accurate affinity information with which to allocate resources following a memory migration event.
Turning now to
In one aspect, SMI 610 is a BIOS procedure for migrating memory and subsequently reloading memory/node affinity information. Memory migration refers to copying or otherwise moving the contents (data) of a portion of system memory from one portion of system memory to another and, in addition, altering the memory decoding structure so that the physical addresses associated with the data do not change. SMI 610 also includes updating affinity information after the memory migration is complete. Reloading the affinity information may include, for example, reloading SRAT 300 and SLIT 400.
As depicted in
The depicted embodiment of migration module 610 includes disabling (block 612) the first portion of memory, which is the portion of memory from which the data was migrated. The illustrated embodiment is particularly suitable for applications in which memory migration is triggered in response to detecting a “bad” portion of memory. A bad portion of memory may be a memory card or other portion of memory containing one or more correctable errors (e.g., single bit errors). Other embodiments, however, may initiate memory migration even when no memory errors have occurred to achieve other objectives including, but not limited to, for example, distributing allocated system memory more evenly across the server nodes. Thus, in some implementations, memory migration will not necessarily include disabling portions of system memory.
As part of the memory migration procedure, the depicted embodiment of SMI 610 includes reprogramming (block 613) memory decode registers. Reprogramming the memory decoder registers causes a remapping of physical addresses from a first portion of memory to a second portion of memory. After memory decode register reprogramming, a physical memory address that accessed a memory location in a first portion of memory that was affected by the migration accesses a second memory cell location in a second portion of memory after the migration is complete and the memory addressed decoders have been reprogrammed.
Having reprogrammed the memory decoder registers in block 613, the depicted embodiment of SMI 610 includes reloading (block 614) BIOS-visible affinity information including, for example, SRAT 300 and SLIT 400 and/or other suitable affinity tables. As indicated previously, SRAT 300 and SLIT 400 are located, in one implementation, a portion of system memory reserved for or other accessible only to BIOS. SRAT 300 and SLIT 400 are sometimes referred to herein as the BIOS-visible affinity information to differentiate operating system memory affinity information, which is preferably stored in system memory.
In cases where memory migration crosses node boundaries, the BIOS visible affinity information (e.g., SRAT 300 and SLIT 400) after migration will be different than the SRAT and SLIT preceding migration. More specifically, the SRAT and SLIT after migration will reflect the migrated portion of memory as now residing on a new node. Method 600 as described further below includes making the modified BIOS-visible information visible to the operating system.
Following the re-loading of SRAT 300 and SLIT 400, the depicted embodiment of SMI 610 includes generating (block 615) a system control interrupt (SCI). The SCI generated in block 615 initiates procedures that expose the re-loaded BIOS-visible affinity information to the operating system. Specifically, as depicted the SCI interrupt generated in block 615 calls the operating system SCI handler 650.
OS SCI handler 650 is invoked when SMI 610 issues an interrupt. As depicted in
Returning back to OS SCI handler 650, a decision is made in block 652 whether to discard and reload the operating system affinity information. If BIOS _Lxx method 630 notified the operating system to discard and reload its memory affinity information, OS SCI handler 650 recognizes the notification, discards (block 654) its current affinity information, and reloads (block 656) the new information based on the new SRAT and SLIT values. The operating system affinity information may include tables, preferably stored in system memory, that mirror the BIOS affinity information including SRAT 300 and SLIT 400 stored in a BIOS reserved portion of system memory. If, on the other hand, OS SCI handler 650 has not been notified by BIOS _Lxx method 630 to discard and reload the SRAT and SLIT, OS SCI handler 650 terminates without taking further action. Thus, memory migration module 610 and affinity module 620 are effective in responding to a memory migration event by updating the affinity information maintained by the operating system.
Although the disclosed embodiments have been described in detail, it should be understood that various changes, substitutions and alterations can be made to the embodiments without departing from their spirit and scope