1. Field of the Invention
The present invention relates generally to communication protocols between a host computer and an input/output (I/O) adapter. More specifically, the present invention provides an implementation for virtualizing memory registration and window resources on a physical I/O adapter. In particular, the present invention provides a mechanism by which a system image, such as a general purpose operating system (e.g. Linux, Unix, or Windows) or a special purpose operating system (e.g. a Network File System server), may directly expose real memory addresses, such as the memory addresses used by a host processor or host memory controller to access memory, to a Peripheral Component Interconnect (PCI) adapter, such as a PCI, PCI-X, or PCI-E adapter, that supports memory registration or windows, such as an InfiniBand Host Channel Adapter, an iwarp Remote Direct Memory Access enabled Network Interface Controller (RNIC), a TCP/IP Offload Engine (TOE), an Ethernet Network Interface Controller (NIC), Fibre Channel (FC) Host Bus Adapters (HBAs), parallel SCSI (pSCSI) HBAs, iSCSI adapters, iSCSI Extensions for RDMA (iSER) adapters, and any other type of adapter that supports a memory mapped I/O interface.
2. Description of the Related Art
Virtualization is the creation of substitutes for real resources. The substitutes have the same functions and external interfaces as their real counterparts, but differ in attributes such as size, performance, and cost. These substitutes are virtual resources and their users are usually unaware of the substitute's existence. Servers have used two basic approaches to virtualize system resources: Partitioning and Hypervisors. Partitioning creates virtual servers as fractions of a physical server's resources, typically in coarse (e.g., physical) allocation units (e.g., a whole processor, along with its associated memory and I/O adapters). Hypervisors are software or firmware components that can virtualize all server resources with fine granularity (e.g., in small fractions of a single physical resource).
Servers that support virtualization presently have two options for handling I/O. The first option is to not allow a single physical I/O adapter to be shared between virtual servers. The second option is to add function into the Hypervisor, or another intermediary, that provides the isolation necessary to permit multiple operating systems to share a single physical adapter.
The first option has several problems. One significant problem is that expensive adapters cannot be shared between virtual servers. If a virtual server only needs to use a fraction of an expensive adapter, an entire adapter would be dedicated to the server. As the number of virtual servers on the physical server increases, this leads to underutilization of the adapters and more importantly to a more expensive solution, because each virtual server would need a physical adapter dedicated to it. For physical servers that support many virtual servers, another significant problem with this option is that it requires many adapter slots, with all the accompanying hardware (e.g., chips, connectors, cables) required to attach those adapters to the physical server.
Although the second option provides a mechanism for sharing adapters between virtual servers, that mechanism must be invoked and executed on every I/O transaction. The invocation and execution of the sharing mechanism by the Hypervisor or other intermediary on every I/O transaction degrades performance. It also leads to a more expensive solution, because the customer must purchase more hardware, either to make up for the cycles used to perform the sharing mechanism or, if the sharing mechanism is offloaded to an intermediary, for the intermediary hardware.
Therefore, it would be advantageous to have mechanism that allows a system image within a multiple system image virtual server to directly expose a portion or all of its associated system memory to a shared PCI adapter without having to go through a trusted component, such as a Hypervisor, without any additional address translation and protection hardware on the host. It would also be advantageous for the system image to expose memory to a shared adapter during an infrequently used operation, such as the assignment of memory to the System Image by the Hypervisor, or when the System Image pin its memory with help from the Hypervisor. It would also be. advantageous to have the mechanism apply to Ethernet Network Interface Controllers (NICs), Fibre Channel (FC) Host Bus Adapters (HBAs), parallel SCSI (pSCSI) HBAs, InfiniBand Host Channel Adapters (HCAs), TCP/IP Offload Engines, Remote Direct Memory Access (RDMA) enabled NICs, iSCSI adapters, iSCSI Extensions for RDMA (iSER) adapters, and any other type of adapter that supports a memory mapped I/O interface.
The present invention provides a method, system, and computer program product for allowing a system image within a multiple system image virtual server to directly expose a portion, or all, of its associated system memory to a shared PCI adapter without having to go through a trusted component, such as a Hypervisor, and without any address translation and protection hardware on the host. Specifically, the present invention is directed to a mechanism for sharing conventional PCI I/O adapters, PCI-X I/O Adapters, PCI-Express I/O Adapters, and, in general, any I/O adapter that uses a memory mapped I/O interface for communications.
A mechanism is provided that allows hosts that provide address translation and protection hardware to use that hardware in conjunction with an address translation and protection table in the adapter. A mechanism is also provided that allows a host that does not provide an address translation and protection table to protect its addresses strictly by using an address translation and protection table and a range table in the adapter.
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
The present invention applies to any general or special purpose host that uses PCI family I/O adapter to directly attach storage or to attach to a network, where the network consists of endnodes, switches, router and the links interconnecting these components. The network links can be Fibre Channel, Ethernet, InfiniBand, Advanced Switching Interconnect, or a proprietary link that uses proprietary or standard protocols.
With reference now to the figures and in particular with reference to
Network 120 can also attach large host node 124 through port 136 which attaches to switch 140. Large host node 124 can also contain a second type of port 128, which connects to a direct attached storage subsystem, such as direct attached storage 132.
Network 120 can also attach a small integrated host node 144 which is connected to network 120 through port 148 which attaches to switch 140. Small integrated host node 144 can also contain a second type of port 152 which connects to a direct attached storage subsystem, such as direct attached storage 156.
Turning next to
In this example, small host node 202 includes two processor I/O hierarchies, such as processor I/O hierarchy 200 and 203, which are interconnected through link 201. In the illustrative example of
With reference now to
In this example, small integrated host node 302 includes two processor I/O hierarchies 300 and 303, which are interconnected through link 301. In the illustrative example, processor I/O hierarchy 300 includes processor chip 304, which is representative of one or more processors and associated caches. Processor chip 304 is connected to memory 312 through link 308. One of the links on the processor chip, such as link 330, connects to a PCI family adapter, such as PCI family adapter 345. Processor chip 304 has one or more PCI family (e.g., PCI, PCI-X, PCI-Express, or any future generation of PCI) links that is used to connect either PCI family I/O bridges or a PCI family I/O adapter, such as PCI family adapter 344 and PCI family adapter 345 through a PCI link, such as link 316, 330, and 324. PCI family adapter 345 can also be used to connect with a network, such as network 364, through link 356 via either a switch or router, such as switch or router 360. PCI family adapter 344 can be used to connect with direct attached storage 352 through link 348.
Turning now to
In this example, large host node 402 includes two processor I/O hierarchies 400 and 403 interconnected through link 401. In the illustrative example of
Procesor I/O hierarchy 403 includes processor chip 405, which is representative of one or more processors and associated caches. Processor chip 405 is connected to memory 413 through link 409. One of links 415 and 418, such as link 418, on the processor chip connects to a non-PCI I/O hub, such as non-PCI I/O hub 419. The non-PCI I/O hub uses a network 492 to attach to a non-PCI I/O bridge 488. That is, non-PCI I/O bridge 488 is connected to switch or router 494 through link 490 and switch or router 494 also attaches to non-PCI I/O hub 419 through link 496. Network 492 allows the non-PCI I/O hub and non-PCI I/O bridge to be placed in different packages. Non-PCI I/O bridge 488 has one or more links that are used to connect with other non-PCI I/O bridges or a PCI family I/O adapter, such as PCI family adapter 480 and PCI family adapter 474 through a PCI link, such as link 482, 484, and 486. PCI family adapter 480 can be used to connect direct attached storage 476 through link 478. PCI family adapter 474 can also be used to connect with network 464 through link 473 via, for example, either a switch or router 472.
Turning next to
PCI bus transaction 500 shows three phases: an address phase 508; a data phase 512; and a turnaround cycle 516. Also depicted is the arbitration for next transfer 504, which can occur simultaneously with the address, data, and turnaround cycle phases. For PCI, the address contained in the address phase is used to route a bus transaction from the adapter to the host and from the host to the adapter.
PCI-X transaction 520 shows five phases: an address phase 528; an attribute phase 532; a response phase 560; a data phase 564; and a turnaround cycle 566. Also depicted is the arbitration for next transfer 524 which can occur simultaneously with the address, attribute, response, data, and turnaround cycle phases. Similar to conventional PCI, PCI-X uses the address contained in the address phase to route a bus transaction from the adapter to the host and from the host to the adapter. However, PCI-X adds the attribute phase 532 which contains three fields that define the bus transaction requestor, namely: requestor bus number 544, requestor device number 548, and requestor function number 552 (collectively referred to herein as a BDF). The bus transaction also contains miscellaneous field 536, tag field 540, and byte count field 556. Tag 540 uniquely identifies the specific bus transaction in relation to other bus transactions that are outstanding between the requestor and a responder. The byte count 556 contains a count of the number of bytes being sent.
Turning now to
PCI-E bus transaction 600 shows six phases: frame phase 608; sequence number 612; header 664; data phase 668; cyclical redundancy check (CRC) 672; and frame phase 680. PCI-E header 664 contains a set of fields defined in the PCI-Express specification, including format 620, type 624, requestor ID 628, reserved 632, traffic class 636, address/routing 640, length 644, attribute 648, tag 652, reserved 656, byte enables 660. Specifically, the requestor identifier (ID) field 628 contains three fields that define the bus transaction requester, namely: requester bus number 684, requestor device number 688, and requestor function number 692. The PCI-E header also contains tag 652, which uniquely identifies the specific bus transaction in relation to other bus transactions that are outstanding between the requester and a responder. The length field 644 contains a count of the number of bytes being sent.
With reference now to
The functions performed at the super-privileged physical resource allocation level 700 include but are not limited to: PCI family adapter queries, creation, modification and deletion of virtual adapters, submission and retrieval of work, reset and recovery of the physical adapter, and allocation of physical resources to a virtual adapter instance. The PCI family adapter queries are used to determine, for example, the physical adapter type (e.g. Fibre Channel, Ethernet, iSCSI, parallel SCSI), the functions supported on the physical adapter, and the number of virtual adapters supported by the PCI family adapter. The LPAR manager performs the physical adapter resource management 704 functions associated with super-privileged physical resource allocation level 700. However, the LPAR manager may use a system image, for example an I/O hosting partition, to perform the physical adapter resource management 704 functions.
Note that the term system image in this document refers to an instance of an operating system. Typically multiple operating system instances run on a host server and share resources such as memory and I/O adapters.
The functions performed at the privileged virtual resource allocation level 708 include, for example, virtual adapter queries, allocation and initialization of virtual adapter resources, reset and recovery of virtual adapter resources, submission and retrieval of work through virtual adapter resources, and, for virtual adapters that support offload services, allocation and assignment of virtual adapter resources to a middleware process or thread instance. The virtual adapter queries are used to determine: the virtual adapter type (e.g. Fibre Channel, Ethernet, iSCSI, parallel SCSI) and the functions supported on the virtual adapter. A system image performs the privileged virtual adapter resource management 712 functions associated with virtual resource allocation level 708.
Finally, the functions performed at the non-privileged level 716 include, for example, query of virtual adapter resources that have been assigned to software running at the non-privileged level 716 and submission and retrieval of work through virtual adapter resources that have been assigned to software running at the non-privileged level 716. An application performs the virtual adapter access library 720 functions associated with non-privileged level 716.
With reference now to
If the processor, I/O hub, or I/O bridge 800 uses the same bus number, device number, and function number for all transaction initiators, then when a software component initiates a PCI-X or PCI-E bus transaction, such as host to adapter PCI-X or PCI-E bus transaction 812, the processor, I/O hub, or I/O bridge 800 places the processor, I/O hub, or I/O bridge's bus number in the PCI-X or PCI-E bus transaction's requestor bus number field 820, such as requestor bus number 544 field of the PCI-X transaction shown in
If the processor, I/O hub, or I/O bridge 800 uses a different bus number, device number, and function number for each transaction initiator, then the processor, I/O hub, or I/O bridge 800 assigns a bus number, device number, and function number to the transaction initiator. When a software component initiates a PCI-X or PCI-E bus transaction, such as host to adapter PCI-X or PCI-E bus transaction 812, the processor, I/O hub, or I/O bridge 800 places the software component's bus number in the PCI-X or PCI-E bus transaction's requester bus number 820 field, such as requestor bus number 544 field shown in
Turning next to
Turning next to
The present invention allows a system image within a multiple system image virtual server to directly expose a portion, or all, of the system image's system memory to a shared I/O adapter without having to go through a trusted component, such as an LPAR manager or Hypervisor.
For the purpose of illustration two representative embodiments are described herein. In one representative embodiment, described in
The present invention allows a system image within a multiple system image virtual server to directly expose a portion, or all, of the system image's system memory to a shared I/O adapter without having to go through a trusted component, such as an LPAR manager or Hypervisor.
For the purpose of illustration two representative embodiments are described herein. In one representative embodiment, described in
With reference next to
A system image, such as System Image A 1196 depicted in
The host depicted in
For example, in
Using the mechanisms depicted in
A specific record in protection table 1200 is accessed using key 1204, such as a local key (L_KEY) for InifiniBand adapters, or a steering tag (STag) for iWarp adapters. Protection table 1200 comprises at least one record, where each record comprises access controls 1208, protection domain 1212, key instance 1216, window reference count 1220, Physical Address Translation (PAT) size 1224, page size 1228, First Byte Offset (FBO) 1232, virtual address 1236, length 1240, and PAT pointer 1244. PAT pointer 1244 points to physical address table 1248.
Access controls 1208 typically contains access information about a physical address table such as whether the memory referenced by the physical address table is valid or not, whether the memory can be read or written to, and if so whether local or remote access is permitted, and the type of memory, i.e. shared, non-shared or memory window.
Protection domain 1212 associates a memory area with a queue. That is, the context used to maintain the state of the queue, and the address protection table entry used to maintain the state of the memory area, must both have the same protection domain number. Key instance 1216 provides information on the current instance of the key. Window reference count 1220 provides information as to how many windows are currently referencing the memory. PAT size 1224 provides information on the size of physical address table 1248.
Page size 1228 provides information on the size of the memory page. FBO 1232 provides information on the first byte offset into the memory, which is used by iwarp or InfiniBand adapters to reference the first byte of memory that is registered using iwarp or InfiniBand (respectively) Block Mode I/O physical buffer types.
Length 1240 provides information on the length of the memory because a memory area is typically specified using a starting address and a length.
The process depicted in
The LPAR manager next translates the memory addresses, which can be either virtual or physical addresses, into real addresses and PCI bus addresses in 1310, adds an entry in the ATPT in 1312, and provides the System Image with the memory address translation in 1314. That is, for virtual addresses that were supplied by the System Image, it provides the virtual addresses to PCI bus addresses. For physical addresses that were supplied by the System Image, it provides the physical addresses to PCI bus addresses. After step 1314 completes the operation ends.
In the event of an error, such as when the LPAR manager determines that the System Image does not own the memory it wants to pin in 1304 or that the ATPT does not have an entry available in 1306, then the LPAR manager in 1316 creates an error record, brings down the System Image, and the operation ends.
The operation begins when a system image performs a register memory operation in 1402. In 1404 the adapter checks to see if the adapter's ATPT has an entry available. If an entry is available in the adapter's ATPT, then in 1406 the adapter performs a register memory operation and the operation ends. If an entry in the adapter's ATPT is not available, an error record is created in 1408. The operation then ends.
Typically, one or more logical memory blocks (LMB) are associated or disassociated with a system image during a configuration event. A configuration event usually occurs infrequently. In contrast, memory within an LMB is typically pinned or unpinned frequently such that it is common for memory pinning or unpinning to occur millions of times a second on a high end server.
The operation begins when a system image performs an unpin operation in 1502. The LPAR manager unpins the memory addresses referenced in the unpin operation in 1504 and the operation ends.
Typically, memory pages can be accessed through four types of addresses: Virtual Addresses, Physical Addresses, Real Addresses, and PCI Bus Addresses.
A Virtual Address is the address a user application running in a System Image uses to access memory. Typically, the memory referenced by the Virtual Address is protected so that other user applications cannot access the memory.
A Physical Address refers to the address the system image uses to access memory. A Real Address is the address a system processor or memory controller uses to access memory. A PCI Bus Address is the address an I/O adapter uses to access memory.
Typically, on a system that does not support an LPAR manager (or Hypervisor), when an I/O adapter accesses memory, the System Image translates the Virtual Address to a Physical Address, the Physical Address to a Real Address, and finally the Real Address to a PCI Bus Address.
Typically, on a system that does support an LPAR Manager (or Hypervisor), when an I/O adapter accesses memory, the System Image translates the Virtual Address to a Physical Address, and then the LPAR manager (or Hypervisor) translates the Physical Address to a Real Address and then a PCI Bus Address.
Servers that provide I/O access protection use an I/O address translation and protection mechanism to determine if an I/O adapter is associated with a PCI Bus Address. If the adapter is associated with the PCI Bus Address, then the I/O address translation and protection mechanism is used to translate the PCI Bus Address into a Real Address. Otherwise an error occurs.
The remainder of this discussion,
In
Except for the range tables, which the system image is prevented from accessing by the LPAR manager (or Hypervisor), the system image may utilize real addresses in all internal adapter structures, such as, for example, protection tables, translation tables, work queues, and work queue elements. In addition, the system image may use real addresses in the page-list provided in Fast Memory Registration operations. The adapter is thus made aware of the LMB structure, as well as the association of the particular LMB with a system image.
Using the system image ID and range table, the adapter may validate whether or not a real address the system image is attempting to expose or access is actually associated with that system image. Thus, the adapter is trusted to perform memory access validations to prevent unauthorized access to the system memory. Having the adapter validate memory access is thus faster and more efficient than having an LPAR manager validate memory access.
The adapter, such as virtual adapter 1614, is responsible for access control when performing I/O operations requested by the system image. The access control may include validating that access to the real address is authorized for the given system image, and validating access is authorized based on the system image ID and information in the range tables. The adapter is also responsible for: associating a resource to one or more PCI virtual ports and to one or more virtual downstream ports; performing the memory registrations requested by a system image; and performing I/O transactions associated with a system image in accordance with illustrative embodiments of the present invention.
Like the adapter virtualization approach described in
PCI Adapter 1631 associates to a host side system image one set of processing queues, such as processing queue 1604, either a verb memory address translation and protection table or a set of verb memory address translation and protection table entries, such as Verb Memory translation and protection tables (TPT) 1612; one downstream virtual port, such as Virtual PCI Port 1606; and one upstream Virtual Adapter (PCI) ID (VAID), such as the bus, device, function number (BDF 1626). If the adapter supports out of user space access, such as would be the case for an InfiniBand Host Channel Adapter or an RDMA enabled NIC, then the I/O operation used to initiate a work request may be validated by checking that the queue pair associated with the work request has the same protection domain as the memory region referenced by the data segment.
Verb Mem TPT 1612 is a memory translation and protection table that may be implemented in adapters capable of supporting memory registration, such as InfiniBand and iwarp-style adapters. Verb Mem TPT 1612 is used by the adapter to validate access to memory on the host. For example, when the system image wants the adapter to access a memory region of the system image, the system image passes a PCI Bus address to the adapter, the length and a key, such as L_key for an Infiniband adapter and Stag for an iwarp adapter. The key is used to access an entry in Verb Mem TPT 1612.
Verb Mem TPT 1612 controls access to memory regions on the host by using a set of variables, such as, for example, local read, local write, remote read, remote write. Verb Mem TPT 1612 also comprises a protection domain field, which is used to associate an entry in the table with a queue. As will be described further in
In this illustrative embodiment, virtual adapter 1614 is also shown to contain range table 1611. Range table 1611 is used to determine the LMB addresses that system image 1696 may use. For instance, as shown in
The LPAR manager, or an intermediary, sets the PCI Bus Addresses equal to the Real Addresses and provides the PCI Bus addresses to the system image associated with the allocated LMBs. The LPAR manager is responsible for updating the internal adapter's Logical Memory Block structure, or range table 1611, and the System Image ID field in the Verb Mem TPT 1612 which together used for memory access validation. The system image is responsible for updating all other internal adapter structures.
A specific record in protection table 1700 is accessed using key 1704, such as a local key (L_KEY) for Infiniband adapters, or a steering tag (STag) for iwarp adapters. Protection table 1700 comprises one or more records, where each record comprises access controls 1716, protection domain 1720, system image identifier (SI ID 1) 1724, key instance 1728, window reference count 1732, PAT size 1736, page size 1740, virtual address 1744, FBO 1748, length 1752, and PAT pointer 1756. All fields in a Protection Table record, such as protection table 1700, can be written and read by the System Image, except the System Image Identifier field, such as SI ID 11724. The System Image Identifier field, such as SI ID 11724, can only be read or written by the LPAR manager or by the PCI Adapter.
PAT pointer 1756 points to physical address table 1708, which in this example is a PCI bus address table. SI ID 11724 points to Logical Memory Block (LMB) table, or range table, 1712 that is associated with a specific system image.
Access controls 1716 typically contains access information about a physical address table such as whether the memory referenced by the physical address table is valid or not, whether the memory can be read or read and written to, and if so whether local or remote access is permitted, and the type of memory, i.e. shared, non-shared or memory window.
Protection domain 1720 associates a memory area with a queue protection domain number. Compared to previous implementations, the present invention adds a system image identifier such as SI ID 11724 to each record in the protection table 1700 and uses the SI ID 11724 to reference a range table, such as range table 1712 which is associated with SI ID 1.
Key instance 1728 provides information on the current instance of the key. Window reference count 1732 provides information as to how many windows are currently referencing the memory. PAT size 1736 provides information on the size of physical address table 1708.
Page size 1740 provides information on the size of the memory page. Virtual address 1744 provides the virtual address. FBO 1748 provides the first byte offset into the memory region.
Length 1752 provides information on the length of the memory. A memory area is typically specified using a starting address and a length.
PCI bus address table 1708 contains the addresses associated with a memory area, such as a memory region (iwarp) or memory window (InfiniBand), that can be directly accessed by the system image associated with the PCI bus address table. The PCI bus address table 1708, contains one or more physical I/O buffers, and each physical I/O buffer is referenced by a PCI bus address 1758 and length 1762, or if all physical buffers are the same size, by just a physical address 1758. PCI bus address 1758 typically contains a PCI bus address that the adapter will use to access system memory. In the present invention, the LPAR manager will have set the PCI bus address equal to the real address that the system memory controller can use to directly access system memory. Length 1762 contains the length of the allotted LMB, if multi-sized pages are supported.
Logical memory block (LMB) table 1712 contains one or more records, with each record comprising PCI bus address 1766 and length 1770. In the present invention, the LPAR manager sets the PCI bus address 1766 equal to the real memory address used by the system memory controller to access memory and therefore does not require any further translation at the host. Length 1770 contains the length of the LMB.
Typically, the allocation is performed when the system image is (a) initially booted or (b) reconfigured with additional resources. Typically, a trusted entity such as the Hypervisor or LPAR manager does the allocation.
The operation begins in 1802 when the trusted entity receives a request to allocate memory for the system image. In 1804, for each I/O adapter that has a range table, the trusted entity, such as an LPAR manager or Hypervisor, allocates a set of IB or iWARP style memory region or memory window entries, such as a set of Protection Table 1700 and PCI Bus Address Table 1708 records, for the System Image to use. The trusted entity, such as an LPAR manager or Hypervisor, also loads into each Protection Table 1700 record the System Image ID field, such as SI ID 11724, with the identifier of the System Image associated with the entry. The operation then ends.
Typically, one or more logical memory blocks (LMB) are associated or disassociated with a system image during a configuration event. A configuration event usually occurs infrequently. In contrast, memory within an LMB is typically pinned or unpinned frequently such that it is common for memory pinning or unpinning to occur millions of times a second on a high end server.
The operation begins in one of two ways. If the LPAR manager sets up range table entries when an LMB is associated with a System Image, then the operation begins when an LMB is associated with a system image in 1902. Next, a determination is made whether the system image has I/O adapters that support range tables in 1904. If the system image does not have I/O adapters that support range tables then the operation ends.
If the system image has I/O adapters that support range tables, then in 1906 the adapter range table is checked to see whether it has an entry available. If the adapter range table has an entry available then in 1908 the LPAR manager translates the physical address into real addresses which equal the PCI bus addresses. The LPAR manager in 1910 then makes an entry in the range table containing the PCI Bus Addresses and length, or the range (high and low) of PCI Bus Addresses. Finally, the LPAR manager returns the PCI bus addresses which equal the real addresses to the system image in 1912 and the operation ends.
If the LPAR manager sets up range table entries when a System Image requests memory to be pinned, then the operation begins when a system image performs a memory pin operation in 1920. In 1922, a check is made to ensure that the memory referenced in the memory pin operation is associated with the system image performing the memory pin. If in 1922 the memory referenced in the memory pin operation is not associated with the system image performing the memory pin then an error record is created in 1924 and the operation ends.
If in 1922 the memory referenced in the memory pin operation is associated with the system image performing the memory pin, then in 1926 the LPAR manager pins the memory addresses referenced in the memory pin operation. Next a check is made in 1928 as to whether this is the first address of the LMB to be pinned. If in 1928 this is not the first address of the LMB to be pinned, then the operation ends successfully, because a pin request had been previously made on an address within the LMB, so the full LMB has already been made available to the adapter's range table for that System Image.
If in 1928 this is the first address of the LMB to be pinned, then in 1906 the adapter range table is checked to see whether it has an entry available. If the adapter range table has an entry available then in 1908 the LPAR manager translates the physical address into real addresses which equal the PCI bus addresses. The LPAR manager in 1910 then makes an entry in the range table containing the PCI Bus Addresses and length, or the range (high and low) of PCI Bus Addresses. Then, the LPAR manager returns the PCI bus addresses which equal the real addresses to the system image in 1912 and the operation ends.
If in 1906 the adapter's range table does not have an entry available, then an error record is created in 1924 and the operation ends.
The operation begins when a System Image performs an unpin operation in 2002. Typically, the unpin operation is performed on the host server by the LPAR manager in order to destroy one or more previously registered memory ranges. The unpin may be an InfiniBand or iWARP (RDMA enabled NIC) unpin.
The LPAR manager unpins, i.e. makes pageable, the real addresses associated with the memory in 2004. The LPAR manager then removes the associated entry for those real addresses in the adapter's range table in 2006. The operation then ends.
The operation begins when the adapter receives a request to access the system image's memory region in 2102. The adapter performs all appropriate memory and protection checks in 2104, such as IB or IWARP memory and protection checks. In 2106 the adapter looks in the Protection table for the Range table associated with the System Image, for example, by using the system image identifier (SI ID). In 2108, the adapter then determines whether the memory region in the access request is valid by determining whether the memory address in the access request is within the range of one of the entries in the adapter's Range table.
If the memory address in the request is within the range of one of the entries in the adapter's Range table then the corresponding physical address is retrieved from the Physical Address table in 2110. In 2112, the requested memory is then accessed using the corresponding physical address, for example, by using the physical address as the PCI bus address.
If the memory address in the request is not within the range of one of the entries in the adapter's Range table, then an error record is created and the system image is brought down in 2114.
The operation begins when an LMB is disassociated with a system image in 2202. Then, for each adapter with a range table, the LPAR manager destroys the range table entry associated with the system image in 2204 and the operation ends.
The invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In a preferred embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk—read only memory (CD-ROM), compact disk—read/write (CD-R/W) and DVD.
A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers.
Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.