The present invention relates to a computer system that unifies plural servers and a bus assignment method in the computer system, and more specifically, to bus assignment to make accesses to IO devices operate normally in moving a logical server between physical servers.
Along with performance improvement of the computer systems, a technique of curtailing a cost by consolidating processing that is distributed to plural servers into a single server has been put in practical use. As a technique of consolidating the servers, a server partitioning is known. This is to make plural operating systems work on a single server by assigning one operating technique of consolidating the servers, a server partitioning is known. This is to make plural operating systems work on a single server by assigning one operating system to each of divided partitions.
The server partitioning is divided into two: a physical partitioning of assigning an operating system to a physical computer resource of a node unit that includes a processor and a memory; and a logical partitioning of virtualizing a physical processor and a memory to generate an arbitrary number of logical partitions (logic processors) in a computer by firmware, called a hypervisor.
Since with the physical partitioning, the operating system can exclusively use physical computer resources, it can attain high performance. However, in order to increase the physical partitions, it is necessary to also multiply physical computer resources, and therefore there is a physical limit to the number of the physical partitions.
In the logical partitioning system, each operating system (guest OS) is made to execute on the logic processor that the hypervisor provides. Moreover, by the hypervisor mapping plural logical partitions on the physical processor, a partition can be divided into units finer than the node. This enables plural logical partitions to be executed by a single physical processor while the processor is changed over in timesharing, and it is possible to generate more of logical partitions than the number of the physical processors and to make them execute simultaneously.
As a method of further making effective use of the physical computer resource in the computer system that combine these physical partitioning and logical partitioning, there are known a method whereby logical partitions share an IO slot by switching over the IO slot that is used among the logical partitions in time sharing without causing degradation of performance (JP-A No. 122640/2005), a method whereby a logical partition that is located on a certain physical partition is migrated to a different physical partition according to a user's policy (JP-A No. 244481/2006), and a method whereby plural servers share plural computer resources through a switch (Multi-Root I/O Virtualization and Sharing Specification Revision 1.0, 1.2.3. MR-IOV Topology pp. 21-23 (May 12, 2008)).
Moreover, as the logical partitioning using the hypervisor, there are a method using only virtualization software and a method using hardware for some processing. The method using the virtualization software has high flexibility over IO virtualization because the virtualization software performs conversion from logical IO resource IDs (indicating a PCI bus number, an MMIO address, an IO address, and a configuration address that are necessary to access the IO device) to physical IO resource IDs. But its performance decreases because of overhead by software processing.
On the other hand, since the method using the hardware can transparently accesses the IO device by converting the logical IO resource ID to the physical IO resource ID required at the time of an IO access, the method has higher performance than the method using only the virtualization software, but its flexibility over the IO virtualization lowers.
As described above, since, in the virtualization method using the hardware, the IO device is provided transparently to the guest OS, its performance is high compared with the method using the virtualization software, but its flexibility over the IO virtualization lowers. This is because, when migration of the logical partition described in JP-A No. 244481/2006, etc. is performed, disagreement is caused between the logical IO resource ID of the IO device that the guest OS uses and the physical IO resource ID.
For example, in the computer system using the virtualization method by the hardware, when Hot Plugging is done, the reallocation of the IO resource ID of the each IO device is performed by hot-plugging a new IO device, and the IO resource ID of the IO device already in use may change. In this case, the guest OS may use the IO resource ID before alteration with an intention of accessing the IO device that was used originally, and may access the different IO device. In the worst case, the computer system may become unable to be activated at the time of reboot because the IO resource ID of a boot device has changed.
Moreover, when live migration of the logical partitions is performed in a similar computer system, an IO configuration of the computer system when seen from the guest OS may be different before and after the migration. In this case, contention arises between the IO resource ID that the guest OS used before the migration and the IO resource ID of another device currently used at the migration destination.
The object of the present invention is to realize takeover of an IO configuration without causing disagreement between a logical IO resource ID of an IO device that a guest OS uses in a virtual computer system and a physical IO resource ID, and to make an IO access at the time of migrating a logical partition work normally.
In order to solve the above-mentioned problem, the computer system according to the present invention includes: a server having an IO bridge; a switch that has a first IO bridge for connecting with the IO bridge of the server through a bus and plural second IO bridges for connecting with plural IO devices through a bus; and bus number assignment management means for assigning mutually different PCI bus numbers to the plural second IO bridges.
Moreover, the IO resource ID assignment method according to the present invention is an IO resource ID assignment method in the computer system that includes the server having the IO bridge and the switch that has the first IO bridge for connecting with the IO bridge of the server through the bus and the plural second IO bridges for connecting to the plural IO devices through the bus, and includes: a bus number assignment management step of, when assigning a PCI bus number as the IO resource ID, fixedly assigning the PCI bus number to the second IO bridges; and a step of referring to a relationship between the PCI bus numbers assigned in the bus number assignment management step and the second IO bridge and forming a virtual switch for connecting the server and the IO device together.
In a preferred example, the bus number assignment management means is configured as a bus number assignment management table for registering information for fixedly assigning the PCI bus number to the IO bridge in the switch.
Preferably, the bus number assignment management table registers pieces of information about an upstream PCI bus number and a downstream PCI bus number that are assigned to a virtual bridge in the switch for every switch number for identifying the switch, and a downstream-most PCI bus number that is connected from the IO bridge.
In a preferred example, the computer system has a PCI manager for setting up contents of the bus number assignment management table and a register that is prepared in the switch and stores the bus number assignment management table.
Especially, the computer system according to the present invention, in addition to the above-mentioned configuration, has plural physical servers, and when the logical partitions of a certain physical server are live migrated to another physical server, the computer system does not update the bus number assignment management table but updates the switch VS(Virtual Switch) bridge table, and the logical partition that is migrated to the other physical server takes over the PCI bus number used before migration and uses the IO device.
According to the present invention, with the above configuration, improvement of performance and reliability by transparent accesses to the IO devices and flexibility by live migration can be realized.
Hereafter, embodiments will be described with reference to the drawings.
This computer system is constructed with plural physical servers 101 to 103 and plural IO devices 501-1, 501-3 to 501-6, and 502-1 to 502-5 connected to plural switches 301, 302. Furthermore, a service processor (SVP) 400 is connected to the physical servers 101 to 103 and the switches 301, 302 so as to be able to communicate with them. A PCI manager 410 for managing a setup, alteration, etc. of virtual switch mechanisms 311, 312 is mounted on the SVP 400, and is connected to the switches 301, 302 so as to be able to communicate with them.
Incidentally, although the switch is in a single stage configuration in this example, it is also possible for the switch to be constructed in a multistage configuration. Moreover, the PCI manager 410 does not necessarily need to be inside the SVP 400, but can be disposed outside the SVP 400, being connected to the SVP 400 so as to be able to communicate with it.
Regarding the plural physical servers, the physical server 101 is partitioned into plural logical partitions (in this example, two) by a hypervisor 121, which have respective guest OSs 111A, 111B. A north bridge 131 includes plural 10 bridges 141A, 141B, and is connected to the switches 301, 302 by buses 201A, 201B. Moreover, physical servers 102, 103 similarly include: each of them is configured to have one guest OS 112A (113A) and a north bridge 132 (133) that is connected to the switch 301 (302) through plural bridges 142A, 142B (143A, 143B).
The switch 301 has: plural first IO bridges 321A-1 to 321A-3 connected to the physical servers 101 to 103 through the bus 201A etc.; second IO bridges 321B-1 to 321B-6 connected to the IO devices 501 through a bus 211; a management port 331A connected to the PCI manager 410 of the SVP 400; a control port 331B connected to a control port 332A of the switch 302; a register 341; and the virtual switch mechanism 311.
In addition, in the computer system, the plural IO devices 501-1 to 501-6 are connected to the second IO bridges 321B-1 to 321B-6 through a slot (not illustrated) of the switch 301. The second IO bridges 321B-1 to 321B-6 in correspond to the so-called slot positions.
The virtual switch mechanism 311 is set up or altered by the PCI manager 410 through the management port 331A to form a virtual switch. The register 341 stores a switch port table 600 (to be described later).
The switch 302 is not directly connected to the SVP 400, but is connected to the switch 301 through the control port 332A. Except for this, the switch 302 has fundamentally the same configuration as that of the switch 301. The switches 301, 302 may differ in the number of and a type of the IO devices connected to them.
For example, it is indicated that the port number of the switch 302: a bridge 322A-1 has its connection in an UP direction (upstream direction), the type of the connection destination is the physical server 101 as the host, and a link destination is the bridge 141B of the physical server 101.
Next, the virtual switch formed with the virtual switch mechanism 311 etc. will be explained with reference to
In the illustrated example, three virtual switches are formed by each of the virtual switch mechanisms 311, 312. Virtual switches 351-1, 351-2, and 351-3 are formed in the switch 301; virtual switches 352-1, 352-2, and 352-3 are formed in the switch 302.
For example, the virtual switches 351-1 is constructed with the bridge 321A-1 to which the host is connected and the bridges 321B-1,321B-3 to which the IO devices 501-1, 501-3 are connected, respectively. The bridge 321B-2 is not used.
The PCI manager 410 is made to execute by an administrator manipulating the SVP 400, and this virtual switch is formed by the PCI manager 410 setting up the contents of a switch VS table 610 (
This switch VS table 610 registers pieces of information of: a switch number 611 for identifying a target switch; a virtual switch number 612 for indicating a number of the virtual switch formed in the switch 611; a valid field 613 for indicating whether the virtual switch concerned is valid; a start number 614 of the virtual bridge that forms the virtual switch when the virtual switch concerned is valid (Yes); and an entry number 615 for indicating the number of entries of the virtual bridge.
For example, three virtual switches (VS#1 to #3) are set up in the switch 301, all being valid (Yes), and virtual bridge start numbers of the virtual switches VS#1 to #3 are VB#1, #8, and #15. The numbers of respective entries of the virtual switches VS#1 to #3 are 3, 3, and 2.
This switch VS bridge table 620 registers pieces of information of: a switch number 621 for identifying the target switch; a valid field 623 for indicating whether a virtual bridge of a virtual bridge number 622 in the switch concerned is valid; a direction 624 for, when the virtual bridge is valid (Yes), indicating its direction; a map field 625 for indicating whether it is mapped to the IO bridge of a port number 626; the port number 626 for identifying the port; and a virtual hierarchy number 627 for, when it is mapped, indicating a number of its virtual hierarchy.
For example, the switch 301 is given the virtual bridge numbers of the VB#1 to #21, total 21, and all the bridges are in a valid (Yes) state. The 21 virtual bridges are divided into three groups each having seven virtual bridges, correspondingly to the three virtual switches (VS#1 to #3). A start virtual bridge corresponding to the virtual switch VS#1 is assigned as “VB#1”; a start virtual bridge corresponding to the virtual switch VS#2 is assigned as “VB#8”; a start virtual bridge corresponding to the virtual switch VS#3 is assigned as “VB#15”; and so on. For example, in the virtual switch VS#1, VB#1 is mapped, in the UP direction, to a port #321A-1, VB#2 is mapped, in a DOWN direction, to a port #321B-1, and VB#4 is mapped, in the DOWN direction, to a port #321B-3. Other maps indicate “Nos.”
Therefore, in the virtual switch VS#1, valid entries are three. Similarly, it is understood that a virtual switch VS#2 has three valid entries, VB#8, VB#12, and VB#13, and the virtual switch VS#3 has two valid entries, the VB#15 and the VB#21.
The bus number assignment management table 630 registers pieces of information of a primary bus number 633, a secondary bus number 634 and a subordinate bus number 635 that are assigned to the virtual bridge of a virtual bridge number 632 in the switch concerned, for every switch number 631 for identifying the switch.
Next, as an operation of a first embodiment, a Hot Plug operation of the IO device will be described. It is assumed that the computer system takes an initial configuration shown in
First, problems of a Hot Plug operation when the present invention is not applied will be explained. In an initial configuration of the computer system of this embodiment, the EFI/BIOS assigns in sequence the IO resource IDs to the first IO bridges 321A-1, 321B-1, and 321B-3 that are connected from the IO bridge 141A in the north bridge 131 of
In this state, an IO device 501-2 is Hot Plugged in the computer system, as shown in
By this Hot Plugging, the switch port table 600, the switch VS table 610, and the switch VS bridge table 620 are updated, as in
As a result, the virtual switch is configured to be added to the virtual switch 351-1, as shown in
In the virtual switch of
Therefore, in the case where the guest OSs 111A, 111B in the physical server 101 use the IO resource ID before the Hot Plugging in order to access an IO device 501-3, they will access the IO device 501-2. Depending on use of the IO device, the computer system can continue working by support of the OS even when the access becomes impossible. However, if a boot device cannot be accessed at the time of reboot, the computer system can no longer be activated, which incurs a fatal problem.
Now, a Hot Plug operation in the case where the present invention is applied will be explained.
First, in this embodiment, before the EFI/BIOS assigns the IO resource IDs, the PCI manager 410 of
The information of the bus number assignment management table 630 being set-up of
Next, the IO device 501-2 is Hot Plugged as shown in the computer system configuration of
With the new IO device 501-2 of
Since the assigned IO resource ID was uniquely established so as not to overlap those of other IO bridges/virtual bridges, the IO resource IDs of the second IO bridges 321B-1, 321B-3 to which the IO devices 501-1, 501-3 used before the Hot Plugging are connected do not change and accesses to the IO devices 501-1, 501-3 do not become impossible.
In this way, according to this embodiment, by fixedly assigning the IO resource IDs, even when Hot Plugging of the IO device is performed during system operation, it does not affect accesses to other IO devices, and consequently it is possible to flexibly configure the computer system.
A second embodiment relates to assignment of a bus to the IO device accompanying with live migration of the logical partitions. The computer system in the second embodiment, like the first embodiment, is of the same initial configuration as is shown in
First, a problem on the live migration of the logical partitions when the present invention is not applied will be explained.
In the initial configuration of the computer system of this embodiment, the EFI/BIOS assigns the IO resource IDs in sequence to the first IO bridges 321A-1, 321B-1, and 321B-3 that are connected from the IO bridge 141A in the north bridge 131 of
Similarly, when assigning MMIO space in sequence to the IO devices, an MMIO address 711-0 is assigned for an IO device 501-1 in an MMIO space 701 of the physical server 101 of
Further, when assigning the PCI bus number from the physical server 102, a bus 202A is assigned as the PCI bus number No. 1, a virtual switch 351-2 as the PCI bus number No. 2, a bus 211-4 as the PCI bus number No. 3, and a bus 211-5 as the PCI bus number No. 4. Similarly, when assigning the MMIO space in sequence to the IO devices, in an MMIO space 702 of the physical server 102 of
In this state, as shown in the computer system of
As a result, since a guest OS 112B that is live migrated from the physical server 101 to the physical server 102 in
Regarding this aspect, the above-mentioned problem can be eliminated according to the second embodiment. First, in this embodiment, before the EFI/BIOS assigns the IO resource IDs, the PCI manager 410 (
The information of the bus number assignment management table 630 that is set up is reflected on the EFI/BIOS side through the registers 341, 342 in the switches 301, 302 of
Next, using the PCI bus numbers assigned fixedly, a fixed MMIO space is assigned to the IO device.
When the fixed MMIO space is assigned to the IO device based on the flowchart of
Furthermore, when assigning the PCI bus numbers from the physical server 102, the bus 202A is assigned as the PCI bus number No. 1, the virtual switch 351-2 is assigned as the PCI bus number No. 2, the bus 211-4 is assigned as the PCI bus number No. 6, and the bus 211-5 is assigned as the PCI bus number No. 7. Similarly when assigning the fixed MMIO space in sequence to the IO devices, an MMIO address 712-6 (PCI bus number No. 6) is assigned for the IO device 501-4, and an MMIO address 712-7 (PCI bus number No. 7) is assigned for the IO device 501-5 in the MMIO space 702 of the physical server 102 of
Next, as shown in
Thus, since the guest OS 112B that is live migrated from the physical server 101 to the physical server 102 in
By fixedly assigning the IO resource IDs like this embodiment described above, even when the IO device is transparently provided to the guest OS, the live migration of the logical partitions during system operation can be performed, and the improvement of performance and reliability by the transparent accesses and the flexibility by the live migration can be attained.
Number | Date | Country | Kind |
---|---|---|---|
2008-201983 | Aug 2008 | JP | national |
This application is a continuation of U.S. patent application Ser. No. 12/486,927, filed Jun. 18, 2009, now allowed, which claims priority from Japanese application JP 2008-201983 filed on Aug. 5, 2008, the contents of which are hereby incorporated by reference into this application.
Number | Name | Date | Kind |
---|---|---|---|
5018133 | Tsukakoshi et al. | May 1991 | A |
5542055 | Amini et al. | Jul 1996 | A |
5857086 | Horan et al. | Jan 1999 | A |
5859989 | Olarig et al. | Jan 1999 | A |
5933614 | Tavallaei et al. | Aug 1999 | A |
5974474 | Furner et al. | Oct 1999 | A |
6094700 | Deschepper et al. | Jul 2000 | A |
6098114 | McDonald et al. | Aug 2000 | A |
6189050 | Sakarda | Feb 2001 | B1 |
6233634 | Clark et al. | May 2001 | B1 |
6332180 | Kauffman et al. | Dec 2001 | B1 |
6397268 | Cepulis | May 2002 | B1 |
6418492 | Papa et al. | Jul 2002 | B1 |
6430626 | Witkowski et al. | Aug 2002 | B1 |
6542953 | Porterfield | Apr 2003 | B2 |
6557068 | Riley et al. | Apr 2003 | B2 |
6594722 | Willke, II et al. | Jul 2003 | B1 |
6625673 | Dickey et al. | Sep 2003 | B1 |
6636904 | Fry et al. | Oct 2003 | B2 |
6647453 | Duncan et al. | Nov 2003 | B1 |
6662242 | Holm et al. | Dec 2003 | B2 |
6665759 | Dawkins et al. | Dec 2003 | B2 |
6668299 | Kagan et al. | Dec 2003 | B1 |
6732067 | Powderly | May 2004 | B1 |
6748478 | Burke et al. | Jun 2004 | B1 |
6785892 | Miller et al. | Aug 2004 | B1 |
6820149 | Moy | Nov 2004 | B2 |
6823418 | Langendorf et al. | Nov 2004 | B2 |
6865618 | Hewitt et al. | Mar 2005 | B1 |
6985990 | Bronson et al. | Jan 2006 | B2 |
7219183 | Pettey et al. | May 2007 | B2 |
7308551 | Arndt et al. | Dec 2007 | B2 |
7334071 | Onufryk et al. | Feb 2008 | B2 |
7363404 | Boyd et al. | Apr 2008 | B2 |
7366798 | Nordstrom et al. | Apr 2008 | B2 |
7809977 | Takamoto | Oct 2010 | B2 |
7890669 | Uehara et al. | Feb 2011 | B2 |
7991839 | Freimuth et al. | Aug 2011 | B2 |
8051254 | Suzuki | Nov 2011 | B2 |
8352665 | Nakayama et al. | Jan 2013 | B2 |
20020016891 | Noel et al. | Feb 2002 | A1 |
20020052914 | Zalewski et al. | May 2002 | A1 |
20020169918 | Piatetsky et al. | Nov 2002 | A1 |
20030005207 | Langendorf et al. | Jan 2003 | A1 |
20030012204 | Czeiger et al. | Jan 2003 | A1 |
20030037199 | Solomon et al. | Feb 2003 | A1 |
20040003063 | Ashok et al. | Jan 2004 | A1 |
20040039986 | Solomon et al. | Feb 2004 | A1 |
20040103210 | Fujii et al. | May 2004 | A1 |
20040260857 | Henderson et al. | Dec 2004 | A1 |
20050097384 | Uchara et al. | May 2005 | A1 |
20050182788 | Arndt et al. | Aug 2005 | A1 |
20050268065 | Awada et al. | Dec 2005 | A1 |
20060010278 | Dennis et al. | Jan 2006 | A1 |
20060072728 | Cope et al. | Apr 2006 | A1 |
20060106967 | Brocco et al. | May 2006 | A1 |
20060123178 | Lueck et al. | Jun 2006 | A1 |
20060195715 | Herington | Aug 2006 | A1 |
20060242353 | Torudbakken et al. | Oct 2006 | A1 |
20060242442 | Armstrong et al. | Oct 2006 | A1 |
20070165596 | Boyd et al. | Jul 2007 | A1 |
20080065826 | Recio et al. | Mar 2008 | A1 |
20080117907 | Hein | May 2008 | A1 |
20080162800 | Takashige et al. | Jul 2008 | A1 |
20080256327 | Jacobs et al. | Oct 2008 | A1 |
20090003245 | Wu et al. | Jan 2009 | A1 |
20090282300 | Heyrman et al. | Nov 2009 | A1 |
20090307456 | Patwari et al. | Dec 2009 | A1 |
20100312943 | Uehara et al. | Dec 2010 | A1 |
20110004688 | Matthews et al. | Jan 2011 | A1 |
20110029710 | Matthews et al. | Feb 2011 | A1 |
Number | Date | Country |
---|---|---|
10-178626 | Jun 1998 | JP |
10178626 | Jun 1998 | JP |
2001-337909 | Dec 2001 | JP |
2002-49572 | Feb 2002 | JP |
2002-149592 | May 2002 | JP |
2004-531838 | Oct 2004 | JP |
2005-122640 | May 2005 | JP |
2005-250975 | Sep 2005 | JP |
2006-004381 | Jan 2006 | JP |
2006-178553 | Jul 2006 | JP |
2006-244481 | Sep 2006 | JP |
2008-21252 | Jan 2008 | JP |
2008-146566 | Jun 2008 | JP |
2008-171413 | Jul 2008 | JP |
2009181418 | Aug 2009 | JP |
2009294828 | Dec 2009 | JP |
2010039760 | Feb 2010 | JP |
2010079816 | Apr 2010 | JP |
2011-171951 | Sep 2011 | JP |
2011171951 | Sep 2011 | JP |
2012150623 | Aug 2012 | JP |
Entry |
---|
Haojun Luo; Hui, J.Y.; Fayoumi, A.G., “A low power and delay multi-protocol switch with IO and network virtualization,” High Performance Switching and Routing (HPSR), 2013 IEEE 14th International Conference on , pp. 35,42, Jul. 8-11, 2013. |
Suzuki, J.; Hidaka, Y.; Higuchi, J.; Baba, T.; Kami, N.; Yoshikawa, T., “Multi-root Share of Single-Root I/O Virtualization (SR-IOV) Compliant PCI Express Device,” High Performance Interconnects (HOTI), 2010 IEEE 18th Annual Symposium on , pp. 25,31, Aug. 18-20, 2010. |
Multi-Root I/O Virtualization and Sharing Specification Revision 1.0, 1.2.3. MR-IOV Topology, May 12, 2008, pp. 21-23. |
“NN76043577: Error Recovery from Controller Failures in a Virtually Addressed Mass Storage System”, Apr. 1, 1976, IBM, IBM Technical Disclosure Bulletin, vol. 18, Iss. 11, pp. 3577-3578. |
“NN76043579: Generating Logical Unit Addresses Based Upon Interruption Signals”, Apr. 1, 1976, IBM, IBM Technical Disclosure Bulletin, vol. 18, Iss. 11, pp. 3579-3580. |
“NNRD428183: Mapping Large PCI Memory Windows on a Secondary Bus to Smaller Windows on a Primary Bus”, Dec. 1, 1999, IBM, IBM Technical Disclosure Bulletin, Iss. 428, pp. 1730. |
“NN9701133: Methodology for Software Peripheral Component Interconnect Frequency Selection”, Jan. 1, 1997, IBM, IBM Technical Disclosure Bulletin, vol. 40, Iss. 1, pp. 133-134. |
Number | Date | Country | |
---|---|---|---|
20130124775 A1 | May 2013 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12486927 | Jun 2009 | US |
Child | 13733515 | US |