The present invention relates generally to the data processing field, and more particularly, relates to a method, system and computer program product for implementing enhanced error handling (EEH) for a hardware I/O adapter, such as a Single Root Input/Output Virtualization (SRIOV) adapter, in a virtualized system.
Single root input/output (TO) virtualization (SRIOV) is a PCI standard, providing an adapter technology building block for I/O virtualization within the PCI-Express (PCIe) industry. SRIOV capability is a feature of many new PCIe adapters for Fibre Channel, Ethernet, Infiniband, and Converged Network Adapters (CNA).
The SRIOV adapter has an I/O adapter virtualization architecture that allows a single I/O adapter to be concurrently shared across many different logical partitions. The sharing is done at a physical level, so that each logical partition has access to a slice of the physical adapter. The sharing is accomplished via partitioning the adapter into many different PCI functions, and then distributing access to those functions. The adapter is presented as one or more physical functions (PFs) that control functions, for example used for both configuration and I/O, and a set of virtual functions (VFs), used for I/O and limited configuration, each VF represents a slice of the adapter capacity that can be assigned to a logical partition independently of other VFs. Each logical partition has a device driver for each of the VFs assigned to the logical partition.
With a shared hardware I/O adapter, such as the SRIOV adapter, error recovery of the shared adapter is now required to be coordinated between many partitions. Prior solutions only required coordination within a single partition, thus a new solution is required.
A need exists for an effective mechanism to enable enhanced error handling (EEH) for a shared hardware I/O adapter or a Single Root Input/Output Virtualization (SRIOV) adapter in a virtualized system. It is desirable that such mechanism enables effective and efficient error handling operations to cover the multiple partitions.
Principal aspects of the present invention are to provide a method, system and computer program product for implementing enhanced error handling (EEH) for a hardware I/O adapter, such as a Single Root Input/Output Virtualization (SRIOV) adapter, in a virtualized system. Other important aspects of the present invention are to provide such method, system and computer program product substantially without negative effects and that overcome many of the disadvantages of prior art arrangements.
In brief, a method, system and computer program product are provided for implementing enhanced error handling for a hardware I/O adapter, such as a Single Root Input/Output Virtualization (SRIOV) adapter, in a virtualized system. The hardware I/O adapter is partitioned into multiple endpoints, with each Partitionable Endpoint (PE) corresponding to a function, and there is an adapter PE associated with the entire adapter. The endpoints are managed both independently for actions limited in scope to a single function, and as a group for actions with the scope of the adapter. An error or failure of the adapter PE freezes the adapter PE and propagates to the VF PEs associated with the adapter, causing the VF PEs to be frozen. An adapter driver and VF device drivers are informed of the error, and start recovery. The hypervisor locks out the VF device drivers at key points enabling adapter recovery to successfully complete.
In accordance with features of the invention, a failure of a VF PE causes a failure of just that single PE, and is handled in isolation.
In accordance with features of the invention, the VF device driver learns of the error and starts recovery and the VF driver is blocked in the initial recovery steps with the VF PEs remaining frozen until the adapter driver completes the adapter recovery. The adapter driver unfreezes the adapter PE, collects error data, and starts recovery and reinitialization, and the VF PE remains frozen. The adapter driver recovers the adapter, resets the VFs, and recovers the previous configuration of the adapter. The adapter driver gives permission for the unfreeze of the VF PEs and VF drivers commence recovery.
In accordance with features of the invention, no coordination is required between the adapter driver and VF device drivers, or among the VF device drivers. The VF device drivers can progress independently, and complete recovery independently.
In accordance with features of the invention, multiple levels of isolation are provided. A first level of isolation includes errors scoped to a single VF. In that case only the single PE for the single VF is frozen and recovered. A second level of isolation includes at least one error scoped to the entire adapter. In that case all PEs are frozen and recovered, this includes the adapter PE and each of the VF PEs.
The present invention together with the above and other objects and advantages may best be understood from the following detailed description of the preferred embodiments of the invention illustrated in the drawings, wherein:
In the following detailed description of embodiments of the invention, reference is made to the accompanying drawings, which illustrate example embodiments by which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
In accordance with features of the invention, a method, system and computer program product are provided for implementing enhanced error collection for a hardware I/O adapter, such as a Single Root Input/Output Virtualization (SRIOV) adapter, in a virtualized system.
Having reference now to the drawings, in
Computer system 100 includes a memory 108 and one or more logical partitions (LPARs) 110 (one shown) coupled by a system bus 111 to the processor 104 and the processor host bridge 106. Each operating system (OS) 112 resides in its own LPAR 110, with each LPAR allocated a part of a physical processor 104, an entire physical processor, or multiple physical processors from the computer 100. A VF device driver 114 is provided with the logical partition (LPAR) 110. A portion of the memory 108 is allocated to each LPAR 110. Computer system 100 includes a hypervisor 116 including a configuration mechanism 118. The hypervisor 116 is a part of the system firmware and manages the allocation of resources to each operating system 112 and LPAR 110.
As shown, a hardware management console (HMC) 120 used, for example, to manage system functions including logical partition configuration and hardware virtualization, is coupled to the hypervisor 116 via a service processor 122. Computer system 100 includes a physical function (PF) manager or PF adjunct 124 provided with the hypervisor 116. The PF adjunct 124 includes an adapter driver 128 to manage physical functions of the hardware I/O adapter 102. The hypervisor 116 uses the PF adjunct 124, for example, to configure physical functions (PFs) and virtual functions (VFs) of the hardware I/O adapter 102 based on configuration information provided by a system administrator via the hardware management console 120.
As shown, the hardware I/O adapter 102 includes, for example, a first physical function 130, a second physical function 132, a first port 134, and a second port 136. The hypervisor 116 using the PF adjunct 124 configures virtual functions based on the physical functions 130, 132 and associates virtual functions with one or more of the ports 134, 136 of the hardware I/O adapter 102.
For example, a first virtual function, 140, instance 1, and the Mth instance of the first virtual function 142, where M is greater than 1, are associated with the second port 136. As shown, a second virtual function 144, such as the first instance of the second virtual function 144 and the Pth instance of the second virtual function 146, where P is greater than 1, are associated with the first port 134. As shown, multiple instances of an Nth virtual function, where N is greater than 2, such as the first instance of the Nth virtual function 148 is associated with the first port 134 and the Qth instance of the Nth virtual function 150, where Q is greater than 1, is associated with the second port 136.
Each instance of the first virtual function 140, 142, the second virtual function 144, 146, and Nth virtual function 148, 150 are hosted by a physical function, such as one of the first physical function 132, the second physical function 132, and another physical function (not shown).
Each instance of the first virtual function 140, 142, the second virtual function 144, 146, and Nth virtual function 148, 150 includes a respective virtual function identifier (ID), shown as ID 152, ID 154, ID 156, ID 158, ID 160, and ID 162. Each virtual function identifier uniquely identifies a particular virtual function that is hosted by the hardware I/O adapter 102. For example, when a message (not shown) is routed to a particular virtual function, the message includes the identifier associated with the particular virtual function.
In accordance with features of the invention, the PHB 106 with multiple PE support turns a single physical adapter into multiple independent PCI endpoints. These endpoints can then be managed independently by the different partitions. Each VF 140, 142, 144, 146, 148, 150 is a unique PE. Additionally, there is a PE associated with the entire adapter 102. A failure of a VF PE causes a failure of just that single PE, and advantageously is handled in isolation. A failure of the adapter PE propagates to the VF PEs associated with the adapter, causing them to be failed also.
A Partitionable Endpoint (PE) is a separately assignable I/O unit. That is, any part of an I/O subsystem that can be assigned a logical partition independent of another PE. Each PE has independent domains (addressing, error, state, and the like) to provide PE level error isolation, detection, and recovery.
In accordance with features of the invention, the hypervisor 116 provides base support for managing the specific PEs. The adapter driver 128 associating the PEs with specific functions and partitions, provides sequences error recovery across the entire adapter 102, controls when the VFs 140, 142, 144, 146, 148, 150 are allowed to commence recovery, and manages the PFs 130, 132. VF device driver 114 handles VF error recovery, for example using the same sequence as is followed for a non-shared adapter.
Computer system 100 is shown in simplified form sufficient for understanding the present invention. The illustrated computer system 100 is not intended to imply architectural or functional limitations. The present invention can be used with various hardware implementations and systems and various other internal hardware devices.
Referring to
System 200 includes a hypervisor 204 or other virtualization intermediary, used to enable multiple logical partitions to access virtual functions provided by hardware that includes the hardware I/O adapter 202. For example, as shown in
The physical functions 220, 222 advantageously include PCI functions, supporting single root I/O virtualization capabilities. Each of the virtual functions 212, 214, 216, 218 is associated with one of the physical functions 220, 222 and adapted to share one or more physical resources of the hardware I/O adapter 202.
Software functions or modules, such as a physical function (PF) adjunct 224 including an adapter driver 225, is provided with the hypervisor 204 for managing the physical functions 220, 222 and the virtual functions 212, 214, 216, 218. For example, a user may specify a particular configuration and the hypervisor 204 uses the PF adjunct 224 to configure the virtual functions 212, 214, 216, 218 from the physical functions 220, 222.
For example, in operation, the hypervisor 204 with the PF adjunct 224 enables the first virtual function instances 212, 214, 216 from the first physical function 220. The hypervisor 204 with the PF adjunct 224 enables the second virtual function 218 from the second physical function 222. The virtual functions 212, 214, 216, 218 are enabled, for example, based on a user provided configuration. Each of the logical partitions 206, 208, 210 may execute an operating system (not shown) and client applications (not shown).
As shown, the client applications that execute at the logical partitions 206, 208, 210 perform virtual input/output operations and include a respective device driver to directly manage an associated virtual function. For example, a first client application executing at the first logical partition 206 may include a first client VF device driver 226, and a second client application executing at the first logical partition 206 may include a second client VF device driver 228.
As shown, the first client VF device driver 226 accesses the first instance of the first virtual function 212. The second client virtual VF device driver 228 accesses the second virtual function 218. A third client VF device driver 230 executing at the second logical partition 208 accesses the second instance of the first virtual function 214. An Nth client VF device driver 232 executing at the Nth logical partition 210 accesses the Nth instance of the first virtual function 216. An access mechanism 234 and a configuration mechanism 236 are provided with the hypervisor 204 to associate a logical partition with an accessed virtual function. The hypervisor 204 uses the access mechanism 234 to enable logical partitions, such as LPAR 206 to access configuration space associated with one or more of the virtual functions 212, 214, 216, 218.
In accordance with features of the invention, the hardware I/O adapter is partitioned into multiple endpoints, with each Partitionable Endpoint (PE) corresponding to a function, and there is an adapter PE associated with the entire adapter. The endpoints are managed both independently for actions limited in scope to a single function, and as a group for actions with the scope of the adapter. An error or failure of the adapter PE freezes the adapter PE and propagates to the VF PEs associated with the adapter, causing the VF PEs to be frozen. An adapter driver and VF device drivers are informed of the error, and start recovery. The hypervisor locks out the VF device drivers at key points enabling adapter recovery to successfully complete.
In accordance with features of the invention, the adapter driver unfreezes its PE, collects error data, and starts recovery/reinitialization, and the VF PEs remains frozen. The adapter driver recovers the adapter, resets the VFs, and restores the previous configuration of the adapter. The adapter driver gives permission for the unfreeze of the VF PEs and the VF drivers commence recovery.
System 200 is shown in simplified form sufficient for understanding the present invention. The illustrated system 200 is not intended to imply architectural or functional limitations. The present invention can be used with various hardware implementations and systems and various other internal hardware devices.
In accordance with features of the invention, enhanced error handling (EEH) optionally is entered via the HW detecting an error and freezing the adapter PE along with the child PEs. The same sequence can also be initiated from a number of other causes, this is just one example.
Referring to
In
Referring now to
As indicated in a block 406, the adapter driver recovers the adapter. This may involve a reset of the entire adapter, which also resets the VFs. The adapter is then reinitialized to the default state at block 406. As indicated in a block 408, the adapter driver replays the previous configuration to the adapter. This is the same configuration as before, since the VFs need to come back similar to what was previously there, such as with the same PCI BAR spaces or VF BAR registers, and the like. Then the adapter driver logs error and communicates a Log ID to the hypervisor as indicated in a block 410. As indicated in a block 412, the adapter driver gives the hypervisor permission to unfreeze VF PEs and resumes normal operation.
Referring now to
In accordance with features of the invention, enhanced error handling (EEH) optionally includes two levels of isolation, such as illustrated in
In accordance with features of the invention, enhanced error handling (EEH) optionally includes additional intermediate groupings. This adds additional levels to the hierarchy. The advantage of this is finer-grained recovery. The recovery action is less intrusive, and fewer VFs are impacted. For example, some errors might be scoped to a single physical port. In that case, recovery might encompass freezing the set of PEs associated with the VFs using that physical port. In another example, errors might be scoped to a single I/O protocol, for example in a Converged Network Adapter (CNA) implementing both Network Interface Controller (NIC) and Fibre Channel over Ethernet (FCoE), an error might impact only FCoE. In that case, recovery might encompass freezing the set of PEs associated with VFs using that protocol, while not freezing PEs for VFs running different protocols.
In accordance with features of the invention, enhanced error handling (EEH) optionally includes multiple and potential overlapping groups of PEs. Continuing the above examples, there might be groupings for all PEs on a single port, all FCoE PEs on that port, and also all NIC PEs on that same port. Further there might also be additional groupings of all FCoE PEs across all ports of the adapter. The exact details of the groupings used are determined by the level of isolation provided by the adapter vendor for recovery. Allowing various groupings by the adapter driver allows for maximum error isolation with the minimum number of VF PEs impacted. Note that the recovery by the VF driver is independent of the groupings used, as the VF driver is acting on only a single VF.
Referring now to
In
A second step is indicated by a line labeled 2, the error logging support 619 of the hypervisor 610 sends the log to the FSP 632 for inclusion in the logging flow. A third step is indicated by a line labeled 3, the FSP 632 sends the log to the HMC 630. A fourth step is indicated by a line labeled 4, the FSP 632 sends the log back to the hypervisor 610 for full broadcast. A next fifth step indicated by lines labeled 5, the error logging support 619 of the hypervisor 610 sends the log to all active partitions 602. An optional sixth step indicated by a line labeled 6 shown in dotted line for Resource Monitoring and Control (RMC) connection, the HMC 630 may pull platform error from platform operating systems using the RMC connection. This provides a redundant path for receiving platform error logs. An optional seventh step indicated by a line labeled 7 shown in dashed line for Hypervisor Call (HCALL) interface, the partition 602 can retrieve the error log ID 618 from the hypervisor 610.
Referring now to
A sequence of program instructions or a logical assembly of one or more interrelated modules defined by the recorded program means 707, 706, 708, and 710, direct the computer system 700 for implementing enhanced error collection for the I/O adapter.
While the present invention has been described with reference to the details of the embodiments of the invention shown in the drawing, these details are not intended to limit the scope of the invention as claimed in the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5875310 | Buckland et al. | Feb 1999 | A |
6122289 | Brown et al. | Sep 2000 | A |
6311326 | Shagam | Oct 2001 | B1 |
7231493 | Nguyen et al. | Jun 2007 | B2 |
7757129 | Bohizic et al. | Jul 2010 | B2 |
7770073 | Fashchik et al. | Aug 2010 | B2 |
8141092 | Brown et al. | Mar 2012 | B2 |
8261242 | Booth et al. | Sep 2012 | B2 |
8358661 | Armstrong et al. | Jan 2013 | B2 |
8359415 | Brown et al. | Jan 2013 | B2 |
8375363 | Zhou et al. | Feb 2013 | B2 |
8418166 | Armstrong et al. | Apr 2013 | B2 |
8447891 | Brownlow et al. | May 2013 | B2 |
9135101 | Prabhakaran | Sep 2015 | B2 |
20020161907 | Moon | Oct 2002 | A1 |
20030037275 | Bakke et al. | Feb 2003 | A1 |
20040049710 | Ashmore et al. | Mar 2004 | A1 |
20040260981 | Kitamorn et al. | Dec 2004 | A1 |
20080147904 | Freimuth et al. | Jun 2008 | A1 |
20090133028 | Brown et al. | May 2009 | A1 |
20090144731 | Brown et al. | Jun 2009 | A1 |
20090178033 | Challener et al. | Jul 2009 | A1 |
20090313391 | Watanabe | Dec 2009 | A1 |
20100115049 | Matsunaga et al. | May 2010 | A1 |
20100146170 | Brown et al. | Jun 2010 | A1 |
20110040860 | DeCusatis et al. | Feb 2011 | A1 |
20120102490 | Eide et al. | Apr 2012 | A1 |
20120124572 | Cunningham | May 2012 | A1 |
20120137288 | Viswanath | May 2012 | A1 |
20120151472 | Koch et al. | Jun 2012 | A1 |
20120159245 | Brownlow et al. | Jun 2012 | A1 |
20120179932 | Armstrong et al. | Jul 2012 | A1 |
20120180047 | Cardona et al. | Jul 2012 | A1 |
20120180048 | Brownlow et al. | Jul 2012 | A1 |
20120185632 | Lais et al. | Jul 2012 | A1 |
20120246644 | Hattori et al. | Sep 2012 | A1 |
20120254862 | Dong | Oct 2012 | A1 |
20120297379 | Anderson et al. | Nov 2012 | A1 |
20120317548 | Olsa et al. | Dec 2012 | A1 |
20130054507 | Das et al. | Feb 2013 | A1 |
20130275972 | Sawa et al. | Oct 2013 | A1 |
20140250338 | Prabhakaran et al. | Sep 2014 | A1 |
Number | Date | Country |
---|---|---|
1130501 | Sep 2001 | EP |
Entry |
---|
Bhosale, Shivaji D. et al., “IBM Power Systems SR-IOV Technical Overview and Introduction”, REDP-5065-00, International Business Machines Corporation, May 20, 2014, pp. 1-71. |
Ko, Mike et al., “Virtual Ethernet Bridging”, International Business Machines Corporation, Jul. 2008, pp. 1-11. |
Power 7 Information, Virtualizing Power Systems Servers, International Business Machines Corporation, Apr. 29, 2014. |
Emulex, “Single Root I/O Virtualization (SR-IOV)”, Version 3.1 User's Guide, P007978-01A Rev. A, 2012, pp. 1-5. |
Varma, Anujan, “Single Root IOV Endpoint Implementation”, PCI-SIG Developers Conference 2007, PCI-SIG, May 21, 2007, pp. 1-36. |
International Search Report and Written Opinion of the ISA dated Jul. 8, 2014—International Application No. PCT/JP2014/002914. |
Netronome; “Standardized but Flexible I/O for Self-Virtualizing Devices”; WTOV'08 Proceedings of the First Conference on I/O Virtualization; p. 9-9; 2008. |
Broadcom; “Broadcom Ethernet Network Controller Enhanced Virtualization Functionality”; http://www.broadcom.com/press/release.php?id=1197764; White Paper; Oct. 2009. |
Challa, NR.; “Hardware Based I/O Virtualization Technologies for Hypervisors, Configurations, and Advantages—A Study”; Proceedings of the 2012 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM), 5 pp.; IEEE; 2012. |
Kirk Glerum et al., “Debugging in the (Very) Large: Ten Years of Implementation and Experience”, Microsoft Corporation, 2009. |
Nithya Ramanathan et al., “Sympathy for the Sensor Network Debugger”, UCLA Center for Embedded Network Sensing, 2005. |
Number | Date | Country | |
---|---|---|---|
20140372789 A1 | Dec 2014 | US |