The present invention relates generally to the data processing field, and more particularly, relates to a method, apparatus and computer program product for implementing initial program load to configure system hardware in a computer system including speculative deconfiguration of system hardware.
Reliability, availability, and serviceablility (RAS) features provided in some data processing systems enable enhanced error detection and prevention capabilities.
For example, the RS/6000® server computer system manufactured and sold by International Business Machines Corporation of Armonk, N.Y., includes a unique RAS feature called Repeat-Gard.
The Repeat-Gard feature provides the capability to deconfigure portions of hardware that are determined to be defective either via diagnostics run during initial program load (“IPL”) or during runtime. A user also has the capability of indicating hardware is defective by manual intervention. Deconfiguring portions of hardware may cause other working pieces of a system to be deconfigured as well because they cannot be used without the original part.
There are cases where so much hardware has been deconfigured that a system will not IPL. Typically, this does not occur due to the faulty part alone, but as a result of the deconfiguration of associated parts. This deconfiguration by association concept is a cascade of dependencies unknown to the software that is performing the initial deconfiguration. Conventionally, all dependencies must be known by multiple software applications and all deconfiguration actions are necessarily permanent.
A need exists for a method and mechanism for implementing initial program load including speculative deconfiguration of system hardware in a computer system.
Principal aspects of the present invention are to provide a method, apparatus and computer program product for implementing initial program load to configure system hardware in a computer system. Other important aspects of the present invention are to provide such method, apparatus and computer program product for implementing initial program load of system hardware in a computer system substantially without negative effect and that overcome many of the disadvantages of prior art arrangements.
In brief, a method, apparatus and computer program product are provided for implementing initial program load to configure system hardware in a computer system. During the initial program load of the computer system, selected hardware components are marked with a temporary state of non-functional. At least one policy check is performed based upon a system type for the computer system to determine system availability. When system availability is identified, the selected hardware components are permanently deconfigured and the initial program load of the computer system is continued.
In accordance with features of the invention, when determined that the system availability fails, the selected hardware components are reconfigured and the initial program load of the computer system is continued. System availability is identified when the computer system has enough functioning hardware to run. For example, a processor, memory, a path to the memory, an input/output (I/O) bridge, and an I/O adapter may be required for system availability; while the required hardware is system specific.
In accordance with features of the invention, a hardware manager in the computer system marks the selected hardware components with the temporary state of non-functional. A client IPL deconfiguration control program requests the hardware manager to mark the selected hardware components with the temporary state of non-functional based upon a failure or an action from a prior initial program load. When a selected hardware component is marked with the temporary state of non-functional, hardware components associated with the selected hardware component are marked with the temporary state of non-functional.
In accordance with features of the invention, the hardware manager performs the policy check based upon the system type for the computer system to determine system availability, responsive to marking the selected hardware components with the temporary state of non-functional. The client IPL deconfiguration control program forces the selected hardware components to permanently deconfigured when the hardware manager determines system availability. Otherwise, when the hardware manager determines that the system is unavailable to complete the initial program load of the computer system, the client IPL deconfiguration control program forces the selected hardware components to functional.
The present invention together with the above and other objects and advantages may best be understood from the following detailed description of the preferred embodiments of the invention illustrated in the drawings, wherein:
Referring now to the drawings, in
Computer system 100 is shown in simplified form sufficient for understanding the present invention. The illustrated computer system 100 is not intended to imply architectural or functional limitations. The present invention can be used with various hardware implementations and systems and various other internal hardware devices, for example, instead of a single main processor 102, multiple main processors can be used.
As shown in
Various commercially available computers can be used for computer system 100, for example, an IBM personal computer or an IBM server computer. CPU 102 is suitably programmed by the client IPL deconfiguration control program 134 to execute the flowchart of
In accordance with features of the preferred embodiment, methods are provided that allow software run during an IPL to determine if deconfiguring a piece of hardware will cause the system to fail during the IPL and to bring back enough hardware to allow the IPL. This enables a customer, in emergency situations, to continue to use the computer system 100 until a service call can be made and completed. In conventional arrangements, the system would not be able to complete IPL until a service action was completed or the user manually reconfigured hardware.
In accordance with features of the preferred embodiment, methods implement speculative deconfiguration of system hardware, providing the capability for computer system 100 to allow a user to temporarily set the state of hardware to non-functional as well as any of its associated hardware. If it is determined that there is enough good hardware to IPL the system 100, the state of the marked hardware can change to a more permanent non-functional state or have the functional state restored as required.
In a computer system, it is not enough just to reconfigure or un-guard a piece of hardware and expect the associated hardware to be reconfigured automatically because more than one piece of hardware affects another piece of hardware within in the system.
In accordance with features of the preferred embodiment, methods allow the hardware manager 132 performing a Repeat-Gard function to first check if deconfiguring a hardware component may cause the system 100 to not IPL. If the Repeat-Gard program determines that this is the case and that the original failure was not a fatal problem, then the hardware component is reconfigured and marked as functional and the system 100 continues to IPL. If deconfiguring the hardware would not cause an IPL failure, the hardware manager 132 performs a Repeat-Gard function to permanently deconfigure the hardware component, while allowing the system to continue IPL. Additional details about the Repeat-Gard program can be found in the technical white paper entitled The RS/6000 Enterprise Server S Family: Reliability, Availability, Serviceability and the IBM redbook entitled IBM eServer PSeries 680 Handbook: Including the RS/6000 Model S80, which are herein incorporated by reference in their entirety.
Referring now to
For example, the client IPL deconfiguration control program 134 requests the hardware manager 132 to mark a certain hardware component or certain pieces of hardware as speculatively non-functional at block 304. As the state of hardware is marked non-functional, the associated hardware is also marked speculative non-functional as indicated in a block 306.
Next system availability is queried as indicated in a block 308, where the client IPL deconfiguration control program 134 asks the hardware manager 132 if there is enough hardware available to IPL the computer system 100. Hardware availability is validated based upon system type as indicated in a block 310. For example, the hardware manager 132 runs a number of policy checks based on the system type to validate hardware availability.
Then it is determined whether the hardware is sufficient to complete IPL of the computer system 100 as indicated in a decision block 312. If the hardware manager 132 indicates that there is enough hardware to IPL, the client IPL deconfiguration control program 134 will change all the speculatively deconfigured pieces of hardware to permanently deconfigured and continue the IPL as indicated in a block 314.
If the hardware manager 132 indicates that there is not enough hardware to run, the client IPL deconfiguration control program 134 will force all the speculatively deconfigured pieces functional again and continue the IPL as indicated in a block 316.
Each piece of hardware that was marked speculatively deconfigured, both directly and by association, is marked with the requested new state at blocks 314, 316.
Referring now to
A sequence of program instructions or a logical assembly of one or more interrelated modules defined by the recorded program means 404, 406, 408, 410, direct the computer system 100 for implementing initial program load with speculative deconfiguration of the preferred embodiment.
Embodiments of the present invention may also be delivered as part of a service engagement with a client corporation, nonprofit organization, government entity, internal organizational structure, or the like. Aspects of these embodiments may include configuring a computer system to perform, and deploying software, hardware, and web services that implement, some or all of the methods described herein. Aspects of these embodiments may also include analyzing the client's operations, creating recommendations responsive to the analysis, building systems that implement portions of the recommendations, integrating the systems into existing processes and infrastructure, metering use of the systems, allocating expenses to users of the systems, and billing for use of the systems.
While the present invention has been described with reference to the details of the embodiments of the invention shown in the drawing, these details are not intended to limit the scope of the invention as claimed in the appended claims.