The present disclosure relates generally to computer systems and, in particular, to simulating a failure in a virtualization environment.
A shared computer system often concurrently supports a number of different guest operating systems by using virtual machines. Virtual machines can be in the form of virtual machine guests, logical partitions (LPARs), or other isolation techniques.
Virtual machines (VM's) are separated in two major categories based on their use and degree of correspondence to any real machine. A system virtual machine provides a complete system platform which supports the execution of a complete operating system (OS). In contrast, a process virtual machine is designed to run a single program, which means that it supports a single process. An essential characteristic of a virtual machine is that the software running inside is limited to the resources and abstractions provided by the virtual machine—it cannot break out of its virtual world.
System virtual machines (sometimes called hardware virtual machines) allow multiplexing the underlying physical machine between different virtual machines, each running its own operating system. The software layer providing the virtualization is called a virtual machine monitor or hypervisor. A hypervisor can run on bare hardware (Type 1 or native VM) or on top of an operating system (Type 2 or hosted VM). The main advantages of system VMs are that multiple OS environments can co-exist on the same computer, in strong isolation from each other, and the virtual machine can provide an instruction set architecture (ISA) that is somewhat different from that of the real machine.
Multiple VMs each running their own operating system (called a guest operating system) are frequently used in server consolidation, where different services that used to run on individual machines in order to avoid interference, are instead run in separate VMs on the same physical machine. This use is frequently called quality-of-service isolation (QoS isolation). The desire to run multiple operating systems was the original motivation for virtual machines, as it allowed time-sharing a single computer between several single-tasking operating systems.
A shared computer system may also employ other containers executing discrete and unrelated tasks. In such a collaborative shared-physical-resource environment, testing and workloads can be disrupted in non-obvious ways during development on a shared computer system.
In some instances, two mainframe or other computers may be monitoring one another. If one mainframe determines that it or the other mainframe is about to crash, system designers have attempted to develop elegant load shifting techniques to ensure that processsing is not too adversely affected if one of the mainframes goes down. Like any development, these techniques have created additional challenges.
One embodiment of the present invention is directed to a method for simulating a hardware failure in a virtualization environment. The method of this embodiment includes determining a location of an instruction pointer for a particular operating system operating in the virtualization environment; determining an address of a memory location containing an invalid instruction; and writing the address of the memory location containing the invalid instruction in the location of the instruction pointer.
Another embodiment of the present invention is directed to a method for testing a partitionable computer simulating a hardware failure in a virtualization environment. The method of this embodiment includes partitioning the computer into one or more virtual machines including a first virtual machine; determining the operating system installed on the first virtual machine; determining a location of an instruction pointer for the operating system operating on the first virtual machine; determining an address of a memory location containing an invalid instruction for the operating system on the first virtual machine; and writing the address of the memory location containing the invalid instruction in the location of the instruction pointer.
Other systems, methods, and/or computer program products according to embodiments will be or become apparent to one with skill in the art upon review of the following drawings and detailed description. It is intended that all such additional systems, methods, and/or computer program products be included within this description, be within the scope of the present invention, and be protected by the accompanying claims.
The subject matter which is regarded as the invention is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the invention are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:
The detailed description explains the preferred embodiments of the invention, together with advantages and features, by way of example with reference to the drawings.
One problem that has emerged due to the design of elegant failure procedures is that testing a system for an operating system or power failure without affecting other virtual machines is difficult. For instance, in the case where two mainframe computer are sharing processing and employing a load-leveler to allocate processing between the two mainframes, it is difficult to simulate a power failure, and therefore, the ability to quickly transfer the load from one mainframe to the other, without affecting all of the other virtual machines. That is, it is difficult to test “pulling the plug” without actually pulling the plug on the mainframe and taking the entire mainframe down and, thus, disturbing the usage of all of the mainframe's users.
Exemplary embodiments of the present invention provide methods for causing a virtual computer to simulate a hardware or software failure. In one embodiment, the location of a program counter is determined. As is well known in the art, the program counter (also called the instruction pointer) is a register in a computer processor which indicates where the computer is in its instruction sequence. Depending on the details of the particular machine, the instruction pointer holds either the address of the instruction being executed, or the address of the next instruction to be executed. The instruction counter is automatically incremented for each instruction cycle so that instructions are normally retrieved sequentially from memory. Certain instructions, such as branches and subroutine calls and returns, interrupt the sequence by placing a new value in the program counter.
In most processors, the instruction pointer is incremented immediately after fetching a program instruction. This means that the target address of a branch instruction is obtained by adding the branch instruction's operand to the address of the next instruction (byte or word, depending on the computer type) after the branch instruction. The address of the next instruction to be executed is typically found in the instruction pointer. In some embodiments, an address that has an invalid instruction (also referred to herein as an invalid memory location) is determined. As is well known in the art, in some systems it may be possible to insert the invalid instruction by either user or computer intervention. Regardless, according to an embodiment of the present invention, the location of the invalid instruction is placed into the instruction pointer. Upon the next instruction cycle, the instruction pointer causes the virtual computer to try an perform the invalid instruction. In some instances the virtual computer stops operating at this point. The operation stoppage is an effective simulation of pulling the plug and may also simulate and operating system failure. In addition, operating in such a manner may effectively avoid the built-in recovery routines.
Turning now to the drawings, it will be seen that in
Users can initiate various tasks on the host system 102 via the user systems 104, such as developing and executing system tests or running application programs. In some embodiments, the user system may be able to directly insert (via a keyboard or other input device) addresses or commands directly into the instruction pointer of a virtual machine. While only a single host system 102 is shown in
The network 106 may be any type of communications network known in the art. For example, the network 106 may be an intranet, extranet, or an internetwork, such as the Internet, or a combination thereof. The network 106 can include wireless, wired, and/or fiber optic links.
In exemplary embodiments, the host system 102 accesses and stores data in a data storage device 108. The data storage device 108 refers to any type of storage and may comprise a secondary storage element, e.g., hard disk drive, tape, or a storage subsystem that is internal or external to the host system 102. Types of data that may be stored in the data storage device 108 include, for example, log files and databases. It will be understood that the data storage device 108 shown in
In exemplary embodiments, the host system 102 executes various applications, including a hypervisor 110 and optionally multiple virtual machines 112. In some embodiments, the system may include multiple hypervisors 110. In such embodiments, one hypervisor ma control another hypervisor to create a multi-tiered system. The hypervisor 110 manages access to resources of the host system 102 and serves as a virtual machine monitor to support concurrent execution of the virtual machines 112. Each virtual machine 112 can support specific guest operating systems and multiple user sessions for executing software written to target the guest operating systems. For example, one virtual machine 112 may support an instance of the Linux® operating system, while a second virtual machine 112 executes an instance of the z/OS® operating system. Other guest operating systems known in the art can also be supported by the hypervisor 110 through the virtual machines 112, In some embodiments, one of the virtual machines 112 could function as a hypervisor, thus resulting in a multi-tiered hypervisor structure. In exemplary embodiments, the hypervisor 110 manages execution control of each virtual machine 112 through a virtual machine control bus 122. Each virtual machine control bus 122 may handle an exchange of low-level control information, such as interrupts, device driver commands, device driver data, and the like. While each virtual machine control bus 122 can handle low-level information exchange, it is incapable of handling higher-level messages targeted for in-band user display.
In a block 204 an invalid memory location (or location of an invalid instruction) is determined. In some systems, such as a Linux operating system, the invalid memory location may be, for example, represented as a negative one or a null. Regardless, in a block 206, the invalid memory location is written to the instruction pointer. On the next instruction cycle, the machine stops. The stopping of the machine in this manner may simulate a power failure or an operating system failure.
Advantageously, stopping the virtual machine in this manner may be done without affecting any other virtual machines operating on a host machine (such as a mainframe) being tested, assuming the virtual machine is not itself a hypervisor with its own guests. As discussed above, if the machine has been portioned into two or more virtual machines (either in hardware, software, or a combination or both) each virtual machine may operate independently. If the virtual machines are operating independently, a failure in one virtual machine will not effect operation of another virtual machine. Thus, the present invention may allow a virtual machine to be failed for testing or other purposes without affecting other users of the host machine. For example, the virtual machine could be stopped utilizing the method disclosed herein to simulate a power failure to the host machine without affecting any other users of the host machine.
At a block 304 the operating system of one or more of the virtual machines is determined. As discussed above, each virtual machine may run its own operating system. In some embodiments, the determination of the operating system may be made by a program running on the host machine or may be determined by a user. Regardless, at a block 306 a location of the instruction pointer for the operating running on a particular virtual machine is determined.
At a block 308 an address or memory location of an invalid instruction is determined and this address of memory location is placed in the instruction at a block 310.
As described above, embodiments can be embodied in the form of computer-implemented processes and apparatuses for practicing those processes. In exemplary embodiments, the invention is embodied in computer program code executed by one or more network elements. Embodiments include computer program code containing instructions embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, universal serial bus (USB) flash drives, or any other computer-readable storage medium, wherein, when the computer program code is loaded into and executed by a computer, the computer becomes an apparatus for practicing the invention. Embodiments include computer program code, for example, whether stored in a storage medium, loaded into and/or executed by a computer, or transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the computer program code is loaded into and executed by a computer, the computer becomes an apparatus for practicing the invention. When implemented on a general-purpose microprocessor, the computer program code segments configure the microprocessor to create specific logic circuits.
While the invention has been described with reference to exemplary embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but that the invention will include all embodiments falling within the scope of the appended claims. Moreover, the use of the terms first, second, etc. do not denote any order or importance, but rather the terms first, second, etc. are used to distinguish one element from another. Furthermore, the use of the terms a, an, etc. do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced item.