This invention relates to the art of computer system emulation and, more particularly, to a host computer system in which the instruction set of legacy system hardware design is emulated by a software program to thus preserve legacy systems and software. More particularly, this invention relates to improving the reliability, availability and serviceability of a software emulator by utilizing host system hardware and software and a signal handler to detect and initiate recovery from certain software faults in the emulator.
Users of obsolete mainframe computers running a proprietary operating system may have a very large investment in proprietary application software and, further, may be comfortable with using the application software because it has been developed and improved over a period of years, even decades, to achieve a very high degree of reliability and efficiency.
As manufacturers of very fast and powerful “commodity” processors continue to improve the capabilities of their products, it has become practical to emulate the proprietary hardware and operating systems of powerful older computers on platforms built using commodity processors such that the manufacturers of the older computers can provide new systems which allow their customers to continue to use their highly-regarded proprietary software on state-of-the-art new computer systems by emulating the older computer in software that runs on the new systems.
Accordingly, computer system manufacturers are developing such emulator systems for the users of their older systems, and the emulation process used by a given system manufacturer is itself subject to ongoing refinement and increases in efficiency and reliability.
Some historic computer systems now being emulated by software running on commodity processors have achieved performance which approximates or may even exceed that provided by legacy hardware system designs. An example of such hardware emulation is the Bull HN Information Systems (descended from General Electric Computer Department and Honeywell Information Systems) DPS 9000 system which is being emulated by a software package running on a Bull NovaScale system which is based upon an Intel Itanium 2 Central Processor Unit (CPU). The 64-bit Itanium processor is used to emulate the Bull DPS 9000 36-bit memory space and the GCOS 8 instruction set of the DPS 9000. Within the memory space of the emulator, the 36-bit word of the “target” DPS 9000 is stored right justified in the least significant 36 bits of the “host” (Itanium) 64-bit word. The upper 28 bits of the 64-bit word are typically zero for “legacy” code. Sometimes, certain specific bits in the upper 28 bits of the containing word are used as flags or for other temporary purposes, but in normal operation these bits are usually zero and in any case are always viewed by older programs in the “emulated” view of the world as being non-existent. That is, only the emulation program itself uses these bits.
In the development of the emulator system, careful attention is typically devoted to ensuring exact duplication of the legacy hardware behavior so that legacy application programs will run without change and even without recompilation. Exact duplication of legacy operation is highly desirable to accordingly achieve exactly equivalent results during execution.
In order to achieve performance in an emulated system that at least approximates that achieved by the legacy system hardware, or in more general terms, in order to maximize overall performance, it is necessary that the code that performs the emulation be very carefully designed and very “tightly” coded in order to minimize breaks and maximize performance. These considerations require careful attention to the lowest level design details of the host system hardware, that is, the hardware running the software that performs the emulation. It also requires employing as much parallelization of operations as possible.
An Intel Itanium series 64-bit CPU is an excellent exemplary platform for building a software emulator of a legacy instruction set because it offers hardware resources that enable a high degree of potential parallelism in the hardware pipeline of the Itanium CPU. The Itanium CPU also provides instructions that allow for fast decision making and guidance by the software as to the most likely path of program flow for a reduction in instruction fetch breaks and overall improved performance. In particular, the Itanium architecture provides instructions that allow preloading of a “branch register” which informs the hardware of the likely new path of the instructions to be executed, with the “branch” instruction itself actually happening later. This minimizes the CPU pipeline breaks that are characteristically caused by branch instructions, and allows for typically well predicted branch instructions to be processed efficiently without CPU pipeline breaks wasting cycles. The branch look-ahead hardware of the Itanium CPU, and in particular a specific mechanism for loading and then using a branch register, allows for the emulation software to achieve a higher degree of overlap and, as a result, higher performance in emulated legacy system instruction processing.
Reference may be taken to co-pending U.S. application Ser. No. 11/174,866 entitled “Lookahead Instruction Fetch Process for Improved Emulated Instruction Performance” by Russell W. Guenthner et al, filed Jun. 6, 2005, and assigned to the same Assignee as the present application for a more complete exposition of the advantages of selecting a host processor having the characteristics of the Intel Itanium series processors for emulating legacy software.
The development of software which provides for emulation of the legacy software instruction set on the host machine is complicated, and the requirements on performance are extreme. An approach which allows for ease of development and also provides the ultimate performance is to develop the code first in a high-level language, and then once the functionality and approach are precisely defined, to develop analogous code in assembly language. Because of the complexity it is also probable that in a final product some of the source code will be in assembly and some will be in a more easily maintained and understood higher level language such as “C” or “C++”.
Two major requirements of the emulation software are 1) to achieve precise and exact emulation of the legacy instruction set, and 2) to achieve the highest possible performance. These two requirements are sometimes conflicting.
In any software emulation of hardware there are pieces of code which are concerned with checking for error conditions and exceptions. Since performance is critical the code must be carefully crafted to avoid “wasting” unnecessary time doing all the checks that the legacy system hardware might have done in parallel with other operations. Checking in software for the many exceptions that may have been detected by the legacy hardware is time-consuming and a potentially large detriment to performance.
The emulation software runs on a machine called the host system. The host system is itself a computer system which has its own exception and fault checking mechanisms built into the host system hardware and if used, also in the operating system of the host system. The exceptions and checks may be similar or quite different from the legacy system being emulated. These exceptions typically must be avoided by writing the emulation software so that it does not typically fault or do things which would cause system or application program errors.
If an error is detected by the host system hardware and operating system software there are typically two options for “handling” the error condition. Typically, the application program is aborted. In more advanced systems, a mechanism commonly called a “signal handler” may be invoked by a coordinated response of host system's operating system and the underlying hardware upon which it is running. In any operating system these pieces of code are typically quite machine dependent. The signal handler is code that is written by the application developer and that code is invoked on behalf of the application program when specifically selected hardware or system errors are detected. This gives the application programmer a chance to recover or process the host system detected errors in any desired way and is a much improved alternative to simply aborting the program.
It can also be observed that not all of the legacy code being emulated from the legacy system is of the same level of criticality. For example, an application program can abort or be aborted without bringing down the entire emulated system. Certain pieces of the operating system are also much more critical than other pieces. Some programs can be aborted and restarted without problem, and many “mainframe” programs are designed to allow for this. A software approach to hardware detected errors inside the emulator is akin to hardware error detection, correction and recovery.
Accordingly it would be an advantage to provide for a solution and methodology within a computer system hardware emulation that allows for the signal handler of the host system hardware and operating system to be utilized by the software emulation program with the objectives of improving the stability and reliability of the software emulation program itself. This is done in a manner such that the checking for selected special conditions that would normally be required of the software emulation code in the method of the prior art would be left unchecked by the software emulation program. These certain special conditions would now instead be detected and caught by the host system hardware and software. Then, control is passed back to the emulation software program in a manner such that proper processing and recovery from the exception in the manner of the emulated legacy system hardware would take place.
This implementation allows certain checking by the software emulation program to not need to be done in software and allows for increased performance of the overall emulation. Further enhancement of these same facilities for signal handling also allows for increased reliability in the emulation system itself, and especially in overall legacy system stability and are the objects of this invention.
These further enhancements include a provision for distinguishing between emulation of hardware instructions which are part of the legacy system operating system, or “system” code versus instructions which are part of an “application” that is not part of the operating system itself. Once this distinction can be made the signal handling can be programmed such that certain signals which are detected while emulating application program code will cause the abort of only that application program while leaving the legacy operating system running. This approach increases the stability and availability of the overall emulated legacy system.
It is to these ends that the present invention is directed.
When the emulated legacy system is a large mainframe handling multiple programs simultaneously and continuously, the selection and subsequent control of the emulator is not trivial. Software emulation of a hardware system requires that the software emulation appear to act like hardware in that it switches between and performs many tasks for many users or programs simultaneously. The same software emulation system which is utilized to run a user's job, is also simultaneously used to emulate the processing of instructions for both the operating system and the I/O system. In a large system with multiple users, the same emulation software is used to process jobs from many users, threads, or processes. The software emulation of the “hardware” switches rapidly between the tasks to be done, and as a result spends small slices of time processing many users jobs, threads, or processes.
If errors exist in the coding of the software emulation software, it may be possible that the coding errors will affect only the results of the software emulation of an application program and not the higher level operating system or I/O system. The erroneous coding may affect only a single user and not other users. In this case system reliability can be increased by detecting these conditions and in response to such detection aborting only the job for that user application rather than the entire emulation software program, which could potentially bring down the entire emulation system, operating system and all components. This should be avoided if possible without taking any large risk or sacrifice of system data integrity.
A simple and commonly encountered example of a check in which reliability can be increased would be the hardware detection of a “divide check”. A “divide check” is a commonly used term in the computer industry which means that an attempt has been to tell the hardware to divide by zero. Dividing by zero is potentially a hazard in programming because a divide by zero is a result which should have a quotient with value of infinity. Typically, without a signal handler, a division by zero will cause the operating system to abort any program which executes a hardware divide instruction and encounters a divisor with value zero. This is true for both integer and floating point divides. With a signal handler in place however, the application program is given the opportunity to recover from such a fault and to return to normal processing.
Specifically as related to software emulation of a hardware instruction set, there are two potential categories of problems which may cause a divide check. The first case is when the software emulation program is in error, and for some reason unplanned by the programmer a divide by zero is encountered that was unanticipated. A second case is when the need for checking for a zero divisor is specifically ignored by the emulation software and the host system hardware/software signal of a divide check is relied upon to detect such a condition.
In the first case which is a programming error, there are two further sub-possibilities. The first sub-possibility is that the error is encountered while emulating the instructions which are a part of a user's job, and the second is when the emulation is processing an emulated instruction which is part of the legacy operating system. If the error is encountered while processing instructions which are part of a user's application, there may be no need to “crash” or abort the entire software emulation system. Instead, for certain errors a choice can be made to abort only that specific user's job, and leave the emulation to continue with further processing of other jobs and the operating system itself. This will result in a more robust emulated legacy system. It is understood that certain pieces of operating system code are also less or more critical than others, and that some application programs are very important, but this can be ignored for simplicity in this explanation.
The second case is a potential error which could have been anticipated by the software emulation programmer, but a decision was made, for performance and simplicity reasons, to not anticipate or check for the error condition before using a host machine instruction which may indeed abort. In this case, a signal handler at a high level can detect the error, and then return control to the software emulation code specifically written to recover from such errors. The software emulator can then account for the event which was the hardware exception and finish that specific instruction emulation utilizing special code in the software emulator written to recover from errors in the manner of the original legacy hardware instruction. That is, in response to the signal from the host system hardware that a specific error has occurred, the software emulator can determine which legacy instruction was being emulated and respond in a manner which emulates the response that the legacy hardware system would perform in response to that special situation.
For the second case just described, that of not checking for conditions that could cause potential hardware aborts, the performance of the software code can be potentially better than when a check is made because the instructions required to perform the check are not needed. The response to an error is typically not critical and not a performance impact because the exception conditions typically occur infrequently. For conditions which do occur on a frequent basis, an engineering decision as to which is the most performance approach must be made, especially since the signal handler in a machine such as Linux may take hundreds or even thousands of cycles to respond, recover and return control after the error to the software emulation program.
A further complication which must be resolved in the second case is to determine if any distinction must be made to account for the anomaly that an unanticipated software coding error could cause a hardware fault identical to that which might occur naturally by encountering data which would cause a legacy instruction hardware fault. For the example of a divide check, the response would be different if the software emulation caused a divide check when it was not in the process of emulating a divide instruction, versus if it encountered a divide check while emulating a legacy instruction which actually does a divide. This distinction could be provided to the signal handler as some sort of flag such as the setting of a global variable or register, to tell the signal handler that an expected potential exception type may be encountered and then resetting that flag after the code that may cause it has been completed. Another approach would be to provide information which would allow the software emulation to have knowledge of specifically which instruction locations may detect the “anticipated” hardware errors, and process only those specifically. Hardware aborts detected from other host system program counter locations would be treated as the first case above, that is, determining if the error occurred while processing a legacy system instruction which is part of the legacy operating system code, or a “milder” response for an application program which would allow only the emulation of one program to be aborted.
In the Intel Itanium 2 processor which is the environment for the implementation of the exemplary machine for this invention the assembly language for the machine provides access to hardware registers which allow for the precise location of a hardware fault to be determined and that information given, typically by the operating system, to the software emulation program.
Further consideration as to the specifics of any fault may also be important in the decision as to whether to recover the emulation of the legacy instructions for a specific program, to abort a user application, or to abort the entire emulation process. An example of this would be in analysis of what is commonly called a “segmentation error” by a program which is an access outside the boundaries of memory that are allowed to it. A segmentation error that was attempting to “read” a location in memory outside of its boundaries might be deemed less likely to have corrupted critical system memory components than a segmentation error that signals an attempt to write or “store” into that memory location.
The subject matter of the invention is particularly pointed out and distinctly claimed in the concluding portion of the specification. The invention, however, both as to organization and method of operation, may best be understood by reference to the following description taken in conjunction with the subjoined claims and the accompanying drawing of which:
The target operating system reference space 15 also contains suitable information about the interconnection and interoperation among the various target system elements and components and a complete implementation in software of the target system operating system commands which includes information on the steps the host system must take to “execute” each target system instruction in a program originally prepared to run on a physical machine using the target system operating system. It can be loosely considered that, to the extent that the target system 1 can be said to “exist” at all, it is in the target operating system reference space 15 of the host system memory 12. Thus, an emulator program running on the host system 2 can replicate all the operations of a legacy application program written in the target system operating system as if the legacy application program were running on a physical target system.
In a current state-of-the-art example chosen to illustrate the invention, a 64-bit Intel Itanium series processor is used to emulate the Bull DPS 9000 36-bit memory space and the instruction set of the DPS 9000 with its proprietary GCOS 8 operating system. Within the memory space of the emulator, the 36-bit word of the DPS 9000 is stored right justified in the least significant 36 bits of the “host” (Itanium) 64-bit word during the emulation process. The upper 28 bits of the 64-bit word are typically zero; however, sometimes, certain specific bits in the “upper” 28 bits of the “containing” word are used as flags or for other temporary purposes. In any case, the upper 28 bits of the containing word are always viewed by the “emulated” view of the world as being non-existent. That is, only the emulation program itself uses these bits or else they are left as all zeroes. Leaving the bits as all zeroes can also be a signal to the software emulator that it is “emulating” a 36-bit instruction, and the non-zero indication would signal a 64-bit instruction.
It is noted at this point that in actual practice the steps shown in
The subject invention can be practiced in host CPUs of any design but is particularly effective in those which include branch prediction registers which assist the hardware in handling branches and also benefits from CPUs employing parallel execution units and having efficient parallel processing capabilities. It has been found, at the state-of-the-art, that the Intel Itanium series of processors is an excellent exemplary choice for practicing the invention. Accordingly, attention is directed to
The CPU 100 employs Explicitly Parallel Instruction Computing (EPIC) architecture to expose Instruction Level Parallelism (ILP) to the hardware. The CPU 100 provides a six-wide and ten-stage pipeline to efficiently realize ILP.
The function of the CPU is divided into five groups. The immediately following discussion gives a high level description of the operation of each group.
Instruction Processing: The instruction processing group contains the logic for instruction prefetch and fetch 112, branch prediction 114, decoupling coupler 116 and register stack engine/remapping 118.
Execution: The execution group 134 contains the logic for integer, floating point, multimedia, branch execution and the integer and floating point register files. More particularly, the hardware resources include four integer units/four multimedia units 102, two load/store units 104, two extended precision floating point units and two single precision floating point units 106 and three branch units 108 as well as integer registers 120, FP registers 122 and branch and Predicate registers 124. In certain versions of the Itanium 2 architecture, six of the execution units can be utilized by the CPU simultaneously with the possibility of six instructions being started in one clock cycle, and sent down the execution pipeline. Six instructions can also be completed simultaneously.
Control: The control group 110 includes the exception handler and pipeline control. The processor pipeline is organized into a ten stage core pipeline that can execute up to six instructions in parallel each clock period.
IA-32 Execution: The IA-32 instruction group 126 group contains hardware for handling certain IA-32 instructions; i.e., 32-bit word instructions which are employed in the Intel Pentium series processors and their predecessors, sometimes in 16-bit words.
Three levels of integrated cache memory minimize overall memory latency. This includes an L3 cache 128 coupled to an L2 cache 130 under directive from a bus controller 130. Acting in conjunction with sophisticated branch prediction and correction hardware, the CPU speculatively fetches instructions from the L1 instruction cache in block 112. Software-initiated prefetch probes for future misses in the instruction cache and then prefetches specified code from the L2 cache into the L1 cache. Bus controller 132 directs the information transfers among the memory components.
The foregoing will provide understanding by one skilled in the art of the environment, provided by the Intel Itanium series CPU, in which the present invention may be practiced. The architecture and operation of the Intel Itanium CPU processors is described in much greater detail in the Intel publication “Intel® Itanium™ 2 Processor Hardware Developer's Manual” which may be freely downloaded from the Intel website and which is incorporated by reference herein.
The Itanium 2 is presently preferred as the environment for practicing the present invention, but, of course, future versions of the Itanium series processors, or other processors which have the requisite features, may later be found to be still more preferred.
Referring to
Continuing in reference to
Thus, while the principles of the invention have now been made clear in an illustrative embodiment, there will be immediately obvious to those skilled in the art many modifications of structure, arrangements, proportions, the elements, materials, and components, used in the practice of the invention which are particularly adapted for specific environments and operating requirements without departing from those principles.