The present invention relates to techniques and systems for executing a program compiled for a source architecture on a machine having a different target architecture.
A compiler, such as a Cobol compiler, translates a source code program made up of one or more source code files into object code including one or more processor language program files. These object code files, in some cases together with additional object files, can be linked and assembled into an executable program. Such an executable program is constrained to run only on a processor of a specific architecture and instruction set. Typically, a processor of a given architecture has associated with its architecture an instruction set. Processors having different architectures support different instruction sets, with the result that an executable program including processor instructions of one instruction set will not generally execute on a processor having a different architecture and different corresponding instruction set.
Just in time compilation, JIT, or dynamic translation, is compilation that occurs at runtime—during the execution of a program. Because the compilation happens at runtime, a JIT compiler can use dynamic runtime information to perform optimizations. Dynamic runtime parameters are not available to the static compiler, but they may be used by a JIT compiler to identify optimization opportunities, such as the in-lining of functions.
A typical compiler tokenizes and parses source code, constructs an abstract syntax tree to represent the code in an intermediate representation, may contain one or more optimization stages, and then uses a code generator to translate the optimized intermediate representation into executable code in the instruction set of the target architecture. LLVM is an open-source compiler framework that includes a just-in-time compiler, the ORC JIT, that can be used to compile source programs to executable code at runtime. The LLVM framework facilitates the development of retargetable compilers, which can employ different compiler backend, by employing a common intermediate representation. The intermediate representation of LLVM, LLVM IR, operates as a common abstraction layer, enabling backend compiler developers to implement target-specific functions and optimizations. A description of LLVM can be found in chapter 11 of “The Architecture of Open Source Applications, Elegance, Evolution, and a Few Fearless Hacks,” Brown, Amy and Wilson, Greg, eds., which explains the structure and evolution of the LLVM project.
A load module refers to all or part of an executable program, typically in the context of a legacy, mainframe computing environment. A compiler and linker may be used to translate a legacy source code programs, such Cobol programs or database applications, into an executable load modules that can run on a System 370, 390 ZOS mainframe. For a variety of reasons, recompiling the original legacy source program to run on a different target architecture is often undesired. Some examples include difficulty accurately identifying all of the required source code, difficulties that may arise if the source code is compiled with different settings, or in an environment that may inadvertently include different components, due to difficulties in performing functional or integration testing to ensure that the recompiled code will perform in the same manner as the original program, or the difficulty disentangling code from numerous, and often unknown dependencies. In other examples, source code may no longer be available, or a previous compiler or version of the compiler used with the original source code may no longer be available.
One approach to retargeting applications from an architecture running one hardware instruction set to an architecture running a different hardware instruction set is emulation. In a conventional emulation system, for each instruction or set of instructions that would ordinarily execute on the first architecture, the emulation system must perform appropriate translations, thereby simulating the operation of the emulated environment. Some emulation systems host a guest operating system in an emulator, and then run programs in a software layer atop the emulated execution of the guest operating system. This emulation approach may suffer from reduced performance due to multiple layers of translation required to execute the software. Other emulation systems emulate both the hardware and operating system functions of a first architecture, to allow applications written to run on the first architecture to operate in an emulated runtime environment. While this approach can reduce the number of layers in the emulation stack, it may also suffer from poor performance due to the overhead of emulating system or library calls, or other complex system functions. In addition to emulating the instructions, an emulator may also support memory management and system data structures, to ensure interoperability with programs running in a legacy environment. Performance of the emulator can be improved by optimizing the emulator's implementation of individual legacy hardware and software instructions. However, even an optimized emulator will typically not perform as well as a similarly optimized native application.
A load module compiler that can receive as input a compiled legacy load module such as a Cobol load module compiled for a System 390 mainframe and generate as output an executable program that could run on a 64 bit x86 architecture while continuing to make external references accessible would enable the migration of mainframe computing jobs to a non-mainframe environment without rewriting and/or recompiling the original Cobol source code. However, compilers that retarget executable code configured to execute in accordance with a first instruction set and environment into executable code configured to execute in accordance with a second instruction set and environment can be closely coupled to both the first and the second instruction sets.
Translating code from a first computer architecture to another typically introduces changes that break interfaces between previously interoperable programs. For example, in the case of call back programs that pass as parameters the addresses of routines or functions to allow a receiving program to invoke a routine in a calling program, translation through decompilation and recompilation to a different target architecture will typically change the addresses, and may change the size of address operands, so as to disrupt the ability of the receiving program to invoke the remote routine. Parsing, rather than decompilation of the received program may enable program translation, but the challenges of address and operand translation are still present.
One previous load module compiler operated by transcompiling load modules from executable legacy code to x86 code, by translating individual functions in C-program macro calls, and compiling each such function into x86 code that was subsequently linked and executable. Where an interpreter executed between 200 and 300 instructions in emulation for each native legacy instruction, the load module compiler generated executables corresponding to a given instruction that used far fewer instructions, resulting in considerable performance gains. Methods for enabling the transcompilation of executable programs, that translate the mappings required to correctly invoke external references have been discussed in U.S. Pat. No. 10,713,024 titled Load Module Compiler, which is incorporated by reference herein in its entirety. However, the need for a priori compilation by the load module compiler limited its flexibility.
A just-in-time load module compiler increased flexibility, but required recompilation each time a load module was run, and could not be used in cases of self-modifying programs.
Some executable programs are self-modifying, either for reasons of performance or interoperability with other programs. Such self-modifying programs blur the distinction between program data and program instructions, and are difficult to parse or to compile using just-in-time compilation techniques. The possibility that one such self-modifying programs may be present in a set of load modules presents an obstacle to just-in-time compilation. Whether or not a program modifies its own instructions may not be known until runtime, which can further complicate the identification of code that is suitable for execution in a load module compiler. In many applications, some program code is seldom, if ever executed. Just-in-time compiling code that is not executed, or is only rarely executed, is inefficient and reduces overall system performance. It would be desirable for a load module compiler to support self-modifying application code, to avoid the need to select only programs that are known not to be self-modifying for use with a load module compiler.
In order to take advantage of the performance benefits of a load module compiler, while ensuring that programs that are unsuitable for JIT compilation can still be executed on the alternate architecture, a load module compiler can be configured to share an operating environment with an emulation system. An example of such a hybrid environment is described in U.S. Pat. No. 9,779,034 titled Protection Key Management and Prefixing in Virtual Address Space Emulation System, which is incorporated herein by reference in its entirety. A load module compiler described in that environment included a decompiler that translated each legacy instruction of a Cobol load module into a macro in accordance with the rules of the C programming language. This intermediate representation of legacy instructions could then be processed by the back-end compiler of the load module compiler, to produce X86 instructions. Because the compiler implements the behavior of each legacy instruction individually, opportunities for optimization across sets of instructions are missed.
In addition, where the intermediate representation of load module compiler is proprietary, care must be taken to ensure that the state of the system, when running code generated by the load module compiler, will be consistent with the state of the system when running the corresponding code in emulation. In addition, the use of a proprietary intermediate representation format limits some optimizations that might be performed by the load module compiler, and increases the compiler development work that must be done to retarget load modules toward a different version of the x86 architecture, the ARM architecture, or to another architecture, since each retargeting requires the custom development of back-end compiler code for translating the proprietary intermediate representation into native instructions.
The present disclosure provides method for constructing a library of transformation functions for translating legacy programs from a source architecture to a target architecture, the method including: providing a first library of transformation functions that each transform a statement in a legacy executable program into a representation in an intermediate representation; receiving a load module; obtaining an original legacy instruction or legacy system call from the load module in a first system architecture; obtaining a function from said legacy function library, the function being in an intermediate representation of code for implementing the legacy function; inserting said function obtained from said library for said original legacy instruction or legacy system call into an intermediate representation of a basic block; inserting labels corresponding to said function into an index associated with said basic block; and storing said intermediate representation of said basic block and said index into a second library of transformation functions, wherein each transformation function of said second library represents a basic block encoded in an intermediate representation.
The method may further include any of the following additional steps or embodiments, which may all be combined with one another in any combinations unless clearly mutually exclusive: further including parsing a sequence of instructions or function calls of said load module by a parser; wherein said parsing includes identifying each instruction or function call in a basic block of said load module; wherein said basic block includes a sequence of instructions beginning with an entry point and continuing to a branch instruction; wherein said basic block includes a sequence of instructions beginning with an entry point and continuing to an instruction that branches to an address not already identified within the basic block; wherein said basic block includes a sequence of instructions beginning with an entry point, and continuing to the earlier of a branch instruction, or until a predefined threshold number of instructions is included in the basic block; or wherein the instructions of said basic block include a sequence of instructions beginning with an entry point, and continuing until a state saving operation is detected within a CSECT of the legacy executable program is detected.
The present disclosure further provides a non-transitory computer readable medium configured to store instructions, the instructions when executed by one or more processors to cause the one or more processors to execute a first legacy executable program compiled for a first architecture on a machine having a target architecture different from the source architecture by performing steps including: providing a first library of transformation functions that each transform a statement in said legacy executable program into a representation in an intermediate representation; receiving a load module; obtaining an original legacy instruction or legacy system call from the load module in a said first system architecture; obtaining a function from said legacy function library, the function being in an intermediate representation of code for implementing the legacy function; inserting said function obtained from said library for said original legacy instruction or legacy system call into an intermediate representation of a basic block; inserting labels corresponding to said function into an index associated with said basic block; and storing said intermediate representation of said basic block and said index into a second library of transformation functions, wherein each transformation function of said second library represents a basic block encoded in an intermediate representation.
The non-transitory computer readable medium may further include any of the following components, steps, or embodiments, which may all be combined with one another in any combinations unless clearly mutually exclusive: further including parsing a sequence of instructions or function calls of said load module by a parser, to identify each instruction or function call in a basic block; or wherein said basic block includes a sequence of instructions beginning with an entry point, and continuing to the earlier of a branch instruction whose target lies outside the basic block, or until a predefined threshold number of instructions is included in the basic block.
The present disclosure further provides a method of generating a library of intermediate representations of basic blocks of a first program, compiled for a source architecture having an instruction set that differs from the instruction set of a target architecture, for use by a load module compiler, the method including: providing a first library of legacy functions that each transform one or more instructions of said source architecture into an intermediate representation; generating by a decompiler, an indicator of the compiler type used to compile said first program according to said source architecture using metadata associated with said first program; based on said indicator, identifying by the decompiler, a set of instructions to initialize the first program; replacing said set of instructions with an intermediate representation of an initialization routine; parsing said first program by said decompiler, to identify sequences of instructions and system calls corresponding to a basic block of said first program; replacing said sequences of instructions and system calls, by in-lining functions from said first library into an object corresponding to said basic block; and storing the intermediate representation of said basic block in a second library.
The method may further include repeating the steps of parsing and replacing said sets of sequences for each basic block of the first program to be compiled by the load module compiler.
The present disclosure further provides a non-transitory computer readable medium configured to store instructions, the instructions when executed by one or more processors to cause the one or more processors to generate a library of intermediate representations of basic blocks of a first program compiled for a source architecture having an instruction set that differs from the instruction set of a target architecture, for use by a load module compiler, the generating of the library including: providing a first library of legacy functions that each transform one or more instructions of said source architecture into an intermediate representation; generating by a decompiler, an indicator of the compiler type used to compile said first program according to said source architecture using metadata associated with said first program; based on said indicator, identifying by the decompiler, a set of instructions to initialize the first program; replacing said set of instructions with an intermediate representation of an initialization routine; parsing said first program by said decompiler, to identify a sequence of instructions and system calls corresponding to a basic block of said first program; replacing said sequences of instructions and system calls, by in-lining functions from said first library into an object corresponding to said basic block; and storing the intermediate representation of said basic block in a second library.
The non-transitory computer readable medium may further include any of the following components, steps, or embodiments, which may all be combined with one another in any combinations unless clearly mutually exclusive: wherein said generating further includes repeating the steps of parsing and replacing said sets of sequences for each basic block of the first program to be compiled by the load module compiler; wherein said identifying a sequence of instructions by the decompiler includes selecting a sequence of instructions using a predefined parameter that specifies a maximum number of instructions permitted in a basic block; wherein said identifying a sequence of instructions by the decompiler includes identifying a branch to an instruction whose address lies outside the range of instructions determined to lie within the basic block; wherein said identifying a sequence of instructions by the decompiler further includes identifying a state saving operation within the specified maximum number of instructions, and ending the basic block at the state saving instruction; or wherein said identifying a state saving operation includes a memory write operation.
The present disclosure further provides a method of executing a first program compiled for a source architecture on a processor having a target architecture different from the source architecture, the method including: providing a library of legacy functions that each transform one or more instructions of said source architecture into an intermediate representation; parsing said first program to identify a sequence of instructions of said source architecture including a basic block; replacing the instructions with functions of said library, to generate an intermediate representation of said basic block; storing the intermediate representation of said basic block in a store; compiling by a back-end compiler, said intermediate representation of said basic block, into a representation of said basic block in said target architecture; storing said compiled representation of said basic block in a cache indexed by processor type; retrieving said compiled representation of said basic block from said cache; and linking said basic block in a runtime environment, said runtime environment configured for execution of instructions in accordance with said target architecture.
The method may further include any of the following additional steps or embodiments, which may all be combined with one another in any combinations unless clearly mutually exclusive: wherein said legacy functions of said library of legacy functions include functions of an interpreter, compiled into an intermediate representation; wherein said intermediate representation includes an LLVM-IR representation; wherein said legacy functions of said library of legacy functions further include one or more initialization functions; further including obtaining from metadata associated with said first program, an indication of the compiler type used to compile said first program into executable form according to said source architecture; further including using an indicator of the compiler type used to compile the first program into executable form according to said source architecture, to enable optimization by a decompiler; wherein said decompiler replaces a plurality of instructions of said next basic block in accordance with said source architecture, with an intermediate representation of an initialization routine in said intermediate representation of said basic block, based upon said indicator; wherein said decompiler replaces a plurality of instructions of said next basic block in accordance with said source architecture, with an intermediate representation of an input-output routine in said intermediate representation of said basic block, based upon said indicator; further including compiling by a second back-end compiler, said intermediate representation of said basic block, into a second representation of said basic block said target architecture, and storing said compiled basic block in said cache, with an index entry indicating the processor type associated with said second back-end compiler; further including removing by said back-end compiler an instruction to store a value in memory and a corresponding instruction to retrieve the value from memory from code in in-lined functions of the basic block; further including removing by said back-end compiler instructions to push parameters onto the stack, and pull parameters from the stack, in functions of the basic block; further including substituting the values of constants directly into the code of the basic block; further including removing the back-end compiler of the load module compiler, code that generates return values that are not used by the calling function; further including using the indication of compiler type to enable the replacement a legacy ABI call with a call to an optimized external function; or further including detecting an attempt by the basic block to write to its own program storage area, permitting the write, incrementing an indicator of writes to the program storage area by the basic block, and directing the runtime to execute a non-cached copy of the modified basic block.
The present disclosure further provides a non-transitory computer readable medium configured to store instructions, the instructions when executed by one or more processors to cause the one or more processors to execute a first program compiled for a source architecture on a processor having a target architecture different from the source architecture by performing steps including: providing a library of legacy functions that each transform one or more instructions of said source architecture into an intermediate representation; parsing said first program to identify a sequence of instructions of said source architecture including a basic block; replacing the instructions with functions of said library, to generate an intermediate representation of said basic block; storing the intermediate representation of said basic block in a store; compiling by a back-end compiler, said intermediate representation of said basic block, into a representation of said basic block said target architecture; storing said compiled representation of said basic block in a cache indexed by processor type; retrieving said compiled representation of said basic block from said cache; and linking said basic block in a runtime environment, said runtime environment configured for execution of instructions in accordance with said target architecture.
The non-transitory computer readable medium may further include any of the following components, steps, or embodiments, which may all be combined with one another in any combinations unless clearly mutually exclusive: wherein said legacy functions of said library of legacy functions include functions of an interpreter, compiled into an intermediate representation; wherein said intermediate representation includes an LLVM-IR representation; wherein said legacy functions of said library of legacy functions further include one or more initialization functions; further including obtaining from metadata associated with said first program, an indication of the compiler type used to compile said first program into executable form according to said source architecture; further including using an indicator of the compiler type used to compile the first program into executable form according to said source architecture, to enable optimization by a decompiler; wherein said decompiler replaces a plurality of instructions of said next basic block in accordance with said source architecture, with an intermediate representation of an initialization routine in said intermediate representation of said basic block, based upon said indicator; wherein said decompiler replaces a plurality of instructions of said next basic block in accordance with said source architecture, with an intermediate representation of an input-output routine in said intermediate representation of said basic block, based upon said indicator; further including detecting an attempt by the basic block to write to its own program storage area, permitting the write, incrementing an indicator of writes to the program storage area by the basic block, and directing the runtime to execute a non-cached copy of the modified basic block; further including unrolling the execution of the previous instruction of the basic block before said directing.
The present disclosure provides a method of executing a first program compiled for a source architecture on a processor having a target architecture different from the source architecture, the method including: providing a cache of compiled basic blocks, wherein each said compiled basic block is a representation of a basic block of said first program, translated from said source architecture into said target architecture; determining whether a next basic block for the execution of said program by a runtime environment having said target architecture is present in said cache, and whether said cached basic block includes a label required for execution by said runtime; upon determining that said label is present in said next basic block, linking said basic block in the runtime environment and executing the basic block; upon determining that said next basic block is not present in said cache, or that said next basic block is missing said label, initiating a process by a decompiler to identify the next basic block in said first program, and to translate said basic block into an intermediate representation; compiling the intermediate representation of said basic block into an executable for said target architecture, and storing said compiled basic block in said cache.
The method may further include any of the following additional steps or embodiments, which may all be combined with one another in any combinations unless clearly mutually exclusive: wherein said storing includes storing an object, object ID, and an indicator of said target architecture in said cache; wherein the entries of said cache are indexed by target architecture; wherein the entries of said cache are indexed by said program; wherein the entries of said cache are indexed by an identifier of a CSECT in said program; wherein the entries of said cache are indexed by an identifier of the target architecture, a hash of said program, an identifier of a CSECT in said program, and an instruction address; wherein said next basic block begins with a next instruction of said first program, and continues until a subsequent branch instruction; wherein said next basic block begins with a next instruction of said first program, and continues through subsequent instructions of said first program, until a branch to an address outside the range of addresses of said next instruction and said subsequent instructions; wherein the next basic block begins with a next instruction of said first program, and continues until a predetermined number of branch instructions have been parsed; wherein the next basic block begins with a next instruction of said first program, and continues through a predetermined number of instructions; further including detecting by a memory management unit of said runtime, an attempt by said next basic block, to write to the program storage area; further including permitting said next basic block to write to the program storage area, unrolling the execution of said next basic block, recompiling said next basic block, to incorporate the modified program instruction, and storing the compiled, modified next basic block in memory associated with the runtime; further including determining that the number of modifications the basic block has made to itself is less than a predefined threshold value; further including detecting that an instruction modification flag is not set and directing the dispatch of execution flow to an interpreter; wherein said translating said basic block into an intermediate representation includes in-lining functions from a library of functions that implement instructions and system calls of the first system architecture, wherein said functions in said library are stored in an intermediate representation; wherein said determining that the next basic block is in the cache includes a lookup based on the instruction address and the processor type; wherein said lookup is further based on an identifier associated with the C SECT in which the next basic block resides; further including detecting that a flag has been set indicating that a basic block has modified itself, and linking and executing an in-memory copy of the modified basic block, rather than the cache copy, for the modified basic block; further including detecting an attempt by the basic block to write to its own program storage area, permitting the write, incrementing an indicator of writes to the program storage area by the basic block, and directing the runtime to execute a non-cached copy of the modified basic block; or further including unrolling the execution of the previous instruction of the basic block before said directing.
The present disclosure provides a non-transitory computer readable medium configured to store instructions, the instructions when executed by one or more processors to cause the one or more processors to execute a first program compiled for a source architecture on a processor having a target architecture different from the source architecture by performing steps including:
The non-transitory computer readable medium may further include any of the following components, steps, or embodiments, which may all be combined with one another in any combinations unless clearly mutually exclusive: wherein said storing includes storing an object, object ID, and an indicator of said target architecture in said cache; wherein the entries of said cache are indexed by target architecture; wherein the entries of said cache are indexed by said program; wherein the entries of said cache are indexed by an identifier of a CSECT in said program; wherein the entries of said cache are indexed by an identifier of the target architecture, a hash of said program, an identifier of a C SECT in said program, and an instruction address; wherein said next basic block begins with a next instruction of said first program, and continues until a subsequent branch instruction; wherein said next basic block begins with a next instruction of said first program, and continues through subsequent instructions of said first program, until a branch to an address outside the range of addresses of said next instruction and said subsequent instructions; wherein the next basic block begins with a next instruction of said first program, and continues until a predetermined number of branch instructions have been parsed; wherein the next basic block begins with a next instruction of said first program, and continues through a predetermined number of instructions; further including detecting by a memory management unit of said runtime, an attempt by said next basic block, to write to the program storage area; further including permitting said next basic block to write to the program storage area, unrolling the execution of said next basic block, recompiling said next basic block, to incorporate the modified program instruction, and storing the compiled, modified next basic block in memory associated with the runtime; further including determining that the number of modifications the basic block has made to itself is less than a predefined threshold value; further including detecting that an instruction modification flag is not set and directing the dispatch of execution flow to an interpreter; wherein said translating said basic block into an intermediate representation includes in-lining functions from a library of functions that implement instructions and system calls of the first system architecture, wherein said functions in said library are stored in an intermediate representation; wherein said determining that the next basic block is in the cache includes a lookup based on the instruction address and the processor type; wherein said lookup is further based on an identifier associated with the C SECT in which the next basic block resides; further including detecting that a flag has been set indicating that a basic block has modified itself, and linking and executing an in-memory copy of the modified basic block, rather than the cache copy, for the modified basic block; or further including detecting an attempt by the basic block to write to its own program storage area, permitting the write, incrementing an indicator of writes to the program storage area by the basic block, and directing the runtime to execute a non-cached copy of the modified basic block; further including unrolling the execution of the previous instruction of the basic block before said directing.
The present disclosure further provides a method of executing a first program compiled for a source architecture on a system with one or more processors having a target architecture different from the source architecture, the method including:
The method may further include any of the following additional steps or embodiments, which may all be combined with one another in any combinations unless clearly mutually exclusive: wherein said detecting an attempt to write to a memory location in a memory block containing a compiled program instruction includes detecting an attempt to write to protected storage; further including unprotecting said memory block containing a compiled program instruction; wherein the data structure is a bit map of the address space of the first program; wherein the linked basic block was copied from a cache of compiled basic blocks and placed in protected storage to designate the memory blocks containing the compiled basic blocks as part of a program storage area; further including recompiling by the load module compiler, the basic block whose program instruction was modified; or further including: after creating said data structure, modifying the memory write routine of the runtime environment to: (a) determine, whether the data structure exists, and (b) upon determining that the data structure exists, to initiate a routine to determine, using the data structure, whether a write is a write to a program instruction in the program storage area; and further including: determining at runtime, whether to invoke memory protection handler code capable of performing said step of determining whether a data structure that stores indications of writes to the program storage area exists.
The present disclosure further provides a non-transitory computer readable medium configured to store instructions, the instructions, when executed by one or more processors to cause the one or more processors to execute a first program compiled for a source architecture on a system with one or more processors having a target architecture different from the source architecture by performing the steps including: initiating execution of the first program in a runtime environment configured to support the execution of programs compiled for execution in the source architecture, said runtime environment operating on one or more of said processors having said target architecture; after the first program has started execution, linking to the first program a basic block compiled by a load module compiler; detecting an attempt by an instruction of said basic block to write to a memory location in a memory block containing a compiled instruction of said first program; determining whether a data structure that stores indications of the addresses of instructions compiled by the load module compiler and linked to the first program exists; upon determining that the data structure does not exist, creating the data structure and populating it with indications of the locations of the instructions compiled by the load module compiler and linked to the first program; allowing execution of the write instruction to modify contents of said memory block containing a compiled instruction of said first program; determining, based on the contents of the data structure, whether the write instruction modified a program instruction; and upon determining that the write instruction modified an instruction of said first program, incrementing a counter associated with the basic block whose instruction was modified, or upon determining that the write instruction did not modify an instruction of said first program, continuing execution of the first program.
The non-transitory computer readable medium may further include any of the following components, steps, or embodiments, which may all be combined with one another in any combinations unless clearly mutually exclusive: wherein said detecting an attempt to write to a memory location in a memory block containing a compiled program instruction includes detecting an attempt to write to protected storage; further including unprotecting said memory block containing a compiled program instruction; wherein the data structure is a bit map of the address space of the first program; wherein the linked basic block was copied from a cache of compiled basic blocks and placed in protected storage to designate the memory blocks containing the compiled basic blocks as part of a program storage area; further including recompiling by the load module compiler, the basic block whose program instruction was modified; further including: after creating said data structure, modifying the memory write routine of the runtime environment to: (a) determine, whether the data structure exists, and (b) upon determining that the data structure exists, to initiate a routine to determine, using the data structure, whether a write is a write to a program instruction in the program storage area; or further including: determining at runtime, whether to invoke memory protection handler code capable of performing said step of determining whether a data structure that stores indications of writes to the program storage area exists.
The present disclosure further provides a method of executing a first program compiled for a source architecture on a processor having a target architecture different from the source architecture, the method including: providing a library of legacy functions in an intermediate representation, wherein the legacy functions implement one or more instructions, language functions, or runtime functions of said source architecture; selecting by a decompiler, a sequence of instructions of said first program compiled for said source architecture, wherein the sequence includes a basic block; identifying sets of one or more instructions in said basic block that correspond to one or more functions of said library; replacing the identified sets of one or more instructions with their corresponding library functions to generate an intermediate representation of said basic block; storing the intermediate representation of said basic block in a store; compiling by a back-end compiler, said intermediate representation of said basic block, into a representation of said basic block compiled for said target architecture; storing said compiled representation of said basic block in a cache; retrieving said compiled representation of said basic block from said cache; and linking said compiled representation of the basic block to the first program while the first program is executing in a runtime environment, wherein the runtime environment is configured to execute of instructions of the first program compiled for the source architecture on a processor of the target architecture.
The method may further include any of the following additional steps or embodiments, which may all be combined with one another in any combinations unless clearly mutually exclusive: wherein the intermediate representation includes an LLVM representation of the legacy functions; wherein said selecting by a decompiler is based in-part upon detection of an optimization setting applied by the compiler that compiled the first program for the source architecture; wherein said selecting by a decompiler is based in-part upon determining a version of the compiler that compiled the first program for the source architecture; wherein the decompiler identifies one or more of said sets by selecting a sequence including an ENC instruction preceded by one or more set up instructions; wherein the decompiler identifies the set up instructions preceding the ENC instruction based in-part on determining the version of the compiler that compiled the first program for the source architecture; wherein the decompiler determines the extent of said sequence including a basic block, based in part on detecting a repeated sequence of instructions whose index variable changes; wherein the decompiler determined the extent of said sequence including a basic block, based in part on detecting a repeated sequence of instructions whose index variable changes, followed by a loop; further including determining, by the decompiler, that the repeated sequence of instructions whose index variable changes is a partially unrolled loop; further including creating a loop in the intermediate representation based upon the partially unrolled loop detected by the decompiler; or wherein said compiling by the back-end compiler unrolls the re-rolled loop.
The present disclosure further provides a non-transitory computer readable medium configured to store instructions, the instructions when executed by one or more processors to cause the one or more processors to execute a first program compiled for a source architecture on a processor having a target architecture different from the source architecture by performing the steps including: providing a library of legacy functions in an intermediate representation, wherein the legacy functions implement one or more instructions, language functions, or runtime functions of said source architecture; selecting by a decompiler, a sequence of instructions of said first program compiled for said source architecture, wherein the sequence includes a basic block; identifying sets of one or more instructions in said basic block that correspond to one or more functions of said library; replacing the identified sets of one or more instructions with their corresponding library functions to generate an intermediate representation of said basic block; storing the intermediate representation of said basic block in a store; compiling by a back-end compiler, said intermediate representation of said basic block, into a representation of said basic block compiled for said target architecture; storing said compiled representation of said basic block in a cache; retrieving said compiled representation of said basic block from said cache; and linking said compiled representation of the basic block to the first program while the first program is executing in a runtime environment, wherein the runtime environment is configured to execute of instructions of the first program compiled for the source architecture on a processor of the target architecture.
The method may further include any of the following additional steps or embodiments, which may all be combined with one another in any combinations unless clearly mutually exclusive: wherein the intermediate representation includes an LLVM representation of the legacy functions; wherein said selecting by a decompiler is based in-part upon detection of an optimization setting applied by the compiler that compiled the first program for the source architecture; wherein said selecting by a decompiler is based in-part upon determining a version of the compiler that compiled the first program for the source architecture; wherein the decompiler identifies one or more of said sets by selecting a sequence including an ENC instruction preceded by one or more set up instructions; wherein the decompiler identifies the set up instructions preceding the ENC instruction based in-part on determining the version of the compiler that compiled the first program for the source architecture; wherein the decompiler determines the extent of said sequence including a basic block, based in part on detecting a repeated sequence of instructions whose index variable changes; wherein the decompiler determined the extent of said sequence including a basic block, based in part on detecting a repeated sequence of instructions whose index variable changes, followed by a loop; further including determining, by the decompiler, that the repeated sequence of instructions whose index variable changes is a partially unrolled loop; further including creating a loop in the intermediate representation based upon the partially unrolled loop detected by the decompiler; wherein said compiling by the back-end compiler unrolls the re-rolled loop.
The present disclosure also provides a method of executing a first program compiled for a source architecture on a system with one or more processors having a target architecture different from the source architecture, the method including: initiating execution of the first program in a runtime environment configured to support the execution of programs compiled for execution in the source architecture, said runtime environment operating on one or more of said processors having said target architecture; after the first program has started execution, linking to the first program a basic block compiled by a load module compiler; detecting an attempt by an instruction of said basic block to write to a memory location in a memory block containing a compiled instruction of said first program; determining whether a data structure that stores indications of the addresses of instructions compiled by the load module compiler and linked to the first program exists; upon determining that the data structure does not exist, creating the data structure and populating it with indications of the locations of the instructions compiled by the load module compiler and linked to the first program; allowing execution of the write instruction to modify contents of said memory block containing a compiled instruction of said first program; based in-part on the contents of the data structure, determining whether the write instruction modified a compiled program instruction of said first program that does not have a corresponding basic block in the cache; and upon determining that the write instruction modified an instruction of said first program that does not have a corresponding basic block in the cache, storing an indication that the first program is self-modifying and an indication of the memory location that has been modified.
The method may further include any of the following additional steps or embodiments, which may all be combined with one another in any combinations unless clearly mutually exclusive: further including selecting by a decompiler, another basic block whose addresses include the address that was modified by said write instruction, and generating an intermediate representation of said another basic block; further including compiling by a back-end compiler, said intermediate representation of said another basic block, into a representation of said basic block compiled for said target architecture; or further including linking the representation of the basic block compiled for said target architecture to the first program, and modifying a data structure containing indications of the locations of the instructions compiled by the load module compiler and linked to the first program.
The present disclosure further provides a non-transitory computer readable medium configured to store instructions, the instructions when executed by one or more processors to cause the one or more processors to execute a first program compiled for a source architecture on a processor having a target architecture different from the source architecture by performing the steps including: initiating execution of the first program in a runtime environment configured to support the execution of programs compiled for execution in the source architecture, said runtime environment operating on one or more of said processors having said target architecture; after the first program has started execution, linking to the first program a basic block compiled by a load module compiler; detecting an attempt by an instruction of said basic block to write to a memory location in a memory block containing a compiled instruction of said first program; determining whether a data structure that stores indications of the addresses of instructions compiled by the load module compiler and linked to the first program exists; upon determining that the data structure does not exist, creating the data structure and populating it with indications of the locations of the instructions compiled by the load module compiler and linked to the first program; allowing execution of the write instruction to modify contents of said memory block containing a compiled instruction of said first program; based in-part on the contents of the data structure, determining whether the write instruction modified a compiled program instruction of said first program that does not have a corresponding basic block in the cache; and upon determining that the write instruction modified an instruction of said first program that does not have a corresponding basic block in the cache, storing an indication that the first program is self-modifying and an indication of the memory location that has been modified.
The non-transitory computer readable medium may further include any of the following components, steps, or embodiments, which may all be combined with one another in any combinations unless clearly mutually exclusive: further including selecting by a decompiler, another basic block whose addresses include the address that was modified by said write instruction, and generating an intermediate representation of said another basic block; further including compiling by a back-end compiler, said intermediate representation of said another basic block, into a representation of said basic block compiled for said target architecture; or further including linking the representation of the basic block compiled for said target architecture to the first program, and modifying a data structure containing indications of the locations of the instructions compiled by the load module compiler and linked to the first program.
Although the above embodiments are described in the context of a program compiled for a source architecture on a platform having a target architecture different from the source architecture, one of ordinary skill in the art could apply the same embodiments in the context of a program compiled for a source architecture on another physical iteration of the platform having the same general architecture in order to optimize workflow across platforms. In a more specific embodiment, the platform may have a different configuration in the source architecture than in the other physical iteration of the platform.
The present disclosure provides a load module compiler that uses an intermediate representation. Some load module compilers are provided by the disclosure with an intermediate representation of a load module that leverages library components implemented for execution of a legacy application environment. Some load module compilers of the present disclosure may be able to generate native code that can be optimized for different target architectures, and that can support on-the-fly determinations of the desired target architecture. Some load module compilers of the present disclosure may optimize not only performance of individual legacy instructions or system calls, but also performance across multiple legacy instructions or system calls. Additionally, some load module compilers of the present disclosure can transcompile self-modifying programs. The present disclosure also provides a system to support the flexibility of just-in-time compilation, with the ability to reuse JIT-compiled blocks of code, outside of the immediate task or process. The present disclosure further provides a system that enables a load module compiler to create optimized code for replacing legacy ABI calls.
In one embodiment, an emulation environment (100) implements a legacy application environment (140) that provides a runtime environment in which legacy application programs may be executed. In one embodiment, the legacy application environment (140) includes a library of functions to support legacy instructions. The legacy application environment (140) may replace operating system calls and other functions with API calls that invoke optimized native APIs (135). The runtime environment, or a schedule operating in the legacy application environment (140), may identify a program, or a basic block within a program, as a candidate for runtime compilation by a load module compiler. The load module compiler may include a decompiler, which identifies basic blocks for translation, and invokes a library of functions to translate legacy hardware and software instructions into an intermediate representation. Preferably, the library of functions corresponds to the library of functions and native API calls used by the emulation environment. The legacy application environment may be thought of as a thin compatibility layer that allows legacy applications to make application calls. Though drawn to show that a legacy applications access native APIs (135) through legacy application layer (140), applications may also be written that access native APIs or host OS (120) calls directly.
The execution of complex instructions, such as the execution of operating system function calls, can be slow in emulation. By identifying such instructions, and replacing them with instructions to invoke native function calls through a set of Native APIs, the performance of an emulator can be significantly enhanced. In one example, sequences of instructions that invoke operating system calls are replaced with Execute Native Call (“ENC”) instructions to invoke native APIs (135), which run natively on the host hardware architecture (100). Preferably, a preprocessor substitutes ENC instructions prior to loading the legacy application (150) into the legacy application environment (140) of the runtime environment. In one embodiment of the inventive system, a system (100) provides runtime capability to emulate a legacy application environment. Libraries of functions written in a programming language such as C enable the emulation of legacy hardware and system calls by performing corresponding operations in the host environment.
The substitution of calls to optimized native functions for operating system calls and library function calls in a load module may improve performance in an emulation environment, but the introduction of the substituted ENC instructions can present a complication for a load module compiler. In one embodiment, the load module compiler may detect an ENC instruction, allow the instruction to execute in emulation, and then proceed with JIT compilation of subsequent instructions in the load module. In another embodiment, JIT compilation may occur before execution.
In another embodiment, a library of functions that implement legacy instructions, and a library of functions that implement ENC instructions is provided to the JIT load module compiler. When an instruction in the load module has a corresponding function in the library, the front end of the load module compiler incorporates the library function into the program. In some cases, recurring patterns of initialization instructions, followed by instructions that branch to code that implements a function may be emitted by the COBOL compiler. Similarly, when ENC instructions (215) and (225) are inserted into a load module to replace system or library function calls, those ENC instructions may be preceded by sequences of initialization instructions. The load module compiler may recognize such sequences of initialization instructions and replace them with a library function that includes the initialization instructions. For example, when an ENC instruction is identified, the JIT compiler identifies those instructions that set up the parameters required for the execution of the corresponding system call or library function call, and replaces those instructions and the ENC instruction with the ENC library function. In the context of a legacy COBOL program, the set up instructions typically concern the population of defined parameter data structures, and the placement of parameters or pointers to parameters in registers specified by the legacy architecture. Because this substitution is made at runtime, the parameter values and addresses are known to the system, allowing the JIT load module compiler to eliminate set up instructions. In one example a library of C program functions corresponding to legacy instructions and to ENC instructions is provided to a load module decompiler (320) (as depicted in
Different patterns of initialization instructions may be used. In one example, a sequence of three loads and a branch instruction to the code implementing certain ENC instructions are inserted into the load module, and can be detected by the load module compiler and replaced with a corresponding function from the function library. In operation, the compiler recognizes the branch to the ENC instruction, and inserts the library function corresponding to the ENC instruction, which has been adapted to include the three load instructions used to initialize the ENC instruction. In another example, the load instructions and the branch to the ENC instruction are not included in the library function, and are included in the decompiled basic block by the decompiler. Some ENC instructions used with COBOL or PL/1 functions may be initialized with two or four load instructions rather than three load instructions. In such cases, the load module decompiler (320) can recognize the corresponding sequence of load instructions for a particular ENC instruction. Other initialization patterns may be used. The load module decompiler (320) may similarly recognize the sequence of instructions, such as a sequence of load instructions or other set up instructions, that precede an in-lined function that had not been replaced by an ENC instruction.
Referring now to
As explained above the sequence of initialization instructions that precedes a particular function emitted by the legacy COBOL compiler may vary, depending on the version of the compiler used. In one embodiment, the library functions in the legacy function library (315) may be written to selectively include differing sets of initialization instructions. Depending on the compiler version, which may be determined using CSECT metadata, a corresponding set of initialization instructions may be selectively included in by the function in the function library. Since the substitution of functions from the legacy function library (315) into the CSECT happens at runtime, the compiler version information is available to the load module decompiler (320), allowing such compiler-specific optimization.
At runtime, the decompiler (320) of the load module compiler first identifies a basic block for just-in-time translation and execution. A basic block is typically a sequence of instructions that do not branch outside of the basic block, and is ended by a branching instruction to another subroutine or return. Non-branching instructions can be used to load, store, or move data among memory and registers, and to perform computations such as addition or shifts on the data. A branch or terminator instruction is an instruction that determines where to transfer control flow, such as a return or branch instruction, which may change control flow conditionally or unconditionally. Absent an externally driven interruption or error condition, the sequence of instructions will proceed from beginning to end without interruption. Using the legacy function library (315), the decompiler (320) translates the legacy instructions into an intermediate representation (330) and index (335).
In one embodiment, the load module decompiler (320) performs initial optimizations on the basic block. In one embodiment, the load module decompiler (320) includes a program routine to parse an overlay data structure generated by the legacy compiler that created the load module, to identify the CSECTs within the load module (310). The load module decompiler (320) may also include a program routine to parse the identification record associated with a CSECT, to identify the language and version of the compiler used to generate the corresponding C SECT within the load module. A load module (310) may include one or more CSECTs, and the different CSECTS may be stored in non-contiguous memory locations. Based on the compiler and/or version information obtained from the identification record, the load module decompiler (320) may selectively apply optimizations specific to the corresponding source language, or to the compiler. As explained above, one example of the use of compiler version information may be the selective inclusion of corresponding initialization sequences in a function from the legacy function library (315). For example an initialization sequence might use a load address instruction, rather than a load half word immediate instruction, as part of the initialization sequence. In another example, different instructions may have been emitted by the compiler because different versions or sub-versions of the compiler may support different instruction sets. For example, if a new compiler version or sub-version makes uses of processor instructions that were not previously available, the CSECT compiled with the newer compiler may make use of previously unavailable instructions.
In one example, the load module decompiler (320), upon detecting that a s390 COBOL compiler was used to create a CSECT, may identify a sequence of initialization instructions at the beginning of the CSECT, and substitute one or more initialization functions corresponding to the initialization sequence, rather than in-lining functions corresponding to the individual instructions or system calls that make up the initialization sequence of the CSECT. To improve performance, the corresponding initialization sequence or sequences were preferably pre-compiled into optimized LLVM-IR code, and included in legacy function library (315), though they may also be stored in a separate store accessible to the load module decompiler (320). The load module decompiler (320) may also omit labels for entry points. Application binary interfaces (ABIs) may also be removed by the load module decompiler (320) as the insertion of replacement functions, either from the legacy function library (315), or calls to external functions in the runtime library (360). As described above, one common application binary interface is the use of a sequence of load instructions that load parameters or pointers to data structures containing parameters before a branch to the corresponding library function. Other application binary interfaces, with different sequences used to initialize a function may be used.
In some cases, the ABI used by a particular function of in the legacy function library (315) may vary, depending on the version or sub-version of the compiler used to generate the CSECT. In such cases, the library function may be configured to selectively include corresponding set up instructions, as a function of the compiler version number. In such an implementation, the load module compiler may flatten differences between different compilers, making the execution of such code transparent to the compiler version used with the code. This automatic identification of, and inclusion of the appropriate ABI in the code is particularly helpful where the legacy code or the details of its compilation are poorly documented.
In some situations, the version level of the compiler may be insufficient to identify important differences in the emitted code. For example, where a compiler version has been updated to fix a known problem, the emitted code of the updated compiler will differ from code emitted previously. In such instances, reference to the compiler sub-version number may be required, for example, for the decompiler to recognize which ABI may have been used to set up the parameters used with a subsequent instruction or a library call.
In addition to the version or sub-version information obtained from the CSECT, the load module decompiler (320) may also use information obtained about the level of optimization applied by the compiler that generated the load module. Using the optimization level, the load module decompiler (320) may identify blocks of code that were optimized for the source machine, and translate them into an intermediate code representation (330) that either modifies or undoes the optimization, enabling the load module compiler (340) to apply its own optimizations that are suited to the target platform. For example and as further described herein, the load module decompiler (320) may detect an unrolled loop, and may opt to extend the size of the basic block to include a larger portion of the unrolled loop, or possibly the entire loop, even though such inclusion would expand the size of the basic block. In one embodiment, the decompiler may invoke a process to reroll a previously unrolled loop, or part of such a loop, generating an LLVM representation of the CSECT containing an unrolled loop. By doing this rerolling, the load module compiler (340), or one of its optimization passes, cane emit code that is optimized for the target platform.
The next stage of the load module compiler (340) receives the intermediate code (330) and index (335), and invokes a compiler (350) to generate executable code for the target architecture. A runtime library (360) is accessed to obtain external functions to be linked with the executable output of the load module compiler (340). The executable is then stored as an object in the cache (370) where it becomes available to the runtime environment (385).
When a call is made to a basic block that is present in the cache (370), the load module compiler must verify that the label used to call the block is present for the cached, JIT-compiled basic block. If the label is present, the load module compiler (340) invokes the in-memory linker (380) to link the compiled basic block to the in-memory executable (390). In one embodiment, the cache (370) resides on a POSIX-compliant architecture that permits shared access to the cache among multiple processors. Entries in the cache may reference the load module, CSECT, basic block, object ID, the processor type, instruction set identifier, or hashes of such values. The in-memory linker (380) retrieves the compiled basic block from the cache and links it into the in-memory executable (390) for execution in runtime environment 385. Alternatively, the sharing of cache (370) may be limited to a specific processor type, and a separate cache of compiled objects may be maintained for each processor type in the heterogeneous environment.
In a just-in-time implementation, basic blocks are compiled as they are encountered during the execution of a load module (310) by a runtime environment (385). By persisting the life of entries in the cache (370) beyond the life of the process executing the load module, a hybrid approach enables the runtime environment (385) to access previously compiled basic blocks and just-in-time compiled basic blocks during program execution.
The bits in a register or other storage location may reference an individual memory location, such as a byte of memory. Blocks of size other than a byte can be used, and often are used in referencing the contents of disk storage, caches, or other types of data stores. Bytes addresses have most frequently been used with microprocessors. Where the bits indicate the address of a byte in memory, the number of bits determines the extent of memory addressable to the processor. A 32-bit instruction can access a maximum of 2 to the 32nd power or 4 gigabytes (4,294,967,296) bytes of memory, whereas a 64-bit instruction set can theoretically access 2 to the 64th power or 16 exabytes (17,179,869,184 gigabytes) of memory, though for practical reasons, a smaller maximum virtual address space is often used. Executable computer programs, such as the load modules (310) that have been compiled to use 32-bit addresses, require that the addresses be translated into 64-bit addresses, if they are to run on a machine that uses a 64-bit instruction set.
When the load module compiler (340) converts the intermediate code representation retrieved from IR store (345) into object code including x86 instructions for assembly into an x86 executable, the entries in the index corresponding to 32-bit addresses in the address syntax are inserted into the object code generated by the compiler (340), rather than inserting 64-bit addresses of the target architecture for those entries. The entries in the table are not given an absolute address, but are assigned an external reference which the in-memory linker (380) may then assign to 64-bit addresses allocated to the executing, compiled program. In one embodiment, index location zero is reserved as invalid, and the index of externally referenced addresses begins at location one. In one embodiment, the Memory Management Unit (MMU) responds to an attempt to access instructions at the lowest addresses, which have not been allocated to the user space of the program, by causing the Linux operating system to generate a SEGV signal to invoke the exception handler. The exception handler is configured to access the index of 32-bit addresses and to translate the 32-bit address into a corresponding 64-bit address used by the compiled executable program. The exception handler may be performed to perform additional verifications, such as to support protection key management of memory addresses. An example of an exception handler and of prefixing schemes to perform such functions is described in PCT application PCT/IB2015/059646 titled “Protection Key Management and Prefixing in Virtual Address Space Application.”
In one exemplary embodiment, for which there were fewer than 16k addresses that were potentially externally referenced, the external references will be to addresses ranging from 0000 0000 0000 0000x to 0000 0000 0000 3FFFx. Because this range of addresses was not assigned to the program, an attempt to execute an instruction at these locations invokes the MMU and exception handler, which will determine the correct address and then retry the instruction at the proper address. Other sizes may be used. Where only the lower 4k addresses were unused, the range would be from 0000x to 0FFFx. In an 8k embodiment, the range is 0000x to 1FFFx. In another example, where ARM Linux is used, the default page size is 64k, and accessing the bottom range of addresses from 0-64k may similarly invoke the exception handler. Different ranges of addresses, or ranges of addresses that begin with a base address other than zero. In one embodiment, the load module compiler may generate pseudo-addresses, and implement a branch table to translate the pseudo-addresses of the load module compiler into 64-bit addresses used by an underlying Linux platform. As further described herein, the exception handler may also be configured to detect attempts to write to addresses in the program address space, and to handle such self-modifying code.
Legacy mainframe systems, such as the System/360™, System/390™, or System/Z architectures use storage keys to implement different levels of protected access to different portions of memory. The storage keys are typically stored in a table that has a control byte associated with each 4 KB block of memory, and a control byte containing a storage key is associated with each physical page of memory. Such a control byte may be structured to contain a four-bit field that indicates the protection key in bits 0-3, a protect bit is stored in bit 4, a change bit in bit 5, and a reference bit stored in bit 6. The setting of the fetch bit may indicate whether the protected status of the associated block should apply to both reads (fetches) and write accesses (stores) to the block. In this example, where for bits are used to encode the protection key, there are 16 protection keys numbered zero to fifteen. The protection key associated with a given task running on the processor is stored in the program status word (PSW) and is referred to as a storage access key. In operation, the system checks whether the storage access key in the program status word permits access to the protected memory. When the storage key does not permit access, storage protection logic will interrupt the task and initiate a protection exception.
In one embodiment, to provide support for protection keys, the interrupt handler of the LINUX® (Linus Industrial, Massachusetts) system on which the runtime operates is modified to support key verification. The key verification routine compares the storage access key associated with the current task to the storage key in an associated control byte to see whether the keys are equal. If the key verification routine determines that the key does not match, and if the access key is other than zero, then the system denies access and does not execute the instruction. If the key matches or is not zero, then the operation is permitted.
In operation, if the x86 executable refers to an indexed address, the runtime system uses the index to identify the 64-bit address of the corresponding instruction. However, a 32-bit program that has not been recompiled may still generate a 32-bit address. When a program attempts to address the lowest portion of the address space, an exception is generated and the memory exception handler performs the necessary address translation.
The linking and execution of compiled basic blocks is depicted in
In one embodiment, the code in the legacy function library (315) includes code that implements functionality corresponding to legacy instructions for a legacy application environment (140), where the code corresponding to each function of system operation is compiled from a source language such as C into optimized LLVM-IR code. By replacing legacy instructions or system calls with optimized LLVM-IR functions, the load module decompiler (320) generates a representation of the basic block in LLVM-IR, and an index (335) of LLVM IR labels. In one embodiment, the load module decompiler (320) recognizes values that are loaded into registers and subsequently used as branch addresses or passed to external routines, and includes corresponding labels in the index (335). Preferably the labels of the index (335) are LLVM-IR labels.
To identify the basic block, the load module decompiler (320) examines each instruction, determining whether it references a known library function or an external function. In the case of an external reference, the load module decompiler (320) inserts and external reference in index (425) and proceeds to next instruction. In the event a library function is detected (420), the code corresponding to that instruction is inserted (420), and the load module compiler proceeds to the next instruction (430). If the next instruction is a branch, return, or, in the event that code length is used to defined the basic block, reaches the maximum allowed size at step (440), the load module decompiler (320) inserts the return address (450) indicating the end of the basic block, and the LLVM-IR representation of the basic block (330) and its index (335) are stored in intermediate store (345). As described below, rather than simply identify a branch or a threshold number of instructions that indicates the end of the basic block, additional criteria, such as the identification of a nested set of branches, or detection that the decompiler is processing an unrolled loop, may result in the decompiler looping back to step (420), and processing further instructions. Where the system allows a set of nested loops, or allows the processing of an unrolled loop to extend the size of the basic block, the completion of the identified nested loops, or the completion of an unrolled loop, or a portion of an unrolled loop, may be detected at step (440).
In one embodiment, a basic block is selected by beginning with the address first instruction identified by the dispatcher (405), with the load module decompiler continuing to include subsequent instructions from the load module (310) until a branch is detected at step (440). The load module decompiler (320) in-lines individual functions taken from the legacy function library (315). Additional functions, such as library routines that implement mathematical operations, or other types of library functions, may also be compiled into an intermediate representation and included in the legacy function library (315). By constructing a basic block that includes code from multiple functions, the decompiler (320) enables the load module compiler (340) to perform optimizations that occur across functions.
In some cases, a basic block may extend beyond a branch instruction. In one embodiment, a basic block is selected by beginning with at a first instruction and continuing through a sequence of subsequent instructions of the load module until a branch instruction to an instruction whose address is not one of the earlier instructions in the same sequence. A substantial fraction of compute time in a typical program is spent in loops, and this embodiment permits the generation of code to optimize loop execution. In one embodiment, the definition of a basic block may be expanded to encompass nested sets of branches, to enable the use of loop optimizations by the back-end compiler. In one embodiment, a parameter may be set to define a maximum allowed length or a maximum allowed number of instructions. The logic of the load module decompiler may include code to recognize instructions or sequences of instructions that save state, and a basic block selected such that its length is less than the value indicated by the parameter, and concludes upon the execution of a memory write or other instruction to preserve state. A sequence of memory write operations, may also be identified for the termination of the basic block. Basic block selection logic may, in some instances, examine branches, a maximum allowed length, and recognition of state saving sequences of instructions.
The determination of the optimal length of a basic block may also be made, based in-part, on the optimization settings used when the original C SECT was compiled. For example, some versions of a COBOL compiler permit the use of optimization settings that will unroll loops for performance reasons. In the case of a very large loop, the compiler may have taken considerations such as the cache size of the legacy machine into account, in order to determine the number of iterations of a loop that should be unrolled into a particular block of code. While such compiled code may have been optimized for performance with a specific legacy machine configuration, the size of the available cache memory in the target machine in which the load module compiler is running may be very different. In one embodiment, the selection of a basic block in accordance with
In another example, the decompiler may detect that a load module was compiled to optimize to natively managed data types, or to modify initialization sequences, such as by loading data once and moving it to another register to initialize multiple fields, which may modify the code emitted by the legacy compiler. By detecting that such optimization settings were set, the load module decompiler (320) may insert suitably optimized functions from its legacy function library (315).
The decision as to how to limit the extent of a basic block is generally based on performance considerations. In some cases, rather than selecting the size of the basic block by proceeding to the next branch instruction, a basic block may be selected using a maximum permitted code length setting, or by using labels of other routines that call into the basic block. By allowing a basic block to extend beyond a branch instruction, the load module compiler can perform optimizations that span across the branch instruction. The insertion or in-lining of functions (430) into by the load module decompiler (320) may include the insertion of recursive functions into the basic block.
CSECTs generally include many basic blocks. Unlike a typical compiler, which translates an entire program from source code into an executable program, the load module compiler parses an executable load module to identify a next basic block, generates and LLVM-IR representation of the basic block, and then invokes a back-end compiler to generate an executable corresponding to the basic block, which is stored in the cache and may execute in the system runtime. By operating on basic blocks, rather than entire programs, the load module compiler enables the benefits of optimized just-in-time compilation that spans multiple program statements, without the loss of flexibility of a load module compiler design that must compile an entire program before execution may begin. In one embodiment, the load module decompiler (320) may allow the expansion of a selected basic block beyond the maximum permitted code length, to accommodate an unrolled loop. In another embodiment, the load module decompiler (320) may reroll the loop, or portions of the loop, both to reduce the size of the basic block, and to enable optimization by the load module compiler (340), which may unroll the loop differently, depending on the target processor, or the size of a cache of the target processor. In one embodiment, the size of the instruction cache may determine the desired level of optimization. In another embodiment, the size of a second level cache, or the amount of RAM in the configured target machine or container may be used.
In embodiments that permit the size of the basic block to extend beyond a branch, return, or a threshold max size as described above, the load module decompiler (320) may scan forward, to identify conditions that favor selection of a larger basic bloc. For example, where nested sets of loops are permitted within a basic block, the load module decompiler (320) may determine the extent of the nested set of loops with reference to index variables or repeating branch addresses. Where an unrolled or partially unrolled loop is selected to be within a basic block, the load module decompiler (320) may scan ahead to detect repeating sets of instructions with a varying index variable, and continue to iterate through steps (420) and (430) until the end of the loop is reached at step (440) before inserting the return address (450).
In the event that the load module decompiler (320) encounters an instruction or sequence of instructions that it cannot decompile, the load module decompiler (320) may be configured to set a flag or return a parameter indicating to the dispatcher that subsequent execution of the load module (310) should fall back to the emulation in the legacy application environment (140).
A back-end compiler (350) performs the optimizing compilation (460) of the LLVM-IR representation of a basic block stored in intermediate representation store (345), to create an executable object corresponding to the basic block. The executable code may be x86 code, ARM code, or code of another target architecture. If the compilation succeeds (465), the load module compiler (340) checks whether the newly compiled block is one that has been self-modified (485), and if so, returns to execution. If the newly compiled block was not modified by the CSECT, the load module compiler (340) adds the object and its corresponding ID to the cache (370) at step (470). The in-memory linker (380) then loads the object into the in-memory executable (390) in the corresponding runtime environment (385). Because the object and ID are already in memory, the system may proceed with the in-memory copy rather than load the object from the cache. Using the in-memory copy only for execution of the self-modifying code ensures that if another program accesses the same basic block, it will not initiate execution of the block in an undetermined state. If compilation fails at step (465), the load module compiler (340) sets a flag directing the dispatcher to fall back to interpreted execution (495) for the basic block. Alternatively, the flag could be cleared, or simply not set, if the flag were defined such that a set flag indicated use of the load module compiler, rather than the interpreter.
The load module compiler (340) preferably carries out a sequence of transformation passes that analyze the code for the basic block and optimize the code's performance. In one embodiment, an LLVM optimizer translates the LLVM IR code received from IR store (345) into optimized LLVM code and stores the optimized executable code corresponding to the basic block in cache (370). In one embodiment, the cache (370) is shared among multiple processors, but is ‘indexed’ by processor type. Sharing the cache by multiple processors allows multiple runtime environments (385) to re-use previously translated basic blocks.
The generation of executable code, optimized for a specific back-end architecture is preferably performed by a back-end compiler (350) by the load module compiler (340). In one embodiment, back-end compilers (350) for both the x86 and ARM environment are dynamically selected at runtime. In one embodiment, the cache (370) is further indexed by the different sets of extension instructions to the x86 or ARM architectures, and back-end compilers (350) that include different sets of extension instructions of the x86 or ARM architectures may be used to generate corresponding code. Though the process of compiling the basic block by the load module compiler (350) may be serialized, it is also possible to perform parallel compilation using multiple back-end compilers (350) to produce a set of objects in the cache (370) for use with different target architectures. By preventing multiple threads from compiling the same basic block, consistency of the cache is maintained and performance of the system improved. In one embodiment, compilation of an individual basic block is serialized to prevent inconsistent system behavior. In this embodiment, parallel operations are permitted involving the compilation of different basic blocks. Alternatively, parallel operations on the same basic block may be permitted where other methods of ensuring cache consistency are employed.
The execution of a segment of legacy program code involves the invocation of a sequence of different functions. If a load module compiler compiles and executes each function individually, the execution of the code requires calling and returning from functions for each instruction. By incorporating the functions themselves in a library, the load module compiler can significantly improve code optimization by in-lining function calls, thereby reducing the overhead of sequential jumps to different functions.
When a function call is separated from a basic block, the runtime environment must push parameters onto the stack, pull them off the stack, and execute the function separately from the calling routine. However, where the function is in-lined, the load module compiler can avoid this overhead. In addition, when the execution of a function is separated from that of the calling routine, the function must compute and return all of its output values, even if some of those values are not used. When the function is in-lined, the load module compiler can identify code that produces an unused value or values and remove it to improve performance. Because the load module compiler (340) operates on basic blocks obtained from IR store (345) that typically include many in-lined functions, the load module compiler (340) can perform these and other optimizations.
Many legacy program instructions make use of constants, or of computed values that are known or can be known at compile time. When such constants or computed values are determined and used in a sequence of program instructions, the LCM can use constant propagation, substituting the values of constants directly into the code at compile time. This technique is not available to a load module compiler that identically implements the behavior of each instruction. By enabling the load module compiler to optimize code across a calling block and a function, or across a series of function calls, the constant propagation becomes available to optimize code execution within a basic block.
In another example, some code segments perform multiple loads or stores to the same memory location. Where a program may be interrupted, these operations may be necessary to ensure that the runtime environment maintains a valid state. However, interim loads and stores to memory locations can be eliminated, where the register containing the value of interest is known to the compiler. Similarly, instructions to allocate memory to store the interim values, or data structures containing these values, may be eliminated across the basic block. In this way, a sequence of loads and stores to memory may be eliminated, and the optimized code need only store the final result back to memory.
A feature of some compilers is the use of specific registers for known tasks. For example, in s/390 and z/OS Cobol programs, register 15 is often used to carry the contents of a so-called RETURN-CODE. A calling routine can thus make use of the RETURN-CODE of the called routine by reading register 15, without the added overhead of the calling routine defining a parameter for the call, and the callee, in turn incurring the overhead of providing a parameter back to the caller. In one embodiment, the back-end compiler (350) of a load module compiler (340) identifies the use of register 15 to communicate a return code from a call function to the caller, and removes from the executable code, instructions associated with moving the return code between memory to register 15. In one embodiment, the load module decompiler (320) sets one or more flags to enable such optimizations by the load module compiler (340), using data identified by parsing a C SECT identification record containing metadata for the CSECT.
In one embodiment, when the runtime begins execution of a CSECT that has previously been compiled using the load module compiler, some of the basic blocks will be persistently stored in the cache (370), and the load module, together with those compiled basic blocks that are in the cache, will be loaded into protected storage and linked. While it is possible that the execution of the program might not use all of the basic blocks that were compiled during a prior execution, loading and linking such blocks reduces the overhead that would be incurred if linking the previously compiled basic blocks was delayed until runtime. For example, steps 415, 480, 475, and 490 would not need to be repeated while the application is running as each previously compiled basic block is encountered, where the cached basic blocks are loaded ahead of time.
In one embodiment, the decision as to the execution architecture is fixed, and a dedicated back-end compiler for the target architecture is used. In another embodiment, the decision as to the target architecture is made at runtime, and a flag informs the load module compiler which of a set of multiple back-end compilers should be used. For example, the decision use a different back end compiler to translate optimized LLVM IR code to x-86 or ARM architectures could be made at runtime. The runtime decision may, for example, support deployment to different versions of the x-86 or ARM architectures, which support enhanced or modified instruction sets. Other instruction architectures, such as MIPS, PowerPC, NVIDIA, Qualcomm Hexagon, or even legacy architectures such as S/390 or z/OS instruction architectures may be used.
In one embodiment, back-end compilers (350) may be adapted to generate legacy S/390 code using different instruction extension sets may be employed to assess the performance impact of the use of different instructions, or the compatibility of applications with architectures running different legacy instruction set architectures. In some cases, some of the functions needed to implement the behavior of, for example, the s390 instructions require calls to external run-time functions. In such cases, the output of the back-end compiler must be linked to the executable external run-time function. Such an application may be particularly useful where the availability of a legacy test environment, or the ability to execute a legacy test environment under a specific set of conditions is limited. Another application is the performance of backward-compatible translation, as may be desired in order to migrate an application to a system whose architecture lacks support for some instructions.
Just-in-time compilers typically cannot address self-modifying code. In one embodiment, the load module compiler is equipped to accommodate self-modifying code. The compiler places the compiled executable code in a protected range of memory addresses. When an instruction seeks to write to a memory location containing instructions, a memory protection exception is thrown. In the embodiment depicted in
The inventive design of the JIT load module compiler depicted in
For example, where a particular program is bound to a legacy processor, the load module compiler select a backend compiler (350) to target the JIT-compiled program to the original instruction set (e.g. s390 or z/OS), or to the legacy instruction set to which the application is bound. Preferably, in such instances, the load module decompiler (320) would detect that the target environment is a legacy architecture, so that different legacy function libraries (315) might be included where necessary, to accommodate the native legacy environment. In one example, the load module compiler might direct its output to execute in a runtime environment instantiated in, for example, a Z/OS Linux instance. Such an implementation may be used in a production environment, or in a test environment, such as for the verification of a new component or peripheral, or to otherwise validate the interoperability of the legacy load module with other systems.
An illustrative embodiment of modifications to the memory protection handling of the system to accommodate the handling of self-modifying code by the load module compiler is shown in
Where instructions compiled by the load module compiler attempt to write to a location in memory containing program instructions, a memory protection fault will be detected, invoking the exception handler (505). This exception may be triggered, for example, where the memory assigned to the CSECT containing the basic block in question is protected memory. The exception handler determines whether the attempted write is to a program storage area (510), which will be the case of self-modifying code. If the write is not to a program storage area, then the handler operates as it would for an ordinary protection fault (515), as might occur due to a need to access virtual memory, handling protected memory access, or for other reasons. After determining that the write is to the program storage area (510), an indicator is checked to determine whether the basic block is a read-only block (520). In one embodiment, the indicator that the cached basic block is designated a read-only block was associated with the cached basic block identified by the decompiler module (320) reading metadata associated with the load module, and placing a corresponding indicator into IR store (345). The indicator could also have been set after compilation, or stored outside of the cached basic block, in a data structure that is accessible by the runtime (380). If a read-only basic block tries to write to is program storage area, an error condition occurs (525). If the block is not designated as read-only, then the basic block is permitted to issue the write instruction to the program storage area (530), and a counter is incremented (535). Preferably, the write to the protected program storage area is only permitted where the program is writing to protected memory that has been allocated to the CSECT to which the basic block belongs. The modification is made to the in-memory copy of the basic block, rather than to the copy of the basic block resident in cache (370), to ensure consistency of the cached copy. Next, an instruction modification flag is checked (540) to determine whether the code has previously been marked as reentrant code. In one embodiment, the flag is set to negative by default, such that a basic block retrieved from the cache (370) is presumed not to be reentrant. Alternatively, the default assumption may be that a program is reentrant. At step 540, the flag indicating that the program has modified itself rather than indicating reentrancy, may be checked. If at step (540) the flag has already been set, then the counter is compared to a threshold setting in step (545). Just-in-time compilation of programs that make too many modifications to themselves is inefficient. If the counter value is greater than or equal to a threshold setting, then a flag is set to direct the dispatcher to use the interpreter (550), rather than to continue to JIT-compile the basic block. If the count of writes to the program storage area is less than the threshold value, then JIT compilation will proceed. In one embodiment, at step (570), the execution of the previous instruction may be unrolled. In another embodiment, the execution of the basic block continues at step (570). Whether or not the last instruction is unrolled, the JIT-compiled basic block is deleted from memory, and the corresponding bits of the bitmap are cleared, if a bitmap is used. This allows the recompiled block to be loaded in memory, and bitmap settings reflecting the addresses of instructions present in the now recompiled block to be set, before execution of the basic block proceeds.
In one embodiment, after unrolling execution of the previous instruction at step (570), at step (575), the runtime checks whether the flag directing execution in the interpreter is set. In either case, at step (570), the previous execution of the basic block is unrolled. Specifically, this unrolling includes deleting JIT-compiled basic block from memory, and clearing the corresponding bits of the bitmap, if a bitmap is used. In some cases. If the flag is not set, then the dispatcher directs recompilation of the basic block and insertion of the recompiled block in the main memory of the runtime (580), where the modified block resides, rather than from the cache (370), which stores the unmodified version of the basic block. If the flag requiring execution of the basic block by the interpreter is set, then the dispatcher directs the execution flow for the basic block to the interpreter (590). The interpreter is able to proceed with execution of the next instruction because, at the time of the interrupt, state was saved. This lazy detection of the reentrant status of the basic block improves system performance where the common case is that programs are not reentrant. In the common case of programs that do not modify themselves, the lazy detection and setting of the instruction modification flag, system performance is improved because the runtime avoids executing unnecessary instructions to determine whether ordinary writes to memory are writes to program instructions, and also avoids the overhead of creating and maintaining data structures to track such writes.
In another embodiment, execution of the basic block continues at step (570), without unrolling the last instruction of the basic block. In this embodiment, the dispatcher directs recompilation of the basic block and insertion of the recompiled block in the main memory of the runtime (580), where the modified block resides, rather than from the cache (370). In this embodiment, the number of times that a basic block modifies itself may exceed the threshold, if the basic block further modifies its own code before it completes execution. However, after execution of the basic block has completed, the set flag has been will cause the dispatcher to direct execution of the basic block to the interpreter if it is invoked again by the CSECT.
In an alternative embodiment, the default state for a basic block could be to have a flag set to permit the execution of self-modifying code. In such a system, the attempt by the block to write to the program storage area would still cause a memory protection fault (505), but the flag would signify whether the basic block is permitted to modify itself, rather than whether the block has in fact modified itself. In this embodiment, the setting of the flag at step (560) is not required, but the flag must be cleared in step (550). A person of ordinary skill in the art would recognize that the program code could be implemented to test for an unset rather than a set condition, of to change flag settings if the count exceeded a threshold, rather than if the count were equal to a threshold. Alternatively, rather than setting a flag for use by the dispatcher, the exception handler could use a return code or other signal at step (590) to indicate to the dispatcher or the runtime environment to place the object in memory, but not in the cache.
In some applications, CSECTs may store program data, in addition to program instructions, in the program storage area. Such CSECT may be self-modifying in they write to such data, rather than to instructions, that are located within the program storage area. However, such operations would not generally warrant recompilation, and the associated cost of such recompilation. Because computer programs generally modify data with much higher frequency than their own instructions, the operation described above may result unnecessary recompilation, or in redirecting such programs to the interpreter (590), even though they make few, or even no modifications to their program code. By using table that identifies which addresses in the program's memory space correspond to instructions, one might distinguish between CSECTs writing to their own data, which does not require recompiling the code, and CSECTs writing to their own instructions. However, the creation of a table to map the entire memory space of the program, and checking this table for every write would introduce considerable overhead to the system. The use of a bitmap, rather than a table, to indicate the addresses of program instructions, may reduce overhead somewhat. But, creating such a bitmap for every program, and performing a check against the bitmap for every write by every program, negatively effects performance.
In some cases, a load module (310) may be marked with metadata prohibiting the code to be reentrant or indicating that the code is read-only code. The metadata may alternatively indicate whether the code is permitted to be self-modifying. The load module (310) may include one or more CSECTS, which may not be contiguous. Where CSECTS are discontiguous with each other, each CSECT has its own corresponding memory area. An individual C SECT may be marked with its own metadata. In one embodiment, the loader detects the circumstance in which a program should not permit modifications to its own code, and places the program in protected memory to prevent modification. If a program that has been placed in protected memory attempts to write to itself notwithstanding the restriction, the memory protection fault will cause the memory handler to interrupt execution and return an error.
By managed placement of compiled blocks in protected or unprotected memory, the lazy creation of a bitmap (or other data structure) indicating the addresses of compiled program instructions, and the lazy verification of writes to the program storage area against the bitmap, the problem of redirecting CSECTS that modify internal data can be resolved, while avoiding the overhead associated with requiring such verifications for every program write operation.
In one embodiment, when a block that has been compiled by the load module compiler is linked and loaded, the loader is configured to designate the memory containing the block as protected memory by default. As illustrated in
An advantage of the approach depicted in
As seen in
In one embodiment, the compiled basic block is allowed to run to completion, whether or not the flag is set. In this embodiment, after step (1240), the basic block is recompiled and inserted in the main memory of the runtime (1255), rather than in the cache (370). In this embodiment, the number of times that a basic block modifies itself may exceed the threshold, if the basic block further modifies its own code before it completes execution. However, after execution of the basic block has completed, the set flag has been will cause the dispatcher to direct execution of the basic block to the interpreter if it is invoked again by the CSECT. Alternatively, after step (1240), the runtime may check the setting of the flag at step (1245). In this alternative embodiment, if the flag has been set, then execution is dispatched to the interpreter (1250). The interpreter is able to resume execution of the basic block because state was saved when the memory exception occurred. If at step (1245), the flag is not set, then the modified basic block is recompiled and inserted in RAM, rather than in the cache, the bitmap is updated to include settings for the recompiled basic block, and execution continues using the compiled basic block.
An advantage of the arrangement depicted in
In another embodiment, the memory locations containing the code compiled by the load module compiler are not stored in protected memory, and the determination as to whether or not a program write was to a program storage area is carried out, not by the memory protection fault handler, but instead by the write function implemented in the runtime library (315). In this embodiment, a bitmap to indicate the addresses of JIT-compiled blocks is always used by the load module compiler, and every program write checks against the bitmap to determine whether or not there has been a write to a compiled basic block. This approach increases the overhead, because the bitmap is created even for programs that do not attempt to write to a program storage area, and because the bitmap must be checked for every program write. Although the overhead is higher using this approach will generally be higher, with a corresponding decrease in system performance, it may be desirable to implement in this fashion where modifying the memory protection handler is not desired, or is not possible.
In some situations, it may be desirable to make the determination of whether to apply the approach that relies on the approach depicted in
It is possible that the instructions of a self-modifying program might modify instructions that are part of the CSECT, but have not previously been executed. If the write instruction is to a basic block that has not previously executed in this instance, but that has previously been compiled by the load module compiler, the mechanism described above with respect to
The first instruction loads a structure, CEECAA, that describes the language environment. The second instruction loads a vector that indicates functions. In this case, the number 92 indicates and offset of 92 into the table identified by the previous table. The next instruction loads the address of the function, which is at an offset of 256 in the table loaded in the previous instruction. Finally, the fourth instruction stores a return address at R14, and then branches to the address returned by the function at the 256 offset. The fifth through seventh instructions of
Absent optimization by the load module decompiler (320), these instructions would be translated into the LLVM-IR representation. An earlier load module compiler would have translated the instructions into C program instructions, as shown in
Similar to
An article of manufacture such as a disk, tape, flash drive, optical disk, CD-ROM, DVD, EPROM, EEPROM, optical card or other type of processor-readable storage medium may be used for storing electronic instructions. Computer instructions may be downloaded from a computer such as a server, to a requesting client computer or handheld device, using a communications link or network connection.
A system for storing and/or executing program instructions typically includes at least one processor coupled to memory through a system bus or other data channel or arrangement of switches, buffers, networks, and channels. The memory may include, cache memory, local memory employed during execution of the program. Computers that run such instructions may be standalone computers or networked computers, in a variety of different form factors such as servers, blade servers, laptops or desktop computers, mobile devices such as tablet or other multi-function handheld computing devices. Though the disclosed herein used Intel x86 or ARM processors, other processors may be used without effect on the invention disclosed herein. Main memory can be Random Access Memory (RAM), or other dynamic storage devices known in the art. Read only memory can be ROM, PROM, EPROM, Flash/EEPROM, or other known memory technologies. Mass storage can be used to store data or program instructions. Examples of mass storage include disks, arrays of disks, tape, solid state drives, and may be configured in direct attached, networked attached, storage area network, or other storage configurations that are known in the art. Removable storage media include tapes, hard drives, floppy disks, zip drives, flash memory and flash memory drives, optical disks and the like. Computer program instructions for performing operations of the systems described herein may be stored in one or more than one non-transitory storage medium, including of the various different types of non-transitory storage media discussed herein.
The embodiments described above described input/output between the host and appliance devices, and examples including input/output between the legacy and appliance processors and DASD, local disks, and main memory. Other examples of input/output devices which may be applicable to embodiments of the invention described herein include additional I/O devices (including, but not limited to, keyboards, pointing devices, light pens, voice recognition devices, speakers, displays, printers, plotters, scanners, graphic tablets, disk drives, solid state drives, tape drives, CD-ROM drives, DVD drives, thumb drives and other memory media, etc.) can be coupled to the system either directly or through intervening I/O controllers. Network adapters may also be coupled to the system to enable the data processing system to be coupled to other data processing systems or remote printers or to storage devices through private or public networks.
Some embodiments described above describe the setting of flags in response to various conditions, such as the setting of a flag to direct the use of the interpreter for a basic block. As used herein, references to setting a flag shall be understood to include not only writing a specified value to the flag, but also to include not changing the value of the flag, where the existing value already indicates the desired setting. For example, if the default state of a flag were null or zero, a person of ordinary skill in the art would understand setting the flag to the null or zero state includes leaving the state of the flag unchanged from its default setting. Similarly, a person of ordinary skill in the art would understand that defining a flag to have one meaning when set, and another meaning when unset is equivalent to defining the flag to have the first meaning when unset and the second meaning when set, and using the corresponding opposite settings to evaluate a condition.
Many examples are provided herein. These examples may be modified without departing from the spirit of the present invention. The examples and embodiments described herein are only offered as examples, and other components, modules, or products may also be used. For example, additional architectures may be used and other types of machine instructions or operating system calls may be used. Additionally, although certain specific message types and databases are described herein, any suitable message type may be used. There are many other variations that can be included in the description described herein and all of these variations are considered a part of the invention.
This application claims the benefit of priority of U.S. Provisional Patent Application No. 63/154,333, filed Feb. 26, 2021, titled HYBRID JUST IN TIME LOAD MODULE COMPILER, which is hereby incorporated by reference herein in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2022/051686 | 2/25/2022 | WO |
Number | Date | Country | |
---|---|---|---|
20240134666 A1 | Apr 2024 | US |
Number | Date | Country | |
---|---|---|---|
63154333 | Feb 2021 | US |