Computers are designed to execute computer programs written in a format understood by the computer, such as machine code. However, programmers typically do not write computer programs in machine code. Instead, programmers typically write computer programs using human readable programming languages (e.g., Java™ (Java™ is a trademark of Sun Microsystems, Inc.), C++, C#, etc.). The resulting computer program is then compiled to generate a computer program in a format understood by the computer.
The process of compilation is performed by a compiler. Compilers typically take the source code (e.g., computer program written in a human readable programming language) as input and generate an executable (e.g., a computer program in a format understood by the computer). Compilation is typically not a single step process from the source code to the executable. Rather, the compilation process translates the source code into an Intermediate Representation (IR). The IR may then be translated into the executable. Depending on the complexity of the compilation, there may be multiple intermediate representations between the source code and the executable. The following follow diagram illustrates this point.
Source code→IR→(IR)*→Executable, where * denotes ≧0
As discussed above, the source code (or previous IR) is translated to an IR and then subsequently translated from the IR to the executable (or subsequent IR). When writing a compiler that uses an IR(s), it is important to validate the translation from the source code (or previous IR) to the IR. Conventionally, the following method is used to validate the translation or, more specifically, the translation algorithm(s), used to obtain the IR.
Initially, the source code (or previous IR) is translated to obtain the IR. The source code (or previous IR) is associated with a known interpretation result. Said another way, the execution result of the executable corresponding to the source code is known. Returning to the method, the IR is subsequently input into an interpreter. The purpose of the interpreter is to execute the IR to obtain an interpretation result. Execution of the IR typically includes performing syntactic interpretation (e.g., β-reduction) to obtain one or more primitives (i.e., one of the basic building blocks of the IR). At this stage, the syntactic interpretation temporarily halts and meta-evaluation is invoked.
Meta-evaluation typically involves searching for the appropriate execution support for the primitive. Said another way, meta-evaluation involves determining whether the interpreter includes functionality to execute/evaluate the primitive and obtain a result. It is often the case that the interpreter does not include functionality to execute/evaluate the primitive. In such cases the interpreter may take one of the following courses of action. The interpreter, upon determining that it does not have functionality to evaluate the primitive fails, thereby halting the execution of the IR by the interpreter.
Alternatively, the interpreter may obtain a second IR, corresponding to the primitive. This alternative is typically taken when the primitive corresponds to a method call to a method referenced outside of the IR. Once the second IR corresponding to the primitive is obtained, the interpreter proceeds to execute the second IR to obtain the second execution result. If the interpreter successfully executes the second IR (i.e., interpreter includes functionality to execute/evaluate each encountered primitive during execution of the second IR), then the interpreter may return the second execution result to the interpreter such that the interpreter may continue executing the original IR. However, if the interpreter does not include functionality to execute/evaluate each encountered primitive in the second IR, then the interpreter may fail or obtain and execute/evaluate subsequent IRs corresponding to the encountered primitives that the interpreter does not include functionality to execute/evaluate.
Once the interpreter has generated the interpretation result (assuming that it has not failed during the interpretation of the IR), the interpretation result is compared to the expected result of the executed source code. If the results are the same, then the translation to the IR from the source code has been validated. Alternatively, if the interpretation result is not the same as the expected result, then an error likely exists in the translation algorithm used to generate the IR from the source code.
In general, in one aspect, the invention relates to a computer readable medium comprising executable instructions for verifying generation of an intermediate representation (IR). The generation of the IR is verified by generating the IR from source code and interpreting the IR to obtain an interpretation result. Interpreting the IR includes encountering a method call in the IR, locating an execution unit corresponding to the method call, executing the execution unit to obtain an execution result, replacing a portion of the IR with the execution result to obtain a reduced IR, and obtaining the interpretation result from the reduced IR. Finally, the interpretation result is compared to an expected result of the source code, wherein the generation of the IR is verified if the interpretation result equals the expected result.
In general, in one aspect, the invention relates to a computer readable medium comprising executable instructions for verifying the generation of an intermediate representation (IR). Verifying the IR includes generating the IR from source code and interpreting the IR to obtain an interpretation result. Interpreting the IR includes encountering a method call in the IR, locating an execution unit corresponding to the method call and determining whether a system executing the executable instructions includes functionality to execute the execution unit. If the system includes functionality to execute the execution unit, executing the execution unit to obtain an execution result. If the system does not include functionality to execute the execution unit: locating an equivalent execution unit corresponding to the execution unit, wherein the system is configured to execute the equivalent execution unit, executing the equivalent execution unit to obtain an equivalent execution result, and generating the execution result from the equivalent execution result. A portion of the IR is then replaced with the execution result to obtain a reduced IR, the interpretation result is obtained from the reduced IR, and the interpretation result is compared to an expected result of the source code, wherein the generation of the IR is verified if the interpretation result equals the expected result.
In general, the invention relates to a system comprising an interpreter. The interpreter is configured to obtain an intermediate representation (IR) and interpret the IR to obtain an interpretation result. Interpreting the IR includes encountering a method call in the IR, locating an execution unit corresponding to the method call, and determining whether a system executing the executable instructions includes functionality to execute the execution unit. If the system includes functionality to execute the execution unit, then executing the execution unit to obtain an execution result. If the system does not include functionality to execute the execution unit, then locating an equivalent execution unit corresponding to the execution unit, wherein the system is configured to execute the equivalent execution unit, executing the equivalent execution unit to obtain an equivalent execution result, and generating the execution result from the equivalent execution result. A portion of the IR is then replaced with the execution result to obtain a reduced IR, the interpretation result is obtained from the reduced IR, and comparing the interpretation result to an expected result of the source code, the generation of the IR is verified if the interpretation result equals the expected result.
Other aspects of the invention will be apparent from the following description and the appended claims.
Specific embodiments of the invention will now be described in detail with reference to the accompanying figures. Like elements in the various figures are denoted by like reference numerals for consistency. Further, the use of “ST” in the drawings is equivalent to the use of “Step” in the detailed description below.
In the following detailed description of one or more embodiments of the invention, numerous specific details are set forth in order to provide a more thorough understanding of the invention. However, it will be apparent to one of ordinary skill in the art that the invention may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid obscuring the invention.
In general, embodiments of the invention relate to a method and system for extending the functionality of an interpreter to execute intermediate representations. Further, by extending the functionality of the interpreter to execute the intermediate representations, the interpreter may be used to validate the translation of the source code to the intermediate representation.
In one embodiment of the invention, the intermediate representation (IR) corresponds to a representation of a computer program that is not directly executable by the computer (i.e., it is not machine code) but is executable by an interpreter. Further, in one embodiment of the invention, the IR may be obtained via translation from source code or from another IR.
In one embodiment of the invention, source code corresponds to any computer program (or portion thereof) that is not machine code. Examples of source code include human readable computer programs (or portions thereof), byte code, and intermediate representations derived (directly or indirectly) from byte code or from human readable computer programs.
Continuing with the discussion of
Continuing with the discussion of
As shown in
By repeating the β-reduction, the IR may be reduced from a complex set of expressions to one or more primitives. Primitives correspond to expressions in the IR (204) that cannot be simplified any further using syntactic evaluation. At this stage, the Interpreter (206) initiates meta-evaluation of the primitives.
In one embodiment of the invention, meta-evaluation of the primitives includes initially analyzing the built-in functions of the Interpreter (206) to determine whether the Interpreter (206) includes functionality to execute/evaluate the primitive. If the Interpreter (206) includes built-in functionality to execute/evaluate the primitive, then the Interpreter (206) proceeds to use the built-in functionality.
Alternatively, if the Interpreter (206) does not include built-in functionality to execute/evaluate the primitive, then the Interpreter (206) proceeds to determine whether the primitive corresponds to a method call (i.e., a call to a method, where the method is not defined within the IR (204) or the source code from which is was derived). If the primitive corresponds to a method call, then the Interpreter (206) proceeds to locate an execution unit corresponding to the method call.
In one embodiment of the invention, the execution unit corresponds to an executable version of the method call in the IR (204). In one embodiment of the invention, the execution unit may correspond to byte code, machine code, or any other type of executable code, which is capable of execution by any program or system other than the Interpreter (206).
Once the execution unit has been located, the Interpreter (206) invokes the method (212) in an execution environment (210). In one embodiment of the invention, the arguments to be used as input to the execution unit are obtained from the Interpreter (206).
In one embodiment of the invention, the execution environment (210) corresponds to a JVM. Alternatively, the execution environment (210) may correspond to any other system or program, except the Interpreter (206), configured to execute the execution unit. The result of executing the execution unit (i.e., the “execution result” (214)) is subsequently returned to the Interpreter (206).
In one embodiment of the invention, the execution result (214) is in a format that is understood by the Interpreter (206). Further, the execution result (214) may correspond to an expression that requires further syntactic and meta-evaluation. Once the Interpreter (206) has completed executing the IR (204), it generates an Interpretation Result (208).
In one embodiment of the invention, the Interpreter (206) uses reflection to locate and invoke the execution unit corresponding to the method call. In one embodiment of the invention, the Interpreter (206) uses the Java™ Reflection Application Programming Interface (API) to perform the reflection (Java™ is a trademark of Sun Microsystems, Inc.).
In one embodiment of the invention, the execution environment (210) may not include functionality to execute the execution unit. In such cases, the Interpreter (206) (or a related process) may include functionality to determine an equivalent execution unit, which the execution environment (210) can execute, that produces the same (or corresponding) output as execution of the execution unit would have produced. In such cases, the arguments for the equivalent execution unit may need to be modified prior to providing them as input to the equivalent execution unit. Similarly, the result of executing the equivalent execution unit (i.e., the equivalent execution results) may need to be modified to match the format (including argument name, type, value, etc.) of the expected evaluation result (214). Said another way, the equivalent execution results (214) may need to be modified such that they appear as though they were obtained via execution of the execution unit.
If a method call requiring execution/evaluation is present, then the Interpreter proceeds to locate an execution unit (or equivalent execution unit) corresponding to the method call (ST306). Once the execution unit has been located, the Interpreter invokes the execution of the execution unit in the execution environment (ST308). The results of the execution (i.e., the execution results) are subsequently obtained (ST310). The execution results are subsequently copied into the appropriate portion of the reduced IR (ST312).
At this stage, a determination is made about whether execution of the IR is complete (i.e., does the result of ST312 correspond to the interpretation result) (ST314). If execution of the IR is complete, then the interpretation result is obtained (ST316). Alternatively, if the execution of the IR is not complete, then the method proceeds to ST302. In one embodiment of the invention, the interpretation result is then compared to the expected result to determine whether the translation to the IR from the source code is valid.
The following example illustrates various aspects of the invention. The example is not intended to limit the scope of the invention. In the following example, source code (see below) is translated to IR (see below). The IR is then subsequently interpreted to determine whether the translation is correct.
The source code corresponds to a Boolean function which returns True, if the argument of the function is an integer.
The following is the IR generated when perform_instanceof (new Integer(345)) is called. The expected result is True.
An Interpreter is then used to execute/evaluate the IR. Initially, the IR is reduced, using β-reduction, to obtain Reduced IR 1.
Reduced IR 1 is then reduced, using β-reduction, to obtain Reduced IR 2.
Reduced IR 2 is then reduced, using β-reduction, to obtain Reduced IR 3.
Reduced IR 3 is then reduced, using β-reduction, to obtain Reduced IR 4.
At this stage, the Interpreter encounters a primitive corresponding to a method call (i.e., call_resolveType). The Interpreter, using reflection, locates the corresponding execution unit (in this case, byte code) corresponding to the call_resolveType method. Once located, the execution unit corresponding to the call_resolveType method is invoked with the argument “Ljava.Lang/Integer;” The execution result of executing the aforementioned execution unit is java.lang.Integer. The execution result is then pasted into the appropriate portion of Reduced IR 4, to produce Reduced IR 5.
Reduced IR 5 is then reduced, using β-reduction, to obtain Reduced IR 6.
At this stage, the Interpreter encounters a primitive corresponding to a method call (i.e., call_instanceOf). The Interpreter, using reflection, locates the corresponding execution unit (in this case, byte code) corresponding to the call_instanceOf method. Once located, the execution unit corresponding to the call_instanceOf method is invoked with the argument “java.lang.Integer, 345.” The execution result of executing the aforementioned execution unit is “true.” The execution result is then pasted in to the appropriate portion of Reduced IR 6, to produced Reduced IR 7.
Reduced IR 7 is then reduced, using β-reduction, to obtain Reduced IR 8.
Reduced IR 8 is then reduced, using β-reduction, to obtain Reduced IR 9.
At this stage, the Reduced IR 9 generates an Interpretation result of True. The interpretation result of True is then compared to the expected result of the source code, namely, True. Because the expected result and the interpretation result are equal, the translation of the source code to the IR is valid.
The invention may be implemented on virtually any type of computer regardless of the platform being used. For example, as shown in
Further, software instructions to perform embodiments of the invention may be stored on a computer readable medium such as a compact disc (CD), a diskette, a tape, a file, or any other computer readable storage device.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.