The technical field relates to components of a compiler computer program. More specifically the field relates to an intermediate representation of exception handling constructs for compiling a program.
Generally speaking, a translator is a computer program that receives as its input a program written in one computer programming language and produces as its output a program in another programming language. Translators that receive as their input a high-level source language (e.g., C++, JAVA, etc.) and generate as their output a low-level language such as assembly language or machine language sometimes are more specifically referred to as compilers. The process of translation within a compiler program generally consists of multiple phases.
Exception handling is invoked when a flaw in the source program is detected. In the existing compiler frameworks, exception handling constructs within the source program are processed separate from the main control flow of the intermediate representation. Traditionally, exception handling constructs are not explicitly represented in the control flow of the intermediate representation. In one well known technique, regions within the source code where exception handling constructs are detected are delimited from the main control flow and thus not subject to the same code optimization techniques as the main control flow. In yet another method, the exception handling constructs are captured within a table outside of the main control flow and the compiler back end processes them separately. Thus, there is a need for intermediate representation for exception handling constructs that allows such constructs to be explicitly represented within the main control flow to take advantage of the same code optimizations and code generation techniques (i.e., compiler back end) as the rest of the source code.
Also, traditionally, intermediate representations have been specific to a source language. Thus, compilers have to be aware of the specific exception handling models of the source language associated with each representation. For our purposes, these exception handling models can be typically characterized by four features. The first feature determines if the exception is synchronous or asynchronous. A synchronous exception is associated with the action of the thread of control that throws and handles it. In this situation, an exception is always associated with an instruction of the thread. In other words, an exception handling action is invoked by an instruction when some condition fails. However, an asynchronous exception is injected into a thread of control other than thread that may have thrown and handled it. In Microsoft CLR (the Common Language Runtime (CLR) is Microsoft's commercial implementation of the Common Language Infrastructure (CLI) specification; Microsoft is a trademark of Microsoft Corporation), this may be caused by aborting a thread via a system API. Such exceptions are not associated to a particular instruction. The effect is to raise an exception in the thread at some suitable point called a synchronization point.
Second, an exception may either terminate or resume the exception causing instruction. In the case of a terminating exception the instruction is terminated and a filter, handler, or a finalization action is initiated. However in the case of a resumption model the offending instruction can be automatically resumed after some handling action is performed. The Structured Exception Handling (SEH) constructs in C/C++ fall into this category. This requires, typically, that the entire region including the exception causing instruction be guarded as if all memory accesses act like volatile accesses. Thus, disallowing any optimization of the memory accesses.
Third, an exception handling model may be precise or imprecise. In precise exception handling models relative ordering of two instructions needs to preserve observable behavior of memory state. This means that a reordering of instructions cannot be performed if a handler or another fragment of code will see different values of variables. Languages such as C#, Microsoft CLR and C++ require a precise mechanism. In such models, the compiler may need to reorder exception instructions relative to each other and any other instruction whose effect is visible globally. In imprecise models, the relative order of instructions on exception effect is undefined and a compiler is free to reorder such instructions. In either model, the order between exception instructions and their handlers is always defined and is based on control dependencies. Some languages like Ada have an imprecise exception model.
Fourth feature of an exception handling model is how handler association is performed in various exception handling models. In most languages, including C++, C#, and Microsoft CLR, handler association is lexical and performed statically. This means that it is statically possible to identify the start of the handler code and this is unique. As explained below this attribute of statically identifying handler bodies may be used to generate the intermediate representation of the exception handling instructions. Thus, there is a need for a single uniform framework for intermediately representing exception handling constructs that is uniform across multiple models for representing exception handling and is capable of accounting for the various attributes of such models described above.
As described herein, a uniform intermediate representation of exception handling constructs may be used for expressing exception handling models of various languages. In one aspect, a single set of instructions related to the intermediate representation are described herein for expressing multiple different exception handling mechanisms. For example, a common set of related instructions may be used to describe the control flow from a try region to a finally region and then to outside of the finally region. In yet another aspect, control flow from a try region to a catch region may be expressed using a common set of related instructions. Furthermore, filters guarding the handler or catch region may also be expressed. Control flow from a try region to the “except” region to pass the control back to the exception causing region under certain conditions may also be expressed. Exception handling control flow related to object destructors may also be expressed using the uniform intermediate representation of the exception handling constructs.
In a further aspect, methods and systems are described herein for generating the uniform intermediate representation for expressing control flow of exception handling constructs. In one aspect, the intermediate representation may be generated by translating an intermediate language representation of the source code file. Multiple different intermediate languages may be used to generate the intermediate representation of exception handling constructs. In a further aspect, the intermediate representation of the exception handling constructs may be used by software development tools for such tasks as code generation, code optimization, analysis etc.
Additional features and advantages will be made apparent from the following detailed description of illustrated embodiments, which proceeds with reference to accompanying drawings.
The uniform intermediate representation of the software having the exception handling constructs can explicitly express exception handling control of the software.
At 360, the uniform intermediate representation is read (e.g., by a compiler or other software development tool). For example, the uniform intermediate representation generated by the method of
At 370, a computer-executable version of the software is generated (e.g., by the compiler or other software development tool). The computer-executable version of the software implements the exception handling control flow of the software, based on the uniform intermediate representation.
Both the readers 435 and 445 may use appropriate algorithms implemented within their respective readers to parse or read their respective intermediate language code streams to express the exception handling constructs or instructions or expressions within the intermediate language code stream using a uniform framework of exception handling instructions 450 to be provided to the back end 460. Part of the rest of this document below describes various components of such a language independent exception handling instruction set. Furthermore, examples of exception handling constructs within the intermediate language are shown translated to their respective language independent intermediate representations. The document also describes algorithms and methods for parsing the intermediate language and generating the intermediate representations of exception handling constructs.
Exception causing instructions are guarded by their handlers or finally regions. When an instruction causes an exception the control flow may pass to a handler and sometimes the handler may be conditionally selected based on the processing of filter instructions. Control may flow to finally regions of code based on exceptions or directly, either way, it will be processed and used to implement clean-up code. Finally, regions are always executed before the control is exited out of the corresponding try region. This mechanism can be used for implementing clean up code, such as closing of file handles, sockets, locks, etc.
As described with reference to
As noted above, the intermediate representation of exception handling constructs in the intermediate language representation may be expressed at an instruction level.
The exception handling semantics may be represented by providing, each instruction 505 with a handler field 510 that points to a label instruction 520 which is the start of the handler 530 for that instruction 505. If the instruction cannot throw an exception then the handler field 510 of the instruction is set to NULL. If the instruction can throw an exception but has no handler then the compiler may build a special handler to propagate control out of the current method.
A textual notation for describing the IR instruction 505 of
CC, DST=OPER1 SRC1, SRC2; $HANDLER1
The handler label, if any, appears after the semi-colon. When the instruction does not throw an exception, the handler field is set to NULL. This may be either specified by the semantics of the instruction or found to be the case as a result of optimization or program analysis. In that case, the instruction may be textually denoted as follows:
CC, DST2=OPER1 SRC1, SRC2;
In cases where there is no destination operand or result for an instruction the destination and the “=” sign in the instruction description is omitted. For example, a conditional branch instruction does not have any explicit destination operands and its may be represented textually as follows:
CBRANCH SRC1, SRC1-LABEL, SRC2-LABEL;
The following paragraphs describe the various exception handling related instructions of the intermediate representation by describing their operations, their inputs and outputs. Examples will illustrate how this instruction set can be used to generate an intermediate representation of exception handling constructs of various models within the same control flow as those instructions that are unrelated to exception handling.
An UNWIND instruction is used to represent control flow out of the current method when no matching handler for an exception is present. The unwind instruction is preceded by a label, and is followed by an exit out of the method. The source operand (x) of the UNWIND operation represents the thrown exception object. This makes the data flow explicit. There can be one or more unwind instruction in a method. However, having just one UNWIND per method allows for savings in intermediate representation space for each method. Also the handler field of an UNWIND instruction is usually set to be NULL.
The control flow to and out of a finally region may be represented in the intermediate representation by a set of instructions that are related, e.g., FINAL, FINALLY and ENDFINALLY. The FINAL instruction in general handles the explicit transfer of control to a finally region, whereas the FINALLY instruction can accept transfer from a FINAL instruction or through an exception causing instruction with a handler. The ENDFINALLY instruction represents the control flow out of a finally region.
A FINAL instruction represents an explicit transfer of control to the start of a finally instruction. The first source operand of this instruction is the start label of the associated finally instruction, and the second operand is the continuation label where control is transferred after the finally region is executed. The handler field of a FINAL instruction is usually set to be NULL.
A FINALLY instruction has two destination operands. The first operand is the exception variable. This models the data flow of the exception object. The second operand is a label or a code operand for the continuation that is captured. When a FINALLY instruction is executed as a result of an exception the captured continuation is the label of the lexically enclosing handler, FINALLY label or UNWIND instruction. This continuation label is reflected as the handler field of the matching ENDFINALLY (see below). The handler field of a FINALLY instruction is usually set to NULL.
An ENDFINALLY instruction has two or more operands. The first operand is the exception variable. The second operand is the continuation variable whose type is the type of a label or a code operand. It also has a case list that is used to represent possible control transfers for explicit final invocations in the program. An ENDFINALLY instruction must have its handler field set to the label of the lexically enclosing outer finally or handler (i.e., a FILTER or UNWIND instruction). If there is no exceptional control flow to the matching finally instruction then the handler field may be NULL. Furthermore, the destination operands E and R of the FINALLY instruction is the same as the source operands E and R of the ENDFINALLY instruction. This ensures data dependence between the two instructions which can be used by the back end components during code optimization.
Yet another set of exception handling intermediate representation instructions for representing a finalization control flow may be referred to as the FAULT and the ENDFAULT instructions. They are similar to the FINALLY and the ENDFINALLY instructions, however, unlike a FINALLY instruction control flow cannot be passed explicitly from FINAL instruction to a FAULT instruction. Control to the FAULT instruction is branched to only through an exception causing instruction.
The ENDFAULT instruction terminates a related FAULT handler and throws the exception to a specified handler. The handler field can be NULL if all exceptional control flow to the corresponding FAULT instruction has been removed. In that case the fault handler is unreachable and can be deleted.
Some intermediate languages (e.g., MSIL) implement a filter-based handler, whereby different handlers are assigned to exception causing events based on a characteristic of the exception causing event. Such control flow may be represented in a intermediate representation using instructions to catch and filter exceptions and then to specify handlers to exceptions (e.g., FILTER, ENDFILTER and TYPEFILTER). As described below, TYPEFILTER instructions may be a short form of the FILTER and ENDFILTER instructions.
This instruction may be used to implement a general-purpose filter-based handler in MSIL. This matches any exception and simply returns the exception object in the destination operand of the instruction. The filter instruction is labeled, and is followed by an arbitrary sequence of instructions that may or may not use the exception variable. A filter instruction must eventually reach an ENDFILTER instruction without an intervening FILTER instruction. The handler field of a FILTER instruction is usually NULL.
An ENDFILTER instruction tests a Boolean operand (X) and if it is 1, branches to the handler label otherwise it tries another filter or unwinds.
A TYPEFILTER instruction tests if the type of the exception object is a subtype of the type of the destination operand (which is statically known). If so, control is transferred to the first label (the handler label). Otherwise, another filter or unwind instruction label is tried. When the type filter matches, the destination operand is set to the exception object. The handler field of a TYPEFILTER instruction is usually NULL.
Note that a TYPEFILTER instruction is a short form and in fact can be represented as a combination of both FILTER and ENDFILTER operations as follows:
The FILTER instruction returns an exception object whose type is verified to be of e.Type and if it is of e.Type then, x.cc is set to TRUE and to FALSE otherwise. Then at ENDFILTER, the continuation is determined to be $LABEL1 or $LABEL2 depending on the value of x.cc. The same expression can be represented as a TYPEFILTER instruction as follows.
e.Type=TYPEFILTER $LABEL1, $LABEL2;
A MATCHANYFILTER instruction is a form of the filter based exception handling instruction.
This filter always matches any exception unconditionally, and transfers control to a valid label. This is equivalent to a FILTER-ENDFILTER pair where the first operand of the ENDFILTER is always 1 and the second label is not specified. The handler field of a MATCHANYFILTER instruction must be NULL. A MATCHANYFILTER instruction is also a short form and can be represented using FILTER and ENDFILTER instructions as shown below.
The equivalent MATCHANYFILTER instruction for the FILTER and ENDFILTER combination above is as follows:
e.Type=MATCHANYFILTER $LABEL1;
Yet another exception handling model has the control flowing from a try block to one or more handler regions based on type of exception caused and then one or more finally regions.
As shown in
For example,
However, as noted above, the exception instructions of intermediate representation uses labels to mark or identify various cohesive blocks of code and these labels are used to build the intermediate representation along with the control flow. Thus, IL reader uses the exception handling data tables such as the one shown in
For example,
Returning now to
Some languages such as C++ allow a programmer to declare object (class) variables that have local lifetimes within blocks or expressions where they are declared. The language semantics requires that the corresponding object destructors be called when the block scope or the expression scope is exited. These operations can be represented within the intermediate representation as one or more sets of try-finally blocks. For example,
Some languages such as C++ allow creation of expression temporary objects. These objects are created during expression evaluation and are destroyed after the expression evaluation, typically after the evaluation of the statement containing the expression. If the expression is a conditional expression, then objects created have to be destructed conditionally. For example,
Some languages such as C++ permit returning objects by value. Before returning the object, destructors are called on the locally created objects. For example, in
Some languages such as C++ permits throwing and catching objects by value, i.e., values allocated on the stack. Simple primitive values like type “int” do not pose any issues. However, throwing structs and classes allocated on the stack may require calls to constructors and destructors.
The try-finally block may be represented in the intermediate representation as a set of FINAL, FINALLY and ENDFINALLY instructions and following instruction may be used to represent throwing of value types which have copy constructors and destructors defined for them within the copy instruction.
This is a special form of throw that is used to throw value types which have copy constructors or destructors defined for them. It has two operands. The first operand is a pointer to the location that has the value being thrown, and the second operand is a function pointer that performs the destruction. The semantics of this is that the thrown object is destructed when a handler is found. This essentially keeps the local value type location live at runtime. This may be used to model the C++ exception semantics for value types. The handler field of a THROW instruction is usually not set to NULL.
Structured Exception Handling (SEH) extensions to languages such as C and C++ provide an exception handling construct expressed as a try-except block.
An SEHENTER instruction marks an entry to a try-except region. Its handler specifies a control dependency to the handler and the body of the guarded region.
An ENDRESUMEFILTER is similar to an ENDFILTER except that it may cause the execution of the exception causing instruction to be resumed when the source operand has a value of −1.
As noted in
Generally, in post-fix notation expressions, the operands of an operation are expressed before the operator is expressed. For example, in the code for an ADD operation such as T=5+3, a reader will encounter the code for the operands 5 and 3 before encountering the code for the operator “+” (i.e., ADD). A translator for such code, which uses a post-fix notation form and more particularly, one capable of translating such code in one pass may translate the code for the operands first, and then build the translated code for the entire operation based on the code for its operands or its children nodes (also referred to as sub-expressions of an expression).
Post-fix notation language may express exception information in form of operations using selected operators, which may be processed by an IL reader in the manner of
In one such method, nodes read from intermediate language input are pushed on to the evaluation stack 3610 where they may be evaluated. The evaluation of a node read from the input may require popping of some nodes from the evaluation stack or EH stack, and pushing new nodes onto the evaluation stack or EH stack. Generally, the evaluation stack 3610 may contain intermediate code related to most of the main code stream. Some nodes from the evaluation stack may be then popped off the stack to build code for other nodes as the context or containment relationship of the parent and children nodes are established. For example, getting back to the simple add expression T=5+3, when its established that 5 and 3 are operands for the “+” operation, the nodes on the evaluation stack 3610 related to the constants 5 and 3 are popped and the code for the add operation can be synthesized by composing the code of its children, namely nodes representing 5 and 3. The translation algorithm uses and maintains the invariant that nodes on evaluation stack have all attributes computed.
The DTOR code data structure 3620 may be thought of as a structure for encapsulating the translated intermediate representation code sequences for all object destructors, catch blocks, and finally blocks that appear in the body of a method. The exception handling (EH) stack 3630 is not used to contain code sequences, as such, but may be thought of as a stack of continuations used to establish the relationship between the various sequences of code by building labels. The EH stack establishes the nesting relationship among try, catch and finally regions. Each region can be identified by the label associated with the region. Each node in the EH Stack has a unique ID called the state. The notion of state is used to compute the number of objects allocated in an expression evaluation. This information can be used later for such things as determining the number of destructors that need to be added to the translated code and their relationships to the rest of the code.
The data structure for each of the nodes on the evaluation stack 3610 may be as shown below.
Only the “Opcode” and the “IL Type” fields may be provided by the front end of the compiler the rest of the fields are filled during the translation process as the intermediate representation code is generated. The FirstInstr and the LastInstr fields will allow the concatenation and pre-pending of code. The entire DTOR code data structure 3620 may be implemented similar to the data structure of one node of the evaluation stack 3610.
The EH stack nodes may have the following data structure for representing the continuations on the exception path.
The label field points to a label instruction that can precede a Finally, or TypeFilter instruction. The flag field may be used to identify the characteristics of the exception handling operations being performed. For example, it may be used to identify whether destructor code to be translated is for a temporary object such as an expression temporary or whether it is for an object that can be destructed outside of the expressions within which it is constructed.
Within the context of the data structures for the nodes of the evaluation stack 3610, DTOR code 3620 and the EH stack 3630 a method for generating intermediate representation as shown in
The method of evaluation 3740 may be different for different operations as they are encountered during the process of reading the intermediate language code. The following example illustrates the method for evaluating some such operations in intermediate language code.
Having described and illustrated the principles of our invention with reference to the illustrated embodiments, it will be recognized that the illustrated embodiments can be modified in arrangement and detail without departing from such principles. Although, the technology described herein have been illustrated via examples using compilers, any of the technologies can use other software development tools (e.g., debuggers, optimizers, simulators and software analysis tools). Also, it should be understood that the programs, processes, or methods described herein are not related or limited to any particular type of computer apparatus. Various types of general purpose or specialized computer apparatus may be used with or perform operations in accordance with the teachings described herein. Actions described herein can be achieved by computer-readable media comprising computer-executable instructions for performing such actions. Elements of the illustrated embodiment shown in software may be implemented in hardware and vice versa. In view of the many possible embodiments to which the principles of our invention may be applied, it should be recognized that the detailed embodiments are illustrative only and should not be taken as limiting the scope of our invention. Rather, we claim as our invention all such embodiments as may come within the scope and spirit of the following claims and equivalents thereto.
This is a continuation of U.S. patent application Ser. No. 10/609,275, filed Jun. 26, 2003, the disclosure of which is hereby incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
4197978 | Kasper | Apr 1980 | A |
4734854 | Afshar | Mar 1988 | A |
5355491 | Lawlor et al. | Oct 1994 | A |
5488727 | Agrawal et al. | Jan 1996 | A |
5598560 | Benson | Jan 1997 | A |
5659753 | Murphy | Aug 1997 | A |
5696974 | Agrawal et al. | Dec 1997 | A |
5742828 | Canady et al. | Apr 1998 | A |
5754858 | Broman et al. | May 1998 | A |
5768595 | Gillies | Jun 1998 | A |
5778233 | Besaw et al. | Jul 1998 | A |
5857105 | Ayers et al. | Jan 1999 | A |
5918235 | Kirshenbaum et al. | Jun 1999 | A |
5937195 | Ju et al. | Aug 1999 | A |
5943499 | Gillies et al. | Aug 1999 | A |
5966702 | Fresko et al. | Oct 1999 | A |
5999739 | Soni et al. | Dec 1999 | A |
6009273 | Ayers et al. | Dec 1999 | A |
6070011 | Liu et al. | May 2000 | A |
6148302 | Beylin et al. | Nov 2000 | A |
6182284 | Sreedhar et al. | Jan 2001 | B1 |
6212672 | Keller et al. | Apr 2001 | B1 |
6249910 | Ju et al. | Jun 2001 | B1 |
6253304 | Hewitt et al. | Jun 2001 | B1 |
6286134 | Click et al. | Sep 2001 | B1 |
6292938 | Sarkar et al. | Sep 2001 | B1 |
6330717 | Raverdy et al. | Dec 2001 | B1 |
6353924 | Ayers et al. | Mar 2002 | B1 |
6363522 | Click et al. | Mar 2002 | B1 |
6374368 | Mitchell et al. | Apr 2002 | B1 |
6421667 | Codd et al. | Jul 2002 | B1 |
6460178 | Chan et al. | Oct 2002 | B1 |
6463581 | Bacon et al. | Oct 2002 | B1 |
6481008 | Chaiken et al. | Nov 2002 | B1 |
6526570 | Click et al. | Feb 2003 | B1 |
6560774 | Gordon et al. | May 2003 | B1 |
6578090 | Motoyama et al. | Jun 2003 | B1 |
6598220 | Valys et al. | Jul 2003 | B1 |
6625804 | Ringseth et al. | Sep 2003 | B1 |
6625808 | Tarditi | Sep 2003 | B1 |
6629312 | Gupta | Sep 2003 | B1 |
6634023 | Komatsu et al. | Oct 2003 | B1 |
6662356 | Edwards et al. | Dec 2003 | B1 |
6678805 | Corduneanu et al. | Jan 2004 | B1 |
6745383 | Agarwal et al. | Jun 2004 | B1 |
6748584 | Witchel et al. | Jun 2004 | B1 |
6981249 | Knoblock et al. | Dec 2005 | B1 |
7055132 | Bogdan et al. | May 2006 | B2 |
7117488 | Franz et al. | Oct 2006 | B1 |
7120898 | Grover et al. | Oct 2006 | B2 |
20020083425 | Gillies et al. | Jun 2002 | A1 |
20020095667 | Archambault | Jul 2002 | A1 |
20020166115 | Sastry | Nov 2002 | A1 |
20020170044 | Tarditi | Nov 2002 | A1 |
20030101335 | Gillies et al. | May 2003 | A1 |
20030101380 | Chaiken et al. | May 2003 | A1 |
20030217196 | Chan et al. | Nov 2003 | A1 |
20030217197 | Chan et al. | Nov 2003 | A1 |
20030226133 | Grover | Dec 2003 | A1 |
20040025152 | Ishizaki et al. | Feb 2004 | A1 |
20040093604 | Demsey et al. | May 2004 | A1 |
20040095387 | Demsey et al. | May 2004 | A1 |
20040098710 | Radigan | May 2004 | A1 |
20040098724 | Demsey et al. | May 2004 | A1 |
20040098731 | Demsey et al. | May 2004 | A1 |
20040172639 | Luo et al. | Sep 2004 | A1 |
Number | Date | Country |
---|---|---|
0 463 583 | Jan 1992 | EP |
0 665 493 | Aug 1995 | EP |
0 757 313 | Feb 1997 | EP |
1 049 010 | Nov 2000 | EP |
WO 0148607 | Jul 2001 | WO |
Number | Date | Country | |
---|---|---|---|
20070006192 A1 | Jan 2007 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10609275 | Jun 2003 | US |
Child | 11505090 | US |