This application is related to co-pending United States patent application entitled “Computer-Implemented Exception Handling System and Method,” filed on Aug. 1, 2002, and having Ser. No. 10/210,438.
The present invention relates generally to computer code generation and more particularly to computer code compiling.
Most computer programs require preexisting object files in order to execute their instructions. If the programs' instructions were to change, then typically the programs' underlying source code is modified off-line and recompiled into an object file stored on a computer hard disk. The modified program is then available for execution.
In accordance with the teachings disclosed herein, a computer-implemented system and method are provided for generating code. The system and method receive source code that includes higher order computer language statements. Machine code is generated directly into RAM from the received source code.
As an example, the application 33 may wish to run a data search. The source code 34 to handle the data search is created during run-time of the application 33. The module 36 generates directly to RAM the machine code 32 from the data searching source code 34. The generator 36 makes available function pointers 40 so that the application 33 may call the data searching functions contained within the machine code 32. The application 33 can call these function pointers 40 as it would any other function pointer.
The application 64 may also use the APIs 62 to export data symbols, function symbols, and structures 70 so that they may be used during the machine code generation process. This is done when the generated code stream 72 needs to call back into the code of the application 64. To support this, the application 64 is allowed to export code symbols 70 in the form of function pointers, to the generator 68. When the generated code stream 72 needs to have access to data in the application program 64, the application 64 may export data to the code stream 72.
The generator 68 creates the machine code 72 from the source code 66 and from the exported information 70 and places the machine code stream 72 into RAM 74 as the machine code 72 is generated. It is noted that the generated machine code 72 is fully resolved and contains physical memory addresses. Also generated and included in the code stream 72 is any data required by the generated machine code. The generator 68 creates function pointers 76 so that the application 64 may locate where the functions are located in RAM 74.
Because the generated machine code 72 goes directly into memory 74 instead of to an object file, there are no “link” or “load” phases, so relocation is performed during the generator's compiling process. The specific types of relocations are dependent on the target machine, but illustrated in
As shown at reference number 102, another type of relocation is code to code relocation between generated functions. This is used for function calls or for taking the address of another function (so that it can point to the start of the function). Reference number 103 illustrates data to code relocation so that statically initialized pointers may point to the start of a function. Reference number 104 illustrates code to data relocation to obtain the address of a member of global data. Reference number 105 illustrates data to data relocation so that statically initialized pointers may point to data.
The description allows the input source code 66 to be shown alongside of the machine code instruction(s) 72 to be executed. The description contains the current data values and the machine code instruction that is next to be executed. In this way, a user can analyze step-by-step the operation of the machine code 72. The debugger 120 may also access the code within the application 64 so that a user may examine the effect the application's execution has upon the machine code 72.
Continuation block A 158 indicates that processing continues on
Any errors arising during compiling are detected by decision block 162. If no errors arise, then at process block 164 the computer program accesses the functions and data contained in the machine code. Processing in this operational scenario terminates at end block 168. If an error did arise during compilation, then the error is analyzed via the debugger at process block 166.
Situations may become even more complicated as a B*tree computer program may request an unlimited variety of key-types and data-types. Coding for all the possibilities without the code generation system is quite complex and inefficient. The code generation system provides a way to generate customized code for the particular B*tree situation at hand. The following provides specific exemplary considerations that may be used in deciding how to create the optimal source code for the B*tree situation at hand:
The following illustrates use of various APIs in the code generation process. When the generated code stream needs to call back into the application code, the application code exports code symbols in the form of function pointers to the code generator by using the following code:
(Note: the tkgDefineExtern function defines external functions and uses three arguments: 1) handle to code generator, 2) name to export, 3) address of symbol.) The application would then submit code to define this function in the code stream as follows:
The code stream can now call the ‘function’. Note that the calling conventions on some platforms require that a register be set up to point the global data area of each shared library (DLL). Because in this example, ‘function’ appears in a different shared library from the rest of the generated code stream, the direct function call is converted to a function pointer call, so the data pointer can be loaded as part of the normal calling sequence of a function pointer call.
When the generated code stream needs to access data in the application program, the application can export data to the code stream as follows:
The application then submits the code to define symbol as an external data item.
The routines tkgAppendCSource() and tkgFormatCSource() allow the application to submit “C” source code to the code generator. Once all the external symbols are defined, and all the “C” source has been submitted, the application calls tkgGenerate() to cause the source code to be compiled into machine code. This function construct a C function pointer that can be called from the context of the application program. On some architectures, this is returning the address of the code stream. On others, a function descriptor is created which contains the address of the code, and any other necessary information to make a successful function pointer call, i.e., address of the global data pointer. The function pointer obtained from tkgGetFuncAddr() can be cast to a function pointer of the appropriate type, and called like any other function pointer.
(Note that the tkgFormatCSource function formats and adds source code to the compiler source buffer and uses in this example the following arguments: 1) handle to code generator, 2) varargs “printf-style” list of arguments. The tkgGenerate function compiles the code in the source buffer and returns a handle to manage the generated code stream. The function uses in this example uses the following arguments: 1) handle to code generator, 2) IO (input/output) handle to write error messages to, 3) name of codestream (for debugger use). The tkgGetFuncAddr function in this example uses the following arguments: 1) handle to code stream, and 2) name of function.)
It is often desirable for generated code to reference data in structures defined by the application. Structures that the generated code stream references will need are defined in the incoming source code. To reduce the amount of dual maintenance in maintaining a structure definition in two locations (such as one in application header files, and the other in a series of tkgAppendCSource statements), a method is used to describe only the members that the code stream requires. If members are added to the application version, the offsets to the code stream version are automatically adjusted. To illustrate this, the tkgDefineStructList() API is discussed. This API allows a structure to be defined for the generated code to use. If the application developer modifies the structure, the definition sent to the code generator is automatically updated (note that in this C language example, this would be due to the use of the standard C offsetof() operator).
First, a set of helper macros are defined as:
Suppose we wish to allow the generated code to have access to members ‘a’,‘d’,‘x’, and ‘func’. A data structure may be set up and may describe a structure FOO as follows:
Then the application calls tkgDefineStructList() as follows:
While examples have been used to disclose the invention, including the best mode, and also to enable any person skilled in the art to make and use the invention, the patentable scope of the invention is defined by the claims, and may include other examples that occur to those skilled in the art. As an example of the wide scope attributable to the code generation system and method,
A wide range of computer applications may use the code generation system 60. As an illustration,
As shown in
The code generation system may be used on many different types of computer architectures and may be implemented in different ways. As an illustration, the generator module may co-exist on the same computer or device that contains the applications that generate and use the machine code. In another implementation, the generator module may reside on a first computer while the applications reside on a second computer; the generated machine code is placed directly into RAM device(s) that the applications can access. Also, the code generation software may be stored on many different types of computer readable media, such as for storage, delivery to a user, for execution, etc., as well as using a flash memory device (that is in the address space of the program that generates the source code) in substitution for the RAM device.
As a further example of the wide scope of the code generation system, the code generation system may be tailored to perform run-time exception handling more efficiently. With reference to
When an exception happens, an exception signal handler 310 allows execution to resume at the recovery code contained within the exception catching functionality 322. The exception signal handler 310 accomplishes this by placing the value stored from the exception branching functionality 320 into the PC (program counter) slot of the exception context structure that was passed to the exception signal handler 310. The exception signal handler 310 then returns, and execution continues via the exception catching functionality 322.
The exception handling techniques may be used with source code of many different types of higher order languages, such as C, FORTRAN, Pascal, assembly, etc.
The ON_EXCEPTION_GOTO() statement 370 is a relatively fast operation that records the location of the recovery code located at the exception label 372. The ON_EXCEPTION_GOTO() statement 370 may be placed at any point within the generated source code 34. For example, the ON_EXCEPTION_GOTO() statement 370 may be placed in the beginning of the generated source code 34 in order to have exception handling techniques available from the execution's start. Because the exception-related statements (370 and 372) are being generated on-the-fly by the application 33 (e.g., a data mining application, a database application, a statistical application, etc.), the exception handling statements (370 and 372) may be tailored/customized for the situation at hand. The customization may include placing one or more of the exception-related statements (370 and 372) in different locations within the generated source code 34 based upon the application's current execution context. The customization may also include generating different recovery code based upon the application's current execution context. For example, the program may have attempted to compute a multiplication of two very large numbers. This would cause a floating point overflow exception. The recovery code could choose to set the result to the largest possible floating point number, and continue execution.
With the extension 350, the C machine code generator 340 can generate on-the-fly machine code 32 directly into RAM such that execution resumes at the EXCEPTION_LABEL() statement 372 no matter where in the machine code 32 the exception happens. To resume execution at the recovery code, an exception signal handler 360 places the value stored from the ON_EXCEPTION_GOTO() statement 370 into the PC (program counter) slot of the exception context structure that was passed to the exception signal handler 360. The exception signal handler 360 then returns, and execution continues at the EXCEPTION_LABEL() statement 372.
It is noted that the recovery code can perform any actions necessary or useful for handling the exception at hand. For example, the recovery code may allow the executing program to recover from the exception, activate a debugging program to analyze the exception, or terminate the program gracefully.
The run-time exception handling techniques allow for the reduction or the elimination of having to continually save the context of an executing program when trapping for exceptions. Moreover, the exception handling functionality may be extended to allow exception handling code to be generated by the code generator that allows the exception handling to be turned off as well as turned back on.
The exception handling functionality may also handle many types of exceptions, such as for example I/O interrupts (e.g., a person hitting control C while the program is executing), null pointer exceptions, overflow errors, or index outside the bounds of an array, etc. Other exception handling may include the recovery code allowing locks to be properly released if the program is involved in a concurrency situation.
At process block 406, a code generator module generates machine code directly to RAM from the source code and allows the application to access the functions contained in the machine code. At process block 408, the generated machine code hits the ON_EXCEPTION_GOTO() statement which causes the address of the recovery code to be recorded. Within the generated machine code, the code portion that could possibly cause an exception is executed at process block 410. Processing continues on
With reference to
However if an exception had occurred as determined by decision block 414, then the signal handler is invoked at process block 418. At process block 420, the signal handler changes the PC in the exception context to the location recorded by the ON_EXCEPTION_GOTO() and returns from the exception. At process block 422, the recovery code at the EXCEPTION_LABEL is executed, and execution continues at process block 416 before processing returns from the generated machine code at process block 424.
| Number | Name | Date | Kind |
|---|---|---|---|
| 4342082 | Brown et al. | Jul 1982 | A |
| 5170465 | McKeeman et al. | Dec 1992 | A |
| 5432795 | Robinson | Jul 1995 | A |
| 5455949 | Conder et al. | Oct 1995 | A |
| 5487132 | Cheng | Jan 1996 | A |
| 5627981 | Adler et al. | May 1997 | A |
| 5724564 | Conder et al. | Mar 1998 | A |
| 5761407 | Benson et al. | Jun 1998 | A |
| 5761467 | Ando | Jun 1998 | A |
| 5761513 | Yellin et al. | Jun 1998 | A |
| 5832202 | Slavenburg et al. | Nov 1998 | A |
| 5881280 | Gupta et al. | Mar 1999 | A |
| 6067577 | Beard | May 2000 | A |
| 6247117 | Juffa | Jun 2001 | B1 |
| 6247169 | DeLong | Jun 2001 | B1 |
| 6260190 | Ju | Jul 2001 | B1 |
| 6353818 | Carino, Jr. | Mar 2002 | B1 |
| 6353820 | Edwards et al. | Mar 2002 | B1 |
| 6412109 | Ghosh | Jun 2002 | B1 |
| 6427228 | Wigger | Jul 2002 | B1 |
| 6487716 | Choi et al. | Nov 2002 | B1 |
| 6625797 | Edwards et al. | Sep 2003 | B1 |
| 6634023 | Komatsu et al. | Oct 2003 | B1 |
| 6772413 | Kuznetsov | Aug 2004 | B2 |
| 6848111 | Schwabe et al. | Jan 2005 | B1 |
| 6918030 | Johnson | Jul 2005 | B2 |
| 7003762 | Krueger | Feb 2006 | B2 |
| Number | Date | Country | |
|---|---|---|---|
| 20040025148 A1 | Feb 2004 | US |