1. Technical Field
The present disclosure relates to the field of validating computer executable instructions. In particular, to a system and method for compilation validation.
2. Related Art
Software applications depend on the integrity of the compiler that converts source code to an executable form. A compiler is an extremely complex program and, for mission- or safety-critical applications, it may be necessary to be able to produce evidence that the compiler has produced valid output. The term “compiler” describes the tools needed to get from source code to executable code (e.g., compiler, assembler, linker, loader, etc.). Code conversion may be confirmed by a compiler validation.
Demonstrating that a compiler operates correctly for any source program processed by the compiler can be an extremely difficult task and the resulting demonstration will be fragile. Compiler validation has to be repeated after each and every change to the compiler and for each different host computer on which the compiler is run. It is also essential to demonstrate that the compiler does not silently produce any output for an incorrect source program.
The system and method may be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the disclosure. Moreover, in the figures, like referenced numerals designate corresponding parts throughout the different views.
Other systems, methods, features and advantages will be, or will become, apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included with this description and be protected by the claims that follow.
Compilation validation may be an alternative to compiler validation. Compilation validation may answer the question “is this particular compilation correct?” without the need to determine whether every compilation of any possible source code is correct.
Compilation validation has several advantages that may overcome some of challenges of compiler validation. It is easier to demonstrate the correctness of a compilation than the correctness of the compiler because it is usually easier to check the result of an algorithm than the algorithm itself. Compilation validation may be unaffected by changes to the compiler—no additional work may be needed when changes are made. Compilation validation may be used with optimizing compilers—these compilers are notoriously difficult to validate.
The compiler under test 102 is modified to produce not only the object code 104 but also an annotated version of the assembler code 106 (e.g., for a Digital Equipment Corporation (DEC) Alpha workstation) that allows a certifier 108 to produce a safety predicate (theorem) 110 for each function that will be true if, and only if, the assembler code is memory- and type-safe. A prover 112 then attempts to prove the predicate.
This technique relies on the changes introduced into the compiler under test 102 being correct. Microsoft Corporation's Verifying C Compiler (VCC) uses a variant of this technique where the programmer is required to embed the correctness requirements into the code itself.
Note that this “certifies” the compiler 204 only for that particular compilation: this must be repeated for each compilation. One advantage of this approach is that it does not try to demonstrate the compiler's accuracy for all programs, just the programs that form part of the system being developed. In the future this technique may be a viable path to compiler validation, but at present the theorem provers necessary to check the correctness are not time efficient. Verifying a theorem prover is tedious and complex and, many language features (e.g., pointers) cannot be handled.
The technique described below preserves the advantages of compilation, rather than compiler validation and provides an approach that is more independent of the source language than other techniques such as those described above.
This approach expands a Trusted Computing Base (TCB) by assuming that the same compiler bug will not appear in both the compiler under test 304 and the second compiler 308. The checker 312 may be significantly simpler than the theorem prover required for the approach described above with reference to
The second compiler 308 does not need to be LLVM; it may be, for example, a variant or derivative of LLVM, a purpose-written compiler only producing intermediate code 310 or another compiler that generates intermediate code and/or certificates. In that case the second compiler 308 could itself be certified and, as it only has to run in one environment, certification would be relatively easy to obtain and maintain.
To compare the intermediate code 310 and compilation results 306, the checker 312 may use any of several processes or any combination thereof. In one process based on static analysis, various static checks may be carried out to compare the two compilation outputs 306 and 310. These include, for example, checking that:
These checks may be inadequate to demonstrate compilation correctness, but, if differences are found at this level, no further analysis is required.
Note that even with call graphs, the compiler outputs 306 and 310 may differ. Consider the example code segment:
Clearly doit2( ) will never actually be called (it would require x to be both odd and even) and it is possible that one compiler notices this and does not generate the call, while the other compiler does not notice and so produces output. Such conditions represent error conditions (dead code) and may be detected and removed before compilation validation is performed. If they are not, then the compilation validation may have the useful side-effect of detecting such code.
Symbolic execution (or “symbolic evaluation”) is the analysis of programs by tracking symbolic rather than actual values. Tools such as, for example, Klee (an open source symbolic virtual machine sub-project of LLVM released by the University of Illinois) may be used to carry this out on LLVM intermediate code 310 and it is also possible to carry out symbolic execution on object code 306. In another approach symbolic execution may be executed on both compiler output forms:
Symbolic execution can derive two invariants that hold at the return statement:
The second of these does not relate to an observable variable and may be ignored. However, the first does and should therefore be true in both versions of the program 306 and 310. It is possible that an invariant of this type is too strong—while one compiler produced code that satisfied it, that was not strictly necessary. In this case a determination may be made whether the full strength is required, but such cases should be rare.
Additional diversity can also be obtained by pre-processing the source program 302 with a source-code transformation tool 404 such as CIL as described by George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, David I. August, and Shubhendu S. Mukherjee. Software-controlled fault tolerance, TACO, 2(4):366-396, 2005, the entirety of which is incorporated herein by reference. This tool 404 transforms a C program into a semantically equivalent, but much simpler, program 406. This places less stress on the compiler and, given the magnitude of the transformation, even using the compiler under test 304 as the second compiler would, in principle, provide a level of confidence 314. The CIL tool 404 also emits other useful information (e.g., control and data flow graphs) that may be used to assist the checker 312.
An approach as described herein may give many of the advantages of compilation validation without the intractability of a formal proof. The system and method for compilation validation may produce a level of confidence while not necessarily producing a proof.
The processor 602 may comprise a single processor or multiple processors that may be disposed on a single chip, on multiple devices or distributed over more that one system. The processor 602 may be hardware that executes computer executable instructions or computer code embodied in the memory 604 or in other memory to perform one or more features of the system. The processor 602 may include a general purpose processor, a central processing unit (CPU), a graphics processing unit (GPU), an application specific integrated circuit (ASIC), a digital signal processor (DSP), a field programmable gate array (FPGA), a digital circuit, an analog circuit, a microcontroller, any other type of processor, or any combination thereof.
The memory 604 may comprise a device for storing and retrieving data, processor executable instructions, or any combination thereof. The memory 604 may include non-volatile and/or volatile memory, such as a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a flash memory. The memory 604 may comprise a single device or multiple devices that may be disposed on one or more dedicated memory devices or on a processor or other similar device. Alternatively or in addition, the memory 604 may include an optical, magnetic (hard-drive) or any other form of data storage device.
The memory 604 may store computer code, such as a compiler under test 304, a second compiler 308, a checker 312, source code transformation tool 404 and an object code transformation tool 402 as described herein. The computer code may include instructions executable with the processor 602. The computer code may be written in any computer language, such as C, C++, assembly language, channel program code, and/or any combination of computer languages. The memory 604 may store information in data structures including, for example, source code 302, object code 306, intermediate code (a.k.a. certificates) 310, correctness statements 314, transformed source code 406, and transformed object code 408.
The I/O interface 606 may be used to connect devices such as, for example, a display, a keyboard, pointing device, and to other components of the system 600.
All of the disclosure, regardless of the particular implementation described, is exemplary in nature, rather than limiting. The system 600 may include more, fewer, or different components than illustrated in
The functions, acts or tasks illustrated in the figures or described may be executed in response to one or more sets of logic or instructions stored in or on a non-transitory computer readable media. The functions, acts or tasks are independent of the particular type of instructions set, storage media, processor or processing strategy and may be performed by software, hardware, integrated circuits, firmware, micro code and the like, operating alone or in combination. Likewise, processing strategies may include multiprocessing, multitasking, parallel processing, distributed processing, and/or any other type of processing. In one embodiment, the instructions are stored on a removable media device for reading by local or remote systems. In other embodiments, the logic or instructions are stored in a remote location for transfer through a computer network or over telephone lines. In yet other embodiments, the logic or instructions may be stored within a given computer such as, for example, a CPU.
While various embodiments of the system and method for on-demand user control have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the present invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.
This application claims priority from U.S. Provisional Patent Application Ser. No. 61/808,935, filed Apr. 05, 2013, the entirety of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61808935 | Apr 2013 | US |