The present invention relates generally to computer systems and software, and specifically to detecting bugs in software code.
Static analysis tools analyze computer software code without actually executing programs built from that code. By contrast, dynamic analysis is performed on executing programs. Static analysis is usually faster than dynamic analysis and is capable of covering all possible program states. On the other hand, static analysis tools tend to have a high rate of false positive error reports, i.e., they output warnings of many potential bugs that do not actually have any deleterious effect at run time, typically because the program never actually reaches the corresponding error states.
Various attempts have been made to reduce the false positive rate of static analysis tools or to eliminate false positives by combining static and dynamic analysis techniques. A technique of this sort is described, for example, by Artho and Biere in “Combined Static and Dynamic Analysis” (Technical Report 466, Department of Computer Science, ETH Zürich, Switzerland, 2005). The authors explain that it is often desirable to retain information from static analysis for run-time verification, or to compare the results of both techniques. For this purpose, they developed a framework, which they call “JNuke,” for analysis of Java programs, in which static and dynamic analysis share the same generic algorithm and architecture.
As another example, Csallner and Smaragdakis describe an automatic error-detection approach that combines static checking and concrete test-case generation in “Check ‘n’ Crash: Combining Static Checking and Testing,” 27th International Conference on Software Engineering (St. Louis, Mo., 2005). The authors state that their technique eliminates spurious warnings and improves the ease of comprehension of error reports.
An embodiment of the present invention provides a computer-implemented method for evaluating software code. A static analysis of the software code provides a warning indicating a respective location in the software code of a potential bug and a possible execution path leading to the potential bug. Responsively to the warning, instrumentation is added to the code at one or more locations along the execution path. When the instrumented code is executed, the instrumentation causes an output to be generated, indicating that the execution path was traversed while executing the instrumented code. The code may then be debugged responsively to the output.
Other embodiments provide apparatus and computer software products for carrying out these functions.
The present invention will be more fully understood from the following detailed description of the embodiments thereof, taken together with the drawings in which:
Fixing bugs and making other modifications to existing code often introduces new bugs. This problem of bug creation is especially acute when modifications are made to legacy code, which is often complex and not fully understood by those who are currently responsible for its maintenance. Debugging legacy code can itself be time consuming and expensive, and changes may often require authorization by external reviewers.
Although static analysis tools can be useful in identifying potential problems in modified legacy code, the high false-positive rate of these tools may complicate the task of debugging still further, by requiring programmers to work through long lists of potential bugs in the code that never actually occur during execution. In response to this problem, programmers often reduce the sensitivity of their static analysis tools (which commonly offer this sort of adjustment capability), which may consequently cause the tools to miss true bugs that fall below the sensitivity threshold. For all these reasons, it is desirable to filter out false positives and minimize the number of potential bugs that programmers must try to fix, while permitting the programmers to use high sensitivity in their static analysis.
Embodiments of the present invention use code instrumentation (i.e., special-purpose instructions that are added to software code), based on the results of static analysis, in order to determine which potential bugs actually do occur during execution. The instrumentation is added at certain points along possible execution paths that the static analysis has identified as leading to the potential bugs. When the code is then executed, the instrumentation generates an output that reveals which of these potential bugs actually do occur during normal operation of the code. Consequently, at least some of the remaining bug warnings from the static analysis may be ignored. Filtering out the false positives in this manner permits programmers to operate the static analysis tool at higher sensitivity, and thus to detect and fix more true bugs without otherwise modifying the static analysis tool in any way.
The techniques that are described hereinbelow are useful particularly in debugging legacy code, which is usually executable and often has a test suite that is representative of its use. This existing test suite may be used to exercise the code in ways that are representative of operation under actual application conditions. Alternatively, the techniques described herein may similarly be applied in debugging of new programs that have a execution environment suitable for these purposes.
Processor 22 performs a static analysis of the software code and instruments the code, as described hereinbelow, based on the results of the analysis. The processor then compiles and executes the code, possibly using a test suite that has been prepared for testing code operation. When the code traverses a path to a potential bug that was instrumented following static analysis, the instrumentation causes processor 22 to output an indication that the path was traversed, and thus to show the programmer that an actual bug exists in the program. The output may be delivered to the programmer via output device 28 and/or recorded in memory 24. Typically, the programmer responds to this indication by debugging the code. Alternatively or additionally, processor 22 may automatically suggest or implement a code correction.
Typically, processor 22 comprises a general-purpose computer, which is programmed in software to carry out the functions described herein. The software may be downloaded to the computer in electronic form, via a network, for example, or it may alternatively be provided on tangible media, such as optical, magnetic, or electronic memory. Processor 22 may comprise a single computer, as illustrated in
One tool of this sort, which has been used by the inventors in developing the present embodiment, is BEAM, which is described, for example, by Brand in “A Software Falsifier,” International Symposium on Software Reliability Engineering (San Jose, Calif., 2000). BEAM is a static analysis tool that looks for bugs in C, C++, and Java software. Like other such tools, the problems BEAM reports include bad memory accesses (uninitialized variables, dereferencing null pointers, etc.) memory leaks, and unnecessary computations, for example. It analyzes the likelihood that suspected errors are actually bugs and filters out suspected errors whose likelihood is below a certain sensitivity threshold, which may be set by the user. (As noted earlier, use of code instrumentation as described herein permits the user to set the threshold to a lower value, i.e., to increase the sensitivity and hence the number of true bugs discovered by the static analysis tool.) Alternatively, other tools with similar capabilities may be used.
Upon discovering a potential bug, BEAM issues a warning 36 reporting the type and location of the bug and identifying a possible execution path leading to the bug. Deciding feasibility of paths, however, is a computationally hard problem and cannot take into account all run-time conditions. Therefore, as noted earlier, many of warnings 36 issued by BEAM (and other static analyzers) are false positives, in the sense that normal execution of code 32 never actually traverses the paths leading to these bugs, or that the potential bug in question cannot actually occur for other reasons not known to the static analysis tool.
Operation of static analyzer 34 is illustrated below with reference to the following sample routine, written in C:
Upon analyzing this code, BEAM returns the following error type 1 (ERROR1) warning, indicating an uninitialized variable (in this case, the variable ‘c’):
—ERROR1 /*uninitialized*/ >>>ERROR1_foo—9269b7a63
“bug.c”, line 12: uninitialized ‘c’
“bug.c”, line 6: allocating ‘c’
“bug.c”, line 9: the if-condition is false
“bug.c”, line 13: getting the value of ‘c’
Processor 22 reviews warnings 36 and, where appropriate, automatically adds instrumentation 38 to code 32 along the paths indicated by the warnings. For example, when the processor encounters a warning regarding an uninitialized variable (ERROR1), the processor may execute the following logic in order to decide where and how to instrument the code:
Application of the above logic to the sample code in Table I will give the following instrumented code:
Instrumentation 38 has added a declaration of a new variable ‘copy_c’ at line 7 and assigned to it the value of the suspected uninitialized variable ‘c’ immediate after the allocation (line 6). An instruction is also added at line 12 to test the value of the suspected uninitialized variable against the new variable immediately before getting the value of the suspected uninitialized variable (line 13).
Processor 22 executes the instrumented code, possibly using an existing test suite 40 to provide a representative set of input commands and data. With respect to the sample code in Table I, if the execution traverses the path through lines 6 and 13 that was indicated by the static analysis bug warning and instrumented as shown in Table II, the added instruction at lines 7 and 12 will cause the processor to issue a bug report 42. Thus, the programmer will know that this particular warning refers to an actual bug, which should be fixed. Alternatively, if the instrumentation of this particular bug warning does not result in a bug report upon execution, the programmer will know that this warning is in all likelihood a false positive, and that the potential bug that it indicates need not be corrected. Eliminating unneeded code changes not only saves time for the programmer, but also avoids additional bugs that often appear when code is changed (particularly in legacy code).
Processor 22 may similarly instrument code 32 in response to warnings of other types. For example, BEAM ERROR4 warns of accessing an already-deallocated flag, which may occur when the code contains multiple pointers to an address, one of which is accessed after another is freed. In this case, processor 22 may instrument the code on the given path so that when the first pointer is freed, the range of freed addresses is recorded, and a Boolean flag is initialized to true. When a subsequent pointer is accessed, a second instrumentation instruction checks whether the address of the pointer is within the recorded range, and whether the Boolean flag is set to true. If both conditions are met, the processor issues a bug report.
As yet another example, BEAM ERROR9 warns of passing NULL, i.e., passing a non-existent address. To investigate this sort of error, processor 22 adds instrumentation just before the end of the execution path, to check the contents of the pointer in question before passing it. Possible instrumentation for other types of static analysis warnings will be apparent to those skilled in the art and is considered to be within the scope of the present invention.
Although the above examples refer to certain types of errors in C code that are discovered by BEAM, the principles of the present invention may similarly be applied to other error types, as well as in debugging code in other languages, using a variety of static analysis tools that are known in the art. It will thus be appreciated that the embodiments described above are cited by way of example, and that the present invention is not limited to what has been particularly shown and described hereinabove. Rather, the scope of the present invention includes both combinations and subcombinations of the various features described hereinabove, as well as variations and modifications thereof which would occur to persons skilled in the art upon reading the foregoing description and which are not disclosed in the prior art.