The field relates to dynamic program analysis, and tools therefor.
As defined by Microsoft® Computer Dictionary, Fourth Edition, Microsoft Press (1999), the heap is a portion of memory in a computer that is reserved for a program to use for the temporary storage of data structures whose existence or size cannot be determined until the program is running. To build and use such elements, programming languages such as C and Pascal include functions and procedures for requesting free memory from the heap, accessing it, and freeing it when it is no longer needed. In contrast to stack memory, heap memory blocks are not freed in reverse of the order in which they were allocated, so free blocks may be interspersed with blocks that are in use. As the program continues running, the blocks may have to be moved around so that small free blocks can be merged together into larger ones to meet the program's needs.
Microsoft® Computer Dictionary, Fourth Edition, Microsoft Press (1999) further defines garbage collection as, “a process for automatic recovery of heap memory. Blocks of memory that had been allocated but are no longer in use are freed, and blocks of memory still in use may be moved to consolidate the free memory into larger blocks. Some programming languages require the programmer to handle garbage collection. Others, such as Java, perform this task for the programmer.”
Many currently available programming language run-time environments provide a garbage collector to actively and automatically manage heap memory. Examples of such run-time environments include run-time environments for the Java programming language, the C# programming language, and Microsoft Corporation's .Net Common Language Runtime environment. The garbage collector periodically traverses the objects in heap memory to identify objects that are no longer in use, so that the memory occupied by such dead objects or “garbage” can then be reclaimed. Although the garbage collectors may vary in design, they generally operate by tracing or traversing through the live objects by following pointers from a root object or objects of a program in the heap. Those objects still reachable by tracing pointers from the root object(s) are considered “live,” whereas any of the program's objects that can no longer be reached are dead or garbage. The garbage collector then reclaims the memory occupied by such dead objects.
Modern software packages allocate and manage a vast amount of information on the heap. Object oriented languages such as Java and C# almost exclusively use the heap to represent and manipulate complex data structures. The growing importance of the heap necessitates detection and elimination of heap-based bugs. These bugs often manifest themselves in different forms, such as dangling pointers, memory leaks, and inconsistent data structures.
Unfortunately, heap-based bugs are hard to detect. The effect of these bugs is often delayed, and may be apparent only after significant damage has been done to the heap. In some cases, the effect of the bug may not be apparent. For instance, a dangling pointer bug does not crash the program unless the pointer in question is dereferenced, and on occasion, may not cause a crash even then. Consequently, software testing is not very effective at identifying heap-based bugs. Because of the non-deterministic nature of heap based bugs, even if the buggy statement is executed on a test run, it is not always guaranteed to crash the program, or produce unexpected results. Moreover, the effect of heap-based bugs is often delayed, as a result of which testing does not reveal the root-cause of the bug.
Static analysis techniques, such as shape analysis (see, e.g., M. Sagiv, T. W. Reps, and R. Wilhelm, “Parametric Shape Analysis Via 3-Valued Logic,” ACM Trans. Prog. Lang. Syst. (TOPLAS), 24(3):217-298, May 2002), overcome these limitations. They examine all valid code paths, and can also provide soundness guarantees about the results of the analysis. Shape analysis has enjoyed success at determining the correctness of, or finding bugs in algorithms that manipulate heap data structures. However, in spite of recent advances (such as described by B. Hackett and R. Rugina, “Region-Based Shape Analysis With Tracked Locations,” Proc. 32nd Symp. on Princ. of Prog. Lang. (POPL), January 2005; and E. Yahav and G. Ramalingam, “Verifying Safety Properties Using Separation And Heterogeneous Abstractions,” Proc. ACM SIGPLAN Conf. On Prog. Lang. Design and Impl., pages 25-34, June 2004), shape analysis algorithms are expensive, and apply only to limited classes of data structures, and properties to be checked on them. Moreover, the results of static analysis, while sound, are often overly conservative, and over approximate the possible set of heap configurations.
On the other hand, dynamic analysis techniques have the advantage of precisely capturing the set of heap configurations that arise. Several dynamic analysis tools have been developed to detect special classes of heap-based bugs. (See, e.g., T. M. Chilimbi and M. Hauswirth, “Low-Overhead Memory Leak Detection Using Adaptive Statistical Profiling,” Proc. 11th Intl. Conf. on Arch. Support for Prog. Lang. and Op. Sys. (ASPLOS), pages 156-164, October 2004; B. Demsky and M. Rinard, “Automatic Detection And Repair Of Errors In Data Structures,” Proc. 18th ACM SIGPLAN Conf. on Object-Oriented Prog., Systems, Lang. and Appls. (OOPSLA), pages 78-95, October 2003; R. Hastings and B. Joyce, “Purify: Fast Detection Of Memory Leaks And Access Errors,” Winter USENIX Conference, pages 125-136, January 1992; and N. Nethercote and J. Seward, “Valgrind: A Program Supervision Framework,” Elec. Notes in Theor. Comp. Sci. (ENTCS), 89(2), 2003.) However, there has been relatively little research at understanding the runtime behavior of the heap, and applying this information for bug finding.
The following description details various techniques and tools for discovering data structure invariants, which are properties or characteristics of the data structure that generally do not vary during execution of the program (such as, “Foo.x is a constant” or “Object[ ] bar only contains objects of type Baz,” etc.). These techniques and tools leverage the garbage collection process, in that the techniques and tools infer the invariants dynamically, at runtime, by analyzing the data structures on the heap as the garbage collector traverses the data structures.
In one exemplary implementation of this approach, the technique is implemented in a heap executive or garbage collector that performs the garbage collection process for a run-time environment in which a program executes. The program is run in this execution environment. As the program executes, the heap executive tracks object allocations made by the program, and records some meta data describing an allocated object based on the type of the object. This meta data represents the invariants that are to be inferred for the object. Initially, it is assumed that the object satisfies all the invariants that an object of its type could satisfy, and the meta data is initialized accordingly.
Then, whenever the garbage collection process is run, the heap executive updates the meta data of the objects on the heap. As the garbage collection process reaches each object, the heap executive checks which of the invariants are still satisfied by the object. For any invariants no longer satisfied by the object, the heap executive updates the meta data accordingly.
When an object dies (either when identified as garbage or at program termination), the heap executive reports the end state of the object's meta data. This end state reflects which invariants were satisfied across the lifetime of the object (although the heap executive alternatively can perform the invariant checking over some other interval).
The invariants discovered through this technique could be reintroduced to the source code as static annotations (e.g., in a language like Spec#) to facilitate further code development. Also, the invariants could be learned then enforced at runtime (or through static analysis) to find bugs—those parts of the program code that violate the invariants. In one example application, the invariants discovered by the technique are introduced back into the source code of the program as static annotations. After changes in the source code from further development of the program, the heap executive checks that the objects created by the program on the heap at run-time continue to satisfy the invariants specified in these annotations.
In another particular application, this dynamic invariant inference by leveraging garbage collection technique can be applied to the identification of heap-based bugs using anomaly detection that is described by Trishul Chilimbi and Vinod Ganapathy, “HEAP-BASED BUG IDENTIFICATION USING ANOMALY DETECTION,” U.S. patent application Ser. No. 11/134,812, filed concurrently herewith (the disclosure of which is hereby incorporated herein by reference). More particularly, the heap executive implements a runtime tool that analyzes heap behavior during execution of a program to identify relatively stable properties (the invariants). The tool then detects the occurrence of anomalies deviating from the observed properties, which may lead to finding bugs.
Additional features and advantages of the invention will be made apparent from the following detailed description of embodiments that proceeds with reference to the accompanying drawings.
The following description is directed to techniques for dynamic invariant inference leveraging garbage collection. The techniques are described by reference to an exemplary software analysis tool implemented in a heap executive of a run-time, program-execution environment that provides garbage collection.
With reference to
The heap executive 130 provides a set of system-level services, including: a heap allocator 140 that provides allocation of heap memory for data structures to the program 110; and a garbage collector 150 that manages the allocated heap memory to reclaim memory from “dead” objects. The program 110 calls the heap allocator 140 through an application programming interface (API) to have space allocated on the heap 120 for data structures or objects that the program dynamically creates during its execution. The garbage collector 150 periodically runs a garbage collection process, which traverses the objects created by the program on the heap 120 to identify and reclaim space from any of the program's objects that are no longer reachable (i.e., “dead”). The heap allocator 140 and garbage collector 150 can employ conventionally known memory allocation and garbage collection processes.
The heap executive 130 additionally includes an invariant inference service 160 that implements the dynamic invariant inference leveraging garbage collection technique described herein. The invariant inference service 160 hooks the heap allocator 140 and garbage collector 150 services, so that the invariant inference service 160 can create and update meta data 162 about the objects created by the program 110 on the heap 120 as the program executes. The invariant inference service 160 also creates an invariant report 164 with information of the invariants it has inferred about the objects 122-125 on the heap.
With reference now to
In general, the basic operation of the invariant inference service 160 (invariant inference leveraging garbage collection process 200) is to track object allocations of the program 110 and store some meta data representing invariants of the objects based on their respective type. In the exemplary implementation, the invariant inference service optimistically assumes that the object will satisfy the invariants that an object of its type could satisfy. When the garbage collection is periodically run, the invariant inference service updates the meta data of the objects. When the garbage collection visits each object on the heap to test whether the object is reachable, the invariant inference service also checks whether the object satisfies the various invariants that it is tracking for the object. The invariant inference service updates the meta data of an object accordingly for any invariants that the object is found to no longer satisfy. When the object dies (e.g., from garbage collection or at program termination), the invariant inference service reports the final state of the invariants that it has tracked. The invariant inference service can compile a report of the heap invariants over the life of the program, possibly including a post-processing or off-line analysis of the invariant data.
The exemplary implementation of the invariant inference service performs this heap invariant inference technique by inserting various hooks at various points into the normal heap management operations performed by the heap executive (i.e., memory allocation by the heap allocator API service 140 and the garbage collection operations of the garbage collector 150). More particularly, the invariant inference service inserts an invariant inference service initialization routine 230 at system startup of the heap executive 130. For tracking invariants while the program 110 runs (at stage 210), the invariant inference service inserts a hook 240 at object allocation by the heap allocator 140, and hooks 250, 260 at the points that the garbage collector performs its object reachable test on an object and processes a dead object in its periodic garbage collection passes. Then, at the program termination and system shut down stage 220, the invariant inference service inserts an invariant reporting procedure 270. These parts of the invariant inference service are described in more detail below.
In this exemplary implementation of the invariant inference service, the invariants are inferred across the lifetime of objects, although the inference could alternatively be performed over other periods.
In an alternative implementation of the invariant inference service, the invariant inference service could also defer creating and initializing the meta data for the object until the garbage collection iteration following the object's creation. This would potentially enhance efficiency by avoiding allocating meta data for short-lived objects that do not survive long enough after memory allocation to reach a garbage collection. However, the exemplary invariant inference service allocates the meta data at the memory allocation hook to also collect information as to the call site of the allocator for invariants relating to this information.
More specifically,
If the garbage collector 150 uses a garbage collection process that may move objects in the heap, the invariant inference service further hooks the garbage collector's procedure that moves objects. With this hook, the invariant inference service updates its mapping from the object address to its corresponding meta data. Also, the invariant inference service updates the meta data to appropriately reflect any pointer fields that are forwarded in the move, such as that the object's field pointing to one location has been forwarded to another location. Otherwise, a constant pointer could appear to be variable since it changes value.
The invariant inference service 160 can infer various invariants or characteristics of the data structures on the heap. In an exemplary implementation, the invariant inference service infers a set of generic invariants of the program's heap objects, as well as a set of type-specific invariants of the objects. In alternative implementations, the set of invariants can be extended to infer other invariants in addition to those inferred in this exemplary implementation, or can omit invariants inferred by the exemplary implementation. Further, the exemplary invariant inference service infers intra-object invariants, but alternatively can be extended to also infer inter-object invariants.
The set of generic invariants that the invariant inference service in the exemplary implementation checks for all program objects include the following invariants for each of the program object's reference fields:
In the exemplary implementation, the set of invariants inferred for program objects of type array t[ ] can include:
A number of invariants can be inferred for Collection types, such as:
Additionally, specific Collection types can have specific invariants, such as:
In an alternative implementation, the invariant inference service can be extended to also infer inter-object invariants in addition to intra-object invariants listed above. In one example alternative implementation, the invariant inference service infers inter-object invariants as a post-process following program termination. For use in this post-processing, the invariant inference service tracks memory addresses of the heap objects 122-125 during program execution at garbage collection iterations, and emits or logs these memory addresses in the meta-data. After program termination, the invariant inference service processes this information to reconstruct portions of the heap inferred as “constant” for a given garbage collection iteration. By then examining the object reference fields inferred as constant after program termination, the invariant inference service reconstructs the portion of the heap that has remained constant for the life of the program, and infers the inter-object invariants. This alternative implementation can then infer inter-object invariants, such as the following:
The foregoing description provides representative examples of invariants that can be discovered via the invariant inference leveraging garbage collection technique, and is not intended to be comprehensive or complete. Many other invariants of program objects on the heap that are similar to those discussed above also can be discovered using this technique.
In one example application of the above described invariant inference leveraging garbage collection process 200 (
More particularly, in one implementation 900 of the invariant inference leveraging garbage collection process 200 in a debugger, the heap executive with the invariant inference service 160 is used on a program in development to detect the introduction of bugs during the development process. The program is subjected to the invariant inference process 200 initially in a first invariant discovery run of the program. The invariant inference process is applied again in a debugging run after further edits have been made to the program. In the initial invariant discovery run, source code 905 of the program is compiled by compiler 910 into the executable program 110. The executable program is run in the run-time environment 100 (
The source code then may be edited by the programmer in further development, such as to add further features to the program or otherwise modify its operation. After these edits, the edited and annotated source code is again compiled by compiler 910 into the executable program 110 and again run in the run-time environment 100. Again, the invariant inference process produces the invariant report 164. This time, a bug detector 940 processes the invariant report 164. The bug detector compares the invariants reported in this debugging run of the program to the invariants specified by the annotations in the edited and annotated source code 935. The bug detector reports any violations (differences in the reported invariants from those specified in the annotations) as bugs in a bug report 945. The bug detector can be implemented to operate as an off-line or post-process on the invariant report resulting from an execution of the program in the run-time environment. Alternatively, the bug detector can be implemented to operate in real-time in combination with the invariant inference process 200, such as also during garbage collection passes. In this way, the bug detector can detect and report violations of the annotated invariants as the program is running.
In one particular application, the invariant inference leveraging garbage collection described herein can be applied to the heap-based bug identification using anomaly detection technique described by Trishul Chilimbi and Vinod Ganapathy, “HEAP-BASED BUG IDENTIFICATION USING ANOMALY DETECTION,” U.S. patent application Ser. No. 11/134,812, filed concurrently herewith (the disclosure of which is hereby incorporated herein by reference). More particularly, the invariant inference service described herein can be used to infer relatively stable properties (the invariants) of heap objects in a first execution of a program (or previous phases of execution of a long running program). Then, an anomaly detection tool (which may again be implemented using the invariant inference service) detects the occurrence of anomalies where the objects' heap behavior deviates from their previously observed invariants. The anomaly detection tool can be implemented as an off-line process that compares the invariants reported by the invariant inference service in a first execution of the program to those reported in subsequent executions. Alternatively, the anomaly detection can be implemented as a run-time tool in which the invariants tracked by the invariant inference service are compared to invariants reported in a previous execution of the program (or previous phases of execution of a long running program) to detect the occurrence of anomalies where the object deviates from the previously reported invariants.
In a further example application, the above described invariant inference leveraging garbage collection process 200 (
The above described exemplary software analysis tool 100 (
With reference to
A computing environment may have additional features. For example, the computing environment 1000 includes storage 1040, one or more input devices 1050, one or more output devices 1060, and one or more communication connections 1070. An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the components of the computing environment 1000. Typically, operating system software (not shown) provides an operating environment for other software executing in the computing environment 1000, and coordinates activities of the components of the computing environment 1000.
The storage 1040 may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, CD-RWs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing environment 1000. The storage 1040 stores instructions for the software 1080 of the exemplary analysis tool implementing the heap invariant inference leveraging garbage collection techniques.
The input device(s) 1050 (e.g., for devices operating as a control point in the device connectivity architecture 100) may be a touch input device such as a keyboard, mouse, pen, or trackball, a voice input device, a scanning device, or another device that provides input to the computing environment 1000. For audio, the input device(s) 1050 may be a sound card or similar device that accepts audio input in analog or digital form, or a CD-ROM reader that provides audio samples to the computing environment. The output device(s) 1060 may be a display, printer, speaker, CD-writer, or another device that provides output from the computing environment 1000.
The communication connection(s) 1070 enable communication over a communication medium to another computing entity. The communication medium conveys information such as computer-executable instructions, audio/video or other media information, or other data in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media include wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.
The analysis tool and techniques herein can be described in the general context of computer-readable media. Computer-readable media are any available media that can be accessed within a computing environment. By way of example, and not limitation, with the computing environment 1000, computer-readable media include memory 1020, storage 1040, communication media, and combinations of any of the above.
The techniques herein can be described in the general context of computer-executable instructions, such as those included in program modules, being executed in a computing environment on a target real or virtual processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed within a local or distributed computing environment.
For the sake of presentation, the detailed description uses terms like “determine,” “generate,” “adjust,” and “apply” to describe computer operations in a computing environment. These terms are high-level abstractions for operations performed by a computer, and should not be confused with acts performed by a human being. The actual computer operations corresponding to these terms vary depending on implementation.
In view of the many possible embodiments to which the principles of our invention may be applied, we claim as our invention all such embodiments as may come within the scope and spirit of the following claims and equivalents thereto.
Number | Name | Date | Kind |
---|---|---|---|
4862373 | Meng | Aug 1989 | A |
5220667 | Ichieda | Jun 1993 | A |
5333311 | Whipple, II | Jul 1994 | A |
5713008 | Falkner | Jan 1998 | A |
5740443 | Carini | Apr 1998 | A |
5774685 | Dubey | Jun 1998 | A |
5815720 | Buzbee | Sep 1998 | A |
5909578 | Buzbee | Jun 1999 | A |
5925100 | Drewry et al. | Jul 1999 | A |
5940618 | Blandy et al. | Aug 1999 | A |
5950003 | Kaneshiro et al. | Sep 1999 | A |
5950007 | Nishiyama et al. | Sep 1999 | A |
5953524 | Meng et al. | Sep 1999 | A |
5960198 | Roediger et al. | Sep 1999 | A |
6026234 | Hanson et al. | Feb 2000 | A |
6073232 | Kroeker et al. | Jun 2000 | A |
6079032 | Peri | Jun 2000 | A |
6145121 | Levy et al. | Nov 2000 | A |
6148437 | Shah et al. | Nov 2000 | A |
6189036 | Kao | Feb 2001 | B1 |
6216219 | Cai et al. | Apr 2001 | B1 |
6233678 | Bala | May 2001 | B1 |
6311260 | Stone et al. | Oct 2001 | B1 |
6321240 | Chilimbi et al. | Nov 2001 | B1 |
6330556 | Chilimbi et al. | Dec 2001 | B1 |
6360361 | Larus et al. | Mar 2002 | B1 |
6370684 | De Pauw et al. | Apr 2002 | B1 |
6381735 | Hunt | Apr 2002 | B1 |
6404455 | Ito et al. | Jun 2002 | B1 |
6446257 | Pradhan et al. | Sep 2002 | B1 |
6560693 | Puzak et al. | May 2003 | B1 |
6560773 | Alexander et al. | May 2003 | B1 |
6571318 | Sander et al. | May 2003 | B1 |
6598141 | Dussud et al. | Jul 2003 | B1 |
6628835 | Brill et al. | Sep 2003 | B1 |
6651243 | Berry et al. | Nov 2003 | B1 |
6658652 | Alexander et al. | Dec 2003 | B1 |
6675374 | Pieper et al. | Jan 2004 | B2 |
6704860 | Moore | Mar 2004 | B1 |
6738968 | Bosworth et al. | May 2004 | B1 |
6848029 | Coldewey | Jan 2005 | B2 |
6886167 | Breslau et al. | Apr 2005 | B1 |
6904590 | Ball et al. | Jun 2005 | B2 |
6951015 | Thompson | Sep 2005 | B2 |
6957422 | Hunt | Oct 2005 | B2 |
7032217 | Wu | Apr 2006 | B2 |
7058936 | Chilimbi et al. | Jun 2006 | B2 |
7140008 | Chilimbi et al. | Nov 2006 | B2 |
7159038 | Rychlicki | Jan 2007 | B2 |
7181730 | Pitsianis et al. | Feb 2007 | B2 |
7293260 | Dmitriev | Nov 2007 | B1 |
7296180 | Waterhouse et al. | Nov 2007 | B1 |
7343598 | Chilimbi et al. | Mar 2008 | B2 |
7401324 | Dmitriev | Jul 2008 | B1 |
7506317 | Liang et al. | Mar 2009 | B2 |
7587709 | Chilimbi et al. | Sep 2009 | B2 |
7607119 | Chilimbi et al. | Oct 2009 | B2 |
7770153 | Chilimbi et al. | Aug 2010 | B2 |
20020133639 | Breslau et al. | Sep 2002 | A1 |
20020144245 | Lueh | Oct 2002 | A1 |
20020178401 | Ball et al. | Nov 2002 | A1 |
20030145314 | Nguyen et al. | Jul 2003 | A1 |
20030204840 | Wu | Oct 2003 | A1 |
20040015897 | Thompson et al. | Jan 2004 | A1 |
20040015930 | Wu | Jan 2004 | A1 |
20040025145 | Dawson | Feb 2004 | A1 |
20040078381 | Blandy et al. | Apr 2004 | A1 |
20040088699 | Suresh | May 2004 | A1 |
20040103401 | Chilimbi et al. | May 2004 | A1 |
20040103408 | Chilimbi et al. | May 2004 | A1 |
20040111444 | Garthwaite | Jun 2004 | A1 |
20040133556 | Wolczko et al. | Jul 2004 | A1 |
20040181782 | Findeisen | Sep 2004 | A1 |
20040215880 | Chilimbi et al. | Oct 2004 | A1 |
20040216091 | Groeschel | Oct 2004 | A1 |
20050086648 | Andrews et al. | Apr 2005 | A1 |
20050091645 | Chilimbi et al. | Apr 2005 | A1 |
20050149904 | Ball et al. | Jul 2005 | A1 |
20050182603 | Freitas et al. | Aug 2005 | A1 |
20050235257 | Ball et al. | Oct 2005 | A1 |
20050246696 | Alexander et al. | Nov 2005 | A1 |
20060070040 | Chilimbi et al. | Mar 2006 | A1 |
20060155791 | Tene et al. | Jul 2006 | A1 |
20060242636 | Chilimbi et al. | Oct 2006 | A1 |
20060265438 | Shankar et al. | Nov 2006 | A1 |
20060265694 | Chilimbi et al. | Nov 2006 | A1 |
20070083856 | Chilimbi et al. | Apr 2007 | A1 |
20070169000 | Havin et al. | Jul 2007 | A1 |
20070244942 | McCamant et al. | Oct 2007 | A1 |
20080005208 | Vaswani et al. | Jan 2008 | A1 |
Number | Date | Country | |
---|---|---|---|
20060265438 A1 | Nov 2006 | US |