This disclosure relates generally to the field of computer software systems and in particular to methods for the effective typestate and lifetime dependency analysis of software systems such as those written in C/C++.
As is known, object oriented languages including Java and C++ are now extensively used to construct large-scale software and systems. As contemporary society increasingly relies on such software and systems, scalable techniques for checking the correctness, reliability and robustness of such software and systems becomes increasingly important. And while a number of scalable static analysis techniques for C and Java have been proposed, there has been comparatively little work done on the static analysis of C/C++ programs. Consequently the development of such techniques would represent a welcome addition to the art.
An advance is made in the art according to an aspect of the present disclosure directed to methods that identify correctness, performance, and maintenance issues (bugs) in C++ programs using bug patterns. Advantageously, a pattern-based method according to the present disclosure using simple patterns may detect even complex bugs involving lifetimes of objects.
Viewed from one aspect, the present disclosure is directed to typestate and lifetime dependency analysis methods for identifying bugs in C++ programs. Disclosed are an abstract representation (ARC++) that models C++ objects and which makes object creation/destruction, usage, lifetime and pointer operations explicit in the abstract model thereby providing a basis for static analysis on the C++ program. Also disclosed is a lifetime dependency that tracks implied destructions between objects such that an effective high-level abstraction for issues involving temporary objects and internal buffers and subsequently used in the static analysis that supports typestate checking for the C++ program. Finally disclosed a framework that automatically genaerates ARC++ representations from C++ programs and performs typestate checking to detect bugs that are specified as typestate automata over ARC++ representations.
A more complete understanding of the present disclosure may be realized by reference to the accompanying drawings in which:
The following discussion and attached Appendix merely illustrates the principles of the disclosure. It will thus be appreciated that those skilled in the art will be able to devise various arrangements which, although not explicitly described or shown herein, embody the principles of the disclosure and are included within its spirit and scope.
Furthermore, all examples and conditional language recited herein are principally intended expressly to be only for pedagogical purposes to aid the reader in understanding the principles of the disclosure and the concepts contributed by the inventor(s) to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.
Moreover, all statements herein reciting principles, aspects, and embodiments of the disclosure, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently-known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
Thus, for example, it will be appreciated by those skilled in the art that the diagrams herein represent conceptual views of illustrative structures embodying the principles of the invention.
In addition, it will be appreciated by those skilled in art that any flow charts, flow diagrams, state transition diagrams, pseudocode, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
In the claims hereof any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements which performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The invention as defined by such claims resides in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. Applicant thus regards any means which can provide those functionalities as equivalent as those shown herein. Finally, and unless otherwise explicitly specified herein, the drawings are not drawn to scale.
Thus, for example, it will be appreciated by those skilled in the art that the diagrams herein represent conceptual views of illustrative structures embodying the principles of the disclosure.
By way of some additional background, we note that as contemporary software development has increased a need for higher levels of abstractions in the software development industry, software programming teams have significantly shifted programming languages used to object-oriented languages such as Java or C++. The benefits of using an object-oriented language are well known and include—among others—maintainability, encapsulation, and inheritance. Despite the use of such languages however, it is nevertheless becoming more difficult to test and debug software due to large code bases and increasing complexity.
Whereas, a large volume of work on verification has focused on C programs or Java programs, there has been comparatively little work on the verification of C++ programs. C++ has a number of distinguishing features that makes it difficult and—in some cases—impossible to use the verification techniques developed for other languages such as C and Java.
More particularly, C++ is deliberately chosen for a software project due to its ability to fully interact with legacy C-based systems, including system-level, C-based, application programming interfaces (APIs). Therefore, development in C++ necessitates a mixed programming style combining features of high-level object-oriented constructs and lower-level C-based code. Moreover, the semantics of inheritance, virtual-function dispatch, and exceptions are different from other object-oriented languages such as Java. Consequently, there is a need to develop methods for the automatic verification and testing targeted at C++ programs.
According to an aspect of the present disclosure, an algorithm is disclosed to find typical correctness, performance, and maintenance issues in C++ programs using bug patterns. As used herein, bug patterns are code idioms that are likely to be errors and describe coding practices that arise from misunderstanding of the language semantics, or simple and common mistakes. For example, absence of a copy constructor when the associated class has pointer fields is typically a bug. Similarly, dereferencing a Standard Template Library (STL) iterator without checking that it is not pointing within iterator bounds is most probably a bug. To find such bugs, our disclosure presents a framework for developers to specify bug patterns and disclose further a static analysis method to automatically detect the presence of such bug patterns in a software program.
As may be readily appreciated by those skilled in the art, one of the peculiarities of C++ semantics is related to the lifetime(s) of temporary objects. More particularly, in C++, temporary objects are often created by a compiler and cause performance and correctness issues that are hard to find and understand.
As is generally understood, temporary objects are unnamed objects created on a stack by the compiler. They are used during reference initialization and during evaluation of expressions including standard type conversions, argument passing, function returns, and evaluation of the throw expression. Performance bottlenecks can arise through the necessary creation and destruction of such temporary objects. Correctness issues can arise due to the complex lifetime semantics of temporary objects often leading to accesses of previously freed/destructed memory.
The use of a mixed C and C++ programming (programs comprising both C and C++ programming) links the lifetimes of objects in complex ways. For example, consider a class that has a method ‘foo’ that returns an internal buffer and another method ‘bar’ that possibly reallocates the same internal buffer. Incorrect interactions of ‘foo’ and ‘bar’ can result in use-after-free errors.
As we shall disclose, our pattern-based bug detection framework can advantageously detect even complex bugs involving lifetimes of objects using simple patterns. As noted above, temporary objects have an impact both on correctness and performance, and mixed C+ and C++ programming links object lifetimes in complex ways.
Generally, the correctness issues related to object lifetimes are hidden during testing due—in part—to the fact that stale uses of object storage often occurs shortly after destruction of the object. Nevertheless, in an actual deployed production environment such short-term stale uses cause hard to find runtime errors, and memory corruption, leading to memory faults in the future. Furthermore, such memory corruption can also potentially be exploited by malicious user
According to the present disclosure, a bug pattern is provided as a finite state machine (FSM) with a designated error state that is only reachable in the FSM for buggy code patterns. The finite state machine formalism is used fir this purpose. To make it easy to specify bug patterns, we annotate the given program with several high level notions such as ObjectCreation, ObjectDestruction, etc. We refer to these abstractions or annotations as ARC++.
For the given bug pattern, we perform a call-summary-based static analysis that computes the set of reachable FSM states for each point in the program. Static analysis consists of a number of stages that are required for solving the problem.
First, we need a finite representation for the potentially infinite set of heap and stack objects during static analysis. To this end, we describe an object abstraction based on access paths in the program and a notion of object clusters. As used herein, access paths correspond to the data access expressions in a C++ program. An object cluster represents a set of concrete objects that are potentially abased to each other.
In some cases, the bug pattern may involve objects of more than one type. In such cases, we have defined a dependency graph that links objects that are related by the bug pattern. That is, an edge in the dependency graph between object o and p means that the state of one object is dependent on the other. Based on this notion, we build method summaries, where the behavior of methods and their side-effects on parameters and globals with respect to dependencies is captured.
Subsequently, we perform a call-summary-based program analysis based on the object abstraction and dependency graph (if needed) and compute an over-approximation for the set of FSM states that are reachable at every program point. If any program point contains the error state, then it is reported to the user.
One particularly interesting aspect of the present disclosure is observed is when the tracked dependency is related to the lifetime of objects. In such cases, if an operation, modification or destruction of object o causes the lifetime to expire for Object p, we introduce special liftetime dependency edges. Advantageously, these can be used to easily discover stale uses of objects after their lifetime has expired due to a modification of another object.
Turning now to
According to the present disclosure, an abstract interpretation is performed and is shown schematically in
With reference now to
Turning now to
The foregoing is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description and the attached Appendix, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the principles of the present invention and that those skilled in the art may implement various modifications without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention.
Number | Date | Country | |
---|---|---|---|
61803697 | Mar 2013 | US |