The present invention relates to program analysis and in particular to a technique for extracting dependencies in a program written in an object-oriented program through dynamic analysis.
If the relation between inputs and outputs that can be known from the specifications of acomputer system is inherently indeterministic, a cause-effect relation between the inputs and outputs is often undetermined. For example, the order in which incoming inputs are processed in a multi-thread system can vary in an indeterministic manner depending on the state of execution of threads, and therefore an output cannot uniquely be determined from an input. This is a very common circumstance in programming models that use many threads in order to improve the level of parallel processing.
When a test is performed on such a system, the assertion between an input and an output cannot be described because no output that corresponds to the input can be determined. Here, the term test refers to in general an act that verifying the consistency between a model and an implementation. A black-box test is a method for testing a system by using only the specifications of the system as knowledge used in a model. An input is given to a system under test to obtain an output and verification is conducted as to whether the input and output meet the specifications. Each test is called a test case, which consist of a set of inputs and outputs.
A condition that associates between an input and the corresponding output is described in the assertion. Consider a system computing the largest integer (M) that does not exceed the n-th power root of a given real number (d) . If the system receives one real number and one integer (n) as inputs, and outputs one real number, then the assertion will be the following conditional expression:
((Mˆn<=d)&&(((M+1)ˆn)>d))
Methods that uses conventional techniques for performing a test in a case where system operations that meet specifications cannot deterministicallybedescribedmayincludeamethodinwhichamessage history is traced to verify the operations and a method in which cause-effect relations in data are traced during run time. If the method in which a message history is traced for verification is used, the input corresponding toanoutputmaybe identified fromthehistory of messages such as method calls between objects. A technique has been proposed that uses run-time tracing in run-time analysis of an object-oriented program to analyze cause-effect relations in parallel processing or cause-effect relations between operations. (Shreeram Salasrabudhe and Hector Munoz-Aliva, “Mining Cause-Effect Sequential Patterns from Action Traces”, [online], August 2003 (searched on Dec. 1, 2004) Internet URL: http://www.Lehigh.edu/˜sas4/html/ShereeramMunoz-ICML2003.PDF)
Another technique has been proposed that enables queries about relations between objects or about changes at a certain point of time during run time.
(Raimondas Lencevicious, Urs Holzle and Ambuj K. Singh, “Dynamic Query-Based Debugging of Object-Oriented Prgrams”, Journal of Automated Software Engineering, Volume 10, Number 1, pp. 39-74, Kluwer, January 2003)
If the method in which cause-effect relations in data are traced during run time is used, the input corresponding to an output may be identified from a history of access to physical memory data. An efficient differential debugging technique has been proposed that traces cause-effect relations in data to extract them as a graph and detects errors exclusively from the separated cause effect though it does not consider any about objects.
(Andreas Zeller, “Isolating Cause-Effect Chains from Computer Programs”, Proc. of ACM SIGSOFT 10th International symposium on the Foundations of Software Engineering (FSE-10), pp. 1-10, November 2002)
Another technique has been proposed for displaying a visible graph of the results of analysis of data dependencies traced.
(Thomas Zimmermann and Andreas Zeller, “Visualizing Memory Graphs”, Proc. of the Dagstuhl Seminar 01211, “Software Visualization”, Lecture Notes in Computer Science (LNCS) 2269, pp. 191-204, Springer-Verlag, May 2001)
As described above, the techniques that trace a message history or cause-effect relations in data during run time can be used to perform a test in a case where system operations that meet specifications cannot determinatively be described. However, none of these testing methods can be applied to black-box tests in which only the specifications are used as knowledge in a model.
To verify a system by tracing amessage history, what results internal messages actually produced on the system must be known beforehand. That is, the specifications of the internal messages must be known. This information cannot be obtained if a block-box test is to be performed. Moreover, in actual systems, some operation that are not saved inmessagehistories like exception ordirect manipulations of field can make changes to object. Therefore, simply tracing the message history does not provide sufficient cause-effect relations between inputs and outputs for testing.
The conventional technique disclosed in (Raimondas Lencevicious, Urs Holzle and Ambuj K. Singh, “Dynamic Query-Based Debugging of Object-Oriented Prgrams”, Journal of Automated Software Engineering, Volume 10, Number 1, pp. 39-74, Kluwer, January 2003) is a sort of interactive debugger that enables queries about relations between objects or about changes made to objects and, in theory, maybe capable of providing information required for cause-effect analysis in a certain period of time. However, the technique is not realistic because execution must be stopped on a-change-by-change basis to run a query about a change and log the change in order to record the history on the entire system.
If the technique that traces cause-effect relations in data during run time is applied to an object-oriented language such as Java® (Java® is a trademark of SunMicrosystems, Incorporated in the United States) that provides GC (Garbage Collection), the relation between an object, which is moved as a unit, and physical memory data must be externally provided. This is because, in a program in an object-oriented language that provides GC, addresses that holdvalues change during run time. Furthermore, if assertions can be examined in a test, it is essential to debugging to know why an assertion failed, in analysis after the test. However, cause-effect relation analysis based on data, the lack of object information makes it difficult to know the cause of a failure from a cause-effect relation obtained.
The conventional techniques disclosed in (3 and 4) do not include considerations of metadata that constitutes an object into a cause-effect relation and therefore lack information as to whether which data dependency corresponds to which object. Accordingly, these metadata cannot directly be connected with inputs in the cause-effect relation in an object-oriented environment unless an unrealistic method that specifies a particular memory address of the input is provided to the system under test.
Analysis of the cause-effect relation between fields using cause-effect relations in data requires metadata indicating how the data constitutes objects. That is, information as to which data corresponds to which filed of an object is require. Data-based cause-effect analysis that simply uses a conventional technique cannotprovide informationabout objectsunlessmetadataas described above is provided. It is conceivable that, without metadata, operations that give considerations to objects will be difficult to perform in interactive debugging.
Although it is not a technique for testing or debugging, a tainted flag uses such cause-effect analysis during runtime. This technique is used in a language such as Perl. A “tainted” state is dynamically stored in a variable and the flag propagates with propagation of information. Accordingly, the variable canbe checked to seewhether it may contain information that can endanger security. However, this technique can be used only for cause-effect analysis of a piece of information that indicates a “tainted” state and is not a generally applicable technique.
The present invention has been made in light of the technical problems described above. An object of the present invention is to enable analysis of a cause-effect relation (dependency) between a state of an object and a state of another object in a program written in an object-oriented language. With this, a cause-effect relation between an input and an output can be analyzed on the level of states of objects and a black-box test can be performed on a system in which the correspondence between an input and an output is not uniquely determined from the specifications of the system.
To achieve the object, the present invention is implemented as an apparatus that performs a test on a program of interest and has a configuration as described below. The apparatus includes a cause-effect relation extracting section that extracts a cause-effect relation between a state of an object and a state of another object by executing a program under test step by step and obtaining in each step a history of changes made to fields of objects and information about fields that have caused the changes; and testing section that performs a test based on an assertion of the test that is set between a given object to be examined and causative object by analyzing a cause-effect relation to the given object to be examined on the basis of the cause-effect relations between objects extracted by the cause-effect extracting section to detect the causal object.
More specifically, the cause-effect relation extracting section includes a stepwise executing section that executes a program under test step by step; and an abstract interpreting section that extracts in each step a cause-effect relation between a state of an object and a state of another object by obtaining a history of changes made to fields of objects and information about the fields of objects that have caused by changes on the basis of the result of the execution of the program under test by the stepwise executing section. The abstract interpreting section analyzes a portion in which a conditional branch based on the state of a given object affects the state of another object to obtain a cause-effect relation of an object governed by the conditional branch. Furthermore, the abstract interpreting section generates a based on the cause-effect relations between the objects, the directed graph being a graph in which each field of each object is represented by a node and an edge directed from a field that has caused a change to a changed field is provided.
The testing section can record the correspondence between each individual input in the test and an object actually generated from the input prior to the execution of the test, set an assertion between an input object that has caused an output object and the output object, and perform the test. The cause-effect relation extracting section can perform the execution of the program under test and the extraction of cause-effect relations between objects twice, store the extracted cause-effect relations between objects at the first performance, produce a copy of the objects on which the testing section specified as system under test inaddition to the cause-effect relations between the objects at the second performance.
The apparatus further includes a graph output section of generating and outputting a graph based on the cause-effect relations between objects extracted by the cause-effect relation extracting section, in which cause-effect relations to an object are indicated together by expressing a history of changes made to a field of the object during the lifetime of the object as a life line and by linking other objects that have caused the changes of the field to the given object by arrows to describe the cause-effect relations.
The present invention which achieves the objects is implemented as the following method for analyzing a program under test by a computer. The method includes a first step of, by the computer, executing the program under test step by step to obtain the result of the execution and saving the result of the execution a second step of, by the computer, reading the saved obtained execution result, obtaining in each step a history of changes made to fields of objects and information about fields of an objects that have caused the changes, extracting cause-effect relations between objects, and saving the cause-effect relations; and a third step of, by the computer, reading the saved cause-effect relations between the objects device and generating, a directed graph in which each field of each object is represented by a node and an edge is provided from a node that has caused a change to a changed node on the basis of the information including the cause-effect relations. The method may further include the steps of: by the computer, analyzing a cause-effect relation to a given object to be examined, on the basis of the cause-effect relation between the objects extracted at the second step to detect an causative object; and by the computer, performing a test on the basis of a test assertion set between the object to be examined and the causative object. Moreover, the method may further include the step of, by the computer, generating and outputting a graph based on the cause-effect relations extracted at the second step, in which cause-effect relations to a given object are indicated together by expressing a history of changes made to a field of the object during the life time of the object as a life line and by linking other objects that have caused the changes of the field to the object by arrows to describe the cause-effect relations. The step of outputting the graph in which the cause-effect relations to a given object are indicated together displays other objects and fields progressively that have affected the given object, starting from the given object. The method may display the graph if the program fails the test or if the program passes the test.
The present invention is implemented as a program that controls a computer to function as the apparatus describe above, or a program that causes a computer to perform a process equivalent to each step of the program analyzing method described above. The program can be delivered as a program stored in a magnetic disk, an optical disk, or a semiconductor memory, or other recording memory, or can be distributed over a network.
According to the present invention configured as described above, information is collected while executing a program under analysis step by step to obtain a history of changes made to fields of each object and information about causes of the changes, thereby obtaining a cause-effect relation between a state of an object and a state of another object. By considering the cause-effect relations, the cause-effect relations between inputs and outputs can be analyzed on a state-to-state basis of objects in a test of the program written in the object-oriented program. Because relations between inputs and outputs on a system in which the relations between inputs and outputs cannot uniquely be determined from its specifications can be known, a black-box test can be performed even on such a system.
The present invention traces operations of systems during execution of a program to analyze cause-effect relations of changes of objects. In particular, a directed graph indicating a cause-effect relation as to which object's state generated or changed an object's state (the directed graph is hereinafter referred to as the dependency graph) is generated. Then, the system (program) of interest is tested by analyzing the dependency graph to obtain the input used for an output and examining a condition between the input and the output. In this way, a black-box test can be conducted on the system (program) without altering the system (program). Moreover, an edited display version of the dependency graph is presented to a user to allow the user to efficiently analyze as to which object's state generates the state of an object of interest (for example an incorrect object) or the state of an object that appears in an assertion, and to debug the system.
It should be noted that the hardware configuration of the computer implementing the present embodiment shown in
The cause-effect relation extracting section 10 of the testing apparatus 100 shown in
The cause-effect relation extracting section 10 includes a stepwise executing section 11 and an abstract interpreting section 12 as shown in
A cause-effect relation between the state of an object and the state of another object may be established by a conditional branch. An example of such a code is shown in
How the cause-effect extracting section 10 generates a dependency graph will be detailed below. The concept of a slot will be defined first. The concept of a slot implies an area (storage location) represented by a local variable or a stack variable (which are thread-context-dependent), or fields of an object (class field and instance field). Using slots, a dependency can be defined as follows:
The left side of this definition indicates the depended-on slots and the right side represents the depending slot. That is, the expression consists of n depended-on slots and one depending slot. This means that values are set from the n slots on the left side into the slot on the right side. An object can be identified by a unique object ID provided by a debugging API or a unique ID assigned by the abstract interpreting section 12 itself in generating the object.
In the cause-effect relation extracting section 10, the stepwise executing section 11 executes a program under test stepwise and the abstract interpreting section 12 receives events from the stepwise executing section 11 to obtain information as to which instruction in the program under test was executed. Then, the abstract interpreting section 12 traces depended-on slots of stack variables according to the information. It also traces depended-on slots of local variables and fields of objects. Once depended-on slots of a particular slot have been set in an another slots, a cause-effect relation is established between the two slots.
The abstract interpreting section 12 obtains information about cause-effect relations between slots in this way, generates a list of history of changes of slots (dependency list), and stores the list in the main memory 103 or a cache memory of the CPU 101 shown in
Referring to
The abstract interpreting section 12 further generates a graph (dependency graph) representing dependencies between slots. FIG. 6 shows a dependency graph generated from the dependency list shown in
Then, the abstract interpreting section 12 extracts nodes corresponding to the depended-on slots from the dictionary. If the abstract interpreting section 12 cannot find a node corresponding to a depended-on slot, it creates a new node for that depended-on slot. This situation occur in the case where only certain classes are traced or only a recent finite number of pieces of trace information are held for the purpose of reduction of memory usage or execution time. The label of that depended-on slot in the dependency list is stored in the new node (step 704).
The abstract interpreting section 12 then provides edges directed from the depended-on nodes to the depending node to link them (step 705). Then, it registers the node corresponding to the depending slot in the dictionary (step 706).
Then, the abstract interpreting section 12 returns to step 702, searches the dependency list for additional dependencies to be dealt with. If there are additional dependencies, the abstract interpreting section 12 repeats the operations from step 703 to step 706. After processing all dependencies contained in the dependency list, the abstract interpreting section 12 removes the nodes representing stack variables or local variables from the dependency graph generated through the process. Then, dependencies originally linked to those nodes are replaced with a direct edge, and then the process end (step 707).
Instead of storing the history of changes made to slots, a copy of objects or fields may be made and stored to provide information required for generating a dependency graph. By storing a history of changes to slots or a copy of objects (or fields), a state of the objects at any time point can be reproduced. The states of objects at all time points may not always be required. In some cases, checking the state of objects during debugging or testing may be enough. For example, the state of an immutable object does not changes its state and therefore checking the last state of the object is enough. However, objects that are no longer referred to are usually deleted by garbage collection (GC) and therefore even objects in the last state may not be able to be obtained during testing or debugging described later. The state of such objects that exist only temporarily during execution can be known at any time by recording them in an appropriate order or by protecting objects of interest from GC by using the ObjectReference.disableCollection( ) function of a debugging API (for example JDI (Java® Debugger Interface), which is a Java® debugging API). Furthermore, saving a copy of objects relating to a change when recording a slot change history enables the state of the objects to be known in detail during debugging. Moreover, objects that exist only temporarily can made available during testing or debugging by protecting the objects from GC as describe above even if a change history is not recorded or the objects are not copied.
Storing the states of all temporary objects of history of all changes to slots requires a large amount of memory, whichmaybe impractical. To solve the problem, cause-effect relations between states of objects may be extracted twice (the program under test is traced twice) . In that case, only the IDs of objects and cause-effect relations are recorded during the first trace without storing the states of temporary objects or a history of changes to slots. The record of the cause-effect relations obtained during the first trace is used to solve the inverse problem for an objects of interest (upstream search is performed in a cause-effect relation graph of the objects that affect the state of a selected object, starting from the selected object). An object of interest may be an output inatestoranobject tobe debugged, forexample. During the second trace, the states of only the temporary objects or the history of changes to only the slots that were found by the search is stored. The execution pattern of the second trace must be the same as that of the first trace. Therefore, it is preferable that an execution environment such as deJaVu or ConTest in which a program can be re-executed be used.
A method for tracing data dependencies in the exemplary implementation will described below with respect to an example in which a program under test written in Java is analyzed and JDI is used as a debugging API that provides functions of the stepwise executing section. In procedural object-oriented languages, including Java®, that have an exception handling mechanism, data dependencies are traced from an instruction described below.
The Java® Virtual Machine (VM) instruction set is defined in the Java® Virtual Machine Specification. The Java® Virtual Machine specification defines the meaning of the operation, increase or decrease in the level of operand stack, and the side effect of each instruction. However, there are ambiguities in the types of fields to be defined or used and in the types of parameters and a return value for method calls. Their actual types are determined only by referring to corresponding entries in the ConstantPool of the class that implement the methods. The Java® virtual machine has an architecture in which one slot consist of 32 bits and 64-bit long and double type operands use two contiguous slots. Accordingly, if the type of an operand is not known for the definition and the use of a stack variable, stack-level adjustments cannot properly be made.
The present exemplary implementation uses Java® Debugger Interface (JDI) as the debugging API. The JDI is a Java® API which is used by a debugger to control the execution of a program in a Java® virtual machine or to access an internal state and provides basic functions for implementing the debugger such as a stepwise execution and breakpoints. In the present exemplary implementation, the JDI is used to stepwise execute a system under test and an instruction string obtained is traced by focusing only on data dependencies. To trace data dependencies of stack variables, the size of operand must be known in order to make stack-level adjustments. However, the JDI does not provide means for accessing ConstantPools of classes. For simplicity, a specialized event that provide more information may be used in addition to step events for field definitions and uses, method calls, and returns, which contain ambiguities. In practice, the ambiguity of the type of stack variables can be eliminated by performing static analysis of methods in most cases because Java®-based systems ensure the uniqueness of the type of a stack variable at any reachable position in a method.
Event tracing by the abstract interpreting section 12 is performed on the entire system and on each of the instructions that trace the data dependencies given above (intra-frame, inter-frame, intra-thread (object) data dependencies), as described below.
On the Entire System
The abstract interpreting section 12 sets a ClassPrepareEventRequest for all classes (filter settable) in the system under test during the startup of the system under test. Then, the abstract interpreting section 12 sets ExceptionEventRequest, MethodEntryEventRequest, and MethodExitEventRequest for all threads (filter settable) and all classes (filter settable). It also sets a ThreadStartEventRequest for all threads (filter settable) in the system under test.
When a class is loaded, the abstract interpreting section 12 sets ModificationWatchpointEventRequest and AccessWatchpointEventRequest for all field (filter settable) of the class in processing ClassPrepareEvent.
When a thread is started, the abstract interpreting section 12 creates a corresponding thread for an abstract interpreting machine in handling the ThreadsStartEvent. It also sets a StepEventRequest for the methods of all classes (filter settable) executed by the thread.
Inter-Frame Data Dependency
The abstract interpreting section 12 records a sum set of data dependencies of multiple variables that are referenced by an instruction as data dependencies of the variables defined by the instruction, in both stack variable definition and use and local variable definition and use, in corresponding slots.
Intra-Frame Data Dependency
When MethodEntryEvent is processed, the abstract interpreting section l2 creates a new frame in a corresponding thread of the abstract interpreting machine for actual parameters in the method call. Then it copies the data dependency of local variables corresponding to the actual parameters of the method from the stack top of the previous frame.
WhenMethodExitEvent is processed, the abstract interpreting section 12 records in the current frame the data dependency at the stack top as the data dependency of the return value from the method. When StepEvent of the next instruction in the same threads is processed, the abstract interpreting section 12 pops one frame, pops the as many entries as the number of the actual parameters of the returned method from the stack, and pushes the data dependency of the returned value of the returned method as the data dependency at the stack top.
For exception occurrence and catch, the abstract interpreting section 12 records the data dependency at the stack top in the current frame as the data dependency of the exception object. When StepEvent of the next instruction in the same thread is processed, the abstract interpreting section 12 pops the same number of frames as the difference between the frame in which the exception has occurred and the current frame, pops all entries in the stack, and pushes the data dependency of the exception object as the data dependency at the stack top.
Intra-Thread (Object) Data Dependency
For field variable definition and use, the abstract interpreting section 12 pops the data dependency at the stack top and sets it as the data dependency of the field variable of the object when ModificationWatchPointEvent is processed. When AccessWatchpointEvent is processed, the abstract interpreting section 12 pushes the data dependency of the field variable of the object as the data dependency at the stack top.
The implementation described above is one example of the implementation for a program under test written in Java ®. Inpractice, various other implementations are possible according to the operating environment of a system in which the present embodiment is implemented and a language used and the like (Java® or other object-oriented language).
In this way, a dependency graph is generated in which case-effect relations between states of objects are reflected. The dependency graph generated is stored in a storage device such as the main memory 103 shown in
Data dependency between objects that are not involved in the computation of outputs are not necessarily required for testing and debugging. Therefore the instructions to those objects can be excluded from stepwise execution. This can prevent the stepwise executing section 11 from generating unnecessary events and eliminate the cost for the abstract interpreting section 12 to received and ignore the events. Also, threads in a system under test that are known not to affect the execution of the system under test can also be eliminated from stepwise execution. Examples of such threads include system threads such as a finalizer and a reference handler, and threads of a debugging API itself in a program executed on a Java® virtual machine.
According to the present embodiment, a dependency graph obtained as described above can be used to analyze cause-effect relations between inputs and outputs during testing a program under test or investigating the cause of failure during debugging. The testing section 20 may be implemented by the CPU 101 and the main memory 103 shown in
During testing, an input object can be changed or deleted. To prevent this, a method of tracing temporary objects as described earlier can be used. However, if performing a test is the sole purpose, a more efficient method can be used. That is, before performing the test, correspondence between an input in the test and an object generated by the input may be established. To examine the assertion, an input object (which may no longer live or may be changed) that corresponds an output object is obtained through cause-effect relation analysis using the dependency graph. Then an input in the test is obtained from the input object and the assertion between the input and output can be examined.
The graph output section 30 may be implemented by the CPU 101, the main memory 103, and the video card 104 shown in
FIGS. 8 to 11 show exemplary display graphs. The graphs in the examples shown in FIGS. 8 to 11 are graphical representations resembling a UML sequence diagram. In the examples in FIGS. 8 to 11, an output from a test performed by the testing section 20 is examined to see whether it is correct or not. First, searching for a cause-effect relation, starting at the Output object, shows that there is a relation in which an Average object outputted an average to the Output object (See
Searching a cause-relation, starting from the Result objects, shows that the two Result objects were copied from a ResultSet object (See
If a program under test contains a loop, at first the loop section can be compactly displayed as a mark indicating that there is a loop in that section, in order to prevent the display graph from being complicated. When an action such as a mouse click is performed on the loop section, the loop section can be developed and displayed.
Then the graph output section 30 extracts an unprocessed node associated with the object ID of an object to be analyzed from the dependency graph (step 1303). If there is an unprocessed node, the graph output section 30 refers to the dictionary to retrieve an array with the object ID and adds the node extracted at step 1303 to the retrieved array (steps 1304 and 1305). The node added to the array is marked as a processed node (step 1306).
Operation from step 1303 to step 1306 is repeated until no unprocessed node associated with the object ID of an object to be analyzed is left (step 1304: NO), then the graph output section 30 sorts the nodes stored in each of the arrays stored in the dictionary, in the order of the numbers contained in the dependency list (step 1307).
Thus, a display graph in which the cause-effect relations of the same objects are displayed together is generated. The graph output section 30 generates the image data of the display graph through the use of the video card 104 and displays it on the display device. The graph output section 30 may generate and display the display graph if the program fails the test performed by the testing section 20 or if the program passes the test.
It is important to note that while aspects of the present invention have been described in the context of a fully functioning computer system, those of ordinary skill in the art will appreciate that processes of the present invention are capable of being distributed in the form of a computer readable medium of instructions and a variety of forms and that the present invention applies equally regardless of the particular type of signal bearing media actually used to carry out the distribution. Examples of computer readable media include recordable-type media, such as a floppy disk, a hard disk drive, a RAM, CD-ROMs, DVD-ROMs, and transmission-type media, such-as digital and analog communications links, wired or wireless communications links using transmission forms, such as, for example, radio frequency and light wave transmissions. The computer readable media may take the form of coded formats that are decoded for actual use in a particular data processing system.
The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
Description of Symbols
Number | Date | Country | Kind |
---|---|---|---|
378560 | Dec 2004 | JP | national |