The present invention relates to the analysis of a computer program to illustrate the inner workings of the program. More particularly, the present invention relates to methods, systems, and computer program products for summarizing operational behavior of a computer program to produce output that is concise and easily understood.
Computer programs are often comprised of thousands of lines of code and hundreds of subroutines. As the program is developed it becomes increasingly more difficult to comprehend. As a result, software engineers use diagrams that help to explain the components that comprise a program and how those components interact.
In object-oriented software design, there are two basic logical views of a system, a static view and a dynamic view. A static view answers the question: “What?”. For example, “what are the classes that are used by the program?” In contrast, a dynamic view answers “How?”. For example, “how does the system work?” An object-oriented example of a static view of a system in object-oriented design is called a class diagram. A class diagram illustrates the relationships between a set of classes. An object-oriented example of a dynamic view is a sequence diagram. A sequence diagram illustrates how different objects (instances of classes) interact under a given scenario.
Software engineers use specialized design tools to aid in the documentation of the design of the systems they develop. There are many mature software design tools on the market. These tools vary in sophistication. Some tools are mere drawing tools that allow a user to graphically illustrate the design of a system using standard symbols that are part of the design methodology specification. Other design tools are more sophisticated because they allow the model and the source code to stay synchronized, a process known as round-trip engineering.
The current trend in software design tools is toward automating the often tedious work of drawing design diagrams and to achieve close synchronization with the software source code as changes are made to the system. The market is saturated with software design tools that focus on generating the static view of a system, while not providing any assistance in generating the dynamic view. This is largely because it is quite simple to reverse engineer the static design of a system by analyzing the source code. Generating a dynamic view of the system is much more difficult because it requires analyzing a running program. Some products analyze the source code in an attempt to describe the program flow of a system. The main problem with this approach is that it is impossible to predict the exact behavior of a system without actually running the program. This is because the path of execution depends largely on conditions that are not known until a program is executed.
Tools that are used to analyze the behavior of a system are referred to as profilers. The main purpose of a profiler is to analyze a running program in order to isolate bottlenecks and to improve overall performance. Some diagramming tools utilize profilers in order to build sequence diagrams or call trees. These tools provide some insight into the behavior of a system, but they usually produce output that would consume reams of paper if printed. Typical output includes a listing of the results of execution of each statement, similar to output produced manually by inserting I/O statements to print variable values after each program statement. The reason for such voluminous output is that profilers lack the ability to summarize the execution flow in such a way that can be easily understood by a human.
Some profilers produce sequence diagrams. A problem with current profiling tools that produce sequence diagrams is that the generated documentation lacks critical detail about the execution flow. Two very important aspects of execution flow are execution looping and conditional execution. Current profiling products do not depict looping or conditional execution. However, most programs are composed mainly of execution loops and conditional execution statements. As a result, it is not possible to truly understand why a program is behaving a certain way using current profiling tool technology. A solution is required that can summarize program execution into a sequence diagram that illustrates the nature of the program flow. This can be accomplished by tracking looping and conditional execution and annotating the resulting sequence diagram. However, such tracking and annotating have only been performed manually by software engineers through analysis of thousands of lines of source code and program output.
In light of the problems with current program analysis tools, there exists a long-felt need for improved methods, systems, and computer program products for summarizing operational behavior of a computer program.
The present invention includes methods, systems, and computer program products for summarizing the execution flow of a computer program. According to one method, a computer program is executed in a mode that allows control over execution of the program. Execution of the program is paused at locations corresponding to instructions in the computer program. For each location, contents of a call stack containing function calls made by the program that have not been returned are recorded. For each function in the call stack, conditions under which the function was called are recorded. The conditions may include the sequence of functions that resulted in the current function being called and whether the function is executing in a loop. In implementation, the contents of the call stack may be recorded on a data structure, referred to herein as a shadow stack. A new shadow stack instance may be created for each breakpoint location. A summarized call tree may be used to store relationships between calls for each instance of the shadow stack. Computer program output may be presented to the user in a summarized format, such as a sequence diagram. Post processing of intermediate or machine code corresponding to the computer program may be performed to add notation to the sequence diagram that indicates guard conditions for loops and conditionally executed blocks of code.
Although the methods and systems described herein may produce a sequence diagram for display to the user, the present invention is not limited to producing sequence diagrams. Alternative diagrams that may be produced by the methods and systems described herein to illustrate operational behavior of a computer program include behavioral views of the system, such as collaboration diagrams, state diagrams, and object interaction diagrams.
One aspect of the invention may include analyzing the execution flow of a computer program by monitoring function/method calls made by a program. The analysis may include detecting when the program is in an execution loop or when a program has conditionally executed a block of code. This information is used to summarize the behavior of a program by expressing function call sequences with loop and conditional execution notation. One problem that is solved by the present invention is how to identify where the execution loop actually exists. One exemplary implementation described herein determines the origin of an execution loop by combining the use of the shadow stack, a local loop counter, and the summarized call tree.
In one exemplary implementation, the present invention includes a sequence analysis engine (SAE). The SAE utilizes debugger services in order to examine and record the execution flow of a computer program. By using debugger services, it is meant that the SAE uses services provided by a debugger application programming interface (API), such as the WINDOWS® debugger API. While most debuggers are used by computer programmers to isolate and fix bugs in computer programs, one exemplary implementation of the present invention includes an approach that automates the use of debugger services to control, examine, and record the execution of a computer program. The SAE utilizes common debugger services to inspect the state of an executing program, to set and clear debug breakpoints, to single step into/over computer instructions, to control the execution of threads and processes, and to access debug symbols associated with the target program.
An alternative approach to using debugger services involves using profiler services to track method entry and exit events. The disadvantage of this approach is that every function call incurs overhead because profilers track all method calls in an application. Using debugger services is more efficient because debug breakpoints are used to focus analysis only on those classes/methods that are of interest to the user.
In an implementation that utilizes profiler services, the SAE may use profiler call back functions for each method call in a program. Profilers typically require special instructions to be placed at the beginning of each method. These instructions are designed to interrupt the execution of the target program and allow a monitoring service to take control. In one exemplary implementation of the present invention, the SAE may be called by the profiler for each method call in a program being monitored. The SAE may then record the function call and inspect the target application memory, call stack, and registers to determine whether the function call is being conditionally executed or whether the function call is part of an execution loop.
Although using profiler services is one possible method for analyzing the operational behavior of a computer program, using profiler services is less efficient than using debugger services because using profiler services does not enable a user to selectively enable and disable monitoring of user-specified functions. A profiler requires that each function be analyzed. Requiring that each function be analyzed increases unnecessary processing in analyzing a computer program.
Another advantage to using debugger services over profiler services is that debugger services allow the sequence analysis engine to dynamically explore new interactions between functions using the single step debugger service combined with breakpoint service. The ability to dynamically explore new interactions between functions is referred to herein as auto-discovery mode. Auto-discovery mode is not possible using profiler services because profiler services do not allow single stepping services or breakpoint services.
A GUI application that allows a user to configure inputs for the SAE, control the operation of the SAE (start, stop, pause), and to view and manipulate the outputs from the SAE may be provided. The SAE interacts with the debugger to control the target application and examine/record its execution using input provided via the GUI. In an alternate implementation, the GUI may be omitted. In such an implementation, the SAE may be executed by the user from a command prompt. With this approach, the user may be responsible for manually editing the SAE input file. The SAE may produce the analysis results to a text file that could be viewed by the user.
The methods and systems for analyzing operational behavior of a computer program described herein may be used in a variety of software development and testing scenarios. One such scenario is referred to as extreme programming. Rapid application development methodologies, such as extreme programming, advocate developing computer programs by skipping formal analysis and design and jumping immediately into programming. This approach encourages programmers to constantly refactor their programs until the desired solution is obtained. Unfortunately, this approach will leave very little design artifacts for other software developers who later must maintain or add features to the program. A software tool, as described herein, automatically generates concise, easy-to-understand, sequence diagrams that explain how certain aspects of the program function. This tool fits perfectly into the extreme programming paradigm because it allows the software engineers to focus on developing the program while automatically producing up-to-date documentation for communicating the design of the system.
Another application of the methods and systems described herein is behavioral model verification. Large software firms, especially those producing software for government agencies, are required to follow formal design methodologies. During the detailed design phase of software development, these firms produce design documentation for both the static and dynamic aspects of the system. After the design has been reviewed and approved, the actual software development begins. Often times, the software development is performed by an entirely different group of people. As a result, there is often a disconnection between the designer's intent and the actual implementation. The formal software development process requires model verification after the software implementation is complete. At this stage, source code reviews are held and a determination is made as to whether the program has been implemented according to the design. Reviewing source code can be useful in verifying the static design of a system; however, it does not address the dynamic aspects of the design. A tool, as described herein, may be used to perform model verification of the behavioral aspects of the design.
Yet another application of the methods and systems described herein is automatic generation of computer program behavioral documentation for use in maintaining legacy software. The software life-cycle of a computer program typically includes the following: requirements, analysis, design, implementation, testing, installation, operation, maintenance, and retirement. Quite often, software engineers involved in the early design of a system are not the same individuals that are responsible for maintaining the system. In fact, in many software projects, by the time a product reaches the maintenance phase of the software life-cycle, the original designers of the software are no longer available, having either changed projects or in some cases, jobs. This presents a problem when a software maintenance engineer is attempting to fix a bug in a system that has out-dated documentation. The maintenance engineer must spend countless hours pouring over source code, debugging the application, or asking others for helpful insight. As described above, tools already exist that generate up-to-date documentation on the static design of a system. Unfortunately, static design documentation alone often cannot provide the maintenance engineer with enough information to isolate and fix program bugs. A tool that can automatically generate documentation that describes the behavior of a system would have a profound impact on reducing the cost of maintenance in this scenario. The maintenance engineer could run the tool under different conditions to generate behavioral diagrams that can be compared and used to isolate the problem. The methods and systems described herein provide such a tool.
The methods and systems described herein may include an approach for generating a dynamic view of a computer program that is concise and easily understood by the user. Such a tool has utility in rapid application development, formal software development, and in reducing the cost of maintaining legacy software systems.
Accordingly, it is an object of the invention to provide methods, systems, and computer program products for summarizing the operational behavior of a computer program.
It is another object of the invention to provide methods, systems, and computer program products for identifying loops and conditional execution in computer programs and displaying the loops and conditional execution to a user in a summarized format.
Some of the objects of the invention having been stated hereinabove, other objects will become evident as the description proceeds when taken in connection with the accompanying drawings as best described hereinbelow.
Preferred embodiments of the invention will now be explained with reference to the accompanying drawings of which:
The following terms and definitions are used in explaining details of embodiments of the invention:
API—stands for application programming interface, a standard used by computer programmers to allow operating systems and software applications to understand one another.
breakpoint—a place in a source code program that stops the debugger during program execution. Breakpoints aid in the testing and debugging of programs.
C++—an object-oriented programming language based on the C language.
call or function call—an expression that moves the path of execution from the current function to a specified function and evaluates to the return value provided by the called function.
call stack—the list of procedures and functions currently active in a program.
call tree—a data structure used to record a computer program's function call sequence. Also, a display that documents function usage hierarchy.
class—one of the key concepts in object-oriented programming, a class is the most general kind of user-defined type, defining both the state information used by objects of the class (data members) and their behavior (member functions). Classes may be related to one another via inheritance relationships, where base classes define portions of the interface and/or implementation of derived classes.
class diagram—a diagram that shows a collection of declarative (static) model elements, such as classes, types, and their contents and relationships.
collaberation diagram—a diagram that shows object interactions organized around the objects and their links to each other. Unlike a sequence diagram, a collaboration diagram shows the relationships among the objects. Sequence diagrams and collaboration diagrams express similar information, but show it in different ways.
conditional statement—in a programming language, a statement (for example, the if statement) that evaluates one or more variables or conditions and uses the result to choose one of several possible paths through the subsequent code.
debug symbols—information used by a debugger to link machine instructions to higher level source code and to display variable names and type information.
debugger—a software tool that is used to detect the source of program or script errors by performing step-by-step execution of application code and viewing the content of code variables.
disassemble—to transform machine codes into assembler language.
function—a specialized group of statements used to encapsulate general or program-specific tasks.
guard condition—a condition that must be satisfied in order to proceed executing a block of code.
GUI—an acronym for graphical user interface. This term refers to a software front-end meant to provide an attractive and easy-to-use interface between a computer user and application.
intermediate language—in computer programming, a target language into which all or part of a single statement or a source program in a source language is translated before it is further translated or interpreted.
loop—a set of instructions executed repeatedly as long as some condition is met.
machine language—a binary language (using only 0s and 1s); the only programming language the computer understands. All programs written in higher-level languages must be translated into machine language before they can be executed.
method—an operation defined for an object, implemented as a procedure or function in a programming language.
object—in object-oriented design or programming, a concrete realization of a class that consists of data and the operations associated with that data.
object interation diagram—a diagram that shows the dynamic message-passing relationship between objects, including which object owns the data being passed and which object owns the service being called.
process—a process is a single executable module that runs concurrently with other executable modules.
sequence diagram—a diagram that shows object interactions arranged in time sequence. In particular, it shows the objects participating in the interaction and the sequence of messages exchanged. Unlike a collaboration diagram, a sequence diagram includes time sequences but does not include object relationships.
shadow stack—a data structure that reflects the current contents of the program's call stack.
single step—a debugger command that allows an application program to execute one line of the program, which can either be a single assembly language instruction or a single high level language instruction. There are typically two distinct single step commands—one that will single step “into” subroutine calls and one that will step “over” them.
source code—the readable form of code created by a programmer in a high-level programming language. Source code is converted to machine-language object code by a compiler or interpreter.
state diagram—a model of the states of an object and the events that cause the object to change from one state to another.
thread—the basic unit of program execution. A process can have several threads running concurrently. Each thread can be performing a different job, such as waiting for events or performing a time-consuming task that the program does not need to complete before the program continues. Generally, when a thread has finished performing its task, the thread is suspended or destroyed.
UML—stands for unified modeling language. UML is a standard notation and modeling technique for analyzing real-world objects, developing systems, designing software modules in object-oriented approach. UML has been fostered and now is accepted as a standard by the group for creating standard architecture for object technology, OMG (Object Management Group).
The present invention may include a system that can automatically generate a diagram that illustrates the operational behavior of a software program. In one exemplary implementation, a system for summarizing the operational behavior of a computer program may automatically generate a sequence diagram including notation for illustrating conditional execution and looping. Alternate implementations may include generation of other behavioral views of the system, including collaboration diagrams, state diagrams, and object interaction diagrams.
Before explaining details of how conditional execution and looping can be automatically identified, sequence diagrams will first be explained. A sequence diagram describes the interaction between a set of objects for a given scenario.
The notation used in
While the operation of someMethod can be understood by examining the source code, it would be difficult to truly understand the program flow using the limited notation from
The notation in
In
Everything below dashed line 420 and within ALT interaction frame 425 occurs when guard condition 430 evaluates to false. In the illustrated example, 430 is false, guard condition 455 is true and doX is called with a value of false, as indicated by reference numeral 460.
The sequence diagram in
Most modern computer programs are written in a high level language, referred to as source code language. The syntax of these high level languages provides the ability to make function calls, to conditionally execute a block of code, and to execute the same block of code in a loop until some condition is met. The present invention may include analyzing the execution flow by monitoring function/method calls made by a program. In addition, the present invention may include intelligent monitoring that can detect when the program is in an execution loop or when a program has conditionally executed a block of code. This information may be used to summarize the behavior of a program by expressing the function call sequences with loop and conditional execution notation, as illustrated in
A function call allows a program to temporarily branch to another location to execute a series of statements and then to return the point of origin and continue execution. Functions are blocks of code (computer statements) that can be called in order to perform a specific task. In object-oriented programming, objects have actions, referred to as methods, which can be invoked by other objects. A method invocation is analogous to calling a function in a non-object-oriented computer language. Most computer programs are composed of hundreds, if not thousands, of objects, each object having many methods. In object-oriented languages, methods are used to express the actions that can be performed by an object. Software engineers who are designing object-oriented programs utilize sequence diagrams to document the interaction between objects and the behavior of a program. In contrast to sequence diagrams, class diagrams are used to document the static design of an object-oriented system. The present invention may include the ability to monitor the interaction between the objects that comprise a computer program. Although one exemplary implementation described herein focuses on analyzing object-oriented programs, the methods and systems described herein for analyzing operational behavior of a computer program can also be used to monitor the interaction of modules and associated functions of a non-object-oriented application, such as a procedure-oriented application.
In one implementation of the present invention, interactions between methods or functions in a computer program may be recorded into a data structure referred to herein as a summarized call tree, as illustrated in
In
As stated above, one important aspect of the present invention is the ability to automatically identify conditionally executed blocks of code, referred to as conditional statements, and adding conditional execution notation to sequence diagrams. Returning to
The inclusion of conditional execution notation in a sequence diagram helps to explain the conditions under which a method or a block of methods are called. A sequence analysis program, such as a sequence analysis engine according to an embodiment of the present invention, preferably captures the conditional execution flow so that generated sequence diagrams can provide conditional execution notation.
In order to detect conditional execution, the SAE may record method calls between selected classes based on filter criteria established by the user. Recording a method call involves recording both the callee and the caller. The caller contains information about the origin of a call. The callee contains information about the destination of a call. Each call that is monitored may be placed in a summarized call tree which may be later used to create visual representations of the execution flow.
In one exemplary implementation, the SAE performs post analysis of the calls contained in the call summary tree. The main purpose of this post analysis is to identify calls made from blocks of statements contained by a conditional statement. This is accomplished by analyzing the machine/intermediate language code for each caller contained in the call tree and locating low level computer instructions that were generated from high level language conditional statements. These low level instructions are referred to as conditional branching instructions because they allow the processor to jump over blocks of instructions based on some condition. Example conditional branch instructions that the SAE may analyze in detecting conditional execution may include machine instructions, such as jz, jne, jge jump zero, jump not equal, jump greater than or equal). These instructions require a location to jump to if the condition is met.
In one exemplary implementation of the sequence analysis engine, debugger symbols can be used to locate the line(s) of source code that correspond to the analyzed branch instructions. This information can then be presented on a sequence diagram as depicted in
Forward conditional branching is indicative of conditionally executed code blocks. If the disassembled instruction is a forward conditional branch, then the starting and ending offset of the conditionally executed block of code is stored for later use (step 850). If the current instruction is not a forward conditional branch, control proceeds to step 860 where it is determined whether the current instruction is a looping instruction. If the current instruction is a branch that specifies a target offset less than the offset of the branch offset, then a loop has been detected. In this case, the starting and ending offset of the loop are stored for later use (step 870).
After checking for the existence of forward conditional branches and backward conditional branches or loops, the SAE advances to the location of code where the next instruction is located (steps 880 and 890). If the method contains additional instructions then the process described above is repeated starting at step 830. If all of the intermediate language instructions have been analyzed, the process of checking for loops and conditional execution ends in step 895.
The conditional and loop statement scopes that are recorded in this phase are later used when rendering diagrams, such as sequence diagrams, that illustrate the execution flow of the program. For example, in the case of a sequence diagram, if a block of call statements are contained by or within the starting and ending offsets of a conditional statement, an interaction frame can be drawn around the method calls. Returning to
Most computer programs contain blocks of statements that are executed repeatedly in an execution loop. The following pseudo-code to read each line from a file and print it to the computer console illustrates the value of execution loops in computer programs:
Without a looping construct, this program would be difficult, if not impossible, to write. Since the programmer may not know ahead of time how many lines of text are contained in the file, it would be impossible to know how many ReadLine function calls to make in order to read all lines of text contained in the file. The “while” loop provides a concise mechanism for expressing which statements are to be executed in the loop and the condition that must be met in order to continue looping over the statements.
Sequence diagrams provide notation for expressing method calls that are made inside of an execution loop. In
The present invention may include a method for summarizing the execution flow of a computer program by identifying method calls that were made in the context of an execution loop. This enables the generation of sequence diagrams such as in
As described above, a summarized call tree contains nodes that represent unique execution paths made by a computer program. If the same path is encountered more than once, a loop counter is incremented at the point in the tree where the path was repeated. For example, if the call sequence:
entryPoint→methodA→method B
occurs, a sequence analysis engine may store this information in a summarized call tree as illustrated in
As described above,
One problem that must be overcome is to identify where the execution loop actually exists. It is not enough to know that the sequence entryPoint→methodA→methodB was encountered 10 times because entryPoint could be calling methodA in a loop, methodA could be calling methodB in a loop, or both entryPoint and methodA could contain execution loops. The present invention includes a mechanism to overcome this problem. In one exemplary implementation, the present invention may utilize a data structure, referred to herein as a shadow stack, may be to record the number of times a method call was made by the callee while the callee or calling function was on the call stack, i.e., before the calling function returns. The shadow stack may contain a snapshot of the executing program's call stack at a specific time. The SAE maintains the shadow stack by actively monitoring method calls and method returns. When a new method call is monitored, the information about the call is pushed onto the top of the shadow stack. When the call returns to the callee, the corresponding entry is popped from the shadow stack. The call information is stored in an object referred to herein as a call object. The call object may be stored or encapsulated in another object, referred to herein as a StackFrame object. The StackFrame object may be stored on the shadow stack.
In one exemplary implementation, the SAE places a reference to the call object into the summarized call tree, described above, when the call object is created and placed in a frame of the shadow stack. The summarized call tree maintains a history of each method call made by the program. The shadow stack maintains a history of each call that is currently represented on the program's call stack.
As new calls occur, corresponding call objects are encapsulated by a StackFrame object which is in turn pushed onto shadow stack 900. When a call returns to the originating method, the corresponding StackFrame object is popped from shadow stack 900 and then discarded. The main distinction between the roles of summarized call tree 1000 and shadow stack 900 is that summarized call tree 1000 maintains a historic collection of all calls that have been monitored by the system, whereas the shadow stack 900 references only those calls that are “active” on the monitored program's call stack.
In
After the function corresponding to stack frame object 1010 returns, the next function called within the function that corresponds to stack frame object 1020 will be added to the branch of summarized call tree 1000 after call object 1050. For example, the next call may be indicated by call object 1095. Thus, by maintaining a shadow stack that contains objects corresponding to functions that have not returned and a summarized call tree that represents a history of functions that have been called and context between the function calls, the present invention allows automatic generation of summarized computer program information.
Using a shadow stack allows the SAE to determine where the execution loop is located. As described above, it is not enough to simply count the number of occurrences of a particular call. In order to accurately depict the flow of execution, the SAE preferably determines where the loop (or loops) occurred that resulted in multiple occurrences of the call. Maintaining a shadow stack allows the SAE to keep track of the local loop count for each call that is active on the monitored program's call stack.
As described above, the present invention may utilize debugger services to set breakpoints in the monitored program. Breakpoints are special instructions that, when executed, cause a debug exception to occur. Debuggers intercept these exceptions and use them to gain control of the currently executing program. Having suspended execution of the program, the debugger can inspect the content of the program, including the call stack, threads, local variables, computer registers, and the contents of addressable memory. The SAE uses debugger breakpoints to gain control of a program at predetermined locations, referred to herein as sequence points. A sequence point is established by setting a function breakpoint on the first instruction of a method. The SAE utilizes sequence points to focus analysis on interactions between select classes and/or methods. Although one exemplary implementation of the invention utilizes debug breakpoints to gain control of a program, alternative approaches could also be used. For example, the SAE may overwrite instructions in the target application with special instructions (such as a kernel mode function call) that would result in the SAE gaining control of the application. The overwritten instructions would be restored once the SAE has finished processing the exception.
When a sequence point breakpoint occurs, the SAE updates the shadow stack and the summarized call tree with information from the monitored program's call stack. After the shadow stack has been updated, the SAE utilizes the local loop count attribute of the stack frame to detect the origin of execution loops that involve methods that are active on the monitored program's call stack. The final step taken by the SAE when processing a sequence point breakpoint is to scan the current method for call instructions, if it has not already been scanned, and to set breakpoints on the scanned call instructions. These breakpoints are referred to herein as call points.
In step 1125, the sequence analysis engine detects loops. This step may be performed using the shadow stack, the summarized call tree, and the local loop counter for each call currently on the monitored program's call stack. Detecting loops also includes recording the origin of each loop using the summarized call tree, as described above. In step 1130, the sequence analysis engine determines whether it is within a maximum call depth from the nearest sequence point. If it is determined that the analysis is within the maximum call depth, then the sequence analysis engine may continue exploring interactions between function or method calls by setting call breakpoints on all statements in the current method. If the sequence analysis engine is not within the maximum call depth, then execution of the program should resume. Maximum call depth may be programmable by the user depending on the desired depth of analysis desired by the user. Accordingly, if it is determined whether the maximum call depth has not been exceeded, control proceeds to step 1135 where the SAE begins the process of scanning the current method for call instructions by determining whether the current call in the current method has already been scanned. If the call has not been scanned, control proceeds to step 1140 where the method is scanned for calls. In step 1145, call breakpoints are set at each detected call statement so that program execution can be halted at each call statement and relationships between function calls can be determined. Breakpoints that are automatically set by the SAE at calls within a method being analyzed are referred to herein as call points. Control then proceeds to step 1150 where program execution is resumed.
Returning to step 1115, if a breakpoint is determined not to be a sequence point, control proceeds to step 1155 where it is determined whether the breakpoint corresponds to a return point. A return point is a breakpoint set at the instruction in a function that causes the function to return. If the instruction is determined to be a return point, control proceeds to step 1160 where it is determined whether the current thread of execution is a valid thread. If the current thread of execution is a valid thread, control proceeds to step 1163 where call points or call breakpoints in the current method are disabled. In step 1165 the shadow stack frame containing the current function that is returning is removed or popped from the shadow stack. Execution of the program then resumes at step 1150.
Returning to step 1155, if the current breakpoint is determined not to be a return point, control proceeds to step 1170 where it is determined whether the current breakpoint is a call point. As described above, a call point may correspond to a function that is desired to be stepped into in order to analyze relationships between calls. Call points may be automatically set by the SAE, as indicated by step 1145. If the breakpoint is determined to be a call point, control proceeds to step 1175 where the debugger step in services used to step into the function corresponding to the call point. Once the step-in has been performed, control returns to step 1150 where program execution is resumed.
Returning to step 1100, if program execution is halted due to a breakpoint event and the breakpoint event is a step in complete event, control proceeds to step 1180 where analysis of the stepped-into function begins. In the stepped-into function, the SAE first determines in step 1185 whether a sequence point is present in the function. If a sequence point is present, control proceeds to step 1150 where execution of the program is resumed. The program will be suspended when the sequence point breakpoint is encountered. If the stepped-into function does not include a sequence point, control proceeds to step 1120 where the shadow stack and the summarized call tree are updated. Steps 1125-1145 may be repeated to detect loops, and automatically set call points within the stepped-into function, provided that the maximum call depth has not been exceeded. Thus, using these steps, interactions between multiple layers of function calls may be automatically analyzed. Once analysis of the stepped-into function is complete, execution of the program is resumed.
Maintaining a shadow stack and a summarized call tree is an important step in summarizing the execution flow of a computer program. Therefore, it is appropriate to further discuss the process of building and maintaining the shadow stack and the summarized call tree. As illustrated by step 1120 in
Once the program stack and the shadow stack are synched, control proceeds to step 1220 where the SAE walks forward one position in the program stack. In step 1225, the SAE determines whether the current position is the top of the program stack. If the current position is the top of the program stack, there is no need to update the shadow stack because the program stack does not include any further instructions that are not already in the shadow stack. Accordingly, control proceeds to step 1230 where the process of updating the shadow stack completes.
If, however, the current position is not the top of the shadow stack, there are instructions in the call stack that have not been placed in the shadow stack. Accordingly, control proceeds to step 1235 where a call object is created based on the current program stack entry. The call object is used to perform a lookup in the summarized call tree. In step 1245, the SAE determines whether a match for the current call is found in the summarized call tree. If a match is found, control proceeds to step 1250 where the SAE sets the local loop count to zero for all child call objects matching the call object. In step 1255, the call object is pushed on to the shadow stack. In step 1260, the SAE sets a breakpoint at the return address of the function call. In step 1265, the SAE walks forward one position in the call stack. Control then returns to step 1225.
Returning to step 1245, if the call is not found in the summarized call tree, control proceed to step 1270 where the call object is added to the summarized call tree. Instep 1275, the call object is added as a child of the call object at the top of the shadow stack. Control then returns to step 1255 where the call object is pushed on to the shadow stack, step 1260 where a breakpoint is set at the return of the current call, and step 1265 where the next instruction in the call stack is accessed.
Detecting the origin of execution loops is crucial to accurately summarize the behavior of a computer program. As illustrated by step 1125 in
The present invention also utilizes breakpoints to establish two additional types of breakpoint locations. These breakpoint locations as return points and call points, as described above with respect to
As described above with respect to
One of the most important aspects of summarizing the operational behavior of a computer program is detecting when a function is called from within an execution loop, and detecting the node in the summarized call tree corresponding to the sequence of functions in which the execution loop occurred. This sequence of functions is referred to herein as the origin of the execution loop.
Returning to step 1310, if the current instruction is not at the bottom of the shadow stack, in step 1330, the loop counter in the call object associated with the function is incremented. In step 1340, it is determined whether the loop counter associated with the current call is greater than one. If the loop counter is greater than one, the SAE sets the boolean variable in the call object call.local loop to true, indicating that the current call is being called from within an execution loop. In step 1340, if the loop counter is not greater than 1, control proceeds to step 1360 where the next stack frame in the shadow stack is analyzed.
In one implementation, the present invention includes a system that can summarize the execution flow of a computer program.
In operation, GUI application 1400 may allow a user to configure inputs for the SAE 710, control the operation of SAE 1410 (start, stop, pause), and to view and manipulate the outputs from SAE 1430. SAE 1410 interacts with debugger 1420 to control target application 1430 and examine/record its execution. The services of debugger 1420 are accessed by SAE 1410 by utilizing debugger API 1415. It should be noted that debugger APIs are often provided by debugger frameworks in order to offer customized debugging capabilities.
SAE 1410 may utilize debugger services 1420 in order to examine and record the execution flow of a computer program. While most debugger applications are used by computer programmers to isolate and fix bugs in computer programs, it is possible to automate the use of debugger services to control, examine, and record the execution of a computer program. It is this automated use that allows SAE 1410 to step into methods, scan for call statements, and set new breakpoints at the call statements as described above with respect to
In performing automated analysis of target application 1430, SAE 1410 may utilize services that are commonly offered by debuggers.
SAE 1410 may utilize debugger services to inspect the program state, including the program call stack, local and global variables, and the state of computer registers. Of particular interest is the data that is stored on the call stack. The call stack is composed of stack frames. Each stack frame relates to a method that is currently being executed by the program. Information in the stack frame includes local variables and the return address of the next statement to execute once the method being called has returned. SAE 1410 may walk the call stack and examine the contents of each stack frame. SAE 1410 stores the return address stored in the stack frame when it detects a new call. This return address can be used to identify the memory address of the call statement.
Another debugger service that is utilized extensively by SAE 1410 is the breakpoint service 1510. SAE 1410 utilizes the breakpoint service to create, enable, and disable breakpoints. A breakpoint is a special computer instruction that halts the current program and gives control to debugger 1420. In most cases debugger 1420 actually replaces a specified computer instruction with a special instruction that causes a breakpoint exception to occur. When a breakpoint exception occurs, debugger 1420 catches the exception and suspends execution of the application. When a debug breakpoint is encountered most debuggers allow the user to visually inspect the state of the suspended program, including the registers, memory, local/global variables, and the call stack. As described above, SAE 1410 uses debugger breakpoints to gain control of a program at predetermined locations referred to herein as sequence points. A sequence point is established by setting a function breakpoint on the first instruction of a method. SAE 1410 utilizes sequence points to focus analysis on interactions between select classes and/or methods. SAE 1410 is notified by the debugger service whenever a debug breakpoint is encountered. After analyzing the current call stack and recording the execution flow, SAE 1410 uses the debugger services to resume execution of the suspended program.
SAE 1410 may utilize single stepping services to explore interactions for select classes/methods. Debugger 1420 provides services that allow SAE 1410 to control the execution of a single instruction or of a range of instructions. The Step-In service allows SAE 1410 to step into a function that is referenced by a call instruction. As described above, SAE 1410 utilizes the Step-In service to analyze a function that is referenced by a call point, see
SAE 1410 utilize the Suspend service to temporarily pause a program's thread in order to inspect program state and set required breakpoints. SAE 1410 may utilize the Resume service to resume a suspended thread when analysis is complete.
In addition to using debugger services to control program execution, SAE 1410 may also utilizes debugger services for reading program symbols. These symbols allow SAE 1410 to identify the memory locations of functions specified by the user and associate low level machine/intermediate instructions to higher level source code statements. This mapping allows SAE 1410 to annotate the sequence diagrams with fragments of source code. The ability to display source code in generated sequence diagrams greatly enhances the diagram's ability to summarize the execution flow of a computer program.
In order to analyze the target application, in step 1730, SAE 1410 sets entry breakpoints according to the user configuration. In step 1735, SAE 1410 analyzes the target application using the steps described above with regard to
Thus, the present invention includes methods, systems, and computer program products for summarizing the operational behavior of a computer program. The method may include setting execution breakpoints at functions of interest in computer program code. The computer program code is then executed to analyze the operational behavior of the computer program. During execution of the computer program, conditional execution and looping of each function of interest are tracked. A summary of the conditional execution and looping is produced and displayed to the user.
It will be understood that various details of the invention may be changed without departing from the scope of the invention. Furthermore, the foregoing description is for the purpose of illustration only, and not for the purpose of limitation—the invention being defined by the claims.