The present application contains at least one drawing executed in color. Copies of this patent application with color drawings will provided by the Office upon request and payment of the necessary fee.
1. Field of the Invention
The present invention relates to software tools for assisting software developers and users in the task of monitoring and analyzing the execution of computer programs, such as during the troubleshooting process.
2. Description of the Related Art
The software industry faces a challenge in fighting malfunctions (colloquially known as “bugs”) that occur in the released versions of commercial software. Such malfunctions cause serious problems both for customers and vendors. It is often difficult to identify the particular situation, or situations, that leads to the malfunction and this adds to the difficulty of finding the root-cause of the malfunction in the application code or coding environment. The problem is especially difficult if it appears only at a remote customer site and is not reproducible in the development and testing environments. Remote software troubleshooting based on run-time execution tracing of applications in production environments is often used to identify and diagnose such malfunctions. This approach provides insight into the running application and allows gathering of execution traces that record function calls, variables, source lines and other important information. However, analysis of program execution traces and understanding of the root-cause of a program error is a tedious and time-consuming task. The execution trace log can contain thousands of function calls and other entries corresponding to events that happened just before the malfunction.
The present invention solves these and other problems associated with software troubleshooting (debugging) by analyzing execution traces of software programs. The analysis provides reduction and classification of the data that is presented to the user (the user typically being a software analyst tasked with locating the cause of the malfunction). The user, working with the tracing tool, has the option of not analyzing the raw execution trace log containing a relatively large list of low-level events. Rather, the user can view a relatively smaller processed log that categorizes the low-level events into relatively higher-level events attributed to known classes or regions of normal functioning or different anomalies. In one embodiment, the analysis program can send alerts on known classes of problems or anomalies
Analysis of the execution trace allows the user to detect the application execution patterns. Each pattern represents a sequence of operations performed by the application. Patterns are associated with the situation classes. Thus it is possible to identify patterns that represent a relatively broad class of normal execution of the application. Other classes include patterns for typical operations such as file opening, file saving, site browsing, mail sending, etc. For diagnostic purposes, the classes include patterns associated with certain malfunctions. In one embodiment, the system includes a learning mode and a recognition mode. In the learning mode the system accumulates patterns belonging to different classes and stores them in a pattern database. In the recognition mode, the system matches the trace against the pattern database and assigns trace regions to specific classes of execution, such as, for example, normal execution classes, abnormal execution classes, execution classes related to specific problems, execution related to user activities, etc.
In one embodiment, the learning mode includes an automatic learning sub-mode. In one embodiment, the learning mode includes a user-guided learning sub-mode In the learning mode the system accumulates patterns belonging to different execution classes and stores them in a database. Automatic learning is often used for accumulating patterns belonging to a generic class corresponding to normal execution. User-guided learning involves activity of a user who selects certain log regions and attributes them to a particular execution class such as normal execution, execution classes related to specific problems, execution related to user activities, etc.
In the recognition mode, the system matches the trace against the pattern database and assigns trace regions to specific execution classes. Of special interest is the abnormal execution class. A trace region (e.g., a group of events in the execution trace log) is attributed to the abnormal execution class if it contains a relatively high density of unknown patterns. This class usually marks situations related to software malfunctions or insufficient training data in the pattern database. If the user who works with the system decides that the abnormality region appeared because it is related to a new execution path that was not encountered during learning, the analyst can assign it to the normal class using the user-guided learning mode.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
A software system which embodies the various features of the invention will now be described with reference to the following drawings.
In the drawings, like reference numbers are used to indicate like or functionally similar elements. In addition, the first digit or digits of each reference number generally indicate the figure number in which the referenced item first appears.
Each pattern represents a sequence of operations performed by the traced application under certain conditions. A pattern can be represented as:
A stream of operations is separated into overlapping sub-sequences of the constant length N. For example, consider learning a stream of eight operations in a trace log
For the case of pattern length N equal to 6, the following patterns will be extracted:
Meaningful sequences of operations in complex modern applications are selected for analysis. Many modern applications are complex, being multithreaded, and in many cases executed in distributed multi-user environment. Simple time ordering of system calls is too low-level a representation that is influenced by many random or irrelevant factors. The sequences in the patterns stored in the pattern database preferably reflect the internal logic of the program. To identify the patterns, the sequential trace log is pre-processed and separated into logical streams reflecting certain semantic or logical sets of operations. For example, operations can be separated into streams according to: operations belonging to a particular thread; operations performed by particular user of the application; operations belonging to particular transactions, operations belonging to particular user action; operations belonging to a particular function body; etc. Note that with regard to operations belonging to particular transactions, there can be numerous definitions of transactions depending on the semantic of the application class.
In one embodiment, recorded operations have a tree-like structure of nested function calls. In one embodiment, two options are provided for linearization of nested calls into a linear sequence of events:
The user can generate shortened patterns that contain M<N operations. Shortened patterns are generated in the user-guided learning mode if a region selected for learning contains less than N operations. Each new combination of N or fewer successive operations is saved in the pattern database in association with the identifier of the situation class: 0 for normality; 1 or above for user-defined special situations. A pattern can be saved with multiple class identifiers.
Each operation Oi is characterized by a number of characteristics. The characteristics can include, for example, one or more of: an invocation point, a function entry point, a return code, an operation type, function arguments, etc. The invocation point can include, for example, an invoking module name, invocation offset, etc. The function entry point can include, for example, the calling module name, the entry offset, etc. The return code can include, for example, a normal return code, an abnormal return code etc. In one embodiment, additional knowledge about the function is used to determine if the return code value corresponds to a normal situation, or an error condition. The operation type can include, for example, a function call, creating a thread, exiting a thread, a pseudo-operation (e.g., stream opening, closing, padding, etc.), etc.
In one embodiment, deterministic and/or statistical procedures are used for the classification process. A deterministic procedure is typically used in cases where specific patterns are typical for a particular situation. In most cases, the presence of a single situation pattern or absence of any learned patterns is not enough for reliable classification. For example, there can be cases of patterns belonging to several classes, or limitations of statistics during learning when not all patterns of normal execution situations were encountered. Moreover a region can contain patterns of several classes.
In order to increase reliability of the classification, a statistical procedure is used in some embodiments. In one embodiment, a statistical procedure is based on a selection of regions of consecutive stream operations, and calculation of densities of pattern matches for all database-known execution situation classes. The density value for a situation class in a region of stream-adjacent operations is defined as a ratio of pattern matches for the class to the whole number of operations in the region.
In one embodiment, the classification process is described as follows. Select a region Rn of n adjacent stream elements. A density Dni, 0 Dni 1 is assumed, for a specific execution class i. One can choose regions of different lengths 1, 2, etc. covering the same target stream element: R1, R2, . . . . For efficiency reasons, the upper limit is set to the locality frame size F (30 by default in some embodiments). In other embodiments, locality frame size F can be set smaller or greater than 30. For a stream element, there are F density values for the chosen situation class i: D1i, D2i, . . . , DFi. Let Ri to be a region that delivers the maximal density: Di=maxl{Dli}. If there are multiple candidates the lengthiest region is chosen.
The database-known situation classes are enumerated to receive the best density regions R0, R1, etc. for classes 0, 1, . . . that cover the target stream element. The target element is attributed to an execution class n that delivers the maximal density: D=maxi{Di}. If there are multiple candidates to deliver the maximal density D, then the class with minimal ID is chosen. Thus, the normal execution class takes precedence over classes of special situations. If, however D<T where T is a density threshold (0.8 by default in some embodiments), the target element is assigned to an unknown situation (abnormality), with conditional density D−1=1−D.
To reduce overhead, the algorithm implementation can deduce a situation class of the maximal density for each following stream operation with minimal looking back over operation sequences of different length.
For the current stream element, the algorithm processes F (locality frame) regions that end with the current element. As a result, the situation class C and the region length L delivering the maximal density D are deduced and the class C is assigned to the current element. The class C can also be optionally assigned to up to L−1 preceding elements replacing the class previously ascribed to them. Such replacement spreads back until an element is found whose class was assigned with a density value greater than D.
Backward-processing orientation of the algorithm can be used for support of on-line analysis of an operation stream while it is appended with new operations.
During the learning mode, an attempt to add a new pattern is made as a result of processing operations. The situation class (the execution class) is a parameter of the learning process. In the automatic learning mode the default situation class is a broad class of “Normal Execution”. In the user-guided learning mode situation classes are defined by the users.
A set of Graphical User Interface objects (described in connection with
In one embodiment, background color-coding is used to provide certain information to the user. Operations related to an application under learning are background-colored in light blue 506a, 506b. Operations related to an application in a recognition mode have a more complicated coloring scheme depending on the situation class: The situation corresponding to normality is shown in a white background (as shown, for example, in lines 508a and 508b). An unknown situation (abnormality) is shown in a pink background (as shown, for example, in lines 510a, 510b, and 510c). Situations corresponding to a specific class are shown in a light-yellow background (as shown, for example, in lines 512a and 512b). The colors used to illustrate examples are not limited thereto. Other color schemes or designation schemes can be used.
In contrast to an ordinary trace event view (as shown, for example, in FIG. 3A of U.S. Pat. No. 6,202,199), nodes of a given level in the trace event tree of
In one embodiment, unknown situation operations (pink background) are not supplied with labeling and are presented in expanded form in order to attract the user's attention.
Operations that are considered failed are text-colored in red in the comment column (no failed operations are shown in
In one embodiment, as a default, the parent operation and labeling items are initially collapsed. An exception is made for parent items that head, directly or indirectly, pink-background operations, where the parent items are initially expanded in order to attract the users attention to unknown situations. View items in
The right side of the window in
The user controls the program by using context and conventional GUI controls such as menus, toolbar buttons, dialog boxes, and the like. Control actions can be conveniently described in terms of functional groups. A Group 1 provides controls for setting options and showing the pattern base state. Settings for Group 1 include opening the ‘Options’ dialog window. Show state options for Group 1 include opening the ‘Pattern base’ dialog window. A Group 2 provides controls for filtering by thread and clearing the thread filter. When filtering the thread, the situation view filters by a thread that the selected operations belong to. When the thread filter is cleared, the situation view returns to the unfiltered state. A Group 3 controls includes tree-handling selection of zero-level nodes, the node one level up, the nodes one level down, etc. A “Select zero-level nodes” control selects all zero-level nodes and makes the first zero-level node visible. A “Select node one level up” control moves selection and visibility to the parent node. A “Select nodes one level down” control expands the nodes under the selection and moves selection to its child nodes.
A Group 4 controls includes learning control operations such as: stop learning; learn selection; forget selection; undo last learning; properties; etc. Stop learning opens the open the ‘Stop learning’ dialog window. Learning selection open the ‘Do learning’ dialog window. Forget selection opens the ‘Undo learning’ dialog window. Undo last learning returns the pattern base to a state before the last user-guided learning. A Group 5 includes showing the properties window.
In one embodiment, a coloring scheme is applied to show the current status of the pattern file related to specific module. Grey is used to indicate that the file is disabled and the situation classifier neither appends it with new patterns nor assigns any situation classes to operations. Red is used to indicate that the file is inconsistent and can be fixed by deleting only; it is also disabled. Blue is used to indicate that the file is under learning. The state ‘under learning’ is an initial state for new pattern files. The situation classifier supplies all new patterns with the normality class identifier and accumulates them in the pattern file. Upon stopping learning, either automatically, by applying the threshold for the learning saturation ratio, or manually, the pattern file goes to the state ‘under situation detection’. Black is used to indicate that the file is under situation detection. User-guided learning can be made in this state only.
The window menu items shown in
As a result of user-guided learning, the situation classifier re-scans a subset of selected operations that relate to the selected pattern file and appends new patterns to the file. Users are free to assign these patterns either to normality or to a special situation class (e.g., using a ‘Learn as a specific abnormality’ checkbox 1006). If a new special situation class is created, the user may be given the option to input the legend text for it.
According to a ‘Learn top level sequence only’ checkbox 1008 checkbox, scanning can be made either for all operations or for the top-level operations only. Scanning the top-level stream exclusively may be desirable because the lower level operations frequently do not really identify the situation. After successful user-guided learning the situation view is refreshed.
In one embodiment, the formatting string for naming pattern files can be specified by the user. The macro string %s inserts the application module name (%s.db by default). The formatting string for naming backup pattern files can also be specified using the %s macro (%s.bak by default). A pattern base generation code can be specified. For example, a value of 2 indicates that tracking of patterns is made for the call stack level 0. A value of 3 indicates that tracking of patterns is made for every call stack level independently (by default). A value of 4 indicates that tracking of patterns is made by walking along call trees. A maximum pattern length can be specified as 2 or more (6 by default). In one embodiment, these values are set when a new pattern base is created.
In addition, users can customize the default values for locality frame size, abnormality ratio threshold, automatic learning saturation ratio threshold, etc. For example, in one embodiment, the default locality frame size can be specified as 2 or more (30 by default). In one embodiment, the default abnormality ratio threshold can be specified in the range [0, 1] (0.2 by default). In one embodiment, the default automatic learning saturation ratio threshold can be specified in the range [0, 1] (0 by default, i.e. no automatic stop). Default values for other parameters can also be set by users in other embodiments.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrated embodiments and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributed thereof; furthermore, various omissions, substitutions and changes may be made without departing from the spirit of the inventions. The foregoing description of the embodiments is therefore to be considered in all respects as illustrative and not restrictive, with the scope of the invention being delineated by the appended claims and their equivalents.
The present application claims priority benefit of U.S. Provisional Patent Application No. 60/458,322, filed Mar. 27, 2003, titled “SYSTEM AND METHOD FOR TROUBLESHOOTING RUNTIME SOFTWARE PROBLEMS USING APPLICATION LEARNING,” the disclosure of which is incorporated herein by reference in its entirety.
| Number | Name | Date | Kind |
|---|---|---|---|
| 4503495 | Boudreau | Mar 1985 | A |
| 4511960 | Boudreau | Apr 1985 | A |
| 4598364 | Gum et al. | Jul 1986 | A |
| 4782461 | Mick et al. | Nov 1988 | A |
| 4879646 | Iwasaki et al. | Nov 1989 | A |
| 5021949 | Morten et al. | Jun 1991 | A |
| 5121489 | Andrews | Jun 1992 | A |
| 5193180 | Hastings | Mar 1993 | A |
| 5265254 | Blasiak et al. | Nov 1993 | A |
| 5297274 | Jackson | Mar 1994 | A |
| 5335344 | Hastings | Aug 1994 | A |
| 5347649 | Alderson | Sep 1994 | A |
| 5386522 | Evans | Jan 1995 | A |
| 5386565 | Tanaka et al. | Jan 1995 | A |
| 5394544 | Motoyama et al. | Feb 1995 | A |
| 5408650 | Arsenault | Apr 1995 | A |
| 5410685 | Banda et al. | Apr 1995 | A |
| 5421009 | Platt | May 1995 | A |
| 5446876 | Levine et al. | Aug 1995 | A |
| 5450586 | Kuzara et al. | Sep 1995 | A |
| 5465258 | Adams | Nov 1995 | A |
| 5481740 | Kodosky | Jan 1996 | A |
| 5483468 | Chen et al. | Jan 1996 | A |
| 5513317 | Borchardt et al. | Apr 1996 | A |
| 5526485 | Brodsky | Jun 1996 | A |
| 5533192 | Hawley et al. | Jul 1996 | A |
| 5551037 | Fowler et al. | Aug 1996 | A |
| 5574897 | Hermsmeier et al. | Nov 1996 | A |
| 5581697 | Gramlich et al. | Dec 1996 | A |
| 5590354 | Klapproth et al. | Dec 1996 | A |
| 5612898 | Huckins | Mar 1997 | A |
| 5615331 | Toorians et al. | Mar 1997 | A |
| 5632032 | Ault et al. | May 1997 | A |
| 5642478 | Chen et al. | Jun 1997 | A |
| 5657438 | Wygodny et al. | Aug 1997 | A |
| 5732210 | Buzbee | Mar 1998 | A |
| 5740355 | Watanabe et al. | Apr 1998 | A |
| 5745748 | Ahmad et al. | Apr 1998 | A |
| 5771385 | Harper | Jun 1998 | A |
| 5781720 | Parker et al. | Jul 1998 | A |
| 5848274 | Hamby et al. | Dec 1998 | A |
| 5867643 | Sutton | Feb 1999 | A |
| 5870606 | Lindsey | Feb 1999 | A |
| 5896535 | Ronstrom | Apr 1999 | A |
| 5903718 | Marik | May 1999 | A |
| 5928369 | Keyser et al. | Jul 1999 | A |
| 5938778 | John, Jr. et al. | Aug 1999 | A |
| 5940618 | Blandy et al. | Aug 1999 | A |
| 5960198 | Roediger et al. | Sep 1999 | A |
| 5983366 | King | Nov 1999 | A |
| 6003143 | Kim et al. | Dec 1999 | A |
| 6026433 | D'Arlach et al. | Feb 2000 | A |
| 6026438 | Piazza et al. | Feb 2000 | A |
| 6047124 | Marsland | Apr 2000 | A |
| 6065043 | Domenikos et al. | May 2000 | A |
| 6108330 | Bhatia et al. | Aug 2000 | A |
| 6202199 | Wygodny et al. | Mar 2001 | B1 |
| 6219826 | De Pauw et al. | Apr 2001 | B1 |
| 6237138 | Hameluck et al. | May 2001 | B1 |
| 6263456 | Boxall et al. | Jul 2001 | B1 |
| 6282701 | Wygodny et al. | Aug 2001 | B1 |
| 6321375 | Blandy | Nov 2001 | B1 |
| 6360331 | Vert et al. | Mar 2002 | B2 |
| 6374369 | O'Donnell | Apr 2002 | B1 |
| 6415394 | Fruehling et al. | Jul 2002 | B1 |
| 6467052 | Kaler et al. | Oct 2002 | B1 |
| 6490696 | Wood et al. | Dec 2002 | B1 |
| 6507805 | Gordon et al. | Jan 2003 | B1 |
| 6557011 | Sevitsky et al. | Apr 2003 | B1 |
| 6634001 | Anderson et al. | Oct 2003 | B2 |
| 6865508 | Ueki et al. | Mar 2005 | B2 |
| 7058928 | Wygodny et al. | Jun 2006 | B2 |
| 7089536 | Ueki et al. | Aug 2006 | B2 |
| 7114150 | Dimpsey et al. | Sep 2006 | B2 |
| 7386839 | Golender et al. | Jun 2008 | B1 |
| 20020087949 | Golender et al. | Jul 2002 | A1 |
| 20030005414 | Elliott et al. | Jan 2003 | A1 |
| 20030088854 | Wygodny et al. | May 2003 | A1 |
| 20040060043 | Frysinger et al. | Mar 2004 | A1 |
| 20060150162 | Mongkolsmai et al. | Jul 2006 | A1 |
| 20060242627 | Wygodny et al. | Oct 2006 | A1 |
| 20080244534 | Golender et al. | Oct 2008 | A1 |
| Number | Date | Country |
|---|---|---|
| WO 96-05556 | Feb 1996 | WO |
| Number | Date | Country | |
|---|---|---|---|
| 60458322 | Mar 2003 | US |