Tracing is an approach for logging the state of computer applications at different points during its course of execution. Tracing is normally implemented by inserting statements in the computer application code that outputs status/state messages (“traces”) as the statements are encountered during the execution of the code. Statements to generate traces are purposely placed in the computer application code to generate traces corresponding to activities of interest performed by specific sections of the code. The generated trace messages can be collected and stored during the execution of the application to form a trace log.
Programmers often use tracing and trace logs to diagnose problems or errors that arise during the execution of a computer application. When such a problem or error is encountered, trace logs are analyzed to correlate trace messages with the application code to determine the sequence, origin, and effects of different events in the systems and how they impact each other. This process allows analysis/diagnoses of unexpected behavior or programming errors that cause problems in the application code.
In a parallel or distributed environment, there are potentially a number of distributed network nodes, with each node running a number of distinct execution entities such as threads, tasks or processes (hereinafter referred to as “threads”). In many modern computer applications, these threads perform complex interactions with each other, even across the network to threads on other nodes. Often, each of the distributed nodes maintains a separate log file to store traces for their respective threads. Each distributed node may also maintain multiple trace logs corresponding to separate threads on that node.
Diagnosing problems using multiple trace logs often involves a manual process of repeatedly inspecting different sets of the trace logs in various orders to map the sequence and execution of events in the application code. This manual process attempts to correlate events in the system(s) with the application code to construct likely execution scenarios that identify root causes of actual or potential execution problems. Even in a modestly distributed system of a few nodes, this manual process comprises a significantly complex task, very much limited by the capacity of a human mind to comprehend and concurrently analyze many event scenarios across multiple threads on multiple nodes. Therefore, analyzing traces to diagnose applications in parallel and/or distributed systems is often a time consuming and difficult exercise fraught with the potential for human limitations to render the diagnoses process unsuccessful. In many cases, the complexity of manual trace analysis causes the programmer to overlook or misdiagnose the real significance of events captured in the trace logs. With the increasing proliferation of more powerful computer systems capable of greater execution loads across more nodes, the scope of this problem can only increase.
The present invention is directed to a method and mechanism for improved diagnoses of computer systems and applications using tracing. According to an aspect of one embodiment of the invention, trace messages are materialized using a markup language syntax. Hyperlinks can be placed in the trace messages to facilitate navigation between sets of related traces. Specific traces or portions of traces can be emphasized using markup language tools to highlight text. Another aspect of an embodiment of the invention pertains to a method and mechanism for generating trace messages in a markup language syntax. Further aspects, objects, and advantages of the invention are described below in the detailed description, drawings, and claims.
The accompanying drawings are included to provide a further understanding of the invention and, together with the Detailed Description, serve to explain the principles of the invention.
The present invention is disclosed in an embodiment as a method and mechanism for implementing tracing and trace logs. The disclosed embodiment of the invention is directed to trace logs for distributed and parallel systems. However, the principles presented here are equally applicable to trace log(s) in other system architecture configurations, including single node configurations, and thus the scope of the invention is not to be limited to the exact embodiment shown herein.
An aspect of one embodiment of the present invention is directed to traces comprising markup language syntax. A markup language is a collected set of syntax definitions that describes the structure and format of a document page. A widely used markup language is the Standard Generalized Markup language (“SGML”). A common implementation of SGML is the HyperText Markup Language (“HTML”), which is a specific variant of SGML used for the world wide web. The Extensible Markup Language (“XML”) is another variant of SGML. For explanatory purposes only, the invention is described using HTML-compliant markup language syntax. However, it is noted that the present invention is not limited to any specific markup language syntax, but is configurable to work with many markup languages.
Analysis of traces is greatly facilitated, pursuant to an embodiment, by using traces implemented with markup language syntax. To illustrate this aspect of the invention, consider a simple communications operation that is performed between two network nodes.
When analyzing trace logs for communications operations that send messages between network nodes, it is common for sets of related traces to appear in multiple trace logs across the network. For example, “send” operation trace in a first trace log at a first node often has a counterpart “receive” operation trace located in a second trace log at a second node. Thus in the example of
Consider if it is desired to analyze/diagnose this communications operation between Node 1 and Node 2. When a programmer analyzes the set of traces corresponding to that communications operation, it is likely that the programmer must review both the send and receive traces. In this example, the send and receive traces for the communications operation are spread across multiple trace logs on multiple nodes, and the traces of interest may be buried among hundreds or thousands of irrelevant traces that correspond to applications/operations of no immediate interest. Even in this very simple example, analysis of the trace logs could involve a complex and time-consuming task just to identify the traces of interest. That difficult task is compounded by the additional burden needed to manually jump between the different trace logs to chase the chain of traces across the multiple network nodes. In the real world, this analysis/diagnosis task could become far more difficult because of messaging operations that involve many more threads across many more network nodes.
To address this problem, one embodiment of the present invention materializes trace messages using a markup language syntax. By implementing trace messages using markup language syntax, navigational intelligence can be embedded into the trace messages using “hyperlinks.” A hyperlink is an element in an electronic document or object that links to another place in same document/object or to an entirely different document/object. As noted above, when a programmer analyzes the set of traces corresponding to that communications operation, it is likely that the programmer must review both the send and receive traces. For this reason, it is useful to link related communications traces at the senders and receivers of inter-nodal messages. Thus, a send trace is hyperlinked to its counterpart receive trace. The hyperlinks can be defined in both the forward and reverse directions. A chain of linked traces can be established whereby each trace is hyperlinked in sequential order to both its predecessor and successor trace. All traces relating to a common operation or activity can therefore be linked together via a chain of hyperlinks extending from a first trace through all other related traces.
Once the trace messages have been materialized into a markup language syntax, any browser or viewer capable of interpreting the chosen markup language may be used to navigate the trace log(s). The traces for any activity of interest can be navigated by identifying one of the activity's traces and traversing the chain of hyperlinks extending from that trace—without requiring any manual searching for related traces. Since both forward and reverse hyperlinks can be embedded into the trace log, the traces for an activity of interest can be traversed in both the forward or reverse directions.
According to an embodiment of the invention, trace messages from multiple trace logs can be collected into a single trace log, rather than multiple materialized trace logs 310 and 312 as shown in
The present invention also provides a method and mechanism for emphasizing specific traces or portions of traces in a trace log using markup language syntax. According to this aspect of the invention, traces or trace portions of particular interest include markup language elements that provides visual emphasis when viewed in a suitable browser 316. The visual emphasis may encompass any type of visual cue that differentiates one portion of text from another portion of text, such as bolding certain areas of text, using different colors, using different fonts or font sizes, underlining, etc.
This aspect of the invention is useful if it is desired to emphasize traces or portions of traces corresponding to a unique characteristic. For example, consider if it is desired to analyze or diagnose all operations performed against a specific system resource. This would involve identifying all traces that relate to that system resource. A search is performed against the trace logs to identify all traces corresponding to that system resource. During the conversion process, those identified traces would undergo a conversion to include additional markup language elements to visually separate those trace messages from other trace messages. When viewed with browser 316, this “emphasized pattern” would readily highlight all traces corresponding to the system resource of interest. Moreover, hyperlinks can be embedded to permit sequential navigation through all the traces in the emphasized pattern of traces.
Thus, the invention includes a search or filter mechanism to search for particular patterns in the trace logs. Conversion instructions are sent from a user interface to identify specific patterns that should be searched and will be either emphasized or filtered out. In an embodiment, the browser 316 includes an interface for a user to input a string or regular expression to be used for the filter/search procedure. The markup language converter 314 uses the results of the filter/search procedure to determine which trace messages require markup language conversion and what types of conversions are necessary to emphasize patterns.
An embodiment of the invention for converting traces into a markup language format utilizes fixed format trace strings. The process for extracting information from a trace in this approach is driven by knowledge of the position and existence of specific data items in a trace string. For example, unique identifiers for events, operations or other types of data objects are embedded at a recognized location(s) in the trace strings. Extracting these unique identifiers permits efficient correlation between related traces.
For example 1-to-1 communications involving send and receive pairs of traces are identified at this stage. Information is stored to identify these related traces as candidates for embedded hyperlinks when markup language conversions of the traces are generated. It is noted that other types of communications relationships, including 1-to-many and many-to-many relationships, are also identified at this stage. As an example, a broadcast message is a message that is broadcast from a single node to possibly many nodes. This relationship is also identified at step 406 and intermediate data is stored to distinguish these traces as candidates for hyperlinks when markup language version of the traces are materialized.
A filter or search condition may be established for the traces (408). A user desiring to view a particular emphasized pattern may establish such a filter/search condition. If a filter out condition has been established (410), then a search is performed and any traces matching the filter condition are filtered from the group of traces to be converted into a markup language format (412). Information extracted from the trace string during the parse procedure is used to determine if the trace string should be filtered. For example, a filter may be set to exclude all traces corresponding to a system resource “A”. If a trace message corresponding to this filter condition is encountered, then the trace message will be discarded from the conversion process and will not be viewed by the browser 316. If the trace message does not correspond to the filter condition, then the conversion process proceeds for that trace message.
If a search condition has been established for a desired emphasized pattern (414), then a search is performed and any traces matching the search condition are identified as candidates for additional markup language elements to include differentiating visual cues for conversion into a markup language format (416). Information extracted from the trace string during the parse procedure is again used to determine if the trace string should be emphasized.
The traces are thereafter materialized in a markup language format (418). In particular, traces that are part of navigable patterns are materialized to include hyperlinks. Traces that are part of emphasized patterns are materialized to include markup language elements to provide additional visual cues. The materialized traces in markup language format can be viewed using any suitable browser compatible with the particular markup language used for the conversion.
The following represents an example of a generic template that can be used for a fixed trace string:
Generic Template: <Header><keyword><arg0><arg1><arg2><arg3> . . . <argn>
In this generic template, <Header> represents the portion of the trace string containing required data items used for the conversion process. <Keyword> represents one or more keyword “hints” that provide additional information regarding the format/type of arguments that follow. <arg0> through <argn> represents additional arguments to be generated with the trace string.
As a more specific example of a fixed format trace string, consider the following trace string which is generated for a database operation during a deadlock detection (“DD”) search by a distributed lock manager (“DLM”):
7C839FEF:00000010 5 4 10435 51 DLM-DD start:ddTS[0.1][TXN]res[0×1][0×1],[TX], node 0
In this example trace string, the <header> portion includes the following information:
7C839FEF:00000010 5 4 10435 51
This is a fixed format header record representing the following items of information:
<timestamp><sid><pid><event><opcode>
where,
In the example trace string, “DLM-DD” represents a <keyword> that provides a “hint” regarding the type of operation performed and the type/format of the arguments that follow.
<arg0> is represented by the string “start:”. The “start:” value identifies a particular operation or stage of an operation that is performed. Other examples of types of information that may be included in <arg0> for a deadlock detection operation are: “send:—”, “receive:—”, “found:—”, “confirm:—”, “drop (victim done):—”
<arg1> is represented by the “ddTS[x.y]” string where x and y are integers which maintain the deadlock search count. The “ddTS[0.1]” value in the example string is a token number that allows related traces (e.g., send-receive pairs) to be identified across multiple nodes.
<arg2> is represented by the “[TXN]res|[PROC]res” string which specified whether it is a process (PROC) owned resource or a transaction (TXN) owned resource.
<arg3> has the value “node 0” to indicate a particular node related to the trace message.
The following represents a series of example trace messages generated for a deadlock detection operation by a distributed lock manager:
Traces from Node 0
In the example traces above, the send operation (trace 2 from node 0) and receive operation (trace 1 from node 3) pair form a direct linking pattern where opcode “52” is used to create the link identification. Thus, conversion into a markup language format would result in a hyperlink between these two traces. The following is an example of the conversion of the send operation trace in a HTML-based markup language format:
The following is another example of converting the receive operation trace into a HTML-based markup language format:
In the examples above, traces for a particular deadlock detection operation can also be found with keyword DLM-DD as the primary key and arg1 (ddTS[x.y]) as the secondary key.
In the above example, a possible emphasizing pattern that can be identified could be to establish a search filter for all traces for transaction based resources grouped by string [TXN]res. If it is desired to emphasize this pattern, the following is an example of a converted markup language format for these traces:
The following is another example of a converted trace having this emphasized pattern:
Referring to
In an embodiment, the host computer 522 operates in conjunction with a data storage system 531, wherein the data storage system 531 contains a database 532 that is readily accessible by the host computer 522. Note that a multiple tier architecture can be employed to connect user stations 524 to a database 532, utilizing for example, a middle application tier (not shown). In alternative embodiments, the database 532 may be resident on the host computer, stored, e.g., in the host computer's ROM, PROM, EPROM, or any other memory chip, and/or its hard disk. In yet alternative embodiments, the database 532 may be read by the host computer 522 from one or more floppy disks, flexible disks, magnetic tapes, any other magnetic medium, CD-ROMs, any other optical medium, punchcards, papertape, or any other physical medium with patterns of holes, or any other medium from which a computer can read. In an alternative embodiment, the host computer 522 can access two or more databases 532, stored in a variety of mediums, as previously discussed.
Referring to
A processing unit may be coupled via the bus 606 to a display device 611, such as, but not limited to, a cathode ray tube (CRT), for displaying information to a user. An input device 612, including alphanumeric and other columns, is coupled to the bus 606 for communicating information and command selections to the processor(s) 607. Another type of user input device may include a cursor control 613, such as, but not limited to, a mouse, a trackball, a fingerpad, or cursor direction columns, for communicating direction information and command selections to the processor(s) 607 and for controlling cursor movement on the display 611.
According to one embodiment of the invention, the individual processing units perform specific operations by their respective processor(s) 607 executing one or more sequences of one or more instructions contained in the main memory 608. Such instructions may be read into the main memory 608 from another computer-usable medium, such as the ROM 609 or the storage device 610. Execution of the sequences of instructions contained in the main memory 608 causes the processor(s) 607 to perform the processes described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and/or software.
The term “computer-usable medium,” as used herein, refers to any medium that provides information or is usable by the processor(s) 607. Such a medium may take many forms, including, but not limited to, non-volatile and volatile media. Nonvolatile media, i.e., media that can retain information in the absence of power, includes the ROM 609. Volatile media, i.e., media that can not retain information in the absence of power, includes the main memory 608. Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise the bus 606. Transmission media can also take the form of carrier waves; i.e., electromagnetic waves that can be modulated, as in frequency, amplitude or phase, to transmit information signals. Additionally, transmission media can take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications.
Common forms of computer-usable media include, for example: a floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, RAM, ROM, PROM (i.e., programmable read only memory), EPROM (i.e., erasable programmable read only memory), including FLASH-EPROM, any other memory chip or cartridge, or any other medium from which a processor 607 can retrieve information. Various forms of computer-usable media may be involved in providing one or more sequences of one or more instructions to the processor(s) 607 for execution. The instructions received by the main memory 608 may optionally be stored on the storage device 610, either before or after their execution by the processor(s) 607.
Each processing unit may also include a communication interface 614 coupled to the bus 606. The communication interface 614 provides two-way communication between the respective user stations 524 and the host computer 522. The communication interface 614 of a respective processing unit transmits and receives electrical, electromagnetic or optical signals that include data streams representing various types of information, including instructions, messages and data. A communication link 615 links a respective user station 524 and a host computer 522. The communication link 615 may be a LAN 526, in which case the communication interface 614 may be a LAN card. Alternatively, the communication link 615 may be a PSTN 528, in which case the communication interface 614 may be an integrated services digital network (ISDN) card or a modem. Also, as a further alternative, the communication link 615 may be a wireless network 530. A processing unit may transmit and receive messages, data, and instructions, including program, i.e., application, code, through its respective communication link 615 and communication interface 614. Received program code may be executed by the respective processor(s) 607 as it is received, and/or stored in the storage device 610, or other associated non-volatile media, for later execution.
In the foregoing specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. For example, the reader is to understand that the specific ordering and combination of process actions shown in the process flow diagrams described herein is merely illustrative, and the invention can be performed using different or additional process actions, or a different combination or ordering of process actions. The specification and drawings are, accordingly, to be regarded in an illustrative rather than restrictive sense.
Number | Name | Date | Kind |
---|---|---|---|
3707725 | Dellhelm | Dec 1972 | A |
4462077 | York | Jul 1984 | A |
5390316 | Cramer et al. | Feb 1995 | A |
5537319 | Schoen | Jul 1996 | A |
5594904 | Linnermark et al. | Jan 1997 | A |
5642478 | Chen et al. | Jun 1997 | A |
5649085 | Lehr | Jul 1997 | A |
5689636 | Kleber et al. | Nov 1997 | A |
5708825 | Sotomayor | Jan 1998 | A |
5870606 | Lindsey | Feb 1999 | A |
5920719 | Sutton et al. | Jul 1999 | A |
5963740 | Srivastava et al. | Oct 1999 | A |
6026362 | Kim et al. | Feb 2000 | A |
6083281 | Diec et al. | Jul 2000 | A |
6164841 | Mattson, Jr. et al. | Dec 2000 | A |
6189141 | Benitez et al. | Feb 2001 | B1 |
6202099 | Gillies et al. | Mar 2001 | B1 |
6202199 | Wygodny et al. | Mar 2001 | B1 |
6243864 | Odani et al. | Jun 2001 | B1 |
6282701 | Wygodny et al. | Aug 2001 | B1 |
6289503 | Berry et al. | Sep 2001 | B1 |
6311326 | Shagam | Oct 2001 | B1 |
6338159 | Alexander, III et al. | Jan 2002 | B1 |
6339775 | Zamanian et al. | Jan 2002 | B1 |
6339776 | Dayani-Fard et al. | Jan 2002 | B2 |
6349406 | Levine et al. | Feb 2002 | B1 |
6351844 | Bala | Feb 2002 | B1 |
6353898 | Wipfel et al. | Mar 2002 | B1 |
6353924 | Ayers et al. | Mar 2002 | B1 |
6467083 | Yamashita | Oct 2002 | B1 |
6470349 | Heninger et al. | Oct 2002 | B1 |
6480886 | Paice | Nov 2002 | B1 |
6507805 | Gordon et al. | Jan 2003 | B1 |
6513155 | Alexander et al. | Jan 2003 | B1 |
6546548 | Berry et al. | Apr 2003 | B1 |
6553564 | Alexander et al. | Apr 2003 | B1 |
6574792 | Easton | Jun 2003 | B1 |
6584491 | Niemi et al. | Jun 2003 | B1 |
6598012 | Berry et al. | Jul 2003 | B1 |
6654749 | Nashed | Nov 2003 | B1 |
6658652 | Alexander, III et al. | Dec 2003 | B1 |
6678883 | Berry et al. | Jan 2004 | B1 |
6694507 | Arnold et al. | Feb 2004 | B2 |
6708173 | Behr et al. | Mar 2004 | B1 |
6715140 | Haga | Mar 2004 | B1 |
6732095 | Warshavsky et al. | May 2004 | B1 |
6738778 | Williamson et al. | May 2004 | B1 |
6738965 | Webster | May 2004 | B1 |
6748583 | Aizenbud-Reshef et al. | Jun 2004 | B2 |
6751753 | Nguyen et al. | Jun 2004 | B2 |
6754890 | Berry et al. | Jun 2004 | B1 |
6802054 | Faraj | Oct 2004 | B2 |
6826747 | Augsburg et al. | Nov 2004 | B1 |
6877081 | Herger et al. | Apr 2005 | B2 |
6944797 | Guthrie et al. | Sep 2005 | B1 |
7017084 | Ng et al. | Mar 2006 | B2 |
7043668 | Treue et al. | May 2006 | B1 |
7165190 | Srivastava et al. | Jan 2007 | B1 |
20010011360 | Shigeta | Aug 2001 | A1 |
20020004803 | Serebrennikov | Jan 2002 | A1 |
20020016771 | Carothers et al. | Feb 2002 | A1 |
20020019837 | Balnaves | Feb 2002 | A1 |
20020066081 | Duesterwald et al. | May 2002 | A1 |
20020073063 | Faraj | Jun 2002 | A1 |
20020078143 | De Boor et al. | Jun 2002 | A1 |
20020087592 | Ghani | Jul 2002 | A1 |
20020087949 | Golender et al. | Jul 2002 | A1 |
20020095660 | O'Brien et al. | Jul 2002 | A1 |
20020107882 | Gorelick et al. | Aug 2002 | A1 |
20020120634 | Min et al. | Aug 2002 | A1 |
20020133806 | Flanagan et al. | Sep 2002 | A1 |
20020161672 | Banks et al. | Oct 2002 | A1 |
20030088854 | Wygodny et al. | May 2003 | A1 |
20030140045 | Heninger et al. | Jul 2003 | A1 |
20030196192 | Barclay et al. | Oct 2003 | A1 |
20040158819 | Cuomo et al. | Aug 2004 | A1 |
20040205718 | Reynders | Oct 2004 | A1 |
20040210877 | Sluiman et al. | Oct 2004 | A1 |
20040216091 | Groeschel | Oct 2004 | A1 |
20040216092 | Ayers et al. | Oct 2004 | A1 |