This invention relates to the field of computer applications, and in particular to distributed applications that include communications among processors on a network.
With advances in networking technology, distributed applications continue to grow in popularity, and in complexity. In a typical distributed application, a client device may initiate the application, and the application may execute a request for data services at a remote server, and this remote server may in turn request data or other processing from other remote servers. The executed processes at the servers may be specific components of the application residing at the servers, or they may be components provided by the servers and accessed by the application.
Users of an application are generally sensitive to performance and reliability issues associated with the application, and in a competitive market, will generally avoid slow or unreliable applications. Application developers are also sensitive to these issues, to assure that their developed product remains competitive. In like manner, service providers are also sensitive to these issues, to assure that their provided service is not the cause of performance and reliability problems that may affect their customers.
Tools are available for assessing network traffic performance, as are tools for assessing processing performance. The ACE system from OPNET Technologies, Inc., of Bethesda, MD, for example, captures data transmissions associated with an application across a network, and presents the information as a data exchange chart, or as a Gantt chart, that illustrates the time spent communicating the application messages between nodes on the network, as well as the time spent at each node. The OPENVIEW GLANCEPLUS system from Hewlett-Packett, on the other hand, captures processing system performance, including such parameters as CPU processing time, disk transfer rates, cache page faults, and so on.
As applications and networks increase in complexity, the distinction between the traditionally separate tasks of network analysis and processing analysis is becoming less clear. As processing performance becomes more and more dependent upon the amount and type of traffic arriving at a processing server, or the capacity of the network to accept traffic from the server, the processing system manager needs to address how network activity affects the system's performance. And, as network performance becomes more and more dependent upon the amount and type of delays occurring at the processing server, or the capacity of the server to accept traffic from the network, the network system manager needs to address how the processing system's performance affects the network's performance.
It would be advantageous to provide an integrated view of network traffic and processor system performance. By providing a synchronized display of traffic events and process events at one or more nodes of a network, the analysis of network and/or process performance can be performed in the context of an interrelated and interdependent set of traffic and process events.
This advantage and others are achieved by a method and system that include a first capture system that captures communication events, and a second capture system that captures processing events related to the application. A visualization system analyzes the data captured by each of the capture systems, synchronizes the data to a common time base, and presents an integrated display of these communication and processing events.
The invention is explained in further detail, and by way of example, with reference to the accompanying drawings wherein:
Throughout the drawings, the same reference numerals indicate similar or corresponding features or functions. The drawings are included for illustrative purposes and are not intended to limit the scope of the invention.
In the following description, for purposes of explanation rather than limitation, specific details are set forth such as the particular architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the concepts of the invention. However, it will be apparent to those skilled in the art that the present invention may be practiced in other embodiments, which depart from these specific details. In like manner, the text of this description is directed to the example embodiments as illustrated in the Figures, and is not intended to limit the claimed invention beyond the limits expressly included in the claims. For purposes of simplicity and clarity, detailed descriptions of well-known devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
The invention is presented in the context of assessing network and process performance at a server that is accessed by a client and performs tasks on behalf of the client. In the context of this invention, a network event is defined as a communication of a message from a source node to a destination node, and network performance is defined as a metric related to communications between pairs of source and destination nodes. The measure of effectiveness includes, for example, the volume of traffic communicated between the pair of nodes during a given period of time, the delay associated with the communication of messages between the pair, and so on. A processing event, on the other hand, is any event at the processing system that sends or receives the messages, independent of the particular pair of nodes associated with each message. Process performance is defined as a metric associated with the processing events, independent of the particular application that sends or receives the messages, and includes, for example, the percentage of time that the processing CPU is being used, the amount of data transferred to and from storage, the efficiency of cache processing, and so on. One of ordinary skill in the art will recognize that the principles of this invention are applicable to the analysis of performance at any device or system that processes and communicates data over time.
In accordance with this invention, an analysis system 150 is configured to receive information from a process monitor 122 and traffic monitor 124 that are coupled to, or integral to, the server 120. These monitors 122, 124 are configured to capture performance metrics associated with the processing of data at the server and the communication of data to and from the server, respectively.
In the above referenced parent application to this application, “CAPTURE, ANALYSIS, AND VISUALIZATION OF CONCURRENT SYSTEM AND NETWORK BEHAVIOR OF AN APPLICATION”, Ser. No. 11/505,176, filed 16 Aug. 2006 for Baron et al., techniques are disclosed for tracing traffic and process events related to a target application by instrumenting the application processes on the processing server to effect the capture of relevant process events. This invention is premised on the observation that network and process performance often needs to be analyzed and assessed independent of a particular application; or, if an application is being assessed, the analysis often needs to also address how the application is being affected by the overall network and process performance in the context of multiple concurrent applications.
The process monitor 122 preferably includes one or more conventional process monitoring tools, such as the aforementioned OPENVIEW GLANCEPLUS tool from HP, or others, such as PANORAMA from OPNET and the various PERFMON (Performance Monitor) embodiments available from Microsoft, HP, Sun Systems, and so on. The process monitor captures information that is specific to the server 120, such as CPU and disk utilization percentages, average disk queue length, cache hit ratio, numbers of blocked and unblocked users, and a variety of measures per unit time, such as numbers of system calls or interrupts per second, logical and physical disk reads and writes per second, page faults and cache faults per second, and so on. The process monitor 122 also captures information related to the network interface to the server, such as bytes or packets sent or received per second, output queue length, and so on.
The traffic monitor 124, on the other hand, captures information specific to the messages communicated over the network, including an identification of source and destination nodes for each message, the size of each message, the message data, and so on. As noted above, tools such as OPNET's ACE system are particularly well suited for controlling and managing the capture of network performance metrics.
In a preferred embodiment of this invention, the analysis system 150 is configured to facilitate control of each of the process monitor 122 and traffic monitor 124. This control may be automated or manual, or a combination of both. In the automated mode, events at one of the monitors 122, 124 trigger the start of data collection at the other monitor. For example, detection of a message at the traffic monitor 124 to or from a particular source or destination node may trigger the start of data collection at the process monitor 122; or, a significant change to one of the monitored processing parameters at the process monitor 122 may trigger the start of data collection at the traffic monitor 124. Such triggering is particularly advantageous when Quality of Service (QoS) guarantees are in place in the network. Any violation of predetermined thresholds triggers the start of data collection at both monitors 122, 124 so that appropriate analysis can be performed. Alternatively, or additionally, the analysis system may initiate the data collection on a periodic basis, or based on events at other nodes in the network, and so on. In the manual mode, the user may be provided the option of viewing the activity at one of the monitors, and initiating data collection at the other on demand. Optionally, the monitors 122 and 124 may be configured to continuously monitor the process and traffic events, and the analysis system is configured to control storing of the data for subsequent viewing and analysis, automatically or manually.
The inclination of the arrows in
If, as is typical, the network time NT does not correspond to the processor time PT, some correspondence between the clocks must be established before the data can be properly displayed. In a straightforward embodiment, the user is provided the option of manually inputting an appropriate offset that establishes when the process data starts relative to the traffic data. Alternatively, the user may input a known difference between the network time NT and the process time PT. Each of these options is preferably provided via a graphical user interface. In an interactive embodiment, the user is provided the option of ‘grabbing’ one or more of the graphs or charts of
Typically, each of the manual synchronization options described above requires that the user have a reasonable understanding of the collected data and/or the process and network monitors 122, 124. In more automated embodiments, the network and process data can be synchronized to a common time base with little or no user interaction.
Any of a variety of techniques may be used to facilitate the automated determination of a common time base. For example, prior to capturing data, the invention could instruct the process monitor 122 and network monitor 124 to synchronize their clocks. This can be accomplished by synchronizing one with the other, or synchronizing each with a global time base, such as those regulated by the Global Positioning System (GPS) or the Network Time Protocol (NTP), or to an arbitrary time base. If the clocks cannot be synchronized prior to capturing data, each of the process monitor 122 and network monitor 124 could supply, with their corresponding capture data, an offset that identifies the difference between their clocks and that of a global clock. Alternatively, if both monitors 122, 124 provide an option to read its current time, the system 150 need merely request the current time from each and adjust the reported times accordingly. As noted above, however, the network time NT used for data exchange charts is often a normalized time based on a post-process analysis of the reported times, and the time at the monitor 124 would not necessarily correspond to this determined network time NT. If the process that determines the normalized network time NT can be configured to set the time base to a select one of the tiers and adjust the other tier time bases accordingly, and if it is known that the traffic monitor 124 at the server 120 uses the server's CPU time, the system 150 need only direct the process to use the time base of the traffic monitor 124 as the normalized network time NT.
If the above manual or automated determinations of a common time base between the process time PT and network time NT cannot be used, other synchronizing techniques, such as pattern-matching may be used. That is, patterns in the graphs of the process events of
In a preferred embodiment, a combination of user interaction and automated synchronization is used. For example, the time base adjustments can be based on prior time shift determinations. That is, if at some prior time it had been determined that the process time PT and network time NT at a particular server differed by a given amount, this amount may be used as an initial offset for providing a common time base, subject to subsequent user adjustment. These and other techniques for determining or approximating a correspondence between the process and network time bases will be evident to one of skill in the art in view of this disclosure.
In a typical embodiment of a processor monitor, such as HP GLANCEPLUS, all of the processing-related data is maintained in a single file for a given processing system, or in a single database. The dialog box of
In a preferred embodiment, the individual time-based metrics can be overlaid upon each other, using different colors for different metrics. The user is provided the option 342 of having the system overlay all of the selected metrics on a single graph, or having the system overlay sets of similar metrics on each of a plurality of graphs. That is, for example, all of the metrics related to the CPU may be placed on one graph, all of the metrics related to disk-transfers can be placed on another, and so on. Preferably, default similarity sets are provided, and the user is provided the option of defining other sets, selecting the colors to be used within the sets, and so on. Alternatively, each selected metric could be place on a separate graph.
The process performance metrics are preferably displayed as timing diagrams, while the network traffic data is preferably displayed using data exchange charts; as noted above, traffic performance metrics may also be displayed as timing diagrams in display region 360, with the selected process performance metrics. In the example of
Other interface screens are also provided, including an interface to control the process and traffic monitoring tools, an interface to control the timing offset between the process and traffic time bases, and so on.
The foregoing merely illustrates the principles of the invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements which, although not explicitly described or shown herein, embody the principles of the invention and are thus within its spirit and scope. For example, although the process monitor 122 and traffic monitor 124 are conventionally different devices or modules, one of skill in the art will recognize in view of the principles presented herein, that a single monitor that incorporates the functions of monitors 122 and 124 can be provided. In like manner, an integrated analysis system could be provided that includes the analysis component 150 and one or both of the monitors 122, 124. These and other system configuration and optimization features will be evident to one of ordinary skill in the art in view of this disclosure, and are included within the scope of the following claims.
In interpreting these claims, it should be understood that:
This application is a continuation in part of U.S. patent application Ser. No. 11/505,176, filed 16 Aug. 2006, the contents of which is incorporated by reference herein, and claims the benefit of U.S. Provisional Patent Applications 60/709,762 filed 19 Aug. 2005, and 60/750,665, filed 15 Dec. 2005.
Number | Date | Country | |
---|---|---|---|
60709762 | Aug 2005 | US | |
60750665 | Dec 2005 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11505176 | Aug 2006 | US |
Child | 11639864 | Dec 2006 | US |