Application performance management relates to technologies and systems for monitoring and managing the performance of applications. For example, application performance management is commonly used to monitor and manage transactions performed by an application running on a server to a client.
Today, many applications can be accessed over a network, such as the Internet or intranet. For example, due to the ubiquity of web browsers on most client devices, web applications have become particularly popular. Web applications typically employ a browser-supported infrastructure, such as Java or a .NET framework. However, the performance of these types of applications is difficult to monitor and manage because of the complexity of the software and hardware and numerous components that may be involved.
A transaction typically comprises a sequence of method calls in a program that represent a complete set of operations necessary to perform a self-contained unit of work, such as a web request or a database query. Transactions can be traced to monitor and manage their performance. For example, a trace can be performed in an application server to obtain detailed information about the execution of an application within that server.
Unfortunately, the tracing of a transaction through a typical network system is difficult. For example, even when several network-connected interoperating components of a multi-tier application are all instrumented, the known application monitoring systems and methods are unable to correlate transaction call sequences from those components that are causally related. These communications are difficult to correlate because they are independently running on the client and server. In addition, the common use of network address translation makes tracing these communications difficult.
In some known systems, for hypertext transport protocol (HTTP) based requests, it is possible to insert a unique custom header on the client side into an outgoing HTTP request message, and to intercept this custom header on the server side. If each side of the transaction (i.e., the client and server) is tracing calls, the custom header associated with the HTTP request can be recorded to the trace files on each side of each request and response. The associated calls can later be correlated based on this custom information.
Unfortunately, for cross-tier communications that are not encapsulated as HTTP requests and responses, it is generally not possible to insert such additional context into the messages.
Furthermore, in a traditional transaction trace for web applications, Java or .NET instrumentation components are running (on the application server, the client, etc.) and write records of all of the method calls of a transaction to a transaction trace file. Such tracing must be predominantly initiated manually or triggered by a program condition and for only a limited period of time. It is necessary to limit trace duration and detail in the conventional systems because the act of tracing is relatively expensive and could negatively system performance and disk space of the server, the client, etc.
Accordingly, this also means that in many circumstances the execution of an application within a system cannot be diagnosed or monitored regardless of whether the communications are HTTP based or not.
The invention is explained in further detail, and by way of example, with reference to the accompanying drawings wherein:
Throughout the drawings, the same reference numerals indicate similar or corresponding features or functions. The drawings are included for illustrative purposes and are not intended to limit the scope of the invention.
The embodiments relate to monitoring and managing applications, such as web applications running via the hardware and software in a network infrastructure. In particular, the embodiments provide a framework for tracing as many transactions as possible in real-time and correlating the traces across multiple tiers and components. In one embodiment, whenever possible, the application performance management systems and methods will attempt to trace every call in every transaction, while maintaining a low overhead and minimizing the impact on system performance. In one embodiment, a throughput manager manages the tradeoff between performance and completeness of detail harvested by the tracing.
In addition, one embodiment provides methods and systems for correlating the trace information across multiple tiers and components supporting an application. For example, in one embodiment, the system gathers the various transaction trace files. Various network-connected interoperating components of a multi-tier application may all be instrumented and producing transaction traces. The system correlates transaction call sequences from those components that are causally related even if the transaction spans across multiple tiers of the application. In particular, the system can associate a client socket with the corresponding server socket in pairs of trace files and associate the data transmitted with a specific socket send call with data received by a corresponding socket receive call.
For example, a sequence of transaction method calls on a client process may lead to a socket-based data transmission of a request message from the client to a server. The server, having received the data transmission, may analyze the data and make a subsequent sequence of method calls to fulfill the request. Then, the server may in turn send its response data transmission to the waiting client, which continues its activity. Because the send method of the client which transmits the request message is causally related to the receive method which receives the request, the system can associate the call sequences on both the client and server based on data transmissions that the sequences have in common.
In the following description, for purposes of explanation rather than limitation, specific details are set forth such as the particular architecture, interfaces, techniques, etc., in order to provide an understanding of the concepts of the invention. However, it will be apparent to those skilled in the art that the present invention may be practiced in other embodiments, which depart from these specific details.
Certain embodiments of the inventions will now be described. These embodiments are presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms. Furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the inventions. For example, for purposes of simplicity and clarity, detailed descriptions of well-known components, such as circuits, are omitted so as not to obscure the description of the present invention with unnecessary detail. To illustrate some of the embodiments, reference will now be made to the figures.
Clients 102 refer to any device requesting and accessing services of applications provided by system 100. Clients 102 may be implemented using known hardware and software, such as a processor, a memory, communication interfaces, an operating system, application software, etc. For example, clients 102 may be implemented on a personal computer, a laptop computer, a tablet computer, a smart phone, and the like. Such devices are known to those skilled in the art and may be employed in one embodiment.
The clients 102 may access various applications based on client software running or installed on the clients 102. The clients 102 may execute a thick client, a thin client, or hybrid client. For example, the clients 102 may access applications via a thin client, such as a browser application like Internet Explore, Firefox, etc. Programming for these thin clients may include, for example, JavaScript/AJX, JSP, ASP, PHP, Flash, Silverlight, and others. Such browsers and programming code are known to those skilled in the art.
Alternatively, the clients 102 may execute a thick client, such as a stand-alone application, installed on the clients 102. Programming for thick clients may be based on the .NET framework, Java, Visual Studio, etc.
Web server 104 provides content for the applications of system 100 over a network, such as network 124. Web server 104 may be implemented using known hardware and software, such as a processor, a memory, communication interfaces, an operating system, etc. to deliver application content. For example, web server 104 may deliver content via HTML pages and employ various IP protocols, such as HTTP.
Application servers 106 provide a hardware and software environment on which the applications of system 100 may execute. In one embodiment, application servers 106 may be implemented as a Java Application Server, a Windows Server implementing a .NET framework, LINUX, UNIX, WebSphere, etc. running on known hardware platforms. Application servers 106 may be implemented on the same hardware platform as the web server 104, or as shown in
In one embodiment, application servers 106 may provide various applications, such as mail, word processors, spreadsheets, point-of-sale, multimedia, etc. Application servers 106 may perform various transaction related to requests by the clients 102. In addition, application servers 106 may interface with the database server 108 and database 110 on behalf of clients 102, implement business logic for the applications, and other functions known to those skilled in the art.
Database server 108 provides database services access to database 110 for transactions and queries requested by clients 102. Database server 108 may be implemented using known hardware and software, such as a processor, a memory, communication interfaces, an operating system, etc. For example, database server 108 may be implemented based on Oracle, DB2, Ingres, SQL Server, MySQL, and etc. software running on the server 108.
Database 110 represents the storage infrastructure for data and information requested by clients 102. Database 110 may be implemented using known hardware and software, such as a processor, a memory, communication interfaces, an operating system, etc. For example, database 110 may be implemented as a relational database based on known database management systems, such as SQL, MySQL, etc. Database 110 may also comprise other types of databases, such as, object oriented databases, XML databases, and so forth.
Application performance management system 112 represents the hardware and software used for monitoring and managing the applications provided by system 100. As shown, application performance management system 112 may comprise a collector 114, a monitoring server 116, a monitoring database 118, a monitoring client 120, and agents 122. These components will now be further described.
Collector 114 collects application performance information from the components of system 100. For example, collector 114 may receive information from clients 102, web server 104, application servers 106, database server 108, and network 124. The application performance information may comprise a variety of information, such as trace files, system logs, etc. Collector 114 may be implemented using known hardware and software, such as a processor, a memory, communication interfaces, an operating system, etc. For example, collector 114 may be implemented as software running on a general-purpose server. Alternatively, collector 114 may be implemented as an appliance or virtual machine running on a server.
Monitoring server 116 hosts the application performance management system. Monitoring server 116 may be implemented using known hardware and software, such as a processor, a memory, communication interfaces, an operating system, etc. Monitoring server 116 may be implemented as software running on a general-purpose server. Alternatively, monitoring server 116 may be implemented as an appliance or virtual machine running on a server.
Monitoring database 118 provides a storage infrastructure for storing the application performance information processed by the monitoring server 116. Monitoring database 118 may be implemented using known hardware and software, such as a processor, a memory, communication interfaces, an operating system, etc.
Monitoring client 120 serves as an interface for accessing monitoring server 116. For example, monitoring client 120 may be implemented as a personal computer running an application or web browser accessing the monitoring server 120.
Agents 122 serve as instrumentation for the application performance management system. As shown, the agents 122 may be distributed and running on the various components of system 100. Agents 122 may be implemented as software running on the components or may be a hardware device coupled to the component. For example, agents 122 may implement monitoring instrumentation for Java and .NET framework applications. In one embodiment, the agents 122 implement, among other things, tracing of method calls for various transactions. In particular, in one embodiment, agents 122 may interface known tracing configurations provided by Java and the .NET framework to enable tracing and to modulate the level of detail of the tracing.
In one embodiment, the agents 122 may implement or include a throughput manager to allow for continuous tracing of the node or entity being monitored, such as clients 102 or application server 106. As noted, conventional tracing on a server, such as application server 106, must be initiated manually or triggered by a program condition and for only a limited period of time. Conventionally, it is considered necessary to limit trace duration and detail because the act of tracing is relatively expensive and could negatively impact performance and disk space of the application server 106.
In contrast, the embodiments permit continuous, rather than requiring intermittent, tracing of an entity. The continuous tracing may be performed for various durations. In addition, in the embodiments, the continuous tracing may be temporarily suspended. However, in one embodiment, the throughput manager in agents 122 may continue to run and re-initiate tracing when system performance allows. For example, in one embodiment, the agents 122 automatically modulate the level of detail written to meet a set of throughput goals set by the user. In one embodiment, the user, for example via monitoring client 120, may set a target data rate, such as in kilobytes per second, and a maximum amount of disk space to be used by agents 122.
Based on communal data rate measured, the agents 122 may then adjust the level of transaction method call detail written to a transaction trace file to ensure these targets are met. If the current data rate is low enough, the agents 122 allows every detail of each method call, including information tags known as properties. A property is a pair of strings comprising a name and a value. The name of a property derives from a set of strings that identify characteristics, such as method arguments, environment settings at the time of a call, etc., to be associated with each specific method call of a transaction. For example, properties such as SQL statements, database URLs, HTTP methods, etc. may be traced in the embodiments. If, however, the data rate of trace data written by agents 122 becomes excessive, the agents 122 will omit some property details, or even some method call events themselves, from the transaction trace file.
As noted, for cross-tier communications that are not encapsulated as HTTP protocol requests and responses, correlating annotations can be difficult, since it is generally not possible to insert additional context onto messages sent over network 124 by way of arbitrary TCP socket requests. Accordingly, in one embodiment, the application monitoring system may exploit the tracing information produced by agents 122, harvest this information, and then correlate communications for transactions even if the transaction spans multiple tiers.
In particular, within each individual communicating process running on a component of the system 100 (e.g., the client 102 and/or the application servers 106), the tracing of the embodiments maintain awareness of when socket connections start and complete. For example, the agents 122 can track the local and remote ip:port pairs associated with each socket object. The local ip:port pairs uniquely identify one aspect of a particular socket during the course of a conversation.
Although these pairs are not unique indefinitely (since ports can be re-used), in one embodiment, the agents 122 assign an identifier (“ID”) to each socket at the start of a conversation. The socket ID may then be made globally unique by combining it with other information, such as a process ID, node ID, and start time of the process. In one embodiment, the agents 122 send this conversation data to the collector 114 and monitoring server 116 by read and write calls on streams associated with these sockets.
The monitoring server 116 can thus identify which sockets correspond to the reads and writes on the stream. In particular, the monitoring server 116 can identify the two sockets that form the end points of a particular conversation for a transaction between two processes on different tiers and different components (such as clients 102 and application servers 106) based on the ip:port pairs.
In addition, the agents 122 may insert identifiable markers based on the content of the data being transferred to help the monitoring server 116 match the sockets on each side of a data conversation, for example, across network 122. In one embodiment, the agents 122 use checksum values emitted at selected offsets into the conversation stream, such as offsets of 100, 1000, 5000 bytes into the stream and at the end of the stream. In one embodiment, the agents 122 employ Jenkins checksums. However, any form of checksum or other type of marker may be employed.
The monitoring server 116 can thus identify the correspondence between a client transaction, which sends a message to a server and the corresponding sub-transaction on the server, which performs the processing associated with this message. Each side may have a number of socket calls associated with the transfer of the client-to-server message and also a response from the server-to-client message. The embodiments also account for when the same socket pair is re-used for multiple transactions without reconnecting.
For example, at one or more of clients 102, when a socket connection is opened, the agent 122 assigns a connection ID, which is unique within the client process for its lifetime. In one embodiment, this ID is emitted with each socket call. On each socket call when a checksum boundary is crossed, the checksum is emitted with a value as a pair, such as <offset-in-stream, checksum-value>. The bytes sent/received are also emitted on each socket call. The monitoring server 116 can then calculate the absolute offset within the stream for a particular connection ID by summing these quantities. Alternatively, the absolute offset could be emitted by agents 122 on each call.
In addition, for each socket opened, the agents 122 assign a unique socket identifier, SOCKET_ID, property value for the life of the process. Then in one embodiment, for each successful socket read and write, the agents 122 emit the SOCKET_ID, a local address value (LOCAL_ADDR), a remote address value (REMOTE_ADDR), and other information, such as amount of data received like BYTES_RECEIVED or BYTES_SENT properties. Other values associated with the call may be emitted as well by the agents 122. If a read or write call crosses a boundary, the agents 122 write the boundary value and the running checksum of all bytes from the start of socket operation to the boundary with a property, such as a property labeled SOCKETBUFHASH. An exemplary process flow is also explained with reference to
Network 124 serves as a communications infrastructure for the system 100. Network 124 may comprise various known network elements, such as routers, firewalls, hubs, switches, etc. In one embodiment, network 124 may support various communications protocols, such as TCP/IP. Network 124 may refer to any scale of network, such as a local area network, a metropolitan area network, a wide area network, the Internet, etc.
In phase 200, the monitoring server 116 identifies the markers, such as the checksums written by a client program, such as agents 122, at client 102, associated with the connection ID for the client socket calls in the transaction. In one embodiment, the monitoring server 116 employs an index. The index key may be <node, process, connection ID>, and the value is the associated set of checksums for that connection ID. The monitoring server 116 maintains the index keyed on checksums where the stored value is the <node, process, connection ID, time the checksum was created>. If a lookup by the monitoring server 116 produces multiple matches, the monitoring server 116 based on time can disambiguate these matches.
In phase 202, based on the <node, process, connection ID>, the monitoring server 116 then identifies the correct socket calls associated with that connection, for example, based on the connection ID. In one embodiment, the monitoring server 116 maintains an index that is keyed on <node, process, connection id>, where the stored values are the data associated with each socket call for the connection <tracefile+transaction, offset-within-stream>. The monitoring server 116 performs a lookup to find all socket calls associated with the connection concerned, based on the corresponding range of stream offsets.
As noted above, the agents 122 that assigned a unique SOCKET_ID property value for the life of the process and for each successful socket read and write, emitted the SOCKET_ID, LOCAL_ADDR, REMOTE_ADDR and BYTES_RECEIVED or BYTES_SENT properties and values associated with the call. In addition, if a read or write call crosses a boundary, the agents 122 have written the boundary value and the running checksum of all bytes from the start of socket operation to the boundary with a SOCKETBUFHASH property.
In phase 204, the monitoring server 116 processes the trace output of the agents 122 and extracts the method call sequences comprising each identifiable transaction in the trace data and tags each transaction with the properties associated with each method call of the transaction. In the case of transactions involving socket input/output (“I/O”), these properties may include those generated by agents 122 as described above.
Call sequences are stored in a database 118 by monitoring server 116 and are indexed by their various property values, as well as timestamps. The database 118 may also be made available to monitoring client 120.
In phase 206, at monitoring client 120, a user may filter transactions dynamically by selected criteria for transactions based on property values. In some embodiments, when displaying a transaction to the monitoring client 120, the monitoring server 116 may examine transactions for possible relationships based on matching against the algorithm described above. If relationships are inferred by heuristic matching of HTTP custom header or socket checksum properties, then the related transactions can be correlated together by the monitoring server 116 and displayed as if they were a single transaction whose method call sequence spans the tiers between the related transactions.
The foregoing merely illustrates the principles of the invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements which, although not explicitly described or shown herein, embody the principles of the invention and are thus within its spirit and scope. Other system configuration and optimization features will be evident to one of ordinary skill in the art in view of this disclosure, and are included within the scope of the following claims.
The features and attributes of the specific embodiments disclosed above may be combined in different ways to form additional embodiments, all of which fall within the scope of the present disclosure. Although the present disclosure provides certain embodiments and applications, other embodiments that are apparent to those of ordinary skill in the art, including embodiments, which do not provide all of the features and advantages set forth herein, are also within the scope of this disclosure. Accordingly, the scope of the present disclosure is intended to be defined only by reference to the appended claims.
This application claims the benefit of priority of U.S. Provisional Application No. 61/439,662, filed Feb. 4, 2011, entitled “Correlating Input an Output Requests between Client and Server Components in a Multi-Tier Application,” which is incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5168554 | Luke | Dec 1992 | A |
5717911 | Madrid | Feb 1998 | A |
5754774 | Bittinger et al. | May 1998 | A |
5892900 | Ginter et al. | Apr 1999 | A |
5973626 | Berger | Oct 1999 | A |
6003079 | Friedrich et al. | Dec 1999 | A |
6021439 | Turek et al. | Feb 2000 | A |
6041332 | Miller et al. | Mar 2000 | A |
6324548 | Sorenson | Nov 2001 | B1 |
6411604 | Brockman et al. | Jun 2002 | B1 |
6732248 | Chang | May 2004 | B2 |
6761636 | Chung et al. | Jul 2004 | B2 |
7290048 | Barnett et al. | Oct 2007 | B1 |
7318064 | Patterson | Jan 2008 | B2 |
7392266 | Barsness | Jun 2008 | B2 |
7401141 | Carusi et al. | Jul 2008 | B2 |
7433955 | Block et al. | Oct 2008 | B2 |
7539669 | Broker | May 2009 | B2 |
7577701 | Johns et al. | Aug 2009 | B1 |
7593400 | Zelig | Sep 2009 | B2 |
7631073 | Chagoly et al. | Dec 2009 | B2 |
7730045 | Barsness | Jun 2010 | B2 |
7734775 | Barnett et al. | Jun 2010 | B2 |
7752183 | Patterson | Jul 2010 | B2 |
7826487 | Mukerji et al. | Nov 2010 | B1 |
7860799 | Russell et al. | Dec 2010 | B2 |
7873594 | Harada et al. | Jan 2011 | B2 |
7886059 | Block et al. | Feb 2011 | B2 |
7904488 | Hood | Mar 2011 | B2 |
7934003 | Carusi et al. | Apr 2011 | B2 |
7953850 | Mani et al. | May 2011 | B2 |
7958163 | Brentano et al. | Jun 2011 | B2 |
8146095 | Gale et al. | Mar 2012 | B2 |
8266097 | Harada et al. | Sep 2012 | B2 |
8271452 | Longshaw | Sep 2012 | B2 |
8386503 | Benson | Feb 2013 | B2 |
8423973 | Saunders et al. | Apr 2013 | B2 |
8478304 | David | Jul 2013 | B1 |
8495006 | Harada et al. | Jul 2013 | B2 |
8499009 | Brentano et al. | Jul 2013 | B2 |
8521868 | Ben-Yehuda et al. | Aug 2013 | B2 |
8528061 | Davis | Sep 2013 | B1 |
8549540 | Dixon et al. | Oct 2013 | B1 |
8578017 | Cobb et al. | Nov 2013 | B2 |
8667147 | Mani et al. | Mar 2014 | B2 |
8683489 | Dixon et al. | Mar 2014 | B2 |
8924973 | Arcese et al. | Dec 2014 | B2 |
20020128065 | Chung et al. | Sep 2002 | A1 |
20030005217 | Chang | Jan 2003 | A1 |
20030165162 | Westphal | Sep 2003 | A1 |
20040049693 | Douglas | Mar 2004 | A1 |
20040103193 | Pandya et al. | May 2004 | A1 |
20040103196 | Block et al. | May 2004 | A1 |
20040122942 | Green et al. | Jun 2004 | A1 |
20050021736 | Carusi et al. | Jan 2005 | A1 |
20050033774 | Brentano et al. | Feb 2005 | A1 |
20050039186 | Borkan | Feb 2005 | A1 |
20050083917 | Okamoto et al. | Apr 2005 | A1 |
20050132232 | Sima | Jun 2005 | A1 |
20050160078 | Benson | Jul 2005 | A1 |
20050251574 | Chagoly et al. | Nov 2005 | A1 |
20050289231 | Harada et al. | Dec 2005 | A1 |
20060020578 | Hood | Jan 2006 | A1 |
20060095395 | Patterson | May 2006 | A1 |
20060117091 | Justin | Jun 2006 | A1 |
20060179035 | Broker | Aug 2006 | A1 |
20060212264 | Barsness | Sep 2006 | A1 |
20070220051 | Brentano et al. | Sep 2007 | A1 |
20070268915 | Zelig | Nov 2007 | A1 |
20070271216 | Patterson | Nov 2007 | A1 |
20070282882 | Agarwal | Dec 2007 | A1 |
20070288490 | Longshaw | Dec 2007 | A1 |
20080027750 | Barkeloo | Jan 2008 | A1 |
20080059625 | Barnett et al. | Mar 2008 | A1 |
20080098041 | Chidambaran | Apr 2008 | A1 |
20080098173 | Chidambaran | Apr 2008 | A1 |
20080127208 | Bedi et al. | May 2008 | A1 |
20080127209 | Gale et al. | May 2008 | A1 |
20080162656 | Block et al. | Jul 2008 | A1 |
20080262797 | Carusi et al. | Oct 2008 | A1 |
20100088404 | Mani et al. | Apr 2010 | A1 |
20100094990 | Ben-Yehuda et al. | Apr 2010 | A1 |
20100287541 | Saunders et al. | Nov 2010 | A1 |
20110087630 | Harada et al. | Apr 2011 | A1 |
20110113117 | Genest et al. | May 2011 | A1 |
20110167156 | Mani et al. | Jul 2011 | A1 |
20120102001 | Longshaw | Apr 2012 | A1 |
20120144381 | Brentano et al. | Jun 2012 | A1 |
20120151488 | Arcese et al. | Jun 2012 | A1 |
20120317072 | Harada et al. | Dec 2012 | A1 |
20130174156 | Arcese et al. | Jul 2013 | A1 |
20140006606 | Dixon et al. | Jan 2014 | A1 |
20140136693 | Greifeneder et al. | May 2014 | A1 |
20150032884 | Greifeneder et al. | Jan 2015 | A1 |
Entry |
---|
Kawaguchi, Kohsuke. “Fingerprint”, Aug. 15, 2014, pp. 1 and 2, accessed on May 12, 2015 (https://wiki.jenkins-ci.org/display/JENKINS/Fingerprint). |
IBM, “PerformanceApplication Response Management Instrumentation Guide Ver. 5.3”, IBM Tivoli Monitoring for Transaction PerformanceApplication, Feb. 1, 2005, URL: http://publib.boulder.ibm.com/tividd/td/ITMFTP/SC32-9412-00/en—US/HTML/arm14.htm. |
International Search Report and Written Opinion for PCT/US2012/023725 mailed Jul. 11, 2011. |
“Application Response Measurement (ARM) Instrumentation Guide,” Version 5.3, IBM Tivoli Monitoring for Transaction Performance, Feb. 2005; 82 pages. |
Number | Date | Country | |
---|---|---|---|
20120246287 A1 | Sep 2012 | US |
Number | Date | Country | |
---|---|---|---|
61439662 | Feb 2011 | US |