1. Field of the Invention
This invention relates to methods and systems for monitoring the performance of an application and at least one storage device for storing code which performs the method. The invention has particular utility in the field of performance analysis of Java and .NET applications that invoke remote methods in a different virtual machine. This includes remote methods on the same physical computer as well as remote methods on a different physical computer. It also includes a sequence of virtual (and possibly remote) machines where machine A calls machine B which calls machine C.
2. Background Art
Modem Web applications typically invoke remote methods (or transactions) on a back-end Java or .NET virtual machine that is different than the Web application's virtual machine. This back-end virtual machine can be running an Enterprise Java Beans (EJB) server, or any generic Java application. Since the servers are on different virtual machines, there is typically no way to tie the performance of a unique Web transaction (pertaining to one specific request by a user) to the performance of the related unique back-end transaction.
The Open Group Application Response Measurement (ARM) has been developed to do something similar, but it has no facility to actually tie the two unique transactions together. Furthermore, it is up to the individual programmer to change the production application code to take advantage of the ARM API as described in U.S. Pat. No. 6,144,961. More information on ARM can be found at http://en.wikipedia.org/wiki/Application_Response_Measurement.
Published U.S. Patent Application 2007/0143323 to Vanrenen et al. discloses the correlation of data relating to execution flows running on different processes or threads at a computer system. The execution flows may represent sequences of software components that are invoked or other computer system resources that are consumed. A first execution flow fulfills a first request by transmitting a second request which initiates a second execution flow, such as at another computer system. The second request includes meta data, which identifies a context of the first request, such as a URL, an agent which monitors the first execution flow which initiated the second request. A manager receives information regarding the first execution flow from the first agent, and information regarding the second execution flow, along with the meta data, from a second agent, for correlating the first and second execution flows. The received information may include execution flow shape data.
As described by Vanrenen et al., an execution flow can be traced to identify each component that is invoked as well as obtain performance data such as the execution time of each component. An execution flow refers generally to the sequence of steps taken when a computer program executes. Tracing refers to obtaining a detailed record, or trace, of the steps a computer program executes. One type of trace is a stack trace. Traces can be used as an aid in debugging. However, information cannot be obtained and analyzed from every execution flow without maintaining an excessive amount of overhead data and thereby impacting the very application which is being monitored. One way to address this problem is by sampling so that information is obtained regarding every nth execution flow. This approach is problematic because it omits a significant amount of data and, if a particular execution flow instance is not selected for sampling, all information about it is lost. Thus, if a particular component is executing unusually slowly, for instance, but only on an irregular basis, this information may not be captured.
As further described by Vanrenen et al., another approach, aggregation, involves combining information from all execution flows into a small enough data set that can be reported. For example, assume there are one thousand requests to an application server. For each execution flow, performance data such as the response time can be determined. Information such as the slowest, fastest, median and mean response times can then be determined for the aggregated execution flows. However, aggregating more detailed information about the execution flows is more problematic since the details of the execution flows can differ in various ways. Vanrenen et al. deal with aggregating information between related execution flows, such as at different computer systems.
An object of the present invention is to provide an improved method and system for monitoring the performance of an application and at least one storage device for storing code which performs the method and which do not require the user to make any modifications to their program. Automated tracking and reporting of program execution across multiple virtual machines is provided.
In addition, the sequence of local and remote methods may be displayed in a single, hierarchical display that allows for the easy understanding and resolution of application performance problems.
In carrying out the above object and other objects of the present invention, a method of monitoring the performance of an application running in an environment in which a first thread is processed on a first virtual machine in response to an invocation process and a second thread is processed on a second virtual machine in response to a request to invoke from the first thread is provided. The method includes automatically generating first and second sets of thread instance data. The first set of thread instance data is based on the processing of the first thread and the second set of thread instance data is based on the processing of the second thread. The method further includes correlating the first and second sets of thread instance data to tie the invocation and performance of the processing of the first thread to the performance of the processing of the second thread. The invocation process is followed across the threads of execution of multiple virtual machines.
Each of the threads may have a stack. The first set of instance data may represent the location of the stack of the first thread and a representation of the current thread context executing on the first virtual machine and the second set of thread instance data may represent the location of the stack of the second thread and a representation of thread context of the second virtual machine. The step of correlating may correlate the thread and stack locations on both machines.
The method may further include transmitting data from the first virtual machine to the second virtual machine. The transmitted data may include the first set of thread instance data.
The method may further include the step of transmitting the first and second sets of thread instance data to a nucleus server. The nucleus server may perform the step of correlating.
The application may be a real application.
The environment may be a production environment.
The method may be computer-implemented.
The environment may be a distributed computer environment.
Further in carrying out the above object and other objects of the present invention, an apparatus for monitoring the performance of the application running in an environment in which a first thread is processed on a first virtual machine in response to an invocation process and a second thread is processed on a second virtual machine in response to a request to invoke from the first thread is provided. The apparatus includes at least one storage device and at least one processor in communication with the at least one storage device. The at least one processor performs a method which includes generating first and second sets of thread instance data. The first set of thread instance data is based on the processing of the first thread and the second set of thread instance data is based on the processing of the second thread. The method performed by the processor further includes correlating the first and second sets of thread instance data to tie the invocation and performance of the processing of the first thread to the performance of the processing of the second thread. The invocation process is followed across the threads of execution of multiple virtual machines.
Still further in carrying out the above object and other objects of the present invention, at least one processor-readable storage medium having processor-readable code embodied thereon for programming at least one processor to perform a method for monitoring the performance of an application running in an environment in which a first thread is processed on a first virtual machine in response to an invocation process and a second thread is processed on a second virtual machine in response to a request to invoke from the first thread is provided. The method includes generating first and second sets of thread instance data. The first set of thread instance data is based on the processing of the first thread and the second set of thread instance data is based on a processing of the second thread. The method further includes correlating the first and second sets of thread instance data to tie the invocation and performance of the processing of the first thread to the performance of the processing of the second thread. The invocation process is followed across the threads of execution of multiple virtual machines.
The above object and other objects, features, and advantages of the present invention are readily apparent from the following detailed description of the best mode for carrying out the invention when taken in connection with the accompanying drawings.
Each virtual machine in a distributed computer environment is made up of threads of execution. These threads are independent of each other while executing, but can be started, stopped, and called by other threads. Distributed computing allows for the threads of one virtual machine to invoke threads on another virtual machine. These are referred to as “remote procedure calls” or “remote process calls.”
Some operating systems and platforms utilize a technique called “thread pooling.” This technique creates a pre-defined number of threads, and reuses them for various executions that the system requires.
In at least one embodiment of the invention, a technique is provided to identify each unique usage of a thread within a thread pool. A request identifier is assigned and incremented for each unique usage of the thread. The combination of the thread identifier and request identifier is used to uniquely identify a “transaction” or what will subsequently be referred to as a “thread instance.”
Each thread is comprised of program code that is executing. The threads contain a call stack. The call stack represents the currently executing piece of code. It is commonly referred to simply as “the stack.”
Referring now to
In at least one embodiment of the present invention, a unique correlation identifier for the first machine's remote invocation of the second machine's thread is provided. This identifier represents the exact location in the first machine's stack, and the exact representation of the current thread context executing on that machine. This identifier is sent to the second machine, automatically appended to the first machine's operating system or platform level request to invoke the remote thread on the second machine. This is the data appended onto the wire. There is no user intervention required. The identifier and its definition are also sent to the nucleus server at this time. When the second machine's thread starts, it sends its exact stack location to the nucleus server. When the second machine's transaction finishes, it sends that same identifier (passed from the first machine on the wire) back to the nucleus server, along with its exact thread context, allowing the nucleus server to directly correlate the two exact stack and thread locations on both machines.
As noted above, at least one embodiment of the present invention focuses on the specific correlation of the individual thread instances. In other words, data is used specific to an individual transaction rather than data about the type of transaction which is then aggregated. Advantageously, this technique can be used to follow a specific instance of a request across multiple virtual machines' threads of execution. This allows for the diagnosis of a problem that may only happen once while thousands of similar requests with the same user-facing data have been made. This would be impossible to recognize with the aggregation technique employed by the prior art. This technique has solved the overhead issue mentioned in the prior art.
The technique of at least one embodiment of the present invention applies at the lower level of machine threads and call stacks, allowing for the specific instance correlation mentioned above.
Furthermore, unlike the prior art, the at least one embodiment does not require a Web browser or any user-facing data to accomplish the correlation. Advantageously, the present technique can be used in any environment that utilizes virtual machines, be it Web, command-line, or any other type of invocation process that starts the first virtual machine threads.
Unlike the prior art, the at least one embodiment does not require any user intervention. The correlation is done automatically. The present technique can be used in a production environment where user intervention is not allowed, or closely controlled. This allows the users to monitor the real application, rather than a debug or test version of it.
The at least one embodiment does not require any debug clients. Advantageously, this technique can be used in a product environment where debug clients are not allowed. Again, this allows the users to monitor the real application, rather than a debug or test version of it.
As previously noted, instance data is sent from one specific execution of a thread to another specific execution of a thread. This instance data specifically ties the two thread instances together, rather than correlating two generic flows. Specific instance data is received and correlated for individual threads, not an aggregated set of data related to an execution flow. This allows for the direct correlation of the first virtual machine's thread's performance to the second virtual machine's thread's performance. It is to be understood, however, that one embodiment of the invention may be utilized to correlate specific thread instances for inter-process communication within one virtual machine.
When used in a Java or .NET environment, at least one embodiment of the invention can instrument (change on the fly) the underlying Java and .NET system code. This allows one to alter the information that is transmitted across the network of
The calling machine puts the additional data on the wire with the program's original request to be sent to a program running on the second virtual machine (which may be on a separate computer). Instrumented code on the remote virtual machine pulls this additional data off and uses it to correlate the two transactions. Subsequently, the remote machine could invoke a method on another virtual machine, and the process would be exactly the same for the calls from it to this third machine.
Once the additional data is captured at the remote virtual machine, it is sent to a common database of performance data so that it can be correlated with other local and remote transactions. A view or screenshot on the performance console of
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
In a Web environment, by using at least one embodiment of the invention, the owners (developers, DBAs, operators, etc. . . . ) of a website can now track a user's transaction across multiple layers of their entire virtual machine infrastructure. This allows them to pinpoint performance bottlenecks in areas other than just the Web server, and ultimately enhances the overall performance of their website. This will lead to increased customer satisfaction.
While embodiments of the invention have been illustrated and described, it is not intended that these embodiments illustrate and describe all possible forms of the invention. Rather, the words used in the specification are words of description rather than limitation, and it is understood that various changes may be made without departing from the spirit and scope of the invention.