SYSTEM AND METHOD FOR CONDITIONAL CALL PATH MONITORING IN A DISTRIBUTED TRANSACTIONAL MIDDLEWARE ENVIRONMENT

Information

  • Patent Application
  • 20190089803
  • Publication Number
    20190089803
  • Date Filed
    May 09, 2018
    6 years ago
  • Date Published
    March 21, 2019
    5 years ago
Abstract
In accordance with an embodiment, described herein is a system and method for conditional call path monitoring in a distributed transactional middleware environment. A cache can be provided in local memory, for use by an agent in the reporting and aggregation of call path metrics. When the agent collects such metrics, it does not report them immediately to a system and application monitor (SAM) manager (e.g., Tuxedo System and Application Monitor, TSAM), but instead stores them in the cache, indexed by correlation ID (identifier). When a predefined condition is met at a participating node, that node propagates a corresponding correlation ID to other participating nodes, via the SAM manager. The other participating nodes can then search for the correlation ID in the cache, and report to the SAM manager metrics of call paths which meet the condition.
Description
COPYRIGHT NOTICE

A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.


FIELD OF INVENTION

Embodiments of the invention are generally related to computing environments, including transactional middleware environments, and are particularly related to conditional call path monitoring in a distributed transactional middleware environment.


BACKGROUND

In a distributed transactional middleware or other computing environment, the execution of a transaction often involves multiple processing nodes, such that the execution path (call path) of the transaction can be complicated, requiring computationally expensive reporting and aggregation of call path metrics. These are some examples of the types of environments in which embodiments of the invention can be used.


SUMMARY

In accordance with an embodiment, described herein is a system and method for conditional call path monitoring in a distributed transactional middleware environment. A cache can be provided in local memory, for use by an agent in the reporting and aggregation of call path metrics. When the agent collects such metrics, it does not report them immediately to a system and application monitor (SAM) manager (e.g., Tuxedo System and Application Monitor, TSAM), but instead stores them in the cache, indexed by correlation ID (identifier). When a predefined condition is met at a participating node, that node propagates a corresponding correlation ID to other participating nodes, via the SAM manager. The other participating nodes can then search for the correlation ID in the cache, and report to the SAM manager metrics of call paths which meet the condition.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 illustrates a distributed transactional middleware environment which enables conditional call path monitoring, in accordance with an embodiment.



FIG. 2 illustrates a process for conditional call path monitoring, in accordance with an embodiment.





DETAILED DESCRIPTION

As described above, in a distributed transactional middleware environment, the execution of a transaction often involves multiple processing nodes, such that the execution path (call path) of the transaction can be complicated, requiring computationally expensive reporting and aggregation of call path metrics.


One approach to analyzing the execution of a complicated transaction is to deploy an agent on each node, to collect and report transaction execution details to a centralized manager, which then aggregates the information reported by participating agents, and generates an overall call path.


However, in many situations, a system administrator may only be interested in call paths that are abnormal, and such expensive call path reporting and calculation may not be required when the system is running fine.


In accordance with an embodiment, described herein is a system and method for conditional call path monitoring in a distributed transactional middleware environment. A cache can be provided in local memory, for use by an agent in the reporting and aggregation of call path metrics. When the agent collects such metrics, it does not report them immediately to a system and application monitor (SAM) manager (e.g., Tuxedo System and Application Monitor, TSAM), but instead stores them in the cache, indexed by correlation ID (identifier). When a predefined condition is met at a participating node, that node propagates a corresponding correlation ID to other participating nodes, via the SAM manager. The other participating nodes can then search for the correlation ID in the cache, and report to the SAM manager metrics of call paths which meet the condition.


Transactional Middleware Environment

Transactional middleware environments, for example computing environments such as Oracle Tuxedo environments, are widely used by enterprises to develop and use mission-critical applications, including acting as an infrastructure layer in distributed computing environments.


In accordance with an embodiment, a transactional middleware environment can use a system and application monitor (SAM, examples of which include the Tuxedo System and Application Monitor, TSAM, TSAM Plus), or a similar component or functionality, to provide monitoring and reporting for the system and applications, including, for example, the monitoring of real-time performance bottlenecks and business data fluctuations.


In accordance with an embodiment, a SAM agent enables collection of application performance metrics, such as, for example, call path, transactions, services, and system servers.


In accordance with an embodiment, a SAM manager provides an interface, such as a graphical user interface, that correlates and aggregates performance metrics collected from one or more domains, and displays this information in real time.



FIG. 1 illustrates a distributed transactional middleware environment which enables conditional call path monitoring, in accordance with an embodiment.


As illustrated in FIG. 1, in accordance with an embodiment, a system can comprise one or more computer(s) 100 including at least one microprocessor(s) 101, and a distributed transactional middleware environment 102 provided thereon.


In accordance with an embodiment, a SAM manager can be provided as a web container 104 that enables a SAM console 106, and SAM data server 108, and utilizes a database 110 for storage of data.


In accordance with an embodiment, each of a plurality of processing nodes (e.g., Tuxedo nodes) A 120 and B 130, can include a local monitor server (LMS) 122, 132, and one or more application (e.g., Tuxedo) processes 124, 134, each of which is associated with a SAM agent 125, 135, having a SAM framework 126, 136, and SAM plug-in 127, 137.


In accordance with an embodiment, a cache 138, 139 enables use of, for example, a ring buffer, for storage of conditional call path metrics data.


System and Application Monitor (SAM) Agent

In accordance with an embodiment, the SAM agent handles back-end logic, works in conjunction with the SAM manager, and includes the following sub-components:


SAM framework: a data collection engine operating as a layer between, e.g., a Tuxedo infrastructure and other SAM components, and responsible for run time metrics collection, alert evaluation, and monitoring policy enforcement.


SAM plug-in: an extensible mechanism invoked by the SAM framework. The SAM agent provides plug-ins to send data to a LMS, and then to the SAM manager.


Local monitor server (LMS): a server process, e.g., a Tuxedo system server, that enables SAM plug-ins to send data to the LMS, which then passes the data to the SAM manager using, e.g., HTTP protocol.


System and Application Monitor (SAM) Manager

In accordance with an embodiment, the SAM manager includes the following components:


SAM data server: the data server is responsible for accepting data from an LMS and storing it to a database; accepting requests from a presentation layer; and communicating with the LMS for configuration instructions.


SAM console: a presentation layer which can be accessed via a compatible Web browser.


In accordance with an embodiment, the SAM manager runs in a Java application server, for example a WebLogic Server instance, and can use a relational database to store information such as, for example: performance metrics collected by the SAM agent; process (e.g., Tuxedo) component information; user account information; and alerts and events.


Call Path Monitoring

Transactional middleware environments are often used by a client program or application that calls a service to perform a business computing logic scenario, wherein the service implementation is completely transparent to the caller. This type of middleware transparency provides benefits for development, deployment, and system administration. However, from a monitoring perspective, it may be difficult for an administrator to determine what happens behind the scene, which can be addressed using call path monitoring.


In accordance with an embodiment, an application call triggers a set of service invocations. The involved services constitute a tree (call path tree), which defines factors such as:


The type of services that are involved to perform the initial service request.


The service invocation depth (the depth of the call path tree).


The service invocation sequence, for example, client A calls SVC1. SVC1 calls SVC2 and SVC3.


Call transportation—the edge (how information is sent and received) of a call path tree represents the transportation information from caller to service provider, for example, an IPC queue, or BRIDGE connection.


Call path metrics—which are available during the message propagation, such as the message size, execution status, transaction and CPU consumption.


In accordance with an embodiment, a monitoring initiator is a process that initiates tracking a call path tree. The process can be, for example, a Tuxedo client, application server, client proxy server (WSH/JSH), a Tuxedo domain gateway server or web services proxy server. A typical scenario is when a tpcall/tpacall/tpconnect is invoked by the monitoring initiator, and call path monitoring begins.


Service Monitoring

In accordance with an embodiment, service monitoring can be used, e.g., to determine a Tuxedo service execution status.


System Server Monitoring

In accordance with an embodiment, examples of system servers can include, e.g., BRIDGE: connects multiple Tuxedo machines within a Tuxedo domain; GWTDOMAIN: connects one Tuxedo domain with others; and GVWVS: a web services gateway.


Transaction Monitoring

In accordance with an embodiment, transaction monitoring can be used, e.g., to track XA calls triggered in a transaction.


Policy Monitoring

In accordance with an embodiment, policy monitoring enables policy monitoring settings to be used to collect the exact metrics needed with minimum application performance impact.


Performance Metrics

In accordance with an embodiment, a correlation ID is a unique identifier that represents a call path tree. It can be generated by a monitoring initiator plug-in, and use the following format:


DOMAINID:MASTERHOSTNAME:IPCKEY LMID PROCESSNAME PID TID COUNTER TIMESTAMP

An example of a Correlation ID is shown below, wherein the monitored call is started by the program “bankclient” with process ID 8089 and thread ID 1 on machine “SITE1” on Tuxedo domain “TUXDOM1”. The master is “bjsoll8” and IPCKEY in TUXCONFIG is “72854”.


TUXDOM1:bjsol18:72854 SITE1 bankclient 8089 1 99 1259309025


In accordance with an embodiment, examples of metrics include:

  • Service Name: the name of a Tuxedo service.
  • Location: a set of metrics to identify the process who sends out the performance metrics, for example, information about domain, machine, group and process name.
  • IPC Queue Length: the message number in an IPC queue.
  • IPC Queue ID: a Tuxedo identifier of an IPC queue.
  • Execution Time: the time used in a Tuxedo service or XA call execution in milliseconds.
  • Wait Time: the time used of a message in the transportation stage.
  • CPU Time: the CPU time consumed by the service request processing.
  • Message Size: the Tuxedo message size.
  • Execution Status: a tpreturn service return code, defined by the Tuxedo ATMI interface.
  • Call Flags: the flags passed to tpcall/tpacall in the Tuxedo ATMI interface.
  • Call Type: tpcall, tpacall, or tpforward.
  • Elapse Time: the time elapsed time a call is monitored.
  • GTRID: a Tuxedo global transaction ID.
  • Pending Message Number: the number of messages which are delivered to the Tuxedo network layer and waiting for being sent.
  • Message Throughput: the total message number and volume accumulated in, e.g., system server monitoring intervals.
  • Waiting Reply Message Number: the number of requests in GWTDOMAIN awaiting a reply from the remote domain.
  • XA Code: the XA call return code in transaction monitoring.
  • XA Name: the XA call name.
  • GVWVS Metrics: a set of metrics used to measure GVWVS throughput, including: Inbound Message Throughput; Inbound Message Processing Time; Outbound Message Throughput; Outbound Message Processing Time;


Conditional Call Path Monitoring

As described above, from a system or application monitoring perspective, it may be difficult for an administrator to determine what happens behind the scene, which can be addressed using call path monitoring.


In accordance with an embodiment, described herein is a system and method for conditional call path monitoring in a distributed transactional middleware environment. A cache can be provided in local memory, for use by an agent in the reporting and aggregation of call path metrics. When the agent collects such metrics, it does not report them immediately to a SAM manager, but instead stores them in the cache, indexed by correlation ID (identifier). When a predefined condition is met at a participating node, that node propagates a corresponding correlation ID to other participating nodes, via the SAM manager. The other participating nodes can then search for the correlation ID in the cache, and report to the SAM manager metrics of call paths which meet the condition.


Plug-In Support for Conditional Call Path Monitoring

In accordance with an embodiment, a shared memory ring buffer can be provided as a data structure to store conditional call path segments. The SAM plug-in can put the conditional call path segments to this ring buffer.


In accordance with an embodiment, a field ConditionFilter (the definition of the condition) can used, for example by adding it to a Tuxedo typed container module (META_TCM) for conditional call path metrics. The condition filter can be propagated along with the call path metrics, so that the participant of the call path can use it to evaluate the condition. The ConditionFilter field can be removed from META_TCM when the condition is met.


In accordance with an embodiment, a condition can be evaluated at different hook points in the SAM plug-in.


In accordance with an embodiment, regardless of whether the call path condition is met or not, the conditional call path segment can be reported to the conditional call path ring buffer. If the condition is met, additional information is added to the call path segment, for example: matched flag: indicate this is a condition met call path segment; prevN/nextN: indicate the previous/next N call paths should be reported; cross-machine flag: added if the metrics is reported by BRIDGE or GWTDOMAIN.


LMS Support for Conditional Call Path Monitoring

In accordance with an embodiment, the local monitor server (LMS) attaches to a ring buffer (Ring buffer) for the conditional call path metrics, and reads the call path metrics from it periodically.


In accordance with an embodiment, a conditional call path index table (Index table) facilitates the search for call path segments in the ring buffer for a certain correlation ID. An element of the index table includes the following fields: corr_id: the correlation ID of the call path segment; address: the address of the segment in the ring buffer. Both corr_id and address are indexed and sorted. The index table can be provided as an internal data structure within the local monitor server.


In accordance with an embodiment, a correlation ID can be set for a condition met call path (Matched Corr_id set). A data structure stores the correlation IDs for the call paths which met the condition. The element includes the following fields: corr_id: the correlation id which met the condition; prevN/nextN: the previous/next N call paths should be reported as well. Is_propagated: whether this corr_id is propagated from other node via the SAM manager; time_stamp: the time when the element is added. This can be provided as an internal data structure within the local monitor server.


In accordance with an embodiment, a list of call path segment to be reported to SAM manager (Output Segment list) can be provided as an internal data structure within the local monitor server.


Conditional Call Path Monitoring Interface

In accordance with an embodiment, to enable conditional call path monitoring, one or more interfaces can be provided, for example:


retrieveData1: get N segments from the read pointer of the ring buffer.


retrieveData2: get a segment at an address from the ring buffer.


cpi_put: put a (corr_id, segment_address) pair to the index table, optionally purge the invalid elements in the table.


cpi_get: get a list of addresses for all the segments of a corr_id from the index table, optionally including the segments for the previous/next N corr_id.


outputSeg: put a segment to the Output segment list.



FIG. 2 illustrates a process for conditional call path monitoring, in accordance with an embodiment.


As illustrated in FIG. 2, in accordance with an embodiment, a thread is added to the local monitor server to process the conditional call path, which includes, at step 150, periodically:


Call retrieveData1 to get N call path segments from the ring buffer.


At step 152, the process continues, in accordance with an embodiment:


If no segment is returned:


Sleep X ms.


At step 154, the process continues, in accordance with an embodiment:


Else for each segment:


If the matched flag is not set for the segment:


Call cpi_put to put the (corr_id, address) pair to the index table with purge range.


Else:

    • Call outputSeg to put the segment to the output segment list.
    • Append the (corr_id, prevN, nextN) element to the matched corr_id set.
    • Call cpi get to get the addresses of all related segments from the index table (with prevN and nextN).
    • For each address:
    • Call retrieveData2 to get the segment at the address from the ring buffer.
    • If the corr_id is still valid (consider prevN/nextN as well):
      • If the corr_id is prevN/nextN and the segment has cross-machine flag:
        • Add matched flag to the segment.
        • Call outputSeg to put the segment to the output segment list.


At step 156, the process continues, in accordance with an embodiment:


For each element in the matched corr_id set:


If expired:

    • Remove me from the set.


Else:

    • Call cpi_get to get the addresses of all related segments from the index table (with prevN and nextN).
    • For each address:
      • Call retrieveData2 to get the segment at the address from the ring buffer.
      • If the corr_id is still valid (consider prevN/nextN as well):
        • If the element is not propagated and the corr_id is prevN/nextN and the segment has cross-machine flag:
          • Add matched flag to the segment.


Call outputSeg to put the segment to the output segment list.


The above is provided by way of example to illustrate an embodiment of a process for conditional call path monitoring. In accordance with other embodiments, other types of steps can be used


In accordance with an embodiment, the system gets call path segments from the output segment list, and reports to the SAM manager. The SAM manager will then propagate the cross-machine condition met correlation ID (corr_id, prevN, nextN) to all LMS. When receiving the propagated (corr_id, prevN, nextN), the local monitor server adds it to the matched corr_id set.


In accordance with an embodiment, the condition met call path segment includes various fields (matched flag, prevN, nextN, cross-machine flag). When the SAM manager receives a cross-machine condition met call path segment, it propagates the correlation ID (corr_id, prevN, nextN) to all local monitor servers which connect to the current SAM manager data server.


The present invention may be conveniently implemented using one or more conventional general purpose or specialized digital computer, computing device, machine, or microprocessor, including one or more processors, memory and/or computer readable storage media programmed according to the teachings of the present disclosure. Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art.


In some embodiments, the present invention includes a computer program product which is a non-transitory storage medium or computer readable medium (media) having instructions stored thereon/in which can be used to program a computer to perform any of the processes of the present invention. The storage medium can include, but is not limited to, any type of disk including floppy disks, optical discs, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.


The foregoing description of embodiments of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations will be apparent to the practitioner skilled in the art. The modifications and variations include any relevant combination of the disclosed features. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention for various embodiments and with various modifications that are suited to the particular use contemplated.

Claims
  • 1. A system for conditional call path monitoring in a distributed transactional middleware environment, comprising: one or more computers including at least one microprocessor and a distributed transactional middleware environment provided thereon; anda system and application monitor manager that enables conditional call path monitoring.
  • 2. The system of claim 1, wherein the distributed transactional middleware environment includes a plurality of nodes, a plurality of agents associated with the plurality of nodes, and a cache for use in at least one of reporting or aggregation of call path metrics, including wherein an agent collects the call path metrics, and stores the call path metrics in the cache, indexed by correlation IDs, and p2 whereupon a predefined condition being met at a participating node, the participate node propagates a corresponding correlation ID to other participating nodes, via a system and application monitor manager; andwherein the other participating nodes search for the correlation ID in the cache, and report, to the system and application monitor manager, metrics of call paths which meet the condition.
  • 3. The system of claim 2, wherein the distributed transactional middleware environment is a Tuxedo environment.
  • 4. The system of claim 3, wherein the system and application monitor manager is one of a Tuxedo System and Application Monitor (TSAM) component, or Tuxedo System and Application Monitor Plus (TSAM Plus) component.
  • 5. A method for conditional call path monitoring in a distributed transactional middleware environment, comprising: providing, at one or more computers including at least one microprocessor, a distributed transactional middleware environment; andproviding a system and application monitor manager that enables conditional call path monitoring.
  • 6. The method of claim 5, wherein the distributed transactional middleware environment includes a plurality of nodes, a plurality of agents associated with the plurality of nodes, and a cache for use in at least one of reporting or aggregation of call path metrics, including wherein an agent collects the call path metrics, and stores the call path metrics in the cache, indexed by correlation IDs, andwhereupon a predefined condition being met at a participating node, the participate node propagates a corresponding correlation ID to other participating nodes, via a system and application monitor manager; andwherein the other participating nodes search for the correlation ID in the cache, and report, to the system and application monitor manager, metrics of call paths which meet the condition.
  • 7. The method of claim 6, wherein the distributed transactional middleware environment is a Tuxedo environment.
  • 8. The method of claim 7, wherein the system and application monitor manager is one of a Tuxedo System and Application Monitor (TSAM) component, or Tuxedo System and Application Monitor Plus (TSAM Plus) component.
  • 9. A non-transitory computer readable storage medium, including instructions stored thereon which when read and executed by one or more computers cause the one or more computers to perform a method comprising: providing a distributed transactional middleware environment; andproviding a system and application monitor manager that enables conditional call path monitoring.
  • 10. The non-transitory computer readable storage medium of claim 9, wherein the distributed transactional middleware environment includes a plurality of nodes, a plurality of agents associated with the plurality of nodes, and a cache for use in at least one of reporting or aggregation of call path metrics, including wherein an agent collects the call path metrics, and stores the call path metrics in the cache, indexed by correlation IDs, andwhereupon a predefined condition being met at a participating node, the participate node propagates a corresponding correlation ID to other participating nodes, via a system and application monitor manager; andwherein the other participating nodes search for the correlation ID in the cache, and report, to the system and application monitor manager, metrics of call paths which meet the condition.
  • 11. The non-transitory computer readable storage medium of claim 10, wherein the distributed transactional middleware environment is a Tuxedo environment.
  • 12. The non-transitory computer readable storage medium of claim 11, wherein the system and application monitor manager is one of a Tuxedo System and Application Monitor (TSAM) component, or Tuxedo System and Application Monitor Plus (TSAM Plus) component.
  • 13. The system of claim 1, wherein the corresponding correction ID is a unique identifier that represents a call path tree, and that is generated by a monitoring initiator plug-in.
  • 14. The system of claim 1, wherein the call path metrics include one or more of a service name, a process that sends the call path metrics, an IPC queue length, an execution time, a wait time, a CUP time, a message size, and an execution status.
  • 15. The system of claim 1, wherein the predefined condition is propagated along with the call path metrics, for use by one or more of the other participating nodes to evaluate the predefined condition.
  • 16. The method of claim 5, wherein the corresponding correction ID is a unique identifier that represents a call path tree, and that is generated by a monitoring initiator plug-in.
  • 17. The method of claim 5, wherein the call path metrics include one or more of a service name, a process that sends the call path metrics, an IPC queue length, an execution time, a wait time, a CUP time, a message size, and an execution status.
  • 18. The method of claim 5, wherein the predefined condition is propagated along with the call path metrics, for use by one or more of the other participating nodes to evaluate the predefined condition.
  • 19. The non-transitory computer readable storage medium of claim 10, wherein the corresponding correction ID is a unique identifier that represents a call path tree, and that is generated by a monitoring initiator plug-in.
  • 20. The non-transitory computer readable storage medium of claim 10, wherein the call path metrics include one or more of a service name, a process that sends the call path metrics, an IPC queue length, an execution time, a wait time, a CUP time, a message size, and an execution status.
CLAIM OF PRIORITY

This application claims the benefit of priority to International Application titled “SYSTEM AND METHOD FOR CONDITIONAL CALL PATH MONITORING IN A DISTRIBUTED TRANSACTIONAL MIDDLEWARE ENVIRONMENT”, International Application No. PCT/CN2017/071091, filed Jan. 13, 2017, which application is herein incorporated by reference in its entirety.

Continuations (1)
Number Date Country
Parent PCT/CN2017/071091 Jan 2017 US
Child 15975557 US