The instant disclosure relates to resource management systems and methods.
Resource management systems (RMSs) monitor events and transactions that occur in computer resources of an enterprise and take actions to improve the performance and accountability of the enterprise. The computer resources typically include different types of servers, databases, telecommunications equipment, and other devices that perform particular functions in the enterprise. The computer resources are typically located in a data center. The servers that are found in a typical enterprise data center vary in type and include web servers, application servers, email servers, proxy servers, domain name system (DNS) servers, and other types of servers. By monitoring transactions as they occur in the enterprise, a RMS can determine whether resources are operating properly and efficiently, and if not, take actions to allocate or re-purpose resources in a way that increases the efficiency and productivity of the enterprise, and/or that enables a recovery to be made in the event that a resource failure has occurred.
A typical RMS monitors transactions being performed by computer resources of the enterprise to obtain measurements relating to their performance. These measurements are commonly referred to as metrics. A typical RMS includes a resource management server that runs a resource management software program that is designed to obtain and analyze particular metrics. The metrics that are monitored and acted upon by a RMS can typically be varied by making changes to the resource management software program. System-level metrics that are typically monitored include central processing unit (CPU) utilization, random access memory (RAM) usage, disk input/output (I/O) performance, and network I/O performance. Application-level metrics that are typically monitored include response time metrics, Structured Query Language (SQL) calls metrics, and Enterprise JavaBeans (EJB) calls metrics.
An example of the manner in which the CPU usage metric is monitored and acted upon by a typical RMS is as follows. For this example, it will be assumed that the enterprise includes a farm of application servers that perform operations associated with accounts payable tasks and a farm of application servers that perform operations associated with accounts receivable tasks. The RMS monitors transactions being performed on these servers and determines that the loads on the CPUs of the accounts payable servers are relatively low and that the loads on the CPU of the accounts receivable servers are relatively high. The relatively high CPU loads on the accounts receivable servers may result in the accounts receivable tasks being performed relatively slowly. The relatively low CPU loads on the accounts payable servers indicate that the accounts payable servers are being under-utilized. In this scenario, a typical RMS will determine that the loads on the CPUs of the accounts receivable servers are too high and that the accounts payable servers are being under-utilized. In response to this determination, the RMS will re-allocate the processing loads among the servers by re-purposing one or more of the accounts payable servers to be used in performing some share of the accounts receivable tasks.
An example of the manner in which an application-level metric is monitored and acted upon by a typical RMS is as follows. For this example, it is assumed that the enterprise is an E-commerce enterprise in which goods or services are sold and funds are transferred digitally online over a public network such as the Internet or over some private network to which users can obtain access. The checkout process is controlled by an application server that executes a software program that performs tasks associated with the checkout process. A different application server executes a software program that performs a verification process if, during the checkout process, the checkout application server detects that the user has entered a discount code. The user places items in an online shopping cart and attempts to checkout by clicking on a submit button. The website, however, appears not to be responding. Consequently, the user becomes frustrated and decides to purchase the items on a different website. At a later point in time, the RMS traces the transaction and finds that the delay was caused due to verification process taking a very long time to verify the discount code. After further analysis, the RMS determines that a table that is used by the verification software program is missing an index, and that the missing index caused a delay in the verification process. The RMS then causes the index to be inserted into the table to prevent delays in the future.
RMSs generally may be classified as being one of two types, namely, (1) response-time RMSs or (2) call-analysis RMSs. In response-time RMSs, the only metrics that are monitored and analyzed are timing metrics. One timing metric that is often used measures the amount of time that passes between an instant in time when the user clicks a submit button on his or her web browser to an instant in time when the corresponding web server receives the submission. Another timing metric that is often used measures the amount of time that passes between an instant in time when the corresponding web server receives the submission to an instant in time when the corresponding application server receives the submission. Another timing metric that is often used measures the amount of time that passes from an instant in time when the corresponding application server receives the submission to an instant in time when the corresponding database server receives the submission. In other words, response-time RMSs monitor metrics relating to the timing of hops from one server to the next when servicing a transaction. However, run-time RMSs do not provide information relating to the underlying methods that are performed when servicing a transaction. Rather, the underlying methods are essentially “black boxes” in that the details associated with the performance of the methods are not provided.
In call-analysis RMSs, the metrics that are monitored and analyzed relate to measurements associated with the performance of methods that have been called during a transaction. These call metrics provide information about each method that has been called and about which method triggered any other method during the transaction. These types of RMSs are not used to monitor and manage resources in real-time, but are used to debug enterprise resources offline (i.e., in non-real-time). The reason for this is that monitoring call metrics in real-time will typically slow down the transaction, which degrades the experience for the user. Consequently, it is seen as impractical to implement call-analysis RMSs that monitor and analyze call metrics in real-time.
A Java enterprise resource management (JERM) system and method are provided. In accordance with an embodiment, the JERM system comprises at least a first server located on a client side of a network. The first server includes one or more processing devices configured to run at least a first application computer software program, at least a first metrics gatherer computer software program, at least a first metric serializer and socket generator computer software program, and at least a first JERM agent computer software program. The first metrics gatherer program monitors and gathers at least a first metric relating to one or more transactions performed by the first application program at run-time while the first application program is running. The first metric serializer and socket generator program performs a serialization algorithm that converts the gathered first metric into a first serial byte stream and generates a first communications socket. The first server includes at least a first input/output (I/O) communications port configured to implement a client-side end of the first communications socket for outputting the first serial byte stream from the first server. The first serial byte stream is communicated over the first communications socket between the I/O communications port of the first server and an I/O communications port of a JERM management server located on a server side of the network.
The JERM method, in accordance with an embodiment, comprises configuring at least a first server on a client side of a network to run at least a first application computer software program, at least a first metrics gatherer computer software program, at least a first metric serializer and socket generator computer software program, and at least a first JERM agent computer software program. The method further includes running the first application program on the first server to cause at least a first transaction to be performed. While the first application program is running, the first metrics gatherer program runs to monitor and gather at least a first metric relating to at least the first transaction. The method further includes running the first metric serializer and socket generator program to perform a serialization algorithm that converts the gathered first metric into a first serial byte stream and generates a first communications socket over which the serial byte stream is communicated to a server side of the network.
In accordance with another embodiment, a computer-readable medium (CRM) is provided that has instructions stored thereon for performing a JERM method on a client side of a network. The instructions that are stored on the CRM comprises at least a first application computer code portion, at least a first metrics gatherer computer code portion, at least a first metric serializer and socket generator computer code portion, and at least a first JERM agent computer code portion. Running the first application computer code portion on a first server located on the client side of the network causes at least a first transaction to be performed. While the first application computer code portion is running on the first server, the first metrics gatherer program runs on the first server to monitor and gather at least a first metric relating to at least the first transaction performed by the first server. Running of the first metric serializer and socket generator computer code portion on the first server causes a serialization algorithm and socket generation algorithm to be performed. The serialization algorithm converts the gathered first metric into a first serial byte stream, and the socket generation algorithm causes a first communications socket to be generated over which the serial byte stream is communicated to a server side of the network. When the first JERM agent computer code portion runs on the first server, it detects whether one or more commands have been received by the first JERM agent computer code portion from the server side of the network. If the first JERM agent computer code portion detects that one or more commands have been received from the server side, the first JERM agent computer code portion causes the first server to perform one or more actions consistent with the one or more commands received from the server side of the network. The production server may be the first server or some other server located on the client side of the network.
These and other features and advantages will become apparent from the following description, drawings and claims.
In accordance with an embodiment, a Java enterprise resource management (JERM) system is provided that combines attributes of run-time RMSs and call-analysis RMSs to allow both timing metrics and call metrics to be monitored in real-time, and which can cause appropriate actions to be taken in real-time. The JERM system provides a level of granularity with respect to the monitoring of methods that are triggered during a transaction that is equivalent to or better than that which is currently provided in the aforementioned known call-analysis RMSs. In addition, the JERM system also provides information associated with the timing of hops that occur between servers, and between and within applications, during a transaction. Because all of this information is obtained in real-time, the JERM system is able to respond in real-time, or near real-time, to cause resources to be allocated or re-allocated in a way that provides improved efficiency and productivity, and in a manner that enables the enterprise to quickly recover from resource failures. In addition, the JERM system is a scalable solution that can be widely implemented with relative ease and that can be varied with relative ease in order to meet a wide variety of implementation needs. The following description of the drawings describes illustrative embodiments of the JERM system and method.
The application program 2 that is run by the Production Server 1 may be virtually any Java Enterprise Edition (Java EE) program that performs one or more methods associated with a transaction, or all methods associated with a transaction. During run-time while the application program 2 is being executed, the metrics gathering program 10 monitors the execution of the application program 2 and gathers certain metrics. The metrics that are gathered depend on the manner in which metrics gathering program 10 is configured. A user interface (UI) 90 is capable of accessing the production server 1 to modify the configuration of the metrics gathering program 10 in order to add, modify or remove metrics. Typical system-level metrics that may be gathered include CPU utilization, RAM usage, disk I/O performance, and network I/O performance. Typical application-level metrics that may be gathered include response time metrics, SQL call metrics, and EJB call metrics. It should be noted, however, that the disclosed system and method are not limited with respect to the type or number of metrics that may be gathered by the metrics gathering program 10.
In the illustrated embodiment, metrics that are gathered by the metrics gathering program 10 are provided to the metrics serializer and socket generator (MSSG) software program 20. The MSSG program 20 serializes each metric into a serial byte stream and generates a communications socket that will be used to communicate the serial byte stream to the JERM Management Server 40 located on the server side 120 of the JERM system 100. The serial byte stream is then transmitted over the socket 80 to the JERM Management Server 40. The socket 80 is typically a Transmission Control Protocol/Internet Protocol (“TCP/IP”) socket that provides a bidirectional communications link between an I/O port of the Production Server 1 and an I/O port of the JERM Management Server 40.
In the illustrated embodiment, the JERM Management Server 40 runs various computer software programs, including, but not limited to, a metrics deserializer computer software program 50, a rules manager computer software program 60, and an actions manager computer software program 70. The metrics deserializer program 50 receives the serial byte stream communicated via the socket 80 and performs a deserialization algorithm that deserializes the serial byte stream to produce a deserialized metric. The deserialized metric comprises parallel bits or bytes of data that represent the metric gathered on the client side 110 by the metrics gathering program 10. The deserialized metric is then received by the rules manager program 60. The rules manager program 60 analyzes the deserialized metric and determines whether a rule exists that is to be applied to the deserialized metric. If a determination is made by the rules manager program 60 that such a rule exists, the rules manager program 60 applies the rule to the deserialized metric and makes a decision based on the application of the rule. The rules manager program 60 then sends the decision to the actions manager program 70. The actions manager program 70 analyzes the decision and decides if one or more actions are to be taken. If so, the actions manager program 70 causes one or more actions to be taken by sending a command to the Production Server 1 on the client side 110, or to some other server (not shown) on the client side 110. As stated above, there may be multiple instances of the Production Server 1 on the client side 110, so the action that is taken may be directed at a different server (not shown) on the client side 110.
In accordance with an embodiment, each Production Server 1 on the client side 110 runs the JERM agent software program 30. For ease of illustration, only a single Production Server 1 is shown in
An example of an action that scales out another physical instance is an action that causes another Production Server 1 to be brought online or to be re-purposed. By way of example, without limitation, in the scenario given above in which the processing loads on the CPUs of the accounts receivable servers are too high, the rules manager program 60 may process the respective CPU load metrics for the respective accounts receivable servers, which correspond to Production Servers 1, and decide that the CPU loads are above a threshold limit defined by the associated rule. The rules manager program 60 will then send this decision to the actions manager program 70. The actions manager program 70 will then send commands to one or more JERM agent programs 30 running of one or more accounts payable servers, which also correspond to Production Servers 1, instructing the JERM agent programs 30 to cause their respective servers to process a portion of the accounts receivable processing loads. The actions manager program 70 also sends commands to one or more JERM agent programs 30 of one or more of the accounts receivable servers instructing those agents 30 to cause their respective accounts receivable servers to offload a portion of their respective accounts receivable processing loads to the accounts payable servers.
An example where the action taken by the actions manager program 70 is the scaling out of one or more virtual instances is as follows. Assuming that the application program 2 running on the Production Server 1 is a particular application program, such as the checkout application program described above, the actions manager program 70 may send a command to the JERM agent program 30 that instructs the JERM agent program 30 to cause the Production Server 1 to invoke another instance of the checkout application program so that there are now two instances of the checkout application program running on the Production Server 1.
In the same way that the actions manager program 70 scales out additional physical and virtual instances, the actions manager program 70 can reduce the number and types of physical and virtual instances that are scaled out at any given time. For example, if the rules manager program 60 determines that the CPU loads on a farm of accounts payable servers are low (i.e., below a threshold limit), indicating that the serves are being under-utilized, the actions manager program 70 may cause the processing loads on one or more of the accounts payable Production Servers 1 of the farm to be offloaded onto one or more of the other accounts payable Production Servers 1 of the farm to enable the Production Servers 1 from which the loads have been offloaded to be turn off or re-purposed. Likewise, the number of virtual instances that are running can be reduced based on decisions that are made by the rules manager program 60. For example, if the Production Server 1 is running multiple Java virtual machines (JVMs), the actions manager 70 may reduce the number of JVMs that are running on the Production Server 1. The specific embodiments described above are intended to be exemplary, and the disclosed system and method should not be interpreted as being limiting to these embodiments or the descriptions thereof.
The application program 240 may be any program that performs one or more methods associated with a transaction, or that performs all methods associated with a transaction. During run-time while the application program 240 is being executed, the metrics gathering program 250 monitors the execution of the application program 240 and gathers certain metrics. The metrics that are gathered depend on the manner in which the metrics gathering program 250 is configured. In accordance with this embodiment, the metrics gathering program 250 gathers metrics by aspecting JBoss interceptors. JBoss is an application server program for use with Java EE and EJBs. An EJB is an architecture for creating program components written in the Java programming language that run on the server in a client/server model. An interceptor, as that term is used herein, is a programming construct that is inserted between a method and an invoker of the method, i.e., between the caller and the callee. The metrics gathering program 250 injects, or aspects, JBoss interceptors into the application program 240. The JBoss interceptors are configured such that, when the application program 240 runs at run-time, timing metrics and call metrics are gathered by the interceptors. This feature enables the metrics to be collected in real-time without significantly affecting the performance of the application program 240.
A UI 410, which is typically a graphical UI (GUI) enables a user to interact with the metrics gatherer program 250 to add, modify or remove metrics so that the user can easily change the types of metrics that are being monitored and gathered. Typical system-level metrics that may be gathered include CPU utilization, RAM usage, disk I/O performance, and network I/O performance. Typical application-level metrics that may be gathered include response time metrics, SQL call metrics, and EJB call metrics. It should be noted, however, that the disclosed system and method are not limited with respect to the type or number of metrics that may be gathered by the metrics gathering program 250.
The client MBean program 260 receives the metrics gathered by the JBoss interceptors of the metrics gathering program 250 and performs a serialization algorithm that converts the metrics into a serial byte stream. An MBean is an object in the Java programming language that is used to manage applications, services or devices, depending on the class of the MBean that is used. The client MBean program 260 also sets up an Internet socket 280 for the purpose of communicating the serial byte stream from the client side 210 to the server side 220. The metrics are typically sent from the client side 210 to the server side 220 at the end of a transaction that is performed by the application program 240.
The server side 220 includes a JERM Management Server 310, which is configured to run a server MBean computer software program 320, a JERM rules manager computer software program 330, and a JERM actions manager computer software program 370. The server MBean program 320 communicates with the client MBean program 260 via the socket 280 to receive the serial byte stream. The server MBean program 320 performs a deserialization algorithm that deserializes the serial byte stream to convert the byte stream into parallel bits or bytes of data representing the metrics. The JERM rules manager program 330 analyzes the deserialized metric and determines whether a rule exists that is to be applied to the deserialized metric. If a determination is made by the rules manager program 330 that such a rule exists, the rules manager program 330 applies the rule to the deserialized metric and makes a decision based on the application of the rule. The rules manager program 330 then sends the decision to a JERM rules manager proxy computer software program 360, which formats the decision into a web service request and sends the web service request to the JERM actions manager program 370.
The JERM actions manager program 370 is typically implemented as a web service that is requested by the JERM rules manager proxy program 360. The JERM actions manager program 370 includes an action decider computer program 380 and an instance manager program 390. The actions decider program 380 analyzes the request and decides if one or more actions are to be taken. If so, the actions decider program 380 sends instructions to the instance manager program 390 indicating one or more actions that need to be taken. In some embodiments, the instance manager program 390 has knowledge of all of the physical and virtual instances that are currently running on the client side 210, and therefore can make the ultimate decision on the type and number of physical and/or virtual instances that are to be scaled out and/or scaled in on the client side 210. Based on the decision that is made by the instance manager program 390, the JERM actions manager program sends instructions via one or more of the communications links 330 to one or more corresponding JERM agent programs 270 of one or more of the Production Servers 230 on the client side 210.
Each Production Server 230 on the client side 210 runs a JERM agent program 270. For ease of illustration, only a single Production Server 230 is shown in
The UI 410 also connects to the JERM rules manager program 330 and to the JERM actions manager program 370. In accordance with this embodiment, the JERM rules manager program 330 is actually a combination of multiple programs that operate in conjunction with one another to perform various tasks. One of these programs is a rules builder program 350. A user interacts via the UI 410 with the rules builder program 350 to cause rules to be added, modified or removed from a rules database, which is typically part of the rules builder program 350, but may be external to the rules builder program 350. This feature allows a user to easily modify the rules that are applied by the JBoss rules applier program 340.
The connection between the UI 410 and the JERM actions manager program 370 enables a user to add, modify or remove the types of actions that the JERM actions manager 370 will cause to be taken. This feature facilitates the scalability of the JERM system 200. Over time, changes will typically be made to the client side 210. For example, additional resources (e.g., servers, application programs and/or devices) may be added to the client side 210 as the enterprise grows. Also, new resources may be substituted for older resources, for example, as resources wear out or better performing resources become available. Through interaction between the UI 410 and the JERM actions manager program 370, changes can be made to the instance manager program 390 to reflect changes that are made to the client side 210. By way of example, without limitation, the instance manager program 390 typically will maintain one or more lists of (1) the total resources by type, network address and purpose that are employed on the client side 210, (2) the types, purposes and addresses of resources that are available at any given time, and (3) the types, purposes and addresses of resources that are in use at any given time. As resource changes are made on the client side 210, a user can update the lists maintained by the instance manager program 390 to reflect these changes.
Without limitation, some of the important features that enable the JERM system 200 to provide improved performance over known RMSs of the type described above include: (1) the use of interceptors by the metrics gatherer program 250 to gather metrics without affecting the performance of a transaction while it is being performed by the application program 240: (2) the use of the client MBean program 260 to convert the metrics into serial byte streams and send the serial byte stream over a TCP/IP socket 280 to the server side 220; and (3) the use of the server MBean program 320 to deserialize the byte stream received over the socket 280. These features enable the JERM rules manager program 330 to quickly apply rules to the metrics as they are gathered in real-time and enable the JERM actions manager 370 to take actions in real-time, or near real-time, to allocate and/or re-purpose resources on the client side 210.
Another feature of some embodiments is that the metrics gatherer program 250 can be easily modified by a user, e.g., via the UI 410. Such modifications enable the user to update and/or change the types of metrics that are being monitored by the metrics gatherer program 250. This feature provides great flexibility with respect to the manner in which resources are monitored, which, in turn, provides great flexibility in deciding actions that need to be taken to improve performance on the client side 210 and taking those actions.
Another feature present in some embodiments is that certain functionality on the client side 210 and on the server side 220 is implemented with a client-side work chain and with a server-side work chain, respectively. For example, in one embodiment, the client-side work chain comprises only the functionality that performs the serialization and socket generation programs that are wrapped in the client MBean 260. In one embodiment, the server-side work chain comprises the functionality for performing the socket communication and deserialization algorithms are wrapped in the server MBean 320, and the functionality for performing the algorithms of the rules manager program 330. These work chains operate like assembly lines, and parts of the work chains can be removed or altered to change the behavior of the JERM system 200 without affecting the behavior of the application program 240. Essentially, the work chains are configured in XML, and therefore, changes can be made to the work chains in XML, which tends to be an easier task than modifying programs written in other types of languages which are tightly coupled.
For example, the following XML code corresponds to the client-side work chain in accordance with the embodiment referred to above in which the client-side work chain only comprises the functionality corresponding to the serialization and socket generation programs that are wrapped in the client MBean 260.
The client-side work chain can be easily modified to include an audit algorithm that logs information to a remote log identifying any processes that have interacted with the data being processed through the work chain. Such a modification may be made by adding the following audit <queue> fragment to the XML code listed above:
Consequently, in accordance with this example, the XML code for the entire client-side work chain would look as follows:
Likewise, the rules builder program 350 can be easily modified by a user, e.g., using the UI 410. This enables a user to easily make changes to the JERM rules manager program 330. Additionally, the entire behavior of the JERM rules server 310 can be modified by simply modifying XML code of the server-side work chain. Such ability enhances flexibility, ease of use, and scalability.
For example, an archiver computer software program (not shown) could be added to the JERM management server 310 to perform archiving tasks, i.e., logging of metrics data. To accomplish this, a <queue> fragment similar to the audit <queue> fragment that was added above to the client-side work chain is added to the server-side work chain at a location in the work chain following the rules manager code represented by block 330 in
The combination of all of these features makes the JERM system 200 a superior RMS over known RMSs in that the JERM system 200 has improved scalability, improved flexibility, improved response time, improved metrics monitoring granularity, and improved action taking ability over what is possible with known RMSs. As indicated above, the JERM system 200 is capable of monitoring, gathering, and acting upon both timing metrics and call metrics, which, as described above, is generally not possible with existing RMSs. As described above, existing RMSs tend to only monitor, gather, and act upon either timing metrics or call metrics. In addition, existing RMSs that monitor, gather, and act upon call metrics generally do not operate in real-time because doing so would adversely affect the performance of the application program that is performing a given transaction. By contrast, not only is the JERM system 200 capable of monitoring, gathering, and acting upon timing metrics and call metrics, but it is capable of doing so in real-time, or near real-time.
As indicated above with reference to
As described above with reference to
It should be noted that the disclosed system and method have been described with reference to illustrative embodiments to demonstrate principles and concepts, and features that may be advantageous in some embodiments. The disclosed system and method are not intended to be limited to these embodiments, as will be understood by persons of ordinary skill in the art in view of the description provided herein. A variety of modifications can be made to the embodiments described herein, and all such modifications are within the scope of the instant disclosure, as will be understood by persons of ordinary skill in the art.