Benefit is claimed under 35 U.S.C. 119(a)-(d) to Foreign Application Serial No. 202241021896 filed in India entitled “PROCESS TREE-BASED PROCESS MONITORING IN ENDPOINTS”, on Apr. 12, 2022, by VMware, Inc., which is herein incorporated in its entirety by reference for all purposes.
The present disclosure relates to computing environments, and more particularly to methods, techniques, and systems for monitoring processes in an endpoint based on a process tree.
In application/operating system (OS) monitoring environments, a management node that runs a monitoring tool may communicate with multiple endpoints to monitor the endpoints. For example, an endpoint may be implemented in a physical computing environment, a virtual computing environment, or a cloud computing environment. Further, the endpoints may execute different applications via virtual machines (VMs), physical host computing systems, containers, and the like. In such environments, the management node may communicate with the endpoints to collect performance data/metrics (e.g., application metrics, operating system metrics, and the like) from underlying operating system and/or services on the endpoints for storage and performance analysis (e.g., to detect and diagnose issues).
The drawings described herein are for illustration purposes and are not intended to limit the scope of the present subject matter in any way.
Examples described herein may provide an enhanced computer-based and/or network-based method, technique, and system to monitor processes in an endpoint based on a process tree in a computing environment. Computing environment may be a physical computing environment (e.g., an on-premises enterprise computing environment or a physical data center), a virtual computing environment (e.g., a cloud computing environment, a virtualized environment, and the like), or a hybrid of both.
The virtual computing environment may be a pool or collection of cloud infrastructure resources designed for enterprise needs. The resources may be a processor (e.g., central processing unit (CPU)), memory (e.g., random-access memory (RAM)), storage (e.g., disk space), and networking (e.g., bandwidth). Further, the virtual computing environment may be a virtual representation of the physical data center, complete with servers, storage clusters, and networking components, all of which may reside in a virtual space being hosted by one or more physical data centers. The virtual computing environment may include multiple physical computers executing different workloads (e.g., virtual machines, containers, and the like). Such workloads may execute different types of applications.
The paragraphs [0011] to [0016] are an overview of endpoint performance monitoring in computing environments, existing methods to monitor performance of the endpoint, and drawbacks associated with the existing methods. Performance monitoring of endpoints (e.g., physical host computing systems, virtual machines, software defined data centers (SDDCs), containers, and/or the like) has become increasingly important because performance monitoring may aid in troubleshooting (e.g., to rectify abnormalities or shortcomings, if any) the endpoints, provide better health of data centers, analyse the cost, capacity, and/or the like. An example performance monitoring tool or application or platform may be VMware® vRealize Operations (vROps), VMware® Wavefront™, Grafana, and the like.
Further, the endpoints may include monitoring agents (e.g., Telegraf™, Collectd, Micrometer, and the like) to collect the performance metrics from the respective endpoints and provide, via a network, the collected performance metrics to a remote collector (e.g., an application remote collector (ARC) or a cloud proxy). Furthermore, the remote collector may receive the performance metrics from the monitoring agents and transmit the performance metrics to the monitoring tool for metric analysis. A remote collector may refer to a service/program that is installed in an additional cluster node (e.g., a virtual machine). The remote collector may allow the monitoring tool (e.g., vROps Manager) to gather objects into the remote collector's inventory for monitoring purposes. The remote collectors collect the data from the endpoints and then forward the data to the management node that executes the monitoring tool. For example, remote collectors may be deployed at remote location sites while the monitoring tool may be deployed at a primary location.
Furthermore, the monitoring tool may receive the performance metrics, analyse the received performance metrics, and display the analysis in a form of dashboards, for instance. The displayed analysis may facilitate in visualizing the performance metrics and diagnose a root cause of issues, if any. In some computing environments, the monitoring tools (e.g., vROps) may be deployed and run in on-premises platform to collect data from the endpoints via the remote collectors. The term “on-premises” may refer to a software and a hardware infrastructural setup (e.g., associated with the monitoring tools) deployed and running from within the confines of an organization/enterprise.
In other computing environments, the monitoring tools (e.g., vROps) may be deployed and run-in cloud platforms (e.g., Software as a service (SaaS) platforms) to collect data from the endpoints via cloud proxies. The SaaS is a software distribution model in which a cloud provider hosts applications and makes the applications available to end users over the Internet. In an example on-premises platform, an application remote collector (ARC) or a cloud proxy is a type of remote collector that the monitoring tool (e.g., vROps) uses to collect metrics of applications running in endpoints (e.g., virtual machines) using monitoring agents. In an example SaaS platform, the cloud proxy is a type of remote collector that the monitoring tool (e.g., vROps) uses to collect metrics of applications running in endpoints (e.g., virtual machines) using monitoring agents.
In such scenarios, the remote collector (e.g., the ARC or the cloud proxy) with the help of a custom monitoring agent (e.g., Telegraf agent) supports specific applications to be monitored. Application monitoring with the help of the custom monitoring agent may ensure that the applications maintain levels of performance needed to support business outcomes. For example, the custom monitoring agent may obtain performance metrics for certain enterprise applications (e.g., curated applications supported by the monitoring agent) running on endpoints. However, the custom monitoring agent may not support other non-curated applications running on the endpoints. Thus, the customer may have to switch to an open-source monitoring agent for monitoring the non-curated applications.
Further, the custom monitoring agent may not support monitoring custom Linux processes (e.g., instances of executing a program or command on Linux® operating system), custom Windows services associated with Microsoft Windows® operating system, or both, that run on the endpoint. Furthermore, the custom monitoring agent may not be able to provide insight into parent and child processes associated with the applications. The information associated with the parent and child processes facilitate to debug memory, central processing unit (CPU), and/or availability issues of the applications. Moreover, even though existing monitoring approach generates an alert when an issue occurs, a user may have to manually check for a process of the application, which is causing the issue.
Examples described herein may provide an endpoint to enable a monitoring agent to monitor processes in the endpoint based on a process tree. The endpoint may include a monitoring agent and a process tree generation unit in communication with the monitoring agent. During operation, the process tree generation unit may receive a command to monitor an input process running in the endpoint. Further, the process tree generation unit may download a process tree creation script from a remote collector upon receiving the command. Furthermore, the process tree generation unit may execute the process tree creation script to generate a configuration file for the monitoring agent. The configuration file may include a process tree indicating a hierarchical relationship of the input process with other processes. Further, the process tree generation unit may enable the monitoring agent to monitor the processes in the process tree based on the configuration file.
Examples described herein generate the process tree/process map to monitor the parent and child processes within a boundary which affects the performance of the input process. Further, examples described herein enable to granularly monitor any application process and/or service, and to obtain a performance peek into any running process/service in the endpoint.
In the following description, for purposes of explanation, numerous specific details are set forth to provide a thorough understanding of the present techniques. It will be apparent, however, to one skilled in the art that the present apparatus, devices, and systems may be practiced without these specific details. Reference in the specification to “an example” or similar language means that a particular feature, structure, or characteristic described is included in at least that one example, but not necessarily in other examples.
System Overview and Examples of Operation
As shown in
Further, system 100 includes a management node 114 to manage the data center. For example, management node 114 may execute centralized management services that may be interconnected to manage the resources centrally in the virtualized computing environment. Example centralized management service may be a part of vCenter Server™ and vSphere® program products, which are commercially available from VMware. In an example, management node 114 may be communicatively connected to the data center via a network to manage the data center. An example network can be a managed Internet protocol (IP) network administered by a service provider. For example, the network may be implemented using wireless protocols and technologies, such as Wi-Fi, WiMAX, and the like. In other examples, the network can also be a packet-switched network such as a local area network, wide area network, metropolitan area network, Internet network, or other similar type of network environment. In yet other examples, the network may be a fixed wireless network, a wireless local area network (LAN), a wireless wide area network (WAN), a personal area network (PAN), a virtual private network (VPN), intranet or other suitable network system and includes equipment for receiving and transmitting signals.
As shown in
As shown in
Considering an example of an on-premises platform, second endpoint 106 may host monitoring agent 108 (e.g., Telegraf agent) for metric collection, a service discovery agent (e.g., a unified collection proxy (UCP) minion agent) for service discovery, and a configuration manager (e.g., a Salt minion) for control actions. The Telegraf agent and the UCP minion agent of the data plane may publish metrics to the EMQTT message broker running in the ARC. Further, the Salt minion of the control plane may communicate with a Salt master running in the ARC. Further, control commands such as updating the agents, starting/stopping the agents, and the like may be carried out via the Salt minions upon the request of the Salt master.
In an example SaaS platform, second endpoint 106 may host monitoring agent 108 (e.g., Telegraf Agent) for metric collection, a UCP minion agent for service discovery, and a Salt minion for control actions. Further, the Telegraf agent and the UCP minion of the data plane may publish metrics to the Apache HTTPD web server running in the cloud proxy. Furthermore, the Salt minion of the control plane may communicate with the Salt master running in the cloud proxy.
Further, second endpoint 106 includes process tree generation unit 110 in communication with monitoring agent 108. During operation, process tree generation unit 110 may receive a command to monitor an input process running in second endpoint 106. In an example, process tree generation unit 110 may receive the command to monitor an operating system process (e.g., a Linux process), an operating system service (e.g., Windows service), or both running in the second endpoint. In another example, process tree generation unit 110 may receive the command to monitor the process or service associated with an application running in the second endpoint. Upon receiving the command, process tree generation unit 110 may download a process tree creation script 112 from remote collector 104 or from an external server.
Furthermore, process tree generation unit 110 may execute process tree creation script 112 to generate a configuration file for monitoring agent 108. For example, the configuration file may store configuration information related to a type of diagnostic information that needs to be collected using a plugin. In this example, the configuration file may include a process tree indicating a hierarchical relationship of the input process with other processes. In an example, process tree generation unit 110 may execute process tree creation script 112 to iteratively retrieve a plurality of process identifiers associated with a plurality of processes having the hierarchical relationship with the input process until a stop condition is met. The stop condition may include determining a child process having no children, determining a parent process as a parent of all operating system processes (e.g., if the parent process is an initialization process in Linux operating system having a process identifier 1), determining a parent process as an operating system service execution process (e.g., if the parent process is a services.exe process in windows), identifying a parent or child process crossing a boundary/range beyond which the parent or child process unlikely to affect a performance of the input process (e.g., a docker container daemon process, a Kubernetes pod, or the like), or any combination thereof. Further, the configuration file for the monitoring agent may be generated based on the retrieved process identifiers.
In an example, process tree generation unit 110 generates the configuration file including the process tree to monitor parent and child processes in the boundary which affects a performance of the input process. The boundary can be user-defined. In another example, the boundary can be automatically determined depending on whether the process to be monitored is running in a docker container, a Kubernetes pod, or any virtual environment. Further, process tree generation unit 110 may enable monitoring agent 108 to monitor the processes in the process tree based on the configuration file. An example configuration file is depicted in
In an example, monitoring agent 108 may collect performance metrics associated with the processes in the process tree. The performance metrics may include a hierarchical level of each process in the process tree. Further, the performance metrics associated with each process in the process tree may include a process identifier, a parent process identifier, a hierarchical level, a search pattern to group each process in the process tree to the input process, or any combination thereof. Further, monitoring agent 108 may transmit the collected performance metrics and information associated with the processes in the process tree to the remote collector.
Furthermore, remote collector 104 may receive the performance metrics associated with the processes in the process tree. Further, remote collector 104 may construct a relationship tree indicating a hierarchical relationship of the input process with other processes using the hierarchical level of each process. An example relationship tree is depicted in
Examples described herein generate the configuration file for monitoring agent 108, which may facilitate in monitoring any native applications running in second endpoint 106 (e.g., by providing a feature to monitor any Linux process and Windows service). For example, for monitoring a Linux process, a user can provide the process's bin name, a pid file path, or a command line regex. Further, for monitoring a Windows service, the user can provide a service name from a services.msc utility. Further, examples described herein enable to monitor non-curated applications by monitoring component processes and services of the non-curated applications, which may provide an insight into the non-curated applications' performance. Thus, examples described herein facilitate in obtaining “availability”, “CPU usage”, and “memory usage” metrics for any Linux process and Windows service running on second endpoint 106.
In some examples, the functionalities described in
In the example configuration file of
In the example shown in
At 402, a command to monitor an input process running in an endpoint may be received. For example, a user may initiate a plugin activation of processes or services by providing an input such as a process bin name, command line regex/pid file path, or service name. In an example, for monitoring a Linux process, a procstat plugin may be used in a Telegraf agent. For monitoring a Windows service, a combination of Telegraf's procstat and Windows services plugins may be used.
At 404, a process tree creation script may be downloaded upon receiving the command. At 406, the process tree creation script may be executed to generate a configuration file for a monitoring agent running in the endpoint. In an example, the configuration file includes a process tree indicating a hierarchical relationship of the input process with other processes. In some examples, the configuration file may be generated in addition to a default configuration file of the monitoring agent. Further, the configuration file may be stored in a default configuration folder of the monitoring agent.
In an example, the hierarchical relationship of the processes in the process tree of the configuration file may be validated. The process tree creation script may be invoked to generate the configuration file with a modified process tree reflecting a change in the process in response to detecting the change in a process of the process tree. Further, the monitoring agent may be enabled to monitor the processes in the process tree upon validating the hierarchical relationship.
For example, the validation script may be executed as part of UCP minion at every metric collection interval to recreate the process tree when a change in a process of the process tree is detected. In an example, the process tree creation script may obtain the process identifier of the mentioned Linux and/or Windows processes iteratively till a stop condition mentioned below is met to obtain the process identifiers of the parent and all children processes of the input process. The stop condition for the process tree may include at least one of:
Further, in case of a Linux endpoint, for each of the parent and child processes, additional blocks of Telegraf plugin entries (e.g., inputs.procstat plugins) may be appended to the new Telegraf configuration file created at custom Telegraf configuration directory default location (e.g.,/opt/vmware/ucp/ucp-telegraf/etc/telegraf/telegraf.d). For Windows endpoints, the same function may be achieved by using a combination of Windows services plugins (e.g., inputs.win_services) and Telegraf plugins (e.g., inputs.win_perf_counters).
At 408, the monitoring agent may be enabled to monitor the processes in the process tree based on the configuration file. In an example, enabling the monitoring agent to monitor the processes in the process tree including collecting performance metrics associated with the processes in the process tree and transmitting the collected performance metrics and information associated with the processes in the process tree to the remote collector. The performance metrics associated with each process in the process tree may include a process identifier, a parent process identifier, a hierarchical level, a search pattern to group each process in the process tree to the input process, or any combination thereof. The remote collector may construct a relationship tree indicating a hierarchical relationship of the input process with other processes using the hierarchical level of each process and the relationship tree. Further, the remote collector may send the relationship tree and the performance metrics associated with each process in the process tree to a monitoring application for performance analysis of the input process.
Computer-readable storage medium 504 may store instructions 506, 508, 510, and 512. Instructions 506 may be executed by processor 502 to receive a command to monitor an input process running in an endpoint. Instructions 508 may be executed by processor 502 to download a process tree creation script upon receiving the command.
Instructions 510 may be executed by processor 502 to execute the process tree creation script to generate a configuration file for a monitoring agent running in the endpoint. The configuration file may include a process tree indicating a hierarchical relationship of the input process with other processes. In an example, instructions 510 to execute the process tree creation script to generate the configuration file include instructions to generate the configuration file including the process tree to monitor parent and child processes in a boundary which affects a performance of the input process. In other examples, instructions 510 to execute the process tree creation script to generate the configuration file may include instructions to iteratively retrieve a plurality of process identifiers associated with a plurality of processes having the hierarchical relationship with the input process until a stop condition is met.
Instructions 512 may be executed by processor 502 to enable the monitoring agent to monitor the processes in the process tree based on the configuration file. Further, computer-readable storage medium 504 may store instructions to collect performance metrics associated with the processes in the process tree and transmit the collected performance metrics and information associated with the processes in the process tree to a remote collector. For example, the information associated with the processes include a process identifier, a parent process identifier, a hierarchical level, a search pattern to group each process in the process tree to the input process, or any combination thereof.
Some or all of the system components and/or data structures may also be stored as contents (e.g., as executable or other machine-readable software instructions or structured data) on a non-transitory computer-readable medium (e.g., as a hard disk; a computer memory; a computer network or cellular wireless network or other data transmission medium; or a portable media article to be read by an appropriate drive or via an appropriate connection, such as a DVD or flash memory device) so as to enable or configure the computer-readable medium and/or one or more host computing systems or devices to execute or otherwise use or provide the contents to perform at least some of the described techniques.
It may be noted that the above-described examples of the present solution are for the purpose of illustration only. Although the solution has been described in conjunction with a specific embodiment thereof, numerous modifications may be possible without materially departing from the teachings and advantages of the subject matter described herein. Other substitutions, modifications and changes may be made without departing from the spirit of the present solution. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive.
The terms “include,” “have,” and variations thereof, as used herein, have the same meaning as the term “comprise” or appropriate variation thereof. Furthermore, the term “based on”, as used herein, means “based at least in part on.” Thus, a feature that is described as based on some stimulus can be based on the stimulus or a combination of stimuli including the stimulus.
The present description has been shown and described with reference to the foregoing examples. It is understood, however, that other forms, details, and examples can be made without departing from the spirit and scope of the present subject matter that is defined in the following claims.
Number | Date | Country | Kind |
---|---|---|---|
202241021896 | Apr 2022 | IN | national |