Benefit is claimed under 35 U.S.C. 119(a)-(d) to Foreign Application Serial No. 202241042074 filed in India entitled “METHODS AND APPARATUS TO IMPROVE MANAGEMENT OPERATIONS OF A CLOUD COMPUTING ENVIRONMENT”, on Jul. 22, 2022, by VMware, Inc., which is herein incorporated in its entirety by reference for all purposes.
This disclosure relates generally to cloud computing environments and, more particularly, to methods and apparatus to improve management of a cloud computing environment.
Computing environments often include many virtual and physical computing resources. For example, software-defined data centers (SDDCs) are data center facilities in which many or all elements of a computing infrastructure (e.g., networking, storage, CPU, etc.) are virtualized and delivered as a service. The computing environments often include management resources for facilitating management of the computing environments and the computing resources included in the computing environments. Some of these management resources include the capability to automatically monitor computing resources and generate alerts when compute issues are identified. Additionally or alternatively, the management resources may be configured to provide recommendations for responding to generated alerts. In such examples, the management resources may identify computing resources experiencing issues and/or malfunctions and may identify methods or approaches for remediating the issues. Recommendations may provide an end user(s) (e.g., an administrator of the computing environment) with a list of instructions or a series of steps that the end user(s) can manually perform on a computing resource(s) to resolve the issue(s). Although the management resources may provide recommendations, the end user(s) is responsible for implementing suggested changes and/or performing suggested methods to resolve the compute issues.
The figures are not to scale. Instead, the thickness of the layers or regions may be enlarged in the drawings. As used herein, connection references (e.g., attached, coupled, connected, and joined) may include intermediate members between the elements referenced by the connection reference and/or relative movement between those elements unless otherwise indicated. As such, connection references do not necessarily infer that two elements are directly connected and/or in fixed relation to each other. As used herein, stating that any part is in “contact” with another part is defined to mean that there is no intermediate part between the two parts.
Unless specifically stated otherwise, descriptors such as “first,” “second,” “third,” etc., are used herein without imputing or otherwise indicating any meaning of priority, physical order, arrangement in a list, and/or ordering in any way, but are merely used as labels and/or arbitrary names to distinguish elements for ease of understanding the disclosed examples. In some examples, the descriptor “first” may be used to refer to an element in the detailed description, while the same element may be referred to in a claim with a different descriptor such as “second” or “third.” In such instances, it should be understood that such descriptors are used merely for identifying those elements distinctly that might, for example, otherwise share a same name. As used herein, “approximately” and “about” refer to dimensions that may not be exact due to manufacturing tolerances and/or other real world imperfections. As used herein “substantially real time” refers to occurrence in a near instantaneous manner recognizing there may be real world delays for computing time, transmission, etc. Thus, unless otherwise specified, “substantially real time” refers to real time+/−1 second. As used herein, the phrase “in communication,” including variations thereof, encompasses direct communication and/or indirect communication through one or more intermediary components, and does not require direct physical (e.g., wired) communication and/or constant communication, but rather additionally includes selective communication at periodic intervals, scheduled intervals, aperiodic intervals, and/or one-time events. As used herein, “processor circuitry” is defined to include (i) one or more special purpose electrical circuits structured to perform specific operation(s) and including one or more semiconductor-based logic devices (e.g., electrical hardware implemented by one or more transistors), and/or (ii) one or more general purpose semiconductor-based electrical circuits programmed with instructions to perform specific operations and including one or more semiconductor-based logic devices (e.g., electrical hardware implemented by one or more transistors). Examples of processor circuitry include programmed microprocessors, Field Programmable Gate Arrays (FPGAs) that may instantiate instructions, Central Processor Units (CPUs), Graphics Processor Units (GPUs), Digital Signal Processors (DSPs), XPUs, or microcontrollers and integrated circuits such as Application Specific Integrated Circuits (ASICs). For example, an XPU may be implemented by a heterogeneous computing system including multiple types of processor circuitry (e.g., one or more FPGAs, one or more CPUs, one or more GPUs, one or more DSPs, etc., and/or a combination thereof) and application programming interface(s) (API(s)) that may assign computing task(s) to whichever one(s) of the multiple types of the processing circuitry is/are best suited to execute the computing task(s).
Virtual computing services enable one or more assets to be hosted within a computing environment. As disclosed herein, an asset is a computing resource (physical or virtual) that may host a wide variety of different applications such as, for example, an email server, a database server, a file server, a web server, etc. Example assets include physical hosts (e.g., non-virtual computing resources such as servers, processors, computers, etc.), virtual machines, containers that run on top of a host operating system without the need for a hypervisor or separate operating system, hypervisor kernel network interface modules, etc. In some examples, an asset may be referred to as a compute node, an end-point, a data computer end-node or as an addressable node.
Virtual machines operate with their own guest operating system on a host using resources of the host virtualized by virtualization software (e.g., a hypervisor, virtual machine monitor, etc.). Numerous virtual machines can run on a single computer or processor system in a logically separated environment (e.g., separated from one another). A virtual machine can execute instances of applications and/or programs separate from application and/or program instances executed by other virtual machines on the same computer.
Management applications (e.g., cloud management such as vRealize® Automation Cloud Assembly) provide administrators visibility into the condition of assets in a computing environment (e.g., a data center). Administrators can inspect the assets, see the organizational relationships of a virtual application, filter log files, overlay events versus time, manage the lifecycle of the assets in the computing environment, troubleshoot during mission critical issues, etc. In some examples, an application may install one or more plugins (sometimes referred to herein as “agents”) at the asset to perform monitoring operations. For example, a first management application may install a first monitoring agent at an asset to track an inventory of physical resources and logical resources in a computing environment, a second management application may install a second monitoring agent at the asset to provide real-time log management of events, analytics, etc., and a third management application may install a third monitoring agent to provide operational views of trends, thresholds and/or analytics of the asset, etc.
In some systems (e.g., such as vRealize® Automation), a user and/or administrator may set up and/or create a cloud account (e.g., a Google® cloud platform (GCP) account, a network security virtualization platform (NSX) account, a VMware® cloud foundation (VCF) account, a vSphere® account, etc.) to connect a cloud provider and/or a private cloud so that the management applications can collect data from regions of datacenters. Additionally, cloud accounts allow a user and/or administrator to deploy and/or provision cloud templates to the regions. A cloud template is a file that defines a set of resources. The cloud template may utilize tools to create server builds that can become standards for cloud applications. A user and/or administrator can create cloud accounts for projects in which other users (e.g., team members) work. The management applications periodically perform checks on the cloud accounts to verify that the accounts are healthy (e.g., the credentials are valid, the connectivity is acceptable, the account is accessible, etc.).
For efficient operation between a management application and a monitoring agent at an asset, the system hosting the management application and the asset need to be connected and stay connected (until a user decides that the monitoring agent is no longer needed). In some examples, when an issue with the connectivity between system and the asset occurs, there is no way for a user (e.g., a system administrator, an end user, etc.) to know until attempting to access (e.g., obtain information from) the asset. In such an example, the management application will not collect data and, thus, will not enable troubleshooting during mission critical issues, correcting of any software issues that arise during execution, nor enable life cycle management capabilities to the applications running on the assets.
Examples disclosed herein provide users (e.g., system administrators, end users, etc.) with access to a connectivity status between a management application and one or more monitoring agents. For example, examples disclosed herein include circuitry that monitors the connectivity between the system hosting the management application and respective assets hosting the monitoring agents. Examples disclosed herein provide users with an ability to rectify the connection in an example where a connection has been terminated. For example, examples disclosed herein include rectification circuitry that identifies how the connection was terminated and uses that information to reestablish the connection between the management application and respective asset(s).
The example cloud proxy 102 of
The example cloud management circuitry 104 of
The example cloud management circuitry 104 of
A user and/or administrator may set up and/or create a cloud account (e.g., a Google® cloud platform (GCP) account, a network security virtualization platform (NSX) account, a VMware® cloud foundation (VCF) account, a vSphere® account, etc.) to connect a cloud provider and/or a private cloud so that the cloud management circuitry 104 of
The example configuration circuitry 112 of
The example resource platform(s) 106 of
The example compute nodes 114a-d are computing resources that may execute operations within the example computing environment 100. The example compute nodes 114a-d are illustrated as virtual computing resources managed by the example manager 116 (e.g., a hypervisor) executing within the example host 118 (e.g., an operating system) on the example physical resources 120. The example compute nodes 114a-d may, alternatively, be any combination of physical and virtual computing resources. For example, the compute nodes 114a-d may be any combination of virtual machines, containers, and physical computing resources.
Virtual machines operate with their own guest operating system on a host (e.g., the example host 118) using resources of the host virtualized by virtualization software (e.g., a hypervisor, virtual machine monitor, etc.) (e.g., the example manager 116). Numerous virtual machines can run on a single computer or processor system in a logically separated environment (e.g., separated from one another). A virtual machine can execute instances of applications and/or programs separate from application and/or program instances executed by other virtual machines on the same computer.
In some examples, containers are virtual constructs that run on top of a host operating system (e.g., the example compute nodes 114a-d executing within the example host 118) without the need for a hypervisor or a separate guest operating system. Containers can provide multiple execution environments within an operating system. Like virtual machines, containers also logically separate their contents (e.g., applications and/or programs) from one another, and numerous containers can run on a single computer or processor system. In some examples, utilizing containers, a host operating system uses namespaces to isolate containers from each other to provide operating-system level segregation of applications that operate within each of the different containers. For example, the container segregation may be managed by a container manager (e.g., the example manager 116) that executes with the operating system (e.g., the example compute node 114a-d executing on the example host 118). This segregation can be viewed as a form of virtualization that isolates different groups of applications that operate in different containers. In some examples, such containers are more lightweight than virtual machines. In some examples, a container OS may execute as a guest OS in a virtual machine. The example compute nodes 114a-d may host a wide variety of different applications such as, for example, an email server, a database server, a file server, a web server, etc. In the example of
The example manager(s) 116 of
The example host(s) 118 of
The example physical resource(s) 120 of
The example network 108 of
The example client interface(s) 110 of
In
There are many components (e.g., compute nodes(s) 114a-d, manager(s) 116, hos(s) 118, physical resource(s) 120, keys, configuration circuitry 112, etc.) involved in executing the application monitoring service that communicate over the example network 108. Any issues with any of the components would disrupt the connectivity between the primary agent 122 and secondary agent(s) 124a-d and, thus, would disrupt jobs, tasks, activities, etc., planned by the user and/or administrator. Conventionally, the cloud management circuitry 104 has not provided users with an option to show the status of the connectivity. However, in examples disclosed herein, the cloud management circuitry 104 implements methods and apparatus to not only provide the status of the connectivity between the primary agent 122 and the secondary agent 124a-d, but also an option to rectify the connection when a connectivity issue is identified.
In the example data flow diagram 200, the example cloud management circuitry 104 executes a first step 202 that triggers an agent install. For example, the cloud management circuitry 104 may receive an instruction via the client interface(s) 110 of
In the example data flow diagram 200, the example cloud proxy 102 executes a second step 204 that installs the secondary agent with input plugins to collect operating system metrics. As used herein, the secondary agent is an application monitoring agent installed on a compute node (e.g., compute node(s) 114a-d) that is controlled and/or receives instructions from a primary agent. To execute the second step 204, the cloud proxy 102 downloads the bootstrap bundle and provides the certificates and keys to the compute node(s) 114a-d to install the secondary agent.
In the example data flow diagram 200, the example cloud proxy 102 executes a third step 206 that installs the primary agent (e.g., primary agent 122 of
In the data flow diagram 200, the example compute node(s) 114a-d execute a fourth step 208 that runs (e.g., executes, starts, etc.) a test of the monitoring service to find a number of metrics per collection cycle. For example, the compute node(s) 114a-d may trigger the secondary agent, in response to an installation request from the cloud proxy 102, to collect metrics corresponding to applications running at the compute node(s) 114a-d. In some examples, this test assists the configuration circuitry 112 to configure the secondary agent. For example, the secondary agent is initially not informed on what metrics are to be collected and how many metrics are to be collected. Therefore, the compute node(s) 114a-d execute the test to configure buffer(s) and/or memory at the configuration circuitry 112 and/or at the cloud management circuitry 104 to store a particular size (e.g., bytes) of metrics. As used herein, metrics may include CPU metrics (e.g., idle measurement, busy measurement, processing measurement, etc.), memory metrics (e.g., total bytes, percentage of memory used, percentage of unused memory available for processes, etc.), disk and partition metrics (e.g., average input/output (TO) utilization, writes per second, etc.), load metrics (e.g., CPU load, presented as an average over the last 1 minute, 5 minutes, etc.), and/or network metrics (e.g., volume of data received by all monitored network devices, number of packets received, number of outgoing packets, etc.). Any other type of available metrics may be collected by the secondary agent and provided to the cloud management circuitry 104.
In the example data flow diagram 200, the example compute node(s) 114a-d executes a fifth step 210 that updates a metric buffer limit value based on the test run of the monitoring service. For example, the secondary agent, hosted by the compute node(s) 114a-d, identify a number of metrics to be stored in a buffer of the compute node(s) 114a-d and update the metric buffer limit value to reflect the identified number. In some examples, the metric buffer limit value is to be used to configure the secondary agent.
In the example data flow diagram 200, the example compute node(s) 114a-d execute a sixth step 212 to restart the monitoring service. For example, the secondary agent restarts the monitoring service after the metric buffer limit value is identified. In some examples, the secondary agent restarts the monitoring service because the configuration of the secondary agent changed during the fifth step 210. For example, an initial configuration of the secondary agent may have defined a metric buffer limit value as some pre-determined value, not representative of the actual amount of metrics that are to be collected. Therefore, an updated configuration of the secondary agent requires the restarting of the monitoring service to properly collect metrics from the compute node(s) 114a-d.
In the example data flow diagram 200, the example compute node(s) 114a-d executes a seventh step 214 that collects service discovery metrics and provides them to the cloud proxy 102. For example, the secondary agent collects metrics from the compute node(s) 114a-d in response to restarting the monitoring service. The secondary agent provides the metrics to the primary agent.
In the example data flow diagram 200, the example cloud proxy 102 executes an eighth step 216 that provides a list of applications discovered at the compute node(s) 114a-d to the cloud management circuitry 104. For example, the primary agent, hosted by the configuration circuitry 112, utilizes the metrics obtained from the secondary agent to determine what applications are running at the compute node(s) 114a-d. As such, the primary agent provides the list of applications to the cloud management circuitry 104 for displaying at the client interface(s) 110.
In some examples, after installation and an initial starting of the application monitoring service, the cloud management circuitry 104 can be controlled by a user and/or an administrator to perform a number of different operations, jobs, tasks, etc.
The example cloud management circuitry 104 of
The example interface 302 of
The example installation circuitry 304 installs secondary agents 124a-d at the compute nodes 114a-d and connects the secondary agent(s) to the primary agent 122 installed on the example configuration circuitry 112 of
In some examples, the installation circuitry 304 includes means for installing agents. For example, the means for determining may be implemented by installation circuitry 304. In some examples, the installation circuitry 304 may be instantiated by processor circuitry such as the example processor circuitry 812 of
The example user interface update circuitry 306 of
In some examples, the user interface update circuitry 306 includes means for updating user interface(s) and/or means for instructing user interface(s) to display connectivity statuses. For example, the means for updating may be implemented by user interface update circuitry 306. In some examples, the user interface update circuitry 306 may be instantiated by processor circuitry such as the example processor circuitry 812 of
The example connectivity determination circuitry 308 of
In some examples, the connectivity determination circuitry 308 includes means for identifying a connectivity issue and/or means for determining connectivity statuses. For example, the means for identifying may be implemented by connectivity determination circuitry 308. In some examples, the connectivity determination circuitry 308 may be instantiated by processor circuitry such as the example processor circuitry 812 of
The example rectification circuitry 310 of
In some examples, the rectification circuitry 310 implements a two-part process to rectify and/or resolve a connectivity issue. The first part of the two-part process includes rectifying the primary agent 122. For example, the rectification circuitry 310 ensures the primary agent 122 is operational (e.g., up and running) at the configuration circuitry 112. In some examples, if there is an issue with the primary agent 122, the rectification circuitry 310 restarts and/or reconfigures the primary agent 122. The example rectification circuitry 310 verifies the operating state of the primary agent 122 before proceeding to the second part of the two-part process. The second part of the two-part process includes rectifying the secondary agent(s) 124a-d. For example, the rectification circuitry 310 ensures that the secondary agent(s) 124a-d is operational (e.g., up and running) at the compute node(s) 114a-d. In some examples, the rectification circuitry 310 reconfigures the authentication between the secondary agent(s) 124a-d and the primary agent 122. For example, the rectification circuitry 310 may reconfigure the cryptographic keys (e.g., the secondary private key, the secondary public key, and the primary public key), uninstall the secondary agent(s) 124a-d, reinstall the secondary agent(s) 124a-d, and utilize the reconfigured keys to restart the operation of the secondary agent(s) 124a-d. In some examples, upon restart, the secondary agent(s) 124a-d reconnect to the primary agent 122. In some examples, the rectification circuitry 310 reconfigures the cryptographic keys because the keys are corrupted. In some examples, the rectification circuitry 310 identifies which keys are corrupted. For example, the rectification circuitry 310 determines whether the primary keys are corrupted and/or whether the secondary keys are corrupted. In some examples, the rectification circuitry 310 could determine that a file including the primary keys is unreadable (e.g., not accessible). In some examples, the rectification circuitry 310 could determine that a file including the secondary keys is unreadable. In some examples, the rectification circuitry 310 reconfigures only the primary public key in response to the primary public key being corrupted. In some examples, the rectification circuitry 310 reconfigures only the secondary public key and the secondary private key in response to the secondary keys being corrupted.
In some examples, the rectification circuitry 310 requests credentials (e.g., user credentials) prior to proceeding with the two-part rectification process. In some examples, if the rectification circuitry 310 identifies that more than one secondary agent 124a-d has a connectivity issue, the rectification circuitry 310 determines whether each of the identified secondary agent(s) 124a-d use the same user credentials. For example, each of the secondary agent(s) 124a-d may be installed on compute node(s) 114a-d having the same cloud account and, thus, the same credentials. In some examples, the rectification circuitry 310 simultaneously rectifies each of the identified secondary agent(s) 124a-d in response to each of the identified secondary agent(s) 124a-d having the same user credentials. In some examples, a user and/or administrator can select all of the compute node(s) 114a-d hosting secondary agent(s) 124a-d that have connectivity issues and request that the rectification circuitry 310 resolve the connection at the same time.
In some examples, the rectification circuitry 310 notifies the user interface update circuitry 306 when a connection between the primary agent 122 and the identified secondary agent(s) 124a-d has been reestablished (e.g., rectified). In some examples, the user interface update circuitry 306 instructs the client interface(s) 110 to display the status of the identified secondary agent(s) 1240a-d verified connection. An example user interface (e.g., client interface 110) is shown and described in further detail below in connection with
In some examples, the rectification circuitry 310 includes means for rectifying a connectivity issue, means for resolving a connectivity issue, and/or means for reestablishing a connection between a primary agent and secondary agent. For example, the means for rectifying may be implemented by rectification circuitry 310. In some examples, the rectification circuitry 310 may be instantiated by processor circuitry such as the example processor circuitry 812 of
The example datastore 312 of
The example first command 402 instructs the first compute node 114a to check the connectivity status of the first secondary agent 124a. In some examples, the connectivity determination circuitry 308 instructs the configuration circuitry 112 to execute the first command 402. The example first response 404 illustrates that the first secondary agent 124a has a stable connection by displaying the value “TRUE.” For example, the first response 404 indicates that no issues exist with the first secondary agent 124a.
The example second command 406 instructs the fourth compute node 114d to check the connectivity status of the fourth secondary agent 124d. The example second response 408 illustrates that the fourth secondary agent 124d has a connectivity issue by displaying the text “SECONDARY AGENT DID NOT RETURN.” For example, the connectivity determination circuitry 308 did not receive a valid response from the fourth compute node 114d. As such, there is a connectivity issue between the fourth secondary agent 124d and the primary agent 122.
The example first column 412 depicts compute node names. For example, each compute node (e.g., compute node(s) 114a-d) the is given and/or provided with a name during instantiation. In some examples, the connectivity determination circuitry 308 of
The example second column 414 depicts agent connectivity statuses. The example second column 414 enables a user and/or an administrator to view the connectivity status of the secondary agent running on the respective compute node and take an action based on the status indicated in the example second column 414. In some examples, the user interface update circuitry 306 provides the second user interface 410 with the status information and instructs the second user interface 410 to update the second column 414 based on the status information.
The example action option 416 is a “RECTIFY” option that instructs the example rectification circuitry 310 to rectify the connection of a selected compute node. For example, a first virtual machine (VM) 418 has a secondary agent that is disconnected, depicted in the second column 414. In the example second user interface 410, a user and/or administrator has selected the VM 418 and interacted with the action option 416 “RECTIFY.” The example rectification circuitry 310 obtains this instruction, along with the name of the VM 418, and executes the two-part process to reestablish the connectivity between the secondary agent and the primary agent.
While an example manner of implementing the cloud management circuitry 104 of
Flowcharts representative of example machine readable instructions, which may be executed to configure processor circuitry to implement the cloud management circuitry 104 of
The machine readable instructions described herein may be stored in one or more of a compressed format, an encrypted format, a fragmented format, a compiled format, an executable format, a packaged format, etc. Machine readable instructions as described herein may be stored as data or a data structure (e.g., as portions of instructions, code, representations of code, etc.) that may be utilized to create, manufacture, and/or produce machine executable instructions. For example, the machine readable instructions may be fragmented and stored on one or more storage devices and/or computing devices (e.g., servers) located at the same or different locations of a network or collection of networks (e.g., in the cloud, in edge devices, etc.). The machine readable instructions may require one or more of installation, modification, adaptation, updating, combining, supplementing, configuring, decryption, decompression, unpacking, distribution, reassignment, compilation, etc., in order to make them directly readable, interpretable, and/or executable by a computing device and/or other machine. For example, the machine readable instructions may be stored in multiple parts, which are individually compressed, encrypted, and/or stored on separate computing devices, wherein the parts when decrypted, decompressed, and/or combined form a set of machine executable instructions that implement one or more operations that may together form a program such as that described herein.
In another example, the machine readable instructions may be stored in a state in which they may be read by processor circuitry, but require addition of a library (e.g., a dynamic link library (DLL)), a software development kit (SDK), an application programming interface (API), etc., in order to execute the machine readable instructions on a particular computing device or other device. In another example, the machine readable instructions may need to be configured (e.g., settings stored, data input, network addresses recorded, etc.) before the machine readable instructions and/or the corresponding program(s) can be executed in whole or in part. Thus, machine readable media, as used herein, may include machine readable instructions and/or program(s) regardless of the particular format or state of the machine readable instructions and/or program(s) when stored or otherwise at rest or in transit.
The machine readable instructions described herein can be represented by any past, present, or future instruction language, scripting language, programming language, etc. For example, the machine readable instructions may be represented using any of the following languages: C, C++, Java, C#, Perl, Python, JavaScript, HyperText Markup Language (HTML), Structured Query Language (SQL), Swift, etc.
As mentioned above, the example operations of
“Including” and “comprising” (and all forms and tenses thereof) are used herein to be open ended terms. Thus, whenever a claim employs any form of “include” or “comprise” (e.g., comprises, includes, comprising, including, having, etc.) as a preamble or within a claim recitation of any kind, it is to be understood that additional elements, terms, etc., may be present without falling outside the scope of the corresponding claim or recitation. As used herein, when the phrase “at least” is used as the transition term in, for example, a preamble of a claim, it is open-ended in the same manner as the term “comprising” and “including” are open ended. The term “and/or” when used, for example, in a form such as A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) B with C, or (7) A with B and with C. As used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, or (3) at least one A and at least one B. Similarly, as used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, or (3) at least one A and at least one B. As used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, or (3) at least one A and at least one B. Similarly, as used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, or (3) at least one A and at least one B.
As used herein, singular references (e.g., “a”, “an”, “first”, “second”, etc.) do not exclude a plurality. The term “a” or “an” object, as used herein, refers to one or more of that object. The terms “a” (or “an”), “one or more”, and “at least one” are used interchangeably herein. Furthermore, although individually listed, a plurality of means, elements or method actions may be implemented by, e.g., the same entity or object. Additionally, although individual features may be included in different examples or claims, these may possibly be combined, and the inclusion in different examples or claims does not imply that a combination of features is not feasible and/or advantageous.
The example connectivity determination circuitry 308 determines whether a connectivity issue was found (block 504). For example, the connectivity determination circuitry 308 reads (e.g., analyzes, processes, etc.) the responses from the one or more secondary agents, provided by the primary agent, to determine whether a connection between one or more secondary agents and the primary agent has been terminated, failed, etc.
In some examples, when the connectivity determination circuitry 308 determines that no connectivity issue has been found (e.g., block 504 returns a value NO), control returns to block 502. For example, no further analysis is required if all secondary agents are fully connected to the primary agent. In some examples, the connectivity determination circuitry 308 notifies the user interface update circuitry 306 (
In some examples, when the connectivity determination circuitry 308 determines that a connectivity issue has been found (e.g., block 504 returns a value YES), the example connectivity determination circuitry 308 identifies the disconnected secondary agent (block 506). For example, the connectivity determination circuitry 308 identifies a name of the compute node(s) 114a-d, hosting the secondary agent(s) 124a-d that has been disconnected from the primary agent 122.
The example user interface update circuitry 306 updates a connectivity status of the secondary agent (block 508). For example, the user interface update circuitry 306 is notified, by the connectivity determination circuitry 308, that a particular secondary agent has been disconnected from the primary agent. In some examples, the user interface circuitry 306 obtains an instruction from the client interface(s) 110 to rectify the failed connection. In some examples, the user interface update circuitry 306 instructs the client interface(s) 110 to update the second column 414 of the second user interface 410 to indicate which secondary agent has been disconnected. For example, the second column 414 is to display “AGENT DISCONNECTED” next to and/or associated with the identified secondary agent in response to receiving an instruction from the user interface update circuitry 306.
The example rectification circuitry 310 (
In some examples, when the rectification circuitry 310 receives a request to rectify the connection (e.g., block 510 returns a value YES), the rectification circuitry 310 rectifies the connection (block 512). For example, the rectification circuitry 310 executes a two-part process, described below in connection with
The example connectivity determination circuitry 308 determines whether there is another secondary agent with a connectivity issue (block 514). For example, in response to the rectification circuitry 310 rectifying the connection between the identified secondary agent and the primary agent, the connectivity determination circuitry 308 can move on to identify other connectivity issues.
In some examples, when the rectification circuitry 310 does not receive a request to rectify the connection (e.g., block 510 returns a value NO), the connectivity determination circuitry 308 determines whether there is another secondary agent with a connectivity issue (block 514). For example, a user and/or administrator may not utilize the secondary agent that has a failed connection and, thus, may not take an action to rectify it. In such an example, the connectivity determination circuitry 308 continues to determine whether there are issues with other secondary agents.
The example operations 500 ends when the connectivity determination circuitry 308 determines that there are no connectivity issues with the secondary agents. In some examples, the operations 500 restart when the connectivity determination circuitry 308 triggers an execution of the background thread.
The example rectification circuitry 310 determines whether the verification failed (block 604). For example, the rectification circuitry 310 determines whether any issues were identified with the primary agent and/or the configuration circuitry 112. In some examples, an issue with the primary agent 122 and/or the configuration circuitry 112 is identified when all of the secondary agents 124a-d associated with the primary agent 122 are not connected to the primary agent 122. In some examples, an issue with the primary agent 122 and/or the configuration circuitry 112 is identified when a child service (e.g., a program executed by the primary agent 122) of the primary agent 122 is not in an operational state.
In some examples, if the rectification circuitry 310 determines that the verification has failed, the installation circuitry 304 (
The example rectification circuitry 310 verifies the primary agent 122 (block 608). For example, after the installation circuitry 304 reconfigures the primary agent 122, the rectification circuitry 310 determines whether the primary agent 122 is operational. In some examples, the rectification circuitry 310 determines the primary agent 122 is operational by sending a test command to the primary agent 122.
The example rectification circuitry 310 determines whether the verification was successful (block 610). For example, the rectification circuitry 310 determines whether the primary agent 122 returned a valid or invalid response to the test command. Additionally and/or alternatively, the rectification circuitry 310 can utilize any methods, algorithms, processes, to verify the state of the primary agent 122.
In some examples, if the rectification circuitry 310 determines that the verification was not successful (e.g., block 610 returns a value NO), control returns to block 606. For example, the rectification circuitry 310 attempts to reconfigure the primary agent 122 until the rectification circuitry 310 determines a successful state of the primary agent 122. In some examples, the rectification circuitry 310 instructs the installation circuitry 304 to utilize different steps, processes, etc., to ensure a successful reconfiguration of the primary agent 122.
In some examples, if the rectification circuitry 310 determines that the verification was successful (e.g., block 610 returns a value YES), control turns to the second part of the two-part process, in
The example rectification circuitry 310 determines whether the verification failed (block 704). For example, the rectification circuitry 310 determines whether the identified secondary agent(s) 124a-d has provided a valid or invalid response to the test command.
In some examples, when the rectification circuitry 310 determines that the verification did not fail (e.g., block 704 returns a value NO), control returns to block 514 of
In some examples, when the rectification circuitry 310 determines that the verification failed (e.g., block 704 returns a value YES), the rectification circuitry 310 instructs the installation circuitry 304 to copy a first key and a second key from the cloud proxy 102 (
The example installation circuitry 304 copies a third key from the example cloud proxy 102 to the example host(s) 118 (block 708). For example, the installation circuitry 304 copies the primary public key and provides the primary public key to the host(s) 118 hosting the compute node(s) 114a-d.
The example installation circuitry 304 uninstalls the identified secondary agent(s) 124a-d to set up an environment reconfiguration (block 710). For example, the installation circuitry 304 restarts the secondary agent(s) 124a-d by uninstalling the secondary agent(s) 124a-d. In some examples, the rectification circuitry 310 uninstalls the identified secondary agent(s) 124a-d. In some examples, the environment reconfiguration is equivalent to an environment shown in
The example installation circuitry 304 reinstalls the example secondary agent(s) 124a-d (block 712). For example, the installation circuitry 304 instructs the manager(s) 116 to reinstall the secondary agent(s) 124a-d on the compute node(s) 114a-d.
The example installation circuitry 304 utilizes the first key, the second key, and the third key to reconnect the example secondary agent(s) 124a-d to the primary agent 122 (block 714). For example, the installation circuitry 304 may instruct the host(s) 118 to provide the compute node(s) 114a-d with the primary public key, the secondary public key, and the secondary private key to authorize and/or authenticate the connection between the secondary agent(s) 124a-d and the primary agent 122. In some examples, the rectification circuitry 310 instructs the host(s) 118 to provide the compute node(s) 114a-d with the first, second, and third key to establish connectivity between the secondary agent(s) 124a-d and the primary agent 122.
The example rectification circuitry 310 determines whether connectivity was established between the example primary agent 122 and the example secondary agent(s) 124a-d (block 716). For example, the rectification circuitry 310 instructs the primary agent 122 to send a test command (e.g., execute the background thread) to the secondary agent(s) 124a-d. The example rectification circuitry 310 waits for a response from the primary agent 122 to determine whether the secondary agent(s) 124a-d has been successfully reinstalled and/or reconfigured. In some examples, the rectification circuitry 310 instructs the connectivity determination circuitry 308 to verify the connection between the primary agent 122 and the secondary agent(s) 124a-d.
In some examples, when the rectification circuitry 310 determines that connectivity is established (e.g., block 716 returns a value YES), control returns to block 514 of
The processor platform 800 of the illustrated example includes processor circuitry 812. The processor circuitry 812 of the illustrated example is hardware. For example, the processor circuitry 812 can be implemented by one or more integrated circuits, logic circuits, FPGAs, microprocessors, CPUs, GPUs, DSPs, and/or microcontrollers from any desired family or manufacturer. The processor circuitry 812 may be implemented by one or more semiconductor based (e.g., silicon based) devices. In this example, the processor circuitry 812 implements the example installation circuitry 304, the example user interface update circuitry 306, the example connectivity determination circuitry 308, and the example rectification circuitry 310.
The processor circuitry 812 of the illustrated example includes a local memory 813 (e.g., a cache, registers, etc.). The processor circuitry 812 of the illustrated example is in communication with a main memory including a volatile memory 814 and a non-volatile memory 816 by a bus 818. The volatile memory 814 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®), and/or any other type of RAM device. The non-volatile memory 816 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 814, 816 of the illustrated example is controlled by a memory controller 817.
The processor platform 800 of the illustrated example also includes interface circuitry 820. The interface circuitry 820 may be implemented by hardware in accordance with any type of interface standard, such as an Ethernet interface, a universal serial bus (USB) interface, a Bluetooth® interface, a near field communication (NFC) interface, a Peripheral Component Interconnect (PCI) interface, and/or a Peripheral Component Interconnect Express (PCIe) interface. In this example, the interface circuitry 820 implements the example interface 302.
In the illustrated example, one or more input devices 822 are connected to the interface circuitry 820. The input device(s) 822 permit(s) a user to enter data and/or commands into the processor circuitry 812. The input device(s) 822 can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, an isopoint device, and/or a voice recognition system.
One or more output devices 824 are also connected to the interface circuitry 820 of the illustrated example. The output device(s) 824 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube (CRT) display, an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer, and/or speaker. The interface circuitry 820 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip, and/or graphics processor circuitry such as a GPU.
The interface circuitry 820 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) by a network 826. The communication can be by, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, an optical connection, etc.
The processor platform 800 of the illustrated example also includes one or more mass storage devices 828 to store software and/or data. Examples of such mass storage devices 828 include magnetic storage devices, optical storage devices, floppy disk drives, HDDs, CDs, Blu-ray disk drives, redundant array of independent disks (RAID) systems, solid state storage devices such as flash memory devices and/or SSDs, and DVD drives. In this example, the mass storage devices 828 implement the example datastore 312.
The machine readable instructions 832, which may be implemented by the machine readable instructions of
The cores 902 may communicate by a first example bus 904. In some examples, the first bus 904 may be implemented by a communication bus to effectuate communication associated with one(s) of the cores 902. For example, the first bus 904 may be implemented by at least one of an Inter-Integrated Circuit (I2C) bus, a Serial Peripheral Interface (SPI) bus, a PCI bus, or a PCIe bus. Additionally or alternatively, the first bus 904 may be implemented by any other type of computing or electrical bus. The cores 902 may obtain data, instructions, and/or signals from one or more external devices by example interface circuitry 906. The cores 902 may output data, instructions, and/or signals to the one or more external devices by the interface circuitry 906. Although the cores 902 of this example include example local memory 920 (e.g., Level 1 (L1) cache that may be split into an L1 data cache and an L1 instruction cache), the microprocessor 900 also includes example shared memory 910 that may be shared by the cores (e.g., Level 2 (L2 cache)) for high-speed access to data and/or instructions. Data and/or instructions may be transferred (e.g., shared) by writing to and/or reading from the shared memory 910. The local memory 920 of each of the cores 902 and the shared memory 910 may be part of a hierarchy of storage devices including multiple levels of cache memory and the main memory (e.g., the main memory 814, 816 of
Each core 902 may be referred to as a CPU, DSP, GPU, etc., or any other type of hardware circuitry. Each core 902 includes control unit circuitry 914, arithmetic and logic (AL) circuitry (sometimes referred to as an ALU) 916, a plurality of registers 918, the local memory 920, and a second example bus 922. Other structures may be present. For example, each core 902 may include vector unit circuitry, single instruction multiple data (SIMD) unit circuitry, load/store unit (LSU) circuitry, branch/jump unit circuitry, floating-point unit (FPU) circuitry, etc. The control unit circuitry 914 includes semiconductor-based circuits structured to control (e.g., coordinate) data movement within the corresponding core 902. The AL circuitry 916 includes semiconductor-based circuits structured to perform one or more mathematic and/or logic operations on the data within the corresponding core 902. The AL circuitry 916 of some examples performs integer based operations. In other examples, the AL circuitry 916 also performs floating point operations. In yet other examples, the AL circuitry 916 may include first AL circuitry that performs integer based operations and second AL circuitry that performs floating point operations. In some examples, the AL circuitry 916 may be referred to as an Arithmetic Logic Unit (ALU). The registers 918 are semiconductor-based structures to store data and/or instructions such as results of one or more of the operations performed by the AL circuitry 916 of the corresponding core 902. For example, the registers 918 may include vector register(s), SIMD register(s), general purpose register(s), flag register(s), segment register(s), machine specific register(s), instruction pointer register(s), control register(s), debug register(s), memory management register(s), machine check register(s), etc. The registers 918 may be arranged in a bank as shown in
Each core 902 and/or, more generally, the microprocessor 900 may include additional and/or alternate structures to those shown and described above. For example, one or more clock circuits, one or more power supplies, one or more power gates, one or more cache home agents (CHAs), one or more converged/common mesh stops (CMSs), one or more shifters (e.g., barrel shifter(s)) and/or other circuitry may be present. The microprocessor 900 is a semiconductor device fabricated to include many transistors interconnected to implement the structures described above in one or more integrated circuits (ICs) contained in one or more packages. The processor circuitry may include and/or cooperate with one or more accelerators. In some examples, accelerators are implemented by logic circuitry to perform certain tasks more quickly and/or efficiently than can be done by a general purpose processor. Examples of accelerators include ASICs and FPGAs such as those discussed herein. A GPU or other programmable device can also be an accelerator. Accelerators may be on-board the processor circuitry, in the same chip package as the processor circuitry and/or in one or more separate packages from the processor circuitry.
More specifically, in contrast to the microprocessor 900 of
In the example of
The configurable interconnections 1010 of the illustrated example are conductive pathways, traces, vias, or the like that may include electrically controllable switches (e.g., transistors) whose state can be changed by programming (e.g., using an HDL instruction language) to activate or deactivate one or more connections between one or more of the logic gate circuitry 1008 to program desired logic circuits.
The storage circuitry 1012 of the illustrated example is structured to store result(s) of the one or more of the operations performed by corresponding logic gates. The storage circuitry 1012 may be implemented by registers or the like. In the illustrated example, the storage circuitry 1012 is distributed amongst the logic gate circuitry 1008 to facilitate access and increase execution speed.
The example FPGA circuitry 1000 of
Although
In some examples, the processor circuitry 812 of
A block diagram illustrating an example software distribution platform 1105 to distribute software such as the example machine readable instructions 832 of
From the foregoing, it will be appreciated that example systems, methods, apparatus, and articles of manufacture have been disclosed that improve an operation of a cloud computing environment by updating and monitoring connections between compute nodes and management nodes. Disclosed systems, methods, apparatus, and articles of manufacture improve the efficiency of using a computing device by reducing latency of a cloud computing environment that continuously attempts to execute a command without the ability to do so due to connectivity issues. Disclosed systems, methods, apparatus, and articles of manufacture are accordingly directed to one or more improvement(s) in the operation of a machine such as a computer or other electronic and/or mechanical device.
Example methods, apparatus, systems, and articles of manufacture to improve management operations of a cloud computing environment are disclosed herein. Further examples and combinations thereof include the following:
Example 1 includes an apparatus comprising at least one memory, machine readable instructions, and processor circuitry to at least one of instantiate or execute the machine readable instructions to determine a connectivity status between a first agent operating on a proxy server and a second agent operating on a compute node, the first agent and the second agent executing an application monitoring service, in response to determining that the connectivity status is indicative of a failed connection between the first agent and second agent update the connectivity status of the second agent, and obtain an instruction to rectify the failed connection, and resolve that failed connection between the first agent and the second agent.
Example 2 includes the apparatus of example 1, wherein the first agent is a primary agent that requests metric data from the second agent.
Example 3 includes the apparatus of example 1, wherein the second agent is a secondary agent that runs on the compute node and collects metric data from the compute node in response to instructions from the first agent.
Example 4 includes the apparatus of example 1, wherein the processor circuitry is to periodically execute a background thread to determine the connectivity status between the first agent and the second agent.
Example 5 includes the apparatus of example 1, wherein the processor circuitry is to verify an operating state of the first agent to resolve the failed connection between the first agent and the second agent.
Example 6 includes the apparatus of example 5, wherein the processor circuitry is to reconfigure the first agent in response to determining an unsuccessful operating state of the first agent.
Example 7 includes the apparatus of example 1, wherein the processor circuitry is to verify an operating state of the second agent, in response to an unsuccessful operating state of the second agent provide a first key, a second key, and a third key to the second agent, the first and second cryptographic keys corresponding to the second agent and the third key a cryptographic key corresponding to the first agent, uninstall the second agent, reinstall the second agent, and instruct the second agent to reconnect to the first agent utilizing the first key, second key, and third key, the first, second, and third keys to authenticate a communication between the first agent and second agent.
Example 8 includes a non-transitory machine readable storage medium comprising instructions that, when executed, cause processor circuitry to at least determine a connectivity status between a first agent operating on a proxy server and a second agent operating on a compute node, the first agent and the second agent executing an application monitoring service, in response to determining that the connectivity status is indicative of a failed connection between the first agent and second agent update the connectivity status of the second agent, and obtain an instruction to rectify the failed connection, and resolve that failed connection between the first agent and the second agent.
Example 9 includes the non-transitory machine readable storage medium of example 8, wherein the first agent is a primary agent that requests metric data from the second agent.
Example 10 includes the non-transitory machine readable storage medium of example 8, wherein the second agent is a secondary agent that runs on the compute node and collects metric data from the compute node in response to instructions from the first agent.
Example 11 includes the non-transitory machine readable storage medium of example 8, wherein the instructions, when executed, cause processor circuitry to at least periodically execute a background thread to determine the connectivity status between the first agent and the second agent.
Example 12 includes the non-transitory machine readable storage medium of example 8, wherein the instructions, when executed, cause processor circuitry to at least verify an operating state of the first agent to resolve the failed connection between the first agent and the second agent.
Example 13 includes the non-transitory machine readable storage medium of example 12, wherein the instructions, when executed, cause processor circuitry to at least reconfigure the first agent in response to determining an unsuccessful operating state of the first agent.
Example 14 includes the non-transitory machine readable storage medium of example 8, wherein the instructions, when executed, cause processor circuitry to verify an operating state of the second agent, in response to an unsuccessful operating state of the second agent provide a first key, a second key, and a third key to the second agent, the first and second cryptographic keys corresponding to the second agent and the third key a cryptographic key corresponding to the first agent, uninstall the second agent, reinstall the second agent, and instruct the second agent to reconnect to the first agent utilizing the first key, second key, and third key, the first, second, and third keys to authenticate a communication between the first agent and second agent.
Example 15 includes a method comprising determining a connectivity status between a first agent operating on a proxy server and a second agent operating on a compute node, the first agent and the second agent executing an application monitoring service, in response to determining that the connectivity status is indicative of a failed connection between the first agent and second agent updating the connectivity status of the second agent, and obtaining an instruction to rectify the failed connection, and resolving that failed connection between the first agent and the second agent.
Example 16 includes the method of example 15, wherein the first agent is a primary agent that requests metric data from the second agent.
Example 17 includes the method of example 15, wherein the second agent is a secondary agent that runs on the compute node and collects metric data from the compute node in response to instructions from the first agent.
Example 18 includes the method of example 15, further including periodically executing a background thread to determine the connectivity status between the first agent and the second agent.
Example 19 includes the method of example 15, further including verifying an operating state of the first agent to resolve the failed connection between the first agent and the second agent.
Example 20 includes the method of example 19, further including reconfiguring the first agent in response to determining an unsuccessful operating state of the first agent.
Example 21 includes the method of example 15, further including verifying an operating state of the second agent, in response to an unsuccessful operating state of the second agent providing a first key, a second key, and a third key to the second agent, the first and second cryptographic keys corresponding to the second agent and the third key a cryptographic key corresponding to the first agent, uninstalling the second agent, reinstalling the second agent, and instructing the second agent to reconnect to the first agent utilizing the first key, second key, and third key, the first, second, and third keys to authenticate a communication between the first agent and second agent.
The following claims are hereby incorporated into this Detailed Description by this reference. Although certain example systems, methods, apparatus, and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all systems, methods, apparatus, and articles of manufacture fairly falling within the scope of the claims of this patent.
Number | Date | Country | Kind |
---|---|---|---|
202241042074 | Jul 2022 | IN | national |