This disclosure relates generally to clusters and, more particularly, to methods and apparatus to generate and manage logical workload domain clusters in a computing environment.
A software-defined data center (SDDC) is a data center implemented by software in which hardware is virtualized and provided to users as services. SDDCs allow for dynamically configuring and deploying applications and resources per customer requests and per customer-defined specifications and performances.
In general, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts. The figures are not necessarily to scale.
Unless specifically stated otherwise, descriptors such as “first,” “second,” “third,” etc., are used herein without imputing or otherwise indicating any meaning of priority, physical order, arrangement in a list, and/or ordering in any way, but are merely used as labels and/or arbitrary names to distinguish elements for ease of understanding the disclosed examples. In some examples, the descriptor “first” may be used to refer to an element in the detailed description, while the same element may be referred to in a claim with a different descriptor such as “second” or “third.” In such instances, it should be understood that such descriptors are used merely for identifying those elements distinctly within the context of the discussion (e.g., within a claim) in which the elements might, for example, otherwise share a same name.
As used herein, the phrase “in communication,” including variations thereof, encompasses direct communication and/or indirect communication through one or more intermediary components, and does not require direct physical (e.g., wired) communication and/or constant communication, but rather additionally includes selective communication at periodic intervals, scheduled intervals, aperiodic intervals, and/or one-time events.
As used herein, “programmable circuitry” is defined to include (i) one or more special purpose electrical circuits (e.g., an application specific circuit (ASIC)) structured to perform specific operation(s) and including one or more semiconductor-based logic devices (e.g., electrical hardware implemented by one or more transistors), and/or (ii) one or more general purpose semiconductor-based electrical circuits programmable with instructions to perform specific functions(s) and/or operation(s) and including one or more semiconductor-based logic devices (e.g., electrical hardware implemented by one or more transistors). Examples of programmable circuitry include programmable microprocessors such as Central Processor Units (CPUs) that may execute first instructions to perform one or more operations and/or functions, Field Programmable Gate Arrays (FPGAs) that may be programmed with second instructions to cause configuration and/or structuring of the FPGAs to instantiate one or more operations and/or functions corresponding to the first instructions, Graphics Processor Units (GPUs) that may execute first instructions to perform one or more operations and/or functions, Digital Signal Processors (DSPs) that may execute first instructions to perform one or more operations and/or functions, XPUs, Network Processing Units (NPUs) one or more microcontrollers that may execute first instructions to perform one or more operations and/or functions and/or integrated circuits such as Application Specific Integrated Circuits (ASICs). For example, an XPU may be implemented by a heterogeneous computing system including multiple types of programmable circuitry (e.g., one or more FPGAs, one or more CPUs, one or more GPUs, one or more NPUs, one or more DSPs, etc., and/or any combination(s) thereof), and orchestration technology (e.g., application programming interface(s) (API(s)) that may assign computing task(s) to whichever one(s) of the multiple types of programmable circuitry is/are suited and available to perform the computing task(s).
As used herein integrated circuit/circuitry is defined as one or more semiconductor packages containing one or more circuit elements such as transistors, capacitors, inductors, resistors, current paths, diodes, etc. For example an integrated circuit may be implemented as one or more of an ASIC, an FPGA, a chip, a microchip, programmable circuitry, a semiconductor substrate coupling multiple circuit elements, a system on chip (SoC), etc.
In computing environments, clusters of computing devices can be deployed to provide redundancy and distribute compute resources across multiple physical devices. In some implementations, multiple host computing systems can be deployed and each host computing system can provide a physical platform for virtual machines, containers, or other virtualized endpoints. The hosts may further provide additional compute resources, including virtual networking, including routing, encapsulation, or other similar networking operations to support the communications of the virtual endpoints. In some examples, an organization may deploy multiple physical clusters, and each of the clusters may include a plurality of hosts. The clusters may be deployed in a single datacenter or can be deployed across multiple datacenters, edge deployments (stores, workplaces, and the like), and geographic locations.
An SDDC environment typically requires configuration of compute resources, network resources, storage resources, and security protocols. An SDDC executes workload domains in accordance with resource configurations (e.g., configurations of clusters and corresponding hosts) corresponding to these workload domains. As used herein, a workload domain is a policy-based resource container with specific availability and/or performance attributes that combines virtual compute resources, virtual storage resources, and/or virtual network resources into a useable execution environment. In examples disclosed herein, a workload domain is deployed in a virtualization environment and used to execute deployed applications.
In some examples, an SDDC environment includes, manages, and deploys a plurality of workload domains. In such examples, at least some, if not all, of the plurality of workload domains are homogenous. As used herein, homogeneous workload domains are that workload domains share the same hardware and/or virtual resources (e.g., servers, memory, etc.). For example, when a first workload domain is created for an SDDC to deploy and manage, the first workload domain is assigned to a set of one or more servers (e.g., physical and/or virtual servers) that are managed by the SDDC. When a second workload domain is created for the SDDC to deploy and manage, the second workload domain is also assigned to the set of one or more servers (e.g., physical and/or virtual). In this example, the first and second workload domains may have separate functions (e.g., execute different applications), but they utilize the same compute resources. As such, the first and second workload domains are homogenous workload domains. In some examples, the first and second workload domains may operate in conjunction with each other. The operation of the individual workload domains is determined by a user creating a virtual environment and, thus, may function in any way desired and known by the user. In some examples, when there are 10, 20, 50, 100, or any number of workload domains deployed by the SDDC, it becomes cumbersome to manage. For example, updating policies of the SDDC, updating firmware versions, etc., may take a significant amount of time to perform, because every individual workload domain will require the updates.
Additionally, there is an even greater number of clusters and hosts managed by the SDDC when a large number of workload domains are deployed by the SDDC. For example, a workload domain includes one or more clusters. And a cluster includes one or more hosts. Therefore, for a given workload domain, there may be tens to hundreds of hosts. While previous solutions to manage, as one unit, such a large number of resources included grouping workload domains into logical workload domains (LWDs), such solutions did not cover managing, as one unit, the clusters and associated hosts included in the workload domains. For example, a user and/or operator of a workload domain could apply firmware updates, security policy updates, certificate management, etc. at a LWD level, but not at a cluster and/or host level. Therefore, users and/or operators could not apply configuration management, desired state management, lifecycle updates, etc., to hosts in a uniform manner. For example, when hosts were to be updated, a user and/or operator had to apply the update manually to one host at a time.
As used herein, “moving hosts,” “assigning hosts,” and/or “distributing hosts” refers to a process of reallocating hosts. In virtual computing, hosts can be reallocated to different clusters through a process known as live migration or live relocation. Live migration allows for the movement of virtual machines (VMs) from one physical host to another without disruption to running (e.g., operating) applications. As used herein, a “LWD” is a logical grouping of two or more workload domains based on a certain criterion. As used herein, a “cluster” is a cluster of two or more hosts (e.g., computing devices, computing systems, compute resources, etc.) deployed to provide redundancy, distribute compute resources across multiple physical devices, and/or provide physical platforms for virtual machines, containers, and/or other virtual endpoints.
Examples disclosed herein define a logical workload domain cluster (LWDC) as a logical grouping of two or more of workload domains and corresponding clusters based on criterion. In some examples, the LWDC provides a user and/or operator the ability to manage hosts and/or clusters at a LWD level. In some examples, the LWDC provides a user and/or operator the ability to apply a desired state configuration across clusters to have a similar set of configurations on a cluster level. In some examples, such a desired state configuration normalizes a standard across workload domains. In some examples, the LWDC facilitates an automatic resource optimization of hosts operating under and/or in a LWD. For example, examples disclosed herein identify hosts from one workload that are available (e.g., not in use, have a threshold amount of available storage, have a threshold amount of available processing capabilities, etc.) and assign (e.g., move, reallocate, etc.) them to a different workload of the same LWD, when the different workload has a critical situation due to the shortage and/or overuse of its resources (e.g., a resource crunch).
As used herein, a “desired state configuration” refers to how a user and/or operator wants to set up the host and/or cluster. For example, a desired state configuration of a host may be a specific profile created by a user and/or operator. In some examples, a profile and/or a desired state configuration is created utilizing a template (e.g., a configuration template). In some examples, such templates are pre-defined by a provider (e.g., VMware Inc.) or a user/operator. For example, when a user and/or operator wishes to create a virtual infrastructure (VI) type workload domain, the user and/or operator can retrieve a template that can be used to configure hosts to operate specifically for a VI. In such an example, the template may be created by a provider, by the user/operator at a previous time, etc.
The example LWD system 100 is a system that operates on top of or outside of one or more virtual server racks. The example LWD system 100 is a high-level management system that facilitates the creation of LWDs 116 and that facilitates the management of both the LWDs 116 and clusters 118. For example, the LWD system 100 includes components that configure resources, facilitate updates, configure security protocols, etc. For example, the LWD system 100 configures, deploys, and/or upgrades logical workload domains 116. In addition, the example LWD system 100 configures, deploys, upgrades, moves, etc. the example clusters 118. In addition, the example LWD system 100 may be implemented by a physical server, a virtual server, and/or a combination thereof.
The example LWD system 100 includes the example LWD management controller 102 to configure and/or deploy LWDs 116. The example LWD management controller 102 includes at least one read/write connection that may be connected to a network to receive API calls. For example, the LWD management controller 102 communicates with an SDDC manager, controlled by a user, to create LWDs 116, remove LWDs 116, etc. In the illustrated example, the LWD management controller 102 is to retrieve reference configuration templates from the datastore 106 and/or configure the LWDs 116 based on settings of the retrieved reference configuration templates. The example LWD management controller 102 selects a reference configuration template based on instructions from API calls. The example LWD management controller 102 may select a reference configuration template based on a type (e.g., a banking type, a web server type, a media streaming type, etc.) of the application to be deployed in the workload domain and/or based on the LWD 116 to which the workload domain is to belong, which may be determined based on user input. In some examples, the LWD management controller 102 is instantiated by programmable circuitry executing LWD management controller instructions and/or configured to perform operations such as those represented by the flowchart of
In some examples, the LWD domain system 100 includes means for configuring and deploying LWDs. For example, the means for configuring and deploying LWDs may be implemented by LWD management controller circuitry such as the LWD management controller 102. In some examples, the LWD management controller 102 may be instantiated by programmable circuitry such as the example programmable circuitry 812 of
The example LWD system 100 includes the example LWD operator controller 104 to apply policies and/or to update deployed LWDs 116. For example, the LWD operator controller 104 facilitates applying security policies, managing upgrades, performing backup and restore operations, and applying compliance updates at the LWD level. The example LWD operator controller 104 can simultaneously and/or concurrently orchestrate a service for all workload domains within a LWD 116 and, thus, decrease an amount of time spent on orchestrating the service to individual workload domains. As used herein, orchestrating is defined as the creation, management, manipulation and/or decommissioning of cloud resources, (e.g., computing, storage, and/or networking resources), in order to realize customer computing requests (e.g., processing requests, hosting requests, etc.), while conforming to operational objectives of cloud service providers. Orchestrating a service includes managing, manipulating, and/or decommissioning cloud resources corresponding to one or more logical workload domains (e.g., the cloud resources making up the logical workload domains) in order to instantiate (e.g., realize) the service.
In some examples, the LWD system 100 includes means for applying policies to LWDs. For example, the means for applying policies to LWDs may be implemented by LWD operator controller circuitry such as the LWD operator controller 104. In some examples, the LWD operator controller 104 may be instantiated by programmable circuitry such as the example programmable circuitry 812 of
The example LWD system 100 includes the example datastore 106 which includes and/or stores reference configuration templates (workload domain configuration templates) and/or cluster assignments. Reference configuration templates provide configuration settings for the workload domains. As used herein, a configuration template is a data file that stores general configuration settings for workload domains. In examples disclosed herein, the configuration templates are used by the LWD management controller 102 to initially configure the workload domains and LWDs. Multiple configuration templates with different settings may be provided for different workload domains such as, for example, a workload domain for using a banking application, a workload domain for using a streaming service application, etc. In some examples, the reference configuration templates include metadata indicative of ones of logical workload domains 116a, 116b to which the reference configuration templates correspond. For example, a first reference configuration template may be a replica of a workload domain in a first LWD 116a and a second reference configuration template may be a replica of a workload domain in a second LWD 116b. The cluster assignments stored in the example datastore 106 are indicative of which clusters 118 are associated with what LWD 116. In some examples, the LWD management controller 102 generates the cluster assignments.
The example datastore 106 of this example may be implemented by a volatile memory (e.g., a Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS Dynamic Random Access Memory (RDRAM), etc.) and/or a non-volatile memory (e.g., flash memory). The example datastore 106 may additionally or alternatively be implemented by one or more double data rate (DDR) memories, such as DDR, DDR2, DDR3, DDR4, mobile DDR (mDDR), etc. The example datastore 106 may additionally or alternatively be implemented by one or more mass storage devices such as hard disk drive(s), compact disk (CD) drive(s), digital versatile disk (DVD) drive(s), solid-state disk drive(s), etc. While in the illustrated example the datastore 106 is illustrated as a single datastore, the datastore 106 may be implemented by any number and/or type(s) of datastores. Furthermore, the data stored in the example datastore 106 may be in any data format such as, for example, binary data, comma delimited data, tab delimited data, structured query language (SQL) structures, etc.
The example LWD system 100 includes the example LWD cluster management controller 108 to monitor resource utilization across clusters 118 and manage (e.g., reallocate, move, change, extend, etc.) the clusters 118 based on feedback from resource utilization monitoring. For example, the LWD cluster management controller 108 performs cluster movement between workload domains in a LWD 116 and/or host movement between clusters 118 in a LWD 116 based on resource utilization. An example of cluster movement and host movement is described in further detail below in connection with the clusters 118 and in connection with
The example LWD system 100 includes the example LWD cluster operator controller 110 to orchestrate operational processes on clusters 118 at a logical domain cluster level. For example, the LWD cluster operator controller 110 triggers upgrades across hosts in a LWD 116, applies desired state configurations on clusters in a LWD 116, triggers certificate management across hosts in a LWD 116 and/or across selected clusters in a LWD 116, applies security and password policies across clusters in a LWD 116, etc. In some examples, the LWD cluster operator controller 110 improves the efficiency of orchestrating the day-to-day infrastructure operations of one or more workload domains by issuing one operation instruction for all the clusters as opposed to an operation instruction for each cluster. The example LWD cluster operator controller 110 is described in further detail below in connection with
The example LWD system 100 includes the example operations management controller 112 to perform individual operations on individual clusters when a cluster does not belong to a LWD. For example, some workload domains may not be configured to be included in a group of workload domains (e.g., logical workload domains) and, thus, the corresponding clusters are not configured to be included in a group of clusters (e.g., logical workload domain clusters). As such, the example operations management controller 112 is provided to apply operations to the clusters not included in a LWD. In some examples, the LWD cluster operator controller 110 may receive a request from an SDDC manager to perform an operation on cluster n, and further determine that the cluster n is not included in any LWD. In such examples, the LWD cluster operator controller 110 initiates the operations management controller 112 to perform the operation on the cluster n.
The example LWD system 100 includes the example lifecycle management controller 114 to perform individual upgrades on individual clusters when a cluster does not belong to a LWD. The example lifecycle management controller 114 obtains requests and/or instructions from the LWD cluster operator controller 110 to apply upgrades to clusters. In some examples, the LWD cluster operator controller 110 may receive a request from an SDDC manager to upgrade a cluster n, and further determine that the cluster n is not included in any LWD. In such examples, the LWD cluster operator controller 110 initiates the lifecycle management controller 114 to upgrade the cluster n.
The example LWD system 100 includes the example LWDs 116, which are one or more logical groupings of a number of workload domains and corresponding clusters of resources (e.g., hosts) grouped based on certain criteria. In some examples, the criteria/criterion for grouping is based on applications that are to run on the workload domains' clusters. For example, workload domains utilized for a banking application may be grouped together, workload domains used for a streaming service application may be grouped together, etc. In some examples, the criteria/criterion for grouping is based on user choices. For example, if a user wants particular workload domains and/or clusters to be handled simultaneously and/or concurrently (e.g., managed at one time), the user can select which workload domains to group together.
The example LWD system 100 includes the example clusters 118, which represent a cluster of hosts assigned to a workload. For example, a first workload domain (VI-1) includes a first cluster (VI-1-CLUSTER-1) and a second cluster (VI-1-CLUSTER-2). In this example, the first cluster and the second cluster are assigned to the first workload domain and, thus, the clusters of hosts perform tasks, operations, etc., for the first workload domain. The hosts represent physical computers, which can each include memory and at least one processing system to provide the operations disclosed herein. A cluster may utilize any number of hosts. In some examples, the clusters 118 can be deployed by an organization to provide resources for virtual endpoints, including virtual machines, containers, and the like. In some examples, the hosts can virtualize the physical resources or components to virtual or logical resource representations and provide access to the virtual or logical resource representation of the physical resources to the virtual machines or other virtualized endpoints. The resources can include processing resources, memory resources, networking resources, and the like. The organization may deploy one or more clusters 118 within the same datacenter or across multiple datacenters in different geographic locations. These locations can be remote or moveable, such as retail locations, cruise ships, oil rigs, or other similar deployments of hosts for virtual machines.
In
In the illustrated example of
In some examples, another deployment of a group of workload domains is configured to follow the same security policy for all of its workload domains. In some examples, when a user configures workload domains to follow the same security policy, the LWD management controller 102 can identify this criteria and group the workload domains into a LWD. For example, the second LWD 116b includes workload domain 3 (VI-3) and workload domain 4 (VI-4), which are both configured to follow the same security policy. Therefore, the second LWD 116b is created based on the security policy criteria. By creating the second LWD 116b based on the security policy, the example LWD operator controller 104 is enabled to configure updates and security policies in one place instead of configuring two workload domains separately. In some examples, the LWD cluster operator controller 110 is enabled to configure updates and security policies in one place instead of configuring four clusters separately (e.g., clusters VI-3-CLUSTER-1, VI-3-CLUSTER-2, VI-4-CLUSTER-1, VI-4-CLUSTER-2).
In some examples, as described above, the LWD cluster management controller 108 performs cluster movement between workload domains in a LWD 116 and/or host movement between clusters 118 in a LWD 116 based on resource utilization. In such an example, the LWD cluster management controller 108 can move the second cluster (VI-1-CLUSTER-2) from the first workload domain (VI-1) to the second workload domain (VI-2), due to their logical grouping (e.g., being grouped together in LWD 116a). In this example, the second cluster comes under the control of the second workload domain (VI-2), along with the third cluster (VI-2-CLUSTER-1) and the fourth cluster (VI-2-CLUSTER-2). In some examples, LWD cluster management controller 108 moves the second cluster from workload domain 1 to workload domain 2 because workload domain 2 has a resource crunch (e.g., a critical situation due to the shortage and/or overuse of the workload domain resources). In some examples, the LWD cluster management controller 108 can move a host from the first cluster (VI-1-CLUSTER-1) to the second cluster (VI-1-CLUSTER-2). For example, if the second cluster has a resource crunch, and the first cluster has an available host, the LWD cluster management controller 108 can configure the available host to operate for the second cluster. The host and cluster movement is described in further detail below in connection with
In some examples, the LWD management controller 102 is configured to group the workload domain based on application criteria. As such, the example workload domain 3 (VI-3) and the example workload domain 4 (VI-4) execute parts of the same application. The example LWD management controller 102 may analyze configuration settings of the workload domains to determine that workload domain 3 (VI-3) and workload domain 4 (VI-4) execute parts of the same application. In some examples, the configuration settings include a job title of the workload domain. For example, both the workload domain 3 (VI-3) and workload domain 4 (VI-4) include information indicating their job title is JOB 1. The example LWD management controller 102 creates the second LWD 116b to be consumed as a resource for JOB 1. For example, the LWD management controller 102 encloses the set of workload domains (e.g., workload domain 3 (VI-3) and workload domain 4 (VI-4)) in the second LWD 116b as a set of workload domains that can be managed together as a single entity.
The example LWD cluster management controller 108 includes an example cluster management control interface 202, an example host management control interface 204, an example resource monitor 206, an example resource optimization controller 208, an example host movement controller 210, and an example scheduling controller 212. In some examples, the resource monitor 206 is instantiated by programmable circuitry executing resource monitor instructions and/or configured to perform operations such as those represented by the flowchart of
In the illustrated example of
In some examples, the LWD cluster management controller 202 includes means for obtaining requests. For example, the means for obtaining requests may be implemented by cluster management control interface 202. In some examples, the cluster management control interface 202 may be instantiated by programmable circuitry such as the example programmable circuitry 812 of
In the illustrated example of
In some examples, the LWD cluster management controller 108 includes means for obtaining second requests. For example, the means for obtaining second requests may be implemented by host management control interface circuitry such as the host management control interface 204. In some examples, the host management control interface 204 may be instantiated by programmable circuitry such as the example programmable circuitry 812 of
In the illustrated example of
In some examples, the resource monitor 206 performs a resource usage analysis on the collected resource usage data to determine whether a cluster is in a low resource state due to the shortage and/or overuse of its resources. As used herein, a low resource state of a cluster refers to a cluster that includes hosts operating at a high capacity, such as using most (or all) of their storage, CPU cycles, bandwidth, etc. In some examples, the resource monitor 206 is provided with pre-defined threshold values for each low resource state metric (e.g., CPU usage, memory utilization, disk input/output (I/O), network traffic, and storage capacity). These pre-defined threshold values, referred to herein as crunch thresholds, represent the upper limits or abnormal conditions beyond which a resource crunch might occur. In examples disclosed herein, a resource crunch occurs when resources of a system (e.g., a workload domain, a cluster, etc.) become overloaded such that they are “crunched” under the resource-intensive use of their tasks. For example, high CPU usage, low free memory, or excessive disk latency could indicate a resource crunch. In another example, a crunch threshold could be assigned to a cluster. A crunch threshold value may be set by a system administrator or an intelligent computer process to represent when a cluster is overloaded. In some examples, the system administrator (or intelligent computer process) defines when a cluster is overloaded. For example, a system administrator can set a crunch threshold value to be higher for a first cluster relative to a second cluster, because the system administrator may desire the first cluster to use less resources than the second cluster. In one example, the first cluster (VI-1-CLUSTER-1) may have a crunch threshold of 60%, indicating that a 60% resource usage or higher by the first cluster is abnormal or an upper limit, while the second cluster (VI-1-CLUSTER-2) may have a crunch threshold of 90%, indicating that a 90% resource usage or higher by the second cluster is abnormal or an upper limit. The example resource monitor 206 utilizes the crunch threshold(s) to determine whether a cluster has a resource crunch. For example, the resource monitor 206 compares the resource usage data to a crunch threshold. For example, the first cluster (VI-1-CLUSTER-1) has a crunch threshold of 60%. In such an example, the resource monitor 206 compares the resource usage (e.g., denoted by a percentage) to the crunch threshold to determine whether the VI-1-CLUSTER-1 is in a critical situation. In some examples, each cluster has a different crunch threshold based on the type of application it is configured to execute, based on the type of workload domain it is deployed in, based on the type of hardware it abstracts and/or operates on, etc. Additionally or alternatively, each cluster has the same crunch threshold. In any event, the example resource monitor 206 is provided the crunch threshold for the clusters 118 in a LWD 116 and uses it to identify resource crunches in one or more clusters.
In some examples, the LWD cluster management controller 108 includes means for obtaining resource usage data. For example, the means for obtaining resource usage data may be implemented by resource monitor circuitry such as the resource monitor 206. In some examples, the resource monitor 206 may be instantiated by programmable circuitry such as the example programmable circuitry 812 of
In the illustrated example of
In some examples, the resource optimization controller 208 triggers host movement. For example, the resource optimization controller 208 sends an instruction and/or request to the host movement controller 210, where the instruction includes information relating to which cluster and host is to be reallocated (e.g., the source cluster) and to which cluster the host is to be reallocated to (e.g., the destination cluster). In some examples, the resource optimization controller 208 obtains feedback from the host movement controller 210. For example, the host movement controller 210 notifies the resource optimization controller 208 when reallocation of a host is complete. In some examples, upon a notification from the host movement controller 210, the resource optimization controller 208 “recalculates” the resource usage of the both the source cluster and destination cluster. As used herein, “recalculating” resource usage includes collecting resource usage data from the source cluster and destination cluster, after a host has been reallocated, and further comparing the resource usage data to the crunch threshold to determine if there is still a critical situation at the destination cluster. For example, if the resource optimization controller 208 was notified that a host of VI-1-CLUSTER-2, having a resource usage of 20% or less, was reallocated to VI-1-CLUSTER-1, the resource optimization controller 208 would collect resource usage data from both VI-1-CLUSTER-1 and VI-1-CLUSTER-2 and determine whether either of those clusters are in a low resource state. Additionally or alternatively, the resource optimization controller 208 requests the resource monitor 206 to determine whether VI-1-CLUSTER-1 or VI-1-CLUSTER-2 are in a low resource state. In some examples, if the resource optimization controller 208 determines that a cluster (source or destination), after reallocation of a host, is still in (or newly in) a low resource state, the process for identifying an available host is repeated and host reallocation reoccurs.
In some examples, the resource optimization controller 208 optimizes clusters across workload domains. For example, the resource optimization controller 208 can cause a reallocation of clusters within a LWD. In some examples, the resource optimization controller 208 and/or the resource monitor 206 identifies that the first and second clusters (VI-1-CLUSTER-1, VI-1-CLUSTER-2) in workload domain 1 (VI-1) are both in a low resource state. In such an example, the resource optimization controller 208 determines that the third cluster (VI-2-CLUSTER-1) in workload domain 2 (VI-2) does not exceed its availability threshold. Therefore, the example resource optimization controller 208 can cause a reallocation of the third cluster to the workload domain 1 (e.g., VI-2-CLUSTER-1 becomes VI-1-CLUSTER-3). In some examples, the resource optimization controller 208 causes the host movement controller 210 to execute the reallocation of a cluster from one workload domain to another.
In some examples, the resource LWD cluster management controller 108 includes means for optimizing resource usage across clusters. For example, the means for optimizing resource usage across clusters may be implemented by resource optimizing controller circuitry such as the resource optimization controller 208. In some examples, the resource optimization controller 208 may be instantiated by programmable circuitry such as the example programmable circuitry 812 of
The example LWD cluster management controller 108 includes the example host movement controller 210 to reallocate a host from a source cluster to a destination cluster. The example host movement controller 210 obtains instructions from the example resource optimization controller 208. In some examples, the instructions include information corresponding to the source cluster, the source host, and the destination cluster. To reallocate a source host to a destination cluster, the example host movement controller 210 may reconfigure the destination cluster with information of a source host. For example, specific cluster properties are to be updated to include the moving host (e.g., the source host). In some examples, the host movement controller 210 notifies the resource optimization controller 208 when reconfiguration and, thus, reallocation of a host, is complete.
In some examples, the host movement controller 210 is configured to generate a stretch cluster. As used herein, a stretch cluster refers to a cluster having hosts in two or more availability zones. An availability zone is a collection of infrastructure components that run on a physically distinct, independent infrastructure and is physically separate from a different availability zone. In some examples, a first availability zone is physically separate from a second availability zone when the collection of infrastructure components (e.g., servers, data centers, physical hardware components, etc.) of the first availability zone are located in a different location than the infrastructure components of the second availability zone. For example, first infrastructure components are located at a warehouse and second infrastructure components are located at an office building down the street from the warehouse. In some examples, the host movement controller 210 generates a stretch cluster by moving hosts from a first availability zone to a second availability zone. Availability zones belonging to the same region are configured to have an equal number of hosts to ensure failover in case any of the availability zones goes down (e.g., fails). As used herein, a region is a distinct location where one or more clusters and, thus, workload domains, are located. In some examples, the host movement controller 210 can generate stretch clusters if a LWD is configured based on a particular region. For example, the LWD management controller 102 logically groups together workload domains based on their location. For example, if a first workload domain, a second workload domain, and a third workload domain are deployed in the geographic region (e.g., same city, town, county, etc.), then the LWD management controller 102 can logically group the first, second, and third workload domains based on the location criterion. In such an example, the host movement controller 210 can create availability zones.
In some examples, the LWD cluster management controller 108 includes means for reallocating a host from a source cluster to a destination cluster. For example, the means for reallocating a host from a source cluster to a destination cluster may be implemented by host movement controller circuitry such as the host movement controller 210. In some examples, the host movement controller 210 may be instantiated by programmable circuitry such as the example programmable circuitry 812 of
The example LWD cluster management controller 108 includes the example scheduling controller 212 to schedule host movement when the LWD cluster management controller 108 operates in an automatic optimization mode. In some examples, the scheduling controller 212 is inactive when an automatic optimization mode is not enabled. The automatic optimization mode is a mode of the LWD cluster management controller 108 that is configured by an organization, user, operator, administrator. The automatic optimization mode causes the resource optimization controller 208 to periodically (e.g., once an hour, once a day, once a week, etc.) execute the optimization process on the clusters in the grouped together in a LWD. As used herein, the optimization process is the process described above in connection with the resource monitor controller 206 and the resource optimization controller 208 and described in further detail below in connection with
In some examples, the scheduling controller 212 initiates the optimization process on a periodic basis. For example, the scheduling controller 212 may be configured with a resource check time interval (e.g., 24 hours). In such an example, the scheduling controller 212 waits for the resource check time interval to lapse and then notifies the resource monitor controller 206 to check for LWD clusters having a critical situation. The trigger (e.g., the notification to the resource monitor controller 206) is sent, the resource check time interval resets. In this manner, the example scheduling controller 212 initiates an automatic optimization of resources across all LWD clusters, eliminating a requirement of the user to constantly select an option to optimize the resources.
In some examples, the LWD cluster management controller 108 includes means for scheduling host movement when the LWD cluster management controller 108 operates in an automatic optimization mode. For example, the means for scheduling host movement when the LWD cluster management controller 108 operates in an automatic optimization mode may be implemented by scheduling controller circuitry such as the scheduling controller 212. In some examples, the scheduling controller 212 may be instantiated by programmable circuitry such as the example programmable circuitry 812 of
While an example manner of implementing the LWD cluster management controller 108 of
The example LWD cluster operator controller 110 is in connection with an example LWD 302. In the illustrated example, the LWD cluster operator controller 110 applies policies and upgrades to the example LWD 302. The example LWD 302 includes example cluster 1 (CLUSTER-1), example CLUSTER-2, example CLUSTER-3, example CLUSTER-4, example CLUSTER-5, and example CLUSTER-6. In the illustrated example of
The example LWD operator controller 104 includes an example upgrade interface 304, an example desired state configuration interface 306, an example certificate management interface 308, an example password management interface 310, an example cluster resolver controller 312, an example datastore 314, an example message bus 316, and an example cluster orchestrator controller 318. In some examples, the cluster resolver controller 312 is instantiated by programmable circuitry executing the cluster resolver controller instructions and/or configured to perform operations such as those represented by the flowchart of
In the illustrated example of
In the illustrated example of
In the illustrated example of
In the illustrated example of
In the illustrated example of
In the illustrated example, the cluster resolver controller 312 submits a query to the datastore 314 based on the identifying information. For example, the cluster resolver controller 312 is to submit a query to the datastore 314 for a number of clusters in the LWD 302, based on utilizing the identifying information (e.g., the cluster identifiers), for names of the clusters in the LWD 302, for current versions of the clusters in the LWD 302, for current security policies of the clusters in the LWD 302, etc. Based on the query, the cluster resolver controller 312 is to identify the logical workload domain clusters as a target logical workload domain cluster to perform the service of the request. In the illustrated example, the cluster resolver controller 312 provides instructions to the cluster orchestrator controller 318, via the cluster operations message bus 316, indicative of the service to perform and on which and/or for which clusters the service is to be performed.
In some examples, the LWD cluster operator controller 110 includes means for resolving a LWD cluster to which an instruction and/or request is to apply. For example, the means for resolving a LWD cluster to which an instruction and/or request is to apply may be implemented by cluster resolver controller circuitry such as the cluster resolver controller 312. In some examples, the cluster resolver controller 312 may be instantiated by programmable circuitry such as the example programmable circuitry 812 of
In the illustrated example of
In the illustrated example of
In some examples, the LWD cluster operator controller 110 includes means for communicating and/or publishing instructions and/or operations. For example, the means for communicating and/or publishing instructions and/or operations may be implemented by the cluster operations bus 316. In some examples, the cluster operations bus 316 may be instantiated by programmable circuitry such as the example programmable circuitry 812 of
In the illustrated example of
In some examples, the cluster orchestrator controller 318 performs the operations (as described above) in fewer clock cycles than operations for clusters not included in a LWD. In some examples, the operation at each of the clusters (CLUSTER-1, CLUSTER-2, CLUSTER-3, CLUSTER-4, CLUSTER-5, CLUSTER-6) occurs simultaneously and/or concurrently and, thus, are complete before operations applied individually to clusters not in a LWD. In some examples, the cluster orchestrator controller 318 generates reports indicative of results of the operations applied to the clusters in the LWD 302. For example, the cluster orchestrator controller 318 may generate reports indicative that an upgrade was successful or unsuccessful, that a desired state configuration was successfully or unsuccessfully applied, etc. In some examples, the cluster orchestrator controller 318 generates reports indicative of resource usage after the operation was applied (e.g., CPU usage, memory capacity, etc.). The example cluster orchestrator controller 318 may be implemented by processor circuitry and/or controller circuitry.
In some examples, the LWD cluster operator controller 110 includes means for orchestrating operations and/or services. For example, the means for orchestrating operations and/or services may be implemented by cluster orchestrator controller circuitry such as the cluster orchestrator controller 318. In some examples, the cluster orchestrator controller 318 may be instantiated by programmable circuitry such as the example programmable circuitry 812 of
While an example manner of implementing the LWD cluster operator controller 110 of
Flowchart(s) representative of example machine readable instructions, which may be executed by programmable circuitry to implement and/or instantiate the LWD system 100 of
The program(s) may be embodied in instructions (e.g., software and/or firmware) stored on one or more non-transitory computer readable and/or machine readable storage medium such as cache memory, a magnetic-storage device or disk (e.g., a floppy disk, a Hard Disk Drive (HDD), etc.), an optical-storage device or disk (e.g., a Blu-ray disk, a Compact Disk (CD), a Digital Versatile Disk (DVD), etc.), a Redundant Array of Independent Disks (RAID), a register, ROM, a solid-state drive (SSD), SSD memory, non-volatile memory (e.g., electrically erasable programmable read-only memory (EEPROM), flash memory, etc.), volatile memory (e.g., Random Access Memory (RAM) of any type, etc.), and/or any other storage device or storage disk. The instructions of the non-transitory computer readable and/or machine readable medium may program and/or be executed by programmable circuitry located in one or more hardware devices, but the entirety of the program(s) and/or parts thereof could alternatively be executed and/or instantiated by one or more hardware devices other than the programmable circuitry and/or embodied in dedicated hardware. The machine readable instructions may be distributed across multiple hardware devices and/or executed by two or more hardware devices (e.g., a server and a client hardware device). For example, the client hardware device may be implemented by an endpoint client hardware device (e.g., a hardware device associated with a human and/or machine user) or an intermediate client hardware device gateway (e.g., a radio access network (RAN)) that may facilitate communication between a server and an endpoint client hardware device. Similarly, the non-transitory computer readable storage medium may include one or more mediums. Further, although the example programs are described with reference to the flowchart illustrated in
The machine readable instructions described herein may be stored in one or more of a compressed format, an encrypted format, a fragmented format, a compiled format, an executable format, a packaged format, etc. Machine readable instructions as described herein may be stored as data (e.g., computer-readable data, machine-readable data, one or more bits (e.g., one or more computer-readable bits, one or more machine-readable bits, etc.), a bitstream (e.g., a computer-readable bitstream, a machine-readable bitstream, etc.), etc.) or a data structure (e.g., as portion(s) of instructions, code, representations of code, etc.) that may be utilized to create, manufacture, and/or produce machine executable instructions. For example, the machine readable instructions may be fragmented and stored on one or more storage devices, disks and/or computing devices (e.g., servers) located at the same or different locations of a network or collection of networks (e.g., in the cloud, in edge devices, etc.). The machine readable instructions may require one or more of installation, modification, adaptation, updating, combining, supplementing, configuring, decryption, decompression, unpacking, distribution, reassignment, compilation, etc., in order to make them directly readable, interpretable, and/or executable by a computing device and/or other machine. For example, the machine readable instructions may be stored in multiple parts, which are individually compressed, encrypted, and/or stored on separate computing devices, wherein the parts when decrypted, decompressed, and/or combined form a set of computer-executable and/or machine executable instructions that implement one or more functions and/or operations that may together form a program such as that described herein.
In another example, the machine readable instructions may be stored in a state in which they may be read by programmable circuitry, but require addition of a library (e.g., a dynamic link library (DLL)), a software development kit (SDK), an application programming interface (API), etc., in order to execute the machine-readable instructions on a particular computing device or other device. In another example, the machine readable instructions may need to be configured (e.g., settings stored, data input, network addresses recorded, etc.) before the machine readable instructions and/or the corresponding programs can be executed in whole or in part. Thus, machine readable, computer readable and/or machine readable media, as used herein, may include instructions and/or program(s) regardless of the particular format or state of the machine readable instructions and/or program(s).
The machine readable instructions described herein can be represented by any past, present, or future instruction language, scripting language, programming language, etc. For example, the machine readable instructions may be represented using any of the following languages: C, C++, Java, C#, Perl, Python, JavaScript, HyperText Markup Language (HTML), Structured Query Language (SQL), Swift, etc.
As mentioned above, the example operations of
“Including” and “comprising” (and all forms and tenses thereof) are used herein to be open ended terms. Thus, whenever a claim employs any form of “include” or “comprise” (e.g., comprises, includes, comprising, including, having, etc.) as a preamble or within a claim recitation of any kind, it is to be understood that additional elements, terms, etc., may be present without falling outside the scope of the corresponding claim or recitation. As used herein, when the phrase “at least” is used as the transition term in, for example, a preamble of a claim, it is open-ended in the same manner as the term “comprising” and “including” are open ended. The term “and/or” when used, for example, in a form such as A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) B with C, or (7) A with B and with C. As used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, or (3) at least one A and at least one B. Similarly, as used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, or (3) at least one A and at least one B. As used herein in the context of describing the performance or execution of processes, instructions, actions, activities, etc., the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, or (3) at least one A and at least one B. Similarly, as used herein in the context of describing the performance or execution of processes, instructions, actions, activities, etc., the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, or (3) at least one A and at least one B.
As used herein, singular references (e.g., “a”, “an”, “first”, “second”, etc.) do not exclude a plurality. The term “a” or “an” object, as used herein, refers to one or more of that object. The terms “a” (or “an”), “one or more”, and “at least one” are used interchangeably herein. Furthermore, although individually listed, a plurality of means, elements, or actions may be implemented by, e.g., the same entity or object. Additionally, although individual features may be included in different examples or claims, these may possibly be combined, and the inclusion in different examples or claims does not imply that a combination of features is not feasible and/or advantageous.
The example LWD system 100 requests input information for one of the two or more workload domains (block 404). For example, the LWD management controller 102 requires additional information to configure a workload domain. In some examples, a request to create a LWD domain includes some information about the desired workload domain, such as a job of the workload domain (e.g., the intended use of the workload domain), the creator of the workload domain, a cluster name, management and host components, etc. The input information is to assist the LWD management controller 102 in configuring the workload domain prior to deployment. In some examples, the input information is to be utilized to create a template for workload domains in the LWD.
The example LWD system 100 generates a first workload domain based on the input information (block 406). For example, the LWD management controller 102 configures and deploys the first workload domain with information specific to the input information. In some examples, the input information includes a domain name, an organization name, a cluster name, a cluster image, a username, a password, and/or a host configuration protocol (e.g., a dynamic host configuration protocol (DHCP) that provides and assigned an IP address to a host). In some examples, the input information includes an indication to add the first workload domain to the requested LWD. Therefore, the first workload domain is configured to be included in the requested LWD and deployed as a workload domain grouped within the requested LWD.
The example LWD system 100 allocates cluster(s) of host(s) to the first workload domain (block 408). For example, the LWD management controller 102 allocates one or more clusters (VI-1-CLUSTER-1, VI-1-CLUSTER-2) that the first workload domain is to execute on and/or consume resources from, the one or more clusters including one or more host(s). In some examples, the LWD management controller 102 is to select one or more available hosts to be included in the cluster(s) (VI-1-CLUSTER-1, VI-1-CLUSTER-2) for the first workload domain based on the host configuration protocol. Additionally and/or alternatively, the example LWD management controller 102 is to select one or more available hosts based on the available resources of the host, a priority of the job of the first workload domain (e.g., a high priority job that should have a lot of resources, a low priority job that has the bandwidth to share resources, etc.), etc.
The example LWD system 100 extracts information from the first workload domain, the information corresponding to a workload domain configuration (block 410). For example, the LWD management controller 102 copies information from the configuration of the first workload domain. In some examples, the copied information includes a domain name, an organization name, a cluster name, a cluster image, a username, a password, and/or a LWD identifier (e.g., an identifier of the requested LWD), etc.
The example LWD system 100 generates a reference configuration template for subsequent workload domains based on the extracted information (block 412). For example, the LWD management controller 102 generates a file of pre-configured information that is typically repeated in other workload domains. In some examples, the pre-configured information is associated with the LWD. For example, one reference configuration template is utilized for a first LWD and a different reference configuration template with different pre-configured information is utilized for a second LWD.
The example LWD system 100 stores the reference configuration template (block 414). For example, the LWD management controller 102 store the reference configuration template at the datastore 106 (
The example LWD system 100 determines whether a request is obtained to generate a second workload domain (block 416). For example, the LWD management controller 102 waits to obtain a request to deploy a second workload domain. In some examples, a request to create the second workload domain is in a queue in response to receiving the request to create the logical workload domain. For example, a user may send a request for a generation of two workload domains. In such examples, the LWD management controller 102 retrieves the request from an interface and/or any type of memory (e.g., cache).
In some examples, when the LWD system 100 determines a second request has been obtained (e.g., block 416 returns a value YES), the LWD system 100 identifies a logical workload domain to create the second workload domain (block 418). For example, the LWD management controller 102 determines if the request to generate the second workload domain is a request to include the second workload domain in the LWD. In some examples, the request may not indicate that a workload domain is to be included in a LWD and, thus, a logical workload domain is not identified. In some examples, when the LWD system 100 determines a second request has not been obtained (e.g., block 416 returns a value NO), the operations 400 end.
The example LWD system 100 determines whether a LWD has been identified (block 420). For example, the LWD management controller 102 determines whether a description of the second workload domain matches a description of the LWD. Additionally and/or alternatively, the example LWD management controller 102 determines whether specific data (e.g., metadata) is included in the request corresponding to the LWD. In some examples, if the LWD system 100 does not identify a LWD (e.g., block 420 returns a value NO), control returns to block 404.
In some examples, if the LWD system 100 does identify a LWD (e.g., block 420 returns a value YES), the example LWD system 100 determines whether an identified LWD corresponds to the first workload domain (block 422). For example, the LWD management controller 102 scans the request to identify data and/or information indicative of a LWD. In some examples, if the LWD system 100 determines that the identified LWD does not correspond to the first workload domain (e.g., block 422 returns a value NO), control returns to block 404.
In some examples, if the LWD system 100 determines that the identified LWD does correspond to the first workload domain (e.g., block 422 returns a value YES), the example LWD system 100 invokes the reference configuration template (block 424). For example, the LWD management controller 102 obtains the reference configuration template that includes pre-configured information for workload domains in the LWD.
The example LWD system 100 requests a host configuration protocol (block 426). For example, the LWD management controller 102 requests input information corresponding to which cluster(s) of host(s) and/or to which IP address the second workload domain system is to be associated.
The example LWD system 100 adds the host configuration protocol to the reference template to generate the second workload domain (block 428). For example, the LWD management controller 102 populates the reference configuration template with the specific host configuration protocol. In some examples, by populating the reference configuration template with the host configuration protocol, the reference configuration template becomes a unique workload domain that is ready for deployment. For example, the reference configuration template becomes the second workload domain.
The example LWD system 100 deploys the second workload domain at the LWD (block 430). For example, the LWD management controller 102 deploys the second workload domain and associates it with the example LWD. In the illustrated example, the first workload domain and the second workload domain can be consumed as a single resource.
The example LWD system 100 determines whether another workload domain is to be generated (block 432). For example, the LWD management controller 102 may obtain more requests to configure and deploy workload domains and/or may reference a queue of requests to configure and/or deploy workload domains. If the example LWD system 100 determines another workload domain is to be generated (e.g., block 432 returns a value YES), control returns to block 418.
If the example LWD system 100 determines another workload domain is not to be generated (e.g., block 432 returns a value NO), the example operations 400 end.
The example LWD system 100 identifies two or more clusters in the LWD cluster (block 504). For example, the LWD cluster operator controller 110 (
The example LWD system 100 is to generate a message indicative of the two or more clusters and the service to be executed (block 506). For example, the cluster operations message bus 316 is to configure and publish a message, such as instructions, a request, etc., based on communication from the cluster resolver controller 312. For example, the cluster resolver controller 312 is to notify the bus 316 (
The example LWD system 100 is to obtain the message (block 508). For example, the LWD cluster operator controller 110 and/or the cluster orchestrator controller 318 (
The example LWD system 100 simultaneously and/or concurrently orchestrates the service on each of the two or more clusters (block 510). For example, the LWD cluster operator controller 110 and/or the cluster orchestrator controller 318 is to apply the service, identified in the message from the bus 316, to all of the clusters also identified in the message from the bus 316. For example, the LWD cluster operator controller 110 and/or the cluster orchestrator controller 318 applies the service by identifying the service for each cluster in the LWD and configuring the service for the clusters. The example cluster orchestrator controller 318 configures the service by generating instructions that include relevant data for the particular cluster. For example, when a desired state configuration is to be applied to the clusters, the cluster orchestrator controller 318 includes relevant information relating to how each cluster should be configured, including a version of the cluster, storage size and/or storage interface of the cluster, tools to be added to the clusters, etc. In some examples, when the passwords are to be updated, the cluster orchestrator controller 318 includes relevant information in the instructions about previous passwords used by which clusters and the new password to be applied to the clusters. The example cluster orchestrator controller 318 conveys the instructions to the clusters.
The example LWD system 100 generates a report including results of the service (block 512). For example, the cluster orchestrator controller 318 generates reports indicative of the outcome of the uniform configuration (e.g., a desired state configuration) as successful or unsuccessful. In some examples, the cluster orchestrator controller 318 generates reports indicative of resource usage after the operation was applied (e.g., CPU usage, memory capacity, etc.).
The example operations 500 end after the LWD system 100 generates a report. In some examples, the operations 500 may be repeated when the example LWD system 100 obtains a new request to perform a service.
The example LWD system 100 identifies a LWD having two or more clusters (block 604). For example, the LWD cluster management controller 108 and/or the resource monitor controller 206 (
The example LWD system 100 selects a cluster (block 608). For example, the LWD cluster management controller 108 and/or the resource monitor controller 206 identifies a cluster, in the LWD, to analyze. In some examples, the resource monitor controller 206 selects a first cluster in the LWD based on an identifier of the first cluster. For example, the resource monitor controller 206 iterates through the clusters in the LWD and, thus, iterates numerically and/or alphabetically based on the naming convention (e.g., identifier) of the clusters. For example, if the clusters have numerical identifiers (cluster-1, cluster-2, cluster-3, etc.), then the example resource monitor controller 206 iterates numerically through the clusters (e.g., selects cluster-1 to analyze first, selects cluster-2 to analyze second, etc.). In some examples, if the clusters have alphabetical identifiers (e.g., cluster-A, cluster-B, cluster-C, etc.), then the example resource monitor controller 206 iterates alphabetically through the clusters (e.g., selects cluster-A to analyze first, selects cluster-B to analyze second, etc.). Alternatively, the example resource monitor controller 206 can utilize any method to select a cluster to analyze in the LWD.
The example LWD system 100 checks resource usage of the selected cluster (block 610). For example, the LWD cluster management controller 108 and/or the resource monitor controller 206 obtains data from the hosts in the clusters that provide insight into resource utilization, performance metrics, and overall health of a virtual infrastructure. In some examples, the resource monitor controller 206 obtains or determines a percentage value related to the resource usage of the selected cluster.
The example LWD system 100 determines whether resource usage satisfies a crunch threshold for the selected cluster (block 612). For example, the LWD cluster management controller 108 and/or the resource monitor controller 206 compares the resource usage (e.g., denoted as a percentage) to a pre-defined threshold value that represents an upper limit of the resource usage in the selected cluster. In some examples, when a resource usage satisfies the crunch threshold, the resource monitor controller 206 determines that the selected cluster is in a critical condition. In some examples, when the resource usage does not satisfy the crunch threshold, the resource monitor controller 206 determines that the selected cluster is not in a critical condition.
In some examples, when the LWD system 100 determines that the resource usage satisfies a crunch threshold for the selected cluster (e.g., block 612 returns a value YES), the LWD system 100 identifies a source cluster in the LWD having highest availability (block 616). For example, the LWD cluster management controller 108 and/or resource optimization controller 208 is provided with pre-defined availability threshold values indicative of a resource availability of a cluster. In some examples, the resource optimization controller 208 utilizes the availability threshold to identify a source cluster that has available hosts that can be used to reduce a crunch of the selected cluster in the critical situation.
The example LWD system 100 reallocates a host from the source cluster to the selected destination cluster (block 618). In some examples, when the resource monitor controller 206 determines that the selected cluster is in a critical situation, the selected cluster becomes a selected destination cluster. Therefore, the example LWD cluster management controller 108 and/or the example resource optimization controller 208 instructs the example host movement controller 210 to reallocate a host from the source cluster to the selected destination cluster. The example host movement controller 210 reconfigures the host from the source cluster with information of the selected destination cluster in order to reallocate the host in the source cluster.
The example LWD system 100 recalculates the resource usage of the destination cluster and the source cluster (block 620). For example, the LWD cluster management controller 108 and/or the resource optimization controller 208 collects resource usage data from the source cluster and selected destination cluster after the clusters have been reconfigured.
In some examples, after the host has been reallocated, control returns to block 610 where the LWD cluster management controller 108 and/or the resource monitor controller 206 compares the resource usage data to the crunch threshold to determine if 1) there is still a critical situation at the destination cluster or 2) there is a new critical situation at the source cluster due to the reallocation of one of its clusters.
In some examples, when the LWD system 100 determines that the resource usage does not satisfy a crunch threshold for the selected cluster (e.g., block 612 returns a value NO), the LWD system 100 determines whether there is another available cluster (block 614). For example, the LWD cluster management controller 108 and/or the resource monitor controller 206 identifies the next cluster in the LWD to analyze. In some examples, the resource monitor controller 206 selects the next cluster in the LWD based on an iteration through the cluster's identifiers.
In some examples, if the resource monitor controller 206 determines that there is another available cluster (e.g., block 614 returns a value YES), control returns to block 608. For example, the resource monitor controller 206 selects the next cluster to analyze.
In some examples, if the resource monitor controller 206 determines that there is not another available cluster (e.g., block 614 returns a value NO), the LWD cluster management controller 108 determines whether the system is in an automatic optimization mode (block 622). For example, the LWD cluster management controller 108 identifies whether the scheduling controller 212 is active or inactive. In some examples, if the LWD cluster management controller 108 determines that the system is in automatic optimization mode (e.g., block 622 returns a value YES), then the example LWD cluster management controller 108 and/or the example scheduling controller 212 waits for a predetermined time period (block 624). For example, the scheduling controller 212 is provided with a time period to periodically check the resource usage of the clusters. In some examples, the scheduling controller 212 is implemented by a counter, set with the predetermined time period and triggered when all clusters in the LWD have been analyzed. In some examples, the scheduling controller 212 triggers the resource monitor controller 206 when the time period ends. For example, control returns to block 604 when the time period ends.
In some examples, when the LWD cluster management controller 108 determines that the system is not in an automatic optimization mode (e.g., block 622 returns a value NO), the example operations 600 end. For example, the scheduling controller 212 is inactive when an automatic optimization mode is not enabled and, thus, the LWD cluster management controller 108 does not repeat the operations 600 until another request to optimize resources has been obtained (e.g., block 602a).
In the illustrated example of
In the illustrated example of
In the illustrated example of
In the illustrated example of
In the illustrated example of
In the illustrated example of
In the illustrated example of
The programmable circuitry platform 800 of the illustrated example includes programmable circuitry 812. The programmable circuitry 812 of the illustrated example is hardware. For example, the programmable circuitry 812 can be implemented by one or more integrated circuits, logic circuits, FPGAs, microprocessors, CPUs, GPUs, DSPs, and/or microcontrollers from any desired family or manufacturer. The programmable circuitry 812 may be implemented by one or more semiconductor based (e.g., silicon based) devices. In this example, the programmable circuitry 812 implements the example LWD management controller 102, the example LWD operator controller 104, the example LWD cluster management controller 108, the example resource monitor controller 206, the example resource optimization controller 208, the example host movement controller 210, the example scheduling controller 212, the example cluster resolver controller 312, the example cluster operations bus 316, the example cluster orchestrator controller 318, and/or, more generally, the example LWD cluster operator controller 110, the example operations management controller 112, and the example lifecycle management controller 114.
The programmable circuitry 812 of the illustrated example includes a local memory 813 (e.g., a cache, registers, etc.). The programmable circuitry 812 of the illustrated example is in communication with main memory 814, 816, which includes a volatile memory 814 and a non-volatile memory 816, by a bus 818. The volatile memory 814 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®), and/or any other type of RAM device. The non-volatile memory 816 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 814, 816 of the illustrated example is controlled by a memory controller 817. In some examples, the memory controller 817 may be implemented by one or more integrated circuits, logic circuits, microcontrollers from any desired family or manufacturer, or any other type of circuitry to manage the flow of data going to and from the main memory 814, 816.
The programmable circuitry platform 800 of the illustrated example also includes interface circuitry 820. The interface circuitry 820 may be implemented by hardware in accordance with any type of interface standard, such as an Ethernet interface, a universal serial bus (USB) interface, a Bluetooth® interface, a near field communication (NFC) interface, a Peripheral Component Interconnect (PCI) interface, and/or a Peripheral Component Interconnect Express (PCIe) interface. In the illustrated example, the interface circuitry 820 implements the example cluster management control interface 202, the example host management control interface 204, the example upgrade interface 304, the example desired state configuration interface 306, the example certification management interface 308, and the example password management interface 310.
In the illustrated example, one or more input devices 822 are connected to the interface circuitry 820. The input device(s) 822 permit(s) a user (e.g., a human user, a machine user, etc.) to enter data and/or commands into the programmable circuitry 812. The input device(s) 822 can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a trackpad, a trackball, an isopoint device, and/or a voice recognition system.
One or more output devices 824 are also connected to the interface circuitry 820 of the illustrated example. The output device(s) 824 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube (CRT) display, an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer, and/or speaker. The interface circuitry 820 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip, and/or graphics processor circuitry such as a GPU.
The interface circuitry 820 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) by a network 826. The communication can be by, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a beyond-line-of-sight wireless system, a line-of-sight wireless system, a cellular telephone system, an optical connection, etc.
The programmable circuitry platform 800 of the illustrated example also includes one or more mass storage discs or devices 828 to store firmware, software, and/or data. Examples of such mass storage discs or devices 828 include magnetic storage devices (e.g., floppy disk, drives, HDDs, etc.), optical storage devices (e.g., Blu-ray disks, CDs, DVDs, etc.), RAID systems, and/or solid-state storage discs or devices such as flash memory devices and/or SSDs. In the illustrated example, the mass storage discs or devices 828 implement the datastore 106 (
The machine readable instructions 832, which may be implemented by the machine readable instructions of
The cores 902 may communicate by a first example bus 904. In some examples, the first bus 904 may be implemented by a communication bus to effectuate communication associated with one(s) of the cores 902. For example, the first bus 904 may be implemented by at least one of an Inter-Integrated Circuit (I2C) bus, a Serial Peripheral Interface (SPI) bus, a PCI bus, or a PCIe bus. Additionally or alternatively, the first bus 904 may be implemented by any other type of computing or electrical bus. The cores 902 may obtain data, instructions, and/or signals from one or more external devices by example interface circuitry 906. The cores 902 may output data, instructions, and/or signals to the one or more external devices by the interface circuitry 906. Although the cores 902 of this example include example local memory 920 (e.g., Level 1 (L1) cache that may be split into an L1 data cache and an L1 instruction cache), the microprocessor 900 also includes example shared memory 910 that may be shared by the cores (e.g., Level 2 (L2 cache)) for high-speed access to data and/or instructions. Data and/or instructions may be transferred (e.g., shared) by writing to and/or reading from the shared memory 910. The local memory 920 of each of the cores 902 and the shared memory 910 may be part of a hierarchy of storage devices including multiple levels of cache memory and the main memory (e.g., the main memory 814, 816 of
Each core 902 may be referred to as a CPU, DSP, GPU, etc., or any other type of hardware circuitry. Each core 902 includes control unit circuitry 914, arithmetic and logic (AL) circuitry (sometimes referred to as an ALU) 916, a plurality of registers 918, the local memory 920, and a second example bus 922. Other structures may be present. For example, each core 902 may include vector unit circuitry, single instruction multiple data (SIMD) unit circuitry, load/store unit (LSU) circuitry, branch/jump unit circuitry, floating-point unit (FPU) circuitry, etc. The control unit circuitry 914 includes semiconductor-based circuits structured to control (e.g., coordinate) data movement within the corresponding core 902. The AL circuitry 916 includes semiconductor-based circuits structured to perform one or more mathematic and/or logic operations on the data within the corresponding core 902. The AL circuitry 916 of some examples performs integer based operations. In other examples, the AL circuitry 916 also performs floating-point operations. In yet other examples, the AL circuitry 916 may include first AL circuitry that performs integer-based operations and second AL circuitry that performs floating-point operations. In some examples, the AL circuitry 916 may be referred to as an Arithmetic Logic Unit (ALU).
The registers 918 are semiconductor-based structures to store data and/or instructions such as results of one or more of the operations performed by the AL circuitry 916 of the corresponding core 902. For example, the registers 918 may include vector register(s), SIMD register(s), general-purpose register(s), flag register(s), segment register(s), machine-specific register(s), instruction pointer register(s), control register(s), debug register(s), memory management register(s), machine check register(s), etc. The registers 918 may be arranged in a bank as shown in
Each core 902 and/or, more generally, the microprocessor 900 may include additional and/or alternate structures to those shown and described above. For example, one or more clock circuits, one or more power supplies, one or more power gates, one or more cache home agents (CHAs), one or more converged/common mesh stops (CMSs), one or more shifters (e.g., barrel shifter(s)) and/or other circuitry may be present. The microprocessor 900 is a semiconductor device fabricated to include many transistors interconnected to implement the structures described above in one or more integrated circuits (ICs) contained in one or more packages.
The microprocessor 900 may include and/or cooperate with one or more accelerators (e.g., acceleration circuitry, hardware accelerators, etc.). In some examples, accelerators are implemented by logic circuitry to perform certain tasks more quickly and/or efficiently than can be done by a general-purpose processor. Examples of accelerators include ASICs and FPGAs such as those discussed herein. A GPU, DSP and/or other programmable device can also be an accelerator. Accelerators may be on-board the microprocessor 900, in the same chip package as the microprocessor 900 and/or in one or more separate packages from the microprocessor 900.
More specifically, in contrast to the microprocessor 900 of
In the example of
In some examples, the binary file is compiled, generated, transformed, and/or otherwise output from a uniform software platform utilized to program FPGAs. For example, the uniform software platform may translate first instructions (e.g., code or a program) that correspond to one or more operations/functions in a high-level language (e.g., C, C++, Python, etc.) into second instructions that correspond to the one or more operations/functions in an HDL. In some such examples, the binary file is compiled, generated, and/or otherwise output from the uniform software platform based on the second instructions. In some examples, the FPGA circuitry 1000 of
The FPGA circuitry 1000 of
The FPGA circuitry 1000 also includes an array of example logic gate circuitry 1008, a plurality of example configurable interconnections 1010, and example storage circuitry 1012. The logic gate circuitry 1008 and the configurable interconnections 1010 are configurable to instantiate one or more operations/functions that may correspond to at least some of the machine readable instructions of
The configurable interconnections 1010 of the illustrated example are conductive pathways, traces, vias, or the like that may include electrically controllable switches (e.g., transistors) whose state can be changed by programming (e.g., using an HDL instruction language) to activate or deactivate one or more connections between one or more of the logic gate circuitry 1008 to program desired logic circuits.
The storage circuitry 1012 of the illustrated example is structured to store result(s) of the one or more of the operations performed by corresponding logic gates. The storage circuitry 1012 may be implemented by registers or the like. In the illustrated example, the storage circuitry 1012 is distributed amongst the logic gate circuitry 1008 to facilitate access and increase execution speed.
The example FPGA circuitry 1000 of
Although
It should be understood that some or all of the circuitry of
In some examples, some or all of the circuitry of
In some examples, the programmable circuitry 812 of
A block diagram illustrating an example software distribution platform 1105 to distribute software such as the example machine readable instructions 832 of
From the foregoing, it will be appreciated that example systems, apparatus, articles of manufacture, and methods have been disclosed that apply resource optimization and services at a cluster level to clusters in a logical workload domain. For example, examples disclosed herein enable an organization to apply upgrades, security policies, desired state configurations, and resource usage optimization at a lower level of granularity in a logical workload domain, relative to the level of granularity previously provided. Disclosed systems, apparatus, articles of manufacture, and methods improve the efficiency of using a computing device by reducing computation cycles required to apply services to clusters operating in a workload domain and improves the efficiency of using a computing device by regularly checking for critical conditions in the computing device and acting to eliminate the critical conditions to improve execution of an application or workload. Disclosed systems, apparatus, articles of manufacture, and methods are accordingly directed to one or more improvement(s) in the operation of a machine such as a computer or other electronic and/or mechanical device.
The following claims are hereby incorporated into this Detailed Description by this reference. Although certain example systems, apparatus, articles of manufacture, and methods have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all systems, apparatus, articles of manufacture, and methods fairly falling within the scope of the claims of this patent.