A data center is a facility that houses servers, data storage devices, and/or other associated components such as backup power supplies, redundant data communications connections, environmental controls such as air conditioning and/or fire suppression, and/or various security systems. A data center may be maintained by an information technology (IT) service provider. An enterprise may utilize data storage and/or data processing services from the provider in order to run applications that handle the enterprises' core business and operational data. The applications may be proprietary and used exclusively by the enterprise or made available through a network for anyone to access and use.
Virtual computing instances (VCIs), such as virtual machines and containers, have been introduced to lower data center capital investment in facilities and operational expenses and reduce energy consumption. A VCI is a software implementation of a computer that executes application software analogously to a physical computer. VCIs have the advantage of not being bound to physical resources, which allows VCIs to be moved around and scaled to meet changing demands of an enterprise without affecting the use of the enterprise's applications. In a software-defined data center, storage resources may be allocated to VCIs in various ways, such as through network attached storage (NAS), a storage area network (SAN) such as fiber channel and/or Internet small computer system interface (iSCSI), a virtual SAN, and/or raw device mappings, among others.
Virtualization and cloud computing enterprises (e.g., VMware) provide a suite of solutions to optimize multi-cloud environments for cost, performance, security, compliance, and other criteria. They can play a pivotal role in achieving consistency in operations by providing a centralized gateway to multi-cloud. The power of the software platform lies in automating every manageable step, which is key in delivering efficient and more intelligent management across all clouds. Enabling automation in each operation reduces manual overheads and facilitates a self-service user experience for our multi-cloud consumers. Many times, automation tasks need to be performed at a specific time of day or on a recurring schedule, which results in an added burden to consumers. Embodiments of the present disclosure can extend automation capabilities with scheduling. For instance, embodiments herein can promote scheduling as a first-class citizen in any automation with a fully managed scheduler service. With such a built-in capability at the platform level, it becomes possible for scheduling cross-service cross-cloud automation. This gives a cost-efficient edge over the individual public or private cloud scheduling services. In some instances, one or more embodiments of the present disclosure may be referred to as a “cloud scheduler.”
When considering DevOps operations or automation, typically the first thing that comes to mind is «cron». Cron usually comes with its management and maintenance overhead of computing, personnel, etc., especially when application tasks are to be scheduled reliably at scale. Embodiments herein can hide cron overheads by providing a scheduling solution based on distributed systems' key characteristics and making scheduling application tasks more convenient and centralized.
The scheduler service in accordance with embodiments herein can support multitenancy and work with various platforms (e.g., VMware Common Service Platform (CSP)). Other products or applications can onboard quickly on scheduler service within the scope of these platforms. It can provide an interface to schedule future invocations of virtually any task, defined as an HTTP endpoint within an on-premises cloud or an external public cloud, once or at a recurring pattern by leveraging cron expressions.
Embodiments herein can invoke tasks reliably at least once with limited retries (to mitigate failures) within a few seconds of the actual scheduled time. It supports dynamic horizontal scaling to accommodate millions of task invocations in a multi-tenant environment. It can expose a unified REST API and UI that any consumer, either a user or another cloud service, can leverage to define its scheduling needs and access retrospective details of task invocations within the tenant boundaries. For instance, a tenant user could schedule execution of pipelines in a release automation solution (e.g., vRealize Code Stream) or periodically reclaim outdated virtual machine snapshots in a management platform (e.g., vRealize Operations) or CSP can schedule onboarding or offboarding of a customer tenant on a Cloud.
Previous approaches introduce scheduling scoped within a product boundary. The efforts to develop such features are redundant and time-consuming. Hence, the scheduling capabilities slowly appear across multiple product releases in a very limited scope. The customers are left to use other providers for scheduling until then or open a feature request and wait for future releases. Embodiments of the present disclosure offer scheduling enablement in a much faster way for any automation in the product to give a better user experience. Embodiments herein also provide a system architecture to scale reliably for scheduling tens of thousands of automations across multiple tenants for many products.
As referred to herein, a virtual computing instance (VCI) covers a range of computing functionality. VCIs may include non-virtualized physical hosts, virtual machines (VMs), and/or containers. A VM refers generally to an isolated end user space instance, which can be executed within a virtualized environment. Other technologies aside from hardware virtualization that can provide isolated end user space instances may also be referred to as VCIs. The term “VCI” covers these examples and combinations of different types of VCIs, among others. VMs, in some embodiments, operate with their own guest operating systems on a host using resources of the host virtualized by virtualization software (e.g., a hypervisor, virtual machine monitor, etc.).
Multiple VCIs can be configured to be in communication with each other in an SDDC. In such a system, information can be propagated from a client (e.g., an end user) to at least one of the VCIs in the system, between VCIs in the system, and/or between at least one of the VCIs in the system and a server. SDDCs are dynamic in nature. For example, VCIs and/or various application services, may be created, used, moved, or destroyed within the SDDC. When VCIs are created, various processes and/or services start running and consuming resources. As used herein, “resources” are physical or virtual components that have a finite availability within a computer or SDDC. For example, resources include processing resources, memory resources, electrical power, and/or input/output resources.
While the specification refers generally to VCIs, the examples given could be any type of data compute node, including physical hosts, VCIs, non-VCI containers, and hypervisor kernel network interface modules. Embodiments of the present disclosure can include combinations of different types of data compute nodes.
The host 102 can incorporate a hypervisor 104 that can execute a number of virtual computing instances 106-1, 106-2, . . . , 106-N (referred to generally herein as “VCIs 106”). The VCIs can be provisioned with processing resources 108 and/or memory resources 110 and can communicate via the network interface 112. The processing resources 108 and the memory resources 110 provisioned to the VCIs can be local and/or remote to the host 102. For example, in a software defined data center, the VCIs 106 can be provisioned with resources that are generally available to the software defined data center and not tied to any particular hardware device. By way of example, the memory resources 110 can include volatile and/or non-volatile memory available to the VCIs 106. The VCIs 106 can be moved to different hosts (not specifically illustrated), such that a different hypervisor manages the VCIs 106. The host 102 can be in communication with a cloud scheduler 114. An example of the cloud scheduler 114 is illustrated and described in more detail below. In some embodiments, the cloud scheduler 114 can be a server, such as a web server.
As shown in
An example schedule configuration request to invoke an HTTP endpoint can be:
The consumer can authorize with service credentials or organization-level user tokens when configuring schedules. One can also use the provided REST API 220 to cancel the scheduled automation task. The service also provides historical target invocations via the same API 220.
A partition 221 is a hash-key that is assigned to a schedule in the service. One partition 221 can be associated with a limited number of schedules (e.g., 5000 schedules per partition). A default partition is created per application, or a service onboarded on the scheduler. Partitions 221 can be scaled up or down based on the demand for a given consumer. The advantage of partitioning is that overwhelming number of overdue target invocations can be queued in parallel instead of queuing sequentially; it greatly reduces the latency of target invocations when there are many (e.g., thousands) of pending targets at a given time.
A schedule manager 226 can be responsible for reading ready schedules from the configuration store 222 and/or cache store 224 and pushing them into a streaming data bus 228-1. The schedule manager 226 can run two tickers. The first ticker fetches from the configuration store 222 the schedules with a target invocation expected within a minute from «now» and pushes them into the cache store 224. The second ticker (which runs every second) reads any overdue schedules from the cache store 224 and pushes them into the streaming message queue/data bus 228-1. The manager 226 claims a partition in its tickers so that only one manager 226 is working on a single partition at a given time. Any service node can claim the partition. It makes the scheduler service highly available and promotes concurrency when handling schedules in isolation established by the partitions.
The cache store 224 temporarily holds schedules with targets to invoke now or overdue. The cache store 224 is optimized for faster reads to support accuracy of invocations when the service deals with massive arrays of schedules. The cache store 224 is a distributed and highly available component.
The streaming data bus 228-1 is a «pub/sub» component that enables real-time delivery of the overdue schedules to dispatchers 230. To promote scalability and reduce latency of the target invocations, the bus 228-1 is based on streaming fundamentals (e.g., push to consumers) and supports multiple concurrent subscriptions such that it forms consumer groups for a single stream. The bus 228-1 can provide gateway to multiple data streams. There is a default stream (or queue) per application partition.
The dispatcher 230 receives overdue schedules from the streaming data bus 228 and invokes the associated targets. It is a stateless lightweight component created to handle an individual partition. Several dispatchers 230 across nodes can form a cluster to handle the same partition. The overdue target invocations are delivered to only one dispatcher 230 at the most. Dispatchers 230 also auto-scale up or down depending on the load. After target invocation is finished, the dispatcher 230 pushes result of invocation into another streaming data bus 228-2 to update the configuration store 222 and record the invocation for reporting purposes. The dispatcher 230 retries on transient failures of task invocations according to the strategy defined in the schedule configuration; eventually, it will fail the invocation once the retry limit is exceeded.
In operation, a consumer/service's tenant user creates a schedule for a target using the REST API 220. The schedule is associated with an available hash-key (a.k.a. partition) 221. The next-invocation-time is evaluated and updated in the configuration store (PostgreSQL). The first schedule manager 226 ticker reads the schedule from the store 222 to determine if it is due to be invoked in a next minute. The schedule manager 226 pushes the schedule into the cache store 224 (Redis Cache). Another schedule manager 226 ticker (runs every second) reads the schedule from the cache store 224. It pushes this schedule into the streaming data bus 228-1 as soon as it becomes overdue. The dispatcher 230 consumes the schedule and invokes the target associated with it. Once it is done, the dispatcher 230 pushes a completion event back to the streaming data bus 228-2. The schedule Update Manager 232 consumes the completion event and updates the configuration store 222 with invocation results and next-invocation-time for the associated target.
Any service (e.g., VMware service) can onboard itself on this managed scheduler service by authenticating with CSP using the service's provided onboarding REST APIs and provide a valid authentication token when configuring/managing schedule configurations. A default partition is assigned to the service whenever a configuration is created, so that any new schedule can be configured for the service right away.
The number of engines can include a combination of hardware and program instructions that is configured to perform a number of functions described herein. The program instructions (e.g., software, firmware, etc.) can be stored in a memory resource (e.g., machine-readable medium) as well as hard-wired program (e.g., logic). Hard-wired program instructions (e.g., logic) can be considered as both program instructions and hardware.
In some embodiments, the API engine 340 can include a combination of hardware and program instructions that is configured to receive a schedule associated with an automation task to be performed in a virtualized environment via a REST API, wherein the task is associated with a target. In some embodiments, the partition engine 342 can include a combination of hardware and program instructions that is configured to associate the schedule with a partition. In some embodiments, the store engine 344 can include a combination of hardware and program instructions that is configured to store the schedule in a cache store responsive to determining that the schedule is to be invoked within a threshold time period. In some embodiments, the invocation engine 346 can include a combination of hardware and program instructions that is configured to receive the schedule from the cache store and invoke the target responsive to the schedule becoming overdue.
The program instructions (e.g., machine-readable instructions (MRI)) can include instructions stored on the MRM to implement a particular function (e.g., an action such as processing streams of change events). The set of MRI can be executable by one or more of the processing resources 408. The memory resources 410 can be coupled to the machine 448 in a wired and/or wireless manner. For example, the memory resources 410 can be an internal memory, a portable memory, a portable disk, and/or a memory associated with another resource, e.g., enabling MRI to be transferred and/or executed across a network such as the Internet. As used herein, a “module” can include program instructions and/or hardware, but at least includes program instructions.
Memory resources 410 can be non-transitory and can include volatile and/or non-volatile memory. Volatile memory can include memory that depends upon power to store information, such as various types of dynamic random access memory (DRAM) among others. Non-volatile memory can include memory that does not depend upon power to store information. Examples of non-volatile memory can include solid state media such as flash memory, electrically erasable programmable read-only memory (EEPROM), phase change random access memory (PCRAM), magnetic memory, optical memory, and/or a solid state drive (SSD), etc., as well as other types of machine-readable media.
The processing resources 408 can be coupled to the memory resources 410 via a communication path 450. The communication path 450 can be local or remote to the machine 448. Examples of a local communication path 450 can include an electronic bus internal to a machine, where the memory resources 410 are in communication with the processing resources 408 via the electronic bus. Examples of such electronic buses can include Industry Standard Architecture (ISA), Peripheral Component Interconnect (PCI), Advanced Technology Attachment (ATA), Small Computer System Interface (SCSI), Universal Serial Bus (USB), among other types of electronic buses and variants thereof. The communication path 450 can be such that the memory resources 410 are remote from the processing resources 408, such as in a network connection between the memory resources 410 and the processing resources 408. That is, the communication path 450 can be a network connection. Examples of such a network connection can include a local area network (LAN), wide area network (WAN), personal area network (PAN), and the Internet, among others.
As shown in
One or more of the number of modules 440, 442, 444, 446 can include program instructions and/or a combination of hardware and program instructions that, when executed by a processing resource 408, can function as a corresponding engine as described with respect to
For example, the machine 448 can include an API module 440, which can include instructions to receive a schedule associated with an automation task to be performed in a virtualized environment via a REST API, wherein the task is associated with a target. For example, the machine 448 can include a partition module 442, which can include instructions to associate the schedule with a partition. For example, the machine 448 can include a store module 444, which can include instructions to store the schedule in a cache store responsive to determining that the schedule is to be invoked within a threshold time period. For example, the machine 448 can include an invocation module 446, which can include instructions to receive the schedule from the cache store and invoke the target responsive to the schedule becoming overdue.
The present disclosure is not limited to particular devices or methods, which may vary. The terminology used herein is for the purpose of describing particular embodiments, and is not intended to be limiting. As used herein, the singular forms “a”, “an”, and “the” include singular and plural referents unless the content clearly dictates otherwise. Furthermore, the words “can” and “may” are used throughout this application in a permissive sense (i.e., having the potential to, being able to), not in a mandatory sense (i.e., must). The term “include,” and derivations thereof, mean “including, but not limited to.”
The figures herein follow a numbering convention in which the first digit or digits correspond to the drawing figure number and the remaining digits identify an element or component in the drawing. Similar elements or components between different figures may be identified by the use of similar digits. For example, 108 may reference element “08” in
Although specific embodiments have been described above, these embodiments are not intended to limit the scope of the present disclosure, even where only a single embodiment is described with respect to a particular feature. Examples of features provided in the disclosure are intended to be illustrative rather than restrictive unless stated otherwise. The above description is intended to cover such alternatives, modifications, and equivalents as would be apparent to a person skilled in the art having the benefit of this disclosure.
The scope of the present disclosure includes any feature or combination of features disclosed herein (either explicitly or implicitly), or any generalization thereof, whether or not it mitigates any or all of the problems addressed herein. Various advantages of the present disclosure have been described herein, but embodiments may provide some, all, or none of such advantages, or may provide other advantages.
In the foregoing Detailed Description, some features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the disclosed embodiments of the present disclosure have to use more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus, the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate embodiment.