An enterprise may utilize applications or services executing in a cloud computing environment. For example, a business might utilize streaming applications that execute at a data center to process purchase orders, human resources tasks, payroll functions, etc. These applications may be executed within Virtual Machines (“VMs”) in the cloud that are deployed and managed by an orchestration layer or scheduler. Kubernetes is an open source orchestration system that implements a control plane operator to ensure that workloads are executing properly (e.g., a reconciler may change a workload from an actual state to a desired state). Consider, for example, a database system such as PostGRE. The operator for deploying the database is tasked to perform multi-tenanted deployment. Although the database itself may be deployed, for example, into its own namespace with proper security privileges from a network and storage point of view, the control plane will still remain a vulnerable attack point because it is deploying and/or reconciling across the multi-tenanted database. If the operator itself is subject to compromise via cyber-attack, this will open up security concerns for all of the tenants.
A goal of an orchestration system and/or operator should be to allow reconciler logic to be run safely in a multi-tenant way. Today, the reconciler may be part of a single Go language binary (although other programming languages might also be used instead). This means that during execution of multi-tenanted control plane operations, the system is violating the norms of isolated execution itself. It would be desirable to provide reconciler sandboxes for operators in a cloud-based computing environment in a secure, automatic, and accurate manner.
Methods and systems may be associated with a cloud computing environment. A computer processor of an orchestration layer platform may deploy and manage multi-tenant workloads (e.g., each being associated with a VM) in the cloud-based computing environment. A Kubernetes control plane operator associated with the multi-tenant workloads may detect a trigger event (e.g., an actual VM state not matching a desired VM state) that results in a reconciliation request for a particular tenant workload. Responsive to the reconciliation request, serverless tenant execution code, representing reconciler logic compiled into a Web Assembly (“WASM”) module, may be spun up in a WASM sandbox to perform reconciliation for the particular tenant workload.
Some embodiments comprise: means for deploying and managing, by a computer processor of an orchestration layer platform, multi-tenant workloads in the cloud-based computing environment; means for detecting, by a Kubernetes control plane operator associated with the multi-tenant workloads, a trigger event that results in a reconciliation request for a particular tenant workload; and, responsive to the reconciliation request, means for spinning up serverless tenant execution code, representing reconciler logic compiled into a WASM module, in a WASM sandbox to perform reconciliation for the particular tenant workload.
Some technical advantages of some embodiments disclosed herein are improved systems and methods to provide reconciler sandboxes for operators in a cloud-based computing environment in a secure, automatic, and accurate manner.
In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of embodiments. However, it will be understood by those of ordinary skill in the art that the embodiments may be practiced without these specific details. In other instances, well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the embodiments.
One or more specific embodiments of the present invention will be described below. In an effort to provide a concise description of these embodiments, all features of an actual implementation may not be described in the specification. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it should be appreciated that such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure.
Some embodiments described herein provide reconciler sandboxes for operators in a cloud-based computing environment in a secure, automatic, and accurate manner. In particular, some embodiments use a combination of serverless combined with multi-tenanted execution of the control loop itself. This means that each tenant request gets handled by its own reconciler sandbox. The reconciler sandbox may provide following features:
One way to achieve this would be to launch each tenant reconciler (when an event or watch is triggered) into a separate docker container. This would entail setting up the container and its related requirements such as network needs, filesystem, etc. Another way would be to compile the reconciler logic into WSAM modules. Such an approach may provide substantial flexibility.
As used herein, devices, including those associated with the system 100 and any other device described herein, may exchange information via any communication network which may be one or more of a Local Area Network (“LAN”), a Metropolitan Area Network (“MAN”), a Wide Area Network (“WAN”), a proprietary network, a Public Switched Telephone Network (“PSTN”), a Wireless Application Protocol (“WAP”) network, a Bluetooth network, a wireless LAN network, and/or an Internet Protocol (“IP”) network such as the Internet, an intranet, or an extranet. Note that any devices described herein may communicate via one or more such communication networks.
The system 100 may store information into and/or retrieve information from various data stores, which may be locally stored or reside remote from the orchestration layer platform 110 and/or Kubernetes control plane operator 120. Although a single orchestration layer platform 110 and Kubernetes control plane operator 120 are shown in
A user may access the system 100 via a remote device (e.g., a Personal Computer (“PC”), tablet, or smartphone) to view information about and/or manage operational information in accordance with any of the embodiments described herein. In some cases, an interactive graphical user interface display may let an operator or administrator define and/or adjust certain parameters (e.g., to implement various rules and policies) and/or provide or receive automatically generated recommendations or results from the system 100.
At S210, a computer processor of an orchestration layer platform may deploy and/or manage multi-tenant workloads in a cloud-based computing environment. The workloads may be deployed, for example, within VMs that are each assigned an amount of resources (e.g., memory size, Central Processing Unit (“CPU”) utilization, disk space, etc.).
At S220, a Kubernetes control plane operator associated with the multi-tenant workloads may detect a trigger event that results in a reconciliation request for a particular tenant workload. According to some embodiments, the trigger event represents an actual VM state not matching a desired VM state.
The components of the control plane 310 components make decisions, such as scheduling decisions and detecting/responding to system 300 events. An Application Programming Interface (“API”) server 320 in the control plane 310 exposes the Kubernetes API at the front end. The control plane also includes persistence storage 330, such as etcd, to provide consistent and highly-available key value store used as a backing store for cluster data. A scheduler 340 may watch for new pods that don't have an assigned node and selects a node for them to run on. A controller manager 350 may run controller processes, such as node controller, replication controller, endpoints controller, etc. In some cases, a cloud controller manager 360 may embed cloud-specific control logic to link the cluster with a cloud provider API 370. Node components 380 may include a Kubelet 382 agent that makes sure containers are running in a pod and a Kube proxy 384 to maintain network rules on nodes (e.g., with network sessions inside or outside of the cluster).
Referring again to
As will now be described, the WASM sandbox may provide memory isolation from other tenants, maintain code flow integrity, execute a tenant control plane, inherit security features by default.
Cloud computing may demand scalable computing in terms of resource and energy utilization. In some cases, resource utilizations and on-demand provisioning may involve massive data distributions. In serverless computing, developers may need to write functions that execute only when demand comes and keep the resources free at other times. However, existing serverless computing techniques may have limitations in terms of cold-start, a complex architecture for stateful services, problems implementing multi-tenancy, etc. Some embodiments described herein utilize a WASM based runtime that may address some of the existing problems in current serverless architectures. The WASM based runtime may feature resource isolation in terms of CPU, memory, and/or files (and thus offer multi-tenancy within function execution). Further, some embodiments may provide serverless functions that are placed and executed based on data locality and/or gravity (which may help improve execution latency and reduce network usage as compared to existing random function placement strategies).
Traditionally, WASM runtimes were executed within the browser process. Note, however, a WASM runtime can also get executed as a standalone or outside of a browser if the runtimes are accompanied with interfaces that can facilitate system calls. According to some embodiments, the WASM runtimes execute as a separate process which runs a given WASM function using a thread. The WASM runtime process can be provisioned easily with a VM or container (or can even run on a bare machine or a host OS directly).
Compiling reconciler logic into WASM modules may provide sever benefits, such as:
The WASM module may be invoked per request to reconcile a respective resource. The WASM module may be launched into its own sandbox and consume resources (memory/CPU) only during the period of execution. This means this saves resources in the system because control plane operations are not frequent (and also secures execution against attacks on the operator itself).
Note that one process, such as WasmTime or WASM Secure Capabilities Connector (“WaSCC”), may always run across tenants. The “serverless” part may be the tenant execution code which spins up whenever a reconciliation is needed for a tenant. This may launch a WASM module (a relatively fast process) within the above mentioned WASM runtime (e.g., within a few millisecond).
This also means that everything gets launched within the same process (e.g., WasmTime or WaSCC) which from a resource point of view is the same as running a single operator for multiple tenants. This would still be better if the serverless component made via a full process (e.g., via a side car and launch a docker container which then has more setup in terms of the network and filesystem setup per container). Note that each tenant control plane is executed within the WASM sandbox and inherits the security features by default.
Note that the embodiments described herein may be implemented using any number of different hardware configurations. For example,
The processor 810 also communicates with a storage device 830. The storage device 830 can be implemented as a single database or the different components of the storage device 830 can be distributed using multiple databases (that is, different deployment information storage options are possible). The storage device 830 may comprise any appropriate information storage device, including combinations of magnetic storage devices (e.g., a hard disk drive), optical storage devices, mobile telephones, and/or semiconductor memory devices. The storage device 830 stores a program 812 and/or orchestrator platform 814 for controlling the processor 810. The processor 810 performs instructions of the programs 812, 814, and thereby operates in accordance with any of the embodiments described herein. For example, the processor 810 may deploy and manage multi-tenant workloads in the cloud-based computing environment. The processor may also facilitate detection, by a Kubernetes control plane operator associated with the multi-tenant workloads, a trigger event that results in a reconciliation request for a particular tenant workload. Responsive to the reconciliation request, the system may spin up serverless tenant execution code, representing reconciler logic compiled into a WASM module, in a WASM sandbox to perform reconciliation for the particular tenant workload.
The programs 812, 814 may be stored in a compressed, uncompiled and/or encrypted format. The programs 812, 814 may furthermore include other program elements, such as an operating system, clipboard application, a database management system, and/or device drivers used by the processor 810 to interface with peripheral devices.
As used herein, information may be “received” by or “transmitted” to, for example: (i) the platform 800 from another device; or (ii) a software application or module within the platform 800 from another software application, module, or any other source.
In some embodiments (such as the one shown in
Referring to
The WASM runtime process identifier 902 might be a unique alphanumeric label or link that is associated with a particular WASM runtime process being executed on a VM or container. The sandbox identifier 904 might identify a WASM sandbox associated with the runtime (e.g., and as shown in
Thus, embodiments may provide reconciler sandboxes for operators in a cloud-based computing environment in a secure, automatic, and accurate manner. Moreover, multiple tenants may operate in separate sandboxes (with access to different memories) improving the security of the system. A cloud platform based on Kubernetes with multiple operators that may be used (both in public and private cloud setup), embodiments may save resources by allowing for the serverless execution of reconcilers while securing per tenant reconcile request execution. These benefits may help an enterprise save money as the number of operators increase and also provide indirect benefits associated with a more secure execution environment.
The following illustrates various additional embodiments of the invention. These do not constitute a definition of all possible embodiments, and those skilled in the art will understand that the present invention is applicable to many other embodiments. Further, although the following embodiments are briefly described for clarity, those skilled in the art will understand how to make any changes, if necessary, to the above-described apparatus and methods to accommodate these and other embodiments and applications.
Although specific hardware and data configurations have been described herein, note that any number of other configurations may be provided in accordance with some embodiments of the present invention (e.g., some of the information associated with the databases described herein may be combined or stored in external systems). Moreover, although some embodiments are focused on particular types of applications and services, any of the embodiments described herein could be applied to other types of applications and services. In addition, the displays shown herein are provided only as examples, and any other type of user interface could be implemented. For example,
According to some embodiments, the WASM based execution runtime offers a sandboxed execution environment. The WASM runtime may, for example, create a continuous memory heap for each sandbox and no pointers from inside the sandbox can access outside memory. To allow system calls for instructions executing inside the sandbox, during compilation of WASM the pointers are detected, and offsets can be passed to interfaces to enable system interactions (e.g., a WASI-WASM system interface). In order to prevent access from outside the WASM sandbox into sandbox heap memory, some embodiments rely on a security enclave such as the Security Guard Extensions (“SGX”) architecture available from INTEL®. Any process running in user-space might get compromised using a root access. As a result, it is possible that the WASM runtime process can get compromised (which can allow data leaks from the WASM heaps or sandboxes). According to some embodiments, a runtime may use an SGX instruction set with native RUST features to create enclaves. Later, the WASM heaps are protected by using SGX instructions and executing the WASM in the enclaves where security is ensured by hardware. Such a system interface may provide protection when WASM functions are executed outside of a browser.
Further, with a threaded model (where each thread executes a WASM function) CPU isolation may be achieved by setting a timer on the thread and then executing a handler to remove the WASM module after the time expires. A proposed runtime, in some embodiments, may achieve filesystem isolation by separating disks and mounting disks for each runtime process. Further, using the principles of capability-based security the runtime may assigns file descriptors (FDs) to WASM functions in a controlled manner.
Additional security features of the WASM runtime might include, according to some embodiments:
The present invention has been described in terms of several embodiments solely for the purpose of illustration. Persons skilled in the art will recognize from this description that the invention is not limited to the embodiments described, but may be practiced with modifications and alterations limited only by the spirit and scope of the appended claims.