Benefit is claimed under 35 U.S.C. 119(a)-(d) to Foreign Application Serial No. 202341001801 filed in India entitled “PER SERVICE VERSION ROLL BACK IN DISTRIBUTED COMPUTING SYSTEMS”, on Jan. 09, 2023, by VMware, Inc., which is herein incorporated in its entirety by reference for all purposes.
The present disclosure relates to computing environments, and more particularly to methods, techniques, and systems to perform per service version rollback operation in distributed computing systems.
Virtual computing instances (VCIs), such as virtual machines, virtual workloads, data compute nodes, clusters, and containers, among others, have been introduced to lower data center capital investment in facilities and operational expenses and reduce energy consumption. A VCI is a software implementation of a computer that executes application software analogously to a physical computer. A distributed computing system can include multiple virtual components running on multiple VCIs, which can be associated with a plurality of data centers. Further, such a distributed computing system may be upgraded to include new features, performance enhancement, and the like.
The drawings described herein are for illustrative purposes and are not intended to limit the scope of the present subject matter in any way.
Examples described herein may provide an enhanced computer-based and/or network-based method, technique, and system to perform per service version roll back in distributed computing systems. The paragraphs [0014] to [0021] present an overview of the distributed computing systems, existing methods to upgrade the distributed computing systems, and drawbacks associated with the existing methods.
A distributed computing system (also referred to as a distributed software system) may refer to a construct which involves various infrastructure parties that act together to enable a business service. For example, the distributed computing system includes components that are located on different networked computers, which communicate and coordinate their actions by passing messages to one another. In some examples, multiple virtual computing instances (VCIs) can be configured to be in communication with each other in the distributed computing system (e.g., a software defined data network). The term “VCI” covers a range of computing functionality. For example, VCIs may include virtual machines (VMs), and/or containers. Containers can run on a host operating system without a hypervisor or separate operating system, such as a container that runs within Linux. A container can be provided by a virtual machine that includes a container virtualization layer (e.g., Docker). The virtual machine refers to an isolated user space instance, which can be executed within a virtualized environment. Other technologies aside from hardware virtualization can provide isolated user space instances, also referred to as VCIs. The term “VCI” covers these examples and combinations of different types of VCIs, among others.
The virtual machines operate with their own guest operating systems on a host using resources of the host virtualized by virtualization software (e.g., a hypervisor, virtual machine monitor, and the like.). The tenant (i.e., the owner of the virtual machine) can choose which applications to operate on top of the guest operating system. Some containers, on the other hand, are constructs that run on top of a host operating system without the need for a hypervisor or separate guest operating system. The host operating system can use name spaces to isolate the containers from each other and therefore can provide operating-system level segregation of the different groups of applications that operate within different containers.
In such examples, the distributed computing system may include multiple services (e.g., cloud-based services) running on respective isolated virtual computing instances in a network virtualization platform. The network virtualization platform may include a management plane, a data plane, and/or a control plane. The data plane refers to functions and processes that forward packets/frames from one interface to another. The control plane refers to functions and processes that determine which path to use (such as LDP, Routing protocols, etc.). The management plane refers to functions used to control and monitor devices via a graphical user interface.
In some examples, the management plane may be built by a network virtualization and security platform that enables a virtual cloud network, a software-defined approach to networking that extends across data centers, clouds, and application frameworks. An example network virtualization and security platform may include NSX-T™, a software-defined networking and security product offered by VMware®. The data plane may include an NSX® Virtual Switch™, which is based on a vSphere distributed switch (VDS) with additional components to enable services. The NSX® Virtual Switch™ refers to software that operates in server hypervisors to form a software abstraction layer between servers and the physical network. The VDS provides a centralized interface to configure, monitor, and administer virtual machine access switching for the entire data center. Further, the control plane may run in a NSX controller cluster. An NSX controller is an advanced distributed state management system that provides control plane functions for logical switching and routing functions. The control plane is the central control point for all logical switches within a network and maintains information about all hosts, logical switches (VXLANs), and distributed logical routers.
For example, the NSX may be developed as a scale out architecture including components such as a management plane cluster (MP), a control plane cluster (CCP), an Edge and a data plane cluster (DP). The DP may include hypervisor nodes of different virtualization technologies such as an enterprise-class, type-1 hypervisor (ESX), a Kernel-based virtual machine (KVM), and the like. The CCP may include a cluster of controller nodes responsible for configuring networking flows in the hypervisors. The MP may be a cluster of management nodes which provides a management interface to an end user. Thus, the NSX works by implementing three separate but integrated planes, i.e., management, control, and data planes. The three planes may be implemented as a set of processes, modules, and agents residing on three types of nodes such as a NSX manager, a controller, Edges, and a transport node.
Further, to stay and remain competitive in the market, such distributed computing systems may be upgraded to include new features, government compliances, reporting, analytics, and the like. For example, the distributed computing systems upgradation may not be limited to increase in productivity, improved communication, improved efficiency, better security, enhancements, extra support, reduced cost, compatibility, reduce outages, better customer engagements, and business growth.
In some examples, the distributed computing system may be upgraded using an upgrade coordinator, which is a module responsible for upgrading all the components in an automated and ordered manner with an upgrade order (e.g., Edge ->DP ->CCP and MP), for instance. The NSX provides numerous services such as a virtual private network (VPN), a load balancer (L3/L2), a domain name system (DNS), a dynamic host configuration protocol (DHCP), a transport layer security (TLS) decryption, an intrusion detection and prevention system (IDPS), an anti-malware, and the like. Further, the data path of these services may be provided through the NSX Edge, and the management of these services may be provided by the NSX manager. Furthermore, users of the distributed computing system can use the NSX manager's representational state transfer application programming interfaces (REST APIs) to configure these services on the Edge.
When all the NSX components (e.g., Edges, hosts, NSX manager, and the like) get upgraded successfully, then the distributed computing system may be declared as being upgraded from version 1 (V1) to version 2 (V2). However, post upgrade of the distributed computing system, consider that an issue is detected in the management plane for one of the services (e.g., the VPN), where a user is not able to perform configure, modify, or delete operations in the data center. In this case, the user's production environment may be impacted. In some existing methods, the user may be required to report the issue and till then the user is blocked. In some other existing methods, the user may have to wait for a hot patch to fix the issue, which can take a significant amount of time. In some other existing methods, the user may have to wait for a next release, which can again take a significant amount of time (e.g., a quarter or a half year time frame). Moreover, the release (version V2) may have to be pulled off from the public site to refrain users from downloading the upgrade bits so that other users may not experience the same issue.
Examples described herein may provide a management node to perform per service version rollback operations in distributed computing systems. An example distributed computing system may include a first service executed by a first isolated virtual computing instance and a second service executed by a second isolated virtual computing instance. For example, the first service can be implemented as part of a management plane and the second service can be implemented as part of a data plane or a control plane. In an example, the management node may receive a request to upgrade the distributed computing system. Further, the management node may upgrade the distributed computing system including the first service and the second service to a second version while retaining the first version of the first service and the second service. Upon upgrading the distributed computing system, the management node may detect an issue associated with the second version of the first service. In this example, the management node may perform a rollback operation to roll back the first service to the first version while retaining second version of the second service.
Thus, examples described herein may enable version roll back of the first service running in the management plane from the second version to the first version while retaining the second version of remaining services of the distributed computing system running in the management plane, control plane, and/or data plane. By facilitating the roll back of individual services of the distributed computing system, the customer's production environment may not get impacted. Further, examples described herein may enable the data path (e.g., services of the data plane) to continue to work, without any downtime. Furthermore, examples described herein may avoid roll back of the entire distributed computing system (which may be significantly costly and consume significant amount of time), and also avoid pulling off the release from the public site. In this case, users can still download the upgrade package with known exceptions. With the examples described herein, any issue in one service may not impact operations of remaining services in the distributed computing system. In this example, upon performing the version roll back of the service having the issue, users can use new features of the remaining services with upgraded versions.
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present techniques. However, the example apparatuses, devices, and systems, may be practiced without these specific details. Reference in the specification to “an example” or similar language means that a particular feature, structure, or characteristic described may be included in at least that one example but may not be in other examples.
Example distributed computing system 102 includes multiple services (e.g., a first service, a second service, and a third service) running on respective isolated virtual computing instances (e.g., a first isolated virtual computing instance 108, a second isolated virtual computing instance 110, and a third isolated virtual computing instance 116) in a network virtualization platform. The network virtualization platform may include a management plane and at least one of a data plane and a control plane. The terms “management plane”, “data plane”, and “control plane” refer to functional descriptions of elements of distributed computing system 102 that perform specialized functions. The management plane may be responsible to define intent of the user to add, edit, and delete any configuration, functionality. The management plane may provide an interface to an end user to process user queries, process user configurations, and operational tasks for data and control planes. In other examples, the management plane may be configured to, for example, perform tasks related to input validation, user management, policy management, and background task tracking. In some examples, the management plane provides a single application programming interface (API) entry point to distributed computing system 102.
The data plane is coupled with the management plane. The data plane may physically handle the intent/task/operations supplied or provided by the management plane. For example, the data plane is configured to index data during data ingestion and store the indexed data. In some examples, the data plane is configured to ingest data received from the management plane and query the stored data. The data plane may include a collection of data plane containers.
In some examples, the data plane is responsible for handling the data packets and applying actions to them, based on rules that are programmed into lookup tables. Furthermore, the control plane may be tasked with calculating and programming actions for the data plane. In some examples, the control plane is also responsible to distribute the same data across all the data plane components.
As shown in
As shown in
For example, the network can be a managed Internet protocol (IP) network administered by a service provider. The example network may be implemented using wireless protocols and technologies, such as Wi-Fi, WiMAX, and the like. In other examples, the network can also be a packet-switched network such as a local area network, wide area network, metropolitan area network, Internet network, or other similar type of network environment. In yet other examples, the network may be a fixed wireless network, a wireless local area network (LAN), a wireless wide area network (WAN), a personal area network (PAN), a virtual private network (VPN), intranet or other suitable network system and includes equipment for receiving and transmitting signals.
Further, distributed computing system 102 may be communicatively connected to management node 120. Management node 120 may refer to a computing device or computer program (i.e., executing on a computing device) that provides service to first compute node 104 and second compute node 106. Further, management node 120 includes a processor 122 and a memory 124 coupled to processor 122. Processor 122 may refer to, for example, a central processing unit (CPU), a semiconductor-based microprocessor, a digital signal processor (DSP) such as a digital image processing unit, or other hardware devices or processing elements suitable to retrieve and execute instructions stored in a storage medium, or suitable combinations thereof. Processor 122 may for example, include single or multiple cores on a chip, multiple cores across multiple chips, multiple cores across multiple devices, or suitable combinations thereof. Processor 122 may be functional to fetch, decode, and execute instructions as described herein. Example memory 124 includes a version rollback module 126.
During operation, version rollback module 126 may receive a request to upgrade distributed computing system 102. Further, version rollback module 126 may upgrade distributed computing system 102 including first service 112A and second service 114A to a second version (i.e., a second version of first service 112B and a second version of second service 114B) while retaining the first version of first service 112A and second service 114A.
Upon upgrading distributed computing system 102, version rollback module 126 may detect an issue associated with the second version of first service 112B. Further, version rollback module 126 may perform a rollback operation to roll back first service 112A to the first version while retaining the second version of second service 114B.
In an example, during the upgrade of distributed computing system 102, version rollback module 126 may detect an error or failure associated with the upgrade of first service 112A. In response to detecting the error or failure associated with the upgrade of first service 112A, version rollback module 126 may perform the rollback operation to roll back first service to the first version (e.g., 112A) while retaining the second version of second service 114B (i.e., remaining services of the management plane).
In another example, upon completing the upgrade of distributed computing system 102, version rollback module 126 may provide a user-selectable option to roll back the first service in the user interface of the management plane. Upon a user selection of the user-selectable option, version rollback module 126 may perform the rollback operation to roll back the first service from the second version (i.e., 112B) to the first version (i.e., 112A) while retaining the second version of the second service (i.e., 114B). In yet another example, upon completing the upgrade of distributed computing system 102, version rollback module 126 may provide a user-selectable option in a user interface of the management plane to enable a user to navigate through a roll back process of first service 112A. An example of such user interface is depicted in
In an example, upon rolling back the first service to the first version (i.e., 112A) and when the second version (i.e., 112B) does not include a new feature, version rollback module 126 may display a notification on a user interface of the first service indicating that the first service is rolled back to the first version. In another example, upon rolling back the first service to the first version (i.e., 112A) and when the second version (i.e., 112B) includes a new feature, version rollback module 126 may display a notification on a user interface of the first service indicating an unavailability of the new feature.
In some examples, version rollback module 126 may determine a dependent service of distributed computing system 102 that is having a dependency relationship with the second version of the first service (i.e., 112B). Further, version rollback module 126 may determine a functionality of the dependent service that is likely to affect. Furthermore, version rollback module 126 may generate a notification on the user interface to indicate the functionality of the dependent service that is likely to affect.
As shown in
In some examples, the functionalities described in
At 202, a request to upgrade a distributed computing system may be received. In an example, the distributed computing system may include multiple services running on respective isolated virtual computing instances in a network virtualization platform. An example network virtualization platform may include a management plane and at least one of a data plane and a control plane. In an example, a first service may include a first component running in the management plane and a second component running in the data plane and/or the control plane.
At 204, the services of the distributed computing system may be upgraded from a first version to a second version while retaining the first version associated with the services. At 206, upon upgrading the distributed computing system, an issue associated with the second version of a first service of the services may be detected in the management plane. In an example, detecting the issue associated with the second version of the first service may include detecting an error or failure associated with the upgrade of the first service during the upgrading of the services of the distributed computing system. In another example, detecting the issue associated with the second version of the first service may include detecting that a feature associated with the second version of the first service is not working after the upgrade of the services of the distributed computing system. An example feature refers to a unit of functionality of the first service that satisfies a requirement, represents a design decision, and provides a potential configuration option.
At 208, upon detecting an issue associated with the second version of the first service, a rollback operation may be enabled to perform a roll back of the first service in the management plane to the first version while retaining the second version of remaining services. For example, performing the rollback operation may include performing the rollback operation to roll back the first component running in the management plane to the first version while retaining the second version of the second component in the data plane and/or the control plane.
In an example, enabling to perform the rollback operation may include providing, via a user interface of the management plane, a user-selectable option to roll back the first service. Further, in response to a selection of the user-selectable option, the rollback operation may be performed to roll back the first service from the second version to the first version while retaining the second version of the remaining services.
In other examples, example method 200 may include determining a second service that is having a dependency relationship with the second version of the first service. Upon determining the second service, a functionality of the second service that is likely to affect may be determined. Further, a notification may be generated on a user interface to indicate that the functionality of the second service that is likely to affect. In this example, an option may be provided on the user interface to roll back the second service from the second version to the first version.
For example, the management plane may be built by VMware NSX® manager, which provides a NSX manager view 306 and is the management component offered by Vmware for NSX-T data centers. NSX may be a network virtualization and security platform that enables a virtual cloud network and a software-defined approach to networking. The NSX® manager may provide a user Interface 314 (e.g., a graphical user interface) and REST API for the creation, configuration, and monitoring of NSX-T components such as logical switches, logical routers, and firewalls in the data center. The NSX® manager may be installed as a virtual appliance on any physical host computing system in a VMware vCenter® server environment, which is a server management platform. The NSX-T components may refer to services of distributed computing system 300.
Further, the data plane in NSX-T may include physical host computing systems and edge nodes. The data plane may provide an edge view 308 to configure NSX-T components on hosts and deploying edge nodes to setup logical routing. The NSX-T components of the data plane may refer to services of the distributed computing system. In the example shown in
Consider an initial version of code as V1 before the upgrade as indicated in
VPN service is rolled back to the first version. Consider a scenario where there is no new feature in the upgraded version V2 of the VPN service. When the second version V2 includes existing feature enhancements or optimization without any new features, all the components of the user interface, the management plane, and the data path may work together. In this case, upon roll back of the service (e.g., the VPN), message 502 may be displayed on user interface 314 indicating “VPN management service is rolled back to V1”. Further, other services will work as expected and there will be no impact. Furthermore, the user can wait till the issue gets resolved without any impact in customer's production environment.
Further, examples described in
Computer-readable storage medium 604 may store instructions 606, 608, 610, and 612. Instructions 606 may be executed by processor 602 to receive a request to upgrade the distributed computing system. In an example, the distributed computing system may include a microservices architecture. A microservice architecture is a method of developing software applications as a suite of independently deployable, small, modular services in which each microservice runs a unique process and communicates through a well-defined, lightweight mechanism to serve a business goal. These microservices can be deployed, modified, and then redeployed independently without compromising the integrity of an application. In a microservice architecture, when developing an application program, the application program is divided into a plurality of small services, and the application program is constructed by merging the divided small services. That is, in the microservice architecture, one application service is created by specializing each service for each module and connecting and integrating the respective modules using interfaces.
For example, the distributed computing system may include a first service (e.g., a VPN) having a first component running in a management plane and a second component running in a data plane. The first component may run in a first isolated virtual computing instance of a first compute node in the management plane such that the first component executes in isolation from other services running on the first compute node. Further, the second component may run in a second isolated virtual computing instance of a second compute node in the data plane such that the second component executes in isolation from other services running in the second compute node. In an example, the data plane may be backward compatible with the management plane such that operations of the first version of the first component in the management plane can be compatible with operations of the second version of the second component in the data plane.
Instructions 608 may be executed by processor 602 to upgrade the distributed computing system from a first version to a second version while retaining the first version. Upon upgrading the distributed computing system, instructions 610 may be executed by processor 602 to receive a request to roll back the first component running in the management plane from the second version to the first version. In an example, instructions 610 to receive the request to roll back the first component may include instructions to provide, via a user interface of the management plane, a user-selectable option to roll back the first component and receive the request to roll back the first component via the user-selectable option.
In response to receiving the request, instructions 612 may be executed by processor 602 to perform a rollback operation to roll back the first component to the first version while retaining the second version of the second component running in the data plane. Further, computer-readable storage medium 604 may store instructions to determine a second service of the distributed computing system that is having a dependency relationship with the second version of the first service. Furthermore, computer-readable storage medium 604 may store instructions to determine a functionality of the second service that is likely to affect. Further, computer-readable storage medium 604 may store instructions to generate a notification on a user interface. For example, the notification may indicate the functionality of the second service that is likely to affect.
The above-described examples are for the purpose of illustration. Although the above examples have been described in conjunction with example implementations thereof, numerous modifications may be possible without materially departing from the teachings of the subject matter described herein. Other substitutions, modifications, and changes may be made without departing from the spirit of the subject matter. Also, the features disclosed in this specification (including any accompanying claims, abstract, and drawings), and any method or process so disclosed, may be combined in any combination, except combinations where some of such features are mutually exclusive.
The terms “include,” “have,” and variations thereof, as used herein, have the same meaning as the term “comprise” or appropriate variation thereof. Furthermore, the term “based on”, as used herein, means “based at least in part on.” Thus, a feature that is described as based on some stimulus can be based on the stimulus or a combination of stimuli including the stimulus. In addition, the terms “first” and “second” are used to identify individual elements and may not meant to designate an order or number of those elements.
The present description has been shown and described with reference to the foregoing examples. It is understood, however, that other forms, details, and examples can be made without departing from the spirit and scope of the present subject matter that is defined in the following claims.
| Number | Date | Country | Kind |
|---|---|---|---|
| 202341001801 | Jan 2023 | IN | national |