In general, modern computing clusters may operate inefficiently along several dimensions. For example, modern computing clusters may operate using a single master node or load balancer, which creates a single point of failure. Additionally, to satisfy a high availability protocol, modern computing clusters may pair each active node with a passive node so that the passive node may take over a task for the active node if the active node fails. This creates a high level of redundancy and inefficiency. Furthermore, modern computing clusters may limit a particular node to serving as a backup for just another particular node within the cluster, which again results in underutilized computing resources, as discussed further below. Accordingly, the instant application identifies a need for improved systems and methods for performing computing cluster node switchover.
As will be described in greater detail below, the instant disclosure describes various systems and methods for performing computing cluster node switchover. In one example, a computer-implemented method for performing computing cluster node switchover may include (i) detecting an indication to switch an assignment of a transaction task away from a first network node in a computing cluster, (ii) executing, in response to detecting the indication, by each network node in a set of multiple network nodes within the computing cluster, a switchover algorithm to select a second network node, within the set of multiple network nodes, to receive the assignment of the transaction task, (iii) switching over the assignment of the transaction task from the first network node to the second network node based on a result of executing the switchover algorithm by each network node in the set of multiple network nodes, and (iv) performing, by the second network node, at least part of a remainder of the transaction task in response to switching over the assignment of the transaction task from the first network node to the second network node.
In some embodiments, each network node in the set of multiple network nodes within the computing cluster executes the switchover algorithm such that the computing cluster omits a static master node in a manner that prevents a single point of failure. In some examples, the switchover algorithm executed by each network node in the set of multiple network nodes selects the second network node based at least in part on a proximity of the second network node to a client device that is requesting the transaction task.
In some examples, the assignment of the transaction task is switched over from the first network node to the second network node prior to the transaction task being completed. In further examples, the transaction task may include an ecommerce transaction task according to which a user account performs a financial transaction using a web service. Additionally, in some examples, a number of transaction tasks that the switchover algorithm assigns from the first network node to the second network node is based at least in part on a current capacity that is available at the second network node.
In some embodiments, the switchover algorithm, as executed by each network node within the set of multiple network nodes (i) determines that switching over the assignment of the transaction task from the first network node to the second network node exhausts a computing capacity that is defined for the second network node and (ii) switches over an assignment of a second transaction task from the first network node to a third network node to prevent the second network node from being overloaded. In some examples, the switchover algorithm, as executed by each network node within the set of multiple network nodes, repeats a process of assigning a portion of a remainder of remaining transaction tasks from the first network node to a new respective network node until an entirety of the remainder of remaining transaction tasks has been assigned.
In some examples, the computing cluster operates in a manner such that each network node within the computing cluster executes in an active mode rather than each network node that is executing in an active mode being paired with a respective network node that is executing in a passive mode. In further examples, each network node within the computing cluster operates as a candidate backup network node for every other network node within the computing cluster rather than operating as a candidate backup network node for just a single network node.
In some examples, the set of multiple network nodes includes each network node within the computing cluster. In further examples, the computing cluster operates as a homogeneous full mesh. Additionally, in some examples, the homogeneous full mesh is self-healing by executing the switchover algorithm.
In some embodiments, the indication to switch the assignment of the transaction task away from the first network node in the computing cluster includes an indication that the first network node has failed. In some examples, the indication to switch the assignment of the transaction task away from the first network node in the computing cluster includes an indication that a client device that is requesting the transaction task has switched from one geolocation to another geolocation. In further examples, switching over the assignment of the transaction task from the first network node to the second network node further includes transmitting a security policy that is specific to a user account requesting the transaction task to the second network node. In some embodiments, the security policy is based at least in part on a geolocation of the second network node. In some examples, context information relating to the transaction task is accessible to each network node within the set of multiple network nodes. Furthermore, in these examples, each network node within the set of multiple network nodes executes the same switchover algorithm based on the same context information.
In one embodiment, a system for implementing the above-described method may include (i) a detection module, stored in memory, that detects an indication to switch an assignment of a transaction task away from a first network node in a computing cluster, (ii) an execution module, stored in memory, that executes, in response to detecting the indication, as part of each network node in a set of multiple network nodes within the computing cluster, a switchover algorithm to select a second network node, within the set of multiple network nodes, to receive the assignment of the transaction task, (iii) a switching module, stored in memory, that switches over the assignment of the transaction task from the first network node to the second network node based on a result of executing the switchover algorithm by each network node in the set of multiple network nodes, (iv) a performance module, stored in memory, that performs, as part of the second network node, at least part of a remainder of the transaction task in response to switching over the assignment of the transaction task from the first network node to the second network node, and (v) at least one physical processor configured to execute the detection module, the execution module, the switching module, and the performance module.
In some examples, the above-described method may be encoded as computer-readable instructions on a non-transitory computer-readable medium. For example, a computer-readable medium may include one or more computer-executable instructions that, when executed by at least one processor of a computing device, may cause the computing device to (i) detect an indication to switch an assignment of a transaction task away from a first network node in a computing cluster, (ii) execute, in response to detecting the indication, by each network node in a set of multiple network nodes within the computing cluster, a switchover algorithm to select a second network node, within the set of multiple network nodes, to receive the assignment of the transaction task, (iii) switch over the assignment of the transaction task from the first network node to the second network node based on a result of executing the switchover algorithm by each network node in the set of multiple network nodes, and (iv) performing, by the second network node, at least part of a remainder of the transaction task in response to switching over the assignment of the transaction task from the first network node to the second network node.
Features from any of the above-mentioned embodiments may be used in combination with one another in accordance with the general principles described herein. These and other embodiments, features, and advantages will be more fully understood upon reading the following detailed description in conjunction with the accompanying drawings and claims.
The accompanying drawings illustrate a number of example embodiments and are a part of the specification. Together with the following description, these drawings demonstrate and explain various principles of the instant disclosure.
Throughout the drawings, identical reference characters and descriptions indicate similar, but not necessarily identical, elements. While the example embodiments described herein are susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and will be described in detail herein. However, the example embodiments described herein are not intended to be limited to the particular forms disclosed. Rather, the instant disclosure covers all modifications, equivalents, and alternatives falling within the scope of the appended claims.
The present disclosure is generally directed to systems and methods for performing computing cluster node switchover. As described further below, the disclosed subject matter may enable high availability of microservices, as well as high availability of the underlying data on which the microservices operate. The high availability of these microservices may be defined in terms of a high availability protocol and/or availability threshold. In some examples, the underlying data on which the microservices operate may include transaction data, including data describing a transaction that is not yet completed, as discussed further below. The disclosed subject matter may also improve upon related systems by reducing a number of network channels for synchronizing microservice data stores, which may further reduce complexity and streamline installation and maintenance.
The following will provide, with reference to
In certain embodiments, one or more of modules 102 in
As illustrated in
As illustrated in
Example system 100 in
For example, and as will be described in greater detail below, establishment module 104 may establish, at first network node 206, an instance of a first microservice 240 for an application and an instance of a distinct second microservice 250. Establishment module 104 may also establish, at a distinct second network node 208, which may parallel first network node 206 in structure and/or configuration, an additional instance of first microservice 240 and an additional instance of distinct second microservice 250. Furthermore, establishment module 104 may establish a single network channel 260 for synchronizing, between first network node 206 and distinct network node 208, first data store 120 for first microservice 240 and second data store 122 for distinct second microservice 250. Synchronization module 106 may synchronize, across single network channel 260 between first network node 206 and distinct second network node 208, first data store 120 for first microservice 240 and second data store 122 for distinct second microservice 250, such that the synchronizing is adjusted based on an analysis of context information for both first microservice 240 and distinct second microservice 250, rather than synchronizing first data store 120 and second data store 122 independently across two separate network channels.
Computing device 202 generally represents any type or form of computing device capable of reading computer-executable instructions. One illustrative example of computing device 202 may include a server or modular computing host within a cloud computing cluster. In general, computing device 202 may correspond to any computing device configured to manage one or more members of a cloud computing cluster in accordance with method 300, as discussed further below.
Network node 206 generally represents any type or form of computing device that is capable of executing a microservice for a cloud computing service, in accordance with method 300, and as discussed further below. Additional examples of network node 206 include, without limitation, security servers, application servers, web servers, storage servers, and/or database servers configured to run certain software applications and/or provide various security, web, storage, and/or database services. Although illustrated as a single entity in
Network 204 generally represents any medium or architecture capable of facilitating communication or data transfer. In one example, network 204 may facilitate communication between computing device 202 and network node 206. In this example, network 204 may facilitate communication or data transfer using wireless and/or wired connections. Examples of network 204 include, without limitation, an intranet, a Wide Area Network (WAN), a Local Area Network (LAN), a Personal Area Network (PAN), the Internet, Power Line Communications (PLC), a cellular network (e.g., a Global System for Mobile Communications (GSM) network), portions of one or more of the same, variations or combinations of one or more of the same, and/or any other suitable network.
As illustrated in
As used herein, the term “microservice” generally refers to a modular component within an application that performs a specific task in accordance with a service-oriented software architecture. In some examples, the application may correspond to an online web service, such as an online retailer or banking service. Illustrative examples of microservices may include performing credit card transactions, executing search queries, and/or executing user login and/or authentication procedures. Moreover, as used herein, the term “distinct second microservice” generally refers to any microservice that is distinct, or different, from the first mentioned microservice (e.g., first microservice 240 as distinct from microservice 250). In alternative embodiments, the distinct second microservice referenced throughout method 300 may be replaced by the same microservice, such that steps 302 and 304 refer to four separate instances of the same microservice. Furthermore, both first microservice 240 and microservice 250 may execute as part of the same application (e.g., both execute as part of AMAZON.COM), or instead execute as part of two separate and distinct applications (e.g., one may execute as part of AMAZON.COM and another may execute as part of USBANK.COM).
In general, the online web service may use a cloud computing platform or cluster to service user requests. The cloud computing platform may include a multitude of different nodes, hosts, and/or servers. Illustrative examples of these nodes include network node 206 and network node 208.
Establishment module 104 may establish the first microservice and the distinct second microservice in a variety of ways. In general, establishment module 104 may establish a microservice as part of executing the corresponding application. In some examples, establishment module 104 may establish a microservice in response to a user interaction, such as a user request, query, and/or command (or other user input). For example, a user may request to login to an online web service, such as an online banking service. In response, establishment module 104 may establish (e.g., execute) a microservice that corresponds to the user login and/or authentication procedures. Similarly, a user may attempt to purchase an item using an online retailer, such as AMAZON. In response, establishment module 104 may establish the microservice that handles credit card transactions for purchasing items accordingly. In additional or alternative examples, establishment module 104 may establish a microservice whenever execution of a corresponding application reaches a point where the microservice is called or requested.
At step 304, one or more of the systems described herein may establish, at a distinct second network node, an additional instance of the first microservice and an additional instance of the distinct second microservice. For example, establishment module 104 may, as part of computing device 202 in
Establishment module 104 may perform step 304 in a variety of ways. In general, establishment module 104 may perform step 304 in any manner that parallels the performance of step 302, as further described above. In some examples, establishment module 104 may perform step 302 and step 304 in parallel or simultaneously. In other examples, establishment module 104 may perform step 302 and step 304 in serial such that establishment module 104 performs step 302 before performing step 304, or vice versa. In general, establishment module 104 may establish, at the distinct second network node, an instance of the first microservice and an additional instance of the distinct second microservice as part of a cloud computing cluster that relies on redundancy, and a multitude of cluster nodes, in order to process a high volume of microservice requests or user interactions. For example, in the case that the instance of first microservice 240 at first network node 206 is occupied with processing a user interaction, then an additional user interaction may be processed at the additional instance of first microservice 240 at distinct second network node 208. In this manner, a large number of cluster nodes may enable the overall system, such as system 200, to accommodate a large number of microservice requests or user interactions.
At step 306, one or more of the systems described herein may establish a single network channel for synchronizing, between the first network node and the distinct second network node, a first data store for the first microservice and a second data store for the distinct second microservice. For example, establishment module 104 may, as part of computing device 202 in
Establishment module 104 may establish the single network channel in a variety of ways. In general, establishment module 104 may establish the single network channel by establishing a single network connection. The single network connection may include a TRANSMISSION CONTROL PROTOCOL, STREAM CONTROL TRANSMISSION PROTOCOL, INTERNET PROTOCOL, and/or USER DATAGRAM protocol network connection. The single network connection may begin with a network handshake that establishes the network connection. The single network connection may enable two separate network nodes, such as first network node 206 and distinct second network node 208, to communicate by including data within the payload section of network packets transmitted across the network connection between two corresponding network addresses. More specifically, the single network channel enables one microservice data store to replicate from one network node, such as first network node 206, to another network node, such as distinct second network node 208, by including data of the microservice data store within the payload section of the network packets. In general, a cloud computing cluster that includes first network node 206 and distinct second network node 208 may automatically, and dynamically, propagate any changes to any microservice data store to all instances of the corresponding microservice within the cloud computing cluster, thereby satisfying a high availability protocol.
As discussed further below, by establishing the single network channel, establishment module 104 may enable a synchronization engine to manage, or adjust, the synchronization of one microservice data store based on an analysis or consideration of one or more factors (e.g., context information) relating to an additional microservice data store. In contrast, related systems may synchronize microservice data stores using separate network channels, in a matter such that the synchronization of one microservice data store is independent of the synchronization of the other microservice data store. In other words, the synchronization of one microservice data store is configured without any consideration of the synchronization of the other microservice data store. For example,
Returning to the block diagram of set 400 of the network nodes, the seven network channels shown in this block diagram may replicate corresponding microservice data stores independently of each other and without an analysis or consideration of context information for other microservices. For example, the instance of microservice 460 at network node 462 may replicate data store 424 to the instance of microservice 460 at network node 470 across network channel 450, but do so in a manner that is independent of, and does not analyze, factor, or consider, any context information for the replication of the other microservice data stores at network node 462 and/or network node 470. More specifically, the instance of microservice 460 at network node 462 may replicate data store 424 to the instance of microservice 460 at network node 470 without any consideration of the context information for microservice 250 shown in table 432 and/or the context information for microservice 240 shown in table 430.
At step 308, one or more of the systems described herein may synchronize, across the single network channel between the first network node and the distinct second network node, the first data store for the first microservice and the second data store for the distinct second microservice, such that the synchronizing is adjusted based on an analysis of context information for both the first microservice and the distinct second microservice, rather than synchronizing the first data store and the second data store independently across two separate network channels. For example, synchronization module 106 may, as part of computing device 202 in
As used herein, the phrase “synchronize the first data store for the first microservice and the second data store of the distinct second microservice” generally refers to ensuring that two separate network nodes both have the same content or data for the first data store, and also that both have the same content or data for the second data store (although, of course, the data for the first data store and the data for the second data store may be different). In general, synchronizing the first data store for the first microservice and the second data store for the distinct second microservice may be performed by synchronization module 106 replicating each respective data store from one network node to another network node.
Synchronization module 106 may synchronize the first data store for the first microservice and synchronize the second data store for the distinct second microservice in a variety of ways. In one embodiment, synchronization module 106 synchronizing the first data store for the first microservice and the second data store for the distinct second microservice reduces a number of network channels in comparison to synchronizing the first data store and the second data store independently across two separate network channels.
In additional or alternative embodiments, a synchronization engine, as part of synchronization module 106, schedules a transmission of at least one network packet based on the analysis of the context information for both the first microservice and the distinct second microservice. Furthermore, in these examples, the network packet optionally includes within a payload data from both the first microservice and the distinct second microservice. Returning to
Moreover, as further shown in
Intelligent scheduling network packet 510 also optionally includes high availability protocol information that specifies metadata, or configuration data, for ensuring high availability of underlying microservice data store data. In contrast to some related systems, synchronization engine 512 may only use a single high availability protocol, across a single network channel, which reduces complexity of the overall system. The related systems may involve a multitude of distinct second microservices each attempting to use their own high availability protocol, which increases complexity and introduces problems with scaling the overall system.
In some examples, synchronization module 106 may synchronize the first data store for the first microservice and the second data store for the distinct second microservice in a manner that enables high availability of intra-transaction data for a cloud service according to a high availability protocol. As used herein, the term “intra-transaction data” refers to data that specifies a state, or configuration, of a user interaction with an application, such as an online web application (e.g., an online retailer such as AMAZON and/or an online banking service) prior to the user interaction with a corresponding microservice completing. For example, intra-transaction data may define the state of a user being in the process of purchasing an item from an online retailer prior to the user actually completing the purchase. Similarly, intra-transaction data may define the state of the user being in the process of transferring money from one online banking account to another online banking account, but prior to the user actually completing the transfer. In these examples, the user may have already indicated one or more inputs to the online web service that navigated through one or more states of a corresponding microservice. Accordingly, the user may value preserving this state information without the online web service requesting for the user to repeat any steps or inputs. For example, if one node of a cloud computing cluster fails for some reason, the user would still desire to preserve the intra-transaction data that defines where the user was in the process of interacting with a microservice prior to the microservice actually completing the requested functionality. Similarly, if a user switches geographic locations (or an aspect of the user's network connection switches geographic locations), then the user is still desiring to preserve the intra-transaction data, even though a node within a different geographic location is now handling the execution of the microservice (e.g., because the new node is closer to the new geographic location associated with the user's network connection). Accordingly, high availability of intra-transaction data prevents this undesirable situation from occurring by automatically and dynamically propagating intra-transaction data to all network nodes within the cluster that include an instance of the corresponding microservice.
In some examples, synchronization module 106 may perform step 308 as part of a third-party cloud computing service provided by a security vendor that ensures a level of security to protect users of the application. For example, an enterprise organization, such as AMAZON or U.S. BANK may desire to host, and execute, their online web services using a cloud computing service provider. Along these lines, the enterprise organization may select a cloud computing service that is provided by a security vendor, such as SYMANTEC. The security vendor may ensure that its cloud computing service is configured to provide a level of security to protect users of online web services that execute on the cloud computing service. The security vendor may also optionally apply a security policy to a user (e.g., a user of the online web services). The security policy may be mobile in the sense that the cloud computing service provider will apply the security policy to the user, and the user's interactions with the cloud computing system, wherever the user may go (i.e., the security policy follows the user regardless of geographic and/or network location).
In contrast,
As further shown in workflow diagram 604, in some embodiments all nodes belonging to the cloud computing cluster are homogeneous such that the cloud computing cluster functions as a full mesh network without a static master node. For example, microservice high availability cluster 610 does not include a load balancer, a master node, and/or any other leadership node that is configured in a manner to be privileged above, or to manage, a remaining set of nodes.
In contrast, all of the nodes belonging to the cloud computing cluster are homogeneous and may, for example, all execute the same algorithm. The algorithm may decide how a particular network node should respond to the failure of another network node. The algorithm may also determine which network node should take over an active transaction that was being processed by the network node that failed, and/or optionally further determine when this takeover process should occur. In other words, all of the nodes belonging to the cloud computing cluster may be equally prepared, and suited, to perform a load-balancing, management, leader election, and/or leadership function for the overall cloud computing cluster. Moreover, all of the nodes belonging to the cloud computing cluster may be able to perform these functions due to the high availability of microservice data store information (e.g., transaction and/or microservice state information) at each of the nodes, which is shared between all of the microservice instances in real time, as further discussed above. Accordingly, and in contrast to the situation described for workflow diagram 602, if one node within microservice high availability cluster 610 fails, then another node may readily, seamlessly, dynamically, and/or automatically take on the role previously performed by the failing node. Furthermore, in one embodiment, the full mesh network of the cloud computing cluster is self-healing. Additionally, any arbitrary node within the cloud computing cluster is readily capable of receiving microservice request 614, processing the request itself, and/or forwarding the request to another instance of the microservice within the cloud computing cluster (e.g., in accordance with the same algorithm that is executed locally at each cloud computing cluster node).
In general, any permutation of the following features of the disclosed subject matter are independent of method 300 and may be performed in addition, or in alternative, to one or more, or all, of the steps of method 300. First, the cloud computing cluster may detect a failure in at least one node. Second, the nodes of the cloud computing cluster may be homogeneous. For example, the nodes of the cloud computing cluster may execute locally the same algorithm. Third, the algorithm may optionally determine which node should take over the execution of a microservice that was executed previously by the node that experienced failure. Fourth, any given one of the homogeneous nodes of the cloud computing cluster may be able to execute the algorithm, and determine which node will perform the takeover (and when to perform the takeover), due to every one of the nodes possessing state information about the state of every microservice being executed within the cloud computing cluster. The algorithm may also optionally determine which node will take over a leadership position and/or the algorithm may conduct a leadership election. Fifth, the cloud computing cluster may propagate the state information among all the nodes of the cloud computing cluster in an automatic, real-time, and/or dynamic manner, thereby creating a full mesh. Moreover, the full mesh may be self-healing, as further discussed above.
Additionally, in any one or more of the embodiments discussed above, a node of the cloud computing cluster may be considered to be “active” or “passive”, with respect of one or more requests of a particular microservice. An active node of the cloud computing cluster may be actively processing microservice requests. In contrast, a passive node for those same microservice requests may be lying in a dormant state until switching to an active mode for those specific microservice requests. In some examples, only one node of the entire cluster may be active at a single time for a specific microservice request.
Furthermore, in one embodiment, all nodes belonging to the cloud computing cluster dynamically synchronize the first data store for the microservice and the second data store for the distinct second microservice such that each node belonging to the cloud computing cluster (e.g., each node that includes an instance of the first microservice and an instance of the distinct second microservice) maintains access to the first data store for the microservice and the second data store for the distinct second microservice. In the example of
The above discussion provides an overview of the disclosed subject matter with reference to method 300 of
In general, the disclosed subject matter may improve upon related systems that provide cloud computing services. These related solutions may include high availability solutions, such as PACEMAKER. Solutions such as PACEMAKER use an underlying messaging layer (COROSYNC or HEARTBEAT) to synchronize member nodes within a high availability cluster. Unfortunately, the messaging capabilities of these solutions are limited to cluster management and are not available to upper layer applications. Therefore, in order to synchronize service data among microservice instances, an additional implementation of intra-microservice messaging may be used.
As another example of a related system, KAFKA is a popular tool to implement messaging among microservice instances. Although KAFKA also provides a high availability solution, this solution is realized by replicating KAFKA brokers and the solution does not involve a full mesh among all microservice instances, as further discussed above.
In contrast, the disclosed subject matter may include a fully-distributed fault-tolerant real-time cloud service information exchange framework. With this framework, cloud computing nodes may form an interconnected full mesh network to provide high availability for a cloud service with a synchronized service context.
The disclosed subject matter may improve upon related systems in one or more of the following ways. First, the disclosed subject matter may be distributed and/or fault-tolerant. Most conventional approaches to building a cloud computing cluster involve either a master node or a cluster head. In the event of the master node or cluster head becoming unavailable, cloud services may be interrupted until a backup master node or a new cluster head is restored. In contrast, the disclosed subject matter may operate in a fully distributed manner. All computing nodes may be homogeneous in terms of their role in the cloud computing cluster. Because all nodes construct a full mesh, removal of any node will not affect the operation of the cloud service.
Second, the disclosed subject matter may provide a context-aware cloud service information exchange. With this framework, cloud services synchronize their states and operations with all microservice instances in real-time. In this way, when a microservice instance becomes unavailable, other instances can continue to serve. Although the microservice may have been switched from one computing node to another, from the cloud user's perspective, there is no service interruption. The exchange mechanism may be context-aware based on service type. Each service type may have different operational specifications or requested configurations, such as: deterministic delay specifications, encoding/compression specifications, and/or a priority categorization while exchanging service-related data. The disclosed subject matter, including synchronization module 106, may be aware of this contextual information and operate proactively to adapt to the operation environment to ensure that the specifications within the context information are satisfied.
Third, the disclosed subject matter may enable fully automated recovery of microservice information. In some embodiments, microservice information may be stored persistently in a decentralized manner. Furthermore, synchronization module 106 may redistribute the microservice information (e.g., data base data) among the full mesh of peers when a cluster node recovers. Node recovery may happen either when the system reboots, or when a temporary network outage was detected and resolved. In some examples, this recovery of service information may be fully automated.
The disclosed subject matter is optionally designed for generic cloud services through a set of well-defined application programming interfaces, including one or more of the following. First, in terms of a service context definition, cloud services may define their own service context to be synchronized among peer nodes. Second, in terms of a service operational event definition, cloud services may optionally define what type of service-related data will be exchanged when an operation event occurs. The cloud services may define the event type as well as service-related data that will be broadcast to all other peers in the cloud computing cluster. Third, in terms of a service operational event delay requirement, when broadcasting service operational events to peers, the disclosed subject matter (e.g., synchronization module 106) may check if a deterministic delay specification can be satisfied for the corresponding service-related data type. The disclosed subject matter will optionally filter out stale service-related data either at a broadcaster side or at a receiver site. Fourth, in terms of an operational service multiplier, cloud services may optionally define what microservice should receive and process underlying data store data. A server context may also be updated in response to a new operational event.
In one example, detection module 108 may detect an indication to switch an assignment of transaction task 702 away from a first network node (e.g., computing device 202(2)) in cluster 704. Furthermore, execution module 110 may execute, in response to detecting the indication, as part of each network node in a set 706 of multiple network nodes within cluster 704, a switchover algorithm to select a second network node, such as computing device 202(1), within the set of multiple network nodes, to receive the assignment of transaction task 702. For example, each one of computing device 202(1), computing device 202(3), computing device 202(4), and computing device 202(5) may be included within set 706. Accordingly, each one of these computing devices may contain an instance of execution module 110 that executes the switchover algorithm (e.g., computing device 202(2) may be unable to execute the switchover algorithm if it has failed). Furthermore, switching module 112 may switch over the assignment of transaction task 702 from the first network node (e.g., computing device 202(2)) to the second network node (e.g., computing device 202(1)) based on a result of executing the switchover algorithm by each network node in set 706 of multiple network nodes. Additionally, performance module 114 may perform, as part of computing device 202(1), at least part of a remainder of transaction task 702 in response to switching over the assignment of transaction task 702 from computing device 202(2) to computing device 202(1).
As illustrated in
Detection module 108 may detect the indication to switch the assignment of the transaction task in a variety of ways. In some examples, detection module 108 may detect an indication that the first network node has failed. For example, computing device 202(2) in cluster 704 may fail and detection module 108 within computing device 202(1) may detect this fact. Additionally, or alternatively, detection module 108 may detect an indication that a client device that is requesting the transaction task has switched from one geolocation to another geolocation. For example, a client device may request the transaction task while the client device is located within one geolocation, such as China. Subsequently, a user operating the client device may travel to another geolocation, such as Japan. Detection module 108 may detect that the geolocation of this client device has switched. Accordingly, switching module 112 may decide to switch over the assignment of the transaction task from one network node that is more proximate to China to another network node that is more proximate to Japan, thereby improving the efficiency of communication between the client device and the computing cluster, as discussed further below in connection with step 806.
Returning to
Execution module 110 may execute the switchover algorithm in a variety of ways. In general, multiple different network nodes, such as computing device 202(1), computing device 202(3), computing device 202(4), and/or computing device 202(5) may each contain a respective instance of execution module 110. Accordingly, each of these instances of execution module 110 may execute the switchover algorithm. In some examples, the different instances of execution module 110 may execute the switchover algorithm in parallel. In general, executing the switchover algorithm at each one of multiple different network nodes eliminates the need for these network nodes to communicate with each other to transmit a decision on which network node will receive the transaction task. In other words, because each instance of execution module 110 executes the same algorithm and predictably produces the same result, there is no further reason to communicate that result between different network nodes. Executing the switchover algorithm at each one of the multiple different network nodes also eliminates a single point of failure that would otherwise be represented by a load balancer or static master node. Accordingly, execution module 110 may execute the switchover algorithm such that the computing cluster omits a static master node in a manner that prevents a single point of failure within the computing cluster.
In some examples, context information relating to the transaction task is accessible to each network node within the set of multiple network nodes. The context information may define a current state of a user account's interaction with a corresponding web service, as further discussed below. The context information may preserve this current state, thereby preventing the user account from needing to repeat one or more input commands upon switching over the assignment of the transaction task from the first network node to the second network node. In some examples, the context information may be accessible to each network node because the network nodes continuously propagate the context information throughout the entire set of multiple network nodes, as the context information is updated, thereby keeping all of the multiple network nodes informed about the current state of the transaction task. Additionally, or alternatively, the context information may include information about a status of each network node within the set of multiple different network nodes, such as information indicating a current availability and/or capacity at each of these network nodes. The context information may also indicate a delay associated with communicating with one or more of the network nodes. In these examples, each network node within the set of multiple network nodes may execute the same switchover algorithm based on the same context information, thereby predictably generating the same output (e.g., because the same input applied to the same algorithm predictably results in the same output). Ensuring that each network node generates the same output eliminates a potential need for these network nodes to otherwise communicate with each other about the result of executing the switchover algorithm, as further discussed above.
In some examples, set 706 of multiple network nodes may include each network node within the cluster 704 (i.e., include each of computing device 202(1), computing device 202(3), computing device 202(4), and computing device 202(5), as well as computing device 202(2) in this example). Furthermore, the computing cluster may optionally operate as a homogeneous full mesh. Moreover, the homogeneous full mesh may be self-healing such that the full mesh can heal itself by executing the switchover algorithm, as discussed further below. Alternatively, set 706 may optionally omit one or more network nodes within cluster 704.
In general, the computing cluster may operate in a manner such that each network node within the computing cluster executes in an active mode rather than each network node that is executing in an active mode being paired with a respective network node that is executing in a passive mode. Returning to
Moreover, in these examples, each network node within the computing cluster optionally operates as a candidate backup network node for every other network node within the computing cluster rather than operating as a candidate backup network node for just a single network node. Accordingly, in contrast to the technique of workflow diagram 602, each one of computing device 202(1), computing device 202(3), computing device 202(4), and/or computing device 202(5) may operate as a candidate backup network node for the others within set 706.
Returning to
Switching module 112 may switchover the assignment of the transaction task in a variety of ways. In some examples, the switchover algorithm executed by each network node in the set of multiple network nodes selects the second network node based at least in part on a proximity of the second network node to a client device that is requesting the transaction task. For example, the switchover algorithm may select a second network node that is located in Japan if a client device that is requesting the transaction task is also located in Japan, as further discussed above.
In some examples, switching module 112 may switch the assignment of the transaction task from the first network node to the second network node prior to the transaction task being completed. For example, the transaction task may correspond to an ecommerce transaction task according to which a user account performs a financial transaction using a web service. Switching module 112 may switch the assignment of the transaction task from the first network node to the second network node prior to the user completing the financial transaction using the web service. For example, the user account may be in the middle of navigating through the process of purchasing an airline ticket online or moving funds between different bank accounts. Prior to completing such a financial transaction, switching module 112 may switch the assignment of the transaction task from the first network node to the second network node. In these examples, switching module 112 may store state information, or context data, that preserves a current state of the user account's interaction with the web service, thereby enabling the user account to restore the same state after the transaction task has been switched from the first network node to the second network node, without requiring the user account to repeat any previously entered input.
In some examples, a number of transaction tasks that the switchover algorithm assigns from the first network node to the second network node is based at least in part on a current capacity that is available at the second network node. For example, computing device 202(1) may already be processing a number of transaction tasks prior to switching module 112 switching over transaction task 702 to computing device 202(1). Computing device 202(1) may have a defined or established capacity of processing 100 different transaction tasks (e.g., as one arbitrary example for illustrative purposes). Accordingly, in this specific example, if computing device 202(1) was already processing 80 transaction tasks then computing device 202(1) would be able to take on an additional 20 transaction tasks. Consequently, if computing device 202(2) was processing 30 separate transaction tasks prior to failing, then computing device 202(1) could only take on 20 of the transaction tasks, and a remainder of the transaction tasks would need to be switched over to one or more of the remaining network nodes within the computing cluster.
In more general terms, the switchover algorithm may determine that switching over the assignment of the transaction task from the first network node to the second network node exhausts a computing capacity that is defined for the second network node. Accordingly, the switchover algorithm further switches over an assignment of a second transaction task from the first network node to a third network node to prevent the second network node from being overloaded. Furthermore, the switchover algorithm may also repeat a process of assigning a portion of a remainder of remaining transaction tasks from the first network node to a new respective network node until an entirety of the remainder of remaining transaction tasks has been assigned. In the specific illustrative example discussed above, 10 transaction tasks are still remaining after 20 of them have been assigned to computing device 202(1). Accordingly, five more transaction tasks could be assigned to computing device 202(3) and five more transaction tasks could be assigned to computing device 202(4). In general, the switchover algorithm may assign one or more transaction tasks to one or more network nodes based on different factors, including a computing capacity available at the receiving network node, as well as a proximity of the receiving network node to the client device requesting the corresponding transaction task.
In additional examples, switching module 112 may switch over the assignment of the transaction task from the first network node to the second network node at least in part by transmitting a security policy that is specific to a user account requesting the transaction task to the second network node. For example, switching module 112 may transmit to computing device 202(1) a security policy that is specific to a user account attempting to perform a financial transaction through a web service, as part of the process of switching over the assignment of the transaction task from computing device 202(2) to computing device 202(1). Furthermore, in these examples, the security policy may optionally be based at least in part on a geolocation of the second network node. For example, a security policy applied to a network node processing the transaction task may be different when the network node is located in one geolocation, such as China, rather than a different geolocation, such as Japan.
Returning to
Performance module 114 may perform at least part of the remainder of transaction task 702 in a variety of ways. In general, performance module 114 may simply complete transaction task 702. In other examples, performance module 114 may only perform a part of the remainder of transaction task 702. In these examples, the assignment of the transaction task may be switched over to a third network node to complete the remainder of the transaction task. In general, performance module 114 may complete the remainder of the transaction task by executing a computing task requested by the client device, as discussed further above.
Performance module 114 may also perform at least part of the remainder of transaction task 702 without receiving any notification or communication informing computing device 202(1) of a decision to switch over the transaction task to computing device 202(1). In other words, switching module 112 may have, as part of computing device 202(1), already decided that computing device 202(1) will receive the assignment of the transaction task, and because computing device 202(1) assigned the transaction task to itself there is no need for another network node to inform computing device 202(1) to take over the transaction task. Similarly, the other network nodes within the set of multiple network nodes that also decided to switchover the transaction task from computing device 202(2) to computing device 202(1) need not transmit a notification message to computing device 202(1), and may simply take no further action regarding the switchover of the transaction task while trusting that computing device 202(1) obtained the same result from executing the switchover algorithm and automatically took over the transaction task itself.
Computing system 910 broadly represents any single or multi-processor computing device or system capable of executing computer-readable instructions. Examples of computing system 910 include, without limitation, workstations, laptops, client-side terminals, servers, distributed computing systems, handheld devices, or any other computing system or device. In its most basic configuration, computing system 910 may include at least one processor 914 and a system memory 916.
Processor 914 generally represents any type or form of physical processing unit (e.g., a hardware-implemented central processing unit) capable of processing data or interpreting and executing instructions. In certain embodiments, processor 914 may receive instructions from a software application or module. These instructions may cause processor 914 to perform the functions of one or more of the example embodiments described and/or illustrated herein.
System memory 916 generally represents any type or form of volatile or non-volatile storage device or medium capable of storing data and/or other computer-readable instructions. Examples of system memory 916 include, without limitation, Random Access Memory (RAM), Read Only Memory (ROM), flash memory, or any other suitable memory device. Although not required, in certain embodiments computing system 910 may include both a volatile memory unit (such as, for example, system memory 916) and a non-volatile storage device (such as, for example, primary storage device 932, as described in detail below). In one example, one or more of modules 102 from
In some examples, system memory 916 may store and/or load an operating system 940 for execution by processor 914. In one example, operating system 940 may include and/or represent software that manages computer hardware and software resources and/or provides common services to computer programs and/or applications on computing system 910. Examples of operating system 940 include, without limitation, LINUX, JUNOS, MICROSOFT WINDOWS, WINDOWS MOBILE, MAC OS, APPLE'S 10S, UNIX, GOOGLE CHROME OS, GOOGLE'S ANDROID, SOLARIS, variations of one or more of the same, and/or any other suitable operating system.
In certain embodiments, example computing system 910 may also include one or more components or elements in addition to processor 914 and system memory 916. For example, as illustrated in
Memory controller 918 generally represents any type or form of device capable of handling memory or data or controlling communication between one or more components of computing system 910. For example, in certain embodiments memory controller 918 may control communication between processor 914, system memory 916, and I/O controller 920 via communication infrastructure 912.
I/O controller 920 generally represents any type or form of module capable of coordinating and/or controlling the input and output functions of a computing device. For example, in certain embodiments I/O controller 920 may control or facilitate transfer of data between one or more elements of computing system 910, such as processor 914, system memory 916, communication interface 922, display adapter 926, input interface 930, and storage interface 934.
As illustrated in
As illustrated in
Additionally or alternatively, example computing system 910 may include additional I/O devices. For example, example computing system 910 may include I/O device 936. In this example, I/O device 936 may include and/or represent a user interface that facilitates human interaction with computing system 910. Examples of I/O device 936 include, without limitation, a computer mouse, a keyboard, a monitor, a printer, a modem, a camera, a scanner, a microphone, a touchscreen device, variations or combinations of one or more of the same, and/or any other I/O device.
Communication interface 922 broadly represents any type or form of communication device or adapter capable of facilitating communication between example computing system 910 and one or more additional devices. For example, in certain embodiments communication interface 922 may facilitate communication between computing system 910 and a private or public network including additional computing systems. Examples of communication interface 922 include, without limitation, a wired network interface (such as a network interface card), a wireless network interface (such as a wireless network interface card), a modem, and any other suitable interface. In at least one embodiment, communication interface 922 may provide a direct connection to a remote server via a direct link to a network, such as the Internet. Communication interface 922 may also indirectly provide such a connection through, for example, a local area network (such as an Ethernet network), a personal area network, a telephone or cable network, a cellular telephone connection, a satellite data connection, or any other suitable connection.
In certain embodiments, communication interface 922 may also represent a host adapter configured to facilitate communication between computing system 910 and one or more additional network or storage devices via an external bus or communications channel. Examples of host adapters include, without limitation, Small Computer System Interface (SCSI) host adapters, Universal Serial Bus (USB) host adapters, Institute of Electrical and Electronics Engineers (IEEE) 1394 host adapters, Advanced Technology Attachment (ATA), Parallel ATA (PATA), Serial ATA (SATA), and External SATA (eSATA) host adapters, Fibre Channel interface adapters, Ethernet adapters, or the like. Communication interface 922 may also allow computing system 910 to engage in distributed or remote computing. For example, communication interface 922 may receive instructions from a remote device or send instructions to a remote device for execution.
In some examples, system memory 916 may store and/or load a network communication program 938 for execution by processor 914. In one example, network communication program 938 may include and/or represent software that enables computing system 910 to establish a network connection 942 with another computing system (not illustrated in
Although not illustrated in this way in
As illustrated in
In certain embodiments, storage devices 932 and 933 may be configured to read from and/or write to a removable storage unit configured to store computer software, data, or other computer-readable information. Examples of suitable removable storage units include, without limitation, a floppy disk, a magnetic tape, an optical disk, a flash memory device, or the like. Storage devices 932 and 933 may also include other similar structures or devices for allowing computer software, data, or other computer-readable instructions to be loaded into computing system 910. For example, storage devices 932 and 933 may be configured to read and write software, data, or other computer-readable information. Storage devices 932 and 933 may also be a part of computing system 910 or may be a separate device accessed through other interface systems.
Many other devices or subsystems may be connected to computing system 910. Conversely, all of the components and devices illustrated in
The computer-readable medium containing the computer program may be loaded into computing system 910. All or a portion of the computer program stored on the computer-readable medium may then be stored in system memory 916 and/or various portions of storage devices 932 and 933. When executed by processor 914, a computer program loaded into computing system 910 may cause processor 914 to perform and/or be a means for performing the functions of one or more of the example embodiments described and/or illustrated herein. Additionally or alternatively, one or more of the example embodiments described and/or illustrated herein may be implemented in firmware and/or hardware. For example, computing system 910 may be configured as an Application Specific Integrated Circuit (ASIC) adapted to implement one or more of the example embodiments disclosed herein.
Client systems 1010, 1020, and 1030 generally represent any type or form of computing device or system, such as example computing system 910 in
As illustrated in
Servers 1040 and 1045 may also be connected to a Storage Area Network (SAN) fabric 1080. SAN fabric 1080 generally represents any type or form of computer network or architecture capable of facilitating communication between a plurality of storage devices. SAN fabric 1080 may facilitate communication between servers 1040 and 1045 and a plurality of storage devices 1090(1)-(N) and/or an intelligent storage array 1095. SAN fabric 1080 may also facilitate, via network 1050 and servers 1040 and 1045, communication between client systems 1010, 1020, and 1030 and storage devices 1090(1)-(N) and/or intelligent storage array 1095 in such a manner that devices 1090(1)-(N) and array 1095 appear as locally attached devices to client systems 1010, 1020, and 1030. As with storage devices 1060(1)-(N) and storage devices 1070(1)-(N), storage devices 1090(1)-(N) and intelligent storage array 1095 generally represent any type or form of storage device or medium capable of storing data and/or other computer-readable instructions.
In certain embodiments, and with reference to example computing system 910 of
In at least one embodiment, all or a portion of one or more of the example embodiments disclosed herein may be encoded as a computer program and loaded onto and executed by server 1040, server 1045, storage devices 1060(1)-(N), storage devices 1070(1)-(N), storage devices 1090(1)-(N), intelligent storage array 1095, or any combination thereof. All or a portion of one or more of the example embodiments disclosed herein may also be encoded as a computer program, stored in server 1040, run by server 1045, and distributed to client systems 1010, 1020, and 1030 over network 1050.
As detailed above, computing system 910 and/or one or more components of network architecture 1000 may perform and/or be a means for performing, either alone or in combination with other elements, one or more steps of an example method for performing computing cluster node switchover.
While the foregoing disclosure sets forth various embodiments using specific block diagrams, flowcharts, and examples, each block diagram component, flowchart step, operation, and/or component described and/or illustrated herein may be implemented, individually and/or collectively, using a wide range of hardware, software, or firmware (or any combination thereof) configurations. In addition, any disclosure of components contained within other components should be considered example in nature since many other architectures can be implemented to achieve the same functionality.
In some examples, all or a portion of example system 100 in
In various embodiments, all or a portion of example system 100 in
According to various embodiments, all or a portion of example system 100 in
In some examples, all or a portion of example system 100 in
In addition, all or a portion of example system 100 in
In some embodiments, all or a portion of example system 100 in
According to some examples, all or a portion of example system 100 in
The process parameters and sequence of steps described and/or illustrated herein are given by way of example only and can be varied as desired. For example, while the steps illustrated and/or described herein may be shown or discussed in a particular order, these steps do not necessarily need to be performed in the order illustrated or discussed. The various example methods described and/or illustrated herein may also omit one or more of the steps described or illustrated herein or include additional steps in addition to those disclosed.
While various embodiments have been described and/or illustrated herein in the context of fully functional computing systems, one or more of these example embodiments may be distributed as a program product in a variety of forms, regardless of the particular type of computer-readable media used to actually carry out the distribution. The embodiments disclosed herein may also be implemented using software modules that perform certain tasks. These software modules may include script, batch, or other executable files that may be stored on a computer-readable storage medium or in a computing system. In some embodiments, these software modules may configure a computing system to perform one or more of the example embodiments disclosed herein.
In addition, one or more of the modules described herein may transform data, physical devices, and/or representations of physical devices from one form to another. For example, one or more of the modules recited herein may transform the content of a physical memory device by synchronizing a data store for a microservice between two separate network nodes, as further discussed above. Additionally or alternatively, one or more of the modules recited herein may transform a processor, volatile memory, non-volatile memory, and/or any other portion of a physical computing device from one form to another by executing on the computing device, storing data on the computing device, and/or otherwise interacting with the computing device.
The preceding description has been provided to enable others skilled in the art to best utilize various aspects of the example embodiments disclosed herein. This example description is not intended to be exhaustive or to be limited to any precise form disclosed. Many modifications and variations are possible without departing from the spirit and scope of the instant disclosure. The embodiments disclosed herein should be considered in all respects illustrative and not restrictive. Reference should be made to the appended claims and their equivalents in determining the scope of the instant disclosure.
Unless otherwise noted, the terms “connected to” and “coupled to” (and their derivatives), as used in the specification and claims, are to be construed as permitting both direct and indirect (i.e., via other elements or components) connection. In addition, the terms “a” or “an,” as used in the specification and claims, are to be construed as meaning “at least one of.” Finally, for ease of use, the terms “including” and “having” (and their derivatives), as used in the specification and claims, are interchangeable with and have the same meaning as the word “comprising.”
This application is a continuation-in-part that claims the benefit of U.S. patent application Ser. No. 15/885,762, “Systems and Methods for Synchronizing Microservice Data Stores,” filed Jan. 31, 2018, which is incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | 15885762 | Jan 2018 | US |
Child | 15928770 | US |