In various industries, such as providing on-line services and sales, it may be important to provide a computing platform that is both robust and fault tolerant such that no single failure causes a shutdown of the entire computing platform. To address this issue, a “cluster” of interrelated computing devices may be used in a way such that various tasks handled by a first computing device may be assigned to a second computing device should the first computing device fail for some reason.
Various examples of this disclosure that are proposed as examples will be described in detail with reference to the following figures, wherein like numerals reference like elements, and wherein:
The methods and systems disclosed below may be described generally, as well as described in terms of specific examples. For instances where references are made to detailed examples, it is noted that any of the underlying principles described are not to be limited to a single example but may be expanded for use with any of the other methods and systems described herein as will be understood by one of ordinary skill in the art unless otherwise specifically stated.
For the purposes of this disclosure, the following definitions apply.
The term “process” refers to a set of instructions usable on one or more machines, such as a computer, that performs a useful activity.
The term “activity” refers to any task/endeavor that may be found useful and/or desirable and that may be performed, in whole or in part, by a process. An activity may thus include, for example, email delivery, medical diagnoses, fraud detection, gaming, providing an on-line sales platform, and so on.
The term “workload” refers to a single process or any number of related processes designed to perform one or more activities, or a portion of a single activity. By way of example, a first workload may consist of a number of back-end processes (e.g., inventory control) used to support an on-line sales platform. Similarly, a second workload may consist of a number of processes used to support a Graphic User Interface (GUI) that both displays various forms of data while receiving user instructions.
Workloads may be considered “related” (or have a “relationship”) if they have some form of functional relationship(s) with one another. By way of example, the second workload discussed above may be related (i.e., have a “relationship”) to the first workload should the second workload provide an online GUI for an activity provided by the first workload.
The term “manager” refers to some form of software-based process capable of being run by a computing device that performs a number of specific or otherwise-identified management function(s), such as the organization and/or coordination of resources to perform a number of activities. Alternatively, a “manager” may refer to a computing device that incorporates such software-based process.
The term “computing node” (or “node”) as used herein refers to some form of computing system capable of supporting a number of workloads. In various examples the term “computing node” includes, but is not limed to, computers, computer-based servers, a network of computer-based servers, devices augmented with specialized math and/or graphics processors, and so on.
The term “migration” refers to a number of processes (e.g., a workload) being moved from one physical device, e.g., a computing node, to another device typically (but not necessarily) with little or no disruption in service.
The term “cluster of computing nodes” (or “computing cluster”) refers to a number of communicatively-coupled computing nodes that are capable of performing separate workloads. By way of example, two separate workloads may be distributed among any one or two computing nodes in a computing cluster having three or more separate computing nodes.
The term “failover” refers to the act of migrating (i.e., assigning and placing) a workload from a first computing node to a redundant or backup computing node in response to some form of failure of the first computing node. Unless otherwise stated, a “failover” is to be consider an automatic response to a computing note failure. This is in contrast to a “switchover,” which requires some form of human interaction to migrate a workload from one computing note to a second computing note. This is also in contrast to “automated with manual approval,” which refers to a failover reconfiguration that runs automatically once a user acknowledges the underlying failover and approves of the resulting reconfiguration.
It is to be appreciated that failover may be automated by the use of some form of “watchdog” and/or “heartbeat” system that connects multiple systems. In various examples, as long as a particular computing node provides a regular “pulse,” “heartbeat,” or other appropriate signal to some form of systems management device, the hardware of the computing node may be considered healthy. Further, for the purposes of this disclosure, monitoring (“watchdog”) processes may be used to monitor the various processes that constitute a particular workload. By way of example, it may be the responsibility of a short interrupt-based routine to periodically monitor three separate processes constituting a workload to determine whether or not each of such workload processes has malfunctioned.
The term “failback” generally refers to a restoration of workload to a particular computing node once a previous failure of the computing node is resolved. However, for the purposes of this disclosure, the term “failback” may refer to any reorganization of workloads in a computing cluster when one or more computing nodes that previously failed become again available to process workloads.
This disclosure proposes workload assignment and placement approaches related to cluster-based solutions that take the nature of various relationships among workloads into consideration. An “assignment manager,” for example, may provide and/or apply a number of heuristic rules that allow for an improved and/or optimized placement of related workloads among a cluster of computing nodes. A “placement manager” is an entity associated with an assignment manager that is responsible for the placement of workloads according to assignments dictated by the assignment manager while maintaining the defined relationships among workloads. Thus, the disclosed methods and systems can efficiently handle node failures without compromising the availability of workloads and the relationships defined between them. The methods and systems also manage the balancing of workloads among computing nodes such that computing resources are appropriately, if not optimally, distributed among available computing nodes.
An appropriate example balancing rule may be a rule assigning the most computational-intensive workload to the most computationally-capable computing node available at any given time, which has an effect of assigning workloads based on relative computing usage. As another example balancing rule, it may be useful to structure a rule that assigns computing resources such that a first workload will support the instantaneous computing requisites of a related second workload while at the same assign sufficient computing resources such that the second workload will support the instantaneous computing requisites of the first workload. However, it is to be appreciated that this “balancing” of computing resources may be based upon a number of criteria beyond instantaneous processing issues. For instance, “balancing” may include the balancing of average workload processing over a given time period, balanced to avoid subjective issues (e.g., to limit on-line delay when interfacing with human users), or balanced according to any other useful or desirable criteria.
Still another example balancing rule may include a rule assigning different workloads to different nodes when possible so as to assure some distributed processing, as opposed to assigning multiple workloads to a single computing node.
While workload balancing may be accomplished using a set of heuristic rules, workload balancing may also be accomplished using some form of scoring system related to individual workload and overall system performance. Different workload balancing rules may result in how workload placement is evaluated, i.e., “scored.” By way of example, applying an objective to assign the most computationally-intensive workloads to the most computationally-capable available computing nodes, producing and using a “score” of each possible combination of workload assignments becomes highly manageable. For instance, applying this rule in a scenario where two workloads of different computational bandwidth are to be distributed in a cluster of four computing nodes having different processing capability, there may be no more than six separate scores to consider assuming that the two workloads are not to be run on a single node.
The ideal of balancing, however, may not be limited to applying any single rule or objective at one time. For instance, using a weighted parametric equation based on number of desirable objectives and/or rules, it may be possible to “score” an individual assignment of a workload to a given computing node as well as “score” an entire set of workload assignments to any set of computing nodes. By way of example, if there are two related workloads that may be processed by any one or two computing nodes of a cluster of four computing nodes, then at most a total of sixteen (16) scores need be addressed. The number of scores to be considered naturally decreases upon the failure of one or more individual nodes. For instance, using the example immediately above if two of the four computing nodes fail, then at most four (4) scores need to be considered when considering workload assignments.
Further, scoring may be based on past experience using different workload placements while objectively and/or evaluating (i.e., “scoring”) overall system performance for each workload placement combination.
Still further, it may be useful to consider completely different scoring criteria based on different events. For example, a first set of scoring criteria may be beneficial to consider upon initial workload placement, a second set of scoring criteria may be considered for failover events, and a third set of scoring criteria may be considered for failback events.
In view of the different approaches there may be to scoring workload assignments, it is to be appreciated that the form of a “score” may vary widely. In various examples a “score” may be represented by a numeric value. In other examples a “score” may be represented by a rule giving preference to one arrangement of workloads over one or more other arrangements of workloads. However, in still other examples the notion of a “score” may take any form so long as such a form can provide some indication of workload arrangement preference.
Turning now to the drawings,
While the example workload managing device 110 is depicted as a separate device in
In operation, the workload managing device 110 may manage the individual placement of individual workloads among the computing nodes {122-1, . . . 122-N} based on any number of criteria, including criteria that takes the relationships between different workloads into account and criteria that does not take the relationships between different workloads into account. In the case of failover and failback events, the workload managing device 110 may manage the assignment and placement (i.e., the migration) of any number of workloads between the computing nodes {122-1, . . . 122-N} in an effort to maintain some activity performed by the computing cluster 120 while attempting to balance workload processing among the available computing nodes.
In operation, as the process(es) constituting the workload 210 performs some form of activity, and the monitoring (e.g., “watchdog”) process 220 may perform any number of monitoring services to assure that the workload 210 is functioning within some range of expectations. Similarly, the monitoring hardware 230 may perform any number of hardware-based checks to determine whether or not there has been some form of failure that affects the performance of the computing device. During the monitoring of the workload 210, the monitoring process 220 and the monitoring hardware 230 may send out any number of signals, such as regular “heartbeat” pulses, that may inform some form of managing device, such as the workload managing device 110 of
Although the example workload managing device 300 of
Still further, in other examples, one or more of the various components 310-390 can take form of separate servers coupled together via one or more networks. Additionally, it should be appreciated that each of components 310-390 advantageously can be realized using multiple computing devices employed in a cooperative fashion. For example, by employing two or more separate computing devices, e.g., servers, to provide separate processing and data-handling needs, processing bottlenecks can be reduced/eliminated, and the overall computing time may be significantly reduced.
It also should be appreciated that some processing, typically implemented in software/firmware routines residing in program memory 320, alternatively may be implemented using dedicated processing logic. Still further, some processing may be performed by software/firmware processes residing in separate memories in separate servers/computers being executed by different controllers.
In operation, the example workload managing device 300 can first perform a number of setup operations including transferring an operating system and a number of appropriate program(s) from the program storage device 350 to the program memory 320. In the present example of
In addition, setup operations may include transferring computing node information 342 and workload information 344 from the database storage device 340 to the data memory 330. In various examples, “computing node information” refers to information that describes the computing capabilities of each individual computing node managed by the workload managing device 300. Similarly, “workload information” refers to information that describes aspects about both individual workloads (e.g., peak and average computing bandwidth used) as well as describes any number of relationships between workloads. For instance, the workload information 344 of
Subsequent operations of the example workload managing device 300 are discussed below with respect to
The method 400A starts in operation 410 where computing node information and workload information are acquired in some manner, e.g., received or derived. As discussed above computing node information may include information relating to any performance aspect (e.g., instructions per second), structural aspect (e.g., inclusion of a graphics processor or input/output capacity), or any other characteristic of a computing device. As is also discussed above, workload information may include information about individual workloads (e.g., computing bandwidth usage or special hardware requirements) as well as information about the relationships between workloads (e.g., processing disruption caused by latency of a particular workload).
In operation 412, a number of individual scores and/or a set of one or more heuristic rules may be determined to address each possible combination I arrangement of different workloads among different computing nodes. As discussed above the various scores may address at least one functional relationship between two workloads. Similarly, at least one heuristic rule of the set of heuristic rules may address at least one functional relationship between two workloads. As is also discussed above such individual scores and/or heuristic rules may be based upon the computing node information and workload information of operation 410, as well as based upon any number of user-provided qualifiers that may be useful or desirable (e.g., reduced on-line latency).
In operation 414, a device and/or a software process, such as the assignment manager 352 of
In operation 420, a determination is made as to whether a manual assignment is requested by a user (e.g., a systems or network administrator). In the present example of
In operation 422, a determination is made as to whether a computing node currently being used by one or more workloads has failed resulting in a failover event. If such a computing node failure has occurred the method 400A jumps to “C” (i.e., the method 400C of
In operation 424, a determination is made as to whether a previously failed or otherwise unavailable computing node has become available providing a possible failback event. If such a computing node becomes available the method 400A jumps to “D” (i.e., the method 400D of
In operation 426, individual workloads continue to run on their previously assigned to computing nodes without reassignment, and the method 400A jumps back to operation 420 allowing operations 420-426 to be repeated as long as may be useful or desired.
The method 400B starts in operation 430 where a workload is manually reassigned to a specific computing node of a computing cluster. Next, in operation 432, a determination is made as to whether there has been a node placement failure and for some reason the manually assigned workload cannot be placed in the desired computing node. If a node placement failure has occurred, then the method 400B jumps to operation 434; otherwise, the method 400B continues to operation 436.
In operation 434 a message is sent to inform a user that the requested assignment could not be fulfilled, and the method 400B continues to “A” (i.e., the method 400A of
In operation 436 according to the present example, any automatic assignment that changes/countermands the manual assignment is disabled except in a case where the manually-assigned computing node fails. However, as discussed above it is possible for operation 436 to vary in other examples to the point where a manual assignment may be ignored in any subsequent failover or failback event. The method 400B then continues to “A” (i.e., the method 400A of
The method 400C starts in operation 440 where one or more workloads are reassigned to specific computing node(s) of a computing cluster using a device, such as the assignment manager 352 of
In operation 442, an attempt is made by a device, such as the placement manager 354 of
In operation 446, another workload assignment is made similar to the workload assignment of operation 440 but taking node placement failure into account by marking any computing node subject to placement failure as unavailable. The method 400C then jumps back to operation 42 where operations 442-446 may be repeated until all workloads have been successfully reassigned and placed.
The method 400D starts in operation 450 where one or more workloads are reassigned to specific computing node(s) of a computing cluster using a device, such as the assignment manager 352 of
In operation 452 an attempt is made by a device, such as the placement manager 354 of
In operation 456 another workload assignment is made similar to the workload assignment of operation 450 but taking node placement failure into account so as to mark any computing node subject to placement failure as unavailable. The method 400D then jumps back to operation 452 where operations 452-456 may be repeated until all workloads have been successfully reassigned and placed.
Given the similarity of operations 450-456 in
In operation, the hardware processor 510 accesses the executable instructions stored on the machine-readable storage medium 520 so as to cause the hardware processor 510 execute the executable instructions stored thereon. As the executable instructions of
Starting at
Finally,
In various examples the above-described systems and/or methods may be implemented with any of a variety of circuitry. In those examples where any particular device or method is implemented using a programmable device, such as a computer-based system or programmable logic, it should be appreciated that the above-described systems and methods can be implemented using any of various known or later developed programming or scripting languages, such as “SQL,” “C,” “C++,” “FORTRAN,” Pascal,” “Python,” “VHDL” and the like.
Accordingly, various storage media, such as magnetic computer disks, optical disks, electronic memories or any other form of non-transient computer-readable storage memory, can be prepared that can contain information and instructions that can direct a device, such as a computer, to implement the above-described systems and/or methods. Such storage devices can be referred to as “computer program products” for practical purposes. Once an appropriate device has access to the information and programs contained on the storage media I computer program product, the storage media can provide the information and programs to the device, thus enabling the device to perform the above-described systems and/or methods. Unless otherwise expressly stated, “storage medium” is not an electromagnetic wave per se.
For example, if a computer disk containing appropriate materials, such as a source file, an object file, an executable file or the like, were provided to a computer, the computer could receive the information, appropriately configure itself and perform the functions of the various systems and methods outlined in the diagrams and flowcharts above to implement the various functions. That is, the computer could receive various portions of information from the disk relating to different elements of the above-described systems and/or methods, implement the individual systems and/or methods and coordinate the functions of the individual systems and/or methods related to database-related services.
While the methods and systems above are described in conjunction with specific examples, it is evident that many alternatives, modifications, and variations will be apparent to those skilled in the art. Accordingly, the examples above as set forth herein are intended to be illustrative, not limiting. There are changes that may be made without departing from the scope of the present disclosure.