A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The present invention is generally related to computer cluster environments, and data and resource management in such environments, and is particularly related to a system and method for using cluster level quorum to prevent split brain scenario in a data grid cluster.
Modern computing systems, particularly those employed by larger organizations and enterprises, continue to increase in size and complexity. Particularly, in areas such as Internet applications, there is an expectation that millions of users should be able to simultaneously access that application, which effectively leads to an exponential increase in the amount of content generated and consumed by users, and transactions involving that content. Such activity also results in a corresponding increase in the number of transaction calls to databases and metadata stores, which have a limited capacity to accommodate that demand.
In order to meet these requirements, a distributed data management and cache service can be run in the application tier so as to run in-process with the application itself, e.g., as part of an application server cluster. However, a loss of connectivity can occur rather frequently in the application server cluster, which can result in a split-brain scenario. There is a need to maintain the functionality of the distributed data management and cache service when such an event happens. This is the general area that embodiments of the invention are intended to address.
In accordance with an embodiment, a system and method is described for use with a data grid cluster, which uses cluster quorum to prevent split brain scenario. The data grid cluster includes a plurality of cluster nodes, each of which runs a cluster service. Each cluster service collects and maintains statistics regarding communication flow between its cluster node and the other cluster nodes in the data grid cluster. The statistics are used to determine a status associated with other cluster nodes in the data grid cluster whenever a disconnect event happens. The data grid cluster is associated with a quorum policy, which is defined in a cache configuration file, and which specifies a time period that a cluster node will wait before making a decision on whether or not to evict one or more cluster nodes from the data grid cluster.
In accordance with an embodiment, as referred to herein a “data grid cluster”, or “data grid”, is a system comprising a plurality of computer servers which work together to manage information and related operations, such as computations, within a distributed or clustered environment. The data grid cluster can be used to manage application objects and data that are shared across the servers. Preferably, a data grid cluster should have low response time, high throughput, predictable scalability, continuous availability and information reliability. As a result of these capabilities, data grid clusters are well suited for use in computational intensive, stateful middle-tier applications. Some examples of data grid clusters, e.g., the Oracle Coherence data grid cluster, can store the information in-memory to achieve higher performance, and can employ redundancy in keeping copies of that information synchronized across multiple servers, thus ensuring resiliency of the system and the availability of the data in the event of server failure. For example, Coherence provides replicated and distributed (partitioned) data management and caching services on top of a reliable, highly scalable peer-to-peer clustering protocol, with no single points of failure, and can automatically and transparently fail over and redistribute its clustered data management services whenever a server becomes inoperative or disconnected from the network.
Data Grid Cluster Services
In accordance with an embodiment, the functionality of a data grid cluster is based on using different cluster services. The cluster services can include root cluster services, partitioned cache services, and proxy services. Within the data grid cluster, each cluster node can participate in a number of cluster services, both in terms of providing and consuming the cluster services. Each cluster service has a service name that uniquely identifies the service within the data grid cluster, and a service type, which defines what the cluster service can do. Other than the root cluster service running on each cluster node in the data grid cluster, there may be multiple named instances of each service type. The services can be either configured by the user, or provided by the data grid cluster as a default set of services.
In accordance with various embodiments, servers that store data within the data grid cluster can support a set of quorum features. The quorum features can be used to make decisions on physical resource planning (e.g. server, RAM memory, etc), and to determine how the data grid cluster behaves in the absence of such physical resources. As referred to herein, a quorum refers to the minimum number of service members in a cluster that is required before a particular service action is allowed or disallowed. By way of illustration, during deployment, the physical resources of the data grid cluster can be selected according to a plan that is based on the amount of data and requests that will be processed by the grid. For example, a data grid cluster can have 10 servers with totally 10 gigabytes of random access memory (RAM) for handling the grid computing. However, in the event that a subset of those servers and/or RAM fail, it may be important to implement a system to manage how the data grid cluster will behave in their absence. In accordance with an embodiment, the quorum feature enables the data grid cluster to manage the cluster processing in the event of losing some of those resources.
In accordance with an embodiment, the quorum features can enable the configuring of the data grid cluster at the cluster level. The system can use the cluster quorum policy to specify a time period that a data grid cluster defers to make a decision on whether or not to evict one or more cluster nodes in question after a disconnection happens. Such a cluster quorum can prevent a split brain scenario in a data grid cluster with a plurality of cluster nodes when a disconnection event happens.
Cluster Quorum
In accordance with one embodiment, a cluster quorum can enable management of cluster/machine network membership. For example, the quorum can be used to control the ability of a machine to join and become a member of the cluster, or to get evicted from the cluster. In accordance with an embodiment, quorum policies can also control what happens when members connect to the cluster, and also when members leave the cluster.
A root cluster service is automatically started when a cluster node joins a cluster, and typically there is exactly one root cluster service running on each cluster node. The root cluster service keeps track of the membership and services in the cluster. For example, the root cluster service is responsible for detecting other cluster nodes, monitoring the failure or death of other cluster nodes, and can be responsible for registering the availability of other services in the cluster. In one embodiment, a cluster node is considered a suspect cluster member when it has not responded to network communications, and is in imminent danger of being disconnected from the cluster.
As shown in
One exemplary cluster quorum defines a timeout survivor quorum threshold that can be configured in an operational override file using the <timeout-survivor-quorum> element and optionally the role attribute. This element can be used within a <cluster-quorum-policy> element. Listing 1 illustrates configuring the timeout survivor quorum threshold to ensure that five cluster members with the server role are always kept in the cluster while removing suspect members, in accordance with an embodiment.
Preventing “Split Brain” Scenario
In accordance with an embodiment, a data grid cluster can have a large number of interconnected cluster nodes. The system can take into consideration that disconnection events can happen routinely within the data grid cluster, and are not necessarily rare and abnormal events. For example, an intermittent network outage can cause a large number of cluster members to be removed from the cluster.
In accordance with one embodiment, the system can determine a status associated with each cluster node in the data grid cluster when the disconnect event happens, based on the statistics 321-326 maintained on each cluster node. There are generally different types of cluster nodes in the data grid cluster: a first set of nodes that are definitely dead; a second set of nodes that are definitely alive; and a third set of nodes that are in question or that no deterministic answer can be currently given.
In the example illustrated in
In accordance with one embodiment, a split brain scenario can happen in a data grid cluster when the data grid cluster makes a quick decision to evict the cluster nodes that are in question after a disconnection event strikes.
In the example as shown in
In accordance with various embodiments, the system can use cluster quorum strategies to prevent the split brain scenario in a data grid. One exemplary quorum strategy allows the data grid cluster to wait before making a decision on whether or not to evict one or more nodes in the data grid cluster, based on the assumptions that temporary disconnection events can be resolved in short time period. For example, an unintentional unplugged power cable for a network switch can be plugged back as soon as it is detected. In accordance with various embodiments, the cluster quorum policy can specify the time period for how long the cluster will defer making a decision on whether or not to evict one or more cluster nodes in the data grid cluster.
In accordance with various embodiments, in order to prevent the split brain scenario, a cluster quorum policy can specify that human intervention from an administrator is required, when there are less than a minimum number of nodes alive in the cluster, or when there are more than a maximum number of nodes in question existing in the cluster.
In accordance with one embodiment, referring to the illustration of
In accordance with one embodiment, there is still a possibility that the disconnection event cannot be resolved during the time period specified in the cluster quorum policy. The data grid cluster can now evict part of the cluster that was disconnected based on pre-configured or user supplied policies. Also, the data grid cluster can make a decision to evict part of the cluster if there are conflictions among the cluster nodes that are reconnected.
Enabling Custom Action Policies in the Cluster Quorum
In accordance with an embodiment, custom action policies can be used instead of the default quorum policies in order to incorporate user logics to support different cluster services in the data grid cluster system. The custom policies specified in user applications can incorporate arbitrary external states to provide fine grained resource-driven control of the services, since the user applications are in the best position to manage these external states.
In the example shown in
Throughout the various contexts described in this disclosure, the embodiments of the invention further encompass computer apparatus, computing systems and machine-readable media configured to carry out the foregoing systems and methods. In addition to an embodiment consisting of specifically designed integrated circuits or other electronics, the present invention may be conveniently implemented using a conventional general purpose or a specialized digital computer or microprocessor programmed according to the teachings of the present disclosure, as will be apparent to those skilled in the computer art.
Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art. The invention may also be implemented by the preparation of application specific integrated circuits or by interconnecting an appropriate network of conventional component circuits, as will be readily apparent to those skilled in the art.
The various embodiments include a computer program product which is a storage medium (media) having instructions stored thereon/in which can be used to program a general purpose or specialized computing processor(s)/device(s) to perform any of the features presented herein. The storage medium can include, but is not limited to, one or more of the following: any type of physical media including floppy disks, optical discs, DVDs, CD-ROMs, microdrives, magneto-optical disks, holographic storage, ROMs, RAMs, PRAMS, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs); paper or paper-based media; and any type of media or device suitable for storing instructions and/or information. The computer program product can be transmitted in whole or in parts and over one or more public and/or private networks wherein the transmission includes instructions which can be used by one or more processors to perform any of the features presented herein. The transmission may include a plurality of separate transmissions. In accordance with certain embodiments, however, the computer storage medium containing the instructions is non-transitory (i.e. not in the process of being transmitted) but rather is persisted on a physical device.
The foregoing description of the preferred embodiments of the present invention has been provided for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations can be apparent to the practitioner skilled in the art. Embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the relevant art to understand the invention. It is intended that the scope of the invention be defined by the following claims and their equivalents.
This application claims the benefit of priority to U.S. Provisional Patent Application No. 61/437,546, titled “QUORUM IN A DISTRIBUTED DATA GRID”, filed Jan. 28, 2011, which application is herein incorporated by reference.
Number | Name | Date | Kind |
---|---|---|---|
5784569 | Miller et al. | Jul 1998 | A |
5819272 | Benson | Oct 1998 | A |
5940367 | Antonov | Aug 1999 | A |
5991894 | Lee et al. | Nov 1999 | A |
5999712 | Moiin et al. | Dec 1999 | A |
6487622 | Coskrey, IV et al. | Nov 2002 | B1 |
6693874 | Shaffer et al. | Feb 2004 | B1 |
7139925 | Dinker et al. | Nov 2006 | B2 |
7376953 | Togasaki | May 2008 | B2 |
7543046 | Bae et al. | Jun 2009 | B1 |
7720971 | Moutafov | May 2010 | B2 |
7739677 | Kekre et al. | Jun 2010 | B1 |
7792977 | Brower et al. | Sep 2010 | B1 |
7814248 | Fong et al. | Oct 2010 | B2 |
7953861 | Yardley | May 2011 | B2 |
8195835 | Ansari et al. | Jun 2012 | B2 |
8209307 | Erofeev | Jun 2012 | B2 |
8312439 | Kielstra et al. | Nov 2012 | B2 |
20020035559 | Crowe et al. | Mar 2002 | A1 |
20020073223 | Darnell et al. | Jun 2002 | A1 |
20020078312 | Wang-Knop et al. | Jun 2002 | A1 |
20030120715 | Johnson et al. | Jun 2003 | A1 |
20030187927 | Winchell | Oct 2003 | A1 |
20030191795 | Bernardin et al. | Oct 2003 | A1 |
20040059805 | Dinker et al. | Mar 2004 | A1 |
20040179471 | Mekkittikul et al. | Sep 2004 | A1 |
20040267897 | Hill et al. | Dec 2004 | A1 |
20050021737 | Ellison et al. | Jan 2005 | A1 |
20050083834 | Dunagan et al. | Apr 2005 | A1 |
20050138460 | McCain | Jun 2005 | A1 |
20050193056 | Schaefer et al. | Sep 2005 | A1 |
20060095573 | Carle et al. | May 2006 | A1 |
20070016822 | Rao et al. | Jan 2007 | A1 |
20070118693 | Brannon et al. | May 2007 | A1 |
20070140110 | Kaler | Jun 2007 | A1 |
20070174160 | Solberg et al. | Jul 2007 | A1 |
20070237072 | Scholl | Oct 2007 | A1 |
20070260714 | Kalmuk et al. | Nov 2007 | A1 |
20070271584 | Anderson et al. | Nov 2007 | A1 |
20080183876 | Duvur et al. | Jul 2008 | A1 |
20080276231 | Huang et al. | Nov 2008 | A1 |
20080281959 | Robertson | Nov 2008 | A1 |
20090265449 | Krishnappa et al. | Oct 2009 | A1 |
20090320005 | Toub et al. | Dec 2009 | A1 |
20100128732 | Jiang | May 2010 | A1 |
20100312861 | Kolhi et al. | Dec 2010 | A1 |
20110041006 | Fowler | Feb 2011 | A1 |
20110107135 | Andrews et al. | May 2011 | A1 |
20110161289 | Pei et al. | Jun 2011 | A1 |
20110179231 | Roush | Jul 2011 | A1 |
20110249552 | Stokes et al. | Oct 2011 | A1 |
20120117157 | Ristock | May 2012 | A1 |
20120158650 | Andre et al. | Jun 2012 | A1 |
20120215740 | Vaillant et al. | Aug 2012 | A1 |
Number | Date | Country | |
---|---|---|---|
20120197822 A1 | Aug 2012 | US |
Number | Date | Country | |
---|---|---|---|
61437546 | Jan 2011 | US |