SYSTEM AND METHOD FOR PROVIDING PARTITION PERSISTENT STATE CONSISTENCY IN A DISTRIBUTED DATA GRID

Description

A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.

FIELD OF INVENTION

The present invention is generally related to computer systems, and is particularly related to a distributed data grid.

BACKGROUND

Modern computing systems, particularly those employed by larger organizations and enterprises, continue to increase in size and complexity. Particularly, in areas such as Internet applications, there is an expectation that millions of users should be able to simultaneously access that application, which effectively leads to an exponential increase in the amount of content generated and consumed by users, and transactions involving that content. Such activity also results in a corresponding increase in the number of transaction calls to databases and metadata stores, which have a limited capacity to accommodate that demand.

This is the general area that embodiments of the invention are intended to address.

SUMMARY

Described herein are systems and methods that can provide partition persistent state consistency in a distributed data grid. The distributed data grid can provide a plurality of copies of a partition on a plurality of cluster nodes in the distributed data grid, wherein the plurality of cluster nodes includes a primary owner node and one or more backup nodes for the partition. The primary owner node can propagate one or more modifications of the partition from the primary owner node to the one or more backup nodes. The distributed data grid can ensure consistency among the plurality copies of the partition on the plurality of cluster nodes in the distributed data grid.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is an illustration of a data grid cluster in accordance with various embodiments of the invention.

FIG. 2 shows an illustration of providing partition persistent state consistency in a distributed data grid in accordance with an embodiment of the invention.

FIG. 3 shows an illustration of supporting partition persistent state consistency in a distributed data grid when adding a new backup node in accordance with an embodiment of the invention.

FIG. 5 illustrates an exemplary flow chart for providing partition persistent state consistency in a distributed data grid in accordance with an embodiment of the invention.

DETAILED DESCRIPTION

Described herein is a system and method that can provide partition persistent state consistency in a distributed data grid.

In accordance with an embodiment, as referred to herein a “distributed data grid”, “data grid cluster”, or “data grid”, is a system comprising a plurality of computer servers which work together to manage information and related operations, such as computations, within a distributed or clustered environment. The data grid cluster can be used to manage application objects and data that are shared across the servers. Preferably, a data grid cluster should have low response time, high throughput, predictable scalability, continuous availability and information reliability. As a result of these capabilities, data grid clusters are well suited for use in computational intensive, stateful middle-tier applications. Some examples of data grid clusters, e.g., the Oracle Coherence data grid cluster, can store the information in-memory to achieve higher performance, and can employ redundancy in keeping copies of that information synchronized across multiple servers, thus ensuring resiliency of the system and the availability of the data in the event of server failure. For example, Coherence provides replicated and distributed (partitioned) data management and caching services on top of a reliable, highly scalable peer-to-peer clustering protocol.

An in-memory data grid can provide the data storage and management capabilities by distributing data over a number of servers working together. The data grid can be middleware that runs in the same tier as an application server or within an application server. It can provide management and processing of data and can also push the processing to where the data is located in the grid. In addition, the in-memory data grid can eliminate single points of failure by automatically and transparently failing over and redistributing its clustered data management services when a server becomes inoperative or is disconnected from the network. When a new server is added, or when a failed server is restarted, it can automatically join the cluster and services can be failed back over to it, transparently redistributing the cluster load. The data grid can also include network-level fault tolerance features and transparent soft re-start capability.

In accordance with an embodiment, the functionality of a data grid cluster is based on using different cluster services. The cluster services can include root cluster services, partitioned cache services, and proxy services. Within the data grid cluster, each cluster node can participate in a number of cluster services, both in terms of providing and consuming the cluster services. Each cluster service has a service name that uniquely identifies the service within the data grid cluster, and a service type, which defines what the cluster service can do. Other than the root cluster service running on each cluster node in the data grid cluster, there may be multiple named instances of each service type. The services can be either configured by the user, or provided by the data grid cluster as a default set of services.

FIG. 1 is an illustration of a data grid cluster in accordance with various embodiments of the invention. As shown in FIG. 1, a data grid cluster 100, e.g. an Oracle Coherence data grid, includes a plurality of cluster nodes 101-106 having various cluster services 111-116 running thereon. Additionally, a cache configuration file 110 can be used to configure the data grid cluster 100.

Partition Persistent State Consistency

In accordance with an embodiment of the invention, partition persistent state consistency can be supported in the distributed data grid. The partition persistent state consistency can be beneficial in enabling various protocols in the distributed data grid, such as the partition backup protocol, the partition transfer protocol, and the partition ownership change protocol.

FIG. 2 shows an illustration of providing partition persistent state consistency in a distributed data grid in accordance with an embodiment of the invention. As shown in FIG. 2, a distributed data grid 201 can comprise a plurality of cluster nodes, e.g. the cluster nodes A-C 211-213, that maintains different partitions. Each Partition in the distributed data grid 201 can hold various software objects in a middleware environment 200. Furthermore, each partition can be stored in multiple copies in the distributed data grid 201, e.g. the partition copies A-C 221-223 on different cluster nodes A-C 211-213 for a partition.

In accordance with an embodiment of the invention, one cluster node in the distributed data grid 201 can be the primary owner node of the partition, while the other cluster nodes serve as the backup nodes for the primary owner node. The primary owner node A211 can manage the state of the partition, such as controlling the partition version and propagating one or more modifications and/or updates, e.g. the modifications I-II 231-232, to the various backup nodes B-C 212-213. Furthermore, each partition copy A-C 221-223 can be maintained in different versions depending on whether or not a modification, e.g. a modification I 231 or II 232, has been applied on a particular copy of the partition 221-223.

In this example, the partition copy A 221 can be maintained on the cluster node A 211, which is the primary owner node of the partition. The cluster node B 212, which maintains a partition copy B 222, and the cluster node C 213, which maintains a partition copy C 223, are both the backup nodes. A client 202 can interact with the cluster node A 211, such as performing one or more data grid operations that update or modify the partition copy A 221.

The distributed data grid 201 can ensure that the consistency among the different copies of the partition 221-223 stored on the plurality of cluster nodes A-C 211-213 by ensuring that the various modifictions I-II 431-432 are applied on each copy of the partition A-C 221-223 according to an order. Furthermore, the distributed data grid 201 can propagate the different modifications from the primary owner node A 211 to the backup nodes B-C 212-213, e.g. using backup messages. As shown in FIG. 2, the primary owner node A 211 can initiate one or more modifications, e.g. modifications I 231 and II 232, based on one or more messages received from a client 202.

In accordance with an embodiment of the invention, the distributed data grid 201 is based on a peer-to-peer architecture. The underlying message transport layer can guarantee an orderly delivery of one or more messages over a connection between each pair of directly-connected cluster nodes in the distributed data grid. For example, the backup messages, containing either modification I 231 or II 232, can be delivered orderly over the connection between the cluster node A 211 and the cluster node B 212.

On the other hand, under the peer-to-peer architecture, messages sending from a source cluster node to a destination cluster node within a distributed data grid 201 can be delivered via different routs. As shown in FIG. 2, the cluster node A 211 can either deliver a backup message directly to the cluster node C 213, or via the cluster node B 213. Furthermore, the delivery routs may become more complex and less predictable when more cluster nodes are involved.

Thus, the delivery of the backup messages containing various modifications can be out-of-order within the distributed data grid 201. For example, the cluster node 213 can receive a backup message containing modification I 231 after receiving another backup message containing modification II 232, even though the message transport protocol can ensure that the modification I 231 arrives at the cluster node B 212 before the modification II 232.

As shown in FIG. 2, after the cluster node C 213 receives the modification II 232, the cluster node C 213 can determine whether it has already applied the modification I 231 to the partition copy C 223. The cluster node C 213 can defer applying the modification II 232 on the partition copy C 223 until the modification I 231 is received and applied on the partition copy C 223.

In accordance with an embodiment of the invention, partition persistent state information, such as a partition version number, can be assigned to each partition copy A-C 221-223 to ensure the consistency among the different partition copies A-C 221-223 in the distributed data grid 201. In the above example, the cluster node C 213 can check the partition version number for the partition copy C 213 to easily find out whether the partition copy C 223 has been updated with modification I 231.

Furthermore, the distributed data grid 201 allows the primary owner node A 211 to resend one or more modifications to a backup node after receiving a special message from the backup node. For example, this special message can be either an empty message or a message containing the latest modification received at the backup node.

FIG. 3 shows an illustration of supporting partition persistent state consistency in a distributed data grid when adding a new backup node in accordance with an embodiment of the invention. As shown in FIG. 3, a partition can be stored initially in a plurality of cluster nodes in a distributed data grid 301, e.g. cluster nodes A-B 311-312. The cluster node A 311, which maintains partition copy A 321, is the primary owner node and the cluster node B 312, which maintains partition copy B 322, is a backup node. The primary owner node A 311 can propagate a series of modifications, e.g. modifications 1−N 331, to the backup node B 322, after receiving one or more messages from a client 302 in a middleware environment 300.

Then, a cluster node C 313 in the distributed data grid 301, which maintains an additional partition copy C 323 for the partition, can be added and become a new backup node. In order to quickly configure the newly added backup node C 313, the primary owner node A 311 can send a batch of modifications, e.g. the modifications 1−N 331, directly to the newly added backup node C 313, in order to update the partition copy C 323 maintained in the newly added backup node C 313.

In the meantime, the cluster node B 312 can send a backup message that contains a new modification N+1 332, which is received from the primary owner node A 311, to the cluster node C 313. In such a scenario, the batch of modifications 1−N 331 may not always arrive at the newly added backup node C 313 before the new modification N+1 332.

In order to ensure consistency, after receiving new modification N+1 332, the cluster node C 313 can check whether it has received and applied the batch of modifications 1−N 331, e.g. via checking the partition version number associated with the partition copy C 323. Then, the cluster node C 313 can apply the new modification N+1 332.

FIG. 4 shows an illustration of supporting partition persistent state consistency in a distributed data grid when replacing a primary owner cluster node in accordance with an embodiment of the invention. As shown in FIG. 4, a distributed data grid 401 can store a partition initially in cluster nodes A-C 411-413, with the cluster node A 411 maintaining a partition copy A 421 being the primary owner node and the cluster nodes B-C 412-413 maintaining partition copies B-C 422-423 being the backup nodes. The primary owner cluster node A 411 can propagate a backup message containing modification I 431 to the backup nodes B-C 412-413, after receiving one or more messages from a client 402 in a middleware environment 400.

Within the distributed data grid 401, the primary owner node A 411 may die at any time and/or be replaced by a new primary owner cluster node, e.g. cluster node B 412. The client 402 may be reconnected to the new primary owner cluster node B 412 via a new connection 440. With or without a request from the client 402, the new primary owner cluster node B 412 may resend at least one said modification, e.g. modification I 431, to the backup node C 413 which maintains a partition copy C 423, in addition to a new modification II 432.

In the example as shown in FIG. 4, the cluster node C 413 may receive a same modification, e.g. the modification I 431, for multiple times. In order to ensure consistency, after receiving backup messages from cluster node B 412, the cluster node C 413 can check whether it has received and applied the batch of modifications I 431, e.g. via checking the partition version number associated with the partition copy C 423. Then, the cluster node C 413 can update the partition copy C 423 accordingly.

FIG. 5 illustrates an exemplary flow chart for providing partition persistent state consistency in a distributed data grid in accordance with an embodiment of the invention. As shown in FIG. 5, at step 501, the distributed data grid provides a plurality of cluster nodes that stores a plurality of copies of a partition, wherein the plurality of cluster nodes includes a primary owner node and one or more backup nodes for the partition. Then, at step 502, the distributed data grid can propagate one or more modifications of the partition from the primary owner node to the one or more backup nodes. Furthermore, at step 503, the distributed data grid can ensure consistency among the plurality copies of the partition on the plurality of cluster nodes in the distributed data grid.

The present invention may be conveniently implemented using one or more conventional general purpose or specialized digital computer, computing device, machine, or microprocessor, including one or more processors, memory and/or computer readable storage media programmed according to the teachings of the present disclosure. Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art.

In some embodiments, the present invention includes a computer program product which is a storage medium or computer readable medium (media) having instructions stored thereon/in which can be used to program a computer to perform any of the processes of the present invention. The storage medium can include, but is not limited to, any type of disk including floppy disks, optical discs, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.

The foregoing description of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations will be apparent to the practitioner skilled in the art. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention for various embodiments and with various modifications that are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalence.

Claims

1. A method for providing partition persistent state consistency in a distributed data grid, comprising: providing a plurality of copies of a partition on a plurality of cluster nodes in the distributed data grid, wherein the plurality of cluster nodes includes a primary owner node and one or more backup nodes for the partition;propagating one or more modifications of the partition from the primary owner node to the one or more backup nodes; andensuring consistency among the plurality copies of the partition on the plurality of cluster nodes in the distributed data grid.
2. The method according to claim 1, further comprising: associating a partition version number with each copy of the partition in the distributed data grid.
3. The method according to claim 1, further comprising: guaranteeing an orderly delivery of one or more messages via a connection between each pair of cluster nodes in the distributed data grid.
4. The method according to claim 1, further comprising: receiving one or more messages at the primary owner node, wherein a first said message contains a first modification of the partition and a second message contains a second modification of the partition.
5. The method according to claim 4, further comprising: determining whether the first modification has already been applied to a copy of the partition on at least one backup node before applying the second modification to the copy of the partition on the at least one backup node.
6. The method according to claim 4, further comprising: deferring applying the second modification on the copy of the partition on the at least one backup node until the first modification is received and applied.
7. The method according to claim 1, further comprising: sending a batch of modifications to a newly added backup node in order to update a copy of the partition on the newly added backup node.
8. The method according to claim 1, further comprising: resending at least one said modification to the one or more backup nodes via a new primary node when the old primary owner node is dead.
9. The method according to claim 1, further comprising: allowing the primary owner node to resend one or more modifications to a backup node after receive a special message from the backup node.
10. The method according to claim 1, further comprising: supporting in the distributed data grid at least one of: an partition ownership change protocol,a partition backup protocol, anda partition transfer protocol.
11. A system for providing partition persistent state consistency in a distributed data grid, comprising: one or more microprocessors;the distributed data grid, running on the one or more microprocessors, operates to perform the steps of providing a plurality of copies of a partition on a plurality of cluster nodes in the distributed data grid, wherein the plurality of cluster nodes includes a primary owner node and one or more backup nodes for the partition;propagating one or more modifications of the partition from the primary owner node to the one or more backup nodes; andensuring consistency among the plurality copies of the partition on the plurality of cluster nodes in the distributed data grid.
12. The system according to claim 11, wherein: each copy of the partition in the distributed data grid is associated with a partition version number.
13. The system according to claim 11, wherein: the distributed data grid guarantees an orderly delivery of one or more messages via a connection between each pair of cluster nodes in the distributed data grid.
14. The system according to claim 11, wherein: one or more messages are received at the primary owner node, wherein a first said message contains a first modification of the partition and a second message contains a second modification of the partition.
15. The system according to claim 14, wherein: the distributed data grid determines whether the first modification has already been applied to a copy of the partition on at least one backup node before applying the second modification to the copy of the partition on the at least one backup node.
16. The system according to claim 14, wherein: the distributed data grid defers applying the second modification on the copy of the partition on the at least one backup node until the first modification is received and applied.
17. The system according to claim 11, wherein: the distributed data grid sends a batch of modifications to a newly added backup node in order to update a copy of the partition on the newly added backup node.
18. The system according to claim 11, wherein: the distributed data grid resends at least one said modification to the one or more backup nodes via a new primary node when the old primary owner node is dead.
19. The system according to claim 11, wherein: the distributed data grid allows the primary owner node to resend one or more modifications to a backup node after receive a special message from the backup node.
20. A non-transitory machine readable storage medium having instructions stored thereon that when executed cause a system to perform the steps of: providing a plurality of copies of a partition on a plurality of cluster nodes in a distributed data grid, wherein the plurality of cluster nodes includes a primary owner node and one or more backup nodes for the partition;propagating one or more modifications of the partition from the primary owner node to the one or more backup nodes; andensuring consistency among the plurality copies of the partition on the plurality of cluster nodes in the distributed data grid.

CLAIM OF PRIORITY

This application claims priority on U.S. Provisional Patent Application No. 61/714,100, entitled “SYSTEM AND METHOD FOR SUPPORTING A DISTRIBUTED DATA GRID IN A MIDDLEWARE ENVIRONMENT,” by inventors Robert H. Lee, Gene Gleyzer, Charlie Helin, Mark Falco, Ballav Bihani and Jason Howes, filed Oct. 15, 2012, which application is herein incorporated by reference. The current application hereby incorporates by reference the material in the following patent applications: U.S. patent application No. ______, titled “SYSTEM AND METHOD FOR PROVIDING SUPPORTING GUARANTEED MULTI-POINT DELIVERY IN A DISTRIBUTED DATA GRID”, inventors Robert H. Lee and Gene Gleyzer, filed ______ (Attorney Docket No.: ORACL-05358US0). U.S. patent application No. ______, titled “SYSTEM AND METHOD FOR PROVIDING TRANSIENT PARTITION CONSISTENCY IN A DISTRIBUTED DATA GRID”, inventors Robert H. Lee and Gene Gleyzer, filed ______ (Attorney Docket No.: ORACL-05359US1). U.S. patent application No. ______, titled “SYSTEM AND METHOD FOR SUPPORTING ASYNCHRONOUS MESSAGE PROCESSING IN A DISTRIBUTED DATA GRID”, inventor Gene Gleyzer, filed ______ (Attorney Docket No.: ORACL-05360US0). U.S. patent application No. ______, titled “SYSTEM AND METHOD FOR SUPPORTING OUT-OF-ORDER MESSAGE PROCESSING IN A DISTRIBUTED DATA GRID”, inventors Mark Falco and Gene Gleyzer, filed ______ (Attorney Docket No.: ORACL-05364US0).

Provisional Applications (1)

	Number	Date	Country
	61714100	Oct 2012	US

SYSTEM AND METHOD FOR PROVIDING PARTITION PERSISTENT STATE CONSISTENCY IN A DISTRIBUTED DATA GRID

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

US Classifications

International Classifications

Abstract

Description

Claims

CLAIM OF PRIORITY

Provisional Applications (1)