A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
This patent application is related to the following U.S. patent application, which is incorporated by reference herein in its entirety:
U.S. patent application Ser. No. 13/359,396, entitled “EVENT DISTRIBUTION PATTERN FOR USE WITH A DISTRIBUTED DATA GRID”, by Brian Oliver et al., filed on Jan. 26, 2012.
The current invention relates to data storage in distributed computing environments and in particular to distributing updates and data replication between computer clusters.
In recent years, the amount of information utilized by various organizations, businesses and consumers has exploded to reach enormous amounts. From enterprise resource planning (ERP) to customer resource management (CRM) and other systems, more and more parts of an organization are becoming optimized, thereby producing vast amounts of data relevant to the organization. All of this information needs to be collected, stored, managed, archived, searched and accessed in an efficient, scalable and reliable manner.
Historically, most enterprises have utilized large databases to store the majority of their data and used random access memory (RAM) to locally cache a subset of that data that is most frequently accessed. This has been done mainly to conserve costs since RAM has traditionally been faster but more expensive than disk-based storage. Over time, RAM has been continuously growing in storage capacity and declining in cost. However, these improvements have not kept up with the rapid rate of increase in data being used by enterprises and their numerous applications. In addition, because CPU advancements have generally outpaced memory speed improvements, it is expected that memory latency will become a bottleneck in computing performance.
Organizations today need to predictably scale mission-critical applications to provide fast and reliable access to frequently used data. It is desirable that data be pushed closer to the application for faster access and greater resource utilization. Additionally, continuous data availability and transactional integrity are needed even in the event of a server failure.
An in-memory data grid can provide the data storage and management capabilities by distributing data over a number of servers working together. The data grid can be middleware that runs in the same tier as an application server or within an application server. It can provide management and processing of data and can also push the processing to where the data is located in the grid. In addition, the in-memory data grid can eliminate single points of failure by automatically and transparently failing over and redistributing its clustered data management services when a server becomes inoperative or is disconnected from the network. When a new server is added, or when a failed server is restarted, it can automatically join the cluster and services can be failed back over to it, transparently redistributing the cluster load. The data grid can also include network-level fault tolerance features and transparent soft re-start capability.
In accordance with various embodiments of the invention, a set of push replication techniques are described for use in an in-memory data grid. When applications on a cluster perform insert, update or delete operations in the cache, the push replication provider can asynchronously push updates of those data entries from the source cluster to a remote destination cluster over a wide area network (WAN). The push replication provider includes a pluggable internal transport to send the updates to the destination cluster. This pluggable transport can be switched to employ a different communication service and store/forward semantics. The embodiments further include a publishing transformer that can apply filters and chain multiple filters on a stream of updates from the source cluster to the destination cluster. A batch publisher can be used to receive batches multiple updates and replicate those batch to the destination cluster. XML based configuration can be provided to configure the push replication techniques on the cluster. The described push replication techniques can be applied in a number of cluster topologies, including active/passive, active/active, multi-site active/passive, multi-site active/active and centralized replication arrangement.
In accordance with various embodiments, a set of push replication techniques are described for computers that store data in an in-memory data grid. In accordance with an embodiment, the data grid is a system composed of multiple servers that work together to manage information and related operations—such as computations—in a distributed environment. An in-memory data grid then is a data grid that stores the information in memory to achieve higher performance and uses redundancy by keeping copies of that information synchronized across multiple servers to ensure resiliency of the system and the availability of the data in the event of server failure. The data grid is used as a data management system for application objects that are shared across multiple servers, require low response time, high throughput, predictable scalability, continuous availability and information reliability. As a result of these capabilities, the data grid is ideally suited for use in computational intensive, stateful middle-tier applications. The data management is targeted to run in the application tier, and is often run in-process with the application itself, for example in the application server cluster. In accordance with an embodiment, the data grid software is middleware that reliably manages data objects in memory across a plurality of servers and also brokers the supply and demand of data between applications and data sources. In addition, the data grid can push the processing of requests closer to the data residing in the grid. Rather than pulling the necessary information to the server that will be executing the process, the data grid can push the processing of the request to the server that is storing the information locally. This can greatly reduce latency and improve data access speeds for applications.
In accordance with an embodiment, push replication is a framework for synchronizing information between data grid clusters over a network which can include wide area networks (WANs) or metropolitan area networks (MANs). Push replication operates by “pushing” updates occurring in a source cluster to a destination cluster in a transparent and asynchronous manner. An example of using push replication could be implemented between multiple auction sites in New York and London. In this example, push replication could be used to push bids between both sites to keep both sites synchronized, meaning having the same data. In accordance with an embodiment, push replication solves several problems including but not limited to disaster recovery (providing a back-up at a remote cite), offloading read-only queries (providing one cluster dedicated to read/write operations with the read-only operations offloaded to another cluster), and providing local access to global data. Additional examples and use cases for push replication include Active/Passive (read/write and read-only) sites, Hub/Spoke model of Active/Passive sites, and Active/Active cites as will be described in further detail later in this document. A conflict resolution feature resolves conflicts in information when simultaneous updates occur in different active clusters on the same relative information.
In accordance with an embodiment, push replication also includes: 1) declarative configuration that is transparent to applications; 2) an event distribution service that is pluggable (a developer can plug a custom or 3rd party event distribution service into the push replication provider); 3) push replication is configurable to push updates to other services (e.g. file systems); and 4) publish entries can be easily filtered and coalesced using a custom filter class written by the application or by using declarative conditioning expressions embedded in the declarative XML (e.g. price <100).
As illustrated, from the point of view of the application, the push replication is performed transparently, without the application having to be aware of it. The application 104 can simply perform its put, get and remove operations against the data grid cache 106 and the synchronization (push replication) is performed in the background. In particular, in the background, the publishing cache store 108, the push replication provider 110 and the publishing service 112 are aware of the put/get/remove operations and perform the push replication accordingly. These components work together to gather updates in the source cluster 100 and transmit those updates to the destination cluster 102. Applications 118 in the destination cluster then simply see the updates happen automatically.
In accordance with an embodiment, the publishing cache store 108 detects that an update has occurred in the data grid cache 106 and wraps that update with a set of information to create an entry operation. In accordance with an embodiment, the entry operation includes the data entry, the type of operation that was performed on that entry (e.g. insert, update, delete), and any additional metadata that can be used to perform replication, such as conflict resolution on the data, origin of the operation and the like. Once the publishing cache store has created the entry operation, it instructs the underlying push replication provider 110 to publish the operation.
In accordance with an embodiment, the push replication provider 110 places the operation on a topic, which the publishing service 112 is registered to listen to. The push replication provider 110 can utilize a number of messaging schemes, such as the Java Messaging Service (JMS) or a custom messaging protocol to communicate the updates to the destination cluster. In accordance with an embodiment, the push replication provider 110 is pluggable such that a user can select which messaging scheme to use and plug into the push replication framework. The push replication provider 110 is thus responsible for placing the entry operation that needs to be replicated onto any internal transport mechanism that is required to provide the store and forward semantics. The store and forward semantics can ensure that each entry operation is kept in a queue such that in the event of a connection between the clusters is lost, the entry operation will remain on the queue and will eventually be replicated once the connection comes back.
In accordance with an embodiment, the push replication provider 110 is an interface that includes method invocations including but not limited to registerPublisher( ) establishPublishingInfrastructure( ) and publish( ). In accordance with an embodiment, the publisher can be a batch publisher that can publish batches of updates (entry operations) at a time.
The publishing service 112 can be a live processing thread residing on the active cluster 100 which listens for updates and replicates them to the local cache publisher 114 in the destination cluster 102. In accordance with an embodiment, there is one or more running instances of the publishing service for each destination. If the publishing service fails, it can be automatically restarted on another node in the active cluster, such that the application is fault tolerant.
The local cache publisher 114 reads the entry operations received from the active cluster, performs conflict resolution on these entry operations and writes them to the local data grid cache 116 on the passive cluster 102. The local data grid cache 116 is available for access to any local application 118 deployed on the destination cluster.
In accordance with an embodiment, the push replication framework can further include a publishing transformer. The publishing transformer can be used to apply a set of filters to the stream of data being replicated out to another cluster. For example, if only some of the entries in the cache should be replicated out, the publishing transformer can filter out those entries that do not fulfill the criteria for replication. In another example, one could use the publishing transformer to strip the entries of data and only replicate the fact that the entries arrived in the cache. Therefore, any updates made to an entry would be published (replicated) in the same order unless the publishing transformer is used to mutate the entry operations prior to publishing.
In accordance with an embodiment, the publishing transformer can include a coalescing publishing transformer, a filtering transformer and a chaining transformer. The coalescing transformer that filters out all updates for an entry except for the last update in the batch on that data entry. Therefore, rather than consuming the network bandwidth to send multiple operations on the same update, the coalescing transformer only sends the last update operation. The filtering transformer can apply filters to filter out certain entries that do not fulfill the replication requirements. The chaining transformer can chain multiple transformers or multiple filters. For example, the chaining transformer can be used to chain the coalescing transformer with the filtering transformer in order to filter out certain entries in a batch, as well as transmit only the last updates for those entries that matched the filter.
In accordance with an embodiment, the push replication can be declaratively configurable within the data grid. Developers can use extensible markup language (XML) declarations within the data grid configuration files to configure how the push replication works. A push replication namespace is added to the configuration files for the data grid which can be used to configure the functionality of push replication. For example, a user can specify in the configuration which publishers should be used by push replication, which cache should be replicated to which destination and the like.
In addition to the topologies illustrated above, there can be a composition of several topologies between separate clients. For example, while the cluster at the New York client can be an active/passive deployment and the London client can implement its local cluster as a active/passive deployment, the topology between the New York and London clusters can be an active/active topology. Similarly, the two clients may deploy their local clusters as hub/spoke deployments, while the topology between the clusters may be active/active. Various other combinations of such topologies are possible within the scope of the embodiments described herein.
Throughout the various contexts described in this disclosure, the embodiments of the invention further encompass computer apparatus, computing systems and machine-readable media configured to carry out the foregoing systems and methods. In addition to an embodiment consisting of specifically designed integrated circuits or other electronics, the present invention may be conveniently implemented using a conventional general purpose or a specialized digital computer or microprocessor programmed according to the teachings of the present disclosure, as will be apparent to those skilled in the computer art.
Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art. The invention may also be implemented by the preparation of application specific integrated circuits or by interconnecting an appropriate network of conventional component circuits, as will be readily apparent to those skilled in the art.
The various embodiments include a computer program product which is a storage medium (media) having instructions stored thereon/in which can be used to program a general purpose or specialized computing processor(s)/device(s) to perform any of the features presented herein. The storage medium can include, but is not limited to, one or more of the following: any type of physical media including floppy disks, optical discs, DVDs, CD-ROMs, microdrives, magneto-optical disks, holographic storage, ROMs, RAMs, PRAMS, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs); paper or paper-based media; and any type of media or device suitable for storing instructions and/or information. The computer program product can be transmitted in whole or in parts and over one or more public and/or private networks wherein the transmission includes instructions which can be used by one or more processors to perform any of the features presented herein. The transmission may include a plurality of separate transmissions. In accordance with certain embodiments, however, the computer storage medium containing the instructions is non-transitory (i.e. not in the process of being transmitted) but rather is persisted on a physical device.
The foregoing description of the preferred embodiments of the present invention has been provided for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations can be apparent to the practitioner skilled in the art. Embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the relevant art to understand the invention. It is intended that the scope of the invention be defined by the following claims and their equivalents.
The present application claims the benefit of U.S. Provisional Patent Application No. 61/437,550, entitled “PUSH REPLICATION IN A DISTRIBUTED DATA GRID,” by Bob Hanckel et al., filed on Jan. 28, 2011, which is incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5784569 | Miller et al. | Jul 1998 | A |
5819272 | Benson | Oct 1998 | A |
5940367 | Antonov | Aug 1999 | A |
5987506 | Carter | Nov 1999 | A |
5988847 | McLaughlin | Nov 1999 | A |
5991894 | Lee et al. | Nov 1999 | A |
5999712 | Moiin et al. | Dec 1999 | A |
6182139 | Brendel | Jan 2001 | B1 |
6304879 | Sobeski | Oct 2001 | B1 |
6377993 | Brandt | Apr 2002 | B1 |
6487622 | Coskrey, IV et al. | Nov 2002 | B1 |
6490620 | Ditmer | Dec 2002 | B1 |
6615258 | Barry | Sep 2003 | B1 |
6631402 | Devine | Oct 2003 | B1 |
6693874 | Shaffer et al. | Feb 2004 | B1 |
6714979 | Brandt | Mar 2004 | B1 |
6772203 | Feiertag | Aug 2004 | B1 |
6968571 | Devine | Nov 2005 | B2 |
7114083 | Devine | Sep 2006 | B2 |
7139925 | Dinker et al. | Nov 2006 | B2 |
7266822 | Boudnik | Sep 2007 | B1 |
7328237 | Thubert | Feb 2008 | B1 |
7376953 | Togasaki | May 2008 | B2 |
7464378 | Limaye | Dec 2008 | B1 |
7543046 | Bae et al. | Jun 2009 | B1 |
7574706 | Meulemans et al. | Aug 2009 | B2 |
7698390 | Harkness | Apr 2010 | B1 |
7711920 | Borman et al. | May 2010 | B2 |
7720971 | Moutafov | May 2010 | B2 |
7739677 | Kekre et al. | Jun 2010 | B1 |
7792977 | Brower et al. | Sep 2010 | B1 |
7814248 | Fong et al. | Oct 2010 | B2 |
7953861 | Yardley | May 2011 | B2 |
8195835 | Ansari | Jun 2012 | B2 |
8209307 | Erofeev | Jun 2012 | B2 |
8312439 | Kielstra et al. | Nov 2012 | B2 |
20020035559 | Crowe et al. | Mar 2002 | A1 |
20020073223 | Darnell | Jun 2002 | A1 |
20020078312 | Wang-Knop et al. | Jun 2002 | A1 |
20020087630 | Wu | Jul 2002 | A1 |
20030023898 | Jacobs | Jan 2003 | A1 |
20030046286 | Jacobs | Mar 2003 | A1 |
20030120715 | Johnson et al. | Jun 2003 | A1 |
20030187927 | Winchell | Oct 2003 | A1 |
20030191795 | Bernardin | Oct 2003 | A1 |
20030229674 | Cabrera | Dec 2003 | A1 |
20040025071 | Vicard | Feb 2004 | A1 |
20040059805 | Dinker et al. | Mar 2004 | A1 |
20040172626 | Jalan | Sep 2004 | A1 |
20040179471 | Mekkittikul et al. | Sep 2004 | A1 |
20040205148 | Bae | Oct 2004 | A1 |
20040267897 | Hill et al. | Dec 2004 | A1 |
20050021737 | Ellison et al. | Jan 2005 | A1 |
20050083834 | Dunagan et al. | Apr 2005 | A1 |
20050097185 | Gibson | May 2005 | A1 |
20050138460 | McCain | Jun 2005 | A1 |
20050193056 | Schaefer | Sep 2005 | A1 |
20060095573 | Carle | May 2006 | A1 |
20060248285 | Petev | Nov 2006 | A1 |
20070016822 | Rao et al. | Jan 2007 | A1 |
20070118693 | Brannon et al. | May 2007 | A1 |
20070140110 | Kaler | Jun 2007 | A1 |
20070174160 | Solberg et al. | Jul 2007 | A1 |
20070237072 | Scholl | Oct 2007 | A1 |
20070260714 | Kalmuk | Nov 2007 | A1 |
20070271584 | Anderson et al. | Nov 2007 | A1 |
20080109599 | Smith | May 2008 | A1 |
20080183876 | Duvur et al. | Jul 2008 | A1 |
20080276231 | Huang et al. | Nov 2008 | A1 |
20080281959 | Robertson | Nov 2008 | A1 |
20090265449 | Krishnappa et al. | Oct 2009 | A1 |
20090320005 | Toub et al. | Dec 2009 | A1 |
20100128732 | Jiang | May 2010 | A1 |
20100211931 | Levanoni | Aug 2010 | A1 |
20100312861 | Kolhi et al. | Dec 2010 | A1 |
20110041006 | Fowler | Feb 2011 | A1 |
20110107135 | Andrews et al. | May 2011 | A1 |
20110126173 | Tzoref et al. | May 2011 | A1 |
20110161289 | Pei et al. | Jun 2011 | A1 |
20110179231 | Roush | Jul 2011 | A1 |
20110249552 | Stokes et al. | Oct 2011 | A1 |
20120117157 | Ristock | May 2012 | A1 |
20120158650 | Andre et al. | Jun 2012 | A1 |
20120215740 | Vaillant et al. | Aug 2012 | A1 |
Number | Date | Country | |
---|---|---|---|
20120197840 A1 | Aug 2012 | US |
Number | Date | Country | |
---|---|---|---|
61437550 | Jan 2011 | US |