This application relates generally to data replication for disaster recovery. More particularly, this application relates to techniques for data replication monitoring with streamed data updates.
A replication coordinator 112 operates between the first database 100 and the second database 106. The replication coordinator 112 collects Change Data Capture (CDC) events for the original transactions for the first database 102 and generates write commands to form the replicated transactions 110. Similarly, the replication coordinator 112 collects CDC events for the original transactions for the second database 108 and generates write commands to form the replicated transactions 104.
Prior art systems of the type shown in
In view of the foregoing, there is a need for a non-intrusive real-time verification mechanism that continuously and incrementally monitors replication solutions to ensure full replication of data between databases.
A method implemented in a computer network includes identifying a transactional change data capture event at a transactional database. A transaction event stream is created with metadata characterizing the transactional change data capture event. A replication change data capture event is identified at a replication database corresponding to the transactional database. A replication event stream with metadata characterizing the replication change data capture event is created. The transaction event stream and the replication event stream are evaluated to selectively identify a replication performance failure within a specified time threshold of the replication performance failure.
The invention is more fully appreciated in connection with the following detailed description taken in conjunction with the accompanying drawings, in which:
Like reference numerals refer to corresponding parts throughout the several views of the drawings.
The disclosed technology uses a secondary change data capture (CDC) mechanism to capture transactions being committed on a source database that can be seen by the data replication solution and moved to the target. It also captures transactions that the data replication solution is applying to the target.
It then checks that every transaction committed on the source is also committed on the target, matching and providing the lag between the two. If any transaction is not committed on the target within a configurable time period or specified time threshold, that transaction is logged and an administrator is alerted. This solution works for single and bidirectional active/active replication.
Advantageously, the disclosed solution provides immediate notification if a transaction is missed. This allows database administrators to fully synchronize the database before operations continue and major problems occur. The solution also delivers root-cause information by indicating exactly which transactions are missing and which tables are affected. The solution enables an enterprise to be able to fully rely on their secondary systems when a primary system fails.
The CDC module 208 collects database transactions and passes them to a first stream processor 220. More particularly, the stream processor 220 generates or accesses metadata regarding a database transaction. The metadata may be of the type discussed above in connection with the tracking tables. The stream processor 220 emits a first database original transaction event stream 222 and a second database replicated transaction event stream 224.
The same or another stream processor 240 applies rules to corresponding event streams. That is, the stream processor 240 applies rules to first database original transaction event stream 222 and first database replicated transaction event stream 230 to selectively identify a replication performance failure as an event report 242. As demonstrated below, a replication performance failure is reported within a specified time threshold.
Similarly, the second database original transaction event stream 228 is evaluated in connection with the second database replicated transaction event stream 224 to selectively identify a replication performance failure.
Observe that the stream processor 240 is operative on metadata “in flight”. The stream processor 240 is an in-memory processor that applies rules and reports events. It is effectively a continuous query engine. A code example is provided below.
The configuration of
The second machine 304 also includes standard components, such as a central processing unit, input/output devices 332, a bus 334 and a network interface circuit 336. A memory 340 is connected to bus 334. The memory 340 stores a second database 342. The second database 342 may include the elements shown in the second database 210 of
A third machine 348 also includes standard components, such as a central processing unit 350, input/output devices 352, a bus 354 and a network interface circuit 356. A memory 360 is also connected to the bus 354. The memory 360 stores a stream processor 362. Stream processor 362 refers to one or more of stream processors 220, 226 and 240 of
A fourth machine 368 also includes standard components, such as a central processing unit 370, input/output devices 372, a bus 374 and a network interface circuit 376. A memory 378 is connected to the bus 374. The memory 378 stores a replication coordinator 380. The replication coordinator 380 operates as discussed in connection with
The configuration of
The metadata is evaluated 404. Based upon the evaluation, a decision is made to determine whether the original and replication events are within transactional tolerances 406. If so (406—Yes), then control returns to block 400. If not (406—No), a transactional event stream is generated 408 and then control returns to block 400.
An embodiment of the present invention relates to a computer storage product with a non-transitory computer readable storage medium having computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well known and available to those having skill in the computer software arts. Examples of computer-readable media include, but are not limited to: magnetic media, optical media, magneto-optical media and hardware devices that are specially configured to store and execute program code, such as application-specific integrated circuits (“ASICs”), programmable logic devices (“PLDs”) and ROM and RAM devices. Examples of computer code include machine code, such as produced by a compiler, and files containing higher-level code that are executed by a computer using an interpreter. For example, an embodiment of the invention may be implemented using JAVA@, C++, or other object-oriented programming language and development tools. Another embodiment of the invention may be implemented in hardwired circuitry in place of, or in combination with, machine-executable software instructions.
The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the invention. However, it will be apparent to one skilled in the art that specific details are not required in order to practice the invention. Thus, the foregoing descriptions of specific embodiments of the invention are presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed; obviously, many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, they thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the following claims and their equivalents define the scope of the invention.
The following code provides an example of an implementation of the disclosed invention. The code includes explanatory documentation.
This application claims priority to U.S. Provisional Patent Application No. 62/222,712 filed Sep. 23, 2015, the contents of which are incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
6032518 | Prater | Mar 2000 | A |
7680831 | Gandhi | Mar 2010 | B1 |
9756119 | Pareek et al. | Sep 2017 | B2 |
10200459 | Pareek et al. | Feb 2019 | B2 |
20020133507 | Holenstein | Sep 2002 | A1 |
20090157767 | Doty | Jun 2009 | A1 |
20110196833 | Drobychev | Aug 2011 | A1 |
20120054533 | Shi | Mar 2012 | A1 |
20150378840 | Shang | Dec 2015 | A1 |
20170032010 | Merriman | Feb 2017 | A1 |
20170147630 | Scott | May 2017 | A1 |
20170270153 | Bantupalli | Sep 2017 | A1 |
20170270175 | Bantupalli | Sep 2017 | A1 |
20180067826 | Earl | Mar 2018 | A1 |
Entry |
---|
Martin Kleppmann. “Bottled Water: Real-time integration of PostgreSQL and Kafka”. Apr. 23, 2015; accessed Jun. 15, 2018 from <https://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/> (Year: 2015). |
M. A. Meinagh, et al. “Database Replication with Availability and Consistency Guarantees through Failure-Handling,” 2007 International Multi-Conference on Computing in the Global Information Technology (ICCGI'07), Guadeloupe City, 2007, doi: 10.1109/ICCGI.2007.20 (Year: 2007). |
Number | Date | Country | |
---|---|---|---|
62222712 | Sep 2015 | US |