1. Field of the Invention
The present invention relates to a transaction processing system and method that provide the capability to use one two-phase commit engine over multiple protocols and products, while being able to vary the log presumptions according to the requirements of each protocol and product.
2. Description of the Related Art
Many commercial transaction-processing systems rely on the two-phase commit protocol to preserve ACID properties in a unit of work that spans multiple resource managers. The two-phase commit protocol requires a series of messages to be exchanged between a transaction manager or coordinator and the resource managers that are defined by a voting process. This voting process is performed in two phases: in the first phase, resource managers are asked to checkpoint their view of the work scoped by the transaction and indicate if they were successful. In the second phase, the coordinator communicates whether or not the resource managers should make permanent and visible the changes that were check-pointed.
For this process to provide ACID guarantees in the face of a failure, the coordinator must log information to a non-volatile store (typically disk) that can be used to recover a coordinator failure. This adds further overhead to the process. In order to provide enhanced performance, presumptions may be made about the protocol to minimize the number of log writes that must be made. These are well-understood optimizations and include “presumed abort” and “presumed commit” assumptions—the absence of certain log messages indicates that the resource manager may presume that the transaction was either aborted or committed. The full protocol without presumptions is sometimes said to “presume nothing”.
Different transaction models that include reifications of the protocol require, for interoperability, that the coordinator maintain fixed recovery semantics. The conventional solution is for a coordinator to include presumptions explicitly in the two-phase commit algorithm. This is inflexible especially considering the increasing requirements for interoperability in various software architectures the most directly relevant being transaction processing protocols. It is desirable to be able to use the same two phase commit coordinator in transaction managers that support different variations (based on presumptions) of the atomic transaction model. A need arises for a technique by which the log presumptions may be varied.
The present invention provides a novel way to achieve variability of presumptions. This provides the capability to use one two-phase commit engine over multiple variations of the atomic transaction protocol and products, while being able to vary the log presumptions according to the requirements of each protocol variant and product.
In one embodiment of the present invention, a transaction processing system comprises a coordinator operable to receive a commit or abort request for a transaction and to perform a two-phase commit protocol on behalf of the transaction and a transaction logging unit operable to write to log information pertaining to a transaction for use upon recovery of the transaction from a failure, wherein the logging presumption is selectable from among a plurality of logging presumptions. The log presumption mechanism may comprise a strategy pattern operable to select a log presumption from among a plurality of log presumptions. The plurality of log presumptions may comprise a log presumption for recovery protocols which make no presumption about a state of the transaction, a log presumption for recovery protocols which presume that the transaction aborted if no information to the contrary exists, a log presumption for recovery protocols which presume that the transaction committed if no information to the contrary exists. The recovery protocols mentioned may include both the transaction processing system's own recovery manager operable as well as the recovery processes initiated by the resources which participate in the transaction processing system's transactions.
If the selected log presumption is for recovery protocols which make no presumption about a state of the transaction, the transaction logging unit may be operable to write to a log information relating to the transaction when the transaction is created, to write to a log information relating to the transaction after a state change of the two-phase protocol, and to purge the information relating to the transaction from the log when the transaction is completed. If the selected log presumption is for recovery protocols that presume that the transaction aborted if no information to the contrary exists, the transaction logging unit may be operable to write to a log information relating to the transaction only when the coordinator has begun a committing phase of the two-phase commit protocol.
Variations in presumptions of the two phase commit protocol are introduced in to the two phase commit algorithms without modifications to the code that implements the algorithm itself. This is achieved by decoupling the specifics of logging behind a Strategy Pattern implementation. A Strategy Pattern is a specific software design technique used to vary a family of algorithms, encapsulate each one, and make them interchangeable. This invention induces logging behavior at each state change in the transaction by signaling the presumption strategy implementation. The specific details of the presumption are wholly encapsulated by the strategy implementation: the algorithm for the two phase commit protocol is completely agnostic about the logging strategy that is executed on behalf of the coordinator.
Dynamic selection of the strategy based on the capabilities of transaction participants can be used to create self-optimizing transaction coordinators. In this system, the transaction manager examines registered resource managers and automatically creates the most efficient presumption strategy for the two phase commit algorithm that the participants can tolerate without jeopardizing ACID guarantees for the unit of work.
a is an exemplary data flow diagram of a two-phase commit protocol.
b is an exemplary data flow diagram of a two-phase commit protocol.
An exemplary system architecture 100, in which the present invention may be implemented, is shown in
Backend servers 106 include a plurality of servers, such as backend business application 120 and database management systems 122 and 124. Database management systems (DBMSs) are software that enables storing, modifying, and extracting information from a database. There are many different types of DBMSs, ranging from small systems that run on personal computers to huge systems that run on mainframes. Examples of database applications include:
computerized library systems
automated teller machines
flight reservation systems
computerized parts inventory systems
From a technical standpoint, DBMSs can differ widely. The terms relational, network, flat, and hierarchical all refer to the way a DBMS organizes information internally. The internal organization can affect how quickly and flexibly you can extract information.
An exemplary data flow diagram of a two-phase commit protocol 200 is shown in
A resource manager is a term used to describe the role of system components that manage the operation of resources, such as DBMSs. A resource is a term used to describe an item that is managed by a resource manager, such as a database managed by a DBMS. The terms “resource manage” and “resource,” are used to broaden the description of the system components that are used in the two-phase commit protocol because, when a transaction commits, all of the shared resources it accesses need to get involved in the commitment activity, not just databases. Nondatabase resources include recoverable scratch pad areas, queues, and other communications systems.
The two-phase commit protocol makes the following assumptions about each transaction T:
Transaction T accesses resources from time to time. If it experiences a serious error at any time, such as a deadlock or illegal operation, it issues an abort operation. If it terminates normally without any errors, it issues a commit. In response to the commit, the system runs the two-phase commit protocol.
Each resource manager can commit or abort its part of T, that is, permanently install or undo T's operations that involve this resource manager. Thus, each resource manager typically has a transactional recovery system.
One and only one program issues the commit operation on T. That is, one program decides when to start committing T by running the two-phase commit protocol, and no other program will later start running the protocol on T independently. In some cases, a second attempt to run two-phase commit while the first attempt is still running will cause the protocol to break, that is, will cause it to commit at one resource manager and abort at another. The protocol can be programmed to cope with concurrent attempts to run two-phase commit, but we assume it does not happen.
Transaction T has terminated executing at all resource managers before issuing the commit operation. In general, this can be hard to arrange. If the transaction does all of its communications using RPC, then it can ensure T has finished processing at all resource managers by waiting for all of those calls to return, provided that each resource manager finishes all of the work it was asked to do before returning from the call. If T uses other communications paradigms, such as peer-to-peer, then it has to ensure by some other means that T terminated. For example, the well-known LU6.2 protocol, carefully dovetails two-phase commit with the transaction termination protocol. This assumption allows us to avoid dealing with the complexity of transaction termination here.
Every system and resource manager fails by stopping. That is, the protocol does not make mistakes when its system or a resource manager mal-functions. It either does exactly what the protocol says it should do, or it stops running. It is possible for a failure to cause the protocol to do something that is inconsistent with the specification, such as sending bogus messages.
A participant P is said to be prepared if all of transaction T's after-images at P are in stable storage. It is essential that T does not commit at any participant until all participants are prepared. The reason is the force-at-commit rule, which says not to commit a transaction until the after-images of all of its updates are in stable storage. To see what goes wrong if you break the rule, suppose one participant, P1, commits T before another participant, P2, is pre-pared. If P2 subsequently fails, before it is prepared and after PI commits, then T will not be atomic. T has already committed at PI, and it cannot commit at P2 because P2, may have lost some of T's updates when it failed. On the other hand, if P2 is prepared before P, commits, then it is still possible for T to be atomic after P2 fails. When P2 recovers, it still has T's updates in stable storage (because it was prepared before it failed). After it recovers and finds out that T committed, it too can finish committing T.
Ensuring that all participants are prepared before any of them commits is the essence of two-phase commit. Phase 1 is when all participants become prepared. Phase 2 is when they commit. No participant enters phase 2 until all participants have completed phase 1, that is, until all participants are pre-pared.
The protocol proceeds as follows:
The literature on object-oriented systems was influenced in dramatic ways by the publication of Design Patterns by Gamma et al in the mid-90s. This book cataloged a number of implementation techniques for object based systems that could be reused. Typically, patterns promote flexibility and abstraction. One of the patterns documented by Gamma et al was the Strategy Pattern.
The Strategy Pattern provides the capability to define a family of algorithms, encapsulate each one, and make them interchangeable. Strategy lets the algorithm vary independently from clients that use it. The present invention employs the strategy pattern to create statically or dynamically pluggable presumptions about the recovery semantics of the two phase commit algorithm. The algorithm for the protocol is completely decoupled from the log presumptions. This allows the coordinator to be used in different transaction models, which means that multiple products may leverage the same coordination infrastructure without any changes to the core two phase commit engine.
An example of the strategy pattern 300 for pluggable presumptions about the recovery semantics of the two phase commit algorithm is shown in
When the PresumedNothingStrategy object 306 is active as the log presumption, the status of an incomplete or recoverable transaction, or of a resource that is part of such a transaction, is undeterminable without some information about the transaction being present in the logs. In other words, the lack of information about the transaction in the logs is treated as though the transaction had not ever existed or had completed successfully. The scenario where such information necessary for recovery is missing is considered a catastrophic failure, as no consistent recourse can be guaranteed. The processing performed in this presumption is shown in
Returning now to
When the PresumedCommitStrategy object 310 is active as the log presumption, an algorithm is used that interprets a lack of knowledge about a transaction or a lack of a log of the transaction as meaning that the transaction has committed successfully. This presumption is rarely, if at all, used in commercial practice.
An exemplary block diagram of a database management system 500, in which one or more database management systems may be implemented, is shown in
Input/output circuitry 504 provides the capability to input data to, or output data from, database/System 500. For example, input/output circuitry may include input devices, such as keyboards, mice, touchpads, trackballs, scanners, etc., output devices, such as video adapters, monitors, printers, etc., and input/output devices, such as, modems, etc. Network adapter 506 interfaces database/System 500 with Internet/intranet 510. Internet/intranet 510 may include one or more standard local area network (LAN) or wide area network (WAN), such as Ethernet, Token Ring, the Internet, or a private or proprietary LAN/WAN.
Memory 508 stores program instructions that are executed by, and data that are used and processed by, CPU 502 to perform the functions of system 500. Memory 508 may include electronic memory devices, such as random-access memory (RAM), read-only memory (ROM), programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), flash memory, etc., and electro-mechanical memory, such as magnetic disk drives, tape drives, optical disk drives, etc., which may use an integrated drive electronics (IDE) interface, or a variation or enhancement thereof, such as enhanced IDE (EIDE) or ultra direct memory access (UDMA), or a small computer system interface (SCSI) based interface, or a variation or enhancement thereof, such as fast-SCSI, wide-SCSI, fast and wide-SCSI, etc, or a fiber channel-arbitrated loop (FC-AL) interface.
In the example shown in
DBMS routines 514 provide the functionality of DBMS in which the present invention is implemented, such as low-level database management functions, such as those that perform accesses to the database and store or retrieve data in the database. Such functions are often termed queries and are performed by using a database query language, such as Structured Query Language (SQL). SQL is a standardized query language for requesting information from a database. DBMS routines 524 include presumption routines 524, which implement the recovery presumption mechanism. Database kernel 516 provides overall DBMS functionality. Operating system 518 provides overall system functionality.
As shown in
It is important to note that while the present invention has been described in the context of a fully functioning data processing system, those of ordinary skill in the art will appreciate that the processes of the present invention are capable of being distributed in the form of a computer readable medium of instructions and a variety of forms and that the present invention applies equally regardless of the particular type of signal bearing media actually used to carry out the distribution. Examples of computer readable media include recordable-type media such as floppy disc, a hard disk drive, RAM, and CD-ROM's, as well as transmission-type media, such as digital and analog communications links.
Although specific embodiments of the present invention have been described, it will be understood by those of skill in the art that there are other embodiments that are equivalent to the described embodiments. Accordingly, it is to be understood that the invention is not to be limited by the specific illustrated embodiments, but only by the scope of the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5201044 | Frey et al. | Apr 1993 | A |
5261089 | Coleman et al. | Nov 1993 | A |
5335343 | Lampson et al. | Aug 1994 | A |
5390302 | Johnson et al. | Feb 1995 | A |
5452445 | Hallmark et al. | Sep 1995 | A |
5504899 | Raz | Apr 1996 | A |
5638508 | Kanai et al. | Jun 1997 | A |
5884327 | Cotner et al. | Mar 1999 | A |
6101527 | Lejeune et al. | Aug 2000 | A |
6138169 | Freund et al. | Oct 2000 | A |
6247023 | Hsiao et al. | Jun 2001 | B1 |
6266698 | Klein et al. | Jul 2001 | B1 |
6618822 | Loaiza et al. | Sep 2003 | B1 |
6684223 | Ganesh et al. | Jan 2004 | B1 |
6738971 | Chandrasekaran et al. | May 2004 | B2 |
6779016 | Aziz et al. | Aug 2004 | B1 |
6950804 | Strietzel | Sep 2005 | B2 |
7277900 | Ganesh et al. | Oct 2007 | B1 |
7290056 | McLaughlin, Jr. | Oct 2007 | B1 |
7320023 | Chintalapati et al. | Jan 2008 | B2 |
7337441 | Felt et al. | Feb 2008 | B2 |
7472129 | Adya et al. | Dec 2008 | B2 |
7581011 | Teng | Aug 2009 | B2 |
20010049691 | Asazu | Dec 2001 | A1 |
20020062463 | Hines | May 2002 | A1 |
20020083115 | Kinder et al. | Jun 2002 | A1 |
20020120710 | Chintalapati et al. | Aug 2002 | A1 |
20020138577 | Teng et al. | Sep 2002 | A1 |
20020194242 | Chandrasekaran et al. | Dec 2002 | A1 |
20030036919 | Felt et al. | Feb 2003 | A1 |
20030046298 | Weedon | Mar 2003 | A1 |
20030046342 | Felt et al. | Mar 2003 | A1 |
20040015851 | Newhook et al. | Jan 2004 | A1 |
20040030703 | Bourbonnais et al. | Feb 2004 | A1 |
20040236990 | Pavlik et al. | Nov 2004 | A1 |
20050015775 | Russell et al. | Jan 2005 | A1 |
20050066095 | Mullick et al. | Mar 2005 | A1 |
20070073621 | Dulin et al. | Mar 2007 | A1 |
20080066068 | Felt et al. | Mar 2008 | A1 |
20080215637 | Barnes et al. | Sep 2008 | A1 |
Number | Date | Country | |
---|---|---|---|
20060174224 A1 | Aug 2006 | US |