PERFORMANCE OF ROW TABLE TO COLUMNAR TABLE REPLICATION

Information

  • Patent Application
  • 20240004902
  • Publication Number
    20240004902
  • Date Filed
    June 29, 2022
    2 years ago
  • Date Published
    January 04, 2024
    a year ago
  • CPC
    • G06F16/283
    • G06F16/2282
    • G06F16/2379
    • G06F16/2462
  • International Classifications
    • G06F16/28
    • G06F16/22
    • G06F16/23
    • G06F16/2458
Abstract
A method, computer program product, and computer system are provided. Statistics for each columnar table are extracted from an online analytical processing (OLAP) database catalog. A transaction table map is created from transaction log records of an online transaction processing (OLTP) database. The transaction table map includes a counter of log records in a transaction and a timestamp indicating a longevity of the transaction. Based on the counter or the longevity exceeding a predefined threshold, the transaction is transformed, and sent to the OLAP database where it is replayed.
Description
BACKGROUND

The present invention relates to computer systems, and more specifically to improved performance in replicating a row table to a columnar table.


In a row oriented relational database, data is organized in the tables by rows. Each row can be considered a record. This organization is optimized for high concurrency transactional applications (OLTP). In a column oriented database, the data is organized by field, with data associated with a field being kept together. This organization is optimized for fast retrieval of columns of data, as in analytical applications.


However, a business enterprise needs the same data in both row and columnar organization, although not necessarily simultaneously. In order to keep the data in the two database systems concurrent, the business enterprise replicates the row tables from the OLTP database to the columnar tables in the OLAP database warehouse in real time. However, the replay of the row data at the target database tends to be a performance bottleneck, especially during peak hours when transaction activity at the source database is high.


It would therefore be advantageous to provide improved row table to columnar table replication performance.


SUMMARY

A method is provided. Statistics for each columnar table are extracted from an online analytical processing (OLAP) database catalog. A transaction table map is created from transaction log records of an online transaction processing (OLTP) database. The transaction table map includes a counter of log records in a transaction and a timestamp indicating a longevity of the transaction. Based on the counter or the longevity exceeding a predefined threshold, the transaction is transformed, and sent to the OLAP database where it is replayed.


Embodiments are further directed to computer systems and computer program products having substantially the same features as the above-described computer-implemented method.





BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

The subject matter that is regarded as the invention is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other features, and advantages of the invention are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:



FIG. 1 illustrates the operating environment of a replication apply system, according to an embodiment of the present invention;



FIG. 2 illustrates a flow chart of replication apply processing from the source database view, in accordance with one or more aspects of the present invention;



FIG. 3 illustrates a transaction that is transformed for streaming;



FIG. 4 illustrates a flow chart of replication apply processing from the target database view, in accordance with one or more aspects of the present invention;



FIG. 5 illustrates applying the transformed transaction; and



FIG. 6 illustrates as-is transactions at the OLAP system.





DETAILED DESCRIPTION

Beginning now with, FIG. 1 an illustration of the operating environment of the replication capture program is presented. The environment 100 may include source OLTP system (OLTP system) 102 and a target OLAP system (OLAP system) 122 interconnected via a communication network 120.


The communication network 120 may include various types of communication networks, such as a wide area network (WAN), local area network (LAN), a telecommunication network, a wireless network, a public switched network and/or a satellite network. The communication network 120 may include connections, such as wire, wireless communication links, or fiber optic cables. It may be appreciated that FIG. 1 provides only an illustration of one implementation and does not imply any limitations with regard to the environments in which different embodiments may be implemented. Many modifications to the depicted environments may be made based on design and implementation requirements.


The OLTP system 102 includes at least one processor 104 and at least one data storage device 106. The OLTP system 102 also includes a memory 101. Data from the data storage device 106, including log records from the database logs 110 and rows from the row oriented database 108 can be loaded into the memory 101 for processing by the replication capture program 112 via the one or more processors 104. Additionally, the memory 101 can include log records from transactions that have not yet completed.


The replication capture program 112 includes a message formatter 114, a message evaluator 116, and a message converter 118. The message formatter 114 sequentially processes each log record in the memory 101, and formats the data in preparation for transmission to the OLAP system 122. The message evaluator 116 uses statistics from the OLAP system 122 to determine whether to transfer the data in a stream or in a sequential load. The message formatter 114 processes each log record in the database logs 110 of the row oriented database 108 and maps each transaction. The message converter 118 creates the table load SQL statements. When the data is ready for replaying at the OLAP system 122, the data is transmitted over the communication network 120. In some embodiments, the replication capture program 112 and the replication apply program 132 are integrated into the software of the row oriented database 108 DBMS and the software of the column oriented database 128 DBMS. In some embodiments, the replication capture program 112 and the replication apply program 132 are provided separately from the respective DBMS software, but are customized according to their respective features.


The OLAP system 122 similarly includes at least one processor 124, and at least one data storage device 126. The OLAP system 122 also includes a system memory 121. Data from the OLTP system 102 is received by the replication apply program 132, and is subsequently formatted by the apply formatter 134 prior to being replayed at the columnar oriented database 128. As it is applied to the columnar oriented database 128, the data is written to the database logs 134 at the target.



FIG. 2 illustrates a flow chart of replication apply processing from the source database view.


At startup, the message evaluator 116 of the replication capture program 112 receives statistic information for each columnar table from the OLAP system 122. The statistics include optimizer information, such as most efficient retrieval paths, e.g., sequential access or access by index, for various SQL statements. These statistics are stored in the DBMS catalog and can be retrieved using SQL statements. The OLAP system 122 can provide the statistics in several ways. For example, the OLTP 102 can poll the OLAP system 122 at predetermined intervals. Similarly, the OLAP 122 can periodically push the statistics to the OLTP 102 at predetermined intervals, or when the statistics are updated. At 205 the message formatter 114 traverses the log records in the memory 101 of the OLTP system 102. Log records in memory are typically those which belong to an open transaction, i.e., a transaction that has not yet completed, either through a commit or a rollback.


At 210, for each log record found, the message formatter 114 locates the operation field in the log record for an indication that the log record is either an insert operation, an update operation, or a delete operation (IUD) (215). If the log record is not an IUD operation, and is not a commit log record (220), the message formatter 114 proceeds to examine the next record for the same criteria.


If, at 215, the log record is an IUD record, then the message formatter 114 adds the log record to the table map corresponding to the transaction (225). The table map records the log records for each transaction, including their timestamps, transaction identifiers, and operation, to determine which transactions can be converted for transmission to the OLAP system 122. The message formatter 114 uses the timestamps to track the transaction longevity, i.e., the elapsed time since the start of the transaction. The table map also includes a table row counter to track how many log records are part of the current transaction.


At 245, the message formatter 114 identifies a transaction as qualified or as-is. If the table row counter or the transaction longevity exceeds a predefined threshold without the transaction having issued a commit or rollback, it is a long transaction (qualified). At 250 the replication capture program 112 invokes the message converter 118 to compose external table load SQL statements with their data portion for the current running transaction. This action is taken to prevent latency during replaying the transaction records at the OLAP 122 from causing performance degradation at the OLTP system 102. At 255, the transaction is sent to the OLAP 122.


Returning now to 220, if the current log record is a commit record, then the replication capture program 112 invokes the message evaluator 116 (230). The message evaluator 116 maintains the statistics extracted from the OLAP 122. For each columnar table, the message evaluator 116 uses the statistics to compare the overhead of the row operations to an external table load (235).


If at 235, the transaction can be transformed into a stream, then at 250 the replication capture program 112 invokes the message converter 118 to compose external table load SQL statements with their data portion for the current running transaction. At 255, the transaction is sent to the OLAP 122. However, if at 235 the transaction does not qualify for transformation, then the transaction is not qualified and is sent as-is to the OLAP system 122.



FIG. 3 illustrates an example of log records from the database log of the OLTP system 102 (310), and a transaction transformed for streaming to the OLAP system 122 (320). Although FIG. 3 shows the example of “Tx1” it should be understood that log records associated with several different transactions can be interleaved in the database log. The message formatter 116 separates the log records according to transaction identifier and timestamp within transaction identifier, and builds a separate transaction table map for each transaction.


In 310, the transaction, “Tx1” includes a series of individual insert statements, along with the associate table name “Tab1,” columns into which data will be inserted, the values of the data to be inserted, and a timestamp associated with the record creation. Finally, the transaction ends with a commit operation.


In 320, the message converter 118 transformed the transaction “Tx1” into an external table load streaming transaction, “Tx1′,” that will be transmitted to the OLAP system 122. “Tx1′” now comprises one insert statement that includes each data value from “Tx1” delimited by a comma “,”. As in 310, the transaction “Tx1′” ends with a commit operation.


Although not shown in an example, update operations are transformed into a series of delete and insert operations.


For as-is transactions, the 320 transformation does not occur. Instead, the individual transaction records, shown in 310, are transmitted.



FIG. 4 illustrates a flow chart of replication apply processing from the target database view.


At 405, the apply formatter 134 at the OLAP system 122 receives the transaction records from the OLTP system 102.


At 410, for each of the received transaction records, if the record is transformed (415), the apply formatter 134 creates a named pipe for each external load statement in each streaming transaction. The named pipe is uniquely identified, for example, by a combination of transaction identifier from the transaction and table identifier from the DBMS catalog (425). At 445 the apply formatter 134 parses the received streamed transaction. The delimited data (325) is fed to the named pipe created in the OLAP system 122. The named pipe replaces the place holders in the insert statement. The apply formatter 134 then executes the insert statement. At completion, the named pipe is removed.


Returning now to 415, for as-is transactions, the individual transaction records are micro batched in memory and executed on a table basis (420).



FIG. 5 illustrates applying the transformed transaction by the replication apply program 132. The received transformed transaction is shown as 510, and includes the transaction identifier, “Tx1′”, the insert statement indicating the use of an external load, and the individual fields which are substituted into the insert statement. The named pipe is shown as fifo.Tx1.Tab1 (520). The actual insert statement is shown as 525, with the OLAP system 122 shown as 530.



FIG. 6 illustrates as-is transactions at the OLAP system 122. Transaction records for each table are indicated by 610. Each of the transactions, “Txa”, “Txb”, and “Txc” includes insert statements for different tables. For example, “Txa” includes insert statements for “Taba”, “Tabb”, and “Tabc”. The apply formatter 134 sorts each insert statement by table, referred to as micro batching, resulting in one transaction, “Txa′” (620). This may tend to reduce I/O operations and overhead at the DBMS.


The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, element components, and/or groups thereof.


The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.


As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.


Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.


A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.


A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electromagnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.


Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wire line, optical fiber cable, RF, etc., or any suitable combination of the foregoing.


Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).


Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.


These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.


The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.


The flow diagrams depicted herein are just one example. There may be many variations to this diagram or the steps (or operations) described therein without departing from the spirit of the invention. For instance, the steps may be performed in a differing order or steps may be added, deleted or modified. All of these variations are considered a part of the claimed invention.


While the preferred embodiment to the invention had been described, it will be understood that those skilled in the art, both now and in the future, may make various improvements and enhancements which fall within the scope of the claims which follow. These claims should be constructed to maintain the proper protection for the invention first described.

Claims
  • 1. A method, comprising: extracting statistics for each columnar table from an online analytical processing (OLAP) database catalog;building a transaction table map from transaction log records of an online transaction processing (OLTP) database, wherein the transaction table map includes a counter of log records in a transaction and a timestamp indicating a longevity of the transaction;based on the counter or the longevity exceeding a predefined threshold, transforming the transaction;sending the transformed transaction to the OLAP database; andreplaying the transformed transaction at the OLAP.
  • 2. The method of claim 1, wherein the statistics identify a transaction that is a candidate for transformation.
  • 3. The method of claim 1, wherein the transforming the transaction further comprises: creating an external table load statement, comprising an insert operation, an identifier of a data delimiter, a placeholder naming an external table; andfollowing the external table load statement, inserting a plurality of data delimited data values, and a commit statement.
  • 4. The method of claim 1, further comprising: receiving the transformed transaction at the OLAP;creating a temporary external table file, into which is loaded the plurality of data delimited data values;replacing the placeholder with a name of the temporary external table file;creating a uniquely named pipe; andexecuting the external table load statement.
  • 5. The method of claim 1, wherein a transaction not exceeding the counter or the longevity is an as-is transaction.
  • 6. The method of claim 1, wherein the log records of an as-is transaction are sent directly to the OLAP database without transformation.
  • 7. The method of claim 1, further comprising: sorting, at the OLAP database, the log records of an as-is transaction are by table and timestamp within table; andreplaying the log records at the OLAP database.
  • 8. A computer program product, the computer program product comprising a non-transitory tangible storage device having program code embodied therewith, the program code executable by a processor of a computer to perform a method, the method comprising: extracting statistics for each columnar table from an online analytical processing (OLAP) database catalog;building a transaction table map from transaction log records of an online transaction processing (OLTP) database, wherein the transaction table map includes a counter of log records in a transaction and a timestamp indicating a longevity of the transaction;based on the counter or the longevity exceeding a predefined threshold, transforming the transaction;sending the transformed transaction to the OLAP database; andreplaying the transformed transaction at the OLAP.
  • 9. The computer program product of claim 8, wherein the statistics identify a transaction that is a candidate for transformation.
  • 10. The computer program product of claim 8, wherein the transforming the transaction further comprises: creating an external table load statement, comprising an insert operation, an identifier of a data delimiter, a placeholder naming an external table; andfollowing the external table load statement, inserting a plurality of data delimited data values, and a commit statement.
  • 11. The computer program product of claim 8, further comprising: receiving the transformed transaction at the OLAP;creating a temporary external table file, into which is loaded the plurality of data delimited data values;replacing the placeholder with a name of the temporary external table file;creating a uniquely named pipe; andexecuting the external table load statement.
  • 12. The computer program product of claim 8, wherein a transaction not exceeding the counter or the longevity is considered as-is.
  • 13. The computer program product of claim 8, wherein the log records of an as-is transaction are sent directly to the OLAP database without transformation.
  • 14. The computer program product of claim 8, further comprising: sorting, at the OLAP database, the log records of an as-is transaction are by table and timestamp within table; andreplaying the log records at the OLAP database.
  • 15. The computer program product of claim 8, wherein the enhanced compiler executable code output is based on granularity of inner computation unit requirements of individual threads, and wherein the enhanced scheduler dispatches based on granularity of the thread.
  • 16. A computer system, comprising: one or more processors;a memory coupled to at least one of the processors;a set of computer program instructions stored in the memory and executed by at least one of the processors in order to perform actions of: extracting statistics for each columnar table from an online analytical processing (OLAP) database catalog;building a transaction table map from transaction log records of an online transaction processing (OLTP) database, wherein the transaction table map includes a counter of log records in a transaction and a timestamp indicating a longevity of the transaction;based on the counter or the longevity exceeding a predefined threshold, transforming the transaction;sending the transformed transaction to the OLAP database; andreplaying the transformed transaction at the OLAP.
  • 17. The computer system of claim 16, wherein the statistics identify a transaction that is a candidate for transformation.
  • 18. The computer system of claim 16, wherein the transforming the transaction further comprises: creating an external table load statement, comprising an insert operation, an identifier of a data delimiter, a placeholder naming an external table; andfollowing the external table load statement, inserting a plurality of data delimited data values, and a commit statement.
  • 19. The computer system of claim 16, further comprising: receiving the transformed transaction at the OLAP;creating a temporary external table file, into which is loaded the plurality of data delimited data values;replacing the placeholder with a name of the temporary external table file;creating a uniquely named pipe; andexecuting the external table load statement.
  • 20. The computer system of claim 16, wherein: a transaction not exceeding the counter or the longevity is an as-is transaction;the log records of an as-is transaction are sent directly to the OLAP database without transformation, sorted by table and timestamp within the transaction, and replayed at the OLAP database.