1. Field
Embodiments of the invention relate to reorganizing data with update activity.
2. Description of the Related Art
A Relational DataBase Management System (RDBMS) uses relational techniques for storing and retrieving data in a relational database. Relational databases are computerized information storage and retrieval systems. Relational databases are organized into tables that consist of rows and columns of data. The rows may be called “tuples”, “records” or “rows”. A database typically has many tables, and each table typically has multiple records and multiple columns.
RDBMS software may use a Structured Query Language (SQL) interface. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO).
Customer business and application environments emphasize the requirement for continuous data availability. However, data may need to be reorganized, either for performance reasons, due to metadata changes or for physical space reclamation. As a result, data reorganization utilities provide the capability to reorganize data while maintaining near-full update activity against the data, and this capability may also be referred to as “online REORG”. An update activity may be described as an insert activity that inserts a data object into a database, a delete activity that deletes a data object from the data base or an update activity that modifies a data object in the database. A data object may be described as some element of the database (e.g., a row or a large object (“LOB”)). In particular, when an original data set is to be reorganized, data from the original data set is copied to form a shadow data set (i.e., a copy of the original data set) so that the shadow data set may be reorganized while the original data set is being accessed. During this copy operation, other changes may have been received, and information on these changes is stored in an update log. The update log may be described as storing information on update activity for a database. Also, the original data set may include all of the changes in the update log, but, some updates may be missed from the original copy of the original data set to the shadow data set. The update log is scanned and updates to the shadow data set are generated using the data stored in the update log. For a short period, updates to the data in the original data set being reorganized are denied to allow a log read process to complete reading the update log and updating the shadow data set, and then all access is denied as the shadow and original data sets are switched. Once the switch is done, the reorganization is completed.
There are several drawbacks to this solution. For example, this solution is complex and relies on regenerating data updates from an update log. Also, this solution requires logging of the actual data, so the solution may not work with structures for which logging of data is disabled, such as for Large Object (LOB) table spaces when no logging has been specified. Also, the solution requires a mapping table to map data entries in an original data set structure for the original data set to data entries in a shadow data set structure for the shadow data set.
Thus, there is a need in the art for an improved solution for reorganizing data with update activity that may be used for situations in which logging is enabled and situations in which logging is disabled.
Provided are a method, computer program product, and system for reorganizing data. Data is retrieved from an original data set and inserted into a shadow data set. A log record is read from an update log, wherein the log record includes a unique key identifying a data object and an indication of an activity associated with that data object. The activity associated with the data object is performed by determining whether the unique key is found in a shadow index for the shadow data set.
Referring now to the drawings in which like reference numbers represent corresponding parts throughout:
In the following description, reference is made to the accompanying drawings which form a part hereof and which illustrate several embodiments of the invention. It is understood that other embodiments may be utilized and structural and operational changes may be made without departing from the scope of the invention.
The server computer 120 includes system memory 122, which may be implemented in volatile and/or non-volatile devices. System memory 122 stores a data store manager 130 (e.g., a Relational DataBase Management System (RDBMS)) and one or more server applications 140. The data store manager 130 includes a data reorganizer 132 and may include one or more other components 134. These computer programs that are stored in system memory 122 are executed by a processor (e.g., a Central Processing Unit (CPU)) (not shown). The server computer 120 provides the client computer 100 with access to data in a data store 170.
In alternative embodiments, the computer programs may be implemented as hardware, software, firmware or a combination of any of these.
The client computer 100 and server computer 120 may comprise any computing device known in the art, such as a server, mainframe, workstation, personal computer, hand held computer, laptop telephony device, network appliance, etc.
The network 190 may comprise any type of network, such as, for example, a Storage Area Network (SAN), a Local Area Network (LAN), Wide Area Network (WAN), the Internet, an Intranet, etc.
The data store 170 may comprise an array of storage devices, such as Direct Access Storage Devices (DASDs), Just a Bunch of Disks (JBOD), Redundant Array of Independent Disks (RAID), virtualization device, etc.
Embodiments provide the ability to reorganize data, including structures (e.g., relational tables), for which no actual data is logged during insert, update or delete activity, while providing data availability.
Thus, embodiments provide a unique key for each data object, and the key is logged when update activity occurs against the data entry. Unlike conventional solutions, with embodiments, there is no requirement for the logging of the data of the data object.
With embodiments, existing data is retrieved (or “extracted”) from the original data set and inserted into a shadow data set. The update log is then read. Any log records encountered for insert activity or update activity results in a corresponding data entry being retrieved from the original data set and inserted into the shadow data set, throwing away the old entry in the shadow data set, if necessary. Any log records encountered for delete activity results in a corresponding data entry being deleted from the shadow data set.
Embodiments retrieve the necessary data from the original data set as the update log is processed. If an entry cannot be found in the original data set, then it is ignored because the assumption is that it has been subsequently deleted. In certain embodiments, activities found on the update log may or may not be reflected in the shadow data set, however, at the end of the data reorganization, the original data set and shadow data set are consistent.
Embodiments rely on update log records uniquely identifying each data entry being updated, inserted or deleted, but do not require the data itself to be present in the update log. For example, embodiments provide a REORG SHRLEVEL CHANGE solution for large objects (LOBS) in a DB2® for z/OS® system (available from International Business Machines Corporation). The indication of CHANGE in the REORG SHRLEVEL CHANGE solution indicates that the reorganization described herein is to be used.
In block 508, the data reorganizer 132 determines whether all log records have been selected. Each log record includes a unique key identifying a data object and an indication of an activity associated with that data object. In certain embodiments, each log record does not include data associated with the data object. If so, processing continues to block 532 (
In block 514, the data reorganizer 132 determines whether a unique key for the log record is found in a shadow index (i.e., an index to the shadow data set). If so, processing continues to block 516, otherwise, processing continues to block 520. In block 516, the data reorganizer 132 retrieves the data object from the original data set. In block 518, the data reorganizer 132 inserts the data object into the shadow data set. From block 518 (
In certain embodiments, such as those in which a data object is a LOB, when the data object is to be updated, the data object is deleted and then a modified data object is inserted.
Returning to
Retuning to
Returning to
DB2 and z/OS are registered trademarks or common law marks of International Business Machines Corporation in the United States and/or other countries.
The described operations may be implemented as a method, computer program product or apparatus using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof.
Each of the embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. The embodiments may be implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
Furthermore, the embodiments may take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium may be any apparatus that may contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The described operations may be implemented as code maintained in a computer-usable or computer readable medium, where a processor may read and execute the code from the computer readable medium. The medium may be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a rigid magnetic disk, an optical disk, magnetic storage medium (e.g., hard disk drives, floppy disks, tape, etc.), volatile and non-volatile memory devices (e.g., a random access memory (RAM), DRAMs, SRAMs, a read-only memory (ROM), PROMs, EEPROMs, Flash Memory, firmware, programmable logic, etc.). Current examples of optical disks include compact disk—read only memory (CD-ROM), compact disk—read/write (CD-R/W) R/W) and DVD.
The code implementing the described operations may further be implemented in hardware logic (e.g., an integrated circuit chip, Programmable Gate Array (PGA), Application Specific Integrated Circuit (ASIC), etc.). Still further, the code implementing the described operations may be implemented in “transmission signals”, where transmission signals may propagate through space or through a transmission media, such as an optical fiber, copper wire, etc. The transmission signals in which the code or logic is encoded may further comprise a wireless signal, satellite transmission, radio waves, infrared signals, Bluetooth, etc. The transmission signals in which the code or logic is encoded is capable of being transmitted by a transmitting station and received by a receiving station, where the code or logic encoded in the transmission signal may be decoded and stored in hardware or a computer readable medium at the receiving and transmitting stations or devices.
A computer program product may comprise computer useable or computer readable media, hardware logic, and/or transmission signals in which code may be implemented. Of course, those skilled in the art will recognize that many modifications may be made to this configuration without departing from the scope of the embodiments, and that the computer program product may comprise any suitable information bearing medium known in the art.
The term logic may include, by way of example, software, hardware, firmware, and/or combinations of software and hardware.
Certain implementations may be directed to a method for deploying computing infrastructure by a person or automated processing integrating computer-readable code into a computing system, wherein the code in combination with the computing system is enabled to perform the operations of the described implementations.
The logic of
The illustrated logic of
Input/output or I/O devices 812, 814 (including but not limited to keyboards, displays, pointing devices, etc.) may be coupled to the system either directly or through intervening I/O controllers 810.
Network adapters 808 may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters 808.
The system architecture 800 may be coupled to storage 816 (e.g., a non-volatile storage area, such as magnetic disk drives, optical disk drives, a tape drive, etc.). The storage 810 may comprise an internal storage device or an attached or network accessible storage. Computer programs 806 in storage 810 may be loaded into the memory elements 804 and executed by a processor 802 in a manner known in the art.
The system architecture 800 may include fewer components than illustrated, additional components not illustrated herein, or some combination of the components illustrated and additional components. The system architecture 800 may comprise any computing device known in the art, such as a mainfiame, server, personal computer, workstation, laptop, handheld computer, telephony device, network appliance, virtualization device, storage controller, etc.
The foregoing description of embodiments of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the embodiments of the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the embodiments of the invention be limited not by this detailed description, but rather by the claims appended hereto. The above specification, examples and data provide a complete description of the manufacture and use of the composition of the embodiments of the invention. Since many embodiments of the invention may be made without departing from the spirit and scope of the embodiments of the invention, the embodiments of the invention reside in the claims hereinafter appended or any subsequently-filed claims, and their equivalents.