The present disclosure relates in general to the field of data storage, and more specifically, to dynamically changing the architecture of a dataset while allowing concurrent user access to data in the dataset.
Mass storage devices (MSDs) are used to store large quantities of data. A wide variety of entities utilize MSDs to enable continuous or near-continuous access to the data. Retailers, government agencies and services, educational institutions, transportation services, and health care organizations are among a few entities that may provide ‘always on’ access to their data by customers, employees, students, or other authorized users.
A database is one example of a data structure used to store large quantities of data as an organized collection of information. Typically, databases have a logical structure such that a user accessing the data in the database sees logical data columns arranged in logical data rows. A Database Administrator (DBA) typically uses current technology to architect a database for a given entity. While the initial architecture may provide resources and expansion capabilities, technology advances may render the initial architecture comparatively inefficient and expensive. To exploit new data storage technology, however, a change in the architecture is often needed. For some entities, reconstructing the architecture and migrating old datasets to the newly constructed datasets requires significant downtime in which the database is ‘off-line’ and unavailable to users. In many scenarios, this downtime may not be acceptable.
According to one aspect of the present disclosure, a first migration of data rows in a source dataset in a source storage device to a target dataset in a target storage device is initiated. A block size defined for the target dataset can be different than a block size defined for the source dataset. Buffers in memory are available to handle both the source and target block size. During the first migration, a user request for access to a first data row in the source dataset can be received. A determination can be made that the first data row was migrated to a first target block in the target dataset. The first target block can be loaded from the target dataset into a first buffer in memory. A response to the user request can be made using the first data row in the first target block loaded into the first buffer.
Like reference numbers and designations in the various drawings indicate like elements.
As will be appreciated by one skilled in the art, aspects of the present disclosure may be illustrated and described herein in any of a number of patentable classes or context including any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof. Accordingly, aspects of the present disclosure may be implemented entirely in hardware, entirely software (including firmware, resident software, micro-code, etc.) or combining software and hardware implementations that may all generally be referred to herein as a “circuit,” “module,” “component,” “manager,” “gateway,” or “system.” Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable media having computer readable program code embodied thereon.
Any combination of one or more computer readable media may be utilized. The computer readable media may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM or Flash memory), an electrically erasable read only memory (EEPROM), an appropriate optical fiber with a repeater, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable signal medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, radio frequency (RF), etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, CII, VB.NET, Python or the like, conventional procedural programming languages, such as the “C” programming language, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, assembly language, dynamic programming languages such as Python, Ruby and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made through an external computer (for example, through the Internet using an Internet Service Provider) or in a cloud computing environment or offered as a service such as a Software as a Service (SaaS). Generally, any combination of one or more user computers and/or one or more remote computers may be utilized for executing the program code.
Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatuses (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general-purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable instruction execution apparatus, create a mechanism for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that, when executed, can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions when stored in the computer readable medium produce an article of manufacture including instructions that, when executed, cause a computer to implement the function/act specified in the flowchart and/or block diagram block or blocks. The computer program instructions may also be loaded onto a computer, other programmable instruction execution apparatus, or other devices to cause a series of operations to be performed on the computer, other programmable apparatuses or other devices to produce a computer implemented process such that the instructions, which execute on the computer or other programmable apparatus, provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
Referring now to
In at least one embodiment, network server 130 is configured to dynamically change the architecture of an existing dataset of a storage device (e.g., 140A-140C) while allowing concurrent user access (e.g., retrieving, reading, modifying, adding, deleting, etc.) of data in that dataset. The architecture of an existing (source) dataset can be changed by allocating a new (target) dataset on a separate storage device (e.g., 150) that offers a desired architecture configuration, and then migrating the data from the source dataset to the newly allocated target dataset.
For purposes of illustrating certain example techniques of communication system 100 for dynamically changing the architecture of a dataset while allowing concurrent user access to data of the dataset, it is important to understand the activities that may be occurring in a network environment that includes data storage devices configured with data structures capable of hosting large quantities of data and providing online user access to the data. The following foundational information may be viewed as a basis from which the present disclosure may be properly explained.
Data structures are used by storage devices (e.g., MSDs, DASDs) to store massive amounts of data across virtually every sector of society including, but not limited to, social media, business, retail, health, education, and government. A database is one type of data structure and generally refers to an organized collection of data. Although the concepts presented herein are applicable to any type of data structures used in storage devices, most of the world's data is stored in a data structure commonly referred to as a database. Therefore, although the discussion herein may reference databases for ease of illustration, it should be understood that the concepts are also applicable to other types of data structures.
Databases can have a logical structure that an end user can view online, such as logical data columns arranged in logical data rows. These logical data columns are stored in a logical data table. A database can contain any number of data tables, and a data table can be stored in a dataset of a storage device. A dataset is the physical storage of a storage device and is typically a long string of data representing data bytes. Data rows and logical data columns are configured in data tables to enable data to be retrieved and presented in a user-friendly format.
Generally, large database environments are created using the dataset architecture that exists at the time of creation. As time passes, new architectures may be developed that offer more efficiency, speed, and storage than the old architectures. In order to change a block size and/or a device type used for a database, a Database Administrator (DBA) (or other authorized individual) performs several actions. First, user processing to the database data tables is stopped. Second, the database is closed to all processing (e.g., user accesses, utility processes, etc.). Third, the database datasets are backed up to an external device (e.g., data is copied to a tape or other storage device). Fourth, old datasets may be deleted. Fifth, the datasets are reallocated with the new architecture (e.g., new block sizes, new device types, etc.). Finally, the reallocated datasets are initialized and loaded with data from the backup. This process can take hours, days, or even weeks depending on the size of the datasets. During that time, users, utility processes, and batch processes are all prevented from accessing the data.
In past decades, entities seeking to convert their old database architectures to new database architectures typically had certain windows of opportunity when their databases would go offline (e.g., for periodic maintenance, etc.) and would be inaccessible to users. As the interconnected world has evolved, however, many applications no longer have a scheduled offline period. Rather, many consumers and other users expect 24/7 access to online data needed to conduct business, purchase goods, manage finances, access services (e.g., transportation, etc.), etc. Although datasets architected for older direct access storage device (DASD) models may need to be updated to current DASD architectures to exploit improved features of the newer architecture, often the user data in the datasets of the old architecture cannot be taken offline.
In one example, consumers may expect 24-hour access to a retailer's online application so that goods (e.g., shoes, clothing, electronics, cosmetics, etc.) can be purchased whenever the consumer desires. In another example, some interconnected systems around the world require availability to certain types of data across time-zones. For example, a country's customs/border control branch may require an online vetting application to be available at all times to allow transportation services (e.g., airlines, railroads, water transport, etc.) to receive clearance for travelers into the country.
In one specific example, an entity may have datasets defined as an older DASD architecture (e.g., IBM 3380) that are in-use and being emulated to run on current DASD technology. Due to the emulation, the datasets provide limited capabilities and reduced performance. For example, an IBM 3380 DASD, which was first available in the 1980s, is a device type characterized by a design specification of 47,476 bytes per track and 15 tracks per cylinder. The average seek time was 16 milliseconds. Newer DASD architecture such as the IBM 3390 is a device type characterized by a design specification of 56,664 bytes per track, 15 tracks per cylinder, and an average seek time of 9.5 milliseconds. Although many IBM 3380 devices have been replaced by modern DASD devices, during the conversion to the new hardware, the dataset definitions were often left unchanged to ensure compatibility with existing database processing. The new DASD devices may be capable of emulating the older mainframe architectures (e.g., IBM 3380), and so the amount and format of data is often defined using the specifications of the older architecture. Thus, capacity and capabilities of the new DASD hardware is limited due to the emulation of the older DASD architecture.
In another specific example, an entity may have datasets defined with older architectures that were defined for use in earlier processing complexes where data transfer rates of the DASD architecture were slower and users needed to limit block sizes to get the best input-output (I/O) throughput. Data, such as logical data rows, is stored in physical data blocks. These physical data blocks can range in size depending on the platform and the DASD hardware. For example, on the mainframe block sizes can be up to 32K bytes and are defined per user application. In older hardware devices the data transfer rate to retrieve a 16K block of data was typically more that the data transfer rate to move a 4K block of data. Consequently, smaller block sizes (e.g., 4K bytes) were often chosen when defining datasets for user applications using older architecture. Also, the actual transferred data block was stored in memory (also referred to herein as “data buffer,” “buffer” or “buffer memory”), and in older mainframe systems, the amount of memory was often limited. Database administrators (DBAs) needed to limit how much memory was used to store the retrieved data blocks. For most database applications, having four small blocks (4K) in buffer memory provided better performance than one large 16K block in a buffer memory.
Over time, significant changes have occurred both to the DASD storage devices as well as available memory in systems. Data transfer rates have grown exponentially, allowing much larger block sizes to perform at the same speed, while providing more data per I/O operation. These changes have greatly reduced the concern over data transfer rates. Moreover, the addition of (64-bit) memory has significantly increased the available memory for storing data in buffer memory. Thus, many database applications currently running on old architecture using a 4K block size, are likely to experience enhanced performance by increasing dataset block sizes to exploit more recent architecture (e.g., IBM 3390 DASD) implementations such as 16K or 28K block sizes.
In yet another example, there can be significant wasted DASD space and buffer memory when a poor block size is selected and implemented. This may occur when a DBA (or other individual who designs the database) does not have an adequate understanding of database buffering concepts. In a database, database blocks are stored in memory in data buffers. Data buffers are allocated in pools. A data buffer pool is chosen depending on the data block size that is retrieved. Accordingly, a buffer pool should be chosen that is as close to the data block size as possible without the data block size exceeding the buffer pool size. In some scenarios, however, a non-database practice of defining dataset block size as a multiple of data row size is sometimes used. By way of example, the following data block size selections are the result of defining the block sizes as 10- or 20-multiples of the data row size, which can yield significant wasted DASD space and buffer memory:
Thus, several scenarios can result in old dataset architectures being used even though newer technology offers more efficiency, space, and processing speed. Consequently, in some cases, databases rely on decades-old technology, resulting in higher costs, wasted resources, and unnecessary time spent waiting for processing to complete because they cannot afford for the data to be inaccessible to the end users.
A communication system, such as communication system 100 for dynamically changing the architecture of an existing dataset, as outlined in
More specifically, a DBA (or automatic process) can allocate a target dataset and define the preferred architecture, such as block size and device type. For example, an existing dataset defined on an IBM 3380 with a 4K block size may be re-architected to a target dataset defined on an IBM 3390 with a 28K block size. Once the target dataset is allocated, the architecture change process can be triggered when desired. In one embodiment, the architecture change process may first establish that the target dataset is sufficiently sized and suitably architected to hold the data tables being migrated from the source dataset. The architecture change process can establish an input-output (I/O) gateway around the source and target datasets to maintain consistency of reference for all data rows that are migrated from the source dataset to the target dataset. The I/O gateway begins migrating logical data rows from a data block in the source dataset to a data block in the target dataset. The data rows are migrated independently of data blocks, as the new architecture may change the number of data rows per data block. In at least one embodiment, the data rows are migrated in native sequence from the source dataset to the target dataset. Transactional logging may be provided for all data rows to enable a fully restartable and recoverable process in the event of an unintentional processing failure (e.g., power outage, processor failure, system failure, and other abnormal terminations, etc.).
One or more embodiments manage concurrent access to data in the datasets as data rows are migrated from the source dataset to the target dataset. End user processing is performed by logical data row and does not require a data row to be housed in a particular dataset. Thus, the I/O gateway manages access to the data rows by end users, where a particular data row may be accessed from either the source dataset or the target dataset depending upon whether it has been migrated at the time of the user request. The I/O gateway can also manage data row accesses by other database utility processes. This is achieved by ensuring that the data row migration is integrated with these other utility processes. For example, a utility process that attempts to run concurrently with I/O gateway may be blocked until a particular data row migration is complete. However, for at least some utility processes, the utility process is automatically integrated with the I/O gateway, which manages accesses to the source and target datasets by the utility process and allows for successful completion. In some cases, where the requested utility process is blocked because it conflicts with the migration process, an alternative utility process may be provided that performs the utility function integrated with the I/O gateway.
In one or more embodiments, the architecture change process can be completed by renaming the target dataset to the original name of the source dataset. The source dataset may be deleted or renamed. It should also be noted that multiple datasets can be re-architected at the same time. An I/O gateway can be created for each dataset being re-architected.
Embodiments of an architecture change process can offer several advantages. For example, embodiments described herein enable DBAs to quickly migrate data tables from one architecture to another, different architecture with enhanced capabilities and features. Moving to a different architecture can improve performance, remove restrictive requirements for older architectures (e.g., older DASD architectures), and reduce costs of maintaining the environment. The particular embodiments described herein for dynamically changing the architecture of a dataset enable a DBA to implement critical business required architecture changes without interrupting the business. Thus, users may continue to access needed data from a dataset being re-architected without any downtime.
Turning to
Generally, communication system 100 can be implemented in any type or topology of networks. Within the context of the disclosure, networks such as networks 110 and 115 represents a series of points or nodes of interconnected communication paths for receiving and transmitting packets of information that propagate through communication system 100. These networks offer communicative interfaces between sources, destinations, and intermediate nodes, and may include any local area network (LAN), virtual local area network (VLAN), wide area network (WAN) such as the Internet, wireless local area network (WLAN), metropolitan area network (MAN), Intranet, Extranet, virtual private network (VPN), and/or any other appropriate architecture or system that facilitates communications in a network environment or any suitable combination thereof. Additionally, radio signal communications over a cellular network may also be provided in communication system 100. Suitable interfaces and infrastructure may be provided to enable communication with the cellular network.
In general, “servers,” “clients,” “computing devices,” “storage devices,” “network elements,” “database systems,” “network servers,” “user devices,” “user terminals,” “systems,” etc. (e.g., 120, 130, 140A-140C, 150, 160, etc.) in example communication system 100, can include electronic computing devices operable to receive, transmit, process, store, or manage data and information associated with communication system 100. As used in this document, the term “computer,” “processor,” “processor device,” “processing device,” or “I/O controller” is intended to encompass any suitable processing device. For example, elements shown as single devices within communication system 100 may be implemented using a plurality of computing devices and processors, such as server pools including multiple server computers. Further, any, all, or some of the computing devices may be adapted to execute any operating system, including IBM zOS, Linux, UNIX, Microsoft Windows, Apple OS, Apple iOS, Google Android, Windows Server, etc., as well as virtual machines adapted to virtualize execution of a particular operating system, including customized and proprietary operating systems.
Further, servers, clients, computing devices, storage devices, network elements, database systems, network servers, user devices, user terminals, systems, etc. (e.g., 120, 130, 140A-140C, 150, 160, etc.) can each include one or more processors, computer-readable memory, and one or more interfaces, among other features and hardware. Servers can include any suitable software component, manager, controller, or module, or computing device(s) capable of hosting and/or serving software applications and services, including distributed, enterprise, or cloud-based software applications, data, and services. For instance, in some implementations, a network server 130, storage devices 140A-140C and 150, or other subsystem of communication system 100 can be at least partially (or wholly) cloud-implemented, web-based, or distributed to remotely host, serve, or otherwise manage data, software services and applications interfacing, coordinating with, dependent on, or used by other services, devices, and users (e.g., via network user terminal, other user terminals, etc.) in communication system 100. In some instances, a server, system, subsystem, or computing device can be implemented as some combination of devices that can be hosted on a common computing system, server, server pool, or cloud computing environment and share computing resources, including shared memory, processors, and interfaces.
While
Network server 230 may include a database management system (DBMS) 231, which creates and manages databases, including providing batch utilities, tools, and programs. A database manager 232 can create a database processing region (also referred to as a multi-user facility (MUF)) where user processing and most utility processes flow. During an architecture change process, database manager 232 can create an input/output (I/O) gateway 234. In at least one embodiment, I/O gateway 234 may be created temporarily in software and removed from DBMS 231 once the architecture is changed. I/O gateway 234, when executed, can create a background process 236, which migrates data rows from a source dataset (e.g., 242) to a target dataset (e.g., 252), while I/O gateway 234 handles concurrent user processing to access the data rows being migrated. I/O gateway 234 can also create a log file 233 to store information related to each data row migration. Thus, log file 233 can provide information that enables restartability and recoverability if the architecture change process experiences a failure (e.g., power outage, system failure, etc.). Log file 233 may be implemented internal or external to DBMS 231, based on particular implementations and needs. In
Network server 230 may also include hardware including, but not limited to, an I/O controller 235, a processor 237, and a memory element 239. The I/O controller 235 may facilitate communication to both source storage devices (e.g., 240) and target storage devices (e.g., 250), or in other implementations, multiple I/O controllers may be used. In some implementations, a user interface 270 may also be coupled to network server 230. User interface could be any suitable hardware (e.g., display screen, input devices such as a keyboard, mouse, trackball, touch, etc.) and corresponding software to enable an authorized user to communicate directly with network server 230. For example, in some scenarios, a DBA may configure target datasets and initiate the architecture change process using user interface 270.
At any given time, memory element 239 may contain data blocks 238-1 through 238-X, which are loaded into memory based on user access requests received for data rows contained in those blocks. In at least one embodiment, memory element 239 may contain buffer memory and data blocks 238-1 through 238-X may be loaded into buffers in the memory. Multiple users may access, via user terminals, data rows in data blocks that are loaded into memory element 239. Database manager 232 can also be configured to manage concurrency control for users accessing data rows simultaneously, so that adverse effects are prevented if multiple users try to modify resources other users are actively using.
Source storage device 240 and target storage device 250 are representative of different types of physical storage devices capable of storing data in data structures (e.g., databases) that enable multiple users, processes, and utilities to access and, in some cases, modify the stored data. Each storage device 240 and 250 includes a respective dataset 242 and 252, which is the physical storage of data in the storage device. Prior to an architecture change process, source dataset 242 may store data in data blocks 245-1 through 245-N. In at least some embodiments, during the architecture change process, a control block 247 may be added to unused space in source dataset 242 to hold information related to the data migration. Target dataset 252 may be allocated with defined blocks, such as data blocks 255-1 through 255-M, prior to an architecture change process being initiated for source dataset 242. During the architecture change process, a control block 257 may be added to unused space in target dataset 252 to hold information related to the data migration. The background migration process can cause data blocks 255-1 through 255-M to be filled with data rows from source dataset 242.
In at least one scenario, source dataset 242 may be defined with a different architecture than target dataset 252. For example, source dataset 242 may be defined on a less preferred architecture, such as an older data storage device using a small block size (e.g., IBM 3380 with a 4K block size). Target dataset 252 may be defined on a different architecture (e.g., a preferred architecture). In one example, target dataset 252 may be defined on newer technology that enables a larger block size to be utilized (e.g., IBM 3390 with 18K or 28K block size). Consequently, when the migration of source dataset 242 to target dataset 252 is complete, the number of data blocks (M) in target dataset 252 may be different than the number of data blocks (N) in source dataset 242 if their block sizes are different. For example, if source dataset 242 is defined on an IBM 3380 with a 4K block size and target dataset 252 is defined on an IBM 3390 with a 28K block size, then target dataset 252 will likely have fewer blocks than source dataset 242 (i.e., M<N).
Turning to
With reference to
Also in this example scenario, datasets 342A-342C are shown with different architectures. Dataset 342A is defined on a first mass storage device type (MSD-1) with a block size of 4K bytes. Dataset 342B is defined on a second mass storage device type (MSD-2) with a block size of 4K bytes. Dataset 342C is defined on another MSD-1 with a block size of 8K bytes.
Data processing region 337 receives flows of user requests from users via network user terminals 320 and from database administrator(s) via DBA user terminal 360. Data processing region 337 can also receive database access requests from utility and other non-end user processes. In operation, multiple users (e.g., tens, hundreds, thousands, etc.) can access the database concurrently via network user terminals 320.
At 302a, a user requests, via a network user terminal 320, access to a customer data row in customer data table 312A. Data processing region 337 receives the user request. At 302b, data processing region 337 determines the location of a data block that contains the requested data row. In this example, data processing region 337 determines the location of the data block, which is in dataset 342A of storage device 340A.
At 302c, data processing region 337 retrieves into memory 339 the identified data block from the appropriate dataset holding the customer data table. The data block is retrieved into memory as block 338-1, with requested data row 335. In one example, block 338-1 may be stored in buffer memory of memory 339. At 302d, the requested data row 335 is extracted and returned to the network user terminal that submitted the user request at 302a.
User accesses to other data tables (e.g., 312B, 312C) may occur at least partially concurrently to the user access of customer data table 312A. In addition, other user accesses to customer data table 312A may also occur at least partially concurrently with the user access shown and described in
In
A database pre-processing utility application may also be executed to prepare the target dataset for data migration from the source dataset. For example, pre-processing may include verifying the presence of source storage device 340A, target storage device 350, source dataset 342A, target dataset 352, the readiness of target dataset 352 for the data migration, etc. A utility application or the DBA may also ensure that enough buffer memory is available in memory 339 for the new target dataset 352.
In
In response to the command to start processing, database manager 332 creates can create an input/output (I/O) gateway 334 in memory to isolate processing for source dataset 342A while it is being re-architected. The I/O gateway 334 may be a dynamically generated, temporary process that runs in a separate processing region to handle the data migration of the source dataset to the target dataset and the concurrent user requests (and utility process requests) for access to data in source dataset 342A during the data migration. Database manager 332 forwards user requests and utility process requests for access to source dataset 342A to I/O gateway 334. The location of a requested data row in dataset 342A at any given time during the architecture change process depends on whether the data row has been migrated. I/O gateway 334 keeps track of where each data row is located during the migration and handles user requests (and utility process requests) accordingly.
Once the I/O gateway is created, as shown in
Once the datasets are open and connections are established, as shown in
In at least one embodiment, background process 336 migrates data rows sequentially, rather than as a block. In one example, background process 336 migrates the data rows in native sequence. Native sequence is intended to mean a preferred order for the data rows. Often, the preferred order is selected (e.g., by a DBA or designer of the database) based on the most likely processing sequence of the data rows. For example, if requests are typically made in a particular order, then the performance of the database may be increased if data is stored in the dataset in the same order as the most common user requests and/or batch utility requests. It should be noted that, when migrating in native sequence, data rows may be selected across multiple blocks of storage in source dataset 342A. For example, the first 4K block may contain the first data row to migrate, the second 4K block may contain the second data row to migrate, the fifth 4K block may contain the third data row to migrate, and so on. In other embodiments, background process 336 may simply migrate the data rows based on their current order in source dataset 342A or in any other desired order based on particular implementations and needs.
As shown in
In many scenarios, it is desirable to perform the migration as quickly as possible. Therefore, in at least one embodiment, as background process 336 performs the data migration, any available processing power may be used to migrate the data. However, some processing power is also allocated to end user requests for data in source dataset 342A. The user requests are directed through I/O gateway 334 so that the users can access any desired data row from source dataset 342A during the architecture change process of source dataset 342A.
In at least one embodiment, upon receiving a request to run database backup utility application 380, database manager 332 may send a response to DBA user terminal 360 denying the request and offering to run an alternative backup utility application within I/O gateway 334 during the data migration. If the DBA agrees to the alternative backup application, database manager 332 can instruct I/O gateway 334 to run the alternative database backup utility application. The alternative database backup utility application is integrated with the I/O gateway 334 such that data rows are provided to the integrated application from the I/O gateway, which has access to both datasets 342A and 352. Thus, the I/O gateway controls and coordinates the backup process with the data migration so that an accurate backup can be performed. The integrated application can store the data rows received from the I/O gateway in another data storage device, such as dataset backup 383. Database manager 332 may provide status messages related to the alternative backup utility process.
After the migration is complete, I/O gateway 334 can be disconnected from source dataset 342A, as shown in
Turning to
At 404, a target dataset is allocated on the target storage device, such as target dataset 252 on target storage device 250, and the selected architecture is defined for the target dataset.
At 406, pre-processing tasks may be performed before the architecture change process begins. For example, pre-processing tasks may include verifying the presence of the target storage device and target dataset, initializing the target dataset to the appropriate database internal format, verifying the presence of the source storage device and source dataset, and the overall readiness of the source and target datasets for the migration.
In
At 412, the database manager can output start messages to indicate the architecture change process has been initiated. Messages may be sent to a display and/or a log file of messages during the architecture change process. The display may be, for example, a display device of a DBA user terminal or any other display device configured to receive messages from database manager 232.
At 414, database manager 232 can build or create an input/output (I/O) gateway, such as I/O gateway 234 to run in a separate processing region. I/O gateway can open source dataset 242 and target dataset 252 and establish connections to the datasets.
I/O gateway 234 is created to re-architect the source dataset, but not other datasets. Thus, I/O gateway 234 handles only user requests and possibly utility application requests for data rows stored in the gateway's associated source dataset. In at least one embodiment, I/O gateway is temporary and is removed when the architecture change process completes. In other embodiments, I/O gateway 234 may be stopped, stored, and retrieved for later use as an I/O gateway for another source dataset.
When I/O gateway 234 establishes connections to source dataset 242 and target dataset 252, database manager 232 can output a status message at 416 indicating that the I/O gateway is ready, and the architecture change process can begin.
At 418, database manager 232 can provide user requests for data in source dataset 242 to I/O gateway 234 and can receive and appropriately forward responses to those requests from the I/O gateway 234, until the architecture change process is complete. An example of this processing is discussed in further detail with reference to
At 420, once the architecture change process is complete, the database manager 232 can remove the I/O gateway, establish a connection to the target dataset including opening the target dataset, and return to normal processing. Normal processing includes receiving and responding to user requests for data rows in the target dataset by accessing the target dataset, locating the appropriate data rows, and loading the appropriate blocks on the target dataset into memory. Normal processing also includes allowing utility processes that request access to the target dataset to run. At 422, database manager 232 can output a status message indicating that the architecture change process is complete.
At 501, I/O gateway 234 opens source dataset 242 and target dataset 252. I/O gateway 234 also establishes connections to the source and target datasets.
At 502, I/O gateway 234 can initiate a background process to migrate data rows from source dataset 242 to target dataset 252.
At 504, unused space is identified in both the source dataset 242 and the target dataset 252. A control block can be built on both the identified unused space in the source dataset and the identified unused space in the target dataset. The control blocks can be used to store a last migrated key during the migration of data rows from the source dataset to the target dataset. In one embodiment, each row has a unique key value, and the migration of the data rows is performed sequentially based on the unique key values. In one example, the key values can correspond to the physical order in which the data rows are stored in the source dataset.
In another embodiment, the key values can correspond to the native sequence of the data rows. Over time, data rows in a dataset may become out-of-native-sequence due to modifications to the data rows (e.g., insertions, deletions). In order to migrate the data rows of source dataset 242 in native sequence, the rows may be selected for migration based on each row's native key value. Thus, the migration can effectively reorder the data rows into a native key sequence in target dataset 252.
At 506, the first block in which data rows are to be stored in target dataset 252 is identified. At 508, the first data row to migrate from the source dataset is selected. The data row may be selected based on the last migrated key. Because no data rows have been migrated yet, the value of the last migrated key may be null or zero in some examples. Therefore, in this example, the first data row could be selected based on its associated key value being the lowest key value in a sequence of all the key values associated with data rows in source dataset 242. As previously noted, the key values may be based on any desired order of the data rows depending on particular needs and implementations. For example, the key values may be based on a native sequence of the data rows or a stored sequence of the data rows.
At 510, the selected data row is migrated from source dataset 242 to the identified block in target dataset 252. At 512, the key value associated with the migrated data row is stored in the control blocks in both the source dataset and the target dataset as the last migrated key value. The last migrated key value stored in the control blocks provides a reference to enable identification of which data rows have been migrated at any given time during the migration. For example, the last migrated key value stored in the control blocks can indicate that the data row associated with the last migrated key value, and any other data rows associated with key values that are less than the last migrated key value, have been successfully migrated.
At 518, a message indicating the status of migration may be produced. Status messages may include the number of rows successfully migrated in one example. These messages may not be produced after every data row migration, but rather, may be produced periodically (e.g., 10,000 data rows migrated, 20,000 data rows migrated, etc.). In one embodiment, this message or information can be provided to database manager 232, which can then output the message to an appropriate display or log file of status messages.
At 520, in
At 524, a determination is made as to whether there are more data rows in source dataset 242 to be migrated. If there are more data rows to be migrated, then at 526, a determination is made as to whether the identified block in target dataset 252 is filled. If the identified target data block is filled, then at 528, a next block in the target dataset is identified to store with data rows from the source dataset.
If the next block in the target dataset is identified at 528, or if the currently-identified block in the target dataset is determined not to be filled at 526, then the flow loops back to 508, where the next data row is selected to migrate from source dataset 242 to target dataset 252. The last migrated key value is retrieved from the control block of the source dataset or the target dataset. In this case, the last migrated key value from the control block is the key value associated with the first selected data row. The next data row to select is identified by determining the next sequential key value, after the last migrated key value, of a data row in the source dataset.
Flow then continues this loop as previously described until eventually, at 524, it is determined that the source dataset contains no more rows to be migrated. I/O gateway 234 may disconnect from source dataset 242 but retain its connection with target dataset 252. At 530, a message is produced indicating the status of the migrated data rows. In at least one embodiment, information indicating the total amount of data rows that have been migrated may be provided to database manager 232. Database manager 232 may then output the status message to the appropriate display and/or log file of status messages.
Operations at 532-542 are related to enabling database manager to resume normal operations with target dataset 252 replacing source dataset 242 in the database environment. In some cases, one or more operations at 532-542 may be performed by I/O gateway 234, database manager 232, background process 236, and/or other background processes initiated for these activities.
At 532, the original file name of source dataset 242 is released by either deleting or renaming the source dataset. At 534, a message may be produced indicating the status of the source dataset (e.g., deleted or renamed). In at least one embodiment, information indicating the status of the source dataset may be provided to database manager 232. Database manager 232 may then output the status message to the appropriate display and/or log file of status messages.
At 536, target dataset 252 is renamed to the original file name of the source dataset. At 538, a message may be produced indicating the status of the target dataset (e.g., renamed to original name of source dataset). In at least one embodiment, information indicating the status of the target dataset may be provided to database manager 232. Database manager 232 may then output the status message to the appropriate display and/or log file of status messages.
At 540, the log file of data row migrations may be deleted by I/O gateway 234. In other embodiments, the log file of data row migrations may be deleted after the I/O gateway has stopped running (e.g., by database manager 232), or may be saved for any desired length of time.
At 542, I/O gateway 234 is disconnected from the target dataset and the I/O gateway stops handling user requests or utility process requests. As indicated in
At 602, a user request for access to a data row in a dataset is received. At 604, a determination can be made as to whether the dataset is associated with an I/O gateway. A dataset is associated with an I/O gateway if the dataset is being re-architected by the I/O gateway.
If the requested dataset is not associated with an I/O gateway, then at 606, the user request is processed normally. For example, the user request may be handled through a data processing region created by database manager 232, as shown in
If the requested dataset is associated with an I/O gateway, then at 608, the database manager identifies the I/O gateway that is associated with the dataset. At 610, database manager 232 provides the user request to the identified I/O gateway. Thus, database manager 232 receives user requests and funnels them to the appropriate I/O gateway (if any) to allow the I/O gateway to manage user requests during the migration of data from source dataset 242 to target dataset 252. This process may continue as long as at least one I/O gateway is still running in the database environment.
At 702, I/O gateway 234 receives a user request for access to a data row in source dataset 242 during the migration of its data rows to target dataset 252. At 704, a determination is made as to whether the requested data row is currently selected to be migrated. In some possibly rare scenarios, a user request for access to a data row may happen simultaneously with a background migration process (e.g., 236) selecting the same data row for migration. In this scenario, the user request may be briefly halted until the migration of the requested data row is complete. Accordingly, if the requested data row is currently selected for migrating, then at 706, I/O gateway 234 temporarily blocks the user request. At 708, a determination may be made that the data row migration is complete. At 710, once the data row migration is complete, the user request is processed by the I/O gateway.
At 712, a determination is made as to whether the requested data row has been migrated to the target dataset. In one example, the last migrated key value and the key value of the requested data row can be used to determine whether the requested data row has already been migrated. The last migrated key value can be obtained from a control block of either the source dataset or the target dataset. In one example implementation, if the key value of the requested data row is less than or equal to the last migrated key value, then the requested data row has already been migrated. Conversely, if the key value of the requested data row is greater than the last migrated key value, then the requested data row has not been migrated.
If the requested data row has not been migrated to the target dataset, then at 714, a determination is made as to whether the requested data row is currently in a buffer in memory. The requested data row may be in a buffer in memory with its source block if the data row was previously requested by a user request. The source block is the block of data in the source dataset that contains the data row. For example, if the dataset architecture of the source dataset is defined as 4K byte blocks, then a 4K byte block of data containing the requested data row may be stored in buffer memory if access to the data row was previously requested by a user.
In at least one embodiment, a source block flag (or any other suitable indicator) may be set for each block of the source dataset that is loaded into memory. In this example, at 714, the determination of whether the requested data row is already in memory can be made by determining whether a source block flag is set for the source block that contains the requested data row. If the source block flag is set, then the source block is in memory and therefore, the requested data row is in memory.
If the requested data row is not already loaded in buffer memory, then at 716, a block of data that contains the requested data row is located in the source dataset, retrieved by I/O gateway 234, and loaded into a particular area of memory used by I/O gateway. In addition, a source block flag associated with the source block may be set to indicate that the particular source block has been loaded into memory in response to a user request.
Once the source block containing the requested data row is loaded into memory, or if the source block containing the requested data row was already loaded in memory, at 718, the requested data row from the source block in memory is provided to a user terminal associated with the user request for access to the data row.
With reference again to 712, if the requested data row has already been migrated to target dataset 252, then flow passes to 720 of
In at least one embodiment, a target block flag (or any other suitable indicator) may be set for each block of the target dataset that is loaded into memory. In this example, at 720, the determination of whether the requested data row is already in memory can be made by determining whether a target block flag is set for the target block that contains the requested data row. If the target block flag is set, then the target block is in memory and therefore, the requested data row is in memory.
Even if the requested data row has not been previously requested, the requested data row may be loaded in memory if the target block containing the requested data row is “active.” A target block is “active” if the target block is currently receiving and storing data rows being migrated. If a target block containing a requested data row is active, then the target block may not be filled to capacity and may still have additional space to receive data rows migrating from the source dataset. For example, the active target block may be partially filled (e.g., 20 data rows of 40 possible data rows are stored in the target block). If the I/O gateway receives a user request for access to a data row that has already been migrated to the target dataset and stored in this active target block, which is still in memory, then the user request is processed using this active target block in buffer memory that is already in place.
If the target data block that contains the requested data row is not currently loaded in buffer memory, as determined at 720, then at 724, the target data block containing the requested data row can be located and retrieved from target dataset 252 and loaded into buffer memory. In addition, a target block flag may be set to indicate that the particular target data block has been loaded into memory in response to a user request.
Once the target block that contains the requested data row is loaded in buffer memory, then flow can proceed to 718 in
At 802, I/O gateway 234 receives a user request to modify a data row in source dataset 242. At 804, a determination is made as to whether the requested data row is currently selected to be migrated. In some possibly rare scenarios, a user request to modify a data row may happen simultaneously with the background migration process (e.g., 236) selecting the same data row for migration. In this scenario, the user request may be temporarily blocked until the requested data row has been migrated. Accordingly, if the requested data row is currently selected for migrating, then at 806, I/O gateway 234 temporarily blocks the user request. At 808, a determination may be made that the data row migration is complete. At 810, once the data row migration is complete, the user request is processed by the I/O gateway.
At 812, a determination is made as to whether the requested data row has been migrated to the target dataset. In one example, the last migrated key value and the key value of the requested data row can be used to determine whether the requested data row has already been migrated. The last migrated key value can be obtained from a control block of either the source dataset or the target dataset. In one example implementation, if the key value of the requested data row is less than or equal to the last migrated key value, then the requested data row has already been migrated. Conversely, if the key value of the requested data row is greater than the last migrated key value, then the requested data row has not been migrated.
If the requested data row has not been migrated from the source dataset to the target dataset, then at 814, the data row is modified in the source dataset based on user access to a source block in memory. In this scenario, the modification can be made based on the same block size in memory and in storage. This is because block size of the source block loaded in memory (e.g., old block size 4K) is the same as the block size defined for the source dataset in the source storage device (e.g., old block size 4K). Modifications of data can include changing the content of the data row, deleting the data row, compressing or decompressing the data row, encrypting the data row, etc.
If the requested data row has already been migrated to the target dataset, as determined at 812, then at 816 the data row contained in a target block loaded in memory (e.g., new block size 27K) is updated based on the new block size. If the data row has been migrated, then the data row is modified in the target dataset even if the modification was requested by a user based on the user accessing the data row via a source block of the source dataset that is loaded in memory.
The internal processing of the user modification request for a data row in the source or target data block size is completely transparent to the user. The database manager in concert with the I/O gateway, manages all aspects of the data block size management and makes the process transparent to the end-user.
At 902, I/O gateway 234 receives a user request to add a new data row in source dataset 242. At 904, a determination is made as to whether the migration process has been started. If it has not started, then at 906 the data row is added to the source dataset following normal processing procedures.
If the migration process has begun, then at 908, the I/O gateway 234 directs the addition of the new row to the target dataset 252. The I/O gateway 234 finds space in the current active target block and memory for the new data row.
At 910, the new data row is added to the located space in the current active target block and memory. The addition of the new data row by the I/O gateway 234 is synchronized with the migration activity. Synchronizing data row additions with migration activity allows concurrent migrations with data row additions.
At 912, the migration control block is updated (e.g., by a key value associated with the newly added data row) so that any future access requests for this new data row will be directed to the target dataset 252.
At 1002, a database manager 232 may receive a utility process request that requires access to data of a source dataset (e.g., 242) being re-architected. At 1004, database manager 232 determines whether the requested dataset is currently in an architecture change process. If the requested dataset is not being re-architected, then at 1006, the utility process may be allowed to proceed.
If the requested dataset is currently in an architecture change process, then at 1008, a determination is made as to whether the utility process conflicts with the migration. If the utility process is determined to not conflict with the migration, then at 1010, the utility process is allowed to run and the utility processes that access data rows are handled by the I/O gateway providing full integration with the data row migration.
If the utility process is determined to conflict with the migration, then at 1012, a determination is made as to whether an alternative utility process is available and authorized to run. Determining whether an alternative utility process is authorized to run can include, but is not limited to, requesting authorization from an authorized user (e.g., DBA) or determining whether running the utility has been pre-authorized.
If an alternative utility process is not available or is determined to not be authorized to run, then at 1014, the database manager may block the utility process until the architecture change process is complete.
If an alternative utility process is available and authorized to run during an architecture change process, then at 1016, the database manager can issue a command for I/O gateway 234 to run the alternative utility process.
An alternative utility process can be configured to allow the I/O gateway to integrate the alternative utility process with the background migration process. In one example, the alternative utility process issues requests to access data to the I/O gateway. The I/O gateway receives the utility process requests and, for each request, may use a process similar to flows previously described herein for data access requests (e.g.,
In another example, the I/O gateway may allow the alternative utility process to access data sequentially, as it is migrated to the target dataset. For example, if an alternative backup utility is run by the I/O gateway, then the I/O gateway may establish a connection to a backup storage device, and then provide the alternative backup utility with access to data rows after they are successfully migrated to the target dataset.
In flowchart 1100, at 1102, the I/O gateway receives a command to pause the architecture change process. In at least some embodiments, the database manager sends this command to the I/O gateway after receiving a command to pause the process from an authorized user or authorized process. In one example scenario, a command to pause the process may be received in order to allow an emergency action to proceed (e.g., stopping and restarting the system). In another example scenario, a DBA may pause the migration process to lessen the load on the database region while another critical process (e.g., billing) completes.
At 1104, the I/O gateway pauses the architecture change process. For example, the I/O gateway stops migrating data rows. The I/O gateway may still process user data requests using the data rows in the source and target datasets. At this point, the DBA (or other system manager) may decide to take the system down and perform the action that triggered the need to pause the architecture change process.
At 1106, once a determination is made to resume system processing (e.g., the unscheduled maintenance is complete), a command is received to restart the architecture change process. For example, the database manager may send the command to restart the architecture change process based on the completion of the system event (e.g., maintenance utility completes) or based on a command from the authorized user or process to restart the architecture change process.
At 1108, the I/O gateway identifies a location in the source dataset where data migration is to resume. In one embodiment, the I/O gateway may retrieve the last migrated key value from the control block of the target dataset and/or the source dataset. The last migrated key value indicates the last data row in a sequence of all data rows in the source dataset that was successfully migrated. The I/O gateway may then select the next data row from the source dataset based on the next key value in the sequence after the last migrated key value. The I/O gateway may resume migration using this selected data row.
The flowcharts and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various aspects of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed sequentially, substantially concurrently, or in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terminology used herein is for the purpose of describing particular aspects only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes,” and/or “including,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of any means or step plus function elements in the claims below are intended to include any disclosed structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present disclosure has been presented for purposes of illustration and description but is not intended to be exhaustive or limited to the disclosure in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the disclosure. The aspects of the disclosure herein were chosen and described in order to best explain the principles of the disclosure and the practical application, and to enable others of ordinary skill in the art to understand the disclosure with various modifications as are suited to the particular use contemplated.