1. Technical Field
The present application relates generally to an improved data processing system and method. More specifically, the present application is directed to a system and method for correlated analysis of data recovery readiness for data assets.
2. Description of Related Art
Within many enterprises, data assets may be comprised of different types of data depending upon how the data assets are used and the importance of the data assets to the enterprises. Some examples of data assets are applications, databases, servers, files, file systems, and the like. Similarly, there are many tools for managing these data assets and the data on these data assets. For instance, one known enterprise data protection product provides many different data protection techniques to protect data assets within an enterprise, corporation, remote office, or the like. Some of these data protection techniques are: progressive incremental backup, image backup, snapshot (orchestration and management), and continuous data protection. However, other known data protection management solutions do not provide a mechanism to plan for and orchestrate recovery operations based upon recovery objectives for an organization. Another data asset management tool is a storage resource management (SRM) product. The SRM product helps to monitor and manage storage resources within an enterprise including capacity planning and reporting, provisioning, replication management, and such.
However, the combination of data protection products and data asset management tools do not offer a holistic or integrated view with the focus on:
That is, known data protection and storage resource monitoring systems are missing an integrated approach that offers analysis and assessment of “recovery readiness” for data assets. This existing deficit within the storage protection and management tools significantly increases the complexity of protecting and managing data. More importantly, known data protection and storage resource monitoring systems leave the determination about the readiness and efficacy of recovery to a manual process if readiness and efficacy of recovery are being considered at all.
The illustrative embodiments provide a system and method for correlated analysis of data recovery readiness for data assets. An assessment of an estimated time of recovery for the data asset is performed based on the amount of data to be recovered, network activity, and previous backup, recovery, and/or data movement operations. A status check is performed of resources, such as backup servers, replication utilities, networks, and the like, that would be required for the recovery operation. Then, an analysis is performed of any existing disaster recovery plans, recovery service level agreements, recovery time objectives (RTO), and/or recovery point objectives (RPO) to determine what kinds of recovery are expected or desired for the data asset.
The illustrative embodiments provide for determining the recovery readiness of a data asset. The illustrative embodiments identify a set of metrics for a current recovery operation performed for the data asset. The illustrative embodiments identify a current recovery objective for the data asset. The illustrative embodiments apply the current recovery operation to the data asset using the set of metrics. The illustrative embodiments determine if the current recovery operation meets the recovery objective for the data asset. The illustrative embodiments present an error indicating the failure to meet the current recovery objective for the data asset using the current recovery operation in response to a failure of the current recovery operation to meet the recovery objective.
In the illustrative embodiments, a set of recovery policies may be identified that are available to apply to the data asset. In the illustrative embodiments, a determination may be made as to whether a different recovery policy from the set of recovery policies may meet the recovery objective for the data asset, in response to a failure of the current recovery operation to meet the recovery objective. In the illustrative embodiments, the different recovery policy may be implemented to meet the recovery objective for the data asset in response to determining the different recovery policy that meets the recovery objective for the data asset. In the illustrative embodiments, an error may be presented indicating the inability to determine the different recovery policy that meets the recovery objective for the data asset in response to an inability to determine the different recovery policy that meets the recovery objective for the data asset.
In the illustrative embodiments, a set of recovery policies may be identified that are available for the data asset. In the illustrative embodiments, a determination may be made as to whether a different recovery policy from the set of recovery policies may meet the recovery objective for the data asset in response to a failure of the current recovery operation to meet the recovery objective. In the illustrative embodiments, a user may be prompted for an indication to implement the different recovery policy that meets the recovery objective for the data asset in response to determining the different recovery policy that meets the recovery objective for the data asset. In the illustrative embodiments, the different recovery policy may be implemented to meet the recovery objective for the data asset in response to the indication to implement the different recovery policy that meets the recovery objective for the data asset.
In the illustrative embodiments, the determination if the current recovery operation meets the recovery objective for the data asset may be performed at the time of data protection. In the illustrative embodiments, performing the determination at the time of data protection may allow a corrective data protection operation to be applied prior to a recovery scenario.
In the illustrative embodiments, a first indication may be presented indicating that the recovery objective for the data asset is met in response to the current recovery operation meeting the recovery objective. In the illustrative embodiments, data protection resources may be identified that are required to recover the data asset. In the illustrative embodiments, a status of the data protection resources may be checked in order to determine if they are available and able to perform a data asset recovery. In the illustrative embodiments, a second indication may be presented indicating compromised recovery status for the data asset in response to the status indicating an inability of the data protection resources being able to recover the data asset.
In other illustrative embodiments, a computer program product comprising a computer useable medium having a computer readable program is provided. The computer readable program, when executed on a computing device, causes the computing device to perform various ones, and combinations of, the operations outlined above with regard to the method illustrative embodiment.
In yet another illustrative embodiment, a system is provided. The system may comprise a processor and a memory coupled to the processor. The memory may comprise instructions which, when executed by the processor, cause the processor to perform various ones, and combinations of, the operations outlined above with regard to the method illustrative embodiment.
These and other features and advantages of the present invention will be described in, or will become apparent to those of ordinary skill in the art in view of, the following detailed description of the exemplary embodiments of the present invention.
The invention, as well as a preferred mode of use and further objectives and advantages thereof, will best be understood by reference to the following detailed description of illustrative embodiments when read in conjunction with the accompanying drawings, wherein:
The illustrative embodiments provide mechanisms for assessing data recovery readiness for data assets. As such, the mechanisms of the illustrative embodiments are especially well suited for implementation within a distributed data processing environment and within, or in association with, data processing devices, such as servers, client devices, and the like. In order to provide a context for the description of the mechanisms of the illustrative embodiments,
With reference now to the figures,
In the depicted example, server 104 and server 106 are connected to network 102 along with storage unit 108. In addition, clients 110, 112, and 114 are also connected to network 102. These clients 110, 112, and 114 may be, for example, personal computers, network computers, or the like. In the depicted example, server 104 provides data, such as boot files, operating system images, and applications to the clients 110, 112, and 114. Clients 110, 112, and 114 are clients to server 104 in the depicted example. Distributed data processing system 100 may include additional servers, clients, and other devices not shown.
In the depicted example, distributed data processing system 100 is the Internet with network 102 representing a worldwide collection of networks and gateways that use the Transmission Control Protocol/Internet Protocol (TCP/IP) suite of protocols to communicate with one another. At the heart of the Internet is a backbone of high-speed data communication lines between major nodes or host computers, consisting of thousands of commercial, governmental, educational and other computer systems that route data and messages. Of course, distributed data processing system 100 may also be implemented to include a number of different types of networks, such as for example, an intranet, a local area network (LAN), a wide area network (WAN), a storage area network (SAN), or the like. As stated above,
If distributed data processing system 100 is implemented as a storage area network (SAN), the SAN may be implemented as a Fibre Channel (FC) compliant SAN. Fibre Channel is a scalable technology data transfer interface technology that maps several common transport protocols, including Internet Protocol (IP) and Small Computer System Interface (SCSI), allowing it to merge high-speed I/O and networking functionality in a single connectivity technology. Fibre Channel is a set of open standards defined by American National Standards Institute (ANSI) and International Organization for Standardization (ISO). Detailed information regarding the various Fibre Channel standards is available from ANSI Accredited Standards Committee (ASC) X3T11 (www.t11.org), which is primarily responsible for the Fibre Channel project. These standards are collectively referred to in this specification as the Fibre Channel standard or the Fibre Channel specification. Fibre Channel operates over both copper and fiber optic cabling at distances of up to 10 Kilometers and supports multiple inter-operable topologies including point-to-point, arbitrated-loop, and switching (and combinations thereof).
With reference now to
In the depicted example, data processing system 200 employs a hub architecture including north bridge and memory controller hub (NB/MCH) 202 and south bridge and input/output (I/O) controller hub (SB/ICH) 204. Processing unit 206, main memory 208, and graphics processor 210 are connected to NB/MCH 202. Graphics processor 210 may be connected to NB/MCH 202 through an accelerated graphics port (AGP).
In the depicted example, local area network (LAN) adapter 212 connects to SB/ICH 204. Audio adapter 216, keyboard and mouse adapter 220, modem 222, read only memory (ROM) 224, hard disk drive (HDD) 226, CD-ROM drive 230, universal serial bus (USB) ports and other communication ports 232, and PCI/PCIe devices 234 connect to SB/ICH 204 through bus 238 and bus 240. PCI/PCIe devices may include, for example, Ethernet adapters, add-in cards, and PC cards for notebook computers. PCI uses a card bus controller, while PCIe does not. ROM 224 may be, for example, a flash binary input/output system (BIOS).
HDD 226 and CD-ROM drive 230 connect to SB/ICH 204 through bus 240. HDD 226 and CD-ROM drive 230 may use, for example, an integrated drive electronics (IDE) or serial advanced technology attachment (SATA) interface. Super J/O (SIO) device 236 may be connected to SB/ICH 204.
An operating system runs on processing unit 206. The operating system coordinates and provides control of various components within the data processing system 200 in
As a server, data processing system 200 may be, for example, an IBM® eServer™ System p™ computer system, running the Advanced Interactive Executive (AIX®) operating system or the LINUX® operating system (eServer, System p and AIX are trademarks of International Business Machines Corporation in the United States, other countries, or both while LINUX is a trademark of Linus Torvalds in the United States, other countries, or both). Data processing system 200 may be a symmetric multiprocessor (SMP) system including a plurality of processors in processing unit 206. Alternatively, a single processor system may be employed.
Instructions for the operating system, the object-oriented programming system, and applications or programs are located on storage devices, such as HDD 226, and may be loaded into main memory 208 for execution by processing unit 206. The processes for illustrative embodiments of the present invention may be performed by processing unit 206 using computer usable program code, which may be located in a memory, such as, for example, main memory 208, ROM 224, or in one or more peripheral devices 226 and 230, for example.
A bus system, such as bus 238 or bus 240 as shown in
Those of ordinary skill in the art will appreciate that the hardware in
Moreover, the data processing system 200 may take the form of any of a number of different data processing systems including client computing devices, server computing devices, a tablet computer, laptop computer, telephone or other communication device, a personal digital assistant (PDA), or the like. In some illustrative examples, data processing system 200 may be a portable computing device which is configured with flash memory to provide non-volatile memory for storing operating system files and/or user-generated data, for example. Essentially, data processing system 200 may be any known or later developed data processing system without architectural limitation.
The illustrative embodiments provide a new level of correlated analysis of data recovery readiness for data assets. While known systems provide for reporting, monitoring, and analysis of backup operations, those known systems do not perform these functions for all of the different types of backup and recovery options that are available for the data asset. One illustrative embodiment of the present invention provides for the discovery, monitoring, and reporting of different types of backup and recovery options, such as traditional backup, block level continuous data protection, file level continuous data protection, application replication, and the like, that are available for the data asset, which may be a file, file system, database table, application, server, or the like. This illustrative embodiment provides an assessment of an estimated time of recovery for the data asset based on the amount of data to be recovered, network activity, and/or previous backup, recovery, and/or data movement operations. This illustrative embodiment provides a status check of resources, such as backup servers, replication utilities, networks, and the like, that would be required for the recovery operation. This illustrative embodiment further provides for an analysis of any existing disaster recovery plans, recovery service level agreements, recovery time objectives (RTO), and/or recovery point objectives (RPO) to determine what kinds of recovery are expected or desired for the data asset.
Another illustrative embodiment provides mechanisms for generating a recovery health index for a data asset. The recovery health index uses a recovery health status indicator to represent the overall recoverability of data assets. This recovery health index presents a summary view of the correlated analysis of data recovery readiness for the data asset. Yet another illustrative embodiment provides mechanisms for generating a recovery orchestration by evaluating policies at the time a backup is performed. For example, at the time a backup is performed, the backup application may make a determination of the likelihood that it will be able to meet the recovery objectives for the data being backed up. If the backup application determines that it is not possible to meet those objectives with the currently configured backup method, the application may alert an administrator that it may not be possible to meet the recovery objectives for this data. Optionally, the backup application may automatically implement a backup strategy for the data asset that is better able to meet the required recovery objectives.
Data protection application 306 may store and retrieve data from protected data assets 318, which are a subset of data assets 302 that require monitoring for recovery readiness, using known network based protocols and transmissions of data to and from data protection server 310 or data protection agent 312. Data protection application 306 may further manage the data from protected data assets 318 by storing or retrieving the data to or from data protection resources 304. Alternatively, data protection application 306 may store or retrieve the backed-up data of protected data assets 318 directly from data protection resources 304. Through a coordinated use of data protection resources 304 with data protection application 306 and while using advance storage capabilities, such a storage area networks (SANs) and fibre channel (FC) protocols, data protection agent 312 may have direct access to data protection resources 304. Therefore, data protection application 306 performs a data protection operation by reading data from protected data assets 318 and transferring the data to a targeted one of data protection resources 304, or reading the data from data protection resources 304 and writing the data to protected data assets 318.
Data recovery readiness correlation engine 308 may include data protection management module 314 and recovery analysis and orchestration (RAO) module 316. Data protection management module 314 may manage data protection application 306, which is executed on data protection server 310 and/or data protection agent 312. Data protection management module 314 may evaluate the data from protected data assets 318 and data assets 302 in terms of where the data is located and the importance of the data to the user. Similarly, data protection management module 314 may be responsible for evaluating and determining what data protection methodologies are being deployed to protect the data on data assets 302, specifically the data in protected data assets 318. Data protection management module 314 may keep track of the physical storage resources provisioned and in use in data protection resources 304, protected data assets 318 backed-up on data protection resources 304, data transfer rates as data is moved between protected data assets 318 and data protection resources 304, data placement correlation to know where data is kept on data assets 302 or data protection resources 304, and/or the like.
For example, while a primary copy of the data resides in protected data assets 318, a backup or replica copy of the data may exist in data protection resources 304. The replica may have been created by data protection application 306 running on data protection server 310 and/or data protection agent 312. Data protection application 306 may have created the backup or replica copy of the data on data protection resources 304 using a hardware level disk snapshot, a software copy command, or the like.
Both data protection management module 314 and data protection application 306 may use one or more databases to store various types of information regarding the data in protected data assets 318 that is being backed-up or recovered. Data protection application 306 may track metadata that describes the data in protected data assets 318, the location of the data on data protection resources 304, or the like. Similarly, data protection management module 314 may also track data assets 302 in terms of the type of data, the location of the data in data assets 302, protected data assets 318, data transfer times, other descriptive information about that data, and/or the like. Data protection management module 314 and data protection application 306 may use various discovery technologies, such as agents and standards based discovery application programming interfaces (APIs), to collect the information about the data protection resources 304 and data assets 302.
RAO module 316 may interact with the data protection management module 314, data protection server 310, data assets 302, and/or data protection resources 304 in order to discover or retrieve a variety of information to determine the data recovery readiness for data assets 302 that require protection, such as:
Thus, RAO module 316 may interrogate data protection management module 314 and/or data protection application 306 running on data protection server 310 to determine the type of data protection and methodology, referred to herein as metrics, being applied to each of protected data assets 318. RAO module 316 may also collect metric information on how the existing backup operation performs, such as time required for current operation to complete, size of data asset to be protected, number of channels available to protect the data asset, throughput rates achieved for the backup operation, resources used to perform the backup, and the like.
In order to perform a correlated analysis of data recovery readiness, RAO module 316 may solicit a set of recovery policies from policy manager 320 associated with data protection management module 314 and/or data protection application 306 running on data protection server 310. Policy manager 320 may include attributes, such as the scope of the policy, the recovery objective of the policy, or the like. The scope of the policy refers to the data assets covered by the policy, such as a file, directory, drive, server, or application. The recovery objective of the policy refers to the type of recovery operation to be enforced, such as recovery point, recovery time, or the like.
Once RAO module 316 has identified the recovery policies and the metrics, RAO module 316 may then apply the recovery operation associated with each data asset in protected data assets 318 using the metrics that were collected for that data asset. The metrics are applied in such a way as to establish a recovery health index (RHI) for each of protected data assets 318. The RHI may be derived by using the information for the backup(s) of protected data assets 318 as being a representative indication of the restore capacity and throughput that could be achieved during recovery of protected data assets 318. Thus, if protected data assets 318 are managed using a recovery time objective (RTO), then RAO module 316 may compare the available recent backup information, such as transfer times, amount of data, backup location, or the like, compared to the recovery time objective desired for each of protected data assets 318. If the RTO is for recovery in 120 minutes or less, RAO module 316 uses the time it takes to perform the backup for each of protected data assets 318 divided by the RTO to establish the RHI for each of protected data assets 318. For example, if the time it takes to perform the backup for a given data asset is 130 minutes, then the RHI would be 130/120=1.08. RAO module 316 may then use an algorithm to alert or take action for each of protected data assets 318 with a RHI greater than a predetermined value, such as 1.0. RAO module 316 identifies each of protected data assets 318 where the RHI is equal to or less than the predetermined value as being adequately provided for recovery based on the existing approach.
RAO module 316 may also use the RHI algorithm to determine if a recovery point objective (RPO) may be met. For example, if one of protected data assets 318 has an RPO of a snapshot or image 12 times per day, RAO module 316 considers the type of snapshot technology being used and a frequency that the snapshot is being captured. The determination made by RAO module 316 depends upon the reporting capabilities of data protection management module 314, the capabilities of data protection application 306, and/or monitoring of protected data assets 318.
The following pseudo-code illustrates how this RHI algorithm would work for the various ones of protected data assets 318 being considered:
If scope is a single FILE,
If scope is a DIRECTORY,
If scope is a DRIVE, such as a file space,
If scope is a SERVER,
If scope is an APPLICATION, such as a database application,
For those ones of protected data assets 318 with an RHI equal to or less than the predetermined value, RAO module 316 may identify those data assets as needing no change. For those ones of protected data assets 318 with an RHI greater than the predetermined value, RAO module 316 may identify those data assets as not being able to meet the required recovery needs and additional actions may be needed.
Once RAO module 316 identifies the particular ones of protected data assets 318 for which recovery system 300 is able to meet the required recovery needs, RAO module 316 may then evaluate the devices, snapshots, or other resources needed to recover those data assets. RAO module 316 may evaluate the resources used for the recovery compared to those available for the recovery. That is, RAO module 316 analyzes the data protection resources of data protection resources 304 that are used to protect the particular data asset being evaluated and the remaining ones of data protection resources 304 that are available for data protection. If RAO module 316 determines that the required data protection resources are not readily available, then RAO module 316 may update the RHI to reflect that recovery may not be possible and would list the resources that are needed which may not be available. Similarly, RAO module 316 may make recommendations about alternate data protection approaches that could be used to satisfy the recovery policies for protected data assets 318.
Once RAO module 316 identifies the particular ones of protected data assets 318 for which recovery system 300 is not able to meet the required recovery needs, RAO module 316 may determine if a policy could be implemented that would enable recovery system 300 to meet the required recovery needs. RAO module 316 may analyze each of protected data assets 318 for which recovery system 300 is not able to meet the required recovery needs, may analyze the data protection application 306 and interfaces available for effecting a change, and may analyze the data protection resources 304 that are available.
If RAO module 316 determines that a recovery policy could be implemented that would meet the required recovery needs of protected data assets 318, then RAO module 316 may automatically implement the determined recovery policy based upon user preferences. Otherwise, RAO module 316 may alert an administrator of the determined recovery policy, so that the administrator may implement the recovery policy. If RAO module 316 fails to determine a recovery policy that could be implemented that would meet the required recovery needs of protected data assets 318, RAO module 316 may alert an administrator that a recovery policy to meet the required recovery needs of protected data assets 318 could not be determined.
Accordingly, blocks of the flowchart illustration support combinations of means for performing the specified functions, combinations of steps for performing the specified functions and program instruction means for performing the specified functions. It will also be understood that each block of the flowchart illustration, and combinations of blocks in the flowchart illustration, can be implemented by special purpose hardware-based computer systems which perform the specified functions or steps, or by combinations of special purpose hardware and computer instructions.
Furthermore, the flowchart is provided to demonstrate the operations performed within the illustrative embodiments. The flowchart is not meant to state or imply limitations with regard to the specific operations or, more particularly, the order of the operations. The operations of the flowchart may be modified to suit a particular implementation without departing from the spirit and scope of the present invention.
Thus, the illustrative embodiments provide for a new level of correlated analysis of data recovery readiness. In the illustrative embodiments, an assessment is made of a current recovery policy applied to the data asset. If the current recovery policy does not meet a required recovery parameter, the illustrative embodiments provide an analysis of existing recovery operations that will meet the required recovery parameter. If an existing recovery operation exists that will meet the required recovery parameter, the illustrative embodiments will use the identified recovery operation for the data asset.
With regard to
If at step 408 the applied recovery operation meets the required objective for the protected data asset, then the RAO module identifies the protected data asset as needing no change for the applied recovery operation (step 410). The RAO module then determines if there is another protected data asset to analyze (step 412). If at step 412 there is an additional protected data asset to analyze, then the operation returns to step 402. If at step 412 there is not an additional protected data asset to analyze, then the operation ends.
If at step 408 the applied recovery operation fails to meet the required objective for the protected data asset, then the RAO module determines if a recovery policy could be implemented that would meet the required recovery needs for the protected data asset (step 414). If at step 414 the RAO module determines a recovery policy that could be implemented that would meet the required recovery needs of the protected data asset, then the RAO module may implement the determined recovery policy automatically based upon user preferences (step 416), with the operation continuing to step 412 thereafter. If at step 414 the RAO module fails to determine a recovery policy that could be implemented that would meet the required recovery needs of the protected data asset, then the RAO module sends an error that the applied recovery operation does not meet recovery objectives and another recovery policy could not be determined (step 418), with the operation continuing to step 412 thereafter.
Thus, the illustrative embodiments provide for a new level of correlated analysis of data recovery readiness. In the illustrative embodiments an assessment is made of a current recovery policy applied to the data asset to obtain an estimated time of recovery for the data asset based on the amount of data to be recovered and previous backup, recovery, and/or data movement operations. If the current recovery policy does not meet a required recovery parameter, the illustrative embodiments provide a status check of resources and provide for an analysis of any existing recovery operations in order to meet the required recovery parameter. The illustrative embodiments provide for a recovery health index for a data asset that is used for a correlated analysis of data recovery readiness for the data asset. By evaluating policies at the time a backup is performed the illustrative embodiments are able to determine if a particular recovery operation for a data asset will be able to meet the recovery objectives for the data asset being backed up. If the backup application determines that it is not possible to meet those objectives with the currently configured backup method, the application may alert an administrator that it may not be possible to meet the recovery objectives for this data. Optionally, the backup application may automatically implement a backup strategy for the data asset that is better able to meet the required recovery objectives.
It should be appreciated that the illustrative embodiments may take the form of a specialized hardware embodiment, a software embodiment that is executed on a computer system having general processing hardware, or an embodiment containing both specialized hardware and software elements that are executed on a computer system having general processing hardware. In one exemplary embodiment, the mechanisms of the illustrative embodiments are implemented in a software product, which may include but is not limited to firmware, resident software, microcode, etc.
Furthermore, the illustrative embodiments may take the form of a computer program product accessible from a computer-usable or computer-recordable medium providing program code recorded thereon for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer-recordable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The medium may be an electronic, magnetic, optical, electromagnetic, or semiconductor system, apparatus, or device. Examples of a computer-recordable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk, and an optical disk. Current examples of optical disks include compact disk-read-only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers. Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modems and Ethernet cards are just a few of the currently available types of network adapters.
The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.