Independent parallel on demand recovery of data replicas in a storage system

Information

  • Patent Grant
  • 10235092
  • Patent Number
    10,235,092
  • Date Filed
    Thursday, December 15, 2016
    8 years ago
  • Date Issued
    Tuesday, March 19, 2019
    5 years ago
Abstract
Described embodiments provide devices, systems and methods for operating a storage system. An object store located at the replication site stores data objects associated with data stored in storage of the production site. The replication site may generate a plurality of points in time (PITs) from the data objects and identify a PIT from the plurality of PITs.
Description
BACKGROUND

A storage system may include a production site having a plurality of storage devices (e.g., storage arrays) to provide data storage. The storage system may include data protection systems that back up production site data by replicating production site data on a backup storage system. The backup storage system may be situated in the same physical location as the production storage system, or in a physically remote (e.g., virtual or cloud) location. The production site data may be replicated on a periodic basis and/or may be replicated as changes are made to the production site data.


SUMMARY

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.


One aspect provides a storage system that includes a production site and a replication site. An object store located at the replication site stores data objects associated with data stored in storage, e.g., logical units or virtual disks, of the production site. The replication site generates a plurality of points in time (PITs) from the data objects and identifies a PIT from the plurality of PITs.


Another aspect provides a method for operating a storage system that includes a production site and a cloud replication site. The method includes generating an object store located at the cloud replication site. The object store stores data objects associated with data stored in storage, e.g., logical units or virtual disks, of the production site. A plurality of points in time (PITs) are generated from the data objects to identify a PIT from the plurality of PITs.


Another aspect provides a computer program product including a non-transitory computer readable storage medium having computer program code encoded thereon that when executed on a processor of a computer causes the computer to operate a storage system. The storage system includes a production site and a cloud replication site. The computer program product includes computer program code for generating an object store located at the cloud replication site. The object store stores data objects associated with data stored in storage, e.g., logical units or virtual disks, of the production site. The computer program product also includes computer program code for generating a plurality of points in time (PITs) from the data objects and identifying a PIT from the plurality of PITs.





BRIEF DESCRIPTION OF THE DRAWING FIGURES

Objects, aspects, features, and advantages of embodiments disclosed herein will become more fully apparent from the following detailed description, the appended claims, and the accompanying drawings in which like reference numerals identify similar or identical elements. Reference numerals that are introduced in the specification in association with a drawing figure may be repeated in one or more subsequent figures without additional description in the specification in order to provide context for other features. For clarity, not every element may be labeled in every figure. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating embodiments, principles, and concepts. The drawings are not meant to limit the scope of the claims included herewith.



FIG. 1A is a block diagram of a data protection system, in accordance with a first illustrative embodiment;



FIG. 1B is a block diagram of a data protection system, in accordance with a second illustrative embodiment;



FIG. 2A is a block diagram of an object store of the data protection system of FIG. 1B, in accordance with an illustrative embodiment;



FIG. 2B is a block diagram showing an illustrative relationship between objects in the object store of FIG. 2A, in accordance with a second illustrative embodiment;



FIG. 2C is a block diagram showing recovery helper virtual machines in relation to the object store of FIG. 2A, in accordance with an illustrative embodiment;



FIG. 2D is a block diagram showing an illustrative relationship between objects in the object store of FIG. 2A to generate points in time, in accordance with an illustrative embodiment;



FIG. 3 is a flow diagram of a process to perform data recovery by the data protection system of FIGS. 1A and 1B, in accordance with an illustrative embodiment;



FIG. 4 is a flow diagram of a process to rebuild a point in time (PIT) by the data protection system of FIGS. 1A and 1B, in accordance with an illustrative embodiment;



FIG. 5 is a flow diagram of a process to select a PIT by the data protection system of FIGS. 1A and 1B, in accordance with an illustrative embodiment; and



FIG. 6 is a block diagram of an example of a hardware device that may perform at least a portion of the processes in FIGS. 4-5.





DETAILED DESCRIPTION

Before describing embodiments of the concepts, structures, and techniques sought to be protected herein, some terms are explained. In some embodiments, the term “I/O request” or simply “I/O” may be used to refer to an input or output request. In some embodiments, an I/O request may refer to a data read or data write request. In some embodiments, the term “storage system” may encompass physical computing systems, cloud or virtual computing systems, or a combination thereof. In some embodiments, the term “storage device” may refer to any non-volatile memory (NVM) device, including hard disk drives (HDDs), solid state drivers (SSDs), flash devices (e.g., NAND flash devices), and similar devices that may be accessed locally and/or remotely (e.g., via a storage attached network (SAN)). In some embodiments, the term “storage device” may also refer to a storage array including multiple storage devices.


Referring to the illustrative embodiment shown in FIG. 1A, data protection system 100 may include two sites, production site 102 and replication site 122. Production site 102 may generally be a facility where one or more hosts run data processing applications that write data to a storage system and read data from the storage system. Replication site 122 may generally be a facility where replicated production site data is stored. In such embodiments, production site 102 may back up (e.g., replicate) production data at replication site 122. In certain embodiments, production site 102 and replication site 122 may be remote from one another. For example, as shown in FIG. 1B, replication site 122′ may be implemented as one or more “virtual” or “cloud” replication sites located remotely from production site 102′ and in communication via a network link (e.g., the Internet, etc.).


Replication site 122 may replicate production site data and enable rollback of data of production site 102 to an earlier point in time (PIT). Rollback may be used in the event of data corruption of a disaster, or alternatively in order to view or to access data from an earlier point in time.


As shown in FIGS. 1A and 1B, production site 102 may include a host (FIG. 1A) or virtual machine (VM) (FIG. 1B) 104, splitter 106, storage (or storage array) 110, and a data protection appliance (DPA) 108. In some embodiments, host 104 may write to a logical unit in storage 110. In embodiments such as shown in FIG. 1B, VM 104′ may write to virtual disk(s) 112′ in a virtual machine file system (VMFS) 110′. Replication site 122 may include DPA 126 and storage 130. In some embodiments, host 104 may include one or more devices (or “nodes”) that may be designated an “initiator,” a “target”, or both, coupled by communication links appropriate for data transfer, such as an InfiniBand (IB) link or Fibre Channel (FC) link, and/or a network, such as an Ethernet or Internet (e.g., TCP/IP) network that may employ, for example, the iSCSI protocol.


Storage 110 and storage 130 may include storage devices for storing data, such as disks or arrays of disks. Storage 110 may provide (e.g., expose) one or more logical units (LUs) 112 to which production commands are issued, while storage 130 may provide (e.g., expose) one or more logical units (LUs) 132 to which replication commands are issued. As described herein, an LU is a logical entity provided by a storage system for accessing data stored therein. In some embodiments, a logical unit may be a physical logical unit or a virtual logical unit, and may be identified by a unique logical unit number (LUN).


In some embodiments, DPA 108 and DPA 126 may perform various data protection services, such as data replication of storage system 100, and journaling of I/O requests issued by device 104. DPA 108 and DPA 126 may also enable rollback of production data in storage 110 to an earlier point-in-time (PIT) from replica data stored in storage 130, and enable processing of rolled back data at the target site. In some embodiments, each DPA 108 and DPA 126 may be a physical device, a virtual device, or may be a combination of a virtual and physical device.


In some embodiments, DPA 108 may be receive commands (e.g., SCSI commands) issued by device 104 to LUs 112. For example, splitter 106 may intercept commands from device 104, and provide the commands to storage 110 and also to DPA 108. Splitter 106 may act on intercepted SCSI commands issued to a logical unit in one of the following ways: send the SCSI commands to its intended LU; redirect the SCSI command to another LU; split the SCSI command by sending it first to DPA 108 and, after DPA 108 returns an acknowledgement, send the SCSI command to its intended LU; fail a SCSI command by returning an error return code; and delay a SCSI command by not returning an acknowledgement to the respective host. In some embodiments, splitter 106 may handle different SCSI commands, differently, according to the type of the command. For example, in some embodiments, a SCSI command inquiring about the size of a certain LU may be sent directly to that LU, whereas a SCSI write command may be split and sent to DPA 108.


In certain embodiments, splitter 106 and DPA 126 may be drivers located in respective host devices of production site 102 and replication site 122. Alternatively, in some embodiments, a protection agent may be located in a fiber channel switch, or in any other device situated in a data path between host/VM 104 and storage 110. In a virtualized environment, the protection agent may run at the hypervisor layer or in a virtual machine providing a virtualization layer. For example, in such embodiments, a hypervisor may consume LUs and may generate a distributed file system on the logical units such as Virtual Machine File System (VMFS) that may generate files in the file system and expose the files as LUs to the virtual machines (each virtual machine disk is seen as a SCSI device by virtual hosts). In another embodiment, a hypervisor may consume a network based file system and exposes files in the Network File System (NFS) as SCSI devices to virtual hosts.


In some embodiments, production DPA 108 may send its write transactions to replication DPA 126 using a variety of modes of transmission, such as continuous replication or snapshot replication. For example, in continuous replication, production DPA 108 may send each write transaction to storage 110 and also send each write transaction to replication DPA 126 to be replicated on storage 130. In snapshot replication, production DPA 108 may receive several I/O requests and combine them into an aggregate “snapshot” or “batch” of write activity performed to storage 110 in the multiple I/O requests, and may send the snapshot to replication DPA 126 for journaling and incorporation in target storage system 120. In such embodiments, a snapshot replica may be a differential representation of a volume. For example, the snapshot may include pointers to the original volume, and may point to log volumes for locations of the original volume that store data changed by one or more I/O requests. In some embodiments, snapshots may be combined into a snapshot array, which may represent different images over a time period (e.g., for multiple PITs).


As shown in FIG. 2A, in some embodiments, a copy of a LUN or LU may be stored in an object store (e.g., object store 130′ of FIG. 1B) of replication site 122′. Object store 130′ may be implemented as shown in FIG. 2A, and may include a set of objects that may represent data of the LUN. For example, as shown in FIG. 2A, object store 200 may include one or more disk objects 202, one or more change objects 204 and one or more metadata objects 206. Disk objects 202 may include data stored in the LUN. As will be described, change objects may represent changes to data of the LUN over time. Metadata objects 206 may describe how a set of objects representing data of the LUN correspond to or may be used to create the LUN. In some embodiments, object store 200 may be in a cloud replication site. Replication may include sending objects representing changes to one or more LUNs on production site 102 to the replication site.


As described in regard to FIGS. 1A and 1B, input/output (I/O) requests sent to a LUN or LU (e.g., 112) on a production site (e.g., 102) may be intercepted by a splitter (e.g., 106). The splitter may send a copy of the I/O to a DPA (e.g., DPA 108). The DPA may accumulate multiple I/O into an object (e.g., disk objects 202). A change object (e.g., change objects 204) may refer to an object with accumulated I/O where each I/O may change data in the disk objects. The DPA may accumulate I/O until a certain size is reached, and may then send disk object(s) and change objects representing the accumulated I/O to a cloud replication site (e.g., 122) that may include an object store (e.g., 200). In some embodiments, DPA 108 may track metadata about changes corresponding to accumulated I/O in an object as metadata objects 206. DPA 108 may send metadata objects to a cloud or an object store when the metadata object reaches a certain size. In some embodiments, DPA 108 may package the disk objects, change objects and metadata objects into an object to send to a cloud replication site.


In described embodiments, as I/O occurs to the production site LUN, object store 200 may receive a set of change objects corresponding to the changes written to the LUN. In these embodiments, the object store may receive a set of metadata objects describing the changes to the LUN in the objects. Thus, the set of change objects and the set metadata objects may be used as a journal. In such embodiments, the metadata objects and one or more portions of the change objects may be used to create new disk objects to move the copy of the LUN to a different point in time. For example, by keeping the original set of metadata objects and objects, it may be possible to access the original LUN and any point in time (PIT). By reading the metadata objects describing the set of change objects, multiple PITs may be created on the cloud replication site. In some embodiments, objects and metadata may be maintained to provide a protection window of storage system 100. For example, a protection window may correspond to a time period during which changes to a LUN are tracked. Objects and metadata objects that correspond to a PIT outside of a protection window may be deleted.


For example, referring to FIG. 2B, the relationship between the data objects of object store 200′ and the LUNs of storage 110″ are shown. As shown in FIG. 2B, at a first time, t1, image 1 of storage 110″ may be generated. Image 1 includes a plurality of disk objects, shown as disk objects 1-N, each disk object corresponding to an associated LUN of storage 110″, shown as LUNs 1-N. One or more metadata objects 206′ may include information about each of the disk objects 1-N of image 1. As changes are made to the data stored in LUNs 1-N of storage 110″ (e.g., in response to write requests), one or more change objects 204′ may be stored in object store 200′. The change objects may include data or pointers to data that was modified from the data stored in disk objects 1-N of image 1. One or more metadata objects 206′ may also include information about associations between change objects 204′ and the disk objects 1-N of image 1 (e.g., if a given change object represents a change to a given disk object). As used in FIG. 2B, N represents a given number of objects, and may be an integer greater than or equal to 1.


Referring to FIG. 2C, an illustrative block diagram is shown of cloud replication site 122″ having one or more recovery helper virtual machines (VMs), shown as recovery helper VMs 1-N. As described, one or more VMs may be generated to concurrently generate and test a plurality of PITs from the data stored in object store 200″. As used herein, concurrently may refer to two or more things happening at the same time for some period of time, e.g., at least some overlap in time.


Although shown in FIG. 2B as only including a single image (image 1), in general, object store 200′ may include a plurality of images, with each image including full disk data (e.g., disk objects) of storage 110″. For example, as shown in FIG. 2D, the images may be taken at specific time intervals, and between the image time intervals, one or more change objects 204′ may allow recovery to various points in time between the image time intervals. For example, image 1 may be generated at time t0, and include full data (e.g., disk objects) of storage 110″. Each change object 204′ may correspond to changes that occurred at a specific time, shown as change objects 1-M, occurring at times t1-tM. By modifying an earlier image (e.g., image 1) with changes indicated in the one or more change objects, recovery can be made to a point in time between image intervals, shown as PIT for time tM. Metadata objects 206″ may indicate the relationship between disk objects 1-N and change objects 1-M, which allow a new full disk image (e.g., new disk objects 1-N) to be generated to a given PIT. The PIT may then be copied to an associated volume, shown as volume 1, to be operated upon by an associated virtual machine, shown as recovery helper VM 1.


In some embodiments, one or more virtual machines may be used in the cloud or in the object store to process disk objects, change objects, and metadata objects describing the change objects to create a new PIT for a LUN. For example, the virtual machines may create new metadata objects to describe the LUN at a future point in time, where the new metadata objects may reference some of the original disk objects corresponding to the LUN and new objects that replace one or more of the original objects corresponding to the LUN.


In some embodiments, the virtual machines may be created (e.g., brought up) and run periodically to create new PITs for an object store or cloud containing changes to a copy of a LUN. If virtual machines operate periodically, there may not need to use compute power in the cloud (e.g., at the replication site) other than when the virtual machines are running. The virtual machines may process a set of change objects and a set of metadata objects to create a journal to enable a PIT to be rolled forward or backward in time. In some embodiments, the journal may be represented by a set of objects.


As described herein, a set of change objects that contain change data, and a set of metadata objects describing the change data may enable recovering or recreating the production site LUN from the replica data if a failure occurs on the production site.


Described embodiments may perform cloud recovery to multiple points in time (PITs). As described herein, storage system 100 may generate multiple snapshots of the replica copy of the LUN, each snapshot corresponding to a different PIT. Each snapshot may include one or more objects (e.g., as shown in FIG. 2) associated with LUNs of the snapshot. The objects may be used by replication site 122 to create a journal. By applying a portion of the journal (e.g., change objects) to disk data objects, multiple PITs may be recovered.


Storage system 100 may perform a recovery operation by initiating one or more virtual machines in the cloud (e.g., at the replication site) in parallel to apply journal data (e.g., change objects) to the volume data (e.g., disk objects). Thus, described embodiments may independently and simultaneously recover multiple PITs for quickly identifying/selecting a PIT for recovery (for example, the PIT with the most recent data before the production site disaster). When the PIT is selected, any redundant snapshots may be deleted to free additional storage space (e.g., in storage 132 of replication site 122).


Referring to FIG. 3, a flow diagram of illustrative data recovery process 300 is shown. At block 302, data recovery process 300 begins. For example, data recovery process 300 may be started manually by a user of data protection system 100. Alternatively, data protection system 100 may automatically perform data recovery process 300, for example when production site 102 has experienced a data loss or disaster, or to periodically identify image data for a given PIT (e.g., identify a PIT). At block 304, one or more recovery helper virtual machines (VMs) are generated, for example at replication site 122. As described herein, replication site 122 may be a cloud site. At block 306, each of the VMs generated at block 304 rebuilds a corresponding PIT from data objects stored at replication site 122 (such as in object store 200 of FIG. 2A). Block 306 is described in greater detail in regard to FIG. 4.


At block 308, a PIT is identified from the PITs rebuilt at block 306. Block 308 is described in greater detail in regard to FIG. 5. At block 310, the identified PIT is retained at replication site 122. At block 312, data protection system 100 may optionally use the retained PIT to recover the data system to the PIT. For example, the retained PIT may be employed to allow recovery of production site 122 to the PIT, or to allow a new cloud site to be generated based upon the PIT. At block 314, data protection system 100 may optionally remove or delete objects for non-identified PITs. For example, as shown in FIG. 2A, one or more objects from object store 200, such as disk objects 202, change objects 204, and/or metadata objects 206, may be deleted if the objects are not needed for the identified/selected PIT, thus reducing the amount of data required to be stored at replication site 122. At block 316, the VMs generated at block 304 may be shut down, for example to reduce processor and memory consumption at replication site 122. At block 318, process 300 completes.


Referring to FIG. 4, an illustrative process for block 306 of FIG. 3 is shown as PIT rebuilding process 306′. PIT rebuilding process 306′ may be performed multiple times concurrently to rebuild multiple PITs. In some embodiments, each VM generated at block 304 may perform process 306′ to rebuild an associated PIT. At block 402, process 306′ starts. At block 404, the associated VM identifies one or more disk objects (e.g., disk objects 202 of FIG. 2A) that are associated with the PIT. For example, the VM may identify associated disk objects based upon a LUN associated with the PIT. At block 406, the associated VM identifies one or more metadata objects (e.g., metadata objects 206 of FIG. 2A) that are associated with the PIT. For example, the VM may identify associated metadata objects based upon the LUN associated with the PIT and the desired time associated with the PIT. At block 408, the associated VM identifies one or more change objects that are associated with the PIT. For example, the VM may identify associated change objects based upon the metadata objects identified at block 406 since the identified metadata objects may identify or otherwise reference any change objects associated with the PIT. In some embodiments, such as described in regard to FIG. 2D, change objects may be identified that include changes between a most recent full image of the LUN and a time associated with the PIT.


At block 410, the VM generates one or more new disk objects and, at block 412, applies data changes to generate data for the PIT, which is stored in the new disk objects. In other words, at block 412, the VM modifies data of the disk objects identified at block 404 with data from the change objects identified at block 408, and stores the modified data in the new disk objects generated at block 410. At block 414, the new disk objects are copied to a volume that is associated with the VM. At block 416, process 306′ completes.


Referring to FIG. 5, an illustrative process for block 308 of FIG. 3 is shown as PIT identifying process 308′. PIT identifying process 308′ may be performed multiple times concurrently to quickly test a plurality of PITs and identify at least one PIT. In some embodiments, each VM generated at block 304 may perform process 308′ to test an associated PIT. At block 502, process 308′ starts. At block 504, each generated PIT (e.g., each PIT generated at block 306 of FIG. 3) is tested. In some embodiments, one or more various tests may be performed on each PIT. In some embodiments, the type of test may be based upon a type of disaster that occurred at production site 102. For example, if production site 102 experienced a virus, a virus scan may be performed on each PIT to determine a most recent PIT that does not have the virus. As a further example, if production site 102 experienced a data corruption (e.g., one or more LUNs or other data structures or tables becoming invalid or otherwise corrupt), a test may be performed on each PIT to determine a most recent PIT that has valid, uncorrupted data. Other tests may also be possible, such as searching for the availability of specific files, application-specific tests, database tests, or other tests. In some embodiments, a different PIT may be identified based upon characteristics of the PIT. For example, it may be desired to provide a given PIT to a given user of data protection system 100, while other users may receive other PITs. As another example, a given PIT may be selected as the PIT for a given type of use of the PIT. For example, a first PIT may be selected as the PIT for operating a cloud VM, while another PIT may be selected as the PIT for rolling back production site 102, while yet another PIT may be selected as the PIT for testing purposes.


At block 506, if the tests performed at block 504 are acceptable (e.g., the tests have passed), and at block 508 the PIT is the most recent PIT (e.g., the most recent PIT in relation to a desired time of interest, such as the time of a disaster or data loss at production site 102), then at block 512, the PIT is selected. If, at block 506, a test did not pass, or at block 508, a PIT is not the most recent PIT before the time of interest, then at block 510, the PIT is not identified as the preferred PIT. Once a PIT is identified at block 512, process 308′ completes at block 514.


Thus, as shown in FIGS. 2A-D, 3, 4 and 5, a given point in time can be generated based upon data objects. Given embodiments may generate a new VM with an associated empty volume in the cloud (e.g., at cloud replication site 122′). Data may be copied from one or more objects (e.g., one or more disk objects and zero or more change objects) into the volume associated with the VM. The VM is run and various tests may be performed on the VM and/or the data copied into the volume, for example to identify a PIT. Thus, as described herein, some embodiments may test multiple PITs in parallel at a cloud site to identify one or more PITs, for example depending on at least one of a type of use and an identified user of the PIT.


In some described embodiments, production site 102 and/or replication site 122 of FIGS. 1A and 1B may each correspond to one computer, a plurality of computers, or a network of distributed computers. For example, in some embodiments, production site 102 and/or replication site 122 may be implemented as one or more computers such as shown in FIG. 6. As shown in FIG. 6, computer 600 may include processor 602, volatile memory 604 (e.g., RAM), non-volatile memory 606 (e.g., one or more hard disk drives (HDDs), one or more solid state drives (SSDs) such as a flash drive, one or more hybrid magnetic and solid state drives, and/or one or more virtual storage volumes, such as a cloud storage, or a combination of physical storage volumes and virtual storage volumes), graphical user interface (GUI) 608 (e.g., a touchscreen, a display, and so forth) and input/output (I/O) device 620 (e.g., a mouse, a keyboard, etc.). Non-volatile memory 606 stores computer instructions 612, an operating system 616 and data 618 such that, for example, the computer instructions 612 are executed by the processor 602 out of volatile memory 604 to perform at least a portion of the processes described herein. Program code may be applied to data entered using an input device of GUI 608 or received from I/O device 620.


The processes described herein are not limited to use with the hardware and software of FIG. 6 and may find applicability in any computing or processing environment and with any type of machine or set of machines that may be capable of running a computer program. The processes described herein may be implemented in hardware, software, or a combination of the two.


The processes described herein are not limited to the specific embodiments described. For example, processes are not limited to the specific processing order shown described herein. Rather, any of the blocks of the processes may be re-ordered, combined or removed, performed in parallel or in serial, as necessary, to achieve the results set forth herein.


Processor 602 may be implemented by one or more programmable processors executing one or more computer programs to perform the functions of the system. As used herein, the term “processor” describes an electronic circuit that performs a function, an operation, or a sequence of operations. The function, operation, or sequence of operations may be hard coded into the electronic circuit or soft coded by way of instructions held in a memory device. A “processor” may perform the function, operation, or sequence of operations using digital values or using analog signals. In some embodiments, the “processor” can be embodied in one or more application specific integrated circuits (ASICs). In some embodiments, the “processor” may be embodied in one or more microprocessors with associated program memory. In some embodiments, the “processor” may be embodied in one or more discrete electronic circuits. The “processor” may be analog, digital or mixed-signal. In some embodiments, the “processor” may be one or more physical processors or one or more “virtual” (e.g., remotely located or “cloud”) processors.


Various functions of circuit elements may also be implemented as processing blocks in a software program. Such software may be employed in, for example, one or more digital signal processors, microcontrollers, or general purpose computers. Described embodiments may be implemented in hardware, a combination of hardware and software, software, or software in execution by one or more physical or virtual processors.


Some embodiments may be implemented in the form of methods and apparatuses for practicing those methods. Described embodiments may also be implemented in the form of program code, for example, stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium or carrier, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation. A non-transitory machine-readable medium may include but is not limited to tangible media, such as magnetic recording media including hard drives, floppy diskettes, and magnetic tape media, optical recording media including compact discs (CDs) and digital versatile discs (DVDs), solid state memory such as flash memory, hybrid magnetic and solid state memory, non-volatile memory, volatile memory, and so forth, but does not include a transitory signal per se. When embodied in a non-transitory machine-readable medium and the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the method.


When implemented on one or more processing devices, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits. Such processing devices may include, for example, a general-purpose microprocessor, a digital signal processor (DSP), a reduced instruction set computer (RISC), a complex instruction set computer (CISC), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a programmable logic array (PLA), a microcontroller, an embedded controller, a multi-core processor, and/or others, including combinations of one or more of the above. Described embodiments may also be implemented in the form of a bitstream or other sequence of signal values electrically or optically transmitted through a medium, stored magnetic-field variations in a magnetic recording medium, etc., generated using a method and/or an apparatus as recited in the claims.


Various elements, which are described in the context of a single embodiment, may also be provided separately or in any suitable subcombination. It will be further understood that various changes in the details, materials, and arrangements of the parts that have been described and illustrated herein may be made by those skilled in the art without departing from the scope of the following claims.

Claims
  • 1. A storage system comprising: a replication site configured to communicate with a production site;an object store located at the replication site, the object store storing data objects associated with data stored in storage of the production site;the replication site configured to: generate a plurality of points in time (PITs) from the data objects; andgenerate a plurality of virtual machines;copy each of the generated plurality of PITs to a volume associated with one of the virtual machines;test the plurality of pits, wherein at least some of the plurality of PITs are tested concurrently; andselect a PIT from the plurality of PITs based on an outcome of the test.
  • 2. The system of claim 1, Wherein the data objects comprise (i) a set of disk objects associated with data stored in a copy of a logical unit or virtual disk at a point in time of the production site, (ii) a set of change objects associated with one or more input/output (I/O) operations on the production site, and (iii) a set of metadata objects associated with the set of change objects.
  • 3. The system of claim 2, wherein the replication site is configured to generate the plurality of PITs by identifying, based at least in part upon the set of metadata objects: (i) one or more disk objects associated with the PIT, and (ii) one or more change objects associated with the PIT.
  • 4. The system of claim 3, wherein, to generate each PIT, the replication site is configured to: generate one or more new disk objects;generate new data by applying the identified one or more change objects to the identified one or more disk objects; andstore the new data to the one or more new disk objects.
  • 5. The system of claim 1, wherein to test each PIT, the replication site is configured to identify a most recent valid PIT in relation to a time of interest.
  • 6. The system of claim 1, wherein to test each PIT, the replication site is configured to, at least one of, validate data of the PIT and scan data of the PIT for malicious software code.
  • 7. The system of claim 1, wherein the replication site is configured to provide the identified PIT to at least one of the production site and a cloud-based virtual machine.
  • 8. The system of claim 1, wherein the replication site is further configured to delete data objects associated with non-identified ones of the plurality of PITs.
  • 9. The system of claim 1, wherein the replication site is configured to identify two or more PITs from the plurality of PITs, and provide each identified PIT to an associated identified user of the storage system.
  • 10. The system of claim 1, wherein PIT is selected concurrently with generating the plurality of PITs.
  • 11. A method for operating a storage system comprising: generating an object store located at a cloud replication site, the object store storing data objects associated with data stored in storage of a production site;generating a plurality of points in time (PITs) from the data objects; generating a plurality of virtual machines;copying each of the generated plurality of PITs to a volume associated with one of the virtual machines;testing the plurality of pits, at least some of the plurality of pits being tested concurrently; andidentifying a PIT from the plurality of PITs based on an outcome of the testing.
  • 12. The method of claim 11, wherein the data objects comprise (i) a set of disk objects associated with data stored in a copy of a logical unit or virtual disk at a point in time of the production site, (ii) a set of change objects associated with one or more input/output (I/O) operations on the production site, and (iii) a set of metadata objects associated with the set of change objects, the method further comprising generating the plurality of PITs by identifying, based at least in part upon the set of metadata objects: (i) one or more disk objects associated with the PIT, and (ii) one or more change objects associated with the PIT.
  • 13. The method of claim 12, wherein generating each PIT comprises: generating one or more new disk objects;generating new data by applying the identified one or more change objects to the identified one or more disk objects; andstoring the new data to the one or more new disk objects.
  • 14. The method of claim 11, wherein testing any of the PITs comprises at least one of: validating data of the PIT, scanning data of the PIT for malicious software code, and identifying a most recent valid PIT in relation to a time of interest.
  • 15. The method of claim 11, further comprising: providing the identified PIT to at least one of: the production site and a cloud-based virtual machine; anddeleting data objects associated with non-identified ones of the plurality of PITs.
  • 16. The method of claim 11, further comprising identifying two or more PITs from the plurality of PITs, and providing each identified PIT to an associated identified user of the storage system.
  • 17. A computer program product including a non-transitory computer readable storage medium having computer program code encoded thereon that when executed on a processor of a computer causes the computer to operate a storage system, the computer program product comprising: computer program code for generating an object store located at a cloud replication site, the object store storing data objects associated with data stored in storage of a production site;computer program code for generating a plurality of points in time (PITs) from the data objects; andcomputer program code for generating a plurality of virtual machines;computer program code for copying each of the generated plurality of PITs to a volume associated with one of the virtual machines;computer program code for testing each PIT, at least some of the PITs being tested concurrently; andcomputer program code for identifying a PIT from the plurality of PITs based on an outcome of the testing.
  • 18. The computer program product of claim 17, wherein the data objects comprise (i) a set of disk objects associated with data stored in a copy of a logical unit or virtual disk at a point in time of the production site, (ii) a set of change objects associated with one or more input/output (I/O) operations on the production site, and (iii) a set of metadata objects associated with the set of change objects, the computer program product further comprising computer program code for generating the plurality of PITs by identifying, based at least in part upon the set of metadata objects: (i) one or more disk objects associated with the PIT, and (ii) one or more change objects associated with the PIT.
  • 19. The computer program product of claim 18, further comprising: computer program code for generating one or more new disk objects;computer program code for generating new data by applying the identified one or more change objects to the identified one or more disk objects;computer program code for storing the new data to the one or more new disk objects;wherein testing each PIT comprises at least one of: validating data of the PIT, scanning data of the PIT for malicious software code, and identifying a most recent valid PIT in relation to a time of interest.
US Referenced Citations (186)
Number Name Date Kind
7203741 Marco et al. Apr 2007 B2
7719443 Natanzon May 2010 B1
7840536 Ahal et al. Nov 2010 B1
7840662 Natanzon Nov 2010 B1
7844856 Ahal et al. Nov 2010 B1
7860836 Natanzon et al. Dec 2010 B1
7882286 Natanzon et al. Feb 2011 B1
7934262 Natanzon et al. Apr 2011 B1
7958372 Natanzon Jun 2011 B1
8037162 Marco et al. Oct 2011 B2
8041940 Natanzon et al. Oct 2011 B1
8060713 Natanzon Nov 2011 B1
8060714 Natanzon Nov 2011 B1
8103937 Natanzon et al. Jan 2012 B1
8108634 Natanzon et al. Jan 2012 B1
8214612 Natanzon Jul 2012 B1
8250149 Marco et al. Aug 2012 B2
8271441 Natanzon et al. Sep 2012 B1
8271447 Natanzon et al. Sep 2012 B1
8332687 Natanzon et al. Dec 2012 B1
8335761 Natanzon Dec 2012 B1
8335771 Natanzon et al. Dec 2012 B1
8341115 Natanzon et al. Dec 2012 B1
8370648 Natanzon Feb 2013 B1
8380885 Natanzon Feb 2013 B1
8392680 Natanzon et al. Mar 2013 B1
8429362 Natanzon et al. Apr 2013 B1
8433869 Natanzon et al. Apr 2013 B1
8438135 Natanzon et al. May 2013 B1
8464101 Natanzon et al. Jun 2013 B1
8478955 Natanzon et al. Jul 2013 B1
8495304 Natanzon et al. Jul 2013 B1
8510279 Natanzon et al. Aug 2013 B1
8521691 Natanzon Aug 2013 B1
8521694 Natanzon Aug 2013 B1
8543609 Natanzon Sep 2013 B1
8583885 Natanzon Nov 2013 B1
8600945 Natanzon et al. Dec 2013 B1
8601085 Ives et al. Dec 2013 B1
8627012 Derbeko et al. Jan 2014 B1
8683592 Dotan et al. Mar 2014 B1
8694700 Natanzon et al. Apr 2014 B1
8706700 Natanzon et al. Apr 2014 B1
8712962 Natanzon et al. Apr 2014 B1
8719497 Don et al. May 2014 B1
8725691 Natanzon May 2014 B1
8725692 Natanzon et al. May 2014 B1
8726066 Natanzon et al. May 2014 B1
8738813 Natanzon et al. May 2014 B1
8745004 Natanzon et al. Jun 2014 B1
8751828 Raizen et al. Jun 2014 B1
8769336 Natanzon et al. Jul 2014 B1
8805786 Natanzon Aug 2014 B1
8806161 Natanzon Aug 2014 B1
8825848 Dotan et al. Sep 2014 B1
8832399 Natanzon et al. Sep 2014 B1
8850143 Natanzon Sep 2014 B1
8850144 Natanzon et al. Sep 2014 B1
8862546 Natanzon et al. Oct 2014 B1
8892835 Natanzon et al. Nov 2014 B1
8898112 Natanzon et al. Nov 2014 B1
8898409 Natanzon et al. Nov 2014 B1
8898515 Natanzon Nov 2014 B1
8898519 Natanzon et al. Nov 2014 B1
8914595 Natanzon Dec 2014 B1
8924668 Natanzon Dec 2014 B1
8930500 Marco et al. Jan 2015 B2
8930947 Derbeko et al. Jan 2015 B1
8935498 Natanzon Jan 2015 B1
8949180 Natanzon et al. Feb 2015 B1
8954673 Natanzon et al. Feb 2015 B1
8954796 Cohen et al. Feb 2015 B1
8959054 Natanzon Feb 2015 B1
8977593 Natanzon et al. Mar 2015 B1
8977826 Meiri et al. Mar 2015 B1
8996460 Frank et al. Mar 2015 B1
8996461 Natanzon et al. Mar 2015 B1
8996827 Natanzon Mar 2015 B1
9003138 Natanzon et al. Apr 2015 B1
9026696 Natanzon et al. May 2015 B1
9031913 Natanzon May 2015 B1
9032160 Natanzon et al. May 2015 B1
9037818 Natanzon et al. May 2015 B1
9063994 Natanzon et al. Jun 2015 B1
9069479 Natanzon Jun 2015 B1
9069709 Natanzon et al. Jun 2015 B1
9081754 Natanzon et al. Jul 2015 B1
9081842 Natanzon et al. Jul 2015 B1
9087008 Natanzon Jul 2015 B1
9087112 Natanzon et al. Jul 2015 B1
9104529 Derbeko et al. Aug 2015 B1
9110914 Frank et al. Aug 2015 B1
9116811 Derbeko et al. Aug 2015 B1
9128628 Natanzon et al. Sep 2015 B1
9128855 Natanzon et al. Sep 2015 B1
9134914 Derbeko et al. Sep 2015 B1
9135119 Natanzon et al. Sep 2015 B1
9135120 Natanzon Sep 2015 B1
9146878 Cohen et al. Sep 2015 B1
9152339 Cohen et al. Oct 2015 B1
9152578 Saad et al. Oct 2015 B1
9152814 Natanzon Oct 2015 B1
9158578 Derbeko et al. Oct 2015 B1
9158630 Natanzon Oct 2015 B1
9160526 Raizen et al. Oct 2015 B1
9177670 Derbeko et al. Nov 2015 B1
9189339 Cohen et al. Nov 2015 B1
9189341 Natanzon et al. Nov 2015 B1
9201736 Moore et al. Dec 2015 B1
9223659 Natanzon et al. Dec 2015 B1
9225529 Natanzon et al. Dec 2015 B1
9235481 Natanzon et al. Jan 2016 B1
9235524 Derbeko et al. Jan 2016 B1
9235632 Natanzon Jan 2016 B1
9244997 Natanzon et al. Jan 2016 B1
9256605 Natanzon Feb 2016 B1
9274718 Natanzon et al. Mar 2016 B1
9275063 Natanzon Mar 2016 B1
9286052 Solan et al. Mar 2016 B1
9305009 Bono et al. Apr 2016 B1
9323750 Natanzon et al. Apr 2016 B2
9330155 Bono et al. May 2016 B1
9336094 Wolfson et al. May 2016 B1
9336230 Natanzon May 2016 B1
9367260 Natanzon Jun 2016 B1
9378096 Erel et al. Jun 2016 B1
9378219 Bono et al. Jun 2016 B1
9378261 Bono et al. Jun 2016 B1
9383937 Frank et al. Jul 2016 B1
9389800 Natanzon et al. Jul 2016 B1
9405481 Cohen et al. Aug 2016 B1
9405684 Derbeko et al. Aug 2016 B1
9405765 Natanzon Aug 2016 B1
9411535 Shemer et al. Aug 2016 B1
9459804 Natanzon et al. Oct 2016 B1
9460028 Raizen et al. Oct 2016 B1
9471579 Natanzon Oct 2016 B1
9477407 Marshak et al. Oct 2016 B1
9501542 Natanzon Nov 2016 B1
9507732 Natanzon et al. Nov 2016 B1
9507845 Natanzon et al. Nov 2016 B1
9514138 Natanzon et al. Dec 2016 B1
9524218 Veprinsky et al. Dec 2016 B1
9529885 Natanzon et al. Dec 2016 B1
9535800 Natanzon et al. Jan 2017 B1
9535801 Natanzon et al. Jan 2017 B1
9547459 BenHanokh et al. Jan 2017 B1
9547591 Natanzon et al. Jan 2017 B1
9552405 Moore et al. Jan 2017 B1
9557921 Cohen et al. Jan 2017 B1
9557925 Natanzon Jan 2017 B1
9563517 Natanzon et al. Feb 2017 B1
9563684 Natanzon et al. Feb 2017 B1
9575851 Natanzon et al. Feb 2017 B1
9575857 Natanzon Feb 2017 B1
9575894 Natanzon et al. Feb 2017 B1
9582382 Natanzon et al. Feb 2017 B1
9588703 Natanzon et al. Mar 2017 B1
9588847 Natanzon et al. Mar 2017 B1
9594822 Natanzon et al. Mar 2017 B1
9600377 Cohen et al. Mar 2017 B1
9619543 Natanzon et al. Apr 2017 B1
9632881 Natanzon Apr 2017 B1
9665305 Natanzon et al. May 2017 B1
9710177 Natanzon Jul 2017 B1
9720618 Panidis et al. Aug 2017 B1
9722788 Natanzon et al. Aug 2017 B1
9727429 Moore et al. Aug 2017 B1
9733969 Derbeko et al. Aug 2017 B2
9737111 Lustik Aug 2017 B2
9740572 Natanzon et al. Aug 2017 B1
9740573 Natanzon Aug 2017 B1
9740880 Natanzon et al. Aug 2017 B1
9749300 Cale et al. Aug 2017 B1
9772789 Natanzon et al. Sep 2017 B1
9798472 Natanzon et al. Oct 2017 B1
9798490 Natanzon Oct 2017 B1
9804934 Natanzon et al. Oct 2017 B1
9811431 Natanzon et al. Nov 2017 B1
9823865 Natanzon et al. Nov 2017 B1
9823973 Natanzon Nov 2017 B1
9832261 Don et al. Nov 2017 B2
9846698 Panidis et al. Dec 2017 B1
9875042 Natanzon et al. Jan 2018 B1
9875162 Panidis et al. Jan 2018 B1
20030191916 McBrearty Oct 2003 A1