Computer data storage systems typically have a need to protect stored data to permit recovery of the data in the event of a disaster, and may employ various data protection approaches for this purpose. One such approach is data backup, where backup copies (discrete static images) of a data storage volume are saved periodically, e.g., weekly, daily, or hourly to enable recovery of backed up data following a crash. While traditional data backup may permit the data to be recovered to a particular point in time at which a backup copy was made, a disadvantage of traditional backup is that it does not permit recovery of any intermediate changes to the data that were made between the backup copy and the crash or for that matter between backup copies, and does not enable recovery and replication of the system to any desired point in time. In some enterprise storage systems such as transactional processing, banking or military applications any loss of data can be disastrous, and it is frequently desirable to replicate the state of the filesystem as it existed at any particular point in time. Accordingly, a common practice in the industry is to use indexing on a filesystem to afford a view of the general filesystem hierarchy. In backup systems, filesystem indexing may be performed for periodic snapshot images so that a user may inspect the filesystem at the different snapshots without full recovery of a volume or a virtual machine (VM), which is time consuming and expensive.
Continuous data protection (CDP), also known as continuous backup, is an approach that backs up computer data by automatically saving a copy of every change made to that data at a block level. This typically requires asynchronously copying data changes to a second location, which imposes additional resource requirements and overhead for additional disk write operations, but it avoids the need for scheduled backups. CDP systems create numerous point-in-time images (“PiTs”) and information about data changes, so theoretically CDP allows restoration of data to any incremental point in time at which data changes occurred. However, both traditional backup and continuous backup systems operate at the block level; and neither is designed to provide a list of specific data or filesystem changes that facilitate “any point-in-time” recovery and replication either of lost or corrupted data of interest or of the filesystem.
Dell EMC RecoverPoint technology is a journal-based product offering of the assignee of this invention that provides continuous data protection for storage arrays running on a dedicated RecoverPoint Appliance (RPA). The RecoverPoint technology enables protection of data at local and remote locations, and it provides bi-directional replication and any point-in-time recovery of data. RecoverPoint facilitates restoration of the system, not just particular data.
A disadvantage to a user of an any point-in-time replication system is that such systems create a large number of PiT images (snapshots), and users do not have convenient visibility into the contents of these PiT images because of the lack of an index to identify either the filesystem structure or an appropriate PiT image that contains data of interest at any desired point in time (“any PiT”). In order to search a PiT image to determine if it contains a particular data change or filesystem state of interest, the PiT image must be mounted to view it to determine whether it contains the change or state of interest, which is very time consuming. This makes it difficult to easily locate a particular desired PiT that includes data of interest. Moreover, traditional filesystem indexing of PiT images is impractical because it takes too long. Creating a filesystem index may take several seconds or several minutes to complete, whereas a PiT image may be created every few input/outputs (I/Os), and there may be hundreds or thousands of I/Os created every second. Thus, it is impractical to create an index of every filesystem change. Periodically indexing of PiTs is too granular and inaccurate to enable either a filesystem structure or a particular data file of interest to be identified and replicated, and is additionally impractical to implement. Thus, existing indexing systems are not sufficient for any PiT replication and recovery of either a filesystem or a data file at any point in time.
There is a need to address these and other issues associated with point-in-time replication by providing systems and methods that afford effective and easy visibility into and searching of PiTs created during the operation of a CDP system to enable PiTs containing changes of interest to be quickly located to permit replication of the filesystem structure and data recovery at any point in time. The invention affords a system and method that address these and other issues associated with RecoverPoint CDP systems and the like, and that avoid the foregoing problems.
The invention is particularly well adapted for use with RecoverPoint continuous data protection (CDP) systems of Dell EMC, the assignee of this invention, for any point-in-time recovery, and will be described in that context. However, it will be appreciated from the description that follows that the invention may also be used effectively with other types of replication/recovery systems.
Dell EMC RecoverPoint technology, a journal based offering of the assignee of this invention, provides continuous data protection for any point-in-time recovery. RecoverPoint CDP employs a RecoverPoint Appliance (RPA) that tracks all changes to data at the input/output (I/O) or block level, and journals these changes as a sequence of consecutive events. In contrast to backup systems, which store only static and periodic discrete changes to data, with RecoverPoint CDP every I/O event that changes data such as a data write to a file is tracked and stored in a journal as a different PiT snapshot of the data drive. This allows restoration of data to any I/O or PiT. If a data block or a data file is corrupted or lost, the journal allows rolling the data back to a previous point-in-time to view the data state of the data drive as it existed previously prior to loss or corruption, and enables recovery and replication of the data locally as well as remotely at a recovery site. RecoverPoint also enables rolling forward from a selected PiT to view subsequent data changes from that selected PiT. While journaling replication systems such as RecoverPoint capture several million points in time each day, they do not afford convenient and quick visibility into the structure of a filesystem or the contents of PiT images, so locating a particular change or a particular data file at any desired point in time, for example, can be challenging and time consuming.
This invention is related to and may use some of the same methods and systems disclosed in commonly-owned co-pending application Ser. No. 16/558,606 of the same inventors, filed Sep. 3, 2019, the disclosure of which is incorporated by reference herein.
As will be described below in detail, the invention provides a system and method which capture an event stream of consecutive filesystem changes occurring to a filesystem and corresponding PiT snapshot images of the filesystem data state at the time of occurrence of each change event, and use these filesystem events and snapshots and a previously saved full index of the filesystem to create a full index and recreate the filesystem structure as it existed at a previous point in time. The PiT snapshots corresponding to the filesystem events can be used to recover and replicate desired data as it existed at the time of occurrence of a filesystem event. The filesystem event stream of the invention may include metadata comprising a timestamp and a description of each filesystem event to enable creation of an index that describes the filesystem structure at the time of occurrence, and a PiT snapshot detailing the data state (content) of the filesystem. The system and method of the invention save this information as an event stream of metadata in a journal. The metadata descriptions of filesystem events comprise descriptive text strings which describe the filesystem level changes to the system, and afford comprehensible understanding, insight and cues into associated data and system structural changes. The index and metadata afford convenient visibility into and easy searching of the journal of filesystem events to locate a data change involving the filename of a file or other data of interest. Once located, the structure of the filesystem may be replicated and associated PiT snapshots and metadata may be used to recover and replicate the desired file or data. As will be described, the filesystem event stream represents changes to the structure and content of the filesystem at different points in time, and allows replication of the filesystem to a desired PiT rolling either forward or backward in time from the full index.
As used herein, and as well understood by those skilled in the art, the term filesystem (FS) refers to an organization and data structure that controls how pieces or groups of data are stored and retrieved. A filesystem keeps track of where data is located in a storage device. It refers to the logic and structure used to manage groups of data (objects) such as files or directories. Without a filesystem, information placed in a storage medium would be one large body of data with no way to tell where one piece of information ends and the next begins. The term “file” refers to a group or piece of data, i.e., “data file”, in a filesystem, and is typically accessed by a filename and a path to a directory (or folder) in the filesystem where it is located. There are differences between filesystem events and file or block level events. The term “filesystem event” as understood by those skilled in the art and used herein refers to operations at the filesystem level that change the structure and organization of a filesystem, such as, for example, the following: Create File; Remove File; Move File; Create Directory; Remove Directory; Open File for Write/Modify; and Close File. File or block level operations on the other hand are those changes that occur on a file level to a file itself, such as, for example, the following: Read File; Write File; Copy File; Delete File, Move File, etc. The term “metadata” as used herein with reference to a file refers to bookkeeping and descriptive information about the file, such as, for instance, the length of the data contained in the file, e.g., the number of blocks or the byte count, a timestamp indicating the date and time the file was created or modified, the file device type, the file's users or group ID, its access permissions, changes to the file, and other file attributes such as whether a file is read-only, an executable, etc. In relation to a filesystem event, “metadata” refers to descriptive information about a change to the filesystem, such as the object changed, the type of change, the time of the change, etc.
Referring to
The standard local production site 10 may comprise a production processor 14 such as virtual machine (VM), as from VMware, for example, having an associated filesystem (FS) and operating system (OS) 16. The production site processor 14 may also have associated physical media (not shown) storing computer executable instructions for controlling the processor to perform operations as described herein. The production site processor may further have an associated I/O data splitter 18 that is adapted to split off block I/O changes being made to data in a storage device 19, and to provide the changes to a local cluster of one or more RecoverPoint Appliances (RPAs) 20.
Each RPA of the local cluster may comprise a special purpose appliance that includes a processor and associated memory storing executable instructions for controlling the processor and that manages virtual machines and virtual volumes (not shown). The RPA 20 of the local site 10 may be connected as via a fibre channel (FC) or Ethernet (EN) TCP/IP connection 22 to a remote cluster of RPAs 24 at the recovery site 12. The RPAs 20 and 24 may be substantially the same, and they may manage similar SANs. The RPAs 24 at the recovery site may also be connected to a journal 26 into which information is stored about I/O block level and, as will be described ongoing filesystem changes, at the local site 10, which information is transferred by RPA 20 over network 22 to RPA 24 for storage in journal 26. This information about changes may comprise timestamps, other metadata and PiT images. The journal is a source of information about all changes to the data and the filesystem from a predetermined point in time that, as will be described, that enable recovery, reconstruction and replication of the filesystem structure and data state (content) at any desired point in time.
As may be appreciated, the recovery site 12 may be geographically remote from the local production site 10 or, alternatively, may be co-located with the local production site in the same data center, for instance. Moreover, the recovery site may be adapted to receive information streams from multiple production sites, and to recover and replicate filesystems, files or data of interest in different locations.
The system of
As shown in
Referring to
The initial full index of
Referring to
Continuing with
The invention enables the journal to be quickly and easily searched to identify and select an appropriate PiT for recovering and replicating a lost or corrupted object or a data state at a particular time. The invention may also be used advantageously to afford a history of the filesystem changes as for analysis or diagnosis of problems, in addition to aiding in the discovery of an appropriate PiT image for locating an object of interest. Selecting a PiT image just before a file was deleted or modified, or just after the file was closed, for example, is a good point in time to restore the file. Similarly, a PiT before a directory was created, removed or renamed may be selected as appropriate to restore and replicate data or a previous data state of the directory. The system and method of the invention makes it possible to replicate and restore a prior data state of a filesystem easily at a desired time.
As will be appreciated from the foregoing, by providing a FS event splitter to detect FS events and creating a sequence of bookmarks and metadata that describe the events, the invention affords insight and visibility into the contents of PiTs that enable desired data to be quickly and easily located, retrieved and replicated, thereby greatly enhancing the usability of an any PiT continuous data protection system.
While the foregoing has been with reference to particular embodiments of the invention, it may be appreciated that these are merely representative and that changes to these embodiments may be made without departing from the principles of the invention, the scope of which is defined by the appended claims.