Several of the disclosed embodiments relate to data storage, and more particularly, to backing up and restoring data to and from a cloud data storage system that stores data in a format different from that of a primary storage system.
A storage server operates on behalf of one or more clients to store and manage shared files. A client can request the storage server to backup data stored in a primary data storage system (“storage system”) of the data storage server (“storage server”) to one or more secondary storage systems. Many storage systems include applications that provide tools for administrators to perform scheduling and creation of database backups, and restoration of data from these backups in the event of data loss. Some traditional storage systems use secondary storage systems that typically use a same storage mechanism (e.g., a file system) as that of a primary storage system. However, such storage mechanisms do not provide a flexibility to use other heterogeneous secondary storage systems, e.g., third party storage services such as a cloud storage service, because these secondary storage systems often use a different storage mechanism from that of the primary storage system for storing the data.
Some traditional storage systems use heterogeneous secondary storage systems for backing up data. However, current techniques that allow backing up of data to heterogeneous secondary storage systems are inefficient. The current techniques do not provide optimal storage utilization at the secondary storage system; do not support deduplication; or consume significant computing resources, e.g., network bandwidth and processing time, in converting data from one format to the other for backing up and restoring data. Accordingly, traditional network storage systems do not allow the data to be backed up and recovered from heterogeneous storage systems efficiently.
Technology is disclosed for backing up data to and restoring data from a storage service that stores data in a format different from that of a primary storage system (“the technology”). Various embodiments of the technology provide methods for mapping the data from a storage format of the primary storage system, e.g., block-based storage format, to a storage format of a destination storage system, e.g., an object-based storage format, while maintaining storage efficiency. In some embodiments, a replication stream is generated to back up a point-in-time image (“PTI”; sometimes referred to as a “snapshot”) of the primary storage system, e.g., a read-only copy of a file system of the primary storage system. The replication stream can have data of multiple files (e.g., as data stream), metadata of the files (e.g., as metadata stream), and a reference map (e.g., as reference stream) that identifies, e.g., for each of the files, a portion of the data belonging to the file. The replication stream is sent to a cloud data parking parser that backs up the PTI to the destination storage system. The cloud data parking parser identifies the data, metadata and the reference map from the replication stream and generates one or more storage objects in object-based format for each of the data, the metadata and the reference map. The one or more storage objects are then sent to the destination storage system, where they are stored in an object container.
In some embodiments, the primary storage system can be a block-based file storage system that manages data as blocks. An example of such a storage system includes Network File System (NFS) file servers provided by NetApp of Sunnyvale, Calif. In some embodiments, the block-based primary storage system organizes files using inodes. An inode is a data structure that has metadata of the file and locations of the data blocks (also referred to as “data extents”) that store the file data. The inode has associated inode identification (ID) that uniquely identifies the file. A data extent also has an associated data extent ID that uniquely identifies the data extent. Each of the data extents in the inode is identified using a file block number (FBN). The files are accessed by referring to the inodes of the files. The files can be stored in a multi-level hierarchy, e.g., in a directory within a directory.
In some embodiments, the destination storage system can be an object-based storage system, e.g., a cloud storage service. An example of such a cloud storage service includes S3 from Amazon of Seattle, Wash., Microsoft Azure from Microsoft of Redmond, Wash. In some embodiments, the object-based destination storage system can have a flat file system that stores the data objects in a same hierarchy. For example, the data objects are stored in an object container, and the object container may not store another object container in it. All the data objects for a particular object container can be stored in the object container in the same hierarchy.
To back up a PTI from the block-based storage system to the object-based storage system, a replication stream that includes (a) a data stream containing data extents (and their corresponding data extent IDs) representing data of the files at the primary storage system, (b) a reference stream having a reference map that having a mapping of the FBNs of the inode of a corresponding file to the data extents having the data of the corresponding file, and (c) a metadata stream that has metadata of the inode of the corresponding file is generated. The replication stream is then sent to the cloud data parking parser which generates one or more data storage objects that have the data extents, one or more reference map storage objects that have the reference maps, and one or more inode storage objects that have the metadata of the inodes. The data storage objects, reference map storage objects and the inode storage objects corresponding to the PTI of the primary storage system are sent to the destination storage system for storing.
Various embodiments of the technology provide methods for recovering data from the cloud storage service to restore the primary storage system. In some embodiments, the primary storage system can be restored to a particular PTI maintained at the destination storage system. The destination storage system can include multiple PTIs of the primary storage system which are generated sequentially over a period of time. A common PTI that is available on both the primary storage system and the destination storage system is identified. The primary storage system is then restored to the common PTI. A difference between the common PTI and the particular PTI is determined. In some embodiments, finding the difference can include identifying a state of the primary storage system, e.g., a set of files and the data of the set of files that correspond to the particular PTI, and identifying changes made to the state starting from the particular PTI up to the common PTI.
One or more replication jobs are generated for obtaining the difference from the destination storage system and applying the difference to the common PTI on the primary storage system to restore to the particular PTI. The jobs can include a deleting job for deleting the files and/or their corresponding data, e.g., inodes and/or data extents, from the common PTI which are added to the primary storage system after the particular PTI was generated. The jobs can include an inserting job for inserting the files and/or their corresponding data, e.g., inodes and/or data extents, to the common PTI which were deleted at the primary storage system after the particular PTI was generated. The jobs can include an updating job for updating the files, e.g., reference maps of the inodes, which were modified after the particular PTI was generated.
In some embodiments, the primary storage system 110 can be a block-based storage system which manages data as blocks. An example of storage server 105 that stores data in such a format is Network File System (NFS) file servers commercialized by NetApp of Sunnyvale, Calif., that uses various storage operating systems, including the NetApp® Data ONTAP.™ However, any appropriate storage server can be enhanced for use in accordance with the embodiments of the technology described herein. A file system of the storage server describes the data stored in the primary storage system 110 using inodes. An inode is a data structure that has metadata of the file, and the file data or locations of the data extents that has the file data. The files are accessed by referring to the inodes of the files.
The storage server 105 can include a PTI manager component 145 that can generate a PTI of the file system of the storage server 105. A PTI is a read-only copy of an entire file system at a given instant when the PTI is created. The PTI includes the data stored in the primary storage system 110. In some embodiments, the PTI includes the data extents and metadata of the data, e.g., inodes to which the data extents belong, and metadata of the inodes. A newly created PTI refers to exactly the same data extents as an “active file system” (AFS) does. Therefore, it is created in a small period of time and does not consume any additional disk space. The AFS is a file system to which data can be both written and read, or, more generally, an active store that responds to both read and write operations. Only as data extents in the active file system are modified and written to new locations on the primary storage system 110 does the PTI begin to consume extra space. In some embodiments, the PTIs can be generated sequentially at regular intervals. Each of the sequential PTIs includes only the changes, e.g., additions, deletions or modifications to the files, from the previous PTI. A base PTI can be a PTI that has a full copy of the data, and not just the changes from the previous PTI, stored at the primary storage system 110. The PTIs can be backed up to the destination storage system 115.
In some embodiments, the destination storage system 115 can be an object-based storage system, e.g., a cloud data storage service (“cloud storage service”). Accordingly, the PTI data generated by the PTI manager 145 has to be converted to the storage objects.
A replication module 150 generates a replication stream to replicate the PTI to the destination storage system 115. The replication stream can include the data of multiple files, e.g., as data extents, metadata of the files, e.g., inodes, and a reference map that identifies for each of the files the data extents storing the data of the file. However, contents of the replication stream may not be stored as is in the destination storage system 115 because the contents are in a format that is different from what the destination storage system 115 expects. Accordingly, the contents of the replication stream may have to be converted or translated or mapped to a format, e.g., to storage objects that can be stored at the destination storage system 115. The replication stream is sent to a cloud data manager 155 that parses the content of the replication stream, generates the storage objects corresponding to the content, and backs up the storage objects for the PTI to the destination storage system 115. In some embodiments, the cloud data manager 155 can be implemented in a separate server, e.g., a server different from that of the storage server 105.
In some embodiments, parsing the replication stream includes extracting the data, the metadata of the files, and the reference map from the replication stream. After the extraction, the cloud data manager 155 generates one or more storage objects for the data (referred to as “data storage objects”), one or more storage objects for the metadata (referred to as “inode storage objects”), and one or more storage objects for the reference map (referred to as “reference map storage objects”). The one or more storage objects are then sent to the destination storage system 115.
In some embodiments, the object-based destination storage system 115 can have a flat file system that stores the storage objects in a same hierarchy. For example, all the storage objects of a particular PTI “SSi,” e.g., data storage objects 130, inode storage objects 135, and reference-map storage objects 140, are stored in an object container 125 in the same hierarchy. The object container 125 may not include another object container within. Further, the PTIs can be organized in the destination storage system in various ways. For example, every PTI can be stored in a corresponding object container. In another example, there can be one object container per volume of the primary storage system 110 for which the PTI is generated. All the PTIs generated for a particular volume may be stored in the object container corresponding to the particular volume.
Referring back to the cloud data manager 155, the cloud data manager 155 can be implemented within the storage server 105 or in one or more separate servers. The destination storage system 115 provides various application programming interfaces (APIs) for generating the storage objects in a format specific to the destination storage system 115, and for transmitting the storage objects to destination storage system. The cloud data manager 155 generates the storage objects and transmits them to the destination storage system 115 using the provided APIs.
The storage server 205 can be a block-based storage server, e.g., NFS file servers provided by NetApp of Sunnyvale, Calif., that uses various storage operating systems, including the NetApp® Data ONTAP™ storage operating system. The storage server 205 receives data from a client 275 and stores the data, e.g., as blocks, in the primary storage system 210. The storage server 205 is coupled to the primary storage system 210 and to the client 275 through a network. The network may be, for example, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a wireless network, a global area network (GAN) such as the Internet, a Fibre Channel fabric, or the like, or a combination of any such types of networks. The client 275 can be, for example, a conventional personal computer (PC), server-class computer, workstation, or the like.
The primary storage system 210 can be, for example, conventional magnetic disks, optical disks such as CD-ROM or DVD-based storage, magneto-optical (MO) storage, or any other type of non-volatile storage devices suitable for storing large quantities of data. The storage devices can further be organized as a Redundant Array of Inexpensive Disks/Devices (RAID), whereby the storage server 205 accesses the primary storage system 210 using RAID protocols.
It will be appreciated that some embodiments may be implemented with solid-state memories including flash storage devices constituting storage array (e.g., disks). For example, a storage server (e.g., storage server 205) may be operative with non-volatile, solid-state NAND flash devices which are block-oriented devices having good (random) read performance, i.e., read operations to flash devices are substantially faster than write operations. Data stored on a flash device is accessed (e.g., via read and write operations) in units of pages, which in the present embodiment are 4 kB in size, although other page sizes (e.g., 2 KB) may also be used.
The storage server 205 includes a file system layout that writes the data into the primary storage system 210 as blocks. An example of such a file system layout includes a write anywhere file-system (“WAF”) layout (WAF). The WAF layout is block based (e.g., 4 KB blocks that have no fragments), uses inodes to describe the files stored in primary storage system 210, and includes directories that are simply specially formatted files. The WAF layout uses files to store meta-data that describes the layout of the file system. WAF layout meta-data files include an inode file.
For a file having a size that is greater than 64 bytes and less than or equal to 64 KB, a single level of indirection is used to refer to the data blocks. For example, the data block 325 can be used as a block 330 to store the location of the actual data blocks that have the file data. The block 330 has multiple block number entries, e.g., 16 block number entries of 4 bytes each, each of which can have reference to a data block 335 that has the data. The data block 335 can be of a specified size, e.g., 4 KB.
For a file having a size that is greater than 64 KB and is less than 64 MB, two levels of indirection can be used. For example, each of the block number entries of block 340 references a single-indirect data block 345. In turn, each 4 KB single-indirect data block 345 comprises 1024 pointers that reference 4 KB data blocks 350. Similarly, for a file having a size that is greater than 64 MB additional levels of indirection can be used. Accordingly, a file in the primary storage system can be represented using an inode. The inode includes the data of the file or has references to the data extents that have the data of the file. Each of the data blocks within the inode is identified using an inode FBN. Each of the data blocks has a data extent ID that uniquely identifies the data block. Further, the inode has an associated inode ID that uniquely identifies the file.
The data extent also has an associated ID that uniquely identifies the data extent. In some embodiments, the data extent ID is a volume block number (VBN) in a volume 220 of an aggregate 225 of the primary storage system 210. The aggregate 225 is a group of one or more physical storage devices of the primary storage system 210, such as a RAID group 230. The aggregate 225 is logically divided into one or more volumes, e.g., volume 220. The volume 220 is a logical collection of space within an aggregate. The aggregate 225 has its own physical volume block number (PVBN) space and maintains metadata, such as block allocation “bitmap” structures, within that PVBN space. Each volume also has its own VBN space and maintains metadata, such as block allocation bitmap structures, within that VBN space.
When a PTI of the file system of the storage server 205 is generated, the inodes of the files in the primary storage system 210 and the data extents having the data of the files are copied to the PTI. The PTI can then be replicated to the destination storage system 215. As described with reference to
The LRSE protocol 235 is intended for use as a protocol to replicate data between two hosts while preserving storage efficiency. The LRSE protocol 235 allows preserving storage efficiency over the wire, e.g., during transmission, as well as on the storage devices at the destination storage by naming the replicated data. The LRSE protocol 235 allows the sender, e.g., primary storage system 210, to send the named data once and refer to it (by name) multiple times in the future. In LRSE protocol 235, the sender, e.g., primary storage system 210 identifies and sends new/changed data extents along with their names (without a file context). The sender also identifies new/changed files and describes the changed contents in the files using the names.
In the block diagram 400, the first file is represented using inode 410. The inode 410 includes the data extents, e.g., data extent ID “100” and data extent ID “101” that have the data of the first file as FBN “0” and FBN “1” of the inode, respectively. The FBN identifies the data extents within the inode. Similarly, the second file is represented using inode 415 and the data extent, e.g., data extent ID “101,” that has the data of the second file is included as FBN “0” of the inode 415. In some embodiments, the storage server 205 stores the data in a de-duplicated format. That is, the files having a portion of data that is identical between the files share the data extent having the identical data. Accordingly, the inode 415 shares the data extent “101” with inode 410. In some embodiments, the identical data can be stored in different data extents, e.g., different data extents for each of the files. In some embodiments, the data extent ID can be a VBN of the volume 220 at the primary storage system 210.
The replication stream for the above base PTI 405 can include a reference stream 425 having reference maps 430 and 435, a data stream having named data extents 445 and 450. The reference map 430 of the inode 410 includes a mapping of FBNs of the inode 410 to data extent IDs, e.g., “100” and “101,” of the data extents that have the data of the file which the inode 410 represents. Similarly, the reference map 435 includes a mapping of FBNs of the inode 415 to data extent ID, e.g., “101” of the data extent that has the data for the file which the inode 415 represents.
The replication stream can also include a data stream 440 having data extents having the data of the files represented by inodes 410 and 415. The data stream 440 includes the data extents and their corresponding IDs (“names”), and hence referred to as “named data extents.” In some embodiments, the named data extents 445 and 450 may be generated separately, e.g., one named data extent for every data extent. In some embodiments, the named data extents 445 and 450 may be generated as a combined named data extent 455. The replication stream can also include metadata of inodes 410 and 415 (not illustrated).
The replication stream can be transmitted to the destination storage system 215 to store the base PTI 405. However, the contents of the replication stream may have to be converted or translated or mapped to storage objects, which is the format of data expected by the destination storage system 215. The replication stream is sent to a cloud data manager 240 for converting the contents of the replication stream to the storage objects and transmitting them to destination storage system 215. A cloud data parking parser 245 in the cloud data manager 240 parses the replication stream to identify the reference maps 430 and 435, named data extent 455, and the metadata of inodes 410 and 415. After identifying the contents, the cloud data parking parser 245 generates one or more storage objects for the contents of the replication, as illustrated in
The cloud data parking parser 505 creates storage objects of various types representing the content of the replication stream. For example, the cloud data parking parser 505 can create a data storage object 255 corresponding to data extents, a reference map storage object 260 corresponding to a reference map, and an inode storage object 265 corresponding to the metadata of inode. In
The cloud data parking parser 505 creates reference map storage objects 575 and 580 corresponding to the reference maps 525 and 530. The cloud data parking parser 505 also creates inode storage objects 565 and 570 corresponding to the metadata 515 and 520 of the inodes 410 and 415. The inode storage object can include metadata of an inode, e.g., created by, date and time, modified date and time, owner, number of file blocks in an inode (e.g., size of the file to which the inode corresponds) etc. The storage objects may be stored in an object container 550 at the destination storage system 215.
Referring back to
In some embodiments, the incremental PTIs can be backed up using the system 200 of
The replication stream transmits the differences to the cloud data parking parser 245. The cloud data parking parser 245 generates the following storage objects: (a) data storage object 615 corresponding data extents “102” and “103,” (b) an inode storage object 620 corresponding to inode 610, (c) inode storage objects 625 and 630 corresponding to inodes 410 and 415 because the metadata of these inodes, e.g., access time, has changed, (d) a reference map object 635 mapping FBN “1” of inode 410 to data extent ID “102,” (e) a reference map object 640 mapping a FBN “0” of inode 415 to “−1”, indicating that data in data extent “102” is to be deallocated, (f) a reference map object 645 mapping FBN “1” of inode 410 to data extent ID “103,” (g) and a reference map object 650 mapping FBN “0” of inode 610 to data extent ID “103.” These storage objects are then transmitted to the destination storage system 215, where they are stored in an object container corresponding to the PTI 605, e.g., object container 655.
In the example of
The AFS, which is a current state of the primary storage system 705, is as illustrated in AFS 715. The AFS 715 indicates the primary storage system 705 has four files, which are represented by corresponding inodes, e.g., inode “1,” inode “2,” inode “3,” and inode “4.” In some embodiments, the numbers “1”-“4” associated with the inodes are inode IDs. The inode “1” includes two data extents “100” and “103,” that is, the data of file represented by inode “1” is contained in the data extents “100” and “103.” Similarly, the inode “2” includes data extents “103” and “104,” the inode “3” includes data extents “101” and “103,” and the inode “4” includes data extent “105.”
In some embodiments, to restore the primary storage system 705 to a particular PTI, the primary storage system 705 may be first restored to a PTI that is common between the primary storage system 705 and the destination storage system 710. After restoring to the common PTI, a difference between the common PTI and the particular PTI is obtained from the destination storage system 710. The difference is applied to the common PTI at the primary storage system 705 which then restores the primary storage system 705 to the particular PTI.
In some embodiments, obtaining the difference includes identifying a state of the primary storage system 705 at the particular PTI. The state can be identified by traversing all the PTIs from the base PTI to the particular PTI and determining the inodes and their data extents stored at the primary storage system 705 at the time the particular PTI is generated. Then, the state of the primary storage system 705 at the common PTI is determined by traversing all the SDs starting from a SD following the particular PTI to the common PTI in the destination storage system 710. The change in state or the difference is determined as (a) inodes that are added to and/or deleted from the primary storage system 705 after a PTI corresponding to the first SD 730 is generated (a) data extents that are added to and/or deleted from the primary storage system 705 after the PTI corresponding to the first SD 730 is generated, and (c) changes made to the reference maps of the inodes.
After the difference is computed, replicating jobs are generated to apply the difference to the common PTI on the primary storage system 705, thereby restoring the primary storage system to the particular PTI. The replicator jobs can perform one or more of: (a) deleting inodes and/or data extents that are added to the primary storage system 705 after a PTI corresponding to the first SD 730 is generated, (b) adding inodes and/or data extents that are deleted from the primary storage system 705 after a PTI corresponding to the first SD 730 is generated, which can require fetching data corresponding to the added data extents from the destination storage system 710, and (c) reverting the changes made to the reference maps of the inodes after a PTI corresponding to the first SD 730 is generated.
In some embodiments, by restoring the primary storage system 705 to the common PTI before restoring to the particular PTI, the amount of data that has to be obtained from the destination storage system 710 is minimized. This can result in reduced consumption of resources, e.g., network bandwidth, time etc.
The following paragraphs describe restoring the primary storage system 705 to the first SD 730. The primary storage system 705 is restored from the AFS 715 to the common PTI, e.g., fourth PTI 720 which corresponds to the fourth SD 745. Restoring to the common PTI includes identifying the difference in data between the AFS 715 and the fourth PTI 720. The difference between the two PTIs is that the AFS 715 has a new inode “4” and data extent “105” of inode “4” that are not present in fourth PTI 720. Accordingly, the inode “4” and its data extent “105” are deleted from the AFS 715 to restore the primary storage system 705 to the fourth PTI 720.
The state 732 of the primary storage system 705 at the first SD 730 is determined by traversing all the SDs from the base PTI 725 to the first SD 730 and identifying the inodes and their data extents stored at the time the first SD 730 is generated. The state 732 includes two inodes, “inode 1” and “inode 2”, wherein “inode 1” includes data extents “100” and “102” and “inode 2” includes data extent “101.”
A state 733 of the primary storage system 705 at the fourth SD 745 is determined by traversing all the SDs from the second SD 735 to the fourth SD 745 and identifying (a) a set of inodes and/or data extents added to and/or deleted from the primary storage system 705 after the first SD 730 is generated, and (b) reference maps of the inodes that have changed. The state 733 indicates that (a) inode “3” is added, (b) reference map of inode “2” has changed, e.g., mapping of FBN “0” of inode “2” has changed from data extent “101” to “104” (e.g., due to change in data content of file to which inode “2” corresponds), and (c) inode “2” has a new block, FBN “1,” mapped to data extent “103.”
After the state 733 at the fourth SD 745 is determined, the difference 734 between the state 732 and the state 733 is computed and a replication job is generated to apply the difference 734 to the primary storage system 705. The replication job, when executed, at the primary storage system 705, applies the difference 734 to the fourth PTI 720 by deleting the inode “3,” changing the reference map of inode “2”—e.g., change mapping of FBN “0” of inode “2” to data extent “101,” updating the data extent “101” to include data “B,” and removing the mapping of FBN “1” of inode “2” from data extent “103.” Also, because none of the inodes refer to data in data extents “103” and “104”, the data in those blocks is deleted. Thus, the primary storage system 705 is restored to the first PTI 750.
In some embodiments, the primary storage system 705 can also recover a file or a group of files from a particular PTI at the destination storage system 710. To restore a file to a version of particular PTI, a cloud data manager, e.g., the cloud data manager 240 of
Further, the cloud data manager 240 also determines from the metadata that the inode “1” contains two file blocks. So the cloud data manager 240 continues to traverse earlier PTIs one by one until it finds a PTI that has information regarding the remaining data of inode “1.” Consequently, the cloud data manager 240 arrives at the base PTI 725 from where it obtains the data “A” of FBN “0” stored at data extent “100.” After obtaining the data of the entire file, the cloud data manager 240 sends the data of the file corresponding to the inode “1” and the reference map mapping the data extents containing the data of the file to the file blocks of the inode to the primary storage system 705. In some embodiments, the cloud data manager 240 can transmit the data and the reference maps to the primary storage system 705 using a replication module, e.g., replication module 150 of
In some embodiments, the PTIs stored at the destination storage system 710 can also be restored to a storage system other than the storage system (e.g., primary storage system 705) from which the data is backed up to the destination storage system 710.
In some embodiments, one or more of the PTIs at the destination storage system 710 can be compacted. In some embodiments, when multiple PTIs are backed up to the destination storage system 710, after a period, some of the PTIs may not be accessed as often as the others, that is, some of the PTIs become cold PTIs. It may be economical to archive the cold PTIs to storage systems that is more optimized, e.g., have a lesser $/GB cost, compared to the destination storage system 710. Compaction of a set of PTIs can include archiving the set of PTIs from the destination storage system 710 to another storage system and merging the set of PTIs into a single PTI. The set of PTIs can be merged into one PTI based on various known techniques. In some embodiments, the compaction process can be performed by the cloud data manager 240.
The following describes an example of a compaction process. Consider that the destination storage system 710 has the following PTIs:
So if the cloud data manager 240 compacts the PTIs from base PTI to SD4, the PTIs from base PTI to SD4 are moved to another storage system and the destination storage system 710 is updated to have a compacted view or state of the SD5 as the compacted base PTI.
The compacted view of base PTI to SD4 is as follows:
The Compacted ViewBase-SD4 represents a complete state of the destination storage system 710 at the fourth incremental PTI. Note that the Compacted ViewBase-SD4 does not contain inodes “2” and “3” since they are deleted. In some embodiments, the compaction of a set of PTIs can be a union of all the PTIs in the set of PTIs. However, various other techniques can be used to compact the PTIs in other ways.
After the PTIs, base PTI to SD4, are compacted, the PTI SD5 can be compacted with the Compacted ViewBase->SD4, to generate a compacted base PTI as follows:
The Compacted BaseSD5 represents a complete state of the destination storage system 710 at PTI SD5. The destination storage system 710 stores the Compacted BaseSD5 as the base PTI. To restore a file at the primary storage system 705 to a version corresponding to the fifth incremental PTI SD5 or later PTIs, e.g., SD6 to SDn, the cloud data manager 240 can use the Compacted BaseSD5 or the later PTIs accordingly. However, to restore a file to a version corresponding to PTIs below SD5, the cloud data manager 240 may have to fetch the PTIs from the archive storage system.
In some embodiments, if the destination storage system 710 did not store the Compacted BaseSD5, and instead stored fifth incremental PTI SD5 as it is after the compaction process, then the cloud data manager 240 may have to fetch the earlier PTIs, e.g., base PTI to SD4, from the archive storage system to determine the state of the Compacted BaseSD5, e.g., state of inode “1”. Fetching the PTIs from the archive storage system and then determining the state can be resource consuming and therefore, can affect the performance of the storage server 205. Accordingly, storing the compacted view of the fifth incremental PTI SD5 can eliminate the need to fetch the earlier PTIs from the archive storage system to determine the state of the destination storage system 710 at PTI SD5.
At block 815, the replication module 150 associated with the storage server 105 generates a replication stream containing the data to be replicated to the destination storage system from the primary storage system. In some embodiments, the replication module 150 generates the replication stream using a replication protocol, e.g., LRSE protocol. The replication stream can include (a) a first metadata of the data identifying multiple files, e.g., inodes, (b) data, e.g., data extents that contain the data of the files, and (c) a second metadata of the data identifying multiple files to which portions of the data belong, e.g., reference maps that contain a mapping of FBNs of an inode to data extents that contain the data of the file to which the inode corresponds.
At block 820, the replication module 150 sends the replication stream to the cloud data manager 155 to map the data extents, the inodes, and the reference maps to multiple storage objects for storage in the destination storage system. In some embodiments, the cloud data manager 155 can be implemented on the storage server 105. In some embodiments, the cloud data manager 155 can be implemented separate from the storage server 105 and on one or more server computers that can communicate with the storage server 105.
At block 825, the cloud data parking parser 245 parses the replication stream to identify the data extents, the inodes and the reference maps from the stream. The cloud data parking parser 245 can use the LRSE protocol to identify the content of the replication stream. The cloud data parking parser 245 maps the data extents, the inodes and the reference maps to the storage objects. The mapping can include generating a first type of the storage objects containing the data, e.g., data extents, the second type of storage objects containing the reference maps, and a third type of the storage objects containing the metadata of the files, e.g., inodes.
At block 830, the cloud data parking adapter 250 transmits the storage objects to the destination storage system over a communication network. In some embodiments, the storage objects can be transmitted using HTTP. In some embodiments, the cloud data parking adapter 250 uses the APIs of the destination storage system to transmit the storage objects to the destination storage system.
At block 835, the destination storage system 215 receives the storage objects and stores them in an object container. In some embodiments, the storage objects are stored in the same hierarchy level within the object container. In some embodiments, the storage objects can correspond to a PTI of the data at the primary storage system. The destination storage system can have various object containers, each of them corresponding to a particular PTI. The storage objects of the particular PTI can be stored in the object container corresponding to the particular PTI. After storing the storage objects, the process 800 returns at block 840.
At block 915, the PTI manager 145 determines that a new file is created at the primary storage system after a previous PTI is backed up to the destination storage system. The PTI manager 145 identifies the new file. In some embodiments, the PTI manager 145 can be implemented using one or more tools, e.g., SnapDiff, SnapVault of NetApp.
At block 920, the PTI manager 145 determines that the new file includes data of which a first portion is identical to at least a portion of data stored in the storage objects stored at the destination storage system, and a second portion is different from the data stored in the storage objects.
At block 925, the replication module 150 generates a replication stream containing the changes made to the data at the primary storage system because the last PTI was backed up, e.g., second portion of the data. In some embodiments, the replication stream can include (a) a first metadata of the data identifying the new file, e.g., the new inode, (b) the second portion of the data, e.g., new data extents that contain the second portion of the data of the new file, and (c) a second metadata of the data, e.g., a reference map that contains a mapping of the data extents that contain the first portion and the second portion of the data to the FBNs of the new inode. In some embodiments, the replication stream excludes the first portion of the data content that is identical to the data stored in the storage objects at the destination storage system. In some embodiments, the replication stream also excludes any other data at the primary storage system which is previously backed up to the destination storage system.
At block 930, the replication module 150 sends the replication stream to the cloud data manager 155 to map or translate the data extents, the new inode, and the reference map to multiple storage objects of the destination storage system.
At block 935, the cloud data parking parser 245 parses the replication stream to identify the new data extents, the new inode and the reference map from the replication stream. In some embodiments, the cloud data parking parser 245 uses the LRSE protocol to identify the content of the replication stream.
At block 940, the cloud data parking parser 245 generates a data storage object including a set of data extents containing the second portion of the data and data extent IDs of the set of data extents.
At block 945, the cloud data parking parser 245 generates an inode storage object containing the metadata of the new inode.
At block 950, the cloud data parking parser 245 generates a reference-map storage object containing a mapping of the new inode to the set of data extents.
At block 955, the cloud data parking adapter 250 transmits the data storage object, the reference-map storage object, and the inode storage object to the destination storage system.
At block 960, the destination storage system 215 stores the data storage object, the reference map storage object and the inode storage object as one or more files in an object container corresponding to the PTI, and the process 900 returns at block 965.
The process 1000 begins at block 1005, and at block 1010, the storage server 105 receives a request to restore the primary storage system to a particular PTI maintained at the destination storage system. In some embodiments, the multiple PTIs stored at the destination storage system are copies of PTIs generated at the primary storage system sequentially over a period of time. Each of the PTIs can be a copy of a file system of the primary storage system at the time PTI is generated.
At block 1015, the PTI manager 145 determines a current state of the primary storage system. In some embodiments, determining the current state includes identifying the AFS of the primary storage system, e.g., multiple files and the data of the files stored at the primary storage system currently.
At block 1017, the PTI manager 145 and/or the cloud data manager 155 determines a PTI that is common between the primary storage system and the destination storage system. In some embodiments, while the destination storage system includes copies of all the PTIs generated at the primary storage system, the primary storage system itself may not store all the PTIs. The primary storage system may store some or none of the PTIs.
At block 1019, the PTI manager 145 restores the AFS of the primary storage system to the common PTI. In some embodiments, restoring the AFS to the common PTI includes reverting any changes made to the data and the file system of the primary storage system from the time the common PTI was generated.
At block 1020, the PTI manager 145 and/or the cloud data manager 155 determines a state of the primary storage system, e.g., of a file system of the primary storage system, at the time the particular PTI was generated. In some embodiments, determining the state at the particular PTI includes searching the storage objects from a base PTI to the particular PTI at the destination storage system to identify a set of files, e.g., inodes, and the data of the set of files, e.g., data extents, that correspond to the file system of the primary storage system at the time the particular PTI is generated. In some embodiments, the copies of PTIs stored at the destination storage system can be incremental PTIs (also referred as “PTI difference”). The incremental PTI includes a difference of the data between the corresponding PTI and a previous PTI. One of the PTIs, e.g., a base PTI which is a first of the sequence of PTIs, contains a full copy of the file system of the primary storage system.
At block 1025, the PTI manager 145 and/or the cloud data manager 155 determines a state of the primary storage system at the time the common PTI is generated. In some embodiments, the state at the common PTI is determined by searching the storage objects at the destination storage system from a PTI following the particular PTI to the common PTI to identify the inodes, data extents, and the reference maps of the inodes at the time the common PTI is generated.
At block 1030, the PTI manager 145 and/or the cloud data manager 155 determines a difference between the state at the particular PTI and the state at the common PTI. In some embodiments, determining the difference includes identifying the inodes and/or data extents added and/or deleted and any updates made to the reference maps, e.g., to FBNs of the inodes, because the particular PTI up until the common PTI.
At block 1035, the replication module 150 generates a replication job to obtain the difference from the destination storage system. In some embodiments, generating the replication job includes generating a deleting job for deleting from the current state the inodes and/or data extents that are added at the primary storage system after the particular PTI was generated, as illustrated in block 1036. In some embodiments, generating the replication job also includes generating an inserting job for inserting into the current state the inodes and/or data extents that are deleted from the primary storage system after the particular PTI was generated, as illustrated in block 1037. In some embodiments, generating the replication job also includes generating an updating job to update the reference maps of inodes to the reference maps of the inodes at the time particular PTI is generated, as illustrated in block 1038.
At block 1040, the replication module 150 executes the replication job to apply the difference on the current state of primary storage system to restore the primary storage system to the particular PTI. The process 1000 returns at block 1045.
The memory 1110 and storage devices 1120 are computer-readable storage media that may store instructions that implement at least portions of the described technology. In addition, the data structures and message structures may be stored or transmitted via a data transmission medium, such as a signal on a communications link. Various communications links may be used, such as the Internet, a local area network, a wide area network, or a point-to-point dial-up connection. Thus, computer-readable media can include computer-readable storage media (e.g., “non-transitory” media) and computer-readable transmission media.
The instructions stored in memory 1110 can be implemented as software and/or firmware to program the processor(s) 1105 to carry out actions described above. In some embodiments, such software or firmware may be initially provided to the computing system 1100 by downloading it from a remote system through the computing system 1100 (e.g., via network adapter 1130).
The technology introduced herein can be implemented by, for example, programmable circuitry (e.g., one or more microprocessors) programmed with software and/or firmware, or entirely in special-purpose hardwired (non-programmable) circuitry, or in a combination of such forms. Special-purpose hardwired circuitry may be in the form of, for example, one or more ASICs, PLDs, FPGAs, etc.
The above description and drawings are illustrative and are not to be construed as limiting. Numerous specific details are described to provide a thorough understanding of the disclosure. However, in some instances, well-known details are not described in order to avoid obscuring the description. Further, various modifications may be made without deviating from the scope of the embodiments. Accordingly, the embodiments are not limited except as by the appended claims.
Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not for other embodiments.
The terms used in this specification generally have their ordinary meanings in the art, within the context of the disclosure, and in the specific context where each term is used. Some terms that are used to describe the disclosure are discussed below, or elsewhere in the specification, to provide additional guidance to the practitioner regarding the description of the disclosure. For convenience, some terms may be highlighted, for example using italics and/or quotation marks. The use of highlighting has no influence on the scope and meaning of a term; the scope and meaning of a term is the same, in the same context, whether or not it is highlighted. It will be appreciated that the same thing can be said in more than one way. One will recognize that “memory” is one form of a “storage” and that the terms may on occasion be used interchangeably.
Consequently, alternative language and synonyms may be used for any one or more of the terms discussed herein, nor is any special significance to be placed upon whether or not a term is elaborated or discussed herein. Synonyms for some terms are provided. A recital of one or more synonyms does not exclude the use of other synonyms. The use of examples anywhere in this specification including examples of any term discussed herein is illustrative only, and is not intended to further limit the scope and meaning of the disclosure or of any exemplified term. Likewise, the disclosure is not limited to various embodiments given in this specification.
Those skilled in the art will appreciate that the logic illustrated in each of the flow diagrams discussed above, may be altered in various ways. For example, the order of the logic may be rearranged, substeps may be performed in parallel, illustrated logic may be omitted; other logic may be included, etc.
Without intent to further limit the scope of the disclosure, examples of instruments, apparatus, methods and their related results according to the embodiments of the present disclosure are given below. Note that titles or subtitles may be used in the examples for convenience of a reader, which in no way should limit the scope of the disclosure. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. In the case of conflict, the present document, including definitions will control.