Data syncing in a distributed system

Description

BACKGROUND

The following description is provided to assist the understanding of the reader. None of the information provided is admitted to be prior art.

In data storage architectures, a client's data may be stored in a volume. A unit of data, for example a file (or object), is comprised of one or more storage units (e.g. bytes) and can be stored and retrieved from a storage medium such as disk or RAM in a variety of fashions. For example, disk drives in storage systems are divided into logical blocks that are addressed using logical block addresses (LBAs). As another example, an entire file can be stored in a contiguous range of addresses on the storage medium and be accessed given the offset and length of the file. Most modern file systems store files by dividing them into blocks or extents of a fixed size, storing each block in a contiguous section of the storage medium, and then maintaining a list or tree of the blocks that correspond to each file. Some storage systems, such as write-anywhere file layout (WAFL), logical volume manager (LVM), or new technology file system (NTFS), allow multiple objects to refer to the same blocks, typically through a tree structure, to allow for efficient storage of previous versions or “snapshots” of the file system. In some cases, data for a single file or object may be distributed between multiple storage devices, either by a mechanism like RAID which combines several smaller storage media into one larger virtual device, or through a distributed storage system such as Lustre, General Parallel File System, or GlusterFS.

At some point, it is desirable to backup data of the storage system. Traditional backup methods typically utilize backup software that operates independently of the data storage system and manages the backup process. Backup methods exist to backup only the differences since the last full backup (e.g., a differential backup) or to backup only the changes since the last backup (e.g., an incremental backup). However, due to inefficiency of backup software, many administrators are shifting away from traditional backup processes and moving towards data replication methods. With replication comes the issue of replicating a mistake, for example, a wrongly deleted file. High bandwidth is required for both replication and backup solutions, and neither methods are particularly well suited to scale efficiently for long term archiving.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other features of the present disclosure will become more fully apparent from the following description and appended claims, taken in conjunction with the accompanying drawings.

FIG. 1 depicts a simplified system for a storage system in accordance with an illustrative implementation.

FIG. 2A depicts a hash tree in accordance with an illustrative implementation.

FIG. 2B depicts the hash tree illustrated in FIG. 2A, with updated node hashes, in accordance with an illustrative implementation.

FIG. 2C depicts the hash tree illustrated in FIG. 2A, with newly added leaves, in accordance with an illustrative implementation.

FIG. 3 shows a flow diagram of an incremental block level backup procedure in accordance with an illustrative implementation.

FIG. 4 depicts a distributed storage system in accordance with an illustrative implementation.

FIG. 5 shows a flow diagram for replicating data in accordance with an illustrative implementation.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
Overview

In general, one innovative aspect of the subject matter described below can be embodied in methods for receiving a start replication message from a source system to replicate data of a source volume to a replicated volume on a replica server. The replicated volume comprises a copy of data of the source volume. The source system forwards input/output (I/O) requests to the replica server after the start replication message is sent. A data structure associated with units of data of the replicated volume is initialized. A write request is received from the source system that includes write data associated a unit of data of the replicated volume. The source system wrote the write data to the source volume based upon the write request. The write data is written to the replicated volume. The data structure is updated to indicate the write data has been written after the receipt of the start replication message. Source metadata associated with the source volume is received. The metadata includes an ordered list of block identifiers for data blocks of the source volume. Each block identifier is used to access a data block. The source metadata is compared with prior metadata associated with a prior point-in-time image of the source volume to determine blocks of data that have changed since the prior point-in-time image of the source volume. A first block of the blocks of data is determined to not be retrieved based upon the data structure. A second block of the blocks of data is determined to be retrieved based upon the data structure. The second block is received from the source system and written to the replicated volume. Other embodiments of this aspect include corresponding systems, apparatuses, and computer-readable media, configured to perform the actions of the method.

The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, implementations, and features described above, further aspects, implementations, and features will become apparent by reference to the following drawings and the detailed description.

DETAILED DESCRIPTION

Described herein are techniques for an incremental block level backup system. In the following description, for purposes of explanation, numerous examples and specific details are set forth in order to provide a thorough understanding of various implementations. Particular implementations as defined by the claims may include some or all of the features in these examples alone or in combination with other features described below, and may further include modifications and equivalents of the features and concepts described herein.

Storage System

FIG. 1 depicts a simplified system for incremental block level backup of a storage system 100 in accordance with an illustrative implementation. System 100 includes a client layer 102, a metadata layer 104, a block server layer 106, storage 116, and storage 120.

In general, client layer 102 includes one or more clients 108a-108n. Clients 108 include client processes that may exist on one or more physical machines. When the term “client” is used in the disclosure, the action being performed may be performed by a client process. A client process is responsible for storing, retrieving, and deleting data in system 100. A client process may address pieces of data depending on the nature of the storage system and the format of the data stored. For example, the client process may reference data using a client address. The client address may take different forms. For example, in a storage system that uses file storage, client 108 may reference a particular volume or partition, and a file name. With object storage, the client address may be a unique object name. For block storage, the client address may be a volume or partition, and a block address. Clients 108 communicate with metadata layer 104 using different protocols, such as small computer system interface (SCSI), Internet small computer system interface (ISCSI), fibre channel (FC), common Internet file system (CIFS), network file system (NFS), hypertext transfer protocol (HTTP), hypertext transfer protocol secure (HTTPS), web-based distributed authoring and versioning (WebDAV), or a custom protocol.

Metadata layer 104 includes one or more metadata servers 110a-110n. Performance managers 114 may be located on metadata servers 110a-110n. Block server layer 106 includes one or more block servers 112a-112n. Block servers 112a-112n are coupled to storage 116, which stores volume data for clients 108. Each client 108 may be associated with a volume. In one implementation, only one client 108 accesses data in a volume; however, multiple clients 108 may access data in a single volume.

Storage 116 can include multiple solid state drives (SSDs). In one implementation, storage 116 can be a cluster of individual drives coupled together via a network. When the term “cluster” is used, it will be recognized that cluster may represent a storage system that includes multiple disks that may not be networked together. In one implementation, storage 116 uses solid state memory to store persistent data. SSDs use microchips that store data in non-volatile memory chips and contain no moving parts. One consequence of this is that SSDs allow random access to data in different drives in an optimized manner as compared to drives with spinning disks. Read or write requests to non-sequential portions of SSDs can be performed in a comparable amount of time as compared to sequential read or write requests. In contrast, if spinning disks were used, random read/writes would not be efficient since inserting a read/write head at various random locations to read data results in slower data access than if the data is read from sequential locations. Accordingly, using electromechanical disk storage can require that a client's volume of data be concentrated in a small relatively sequential portion of the cluster to avoid slower data access to non-sequential data. Using SSDs removes this limitation.

In various implementations, non-sequentially storing data in storage 116 is based upon breaking data up into one more storage units, e.g., data blocks. A data block, therefore, is the raw data for a volume and may be the smallest addressable unit of data. The metadata layer 104 or the client layer 102 can break data into data blocks. The data blocks can then be stored on multiple block servers 112. Data blocks can be of a fixed size, can be initially a fixed size but compressed, or can be of a variable size. Data blocks can also be segmented based on the contextual content of the block. For example, data of a particular type may have a larger data block size compared to other types of data. Maintaining segmentation of the blocks on a write (and corresponding re-assembly on a read) may occur in client layer 102 and/or metadata layer 104. Also, compression may occur in client layer 102, metadata layer 104, and/or block server layer 106.

In addition to storing data non-sequentially, data blocks can be stored to achieve substantially even distribution across the storage system. In various examples, even distribution can be based upon a unique block identifier. A block identifier can be an identifier that is determined based on the content of the data block, such as by a hash of the content. The block identifier is unique to that block of data. For example, blocks with the same content have the same block identifier, but blocks with different content have different block identifiers. To achieve even distribution, the values of possible unique identifiers can have a uniform distribution. Accordingly, storing data blocks based upon the unique identifier, or a portion of the unique identifier, results in the data being stored substantially evenly across drives in the cluster.

Because client data, e.g., a volume associated with the client, is spread evenly across all of the drives in the cluster, every drive in the cluster is involved in the read and write paths of each volume. This configuration balances the data and load across all of the drives. This arrangement also removes hot spots within the cluster, which can occur when client's data is stored sequentially on any volume.

In addition, having data spread evenly across drives in the cluster allows a consistent total aggregate performance of a cluster to be defined and achieved. This aggregation can be achieved, since data for each client is spread evenly through the drives. Accordingly, a client's I/O will involve all the drives in the cluster. Since, all clients have their data spread substantially evenly through all the drives in the storage system, a performance of the system can be described in aggregate as a single number, e.g., the sum of performance of all the drives in the storage system.

Block servers 112 and slice servers maintain a mapping between a block identifier and the location of the data block in a storage medium of block server 112. A volume includes these unique and uniformly random identifiers, and so a volume's data is also evenly distributed throughout the cluster.

Metadata layer 104 stores metadata that maps between client layer 102 and block server layer 106. For example, metadata servers 110 map between the client addressing used by clients 108 (e.g., file names, object names, block numbers, etc.) and block layer addressing (e.g., block identifiers) used in block server layer 106. Clients 108 may perform access based on client addresses. However, as described above, block servers 112 store data based upon identifiers and do not store data based on client addresses. Accordingly, a client can access data using a client address which is eventually translated into the corresponding unique identifiers that reference the client's data in storage 116.

Although the parts of system 100 are shown as being logically separate, entities may be combined in different fashions. For example, the functions of any of the layers may be combined into a single process or single machine (e.g., a computing device) and multiple functions or all functions may exist on one machine or across multiple machines. Also, when operating across multiple machines, the machines may communicate using a network interface, such as a local area network (LAN) or a wide area network (WAN). In one implementation, one or more metadata servers 110 may be combined with one or more block servers 112 or backup servers 118 in a single machine. Entities in system 100 may be virtualized entities. For example, multiple virtual block servers 112 may be included on a machine. Entities may also be included in a cluster, where computing resources of the cluster are virtualized such that the computing resources appear as a single entity.

Block Level Incremental Backup

One or more backup servers 118a-118n can interface with the metadata layer 104. Backup servers 118 can interface directly with block servers 112. Backup servers 118a-118n are coupled to storage 120, which stores backups of volume data for clients 108. Storage 120 can include multiple hard disk drives (HDDs), solid state drives (SSDs), hybrid drives, or other storage drives. In one implementation, storage 120 can be a cluster of individual drives coupled together via a network. Backup servers 118 can store backup copies of the data blocks of storage 116 according to any number of formats in storage 120, and translation from the format of the data blocks of storage 116 may occur. Data may be transferred to and from backup servers 118 using different protocols, such as small computer system interface (SCSI), Internet small computer system interface (ISCSI), fibre channel (FC), common Internet file system (CIFS), network file system (NFS), hypertext transfer protocol (HTTP), hypertext transfer protocol secure (HTTPS), web-based distributed authoring and versioning (WebDAV), or a custom protocol. Compression and data de-duplication may occur in backup servers 118a-118n.

As discussed above, the servers of metadata layer 104 store and maintain metadata that maps between client layer 102 and block server layer 106, where the metadata maps between the client addressing used by clients 108 (e.g., file names, volume, object names, block numbers, etc.) and block layer addressing (e.g., block identifiers) used in block server layer 106. In one embodiment, the metadata includes a list of block identifiers that identifies blocks in a volume. The list may be structured as an ordered list corresponding to a list of blocks. The list may also be structured as the leaves of a hash tree. The block identifiers of the metadata are the same block identifiers as used throughout system 100 as described above. The block identifiers may be hexadecimal numbers, but other representations may be used. Additional metadata may also be included, such as inode numbers, directory pointers, modification dates, file size, client addresses, list details, etc. The block identifiers uniquely identify the data of a block and are a hash based on the content of the data block. Backup servers 118 are generally configured to create backups of block level data of a volume that is stored in storage 116 of block server layer 106. Backup servers 118 may create backups of all of the volume data of block server layer 106 or backup servers 118 may create backups of one or more particular volumes (e.g., a volume of a client 108). Backups may be full backups of all data, or they may be incremental backups (e.g., data that has changed since a previous backup).

During an initial backup operation, a backup server 118 retrieves a copy of metadata from metadata server 110 for a client volume. The metadata includes a list of block identifiers associated with data blocks of the volume. In an implementation, the metadata includes an ordered list structure of block identifiers. In another implementation, the ordered list is structured as the leaves of a hash tree (e.g., a Merkle tree, etc.) and the metadata includes the hash tree. The metadata is used by backup server 118 to retrieve a copy of all of the data blocks of the client volume in order to create an initial backup of the data blocks. The data blocks are retrieved from storage 116 by sending a request for the data to a metadata server 110. The requested data is based on the data block identifiers. A request may include a list of the block identifiers of blocks desired to be backed up. In one implementation, backup server 118 may calculate the LBAs of blocks desired to be backed up. For example, because each block identifier can represent a known amount of data (e.g., a 4 k block, etc.), an LBA of a block can be calculated based on the location of the block identifier in the ordered list of block identifiers associated with the volume. For example, the position of a block identifier in the ordered list can be used along with the block size to determine the LBA of the data block. As described below, the tree structure can also be used to determine the data blocks that have changed after a previous backup. In this example, the number of leaf nodes to the left of a changed leaf node can be used to calculate the LBA of the data block. In implementations where LBAs are calculated, a request from backup server 118 may include a list of LBAs of blocks to be backed up. The metadata server 110 routes the request to a block server 112, which provides the requested data to metadata server 110. Metadata server 110 then routes the requested data to the backup server 118. This arrangement allows the servers of metadata layer 104 to facilitate data transmission between block server layer 106 and the backup servers 118. In another implementation, backup servers 118 may be configured to communicate directly with servers of block server layer 106. Upon retrieval of the requested data, the backup server 118 stores the data in storage 120. The data may be stored in storage 120 according to any of the methods discussed herein. Backup server 118 may create and maintain statistics and snapshot data corresponding to a particular backup operation. The snapshot data may be used later during a data restoration operation, or during a future backup operation. Backup server 118 can also store a copy of the metadata used during a particular backup operation. In another embodiment, the metadata is not stored on the backup server 118. Rather, the metadata is stored on another storage device, for example, one or more metadata servers, one or more block servers, or one or more devices remote from the backup system. As a result of the initial backup operation, a complete backup of the data of a client volume is created and stored in storage 120.

During an incremental backup operation, a backup server 118 retrieves the current metadata from metadata server 110 for a client volume. The backup server 118 can then compare the current metadata from metadata server 110 with a version of stored metadata on backup server 118 (e.g., the version of metadata stored during the most recent backup operation, or the initial version of the metadata stored during the initial backup, etc.). In an implementation where the metadata includes an ordered list of block identifiers, the backup server 118 can compare the block identifiers of the two versions of metadata node-by-node. For example, the current list node corresponding to a first block of data is compared to the stored list node corresponding to the first block of data, and each node of the ordered list is traversed and compared. Since the block identifiers are hashes based on content of a corresponding data block, a difference in hash values for corresponding nodes indicates that the data of the block has been changed/updated since the prior backup. As the block identifiers are integral to storage system 100 and maintained as described herein, the block identifiers can be compared in their native format and immediately used without the need to compute the hash values. In an implementation where the metadata includes a hash tree and the ordered list of block identifiers are structured as the leaves of the hash tree, additional performance gains may be realized. Such a hash tree is generally a tree data structure in which every non-leaf node includes the hash of its children nodes. This structure is particularly useful because it allows efficient determination of which data blocks have been updated since a prior backup, without the need to compare every node of the list of block identifiers. The determination of changed data blocks by using a hash tree will be discussed in further detail below with reference to FIGS. 2a-b. Upon determination of which particular blocks of data have changed since the previous backup, backup server 118 can retrieve the updated blocks of data from storage 116 by sending a request for the changed data block to the metadata server 110. As discussed above, the metadata server 110 can facilitate the transfer of data from the block server layer 106. Upon retrieval of the requested changed data blocks, the backup server 118 stores the data in storage 120. The backup server 118 also stores the current metadata from metadata server 110 used in the incremental backup operation. As a result of the incremental backup operation, only the data of a volume that has changed since a previous backup operation is backed up again. This provides a number of advantages, including increasing the efficiency of the data backup procedure, and decreasing the overall amount of data being transferred during the backup procedure. Further, any number of incremental backup operations may be performed, during which the current metadata from metadata server 110 may be compared to previously stored metadata on backup server 118 (e.g., the stored metadata from a prior backup operation).

Backup servers 118 may also provide an application programming interface (API) in order to allow clients 108 or traditional data backup software to interface with the backup systems described herein. For example, the API may allow backup servers 118 to send statistics related to backed up data and backup operations to and from clients 108 or traditional backup software. As another example, the API may allow backup servers 118 to receive a request to initiate a backup operation. The API can also allow for backup operations to be scheduled as desired by clients 108 or as controlled by data backup software. Other API functionality is also envisioned.

Referring to FIG. 2A, a hash tree 200a is shown in accordance with an illustrative implementation. The hash tree 200a may be a hash tree that is provided by a metadata server 110 to a backup server 118 in an initial or incremental backup operation as discussed above. Although depicted as a binary hash tree, hash tree 200a (and hash trees described herein) may have any number of child nodes/branches. Hash tree 200a represents the data of a particular volume, and can be provided along with additional metadata describing details related to the tree structure. For example, the metadata may include statistics regarding node counts, leaf-node counts, tree-depth, indexes to sub-trees, etc. Backup server 118 may store the additional metadata for future use. Hash tree 200a includes leaves 202a-d, internal nodes 204a-b, and root node 206. Leaves 202a-d store block identifies B1-B4, respectively. In an implementation, leaves 202a-d may be structured as an ordered list that is indexed by its parent nodes, which in this example are internal nodes 204. Block identifiers B1-B4 are identifiers as described herein (e.g., a hash of the corresponding data block's content), and each uniquely identify a particular data block of the volume. Hash tree 200a further includes non-leaf internal nodes 204a-b and non-leaf root node 206. The value stored by each non-leaf node is the hash of that node's children values. For example, hash H1 is the hash of block identifiers B1 and B2, hash H2 is the hash of block identifiers B3 and B4, and hash H3 is the hash of hashes H1 and H2. During an initial backup operation, backup server 118 can walk the tree, or traverse the ordered list of leaves 202a-d to determine that the data blocks corresponding to block identifiers B1-B4 should be retrieved to be backed up. A copy of hash tree 200a (and any accompanying metadata) is stored by backup server 118 when a backup operation is performed.

Referring to FIG. 2B, the hash tree 200a of FIG. 2A is shown at a later time instance, as hash tree 200b. For example, hash tree 200a may have been provided by metadata server 110 during an initial backup operation and stored by the backup server 118, and hash tree 200b may have been provided by metadata server 110 during a subsequent incremental backup operation. Both hash trees 200a-b represent the data stored on a particular volume. As depicted, the block identifier B3 of leaf node 202c has changed to become block identifier B3′ at some time since the previous backup. For example, new or updated data may have been written to the block referenced by block identifier B3. Because of the structure of the hash tree, the change of block identifier from B3 to B3′ causes updates in hashes to propagate upward through the parent node to the root node. Specifically, hash H2 is recalculated to become H2′, and hash H3 is recalculated to become to H3′. During a backup operation, backup server 118 may walk the hash tree 200b, and compare the nodes of hash tree 200b to corresponding nodes of hash tree 200a. A difference between corresponding non-leaf node hashes indicates that a block identifier (and therefore block data) below that non-leaf node has changed. If the hashes of corresponding non-leaf nodes are equal, this indicates that the block identifiers below that non-leaf node have not changed (and therefore corresponding block data has also not changed). Thus, the subtree of nodes below an unchanged non-leaf node can be skipped from further processing. In this manner, a performance increase may be realized as the entire hash tree does not need to be traversed in every backup operation. As an example with reference to FIG. 2B, backup server 118 may compare hash tree 200b to hash tree 200a as follows (although analysis performed by backup server 118 is not limited to the following operations or order of operations):

- 1. Node 206 is analyzed to determine that hash H3′ is different from its previous value of H3, and therefore hash trees 200a-b need to be further analyzed.
- 2. Node 204a is analyzed to determine that hash H1 has not changed, and the subtree of node 204a (leaf nodes 202a-b) may be skipped from further analysis.
- 3. Node 204b is analyzed to determine that hash H2′ is different from its previous value of H2, therefore the subtree of node 204b (leaf nodes 202c-d) must be analyzed.
- 4. Leaf node 202c is analyzed to determine that block identifier B3′ is different from its previous value of B3. Thus, the data block corresponding to block identifier B3′ needs to be backed up by backup server 118, since its data as changed since the previous backup operation.
- 5. Leaf node 202d is analyzed to determine that block identifier B4 has not changed, and traversal of hash trees 200a-b is complete.

After performing the above sample analysis, backup server 118 may proceed to retrieve the data based on the block identifier(s) that indicate data has changed, and has not yet been backed up. In this example, backup server 118 may send a request to a metadata server 110 for the data block identified by block identifier B3′. Upon receipt of the data block, backup server 118 stores the data block as a backup, and stores hash tree 200b (along with any accompanying metadata) for use in future backup and/or restoration operations.

In one implementation using trees, backup server 118 may retrieve the metadata from a metadata server 110 by requesting only child nodes whose parent node has changed. For example, starting with the root, if the root node has changed the children of the root node can then be requested. These nodes can then be compared to corresponding nodes in the previously stored tree to determine if those have changed. Children of any node that has changed can then be retrieved. This process can be repeated until leaf nodes are retrieved. For example, with reference to FIGS. 2A-B hash tree 200b may be the current metadata from metadata server 110, and hash tree 200a may be stored metadata from a previous backup operation. Backup server 118 may first retrieve root node 206 and analyze it to determine that hash H3′ is different from its previous value of H3. In response, backup server 118 may then request nodes 204a-b from interior node level 204. Node 204a is analyzed to determine that hash H1 has not changed, and leaf nodes 202a-b may be skipped from further requests/analysis. Node 204b is analyzed to determine that hash H2′ is different from its previous value of H2, and thus backup server 118 may proceed to request appropriate nodes of leaf level 202 (leaves 202c-d). Analysis may then continue as described above to determine that block identifier B3′ is different from its previous value of B3 and that the data block corresponding to block identifier B3′ needs to be backed up. This implementation may allow for performance increases by minimizing data that is transmitted between backup server 118 and metadata server 110 during the retrieval of metadata.

At some point, it may be desirable by clients 108 or an administrator of system 100 to increase the volume size assigned to a client 108 by adding more data blocks of storage space. In this situation, with backup servers 118 implementations configured to utilize metadata of an ordered list of block identifiers, any newly added block identifiers (corresponding to the new data blocks) may be appended to the end of the ordered list. Thus, during a backup operation, if a backup server 118 receives metadata of an ordered list that has more elements than that of metadata from a prior backup operation, backup server 118 can determine the newly added data blocks that must be backed up based on the additional list elements. The backup operation may proceed as described above with respect to the remaining elements.

FIG. 2C depicts the result of an increased volume size for implementations configured to utilize metadata of a hash tree. Hash tree 200c is based on hash tree 200a (which is included as a subtree and is denoted by a dashed box). Leaves 202e-f have been newly added to the hash tree and include block identifiers B5-B6, which correspond to the newly added data blocks of the increased volume size. As a result of the volume increase, hash tree 200a is restructured such that root node 206 becomes internal node 206a, and a new root node 208 is created. Further, internal nodes 206b and 204c are added to maintain the tree structure. Hashes H4-H6 are calculated based on the respective child values as described above. After such a restructuring of a hash tree, a backup operation may proceed as described above. However, backup server 118 can determine the newly added data blocks that must be backed up based on a new root node or additional leaves. Also, an implementation may make use of additional metadata that includes the indexes of the root nodes of previously stored trees. In this manner, backup server 118 may access the indexes to locate and compare the root node of a prior tree with the corresponding internal node of the current tree (e.g., root node 206 can be compared to internal node 206a). If the comparison indicates that the hashes have not changed, then backup server 118 may skip analyzing the subtree of the internal node, and a performance gain may be realized.

At some point, it may be desirable by clients 108 or an administrator of system 100 to reduce the volume size assigned to a client 108 by removing data blocks of storage space. In this situation, with backup server 118 implementations configured to utilize metadata of an ordered list of block identifiers, any removed block identifiers (corresponding to removed data blocks) may be removed from the end of the ordered list. Thus, during a backup operation, if a backup server 118 receives metadata of an ordered list that has fewer elements than that of metadata from a prior backup operation, backup server 118 can determine the backed up data blocks that may be removed based on the additional list elements in the stored list from the prior backup. The backup operation may proceed as described above with respect to the remaining elements. With backup server 118 implementations configured to utilize metadata of a hash tree including leaves that are a list of block identifiers, the backup server 118 may compare the trees (e.g. depth of the trees, leaf node count, etc.) to determine that there has been a change in volume size. In another implementation the size of the volume can be part of the metadata received by the backup servers, and this metadata can be compared to a previously received volume size to determine that a change in volume has occurred. The backup server may then determine the position of the current tree within the stored hash tree. After locating the position of the current root node, the leaf nodes (and corresponding parent nodes) that are not within the subtree of the current root node can be ignored. Once the corresponding root nodes have been determined, the backup operation may then proceed as described above with respect to the remaining nodes.

FIG. 3 shows a simplified flow diagram of an incremental block level backup procedure 300, in accordance with an embodiment. Additional, fewer, or different operations of the procedure 300 may be performed, depending on the particular embodiment. The procedure 300 can be implemented on a computing device. In one implementation, the procedure 300 is encoded on a computer-readable medium that contains instructions that, when executed by a computing device, cause the computing device to perform operations of the procedure 300. According to different embodiments, at least a portion of the various types of functions, operations, actions, and/or other features provided by the incremental block level backup procedure may be implemented at one or more nodes and/or volumes of the storage system. In an operation 302, metadata for a particular volume is retrieved (e.g., from a metadata server). For example, a backup sever may initiate a backup operation and retrieve initial metadata as described above. In an alternative embodiment, the backup server may be responding to a request to initiate a backup operation. For example, a client or backup software may submit a request via an API to perform a backup at a certain time. Alternatively, the backup server may be performing a backup according to a schedule (e.g., nightly backups, weekly backups, client-specified backups, etc.). In an operation 304, the initial backup of the data blocks of the volume is created. The metadata provides the block identifiers corresponding to the volume. The metadata may include an ordered list of block identifiers, a hash tree based on block identifiers, and other related data. The block identifiers are used to retrieve the corresponding data blocks to be backed up. For example, the backup server may analyze the metadata in order to request the transmission of and retrieve particular data blocks to be backed up. The request may be sent to the metadata server, which can facilitate the transmission of data from a block server. In an alternative embodiment, the backup server may retrieve the data blocks directly from the block server. The initial backup is a backup of all of the data of the volume as specified by the metadata. In an operation 306, the metadata used for the initial backup is stored for future use. In an operation 308, an incremental backup of the volume is initiated by retrieving the current metadata. For example, sometime after the creation of the initial backup, the backup server may retrieve updated metadata, which has been maintained by the metadata server to be current with the data blocks of the volume. As another example, metadata may be retrieved from a remote storage device. In an operation 310, the current metadata is compared to other metadata (e.g., the metadata from the immediately preceding backup operation, the metadata from the initial backup operation, the metadata from a remote device, etc.). For example, the backup server may analyze the metadata to determine changes in block identifiers as discussed above. Based on any changed block identifiers found during the analysis, in an operation 312, an incremental backup is created. For example, based on the identifiers of the changed data blocks, the backup server may retrieve only the changed data blocks to be backed up. The backup server may store received data blocks as described herein. In an operation 314, the metadata used for the incremental backup is stored for future use. The backup server may also generate additional metadata related to the backup procedure, including statistics to the amount of data backed up, the elapsed time of the backup process, etc. This process may repeat any number of times to create any number of incremental backups, as indicated by operation 316.

In another embodiment, the retrieval of the metadata and the comparison of the metadata to other metadata is performed by a device other than the backup server (e.g., by one or more devices of the storage system). For example, a storage device remote from the backup server may access metadata on the storage device, or may retrieve the metadata from another device, for example, from the metadata server. The storage device may analyze the metadata to determine changes in block identifiers as discussed above. Based on any changed block identifiers found during the analysis, an incremental backup can be created by transferring data to the backup server. For example, based on the identifiers of the changed data blocks, the storage device may transfer only the changed data blocks to the backup server to be backed up. The backup server may store received data blocks as described herein. The metadata used for the incremental backup can be stored by the storage device or can be transferred to another device (e.g., the metadata server) to be stored for future use.

Data Syncing in a Distributed System

In various embodiments, data can synced/replicated to another location. For example, data from a source system can be copied to a replica server. Data can be replicated locally, to another volume in its cluster, to another cluster, to a remote storage device, etc. Data that can be replicated includes, but is not limited to, block server data, metadata server data, etc. Replicated data is a representation of the data on the source system at a particular point in time. To reduce impact on the source system during replication, the replication process does not stop incoming I/O operations. To allow I/O operations to continue during a replication, writes that occur during the replication must be properly handled to avoid mismatches in data between the live data and the corresponding replicated data.

FIG. 4 depicts a distributed storage system 400 in accordance with an illustrative implementation. The storage system 400 stores live client data and may be configured as discussed above regarding system 100 (e.g., including client layer 102, metadata layer 104, block server layer 106, and storage). The storage system 400 can also include one or more replica servers 418a-418n. Replica servers 418a-418n can interface with the metadata and/or block servers of the storage system 400 in order to maintain synchronized (replicated) copies of data stored by the storage system 400. Replica servers 418a-418n are coupled to storage 420, which may store backups of volume data (e.g., backups of block level data of a client volume), synchronized data of client volume, snapshots of a client volume, and associated metadata. Storage 420 may include multiple hard disk drives (HDDs), solid state drives (SSDs), hybrid drives, or other storage drives. In one implementation, storage 420 can be a cluster of individual drives coupled together via a network. Replica servers 418 can store backup copies of the data blocks of storage system 400 according to any number of formats in storage 420, and translation from the format of the data blocks may occur.

In one embodiment, a replica server 418 maintains a live synchronized copy of data blocks of a client volume (e.g., a mirror copy of the client volume). To maintain synchronization, requests to write data that are provided by a client to storage system 400 may also be transmitted to the replica server 418. In this manner, data written to storage system 400 can be synchronized and stored on replica server 418 in real-time or semi real-time. Synchronization of volume data on replica server 418 includes synchronizing the metadata of storage system 400 that identifies blocks in a client volume. As discussed above, metadata servers of the storage system store metadata that includes a list of block identifiers that identifies blocks in a volume. The block identifiers may be hexadecimal numbers, and other representations may be used. Additional metadata may also be included (e.g., inode numbers, directory pointers, modification dates, file size, client addresses, list details, etc.). The block identifiers uniquely identify the data of a block and are a hash based on the content of the data block. In an embodiment, the metadata includes an ordered list structure of block identifiers. In another embodiment, the ordered list is structured as the leaves of a hash tree (e.g., a Merkle tree, etc.) and the metadata includes the hash tree. In an implementation utilizing a tree, when a write request is received and data is written to a block of a volume, values of the leaves (and inner nodes) of the tree change to corresponding to the changes of the block. Thus, replica server 418 can maintain a live synchronization tree that is updated to parallel the tree maintained by a metadata server of storage system 400 for a particular client volume.

FIG. 5 shows a flow diagram for replicating data in accordance with an illustrative implementation. Replication begins with a replica server receiving a start replication message from a source system (502). Upon receipt of the start replication message, the replica server initiates a data structure that will be used to track writes that occur during the replication process (504). In one embodiment, the data structure is a bit field where each bit represents a single unit of information, e.g., a block, a sub-block, etc. Each bit in the bit field represents if a particular unit has been written to after the start of the replication processes. In this embodiment, the bit field will be initialized to 0. At some point after sending the start replication message, the source system sends over replication data to the replica server. Similar to the block level backup embodiments, merkle trees can be used to minimize the amount of data that is required to be transferred between the source system and the replica server.

While the replication data is being sent to the replica server, data writes can be received at the source system. For example, a user may be writing new data to a file or metadata related to a user volume could be updated. The source system will handle the writes and while the replication process is active will also send the writes to the replica server. For example, the replica server can receive an I/O request to write a block of data (550). Upon receipt, the replica server can write the block of data (552) and will also update the bit associated with the block in the bit field to 1 (554). After the bit is set, the data write on the replica server is complete.

As part of the replication process, the replica server determines which blocks of data are needed from the source system (506). For example, a merkle tree comparison as described above can be used to determine blocks of data that have changed since a previous point-in-time image. One or more of the changed blocks of data, however, may have been changed again since the start of the replication process. Accordingly, the data will have already been sent to the replica server and requesting this data again is unneeded. Before requesting the block of data from source system, the bit field can be checked to determine if the block has already been received (508). If the block has not been updated, then the block of data is requested from the source system (510). The block is received (512) and written to storage. If the block has been updated, then no request for that block of data needs to be sent to the source system. This continues until there are no longer any data blocks that are needed from the source system. Once there are no longer any data blocks, the volume has been replicated. The replication system can send a message to the source system indicating that replication is complete. Upon receipt, the source system can stop forwarding I/O to the replication system.

In one embodiment, a block is the smallest amount of data that is written to storage in a single write operation. A block, however, can be divided into smaller sub-blocks, such that each unit of a block can be written to separately. As an example, a block can be 4 kilobytes in size and broken down into sixteen 256 byte sub-blocks. In this embodiment, the data structure corresponds to the sub-blocks and not the blocks. While replication is being done, a write to a sub-block can be received. The write command can include the data for the entire block or just the sub-block of data. The write can update a cache that is associated with the sub-block or could write the sub-block to storage. When only a sub-block is received in the write request, the block that contains the sub-block is retrieved and the sub-block is updated appropriately. Later during replication, the Merkle tree comparison can be used to determine that the block with the updated sub-block needs to be retrieved from the source system. For example, another sub-block may have been update from the previous replication. The entire block can be retrieved. The corresponding block on the replica server is retrieved and updated. To update the corresponding block on the replica server, the data structure is used to update each sub-block from the block retrieved from the source system. For sub-blocks where the data structure indicates that the sub-block has been updated during the replication process, the sub-block is not updated since it already has the latest data. If the data structure indicates that a sub-block has not been updated, that sub-block is updated with the corresponding sub-block received from the source system. To reduce unnecessary data transfers, before the replica server requests a block, the replica server can determine if all the sub-blocks of a block have been updated during the replica process. In this case, the replica server has already replicated this block and there is no need to request that block of data from the source system.

As described above, replica servers 418a-418n can be configured to create point-in-time images of components of the data of storage system 400. In one embodiment, each point-in-time image includes corresponding metadata (e.g., a hash tree) that identifies the blocks of the point-in-time image. The hash tree of a point-in-time image is based on the block identifiers of the data stored for the point-in-time image. A replica server 418 may create one or more point-in-time images of a component of the data of storage system 400, and each point-in-time image may be created according a defined schedule, or on demand (e.g., in response to a client demand, or as demanded by an administrator of storage system 400, etc.). The source system may also create various copies/replicas of a volume locally. For example, every day a replica of a volume can be scheduled. A remote replication system may only replicate a subset of the replicas that are local to the source system. For example, a remote replication system can request a single local copy every week rather than each of the daily local replicas. In another embodiment, the remote replication system can make a replica of the current live volume and ignore any other local replicas of the volume.

In the instance that a replica server 418 goes offline (e.g., due to a failure, being manually taken offline, or otherwise), the replica server 418 may be brought back online and resume synchronizing volume data with storage system 400. However, due to the period of time that the replica server 418 was offline, the data of replica server 418 may be out of sync with the volume data of storage system 400. Accordingly, replica server 418 may retrieve the data that is needed from storage system 400 to re-synchronize with the live volume data of storage system 400. In one embodiment, replica server 418 may implement one or more techniques of the block level incremental backup process to synchronize the volume data. For example, replica server 418 can retrieve the metadata for a live volume (e.g., a tree corresponding to the live volume as maintained by a metadata server). Replica server 418 may then analyze versions of metadata (e.g., comparing the out-of-date synchronization tree of replica server 418 and the retrieved live volume tree). Based on this analysis, replica server 418 can determine changed data blocks of the volume and what blocks needs to be retrieved from storage system 400 to synchronize the volume data. The replica server 418 may request any changed data blocks from storage system 400 and the retrieved blocks may be stored. As replica server 418 is synchronizing its volume data, write requests may still be received and the point-in-time image can still be created. In the instance that a new point-in-time image is being created and the volume data of replica server 418 is not fully synchronized with the live volume data of storage system 400, a data block may not yet be available in the data of replica server 418 to be stored in the new point-in-time image. For example, referring to the new point-in-time image creation process discussed above, the comparison of the metadata of the new tree with the metadata of the live tree may indicate that a block identifier (and therefore block data) has changed. However, the changed block may not yet be synchronized in the volume data of replica server 418. In this scenario, replica server 418 may retrieve the changed block data directly from the storage system 400 (as opposed to pointing to or retrieving the changed block data from the synchronized volume data of replica server 418 as discussed above).

After replication of a volume has completed, the replication can be verified. In one embodiment, this is done by the source system sending to the replica system one or more merkle tree nodes. The replica system can then compare the received merkle tree nodes with the corresponding merkle tree nodes of the replicated copy of the source volume. If any corresponding nodes do not match, the data was not properly replicated between the source system and the replica system. In this embodiment, the merkle tree on the replica side is updated as blocks of data are written to cached data structures and/or storage. Accordingly, the merkle tree is being updated on the replica system in a similar way as the merkle tree was updated on the source side. In one embodiment, the top level node of the merkle tree is compared. In other embodiments, the top two, three, etc., layers of the merkle tree are compared. For this comparison to work properly, the source side and the replica side must be in sync in regard to any data that is to be written. For example, if data is written on the source side, the replica side must also handle that write prior to the verification step. In one embodiment, this is accomplished through messaging between the source and replica systems. Once the replication is complete, the replica server can send a message requesting verification data. The source system can pause handling write requests until the verification data, e.g., the merkle tree nodes, are sent to the replica side. The replica side receiving the verification data handles any queued write requests prior to comparing the received verification data with local data. Once verification is done, the replica system can send a message and the I/O can continue. In another embodiment, the replica side can queue any received I/O requests from the source side. This allows the source side to begin handling I/O as soon as the verification data has been sent to the replica system. Once the verification is done, the replica system can handle any queued I/O requests. Verification can be done at any point during the replication process. The only requirement is that the source and replica side be in sync in regard to handling write requests. For example, after a certain number of blocks have been replicated or after a predetermined amount of time has passed, the replica server can request verification data from the source system.

Replication data between different systems can impact the performance of both systems. Quality of service can be implemented on both the source system and the replica system to ensure adequate service is provided based upon quality of service provisions. Embodiments of quality of service provisions that can be used in replication are described in U.S. application Ser. No. 13/856,958, which is incorporated by reference in its entirety. The quality of service allocated for I/O for a particular volume can be different on the source system compared to the replica system. For example, the replica system may have allocated 1,000 input output per second (IOPs), while the source system has allocated 5,000 IOPs for a particular volume. In this situation, the source system could overload the replica system's ability to handle the IOPs associated with replicating the volume from the source system to the replica system. Once the IOPs threshold has been reached on the replica system, the handling of I/O can be paused. A timer can be used to monitor how long I/O has been paused. If the timer exceeds some threshold, the replication of the source volume can be stopped and reported.

To reduce replications from being stopped, volumes that are to be replicated can be sorted based upon quality of service (QoS) parameters associated with the volumes. In one embodiment, sorting is done on the sum of QoS parameters from the source system and the replica system. This sum can represent a relative importance of a volume, with higher QoS parameters being more important than lower level QoS parameter volumes. In another embodiment, the ratio of the replica QoS parameter to the source QoS parameter is used to sort the volumes. Volumes with higher ratios indicate that the replication of those volumes are likely to successfully finish. Volumes whose ratios fall below a threshold amount can be flagged as volumes whose replication may not successfully finish due to QoS provisions. For example, if the ratio is less than one, the source side's QoS provisions could force the replica side to throttle I/O to the point that the replica side terminates the replication as described above. In another embodiment, the volumes can be sorted based upon the replica system's QoS parameter only. This allows volumes to be given high replication priority by increasing the QoS provisions of the volume on the replica server, without having to modify the source side's QoS provisions. Accordingly, a replication of a volume can be assured to successfully complete based upon a high QoS parameter on the replica side. In another embodiment, the volumes can be sorted based upon the source system's QoS parameter only. Once the volumes have been sorted, replication can begin in an ordered fashion based upon the sorting. Warnings can be generated for any volume that is below some threshold, e.g., ratio below a threshold, sum is below a threshold, etc. The warnings can provide information regarding the replication and the QoS parameters, such that the QoS parameters can be modified to remove future warnings.

One or more flow diagrams have been used herein. The use of flow diagrams is not meant to be limiting with respect to the order of operations performed. The herein-described subject matter sometimes illustrates different components contained within, or connected with, different other components. It is to be understood that such depicted architectures are merely exemplary, and that in fact many other architectures can be implemented which achieve the same functionality. In a conceptual sense, any arrangement of components to achieve the same functionality is effectively “associated” such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as “associated with” each other such that the desired functionality is achieved, irrespective of architectures or intermedial components. Likewise, any two components so associated can also be viewed as being “operably connected,” or “operably coupled,” to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being “operably couplable” to each other to achieve the desired functionality. Specific examples of operably couplable include but are not limited to physically mateable and/or physically interacting components and/or wirelessly interactable and/or wirelessly interacting components and/or logically interacting and/or logically interactable components.

With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.

It will be understood by those within the art that, in general, terms used herein, and especially in the appended claims (e.g., bodies of the appended claims) are generally intended as “open” terms (e.g., the term “including” should be interpreted as “including but not limited to,” the term “having” should be interpreted as “having at least,” the term “includes” should be interpreted as “includes but is not limited to,” etc.). It will be further understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases “at least one” and “one or more” to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim recitation to inventions containing only one such recitation, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an” (e.g., “a” and/or “an” should typically be interpreted to mean “at least one” or “one or more”); the same holds true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should typically be interpreted to mean at least the recited number (e.g., the bare recitation of “two recitations,” without other modifiers, typically means at least two recitations, or two or more recitations). Furthermore, in those instances where a convention analogous to “at least one of A, B, and C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “a system having at least one of A, B, and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). In those instances where a convention analogous to “at least one of A, B, or C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “a system having at least one of A, B, or C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B.”

The foregoing description of illustrative implementations has been presented for purposes of illustration and of description. It is not intended to be exhaustive or limiting with respect to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the disclosed implementations. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents.

Claims

1. A system comprising: a processor; anda non-transitory computer-readable medium having instructions stored thereon that when executed by the processor cause the system to: maintain a data structure indicative of a subset of data blocks that have been written to a replicated volume of the system during synchronization of the data blocks from a source volume of the system to the replicated volume;compare first metadata with second metadata to determine changes in content of the data blocks of the source volume between a first point-in-time and a second point-in-time, wherein the first metadata is a first hash tree having block identifiers indicating the content of the data blocks of the source volume at the first point-in-time and the second metadata is a second hash tree having the blocks identifiers indicating the content of the data blocks of the source volume at the second point-in-time, wherein the first point-in-time is prior to the second point-in-time; andfor each data block of the source volume determined to have changed based upon comparison of the first and second hash trees, update the replicated volume with the data block when the data structure indicates the data block has not been written to the replicated volume during the synchronization.
2. The system of claim 1, wherein each of the block identifiers comprises a hash of the content of a corresponding data block that uniquely identifies the corresponding data block of the volume.
3. The system of claim 1, wherein comparison of the first metadata and the second metadata comprises: analyzing whether a block identifier of the first hash tree is different from the block identifier of the second hash tree; andresponsive to a result of the analyzing being affirmative, updating the replicated volume with the data block corresponding to block identifier of the first hash tree.
4. The system of claim 1, wherein each of the first hash tree and the second hash tree further comprises: a plurality of leaf nodes (leaves) configured to store the block identifiers; anda non-leaf node coupled to two or more of the leaves, the non-leaf node storing a hash value of the block identifiers of the two or more leaves.
5. The system of claim 4, wherein comparison of the first metadata and the second metadata comprises: analyzing whether the hash value of the non-leaf node of the first hash tree is different from the hash value of the non-leaf node of the second hash tree;responsive to a result of the analyzing being affirmative, evaluating whether a block identifier of a leaf node of the two or more leaves of the first hash tree is different from the block identifier of the corresponding leaf node of the two or more leaves of the second hash tree; andresponsive to a result of the evaluating being affirmative, updating the replicated volume with the data block corresponding to the block identifier of the leaf node of the two or more leaves of the first hash tree.
6. The system of claim 1, wherein of the first hash tree and the second hash tree further comprises: a plurality of leaf nodes (leaves) configured to store the block identifiers;two or more internal nodes, each internal node coupled to two or more of the leaves and configured to store a first hash value of the block identifiers of the two or more leaves; anda root node coupled to the two or more internal nodes and configured to store a second hash value of the first hash values of the two or more internal nodes.
7. The system of claim 6 wherein comparison of the first metadata and the second metadata comprises: analyzing whether the second hash value of the root node of the first hash tree is different from the second hash value of the root node of the second hash tree;responsive to a result of the analyzing being affirmative, evaluating whether the first hash value of each non-leaf node of the first hash tree is different from the first hash value of each non-leaf node of the second hash tree;responsive to a result of the evaluating being affirmative, determining whether a block identifier of a leave coupled to the non-leaf node of the first hash tree is different from the block identifier of the corresponding leave coupled to the corresponding non-leaf node of the second hash tree; andresponsive to a result of the determining being affirmative, updating the replicated volume with the data block corresponding to block identifier of the first hash tree.
8. The system of claim 1, wherein the instructions further cause the system to: initiate the data structure to track the data blocks written to the replicated volume; andupdate the data structure to indicate the data blocks written to the replicated volume.
9. The system of claim 1, wherein the data blocks of the replicated volume are randomly and evenly distributed across a cluster containing the replicated volume.
10. A method comprising: maintaining a data structure indicative of a subset of data blocks that have been written to a replicated volume of a storage system during synchronization of the data blocks from a source volume of the storage system to the replicated volume;comparing first metadata with second metadata to determine changes in content of the data blocks of the source volume between a first point-in-time and a second point-in-time, wherein the first metadata is a first hash tree having block identifiers indicating the content of the data blocks of the source volume at the first point-in-time and the second metadata is a second hash tree having the blocks identifiers indicating the content of the data blocks of the source volume at the second point-in-time, wherein the first point-in-time is prior to the second point-in-time; andfor each data block of the source volume determined to have changed based upon comparison of the first and second hash trees, updating the replicated volume with the data block when the data structure indicates the data block has not been written to the replicated volume during the synchronization.
11. The method of claim 10, wherein each of the block identifiers comprises a hash of the content of a corresponding data block that uniquely identifies the corresponding data block of the volume.
12. The method of claim 10, wherein said comparing further comprises: analyzing whether a block identifier of the first hash tree is different from the block identifier of the second hash tree; andresponsive to a result of the analyzing being affirmative, updating the replicated volume with the data block corresponding to block identifier of the first hash tree.
13. The method of claim 10, wherein each of the first hash tree and the second hash tree further comprises: a plurality of leaf nodes (leaves) configured to store the block identifiers; anda non-leaf node coupled to two or more of the leaves, the non-leaf node storing a hash value of the block identifiers of the two or more leaves.
14. The method of claim 13, wherein said comparing further comprises: analyzing whether the hash value of the non-leaf node of the first hash tree is different from the hash value of the non-leaf node of the second hash tree;responsive to a result of the analyzing being affirmative, evaluating whether a block identifier of a leaf node of the two or more leaves of the first hash tree is different from the block identifier of the corresponding leaf node of the two or more leaves of the second hash tree; andresponsive to a result of the evaluating being affirmative, updating the replicated volume with the data block corresponding to the block identifier of the first hash tree.
15. The method of 10, wherein each of the first hash tree and the second hash tree further comprises: a plurality of leaf nodes (leaves) configured to store the block identifiers;two or more internal nodes, each internal node coupled to two or more of the leaves and configured to store a first hash value of the block identifiers of the two or more leaves; anda root node coupled to the two or more internal nodes and configured to store a second hash value of the first hash values of the two or more internal nodes.
16. The method of claim 15, wherein said comparing further comprises: analyzing whether the second hash value of the root node of the first hash tree is different from the second hash value of the root node of the second hash tree;responsive to a result of the analyzing being affirmative, evaluating whether the first hash value of each non-leaf node of the first hash tree is different from the first hash value of each non-leaf node of the second hash tree;responsive to a result of the evaluating being affirmative, determining whether a block identifier of a leave coupled to the non-leaf node of the first hash tree is different from the block identifier of the corresponding leave coupled to the corresponding non-leaf node of the second hash tree; andresponsive to a result of the determining being affirmative, updating the replicated volume with the data block corresponding to block identifier of the first hash tree.
17. The method of claim 10, further comprising: initiating the data structure to track the data blocks written to the replicated volume; andupdating the data structure to indicate the data block written to the replicated volume.
18. The method of claim 10, wherein the data blocks of the replicated volume are randomly and evenly distributed across a cluster containing the replicated volume.
19. A non-transitory computer-readable medium embodying a set of instructions, which when executed by a processor of a storage system, causes the storage system to: maintain a data structure indicative of a subset of data blocks that have been written to a replicated volume of the storage system during synchronization of the data blocks from a source volume of the storage system to the replicated volume;compare first metadata with second metadata to determine changes in content of the data blocks of the source volume between a first point-in-time and a second point-in-time, wherein the first metadata is a first hash tree having block identifiers indicating the content of the data blocks of the source volume at the first point-in-time and the second metadata is a second hash tree having the blocks identifiers indicating the content of the data blocks of the source volume at the second point-in-time, wherein the first point-in-time is prior to the second point-in-time; andfor each data block of the source volume determined to have changed based upon comparison of the first and second hash trees, update the replicated volume with the data block when the data structure indicates the data block has not been written to the replicated volume during the synchronization.
20. The non-transitory computer-readable medium of claim 19, each of the block identifiers comprises a hash of the content of a corresponding data block that uniquely identifies the corresponding data block of the volume.
21. The system of claim 1, wherein the first hash tree and the second hash tree comprise Merkle trees.
22. The system of claim 1, wherein the data structure comprises a bit field.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of U.S. patent application Ser. No. 14/684,929, filed Apr. 13, 2015, now U.S. Pat. No. 10,628,443, which is a continuation of U.S. patent application Ser. No. 14/186,847, filed Feb. 21, 2014, now abandoned, the entire contents of which are incorporated herein by reference in their entirety.

US Referenced Citations (794)

Number	Name	Date	Kind
5138697	Yamamoto et al.	Aug 1992	A
5375216	Moyer et al.	Dec 1994	A
5459857	Ludlam et al.	Oct 1995	A
5511190	Sharma et al.	Apr 1996	A
5542089	Lindsay et al.	Jul 1996	A
5592432	Vishlitzky et al.	Jan 1997	A
5603001	Sukegawa et al.	Feb 1997	A
5611073	Malpure et al.	Mar 1997	A
5734859	Yorimitsu et al.	Mar 1998	A
5734898	He	Mar 1998	A
5751993	Ofek et al.	May 1998	A
5860082	Smith et al.	Jan 1999	A
5864698	Krau et al.	Jan 1999	A
5890161	Helland et al.	Mar 1999	A
5937425	Ban	Aug 1999	A
5974421	Krishnaswamy et al.	Oct 1999	A
5991862	Ruane	Nov 1999	A
6047283	Braun et al.	Apr 2000	A
6067541	Raju et al.	May 2000	A
6081900	Subramaniam et al.	Jun 2000	A
6219800	Johnson et al.	Apr 2001	B1
6257756	Zarubinsky et al.	Jul 2001	B1
6275898	Dekoning	Aug 2001	B1
6347337	Shah et al.	Feb 2002	B1
6363385	Kedem et al.	Mar 2002	B1
6385699	Bozman et al.	May 2002	B1
6397307	Ohran	May 2002	B2
6434555	Frolund et al.	Aug 2002	B1
6434662	Greene et al.	Aug 2002	B1
6526478	Kirby	Feb 2003	B1
6553384	Frey et al.	Apr 2003	B1
6560196	Wei	May 2003	B1
6567817	Vanleer et al.	May 2003	B1
6578158	Deitz et al.	Jun 2003	B1
6604155	Chong, Jr.	Aug 2003	B1
6609176	Mizuno	Aug 2003	B1
6640312	Thomson et al.	Oct 2003	B1
6681389	Engel et al.	Jan 2004	B1
6704839	Butterworth et al.	Mar 2004	B2
6728843	Pong et al.	Apr 2004	B1
6741698	Jensen	May 2004	B1
6779003	Midgley	Aug 2004	B1
6795890	Sugai et al.	Sep 2004	B1
6895500	Rothberg	May 2005	B1
6904470	Ofer et al.	Jun 2005	B1
6912645	Dorward et al.	Jun 2005	B2
6917898	Kirubalaratnam et al.	Jul 2005	B1
6928521	Burton et al.	Aug 2005	B1
6928526	Zhu et al.	Aug 2005	B1
6961865	Ganesh et al.	Nov 2005	B1
7003565	Hind et al.	Feb 2006	B2
7028218	Schwarm et al.	Apr 2006	B2
7039694	Kampe et al.	May 2006	B2
7047358	Lee et al.	May 2006	B2
7055058	Lee et al.	May 2006	B2
7065619	Zhu et al.	Jun 2006	B1
7093086	Van Rietschote	Aug 2006	B1
7110913	Monroe et al.	Sep 2006	B2
7152142	Guha et al.	Dec 2006	B1
7167951	Blades et al.	Jan 2007	B2
7174379	Agarwal et al.	Feb 2007	B2
7177853	Ezra et al.	Feb 2007	B1
7188149	Kishimoto	Mar 2007	B2
7191357	Holland et al.	Mar 2007	B2
7219260	De Forest et al.	May 2007	B1
7249150	Watanabe et al.	Jul 2007	B1
7251663	Smith	Jul 2007	B1
7257690	Baird	Aug 2007	B1
7305579	Williams	Dec 2007	B2
7325059	Barach et al.	Jan 2008	B2
7334094	Fair	Feb 2008	B2
7334095	Fair et al.	Feb 2008	B1
7366865	Lakshmanamurthy et al.	Apr 2008	B2
7370048	Loeb	May 2008	B2
7373345	Carpentier et al.	May 2008	B2
7394944	Boskovic et al.	Jul 2008	B2
7395283	Atzmony et al.	Jul 2008	B1
7395352	Lam et al.	Jul 2008	B1
7415653	Bonwick et al.	Aug 2008	B1
7451167	Bali et al.	Nov 2008	B2
7454592	Shah et al.	Nov 2008	B1
7457864	Chambliss et al.	Nov 2008	B2
7464125	Orszag et al.	Dec 2008	B1
7519725	Alvarez et al.	Apr 2009	B2
7526685	Maso et al.	Apr 2009	B2
7529780	Braginsky et al.	May 2009	B1
7529830	Fujii	May 2009	B2
7543100	Singhal	Jun 2009	B2
7543178	McNeill et al.	Jun 2009	B2
7562101	Jernigan, IV et al.	Jul 2009	B1
7562203	Scott et al.	Jul 2009	B2
7603391	Federwisch et al.	Oct 2009	B1
7603529	Machardy et al.	Oct 2009	B1
7624112	Ganesh et al.	Nov 2009	B2
7644087	Barkai et al.	Jan 2010	B2
7650476	Ashour	Jan 2010	B2
7657578	Karr et al.	Feb 2010	B1
7668885	Wittke et al.	Feb 2010	B2
7680837	Yamato	Mar 2010	B2
7681076	Sarma	Mar 2010	B1
7689716	Short et al.	Mar 2010	B2
7701948	Rabje et al.	Apr 2010	B2
7730153	Gole et al.	Jun 2010	B1
7739614	Hackworth	Jun 2010	B1
7743035	Chen et al.	Jun 2010	B2
7757056	Fair	Jul 2010	B1
7797279	Starling et al.	Sep 2010	B1
7805266	Dasu et al.	Sep 2010	B1
7805583	Todd et al.	Sep 2010	B1
7814064	Vingralek	Oct 2010	B2
7817562	Kemeny	Oct 2010	B1
7818525	Frost et al.	Oct 2010	B1
7831736	Thompson	Nov 2010	B1
7831769	Wen et al.	Nov 2010	B1
7849098	Scales et al.	Dec 2010	B1
7849281	Malhotra et al.	Dec 2010	B2
7873619	Faibish et al.	Jan 2011	B1
7899791	Gole	Mar 2011	B1
7917726	Hummel et al.	Mar 2011	B2
7921169	Jacobs et al.	Apr 2011	B2
7921325	Kondo et al.	Apr 2011	B2
7949693	Mason et al.	May 2011	B1
7953878	Trimble	May 2011	B1
7962709	Agrawal	Jun 2011	B2
7987167	Kazar et al.	Jul 2011	B1
7996636	Prakash et al.	Aug 2011	B1
8055745	Atluri	Nov 2011	B2
8060797	Hida et al.	Nov 2011	B2
8074019	Gupta et al.	Dec 2011	B2
8078918	Diggs et al.	Dec 2011	B2
8082390	Fan et al.	Dec 2011	B1
8086585	Brashers et al.	Dec 2011	B1
8089969	Rabie et al.	Jan 2012	B2
8090908	Bolen et al.	Jan 2012	B1
8099396	Novick et al.	Jan 2012	B1
8099554	Solomon et al.	Jan 2012	B1
8122213	Cherian et al.	Feb 2012	B2
8127182	Sivaperuman et al.	Feb 2012	B2
8131926	Lubbers et al.	Mar 2012	B2
8140821	Raizen	Mar 2012	B1
8140860	Haswell	Mar 2012	B2
8145838	Miller et al.	Mar 2012	B1
8156016	Zhang	Apr 2012	B2
8156290	Vanninen et al.	Apr 2012	B1
8156306	Raizen	Apr 2012	B1
8184807	Kato et al.	May 2012	B2
8205065	Matze	Jun 2012	B2
8209587	Taylor et al.	Jun 2012	B1
8214868	Hamilton et al.	Jul 2012	B2
8224935	Bandopadhyay et al.	Jul 2012	B1
8225135	Barrall et al.	Jul 2012	B2
8244978	Kegel et al.	Aug 2012	B2
8250116	Mazzagatti et al.	Aug 2012	B2
8261085	Fernandez	Sep 2012	B1
8312231	Li et al.	Nov 2012	B1
8327103	Can et al.	Dec 2012	B1
8332357	Chung	Dec 2012	B1
8341457	Spry et al.	Dec 2012	B2
8369217	Bostica et al.	Feb 2013	B2
8417987	Goel et al.	Apr 2013	B1
8429096	Soundararajan et al.	Apr 2013	B1
8429282	Ahuja et al.	Apr 2013	B1
8452929	Bennett	May 2013	B2
8463825	Harty et al.	Jun 2013	B1
8468180	Meiri et al.	Jun 2013	B1
8468368	Gladwin et al.	Jun 2013	B2
8484439	Frailong et al.	Jul 2013	B1
8489811	Corbett et al.	Jul 2013	B1
8495417	Jernigan, IV et al.	Jul 2013	B2
8510265	Boone et al.	Aug 2013	B1
8515965	Mital et al.	Aug 2013	B2
8520855	Kohno et al.	Aug 2013	B1
8533410	Corbett et al.	Sep 2013	B1
8539008	Faith et al.	Sep 2013	B2
8543611	Mirtich et al.	Sep 2013	B1
8549154	Colrain et al.	Oct 2013	B2
8555019	Montgomery et al.	Oct 2013	B2
8560879	Goel	Oct 2013	B1
8566508	Borchers et al.	Oct 2013	B2
8566617	Clifford	Oct 2013	B1
8572091	Sivasubramanian et al.	Oct 2013	B1
8577850	Genda et al.	Nov 2013	B1
8583865	Sade et al.	Nov 2013	B1
8589550	Faibish et al.	Nov 2013	B1
8589625	Colgrove et al.	Nov 2013	B2
8595434	Northcutt et al.	Nov 2013	B2
8595595	Grcanac et al.	Nov 2013	B1
8600949	Periyagaram et al.	Dec 2013	B2
8645664	Colgrove et al.	Feb 2014	B1
8645698	Yi et al.	Feb 2014	B2
8671265	Wright	Mar 2014	B2
8706692	Luthra et al.	Apr 2014	B1
8706701	Stefanov et al.	Apr 2014	B1
8712963	Douglis et al.	Apr 2014	B1
8732426	Colgrove et al.	May 2014	B2
8745338	Yadav et al.	Jun 2014	B1
8751763	Ramarao	Jun 2014	B1
8762654	Yang et al.	Jun 2014	B1
8775868	Colgrove et al.	Jul 2014	B2
8782439	Resch	Jul 2014	B2
8787580	Hodges et al.	Jul 2014	B2
8799571	Desroches et al.	Aug 2014	B1
8799601	Chen et al.	Aug 2014	B1
8799705	Hallak et al.	Aug 2014	B2
8806160	Colgrove et al.	Aug 2014	B2
8812450	Kesavan et al.	Aug 2014	B1
8824686	Ishii et al.	Sep 2014	B1
8826023	Harmer et al.	Sep 2014	B1
8832373	Colgrove et al.	Sep 2014	B2
8843711	Yadav et al.	Sep 2014	B1
8849764	Long et al.	Sep 2014	B1
8850108	Hayes et al.	Sep 2014	B1
8850216	Mikhailov et al.	Sep 2014	B1
8855318	Patnala et al.	Oct 2014	B1
8856593	Eckhardt et al.	Oct 2014	B2
8868868	Maheshwari et al.	Oct 2014	B1
8874842	Kimmel et al.	Oct 2014	B1
8880787	Kimmel et al.	Nov 2014	B1
8880788	Sundaram et al.	Nov 2014	B1
8892818	Zheng et al.	Nov 2014	B1
8904137	Zhang et al.	Dec 2014	B1
8904231	Coatney et al.	Dec 2014	B2
8922928	Powell	Dec 2014	B2
8930778	Cohen	Jan 2015	B2
8943032	Xu et al.	Jan 2015	B1
8943282	Armangau et al.	Jan 2015	B1
8949568	Wei et al.	Feb 2015	B2
8977781	Yokoi et al.	Mar 2015	B1
8996468	Mattox	Mar 2015	B1
8996535	Kimmel et al.	Mar 2015	B1
8996790	Segal et al.	Mar 2015	B1
8996797	Zheng et al.	Mar 2015	B1
9003162	Lomet et al.	Apr 2015	B2
9009449	Chou et al.	Apr 2015	B2
9021303	Desouter et al.	Apr 2015	B1
9026694	Davidson et al.	May 2015	B1
9037544	Zheng et al.	May 2015	B1
9047211	Wood et al.	Jun 2015	B2
9058119	Ray, III et al.	Jun 2015	B1
9092142	Nashimoto	Jul 2015	B2
9152684	Zheng et al.	Oct 2015	B2
9170746	Sundaram et al.	Oct 2015	B2
9195939	Goyal et al.	Nov 2015	B1
9201742	Bulkowski et al.	Dec 2015	B2
9201804	Egyed	Dec 2015	B1
9225801	McMullen et al.	Dec 2015	B1
9229642	Shu et al.	Jan 2016	B2
9256549	Kimmel et al.	Feb 2016	B2
9268502	Zheng et al.	Feb 2016	B2
9274901	Veerla et al.	Mar 2016	B2
9286413	Coates et al.	Mar 2016	B1
9298417	Muddu et al.	Mar 2016	B1
9367241	Sundaram et al.	Jun 2016	B2
9378043	Zhang et al.	Jun 2016	B1
9389958	Sundaram et al.	Jul 2016	B2
9405783	Kimmel et al.	Aug 2016	B2
9411620	Wang et al.	Aug 2016	B2
9413680	Kusters et al.	Aug 2016	B1
9418131	Halevi	Aug 2016	B1
9438665	Vasanth et al.	Sep 2016	B1
9459856	Curzi et al.	Oct 2016	B2
9460009	Taylor et al.	Oct 2016	B1
9471680	Elsner et al.	Oct 2016	B2
9483349	Sundaram et al.	Nov 2016	B2
9537827	McMullen et al.	Jan 2017	B1
9572091	Lee et al.	Feb 2017	B2
9606874	Moore et al.	Mar 2017	B2
9639293	Guo et al.	May 2017	B2
9639546	Gorski et al.	May 2017	B1
9652405	Shain et al.	May 2017	B1
9690703	Jess et al.	Jun 2017	B1
9779123	Sen et al.	Oct 2017	B2
9785525	Watanabe et al.	Oct 2017	B2
9798497	Schick et al.	Oct 2017	B1
9817858	Eisenreich et al.	Nov 2017	B2
9846642	Choi et al.	Dec 2017	B2
9852076	Garg et al.	Dec 2017	B1
9953351	Sivasubramanian et al.	Apr 2018	B1
9954946	Shetty et al.	Apr 2018	B2
10216966	McClanahan et al.	Feb 2019	B2
10516582	Wright et al.	Dec 2019	B2
10565230	Zheng et al.	Feb 2020	B2
10642763	Longo et al.	May 2020	B2
10997098	Longo et al.	May 2021	B2
20010056543	Isomura et al.	Dec 2001	A1
20020073354	Schroiff et al.	Jun 2002	A1
20020091897	Chiu et al.	Jul 2002	A1
20020116569	Kim et al.	Aug 2002	A1
20020156891	Ulrich et al.	Oct 2002	A1
20020158898	Hsieh et al.	Oct 2002	A1
20020174419	Alvarez et al.	Nov 2002	A1
20020175938	Hackworth	Nov 2002	A1
20020188711	Meyer et al.	Dec 2002	A1
20030005147	Enns et al.	Jan 2003	A1
20030028642	Agarwal et al.	Feb 2003	A1
20030084251	Gaither et al.	May 2003	A1
20030105928	Ash et al.	Jun 2003	A1
20030115204	Greenblatt et al.	Jun 2003	A1
20030115282	Rose	Jun 2003	A1
20030120869	Lee et al.	Jun 2003	A1
20030126118	Burton et al.	Jul 2003	A1
20030126143	Roussopoulos et al.	Jul 2003	A1
20030135609	Carlson et al.	Jul 2003	A1
20030135729	Mason et al.	Jul 2003	A1
20030145041	Dunham et al.	Jul 2003	A1
20030159007	Sawdon et al.	Aug 2003	A1
20030163628	Lin et al.	Aug 2003	A1
20030172059	Andrei	Sep 2003	A1
20030182312	Chen et al.	Sep 2003	A1
20030182322	Manley et al.	Sep 2003	A1
20030191916	McBrearty et al.	Oct 2003	A1
20030195895	Nowicki et al.	Oct 2003	A1
20030200388	Hetrick	Oct 2003	A1
20030212872	Patterson et al.	Nov 2003	A1
20030223445	Lodha	Dec 2003	A1
20040003173	Yao et al.	Jan 2004	A1
20040052254	Hooper	Mar 2004	A1
20040054656	Leung et al.	Mar 2004	A1
20040107281	Bose et al.	Jun 2004	A1
20040133590	Henderson et al.	Jul 2004	A1
20040133622	Clubb et al.	Jul 2004	A1
20040133742	Vasudevan et al.	Jul 2004	A1
20040153544	Kelliher et al.	Aug 2004	A1
20040153863	Klotz et al.	Aug 2004	A1
20040158549	Matena et al.	Aug 2004	A1
20040186858	McGovern et al.	Sep 2004	A1
20040205166	Demoney	Oct 2004	A1
20040210794	Frey et al.	Oct 2004	A1
20040215792	Koning et al.	Oct 2004	A1
20040236846	Alvarez	Nov 2004	A1
20040267836	Armangau et al.	Dec 2004	A1
20040267932	Voellm et al.	Dec 2004	A1
20050010653	McCanne	Jan 2005	A1
20050027817	Novik	Feb 2005	A1
20050039156	Catthoor et al.	Feb 2005	A1
20050043834	Rotariu et al.	Feb 2005	A1
20050044244	Warwick et al.	Feb 2005	A1
20050076113	Klotz et al.	Apr 2005	A1
20050076115	Andrews et al.	Apr 2005	A1
20050080923	Elzur	Apr 2005	A1
20050091261	Wu et al.	Apr 2005	A1
20050108472	Kanai et al.	May 2005	A1
20050119996	Ohata et al.	Jun 2005	A1
20050128951	Chawla et al.	Jun 2005	A1
20050138285	Takaoka et al.	Jun 2005	A1
20050144514	Ulrich et al.	Jun 2005	A1
20050177770	Coatney et al.	Aug 2005	A1
20050203930	Bukowski et al.	Sep 2005	A1
20050216503	Charlot et al.	Sep 2005	A1
20050228885	Winfield et al.	Oct 2005	A1
20050246362	Borland	Nov 2005	A1
20050246398	Barzilai et al.	Nov 2005	A1
20060004957	Hand et al.	Jan 2006	A1
20060071845	Stroili et al.	Apr 2006	A1
20060072555	St. Hilaire et al.	Apr 2006	A1
20060072593	Grippo et al.	Apr 2006	A1
20060074977	Kothuri et al.	Apr 2006	A1
20060075467	Sanda et al.	Apr 2006	A1
20060085166	Ochi et al.	Apr 2006	A1
20060101091	Carbajales et al.	May 2006	A1
20060101202	Mannen et al.	May 2006	A1
20060112155	Earl et al.	May 2006	A1
20060129676	Modi et al.	Jun 2006	A1
20060136718	Moreillon	Jun 2006	A1
20060156059	Kitamura	Jul 2006	A1
20060165074	Modi et al.	Jul 2006	A1
20060206671	Aiello et al.	Sep 2006	A1
20060232826	Bar-El	Oct 2006	A1
20060253749	Alderegula et al.	Nov 2006	A1
20060282662	Whitcomb	Dec 2006	A1
20060288151	McKenney	Dec 2006	A1
20070016617	Lomet	Jan 2007	A1
20070033376	Sinclair et al.	Feb 2007	A1
20070033433	Pecone et al.	Feb 2007	A1
20070061572	Imai et al.	Mar 2007	A1
20070064604	Chen et al.	Mar 2007	A1
20070083482	Rathi	Apr 2007	A1
20070083722	Per et al.	Apr 2007	A1
20070088702	Fridella	Apr 2007	A1
20070094452	Fachan	Apr 2007	A1
20070106706	Ahrens et al.	May 2007	A1
20070109592	Parvathaneni et al.	May 2007	A1
20070112723	Alvarez et al.	May 2007	A1
20070112955	Clemm et al.	May 2007	A1
20070136269	Yamakabe et al.	Jun 2007	A1
20070143359	Uppala et al.	Jun 2007	A1
20070186066	Desai	Aug 2007	A1
20070186127	Desai	Aug 2007	A1
20070208537	Savoor et al.	Sep 2007	A1
20070208918	Harbin	Sep 2007	A1
20070234106	Lecrone et al.	Oct 2007	A1
20070245041	Hua et al.	Oct 2007	A1
20070255530	Wolff	Nov 2007	A1
20070266037	Terry et al.	Nov 2007	A1
20070300013	Kitamura	Dec 2007	A1
20080019359	Droux et al.	Jan 2008	A1
20080065639	Choudhary et al.	Mar 2008	A1
20080071939	Tanaka et al.	Mar 2008	A1
20080104264	Duerk et al.	May 2008	A1
20080126695	Berg	May 2008	A1
20080127211	Belsey et al.	May 2008	A1
20080155190	Ash et al.	Jun 2008	A1
20080162079	Astigarraga et al.	Jul 2008	A1
20080162990	Wang et al.	Jul 2008	A1
20080165899	Rahman	Jul 2008	A1
20080168226	Wang et al.	Jul 2008	A1
20080184063	Abdulvahid	Jul 2008	A1
20080201535	Hara	Aug 2008	A1
20080212938	Sato et al.	Sep 2008	A1
20080228691	Shavit et al.	Sep 2008	A1
20080244158	Funatsu et al.	Oct 2008	A1
20080244354	Wu et al.	Oct 2008	A1
20080250270	Bennett	Oct 2008	A1
20080270719	Cochran et al.	Oct 2008	A1
20090019449	Choi et al.	Jan 2009	A1
20090031083	Willis et al.	Jan 2009	A1
20090037500	Kirshenbaum	Feb 2009	A1
20090037654	Allison et al.	Feb 2009	A1
20090043878	Ni	Feb 2009	A1
20090083478	Kunimatsu et al.	Mar 2009	A1
20090097654	Blake	Apr 2009	A1
20090132770	Lin et al.	May 2009	A1
20090144497	Withers	Jun 2009	A1
20090150537	Fanson	Jun 2009	A1
20090157870	Nakadai	Jun 2009	A1
20090193206	Ishii et al.	Jul 2009	A1
20090204636	Li et al.	Aug 2009	A1
20090210611	Mizushima	Aug 2009	A1
20090210618	Bates et al.	Aug 2009	A1
20090225657	Haggar et al.	Sep 2009	A1
20090235022	Bates et al.	Sep 2009	A1
20090235110	Kurokawa et al.	Sep 2009	A1
20090249001	Narayanan et al.	Oct 2009	A1
20090249019	Wu et al.	Oct 2009	A1
20090271412	Lacapra	Oct 2009	A1
20090276567	Burkey	Nov 2009	A1
20090276771	Nickolov et al.	Nov 2009	A1
20090285476	Choe et al.	Nov 2009	A1
20090299940	Hayes et al.	Dec 2009	A1
20090307290	Barsness et al.	Dec 2009	A1
20090313451	Inoue et al.	Dec 2009	A1
20090313503	Atluri et al.	Dec 2009	A1
20090327604	Sato et al.	Dec 2009	A1
20100011037	Kazar	Jan 2010	A1
20100023726	Aviles	Jan 2010	A1
20100030981	Cook	Feb 2010	A1
20100031000	Flynn et al.	Feb 2010	A1
20100031315	Feng	Feb 2010	A1
20100042790	Mondal et al.	Feb 2010	A1
20100057792	Ylonen	Mar 2010	A1
20100070701	Iyigun et al.	Mar 2010	A1
20100077101	Wang et al.	Mar 2010	A1
20100077380	Baker et al.	Mar 2010	A1
20100082648	Potapov et al.	Apr 2010	A1
20100082790	Hussaini et al.	Apr 2010	A1
20100122148	Flynn et al.	May 2010	A1
20100124196	Bonar et al.	May 2010	A1
20100161569	Schreter	Jun 2010	A1
20100161574	Davidson et al.	Jun 2010	A1
20100161850	Otsuka	Jun 2010	A1
20100169415	Leggette et al.	Jul 2010	A1
20100174677	Zahavi et al.	Jul 2010	A1
20100174714	Asmundsson et al.	Jul 2010	A1
20100191713	Lomet et al.	Jul 2010	A1
20100199009	Koide	Aug 2010	A1
20100199040	Schnapp et al.	Aug 2010	A1
20100205353	Miyamoto et al.	Aug 2010	A1
20100205390	Arakawa	Aug 2010	A1
20100217953	Beaman et al.	Aug 2010	A1
20100223385	Gulley et al.	Sep 2010	A1
20100228795	Hahn et al.	Sep 2010	A1
20100228999	Maheshwari et al.	Sep 2010	A1
20100250497	Redlich et al.	Sep 2010	A1
20100250712	Ellison et al.	Sep 2010	A1
20100262812	Lopez et al.	Oct 2010	A1
20100268983	Raghunandan	Oct 2010	A1
20100269044	Ivanyi et al.	Oct 2010	A1
20100280998	Goebel et al.	Nov 2010	A1
20100281080	Rajaram et al.	Nov 2010	A1
20100293147	Snow et al.	Nov 2010	A1
20100306468	Shionoya	Dec 2010	A1
20100309933	Stark et al.	Dec 2010	A1
20110004707	Spry et al.	Jan 2011	A1
20110022778	Schibilla et al.	Jan 2011	A1
20110035548	Kimmel et al.	Feb 2011	A1
20110066808	Flynn et al.	Mar 2011	A1
20110072008	Mandal et al.	Mar 2011	A1
20110078496	Jeddeloh	Mar 2011	A1
20110087929	Koshiyama	Apr 2011	A1
20110093674	Frame et al.	Apr 2011	A1
20110099342	Ozdemir	Apr 2011	A1
20110099419	Lucas et al.	Apr 2011	A1
20110119412	Orfitelli	May 2011	A1
20110119668	Calder et al.	May 2011	A1
20110126045	Bennett et al.	May 2011	A1
20110153603	Adiba et al.	Jun 2011	A1
20110153719	Santoro et al.	Jun 2011	A1
20110153972	Laberge	Jun 2011	A1
20110154103	Bulusu et al.	Jun 2011	A1
20110161293	Vermeulen et al.	Jun 2011	A1
20110161725	Allen et al.	Jun 2011	A1
20110173401	Usgaonkar et al.	Jul 2011	A1
20110191389	Okamoto	Aug 2011	A1
20110191522	Condict et al.	Aug 2011	A1
20110196842	Timashev et al.	Aug 2011	A1
20110202516	Rugg et al.	Aug 2011	A1
20110213928	Grube et al.	Sep 2011	A1
20110219106	Wright	Sep 2011	A1
20110231624	Fukutomi et al.	Sep 2011	A1
20110238857	Certain et al.	Sep 2011	A1
20110246733	Usgaonkar et al.	Oct 2011	A1
20110246821	Eleftheriou et al.	Oct 2011	A1
20110283048	Feldman et al.	Nov 2011	A1
20110286123	Montgomery et al.	Nov 2011	A1
20110289565	Resch et al.	Nov 2011	A1
20110296133	Flynn et al.	Dec 2011	A1
20110302572	Kuncoro et al.	Dec 2011	A1
20110307530	Patterson	Dec 2011	A1
20110311051	Resch et al.	Dec 2011	A1
20110314346	Vas et al.	Dec 2011	A1
20120003940	Hirano	Jan 2012	A1
20120011176	Aizman	Jan 2012	A1
20120011340	Flynn et al.	Jan 2012	A1
20120016840	Lin et al.	Jan 2012	A1
20120047115	Subramanya et al.	Feb 2012	A1
20120054746	Vaghani et al.	Mar 2012	A1
20120063306	Sultan et al.	Mar 2012	A1
20120066204	Ball et al.	Mar 2012	A1
20120072656	Archak et al.	Mar 2012	A1
20120072680	Kimura et al.	Mar 2012	A1
20120078856	Linde	Mar 2012	A1
20120084506	Colgrove et al.	Apr 2012	A1
20120109895	Zwilling et al.	May 2012	A1
20120109936	Zhang et al.	May 2012	A1
20120124282	Frank et al.	May 2012	A1
20120136834	Zhao	May 2012	A1
20120143877	Kumar	Jun 2012	A1
20120150869	Wang et al.	Jun 2012	A1
20120150930	Jin et al.	Jun 2012	A1
20120151118	Flynn et al.	Jun 2012	A1
20120166715	Frost et al.	Jun 2012	A1
20120166749	Eleftheriou et al.	Jun 2012	A1
20120185437	Pavlov et al.	Jul 2012	A1
20120197844	Wang et al.	Aug 2012	A1
20120210095	Nellans et al.	Aug 2012	A1
20120226668	Dhamankar et al.	Sep 2012	A1
20120226841	Nguyen et al.	Sep 2012	A1
20120239869	Chiueh et al.	Sep 2012	A1
20120240126	Dice et al.	Sep 2012	A1
20120243687	Li et al.	Sep 2012	A1
20120246129	Rothschild et al.	Sep 2012	A1
20120246392	Cheon	Sep 2012	A1
20120271868	Fukatani et al.	Oct 2012	A1
20120290629	Beaverson et al.	Nov 2012	A1
20120290788	Klemm et al.	Nov 2012	A1
20120303876	Benhase et al.	Nov 2012	A1
20120310890	Dodd et al.	Dec 2012	A1
20120311246	McWilliams et al.	Dec 2012	A1
20120311290	White	Dec 2012	A1
20120311292	Maniwa et al.	Dec 2012	A1
20120311568	Jansen	Dec 2012	A1
20120317084	Liu	Dec 2012	A1
20120317338	Yi et al.	Dec 2012	A1
20120317353	Webman	Dec 2012	A1
20120317395	Segev et al.	Dec 2012	A1
20120323860	Yasa et al.	Dec 2012	A1
20120324150	Moshayedi et al.	Dec 2012	A1
20120331471	Ramalingam et al.	Dec 2012	A1
20130007097	Sambe	Jan 2013	A1
20130007370	Parikh et al.	Jan 2013	A1
20130010966	Li et al.	Jan 2013	A1
20130013654	Lacapra et al.	Jan 2013	A1
20130018722	Libby	Jan 2013	A1
20130018854	Condict	Jan 2013	A1
20130019057	Stephens et al.	Jan 2013	A1
20130024641	Talagala et al.	Jan 2013	A1
20130042065	Kasten et al.	Feb 2013	A1
20130054927	Raj et al.	Feb 2013	A1
20130055358	Short et al.	Feb 2013	A1
20130060992	Cho et al.	Mar 2013	A1
20130061169	Pearcy et al.	Mar 2013	A1
20130073519	Lewis	Mar 2013	A1
20130073821	Flynn et al.	Mar 2013	A1
20130080679	Bert	Mar 2013	A1
20130080720	Nakamura et al.	Mar 2013	A1
20130083639	Wharton et al.	Apr 2013	A1
20130086006	Colgrove et al.	Apr 2013	A1
20130086270	Nishikawa et al.	Apr 2013	A1
20130086336	Canepa et al.	Apr 2013	A1
20130097341	Oe et al.	Apr 2013	A1
20130110783	Wertheimer et al.	May 2013	A1
20130110845	Dua	May 2013	A1
20130111374	Hamilton et al.	May 2013	A1
20130124776	Hallak et al.	May 2013	A1
20130138616	Gupta	May 2013	A1
20130138862	Motwani et al.	May 2013	A1
20130148504	Ungureanu	Jun 2013	A1
20130159512	Groves et al.	Jun 2013	A1
20130159815	Jung et al.	Jun 2013	A1
20130166724	Bairavasundaram et al.	Jun 2013	A1
20130166727	Wright et al.	Jun 2013	A1
20130166861	Takano et al.	Jun 2013	A1
20130173955	Hallak et al.	Jul 2013	A1
20130185403	Vachharajani et al.	Jul 2013	A1
20130185719	Kar	Jul 2013	A1
20130198480	Jones et al.	Aug 2013	A1
20130204902	Wang et al.	Aug 2013	A1
20130219048	Arvidsson et al.	Aug 2013	A1
20130219214	Samanta et al.	Aug 2013	A1
20130226877	Nagai et al.	Aug 2013	A1
20130227111	Wright et al.	Aug 2013	A1
20130227145	Wright et al.	Aug 2013	A1
20130227195	Beaverson et al.	Aug 2013	A1
20130227201	Talagala et al.	Aug 2013	A1
20130227236	Flynn et al.	Aug 2013	A1
20130232240	Purusothaman et al.	Sep 2013	A1
20130232261	Wright	Sep 2013	A1
20130238832	Dronamraju et al.	Sep 2013	A1
20130238876	Fiske et al.	Sep 2013	A1
20130238932	Resch	Sep 2013	A1
20130262404	Daga et al.	Oct 2013	A1
20130262412	Hawton et al.	Oct 2013	A1
20130262746	Srinivasan	Oct 2013	A1
20130262762	Igashira et al.	Oct 2013	A1
20130262805	Zheng et al.	Oct 2013	A1
20130268497	Baldwin et al.	Oct 2013	A1
20130275656	Talagala et al.	Oct 2013	A1
20130290249	Merriman et al.	Oct 2013	A1
20130290263	Beaverson et al.	Oct 2013	A1
20130298170	Elarabawy et al.	Nov 2013	A1
20130304998	Palmer et al.	Nov 2013	A1
20130305002	Hallak et al.	Nov 2013	A1
20130311740	Watanabe et al.	Nov 2013	A1
20130325828	Larson et al.	Dec 2013	A1
20130326546	Bavishi et al.	Dec 2013	A1
20130339629	Alexander et al.	Dec 2013	A1
20130346700	Tomlinson et al.	Dec 2013	A1
20130346720	Colgrove et al.	Dec 2013	A1
20130346810	Kimmel et al.	Dec 2013	A1
20140006353	Chen	Jan 2014	A1
20140013068	Yamato et al.	Jan 2014	A1
20140025986	Kalyanaraman et al.	Jan 2014	A1
20140052764	Michael et al.	Feb 2014	A1
20140059309	Brown et al.	Feb 2014	A1
20140068184	Edwards et al.	Mar 2014	A1
20140081906	Geddam et al.	Mar 2014	A1
20140081918	Srivas et al.	Mar 2014	A1
20140082255	Powell	Mar 2014	A1
20140082273	Segev	Mar 2014	A1
20140089264	Talagala et al.	Mar 2014	A1
20140089683	Miller et al.	Mar 2014	A1
20140095758	Smith et al.	Apr 2014	A1
20140095803	Kim et al.	Apr 2014	A1
20140101115	Ko et al.	Apr 2014	A1
20140101298	Shukla et al.	Apr 2014	A1
20140108350	Marsden	Apr 2014	A1
20140108797	Johnson et al.	Apr 2014	A1
20140108863	Nowoczynski et al.	Apr 2014	A1
20140129830	Raudaschl	May 2014	A1
20140143207	Brewer et al.	May 2014	A1
20140143213	Tal et al.	May 2014	A1
20140149355	Gupta et al.	May 2014	A1
20140149647	Guo et al.	May 2014	A1
20140164715	Weiner et al.	Jun 2014	A1
20140172811	Green	Jun 2014	A1
20140181370	Cohen et al.	Jun 2014	A1
20140185615	Ayoub et al.	Jul 2014	A1
20140195199	Uluyol	Jul 2014	A1
20140195480	Talagala et al.	Jul 2014	A1
20140195492	Wilding et al.	Jul 2014	A1
20140195564	Talagala et al.	Jul 2014	A1
20140208003	Cohen et al.	Jul 2014	A1
20140215129	Kuzmin et al.	Jul 2014	A1
20140215147	Pan	Jul 2014	A1
20140215170	Scarpino et al.	Jul 2014	A1
20140215262	Li et al.	Jul 2014	A1
20140223029	Bhaskar et al.	Aug 2014	A1
20140223089	Kang et al.	Aug 2014	A1
20140244962	Arges et al.	Aug 2014	A1
20140250440	Carter et al.	Sep 2014	A1
20140258681	Prasky et al.	Sep 2014	A1
20140259000	Desanti et al.	Sep 2014	A1
20140279917	Minh et al.	Sep 2014	A1
20140279931	Gupta et al.	Sep 2014	A1
20140281017	Apte	Sep 2014	A1
20140281055	Davda et al.	Sep 2014	A1
20140281123	Weber	Sep 2014	A1
20140281131	Joshi et al.	Sep 2014	A1
20140283118	Anderson et al.	Sep 2014	A1
20140289476	Nayak	Sep 2014	A1
20140297980	Yamazaki	Oct 2014	A1
20140304548	Steffan et al.	Oct 2014	A1
20140310231	Sampathkumaran	Oct 2014	A1
20140310373	Aviles et al.	Oct 2014	A1
20140317093	Sun et al.	Oct 2014	A1
20140325117	Canepa et al.	Oct 2014	A1
20140325147	Nayak	Oct 2014	A1
20140344216	Abercrombie et al.	Nov 2014	A1
20140344222	Morris	Nov 2014	A1
20140344539	Gordon et al.	Nov 2014	A1
20140372384	Long et al.	Dec 2014	A1
20140379965	Gole et al.	Dec 2014	A1
20140379990	Pan et al.	Dec 2014	A1
20140379991	Lomet et al.	Dec 2014	A1
20140380092	Kim et al.	Dec 2014	A1
20150019792	Swanson et al.	Jan 2015	A1
20150032928	Andrews et al.	Jan 2015	A1
20150039716	Przykucki, Jr. et al.	Feb 2015	A1
20150039745	Degioanni et al.	Feb 2015	A1
20150039852	Sen et al.	Feb 2015	A1
20150040052	Noel et al.	Feb 2015	A1
20150052315	Ghai et al.	Feb 2015	A1
20150058577	Earl	Feb 2015	A1
20150066852	Beard	Mar 2015	A1
20150085665	Kompella et al.	Mar 2015	A1
20150085695	Ryckbosch et al.	Mar 2015	A1
20150089138	Tao et al.	Mar 2015	A1
20150089285	Lim et al.	Mar 2015	A1
20150095555	Asnaashari et al.	Apr 2015	A1
20150106556	Yu et al.	Apr 2015	A1
20150112939	Cantwell et al.	Apr 2015	A1
20150120754	Chase et al.	Apr 2015	A1
20150121021	Nakamura et al.	Apr 2015	A1
20150127922	Camp et al.	May 2015	A1
20150134926	Yang et al.	May 2015	A1
20150169414	Lalsangi et al.	Jun 2015	A1
20150172111	Lalsangi et al.	Jun 2015	A1
20150186270	Peng et al.	Jul 2015	A1
20150193338	Sundaram et al.	Jul 2015	A1
20150199415	Bourbonnais et al.	Jul 2015	A1
20150213032	Powell et al.	Jul 2015	A1
20150220402	Cantwell et al.	Aug 2015	A1
20150234709	Koarashi	Aug 2015	A1
20150236926	Wright et al.	Aug 2015	A1
20150242478	Cantwell et al.	Aug 2015	A1
20150244795	Cantwell et al.	Aug 2015	A1
20150253992	Ishiguro et al.	Sep 2015	A1
20150254013	Chun	Sep 2015	A1
20150261446	Lee	Sep 2015	A1
20150261792	Attarde et al.	Sep 2015	A1
20150269201	Caso et al.	Sep 2015	A1
20150286438	Simionescu et al.	Oct 2015	A1
20150288671	Chan et al.	Oct 2015	A1
20150293817	Subramanian et al.	Oct 2015	A1
20150301964	Brinicombe et al.	Oct 2015	A1
20150324236	Gopalan et al.	Nov 2015	A1
20150324264	Chinnakkonda et al.	Nov 2015	A1
20150339194	Kalos et al.	Nov 2015	A1
20150355985	Holtz et al.	Dec 2015	A1
20150363328	Candelaria	Dec 2015	A1
20150370715	Samanta et al.	Dec 2015	A1
20150378613	Koseki	Dec 2015	A1
20160004733	Cao et al.	Jan 2016	A1
20160011984	Speer et al.	Jan 2016	A1
20160026552	Holden et al.	Jan 2016	A1
20160034358	Hayasaka et al.	Feb 2016	A1
20160034550	Ostler et al.	Feb 2016	A1
20160048342	Jia et al.	Feb 2016	A1
20160070480	Babu et al.	Mar 2016	A1
20160070490	Koarashi et al.	Mar 2016	A1
20160070618	Pundir et al.	Mar 2016	A1
20160070644	D'Sa et al.	Mar 2016	A1
20160070714	D'Sa et al.	Mar 2016	A1
20160077744	Pundir et al.	Mar 2016	A1
20160092125	Cowling et al.	Mar 2016	A1
20160099844	Colgrove et al.	Apr 2016	A1
20160139838	D'Sa et al.	May 2016	A1
20160139849	Chaw et al.	May 2016	A1
20160149763	Ingram et al.	May 2016	A1
20160149766	Borowiec et al.	May 2016	A1
20160154834	Friedman et al.	Jun 2016	A1
20160179410	Haas et al.	Jun 2016	A1
20160188370	Razin et al.	Jun 2016	A1
20160188430	Nitta et al.	Jun 2016	A1
20160203043	Nazari et al.	Jul 2016	A1
20160283139	Brooker et al.	Sep 2016	A1
20160350192	Doherty et al.	Dec 2016	A1
20160371021	Goldberg et al.	Dec 2016	A1
20170003892	Sekido et al.	Jan 2017	A1
20170017413	Aston et al.	Jan 2017	A1
20170031769	Zheng et al.	Feb 2017	A1
20170031774	Bolen et al.	Feb 2017	A1
20170046257	Babu et al.	Feb 2017	A1
20170068599	Chiu et al.	Mar 2017	A1
20170083535	Marchukov et al.	Mar 2017	A1
20170097873	Krishnamachari et al.	Apr 2017	A1
20170109298	Kurita et al.	Apr 2017	A1
20170123726	Sinclair et al.	May 2017	A1
20170212690	Babu et al.	Jul 2017	A1
20170220777	Wang et al.	Aug 2017	A1
20170300248	Purohit et al.	Oct 2017	A1
20170351543	Kimura	Dec 2017	A1
20180081832	Longo et al.	Mar 2018	A1
20180287951	Waskiewicz, Jr. et al.	Oct 2018	A1

Foreign Referenced Citations (7)

Number	Date	Country
0726521	Aug 1996	EP
1970821	Sep 2008	EP
2693358	Feb 2014	EP
2735978	May 2014	EP
2006050455	May 2006	WO
2012132943	Oct 2012	WO
2013101947	Jul 2013	WO

Non-Patent Literature Citations (90)

Entry
International Search Report and the Written Opinion of the International Searching Authority received for PCT Application No. PCT/US15/16625 dated Sep. 17, 2015, 8 pages.
Non-Final Office Action received for U.S. Appl. No. 14/186,847 dated Aug. 13, 2015, 23 pages.
Final Office Action on U.S. Appl. No. 14/186,847 dated Feb. 16, 2016.
Lamport L., “The Part-Time Parliament,” ACM Transactions on Computer Systems, May 1998, vol. 16 (2), pp. 133-169.
Leventhal A.H., “A File System All its Own,” Communications of the ACM Queue, May 2013, vol. 56 (5), pp. 64-67.
Lim H., et al., “SILT: A Memory-Efficient, High-Performance Key-Value Store,” Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP'11), Oct. 23-26, 2011, pp. 1-13.
Metreveli et al. “CPHash: A Cache-Partitioned Hash Table.” Nov. 2011. https://people.csail.mit.edu/nickolai/papers/metrevelicphash- tr.pdf.
Moshayedi M., et al., “Enterprise SSDs,” ACM Queue—Enterprise Flash Storage, Jul.-Aug. 2008, vol. 6 (4), pp. 32-39.
Odlevak, “Simple Kexec Example”, https://www.linux.com/blog/simple-kexec-example, accessed on Feb. 5, 2019 (Year: 2011), 4 pages.
Oetiker, “rrdfetch,” http ://oss.oetiker.ch/rrdtool/doc/rrdfetch .en. html, Date obtained from the internet: Sep. 9, 2014, 5 pages.
Oetiker, “rrdtool,” http :/loss. oetiker.ch/rrdtool/doc/rrdtool.en. html Date obtained from the internet: Sep. 9, 2014, 5 pages.
O'Neil P., at al., “The Log-structured Merge-tree (lsm-tree),” Acta Informatica, 33, 1996, pp. 351-385.
Ongaro D., et al., “In Search of an Understandable Consensus Algorithm,” Stanford University, URL: https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf, May 2013, 14 pages.
Ongaro, et al., “In search of an understandable consensus algorithm (extended version),” 2014, 18 pages.
Pagh R., etaL, “Cuckoo Hashing,” Elsevier Science, Dec. 8, 2003, pp. 1-27.
Pagh R., et al., “Cuckoo Hashing for Undergraduates,” IT University of Copenhagen, Mar. 27, 2006, pp. 1-6.
“Pivot Root”, Die.net, retrieved from https://linux.die.net/pivot_root on Nov. 12, 2011 (Year: 2012).
Proceedings of the FAST 2002 Conference on File Storage Technologies, Monterey, California, USA, Jan. 28-30, 2002, 14 pages.
Rosenblum M., et al., “The Design and Implementation of a Log-Structured File System,” In Proceedings of ACM Transactions on Computer Systems, vol. 10(1),Feb. 1992, pp. 26-52.
Rosenblum M., et al., “The Design and Implementation of a Log-Structured File System,” (SUN00006867-SUN00006881), Jul. 1991, 15 pages.
Rosenblum M., et al., “The Design and Implementation of a Log-Structured File System,”Proceedings of the 13th ACM Symposium on Operating Systems Principles, (SUN00007382-SUN00007396), Jul. 1991, 15 pages.
Rosenblum M., et al., “The LFS Storage Manager,” USENIX Technical Conference, Anaheim, CA, (Sun 00007397-SUN00007412), Jun. 1990, 16 pages.
Rosenblum M., et al., “The LFS Storage Manager,” USENIX Technical Conference, Computer Science Division, Electrical Engin, and Computer Sciences, Anaheim, CA, presented at Summer '90 USENIX Technical Conference, (SUN00006851-SUN00006866), Jun. 1990, 16 pages.
Rosenblum M., “The Design and Implementation of a Log-Structured File System,” UC Berkeley, 1992, pp. 1-101.
Sears., et al., “Blsm: A General Purpose Log Structured Merge Tree,” Proceedings of the 2012 ACM SIGMOD International Conference on Management, 2012, 12 pages.
Seltzer M., et al., “An Implementation of a Log Structured File System for UNIX,” Winter USENIX, San Diego, CA, Jan. 25-29, 1993, pp. 1-18.
Seltzer M.I., et al., “File System Performance and Transaction Support,” University of California at Berkeley Dissertation, 1992, 131 pages.
Smith K., “Garbage Collection,” Sand Force, Flash Memory Summit, Santa Clara, CA, Aug. 2011, pp. 1-9.
Stoica et al. “Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.” Aug. 2001. ACM. SIGCOMM 01.
Supplementary European Search Report for Application No. EP12863372 dated Jul. 16, 2015, 7 pages.
Texas Instruments, User Guide, TMS320C674x/OMAP-L1 x Processor Serial ATA (SATA) Controller, Mar. 2011, 76 Pages.
Twigg A., et al., “Stratified B-trees and Versioned Dictionaries,” Proceedings of the 3rd US EN IX Conference on Hot Topics in Storage and File Systems, 2011, vol. 11, pp. 1-5.
Waskiewicz, PJ, “Scaling With Multiple Network Namespaces in a Single Application”, Netdev 1.2—The Technical Conferenceon Linux Networking, retrieved from internet: URL; https://netdevconf.orq/1.2/papers/pj-netdev-1.2pdf Dec. 12, 2016, 5 pages.
Wei, Y. and D. Shin, “NAND Flash Storage Device Performance in Linux File System,” 6th International Conference on Computer Sciences and Convergence Information Technology (ICCIT), 2011.
Wikipedia, “Cuckoo hashing,” http://en.wikipedia.org/wiki/Cuckoo_hash, Apr. 2013, pp. 1-5.
Wilkes J., et al., “The Hp Auto Raid Hierarchical Storage System,” Operating System Review, ACM, New York, NY, Dec. 1, 1995, vol. 29 (5), pp. 96-108.
Wu P-L., et al., “A File-System-Aware FTL Design for Flash-Memory Storage Systems,” IEEE, Design, Automation & Test in Europe Conference & Exhibition, 2009, pp. 1-6.
Yossef, “BuildingMurphy-compatible embedded Linux Systems”, Proceedings of the Linux Symposium,Ottawa, Ontario Canada, Jul. 20-23, 2005 (Year: 2005).
Agrawal, et al., “Design Tradeoffs for SSD Performance,” USENIX Annual Technical Conference, 2008, 14 Pages.
Alvaraez C., “NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide,” Technical Report TR-3505, 2011, 75 pages.
Amit et al., “Strategies for Mitigating the IOTLB Bottleneck,” Technion—Israel Institute of Technology, IBM Research Haifa, WIOSCA 2010—Sixth Annual Workshop on the Interaction between Operating Systems and Computer Architecture, 2010, 12 pages.
Arpaci-Dusseau R., et al., “Log-Structured File Systems,” Operating Systems: Three Easy Pieces published by Arpaci-Dusseau Books, May 25, 2014, 15 pages.
Balakrishnan M., et al., “CORFU: A Shared Log Design for Flash Clusters,” Microsoft Research Silicon Vally, University of California, San Diego, Apr. 2012, https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/balakrishnan, 14 pages.
Ben-Yehuda et al., “The Price of Safety: Evaluating IOMMU Performance,” Proceedings of the Linux Symposium, vol. 1, Jun. 27-30, 2007, pp. 9-20.
Bitton D. et al., “Duplicate Record Elimination in Large Data Files,” Oct. 26, 1999, 11 pages.
Bogaerdt, “cdeftutorial,” http://oss.oetiker.ch/rrdtool/tut/cdeftutorial.en.html Date obtained from the internet, Sep. 9, 2014, 14 pages.
Bogaerdt, “Rates, Normalizing and Consolidating,” http://www.vandenbogaerdl.nl/rrdtool/process.php Date obtained from the internet: Sep. 9, 2014, 5 pages.
Bogaerdt, “rrdtutorial,” http://oss.oetiker.ch/rrdtool/lul/rrdtutorial.en.html Date obtained from the internet, Sep. 9, 2014, 21 pages.
Chris K., et al., “How many primes are there?” Nov. 2001. https://web.archive.org/web/20011120073053/http://primes.utm edu/howmany shtml.
Cornwellm., “Anatomy of a Solid-state Drive,” ACM Queue-Networks, Oct. 2012, vol. 10 (10), pp. 1-7.
Culik K., et al., “Dense Multiway Trees,” ACM Transactions on Database Systems, Sep. 1981, vol. 6 (3), pp. 486-512.
Debnath B., et al., “FlashStore: High Throughput Persistent Key-Value Store,” Proceedings of the VLDB Endowment VLDB Endowment, Sep. 2010, vol. 3 (1-2), pp. 1414-1425.
Debnath, et al., “ChunkStash: Speeding up In line Storage Deduplication using Flash Memory,” USENIX, USENIXATC '10, Jun. 2010, 15 pages.
Dictionary definition for references, retrieved from: http://www.dictionary.com/browse/reference?s=t on Dec. 23, 2017.
Enclopedia entry for pointers vs. references, retrieved from: https://www.geeksforgeeks.org/pointers-vs-references-cpp/ on Dec. 23, 2017.
Extended European Search Report dated Apr. 9, 2018 for EP Application No. 15855480.8 filed Oct. 22, 2015, 7 pages.
Fan, et al., “MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing,” USENIX NSDI '13, Apr. 2013, pp. 371-384.
Gal E., et al., “Algorithms and Data Structures for Flash Memories,” ACM Computing Surveys (CSUR) Archive, Publisher ACM, New York City, NY, USA, Jun. 2005, vol. 37 (2), pp. 138-163.
Gray J., et al., “Flash Disk Opportunity for Server Applications,” Queue—Enterprise Flash Storage, Jul.-Aug. 2008, vol. 6 (4), pp. 18-23.
Gulati et al., “BASIL: Automated IO Load Balancing Across Storage Devices,” Proceedings of the 8th USENIX Conference on File and Storage Technologies, FAST'10, Berkeley, CA, USA, 2010, 14 pages.
Handy J., “SSSI Tech Notes: How Controllers Maximize SSD Life,” SNIA, Jan. 2013, pp. 1-20.
Hwang K., et al., “RAID-x: A New Distributed Disk Array for I/O-centric Cluster Computing,” IEEE High-Performance Distributed Computing, Aug. 2000, pp. 279-286.
IBM Technical Disclosure Bulletin, “Optical Disk Axial Runout Test”, vol. 36, No. 10, NN9310227, Oct. 1, 1993, 3 pages.
Intel, Product Specification—Intel® Solid-State Drive DC S3700, Jun. 2013, 32 pages.
International Search Report and Written Opinion for Application No. PCT/EP2014/071446 dated Apr. 1, 2015, 14 pages.
International Search Report and Written Opinion for Application No. PCT/US2012/071844 dated Mar. 1, 2013, 12 pages.
International Search Report and Written Opinion for Application No. PCT/US2014/035284 dated Apr. 1, 2015, 8 pages.
International Search Report and Written Opinion for Application No. PCT/US2014/055138 dated Dec. 12, 2014, 13 pages.
International Search Report and Written Opinion for Application No. PCT/US2014/058728 dated Dec. 16, 2014, 11 pages.
International Search Report and Written Opinion for Application No. PCT/US2014/060031 dated Jan. 26, 2015, 9 pages.
International Search Report and Written Opinion for Application No. PCT/US2014/071446 dated Apr. 1, 2015, 13 pages.
International Search Report and Written Opinion for Application No. PCT/US2014/071465 dated Mar. 25, 2015, 12 pages.
International Search Report and Written Opinion for Application No. PCT/US2014/071484 dated Mar. 25, 2015, 9 pages.
International Search Report and Written Opinion for Application No. PCT/US2014/071581 dated Apr. 10, 2015, 9 pages.
International Search Report and Written Opinion for Application No. PCT/US2014/071635 dated Mar. 31, 2015, 13 pages.
International Search Report and Written Opinion for Application No. PCT/US2015/021285 dated Jun. 23, 2015, 8 pages.
International Search Report and Written Opinion for Application No. PCT/US2015/024067 dated Jul. 8, 2015, 7 pages.
International Search Report and Written Opinion for Application No. PCT/US2015/048800 dated Nov. 25, 2015, 11 pages.
International Search Report and Written Opinion for Application No. PCT/US2015/048810 dated Dec. 23, 2015, 11 pages.
International Search Report and Written Opinion for Application No. PCT/US2015/048833 dated Nov. 25, 2015, 11 pages.
International Search Report and Written Opinion for Application No. PCT/US2015/056932 dated Jan. 21, 2016, 11 pages.
International Search Report and Written Opinion for Application No. PCT/US2015/057532 dated Feb. 9, 2016, 12 pages.
International Search Report and Written Opinion for Application No. PCT/US2016/059943 dated May 15, 2017, 14 pages.
International Search Report and Written Opinion for Application No. PCT/US2018/025951, dated Jul. 18, 2018, 16 pages.
Jones, M. Tim, “Next-generation Linux file systems: NiLFS(2) and eofs,” IBM, 2009.
Jude Nelson “Syndicate: Building a Virtual Cloud Storage Service Through Service Composition” Princeton University, 2013, pp. 1-14.
Kagel A.S, “two-way merge sort,” Dictionary of Algorithms and Data Structures [online], retrieved on Jan. 28, 2015, Retrieved from the Internet: URL: http://xlinux.nist.gov/dads/HTMUIwowaymrgsrl.html, May 2005, 1 page.
Konishi, R., Sato, K., andY. Amagai, “Filesystem Support for Continuous Snapshotting,” Ottawa Linux Symposium, 2007.
Extended European Search Report for Application No. 20201330.6 dated Dec. 8, 2020, 7 pages.
Extended European Search Report for Application No. 20205866.5 dated Dec. 8, 2020, 7 pages.

Related Publications (1)

	Number	Date	Country
	20200250201 A1	Aug 2020	US

Continuations (2)

	Number	Date	Country
Parent	14684929	Apr 2015	US
Child	16853660		US
Parent	14186847	Feb 2014	US
Child	14684929		US

Data syncing in a distributed system

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Term Extension

Abstract