Computer data is vital to today's organizations and a significant part of protection against disasters is focused on data protection. As the cost of solid-state memory has decreased, organizations may be able to afford systems that store and process terabytes of data.
Conventional data protection systems may include tape backup drives, for storing organizational production site data on a periodic basis. Another conventional data protection system uses data replication, by generating a copy of production site data of an organization on a secondary backup storage system, and updating the backup with changes. The backup storage system may be situated in the same physical location as the production storage system, or in a physically remote location. Data replication systems generally operate either at the application level, at the file system level, or at the data block level.
Most of the modern storage arrays provide snapshot capabilities. These snapshots allow a user to save or freeze an image of a volume or set of volumes at some point-in-time and to restore this image when needed.
In one aspect, a method includes generating a protection file system in a deduplication storage array, generating a snapshot of a production volume in the deduplication storage array including hashes of data in the snapshot, a first file hierarchy for the hashes of the data in the snapshot in the protection file system and adding a retention indicator to each hash in the first file hierarchy.
In another aspect, an apparatus, includes electronic hardware circuitry configured to generate a protection file system in a deduplication storage array, generate a snapshot of a production volume in the deduplication storage array including hashes of data in the snapshot, generate a first file hierarchy for the hashes of the data in the snapshot in the protection file system and add a retention indicator to each hash in the first file hierarchy.
In a further aspect, an article includes a non-transitory computer-readable medium that stores computer-executable instructions. The instructions cause a machine to generate a protection file system in a deduplication storage array, generate a snapshot of a production volume in the deduplication storage array including hashes of data in the snapshot, generate a first file hierarchy for the hashes of the data in the snapshot in the protection file system and add a retention indicator to each hash in the first file hierarchy.
Described herein are techniques to save data to a protection storage as a file and, when the data is no longer needed, the data may be moved to a retention storage. In some examples, the data that are saved as files may include hashes of data in a production volume and/or hashes of one or more hashes. In some examples, the retention storage may be less expensive to purchase than the protection storage.
While the description herein describes taking snapshots of a volume, the techniques described herein may be applied to multiple volumes such as, for example, taking a snapshot of a logical unit that includes one or more volumes.
Referring to
The host 102 may include an application 110. The storage array 104 may include a storage processing system (SPS) 120, a production volume 122, snapshots of the production volume (e.g., a snapshot of production volume 132), a hash reference count and retention indicator information 136, and a protection logical unit 142. In some examples, the data generated by using the application 110 may be stored on the production volume 122.
The protection logical unit 142 may include a protection file system 144. The protection file system 144 may save hashes of data describing snapshots of logical units of the storage array 104 such as, for example, snapshot 132 as files. For example, the file system 144 may save a file of the first snapshot 152 as one or more files in a file hierarchy. In another example, the protection file system 144 may save changes to the production volume 122 since the first snapshot 152 was taken in change files 162. In other embodiments, if the snapshot is generated as a set of files as described in
In one example, storage array 104 may save each block of data as a hash. In one particular example, the blocks of data are 8 KB in size. In one particular example, a hash includes a Secure Hash Algorithm 1 (SHA-1) hash. In one example, the storage array 104 may be a deduplicated storage array so that each of the data in the storage array may be kept in separate levels. Accordingly, in such examples, in a first level, each volume may include a set of pointers from address-to-hash value of the data address (e.g. in an address-to-hash (A2H) mapping 192), which may be kept in a compact format, for example. Further, in such examples, a second level of mapping includes, for example, a map from hash-to-the physical location (e.g., in a hash-to-physical mapping (H2P) 194) where the data matching the hash value is stored.
In some examples, A2H mappings 192 and H2P mappings 194 may each be maintained using one or more tables. It will be appreciated that, in certain embodiments, combinations of the A2H and H2P tables may provide multiple levels of indirection between the logical (or “I/O”) address used to access data and the physical address where that data is stored. Among other advantages, this may allow storage system 100 freedom to move data within the storage 104.
In some examples, for each hash there is also a hash reference count which counts the number of references to the data to which the hash points. If the exact same data is used again later on the storage array 104, then the hash is not saved again but another pointer is added to point to the hash. In some examples, the system 100 periodically takes a snapshot of the production volume 122 to form, for example, the snapshot 132.
In one example, the hash reference count and retention indicator information 136 includes reference counts for each hash value, and has a pointer from each hash value to its physical location. In one example, each hash reference count value represents a number of entities (e.g., journals, tables) that rely on and use the hash value. In one particular example, a hash reference count of ‘0’ means no entities in the storage array are using the hash value and the data to which the hash points may be erased. In a preferred embodiment, the hash count is incremented for each entity that uses the hash. In some examples, the system 100 may be configured so that a hash reference counter counts up or counts down for each new entity that depends on the hash value and the hash reference count value may start at any value. For example, the counter may track a number of entities that depend on the hash value as new entities are added to and deleted from the storage array 104.
In one example, the hash reference count and retention table 136 also includes a retention indicator for each hash value. In one example, the retention indicator may indicate that the hash may be part of a retention policy (e.g., designated for retention) and should not be erased from the system 100. In one particular example, the retention indicator is a retention bit and a retention bit of ‘1’ indicates retain the hash while a retention bit of ‘0’ indicates not to retain the hash. In one example, even though a reference count is zero for a hash, if the retention indicator indicates that the hash should be retained, the hash is moved to the retention storage 172.
In one particular example, the reference count and retention indicator information 136 may be included with the H2P mapping.
The retention storage 172 may include a retention volume 176, which may store the data designated for retention.
In one example, the storage array 104 may include a flash storage array. In other examples, the storage array 104 may include a deduplication device. In other examples, the storage array 104 may be part of a device used for scalable data storage and retrieval using content addressing. In one example, the storage array 104 may include one or more of the features of a device for scalable data storage and retrieval using content addressing described in U.S. Pat. No. 9,104,326, issued Aug. 11, 2015, entitled “SCALABLE BLOCK DATA STORAGE USING CONTENT ADDRESSING,” which is assigned to the same assignee as this patent application and is incorporated herein in its entirety. In other examples, the storage array 104 may include a flash storage array used in EMC® XTREMIO®.
Referring to
Process 200 may generate a snapshot of the volume (202). For example, a snapshot of the production volume 122 may be taken and saved as the first snapshot 132.
Process 200 may read the hashes of the snapshot (210) and may write a file content hash of the snapshot to a protection file system (218). In one example, the file content hash may be a hash of composite hashes, which may be hashes of other composite hashes and so forth to a last level of hashes, which are hashes of the underlying data blocks. In one particular example, a file content hash may be generated similar to a file content hash 302 described in
Process 200 may mark hashes for retention (220). In one example, a volume is designated for retention. In one particular example, a file content hash and each of the hashes that form the file content hash may be marked with a retention indicator indicating that the data for which the hash is derived should be retained, for example, in the retention storage 172 when the hash is no longer used in the storage array 104 (e.g., the reference count is zero). Once a first copy of a snapshot is generated in the protection file system, the process 200 may repeat itself and a new snapshot of the production volume is generated, and then process 200 may read hashes that have changed since the last snapshot (224) and may write a new file content hash of each snapshot to the file protection system (228). In one example, the production volume 122 may change from the first snapshot 132 because writes are added to the production volume after the first snapshot 132 was generated. In one example, not all of the hashes that are used to generate the file content hash will have changed from the first snapshot, but only those hashes affected by the change in data. The hashes that have changed since the last snapshot may be saved as files (e.g., change files 162). In one particular example, a new file content hash may be generated similar to a file content hash 402 described in
Process 200 may mark new hashes for retention (232). In one particular example, the new file content hash and each of the hashes that form the new file content hash that have changed may be marked with a retention indicator indicating that the data for which the hash is derived should be retained, for example, in the retention storage 172 when the hash is no longer used by the A2H mappings 192 in the storage array 104 (e.g., the reference count is zero). In one embodiment, no data is being moved while hash files are generated, as the data is already in the hash table of the array.
The hash, H11, may be a hash of a composite of hashes that include at least H1, H2, H3 and H4 included in a composite file 310. The hash, H12, may be a hash of a composite of hashes that include at least hashes H5, H6 and H7 included in a composite file 316. The hash, H13, may be a hash of a composite of hashes that include hashes H8 included in a composite file 318.
The hash, H1, may be a hash of at least a data block 320. The hash, H2, may be a hash of at least a data block 322. The hash, H3, may be a hash of at least a data block 323. The hash, H4, may be a hash of at least a data block 328. The hash, H5, may be a hash of at least a data block 330. The hash, H6, may be a hash of at least a data block 332. The hash, H7, may be a hash of at least a data block 334. The hash, H8, may be a hash of at least a data block 338.
In this example, the data blocks 320, 322, 323, 328, 330, 332, 334, 338 include data that include the following data: “This is test phrase that will be stored on the Server in multiple chunks. Each chunk is stored as separate hashes and put in a hierarchy.”
In the example in
With the new data blocks 412, 416 the corresponding dependent hashes are changed. In particular, the hash, H3, of
Referring to
Process 500 may determine whether the storage array (e.g., storage array 104) is using a hash (502). For example, process 500 determines whether the hash is being used in a A2H mapping 192, and the hash points to some physical data on the storage. For example, the process 500 may determine whether a reference count for the hash is zero or not. In one particular example, for each hash, a reference count may be read from the hash reference count and retention indicator table 136.
If the storage array is no longer using the hash (e.g., a reference count is zero), process 500 may determine if retention is indicated (508). In one example, a retention indicator may be assigned to the hash and the retention indicator may indicate whether the hash should be retained or not in retention storage 172. In one particular example, a retention indicator is assigned to the hash in the hash reference count and retention indicator table 136.
If retention is not indicated, process 500 may delete the hash (512). For example, the retention indicator indicates that the hash should be erased from the system 100 and the data is not to be saved in the retention storage 172. In one example, once the hash is deleted the data associated with the hash may also be deleted from the physical storage.
If retention is indicated, process 500 may move data to retention storage (518). For example, the retention indicator indicates that the data should be saved and the data is saved in the retention storage 172. In one example, the data is moved to the retention volume 176.
Process 500 changes pointers to the data (522). For example, pointers to the hash in a H2P mapping may be erased and may be changed to include pointer from the hash files in the protection file system 144 to the location of the hash data in the retention storage 172. Once data is moved to the retention storage it may be erased from the primary storage.
In some embodiments, a volume may be recovered from the retention storage 172. For example, a new volume is generated in the storage array 104. For every location for which a hash exists, an A2H mapping 192 may be updated to point to the hash and other locations are read from the retention storage 172 and written to the storage array 104. In some examples, the hash value is copied to the H2P mapping 192 and a copy of the actual data from the retention storage 172 is copied to the new volume.
In other embodiments, a volume may be recovered using a virtual mode (e.g., virtual access). In one example, a new volume is generated on the storage array 104 and the new volume is empty. In a preferred embodiment, the empty volume is initially an empty device. Further, in certain embodiments, writes that are received for the new volume are written to the new volume. Likewise, in certain embodiments, a read command to read data from the new volume reads the data from the new volume if the data exists in the new volume. However, in certain embodiments, if the data does not exist in the new volume, the file system 144 is checked using the offset of the read for the location of the hash. Accordingly, in certain embodiments, if the hash is in the storage array 104, the H2P mappings 194 are used to locate the hash. Therefore, in certain embodiments, if the hash is in the retention storage 172, the hash is retrieved from the retention storage 172.
Referring to
The processes described herein (e.g., processes 200 and 500) are not limited to use with the hardware and software of
The system may be implemented, at least in part, via a computer program product, (e.g., in a non-transitory machine-readable storage medium such as, for example, a non-transitory computer-readable medium, for execution by, or to control the operation of, data processing apparatus (e.g., a programmable processor, a computer, or multiple computers)). Each such program may be implemented in a high level procedural or object-oriented programming language to communicate with a computer system. However, the programs may be implemented in assembly or machine language. The language may be a compiled or an interpreted language and it may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network. A computer program may be stored on a non-transitory machine-readable medium that is readable by a general or special purpose programmable computer for configuring and operating the computer when the non-transitory machine-readable medium is read by the computer to perform the processes described herein. For example, the processes described herein may also be implemented as a non-transitory machine-readable storage medium, configured with a computer program, where upon execution, instructions in the computer program cause the computer to operate in accordance with the processes. A non-transitory machine-readable medium may include but is not limited to a hard drive, compact disc, flash memory, non-volatile memory, volatile memory, magnetic diskette and so forth but does not include a transitory signal per se.
The processes described herein are not limited to the specific examples described. For example, the processes 200 and 500 are not limited to the specific processing order of
The processing blocks (for example, in the processes 200 and 500) associated with implementing the system may be performed by one or more programmable processors executing one or more computer programs to perform the functions of the system. All or part of the system may be implemented as, special purpose logic circuitry (e.g., an FPGA (field-programmable gate array) and/or an ASIC (application-specific integrated circuit)). All or part of the system may be implemented using electronic hardware circuitry that include electronic devices such as, for example, at least one of a processor, a memory, a programmable logic device or a logic gate.
Elements of different embodiments described herein may be combined to form other embodiments not specifically set forth above. Various elements, which are described in the context of a single embodiment, may also be provided separately or in any suitable subcombination. Other embodiments not specifically described herein are also within the scope of the following claims.
Number | Name | Date | Kind |
---|---|---|---|
7203741 | Marco et al. | Apr 2007 | B2 |
7719443 | Natanzon | May 2010 | B1 |
7840536 | Ahal et al. | Nov 2010 | B1 |
7840662 | Natanzon | Nov 2010 | B1 |
7844856 | Ahal et al. | Nov 2010 | B1 |
7860836 | Natanzon et al. | Dec 2010 | B1 |
7882286 | Natanzon et al. | Feb 2011 | B1 |
7934262 | Natanzon et al. | Apr 2011 | B1 |
7958372 | Natanzon | Jun 2011 | B1 |
8037162 | Marco et al. | Oct 2011 | B2 |
8041940 | Natanzon et al. | Oct 2011 | B1 |
8060713 | Natanzon | Nov 2011 | B1 |
8060714 | Natanzon | Nov 2011 | B1 |
8103937 | Natanzon et al. | Jan 2012 | B1 |
8108634 | Natanzon et al. | Jan 2012 | B1 |
8214612 | Natanzon | Jul 2012 | B1 |
8250149 | Marco et al. | Aug 2012 | B2 |
8271441 | Natanzon et al. | Sep 2012 | B1 |
8271447 | Natanzon et al. | Sep 2012 | B1 |
8332687 | Natanzon et al. | Dec 2012 | B1 |
8335761 | Natanzon | Dec 2012 | B1 |
8335771 | Natanzon et al. | Dec 2012 | B1 |
8341115 | Natanzon et al. | Dec 2012 | B1 |
8370648 | Natanzon | Feb 2013 | B1 |
8380885 | Natanzon | Feb 2013 | B1 |
8392680 | Natanzon et al. | Mar 2013 | B1 |
8429362 | Natanzon et al. | Apr 2013 | B1 |
8433869 | Natanzon et al. | Apr 2013 | B1 |
8438135 | Natanzon et al. | May 2013 | B1 |
8464101 | Natanzon et al. | Jun 2013 | B1 |
8478955 | Natanzon et al. | Jul 2013 | B1 |
8495304 | Natanzon et al. | Jul 2013 | B1 |
8510279 | Natanzon et al. | Aug 2013 | B1 |
8521691 | Natanzon | Aug 2013 | B1 |
8521694 | Natanzon | Aug 2013 | B1 |
8543609 | Natanzon | Sep 2013 | B1 |
8583885 | Natanzon | Nov 2013 | B1 |
8600945 | Natanzon et al. | Dec 2013 | B1 |
8601085 | Ives et al. | Dec 2013 | B1 |
8627012 | Derbeko et al. | Jan 2014 | B1 |
8683592 | Dotan et al. | Mar 2014 | B1 |
8694700 | Natanzon et al. | Apr 2014 | B1 |
8706700 | Natanzon et al. | Apr 2014 | B1 |
8712962 | Natanzon et al. | Apr 2014 | B1 |
8719497 | Don et al. | May 2014 | B1 |
8725691 | Natanzon | May 2014 | B1 |
8725692 | Natanzon et al. | May 2014 | B1 |
8726066 | Natanzon et al. | May 2014 | B1 |
8738813 | Natanzon et al. | May 2014 | B1 |
8745004 | Natanzon et al. | Jun 2014 | B1 |
8751828 | Raizen et al. | Jun 2014 | B1 |
8769336 | Natanzon et al. | Jul 2014 | B1 |
8805786 | Natanzon | Aug 2014 | B1 |
8806161 | Natanzon | Aug 2014 | B1 |
8825848 | Dotan et al. | Sep 2014 | B1 |
8832399 | Natanzon et al. | Sep 2014 | B1 |
8850143 | Natanzon | Sep 2014 | B1 |
8850144 | Natanzon et al. | Sep 2014 | B1 |
8862546 | Natanzon et al. | Oct 2014 | B1 |
8892835 | Natanzon et al. | Nov 2014 | B1 |
8898112 | Natanzon et al. | Nov 2014 | B1 |
8898409 | Natanzon et al. | Nov 2014 | B1 |
8898515 | Natanzon | Nov 2014 | B1 |
8898519 | Natanzon et al. | Nov 2014 | B1 |
8914595 | Natanzon | Dec 2014 | B1 |
8924668 | Natanzon | Dec 2014 | B1 |
8930500 | Marco et al. | Jan 2015 | B2 |
8930947 | Derbeko et al. | Jan 2015 | B1 |
8935498 | Natanzon | Jan 2015 | B1 |
8949180 | Natanzon et al. | Feb 2015 | B1 |
8954673 | Natanzon et al. | Feb 2015 | B1 |
8954796 | Cohen et al. | Feb 2015 | B1 |
8959054 | Natanzon | Feb 2015 | B1 |
8977593 | Natanzon et al. | Mar 2015 | B1 |
8977826 | Meiri et al. | Mar 2015 | B1 |
8996460 | Frank et al. | Mar 2015 | B1 |
8996461 | Natanzon et al. | Mar 2015 | B1 |
8996827 | Natanzon | Mar 2015 | B1 |
9003138 | Natanzon et al. | Apr 2015 | B1 |
9026696 | Natanzon et al. | May 2015 | B1 |
9031913 | Natanzon | May 2015 | B1 |
9032160 | Natanzon et al. | May 2015 | B1 |
9037818 | Natanzon et al. | May 2015 | B1 |
9063994 | Natanzon et al. | Jun 2015 | B1 |
9069479 | Natanzon | Jun 2015 | B1 |
9069709 | Natanzon et al. | Jun 2015 | B1 |
9081754 | Natanzon et al. | Jul 2015 | B1 |
9081842 | Natanzon et al. | Jul 2015 | B1 |
9087008 | Natanzon | Jul 2015 | B1 |
9087112 | Natanzon et al. | Jul 2015 | B1 |
9104529 | Derbeko et al. | Aug 2015 | B1 |
9110914 | Frank et al. | Aug 2015 | B1 |
9116811 | Derbeko et al. | Aug 2015 | B1 |
9128628 | Natanzon et al. | Sep 2015 | B1 |
9128855 | Natanzon et al. | Sep 2015 | B1 |
9134914 | Derbeko et al. | Sep 2015 | B1 |
9135119 | Natanzon et al. | Sep 2015 | B1 |
9135120 | Natanzon | Sep 2015 | B1 |
9146878 | Cohen et al. | Sep 2015 | B1 |
9152339 | Cohen et al. | Oct 2015 | B1 |
9152578 | Saad et al. | Oct 2015 | B1 |
9152814 | Natanzon | Oct 2015 | B1 |
9158578 | Derbeko et al. | Oct 2015 | B1 |
9158630 | Natanzon | Oct 2015 | B1 |
9160526 | Raizen et al. | Oct 2015 | B1 |
9177670 | Derbeko et al. | Nov 2015 | B1 |
9189339 | Cohen et al. | Nov 2015 | B1 |
9189341 | Natanzon et al. | Nov 2015 | B1 |
9201736 | Moore et al. | Dec 2015 | B1 |
9223659 | Natanzon et al. | Dec 2015 | B1 |
9225529 | Natanzon et al. | Dec 2015 | B1 |
9235481 | Natanzon et al. | Jan 2016 | B1 |
9235524 | Derbeko et al. | Jan 2016 | B1 |
9235632 | Natanzon | Jan 2016 | B1 |
9244997 | Natanzon et al. | Jan 2016 | B1 |
9256605 | Natanzon | Feb 2016 | B1 |
9274718 | Natanzon et al. | Mar 2016 | B1 |
9275063 | Natanzon | Mar 2016 | B1 |
9286052 | Solan et al. | Mar 2016 | B1 |
9305009 | Bono et al. | Apr 2016 | B1 |
9563517 | Natanzon | Feb 2017 | B1 |
9588703 | Natanzon et al. | Mar 2017 | B1 |
9588847 | Natanzon et al. | Mar 2017 | B1 |
9594822 | Natanzon et al. | Mar 2017 | B1 |
9600377 | Cohen et al. | Mar 2017 | B1 |
9619543 | Natanzon et al. | Apr 2017 | B1 |
9632881 | Natanzon | Apr 2017 | B1 |
9665305 | Natanzon et al. | May 2017 | B1 |
9710177 | Natanzon | Jul 2017 | B1 |
9720618 | Panidis et al. | Aug 2017 | B1 |
9722788 | Natanzon et al. | Aug 2017 | B1 |
9727429 | Moore et al. | Aug 2017 | B1 |
9733969 | Derbeko et al. | Aug 2017 | B2 |
9737111 | Lustik | Aug 2017 | B2 |
9740572 | Natanzon et al. | Aug 2017 | B1 |
9740573 | Natanzon | Aug 2017 | B1 |
9740880 | Natanzon et al. | Aug 2017 | B1 |
9749300 | Cale et al. | Aug 2017 | B1 |
9772789 | Natanzon et al. | Sep 2017 | B1 |
9798472 | Natanzon et al. | Oct 2017 | B1 |
9798490 | Natanzon | Oct 2017 | B1 |
9804934 | Natanzon et al. | Oct 2017 | B1 |
9811431 | Natanzon et al. | Nov 2017 | B1 |
9823865 | Natanzon et al. | Nov 2017 | B1 |
9823973 | Natanzon | Nov 2017 | B1 |
9832261 | Don et al. | Nov 2017 | B2 |
9846698 | Panidis et al. | Dec 2017 | B1 |
9875042 | Natanzon et al. | Jan 2018 | B1 |
9875162 | Panidis et al. | Jan 2018 | B1 |
9880777 | Bono et al. | Jan 2018 | B1 |
9881014 | Bono et al. | Jan 2018 | B1 |
9910620 | Veprinsky et al. | Mar 2018 | B1 |
9910621 | Golan et al. | Mar 2018 | B1 |
9910735 | Natanzon | Mar 2018 | B1 |
9910739 | Natanzon et al. | Mar 2018 | B1 |
9917854 | Natanzon et al. | Mar 2018 | B2 |
9921955 | Derbeko et al. | Mar 2018 | B1 |
9933957 | Cohen et al. | Apr 2018 | B1 |
9934302 | Cohen et al. | Apr 2018 | B1 |
9940205 | Natanzon | Apr 2018 | B2 |
9940460 | Derbeko et al. | Apr 2018 | B1 |
9946649 | Natanzon et al. | Apr 2018 | B1 |
9959061 | Natanzon et al. | May 2018 | B1 |
9965306 | Natanzon et al. | May 2018 | B1 |
20140201171 | Vijayan | Jul 2014 | A1 |