Computing devices generate, use, and store data. The data may be, for example, images, document, webpages, or meta-data associated with any of the files. The data may be stored locally on a persistent storage of a computing device and/or may be stored remotely on a persistent storage of another computing device.
In one aspect, a data management device in accordance with one or more embodiments of the invention includes a persistent storage and a processor. The persistent storage includes an object storage that stores segments. The processor generates a collision free hash function based on the segments, generates a hash vector using the collision free hash function, deduplicates a portion of the segments associated with to-be-migrated files using the hash vector, and migrates the to-be-migrated files using the deduplicated portion of the segments to a remote storage.
In one aspect, a method of operating a data management device in accordance with one or more embodiments of the invention includes generating, by the data management device, a collision free hash function based on segments stored in an object storage; generating, by the data management device, a hash vector using the collision free hash function; deduplicating, by the data management device, a portion of the segments associated with to-be-migrated files stored in the object storage using the hash vector; and migrating, by the data management device, the to-be-migrated files using the deduplicated portion of the segments to a remote storage.
In one aspect, a non-transitory computer readable medium in accordance with one or more embodiments of the invention includes computer readable program code, which when executed by a computer processor enables the computer processor to perform a method for operating a data management device, the method includes generating, by the data management device, a collision free hash function based on segments stored in an object storage; generating, by the data management device, a hash vector using the collision free hash function; deduplicating, by the data management device, a portion of the segments associated with to-be-migrated files stored in the object storage using the hash vector; and migrating, by the data management device, the to-be-migrated files using the deduplicated portion of the segments to a remote storage.
Certain embodiments of the invention will be described with reference to the accompanying drawings. However, the accompanying drawings illustrate only certain aspects or implementations of the invention by way of example and are not meant to limit the scope of the claims.
Specific embodiments will now be described with reference to the accompanying figures. In the following description, numerous details are set forth as examples of the invention. It will be understood by those skilled in the art that one or more embodiments of the present invention may be practiced without these specific details and that numerous variations or modifications may be possible without departing from the scope of the invention. Certain details known to those of ordinary skill in the art are omitted to avoid obscuring the description.
In the following description of the figures, any component described with regard to a figure, in various embodiments of the invention, may be equivalent to one or more like-named components described with regard to any other figure. For brevity, descriptions of these components will not be repeated with regard to each figure. Thus, each and every embodiment of the components of each figure is incorporated by reference and assumed to be optionally present within every other figure having one or more like-named components. Additionally, in accordance with various embodiments of the invention, any description of the components of a figure is to be interpreted as an optional embodiment, which may be implemented in addition to, in conjunction with, or in place of the embodiments described with regard to a corresponding like-named component in any other figure.
In general, embodiments of the invention relate to systems, devices, and methods for migrating data between storage. More specifically, the systems, devices, and methods may reduce the amount of data transmitted from a first storage to a second storage when migrating data from the first storage to the second storage. Additionally, the amount of storage space used to store the data on the second storage may be reduced when compared to the amount of storage used to store the data on the first storage.
In one or more embodiments of the invention, a data management device may deduplicate data stored in a persistent storage before transmitting the data for storage to a remote storage. The persistent storage may be organized as an object storage. The data management device may deduplicate the data by identifying duplicate data segments, deleting the duplicate data segments, and only transmitting the segments to the remote storage that were not deleted. Removing the duplicate data segments may reduce the quantity of storage required to store the data in the remote storage when compared to the quantity of storage space required to store the data in the data management device.
In one or more embodiments of the invention, the deduplication may be performed as part of a file migration process. For example, when files are migrated from a data storage device offering a high tier of service to a remote storage that offers a different tier of service than the high tier of service, the data may be deduplicated before being transmitted to the remote storage device. The different tier of service may be, for example, a lower tier of service that is less costly or a higher tier of service that is more costly.
The clients (100) may be computing devices. The computing devices may be, for example, mobile phones, tablet computers, laptop computers, desktop computers, a server, or a cloud resource that aggregates the computing capacity of multiple computing devices and presents itself as a single logical computing device. The computing devices may include one or more processors, memory (e.g., random access memory), and persistent storage (e.g., disk drives, solid state drives, etc.). The persistent storage may store computer instructions, e.g., computer code, that when executed by the processor(s) of the computing device cause the computing device to perform the functions described in this application. The clients (100) may be other types of computing devices without departing from the invention. The clients (100) may be operably linked to the data management device (110) via a network.
The remote storage (190) may be a computing device. The computing device may be, for example, a mobile phone, a tablet computer, a laptop computer, desktop computer, a server, or a cloud resource that aggregates the computing capacity of multiple computing devices and presents itself as a single logical computing device. The computing devices may include one or more processors, memory (e.g., random access memory), and persistent storage (e.g., disk drives, solid state drives, etc.). The persistent storage may store computer instructions, e.g., computer code, that when executed by the processor(s) of the computing device cause the computing device to perform the functions described in this application. The remote storage (190) may be other types of computing devices without departing from the invention.
The remote storage (190) may be operably linked to the data management device (110) via a network. For additional details regarding the remote storage (190), See
The data management device (110) may be a computing device. The computing device may be, for example, a mobile phone, a tablet computer, a laptop computer, a desktop computer, a server, or a cloud resource that aggregates the computing capacity of multiple computing devices and presents itself as a single logical computing device. The computing device may include one or more processors, memory (e.g., random access memory), and persistent storage (e.g., disk drives, solid state drives, etc.). The persistent storage may store computer instructions, e.g., computer code, that when executed by the processor(s) of the computing device cause the computing device to perform the functions described in this application and illustrated in at least
The data management device (110) may include a persistent storage (120), a memory (160), and a data migration optimizer (180). Each component of the data management device (110) is discussed below.
The data management device (110) may include a persistent storage (120). The persistent storage (120) may include physical storage devices. The physical storage devices may be, for example, hard disk drives, solid state drives, tape drives, or any other type of persistent storage media. The persistent storage (120) may include any number and/or combination of physical storage devices.
The persistent storage (120) may include a local object storage (130) for storing data from the clients (100). As used herein, an object storage is a data storage architecture that manages data as objects. Each object may include a number of bytes for storing data in the object. In one or more embodiments of the invention, the object storage does not include a file system. Rather, a local namespace (125) may be used to organize the data stored in the object storage. For additional details regarding the object storage (130), see
The local object storage (130) may be a partially deduplicated storage. As used herein, a partially deduplicated storage refers to a storage that attempts to reduce the required amount of storage space to store data by not storing multiple copies of the same files or bit patterns located near the storage location of the data within the object storage when a the data is first stored in the object storage. A partially deduplicated storage attempts to balance the input-output (IO) limits of the physical devices on which the object storage is stored by only comparing the to-be-stored data to a portion of all of the data stored in the object storage.
To partially deduplicate data, the to-be-stored data may be broken down into segments. The segments may correspond to portions of the to-be-stored data. Fingerprints that identify each segment of the to-be-stored data may be generated. The generated fingerprints may be compared to a portion of pre-existing fingerprints associated with a portion of the data already stored in the object storage. Any segments of the to-be-stored data that do not match a fingerprint of the portion of the data already stored in the object storage may be stored in the object storage, the other segments are not stored in the object storage. A file recipe to generate the now-stored data may be generated and stored so that the now-stored data may be retrieved from the object storage. The recipe may include information that enables all of the segments of the to-be-stored data that were stored in the object storage and all of the segments of the data already stored in the object storage having fingerprints that matched the fingerprints of the segments of the to-be-stored data to be retrieved from the object storage. For additional details regarding file recipes, See
As used herein, a fingerprint may be a bit sequence that virtually uniquely identifies a segment. As used herein, virtually uniquely means that the probability of collision between each fingerprint of two segments that include different data is negligible, compared to the probability of other unavoidable causes of fatal errors. In one or more embodiments of the invention, the probability is 10{circumflex over ( )}-20 or less. In one or more embodiments of the invention, the unavoidable fatal error may be caused by a force of nature such as, for example, a tornado. In other words, the fingerprint of any two segments that specify different data will virtually always be different.
The persistent storage (120) may include the local namespace (125). The local namespace (125) may be a data structure stored on physical storage devices of the persistent storage (120) that organizes the data storage resources of the physical storage devices.
In one or more embodiments of the invention, the local namespace (125) may associate a file with a file recipe stored in the object storage. The local namespace (125) may include information required to obtain the file recipe from the object storage. The information may specify an identifier of the object in which the file recipe is stored. The information may also specify a relative position within the object where the file recipe is stored. The object recipe may be used to generate the file based using segments stored in the object storage. For additional details regarding file recipes, See
The data management device (110) may include a memory (160). The memory (160) may store a hash vector (165). The hash vector (165) may be a data structure including a number of bits corresponding to the number of unique segments stored in the local object storage (130). Each bit of the hash vector (165) may correspond to a fingerprint of a segment stored in the local object storage (130). Fingerprints of multiple segments may be mapped to the same bit of the hash vector (165). Each of the fingerprints of the multiple segments that map to the same bit of the hash vector (165) may be identical, i.e., each of the segments corresponding to the fingerprints that each map to the same bit of the hash vector may be identical.
In one or more embodiments of the invention, the hash vector may be generated by a collision free hash function based on all of the fingerprints stored in the object storage. The collision free hash function may be a perfect hash function.
The memory (160) may store migration file identifiers (170). The migration file identifiers (170) may be a data structure that specifies one or more files stored in the local object storage (130) that are to be migrated to the remote storage (190). The migration file identifiers (170) may be generated using the method shown in
The memory (160) may store a buffer (175). The buffer (175) may be a data structure that includes deduplicated segments of the to-be-migrated files of the object storage and/or fingerprints of the deduplicated segments. The buffer may be used to migrate the to-be-migrated files to the remote storage.
The data management may include a data migration optimizer (180). The data migration optimizer (180) may deduplicate the to-be-migrated files of the local object storage (130) as part of the file migration process. Deduplicating the to-be-migrated files of the local object storage (130) before migration may reduce the amount of data required to be transmitted to the remote storage as part of the migration process and may reduce the amount of storage space of the remote storage required to store the to-be-migrated files.
In one or more embodiments of the invention, the data migration optimizer (180) may be a physical device. The physical device may include circuitry. The physical device may be, for example, a field-programmable gate array, application specific integrated circuit, programmable processor, microcontroller, digital signal processor, or other hardware processor. The physical device may be adapted to provide the functionality described above and to perform the methods shown in
In one or more embodiments of the invention, the data migration optimizer (180) may be implemented as computer instructions, e.g., computer code, stored on a persistent storage that when executed by a processor of the data management device (110) cause the data management device (110) to provide the functionality described above and perform the methods shown in
As discussed above, the local object storage (130) may store data. The local object storage (130) may store additional information.
The segments (135) may be data structures including portions of data. The segments (135) may be used to reconstruct files stored in the local object storage (130). The segments (135) may be portions of multiple files, i.e., two different files that include the same segment. For example two versions of the same word document with minimal differences may both include nearly the same number of segments. The majority each of the segments of both word documents may be identical. Only a single copy of any duplicate segments may be stored in the object storage.
The segment fingerprints (140) may be fingerprints corresponding to each of the segments (135). Each segment fingerprint (140A, 140N) may be generated automatically when a corresponding segment (135A, 135N) is stored in the object storage. Each segment fingerprint (140A, 140N) may uniquely identify a corresponding bit sequence. Thus, each segment having the same segment fingerprint is the same bit sequence.
The file recipes (145) may be data structures that enable a number of segments that may be used to reconstruct a file to be retrieved from the object storage. As described above, when a file, e.g., data, is stored in the object storage, it is broken down into segments and deduplicated. Thus, not all of the segments of each file are stored in the object storage. Rather, only segments of the file that are not already present in the object storage are stored in the object storage when the file is stored in the object storage. For additional details regarding the file recipes (145), See
The file meta-data (150) may be data structures that specify meta-data associated with each file stored in the object storage. For additional details regarding the file meta-data (150), See
Returning to the file recipes (145),
In one or more embodiments of the invention, the file recipe A (145A) includes segments identifiers (ID) (146) that specify the identifiers of each segment used to reconstruct the file. The file ID (147) identifies the file.
In one or more embodiments of the invention, the file recipe A (145A) includes only a single segment ID and a file ID. The single segment ID may enable a segment to be retrieved from the object storage. The single segment includes a top level of a tree data structure rather than a portion of a file. The tree may be a segment tree stored in the object storage. Portions of the tree may specify nodes of the tree while other portions of the tree may include segments. The top level of the tree includes information that enables the lower levels of the tree to be obtained from the object storage. Traversing the segment tree using the information included in the tree may enable of the segments used to regenerate the file to be obtained.
While two embodiments of the file recipes used herein have been described above, the file recipes may have other structures without departing from the invention. Embodiments of the file recipe include any data structure that enables segments of a file to be retrieved from the object storage.
In one or more embodiments of the invention, the file recipe A (145A) may include other information that may be used to obtain the segments. For example, information that identifies an object of the object storage that includes a segment may be included. Additionally, the other information may also specify where within an object the segment is located. In other embodiments of the invention, each object may be self-describing, i.e., specifies the contents of the object and the location of the contents within the object. The file recipe may only specify the objects including each respective segment specified by the file recipe.
The file ID (147) may correspond to one or more namespace entries that relate file names or other identification information provided by clients with the name of the file stored in the object storage. When a client requests data stored in the object storage, the data management device may match the file name or other ID provided by the client to a namespace entry. The namespace entry may specify the file ID (147) and, thus, enable the data management device to obtain the file recipe corresponding to the stored file. The data management device may then use the segments specified by the file recipe to obtain the file be reconstructing it using the segments specified by the file recipe.
As discussed with respect to
The remote storage (190) may be programmed to cooperate with the data management device (110) to migrate files from the data management device (110) to the remote storage (190).
As discussed above, when a file is sent to the data management device for storage, the data management device may divide the file into segments.
In Step 300, files are selected for migration to a remote storage. The selected files may be identified using the method shown in
In Step 310, a collision free hash function for all fingerprints of a local object storage storing the selected files are generated. The collision free hash function may be generated using the method shown in
In Step 320, a hash vector is generated using the collision free hash function. The hash vector may be generated using the method shown in
In Step 330, the selected files are deduplicated using the hash vector. The object storage may be deduplicated using the method shown in
In Step 340, the deduplicated files are migrated to the remote storage.
In one or more embodiments of the invention, the deduplicated files may be migrated to the remote storage by sending fingerprints of the deduplicated files to the remote storage. The remote storage may then compare the fingerprints to existing fingerprints stored in a remote object storage of the remote storage. The remote storage may then notify the data management device of any fingerprints that are not present in the remote storage. The remote storage may then send copies of the segments corresponding to the fingerprints that are not already stored in the remote storage.
The method may end following Step 340.
In Step 301, an unprocessed file stored in the local object storage is selected.
In one or more embodiments of the invention, the file is selected randomly. In other words, any unprocessed file is selected. All of the files stored in the object storage may be unprocessed at the start of the method shown in
In Step 302, the selected unprocessed file is matched to criteria.
In one or more embodiments of the invention, the criteria may include: (i) whether a retention lock specified by meta-data associated with the file is set to indicate that the file should be retained, (ii) whether a date of storage specified by meta-data associated with the file is earlier than a predetermined date, and/or (iii) whether a data of last access specified by the meta-data associated with the file is earlier than predetermined date. The criteria may include other criteria without departing from the invention. The criteria may only include one of (i)-(iii) without departing from the invention.
In one or more embodiments of the invention, the predetermined date six months before the current date, i.e., the date at the time the method illustrated in
In Step 303, it is determined whether the selected unprocessed file matches a criteria. If the selected unprocessed file matches a criteria, the method proceeds to Step 304. If the selected unprocessed file does not match a criteria, the method proceeds to Step 305.
In Step 304, the file identifier of the selected unprocessed file is added to the migration files identifier.
In Step 305, the selected unprocessed file is marked as processed.
In Step 306, it is determined whether all of the files stored in the object storage have been processed. If all of the files have been processed, the method may end following Step 306. If all of the files have not been processed, the method may proceed to Step 301.
In Step 311, the fingerprints stored in the local object storage are walked.
In Step 312, a collision free hash function is generated based on the walk.
In one or more embodiments of the invention, the collision free hash function may be a perfect hash function. The perfect hash function may map each of the unique fingerprints walked in Step 311 to different bits of a bit vector. Multiple fingerprints walked in Step 311 that have the same bit sequence map to the same bit of the bit vector.
The method may end following Step 312.
In Step 321, the namespace of the local object storage is enumerated.
In Step 322, a hash vector is generated using the collision free hash function and all of the fingerprints in the local object storage.
In one or more embodiments of the invention, the hash vector is a perfect hash live vector generated using a perfect hash function.
The method may end following Step 322.
In Step 331, all of the segments associated with the to-be-migrated files are identified. The segments may be identified using the files recipe for each to-be-migrated file. The to-be-migrated files may be those selected using the method shown in
While not illustrated in
In Step 332, an unprocessed segment of the identified segments is selected.
In Step 333, a fingerprint of the selected unprocessed segment is matched to a bit of the hash vector. The fingerprint may be matched to the hash vector using the collision free hash function.
In Step 334, it is determined whether the matched bit indicates that the selected unprocessed segment is unique. In other words, it is determined whether a previously processed fingerprint matched to the matched bit before the fingerprint of the selected unprocessed segment matched to the bit of the hash vector. If the matched bit indicates that the unprocessed segment is unique, the method proceeds to Step 335. If the matched bit does not indicate that the unprocessed segment is unique, the method proceeds to Step 336.
In one or more embodiments of the invention, the value of the matched bit may indicate whether the unprocessed segment is unique. When the hash vector is generated, each bit may be set to a predetermined value. If a bit is not the predetermined value when a fingerprint is matched to the bit, the segment associated with the matched fingerprint may be considered to not be unique.
In Step 335, the selected unprocessed segment is marked as unique and the bit of the hash vector to which the fingerprint associated with the unprocessed segment was matched is flipped.
In one or more embodiments of the invention, the selected unprocessed segment may be marked as unique by adding a copy of the selected unprocessed segment to a buffer. In one or more embodiments of the invention, the selected unprocessed segment in the object storage may be marked by adding the segment to a list, or other data structure, that specifies unique segments.
In Step 335, the selected unprocessed segment is marked as processed.
In Step 336, it is determined whether all of the segments of the object storage have been processed. If all of the segments have been processed, the method proceeds to Step 337. If all of the segments have not been processed, the method proceeds to Step 332.
The method may end following Step 337.
One or more embodiments of the invention may be implemented using instructions executed by one or more processors in the data storage device. Further, such instructions may correspond to computer readable instructions that are stored on one or more non-transitory computer readable mediums.
One or more embodiments of the invention may enable one or more of the following i) improve the performance of a network by reducing the amount of bandwidth used to migrate data, ii) improve the storage capacity of a remote storage by reducing the amount of storage require to store migrated data, and iii) reduce the computational burden required to migrate files by only sending fingerprints associated with unique segments for evaluation by the remote storage to determine whether the unique segment is already present in an object storage of the remote storage.
While the invention has been described above with respect to a limited number of embodiments, those skilled in the art, having the benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.
Number | Name | Date | Kind |
---|---|---|---|
7085911 | Sachedina et al. | Aug 2006 | B2 |
7818515 | Umbehocker et al. | Oct 2010 | B1 |
8046551 | Sahin | Oct 2011 | B1 |
8190835 | Yueh | May 2012 | B1 |
8204868 | Wu et al. | Jun 2012 | B1 |
8396841 | Janakiraman | Mar 2013 | B1 |
8732403 | Nayak | May 2014 | B1 |
8782323 | Glikson et al. | Jul 2014 | B2 |
8898114 | Feathergill et al. | Nov 2014 | B1 |
8898120 | Efstathopoulos | Nov 2014 | B1 |
8904120 | Killammsetti et al. | Dec 2014 | B1 |
8918390 | Shilane et al. | Dec 2014 | B1 |
8943032 | Xu | Jan 2015 | B1 |
8949208 | Xu et al. | Feb 2015 | B1 |
9183200 | Liu et al. | Nov 2015 | B1 |
9244623 | Bent et al. | Jan 2016 | B1 |
9250823 | Kamat et al. | Feb 2016 | B1 |
9251160 | Wartnick | Feb 2016 | B1 |
9274954 | Bairavasundaram et al. | Mar 2016 | B1 |
9280550 | Hsu et al. | Mar 2016 | B1 |
9298724 | Patil et al. | Mar 2016 | B1 |
9317218 | Botelho | Apr 2016 | B1 |
9336143 | Wallace et al. | May 2016 | B1 |
9390116 | Li et al. | Jul 2016 | B1 |
9390281 | Whaley et al. | Jul 2016 | B2 |
9424185 | Botelho | Aug 2016 | B1 |
9442671 | Zhang et al. | Sep 2016 | B1 |
9830111 | Patiejunas et al. | Nov 2017 | B1 |
10002048 | Chennamsetty et al. | Jun 2018 | B2 |
10031672 | Wang et al. | Jul 2018 | B2 |
10078451 | Floyd et al. | Sep 2018 | B1 |
10102150 | Visvanathan et al. | Oct 2018 | B1 |
10175894 | Visvanathan et al. | Jan 2019 | B1 |
10445292 | Zhang et al. | Oct 2019 | B1 |
20030110263 | Shillo | Jun 2003 | A1 |
20050120058 | Nishio | Jun 2005 | A1 |
20050160225 | Presler-Marshall | Jul 2005 | A1 |
20050182906 | Chatterjee et al. | Aug 2005 | A1 |
20060075191 | Lolayekar et al. | Apr 2006 | A1 |
20080082727 | Wang | Apr 2008 | A1 |
20080133446 | Dubnicki et al. | Jun 2008 | A1 |
20080133561 | Dubnicki et al. | Jun 2008 | A1 |
20080216086 | Tanaka et al. | Sep 2008 | A1 |
20080244204 | Cremelie et al. | Oct 2008 | A1 |
20090235115 | Butlin | Sep 2009 | A1 |
20090271454 | Anglin et al. | Oct 2009 | A1 |
20100049735 | Hsu | Feb 2010 | A1 |
20100094817 | Ben-Shaul et al. | Apr 2010 | A1 |
20100250858 | Cremelie et al. | Sep 2010 | A1 |
20110055471 | Thatcher et al. | Mar 2011 | A1 |
20110099200 | Blount et al. | Apr 2011 | A1 |
20110099351 | Condict | Apr 2011 | A1 |
20110161557 | Haines et al. | Jun 2011 | A1 |
20110185149 | Gruhl et al. | Jul 2011 | A1 |
20110196869 | Patterson et al. | Aug 2011 | A1 |
20110231594 | Sugimoto et al. | Sep 2011 | A1 |
20110276781 | Sengupta et al. | Nov 2011 | A1 |
20120158670 | Sharma et al. | Jun 2012 | A1 |
20120209873 | He | Aug 2012 | A1 |
20120278511 | Alatorre et al. | Nov 2012 | A1 |
20130036098 | Mutalik et al. | Feb 2013 | A1 |
20130055018 | Joshi et al. | Feb 2013 | A1 |
20130060739 | Kalach et al. | Mar 2013 | A1 |
20130111262 | Taylor et al. | May 2013 | A1 |
20130138620 | Yakushev et al. | May 2013 | A1 |
20140012822 | Sachedina et al. | Jan 2014 | A1 |
20140040205 | Cometto et al. | Feb 2014 | A1 |
20140047181 | Peterson et al. | Feb 2014 | A1 |
20140258248 | Lambright et al. | Sep 2014 | A1 |
20140258824 | Khosla et al. | Sep 2014 | A1 |
20140281215 | Chen et al. | Sep 2014 | A1 |
20140310476 | Kruus | Oct 2014 | A1 |
20150074679 | Fenoglio et al. | Mar 2015 | A1 |
20150106345 | Trimble et al. | Apr 2015 | A1 |
20150178171 | Bish et al. | Jun 2015 | A1 |
20150331622 | Chiu et al. | Nov 2015 | A1 |
20160026652 | Zheng | Jan 2016 | A1 |
20160112475 | Lawson et al. | Apr 2016 | A1 |
20160188589 | Guilford et al. | Jun 2016 | A1 |
20160224274 | Kato | Aug 2016 | A1 |
20160239222 | Shetty et al. | Aug 2016 | A1 |
20160323367 | Murtha et al. | Nov 2016 | A1 |
20160342338 | Wang | Nov 2016 | A1 |
20170093961 | Pacella et al. | Mar 2017 | A1 |
20170199894 | Aronovich et al. | Jul 2017 | A1 |
20170220281 | Gupta et al. | Aug 2017 | A1 |
20170220334 | Hart et al. | Aug 2017 | A1 |
20170300424 | Beaverson et al. | Oct 2017 | A1 |
20170352038 | Parekh et al. | Dec 2017 | A1 |
20170359411 | Burns et al. | Dec 2017 | A1 |
20180089037 | Liu et al. | Mar 2018 | A1 |
20180146068 | Johnston et al. | May 2018 | A1 |
20180322062 | Watkins et al. | Nov 2018 | A1 |
Number | Date | Country |
---|---|---|
2738665 | Jun 2014 | EP |
2013056220 | Apr 2013 | WO |
2013115822 | Aug 2013 | WO |
2014185918 | Nov 2014 | WO |
Entry |
---|
International Search Report and Written Opinion issued in corresponding PCT Application PCT/US2018/027646, dated Jul. 27, 2018. (30 pages). |
Deepavali Bhagwat et al.; “Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup”; IEEE MASCOTS; Sep. 2009 (10 pages). |
Mark Lillibridge et al.; “Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality”; 7th USENIX Conference on File and Storage Technologies, USENIX Association; pp. 111-123; 2009 (13 pages). |
Extended European Search Report issued in corresponding European Application No. 18184842.5, dated Sep. 19, 2018. |
Jaehong Min et al.; “Efficient Deduplication Techniques for Modern Backup Operation”; IEEE Transactions on Computers; vol. 60, No. 6; pp. 824-840; Jun. 2011. |
Daehee Kim et al.; “Existing Deduplication Techniques”; Data Depublication for Data Optimization for Storage and Network Systems; Springer International Publishing; DOI: 10.1007/978-3-319-42280-0_2; pp. 23-76; Sep. 2016. |
International Search Report and Written Opinion issued in corresponding WO application No. PCT/US2018/027642, dated Jun. 7, 2018 (15 pages). |
Extended European Search Report issued in corresponding European Application No. 18185076.9, dated Dec. 7, 2018 (9 pages). |
Lei Xu et al.; “SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Center”; 2011 31st International Conference on Distributed Computing Systems Workshops (ICDCSW); IEEE Computer Society; pp. 61-65; 2011 (5 pages). |
Aseem Kishore; “What is a Checksum and How to Calculate a Checksum”; Online Tech Tips; Feb. 18, 2015; https://www.online-tech-tips.com/cool-websites/what-is-checksum/. |