Computing devices generate, use, and store data. The data may be, for example, images, document, webpages, or meta-data associated with any of the files. The data may be stored locally on a persistent storage of a computing device and/or may be stored remotely on a persistent storage of another computing device.
In one aspect, a data storage device in accordance with one or more embodiments of the invention includes a cache for a data storage and a processor. The data storage includes an object storage. The processor obtains cache hardware heuristics data for a first time period; makes a first determination that the cache hardware heuristics data for the first time period does not meet a goal associated with the first time period; and populates the cache using a reduced size index cache in response to the first determination during a second time period.
In one aspect, a method of operating a data storage device in accordance with one or more embodiments of the invention includes obtaining, by the data storage device, cache hardware heuristics data for a first time period. The cache hardware heuristics data is associated with a cache for an object storage. The method also includes making, by the data storage device, a first determination that the cache hardware heuristics data for the first time period does not meet a goal associated with the first time period and populating, by the data storage device, the cache using a reduced size index cache during a second time period in response to the first determination.
In one aspect, a non-transitory computer readable medium in accordance with one or more embodiments of the invention includes computer readable program code, which when executed by a computer processor enables the computer processor to perform a method for operating a data storage device, the method includes obtaining, by the data storage device, cache hardware heuristics data for a first time period. The cache hardware heuristics data is associated with a cache for an object storage. The method also includes making, by the data storage device, a first determination that the cache hardware heuristics data for the first time period does not meet a goal associated with the first time period and populating, by the data storage device, the cache using a reduced size index cache during a second time period in response to the first determination.
Certain embodiments of the invention will be described with reference to the accompanying drawings. However, the accompanying drawings illustrate only certain aspects or implementations of the invention by way of example and are not meant to limit the scope of the claims.
Specific embodiments will now be described with reference to the accompanying figures. In the following description, numerous details are set forth as examples of the invention. It will be understood by those skilled in the art that one or more embodiments of the present invention may be practiced without these specific details and that numerous variations or modifications may be possible without departing from the scope of the invention. Certain details known to those of ordinary skill in the art are omitted to avoid obscuring the description.
In the following description of the figures, any component described with regard to a figure, in various embodiments of the invention, may be equivalent to one or more like-named components described with regard to any other figure. For brevity, descriptions of these components will not be repeated with regard to each figure. Thus, each and every embodiment of the components of each figure is incorporated by reference and assumed to be optionally present within every other figure having one or more like-named components. Additionally, in accordance with various embodiments of the invention, any description of the components of a figure is to be interpreted as an optional embodiment, which may be implemented in addition to, in conjunction with, or in place of the embodiments described with regard to a corresponding like-named component in any other figure.
In general, embodiments of the invention relate to systems, devices, and methods for storing data. More specifically, the systems, devices, and methods may reduce the amount of storage required to store data.
In one or more embodiments of the invention, a data storage device may deduplicate data before storing the data in a data storage. The data storage device may deduplicate the data against data already stored in the data storage before storing the deduplicated data in the data storage.
For example, when multiple versions of a large text document having only minimal differences between each of the versions are stored in the data storage, storing each version will require approximately the same amount of storage space if not deduplicated. In contrast, when the multiple versions of the large text document are deduplicated before storage, only the first version of the multiple versions stored will require a substantial amount of storage. Segments that are unique to both versions of the word document will be retained in the storage while duplicate segments included in subsequently stored version of the large text document will not be stored.
To deduplicate data, a file of the data may be broken down into segments. Fingerprints of the segments of the file may be generated. As used herein, a fingerprint may be a bit sequence that virtually uniquely identifies a segment. As used herein, virtually uniquely means that the probability of collision between each fingerprint of two segments that include different data is negligible, compared to the probability of other unavoidable causes of fatal errors. In one or more embodiments of the invention, the probability is 10−20 or less. In one or more embodiments of the invention, the unavoidable fatal error may be caused by a force of nature such as, for example, a tornado. In other words, the fingerprint of any two segments that specify different data will virtually always be different.
In one or more embodiments of the invention, the fingerprints of the segments are generated using Rabin's fingerprinting algorithm. In one or more embodiments of the invention, the fingerprints of the unprocessed file segment are generated using a cryptographic hash function. The cryptographic hash function may be, for example, a message digest (MD) algorithm or a secure hash algorithm (SHA). The message MD algorithm may be MD5. The SHA may be SHA-0, SHA-1, SHA-2, or SHA3. Other fingerprinting algorithms may be used without departing from the invention.
To determine whether any of the segments of the file are duplicates of segments already stored in the data storage, the fingerprints of the segments of the file may be compared to the fingerprints of segments already stored in the data storage. Any segments of the file having fingerprints that match fingerprints of segments already stored in the data storage may be marked as duplicate and not stored in the data storage. Not storing the duplicate segments in the data storage may reduce the quantity of storage required to store the file when compared to the quantity of storage space required to store the file without deduplicating the segments of the files.
In one or more embodiments of the invention, the data storage device may include a cache that mirrors all of the fingerprints, or a portion thereof, in the data storage. The cache maybe hosted by one or more physical storage devices that are higher performance than the physical stored devices hosting the data storage. In one or more embodiments of the invention, the cache may be hosted by solid state drives and the data storage may be hosted by one or more hard disk drives.
In one or more embodiments of the invention, the data storage device may update the cache based on changes to the data stored in the data storage. The data storage device may control the rate and content of the updates to the cache to meet one or more cache hardware heuristics. The cache hardware heuristics may specify, for example, a goal of writing a predetermined amount of data, or less, to the cache. The goal may be based on a limitation of the physical storage devices hosting the cache. For example, some types of solid state drives have a limited number of write cycles before the drive, or a portion thereof, becomes inoperable. Controlling the rate and content of the updates to the cache to meet the one or more cache hardware heuristics goal may extend the life of the physical storage devices hosting the cache to predetermined goal.
The clients (110) may be computing devices. The computing devices may be, for example, mobile phones, tablet computers, laptop computers, desktop computers, or servers. The computing devices may include one or more processors, memory (e.g., random access memory), and persistent storage (e.g., disk drives, solid state drives, etc.). The persistent storage may store computer instructions, e.g., computer code, that when executed by the processor(s) of the computing device cause the computing device to perform the functions described in this application. The clients (110) may be other types of computing devices without departing from the invention. The clients (110) may be operably connected to the data storage device (100) via a network.
The clients (110) may store data in the data storage device (100). The data may be of any time or quantity. The clients (110) may store the data in the data storage device (100) by sending data storage requests to the data storage device (100) via an operable connection. The data storage request may specify one or more names that identify the data to-be-stored by the data storage device (100) and include the data. The names that identify the data to-be-stored may be later used by the clients (110) to retrieve the data from the data storage device (100) by sending data access requests including the identifiers included in the data storage request that caused the data to be stored in the data storage device (100).
The data storage device (100) may be a computing device. The computing device may be, for example, a mobile phone, a tablet computer, a laptop computer, a desktop computer, a server, or a cloud resource. As used herein, a cloud resource means a logical computing resource that utilizes the physical computing resources of multiple computing devices, e.g., a cloud service. The computing device may include one or more processors, memory (e.g., random access memory), and persistent storage (e.g., disk drives, solid state drives, etc.). The persistent storage may store computer instructions, e.g., computer code, that when executed by the processor(s) of the computing device cause the computing device to perform the functions described in this application and illustrated in at least
The data storage device (100) may store data sent to the data storage device (100) from the clients (110) and provide data stored in the data storage device (100) to the clients (110). The data storage device (100) may include a data storage (120) that stores the data from the clients, a cache (130), a data deduplicator (140), and a cache manager (150). Each component of the data storage device (100) is discussed below.
The data storage device (100) may include a data storage (120). The data storage (120) may be hosted by a persistent storage that includes physical storage devices. The physical storage devices may be, for example, hard disk drives, solid state drives, hybrid disk drives, tape drives that support random access, or any other type of persistent storage media. The data storage (120) may include any number and/or combination of physical storage devices.
The data storage (120) may include an object storage (121) for storing data from the clients (110). As used herein, an object storage is a data storage architecture that manages data as objects. Each object may include a number of bytes for storing data in the object. In one or more embodiments of the invention, the object storage does not include a file system. Rather, a namespace (not shown) may be used to organize the data stored in the object storage. The namespace may associate names of files stored in the object storage with identifiers of segments of files stored in the object storage. The namespace may be stored in the data storage. For additional details regarding the object storage (121), see
The object storage (121) may be a partially deduplicated storage. As used herein, a partially deduplicated storage refers to a storage that attempts to reduce the required amount of storage space to store data by not storing multiple copies of the same files or bit patterns. A partially deduplicates storage attempts to balance the input-output (IO) limits of the physical devices on which the object storage is stored by only comparing the to-be-stored data to a portion of all of the data stored in the object storage.
To partially deduplicate data, the to-be-stored data may be broken down into segments. The segments may correspond to portions of the to-be-stored data. Fingerprints that identify each segment of the to-be-stored data may be generated. The generated fingerprints may be compared to the fingerprints of a portion of the segments stored in the object storage. In other words, the fingerprints of the to-be-stored data may only be deduplicated against the fingerprints of a portion of the segments in the object storage and is not deduplicated against the fingerprints of all of the segments in the object storage. Any segments of the to-be-stored data that do not match a fingerprint of the portion of the segments stored in the object storage may be stored in the object storage, the other segments may not be stored in the object storage. A recipe to generate the now-stored data may be generated and stored in the data storage so that the now-stored data may be retrieved from the object storage. The recipe may enable all of the segments required to generate the now-stored data to be retrieved from the object storage. Retrieving the aforementioned segments may enable the file to be regenerated. The retrieved segments may include segments that were generated when segmenting the data and segments that were generated when segmenting other data that was stored in the object storage prior to storing the now-stored segments.
In one or more embodiments of the invention, the namespace may be a data structure stored on physical storage devices of the data storage (120) that organizes the data storage resources of the physical storage devices. In one or more embodiments of the invention, the namespace may associate a file with a file recipe stored in the object storage. The file recipe may be used to generate the file based using segments stored in the object storage.
The data storage device (100) may include an index (122). The index may be a data structure that includes fingerprints of each segment stored in the object storage and associates each of the fingerprints with an identifier of a segment from which the respective fingerprint was generated. For additional details regarding the index (122), See
The data storage device (100) may include segment identifiers (ID) to object mappings (123). The mappings may associate an ID of a segment with an object of the object storage that includes the segment identified by the segment ID. The aforementioned mappings may be used to retrieve segments from the object storage.
More specifically, when a data access request is received, it may include a file name. The file name may be used to query the namespace to identify a file recipe. The file recipe may be used to identify the identifiers of segments required to generated the file identified by the file name. The segment ID to object mappings may enable object of the object storage the include the segment identified by the segment IDs of the file recipe to be identified. As will be discussed below, each object of the object may be self-describing and, thereby, enable the segments to be retrieved from the objects once the objects that include the segments are identified. For additional details regarding the segment identifiers ID to object mappings (123), See
As discussed above, the data storage device (100) may include a cache (130). The cache (130) may be hosted by a persistent storage that includes physical storage devices. The physical storage devices may be, for example, hard disk drives, solid state drives, hybrid disk drives, or any other type of persistent storage media. The physical storage devices of the cache (130) may be have better performance characteristics than the physical storage devices of the data storage (120). For example, the physical storage devices of the cache may support higher 10 rates than the physical storage devices off the data storage. In one or more embodiments of the invention, the physical storage devices hosting the cache may be a number of solid state drives and the physical storage hosting the data storage may be hard disk drives. The cache (130) may include any number and/or combination of physical storage devices.
The cache (130) may include an index cache (131). The index cache (131) may be a cache for the fingerprints of the index. More specifically, the index cache (131) maybe a data structure that includes a portion of the fingerprints of the index (122). When deduplicating data, the data storage device may first attempt to retrieve fingerprints from the index cache (131). If the fingerprints are not in the cache, the data storage device may retrieve the fingerprints from the index (122).
In one or more embodiments of the invention, the index cache (131) mirrors all of the fingerprints of the index (122) when the cache hardware heuristics (132) are meeting a predetermined goal. When the cache hardware heuristics (132) are not meeting a predetermined goal, the index cache (131) only mirrors a portion of the fingerprints in the index. As will be discussed in further detail below, reducing the number of fingerprints stored in the index cache may reduce the amount of data written to the physical storage devices hosting the cache (130) and, thereby, may help to meet a predetermined goal of the cache hardware heuristics (132). For additional details regarding the index cache (131), See
The cache (132) may also include a cache hardware heuristics (132). The cache hardware heuristics (132) may include data regarding the usage of the physical storage devices hosting the cache (130). The cache hardware heuristics (132) may also include a goal for the usage of the physical storage devices hosting the cache (130).
While the cache hardware heuristics (132) are illustrated as being stored in the cache (130), the cache hardware heuristics (132) maybe stored in the data storage (120), in memory (not shown) of the data storage device (100), and/or on a storage of another computing device operably connected to the data storage device (100) without departing from the invention. For additional details regarding the cache hardware heuristics (132), See
The data storage device (100) may include a data deduplicator (140). The data deduplicator (140) may partially deduplicate segments of files before the segments are stored in the object storage (121). As discussed above, the segments may be partially deduplicated by comparing fingerprints of the segments of the to-be-stored file to a portions of the fingerprints stored in the index cache (131) and/or the index (122). In other words, the data deduplicator (140) may generate partially deduplicated segments, i.e., segments that have been deduplicated against a portion of the data stored in the object storage. Thus, the partially deduplicated segments still may include segments that are duplicates of segments stored in the object storage (121)
In one or more embodiments of the invention, the data deduplicator (140) may be a physical device. The physical device may include circuitry. The physical device may be, for example, a field-programmable gate array, application specific integrated circuit, programmable processor, microcontroller, digital signal processor, or other hardware processor. The physical device may be adapted to provide the functionality described throughout this application.
In one or more embodiments of the invention, the data deduplicator (140) may be implemented as computer instructions, e.g., computer code, stored on a persistent storage that when executed by a processor of the data storage device (100) cause the data storage device (100) to provide the functionality described throughout this application.
When deduplicating segments, the data deduplicator (140) compares the fingerprints of segments of to-be-stored files to the fingerprints of segments in the object storage (121). To improve the rate of the deduplication, the index cache (131) may be used to provide the fingerprints of the segments in the object storage (121) rather than the index (122).
The data storage device (100) may include a cache manager (141) that manages the contents of the index cache (131). More specifically, the cache manager (141) may reduce the number of fingerprints stored in the index cache to meet a predetermined cache hardware heuristics goal. The cache manager (141) may reduce the number of fingerprints in the index cache (131) by: (i) completely deduplicating the fingerprints of the partially deduplicated segments against all of the fingerprints of the index cache (131) and (ii) by not storing/removing fingerprints of segments of files that were stored in the object storage before a predetermined date in/from the index cache (131).
In one or more embodiments of the invention, the cache manager (141) may be a physical device. The physical device may include circuitry. The physical device may be, for example, a field-programmable gate array, application specific integrated circuit, programmable processor, microcontroller, digital signal processor, or other hardware processor. The physical device may be adapted to provide the functionality described throughout this application and the methods illustrated in
In one or more embodiments of the invention, the cache manager (141) may be implemented as computer instructions, e.g., computer code, stored on a persistent storage that when executed by a processor of the data storage device (100) cause the data storage device (100) to provide the functionality described throughout this application and the methods illustrated in
As discussed above, the index (122) and index cache (131) may be used to supply fingerprints to the data deduplicator (140) when segments of files are being deduplicated.
The index (122) and index cache (131) may include fingerprints of segments stored in the object storage (121,
The segments region description (162) may specify, for example, the start point of the segments region (163A) from the start of object A (160), the length of each segment (163B, 163C), and/or the end point of the segments region (163A). The segments region description (163) may include other/different data that enables the object to be self describing without departing from the invention.
The meta-data of segments (161) may include, for example, the fingerprint of each segment and/or the size of each segment in the segments region (163A). The mea-data of segments (161) may include other/different data without departing from the invention.
Returning to
Returning to
The write rates over time (170) may specify historical usage data of the physical storage devices hosting the cache. More specifically, the historical usage data may specify the quantity of data written to the physical storage devices hosting the cache on a daily basis. For example, the write rates over time (170) may specify that on a first day 100 gigabytes of data was written, on a second day 150 gigabytes of data was written, on a third day 120 gigabytes of data was written, etc.
The goal write rate (171) may specify a write rate goal. The goal may be an average write rate over a predetermined period of time. In one or more embodiments of the invention, the goal may be to write the total storage capacity of physical storage devices hosting the cache three times per day. In one or more embodiments of the invention, the goal may be to limit the average amount of data written to the physical storage devices hosting the cache based on a write limitation of the physical storage devices hosting the cache. The write limitation may be the average number of times the cells of the physical storage devices hosting the cache may be written before the cells stop working. The goal may be to ensure that the cells do not stop working before a predetermined time. The predetermined time may be, for example, three years, four years, five years, etc.
As discussed above, when a file is sent to the data storage device for storage, the data management device may divide the file into segments.
In Step 300, cache hardware heuristics are obtained.
In Step 305, it is determined whether the cache hardware heuristics meet a goal. More specifically, it may be determined whether the write rate over time of the physical devices hosting the cache meets a goal write rate. The goal write rate may be, for example, a quantity of data equal to three times the total quantity of storage space of the cache per day. If the cache hardware heuristics meets the goal, the method proceeds to Step 310. If the cache hardware heuristics do not meet the goal, the method proceeds to Step 315.
In Step 310, the cache is populated using a full size index cache. The cache may be populated using a full size index cache using the method shown in
The method may end following Step 310.
Returning to Step 315, the cache is populated using a reduced size index cache. The cache may be populated using the reduced size index cache using the method shown in
The method may end following Step 315.
The method shown in
In Step 400, partially deduplicated fingerprints associated with segments of a file are obtained. The partially deduplicated fingerprints may be obtained from the data deduplicator (140).
In Step 405, a fingerprint and segment identifier associated with each fingerprint of the partially deduplicated fingerprints is stored in the index of the data storage.
In Step 410, each fingerprint of the partially deduplicated fingerprints is stored in the index cache of the cache.
The method may end following Step 410.
In Step 500, partially deduplicated fingerprints associated with segments of a file are obtained. The partially deduplicated fingerprints may be obtained from the data deduplicator (140).
In Step 505, a cache size reduction analysis is performed. The cache size reduction analysis may identify a portion of the partially deduplicated fingerprints for storage in the index cache of the cache. The cache size reduction analysis may be performed using the method shown in
In Step 510, a fingerprint and segment identifier associated with each fingerprint of the partially deduplicated fingerprints is stored in the index of the data storage.
In Step 515, a portion of the fingerprints of the partially deduplicated fingerprints is stored in the index cache of the cache is stored in the index cache. The portion of the fingerprints is based on the cache size reduction analysis. The cache size reduction analysis may select fingerprints of the partially deduplicated fingerprints that will not be stored in the index cache and thereby reduce the size of the index cache. Reducing the size of the index cache may reduce the amount of data written to the physical devices hosting the cache and thereby enable the data storage device to meet a goal.
The method may end following Step 515.
In Step 600, an unprocessed fingerprint of the partially deduplicated fingerprints is selected. At the start of the method shown in
In Step 605, it is determined whether the selected unprocessed fingerprint matches any fingerprint stored in the index cache. If the selected unprocessed fingerprint matches any fingerprint stored in the index cache, the method processed to Step 615. If the selected unprocessed fingerprint does not match any fingerprint stored in the index cache, the method processed to Step 610.
In Step 610, the selected unprocessed partially deduplicated fingerprint is marked as to-be-stored in the index cache.
In Step 615, the selected unprocessed fingerprint is marked as processed.
In Step 620, it is determined whether all of the partially deduplicated fingerprints have been processed. If all of the partially deduplicated fingerprints have been processed, the method may end following Step 620. If all of the partially deduplicated fingerprints have not been processed, the method may proceed to Step 600.
The method may end following Step 620.
As discussed with respect to Step 610, partially deduplicated fingerprints may be marked as to-be-stored as part of the step. The portion of the partially deduplicated fingerprints in Step 515 may be the partially deduplicated fingerprints marked as to be stored in Step 610.
In Step 700, an unprocessed fingerprint of the partially deduplicated fingerprints is selected. At the start of the method shown in
In one or more embodiments of the invention, the partially deduplicated fingerprints may include all of the fingerprints included in the index cache at the start of method shown in
In Step 705, a storage age of the segment associated with the selected unprocessed partially deduplicated fingerprint is determined.
In one or more embodiments of the invention, the storage age may be determined based on an identifier of an object in which the segment is stored. In one or more embodiments of the invention, identifiers of objects may be numerical values that monotonically increase as each object is stored in the object storage. Thus, objects that are stored at earlier points in time may have lower object IDs while object that are stored t later points in time may have higher object IDs.
In Step 710, it is determined whether the storage age of the segment is less than a predetermined storage age.
As discussed above, the storage age may be the ID of the object in which the segment is stored. In one or more embodiments of the invention, the predetermined storage age is an ID of an object of the object storage. The ID of the object may an ID that results in a predetermined percentage of the object of the object storage having an ID that is less than the ID of the object. In one or more embodiments of the invention, the predetermined percentage may be 10%. In one or more embodiments of the invention, the predetermined percentage may be between 5% and 20%.
For example, if the object storage contains five objects having IDs of 1-5, respectively, a predetermined percentage of 20% may be selected. Based on the predetermined percentage, any segments stored in the first object, i.e., ID 1, will be determined as having a storage age greater than the predetermined storage age.
If the storage age of the segment is less than the predetermined storage age, the method proceeds to Step 715. If the storage age of the segment is not less than the predetermined storage age, the method proceeds to Step 720.
In Step 715, the selected unprocessed partially deduplicated fingerprint is marked as to-be-stored in the index cache.
In Step 720, the selected unprocessed partially deduplicated fingerprint is marked as processed.
In Step 725, it is determined whether all of the partially deduplicated fingerprints have been processed. If all of the partially deduplicated fingerprints have been processed, the method may end following Step 725. If all of the partially deduplicated fingerprints have not been processed, the method may proceed to Step 700.
While the methods illustrated in
In one or more embodiments of the invention, the methods illustrated in
For example, when the data write rate to the physical storage devices hosting the cache differs from the goal by between 1% and 20%, the data storage device may only perform the method shown in
To further clarify embodiments of the invention,
The data storage device has a goal (800) that specifies the average amount of data to be written to the physical storage devices hosting the cache. During the first time period, indicated by the portion of the plot to the left of time T1, the data storage device is turned on and start storing data. As part of the data storage process, the data is deduplicated.
To facilitate deduplication, the cache manager continuously updates the index cache as indicated by the rising cache write rate. In other words, as the cache is populated throughout the period, the cache manager writes all of the fingerprints of the deduplicated segments of the data to the index cache.
During the second period of time, indicated by the portion of the plot between T1 and T2, the cache manager continues to update the cache as shown in
At time T2, the cache manager evaluates whether to continue to populate the cache using a full size index cache or a reduced size index cache. To do so, the cache manager calculates the average rate data is written to the physical storage devices hosting the cache for a given window (810). In this case, the window is selected to be two time periods, e.g., the area along the horizontal axis to the left of time T1.
As seen from row 2 of
Moving to
As seen from row 3 of
Moving to
As seen from row 4 of
Moving to
As seen from row 5 of
Moving to
As seen from row 6 of
Moving to
As seen from row 7 of
Moving to
As seen from row 8 of
Moving to
As seen from row 9 of
Moving to
As seen from row 10 of
The Example ends following time T10.
One or more embodiments of the invention may be implemented using instructions executed by one or more processors in the data storage device. Further, such instructions may correspond to computer readable instructions that are stored on one or more non-transitory computer readable mediums.
One or more embodiments of the invention may enable one or more of the following: i) improve the operational lifetime of physical storage devices hosting a cache, ii) reduce the quantity of data stored in an index cache without causing cache misses due to the reduced amount of data stored in the index cache, and iii) reduce computational cost of performing deduplication of file segments by reducing the number of fingerprints which the segments are deduplicated against.
While the invention has been described above with respect to a limited number of embodiments, those skilled in the art, having the benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.
Number | Name | Date | Kind |
---|---|---|---|
7085911 | Sachedina | Aug 2006 | B2 |
7818515 | Umbehocker et al. | Oct 2010 | B1 |
8190835 | Yueh | May 2012 | B1 |
8396841 | Janakiraman | Mar 2013 | B1 |
8732403 | Nayak | May 2014 | B1 |
8782323 | Glikson et al. | Jul 2014 | B2 |
8898114 | Feathergill et al. | Nov 2014 | B1 |
8898120 | Efstathopoulos | Nov 2014 | B1 |
8904120 | Killammsetti et al. | Dec 2014 | B1 |
8918390 | Shilane et al. | Dec 2014 | B1 |
8943032 | Xu et al. | Jan 2015 | B1 |
8949208 | Xu et al. | Feb 2015 | B1 |
9183200 | Liu et al. | Nov 2015 | B1 |
9244623 | Bent et al. | Jan 2016 | B1 |
9250823 | Kamat et al. | Feb 2016 | B1 |
9251160 | Wartnick | Feb 2016 | B1 |
9280550 | Hsu et al. | Mar 2016 | B1 |
9298724 | Patil et al. | Mar 2016 | B1 |
9317218 | Botelho et al. | Apr 2016 | B1 |
9336143 | Wallace et al. | May 2016 | B1 |
9390116 | Li | Jul 2016 | B1 |
9390281 | Whaley et al. | Jul 2016 | B2 |
9442671 | Zhang et al. | Sep 2016 | B1 |
9830111 | Patiejunas et al. | Nov 2017 | B1 |
10002048 | Chennamsetty et al. | Jun 2018 | B2 |
10031672 | Wang et al. | Jul 2018 | B2 |
10102150 | Visvanathan et al. | Oct 2018 | B1 |
10175894 | Visvanathan et al. | Jan 2019 | B1 |
20030110263 | Shillo | Jun 2003 | A1 |
20050120058 | Nishio | Jun 2005 | A1 |
20050160225 | Presler-Marshall | Jul 2005 | A1 |
20050182906 | Chatterjee et al. | Aug 2005 | A1 |
20060075191 | Lolayekar et al. | Apr 2006 | A1 |
20080082727 | Wang | Apr 2008 | A1 |
20080133446 | Dubnicki et al. | Jun 2008 | A1 |
20080133561 | Dubnicki et al. | Jun 2008 | A1 |
20080216086 | Tanaka et al. | Sep 2008 | A1 |
20080244204 | Cremelie | Oct 2008 | A1 |
20090235115 | Butlin | Sep 2009 | A1 |
20090271454 | Anglin et al. | Oct 2009 | A1 |
20100049735 | Hsu | Feb 2010 | A1 |
20100094817 | Ben-Shaul et al. | Apr 2010 | A1 |
20100250858 | Cremelie et al. | Sep 2010 | A1 |
20110055471 | Thatcher et al. | Mar 2011 | A1 |
20110099351 | Condict | Apr 2011 | A1 |
20110161557 | Haines et al. | Jun 2011 | A1 |
20110185149 | Gruhl et al. | Jul 2011 | A1 |
20110196869 | Patterson et al. | Aug 2011 | A1 |
20110231594 | Sugimoto et al. | Sep 2011 | A1 |
20120158670 | Sharma et al. | Jun 2012 | A1 |
20120278511 | Alatorre et al. | Nov 2012 | A1 |
20130060739 | Kalach et al. | Mar 2013 | A1 |
20130111262 | Taylor et al. | May 2013 | A1 |
20130138620 | Yakushev et al. | May 2013 | A1 |
20140012822 | Sachedina et al. | Jan 2014 | A1 |
20140258248 | Lambright et al. | Sep 2014 | A1 |
20140258824 | Khosla et al. | Sep 2014 | A1 |
20140281215 | Chen et al. | Sep 2014 | A1 |
20140310476 | Kruus | Oct 2014 | A1 |
20150106345 | Trimble et al. | Apr 2015 | A1 |
20150331622 | Chiu et al. | Nov 2015 | A1 |
20160026652 | Zheng | Jan 2016 | A1 |
20160112475 | Lawson et al. | Apr 2016 | A1 |
20160188589 | Guilford et al. | Jun 2016 | A1 |
20160224274 | Kato | Aug 2016 | A1 |
20160239222 | Shetty et al. | Aug 2016 | A1 |
20160323367 | Murtha et al. | Nov 2016 | A1 |
20160342338 | Wang | Nov 2016 | A1 |
20170093961 | Pacella et al. | Mar 2017 | A1 |
20170220281 | Gupta et al. | Aug 2017 | A1 |
20170300424 | Beaverson et al. | Oct 2017 | A1 |
20170359411 | Burns et al. | Dec 2017 | A1 |
20180089037 | Liu et al. | Mar 2018 | A1 |
20180146068 | Johnston et al. | May 2018 | A1 |
20180322062 | Watkins et al. | Nov 2018 | A1 |
Number | Date | Country |
---|---|---|
2738665 | Jun 2014 | EP |
2013056220 | Apr 2013 | WO |
2013115822 | Aug 2013 | WO |
2014185918 | Nov 2014 | WO |
Entry |
---|
International Search Report and Written Opinion issued in corresponding WO application No. PCT/US2018/027642, dated Jun. 7, 2018 (15 pages). |
Deepavali Bhagwat et al.; “Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup”; IEEE Mascots; Sep. 2009 (10 pages). |
Mark Lillibridge et al.; “Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality”; 7th USENIX Conference on File and Storage Technologies, USENIX Association; pp. 111-123; 2009 (13 pages). |
International Search Report and Written Opinion issued in corresponding PCT Application PCT/US2018/027646, dated Jul. 27, 2018. (30 pages). |
Extended European Search Report issued in corresponding European Application No. 18184842.5, dated Sep. 19, 2018. |
Jaehong Min et al.; “Efficient Deduplication Techniques for Modern Backup Operation”; IEEE Transactions on Computers; vol. 60, No. 6; pp. 824-840; Jun. 2011. |
Daehee Kim et al.; “Existing Deduplication Techniques”; Data Depublication for Data Optimization for Storage and Network Systems; Springer International Publishing; DOI: 10.1007/978-3-319-42280-0_2; pp. 23-76; Sep. 2016. |
Extended European Search Report issued in corresponding European Application No. 18185076.9, dated Dec. 7, 2018 (9 pages). |
Lei Xu et al.; “SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Center”; 2011 31st Intemational Conference on Distributed Computing Systems Workshops (ICDCSW); IEEE Computer Society; pp. 61-65; 2011 (5 pages). |