Deduplication selection and optimization

Description

TECHNICAL FIELD

The present disclosure is generally related to deduplication, and more particularly, to selecting a deduplication process based on a difference between performance metrics.

BACKGROUND

Data deduplication is a process to eliminate or remove redundant data to improve the utilization of storage resources. For example, during the deduplication process, blocks of data may be processed and stored. When a subsequent block of data is received, the subsequent block of data may be compared with the previously stored block of data. If the subsequent block of data matches with the previously stored block of data, then the subsequent block of data may not be stored in the storage resource. Instead, a pointer to the previously stored block of data may replace the contents of the subsequent block of data.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure is illustrated by way of example, and not by way of limitation, and can be more fully understood with reference to the following detailed description when considered in connection with the figures as described below.

FIG. 1 illustrates an example environment to select a deduplication process based on a difference between performance metrics in accordance with some embodiments of the present disclosure.

FIG. 2 illustrates an example method to select a deduplication process in accordance with some embodiments of the present disclosure.

FIG. 3 is an example environment 100 with a storage system that is associated with a local memory and a storage resource for storing data blocks and hash values of a deduplication map in accordance with some embodiments.

FIG. 4 is an example method to select a first deduplication process based on retrieving data blocks or a second deduplication process based on retrieving hash values in accordance with some embodiments.

FIG. 5A is an example method to determine a first performance metric for a first deduplication process in accordance with some embodiments of the present disclosure.

FIG. 5B is an example method to determine a second performance metric for a second deduplication process in accordance with some embodiments of the present disclosure.

FIG. 6 is a block diagram of an example computer system operating in accordance with the disclosure described herein.

DETAILED DESCRIPTION

Aspects of the present disclosure relate to selecting a deduplication process based on a difference between performance metrics. For example, data blocks may be analyzed by a deduplication process to determine whether a duplicate or copy of the data block is currently stored at a storage system. The deduplication process may use a hash function that generates a hash value based on the data block. The generated hash value may be compared with hash values of a deduplication map that identifies currently stored data blocks at the storage system. If the generated hash value matches with any of the hash values in the deduplication map, then the data block may be considered to be a copy or duplicate of another data block that is currently stored at the storage system. Alternatively, the deduplication process may directly compare the received data block with another data block that is currently stored at the storage system. Thus, the deduplication process may be based on comparing a generated hash value with a hash value retrieved from a deduplication map or based on comparing a received data block with a retrieved data block that has been previously stored at the storage system.

The storage system may use either deduplication process to determine whether copies of received data blocks are currently stored at the storage system. For example, a series (i.e., stream) of data blocks may be received to be stored at the storage system. A first hash value may be generated for one of the data blocks of the series of data blocks and the generated hash value may be compared with hash values in a deduplication map. If the first hash value matches with another hash value in the deduplication map, then the corresponding received data block of the series of data blocks may be considered a duplicate of another data block that is currently stored at the storage system.

Subsequently, a deduplication process may be used to determine whether the other data blocks of the series of data blocks are also duplicates of currently stored data blocks at the storage system. For example, the first deduplication process may be used to generate hash values for the other data blocks of the series and retrieve stored hash values associated with the other data blocks and currently stored in the deduplication map. The generated hash values may be compared with the retrieved hash values to determine whether the other data blocks of the series are duplicates of currently stored data blocks. Alternatively, the second deduplication process may be used to retrieve other stored data blocks that are associated with the stored data block and then compare the other received data blocks of the series with the other stored data blocks that have been retrieved to determine whether the other data blocks of the series are duplicates of the currently stored data blocks.

The first deduplication process and the second deduplication process may retrieve, respectively, the stored hash values and the other stored data blocks by retrieving the stored hash values and the stored data blocks from cache memory (i.e., a local memory) and a storage resource (i.e., a backing storage) at the storage system. For example, a subset of the hash values or data blocks may be retrieved from the cache memory and another subset may be retrieved from the storage resource. The retrieving of the hash values or data blocks from the cache memory may take less time than the retrieving of the hash values or data blocks from the storage resource. Thus, depending on the number of hash values to be retrieved that are that are stored at the cache memory as opposed to the storage resource and the number of data blocks to be retrieved that are stored at the cache memory as opposed to the storage resource, the performance of the first deduplication process and the second deduplication process may vary. For example, at certain times, the first deduplication process may be more efficient and take less time than the second deduplication process, and vice versa at other times. Thus, if a particular deduplication process is selected to be used by the storage system, a less efficient and time consuming deduplication process may be selected while a more efficient and less time consuming deduplication process may be available to the storage system.

Aspects of the present disclosure address the above and other deficiencies by determining or calculating performance metrics for the deduplication processes. For example, a first performance metric may be determined for the first deduplication process and a second performance metric may be determined for the second deduplication process. As described in further detail below, the performance metrics may be based on whether the respective data blocks or hash values are stored in cache memory, the storage resource, size of the data blocks that are to be retrieved, the number of data blocks that are to be retrieved, performance characteristics of the storage system, etc. If the first performance metric of the first deduplication process does not exceed the second performance metric of the second deduplication process (e.g., the first and second performance metrics predict less time for performing the first deduplication process as opposed to the second deduplication process) then the first deduplication process may be used to determine whether other data blocks of the series of data blocks received at the storage system are duplicates of currently stored data blocks at the storage system. Otherwise, if the second performance metric predicts that the second deduplication process may take less time to perform than the first performance metric, then the second deduplication process may be used to determine whether the other data blocks of the series of data blocks are duplicates of currently stored data blocks.

Thus, the present disclosure may improve the performance of a storage system by determining performance metrics for performing operations of deduplication processes. For example, the deduplication process that may perform a deduplication operation for data blocks faster than another deduplication process may be selected for use by the storage system when appropriate as based on the performance metrics.

FIG. 1 illustrates an example environment 100 to select a deduplication process based on a difference between performance metrics. In general, the environment 100 may include a storage server 120 that includes a deduplication selector component 125 that receives a stream or series of data blocks 110 for storing in a storage resource 130.

The deduplication process may be an inline data deduplication process where a data block is received and then analyzed before being stored in the storage resource 130. For example, the data deduplication process may determine whether copies of the data blocks 110 that are received are currently stored in the storage resource 130 (e.g., a solid-state non-volatile memory such as flash memory) before storing the received data blocks 110 in the storage resource 130. Thus, the inline data deduplication process may be performed as a stream of data blocks 110 are received to be stored in the storage resource 130.

In general, the deduplication process may receive a data block (e.g., of the series of data blocks) and perform a hash function with the data block to generate a hash value. The hash function may transform the data block of an arbitrary size to data of a fixed size corresponding to the hash value. The deduplication process may store the hash value for comparison with a subsequent data block. For example, when the subsequent data block is received, the hash function may be performed on the subsequent data block to generate a corresponding hash value based on the contents of the subsequent data block. If the corresponding hash value of the subsequent data block matches the previously stored hash value, then the contents of the subsequent data block may be a copy of the contents of the previously received data block. Instead of storing the contents of the subsequent data block, a pointer to the previously received data block with the matching hash value may be used to replace the contents of the subsequent data block.

As shown in FIG. 1, the deduplication selector component 125 may receive a series of data blocks 110. In some embodiments, the series of data blocks 110 may be a stream of data blocks that are to be stored at storage resources that are managed by a storage system (e.g., a flash storage array system or solid-state storage array) that includes the deduplication selector component 125. The deduplication selector component 125 may determine performance metrics for a first and second deduplication process and may select one of the deduplication processes with the received stream of data blocks 110 to determine whether the data blocks of the stream of data blocks 110 are duplicates of other data blocks currently stored at the storage resource 130. The deduplication process may be referred to as an inline deduplication process as the data blocks 110 are analyzed to determine whether a copy or duplicate is currently stored at the storage system before storing the data blocks 110.

As described in further detail with regard to FIG. 2, the deduplication selector component 125 may calculate a first performance metric for a first deduplication process and a second performance metric for a second deduplication process and compare the first and second performance metrics to determine which deduplication process to select to use for determining whether a copy of the series of data blocks 110 is currently stored in the storage resource 130. The deduplication selector component 125 may be implemented by a computer system or storage controller of a flash storage array system. In some embodiments, the deduplication selector component 125 may be implemented by processing logic that may include hardware (e.g., processing device, circuitry, dedicated logic, programmable logic, microcode, hardware of a device, integrated circuit, etc.), software (e.g., instructions run or executed on a processing device), or a combination thereof.

The storage resource 130 may correspond to non-disk storage media that is managed by or coupled with the deduplication selector component 125. For example, the storage resource 130 may be one or more solid-state drives (SSDs), flash memory based storage, any type of solid-state non-volatile memory, or any other type of non-mechanical storage device. In some embodiments, the storage resource 130 may be a storage device that includes a flash memory.

FIG. 2 illustrates an example method 200 to select a deduplication process. In general, the method 200 may be performed by processing logic that may include hardware (e.g., processing device, circuitry, dedicated logic, programmable logic, microcode, hardware of a device, integrated circuit, etc.), software (e.g., instructions run or executed on a processing device), or a combination thereof. In some embodiments, the deduplication selector component 125 of FIG. 1 may perform the method 200.

As shown in FIG. 2, the method 200 may begin with the processing logic receiving a series of data blocks (block 210). For example, a group of data blocks may be received as part of a write operation provided to a storage system to store the group of data blocks at a storage resource that is managed by the storage system. Thus, multiple data blocks may be received and subjected to a deduplication process before any of the multiple data blocks are stored at the storage resource. The processing logic may further generate a hash value for a particular data block of the series of data blocks (block 220). For example, a hash function may be used to generate a hash value based on at least one of the data blocks of the series of data blocks. Thus, a subset of the received data blocks, or one data block of multiple data blocks, may be used to generate a first hash value. The processing logic may further determine that the particular data block is a duplicate of a stored data block at a location based on the generated hash value (block 230). For example, the generated hash value may be compared with hash values of a deduplication map that stores hash values for data blocks that are currently stored at the storage resource of the storage system. Further details with regard to the deduplication map are described in conjunction with FIG. 3.

The processing logic may determine a first performance metric associated with retrieving one or more data blocks associated with the particular data block (block 240). The first performance metric may be based on retrieving data blocks that are proximate (e.g., logically proximate in a logical space or physically proximate at the storage resource) to the particular data block at the location at the storage resource of the storage system that is identified by the generated hash value. For example, the data blocks may be in a particular range of data blocks that includes the particular data block or is around the particular data block. The retrieving of the data blocks may be based on retrieving the data blocks from a cache memory of the storage system and/or from a storage resource of the storage system. For example, the data blocks may be retrieved from the cache memory instead of the storage resource when the respective data blocks are currently stored at the cache memory. Thus, the first performance metric may be based on whether data blocks that are to be retrieved are currently stored at the cache memory or the storage resource. The first performance metric may indicate a better performance (e.g., less time to perform a first deduplication process) when more data blocks to be retrieved are currently stored at the cache memory. Further details with regards to determining a performance metric are described in conjunction with FIGS. 5A and 5B.

Furthermore, the processing logic may determine a second performance metric associated with retrieving one or more hash values associated with the series of data blocks (block 250). The second performance metric may be based on retrieving hash values that are stored in a deduplication map and generating hash values for the other data blocks of the series of data blocks. Furthermore, the second performance metric may similarly indicate a better performance (e.g., less time to perform a second deduplication process) when more hash values of the deduplication map that are to be retrieved are currently stored at the cache memory instead of the storage resource.

Referring to FIG. 2, the processing logic may subsequently perform a first deduplication process based on retrieving the one or more data blocks or a second deduplication process based on retrieving the one or more hash values based on the first and second performance metrics (block 260). For example, the first deduplication process may be selected to be performed to perform the deduplication process with the other data blocks of the series of data blocks when the first performance metric indicates that the first deduplication process may be more efficient (e.g., take less time to perform) than the second deduplication process. The first performance metric may indicate a first amount of time to perform the first deduplication process and the second performance metric may indicate a second amount of time to perform the second deduplication process. The deduplication process that is associated with a lesser amount of time to perform the respective deduplication process may be selected to be performed.

FIG. 3 illustrates an example environment 300 with a storage system that is associated with a local memory and a storage resource for storing data blocks and hash values of a deduplication map. In general, the environment 300 may include a storage system 310 with the deduplication selector component 125 of FIG. 1.

As shown in FIG. 3, the storage system 310 may include a cache memory or local memory 320 (e.g., DRAM) and a backing storage or storage resource 330 (e.g., a solid-state non-volatile memory such as flash memory). The storage system 310 may be a solid-state storage array system such as, but not limited to, a flash storage array system. The local memory 320 may store a portion of hash values in an index table 322 and a portion of data 321 and the storage resource 330 may store a deduplication map 331 and data 332. In general, the storage system 310 may receive data blocks 305 and store the data blocks in the storage resource 330 after the deduplication selector component 125 selects a deduplication process to be performed on the received data blocks. The deduplication process may be selected based on hash values that are stored at the portion of hash values in an index table 322 at the local memory 320 and the deduplication map 331 at the storage resource 330 as well as the data blocks at the portion of data 321 stored at the local memory 320 and the data blocks at the data 332 stored at the storage resource 330. The portion of hash values in the index table 322 may be a subset or a portion of the hash values recorded in the deduplication map 331. For example, the portion of hash values in the index table 322 may be hash values that have been recently generated by the storage system 310 for prior received data blocks and are currently stored in the local memory (i.e., the cache) of the storage system 310. Furthermore, the portion of data 321 may correspond to a subset or portion of data blocks that are stored at the data 332. For example, the portion of data 321 may be data blocks that have been recently received by the storage system for storage at the storage resource 330.

In operation, a series of data blocks 305 may be received by the deduplication selector component 125 of the storage system 310 to be stored at the storage resource 330. In response to receiving the data blocks 305, a deduplication process may be selected by the deduplication selector component 125 as described in conjunction with FIG. 2 and FIGS. 4-5B. The selected deduplication process may be used to determine whether a copy of the data block from the data blocks 305 has already been stored at the storage resource 330. For example, the data block may either be compared with another data block received from the local memory 320 or the storage resource 330 or a hash value may be generated for the data block and a hash value may be retrieved from the deduplication map 331 or the local memory 320 and the generated hash value may be compared with the retrieved hash value. If the comparison of data blocks results in the received data block matching the retrieved data block or if the generated hash value matches with the retrieved hash value, then the data block may be considered a duplicate of the data block stored at the storage resource 330. Instead of storing the contents of the received data block, the data block may be stored at the storage resource 330 by creating a pointer to a physical location of the copy of the data block at the storage resource 330. Otherwise, if the data block is not considered a duplicate of another data block stored at the storage resource 330, then the data block may be stored in the storage resource 330 and an entry of the deduplication map 331 may be modified to register the data block by including a hash value of the data block and the physical location in the storage resource 330 where the data block has been stored in an entry.

Although aspects of the present disclosure relate to inline data deduplication, the disclosure herein may be applied to post-processing data deduplication that may be used to analyze data blocks currently stored on the storage resource 330. For example, the post-processing deduplication may analyze each data block that is currently stored on the storage resource 330 to determine whether the corresponding data block is a copy or duplicate of another data block currently stored on the storage resource 330.

Thus, data blocks and hash values may be stored in a local memory and a storage resource. A deduplication process may be selected based on a distribution of the data blocks and hash values in the local memory and the storage resource.

FIG. 4 is an example method 400 to select a first deduplication process based on retrieving data blocks or a second deduplication process based on retrieving hash values. In general, the method 400 may be performed by processing logic that may include hardware (e.g., processing device, circuitry, dedicated logic, programmable logic, microcode, hardware of a device, integrated circuit, etc.), software (e.g., instructions run or executed on a processing device), or a combination thereof. In some embodiments, the deduplication selector component 125 of FIG. 1 may perform the method 400.

As shown in FIG. 4, the method 400 may begin with the processing logic identifying a location of a data block stored at a storage resource that is a duplicate of a data block of a series of data blocks (block 410). For example, the series of data blocks may be received to be stored at the storage resource and at least one of the data blocks may be determined to be a duplicate of another data block currently stored at the storage resource. The location may be determined by using a hash value of the data block as previously described. The processing logic may further calculate a first performance metric for a first set of operations for retrieving data blocks and comparing the retrieved data blocks with the other data blocks of the series of data blocks (block 420). The first set of operations may correspond to the first deduplication process. The retrieved data blocks may correspond to data blocks that are proximate to the data block that is a duplicate. In some embodiments, the retrieved data blocks may correspond to data blocks that are proximate in logical space to the data block that is a duplicate. In the same or alternative embodiments, the retrieved data blocks may correspond to data blocks that are physically proximate in the storage resource to the data block that is a duplicate. For example, the data blocks may be retrieved from the storage resource or may be retrieved from a cache memory. In some embodiments, when the data blocks are stored at both the cache memory and the storage resource, the data blocks may be retrieved from the cache memory instead of the storage resource. The processing logic may further calculate a second performance metric for a second set of operations for generating hash values for the other data blocks of the series of data blocks, retrieving additional hash values, and comparing the generated hash values with the retrieved hash values (block 430). The second set of operations may correspond to the second deduplication process. The retrieved hash values may correspond to hash values for the data blocks that are proximate to the data block that is a duplicate. The hash values may be retrieved from the cache memory when the respective hash values are present in the local memory and the hash values may be retrieved from the storage resource when the respective hash values are not present in the local memory.

Referring to FIG. 4, the processing logic may further determine whether the first performance metric indicates less time than the second performance metric (block 440). If the first performance metric indicates that the first set of operations may be performed in less than time than the second set of operations, then the processing logic may select the first deduplication process corresponding to the first set of operations to be performed with the other data blocks of the series of data blocks (block 450). For example, the first deduplication process may be used to determine whether the other data blocks are duplicates of data blocks stored at the storage resource. Otherwise, if the second performance metric indicates that the second set of operations may be performed in less time than the first set of operations, then the processing logic may select the second deduplication process corresponding to the second set of operations to be performed with the other data blocks of the series of data blocks (block 460). For example, the second deduplication process may be used to determine whether the other data blocks are duplicates of data blocks currently stored at the storage system.

FIG. 5A is an example method 500 to determine a first performance metric for a first deduplication process. In general, the method 500 may be performed by processing logic that may include hardware (e.g., processing device, circuitry, dedicated logic, programmable logic, microcode, hardware of a device, integrated circuit, etc.), software (e.g., instructions run or executed on a processing device), or a combination thereof. In some embodiments, the deduplication selector component 125 of FIG. 1 may perform the method 500.

As shown in FIG. 5A, the method 500 may begin with the processing logic identifying a first deduplication process based on retrieving data blocks (block 505). The first deduplication process may be a deduplication process that is available for a storage system that has received a plurality of data blocks to be stored at a storage resource. For example, the first deduplication process may determine whether the received data blocks are duplicates of currently stored data blocks by comparing the received data blocks with the currently stored data blocks after retrieval. The processing logic may subsequently determine the number of the data blocks that are to be retrieved that are stored at a storage resource and not a cache memory (block 510). The processing logic may further identify a size of the data blocks that are to be retrieved (block 515). For example, the size of the data blocks that are to be retrieved from the storage resource may be identified. The processing logic may further identify a number of read accesses to retrieve the data blocks that are to be retrieved from the storage resource (block 520). For example, the data blocks may be stored at various locations within the storage resource and a number of read accesses of the storage resource to retrieve each of the data blocks that are to be retrieved from the storage resource may be identified. Furthermore, the processing logic may determine a hardware cost associated with the read accesses of the storage resource (block 525). The cost may be associated with a network bandwidth and a processing usage to retrieve the data blocks from the storage resource.

Subsequently, the processing logic may determine the first performance metric for the first deduplication process based on the number of data blocks that are to be retrieved from the storage resource, the size of the data blocks, the number of read accesses, and the hardware cost (block 530). In some embodiments, the first performance metric may indicate a longer amount of time to perform the first deduplication process when the number of data blocks to be retrieved from the storage increases, the size of the data blocks increases, the number of read accesses increases, and when the hardware cost corresponds to an increase in network bandwidth and processing usage.

FIG. 5B is an example method 550 to determine a second performance metric for a second deduplication process. In general, the method 550 may be performed by processing logic that may include hardware (e.g., processing device, circuitry, dedicated logic, programmable logic, microcode, hardware of a device, integrated circuit, etc.), software (e.g., instructions run or executed on a processing device), or a combination thereof. In some embodiments, the deduplication selector component 125 of FIG. 1 may perform the method 550.

As shown in FIG. 5B, the method 550 may begin with the processing logic identifying a second deduplication process based on retrieving hash values (block 555). The processing logic may subsequently identify a number of hash values that are to be retrieved from a deduplication map at a storage resource and not a cache memory (block 560). The processing logic may further determine a number a read accesses to retrieve the hash values from the deduplication map at the storage resource, a cost of the read accesses, and a hardware cost associated with the read accesses (block 565). Furthermore, the processing logic may determine an amount of time to generate hash values for received data blocks (block 570). For example, an amount of time to generate the hash values for received data blocks by a hash function may be received.

Subsequently, the processing logic may determine the second performance metric for the second deduplication process based on the number of hash values that are to be retrieved from deduplication map at the storage resource, the number of read accesses, the hardware cost, and the amount of time to generate the hash values (block 575). In some embodiments, the second performance metric may indicate a longer amount of time to perform the second deduplication process when the number of hash values to be retrieved from the storage increases, the number of read accesses increases, when the hardware cost corresponds to an increase in network bandwidth and processing usage, and when the amount of time to generate the hash values increases.

FIG. 6 depicts an example computer system 600 which can perform any one or more of the methods described herein. The computer system may be connected (e.g., networked) to other computer systems in a LAN, an intranet, an extranet, or the Internet. The computer system may operate in the capacity of a server in a client-server network environment. The computer system may be a personal computer (PC), a set-top box (STB), a server, a network router, switch or bridge, a storage system, or any device capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that device. Further, while only a single computer system is illustrated, the term “computer” shall also be taken to include any collection of computers that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methods discussed herein.

The exemplary computer system 600 includes a processing device 602, a main memory 604 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM)), a solid-state non-volatile memory 606 (e.g., flash memory, 3D crosspoint (XPoint) memory, magnetoresistive random-access memory (MRAM), or any other such storage media that does not use a physical disk), and a data storage device 616, which communicate with each other via a bus 608.

Processing device 602 represents one or more general-purpose processing devices such as a microprocessor, central processing unit, or the like. More particularly, the processing device 602 may be a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, or a processor implementing other instruction sets or processors implementing a combination of instruction sets. The processing device 602 may also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like. The processing device 602 is configured to execute the deduplication selector component 125 of FIG. 1 for performing the operations and steps discussed herein. The computer system 600 may further include a network interface device 622. The data storage device 616 may include a computer-readable medium 624 on which is stored the deduplication selector component 125 embodying any one or more of the methodologies or functions described herein. The deduplication selector component 125 may also reside, completely or at least partially, within the main memory 604 and/or within the processing device 602 during execution thereof by the computer system 600, the main memory 604 and the processing device 602 also constituting computer-readable media. The deduplication selector component 125 may further be transmitted or received over a network via the network interface device 622.

While the computer-readable storage medium 624 is shown in the illustrative examples to be a single medium, the term “computer-readable storage medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “computer-readable storage medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure. The term “computer-readable storage medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical media, and magnetic media.

Although the operations of the methods herein are shown and described in a particular order, the order of the operations of each method may be altered so that certain operations may be performed in an inverse order or so that certain operation may be performed, at least in part, concurrently with other operations. In certain implementations, instructions or sub-operations of distinct operations may be in an intermittent and/or alternating manner.

It is to be understood that the above description is intended to be illustrative, and not restrictive. Many other implementations will be apparent to those of skill in the art upon reading and understanding the above description. The scope of the disclosure should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.

In the above description, numerous details are set forth. It will be apparent, however, to one skilled in the art, that the present disclosure may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present disclosure.

Some portions of the detailed descriptions above are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.

It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise, as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “receiving,” “determining,” “performing,” “using,” “registering,” “recording,” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.

The present disclosure also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, each coupled to a computer system bus.

The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear as set forth in the description below. In addition, the present disclosure is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the disclosure as described herein.

The present disclosure may be provided as a computer program product, or software, that may include a machine-readable medium having stored thereon instructions, which may be used to program a computer system (or other electronic devices) to perform a process according to the present disclosure. A machine-readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). For example, a machine-readable (e.g., computer-readable) medium includes a machine (e.g., a computer) readable storage medium (e.g., read only memory (“ROM”), random access memory (“RAM”), magnetic disk storage media, optical storage media, flash memory devices, etc.).

The words “example” or “exemplary” are used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “example” or “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Rather, use of the words “example” or “exemplary” is intended to present concepts in a concrete fashion. As used in this application, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or”. That is, unless specified otherwise, or clear from context, “X includes A or B” is intended to mean any of the natural inclusive permutations. That is, if X includes A; X includes B; or X includes both A and B, then “X includes A or B” is satisfied under any of the foregoing instances. In addition, the articles “a” and “an” as used in this application and the appended claims should generally be construed to mean “one or more” unless specified otherwise or clear from context to be directed to a singular form. Moreover, use of the term “an embodiment” or “one embodiment” or “an implementation” or “one implementation” throughout is not intended to mean the same embodiment or implementation unless described as such. Furthermore, the terms “first,” “second,” “third,” “fourth,” etc. as used herein are meant as labels to distinguish among different elements and may not necessarily have an ordinal meaning according to their numerical designation.

Claims

1. A storage system comprising: a plurality of storage devices comprising flash memory; anda processing device, operatively coupled to the plurality of storage devices, configured to: generate a hash value for a portion of data blocks to be stored at a particular storage device of the plurality of storage devices;determine that the hash value matches a corresponding hash value of a data block currently stored at the particular storage device;select one of a first deduplication process or a second deduplication process to be performed by the processing device based on one or more performance metrics, the one or more performance metrics comprise a type of storage medium the other data blocks are retrieved from, wherein the selected one of the first deduplication process or the second deduplication process determines whether remaining portions of the data blocks to be stored at the particular storage device match other data blocks currently stored at the particular storage device, and wherein when performing the second deduplication process, the processing device is further configured to: retrieve the other data blocks currently stored at the particular storage device that are associated with the data block currently stored at the particular storage device; anddetermine whether one or more of the remaining portions of the data blocks to be stored at the particular storage device match the other data blocks; andperform, by the processing device, the selected one of the first deduplication process or the second deduplication process.
2. The storage system of claim 1, wherein to perform the first deduplication process, the processing device is further configured to: generate hash values for one or more remaining portions of the data blocks to be stored at the particular storage device; anddetermine whether one or more of the hash values for the one or more remaining portions of the data blocks match one or more corresponding hash values of the other data blocks currently stored at the particular storage device.
3. The storage system of claim 1, wherein the other data blocks are physically proximate to the data block currently stored at the particular storage device.
4. The storage system of claim 1, wherein the one or more performance metrics comprise corresponding amounts of overhead to complete the first deduplication process and the second deduplication process.
5. The method of claim 1, wherein the processing device offloads management of the flash memory from a controller of the storage device.
6. A method comprising: generating a hash value for a portion of data blocks to be stored at a storage device comprising flash memory;determining that the hash value matches a corresponding hash value of a data block currently stored at the storage device;selecting, by a processing device, one of a first deduplication process or a second deduplication process to be performed by the processing device based on one or more performance metrics, the one or more performance metrics comprise a type of storage medium the other data blocks are retrieved from, wherein the selected one of the first deduplication process or the second deduplication process determines whether remaining portions of the data blocks to be stored at the storage device match other data blocks currently stored at the storage device, and wherein when performing the second deduplication process, the processing device is further configured to; retrieving the other data blocks currently stored at the storage device that are associated with the data block currently stored at the storage device; anddetermining whether one or more of the remaining portions of the data blocks to be stored at the storage device match the other data blocks; andperforming, by the processing device, the selected one of the first deduplication process or the second deduplication process.
7. The method of claim 6, wherein performing the first deduplication process comprises: generating hash values for one or more remaining portions of the data blocks to be stored at the storage device; anddetermining whether one or more of the hash values for the one or more remaining portions of the data blocks match one or more corresponding hash values of the other data blocks currently stored at the storage device.
8. The method of claim 6, wherein the other data blocks are physically proximate to the data block currently stored at the storage device.
9. The method of claim 6, wherein the one or more performance metrics comprise corresponding amounts of overhead to complete the first deduplication process and the second deduplication process.
10. The method of claim 6 wherein the processing device is external to the storage device.
11. A non-transitory computer-readable storage medium storing instructions which, when executed by a processing device, cause the processing device to: generate a hash value for a portion of data blocks to be stored at a storage device comprising flash memory;determine that the hash value matches a corresponding hash value of a data block currently stored at the storage device;select one of a first deduplication process or a second deduplication process to be performed by the processing device based on one or more performance metrics, the one or more performance metrics comprise a type of storage medium the other data blocks are retrieved from, wherein the selected one of the first deduplication process or the second deduplication process determines whether remaining portions of the data blocks to be stored at the storage device match other data blocks currently stored at the storage device; retrieve the other data blocks currently stored at the storage device that are associated with the data block currently stored at the storage device; anddetermine whether one or more of the remaining portions of the data blocks to be stored at the storage device match the other data blocks; andperform, by the processing device, the selected one of the first deduplication process or the second deduplication process.
12. The non-transitory computer-readable storage medium of claim 11, wherein to perform the first deduplication process, the processing device is further configured to: generate hash values for one or more remaining portions of the data blocks to be stored at the storage device; anddetermine whether one or more of the hash values for the one or more remaining portions of the data blocks match one or more corresponding hash values of the other data blocks currently stored at the storage device.
13. The non-transitory computer-readable storage medium of claim 11, wherein the other data blocks are physically proximate to the data block currently stored at the storage device.
14. The non-transitory computer-readable storage medium of claim 11, wherein the one or more performance metrics comprise corresponding amounts of overhead to complete the first deduplication process and the second deduplication process.
15. The non-transitory computer-readable storage medium of claim 11, wherein the processing device offloads management of the flash memory from a controller of the storage device.

CROSS REFERENCE TO RELATED PATENTS

This is a continuation application for patent entitled to a filing date and claiming the benefit of earlier-filed U.S. Pat. No. 11,704,036, issued Jul. 18, 2023, which is a continuation of U.S. Pat. No. 10,133,503, filed Nov. 20, 2018, which is a non-provisional application of U.S. Provisional Application No. 62/330,728, filed May 2, 2016 each of which are herein incorporated by reference in their entirety.

US Referenced Citations (496)

Number	Name	Date	Kind
5390327	Lubbers et al.	Feb 1995	A
5450581	Bergen et al.	Sep 1995	A
5479653	Jones	Dec 1995	A
5488731	Mendelsohn	Jan 1996	A
5504858	Ellis et al.	Apr 1996	A
5564113	Bergen et al.	Oct 1996	A
5574882	Menon et al.	Nov 1996	A
5649093	Hanko et al.	Jul 1997	A
5883909	DeKoning et al.	Mar 1999	A
6000010	Legg	Dec 1999	A
6260156	Garvin et al.	Jul 2001	B1
6269453	Krantz	Jul 2001	B1
6275898	DeKoning	Aug 2001	B1
6453428	Stephenson	Sep 2002	B1
6523087	Busser	Feb 2003	B2
6535417	Tsuda et al.	Mar 2003	B2
6643748	Wieland	Nov 2003	B1
6725392	Frey et al.	Apr 2004	B1
6763455	Hall	Jul 2004	B2
6836816	Kendall	Dec 2004	B2
6985995	Holland et al.	Jan 2006	B2
7032125	Holt et al.	Apr 2006	B2
7047358	Lee et al.	May 2006	B2
7051155	Talagala et al.	May 2006	B2
7055058	Lee et al.	May 2006	B2
7065617	Wang	Jun 2006	B2
7069383	Yamamoto et al.	Jun 2006	B2
7076606	Orsley	Jul 2006	B2
7107480	Moshayedi et al.	Sep 2006	B1
7159150	Kenchammana-Hosekote et al.	Jan 2007	B2
7162575	Dalal et al.	Jan 2007	B2
7164608	Lee	Jan 2007	B2
7188270	Nanda et al.	Mar 2007	B1
7334156	Land et al.	Feb 2008	B2
7370220	Nguyen et al.	May 2008	B1
7386666	Beauchamp et al.	Jun 2008	B1
7398285	Kisley	Jul 2008	B2
7424498	Patterson	Sep 2008	B1
7424592	Karr et al.	Sep 2008	B1
7444532	Masuyama et al.	Oct 2008	B2
7480658	Heinla et al.	Jan 2009	B2
7484056	Madnani et al.	Jan 2009	B2
7484057	Madnani et al.	Jan 2009	B1
7484059	Ofer et al.	Jan 2009	B1
7536506	Ashmore et al.	May 2009	B2
7558859	Kasiolas et al.	Jul 2009	B2
7565446	Talagala et al.	Jul 2009	B2
7613947	Coatney et al.	Nov 2009	B1
7634617	Misra	Dec 2009	B2
7634618	Misra	Dec 2009	B2
7681104	Sim-Tang et al.	Mar 2010	B1
7681105	Sim-Tang et al.	Mar 2010	B1
7681109	Litsyn et al.	Mar 2010	B2
7730257	Franklin	Jun 2010	B2
7730258	Smith et al.	Jun 2010	B1
7730274	Usgaonkar	Jun 2010	B1
7743276	Jacobson et al.	Jun 2010	B2
7752489	Deenadhayalan et al.	Jul 2010	B2
7757038	Kitahara	Jul 2010	B2
7757059	Ofer et al.	Jul 2010	B1
7778960	Chatterjee et al.	Aug 2010	B1
7783955	Murin	Aug 2010	B2
7814272	Barrall et al.	Oct 2010	B2
7814273	Barrall	Oct 2010	B2
7818531	Barral	Oct 2010	B2
7827351	Suetsugu et al.	Nov 2010	B2
7827439	Mathew et al.	Nov 2010	B2
7831768	Ananthamurthy et al.	Nov 2010	B2
7856583	Smith	Dec 2010	B1
7870105	Arakawa et al.	Jan 2011	B2
7873878	Belluomini et al.	Jan 2011	B2
7885938	Greene et al.	Feb 2011	B1
7886111	Klemm et al.	Feb 2011	B2
7908448	Chatterjee et al.	Mar 2011	B1
7916538	Jeon et al.	Mar 2011	B2
7921268	Jakob	Apr 2011	B2
7930499	Duchesne	Apr 2011	B2
7941697	Mathew et al.	May 2011	B2
7958303	Shuster	Jun 2011	B2
7971129	Watson et al.	Jun 2011	B2
7975115	Wayda et al.	Jul 2011	B2
7984016	Kisley	Jul 2011	B2
7991822	Bish et al.	Aug 2011	B2
8006126	Deenadhayalan et al.	Aug 2011	B2
8010485	Chatterjee et al.	Aug 2011	B1
8010829	Chatterjee et al.	Aug 2011	B1
8020047	Courtney	Sep 2011	B2
8046548	Chatterjee et al.	Oct 2011	B1
8051361	Sim-Tang et al.	Nov 2011	B2
8051362	Li et al.	Nov 2011	B2
8074038	Lionetti et al.	Dec 2011	B2
8082393	Galloway et al.	Dec 2011	B2
8086603	Nasre et al.	Dec 2011	B2
8086634	Mimatsu	Dec 2011	B2
8086911	Taylor	Dec 2011	B1
8090837	Shin et al.	Jan 2012	B2
8108502	Tabbara et al.	Jan 2012	B2
8117388	Jernigan, IV	Feb 2012	B2
8117521	Parker et al.	Feb 2012	B2
8140821	Raizen et al.	Mar 2012	B1
8145838	Miller et al.	Mar 2012	B1
8145840	Koul et al.	Mar 2012	B2
8175012	Chu et al.	May 2012	B2
8176360	Frost et al.	May 2012	B2
8176405	Hafner et al.	May 2012	B2
8180855	Aiello et al.	May 2012	B2
8200922	McKean et al.	Jun 2012	B2
8209469	Carpenter et al.	Jun 2012	B2
8225006	Karamcheti	Jul 2012	B1
8239618	Kotzur et al.	Aug 2012	B2
8244999	Chatterjee et al.	Aug 2012	B1
8261016	Goel	Sep 2012	B1
8271455	Kesselman	Sep 2012	B2
8285686	Kesselman	Oct 2012	B2
8305811	Jeon	Nov 2012	B2
8315999	Chatley et al.	Nov 2012	B2
8327080	Der	Dec 2012	B1
8335769	Kesselman	Dec 2012	B2
8341118	Drobychev et al.	Dec 2012	B2
8351290	Huang et al.	Jan 2013	B1
8364920	Parkison et al.	Jan 2013	B1
8365041	Olbrich et al.	Jan 2013	B2
8375146	Sinclair	Feb 2013	B2
8397016	Talagala et al.	Mar 2013	B2
8402152	Duran	Mar 2013	B2
8412880	Leibowitz et al.	Apr 2013	B2
8423739	Ash et al.	Apr 2013	B2
8429436	Fillingim et al.	Apr 2013	B2
8452928	Ofer et al.	May 2013	B1
8473698	Lionetti et al.	Jun 2013	B2
8473778	Simitci et al.	Jun 2013	B2
8473815	Chung et al.	Jun 2013	B2
8479037	Chatterjee et al.	Jul 2013	B1
8484414	Sugimoto et al.	Jul 2013	B2
8498967	Chatterjee et al.	Jul 2013	B1
8504797	Mimatsu	Aug 2013	B2
8522073	Cohen	Aug 2013	B2
8533408	Madnani et al.	Sep 2013	B1
8533527	Daikokuya et al.	Sep 2013	B2
8538933	Hu	Sep 2013	B1
8539177	Madnani et al.	Sep 2013	B1
8544029	Bakke et al.	Sep 2013	B2
8549224	Zeryck et al.	Oct 2013	B1
8583861	Ofer et al.	Nov 2013	B1
8589625	Colgrove et al.	Nov 2013	B2
8595455	Chatterjee et al.	Nov 2013	B2
8615599	Takefman et al.	Dec 2013	B1
8627136	Shankar et al.	Jan 2014	B2
8627138	Clark et al.	Jan 2014	B1
8639669	Douglis et al.	Jan 2014	B1
8639863	Kanapathippillai et al.	Jan 2014	B1
8640000	Cypher	Jan 2014	B1
8650343	Kanapathippillai et al.	Feb 2014	B1
8660131	Vermunt et al.	Feb 2014	B2
8661218	Piszczek et al.	Feb 2014	B1
8671072	Shah et al.	Mar 2014	B1
8689042	Kanapathippillai et al.	Apr 2014	B1
8700875	Barron et al.	Apr 2014	B1
8706694	Chatterjee et al.	Apr 2014	B2
8706914	Duchesneau	Apr 2014	B2
8706932	Kanapathippillai et al.	Apr 2014	B1
8712963	Douglis et al.	Apr 2014	B1
8713405	Healey, Jr. et al.	Apr 2014	B2
8719621	Karmarkar	May 2014	B1
8725730	Keeton et al.	May 2014	B2
8751763	Ramarao	Jun 2014	B1
8751859	Becker-Szendy et al.	Jun 2014	B2
8756387	Frost et al.	Jun 2014	B2
8762793	Grube et al.	Jun 2014	B2
8769232	Suryabudi et al.	Jul 2014	B2
8775858	Gower et al.	Jul 2014	B2
8775868	Colgrove et al.	Jul 2014	B2
8788913	Xin et al.	Jul 2014	B1
8793447	Usgaonkar et al.	Jul 2014	B2
8799746	Baker et al.	Aug 2014	B2
8819311	Liao	Aug 2014	B2
8819383	Jobanputra et al.	Aug 2014	B1
8822155	Sukumar et al.	Sep 2014	B2
8824261	Miller et al.	Sep 2014	B1
8832528	Thatcher et al.	Sep 2014	B2
8838541	Camble et al.	Sep 2014	B2
8838892	Li	Sep 2014	B2
8843700	Salessi et al.	Sep 2014	B1
8850108	Hayes et al.	Sep 2014	B1
8850288	Lazier et al.	Sep 2014	B1
8856593	Eckhardt et al.	Oct 2014	B2
8856619	Cypher	Oct 2014	B1
8862617	Kesselman	Oct 2014	B2
8862847	Feng et al.	Oct 2014	B2
8862928	Xavier et al.	Oct 2014	B2
8868825	Hayes et al.	Oct 2014	B1
8874836	Hayes et al.	Oct 2014	B1
8880793	Nagineni	Nov 2014	B2
8880825	Lionetti et al.	Nov 2014	B2
8886778	Nedved et al.	Nov 2014	B2
8898383	Yamamoto et al.	Nov 2014	B2
8898388	Kimmel	Nov 2014	B1
8904231	Coatney et al.	Dec 2014	B2
8918478	Ozzie et al.	Dec 2014	B2
8930307	Colgrove et al.	Jan 2015	B2
8930633	Amit et al.	Jan 2015	B2
8943357	Atzmony	Jan 2015	B2
8949502	McKnight et al.	Feb 2015	B2
8959110	Smith et al.	Feb 2015	B2
8959388	Kuang et al.	Feb 2015	B1
8972478	Storer et al.	Mar 2015	B1
8972779	Lee et al.	Mar 2015	B2
8977597	Ganesh et al.	Mar 2015	B2
8996828	Kalos et al.	Mar 2015	B2
9003144	Hayes et al.	Apr 2015	B1
9009724	Gold et al.	Apr 2015	B2
9021053	Bernbo et al.	Apr 2015	B2
9021215	Meir et al.	Apr 2015	B2
9025393	Wu et al.	May 2015	B2
9043372	Makkar et al.	May 2015	B2
9047214	Northcott	Jun 2015	B1
9053808	Sprouse et al.	Jun 2015	B2
9058155	Cepulis et al.	Jun 2015	B2
9063895	Madnani et al.	Jun 2015	B1
9063896	Madnani et al.	Jun 2015	B1
9098211	Madnani et al.	Aug 2015	B1
9110898	Chamness et al.	Aug 2015	B1
9110964	Shilane et al.	Aug 2015	B1
9116819	Cope et al.	Aug 2015	B2
9117536	Yoon et al.	Aug 2015	B2
9122401	Zaltsman et al.	Sep 2015	B2
9123422	Yu et al.	Sep 2015	B2
9124300	Sharon et al.	Sep 2015	B2
9134908	Horn et al.	Sep 2015	B2
9153337	Sutardja	Oct 2015	B2
9158472	Kesselman et al.	Oct 2015	B2
9159422	Lee et al.	Oct 2015	B1
9164891	Karamcheti et al.	Oct 2015	B2
9183136	Kawamura et al.	Nov 2015	B2
9189650	Jaye et al.	Nov 2015	B2
9201733	Verma et al.	Dec 2015	B2
9207876	Shu et al.	Dec 2015	B2
9229656	Contreras et al.	Jan 2016	B1
9229810	He et al.	Jan 2016	B2
9235475	Shilane et al.	Jan 2016	B1
9244626	Shah et al.	Jan 2016	B2
9250999	Barroso	Feb 2016	B1
9251066	Colgrove et al.	Feb 2016	B2
9268648	Barash et al.	Feb 2016	B1
9268806	Kesselman	Feb 2016	B1
9280678	Redberg	Mar 2016	B2
9286002	Karamcheti et al.	Mar 2016	B1
9292214	Kalos et al.	Mar 2016	B2
9298760	Li et al.	Mar 2016	B1
9304908	Karamcheti et al.	Apr 2016	B1
9311969	Sharon et al.	Apr 2016	B2
9311970	Sharon et al.	Apr 2016	B2
9323663	Karamcheti et al.	Apr 2016	B2
9323667	Bennett	Apr 2016	B2
9323681	Apostolides et al.	Apr 2016	B2
9335942	Kumar et al.	May 2016	B2
9348538	Mallaiah et al.	May 2016	B2
9355022	Ravimohan et al.	May 2016	B2
9384082	Lee et al.	Jul 2016	B1
9384252	Akirav et al.	Jul 2016	B2
9389958	Sundaram et al.	Jul 2016	B2
9390019	Patterson et al.	Jul 2016	B2
9395922	Nishikido et al.	Jul 2016	B2
9396202	Drobychev et al.	Jul 2016	B1
9400828	Kesselman et al.	Jul 2016	B2
9405478	Koseki et al.	Aug 2016	B2
9411685	Lee	Aug 2016	B2
9417960	Cai et al.	Aug 2016	B2
9417963	He et al.	Aug 2016	B2
9430250	Hamid et al.	Aug 2016	B2
9430542	Akirav et al.	Aug 2016	B2
9432541	Ishida	Aug 2016	B2
9454434	Sundaram et al.	Sep 2016	B2
9471579	Natanzon	Oct 2016	B1
9477554	Hayes et al.	Oct 2016	B2
9477632	Du	Oct 2016	B2
9501398	George et al.	Nov 2016	B2
9525737	Friedman	Dec 2016	B2
9529542	Friedman et al.	Dec 2016	B2
9535631	Fu et al.	Jan 2017	B2
9552248	Miller et al.	Jan 2017	B2
9552291	Munetoh et al.	Jan 2017	B2
9552299	Stalzer	Jan 2017	B2
9563517	Natanzon et al.	Feb 2017	B1
9588698	Karamcheti et al.	Mar 2017	B1
9588712	Kalos et al.	Mar 2017	B2
9594652	Sathiamoorthy et al.	Mar 2017	B1
9600193	Ahrens et al.	Mar 2017	B2
9619321	Haratsch et al.	Apr 2017	B1
9619430	Kannan et al.	Apr 2017	B2
9639543	Li	May 2017	B2
9645754	Li et al.	May 2017	B2
9667720	Bent et al.	May 2017	B1
9710535	Aizman et al.	Jul 2017	B2
9733840	Karamcheti et al.	Aug 2017	B2
9734225	Akirav et al.	Aug 2017	B2
9740403	Storer et al.	Aug 2017	B2
9740700	Chopra et al.	Aug 2017	B1
9740762	Horowitz et al.	Aug 2017	B2
9747319	Bestler et al.	Aug 2017	B2
9747320	Kesselman	Aug 2017	B2
9753938	Mallaiah et al.	Sep 2017	B2
9767130	Bestler et al.	Sep 2017	B2
9781227	Friedman et al.	Oct 2017	B2
9785498	Misra et al.	Oct 2017	B2
9798486	Singh	Oct 2017	B1
9804925	Carmi et al.	Oct 2017	B1
9811285	Karamcheti et al.	Nov 2017	B1
9811546	Bent et al.	Nov 2017	B1
9818478	Chung	Nov 2017	B2
9829066	Thomas et al.	Nov 2017	B2
9836245	Hayes et al.	Dec 2017	B2
9846718	Ruef et al.	Dec 2017	B1
9891854	Munetoh et al.	Feb 2018	B2
9891860	Delgado et al.	Feb 2018	B1
9892005	Kedem et al.	Feb 2018	B2
9892186	Akirav et al.	Feb 2018	B2
9904589	Donlan et al.	Feb 2018	B1
9904717	Anglin et al.	Feb 2018	B2
9910748	Pan	Mar 2018	B2
9910904	Anglin et al.	Mar 2018	B2
9934237	Shilane et al.	Apr 2018	B1
9940065	Kalos et al.	Apr 2018	B2
9946604	Glass	Apr 2018	B1
9952809	Shah	Apr 2018	B2
9959167	Donlan et al.	May 2018	B1
9965539	D'Halluin et al.	May 2018	B2
9998539	Brock et al.	Jun 2018	B1
10007457	Hayes et al.	Jun 2018	B2
10013177	Liu et al.	Jul 2018	B2
10013311	Sundaram et al.	Jul 2018	B2
10019314	Yang et al.	Jul 2018	B2
10019317	Usvyatsky et al.	Jul 2018	B2
10031703	Natanzon et al.	Jul 2018	B1
10061512	Lin	Aug 2018	B2
10073626	Karamcheti et al.	Sep 2018	B2
10082985	Hayes et al.	Sep 2018	B2
10089012	Chen et al.	Oct 2018	B1
10089174	Yang	Oct 2018	B2
10089176	Donlan et al.	Oct 2018	B1
10108819	Donlan et al.	Oct 2018	B1
10146787	Bashyam et al.	Dec 2018	B2
10152268	Chakraborty et al.	Dec 2018	B1
10157098	Yang et al.	Dec 2018	B2
10162704	Kirschner et al.	Dec 2018	B1
10180875	Klein	Jan 2019	B2
10185730	Bestler et al.	Jan 2019	B2
10235065	Miller et al.	Mar 2019	B1
10324639	Seo	Jun 2019	B2
10496490	Chandrasekharan	Dec 2019	B2
10567406	Astigarraga et al.	Feb 2020	B2
10846137	Vallala et al.	Nov 2020	B2
10877683	Wu et al.	Dec 2020	B2
11076509	Alissa et al.	Jul 2021	B2
11106810	Natanzon et al.	Aug 2021	B2
11194707	Stalzer	Dec 2021	B2
20020144059	Kendall	Oct 2002	A1
20030105984	Masuyama et al.	Jun 2003	A1
20030110205	Johnson	Jun 2003	A1
20040161086	Buntin et al.	Aug 2004	A1
20050001652	Malik et al.	Jan 2005	A1
20050076228	Davis et al.	Apr 2005	A1
20050235132	Karr et al.	Oct 2005	A1
20050278460	Shin et al.	Dec 2005	A1
20050283649	Turner et al.	Dec 2005	A1
20060015683	Ashmore et al.	Jan 2006	A1
20060114930	Lucas et al.	Jun 2006	A1
20060174157	Barrall et al.	Aug 2006	A1
20060248294	Nedved et al.	Nov 2006	A1
20070079068	Draggon	Apr 2007	A1
20070214194	Reuter	Sep 2007	A1
20070214314	Reuter	Sep 2007	A1
20070234016	Davis et al.	Oct 2007	A1
20070268905	Baker et al.	Nov 2007	A1
20080080709	Michtchenko et al.	Apr 2008	A1
20080107274	Worthy	May 2008	A1
20080155191	Anderson et al.	Jun 2008	A1
20080256141	Wayda et al.	Oct 2008	A1
20080295118	Liao	Nov 2008	A1
20090077208	Nguyen et al.	Mar 2009	A1
20090138654	Sutardja	May 2009	A1
20090216910	Duchesneau	Aug 2009	A1
20090216920	Lauterbach et al.	Aug 2009	A1
20100017444	Chatterjee et al.	Jan 2010	A1
20100042636	Lu	Feb 2010	A1
20100094806	Apostolides et al.	Apr 2010	A1
20100115070	Missimilly	May 2010	A1
20100125695	Wu et al.	May 2010	A1
20100162076	Sim-Tang et al.	Jun 2010	A1
20100169707	Mathew et al.	Jul 2010	A1
20100174576	Naylor	Jul 2010	A1
20100268908	Ouyang et al.	Oct 2010	A1
20100306500	Mimatsu	Dec 2010	A1
20110035540	Fitzgerald et al.	Feb 2011	A1
20110040925	Frost et al.	Feb 2011	A1
20110060927	Fillingim et al.	Mar 2011	A1
20110119462	Leach et al.	May 2011	A1
20110219170	Frost et al.	Sep 2011	A1
20110238625	Hamaguchi et al.	Sep 2011	A1
20110238635	Leppard	Sep 2011	A1
20110264843	Haines et al.	Oct 2011	A1
20110302369	Goto et al.	Dec 2011	A1
20120011398	Eckhardt et al.	Jan 2012	A1
20120079318	Colgrove et al.	Mar 2012	A1
20120089567	Takahashi et al.	Apr 2012	A1
20120110249	Jeong et al.	May 2012	A1
20120131253	McKnight et al.	May 2012	A1
20120158923	Mohamed et al.	Jun 2012	A1
20120191670	Kennedy	Jul 2012	A1
20120191900	Kunimatsu et al.	Jul 2012	A1
20120198152	Terry et al.	Aug 2012	A1
20120198261	Brown et al.	Aug 2012	A1
20120209943	Jung	Aug 2012	A1
20120221525	Gold	Aug 2012	A1
20120226934	Rao	Sep 2012	A1
20120246435	Meir et al.	Sep 2012	A1
20120260055	Murase	Oct 2012	A1
20120311557	Resch	Dec 2012	A1
20130022201	Glew et al.	Jan 2013	A1
20130036314	Glew et al.	Feb 2013	A1
20130042056	Shats et al.	Feb 2013	A1
20130060884	Bernbo et al.	Mar 2013	A1
20130067188	Mehra et al.	Mar 2013	A1
20130073894	Xavier et al.	Mar 2013	A1
20130124776	Hallak et al.	May 2013	A1
20130132800	Healey, Jr. et al.	May 2013	A1
20130151653	Sawicki et al.	Jun 2013	A1
20130151771	Tsukahara et al.	Jun 2013	A1
20130173853	Ungureanu et al.	Jul 2013	A1
20130238554	Yucel et al.	Sep 2013	A1
20130339314	Carpentier et al.	Dec 2013	A1
20130339635	Amit et al.	Dec 2013	A1
20130339818	Baker et al.	Dec 2013	A1
20140040535	Lee et al.	Feb 2014	A1
20140040702	He et al.	Feb 2014	A1
20140047263	Coatney et al.	Feb 2014	A1
20140047269	Kim	Feb 2014	A1
20140063721	Herman et al.	Mar 2014	A1
20140064048	Cohen et al.	Mar 2014	A1
20140068224	Fan et al.	Mar 2014	A1
20140075252	Luo et al.	Mar 2014	A1
20140122510	Namkoong et al.	May 2014	A1
20140122818	Hayasaka	May 2014	A1
20140136880	Shankar et al.	May 2014	A1
20140181402	White	Jun 2014	A1
20140188822	Das	Jul 2014	A1
20140220561	Sukumar et al.	Aug 2014	A1
20140237164	Le et al.	Aug 2014	A1
20140279936	Bernbo et al.	Sep 2014	A1
20140280025	Eidson et al.	Sep 2014	A1
20140289588	Nagadomi et al.	Sep 2014	A1
20140330785	Isherwood et al.	Nov 2014	A1
20140372838	Lou et al.	Dec 2014	A1
20140380125	Calder et al.	Dec 2014	A1
20140380126	Yekhanin et al.	Dec 2014	A1
20150032720	James	Jan 2015	A1
20150039645	Lewis	Feb 2015	A1
20150039849	Lewis	Feb 2015	A1
20150089283	Kermarrec et al.	Mar 2015	A1
20150100746	Rychlik et al.	Apr 2015	A1
20150134824	Mickens et al.	May 2015	A1
20150142755	Kishi	May 2015	A1
20150153800	Lucas et al.	Jun 2015	A1
20150154418	Redberg	Jun 2015	A1
20150180714	Chunn et al.	Jun 2015	A1
20150280959	Vincent	Oct 2015	A1
20160026397	Nishikido et al.	Jan 2016	A1
20160026653	Caro	Jan 2016	A1
20160182542	Staniford	Jun 2016	A1
20160191508	Bestler et al.	Jun 2016	A1
20160246537	Kim	Aug 2016	A1
20160248631	Duchesneau	Aug 2016	A1
20160378612	Hipsh et al.	Dec 2016	A1
20170091236	Hayes et al.	Mar 2017	A1
20170103092	Hu et al.	Apr 2017	A1
20170103094	Hu et al.	Apr 2017	A1
20170103098	Hu et al.	Apr 2017	A1
20170103116	Hu et al.	Apr 2017	A1
20170177236	Haratsch et al.	Jun 2017	A1
20170262202	Seo	Sep 2017	A1
20180039442	Shadrin et al.	Feb 2018	A1
20180054454	Astigarraga et al.	Feb 2018	A1
20180081958	Akirav et al.	Mar 2018	A1
20180101441	Hyun et al.	Apr 2018	A1
20180101587	Anglin et al.	Apr 2018	A1
20180101588	Anglin et al.	Apr 2018	A1
20180157852	Manville	Jun 2018	A1
20180217756	Liu et al.	Aug 2018	A1
20180307560	Vishnumolakala et al.	Oct 2018	A1
20180321874	Li et al.	Nov 2018	A1
20190036703	Bestler	Jan 2019	A1
20190220315	Vallala et al.	Jul 2019	A1
20200034560	Natanzon et al.	Jan 2020	A1
20200057752	Tofano	Feb 2020	A1
20200326871	Wu et al.	Oct 2020	A1
20210360833	Alissa et al.	Nov 2021	A1

Foreign Referenced Citations (6)

Number	Date	Country
2164006	Mar 2010	EP
2256621	Dec 2010	EP
0213033	Feb 2002	WO
2008103569	Aug 2008	WO
2008157081	Dec 2008	WO
2013032825	Mar 2013	WO

Non-Patent Literature Citations (24)

Entry
Hwang et al., “RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing”, Proceedings of The Ninth International Symposium On High-performance Distributed Computing, Aug. 2000, pp. 279-286, The Ninth International Symposium on High-Performance Distributed Computing, IEEE Computer Society, Los Alamitos, CA.
International Search Report and Written Opinion, PCT/US2015/018169, May 15, 2015, 10 pages.
International Search Report and Written Opinion, PCT/US2015/034291, Sep. 30, 2015, 3 pages.
International Search Report and Written Opinion, PCT/US2015/034302, Sep. 11, 2015, 10 pages.
International Search Report and Written Opinion, PCT/US2015/039135, Sep. 18, 2015, 8 pages.
International Search Report and Written Opinion, PCT/US2015/039136, Sep. 23, 2015, 7 pages.
International Search Report and Written Opinion, PCT/US2015/039137, Oct. 1, 2015, 8 pages.
International Search Report and Written Opinion, PCT/US2015/039142, Sep. 24, 2015, 3 pages.
International Search Report and Written Opinion, PCT/US2015/044370, Dec. 15, 2015, 3 pages.
International Search Report and Written Opinion, PCT/US2016/014356, Jun. 28, 2016, 3 pages.
International Search Report and Written Opinion, PCT/US2016/014357, Jun. 29, 2016, 3 pages.
International Search Report and Written Opinion, PCT/US2016/014361, May 30, 2016, 3 pages.
International Search Report and Written Opinion, PCT/US2016/014604, May 19, 2016, 3 pages.
International Search Report and Written Opinion, PCT/US2016/016504, Jul. 6, 2016, 7 pages.
International Search Report and Written Opinion, PCT/US2016/023485, Jul. 21, 2016, 13 pages.
International Search Report and Written Opinion, PCT/US2016/024391, Jul. 12, 2016, 11 pages.
International Search Report and Written Opinion, PCT/US2016/026529, Jul. 19, 2016, 9 pages.
International Search Report and Written Opinion, PCT/US2016/031039, Aug. 18, 2016, 7 pages.
International Search Report and Written Opinion, PCT/US2016/033306, Aug. 19, 2016, 11 pages.
International Search Report and Written Opinion, PCT/US2016/047808, Nov. 25, 2016, 14 pages.
Kim et al., “Data Access Frequency based Data Replication Method using Erasure Codes in Cloud Storage System”, Journal of the Institute of Electronics and Information Engineers, Feb. 2014, vol. 51, No. 2, 7 pages.
Schmid, “RAID Scaling Charts, Part 3:4-128 KB Stripes Compared”, Tom's Hardware, Nov. 27, 2007, URL: http://www.tomshardware.com/reviews/RAID-SCALING-CHARTS.1735-4.html, 24 pages.
Stalzer, “FlashBlades: System Architecture and Applications”, Proceedings of the 2nd Workshop on Architectures and Systems for Big Data, Jun. 2012, pp. 10-14, Association for Computing Machinery, New York, NY.
Storer et al., “Pergamum: Replacing Tape with Energy Efficient, Reliable, Disk-Based Archival Storage”, FAST'08: Proceedings of the 6th USENIX Conference on File and Storage Technologies, Article No. 1, Feb. 2008, pp. 1-16, USENIX Association, Berkeley, CA.

Related Publications (1)

	Number	Date	Country
	20230359381 A1	Nov 2023	US

Provisional Applications (1)

	Number	Date	Country
	62330728	May 2016	US

Continuations (2)

	Number	Date	Country
Parent	16194119	Nov 2018	US
Child	18353264		US
Parent	15333903	Oct 2016	US
Child	16194119		US

Deduplication selection and optimization

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

CPC

Field of Search

CPC

International Classifications

Term Extension

Abstract