The present disclosure concerns flash storage devices and, more particularly, offline deduplication processes for flash storage devices.
Solid-state storage devices (SSDs) may use flash memory as a non-volatile storage medium. A deduplication or dedupe process allows for more efficient use of space. In a deduplication process, duplicate data entries are removed. Rather than storing multiple copies of the same data at multiple physical addresses on the storage device, only one copy of the data is stored at a physical address, with references to that one copy replacing the other copies. Deduplication may be performed inline, as a write command is received from a host. Before writing the data, the data is compared against data already stored on the storage device. If a match is found, a reference to that match is used, rather than writing the data to a new physical address. However, this inline dedupe may add latency to write operations.
According to aspects of the subject technology, a method for managing a flash storage system is provided. The method includes reading a plurality of flash data units from flash memory into a buffer, wherein each of the plurality of flash data units includes one or more host data units, and determining an identifier for each of the host data units read into the buffer. The method includes selecting a set of unique identifiers from the determined identifiers based on a number of host data units that share the respective unique identifiers. For each unique identifier in the set of unique identifiers, the method includes designating a first host data unit sharing the unique identifier as a master data unit, wherein a logical address of the first host data unit is mapped to a first physical address in the flash memory in a lookup table, remapping, in the lookup table, respective logical addresses of one or more second host data units sharing the unique identifier from respective second physical addresses in the flash memory to the first physical address in the flash memory, and invalidating data stored at the respective second physical addresses in the flash memory.
According to other aspects of the subject technology, a flash storage system is provided. The flash storage system includes a plurality of flash memory devices, a memory comprising a buffer, and a controller. The controller is configured to read a plurality of flash data units from the plurality of flash memory devices into the buffer, wherein each of the plurality of flash data units includes one or more host data units, determine an identifier for each of the host data units read into the buffer, and select a set of unique identifiers from the determined identifiers based on a number of host data units that share the respective unique identifiers. For each unique identifier in the set of unique identifiers, the controller is configured to designate a first host data unit sharing the unique identifier as a master data unit, wherein a logical address of the first host data unit is mapped to a first physical address in the flash memory device in a lookup table, remap, in the lookup table, respective logical addresses of one or more second host data units sharing the unique identifier from respective second physical addresses in the flash memory device to the first physical address in the flash memory device, and invalidate data stored at the respective second physical addresses in the flash memory device.
According to other aspects of the subject technology, a non-transitory machine-readable medium comprises instructions stored therein, which when executed by a machine, cause the machine to perform operations. The operations include reading a plurality of flash data units from flash memory into a buffer, wherein each of the plurality of flash data units includes one or more host data units, determining an identifier for each of the host data units read into the buffer, and selecting a set of unique identifiers from the determined identifiers based on a number of host data units that share the respective unique identifiers. For each unique identifier in the set of unique identifiers, the operations include designating a first host data unit sharing the unique identifier as a master data unit, wherein a logical address of the first host data unit is mapped to a first physical address in the flash memory in a lookup table, remapping, in the lookup table, respective logical addresses of one or more second host data units sharing the unique identifier from respective second physical addresses in the flash memory to the first physical address in the flash memory, and invalidating data stored at the respective second physical addresses in the flash memory.
It is understood that other configurations of the subject technology will become readily apparent to those skilled in the art from the following detailed description, wherein various configurations of the subject technology are shown and described by way of illustration. As will be realized, the subject technology is capable of other and different configurations and its several details are capable of modification in various other respects, all without departing from the scope of the subject technology. Accordingly, the drawings and detailed description are to be regarded as illustrative in nature and not as restrictive.
The detailed description set forth below is intended as a description of various configurations of the subject technology and is not intended to represent the only configurations in which the subject technology may be practiced. The appended drawings are incorporated herein and constitute a part of the detailed description. The detailed description includes specific details for the purpose of providing a thorough understanding of the subject technology. However, the subject technology may be practiced without these specific details. In some instances, structures and components are shown in block diagram form in order to avoid obscuring the concepts of the subject technology.
An SSD may include one or more flash memory devices, each of which comprises an array of flash memory cells. The flash memory cells may be organized into physical blocks, with each physical block comprising a number of pages. Data is written to flash memory in write units of pages, where each page has the capacity to store a predetermined number of host data units or sectors. Host data files may be written sequentially to flash memory, at the next available location. However, data is erased from flash memory in erase units of physical blocks. The SSD may perform maintenance operations, which may help manage data storage/utilization and lifespan of the flash memory devices.
In a deduplication or dedupe process, storage space is more efficiently utilized by eliminating duplicate data units. During inline dedupe, when a write command is received from the host, the host data unit to be written is compared against host data units stored in the storage device. If a match is found, the target logical address of the write command is mapped to the physical address of the matching host data unit. If a match is not found, the host data unit is written to an available physical address, and the target logical address is mapped to the written physical address. However, the dedupe process may add delay to the completion of the write command. Applying deduplication during maintenance operations (offline deduplication), such as garbage collection (GC), may avoid the write latency during host write commands.
The interface 115 provides physical and electrical connections between the host 150 and the flash storage system 110. The interface 115 is configured to facilitate communication of data, commands, and/or control signals between the host 150 and the flash storage system 110 via the physical and electrical connections. The connection and the communications with the interface 115 may be based on a standard interface such as Universal Serial Bus (USB), Small Computer System Interface (SCSI), Serial Advanced Technology Attachment (SATA), etc. Alternatively, the connection and/or communications may be based on a proprietary interface. Those skilled in the art will recognize that the subject technology is not limited to any particular type of interface.
The controller 120 manages the flow of data between the host 150 and the flash memory devices 130. The controller 120 is configured to receive commands and data from the host 150 via the interface 115. For example, the controller 120 may receive data and a write command from the host 150 to write the data in the flash memory devices 130. The controller 120 is further configured to send data to the host 150 via the interface 115. For example, the controller 120 may read data from the flash memory devices 130 and send the data to the host 150 in response to a read command. The controller 120 is further configured to manage data stored in the flash memory devices 130 and the memory 125 based on internal control algorithms or other types of commands that may be received from the host 150. For example, the controller 120 is configured to perform GC and other maintenance operations. Those skilled in the art will be familiar with other conventional operations performed by a controller in a flash storage device, which will not be described in detail herein.
The controller 120 may be implemented with a general purpose processor, micro-controller, digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or any combination thereof designed and configured to perform the operations and functions described herein. In certain implementations, the controller 120 may include the processor 122, which may be a specialized processor for a specific operation, such as calculating a Secure Hash Algorithm (SHA). The controller 120 may perform the operations and functions described herein by executing one or more sequences of instructions stored on a machine/computer readable medium. The machine/computer readable medium may be the flash memory devices 130, the memory 125, or other types of media from which the controller 120 can read instructions or code. For example, flash storage system 110 may include a read only memory (ROM), such as an EPROM or EEPROM, encoded with firmware/software comprising one or more sequences of instructions read and executed by the controller 120 during the operation of the flash storage system 110.
The flash memory devices 130 may each be a single flash memory chip or may represent groups of multiple flash memory chips. The flash memory devices 130 may be organized among multiple channels through which data is read from and written to the flash memory devices 130 by the controller 120, or coupled to a single channel. The flash memory devices 130 may be implemented using NAND flash.
The flash memory devices 130 comprise multiple memory cells divided into storage blocks. These storage blocks may be referred to as data blocks or memory blocks and are addressable by the controller 120 using a physical block address. Each of the storage blocks is further divided into multiple data segments or pages addressable by the controller 120 using a physical page address or offset from a physical block address of the storage block containing the referenced page. The pages may store sectors or other host data units. The storage blocks represent the units of data that are erased within the flash memory devices 130 in a single erase operation. The physical pages represent the units of data that are read from or written to the flash memory devices 130 in a single read or write operation.
The subject technology is not limited to any particular capacity of flash memory device. For example, storage blocks may each comprise 32, 64, 128, or 512 pages. Additionally, pages may each comprise 512 bytes, 2 KB, 4 KB, or 32 KB. The sectors may each comprise 4 KB, or other sizes such that sectors may be the same size as a page, or there may be multiple sectors per page.
Returning to
The host 150 may be a computing device, such as a computer/server, a smartphone, or any other electronic device that reads data from and writes data to the flash storage system 110. The host 150 may have an operating system or other software that issues read and write commands to the flash storage system 110. The flash storage system 110 may be integrated with the host 150 or may be external to the host 150. The flash storage system 110 may be wirelessly connected to the host 150, or may be physically connected to the host 150.
The controller 120 is configured to perform maintenance operations on the flash memory devices 130. For example, the controller 120 may determine that GC should be performed. For example, the controller 120 may determine that a number of available blocks may be below a threshold. The controller 120 may keep track of the Program/Erase (P/E) cycles of each block, for wear leveling purposes.
Once GC is triggered, the block 240A, for example, may be selected based on the amount of invalid data units contained in the block 240A, an error count associated with the block 240A, or other parameters such as P/E cycles. For instance, even if the block 240B could be selected for GC—having a significant amount of invalid data—the block 240A may be selected instead based on the respective P/E cycles of the blocks 240A and 240B.
Once the block 240A is selected for GC, the valid data from the block 240A is copied to an empty block, such as the block 240D, and the block 240A is erased. However, the controller 120 may be configured to perform deduplication before copying the valid data from the block 240A to the block 240D.
With a block, such as the block 240A, selected for a maintenance operation, such as GC, the deduplication process may start by reading a page of the selected block, such as the page 245A1. At step 310, a plurality of flash data units are read from flash memory into a buffer, wherein each of the plurality of flash data units includes one or more host data units. For instance, the page 245A1's data may be read into the buffer 127. For better deduplication performance, more data may be read into the buffer, as analyzing more data increases a likelihood of finding duplicate data. Accordingly, the remaining pages of the band associated with the read page, for instance the band 251, may also be read into the buffer. Because the other pages of the band are on different channels, the pages may be read in parallel, such that reading the entire band may not add significant read overhead. In certain implementations, the maintenance operation may select a certain number of flash data units, such as half of a band or stripe, such that not all flash memory devices may be analyzed.
At step 320, an identifier for each of the host data units read into the buffer is determined. For instance, a hash algorithm, such as a SHA, may be applied to each host data unit to create respective digests. Although a SHA is discussed herein, other types of identifiers may be utilized rather than a SHA. The identifiers are determined such that the same data creates the same identifier, in order to identify duplicate data entries. The identifiers may be determined by the controller 120. In certain implementations, a small hardware processor, such as the processor 122, may be used to calculate the SHA. The processor 122 may be specifically configured to calculate the SHA, in order to increase performance and allow the controller 120 to perform other operations in parallel.
At step 330, a set of unique identifiers is selected from the determined identifiers based on a number of host data units that share the respective unique identifiers. The selection may be based on analyzing a data structure, such as a B tree, containing the identifiers determined in step 320. For instance, the controller 120 may construct a B tree in the memory 125 as each identifier is determined in step 320.
The new key is added to the B tree based on the value of the digest. The digests may be compared, for example, by alphanumeric value, such that a first digest may be considered greater than, less than, or equal to a second digest. Starting at the root node (node 410A in
Once the identifiers of all the host data units are entered into the B tree, the B tree may be complete. The set of unique identifiers may be selected based on the count values stored with each key. For example, the identifiers corresponding to the largest counts may be selected. A threshold number of identifiers, such as 16, may be selected, or a threshold percentage of the total number of host data units examined, such as 15%, may be used to select the identifiers having counts which in aggregate reach the threshold percentage without exceeding the threshold percentage.
Alternatively, only identifiers having counts greater than a lower threshold count may be selected, to prevent selection of identifiers having too small of a count for a deduplication benefit. For instance, the 16 largest counts may include a count of two, which may not provide significant deduplication. Identifiers having counts greater than an upper threshold count may be excluded as well. Although deduplicating a large count would provide increased storage efficiency, data management may become burdensome. If the count is large enough, then multiple read commands may require reading from the same physical address within a short time period, which may create a bottleneck and cause read delay. For instance, if every host data unit contained the same data, then all read commands would require access to one physical address.
For each unique identifier in the set of unique identifiers, steps 340, 350, and 360 are performed. At step 340, a first host data unit sharing the unique identifier is designated as a master data unit, wherein a logical address of the first host data unit is mapped to a first physical address in the flash memory in a lookup table.
At step 350, respective logical address of one or more second host data units sharing the unique identifier are remapped from respective second physical addresses in the flash memory to the first physical address in the flash memory. In
At step 360, data stored at the respective second physical addresses in the flash memory are invalidated. Because the LBAs for the deduped entries point to the master data unit's PBA, the PBAs previously pointed to may be later freed. The maintenance operation, such as the GC operation in which the flash data unit was originally selected, may continue. Because deduped PBAs are now marked invalid, there may be less valid data to copy for the GC operation than there was before the deduplication. In other words, more invalid host data units may be claimed for GC.
In addition, the B tree may be deleted from memory after the deduplication is complete. Because the dedupe operation is not inline, the B tree does not need to be maintained for new write commands, and the memory may be freed for other operations. If a deduped band is later selected for dedupe, the B tree may be recreated based on the backtrace table and/or the dedupe flagged entries in the lookup table.
As data in the flash storage system is modified, such as being updated or erased, deduped LBAs may require updating.
At step 620, a reverse mapping of the physical address of the master data unit to the logical address of the target host data is removed from the backtrace table. When the target host data unit is not a master data unit, the corresponding PBA is unaffected, and the respective entries in the lookup table and the backtrace table are updated. For instance,
At step 622, a host data unit of the new physical address is designated as the master data unit. For instance, PBA 40 may be designated as the new master data unit.
At step 632, respective logical addresses of the second host data units are remapped to the new physical address in the lookup table. For example, in
At step 642, reverse mappings of the physical address of the target host data unit are updated, in the backtrace table, to reverse mappings of the new physical address. In the backtrace table, the old master PBA is updated to the new master PBA. For example, in
At step 652, the data stored at the physical address of the target host data unit is invalidated. The old master PBA is marked as invalid to complete the modification. The invalidated PBA may later be erased and freed in a GC operation. In certain implementations, the processes of flowchart 600 and 602 may be combined in order to modify a specific data unit. For instance, if a particular deduped LBA and PBA were to be modified, the processes of flowchart 600 and 602 may be utilized to update the lookup and backtrace tables.
Deduplication for the flash storage system may be enabled by a user, for example, through a user interface accessible through the host. The user interface may be implemented as an application or as a webpage accessible through a browser on the host. The user may also select certain options or parameters for deduplication of the flash storage system. For instance, a percentage of dedupe capability may be selected such that a threshold percentage of the total available space of the flash storage system may be deduped. Once the threshold percentage has been deduped, the flash storage system may not further dedupe data, unless the percentage falls below the threshold percentage. In addition, certain dedupe patterns may be stored normally without deduplication. For example, the user may designate specific data patterns that would not be selected for deduplication. Such patterns may include common patterns (such as all zeros), which may create significant read latency or management burdens. Deduplication may be disabled by the user, which may require formatting the flash storage system.
Deduplication may be prioritized for cold bands of data. The flash storage system may keep track of hot data bands, for example based on the frequency of LBAs' data changing. Performing dedupe on cold data minimizes bookkeeping, such as updating the lookup table and backtrace table. Because cold data may be updated less frequently, master data units may be updated less frequently, which may reduce write amplification. For example, when GC selects a band, deduplication may not be performed if the band is considered hot.
Deduplication may increase read amplification, as additional reads may be needed to read data for deduped entries. However, because less writes are required, and less storage space is utilized, the frequency of GC or other maintenance operations may be reduced, and the overall lifetime of the flash storage system may be increased. In addition, because the deduplication is not performed during host write operations, the deduplication does not add latency to host write operations. Because copying the valid data requires writing data, reducing the amount of data written may increase the lifespan of the corresponding flash memory device.
Offline deduplication may be easier to interrupt than inline deduplication. For example, with offline dedupe, data is already stored on the flash storage device, and the mappings are updated. If a new read command is received during offline deduplication, the data may be read, because the data is still stored as previously written, or the mappings may have already been updated through dedupe. If a new write command is received during offline deduplication, the data is generally written to a new physical address, rather than the physical addresses being examined for dedupe. During offline dedupe, the GC commands may have lower priority over host write commands, providing greater performance benefits to the host. In addition, deduplication may optimize storage utilization as more free space may be available.
Those of skill in the art would appreciate that the various illustrative blocks, modules, elements, components, methods, and algorithms described herein may be implemented as electronic hardware, computer software, or combinations of both. To illustrate this interchangeability of hardware and software, various illustrative blocks, modules, elements, components, methods, and algorithms have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application. Various components and blocks may be arranged differently (e.g., arranged in a different order, or partitioned in a different way) all without departing from the scope of the subject technology.
It is understood that the specific order or hierarchy of steps in the processes disclosed is an illustration of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged. Some of the steps may be performed simultaneously. The accompanying method claims present elements of the various steps in a sample order, and are not meant to be limited to the specific order or hierarchy presented.
The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein, but is to be accorded the full scope consistent with the language claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more. Pronouns in the masculine (e.g., his) include the feminine and neuter gender (e.g., her and its) and vice versa. Headings and subheadings, if any, are used for convenience only and do not limit the invention.
A phrase such as an “aspect” does not imply that such aspect is essential to the subject technology or that such aspect applies to all configurations of the subject technology. A disclosure relating to an aspect may apply to all configurations, or one or more configurations. A phrase such as an aspect may refer to one or more aspects and vice versa. A phrase such as a “configuration” does not imply that such configuration is essential to the subject technology or that such configuration applies to all configurations of the subject technology. A disclosure relating to a configuration may apply to all configurations, or one or more configurations. A phrase such a configuration may refer to one or more configurations and vice versa.
The word “exemplary” is used herein to mean “serving as an example or illustration.” Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs.
All structural and functional equivalents to the elements of the various aspects described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims. No claim element is to be construed under the provisions of 35 U.S.C. §112, sixth paragraph, unless the element is expressly recited using the phrase “means for” or, in the case of a method claim, the element is recited using the phrase “step for.” Furthermore, to the extent that the term “include,” “have,” or the like is used in the description or the claims, such term is intended to be inclusive in a manner similar to the term “comprise” as “comprise” is interpreted when employed as a transitional word in a claim.