Deallocated block determination

Information

  • Patent Grant
  • 11995318
  • Patent Number
    11,995,318
  • Date Filed
    Thursday, April 6, 2023
    a year ago
  • Date Issued
    Tuesday, May 28, 2024
    7 months ago
Abstract
A first data block on a storage device including a data structure of deallocated data blocks on the storage device and a corresponding program erase count value for each of the deallocated data blocks is identified. A determination as to whether a second data block from the data structure of deallocated data blocks remains deallocated after being added to the data structure of deallocated data blocks based on the program erase count value is made. The data is stored at the second data block upon determining that the second data block remains deallocated after being added to the data structure of deallocated data blocks.
Description
BACKGROUND

As computer memory storage and data bandwidth increase, so does the amount and complexity of data that businesses manage daily. Large-scale distributed storage systems, such as data centers, typically run many business operations. A datacenter, which also may be referred to as a server room, is a centralized repository, either physical or virtual, for the storage, management, and dissemination of data pertaining to one or more businesses. A distributed storage system may be coupled to client computers interconnected by one or more networks. If any portion of the distributed storage system has poor performance, company operations may be impaired. A distributed storage system therefore maintains high standards for data availability and high-performance functionality.





BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings.



FIG. 1 is a block diagram illustrating a storage system in which embodiments of the present disclosure may be implemented.



FIG. 2 is a block diagram illustrating a memory manager in a storage controller, according to an embodiment.



FIG. 3 is a flow diagram illustrating a method for performing a metadata scan to populate a data structure, according to an embodiment.



FIG. 4 is a flow diagram illustrating a method for identifying a deallocated data block on a storage device, according to an embodiment.



FIG. 5 is a flow diagram illustrating a method for identifying leading edge data on a storage device, according to an embodiment.



FIG. 6 is a block diagram illustrating an exemplary computer system, according to an embodiment.





DETAILED DESCRIPTION

Embodiments are described for efficient flash management for multiple controllers. In one embodiment, a memory manager module running on a storage controller utilizes physical block addressing rather than logical addressing to manage the data stored on the underlying storage devices in a connected storage array. The memory manager may abide by certain principles including having no preference for particular physical addresses, such that data does not have a “primary” location, “secondary” location, etc., but rather is just scattered randomly about the drive. Another principle of operations is not trying to write a “trickle” of tiny metadata updates because as the drive's state evolves, it may be best to use only the metadata persisted into flash blocks, along with periodically persisted data managed by the memory manager.


In one embodiment, the memory manager described herein achieves these principles by meeting at least three individual objectives. First, the memory manager allows a fast start of the primary storage controller by quickly locating recently-written flash blocks of one or more types of data. Second, the memory manager provides a list of flash blocks that are deallocated and ready for client writes. Third, the memory manager respects the needs of modern flash devices by delaying the erase of flash blocks until just before those blocks are needed for writing.


To accomplish the above objectives, the memory manager works in connection with firmware in the underlying storage devices. In one embodiment, the storage device firmware implements a metadata tracking scheme that stores certain values along with any data payload. These values may include a program/erase count for each data block that indicates a number of cycles during which the block has been written and erased, and a block type value, which may identify a storage client that owns the data in that block. In addition, the storage device firmware maintains a table or other data structure containing data for each data block and allows the memory manager to access the table. For each data block, the table also stores the program/erase count, the block type value, and a block status indicator (erased, written, unreadable, bad, etc.). On power-up of the storage device, the firmware may scan the data blocks to recover the embedded metadata and populate the data structure with the metadata recovered from those blocks.


The memory manager manages blocks in concert with a data structure called the “frontier set.” The frontier set is a data structure that is written to flash that declares the state of a storage device in a way that allows future readers to determine not only what was true at that point in time, but also to recover the effect of operations that occurred after the frontier set was written to flash. The frontier set, in its most primitive form, is simply a list of block numbers and their corresponding program/erase counts. This is a declaration that at the moment the frontier set was created, block X was deallocated when its program/erase count was Y. The memory manager can use the frontier set to extrapolate the state of a block beyond the moment the frontier set was created and written. If block X was deallocated when its program/erase count was Y, the memory manager can make at least two logical conclusions. First, a future reader that finds block X still at program/erase count Y can conclude that block must still be deallocated. Second, a future reader that finds block X at some program/erase count Z>Y can conclude that some client must have written new data to that block, after this frontier set was created or updated.


As explained herein, these conclusions allow the memory manager to achieve the objectives described above. As long as the memory manager writes a new frontier set periodically (and sufficient blocks remain available), it is possible to allow the controller to discover known deallocated blocks that are ready for new writes. In addition, erases can be delayed until just before the blocks are rewritten because block deallocation will be “eventually consistent.” This means that a deallocated block may not be seen as deallocated by all possible future primary controllers since it's possible that deallocated blocks may revert to the allocated state, until the next frontier set is persisted. Deallocated blocks are available for reuse, but until the moment that their new owner actually writes to them (which implies that the embedded metadata will include a new program/erase count and block type value), that block may revert back to its previous owner. Furthermore, a fast start of the primary controller is achieved by locating the leading edge of newly written data from a particular client. To assist a major client storing on the order of 100,000 to 1,000,000 blocks or more, the memory manager can define two classes of data, or more precisely, two states that a data block can be in. A “boot” block is one that contains new data and a “standalone” block contains cold data (i.e., data that has been around and untouched for a certain period of time). Boot blocks can be quickly and efficiently enumerated by memory manager to the client, after a crash or power loss. When the client no longer requires this block to be enumerated as a boot block, it will indicate this to the memory manager (a process referred to as “graduation”). This block will then become a standalone block at the memory manager's discretion.



FIG. 1 is a block diagram illustrating a storage system 100 in which embodiments of the present disclosure may be implemented. Storage system 100 may include storage controllers 110, 150 and storage array 130, which is representative of any number of data storage arrays or storage device groups. As shown, storage array 130 includes storage devices 135A-n, which are representative of any number and type of storage devices (e.g., solid-state drives (SSDs)). Storage controller 110 may be coupled directly to initiator device 125 and storage controller 110 may be coupled remotely over network 120 to initiator device 115. In one embodiment, storage controller 150 is coupled remotely over network 120 to initiator device 115. Initiator devices 115 and 125 are representative of any number of clients which may utilize storage controllers 110 and 150 for storing and accessing data in storage system 100. It is noted that some systems may include only a single client or initiator device, connected directly or remotely, to storage controllers 110 and 150.


In one embodiment, controller 110 is designated as the “primary” controller, which performs most or all of the I/O operations on the array 130. If, however, a software crash, hardware fault or other error occurs, the “secondary” controller 150 may be promoted to serve as the primary controller and take over all responsibilities for servicing the array 130. In one embodiment, storage controllers 110 and 150 are identical and any description of controller 110 herein may be equally attributed to storage controller 150.


Storage controller 110 may include software and/or hardware configured to provide access to storage devices 135A-n. Although storage controller 110 is shown as being separate from storage array 130, in some embodiments, storage controller 110 may be located within storage array 130. Storage controller 110 may include or be coupled to a base operating system (OS), a volume manager, and additional control logic, such as memory manager 140, for implementing the various techniques disclosed herein. In one embodiment, the OS is designed with flash storage in mind, so while it can use conventional SSDs to store data, it does not depend on a 512 byte random overwrite capability. Even on conventional SSDs, storage controller 110 can achieve better performance by writing and discarding data in large chunks. This style of I/O is sometimes called “flash friendly I/O.” This also makes it a much easier task to convert the OS to use the physical addressing of storage devices, as compared to conventional filesystems.


In one embodiment, the logic of memory manager 140 is contained within an object which manages one of devices 135A-n. Thus, there may be a separate memory manager object for each device 135A-n in storage array 130. As new devices are connected to controller 110, new memory manager objects may be created. These objects may be similarly discarded when a corresponding device is disconnected from storage controller 110. Clients wishing to communicate with memory manager 140, such as one of initiator applications 112, 122, the operating system running on storage controller 110 or another client application running on storage controller 110, may do so via a memory manager application programming interface (API) published by memory manager 140. In one embodiment, multiple clients can access the same memory manager object concurrently. In one embodiment, storage controller 150 includes a separate instance(s) of memory manager 152.


Storage controller 110 may include and/or execute on any number of processing devices and may include and/or execute on a single host computing device or be spread across multiple host computing devices, depending on the embodiment. In some embodiments, storage controller 110 may generally include or execute on one or more file servers and/or block servers. Storage controller 110 may use any of various techniques for replicating data across devices 135A-n to prevent loss of data due to the failure of a device or the failure of storage locations within a device. Storage controller 110 may also utilize any of various deduplication techniques for reducing the amount of data stored in devices 135A-n by deduplicating common data.


In one embodiment, storage controller 110 may utilize logical volumes and mediums to track client data that is stored in storage array 130. A medium is defined as a logical grouping of data, and each medium has an identifier with which to identify the logical grouping of data. A volume is a single accessible storage area with a single file system, typically, though not necessarily, resident on a single partition of a storage device. The volumes may be logical organizations of data physically located on one or more of storage device 135A-n in storage array 130. Storage controller 110 may maintain a volume to medium mapping table to map each volume to a single medium, and this medium is referred to as the volume's anchor medium. A given request received by storage controller 110 may indicate at least a volume and block address or file name, and storage controller 110 may determine an anchor medium targeted by the given request from the volume to medium mapping table.


In one embodiment, storage controller 110 includes memory manager 140. Memory manager 140 can perform various operations to identify deallocated data blocks available for writing and to identify leading edge data that was most recently written by a particular client. In one embodiment, memory manager 140 can receive a request to write data to a storage device 135A and can determine a first data block on storage device 135A comprising a list of deallocated data blocks. That list may include a block number of each deallocated data block and an access operation count value (e.g., program/erase count value) at which each deallocated data block was deallocated. Memory manager 140 can then identify a second data block from the list of deallocated data blocks and write the requested data to that second data block. To identify the leading edge data, memory manager 140 may access a data structure stored in memory on storage device 135A, where the data structure stores block metadata for each data block on storage device 135A. Memory manager 140 may determine, from the data structure, a first data block on storage device 135A comprising a list of deallocated data blocks on the storage device and compare a first access operation count value associated with each of the deallocated data blocks from the data structure to a second access operation count value associated with each of the deallocated data blocks from the list of deallocated data blocks. Memory manager 140 may label a second data block on the list as comprising new data responsive to the first access operation count value associated with the second data block from the data structure not matching the second access operation count value associated with the second data block from the list of deallocated data blocks.


In various embodiments, multiple mapping tables may be maintained by storage controller 110. These mapping tables may include a medium mapping table and a volume to medium mapping table. These tables may be utilized to record and maintain the mappings between mediums and underlying mediums and the mappings between volumes and mediums. Storage controller 110 may also include an address translation table with a plurality of entries, wherein each entry holds a virtual-to-physical mapping for a corresponding data component. This mapping table may be used to map logical read/write requests from each of the initiator devices 115 and 125 to physical locations in storage devices 135A-n. A “physical” pointer value may be read from the mappings associated with a given medium during a lookup operation corresponding to a received read/write request. The term “mappings” is defined as the one or more entries of the address translation mapping table which convert a given medium ID and block number into a physical pointer value. This physical pointer value may then be used to locate a physical location within the storage devices 135A-n. The physical pointer value may be used to access another mapping table within a given storage device of the storage devices 135A-n. Consequently, one or more levels of indirection may exist between the physical pointer value and a target storage location.


In alternative embodiments, the number and type of client computers, initiator devices, storage controllers, networks, storage arrays, and data storage devices is not limited to those shown in FIG. 1. At various times one or more clients may operate offline. In addition, during operation, individual client computer connection types may change as users connect, disconnect, and reconnect to storage system 100. Further, the systems and methods described herein may be applied to directly attached storage systems or network attached storage systems and may include a host operating system configured to perform one or more aspects of the described methods. Numerous such alternatives are possible and are contemplated.


Network 120 may utilize a variety of techniques including wireless connections, direct local area network (LAN) connections, wide area network (WAN) connections such as the Internet, a router, storage area network, Ethernet, and others. Network 120 may comprise one or more LANs that may also be wireless. Network 120 may further include remote direct memory access (RDMA) hardware and/or software, transmission control protocol/internet protocol (TCP/IP) hardware and/or software, router, repeaters, switches, grids, and/or others. Protocols such as Fibre Channel, Fibre Channel over Ethernet (FCoE), iSCSI, and so forth may be used in network 120. The network 120 may interface with a set of communications protocols used for the Internet such as the Transmission Control Protocol (TCP) and the Internet Protocol (IP), or TCP/IP. In one embodiment, network 120 represents a storage area network (SAN) which provides access to consolidated, block level data storage. The SAN may be used to enhance the storage devices accessible to initiator devices so that the devices 135A-n appear to the initiator devices 115 and 125 as locally attached storage.


Initiator devices 115 and 125 are representative of any number of stationary or mobile computers such as desktop personal computers (PCs), servers, server farms, workstations, laptops, handheld computers, servers, personal digital assistants (PDAs), smart phones, and so forth. Generally speaking, initiator devices 115 and 125 include one or more processing devices, each comprising one or more processor cores. Each processor core includes circuitry for executing instructions according to a predefined general-purpose instruction set. For example, the x86 instruction set architecture may be selected. Alternatively, the ARM®, Alpha®, PowerPC®, SPARC®, or any other general-purpose instruction set architecture may be selected. The processor cores may access cache memory subsystems for data and computer program instructions. The cache subsystems may be coupled to a memory hierarchy comprising random access memory (RAM) and a storage device.


In one embodiment, initiator device 115 includes initiator application 112 and initiator device 125 includes initiator application 122. Initiator applications 112 and 122 may be any computer application programs designed to utilize the data on devices 135A-n in storage array 130 to implement or provide various functionalities. Initiator applications 112 and 122 may issue requests to read or write data from certain logical volumes data within storage system 100. Those requests can be serviced by memory manager 140 of storage controller 110, as described in detail herein.



FIG. 2 is a block diagram illustrating memory manager 140 in a storage controller 110, according to an embodiment. In one embodiment, memory manager 140 includes client interface 242, data structure interface 244, data block interface 246 and comparison logic 248. This arrangement of modules may be a logical separation, and in other embodiments, these modules, interfaces or other components can be combined together or separated in further components. In one embodiment, storage device 135A is connected to memory manager 140 and includes firmware 252, memory 235 storing data structure 254, and data blocks 256. In one embodiment, storage device 135A may be external to storage controller 110 as part of storage array 130 and may be connected to storage controller 110 over a network or other connection. In other embodiments, storage controller 110 may include different and/or additional components which are not shown to simplify the description. Storage device 135A may include one or more mass storage devices which can include, for example, flash memory or solid-state drives (SSDs). Memory 235 may include for example, random-access memory (RAM); dynamic random-access memory (DRAM); erasable programmable memory (e.g., EPROM and EEPROM); flash memory; or any other type of storage medium. In one embodiment, storage device 135A includes volatile memory 235, such as DRAM, and non-volatile data blocks 256, such as flash blocks or other persistent data blocks.


In one embodiment, client interface 242 manages communication with client devices or applications in storage system 100, such as initiator devices 115 or 125, or applications within storage controller 110. Client interface 242 can receive I/O requests to access data blocks 256 on storage device 135A from an initiator application 112 or 122 over network 120. In one embodiment, the I/O request includes a request to write new data to storage device 135A. After the write is performed, client interface may provide a notification to initiator device 115 or 125 over network 120 indicating that the write was successfully performed.


In one embodiment, data structure interface 244 interacts with data structure 254 in memory 235 on storage device 135A. In response to client interface 242 receiving a write request, for example, data structure interface 244 may access data structure 254 (e.g., a dynamic table) comprising block metadata for each of data blocks 256 on storage device 135A. The block metadata may include an indication of a block type of each data block 256 and an access operation count value for each data block 256. In one embodiment, the access operation count value is a total number of program/erase cycles that have been performed on the block. Using the block type indicator, data structure interface 244 may determine a first data block on storage device 135A which stores a list of deallocated data blocks on storage device 135A. This list may include a block number of each deallocated data block and an access operation count value at which each deallocated data block was deallocated.


In one embodiment, data block interface 246 interacts with data blocks 256 of storage device 135A as part of any data access operations being performed. For example, once data structure interface 244 determines the block storing the list of deallocated data blocks, data block interface 246 may identify a second block of those deallocated blocks from the list, and read an access operation count value associated with the second block from the list. If memory manager 140 ultimately determines that the second block was in fact deallocated, data block interface 246 may perform the requested write operation by overwriting the old data in the second data block with new data. If the allegedly deallocated data block was not actually deallocated (or has since been reallocated), data block interface 246 can remove the second data block from the list stored in the first data block. When memory manager 140 is attempting to locate leading edge data, data block interface 246 can determine whether a particular block was previously labeled as comprising new data. In addition, once memory manager 140 identifies the leading edge data, data block interface 246 can label the data blocks as comprising either new or old data, as appropriate.


In one embodiment, comparison logic 248 performs various calculations and comparisons as part of the operations performed by memory manager 140. For example, to verify that a block appearing on the list of deallocated data blocks is in fact deallocated or to determine whether a block is storing new or old data, comparison logic 248 may compare a first access operation count value associated with the data block from data structure 254 to a second access operation count value associated with the data block from the list of deallocated data blocks stored in one of data blocks 256 (identified by the block type value). If comparison logic 248 determines that the count values match, this indicates that the block has not been reallocated since it was added to the list of deallocated blocks and, thus, can either be labeled as storing old data or can safely be overwritten without losing any critical data. If the count values don't match, however, this indicates that another client has written data to that block and it should be removed from the list of deallocated blocks and/or labeled as comprising new data.



FIG. 3 is a flow diagram illustrating a method for performing a metadata scan to populate a data structure, according to an embodiment. The method 300 may be performed by processing logic that comprises hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions run on a processing device to perform hardware simulation), firmware, or a combination thereof. On power-up of the storage device 135A, the method 300 may scan the data blocks 256 to recover the embedded metadata and populate the data structure 254 with the metadata recovered from those blocks. In one embodiment, method 300 may be performed by the firmware 252 of storage device 135A, as shown in FIG. 2.


Referring to FIG. 3, at block 310, method 300 detects a restart of storage device 135A. After an event, such as a sudden power loss or software crash, storage device 135A may be automatically restarted. Firmware 252 can detect the restart and initiate any number of start-up procedures. Storage system 100 is designed to be extremely reliable even in the face of hardware or software failures, including sudden power losses or software crashes. Storage devices 135A-n may be designed with failures in mind, specifically to account for sudden crashes or power loss at any point during the process, and to assure that restarting after such an event is no different than any other restart operation.


At block 320, method 300 scans a plurality of data blocks 256 on storage device 135A to identify block metadata for each of the plurality of data blocks 256. In one embodiment, firmware 252 may scan each of the data blocks 256 to recover the embedded metadata. This metadata may include, for example, an indication of a block type of each of the data blocks 256 and an access operation count value for each of the data blocks 256. In one embodiment, this metadata may be stored in a header section of each individual block 256, so that it can be obtained quickly and efficiently. In one embodiment, the scan is reasonably fast (e.g., taking less than 10 seconds to scan one million or more blocks).


At block 330, method 300 stores the block metadata in data structure 254 stored in memory 235 on storage device 135A, wherein the block metadata is accessible by storage controller 110 coupled to storage device 135A. In one embodiment, data structure 254 comprises a plurality of entries, each of the entries corresponding to a different one of the data blocks 256 on storage device 135A. In one embodiment, data structure 254 maintains an indication of a block type and an access operation count value for each data block. One example of a block type that may be stored in metadata is a “bootstrap data” block. This block type stores data that is needed to restart the system after a power loss or other failure. Since the data blocks 256 may have limited reusability (e.g., approximately three thousand program/erase cycles), this data cannot be stored in the same place since it gets accessed regularly. Since the bootstrap data can be located easily during the metadata scan, it can be located anywhere on storage device 135A.


Data storage on flash follows a simple looking cycle: erase, program; erase, program. Once written or programmed, data is nonvolatile and may be read millions of times. Flash structures have a complex hierarchy (i.e., packages, dies, planes, blocks, pages, and bits). The memory manager 140 described herein operates primarily on the block level. This is because the block is the most common unit of erase and reuse. If a particular flash chip uses 16 MB blocks, then data will be written to flash in 16 MB chunks, and will be discarded in 16 MB chunks. One physical detail of NAND flash that becomes important is that the physical blocks and pages have some “extra” storage beyond the expected powers of two. Thus, a physical 16 KB block may actually contain 19 KB of physical bits. While most of these “extra” bits are consumed by error correction codes, there may be some room left over to store metadata about the block or its contents. Storage system 100 makes use of some of these bits to store metadata alongside any data stored by the controller.


There are a number of management tasks performed by any system that stores data on NAND flash chips. Normally these functions are all performed in SSD firmware and are concealed from the host computer. Flash chips have a limited lifespan, measured in program/erase (PE) cycles. Management software must spread data around such that flash blocks wear more or less evenly across the drive, or premature drive failure may result due to block failures. This may be referred to as “wear leveling.” Flash is an imperfect media and blocks may fail spontaneously. Thus, in one embodiment, management software must maintain a bad block list over the lifetime of the drive. In addition, most SSDs support a storage interface that is backwards compatible with hard disks from the last 20+ years, allowing a contiguous range of logical 512 byte sectors that can be overwritten randomly an arbitrary number of times. Firmware supports this interface via a complicated abstraction layer that maps logical addresses to physical flash locations dynamically to provide logical address mapping. In one embodiment, wear leveling and bad block handling are not performed in firmware, but rather are handled within memory manager 140 of storage controller 110. Thus, the logical address feature may be discarded as only physical flash addresses are used.



FIG. 4 is a flow diagram illustrating a method for identifying a deallocated data block on a storage device, according to an embodiment. The method 400 may be performed by processing logic that comprises hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions run on a processing device to perform hardware simulation), firmware, or a combination thereof. The method 400 can allow a storage controller to identify deallocated data blocks to allow those blocks to be overwritten without the risk of losing any critical data. In one embodiment, method 400 may be performed by memory manager 140, as shown in FIGS. 1 and 2.


Referring to FIG. 4, at block 410, method 400 receives a request to write data to storage device 135A (e.g., an SSD). In one embodiment, client interface 242 receives the request from an initiator application 112 or 122 over network 120, or from another client application on storage controller 110. The write request may not specify a particular data block on storage device 135A, so it may be up to memory manager 140 to identify a deallocated block. Since data blocks 256 may not be erased until right before they are written, even a block currently storing data may be “deallocated” and, thus, available for writing.


At block 420, method 400 accesses a dynamic table (e.g., data structure 254) stored in memory 235 on storage device 135A, the dynamic table comprising block metadata for each data block 256 on storage device 135A. In one embodiment, data structure interface 244 may access data structure 254 comprising block metadata for each of data blocks 256 on storage device 135A. The block metadata may include an indication of a block type of each data block and an access operation count value for each data block.


At block 430, method 400 determines a first data block on storage device 135A comprising a list of deallocated data blocks on the storage device 135A. In one embodiment, data structure interface 244 determines the first data block from the dynamic table based on the indication of the block type of the first data block. Using the block type indicator stored in the dynamic table, data structure interface 244 may determine a first data block on storage device 135A which stores a list of deallocated data blocks on storage device 135A. This list may include a block number of each deallocated data block and an access operation count value at which each deallocated data block was deallocated. In one embodiment, this list may be referred to as the “frontier set” and the block where it is stored may be given a special block type. In one embodiment, the frontier set is identified and read once when storage device 135A is started-up (or restarted), and the list of deallocated blocks is stored and their corresponding operation count values are stored in memory 235.


At block 440, method 400 identifies a second data block from the list of deallocated data blocks on the storage device 135A. In one embodiment, data block interface 246 accesses the first block on storage device 135A identified at block 430 from the dynamic table. Data block interface 246 may identify a second block of those deallocated blocks from the list, and read an access operation count value associated with the second block from the list. In one embodiment, the second data block may be identified from the list of deallocated blocks stored in memory 235 some period of time after the frontier set is initially identified at block 430.


At block 450, method 400 compares a first access operation count value associated with the second data block from the dynamic table to a second access operation count value associated with the second data block from the list of deallocated data blocks. In one embodiment, comparison logic 248 compares the first access operation count value to the second access operation count value associated with the data block. If comparison logic 248 determines that the count values match, this indicates that the block has not been reallocated since it was added to the list of deallocated blocks. Thus, at block 460, method 400 writes the requested data to the second data block. In one embodiment, data block interface 246 overwrites the old data stored in the second data block with the newly requested data received at block 410. If the count values do not match, however, this indicates that another client has since written new data to that block. Thus, at block 470, data block interface 246 removes the second data block from the list of deallocated data blocks.



FIG. 5 is a flow diagram illustrating a method for identifying leading edge data on a storage device, according to an embodiment. The method 500 may be performed by processing logic that comprises hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions run on a processing device to perform hardware simulation), firmware, or a combination thereof. The method 500 can allow a storage controller to identify leading edge data that was most recently written to storage device 135A by a particular storage client. In one embodiment, method 500 may be performed by memory manager 140, as shown in FIGS. 1 and 2.


Referring to FIG. 5, at block 510, method 500 accesses data structure 254 in memory 235 on storage device 135A, the data structure 254 comprising block metadata for each data block 256 on storage device 135A. In one embodiment, data structure interface 244 may access data structure 254 comprising block metadata for each of data blocks 256 on storage device 135A. The block metadata may include an indication of a block type of each data block and an access operation count value for each data block.


At block 520, method 500 determines, from data structure 254, a first data block on storage device 135A comprising a list of deallocated data blocks on storage device 135A. In one embodiment, data structure interface 244 determines the first data block from the data structure 254 based on the indication of the block type of the first data block. Using the block type indicator stored in the data structure 254, data structure interface 244 may determine a first data block on storage device 135A which stores a list of deallocated data blocks on storage device 135A. This list may include a block number of each deallocated data block and an access operation count value at which each deallocated data block was deallocated and a state value indicating whether the block is known to be in use. The state value may be indicated as “boot” or “future,” where a boot block is known to be in use (because it includes new data) and a future block was, at the time data structure 254 was populated, not in use (because it includes old data). In one embodiment, this list may be referred to as the “frontier set” and the block where it is stored may be given a special block type. The frontier set represents a snapshot in time and permits a future primary controller to correctly recover the state of the drive (and all the clients' data storage). A newly started primary controller can examine this frontier set, and compare the program/erase counts to those on the drive itself. “Future” blocks will become boot blocks if the program/erase count indicates new writes have occurred. “Boot” blocks will generally stay boot blocks, barring unusual events, such as flash errors, etc.


At block 530, method 500 determines whether a data block on the list of deallocated data blocks was previously labeled as comprising old data. In one embodiment, data block interface 246 reads the data from the first data block where the frontier set itself is stored, locates an entry in the frontier set corresponding to a second data block and reads the state value for that entry. As described above, data blocks on storage device 135A comprising old data may be labeled as “future” blocks in the frontier set and data blocks comprising new data may be labeled as “boot” blocks in the frontier set.


If the block was previously labeled as comprising new data (i.e., labeled as “boot” blocks), at block 540, method 500 maintains the previous new data label. Thus, boot blocks stay labeled boot blocks, regardless of whether they have new writes.


If the block was previously labeled as comprising old data, at block 550, method 500 compares a first access operation count value associated with the deallocated data blocks from the data structure 254 to a second access operation count value associated with the deallocated data block from the list of deallocated data blocks. In one embodiment, comparison logic 248 compares the first access operation count value to the second access operation count value associated with the data block. If comparison logic 248 determines that the first access operation count value associated with the second data block from the data structure does not match the second access operation count value associated with the second data block from the list of deallocated data blocks, this indicates that the block has been written with new data since it was added to the list. Thus, at block 540, method 500 labels the second data block on the list as comprising new data. In one embodiment, data block interface 246 changes the state value in the entry of the frontier set corresponding to the data block to “boot.” If the first access operation count value associated with the second data block from the data structure matches the second access operation count value associated with the second data block from the list of deallocated data blocks, however, this indicates that no new data has been written to that block. Thus, at block 560, method 500 labels the second data block on the list as comprising old data. In one embodiment, data block interface 246 maintains the state value in the entry of the frontier set corresponding to the data block as “future.”


In one embodiment, standalone blocks are not tracked by the frontier set. Rather, they can be defined as “all the blocks not listed in the frontier set” (or marked bad). These blocks can be enumerated simply by searching the drive tables for blocks with the correct block type that are not covered by the current frontier set. It should also be noted that all newly allocated blocks may come from the list of blocks written in the frontier set. Otherwise, the newly written data would not be detected as a boot block by a future primary controller. Furthermore, because boot blocks are listed in the frontier set, there can be policy on clients with small storage needs that they will never use standalone blocks. This means that any blocks that contain data with that client's block type, but are not listed in the frontier set, have been deallocated, saving clients the bother of sifting through old data and deallocating those blocks.


Periodically, a new frontier set may be constructed and written to flash. This may be desirable if most of the deallocated blocks in the current frontier set have been exhausted. Writing a new frontier set can make additional blocks ready for allocation. In addition, if a particular client has written and then deallocated many blocks, a new frontier set may be created. In one embodiment, deallocations are not made permanent until a new frontier set is written. Thus, completing that write will limit the number of blocks a future primary will discover. This can affect failover time, so it may be advantageous to limit some clients' block usage. Failover includes the transfer of the “primary” designation from one controller to another, which includes the transfer of the functionality of memory manager 140. Furthermore, when some blocks with attractive (i.e., low) program/erase counts have become deallocated, for best wear leveling behavior, the system tries to use the blocks with the lowest program/erase count first. Persisting a new frontier set can make those more attractive blocks available.


A new frontier set may be constructed by selecting some new future blocks from the list of deallocated blocks and adding the list of known boot blocks. Blocks which were formerly boot blocks in the previous frontier set but have since been graduated may not be recorded in the new frontier set. While persisting a new frontier set, the system need not halt all client allocations and writes. The new frontier set is constructed to contain at least a small number of blocks that are also part of the previous frontier set (and are marked as “future” blocks). In this way the system can continue allocating blocks from this overlapping set, knowing that if it crashes or loses power at any time, memory manager 140 can safely start up using either the new set (assuming the new frontier set write completed) or the old set (if the write did not complete). Each new frontier set may be written to a flash block using a “frontier set” block type value. This assists in quickly locating the frontier set.


Because this design does not persist a new frontier set during every state change, there may be an accumulation of changes that have only taken effect inside the controller memory. These changes can be lost if a failover or power loss occurs. It is possible to lose block deallocations and graduations across these events, however, this does not affect the correctness of the controller application, as it is straightforward to simply repeat these actions on the new primary controller. It is also possible to lose a block allocation if the client did not write to the block in question before the failover or restart. In one embodiment, this problem is alleviated as whatever task that intended to write something will be repeated on the new primary controller and may allocate a new block without being aware that a previous primary controller was attempting a similar operation.


In one embodiment, the storage device firmware 252 provides a list of blocks containing data of a particular block type to memory manager 140. This reduces the work required to locate the frontier set, and allows faster primary controller startup. When writing the frontier set, memory manager 140 also takes the unusual step of physically erasing old copies of frontier sets. This allows faster startup time by ensuring that controller 110 does not have to sift through many old frontier sets looking for the current one. To ensure correct behavior across a surprise block failure, frontier sets may be written to at least two physical locations, preferably spread across different failure domains (e.g. flash dies) to minimize the chance that both fail simultaneously. In one embodiment, the encryption capabilities of storage device firmware 252 are customized such that the data structures needed during startup (e.g. the frontier set) can be written unencrypted, and most client data can be written encrypted.



FIG. 6 illustrates a diagrammatic representation of a machine in the exemplary form of a computer system 600 within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed. In alternative embodiments, the machine may be connected (e.g., networked) to other machines in a local area network (LAN), an intranet, an extranet, or the Internet. The machine may operate in the capacity of a server or a client machine in a client-server network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine may be a personal computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a web appliance, a server, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein. In one embodiment, computer system 600 may be representative of a server, such as storage controller 110 running memory manager 140 or of a client, such as initiator devices 115 or 125.


The exemplary computer system 600 includes a processing device 602, a main memory 604 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM), a static memory 606 (e.g., flash memory, static random access memory (SRAM), etc.), and a data storage device 618, which communicate with each other via a bus 630. Data storage device 618 may be one example of any of the storage devices 135A-n in FIGS. 1 and 2. Any of the signals provided over various buses described herein may be time multiplexed with other signals and provided over one or more common buses. Additionally, the interconnection between circuit components or blocks may be shown as buses or as single signal lines. Each of the buses may alternatively be one or more single signal lines and each of the single signal lines may alternatively be buses.


Processing device 602 represents one or more general-purpose processing devices such as a microprocessor, central processing unit, or the like. More particularly, the processing device may be complex instruction set computing (CISC) microprocessor, reduced instruction set computer (RISC) microprocessor, very long instruction word (VLIW) microprocessor, or processor implementing other instruction sets, or processors implementing a combination of instruction sets. Processing device 602 may also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like. The processing device 602 is configured to execute processing logic 626, which may be one example of memory manager 140 shown in FIGS. 1 and 2, or of initiator application 112 or 122, for performing the operations and steps discussed herein.


The data storage device 618 may include a machine-readable storage medium 628, on which is stored one or more set of instructions 622 (e.g., software) embodying any one or more of the methodologies of functions described herein, including instructions to cause the processing device 602 to execute virtual copy logic 140 or initiator application 112 or 122. The instructions 622 may also reside, completely or at least partially, within the main memory 604 and/or within the processing device 602 during execution thereof by the computer system 600; the main memory 604 and the processing device 602 also constituting machine-readable storage media. The instructions 622 may further be transmitted or received over a network 620 via the network interface device 608.


The machine-readable storage medium 628 may also be used to store instructions to perform a method for efficient flash management for multiple controllers, as described herein. While the machine-readable storage medium 628 is shown in an exemplary embodiment to be a single medium, the term “machine-readable storage medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. A machine-readable medium includes any mechanism for storing information in a form (e.g., software, processing application) readable by a machine (e.g., a computer). The machine-readable medium may include, but is not limited to, magnetic storage medium (e.g., floppy diskette); optical storage medium (e.g., CD-ROM); magneto-optical storage medium; read-only memory (ROM); random-access memory (RAM); erasable programmable memory (e.g., EPROM and EEPROM); flash memory; or another type of medium suitable for storing electronic instructions.


The preceding description sets forth numerous specific details such as examples of specific systems, components, methods, and so forth, in order to provide a good understanding of several embodiments of the present disclosure. It will be apparent to one skilled in the art, however, that at least some embodiments of the present disclosure may be practiced without these specific details. In other instances, well-known components or methods are not described in detail or are presented in simple block diagram format in order to avoid unnecessarily obscuring the present disclosure. Thus, the specific details set forth are merely exemplary. Particular embodiments may vary from these exemplary details and still be contemplated to be within the scope of the present disclosure.


In situations in which the systems discussed herein collect personal information about users, or may make use of personal information, the users may be provided with an opportunity to control whether programs or features collect user information (e.g., information about a user's social network, social actions or activities, profession, a user's preferences, or a user's current location), or to control whether and/or how to receive content from the media server that may be more relevant to the user. In addition, certain data may be treated in one or more ways before it is stored or used, so that personally identifiable information is removed. For example, a user's identity may be treated so that no personally identifiable information can be determined for the user, or a user's geographic location may be generalized where location information is obtained (such as to a city, ZIP code, or state level), so that a particular location of a user cannot be determined. Thus, the user may have control over how information is collected about the user and used by the web server or media server.


Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiments included in at least one embodiment. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. In addition, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or.”


Although the operations of the methods herein are shown and described in a particular order, the order of the operations of each method may be altered so that certain operations may be performed in an inverse order or so that certain operation may be performed, at least in part, concurrently with other operations. In another embodiment, instructions or sub-operations of distinct operations may be in an intermittent and/or alternating manner.

Claims
  • 1. A system comprising: a plurality of storage devices; anda storage controller coupled to the plurality of storage devices, the storage controller comprising a processing device, the processing device configured to: identify a data structure containing metadata associated with a data block of the plurality of storage devices;identify a list of deallocated data blocks for the plurality of storage devices;compare an access operation count value for the data block associated with the data structure with an access operation count value for the data block associated with the list of deallocated blocks; anddetermine whether the data block is available for writing additional data based on comparison of access operation count values.
  • 2. The system of claim 1, wherein identification of the data structure containing metadata is responsive to a restart of the system.
  • 3. The system of claim 1, wherein the data structure containing metadata is a dynamic table comprising block metadata for each data block on the storage device.
  • 4. The system of claim 3, wherein the block metadata comprises an indication of a block type of each data block on the storage device and a corresponding program erase count value for each data block on the storage device.
  • 5. The system of claim 4, wherein the data block is identified from the dynamic table based on the indication of the block type of the data block.
  • 6. The system of claim 1, wherein the data structure containing metadata is a frontier set.
  • 7. The system of claim 1, wherein the processing device is configured to remove the data block from the list of deallocated data blocks in response to the access operation counts not matching.
  • 8. A method comprising: identifying a data structure containing metadata associated with a data block of the plurality of storage devices;identifying a list of deallocated data blocks for the plurality of storage devices;comparing an access operation count value for the data block associated with the data structure with an access operation count value for the data block associated with the list of deallocated blocks; anddetermining whether the data block is available for writing additional data based on comparison of access operation count values.
  • 9. The method of claim 8, wherein identification of the data structure containing metadata is responsive to a restart of the system.
  • 10. The method of claim 8, wherein the data structure containing metadata is a dynamic table comprising block metadata for each data block on the storage device.
  • 11. The method of claim 10, wherein the block metadata comprises an indication of a block type of each data block on the storage device and a corresponding program erase count value for each data block on the storage device.
  • 12. The method of claim 11, wherein the data block is identified from the dynamic table based on the indication of the block type of the data block.
  • 13. The method of claim 8, wherein the data structure containing metadata is a frontier set.
  • 14. The method of claim 8, wherein the processing device is configured to remove the data block from the list of deallocated data blocks in response to the access operation counts not matching.
  • 15. A non-transitory computer readable storage medium comprising instructions which, when executed by a processing device, cause the processing device to: identify a data structure containing metadata associated with a data block of the plurality of storage devices;identify a list of deallocated data blocks for the plurality of storage devices;compare an access operation count value for the data block associated with the data structure with an access operation count value for the data block associated with the list of deallocated blocks; anddetermine whether the data block is available for writing additional data based on comparison of access operation count values.
  • 16. The non-transitory computer readable storage medium of claim 15, wherein identification of the data structure containing metadata is responsive to a restart of the system.
  • 17. The non-transitory computer readable storage medium of claim 15, wherein the data structure containing metadata is a dynamic table comprising block metadata for each data block on the storage device.
  • 18. The non-transitory computer readable storage medium of claim 17, wherein the block metadata comprises an indication of a block type of each data block on the storage device and a corresponding program erase count value for each data block on the storage device.
  • 19. The non-transitory computer readable storage medium of claim 18, wherein the data block is identified from the dynamic table based on the indication of the block type of the data block.
  • 20. The non-transitory computer readable storage medium of claim 15, wherein the data structure containing metadata is a frontier set.
CROSS REFERENCE TO RELATED APPLICATIONS

This is a continuation application for patent entitled to a filing date and claiming the benefit of earlier-filed U.S. patent application Ser. No. 17/401,436, filed Aug. 13, 2021, which is a continuation of U.S. Pat. No. 11,119,657, issued Sep. 14, 2021, which is a continuation of U.S. Pat. No. 10,481,798, issued Nov. 19, 2019, each of which are hereby incorporated by reference in their entirety.

US Referenced Citations (484)
Number Name Date Kind
5390327 Lubbers et al. Feb 1995 A
5450581 Bergen et al. Sep 1995 A
5479653 Jones Dec 1995 A
5488731 Mendelsohn Jan 1996 A
5504858 Ellis et al. Apr 1996 A
5564113 Bergen et al. Oct 1996 A
5574882 Menon et al. Nov 1996 A
5649093 Hanko et al. Jul 1997 A
5883909 DeKoning et al. Mar 1999 A
6000010 Legg Dec 1999 A
6260156 Garvin et al. Jul 2001 B1
6269453 Krantz Jul 2001 B1
6275898 DeKoning Aug 2001 B1
6453428 Stephenson Sep 2002 B1
6523087 Busser Feb 2003 B2
6535417 Tsuda et al. Mar 2003 B2
6643748 Wieland Nov 2003 B1
6725392 Frey et al. Apr 2004 B1
6763455 Hall Jul 2004 B2
6836816 Kendall Dec 2004 B2
6985995 Holland et al. Jan 2006 B2
7032125 Holt et al. Apr 2006 B2
7047358 Lee et al. May 2006 B2
7051155 Talagala et al. May 2006 B2
7055058 Lee et al. May 2006 B2
7065617 Wang Jun 2006 B2
7069383 Yamamoto et al. Jun 2006 B2
7076606 Orsley Jul 2006 B2
7107480 Moshayedi et al. Sep 2006 B1
7159150 Kenchammana-Hosekote et al. Jan 2007 B2
7162575 Dalal et al. Jan 2007 B2
7164608 Lee Jan 2007 B2
7188270 Nanda et al. Mar 2007 B1
7334156 Land et al. Feb 2008 B2
7370220 Nguyen et al. May 2008 B1
7386666 Beauchamp et al. Jun 2008 B1
7398285 Kisley Jul 2008 B2
7424498 Patterson Sep 2008 B1
7424592 Karr et al. Sep 2008 B1
7444532 Masuyama et al. Oct 2008 B2
7480658 Heinla et al. Jan 2009 B2
7484056 Madnani et al. Jan 2009 B2
7484057 Madnani et al. Jan 2009 B1
7484059 Ofer et al. Jan 2009 B1
7536506 Ashmore et al. May 2009 B2
7558859 Kasiolas et al. Jul 2009 B2
7565446 Talagala et al. Jul 2009 B2
7613947 Coatney et al. Nov 2009 B1
7634617 Misra Dec 2009 B2
7634618 Misra Dec 2009 B2
7681104 Sim-Tang et al. Mar 2010 B1
7681105 Sim-Tang et al. Mar 2010 B1
7681109 Litsyn et al. Mar 2010 B2
7730257 Franklin Jun 2010 B2
7730258 Smith et al. Jun 2010 B1
7730274 Usgaonkar Jun 2010 B1
7743276 Jacobson et al. Jun 2010 B2
7752489 Deenadhayalan et al. Jul 2010 B2
7757038 Kitahara Jul 2010 B2
7757059 Ofer et al. Jul 2010 B1
7778960 Chatterjee et al. Aug 2010 B1
7783955 Murin Aug 2010 B2
7814272 Barrall et al. Oct 2010 B2
7814273 Barrall Oct 2010 B2
7818531 Barrall Oct 2010 B2
7827351 Suetsugu et al. Nov 2010 B2
7827439 Mathew et al. Nov 2010 B2
7831768 Ananthamurthy et al. Nov 2010 B2
7856583 Smith Dec 2010 B1
7870105 Arakawa et al. Jan 2011 B2
7873878 Belluomini et al. Jan 2011 B2
7885938 Greene et al. Feb 2011 B1
7886111 Klemm et al. Feb 2011 B2
7908448 Chatterjee et al. Mar 2011 B1
7916538 Jeon et al. Mar 2011 B2
7921268 Jakob Apr 2011 B2
7930499 Duchesne Apr 2011 B2
7941697 Mathew et al. May 2011 B2
7958303 Shuster Jun 2011 B2
7971129 Watson et al. Jun 2011 B2
7975115 Wayda et al. Jul 2011 B2
7984016 Kisley Jul 2011 B2
7991822 Bish et al. Aug 2011 B2
8006126 Deenadhayalan et al. Aug 2011 B2
8010485 Chatterjee et al. Aug 2011 B1
8010829 Chatterjee et al. Aug 2011 B1
8020047 Courtney Sep 2011 B2
8046548 Chatterjee et al. Oct 2011 B1
8051361 Sim-Tang et al. Nov 2011 B2
8051362 Li et al. Nov 2011 B2
8074038 Lionetti et al. Dec 2011 B2
8082393 Galloway et al. Dec 2011 B2
8086603 Nasre et al. Dec 2011 B2
8086634 Mimatsu Dec 2011 B2
8086911 Taylor Dec 2011 B1
8090837 Shin et al. Jan 2012 B2
8108502 Tabbara et al. Jan 2012 B2
8117388 Jernigan, IV Feb 2012 B2
8117521 Parker et al. Feb 2012 B2
8140821 Raizen et al. Mar 2012 B1
8145838 Miller et al. Mar 2012 B1
8145840 Koul et al. Mar 2012 B2
8175012 Chu et al. May 2012 B2
8176360 Frost et al. May 2012 B2
8176405 Hafner et al. May 2012 B2
8180855 Aiello et al. May 2012 B2
8200922 McKean et al. Jun 2012 B2
8209469 Carpenter et al. Jun 2012 B2
8225006 Karamcheti Jul 2012 B1
8239618 Kotzur et al. Aug 2012 B2
8244999 Chatterjee et al. Aug 2012 B1
8261016 Goel Sep 2012 B1
8271455 Kesselman Sep 2012 B2
8285686 Kesselman Oct 2012 B2
8305811 Jeon Nov 2012 B2
8315999 Chatley et al. Nov 2012 B2
8327080 Der Dec 2012 B1
8335769 Kesselman Dec 2012 B2
8341118 Drobychev et al. Dec 2012 B2
8351290 Huang et al. Jan 2013 B1
8364920 Parkison et al. Jan 2013 B1
8365041 Olbrich et al. Jan 2013 B2
8375146 Sinclair Feb 2013 B2
8397016 Talagala et al. Mar 2013 B2
8402152 Duran Mar 2013 B2
8412880 Leibowitz et al. Apr 2013 B2
8423739 Ash et al. Apr 2013 B2
8429436 Fillingim et al. Apr 2013 B2
8452928 Ofer et al. May 2013 B1
8473698 Lionetti et al. Jun 2013 B2
8473778 Simitci et al. Jun 2013 B2
8473815 Chung et al. Jun 2013 B2
8479037 Chatterjee et al. Jul 2013 B1
8484414 Sugimoto et al. Jul 2013 B2
8498967 Chatterjee et al. Jul 2013 B1
8504797 Mimatsu Aug 2013 B2
8522073 Cohen Aug 2013 B2
8533408 Madnani et al. Sep 2013 B1
8533527 Daikokuya et al. Sep 2013 B2
8539177 Madnani et al. Sep 2013 B1
8544029 Bakke et al. Sep 2013 B2
8549224 Zeryck et al. Oct 2013 B1
8583861 Ofer et al. Nov 2013 B1
8589625 Colgrove et al. Nov 2013 B2
8595455 Chatterjee et al. Nov 2013 B2
8615599 Takefman et al. Dec 2013 B1
8627136 Shankar et al. Jan 2014 B2
8627138 Clark et al. Jan 2014 B1
8639669 Douglis et al. Jan 2014 B1
8639863 Kanapathippillai et al. Jan 2014 B1
8640000 Cypher Jan 2014 B1
8650343 Kanapathippillai et al. Feb 2014 B1
8660131 Vermunt et al. Feb 2014 B2
8661218 Piszczek et al. Feb 2014 B1
8671072 Shah et al. Mar 2014 B1
8689042 Kanapathippillai et al. Apr 2014 B1
8700875 Barron et al. Apr 2014 B1
8706694 Chatterjee et al. Apr 2014 B2
8706914 Duchesneau Apr 2014 B2
8706932 Kanapathippillai et al. Apr 2014 B1
8712963 Douglis et al. Apr 2014 B1
8713405 Healey, Jr. et al. Apr 2014 B2
8719621 Karmarkar May 2014 B1
8725730 Keeton et al. May 2014 B2
8751859 Becker-Szendy et al. Jun 2014 B2
8756387 Frost et al. Jun 2014 B2
8762793 Grube et al. Jun 2014 B2
8769232 Suryabudi et al. Jul 2014 B2
8775858 Gower et al. Jul 2014 B2
8775868 Colgrove et al. Jul 2014 B2
8788913 Xin et al. Jul 2014 B1
8793447 Usgaonkar et al. Jul 2014 B2
8799746 Baker et al. Aug 2014 B2
8819311 Liao Aug 2014 B2
8819383 Jobanputra et al. Aug 2014 B1
8822155 Sukumar et al. Sep 2014 B2
8824261 Miller et al. Sep 2014 B1
8832528 Thatcher et al. Sep 2014 B2
8838541 Camble et al. Sep 2014 B2
8838892 Li Sep 2014 B2
8843700 Salessi et al. Sep 2014 B1
8850108 Hayes et al. Sep 2014 B1
8850288 Lazier et al. Sep 2014 B1
8856593 Eckhardt et al. Oct 2014 B2
8856619 Cypher Oct 2014 B1
8862617 Kesselman Oct 2014 B2
8862847 Feng et al. Oct 2014 B2
8862928 Xavier et al. Oct 2014 B2
8868825 Hayes et al. Oct 2014 B1
8874836 Hayes et al. Oct 2014 B1
8880793 Nagineni Nov 2014 B2
8880825 Lionetti et al. Nov 2014 B2
8886778 Nedved et al. Nov 2014 B2
8898383 Yamamoto et al. Nov 2014 B2
8898388 Kimmel Nov 2014 B1
8904231 Coatney et al. Dec 2014 B2
8918478 Ozzie et al. Dec 2014 B2
8930307 Colgrove et al. Jan 2015 B2
8930633 Amit et al. Jan 2015 B2
8943357 Atzmony Jan 2015 B2
8949502 McKnight et al. Feb 2015 B2
8959110 Smith et al. Feb 2015 B2
8959388 Kuang et al. Feb 2015 B1
8972478 Storer et al. Mar 2015 B1
8972779 Lee et al. Mar 2015 B2
8977597 Ganesh et al. Mar 2015 B2
8996828 Kalos et al. Mar 2015 B2
9003144 Hayes et al. Apr 2015 B1
9009724 Gold et al. Apr 2015 B2
9021053 Bembo et al. Apr 2015 B2
9021215 Meir et al. Apr 2015 B2
9025393 Wu et al. May 2015 B2
9043372 Makkar et al. May 2015 B2
9047214 Northcott Jun 2015 B1
9053808 Sprouse et al. Jun 2015 B2
9058155 Cepulis et al. Jun 2015 B2
9063895 Madnani et al. Jun 2015 B1
9063896 Madnani et al. Jun 2015 B1
9098211 Madnani et al. Aug 2015 B1
9110898 Chamness et al. Aug 2015 B1
9110964 Shilane et al. Aug 2015 B1
9116819 Cope et al. Aug 2015 B2
9117536 Yoon et al. Aug 2015 B2
9122401 Zaltsman et al. Sep 2015 B2
9123422 Yu et al. Sep 2015 B2
9124300 Sharon et al. Sep 2015 B2
9134908 Horn et al. Sep 2015 B2
9153337 Sutardja Oct 2015 B2
9158472 Kesselman et al. Oct 2015 B2
9159422 Lee et al. Oct 2015 B1
9164891 Karamcheti et al. Oct 2015 B2
9183136 Kawamura et al. Nov 2015 B2
9189650 Jaye et al. Nov 2015 B2
9201733 Verma et al. Dec 2015 B2
9207876 Shu et al. Dec 2015 B2
9229656 Contreras et al. Jan 2016 B1
9229810 He et al. Jan 2016 B2
9235475 Shilane et al. Jan 2016 B1
9244626 Shah et al. Jan 2016 B2
9250999 Barroso Feb 2016 B1
9251066 Colgrove et al. Feb 2016 B2
9268648 Barash et al. Feb 2016 B1
9268806 Kesselman Feb 2016 B1
9280678 Redberg Mar 2016 B2
9286002 Karamcheti et al. Mar 2016 B1
9292214 Kalos et al. Mar 2016 B2
9298760 Li et al. Mar 2016 B1
9304908 Karamcheti et al. Apr 2016 B1
9311969 Sharon et al. Apr 2016 B2
9311970 Sharon et al. Apr 2016 B2
9323663 Karamcheti et al. Apr 2016 B2
9323667 Bennett Apr 2016 B2
9323681 Apostolides et al. Apr 2016 B2
9335942 Kumar et al. May 2016 B2
9348538 Mallaiah et al. May 2016 B2
9355022 Ravimohan et al. May 2016 B2
9384082 Lee et al. Jul 2016 B1
9384252 Akirav et al. Jul 2016 B2
9389958 Sundaram et al. Jul 2016 B2
9390019 Patterson et al. Jul 2016 B2
9395922 Nishikido et al. Jul 2016 B2
9396202 Drobychev et al. Jul 2016 B1
9400828 Kesselman et al. Jul 2016 B2
9405478 Koseki et al. Aug 2016 B2
9411685 Lee Aug 2016 B2
9417960 Cai et al. Aug 2016 B2
9417963 He et al. Aug 2016 B2
9430250 Hamid et al. Aug 2016 B2
9430542 Akirav et al. Aug 2016 B2
9432541 Ishida Aug 2016 B2
9454434 Sundaram et al. Sep 2016 B2
9471579 Natanzon Oct 2016 B1
9477554 Hayes et al. Oct 2016 B2
9477632 Du Oct 2016 B2
9501398 George et al. Nov 2016 B2
9525737 Friedman Dec 2016 B2
9529542 Friedman et al. Dec 2016 B2
9535631 Fu et al. Jan 2017 B2
9552248 Miller et al. Jan 2017 B2
9552291 Munetoh et al. Jan 2017 B2
9552299 Stalzer Jan 2017 B2
9563517 Natanzon et al. Feb 2017 B1
9588698 Karamcheti et al. Mar 2017 B1
9588712 Kalos et al. Mar 2017 B2
9594652 Sathiamoorthy et al. Mar 2017 B1
9600193 Ahrens et al. Mar 2017 B2
9619321 Haratsch et al. Apr 2017 B1
9619430 Kannan et al. Apr 2017 B2
9645754 Li et al. May 2017 B2
9667720 Bent et al. May 2017 B1
9710535 Aizman et al. Jul 2017 B2
9733840 Karamcheti et al. Aug 2017 B2
9734225 Akirav et al. Aug 2017 B2
9740403 Storer et al. Aug 2017 B2
9740700 Chopra et al. Aug 2017 B1
9740762 Horowitz et al. Aug 2017 B2
9747319 Bestler et al. Aug 2017 B2
9747320 Kesselman Aug 2017 B2
9767130 Bestler et al. Sep 2017 B2
9781227 Friedman et al. Oct 2017 B2
9785498 Misra et al. Oct 2017 B2
9798486 Singh Oct 2017 B1
9804925 Carmi et al. Oct 2017 B1
9811285 Karamcheti et al. Nov 2017 B1
9811546 Bent et al. Nov 2017 B1
9818478 Chung Nov 2017 B2
9829066 Thomas et al. Nov 2017 B2
9836245 Hayes et al. Dec 2017 B2
9891854 Munetoh et al. Feb 2018 B2
9891860 Delgado et al. Feb 2018 B1
9892005 Kedem et al. Feb 2018 B2
9892186 Akirav et al. Feb 2018 B2
9904589 Donlan et al. Feb 2018 B1
9904717 Anglin et al. Feb 2018 B2
9910748 Pan Mar 2018 B2
9910904 Anglin et al. Mar 2018 B2
9934237 Shilane et al. Apr 2018 B1
9940065 Kalos et al. Apr 2018 B2
9946604 Glass Apr 2018 B1
9952809 Shah Apr 2018 B2
9959167 Donlan et al. May 2018 B1
9965539 D'Halluin et al. May 2018 B2
9998539 Brock et al. Jun 2018 B1
10007457 Hayes et al. Jun 2018 B2
10013177 Liu et al. Jul 2018 B2
10013311 Sundaram et al. Jul 2018 B2
10019314 Yang et al. Jul 2018 B2
10019317 Usvyatsky et al. Jul 2018 B2
10031703 Natanzon et al. Jul 2018 B1
10061512 Lin Aug 2018 B2
10073626 Karamcheti et al. Sep 2018 B2
10082985 Hayes et al. Sep 2018 B2
10089012 Chen et al. Oct 2018 B1
10089174 Yang Oct 2018 B2
10089176 Donlan et al. Oct 2018 B1
10108819 Donlan et al. Oct 2018 B1
10146787 Bashyam et al. Dec 2018 B2
10152268 Chakraborty et al. Dec 2018 B1
10157098 Yang et al. Dec 2018 B2
10162704 Kirschner et al. Dec 2018 B1
10180875 Klein Jan 2019 B2
10185730 Bestler et al. Jan 2019 B2
10235065 Miller et al. Mar 2019 B1
10324639 Seo Jun 2019 B2
10481798 Doshi et al. Nov 2019 B2
10567406 Astigarraga et al. Feb 2020 B2
10846137 Vallala et al. Nov 2020 B2
10877683 Wu et al. Dec 2020 B2
11076509 Alissa et al. Jul 2021 B2
11106810 Natanzon et al. Aug 2021 B2
11119657 Doshi et al. Sep 2021 B2
11194707 Stalzer Dec 2021 B2
20020144059 Kendall Oct 2002 A1
20030105984 Masuyama et al. Jun 2003 A1
20030110205 Johnson Jun 2003 A1
20040161086 Buntin et al. Aug 2004 A1
20050001652 Malik et al. Jan 2005 A1
20050076228 Davis et al. Apr 2005 A1
20050235132 Karr et al. Oct 2005 A1
20050278460 Shin et al. Dec 2005 A1
20050283649 Turner et al. Dec 2005 A1
20060015683 Ashmore et al. Jan 2006 A1
20060114930 Lucas et al. Jun 2006 A1
20060174157 Barrall et al. Aug 2006 A1
20060248294 Nedved et al. Nov 2006 A1
20070079068 Draggon Apr 2007 A1
20070214194 Reuter Sep 2007 A1
20070214314 Reuter Sep 2007 A1
20070234016 Davis et al. Oct 2007 A1
20070268905 Baker et al. Nov 2007 A1
20080080709 Michtchenko et al. Apr 2008 A1
20080107274 Worthy May 2008 A1
20080155191 Anderson et al. Jun 2008 A1
20080256141 Wayda et al. Oct 2008 A1
20080295118 Liao Nov 2008 A1
20090077208 Nguyen et al. Mar 2009 A1
20090138654 Sutardja May 2009 A1
20090216910 Duchesneau Aug 2009 A1
20090216920 Lauterbach et al. Aug 2009 A1
20100017444 Chatterjee et al. Jan 2010 A1
20100042636 Lu Feb 2010 A1
20100094806 Apostolides et al. Apr 2010 A1
20100115070 Missimilly May 2010 A1
20100125695 Wu et al. May 2010 A1
20100162076 Sim-Tang et al. Jun 2010 A1
20100169707 Mathew et al. Jul 2010 A1
20100174576 Naylor Jul 2010 A1
20100223423 Sinclair Sep 2010 A1
20100268908 Ouyang et al. Oct 2010 A1
20100306500 Mimatsu Dec 2010 A1
20110035540 Fitzgerald et al. Feb 2011 A1
20110040925 Frost et al. Feb 2011 A1
20110060927 Fillingim et al. Mar 2011 A1
20110119462 Leach et al. May 2011 A1
20110219170 Frost et al. Sep 2011 A1
20110238625 Hamaguchi et al. Sep 2011 A1
20110264843 Haines et al. Oct 2011 A1
20110302369 Goto et al. Dec 2011 A1
20120011398 Eckhardt et al. Jan 2012 A1
20120079318 Colgrove et al. Mar 2012 A1
20120089567 Takahashi et al. Apr 2012 A1
20120110249 Jeong et al. May 2012 A1
20120131253 McKnight et al. May 2012 A1
20120158923 Mohamed et al. Jun 2012 A1
20120191900 Kunimatsu et al. Jul 2012 A1
20120198152 Terry et al. Aug 2012 A1
20120198261 Brown et al. Aug 2012 A1
20120209943 Jung Aug 2012 A1
20120226934 Rao Sep 2012 A1
20120246435 Meir et al. Sep 2012 A1
20120260055 Murase Oct 2012 A1
20120311557 Resch Dec 2012 A1
20130022201 Glew et al. Jan 2013 A1
20130036314 Glew et al. Feb 2013 A1
20130042056 Shats et al. Feb 2013 A1
20130060884 Bernbo et al. Mar 2013 A1
20130067188 Mehra et al. Mar 2013 A1
20130073894 Xavier et al. Mar 2013 A1
20130124776 Hallak et al. May 2013 A1
20130132800 Healey, Jr. et al. May 2013 A1
20130151653 Sawicki et al. Jun 2013 A1
20130151771 Tsukahara et al. Jun 2013 A1
20130173853 Ungureanu et al. Jul 2013 A1
20130238554 Yucel et al. Sep 2013 A1
20130339314 Carpentier et al. Dec 2013 A1
20130339635 Amit et al. Dec 2013 A1
20130339818 Baker et al. Dec 2013 A1
20140040535 Lee et al. Feb 2014 A1
20140040702 He et al. Feb 2014 A1
20140047263 Coatney et al. Feb 2014 A1
20140047269 Kim Feb 2014 A1
20140063721 Herman et al. Mar 2014 A1
20140064048 Cohen et al. Mar 2014 A1
20140068224 Fan et al. Mar 2014 A1
20140075252 Luo et al. Mar 2014 A1
20140122510 Namkoong et al. May 2014 A1
20140136880 Shankar et al. May 2014 A1
20140181402 White Jun 2014 A1
20140220561 Sukumar et al. Aug 2014 A1
20140237164 Le et al. Aug 2014 A1
20140279936 Bernbo et al. Sep 2014 A1
20140280025 Eidson et al. Sep 2014 A1
20140289588 Nagadomi et al. Sep 2014 A1
20140330785 Isherwood et al. Nov 2014 A1
20140372838 Lou et al. Dec 2014 A1
20140380125 Calder et al. Dec 2014 A1
20140380126 Yekhanin et al. Dec 2014 A1
20150032720 James Jan 2015 A1
20150039645 Lewis Feb 2015 A1
20150039849 Lewis Feb 2015 A1
20150089283 Kermarrec et al. Mar 2015 A1
20150100746 Rychlik et al. Apr 2015 A1
20150134824 Mickens et al. May 2015 A1
20150153800 Lucas et al. Jun 2015 A1
20150154418 Redberg Jun 2015 A1
20150180714 Chunn et al. Jun 2015 A1
20150280959 Vincent Oct 2015 A1
20160026397 Nishikido et al. Jan 2016 A1
20160182542 Staniford Jun 2016 A1
20160191508 Bestler et al. Jun 2016 A1
20160246537 Kim Aug 2016 A1
20160248631 Duchesneau Aug 2016 A1
20160378612 Hipsh et al. Dec 2016 A1
20170091236 Hayes et al. Mar 2017 A1
20170103092 Hu et al. Apr 2017 A1
20170103094 Hu et al. Apr 2017 A1
20170103098 Hu et al. Apr 2017 A1
20170103116 Hu et al. Apr 2017 A1
20170177236 Haratsch et al. Jun 2017 A1
20170262202 Seo Sep 2017 A1
20180039442 Shadrin et al. Feb 2018 A1
20180054454 Astigarraga et al. Feb 2018 A1
20180081958 Akirav et al. Mar 2018 A1
20180101441 Hyun et al. Apr 2018 A1
20180101587 Anglin et al. Apr 2018 A1
20180101588 Anglin et al. Apr 2018 A1
20180217756 Liu et al. Aug 2018 A1
20180307560 Vishnumolakala et al. Oct 2018 A1
20180321874 Li et al. Nov 2018 A1
20190036703 Bestler Jan 2019 A1
20190220315 Vallala et al. Jul 2019 A1
20200034560 Natanzon et al. Jan 2020 A1
20200326871 Wu et al. Oct 2020 A1
20210360833 Alissa et al. Nov 2021 A1
Foreign Referenced Citations (6)
Number Date Country
2164006 Mar 2010 EP
2256621 Dec 2010 EP
0213033 Feb 2002 WO
2008103569 Aug 2008 WO
2008157081 Dec 2008 WO
2013032825 Mar 2013 WO
Non-Patent Literature Citations (24)
Entry
Hwang et al., “RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing”, Proceedings of The Ninth International Symposium on High-performance Distributed Computing, Aug. 2000, pp. 279-286, The Ninth International Symposium on High-Performance Distributed Computing, IEEE Computer Society, Los Alamitos, CA.
International Search Report and Written Opinion, PCT/US2015/018169, dated May 15, 2015, 10 pages.
International Search Report and Written Opinion, PCT/US2015/034291, dated Sep. 30, 2015, 3 pages.
International Search Report and Written Opinion, PCT/US2015/034302, dated Sep. 11, 2015, 10 pages.
International Search Report and Written Opinion, PCT/US2015/039135, dated Sep. 18, 2015, 8 pages.
International Search Report and Written Opinion, PCT/US2015/039136, dated Sep. 23, 2015, 7 pages.
International Search Report and Written Opinion, PCT/US2015/039137, dated Oct. 1, 2015, 8 pages.
International Search Report and Written Opinion, PCT/US2015/039142, dated Sep. 24, 2015, 3 pages.
International Search Report and Written Opinion, PCT/US2015/044370, dated Dec. 15, 2015, 3 pages.
International Search Report and Written Opinion, PCT/US2016/014356, dated Jun. 28, 2016, 3 pages.
International Search Report and Written Opinion, PCT/US2016/014357, dated Jun. 29, 2016, 3 pages.
International Search Report and Written Opinion, PCT/US2016/014361, dated May 30, 2016, 3 pages.
International Search Report and Written Opinion, PCT/US2016/014604, dated May 19, 2016, 3 pages.
International Search Report and Written Opinion, PCT/US2016/016504, dated Jul. 6, 2016, 7 pages.
International Search Report and Written Opinion, PCT/US2016/023485, dated Jul. 21, 2016, 13 pages.
International Search Report and Written Opinion, PCT/US2016/024391, dated Jul. 12, 2016, 11 pages.
International Search Report and Written Opinion, PCT/US2016/026529, dated Jul. 19, 2016, 9 pages.
International Search Report and Written Opinion, PCT/US2016/031039, dated Aug. 18, 2016, 7 pages.
International Search Report and Written Opinion, PCT/US2016/033306, dated Aug. 19, 2016, 11 pages.
International Search Report and Written Opinion, PCT/US2016/047808, dated Nov. 25, 2016, 14 pages.
Kim et al., “Data Access Frequency based Data Replication Method using Erasure Codes in Cloud Storage System”, Journal of the Institute of Electronics and Information Engineers, Feb. 2014, vol. 51, No. 2, 7 pages.
Schmid, “RAID Scaling Charts, Part 3:4-128 KB Stripes Compared”, Tom's Hardware, Nov. 27, 2007, URL: http://www.tomshardware.com/reviews/RAID-SCALING-CHARTS.1735-4.html, 24 pages.
Stalzer, “FlashBlades: System Architecture and Applications”, Proceedings of the 2nd Workshop on Architectures and Systems for Big Data, Jun. 2012, pp. 10-14, Association for Computing Machinery, New York, NY.
Storer et al., “Pergamum: Replacing Tape with Energy Efficient, Reliable, Disk-Based Archival Storage”, FAST'08: Proceedings of the 6th USENIX Conference on File and Storage Technologies, Article No. 1, Feb. 2008, pp. 1-16, USENIX Association, Berkeley, CA.
Related Publications (1)
Number Date Country
20230244382 A1 Aug 2023 US
Continuations (3)
Number Date Country
Parent 17401436 Aug 2021 US
Child 18296880 US
Parent 16655792 Oct 2019 US
Child 17401436 US
Parent 15337151 Oct 2016 US
Child 16655792 US