Low-overhead index for a flash cache

Description

FIELD OF THE INVENTION

Embodiments of the invention relate to systems and methods for managing memory in a computing environment. More particularly, embodiments of the invention relate to systems and methods for implementing an index for a memory device such as a flash cache.

BACKGROUND

In order to improve the performance of computing systems, caches are often implemented. A computing system can involve a single cache or tiered cache levels. Further, the cache can be large. For example, a computing system may use a flash cache to cache data. An index may be used to track the data stored in the flash cache. The index may associate a location of the data with an identifier of the data. When data is accessed (e.g., read or written), the index is consulted using a lookup operation. Because a flash cache can be large, the index may also be large. Unfortunately, maintaining a large index consumes a significant portion of memory. Systems and methods are needed to implement a low-overhead index for a cache such as a flash cache.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to describe the manner in which at least some aspects of this disclosure can be obtained, a more particular description will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only example embodiments of the invention and are not therefore to be considered to be limiting of its scope, embodiments of the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings, in which:

FIG. 1 illustrates an example of a computing environment that includes an index used to access content stored in a cache;

FIG. 2 illustrates an example of the index used to access the cache and illustrates that entries in the index include a short identifier and at least one other metadata or field;

FIG. 3 illustrates an example of a method for accessing data stored in the cache using the index; and

FIG. 4 illustrates an example of a method for writing data to the cache using the index.

DETAILED DESCRIPTION OF SOME EXAMPLE EMBODIMENTS

Embodiments of the invention relate to cache indexing and more specifically to cache indexing using an index that includes shortened or partial identifiers (for example using the first 4 bytes of a 20 byte SHA-1 hash). By way of example, a SHA-1 hash is an example of an identifier and a portion of the hash (short hash) is an example of a short identifier. Keys, cryptographic hashes, fingerprints, and the likes are examples of identifiers and short identifiers can be obtained therefrom. In addition to the partial or short identifiers, the index may include additional information, such as segment size or segment type. The additional information can be used to increase the probability that a matched partial identifier in the index corresponds with the data segment that has actually been requested. The additional information can also ensure that the most likely data segment is identified when more than one partial identifier matches the corresponding part of the identifier of a requested data segment. In addition, an asymmetric cache response time is provided and cache misses are very fast. Further, a cache hit or match is identified as probable based on the short identifier, and the cache hit is then confirmed by reading the segment and validating that the correct identifier is stored with the segment.

Embodiments of the invention may be implemented in de-duplicated storage systems and other storage systems. An identifier such as a hash (e.g., SHA-1) can be used to uniquely identify the content of a data segment. These hashes are examples of identifiers that are used to identify segments or data segments. Identifiers can be stored in an index (e.g., a hash table) and when a new data segment arrives to be included in the storage system, the identifier is calculated and looked up in the hash table. If a previous entry exists in the hash table, indicating that the data segment is in the storage system, then the new segment does not need to be stored in the storage system because it has been identified as a duplicate of a previous data segment. The index is also consulted to read back data associated with an identifier.

In order to have a very high probability that identifiers are unique, such that only the exact same data segment would produce the same identifier, the identifiers need to be rather large in size (by way of example on the order of 20 bytes). But large identifiers have a drawback in terms of the memory or storage required to store them.

When indexing a flash cache (or other storage), the index to the data segments stored in the cache may be stored in a faster memory for performance reasons. However, large identifier sizes limit how many data segments can be referenced in the memory. Embodiments of the invention relate to an index that includes only a portion of the identifier (a short identifier). Using a short identifier allows more data segments to be referenced using less memory. However, short identifiers may result in collisions when using the index. A collision occurs, by way of example, when more than one distinct data segments have the same short identifier and are thus incorrectly determined to be a match. Collisions are acceptable as long as they can be detected and handled.

The index may include other information or metadata. The metadata may include, by way of example, a data segment location in the flash cache, segment size, segment type, path, or the like or combination thereof) in order to resolve the collisions. This information allows a collision to be disambiguated. For example, a read request typically knows the size of the segment to be read. In a system where segment sizes are different (e.g., based on content defined patterns used to anchor the segment boundaries), the size of the data segment in combination with the short identifier can disambiguate a collision. When writing data to a flash cache, the short-identifier and size (or other characteristic) can be used in combination to determine if the data segment being written is already in the cache. Even though there is still a possibility of a collision when using the short identifier in combination with other information, the possibility of collision is smaller. As another example, a storage system may use a long string to identify a file with its path, and a partial identifier would be a hash of the string shortened to a specified number of bytes. The short hash may collide, while the long string is unique. A secondary metadata may be stored in the index such as the file size or owner's ID number, which can be used to reduce the chance of collisions.

When performing a write to the cache, it is not imperative to insert the data segment into the cache. If an insertion operation cannot be disambiguated at the index, it is not necessary to perform the insertion. Alternatively, a data segment can be inserted as a new data segment and the previous entry can be marked for deletion.

Generally, the index is used during various operations such as read and write operations. In one example, a client may issue a lookup request using an identifier such as a fingerprint, key, or the like. The index is checked using a portion of the identifier and using at least one secondary metadata. If a combination of the short identifier and the other secondary metadata is not present in the index, a miss is reported (report false) and returned to the client. If the combination of the short identifier and other secondary metadata exists in the index, an asynchronous read of the data from the cache may be issued and a provisional true result is returned to the client. Thus, true may be provisionally reported even if the data segment has not been read from the cache. This response indicates that there will be a call back from the asynchronous read and that the result of the read may be either true or false. A return of true from the asynchronous read would indicate the data is cached and being returned. A return of false from the asynchronous read would indicate that upon reading the data it was determined to be a collision case where the combination of short identifier and other secondary metadata also match a different data segment.

When the cache returns with the data segment and the full identifier, the full identifier can be compared with the identifier included in the initial request to determine whether the identifier returned from the cache matches the identifier included in the lookup. If the match is false, a false report is returned to the client and if the match is true, a true report is reported to the client along with the requested data.

By sizing the short identifier appropriately, a short identifier match indicates, with high probability, that the data segment is located in the cache. Embodiments of the invention can identify a miss quickly and with certainty when the short identifier is not found in the index. Embodiments of the invention are described in the context of a cache such as a flash cache and an index stored in memory. However, one of skill in the art can appreciate that embodiments of the invention may be applied to other storage configurations. The data segments, for example, may be stored in hard disk drives, which are much slower than a flash cache. The data segments may also be stored in the cloud or other remote storage. Embodiments of the invention can reduce the number of times needed to access the flash cache or other storage device. When a cost is incurred for each access, embodiments of the invention can reduce costs.

FIG. 1 illustrates an example of a computing system 100. The computing system 100 may be implemented as an integrated device or may include multiple devices that are connected together using a network. The computing system 100 may be configured to perform a special purpose. For example, the computing system 100 may be configured to perform data protection operations. Example data protection operations include, but are not limited to, backing up data from one or more clients, restoring data to one or more clients, implementing a low-overhead index, de-duplicating data backed up in the computing system, indexing data stored in the computing system, optimizing the data stored in the computing system, reading a cache, writing to a cache, or the like or combination thereof.

The computing system 100 may include a processor 102 (or multiple processors), a memory 104, a flash cache 108 (or other suitable memory type), and storage 110. The memory 104 and the flash cache 108 may both be configured as a cache. The memory 104, for example, may be DRAM or the like. The memory 104 is typically faster and smaller than the flash cache 108. The flash cache 108 is typically smaller and faster than the storage 110.

The memory 104, flash cache 108, and storage 110 are arranged to improve performance of the computing system 100. Over time, by way of example, data that is requested more frequently tends to reside in the flash cache 108.

An index 106 is maintained in the memory 104. The index 106 includes multiple entries and each entry corresponds to data or a data segment stored in the flash cache 108. In one example, the index 106 may be implemented as a table such as a hash table. The hash in an entry of the index is an identifier of data corresponding to the entry. In one example, the index 106 may not store the complete identifier. The index 106 may include short identifiers of data stored in the flash cache 108. Each entry in the index 106 may also store other information or metadata such as a segment size and segment type of the data associated with the identifier, or the like or other combination thereof.

In one example, the entries in the index 106 only include partial identifiers. This allows the index 106 to reference more data in the flash cache 108 while using less of the memory 104. As previously indicated, partial or short identifiers are not necessarily unique and there is a risk of a collision. A collision, for example, occurs when more than one entry in the cache exists for a given request. More specifically, some of the partial identifiers in the index 106 may be the same. Embodiments of the invention augment the index with additional information such that collisions can be avoided and such that the requested data can be accurately identified and such that ambiguities or collisions in the index can be resolved.

In one embodiment, collisions in the index 106 can be resolved by establishing multiple points for comparison. In addition to comparing the partial or short identifier associated with the data, a comparison may also be performed using the other metadata stored in the index 106. Segment size, type, or the like can also be evaluated in the context of the request. By allowing multiple points of comparison, false positives can be reduced and the appropriate entry in the cache 108 can be identified if present.

When data in the computing system is requested, a request is generated for the cache. For example, a client may issue a read request or lookup using an identifier. The read request may include certain information about the requested data. The read request may include the identifier of the requested data, a segment size, a segment type, or the like or other combination thereof. The index 106 may be searched based on the identifier to determine if any of the partial identifiers in the index 106 are a match for the identifier of the requested data. If there is a match, at least one of the other metadata in the index may be compared with the information in the request. If a combination of the identifier and other metadata is not present in the index, the data does not exist in the cache and a false result is returned to the client. If the combination is present in the index, a read is issued and a provisional true result is returned to the client even if the read has not completed. When the cache returns the data and the full identifier, the full identifier is compared with the identifier included in the original request. A match is reported as true and a mismatch results in a false result.

When writing to the cache a lookup is performed in the index based on the partial or short identifier. The other metadata may also be compared. If a match does not exist in the index, then the data is inserted into the cache and the index is updated. If a match of the short identifier and secondary metadata is present, the write may be disregarded. Alternatively, the write (e.g., of a data segment) may be inserted into the cache, the index is updated to include an entry that points to the new data segment in the cache, and the previous entry is marked for deletion.

FIG. 2 illustrates a relationship between an index in memory and a flash cache. FIG. 2 illustrates that the memory 104 includes the index 106. The index 106 includes multiple entries such as the entry 202 and the entry 204. Each of the entries is associated with a data segment in the flash cache 108. In this example, the entry 202 points to or identifies the location of the data segment 214, and the entry 204 points to or identifies the location of the data segment 216. The segment 214 may be included in a container 218 and the segment 216 may be included in a container 220. Each of the containers may store multiple data segments. The fingerprints of the segments may be stored in a container header.

More specifically, each entry in the index 106 includes at least one metadata 206 and a SID (short identifier) 208. In this example, the SID 208 includes part of the data segment's 214 full identifier. For example, if a normal or full identifier is 20 bytes, the partial identifier 208 includes fewer than 20 bytes. The partial identifier 208 may include the most significant bits, the least significant bits, or other combination of bits. By using a partial or short identifier in the index 106, the index 106 can store information for more segments in the flash cache 108. If the SID is 10 bytes, then a memory or allocated portion of the memory can store approximately twice as many references.

The metadata 206 and 210 may include one or more of a location of the segment, a size of the data segment, a type of the data segment, or other metadata about the segment 214.

FIG. 3 is an example of a method for reading the flash cache using the index. The method 300 typically begins by performing a lookup operation in the index in box 302. The lookup operation may be performed in response to a request from a client of the computing system. The request that is the basis of the lookup operation may include an identifier of the data segment being requested, written, or accessed. The request may also include a segment type, a segment size, or other information related to the requested data segment. The portion of the identifier corresponding to the short identifiers stored in the index is used in the lookup operation.

If the short identifier is not found in the index, a miss can be returned immediately and the method may end after returning the miss. If the short identifier is not found, a miss is returned in box 304. If a hit is found, the secondary metadata may be evaluated or checked in box 306. If the lookup operation identifies a single match, it may be possible to issue a read request. However, the likelihood of reading the correct data segment from the cache can be improved by evaluating other metadata in the index. Further, the lookup operation may identify more than one match in the index. In other words, more than one short identifier may match the portion of the identifier used to search or access the index.

In box 306, whether a single match is identified or whether multiple matches are identified, secondary metadata in the index may also be compared with information included in the request if necessary. By evaluating the secondary metadata, the number of matches can be reduced, thus reducing collisions, and the likelihood of requesting the correct data segment from the cache is improved. When the short identifier and at least one of the other metadata match with the information included in the lookup request, a combination is found in box 308 and a read request is issued to the flash cache. A provisional true result may be returned to the client even if the data has not been read from the flash cache. When the secondary metadata does not match, a miss is returned in box 312.

When a combination is found in box 308, the flash cache is read and the flash cache returns the data segment in response to the read request. The flash cache may also return information associated with the data segment such as the full identifier. In box 310, the identifier returned from the flash cache in response to the read request is compared with the identifier in the original lookup request. If these identifiers match, then a hit is achieved and the data segment is returned. If these identifiers do not match, then a miss is returned to the client. If the data segment is not present in the flash cache, it may be retrieved from storage 110 for example.

FIG. 4 illustrates an example of a method for writing to a flash cache. The method 400 initially performs a lookup operation in the index in box 402. The lookup operation may be performed in response to a write request from a client. The write request may include an identifier of the data being written, a segment type, a segment size, or the like. The lookup operation is performed by identifying any short identifiers in the index that match a corresponding portion of the identifier included in the write request.

In box 404, the lookup operation may check the secondary metadata in the index associated with matching short identifiers. By checking the secondary metadata, the most correct entry in the index can be identified. If a combination is found (i.e., when the short identifier and at least one of the secondary metadata in the index match those in the request from the client) in box 406, the data segment is processed in 410.

Processing the data segment when a combination is found can include disregarding the current data segment and leaving the existing data segment in the cache. Alternatively, the data segment can be inserted into the cache and the previous segment already in the cache can be marked for deletion. This option may be performed if the locality for the new entry in the cache is preferred. For example, the locality can be improved in terms of erasures performed in the flash cache or in terms of future sequential reads on the storage (e.g., a hard disk drive). If the combination is not found in box 406, the data may be inserted into the cache in box 408.

The embodiments disclosed herein may include the use of a special purpose or general-purpose computer including various computer hardware or software modules, as discussed in greater detail below. A computer may include a processor and computer storage media carrying instructions that, when executed by the processor and/or caused to be executed by the processor, perform any one or more of the methods disclosed herein.

As indicated above, embodiments within the scope of the present invention also include computer storage media, which are physical media for carrying or having computer-executable instructions or data structures stored thereon. Such computer storage media can be any available physical media that can be accessed by a general purpose or special purpose computer.

By way of example, and not limitation, such computer storage media can comprise hardware such as solid state disk (SSD), RAM, ROM, EEPROM, CD-ROM, flash memory, phase-change memory (“PCM”), or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other hardware storage devices which can be used to store program code in the form of computer-executable instructions or data structures, which can be accessed and executed by a general-purpose or special-purpose computer system to implement the disclosed functionality of the invention. Combinations of the above should also be included within the scope of computer storage media. Such media are also examples of non-transitory storage media, and non-transitory storage media also embraces cloud-based storage systems and structures, although the scope of the invention is not limited to these examples of non-transitory storage media.

Computer-executable instructions comprise, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts disclosed herein are disclosed as example forms of implementing the claims.

As used herein, the term ‘module’ or ‘component’ can refer to software objects or routines that execute on the computing system. The different components, modules, engines, and services described herein may be implemented as objects or processes that execute on the computing system, for example, as separate threads. While the system and methods described herein can be implemented in software, implementations in hardware or a combination of software and hardware are also possible and contemplated. In the present disclosure, a ‘computing entity’ may be any computing system as previously defined herein, or any module or combination of modules running on a computing system.

In at least some instances, a hardware processor is provided that is operable to carry out executable instructions for performing a method or process, such as the methods and processes disclosed herein. The hardware processor may or may not comprise an element of other hardware, such as the computing devices and systems disclosed herein.

In terms of computing environments, embodiments of the invention can be performed in client-server environments, whether network or local environments, or in any other suitable environment. Suitable operating environments for at least some embodiments of the invention include cloud computing environments where one or more of a client, server, or target virtual machine may reside and operate in a cloud environment.

The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

1. A method for managing data segments stored in a cache, the method comprising: storing an index in a memory, wherein the index includes entries corresponding to data segments stored in the cache, wherein each entry in the index includes a short identifier and metadata associated with a data segment;detecting a collision when multiple short identifiers in the index satisfy a query to the index, wherein the query is associated with an operation on a specific data segment in the cache and wherein the query is associated with information about the specific data segment;resolving the collision based on the metadata associated with the multiple short identifiers that satisfy the query by identifying, from among entries in the index associated with the multiple short identifiers, a specific entry that includes a combination of a short identifier that satisfies the query and metadata that matches the information about the specific data segment; andperforming the operation on the specific data segment in the cache identified in the specific entry.
2. The method of claim 1, wherein the operation is inserting a first data segment into the cache, the method further comprising determining a short identifier for the first data segment in preparation for inserting the first data segment into the cache, the first data segment associated with a metadata; and performing a lookup operation for the first data segment in the index using the short identifier.
3. The method of claim 2, further comprising inserting the first data segment into the cache when the short identifier or the combination is not present in the index.
4. The method of claim 3, further comprising processing the first data segment when the collision is detected, wherein resolving the collision includes disregarding the data segment and leaving an existing data segment in the cache.
5. The method of claim 4, wherein processing the first data segment includes improving locality in the cache by inserting the first data segment into the cache and marking the existing data segment in the cache for deletion.
6. The method of claim 1, further comprising, after detecting the collision, comparing a full identifier of the data segment with a full identifier associated with the combinations found in the index.
7. The method of claim 1, further comprising, when the collision is detected, identifying a most correct combination from among combinations in the index associated with the multiple identifiers, wherein the most correct combination is based on a best match between the metadata associated with each short identifier and the information about the specific data segment.
8. The method of claim 7, wherein identifying the most correct combination includes comparing, for each entry associated with the multiple short identifiers, the metadata of each entry with the information about the specific data segment.
9. The method of claim 8, wherein the metadata associated with each short identifier in the index includes at least one of a location of the data segment in the cache, a segment size, a segment type, a segment path, or combination thereof.
10. A non-transitory computer readable medium comprising computer executable instructions for performing the method of claim 1.
11. A method for managing data segments stored in a cache, the method comprising: storing an index in a memory, wherein the index includes entries corresponding to data segments stored in the cache, wherein each entry in the index includes a combination of a short identifier and metadata associated with a data segment;determining whether a combination for a data segment is present in the index, wherein a combination is present at least when a short identifier fora data segment associated with an operation in the cache matches a short identifier in the index;detecting a collision when multiple short identifiers in the index satisfies a query to the index related to the data segment, wherein the query is associated with the operation in the cache on a specific data segment and wherein the query is associated with information about the specific data segment;resolving the collision, when the multiple short identifiers satisfy the query, based on the metadata associated with the multiple identifiers in the index and the information associated with the specific data segment to identify a specific entry in the index that is associated with the specific data segment associated with the query; andperforming the operation on the specific data segment.
12. The method of claim 11, wherein the operation includes inserting a first data segment into the cache or reading the first data segment from the cache, the method comprising performing a lookup operation for the first data segment in an index using the short identifier.
13. The method of claim 11, further comprising processing the data segment when the collision is detected, wherein resolving the collision includes disregarding the specific data segment associated with the query and leaving an existing data segment in the cache.
14. The method of claim 11, wherein resolving the collision further includes comparing a full identifier of the specific data segment with a full identifier associated with the specific entry identified from the entries associated with the multiple identifiers.
15. The method of claim 11, further comprising, when the collision is detected identifying a most correct combination in the index.
16. The method of claim 15, further comprising identifying the most correct combination from among combinations in the index associated with the multiple identifiers, wherein the most correct combination is based on a best match between the metadata associated with each short identifier and the information about the specific data segment.
17. The method of claim 16, wherein the metadata associated with each short identifier in the index includes at least one of a location of the data segment in the cache, a segment size, a segment type, a segment path, or combination thereof.
18. The method of claim 11, further comprising resolving the collision by failing the operation when the collision cannot be resolved at the index.
19. The method of claim 11, further comprising resolving the collision by inserting the data segment as a new data segment in the cache and marking a previous entry in the index for deletion.
20. The method of claim 11, wherein the short identifier comprises a portion of a full identifier.

US Referenced Citations (236)

Number	Name	Date	Kind
4410946	Spencer	Oct 1983	A
4513367	Chan	Apr 1985	A
4785395	Keeley	Nov 1988	A
4905188	Chuang	Feb 1990	A
4942520	Langendorf	Jul 1990	A
5333318	Wolf	Jul 1994	A
5590320	Maxey	Dec 1996	A
5630093	Holzhammer	May 1997	A
5644701	Takewaki	Jul 1997	A
5682497	Robinson	Oct 1997	A
5684976	Soheili-Arasi	Nov 1997	A
5740349	Hasbun	Apr 1998	A
5761501	Lubbers	Jun 1998	A
5838614	Estakhri	Nov 1998	A
5907856	Estakhri	May 1999	A
5909694	Gregor	Jun 1999	A
5913226	Sato	Jun 1999	A
5966726	Sokolov	Oct 1999	A
6046936	Tsujikawa	Apr 2000	A
6049672	Shiell	Apr 2000	A
6058038	Osada	May 2000	A
6119209	Bauman	Sep 2000	A
6128623	Mattis	Oct 2000	A
6138209	Krolak	Oct 2000	A
6192450	Bauman	Feb 2001	B1
6216199	DeKoning	Apr 2001	B1
6272593	Dujari	Aug 2001	B1
6351788	Yamazaki	Feb 2002	B1
6356990	Aoki	Mar 2002	B1
6360293	Unno	Mar 2002	B1
6397292	Venkatesh	May 2002	B1
6510083	See	Jan 2003	B1
6535949	Parker	Mar 2003	B1
6594723	Chapman	Jul 2003	B1
6636950	Mithal	Oct 2003	B1
6807615	Wong	Oct 2004	B1
6851015	Akahane	Feb 2005	B2
6901499	Aasheim	May 2005	B2
6965970	Mosur	Nov 2005	B2
6978342	Estakhri	Dec 2005	B1
7076599	Aasheim et al.	Jul 2006	B2
7079448	Leconte et al.	Jul 2006	B2
7124249	Darcy	Oct 2006	B1
7290109	Horil	Oct 2007	B2
7325097	Darcy	Jan 2008	B1
7356641	Venkiteswaran	Apr 2008	B2
7433245	Otsuka	Oct 2008	B2
7472205	Abe	Dec 2008	B2
7533214	Aasheim	May 2009	B2
7640262	Beaverson	Dec 2009	B1
7652948	Lee et al.	Jan 2010	B2
7673099	Beaverson	Mar 2010	B1
7702628	Luchangco	Apr 2010	B1
7711923	Rogers	May 2010	B2
7720892	Healey, Jr.	May 2010	B1
7793047	Asano	Sep 2010	B2
7870325	Joukan	Jan 2011	B2
7930559	Beaverson	Apr 2011	B1
7996605	Koga	Aug 2011	B2
8250282	Confalonieri et al.	Aug 2012	B2
8300465	Jeon	Oct 2012	B2
8370575	Eichenberger	Feb 2013	B2
8533395	O'Connor	Sep 2013	B2
8581876	Wickes et al.	Nov 2013	B1
8583854	Ji	Nov 2013	B2
8606604	Huber	Dec 2013	B1
8634248	Sprouse	Jan 2014	B1
8688650	Mutalik	Apr 2014	B2
8688913	Benhase	Apr 2014	B2
8738841	Olbrich	May 2014	B2
8738857	Clark	May 2014	B1
8793543	Tai	Jul 2014	B2
8811074	Goss	Aug 2014	B2
8817541	Li	Aug 2014	B2
8904117	Kalekar	Dec 2014	B1
8910020	Frayer	Dec 2014	B2
8917559	Bisen	Dec 2014	B2
8935446	Shilane	Jan 2015	B1
8943282	Armangau	Jan 2015	B1
9026737	Armangau	May 2015	B1
9043517	Sprouse	May 2015	B1
9053015	Nikolay	Jun 2015	B2
9098420	Bulut	Aug 2015	B2
9116793	Kandiraju	Aug 2015	B2
9122584	Kandiraju et al.	Sep 2015	B2
9135123	Armangau	Sep 2015	B1
9152496	Kanade	Oct 2015	B2
9171629	Kokubun	Oct 2015	B1
9189402	Smaldone	Nov 2015	B1
9189414	Shim	Nov 2015	B1
9213603	Tiziani et al.	Dec 2015	B2
9213642	Chiu	Dec 2015	B2
9251063	Nakamura et al.	Feb 2016	B2
9274954	Bairavasundaram	Mar 2016	B1
9281063	Xiang	Mar 2016	B2
9313271	Venkat	Apr 2016	B2
9317218	Botelho	Apr 2016	B1
9405682	Meshchaninov et al.	Aug 2016	B2
9436403	Zhang	Sep 2016	B1
9442662	Dancho	Sep 2016	B2
9442670	Kruger	Sep 2016	B2
9524235	Sprouse	Dec 2016	B1
9535856	Coronado	Jan 2017	B2
9542118	Lercari et al.	Jan 2017	B1
9690507	Matthews et al.	Jun 2017	B2
9690713	Khermosh	Jun 2017	B1
9697267	Kadayam	Jul 2017	B2
9703816	George	Jul 2017	B2
9753660	Mani	Sep 2017	B2
9811276	Taylor	Nov 2017	B1
9870830	Jeon	Jan 2018	B1
9921954	Sabbag et al.	Mar 2018	B1
9952769	Badam	Apr 2018	B2
9959058	O'Brien	May 2018	B1
10002073	Cai	Jun 2018	B2
10037164	Wallace et al.	Jul 2018	B1
10055150	Fenol	Aug 2018	B1
10055351	Wallace et al.	Aug 2018	B1
10089025	Wallace et al.	Oct 2018	B1
10146438	Shilane et al.	Dec 2018	B1
10146851	Tee et al.	Dec 2018	B2
10169122	Tee et al.	Jan 2019	B2
10235397	Shilane et al.	Mar 2019	B1
10243779	Tee et al.	Mar 2019	B2
10261704	Shilane et al.	Apr 2019	B1
10318201	Wallace et al.	Jun 2019	B2
10331561	Shilane et al.	Jun 2019	B1
1035360	Wallace et al.	Jul 2019	A1
1651126	Wallace et al.	Jul 2019	A1
10353820	Wallace	Jul 2019	B2
10379932	Tee et al.	Aug 2019	B2
10521123	Shilane et al.	Dec 2019	B2
10585610	Wallace et al.	Mar 2020	B1
10628066	Wu et al.	Apr 2020	B2
20010029564	Estakhri	Oct 2001	A1
20030009623	Arimilli	Jan 2003	A1
20030009637	Arimilli	Jan 2003	A1
20030009639	Arimilli	Jan 2003	A1
20030009641	Arimilli	Jan 2003	A1
20030009643	Arimilli	Jan 2003	A1
20030217227	Parthasarathy	Nov 2003	A1
20040123270	Zhuang	Jun 2004	A1
20050120180	Schornbach	Jun 2005	A1
20050165828	Lango	Jul 2005	A1
20060015768	Valine	Jan 2006	A1
20060059171	Borthakur	Mar 2006	A1
20060101200	Doi	May 2006	A1
20060143390	Kottapalli	Jun 2006	A1
20060179174	Bockhaus	Aug 2006	A1
20060184744	Langston	Aug 2006	A1
20070005928	Trika	Jan 2007	A1
20070061504	Lee	Mar 2007	A1
20070156842	Vermeulen	Jul 2007	A1
20070180328	Cornwell	Aug 2007	A1
20070192530	Pedersen et al.	Aug 2007	A1
20070300037	Rogers	Dec 2007	A1
20080046655	Bhanoo	Feb 2008	A1
20080065809	Eichenberger	Mar 2008	A1
20080077782	Lataille	Mar 2008	A1
20080120469	Kornegay	May 2008	A1
20080147714	Breternitz	Jun 2008	A1
20080177700	Li	Jul 2008	A1
20080183955	Yang	Jul 2008	A1
20080263114	Nath	Oct 2008	A1
20080266962	Jeon	Oct 2008	A1
20080273400	La Rosa	Nov 2008	A1
20080313132	Hao	Dec 2008	A1
20090063508	Yamato	Mar 2009	A1
20090216788	Rao	Aug 2009	A1
20090222626	Ingle	Sep 2009	A1
20090240871	Yano	Sep 2009	A1
20090300265	Vyssotski	Dec 2009	A1
20100023697	Kapoor	Jan 2010	A1
20100070715	Waltermann	Mar 2010	A1
20100082886	Kwon	Apr 2010	A1
20100115182	Murugesan	May 2010	A1
20100165715	Donze	Jul 2010	A1
20100185807	Meng	Jul 2010	A1
20100199027	Pucheral	Aug 2010	A1
20100211744	Morrow	Aug 2010	A1
20100229005	Herman	Sep 2010	A1
20100306448	Chen	Dec 2010	A1
20100332952	Chung	Dec 2010	A1
20110010698	Byom	Jan 2011	A1
20110022778	Schibilla	Jan 2011	A1
20110072217	Hoang	Mar 2011	A1
20110138105	Franceschini	Jun 2011	A1
20110138132	Brueggen	Jun 2011	A1
20110153953	Khemani	Jun 2011	A1
20110225141	Chaudhry	Sep 2011	A1
20110264865	Mobarak	Oct 2011	A1
20110276780	Sengupta	Nov 2011	A1
20110276781	Sengupta	Nov 2011	A1
20110296110	Lilly	Dec 2011	A1
20120054414	Tsai	Mar 2012	A1
20120084484	Post	Apr 2012	A1
20120102268	Smith	Apr 2012	A1
20120110247	Eleftheriou	May 2012	A1
20120215970	Shats	Aug 2012	A1
20120275466	Bhadra	Nov 2012	A1
20130036418	Yadappanavar	Feb 2013	A1
20130103911	Bulut	Apr 2013	A1
20130205089	Soerensen	Aug 2013	A1
20130282964	Sengupta	Oct 2013	A1
20130325817	Whitehouse	Dec 2013	A1
20130339576	Liu	Dec 2013	A1
20140013027	Jannyavula Venkata	Jan 2014	A1
20140098619	Nazarian	Apr 2014	A1
20140122818	Hayasaka	May 2014	A1
20140136762	Li	May 2014	A1
20140143505	Sim	May 2014	A1
20140149401	Liu	May 2014	A1
20140173330	Samanta	Jun 2014	A1
20140215129	Kuzmin	Jul 2014	A1
20140281167	Danilak	Sep 2014	A1
20140281824	Oh	Sep 2014	A1
20150127889	Hwang	May 2015	A1
20150205722	Chiu	Jul 2015	A1
20150277786	Rostock	Oct 2015	A1
20150331807	Lie	Nov 2015	A1
20150347291	Choi	Dec 2015	A1
20150363285	Delaney	Dec 2015	A1
20160041927	Jung	Feb 2016	A1
20160147669	Huang	May 2016	A1
20160274819	Choi	Sep 2016	A1
20170060439	Harawasa	Mar 2017	A1
20170091054	Delaney	Mar 2017	A1
20180335948	Wallace et al.	Nov 2018	A1
20190004957	Wallace et al.	Jan 2019	A1
20190034100	Wallace et al.	Jan 2019	A1
20190107946	Shilane et al.	Apr 2019	A1
20190243565	Shilane et al.	Aug 2019	A1
20190286329	Wallace et al.	Sep 2019	A1
20190294545	Shilane et al.	Sep 2019	A1
20190339882	Wallace et al.	Nov 2019	A1
20200117359	Shilane et al.	Apr 2020	A1

Non-Patent Literature Citations (49)

Entry
W. Jianpo, Y. Liqun and X. Qing, “Research on hash algorithm for retrieval of global multi-resolution terrain cache data,” 2010 International Conference on Audio, Language and Image Processing, 2010, pp. 980-984 (Year: 2010).
BloomStream: Data Temperature Identification for Flash Based Memory Storage Using Bloom Filters; Bhimani et al.; IEEE 11th International Conference on Cloud Computing; Jul. 2-7, 2018 (Year: 2018).
Optimal Bloom Filters and Adaptive Merging for LSM-Trees; Dayan et al.; ACM Transactions on Database Systems (TODS)—Best of SIGMOD 2017 Papers, vol. 43, iss. 4, Article No. 16; Dec. 2018 (Year: 2018).
Sergey Hardock, Ilia Petrov, Robert Gottstein, and Alejandro Buchmann. 2017. From In-Place Updates to In-Place Appends: Revisiting Out-of-Place Updates on Flash. ACM International Conference on Management of Data, pp. 1571-1586 (Year: 2017).
U.S. Appl. filed Jun. 29, 2016, Wallace et al., U.S. Appl. No. 15/196,163.
U.S. Appl. No. 15/196,110, filed Jun. 29, 2016, Wallace, et al.
U.S. Appl. No. 15/196,261, filed Jun. 29, 2016, Shilane, et al.
U.S. Appl. No. 15/196,283, filed Jun. 29, 2016, Shilane, et al.
U.S. Appl. No. 16/049,891, filed Jul. 31, 2018, Wallace, et al.
U.S. Appl. No. 16/103,499, filed Aug. 14, 2018, Wallace et al.
U.S. Appl. No. 16/146,584, filed Sep. 28, 2018, Wallace et al.
U.S. Appl. No. 16/209,054, filed Dec. 4, 2018, Shilane, et al.
U.S. Application filed Apr. 15, 2019, by Shilane et al., U.S. Appl. No. 16/384,591.
U.S. Application filed Jun. 7, 2019, by Wallace, et al., U.S. Appl. No. 16/434,470.
U.S. Application filed Jun. 13, 2019, by Shilane, et al., U.S. Appl. No. 16/440,457.
U.S. Appl. No. 15/196,150, filed Jun. 29, 2016, Shilane, et al.
A comparison of adaptive radix trees and hash tables; Alvarez et al; 31st International Conference on Data Engineering; Apr. 13-17, 2015; pp. 1227-1238 (12 pages) (Year: 2015).
A DRAM-flash index for native flash file systems; Ho et al.; 2013 International Conference on Hardware/Software Codesign and System Synthesis, pp. 1-10; Sep. 29, 2013-Oct. 4, 2013.
A Forest-structured Bloom Filter with flash memory; Lu et al; IEEE 27th Symposium on Mass Storage Systems and Technologies; May 23-27, 2011 (6 pages).
A multi-level elaborate least frequently/recently used buffer cache for flash storage systems; Noh et al.; Proceedings of the 2009 International Conference on Hybrid Information Technology , pp. 34-41; Aug. 27-29, 2009.
A novel hot data identification mechanism for NAND flash memory; Liu et al.; IEEE Transactions on Consumer Electronics, vol. 61, iss. 4; Nov. 2015; pp. 463-469 (Year: 2015).
A performance model and file system space allocation scheme for SSDs; Hyun etal.; IEEE 26th Symposium on Mass Storage Systems and Technologies; May 3-7, 2010 (Year: 2010).
A self-adjusting flash translation layer for resource-limited embedded systems; Wu, Chin-Hsien; ACM Transactions on Embedded Computing Systems, vol. 9, iss. 4, Article No. 31; Mar. 2010 (Year: 2010).
A sequential indexing scheme for flash-based embedded systems; Yin et al.; Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 588-599; Mar. 24-26, 2009.
A Workload-Aware Adaptive Hybrid Flash Translation Layer with an Efficient Caching Strategy; Park et al; 19th International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems; Jul. 25-27, 2011; pp. 248-255 (8 pages) (Year: 2011).
Algorithms and data structures for flash memories; Gal et al.; ACM Computing Surveys, vol. 37, iss. 2, pp. 138-163; Jun. 2005 (Year: 2005).
Algorithms in Java, Third Edition; Sedgewick, Robert; ISBN 0-201-36120-5; 2003; pp. 91-94 (4 pages).
BloomFlash: Bloom Filter on Flash-Based Storage; Debnath et al.; 2011 31st International Conference on Distributed Computing Systems; Jun. 20-24, 2011; pp. 635-644 (Year: 2011).
B-tree indexes and CPU caches; Graefe et al; 17th International Conference on Data Engineering; Apr. 2-6, 2001; pp. 349-358 (10 pages) (Year: 2001).
CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of Flash Memory based Solid State Drives; Chen et al; Proceedings of the 9th USENIX conference on File and storage technologies; Apr. 15-17, 2011; retrieved from Proceedings of the 9th USENIX conference on File and storage technologies on Jul. 15, 2017 (14 pages).
Design and implementation of NAND Flash files system based on the double linked list; Wang et al.; 2011 International Conference on Electrical and Control Engineering; Sep. 16-18, 2011 (Year: 2011).
DHash: A cache-friendly TCP lookup algorithm for fast network processing; Zhang et al; 38th Conference on Local Computer Networks; Oct. 21-24, 2013; pp. 484-491 (8 pages) (Year: 2013).
Dysource: a high performance and scalable NAND flash controller architecture based on source synchronous interface; Wu et al; Proceedings of the 12th ACM International Conference on Computing Frontiers, Article No. 25; May 18-21, 2015 (Year: 2015).
FlashStore: high throughput persistent key-value store; Debnath et al.; Proceedings of the VLDB Endowment, vol. 3, iss. 1-2, pp. 1414-1425; Sep. 2010.
Hardware/software architecture for flash memory storage systems; Min et al; Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems; Oct. 9-14, 2011; pp. 235-236 (Year: 2011).
History-aware page replacement algorithm for NAND flash-based consumer electronics; Lin etal.; IEEE Transactions on Consumer Electronics, vol. 62, iss. 1; Feb. 2016; pp. 23-39 (Year: 2016).
Hot data identification for flash-based storage systems using multiple bloom filters; Park et al.; 27th Symposium on Mass Storage Systems and Technologies; May 23-27, 2011 (Year: 2011).
Hydra: A Block-Mapped Parallel Flash Memory Solid-State Disk Architecture; Seong etal.; IEEE Transactions on Computers, vol. 59, iss. 7, pp. 905-921; Jul. 2010 (Year: 2010).
Implementing personal home controllers on smartphones for service-oriented home network; Tokuda et al.; IEEE 8th International Conference on Wireless and Mobile Computing, Networking and Communications, pp. 769-776; Oct. 8-10, 2012 (Year: 2012).
NAND Flash Memory: Challenges and Opportunities; Li et al; IEEE Computer, vol. 46, iss. 8; Aug. 2013; pp. 23-29 (Year: 2013).
RwHash: Rewritable Hash table for Fast Network Processing with Dynamic Membership Updates; Song et al.: ADM/IEEE Symposium on Architectures for Networking and Communications Systems; May 18-19, 2017.
SkimpyStash: RAM space skimpy key-value store on flash-based storage; Debnath et al.; Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 25-36; Jun. 12-16, 2011.
Skip lists: a probabilistic alternative to balanced trees; Pugh, William; Communications of the ACM, vol. 33, iss. 6; 6/19990; pp. 668-676 (9 pages).
Software Support Inside and Outside Solid-State Devices for High Performance and High Efficiency; Chen etal.; Proceedings of the IEEE, vol. 105, iss. 3; Sep. 2017; pp. 1650-1665 (Year: 2017).
A. Arelakis and P. Stenstrom, “A Case for a Value-Aware Cache,” in IEEE Computer Architecture Letters, vol. 13, No. 1, pp. 1-4, Jan. 21-Jun. 2014, doi: 10.1109/L-CA.2012.31. (Year: 2014).
A. Berman and Y. Birk, “Integrating de-duplication and write for increased performance and eenndduurrance of Solid-State Drives,” 2010 IEEE 26-th Convention of Electrical and Electronics Engineers in Israel, 2010, pp. 000821-000823 (Year: 2010).
D. Wang, J. Tang, M. Jia, Z. Xu and H. Han, “Review of NAND Flash Information Erasure Based on Overwrite Technology,” 2020 39th Chinese Control Conference (CCC), 2020, pp. 1150-1155 (Year: 2020).
K. Terazono and Y. Okada, “An extended delta compression algorithm and the recovery of failed updating in embedded systems,” Data Compression Conference, 2004. Proceedings. DCC 2004, 2004, p. 570. (Year: 2004).
S. Hardock, I. Petrovy, R. Gottstein and A. Buchmann, “Selective In-Place Appends for Real: Reducing Erases on Wear-prone DBMS Storage,” 2017 IEEE 33rd International Conference on Data Engineering (ICDE), 2017, pp. 1375-1376. (Year: 2017).

Related Publications (1)

	Number	Date	Country
	20190340128 A1	Nov 2019	US

Continuations (2)

	Number	Date	Country
Parent	16103499	Aug 2018	US
Child	16511256		US
Parent	15196163	Jun 2016	US
Child	16103499		US

Low-overhead index for a flash cache

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Disclaimer