In computers, storage devices (such as memory devices) are used to store various data involved in the execution of software or to perform other tasks, such as communications tasks, management tasks, and so forth. Data structures, such as tables, stored in storage devices often have fixed sizes. Examples of fixed-size data structures include lookup tables used in cache memory subsystems, lookup tables used for database applications, and so forth.
With a fixed-size data structure, an algorithm conventionally has to be provided to explicitly select a data item in the data structure to remove (to eject the data item) so that space is freed up to enable addition of a new data item to the data structure. An example of such an algorithm is a least recently used (LRU) replacement algorithm. However, having to provide an algorithm to explicitly eject (remove) a data item from a data structure adds complexity to a system.
Some embodiments of the invention are described, by way of example, with respect to the following figures:
In accordance with some embodiments, a technique is provided to enable storage of data items into a fixed-size data structure without having to provide an explicit eject mechanism for selecting data items in the data structure to remove such that new data can be inserted into the data structure. A technique according to some embodiments, gradually degrades data items stored in the data structure by probabilistically overwriting different portions of existing data items in the data structure as new data items are inserted into the data structure. As more data items are inserted into the data structure, the degradation of earlier data items (data items written into the data structure at an earlier time) is increased until, at some point, the earlier data items are considered lost and cannot be retrieved. A data item that has been degraded to a point that it is no longer retrievable is considered to have “exit” the data structure, even though no explicit eject mechanism has been provided to remove this data item.
A “data item” (or interchangeably, “data value”) refers generally to a unit of data that can be stored into a data structure. In accordance with some embodiments, the data item or data value is first encoded, and it is the encoded version of the data item or data value that is stored in the data structure. Thus, storing a data item or data value in a data structure can refer to storing an encoded version of the data item or data value in the data structure. A “data structure” refers to some predefined arrangement that provides entries for accepting data.
More specifically, it is desired to store p key-data value pairs (Ki,Vi), i=1 to n (where [[n·1]]n≧1), in r arrays (named A1, . . . , Ar) each of size q×1, where [[p·1]] p≧1, [[r·1]]r≧2, and [[q·2]]q≧2.
Each key Ki is formed of s bits, and each data value Vi is formed of t bits, where [[s·1]]s≧1, [[·1]]t≧1. The r arrays, each of size q×1, are depicted as A1, A2, . . . Ar in
As depicted in
As further depicted in
In accordance with some embodiments, a “time arrow” feature is associated with the data structure 102. The time arrow feature of the data structure 102 causes key-data value pairs inserted later to be more likely to be retrievable than those inserted earlier. Key-data value pairs inserted into the data structure 102 are degraded with time (or more precisely, with the number of insertions that occur after a certain key-data value pair has been inserted). Degradation of a particular key-data value pair is caused by portion(s) representing the particular data value being overwritten by subsequent insertions of key-data value pairs. The time arrow feature offers a deterministic and probabilistic behavior on the retrievability of a particular key-data value pair as a function of the number of pairs that are inserted after the key-data value pair.
To enable the retrieval of a correct version of data values from the data structure 102 even though portions representing the data values may have been overwritten, error correction codes can also be provided when storing data values into the data structure 102. A data value that has been more recently inserted into the data structure 102 is more likely to be retrievable, since such more recently inserted data value would be less likely to have portions of the data value overwritten. Stated differently, data values inserted earlier are more likely to have collisions on them (where a collision can result in one or more portions being overwritten), resulting in higher likelihood of irretrievability. The more insertions that follow a particular data value, the more are the chances that the portions used to represent the data value are overwritten, and the higher the chance of it becoming irretrievable. On the other hand, items inserted recently are less likely to have any collisions, resulting in more certain retrievability. By using a multidimensional array representation of the data structure 102, the representations of the key-data value pairs are made more robust, and the lifetime before an inserted key-data value pair is degraded is quite high.
An error correction code can be used to recover the original data value (produce a correct version of the original data value), assuming that some portion(s) of the data value has been overwritten. Adding error correction codes when storing data values into the data structure 102 enhances retrievability (or stated another way, enhances the lifetime for which data values remain retrievable). Furthermore, the error correction codes provide a threshold below which overwritten portions do not render a key-data value pair irretrievable. However, data values that have been inserted earlier are less likely to be retrievable, since more portions of such data values are likely to have been overwritten. More data portions being overwritten reduces the likelihood that an error correction code can be used to recover the data value.
In some embodiments, the data structure 102 is a hash table. A hash table refers to a table having entries pointed to (or addressed) by hash values, which are produced by hashing a key (by applying a hash function on the key). In one example, the hash table is a hash table version of a Bloom filter.
In some examples, the data structure 102 can be used as a lookup table, such as a lookup table used in a cache memory system or an index table used in a database management system. Lookup tables or index tables are accessed to allow quicker access of a larger data structure, such as a cache or database table.
As further depicted in
The Store software module 106 stores data (e.g., key-data value pairs) into the data structure 102. The Retrieve software module 108 is able to retrieve-data from the data structure 102 for output (112), where the output (112) can be presented in a display device, communicated over a network to a remote computer, or input to another software (not shown).
A process performed by the Store software module 106 is depicted in
An input key-data value pair (Ki, Vi) is received (at 202) to be inserted into the data structure 102. To insert the input key-data value pair into the data structure 102, r independent hash functions {hj}{j=1, . . . , r} are defined, each of which converts key Ki to hj(Ki). In other words, the r hash functions are each applied (at 204) independently r times to produce r hash values hj(Ki), j=1 to r. These hash values are q-valued, namely they take values from {1, . . . , q}. The hash values between 1 and q are used to point to the q rows of the data structure 102; in other words, a hash value of 1 points to row 1, a hash value of q points to row q, and so forth. In a different embodiment, instead of using hash functions applied on the key to produce hash values, different functions can be used instead, where such functions are applied to the key to produce pointer values that point to entries of the data structure 102.
Each data value Vi is encoded (at 206) to produce r segments. In one embodiment, the encoding uses a (r, t, d) error correction code C such that encoding of Vi produces encoded value Ci=[ci1, . . . , cir]. Note that the error control code is represented by calligraphed C, while the encoded version of the data value is represented by Ci. The (r, t, d) error correcting code refers to the type of code used to perform the encoding; for example, a (6, 4, d) code refers to an error correction code that converts a 4-byte input into a 6-byte output, where the two extra bastes are code values for performing error correction (note that d is based on the r and t values and represents how much error the error correction code can tolerate before error correction can no longer be performed). The encoded value Ci has r segments Ci=[ci1, ci2, . . . , cir], where each segment cij (j=1 to r) is stored in a corresponding column of the data structure 102. The r segments ci1 to cir together represent the data value Vi.
The r segments are stored (at 208) in corresponding columns of the data structure 102; in other words, segment ci1 is stored in column 1 (or array A1 in
For example, in
The process of
Next, the key Ki is hashed (at 304) r times using the same r hash functions used by the Store software module 106. Segments corresponding to the r hash values are retrieved (at 306) from the corresponding entries of the data structure 102. For example, segment ci1 is retrieved from row h1(Ki) in column 1 (array A1), segment ci2 is retrieved from row h2(Ki) in column 2 (array A2), and so forth. From the retrieved segments, the possibly corrupted version (code) C•i of the encoded value Ci=[ci1, . . . , cir] is constructed (at 308). This possibly corrupted version (code) C•i is represented as C•i=[c•i1, . . . , c•ir]. The version C•i=[c•i1, . . . , c•ir] is decoded (at 310) using standard decoding algorithms for the error correction code C. If the code C is chosen to be cyclic, then additional structure can be used to allow the decoding to be performed more efficiently.
If the original data value Vi is recoverable based on the decoding, then the result that is output (at 312) is data value Vi, which was decoded successfully from C•i. However, if corruption of Ci prevents recovery of Vi using the error correction code, then the result output (at 312) is an error indication to indicate that the data value Vi is irretrievable (indicating that Vi is no longer in the data structure 102).
In the foregoing, it was assumed that all key-data value pairs stored in the data structure 102 have equal importance, and therefore the algorithm discussed above causes each key-data value pair to have equal expected life in the data structure 102. In some embodiments, it may be desirable to assign higher importance to some key-data value pairs, with such key-data value pairs of higher importance assigned a greater expected life.
The expected life of a key-data value pair (K, V) is the value of n (n insertions after (K,V) has been inserted) for which the probability of correct retrieval of (K, V) falls to below 0.5. It may be desirable that the expected life be longer for more important key-value pairs.
To counteract the degradation of a data value stored in the data structure 102 due to subsequent data value insertions the concept of diversity can be employed in some embodiments. Applying diversity causes multiple copies (rather than just one copy) of a data value to be stored in the data structure 102. Diversity is denoted by l, where l>1 indicates the number of copies of the higher importance key-data value pairs to be stored in the data structure 102.
Multiple families Hi, i=1 . . . l of independent hash functions (one family per data value copy) are provided for each higher importance key-data value pair. With this embodiment, the corresponding key K is hashed with multiple families of hash functions (instead of with just one set of hash functions as described above). In other words, the key K is hashed r times with a first family of hash functions; the key K is hashed r times with a second family of hash functions, and so forth. The number of families used depends on how valuable the key-data value pair is. Thus, l can be set differently for different key-data value pairs.
For each such family, a corresponding group of r segments representing the respective data value V is stored into the data structure 102 in the manner described above. The storing algorithm that takes into account multiple families of hash functions is a straightforward extension of the storing algorithm discussed in connection with
The following describes the algorithm used for retrieving a key-data value pair that is associated with multiple families of hash functions. It is desired to retrieve a pair (Ki, Vi) after several insertions have occurred after (Ki, Vi) was inserted into the data structure 102. Also, diversity l was used for storing (Ki, Vi). As depicted in
For each family of hash functions, a respective set of r segments is retrieved (at 506) from the data structure 102, and this set is used to construct a corresponding version C′i,m(m=1 to l) of the encoded value Ci. In other words, l (l·2) versions (possibly corrupt) of the encoded value Ci are created.
The multiple versions C′i,m(m=1 to l) of the encoded value of Vi are decoded (at 508). If it is determined (at 510) that even one of these versions produces a valid output, then the process is complete and the result is output (at 512), where the result is Vi. But if none of the versions C′i,m(m=1 to l) can be successfully decoded to provide Vi, then a list decoding algorithm is performed (at 514). A list decoding algorithm outputs a list of possible data values for a given input, in this case C′i,m. For l inputs (C′i,m(m=1 to l)), l lists are created, with each list containing possible data values. If the corrupted versions C′i,m(m=1 to l) are list decodable, then the output Vi would be at the intersection of the l lists. In other words, a common element of the l lists is identified, where the common element is output Vi. In an alternative embodiment, instead of applying l families of hash functions to the entire input data value Vi, a different form of diversity can be performed, as discussed below. In this alternative diversity embodiment, an input data value Vi is divided into u parts [vi1, Vi2, . . . , viu], [[u·2]]u≧2, where each part has length t/u. Each of the smaller parts vi1, Vi, . . . can be encoded by an [r, t/s, d′] error correction code, where d′>d because the dimension of the code has decreased.
Effectively each of the u parts vi1, vi2, . . . , viu of input data value Vi is encoded to produce r segments for each part viz, z=1 to u. The r segments for each part viz, z=1 to u, is then stored into the data structure 102 using the store algorithm described in connection with
In storing the r segments for each part viz, a corresponding family of hash functions is used to produce r hash values from the key Ki. In other words, u families of hash functions are defined, with each family corresponding to part viz. The same key Ki is hashed using the u hash families. Each such hash yields a q-ary r-tuple (a tuple of r hash values that vary between 1 and q). The q-ary r-tuple is used to store the r segments corresponding to the respective input part viz,
The above process is depicted in
Instructions of software described above (including the Store module 106 and Retrieve module 108 of
Data and instructions (of the software) are stored in respective storage devices, which are implemented as one or more computer-readable or computer-usable storage media. The storage media include different forms of memory including semiconductor memory devices such as dynamic or static random access memories (DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), electrically erasable and programmable read-only memories (EEPROMs) and flash memories; magnetic disks such as fixed, floppy and removable disks; other magnetic media including tape; and optical media such as compact disks (CDs) or digital video disks (DVDs). Note that the instructions of the software discussed above can be provided on one computer-readable or computer-usable storage medium, or alternatively, can be provided on multiple computer-readable or computer-usable storage media distributed in a large system having possibly plural nodes. Such computer-readable or computer-usable storage medium or media is (are) considered to be part of an article (or article of manufacture). An article or article of manufacture can refer to any manufactured single component or multiple components.
In the foregoing description, numerous details are set forth to provide an understanding of the present invention. However, it will be understood by those skilled in the art that the present invention may be practiced without these details. While the invention has been disclosed with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover such modifications and variations as fall within the true spirit and scope of the invention.
This application claims the benefit under 35 U.S.C. §119(e) of U.S. Provisional Application Ser. No. 61/033,811, entitled “MANAGING STORAGE OF DATA IN A DATA STRUCTURE,” filed Mar. 5, 2008.
Number | Name | Date | Kind |
---|---|---|---|
5339398 | Shah et al. | Aug 1994 | A |
5963909 | Warren et al. | Oct 1999 | A |
6154747 | Hunt | Nov 2000 | A |
6199178 | Schneider et al. | Mar 2001 | B1 |
6275919 | Johnson | Aug 2001 | B1 |
6330557 | Chauhan | Dec 2001 | B1 |
6366987 | Tzelnic et al. | Apr 2002 | B1 |
6442553 | Take | Aug 2002 | B1 |
6529995 | Shepherd | Mar 2003 | B1 |
7080072 | Sinclair | Jul 2006 | B1 |
7761458 | Eshghi | Jul 2010 | B1 |
7797323 | Eshghi | Sep 2010 | B1 |
7814078 | Forman | Oct 2010 | B1 |
7856437 | Kirshenbaum | Dec 2010 | B2 |
7895666 | Eshghi | Feb 2011 | B1 |
20030026020 | Buckingham | Feb 2003 | A1 |
20030066010 | Acton | Apr 2003 | A1 |
20030074341 | Blackburn et al. | Apr 2003 | A1 |
20060023596 | Ogawa et al. | Feb 2006 | A1 |
20060106857 | Lillibridge | May 2006 | A1 |
20060116989 | Bellamkonda et al. | Jun 2006 | A1 |
20060152636 | Matsukawa et al. | Jul 2006 | A1 |
20060235903 | Kapur | Oct 2006 | A1 |
20070005627 | Dodge | Jan 2007 | A1 |
20070286194 | Shavitt et al. | Dec 2007 | A1 |
20080016576 | Ueda et al. | Jan 2008 | A1 |
Number | Date | Country | |
---|---|---|---|
20100082562 A1 | Apr 2010 | US |
Number | Date | Country | |
---|---|---|---|
61033811 | Mar 2008 | US |