Embodiments of the present invention generally relate to disk write and query operations. More particularly, at least some embodiments of the invention relate to systems, hardware, software, computer-readable media, and methods directed to processes for striking a balance between write operation efficiency and query lookup times by reducing index write-amplification in disk operations.
Hash tables are commonly employed to keep track of data and associated operations performed in connection with that data. In general, a conventional hash table may be implemented as an on-disk and/or in-memory dictionary structure that includes key-value pairs that map hash values, or keys, to respective data strings, or values. A key-value pair may be referred to herein simply as a ‘hash.’ Thus, as data is written or deleted, the hash table contents are updated accordingly. The hash table also supports query operations concerning the data.
In an effort to improve disk and/or memory IO performance, hash table operations, such as insertions for example, may be batched together in a memory buffer and then merged into the hash table when the buffer is full. In this way, multiple insertions are performed in a single IO, rather than being performed as individual IO processes. The latter approach may be considerably less efficient than the single IO approach in terms of time and processing resources used.
Although this batch approach to IOs has proved beneficial in certain circumstances, merging batched insertions, for example, into sorted key-value pairs already on disk has given rise to various problems, one of which is write amplification. In general, write amplification refers to the notion that, in conventional approaches, considerably more data must be written to the hash table than simply the insertions contained in the batch. More specifically, for each insertion written to the hash table, it is necessary to also copy key-value pairs already in the hash table. That is, the actual amount of information physically written is some multiple of the logical data intended to be written. As explained in the '584 application, this multiple may be as much as 100 in some cases.
Various approaches have been devised to attempt to address the write amplification problem, one example of which is a log structured merge (LSM) table configuration, in which an additional on-disk buffer is provided between the in-memory buffer and the final on-disk hash table. While the LSM approach may reduce write amplification in some instances, there remains a need for improved insertion performance and hash table configurations.
In order to describe the manner in which at least some of the advantages and features of the invention can be obtained, a more particular description of embodiments of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, embodiments of the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings.
Embodiments of the present invention generally relate to disk write and query operations, among others. More particularly, at least some embodiments of the invention relate to systems, hardware, software, computer-readable media, and methods directed to processes that, when implemented, may help to strike a balance between write operation efficiency and query lookup times by, for example, reducing index write-amplification in disk operations. Such embodiments may be especially well suited for use in connection with environments in which the key-value pairs are unsorted. As well, example embodiments may provide for particularly efficient hash table query operations.
In more detail, example embodiments of the invention implement a bundle of arrays (BOA) data structure configuration that includes a size-tiered LSM that operates in conjunction with a routing filter. The data structure comprises a BOA hash table, which may be referred to herein simply as a BOA, that appends new data to data currently in a level, so as to significantly reduce, or eliminate, write amplification. To illustrate, insertions may be batched together in blocks of size B and stored at an uppermost level, or Level 1, of the BOA. Each group of blocks defines an array.
When there is a defined number λ of full arrays at Level 1, that is, when Level 1 is full, those arrays can then be merged down into the next lower level, or Level 2, to form a new array at Level 2. Level 2, like Level 1, may include one or more arrays. In this example then, the arrays being merged down from Level 1 constitute new data, relative to the data already residing at Level 2.
As the foregoing example illustrates, each level of the data structure includes, or will at some point, a bundle of arrays, and when a level fills, the arrays of that level are merged together in a new array of the next lower level. In this way, each level comprises a bundle of arrays, where each array contains sorted data of a prior merge. Note that these arrays may also be referred to as sorted runs and, as such, each level can be considered as comprising a collection of sorted runs. This approach can be implemented for any number ‘n’ of Levels in a data structure. Moreover, because this approach employs a multiway merge, rather than a conventional successive two-way merge process, embodiments within the scope of the invention may be relatively more efficient both in terms of IOs and internal memory operations. The improved efficiency in IOs provided by example embodiments of the invention may provide for a significant reduction in write amplification.
As noted above, embodiments of the invention may also employ a routing filter for enabling fast and efficient queries of data structures such as a BOA hash table. In general, the routing filter eliminates the need to search for data in multiple different sorted arrays. Instead, the routing filter exploits the probabilistic nature of hashes to suggest possible arrays in response to a query. Because there is a low probability that the routing filter will return a false positive, but no probability of false negatives, the routing filter may contribute significantly to the speed and efficiency of queries directed to the BOA hash table by intercepting the query and suggesting possible array locations for the key corresponding to queried data.
Advantageously then, embodiments of the invention may provide various benefits and improvements relative to conventional hardware, systems and methods. To illustrate, embodiments of the invention may improve the operation of a computing system, or element of a computing system, by significantly reducing the insertion performance penalty imposed by write amplification. As well, embodiments of the invention may provide for more focused, and relatively faster, hash table query performance, such as through the use of a routing filter. As a final example, embodiments of the invention may improve on known data structures and processes that require sorted data for effective operation. In particular, embodiments of the invention are well suited for use in connection with unsorted data and, as such, may reduce or eliminate the need to sort the hash table data. That is, like an internal memory hash table, a BOA hash table does not maintain the key-value pairs in sorted order, but rather in sorted hash order. However, using a BOA hash table is fast enough that it is possible to eliminate some situations where data would otherwise need to be sorted in order to do point-wise operations quickly.
As explained herein, advantages such as those noted can be achieved with various embodiments of the invention which include, but are not limited to, data structures that implement particular arrangements, configurations, and handling of data on computer readable media, such as disks for example. Such data structures may be defined, and reside, on disks and/or on other computer readable media. Thus, as contemplated by this disclosure, some embodiments of a data structure are implemented in a physical form.
Finally, aspects of various example embodiments are disclosed in “Optimal Hashing In External Memory,” by Alex Conway, Martin Farach-Colton, and Philip Shilane, which is attached hereto as Appendix A, and incorporated herein in its entirety, and made part of this disclosure, by this reference.
A. Aspects of Example Operating Environments
The following is a discussion of aspects of example operating environments for various embodiments of the invention. This discussion is not intended to limit the scope of the invention, or the applicability of the embodiments, in any way.
In general, example embodiments may be employed in connection with any computing environment in which data storage and protection processes are performed. Such processes may include, but are not limited to, data write processes, data read processes, queries, insertions, deletes, data backup processes, data restore processes, and data deduplication processes. Any of these processes may be performed with respect to storage, disks, or memory, including with respect to hash tables residing on-disk, and in-memory.
Some particular example embodiments may be employed in connection with appliances that may communicate with a cloud datacenter, although that is not required. Yet other embodiments may be implemented in whole or in part in a cloud datacenter or other cloud computing environment. Such a cloud datacenter can implement backup, archive, restore, and/or disaster recovery, functions. Deduplication, when performed, may take place at a client, a backup server, an appliance that communicates with a client and cloud datacenter or other entity, and/or in a cloud computing environment.
Any of the devices, including clients, servers, appliances, and hosts, in the operating environment can take the form of software, physical machines, or virtual machines (VM), or any combination of these, though no particular device implementation or configuration is required for any embodiment. Similarly, data protection system components such as databases, storage servers, storage volumes, storage disks, replication services, backup servers, restore servers, backup clients, and restore clients, for example, can likewise take the form of software, physical machines or virtual machines (VM), though no particular component implementation is required for any embodiment. Where VMs are employed, a hypervisor or other virtual machine monitor (VMM) can be employed to create and control the VMs.
As used herein, the term ‘data’ is intended to be broad in scope. Thus, that term embraces, by way of example and not limitation, data segments such as may be produced by data stream segmentation processes, data chunks, data blocks, atomic data, emails, objects of any type, files, contacts, directories, sub-directories, volumes, and any group of one or more of the foregoing.
Example embodiments of the invention are applicable to any system capable of storing and handling various types of objects, in analog, digital, or other form. Although terms such as document, file, block, or object may be used by way of example, the principles of the disclosure are not limited to any particular form of representing and storing data or other information. Rather, such principles are equally applicable to any object capable of representing information.
B. Aspects of Example BOA Hash Tables
With particular attention now to
Turning now to
As shown in
In the illustrative example of
With reference now to
In the disclosed arrangement, an in-memory buffer 306 is provided that stores a single array 307. As such, in
Turning now to
C. Write Amplification and BOA Hash Tables
With the foregoing discussion of aspects of example BOA hash tables in view, attention is directed now to some further functional and operational aspects of such BOA hash tables. As noted elsewhere herein, the disclosed BOA hash tables may be advantageous at least insofar as they provide for a reduction in a write amplification penalty insofar as they implement a more streamlined approach to the processing of insertions. In particular, a key-value pair is grouped with other key-value pairs until an array is filled, and then the array is merged with, that is, appended to, a bundle of arrays at the next lower layer. In this way, the rest of the data at the next lower layer need not be accessed. A G-way merge implemented by embodiments of the invention contrasts, for example, with a two-way merge of two sorted arrays where one of the sorted arrays is G times larger than the other of the two sorted arrays. Multiway merges such as the G-way merge are typically more efficient than a series of successive two-way merges, both in terms of IOs and in terms of internal memory operations. This additional 10 efficiency provides the principle reduction of the write amplification penalty.
More particularly, an approach that does not employ a BOA hash table such as the disclosed BOA hash table examples would require a much less efficient process in which the entire lower layer would be read in, the insertion made into the data of the lower layer, and then the modified lower layer, including the insertion, written to disk. Thus, embodiments of the invention may, among other things, significantly reduce the extent to which write amplification is experienced in connection with insertion processes, and the other disclosed processes involving hash values.
D. Aspects of Example Routing Filters
While the disclosed BOA hash tables and associated processes, including G-way merges, are effective in reducing a write amplification penalty, it may be possible in some circumstances to enhance the performance of such BOA hash tables where memory and query costs are concerned. In at least some embodiments, such performance enhancements may be achieved with the use of one or more routing filters in conjunction with a multilayered BOA hash table. Following is a discussion of aspects of routing filters, one example of which is generally denoted at 400 in
In general, embodiments of a routing filter serve to eliminate the need to search all of the λ runs on a given level (see, e.g.,
Turning now to
E. Example Host and Server Configurations
With reference briefly now to
In the example of
F. Aspects of Example Methods for IOs
With attention now to
The process 600 can begin when a client requests an insertion 602. The insertion 602 may be implicated, for example, by a write process initiated by the client. While not specifically indicated in
Regardless of whether the insertion 604 occurs in connection with an in-memory buffer, or a layer of a BOA hash table, the method 600 may continue to iterate as long as a determination is made 606 that the array is not full. Once the array is determined 606 to be full, then the array is merged into the next lower level 608, which may be any level of a BOA hash table. This may result in the emptying, at least temporarily, of the in-memory buffer, or the layer from which the array was merged, as applicable. More specifically, the full array may be appended to a group of one or more arrays in the next lower level. After a merge has been performed whether commencing from an in-memory buffer, or a layer of the BOA hash table, the BOA hash table is then updated 610 to reflect the merger. It will be appreciated that processes 604-610 can be repeated as many times as needed, without limit. Correspondingly, an associated BOA hash table may have any number ‘n’ of successive layers, where each layer is larger than the layers above it.
Turning now to
While not necessarily part of a query process, the hashes stored in the BOA hash table may be tracked 702 by the routing filter. As such, the routing filter is responsive to queries, such as a query 704 initiated by a client. In response to the query 704, the routing filter may identify 706 possible locations, in one or more arrays of the BOA hash table, for the hash implicated by the query. The location information may then be returned 708 to the client or other entity that initiated the query. After receipt of the location information 710, the client can then access the hashes 712 and thus the query is satisfied.
G. Example Computing Devices and Associated Media
The embodiments disclosed herein may include the use of a special purpose or general-purpose computer including various computer hardware or software modules, as discussed in greater detail below. A computer may include a processor and computer storage media carrying instructions that, when executed by the processor and/or caused to be executed by the processor, perform any one or more of the methods disclosed herein.
As indicated above, embodiments within the scope of the present invention also include computer storage media, which are physical media for carrying or having computer-executable instructions or data structures stored thereon. Such computer storage media can be any available physical media that can be accessed by a general purpose or special purpose computer.
By way of example, and not limitation, such computer storage media can comprise hardware storage such as solid state disk/device (SSD), RAM, ROM, EEPROM, CD-ROM, flash memory, phase-change memory (“PCM”), or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other hardware storage devices which can be used to store program code in the form of computer-executable instructions or data structures, which can be accessed and executed by a general-purpose or special-purpose computer system to implement the disclosed functionality of the invention. Combinations of the above should also be included within the scope of computer storage media. Such media are also examples of non-transitory storage media, and non-transitory storage media also embraces cloud-based storage systems and structures, although the scope of the invention is not limited to these examples of non-transitory storage media.
Computer-executable instructions comprise, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts disclosed herein are disclosed as example forms of implementing the claims.
As used herein, the term ‘module’ or ‘component’ can refer to software objects or routines that execute on the computing system. The different components, modules, engines, and services described herein may be implemented as objects or processes that execute on the computing system, for example, as separate threads. While the system and methods described herein can be implemented in software, implementations in hardware or a combination of software and hardware are also possible and contemplated. In the present disclosure, a ‘computing entity’ may be any computing system as previously defined herein, or any module or combination of modules running on a computing system.
In at least some instances, a hardware processor is provided that is operable to carry out executable instructions for performing a method or process, such as the methods and processes disclosed herein. The hardware processor may or may not comprise an element of other hardware, such as the computing devices and systems disclosed herein.
In terms of computing environments, embodiments of the invention can be performed in client-server environments, whether network or local environments, or in any other suitable environment. Suitable operating environments for at least some embodiments of the invention include cloud computing environments where one or more of a client, server, or other machine may reside and operate in a cloud environment.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.
This application hereby claims priority to U.S. Provisional Patent Application Ser. 62/571,584, entitled METHOD TO REDUCE INDEX WRITE-AMPLIFICATION, and filed Oct. 12, 2017 (the “'584 application”). All of the aforementioned applications are incorporated herein in their respective entireties by this reference.
Number | Date | Country | |
---|---|---|---|
62571584 | Oct 2017 | US |