Aggregation is one of the most basic and important query operators in relational databases. It is widely used in data warehousing and decision making applications, where the data to be processed is usually quite large. Aggregation methods have been extensively studied and deployed for row oriented databases, or row stores, such as IBM®DB2® and Oracle®. IBM and DB2 are registered trademarks or trademarks of International Business Machines Corporation in the United States and/or other countries. Oracle is a registered trademark of Oracle Corporation.
Recently, column oriented databases, or column stores, have been emerging as a viable alternative to the conventional row oriented database structure. In a column store, database content is stored by column instead of by row. Each database column is stored separately, with attribute values of the same column stored contiguously. Accordingly, it is possible to efficiently access a relational database column by column instead of through the more conventional row by row access methods. However, current technology performs queries, such as aggregation, on column oriented databases utilizing query methods originally developed for row oriented databases. The row oriented database query methods are not designed to take advantage of the specific characteristics of column oriented databases.
The subject matter described herein generally relates to database aggregation. In particular, certain subject matter presented herein provides query methods for column oriented databases. For example, systems and associated methods are described that provide techniques utilizing the bitmap index and data array of column oriented databases to process aggregation queries.
In summary, one aspect provides a method comprising: accessing at least one database storing data in at least one data array; configuring a bitmap index to map the data; receiving at least one database query; and processing the at least one database query by determining positions of query attributes in the at least one bitmap index and locating values corresponding to the positions in the at least one data array.
The foregoing is a summary and thus may contain simplifications, generalizations, and omissions of detail; consequently, those skilled in the art will appreciate that the summary is illustrative only and is not intended to be in any way limiting. For a better understanding of the embodiments, together with other and further features and advantages thereof, reference is made to the following description, taken in conjunction with the accompanying drawings. The scope of the invention will be pointed out in the appended claims.
It will be readily understood that the components of the embodiments, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations in addition to the described example embodiments. Thus, the following more detailed description of the example embodiments, as represented in the figures, is not intended to limit the scope of the claims, but is merely representative of certain example embodiments.
Reference throughout this specification to an “embodiment” or “embodiment(s)” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of “embodiment” or “embodiment(s)” in various places throughout this specification are not necessarily all referring to the same embodiment.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of example embodiments. One skilled in the relevant art will recognize, however, that aspects can be practiced without one or more of the specific details, or with other methods, components, materials, et cetera. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid prolixity.
Analysis and knowledge discovery of operational data has been used for many purposes, including providing business insight into customer behaviors, sales trends, and promoting sales benefits. Aggregation is a primary operation supported in relational databases for ascertaining such information.
With the information explosion of recent years, the size of data being managed by databases involved in real applications is usually at the terabyte scale, but is increasingly at the petabyte scale. For example, there could be billions of sales records in a sales database, and possibly many more if legacy sales data are also included. The processing of aggregation queries over such large databases and data warehouses is computationally intensive and time-consuming. Although computer hardware has achieved significant advancement in the past decade, increased data storage has outstripped advancements in CPU speed and available memory. Accordingly, more efficient and scalable query processing methods are needed to support the explosive growth in data application needs.
Existing aggregation query processing techniques are mostly sort-based or hash-based, and are designed only for row oriented databases. The newly emerging column oriented databases have adopted these row oriented database aggregation processing techniques. However, all of these query processes are designed and optimized based on the architecture of row oriented databases, where tuples are stored consecutively on disk. With the emergence of this new column architecture, opportunities exist to process aggregation queries that are specific and more efficient for column oriented databases.
In a column oriented database, for each relation, the data of each column of the relation is stored contiguously on disk. As such, for a specific value in a column, the positions of occurrences of this value in the column may be easily computed without much I/O. In addition, since the column data is stored contiguously, operations that need random access on the data of a column may be efficiently conducted. Embodiments leverage these two characteristics of column stores to develop more efficient aggregation processing techniques for these types of databases.
Notable database management systems that store data via columns include MonetDB, C-Store, and Vertica®. Vertica is a registered trademark of Vertica Systems, Inc. These column oriented databases support relational databases and data warehouses, including the execution of SQL queries. C-Store additionally supports hybrid structures of both row oriented and column oriented storages, as well as overlapping columns to speed up query processing. Much research has been conducted regarding column oriented databases, especially comparing them to conventional row oriented databases. Examples include demonstrating that column oriented databases are in general more efficient than row oriented databases, such as when answering queries that do not access many attributes. Other research indicates that column oriented databases are well suited for handling vertically partitioned Resource Description Framework (RDF) data, achieving an order of magnitude improvement in efficiency compared with row oriented databases. Nonetheless, in reference to aggregation queries, current column stores still appear to be adopting aggregation processes, such as hash-based aggregation processes, designed and optimized for row stores.
In conventional hash-based aggregation, a hash table is used to store the aggregation values for all the groups. When the number of groups is large, the insert/update to the hash table will generally become slow. Embodiments provide for the use of several smaller hash tables instead of one large hash table for processing aggregation queries. The use of multiple smaller hash tables according to embodiments improves the performance and efficiency of aggregation queries. In addition, embodiments provide that data are partitioned by bitmap indexes and, accordingly, no hash function is required to partition data. Rather, hash functions are only used within data.
An experiment using multiple smaller hash tables serves as a non-limiting example of using multiple smaller hash tables instead of one larger hash table. The hash table implementation used in this experiment is the map class in C++ STL. For a table with 10 million tuples, a big hash table was used to record the occurrences of the groups, which took 35.22 seconds to complete. However, if 100,000 tuples were processed using a hash table each time, the process took 26.99 seconds. The time is further reduced to 15.54 seconds if 10,000 tuples are processed in each hash table, and 12.03, 9.94 and 7.87 seconds when processing 1,000 tuples, 100 tuples and 10 tuples, respectively. Accordingly, embodiments provide methods and systems for replacing a big hash table with several smaller hash tables.
In column oriented databases, the occurrences of a specific value in a column may be computed relatively easily. In addition, random accesses on column data are efficient if the data are kept in memory. Using the occurrence information of the value and a data array, all the groups involving a specific value can be computed using a hash table. In addition, since the hash table only contains entries related to the number of entries in the hash table, the hash table will be much smaller. For example, for all values ai in column A, a similar process may be conducted and the aggregation result of the column and the other columns may be efficiently computed.
Another non-limiting example involves a database with table SALES storing a set of transaction records using at least the two columns PRODUCT and STATE. In this non-limiting example, the query “SELECT PRODUCT, STATE, COUNT(*) FROM SALES GROUP BY PRODUCT, STATE” is used to obtain product data. Generating the count information of a product “Humidifier” requires finding the occurrences of “Humidifier” in PRODUCT. For each occurrence of “Humidifier,” using the data array of the STATE column, the state where a particular humidifier is sold may be ascertained. The count of all groups involving “Humidifier” is maintained in a hash table. After all occurrences of “Humidifier” are examined, the aggregation result is recorded in the hash table. The hash table for keeping the groups involving “Humidifier” will be much smaller, because all the groups are related to “Humidifier.” As there may be a larger number of products, the number of groups involving “Humidifier” is small compared with the total number of different groups in the SALES table.
The occurrence information of a value needs to be stored in memory in a manner that provides fast access. As such, a compact representation is required. Embodiments provide efficient and compact position information by using bitmap index to encode the occurrence positions of a value, using an efficient scheme, such as Word-Aligned Hybrid (WAH) and Byte-aligned Bitmap Code (BBC).
The random access of data arrays may be very efficient if the memory allocated for query processing is large enough to hold the data array of the column. However, this is usually not the case in large applications, where even the data array of one single column may not fit in memory. As such, efficient access of column data is needed when only a limited size of memory is available for query processing. Embodiments provide that when the memory size is not large enough, a divide and conquer technique is used wherein the data array and the bitmap index of the columns is partitioned into one or more smaller partitions, such that the memory allocated is enough for processing aggregation on the smaller partitions. Embodiments further provide that after the aggregations on all partitions are completed, the final result may be generated by merging the results from all partitions.
In practice, the number of aggregation attributes will normally be more than two. Accordingly, embodiments generalize the aggregation process such that aggregation queries with multiple aggregation attributes may be efficiently computed. Accordingly, embodiments provide that one of the aggregation attributes with an available bitmap index may be selected and the remaining columns treated as one single column, then the method for two aggregation attributes is applied.
Embodiments provide for aggregation methods designed for column oriented databases. In addition, embodiments provide for hash-based aggregation methods. Embodiments provide for the use of bitmap indexes with hash-based aggregation. In addition, if necessary, embodiments may use partitioned hash tables. Further embodiments provide for aggregation methods that leverage the bitmap index and the data array in column oriented databases such that aggregation computations are partitioned into a set of smaller problems, which are better able to achieve increased performance, efficiency, and scalability.
Typically, aggregation is supported as the following two forms: scalar aggregates and aggregate functions. In general, scalar aggregates compute a single value from a single input relation, such as SUM, MAX, MIN, and AVERAGE. Aggregate functions compute a set of values from the input relation, producing a relation as the result. To compute the result for aggregate function, grouping is needed for processing aggregate queries. Grouping is very similar to duplicate removal in a database, and may be implemented in a very similar way. As such, they are often used interchangeably.
In row oriented databases, aggregation requires that all data be consumed before the production of output. According to existing technology, two major types of processes for aggregation exist. One is based on sorting, and the other is based on hashing. Some research has demonstrated that the performance of hash-based aggregation is generally better than sort based aggregation.
Bitmap indexes are known to be efficient, especially for read-mostly or append-only data, and are commonly used in data warehousing applications and column oriented databases. As an example, Model 204® was the first commercial product making extensive use of a bitmap index. Model 204® is a registered trademark of Computer Corporation of America. Early bitmap indexes were used to implement inverted files. In data warehouse applications, bitmap indexes were shown to perform better than tree-based schemes, such as the variants of B-tree or R-tree. Compressed bitmap indexes are widely used in column oriented databases, such as C-Store, which contributes to its performance gain over row oriented databases. Various compression schemes for bitmap index have been developed. The development of bitmap compression methods and encoding strategies have further broaden the applicability of bitmap index. For example, bitmap index may be applied to all types of attributes and is very efficient for Online Analytical Processing (OLAP) and warehouse query processing.
A bitmap for an attribute may be viewed as a
Referring to
As an uncompressed bitmap is generally much larger than the original data, compression is typically used for attributes other than the primary key to reduce storage size and improve performance. In addition, with proper compression, bitmaps perform well for a column with cardinality up to 55% of the number of rows, that is, up to 55% rows having distinct values on this column.
Various compression methods for bitmaps have been proposed. For example, Word-Aligned Hybrid (WAH) and the earlier Byte-aligned Bitmap Code (BBC) are two important compression schemes that may be used for any column and for query processing without decompression. WAH can be as much as ten times faster than BBC, while occupying no more than 50% disk space. Another example is run length compression, which may be used for sorted columns. Embodiments may utilize any applicable bitmap compression scheme. Certain embodiments and non-limiting examples described herein utilize WAH compression methods.
WAH organizes the bits in a vector by words. A word can be either a literal word or a fill word, distinguished by the highest bit: 0 for literal words and 1 for fill words. Using
In column oriented databases, each column of a relation is stored continuously on disk. Each continuously stored column of data is referred to as a data array. An offset index may be built on top of the column, recording the value of the cell for each position. Given a position/offset (e.g. the row number), the value of the cell may be easily and quickly retrieved using data array.
For large hash tables, the insert/update of the hash table is costly due to collisions. In addition, performance downgrades with increased entries in the hash table. Embodiments provide that selecting a specific value of one column allows aggregation results to be split into smaller partitions. According to embodiments, if each of the partitions is computed using a hash table, the summation of the processing time of the set of smaller hash tables will be much smaller than using a large hash table. Embodiments provide for any implementation of hash tables that provide such results, including, but not limited to, STL map in C++ STL, sparse hash, and dense hash implementations. A non-limiting example of a sparse hash implementation is the implementation from the Google® Sparse Hash, project 1. Google is a trademark or registered trademark of Google, Inc. Computing aggregation using a large hash table is much more time consuming compared with computing an equivalent aggregation with a set of smaller hash tables according to embodiments. Accordingly, embodiments achieve better aggregation processing by using bitmap index and data array. In addition, embodiments process aggregation queries when memory is not big enough to hold all data to be aggregated and when there are more than two aggregation attributes.
Referring to
Referring to
Another non-limiting example involves processing columns A and B, wherein the bitmap index size of
Embodiments provide that minimizing the number of partitions leverages the memory given as efficiently as possible and also avoids overhead resulting from multi-way merge. In addition, embodiments provide for determining memory requirements. A non-limiting example of determining memory requirements provides for summing the size of the data array, the size of the bitmap index, and the size of memory that will be occupied by the hash map, and dividing the total memory by a given memory upper bound. Embodiments provide that the results may be rounded up and used as the number of partitions.
Embodiments provide processes for aggregation of multiple attributes. According to a first multiple aggregation embodiment, among
A second multiple aggregation embodiment aggregates multiple attributes by processing two grouping attributes at a time. A non-limiting example involves three grouping attributes
The first multiple aggregation embodiment results in more partitions, which results in a higher cost in the merge phase. However, the second multiple aggregation embodiment produces more intermediate results because the intermediate bitmaps need to be stored in memory (writing the intermediate bitmap to disk and reading it back for processing the next grouping attribute is obviously more costly since more I/O is needed). In addition, the number of required partitions is increased and more hash operations are conducted when processing two attributes at a time. Experiments indicate that the first multiple aggregation embodiment demonstrates increased performance when compared with the second multiple aggregation embodiment, especially when the number of grouping attributes is high. Accordingly, the construction of intermediate bitmaps and the use of more hash operations appears to be more costly compared with the costs saved when using fewer partitions.
Another non-limiting example involves a hash-based implementation that reads tuples from a file, and uses one in a memory hash table to record the count of each aggregation group seen. If a memory size bound is given, whenever the memory is not enough to keep all groups seen in memory, all entries and their count values will be written in sorted order (e.g., in alphabetical order of the group value) to temporary files. After all tuples are scanned, all intermediate results will be merged to generate the final result.
Experiments were conducted evaluating the performance of methods disclosed herein. A first experiment used two hash implementations on a relation to demonstrate that a set of smaller hash tables may be more efficient than a larger hash table. A second experiment was conducted to evaluate the effectiveness of certain embodiments, which included implementing state-of-the-art hash-based aggregation processing using hash table. In these experiments, all of the systems and processes were tested in terms of several factors, including data size, number of aggregation attributes, and the amount of memory allocated for query processing. The experiments indicated at least two results: (1) using a set of smaller hash tables for aggregation is faster than using a large hash table; and (2) aggregation processing according to embodiments outperforms the state-of-the-art aggregation processing method.
The experiments were conducted on a machine with Intel® Pentium® IV dual core processor of 3.6 GHz, 2.0 GB main memory and a 7200 rpm SATA hard drive, running Ubuntu® 9.10 with kernel 2.6.31-19. Intel and Pentium are registered trademarks of the Intel Corporation. Ubuntu is a trademark or registered trademark of Canonical Limited. Both the hash-based aggregation method and the bitmap-based method were implemented in the C++ programming language. However, other hardware and software components and configurations are equally applicable and embodiments are not limited to those components and configurations specified in these experiments.
In these experiments a hash-based process termed “globalHash” was used that assumed that the memory size was large enough to hold all of the aggregation groups and their counts in the hash table in memory. A second experimental hash-based process, referred to as “localHash,” used only small hash tables and did not produce final aggregation results, but rather, only provided partial results. Given a number that denoted the maximum size of tuples (e.g. 10,000) a hash table may process, localHash read tuples from the file and stored the count of each group in a hash table. Whenever the number of tuples scanned reached the maximum tuple number given, the hash table was dropped and a new hash table was used when the next tuple was scanned. Embodiments provide that by replacing a global hash function with a number of local hash functions using the bitmap index partition, a significant improvement in hashing performance, and therefore aggregation performance, may be achieved.
In most real applications, the memory available for query processing is quite limited. Accordingly, experiments were conducted comparing the performance of the hash-based method and the bitmap-based method according to embodiments in situations where a memory upper limit for query processing was specified. Experiments tested both methods with a synthetic data set, generated according to Zipfian distribution, and a realistic data set. Experimental results showed that bitmap-based aggregation methods according to embodiments are memory-efficient and outperform the hash-based aggregation method in most cases. In the experiments described herein, each data set was tested by varying different parameters, including the number of tuples in the relation, the memory upper limit for query processing, the number of distinct values in a column, and the number of aggregation attributes.
Referring to
In addition,
Referring to
Further experiments were performed using realistic data containing one relation with nine attributes, wherein the number of distinct values was not controlled. Referring to
The experimental results using the realistic data demonstrates that assumptions made during evaluation of the synthetic data set (i.e., Zipfian) matches characteristics of the realistic data. Accordingly, evaluation using a synthetic data set is representative of realistic data set. In addition, the bitmap-based and hash-based methods demonstrated similar behavior on both data sets, further indicating that bitmap-based methods according to embodiments provided better performance than aggregation methods according to present technology.
Embodiments provide an efficient and scalable aggregation process. Aggregation processes according to embodiments provide a performance advantage compared to aggregation processes according to current technology, especially when the number of aggregation attributes is relatively small. Embodiments provide for the use of several smaller hash tables instead of a large hash table for processing aggregation queries. The use of multiple smaller hash tables according to embodiments improves the performance and efficiency of aggregation queries. In addition, embodiments provide for aggregation methods that leverage the bitmap index and the data array such that aggregation computations are partitioned into a set of smaller problems, which are better able to achieve increased performance, efficiency, and scalability. Embodiments described herein focus on column oriented databases as a prominent example, but are not so limited, as embodiments may operate on any type of database capable of taking advantage of aggregation processes as described in this disclosure.
Referring to
Components of computer 1510 may include, but are not limited to, processing units 1520, a system memory 1530, and a system bus 1522 that couples various system components including the system memory 1530 to the processing unit 1520. Computer 1510 may include or have access to a variety of computer readable media. The system memory 1530 may include computer readable storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) and/or random access memory (RAM). By way of example, and not limitation, system memory 1530 may also include an operating system, application programs, other program modules, and program data.
A user can interface with (for example, enter commands and information) the computer 1510 through input devices 1540. A monitor or other type of device can also be connected to the system bus 1522 via an interface, such as an output interface 1550. In addition to a monitor, computers may also include other peripheral output devices. The computer 1510 may operate in a networked or distributed environment using logical connections to one or more other remote computers or databases, such as a column oriented database. The logical connections may include a network, such as a local area network (LAN) or a wide area network (WAN), but may also include other networks/buses.
It should be noted as well that certain embodiments may be implemented as a system, method or computer program product. Accordingly, aspects of the invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, et cetera) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied therewith.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, et cetera, or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java™, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer (device), partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of the invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatuses (systems) and computer program products according to example embodiments. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
This disclosure has been presented for purposes of illustration and description but is not intended to be exhaustive or limiting. Many modifications and variations will be apparent to those of ordinary skill in the art. The example embodiments were chosen and described in order to explain principles and practical application, and to enable others of ordinary skill in the art to understand the disclosure for various embodiments with various modifications as are suited to the particular use contemplated.
Although illustrated example embodiments have been described herein with reference to the accompanying drawings, it is to be understood that embodiments are not limited to those precise example embodiments, and that various other changes and modifications may be affected therein by one skilled in the art without departing from the scope or spirit of the disclosure.
The application is a continuation of U.S. patent application Ser. No. 13/116,387, entitled SYSTEMS AND METHODS FOR QUERYING COLUMN ORIENTED DATABASES, filed on May 26, 2011, which is incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | 13116387 | May 2011 | US |
Child | 13595681 | US |