In parallel processing, many computational steps (operations) may be performed simultaneously, as opposed to serial processing, in which operations may be performed sequentially. Serial processing is relatively straightforward, and may be accomplished without a great deal of technological sophistication.
Parallel processing, on the other hand, may achieve improved processing and input/output speeds (in comparison to serial processing) by exploiting resources, such as multiple microprocessors and disks. In parallel processing, data and resources may be coordinated in parallel operation to effectively exploit the resources of a parallel processing system.
Traditional database management systems (DBMSs), such as centralized and client-server database systems, typically employ serial processing. As such, traditional DBMSs may not be sophisticated enough to take advantage of the efficiencies of parallel processing.
A parallel DBMS, on the other hand, may be a performance-oriented system that attempts to achieve efficiencies over traditional DBMSs through parallel processing. Efficiencies may be achieved in activities such as loading data, building indexes of database tables, and evaluating and executing database queries. While the data in the databases of a parallel DBMS may be stored in a distributed fashion, the distribution is typically configured to achieve performance efficiencies.
Certain exemplary embodiments are described in the following detailed description and in reference to the drawings, in which:
Additionally, the functional blocks and devices of the system 100 are but one example of functional blocks and devices that may be implemented in an exemplary embodiment of the present invention. Those of ordinary skill in the art would readily be able to define specific functional blocks based on design considerations for a particular electronic device.
The system 100 may include a database server 102, and one or more client computers 104, in communication over a network 130. As illustrated in
In an exemplary embodiment of the invention, the system 100 may include multiple database servers 102. In such an exemplary embodiment, the system 100 may include a shared nothing architecture.
A shared nothing architecture may comprise a distributed computing architecture in which each node is independent and self-sufficient. In a shared nothing architecture, there is no single point of contention across the system. Shared nothing may be contrasted with systems that keep a large amount of centrally-stored state information, whether in a database, an application server, or any other similar single point of contention. In such systems, resources, such as memory, may be shared. As a result, some processes may be slowed, or even stopped, while waiting for a separate process to release the shared resource.
The database server 102 may also be connected through the bus 113 to a network interface card (NIC) 126. The NIC 126 may connect the database server 102 to the network 130. The network 130 may be a local area network (LAN), a wide area network (WAN), or another network configuration, such as the Internet. The network 130 may include routers, switches, modems, or any other kind of interface device used for interconnection.
Through the network 130, several client computers 104 may connect to the database server 102. The client computers 104 may be similarly structured as the database server 102, with exception to the storage of a database management system (DBMS) 124. In an exemplary embodiment, the client computers 104 may be used to submit queries to the database server 102 for execution by the DBMS 124.
The database server 102 may have other units operatively coupled to the processor 112 through the bus 113. These units may include tangible, machine-readable storage media, such as storage devices 122.
The storage devices 122 may include media for the long-term storage of operating software and data, such as hard drives. The storage devices 122 may also include other types of tangible, machine-readable media, such as read-only memory (ROM), random access memory (RAM), and cache memory. The storage devices 122 may include the software used in exemplary embodiments of the present techniques.
The storage devices 122 may include the DBMS 124, a defaults table 129, and a query 128. The DBMS 124 may be a set of computer programs that controls the creation, maintenance, and use of databases by an organization and its end users. In an exemplary embodiment of the invention, the DBMS 124 may be a parallel DBMS. The DBMS 124 is described in greater detail with reference to
The query 128 may be any of a certain class of common database queries. For example, the query 128 may specify that multiple operations are performed on multiple columns of a database table, view, or join. The operations may include multiple grouping operations with aggregation functions, such as maximum, minimum, average, count, sum and the like. In an exemplary embodiment of the invention, the query 128 may specify that each operation is performed on a different column, or set of columns, with each operating on a different grouping of data. For example, the query 128 may include the following structured query language (SQL) statement:
As shown, the SQL STATEMENT 1 specifies multiple aggregate functions COUNT DISTINCT, COUNT DISTINCT, and AVG DISTINCT. The COUNT function may return the number of rows in TABLE with non-NULL values in column A, with the data to be grouped on column A. As shown, each distinct aggregate function operates on a different grouping of the data.
The second COUNT DISTINCT function may return the number of rows in TABLE with distinct non-NULL values in column B, with the data to be grouped on column B. The AVG DISTINCT function may return the average of the distinct non-Null values for column C, with the data to be grouped on column C.
In an exemplary embodiment of the invention, any particular aggregate function may be performed on different data groupings. SQL STATEMENT 2 below illustrates one such example:
As shown, the SQL STATEMENT 2 includes a MIN function on column C, a grouping on columns A and B independently, and then combining the two group-by output streams through a UNION ALL operation. The same aggregation function MIN(C) may be used in both groupings.
The query plans 134 may each specify alternate ways to perform the query 128. The execution engine 136 may execute one of the query plans 134 against the databases 140 to produce a query result.
The optimizer 137 may generate the query plans 134 from the query 128 itself. The optimizer 137 may also select an optimal query plan 134 for the execution engine 136 to execute. The selection may be based on estimated processing costs of the generated query plans 134.
The query plan 134 may include one or more operators. The operators of the query plan 134 may have predefined behaviors that accomplish the work of the query. In particular, operators may act on inputs, referred to as input streams, and produce outputs, referred to as output streams.
Put simply, the transpose operator may expand a single input row into multiple rows such that each column value in the input row is used to create a new row. In this example, the input stream 220 includes a single row of data with column values, “(23, 43, 35).” The transpose operator 210 may convert the single row of 3 columns to 3 rows, one for each column value of the input stream 220. As shown, the output stream 230 contains 3 rows, one for each column value of the input stream 220, “23,” “43,” and “35.”
Within the query plan 134, the output stream for one operator may become the input stream for a subsequent operator. For example, to implement multiple groupings, the output stream 230 may be used as an input stream to a group-by operator. The group-by operator may perform the work of the grouping and aggregation function, taking advantage of the new format of the transposed data in the output stream 230.
However, as shown, the size of the output stream 230 may be significantly greater than the size of the input stream 220. As such, subsequent operations may become unwieldy in the face of large input streams with large numbers of groupings. However, parallel processing may be used for significantly large data streams.
In a parallel DBMS, the query plans 134 may also include an exchange operator. The exchange operator may partition an input stream of data into multiple streams. As such, subsequent operations may be performed, each on one portion of the partitioned data.
For example, the exchange operator may be used in a query plan 134 for the following SQL:
If TABLE includes 2 million rows, the exchange operation may distribute the 2 million rows among multiple streams to subsequent operations. The transpose and group-by operations may then be performed in parallel, computing the sums for each partition of rows. The disparate output streams of the parallel operations may then be input to another operator that computes the sum of each of the sums computed in parallel, providing a SUM result for SQL STATEMENT 3.
While the exchange operator may be efficient, executing the query plan 134 may become unwieldy when there are a large number of rows in the input and the query plan 134 specifies many groupings. For each grouping, the transpose operator may produce an additional row of the input. Additionally, each row produced may contain a column for each grouping. In other words, the size of the data may be increased by a factor of the number of groupings. As a result, very large data sets with multiple groupings may overwhelm the group-by operator, even with the use of the exchange operator. This may incur a significant cost in I-O, particularly for large tables.
In an exemplary embodiment of the invention, the query plans 134 may include a new hybrid exchange operator as part of the implementations of multiple groupings. The hybrid exchange operator may partition the data in a single pass. Additionally, the hybrid exchange operator may be used in place of a transpose operator for multiple groupings, or in place of the exchange and transpose operators in the case of a parallel evaluation of multiple groupings. This may help avoid the expansion in size between the input stream and output stream.
Inclusion of the hybrid exchange operator in the query plan 134 may also be supplemented with a new combine operator that combines outputs of the parallel operations. As such, the combine operator may represent the functionality of a JOIN, UNION, or similar SQL clause.
For example, instead of a partitioning function that partitions data by rows for each aggregate function, the hybrid exchange operator may partition the data according to the groupings (for example the columns specified in each distinct aggregate function or the columns specified in the grouping sets). As such, if each input row includes n columns, C1, C2, . . . , Cn, the hybrid exchange may produce n partitions each containing one column.
In an exemplary embodiment of the invention, a number of partitioning functions may be applied to each row of a table. Each partitioning function may be applied to the same input row, but may extract a distinct column, or combination of columns. In this manner, the operators for each grouping may receive a part of each row. The parts may not be disjoint, depending on the semantics of the query 128.
It should be noted that splitting the input streams by columns may result in functional parallelism, as each output stream of the hybrid exchange operator may be input to different operators, each of which may perform different groupings and different aggregate functions. In an exemplary embodiment of the invention, the different operators may perform the same aggregate functions, but on different columns.
SQL STATEMENT 1 includes 3 distinct aggregate functions, one for each of columns A, B, and C. As such, the hybrid exchange operator 315A may apply a partitioning function (PF) for each of the aggregate functions. PF1 may partition the values for column A from each row of TABLE 1. Similarly, PF2 and PF3 may partition the values for columns B and C, respectively from each row of TABLE 1.
As shown, each output stream 320A may contain the values of only one column. Each output stream 320A may then be input to operators for performing the grouping and aggregate functions.
SQL STATEMENT 2 includes aggregate function, MIN (C). As such, PF4 may partition the values for both columns A and C. However, PF5 may partition the values for both columns B and C. In an exemplary embodiment of the invention, the partitioning functions for a particular hybrid exchange operator may produce output streams that overlap. As such, the values for column C may be included in both output streams of the hybrid exchange operator 315B.
It should be noted that the hybrid exchange operator may provide an improvement in efficiencies for calculating statistics, such as those include in histograms. In an exemplary embodiment of the invention, histograms could be generated by queries 128 that specify aggregation functions for multiple columns with multiple groupings within a single query 128. By optimizing such queries 128 using the hybrid exchange operator, statistics that would normally be computed with several passes over data may be accomplished with a single pass.
The method begins at block 402. At block 402, the DBMS 124 optimizes the query 128, thereby generating a query plan 134 for execution by the execution engine 136. The optimizer 137 may include any legal operations, including the new hybrid exchange and combine operators.
At block 404, the DBMS 124 partitions data of the table based on the operations. As stated previously, the operations may include aggregation functions, such as MIN, MAX, COUNT, SUM, AVG and the like. At block 406, the DBMS 124 performs the operations on the partitioned data in parallel.
Moreover, tangible, machine-readable medium 500 may be included in the storage 122 shown in
A region 506 of the tangible, machine-readable medium 500 stores machine-readable instructions that, when executed by the processor 502, receive a query that specifies a plurality of aggregate functions on a corresponding plurality of columns within a database table. A region 508 of the tangible, machine-readable medium 500 stores machine-readable instructions that, when executed by the processor 502, optimize the query. The query may specify multiple operations on corresponding multiple column sets of a table.
A region 510 of the tangible, machine-readable medium 500 stores machine-readable instructions that, when executed by the processor 502, partition data of the table based on the operations. A region 512 of the tangible, machine-readable medium 500 stores machine-readable instructions that, when executed by the processor 502, perform the operations on the partitioned data in parallel, whereby the operations are performed within a single pass of the data.