Structured Query language (“SQL”) is widely used by software professionals to query relational databases. One powerful feature of SQL is the nested query. Nested queries are sub-queries embedded within a larger query such that the output thereof becomes input for the larger query. The need for such queries typically arises when sought after data depends on multiple tables within the database.
Database management systems (“DBMS”) are typically equipped with processing subsystems known as optimizers. Such optimizers aim to provide a runtime engine of the DBMS with the most efficient plan for executing a received query. By way of example, the following query may be received by a DBMS:
The query shown above includes two nested queries within parenthesis: (select x from bar where bar.y=foo.b) and (select y from bar where bar.x=foo.a). Since execution of each nested query would require multiple evaluations of every row in table foo, an optimizer may devise a plan to filter out rows that fail the first nested query such that the second nested query only evaluates the satisfactory rows. Therefore, if only one thousand of one million rows stored in table foo satisfy the first nested query, whose predicate is “bar.y=foo.b,” the predicate of the second nested query, which is “bar.x=foo.a,” would only evaluate the one thousand rows. This is more efficient than evaluating one million rows multiple times. Optimizers may devise different plans for different types of queries.
As noted above, optimizers of a query processing subsystem may generate a plan to execute nested queries efficiently in order to minimize the database workload. However, certain optimizations such as the plan described above cannot be used for all types of queries. Certain types of queries may lead to erroneous results if the aforementioned optimization plan is applied. For example, the following query may be received by a DBMS:
The two nested queries above are joined with a disjunctive “OR” expression. Optimization techniques may cause the first nested query to filter out rows of data that could have satisfied the second nested query. Therefore, every row of table foo must be evaluated twice to obtain accurate results. Such extra processing may hinder database performance when a table contains millions of rows therein. Database performance may be further diminished if such queries are executed frequently.
In view of the foregoing, various examples disclosed herein provide a system, method, and non-transitory computer readable medium to optimize nested queries. In one aspect, at least one field may be associated with each row of data evaluated by at least one nested query. In a further aspect, a value may be assigned to the at least one field so as to indicate whether each row of data satisfies the at least one nested query. In yet a further aspect, the at least one field associated with each row of data evaluated by the at least one nested query may be evaluated to determine satisfaction thereof.
The techniques disclosed herein optimize any type of query expression while providing accurate results thereto. As such, users are provided with greater flexibility when coding complex queries without diminishing database performance. The aspects, features and advantages of the application will be appreciated when considered with reference to the following description of examples and accompanying figures. The following description does not limit the application; rather, the scope of the application is defined by the appended claims and equivalents.
Computer 102 may communicate with other computers via a network, which may be a local area network (“LAN”), wide area network (“WAN”), the Internet, etc. Computer 102 may use various protocols to communicate with other computers over the network. Such protocols may include virtual private networks, local Ethernet networks, private networks using communication protocols proprietary to one or more companies, cellular and wireless networks, instant messaging, HTTP and SMTP, and various combinations of the foregoing.
Computer 102 may be equipped with a processor 110 and memory 112. Memory 112 may store DBMS instructions 118 which may be retrieved and executed by processor 110. Furthermore, memory 112 may contain a database 120, which may be retrieved, manipulated, or stored by processor 110. In one example, memory 112 may be a random access memory (“RAM”) device. Alternatively, memory 112 may comprise other types of devices, such as memory provided on floppy disk drives, tapes, and hard disk drives, or other storage devices that may be directly or indirectly coupled to computer 102. The memory may also include any combination of one or more of the foregoing and/or other devices as well. The processor 110 may be any number of well known processors, such as processors from Intel® Corporation. In another example, processor 110 may be a dedicated controller for executing operations, such as an application specific integrated circuit (“ASIC”).
Although
Although the architecture of database 120 is not limited to any particular database structure or product, the data thereof may be stored in computer registers, in a relational database as tables having a plurality of different columns and records, XML documents or flat files. The data stored in database 120 may comprise any information sufficient to identify the relevant data, such as numbers, descriptive text, proprietary codes, references to data stored in other areas of the same memory or different memories (including other network locations) or information that is used by a function to calculate the relevant data.
Computer 102 may be configured as a database server. In this regard, computer 102 may be capable of communicating data with a client computer such that computer 102 uses a network to transmit information for presentation to a user of a remote computer. Accordingly, computer 102 may be used to obtain information from database 120 for display via, for example, a web browser executing on a remote. computer. Computer 102 may also comprise a plurality of computers, such as a load balancing network, that exchange information with different computers of a network for the purpose of receiving, processing, and transmitting data to multiple client computers. In this instance, the client computers will typically still be at different nodes of the network than any of the computers comprising computer 102.
DBMS instructions 118 may comprise any set of instructions to be executed directly (such as machine code) or indirectly (such as scripts) by the processor(s). In that regard, the terms “instructions,” “steps” and “programs” may be used interchangeably herein. The instructions may be stored in any computer language or format, such as in object code or modules of source code. Furthermore, it is understood that DBMS instructions 118 may be implemented in the form of hardware, software, or a combination of hardware and software and that the examples herein are merely illustrative. DBMS instructions 118 may configure processor 110 to generate database query responses, to update the database, to provide database usage statistics, or to serve any other database related function.
DBMS instructions 118 may configure processor 110 to implement the query optimization techniques disclosed herein. Such instructions may be encoded in nested query optimizer 114. DBMS instructions 118 may be realized in any non-transitory computer-readable media for use by or in connection with an instruction execution system such as a computer 102, an ASIC or other system that can fetch or obtain the logic from non-transitory computer-readable media and execute the instructions contained therein. “Non-transitory computer-readable media” can be any media that can contain, store, or maintain programs and data for use by or in connection with the instruction execution system. Non-transitory computer readable media may comprise any one of many physical media such as, for example, electronic, magnetic, optical, electromagnetic, or semiconductor media. More specific examples of suitable non-transitory computer-readable media include, but are not limited to, a portable magnetic computer diskette such as floppy diskettes or hard drives, a read-only memory (“ROM”), an erasable programmable read-only memory, or a portable compact disc.
One working example of a system and method to optimize nested queries is shown in
As shown in block 202 of
The query above requests rows in table foo that satisfy a first nested query “a IN (select x from bar)” and a second nested query “a IN (select p from pio).” Referring now to
Referring back to
In “semi join” module 310 of
Referring back to
In addition to nested queries that are disjunctively joined, other types of complex nested queries may be enhanced using indicator fields. A complex nested query may include conjunctively joined nested queries, such as (select max(x) from bar) +(select max(a) from foo). A complex nested query may also include a nested query whose output serves as input to another expression, such as (foo.a in (select x from bar) IS TRUE. The foregoing is a non-exhaustive list of complex query types that may be enhanced using the techniques described above.
Advantageously, the above-described system, method, and non-transitory computer readable medium enhance the performance of any nested query expression. Instead of filtering out data that fail a nested query, all data is preserved with an indication of whether each row thereof satisfied the nested query. In this regard, all the preserved data may be cached such that the database tables are accessed only once. Although the disclosure herein has been described with reference to particular examples, it is to be understood that these examples are merely illustrative of the principles of the disclosure. It is therefore to be understood that numerous modifications may be made to the examples and that other arrangements may be devised without departing from the spirit and scope of the disclosure as defined by the appended claims. Furthermore, while particular processes are shown in a specific order in the appended drawings, such processes are not limited to any particular order unless such order is expressly set forth herein. Rather, processes may be performed in a different order or concurrently, and steps may be added or omitted.