The present invention relates to database systems, and in particular, to compiling and computing database statements.
To interact with a database server, a database statement is issued to a database server to cause the database server to perform operations on data stored in the database. For the database server to process the commands, the database statements must conform to a database language supported by the database server. One database language supported by many database servers is known as the Structured Database statement Language (SQL). SQL, as the term is used herein, refers to forms that conform to ANSI standards and/or proprietary standards (e.g. SQL supported by Oracle™ database servers).
A database statement that conforms to database language is referred to herein as a query. The term query encompasses database statements that specify and/or declare data manipulation language (“DML”) operations, including, without limitation, select, insert, update, and delete.
SQL queries may contain a table function, which when executed, returns a collection of elements (e.g. objects). Such queries are referred to herein as table function queries. The following query QE illustrates an example of a table function query.
The table function Persons_tf (1) is contained within a TABLE clause. During execution of QE by a database server, the function Persons_tf (1) is computed to return a collection of objects. To compute the TABLE clause, the collection of elements is converted into a set of rows, each row corresponding to an element.
The elements in the collection returned by a table function each have the same attributes or fields. The elements may be rows or tuples or objects of an object type. An object type is a set of attributes and associated routines and functions that operate on the state of the object, e.g. attributes. The routines or functions of the object type are referred to herein as object methods.
Compiling an SQL statement, as the term is used herein, refers to the process of determining and optimizing operations or steps, resources, and/or data structures that are required to evaluate the SQL statement. A compiler that compiles an SQL statement forms an execution plan that specifies steps for computing the SQL statement. An execution plan may comprise a separate set of steps for computing a table function which includes invoking an implementation of the table function to return results of the table function.
The present application describes novel ways of compiling and computing table function statements.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.
Table functions can be used to encapsulate the logic of retrieving data and returning the data in a relational format, e.g. returning a collection of elements that are rows or objects. During execution of the table function statement, the table function internally executes another query and returns the results. According to an embodiment, the original query is rewritten by replacing the table function with the query it intends to execute internally. Thus, when a query compiler optimizes the rewritten query, it is able to optimize in a way that cognizant of the entire set of operations needed for both the outer original query in the query and internally executed query that would otherwise be executed by the table function.
According to an embodiment, table function queries are rewritten by query compilers of database servers. Generally, a server, such as a database server, is a combination of integrated software components and an allocation of computational resources, such as memory, a node, and processes on the node for executing the integrated software components, where the combination of the software and computational resources are dedicated to providing a particular type of function on behalf of clients of the server. A database server governs and facilitates access to a particular database, processing requests by clients to access the database.
A database comprises data and metadata that is stored on a persistent memory mechanism, such as a set of hard disks. Such data and metadata may be stored in a database logically, for example, according to relational and/or object-relational database constructs. Database metadata defines database objects, such as tables, object tables, views, or complex types, such as object types, and, importantly table functions. SQL data definition language (“DDL”) instructions are issued to a database server to create or configure database objects.
Generally, data is stored in a database in one or more data containers, each container contains records, and the data within each record is organized into one or more fields. In relational database systems, the data containers are typically referred to as tables, the records are referred to as rows, and the fields are referred to as columns. In object oriented databases, the data containers are typically referred to as object types or classes, the records are referred to as objects, and the fields are referred to as attributes. Other database architectures may use other terminology. Systems that implement the present invention are not limited to any particular type of data container or database architecture. However, for the purpose of explanation, the examples and the terminology used herein shall be that typically associated with relational or object-relational databases. Thus, the terms “table”, “row” and “column” shall be used herein to refer respectively to the data container, record, and field.
A query compiler receives a query and generates an internal query representation of the query. Typically, the internal query representation is a set of interlinked data structures that represent various components and structures of a query statement. The internal representation is typically generated in memory for evaluation, manipulation, and transformation by a query compiler.
A query compiler may generate one or more different candidate execution plans for a query, which are evaluated by the query compiler to determine which should be used to compute the query. Execution plan operations include, for example, a table scan, an index scan, hash-join, sort-merge join, nested-loop join, and filter.
A query compiler may optimize a query by transforming the query. In general, transforming a query involves rewriting a query into another query that should produce the same result and that can potentially be executed more efficiently, i.e. one for which a potentially more efficient and less costly execution plan can be generated. The query as transformed is referred to herein as the transformed query. The query is rewritten by manipulating a copy of the query representation to form a transformed query representation representing a transformed query.
According to an embodiment, a table function is a user-defined function that is registered within a database server. Registering the user-defined function enables the database server to recognize and handle the user-defined function, like natively supported functions, when the user-defined function is presented in queries. Natively supported functions are those defined by an SQL standard (e.g. sum, max).
Registering a user-defined function refers to a database system receiving as input the definition of a user-defined function and configuring itself (e.g. generating metadata) to handle the user-defined functions when the functions appear in queries compiled by the database system. The definition includes the name of the function, arguments and return type of the function, implementations (e.g. code, routines, function) to execute and compute the function. The implementation may have to conform to a format, which may depend on the kind of user-defined function being registered. The implementation may include multiple routines and functions. For example, the implementation may include a separate implementation function for initialization, iteration, and termination.
For a table function, the implementation may include a function that returns a replacement query to replace the table function in a rewrite of a query. Such a function is referred to herein as a replacement function. According to an embodiment, a query compiler calls the function to retrieve a replacement query that may be used to replace the table function. The replacement query may be a text string or internal query representation used by a query compiler. An embodiment is not limited to a particular form of a replacement query.
According to an embodiment, database metadata associates a replacement function with a table function. Several ways of associating the replacement function with a table function are described. The present invention is however not limited to any particular way of associating a replacement function with a table function.
Before a user-defined function may be registered, other database objects may have to be defined. For example, if a user-defined function returns a user-defined data type, the date type must first be defined by, for example, submitting DDL statements to a database server.
According to the embodiment, a table function may be implemented using the following steps.
1. An object type of objects of a collection that is returned by the table function is defined by issuing the following DDL statements.
2. A table type for an object table is defined. This data type is an object collection of the object type Person_t. The table function is defined to return this collection data type.
3. An object type that defines a replacement function as an object method of an object type is created. The following DDL statement may be issued to a database server to define such an object type Impl_t and replacement function ODCITableRewrite( ).
ODCITableRewrite( ) returns a replacement query as a text string returned via the argument sql_str. According to an embodiment, the argument list of a replacement function includes the argument list of the table function. In ODCITableRewrite ( ), the table function argument list is after the first three arguments, i.e. the argument criteria number.
4. An implementation for the replacement function ODCITableRewrite ( ) is provided to the database server by issuing the following DDL statement.
The implementation is coded in PL/SQL, a language supported by Oracle™ database servers.
5. The table function is defined and associated with the replacement function by associating the table function with the object type defining the replacement function, using the following DDL statement.
The clause using Impl_t identifies to the database server the object type that contains an implementation for the replacement function for the table function Persons_tf.
At step 105, a query compiler determines whether query QS contains a table function, which query QS does.
At step 110, in response to determining that query QS contains a table function, the query compiler determines whether the table function is associated with a replacement function. How the determination is made depends on how a replacement function is associated with a table function. In the current illustration, the query compiler determines that an object type associated with the table function Persons_tf (1) has an object method name ODCITableRewrite. This determination is made by examining database metadata defining the function Persons_tf ( ) and the associated object type definition and implementation for Impl_t.
At step 115, in response to determining the replacement function ODCITableRewrite ( ) is associated with the table function, the replacement function is invoked.
At step 120, the query compiler replaces, in effect, the table function and the TABLE clause with the replacement query. In an embodiment, a replacement query may not be returned. In this case, replacing the table function is foregone.
The replacement query returned by the replacement function may not always be the same, even though the implementation of the replacement function does not change. The implementation may have logic to return different replacement queries under different conditions. The value of the replacement query may depend on, for example, the value of an argument of a replacement function. In fact, different replacement queries may be returned at different times for the same argument values.
In an embodiment, certain table functions may be defined as rewrite-only functions. Any time a rewrite function occurs in a query, the query is rewritten to replace the table function with a replacement query. According to an embodiment, a rewrite-only function is created using a DDL statement that not only specifies to create the function but also specifies an implementation for the body of the function that generates a replacement query. The following DDL statement defines Persons_tf as a rewrite-only function.
In response to receiving the DDL statement, the database generates metadata defining Persons_tf as a rewrite-only table function. The rewrite-only function body implementation in effect serves as the replacement function. When a query compiler compiles a table function query that contains a rewrite-only function, the query determines that the database metadata defines the table function as a rewrite-only function and invokes the implementation defined for the table function.
Finally, in an embodiment it may not be necessary to include a table function in a TABLE clause. The table function may be included in the FROM clause as another source of tuples, similar to a label name for a table or view.
Computer system 200 may be coupled via bus 202 to a display 212, such as a cathode ray tube (CRT), for displaying information to a computer user. An input device 214, including alphanumeric and other keys, is coupled to bus 202 for communicating information and command selections to processor 204. Another type of user input device is cursor control 216, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 204 and for controlling cursor movement on display 212. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
The invention is related to the use of computer system 200 for implementing the techniques described herein. According to one embodiment of the invention, those techniques are performed by computer system 200 in response to processor 204 executing one or more sequences of one or more instructions contained in main memory 206. Such instructions may be read into main memory 206 from another machine-readable medium, such as storage device 210. Execution of the sequences of instructions contained in main memory 206 causes processor 204 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.
The term “machine-readable medium” as used herein refers to any medium that participates in providing data that causes a machine to operation in a specific fashion. In an embodiment implemented using computer system 200, various machine-readable media are involved, for example, in providing instructions to processor 204 for execution. Such a medium may take many forms, including but not limited to storage media and transmission media. Storage media includes both non-volatile media and volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 210. Volatile media includes dynamic memory, such as main memory 206. Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 202. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications. All such media must be tangible to enable the instructions carried by the media to be detected by a physical mechanism that reads the instructions into a machine.
Common forms of machine-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.
Various forms of machine-readable media may be involved in carrying one or more sequences of one or more instructions to processor 204 for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 200 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 202. Bus 202 carries the data to main memory 206, from which processor 204 retrieves and executes the instructions. The instructions received by main memory 206 may optionally be stored on storage device 210 either before or after execution by processor 204.
Computer system 200 also includes a communication interface 218 coupled to bus 202. Communication interface 218 provides a two-way data communication coupling to a network link 220 that is connected to a local network 222. For example, communication interface 218 may be an integrated services digital network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 218 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 218 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
Network link 220 typically provides data communication through one or more networks to other data devices. For example, network link 220 may provide a connection through local network 222 to a host computer 224 or to data equipment operated by an Internet Service Provider (ISP) 226. ISP 226 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet” 228. Local network 222 and Internet 228 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 220 and through communication interface 218, which carry the digital data to and from computer system 200, are exemplary forms of carrier waves transporting the information.
Computer system 200 can send messages and receive data, including program code, through the network(s), network link 220 and communication interface 218. In the Internet example, a server 230 might transmit a requested code for an application program through Internet 228, ISP 226, local network 222 and communication interface 218.
The received code may be executed by processor 204 as it is received, and/or stored in storage device 210, or other non-volatile storage for later execution. In this manner, computer system 200 may obtain application code in the form of a carrier wave.
In the foregoing specification, embodiments of the invention have been described with reference to numerous specific details that may vary from implementation to implementation. Thus, the sole and exclusive indicator of what is the invention, and is intended by the applicants to be the invention, is the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction. Any definitions expressly set forth herein for terms contained in such claims shall govern the meaning of such terms as used in the claims. Hence, no limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.