This invention relates to a method of modifying a structured query language (SQL) statement in order to reduce the number of joins executed by a database to which the SQL statement is presented.
Joins are a well known database construct used for selecting data from more than one table. Typically, the rows from these tables are paired using a specified join condition and the resulting rows are returned by the database. However, they are fairly complicated for most users of a database to understand and hence, it is common practice to hide joins from users. This may be accomplished in several ways, for example using a database view or a Discoverer® complex folder, thereby presenting a user with data that is already joined. Such a view may also hide complexity in the base tables used to form the view. For example, the base tables may have a large number of columns and the view may then only expose those columns more frequently referred to.
The problem arises when a user presents an SQL statement to the database which retrieves data from the view and also from one of the columns of one of the base tables that the view does not expose. Such an SQL statement must join the view to the base table in order to retrieve data from the column. The processing of this additional join will slow down the execution of the SQL statement considerably.
In accordance with one aspect of the present invention, there is provided a method of modifying a Structured Query Language (SQL) statement in order to reduce the number of joins executed by a database, the method comprising:
In accordance with a second aspect of the invention, there is provided a database system comprising a store in which at least one table is stored, and a processor adapted to:
The invention exploits the fact that a join from a view to one of its base tables is superfluous, being effectively a join from the base table back to itself, and by removing the join from the SQL statement improves the efficiency of execution of the statement.
In a first embodiment, the one or more predetermined criteria are that the join condition of the retrieved join is an equality between unique keys having no null values in the table. Typically, the unique keys having no null values are primary keys.
In another embodiment, the one or more predetermined criteria are that the join condition is a simple equality between any column in the two instances of the table and that the join is mergeable.
In this case, the join is typically considered to be mergeable if a flag associated with the join is set by a user.
In a third aspect of the invention, there is provided a method for retrieving desired data from a database system, the method comprising:
In accordance with a fourth aspect of the invention, there is provided a computer program comprising computer program code means adapted to perform the steps of either of the first or third aspects of the invention when said program is used on a computer.
In accordance with a fifth aspect of the invention, there is provided a computer program product comprising program code means stored on a computer readable medium for performing the method according to either of the first or third aspects of the invention when said program product is run on a computer.
An embodiment of the invention will now be described with reference to the accompanying drawings, in which:
In fact, for the convenience of the users, this may be done by creating a database view in order to hide the join. For example, the view may be defined as:
The desired data may then be retrieved from this view using the following SQL statement:
The database column EMP.DEPTNO is a foreign key referring to the database column DEPT.DEPTNO which is a primary key of the DEPT table.
In this example, the column LOC in the DEPT table is not exposed by the EMPDEPT view. Therefore, if a user wishes to retrieve the names of every employee along with their respective department name and geographical location using the EMPDEPT view then they would need to join the DEPT base table to the view with an SQL statement as shown below:
When presented to the database, the definition of the EMPDEPT view will be expanded so that the SQL statement actually executed by the database becomes:
It can be seen from the above SQL statement that there is a join from the DEPT base table back to itself since the “WHERE” clause requires that “D1.DEPTNO=D2.DEPTNO”. Both D1 and D2 are aliases for the DEPT table and therefore represent instances of the same table.
Before this superfluous join can be removed, it is advantageous to ascertain whether the two instances of DEPT can be merged into one without affecting the result set. The database performs this by examining the properties of the DEPT.DEPTNO column. Since this column is a primary key, the database knows that the values are unique and do not contain nulls. This means that every row retrieved from DEPT for the D1 alias is exactly the same row retrieved from DEPT for the D2 alias. Hence, the two aliases can be merged into one without affecting the result set.
Alternatively, if the software for modifying the SQL statement in order to reduce the number of joins is part of a tool such as Discoverer® which operates as a user interface to the database and which cannot access the primary key information, then the software ascertains whether the two DEPT aliases can be merged by confirming that the join condition is a simple equality between a column in the two instances of the table and that the join is mergable. The join is considered to be mergeable if a user settable metadata property is set. This metadata property acts as a flag and indicates that the associated join is a suitable candidate for table merging.
Aside from removing the join, it is necessary in both cases to change all references in the SQL statement from one alias to the other. In other words, all references to D1 must be changed to D2 or vice versa. For example, if the references to the D1 alias are changed to references to the D2 alias then the SQL statement becomes:
In actual fact, in this example, since there are only two aliases to table DEPT, both D1 and D2 can be changed to refer to DEPT. The resulting SQL statement becomes:
Clearly, the invention does not just find utility when a view is joined to one of its base tables. It may be used whenever a join exists which links a table to itself and where the references to the base table in the SQL statement can be merged into one without affecting the result set.
The invention therefore provides a method whereby users can make free use of joins without performance concerns since any superfluous joins from a table back to itself, for example from a view to one of its base tables, will be removed thereby ensuring the efficiency of execution of the SQL statement.
The database tables shown in
It is important to note that while the present invention has been described in a context of a fully functioning data processing system, those of ordinary skill in the art will appreciate that the processes of the present invention are capable of being distributed in the form of a computer readable medium of instructions and a variety of forms and that the present invention applies equally regardless of a particular type of signal bearing media actually used to carry out distribution. Examples of computer readable media include recordable-type media such as floppy disks, a hard disk drive, RAM and CD-ROMs as well as transmission-type media such as digital and analogue communications links.
Number | Date | Country | Kind |
---|---|---|---|
0323216.2 | Oct 2003 | GB | national |