The present invention relates to query systems and methods, and more specifically, to optimization systems and methods for database queries.
Resource Description Framework (RDF) is a data representation standard of the Internet. RDF is typically stored in RDF graphs and is often subjected to queries. RDF query languages can be used to write expressions that are evaluated against one or more RDF graphs in order to produce, for example, a narrowed set of statements, resources, or object values, or to perform comparisons and operations on such items. In addition, RDF queries can be used by knowledge management applications as a basis for inference actions.
Although several query languages for RDF graphs have emerged, typically, RDF graphs are queried using the Simple Protocol and RDF Query Language (SPARQL), which is modeled loosely after Structured Query Language (SQL). SPARQL can be used to express complex queries across diverse data sources (e.g., stored natively as RDF or viewed as RDF via middleware). As a relatively new query language, SPARQL does not benefit from many years of optimization research as does other query languages (e.g., SQL). Such disadvantages can hinder the adoption of SPARQL and thus RDF itself.
According to one embodiment of the present invention, a method of processing a query is provided. The method includes performing on a processor: receiving a database query that includes a plurality of predicates that associate a subject with an object, where one or more of the predicates is a variable predicate; generating at least one new query by selectively replacing the at least one variable predicate in the database query with a non-variable predicate; and performing the at least one new database query on a database to obtain a query result.
Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention. For a better understanding of the invention with the advantages and the features, refer to the description and to the drawings.
The subject matter which is regarded as the invention is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The forgoing and other features and advantages of the invention are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:
Turning now to the drawings in greater detail, it will be seen that in
The computer 101 is shown to include a processor 102, memory 104 coupled to a memory controller 106, one or more input and/or output (I/O) devices 108, 110 (or peripherals) that are communicatively coupled via a local input/output controller 112, and a display controller 114 coupled to a display 116. In an exemplary embodiment, a conventional keyboard 122 and mouse 124 can be coupled to the input/output controller 112. In an exemplary embodiment, the computing system 100 can further include a network interface 118 for coupling to a network 120. The network 120 transmits and receives data between the computer 101 and external systems.
In various embodiments, the memory 104 stores instructions that can be performed by the processor 102. The instructions stored in memory 104 may include one or more separate programs, each of which comprises an ordered listing of executable instructions for implementing logical functions. In the example of
When the computer 101 is in operation, the processor 102 is configured to execute the instructions stored within the memory 104, to communicate data to and from the memory 104, and to generally control operations of the computer 101 pursuant to the instructions. The processor 102 can be any custom made or commercially available processor, a central processing unit (CPU), an auxiliary processor among several processors associated with the computer 101, a semiconductor based microprocessor (in the form of a microchip or chip set), a macroprocessor, or generally any device for executing instructions.
The processor 102 executes the instructions of an optimized query system (OQS) 128 of the present disclosure. In various embodiments, the optimized query system 128 of the present disclosure is stored in the memory 104 (as shown), is run from a portable storage device (e.g., CD-ROM, Diskette, FlashDrive, etc.) (not shown), and/or is run from a remote location, such as from a central server (not shown).
Generally speaking, the optimized query system 128 optimizes semantic database queries. As can be appreciated, the optimized database query may be an improved database query and may not necessarily be limited to the optimal database query. The semantic database queries can be provided in a graph query language such as, for example, SPARQL (SPARQL Protocol and RDF Query Language), RDQL (RDF Data Query Language), RQL (RDF Query Language), or other graph query language. The optimized queries can then be performed on data stored in, for example, the memory 104 or other data storage medium to return a result. The data can be stored in, for example, an RDF format. Technical effects and benefits of an optimized query include more efficient query results as well as faster query response times.
With reference now to
For exemplary purposes, the disclosure will be discussed in the context of the example query of
Turning now to
The predicate replacement module 150, the triple pattern evaluation module 152, and the empty query removal module 154 receive as input a query 140 in a graph query language. In various embodiments, each module 150-156 can operate as a stand alone optimization module and can selectively perform one or more optimization techniques on the query 140. In various other embodiments, as shown in
In various embodiments, the predicate replacement module 150 receives as input the query 140. The query 140 can be generated, for example, by a query system as described in the U.S patent application filed contemporaneously herewith entitled, “Enforcing Query Policies Over Resource Description Framework Data,” which is incorporated herein by reference in its entirety. The predicate replacement module 150 identifies and replaces any wildcard predicates (i.e., variable predicates) in the query 140 with actual predicates. In the example query 130 of
In various embodiments, in order to perform the replacement, the predicate replacement module 150 collects or receives (flow not shown) predicate-association statistics. These statistics can be computed by counting co-occurrence frequencies of predicates in the actual data or by estimating these co-occurrence frequencies by using past query evaluations. For each wildcard predicate, the predicate replacement module 150 identifies a set of nearest joinable non-variable predicates in the query 140, and determines an intersection of joinable predicates in the set. The predicate of the query 140 is then replaced with the nearest joinable non-variable predicates. For each substitution, the predicate replacement module 150 generates a new query. The predicate replacement module 150 then generates a set of new queries (query set 158) for further optimization or for querying (flow not shown).
The triple pattern evaluation module 152 receives as input the set of new queries 158. The triple pattern evaluation module 152 identifies and removes redundant triple patterns from the set of new queries 158. For example, assume that a triple pattern is used twice in the set 158, once for predicate p1 and once for its joinable predicate p2, with variable mappings ‘Φ1’ and ‘Φ2,’ respectively. The triple pattern evaluation module 152 considers the variable mappings between the query and the triple patterns and constructs a new mapping ‘Φ merge’ that merges the two input mappings.
In various embodiments, the variables and constants appearing in the new query are treated as constants for the purpose of this merging (therefore only fresh variables are treated as variables for the purposes of the merging). This ensures that views are merged not just because they are copies of each other, but merged only when their predicates are joined in the same way as in the query itself. Each time view copies are merged, any variable mappings that have been applied to the views are accounted for, due to their relationship with other views corresponding to the other predicates. If Φ merge is equal to Ø, then the two copies of V can not be merged.
The triple pattern evaluation module 152 can additionally or alternatively re-order the sequence of the triple patterns in the queries of the set 158. For example, the triple pattern evaluation module can re-order the sequence based on a selectivity estimation of the triple patterns. The selectivity can be estimated by keeping statistics of past pattern evaluations, or by maintaining statistics for the actual stored data. The triple pattern evaluation module 152 then generates an optimized query set 160 for further optimization or for querying (flow not shown).
The empty query removal module 154 receives as input the optimized query set 160. The empty query removal module 154 removes any empty sub-queries from the optimized query set 160. For example, a value set for each distinct variable involved in the triple patterns is determined, and a synopsis for each value set is then constructed. Given these synopses, for the previous example, the size of the intersection of A(?y2) and A(?y3) is estimated. If the intersection size is estimated to be above some preset threshold with a reasonable probability, they can be considered as joinable. Otherwise, an ASK query can be issued to verify if the triple pattern is actually empty. If the ask query returns ‘yes’, the rewritings that the joined triple patterns of p1(?y1, ?y2) and p2(?y3, ?y4) are removed. The empty query removal module 154 then generates an optimized query set 162 for querying.
The query management module 156 receives as input the optimized query set 162. The query management module 156 performs a query of base data 164 stored in a base data datastore 166 using the optimized query set 162 (the optimized query set 160, or the query set 158). As can be appreciated, the base data datastore 166 can be implemented as a part of or separate from the optimized query system 128. The query module 156 generates query results 168 from the querying. The query results 168 can be presented to the user via, for example, a user interface in a textual or graphical format.
Turning now to
With particular reference to
With particular reference to
As can be appreciated, the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, element components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated
Further, as will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method, or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The flow diagrams depicted herein are just one example. There may be many variations to this diagram or the steps (or operations) described therein without departing from the spirit of the invention. For instance, the steps may be performed in a differing order or steps may be added, deleted or modified. All of these variations are considered a part of the claimed invention.
While the preferred embodiment to the invention had been described, it will be understood that those skilled in the art, both now and in the future, may make various improvements and enhancements which fall within the scope of the claims which follow. These claims should be construed to maintain the proper protection for the invention first described.
This invention was made with U.S. Government support under Contract No. W911NF-09-2-0053 awarded by the U.S. Army. The U.S. Government has certain rights in the invention.