The present invention relates to semantic web queries.
Resource Description Framework (RDF) is the de-facto standard for graph representation and the primary vehicle for data exchange over the Internet or World Wide Web. RDF is flexible and uses simple primitives for data representation, e.g., nodes and edges. In addition, RDF facilitates the integration of heterogeneous sources on the Web. The query language of choice for RDF is SPARQL. SPARQL queries are complex and contain a large number of triples and several layers of nesting. Optimization of SPARQL queries involves defining the order and methods with which to access the triples and building a hierarchical plan tree for query evaluation based on cost. A number of works have already studied how to efficiently evaluate semantic web (SPARQL) queries. Typical existing approaches are performing bottom-up SPARQL query optimization, i.e., individual triples or conjunctive patterns in the SPARQL query are independently optimized and then each optimizer attempts to piece together and order these individual plans into one global plan. These approaches are similar to typical relational database optimizers in that they rely on statistics to assign costs to query plans and are in contrast to less effective approaches whose SPARQL query optimization heuristics ignore statistics.
Simple SPARQL queries resemble Structured Query Language (SQL) conjunctive queries, and, therefore, one expects that existing techniques to be sufficient. However, a simple overview of real and benchmark SPARQL queries shows that SPARQL queries encountered in practice are far from simple. To a large extent due to the nature of RDF, these SPARQL queries are often arbitrarily complex, e.g., with deep nestings, and often quite big, e.g. one exemplary SPARQL query involves a union of 100 queries. To make matters worse, typical operators in SPARQL often correspond to more exotic operators in the relational world that are less commonly considered by optimizers. For example, the common OPTIONAL operator in SPARQL corresponds to left-outer joins. All these observations lead to the conclusion that there is potential for novel optimization techniques in this space.
Although attempts have been made to provide query optimization both in SPARQL and beyond, important challenges remain for SPARQL query optimization translation of SPARQL queries to equivalent SQL queries over a relational database or store. Typical approaches perform bottom-up SPARQL query optimization, i.e., individual triples or conjunctive SPARQL patterns are independently optimized and then the optimizer orders and merges these individual plans into one global plan. These approaches are similar to typical relational optimizers in that they rely on statistics to assign costs to query plans. While these approaches are adequate for simple SPARQL queries, they are not as effective for more complicated, but still common, SPARQL queries. Such queries often have deep, nested sub-queries whose inter-relationships are lost when optimizations are limited by the scope of single triple or individual conjunctive patterns.
Exemplary embodiments of systems and methods in accordance with the present invention are directed to a hybrid two-step approach to query optimization. As a first step, a specialized structure, called a data flow is constructed that captures the inherent inter-relationships due to the sharing of common variables or constants of different query components. These inter-relationships often span the boundaries of simple conjuncts (or disjuncts) and are often across the different levels of nesting of a query, i.e., they are not visible to existing bottom-up optimizers. As a second step, the data flow and cost estimates are used to decide both the order with which to optimize the different query components and the plans that are going to be considered.
While the hybrid optimizer searches for optimal plans, this search is qualified by the fact that SPARQL queries are ultimately converted to SQL. That is, the plans are created such that when they are implemented in SQL they are amenable to optimizations by the relational query engine and can be efficiently evaluated in the underlying relational store. Therefore, SPARQL acts as a declarative query language that is optimized, while SQL becomes a procedural implementation language. This dependence on SQL essentially transforms the problem from a purely query optimization problem into a combined query optimization and translation problem. The translation part is particularly complex since there are many equivalent SQL queries that implement the same SPARQL query plan.
The hybrid optimization and the efficient SPARQL-to-SQL translation are generalizable and can be applied in any SPARQL query evaluation system. The hybrid optimizer can be used for SPARQL query optimization independent of the selected RDF storage, i.e., with or without a relational back-end. The efficient translation of SPARQL to SQL can be generalized and used for any relational storage configuration of RDF. The combined effects of these two independent contributions drive the performance of the present invention.
Systems and methods in accordance with the present invention are directed to a hybrid SPARQL query optimization technique that is generic and independent of the choice of representing RDF data in relational schema or otherwise. Therefore, the query optimization techniques can be used by any other optimizer in this space. Optimizations that are representation-agnostic are separated from those that are representation-specific. This modularity provides significant advantages and the ability to fine-tune the storage and query optimization layers independently of each other.
Exemplary embodiments in accordance with the present invention achieve huge performance gains by independently optimizing SPARQL and the SPARQL to SQL. The hybrid SPARQL query optimization technique is generic and independent of how the RDF data are represented, e.g., relational schema, or otherwise. In fact these techniques can be applied directly to query optimization for native RDF stores. The query translation techniques are then tuned to our schema representation. Referring initially to
Regarding the three inputs, the SPARQL query, , conforms to the SPARQL 1.0 standard. Therefore, each query is composed of a set of hierarchically nested graph patterns, , with each graph pattern Pε being, in its most simple form, a set of triple patterns. The statistics S over the underlying RDF dataset are defined using types and precision with regard to specific implementations. Examples of collected statistics include, but are not limited to, the total number of triples, average number of triples per subject, average number of triples per object, and the top-k URIs or literals in terms of number of triples in which they appear. The access methods, , provide alternative ways to evaluate a triple pattern t for some pattern Pε. The methods are system-specific and dependent on existing indexes. For example, for a system having only subject and object indexes, i.e., no predicate indexes, the methods would be access-by-subject (acs), by access-by-object (aco) or a full scan (sc).
Referring to
As was shown in
The DFB starts by building a parse tree for the input query. Referring to
In one embodiment, cost is determined based on a triple method cost (TMC). The triple method cost is a function that maps TMC(t, m, S):→c; cε≦0, where t is a given triple, m is an access method, S are the statistics for RDF. A cost c is assigned to evaluating t using m with respect to statistics S. The mapping function varies with the degree to which S are defined. Therefore, the cost estimation depends on the statistics S. In the example query of
Regarding the building of the data flow graph, the data flow graph models how using the current set of bindings for variables can be used to access other triples. In modeling this flow, the semantics of AND, OR and OPTIONAL patterns are respected. A set of helper functions are used to define the graph. The symbol ↑ refers to parents in the query tree structure. For a triple or a pattern, this is the immediately enclosing pattern. The symbol * denotes transitive closure. The first helper function is produced variable. The data flow graph models how using the current set of bindings for variables can be used to access other triples. In modeling this flow, the semantics of AND, OR and OPTIONAL patterns are respected. First, a set of helper functions are introduced and are used to define the graph. Produced Variables is a function, (t, m):→Vprod, that maps a triple and an access method pair to a set of variables that are bound after the lookup, where t is a triple, m is an access method, and Vprod is the set of variables in the triple produced by the lookup. In the example query, for the pair (t4, aco), P(t4, aco):→y, because the lookup uses Software as an object, and the only variable that gets bound as a result of the lookup is y.
Required Variables is a function, (t, m):→Vreq, that maps a triple and an access method pair to a set of variables that are required t be bound for the lookup, where t is a triple, m is an access method, and Vreq is the set of variables required for the lookup. In the example query, (t5, aco):→y. That is, if the aco access method is used to evaluate t5, then variable y is required to be bound by some prior triple lookup.
Least Common Ancestor, LCA(p, p′) is the first common ancestor of patterns p and p′. More formally, it is defined as follows: LCA(p, p′)=xxε↑*(p)Λxε↑*(p′)Λy.yε↑*(p)Λyε↑*(p′)Λxε↑*(y). As an example, in
↑↑(p,p′)≡{x|xε↑*(p)xε↑*(LCA(p,p′))}
For instance, for the query shown in
For OR connected patterns, ∪ denotes that two triples are related in an OR pattern, i.e., their least common ancestor is an OR pattern: ∪(t, t′)≡LCA(t, t′) is OR. In the example, t2 and t3 are ∪. For OPTIONAL connected patterns, {circumflex over (∩)} denotes if one triple is optional with respect to another, i.e., there is an OPTIONAL pattern guarding t′ with respect to t:
{circumflex over (∩)}(t,t′)≡∃p:pε↑↑(t′,t)Λp is OPTIONAL
In the example, t6 and t7 are {circumflex over (∩)}, because t7 is guarded by an OPTIONAL in relation to t6.
The data flow graph is a graph of G=<V, E>, where V=(×)∪ root, where root is a special node added to the graph. A directed edge (t, m)→(t′, m′) exists in V when the following conditions hold: (t, m)⊃(t′, m′)Λ(∪(t, t′) V{circumflex over (∩)}(t′, t)). In addition, a directed edge from root exists to a node (t, m) if (t, m)=ø.
In the example, a directed edge root→(t4, aco) exists in the data flow graph (in
The data flow graph is weighted, and the weights for each edge between two nodes is determined by a function: W((t, m), (t′, m′)), S)→w. The w is derived from the costs of the two nodes, i.e., TMC(t, m, S), and TMC(t′, m′, S). A simple implementation of this function, for example, could apply the cost of the target node to the edge. In the example, for instance, w for the edge root→(t4, aco) is 2, whereas the edge root→(t4, asc) is 5.
Given a weighted data flow graph the optimal, in terms of minimizing the cost, order for accessing all the triples in query is the minimal weighted tree that covers all the triples in , which is NP-hard. Since the query can contain a large number of triples, a greedy algorithm to is used to determine the execution tree. If T denotes the execution tree that is being computed and τ refers to the set of triples corresponding to nodes already in the tree, τ≡{ti|∃mi(ti,mi)ε}. The object is to add a node that adds a new triple to the tree while adding the cheapest possible edge. Formally, a node (t′, m′) is chosen such that:
On the first iteration, 0=root, and τ0=ø. i+1 computed by applying the step defined above, and the triple of the chosen node is added to τi+1. In the example, root→(t4, aco) is the cheapest edge, so 1=(t4, aco), and τ0=t4. Then (t2, aco) is added to 2, and so on. The iterations stop at n, where n is the number of triples in Q.
Both the data flow graph and the optimal flow tree largely ignore the query structure, i.e., the organization of triples into patterns, and the operators between the (triple) patterns. Yet, they provide useful information as to how to construct an actual plan for the input query, the focus of this section and output of the QPB module.
In more detail, the main algorithm ExecTree of the module appears below. The algorithm is recursive and takes as input the optimal flow tree F computed by DFB, and (the parse tree of) a pattern P, which initially is the main pattern that includes the whole query.
In the running example, for the query 202 in
There are four main types of patterns in SPARQL, namely, SIMPLE, AND, UNION (a.k.a OR), and OPTIONAL patterns, and the algorithm handles each one independently as illustrated through the running example. Initially, both the execution tree T and the set L are empty (line 1). Since the top-level node in
The first recursion terminates by returning (T1, L1)=(ø, (L1=(t1, acs))). The second sub-pattern in
The last call to AndTree builds the tree 500 illustrated in of
Using the optimal flow tree as a guide enabled weavinge the evaluation of different patterns, while the structured based processing guarantees that the associativity of operations in the query is respected. So, the optimizer can generate plans like the one in
The SPARQL to SQL translator takes as input the execution tree generated from the QPB module and performs two operations. First, it transforms the execution tree into an equivalent query plan that exploits the entity-oriented storage of, for example, R2DF. Second, it uses the query plan to create the SQL query which is executed by the database.
In order to build the query plan, the execution tree provides an access method and an execution order for each triple but assumes that each triple node is evaluated independently of the other nodes. However, one of the advantages of the entity-oriented storage is that a single access to, say, the DPH relation might retrieve a row that can be used to evaluate multiple triple patterns (star-queries). To this end, starting from the execution tree the translator builds a query plan where triples with the same subject (or the same object) are merged in the same plan node. A merged plan node indicates to the SQL builder that the containing triples form a star-query and is executed with a single SQL select. Merging of nodes is always advantageous with one exception, when the star query involves entities with spills. The presence of such entities would require self-joins of the DPH (RPH) relations in the resulting SQL statement. Self-joins are expensive, and therefore the following strategy is used to avoid them. When the star-queries involve entities with spills, the evaluation of the star-query is cascaded by issuing multiple SQL statements, each evaluating a subset of the star-query while at the same time filtering entities from the subsets of the star-query that have been previously evaluated. The multiple SQL statements are such that no SQL statement accesses predicates stored into different spill rows. Of course, the question remains on how to determine whether spills affect a star query. In accordance with exemplary embodiment of the methods and systems of the present invention, this is straightforward. With only a tiny fraction of predicates involved in spills, e.g., due to coloring, the optimizer consults an in-memory structure of predicates involved in spills to determine during merging whether any of the star-query predicates participate in spills.
During the merging process, both the structural and semantic constraints are respected. The structural constraints are imposed by the entity-oriented representation of data. To satisfy the structural constraints, candidate nodes for merging need to refer to the same entity, have the same access method and do not involve spills. As an example, in
Semantic constraints for merging are imposed by the control structure of the SPARQL query, i.e., the AND, UNION, OPTIONAL patterns. This restricts the merging of triples to constructs for which we can provide the equivalent SQL statements to access the relational tables. Triples in conjunctive and disjunctive patterns can be safely merged because the equivalent SQL semantics are well understood. Therefore, with a single access, the system can check whether the row includes the non-optional predicates in the conjunction. Similarly, it is possible to check the existence of any of the predicates mentioned in the disjunction. More formally, to satisfy the semantic constraints of SPARQL, candidate nodes for merging need to be ANDMergeable, ORMergeable or OPTMergeable.
For AND Mergeable nodes, two nodes are ANDMergeable iff their least common ancestor and all intermediate ancestors are AND nodes: ANDMergeable(t, t′)∀x:xε(↑↑(t, LCA(t, t′))∪↑↑(t′, LCA(t, t′)))x is AND. For OR Mergeable nodes, two nodes are ORMergeable iff their least common ancestor and all intermediate ancestors are OR nodes: ORMergeable(t, t′), ∀x:xε(↑↑(t, LCA(t, t′))∪↑↑(t′, LCA(t, t′)))x is OR. Going back to the execution tree in
Given the input execution tree, pairs of nodes are identified that satisfy both the structural and semantic constraints introduced and are merged. So, given as input the execution tree in
SQL generation is the final step of query translation. The query plan tree plays an important role in this process, and each node in the query plan tree, be it a triple, merge or control node, contains the necessary information to guide the SQL generation. For the generation, the SQL builder performs a post order traversal of the query plan tree and produces the equivalent SQL query for each node. The whole process is assisted by the use of SQL code templates.
In more detail, the base case of SQL translation considers a node that corresponds to a single triple or a merge.
The operator nodes in the query plan are used to guide the connection of instantiated templates like the one in
As illustrated above, several Common Table Expressions (CTEs) are used for each plan node. For example, t4 is evaluated first and accesses RPH using the Software constant. Since industry is a multivalued predicate, the RS table is also accessed. The remaining predicates in this example are single valued and the access to the secondary table is avoided. The ORMergeable node t23 is evaluated next using the RPH table where the object is bound to the values of y produced by the first triple. The WHERE clause enforces the semantic that at least one of the predicates is present. The CTE projects the values corresponding to the present predicates and null values for those that are missing. The next CTE just flips these values, creating a new result record for each present predicate. The plan continues with triple t5 and is completed with node the OPTMergeable node t67. Here no constraint is imposed for the optional predicate making its presence optional on the record. In case the predicate is present, the corresponding value is projected, otherwise null. In this example, each predicate is assigned to a single column. When predicates are assigned to multiple columns, the position of the value is determined with CASE statements as seen in the SQL sample.
To examine the effectiveness of the query optimization, experiments were conducted using both a 1M triple microbenchmark and queries from other datasets. As an example, for the microbenchmark, two constant values O1 and O2 were considered with relative frequency in the data of 0.75 and 0.01, respectively. Then, the simple query 802 shown in
In
The performance of an R2DF schema, using IBM DB2 as the relational back-end, was compared to that of Virtuoso 6.1.5 OpenSource Edition, Apache Jena 2.7.3 (TDB), OpenRDF Sesame 2.6.8, and RDF-3×0.3.5. R2DF, Virtuoso and RDF-3× were run in a client server mode on the same machine and all other systems were run in process mode. For both Jena and Virtuoso, all recommended optimizations were enabled. Jena had the BGP optimizer enabled. For Virtuoso all recommended indexes were built. For R2DF, only indexes on the entry columns of the DPH and RPH relations were added, (no indexes on the predi and vali columns).
Experiments were conducted with 4 different benchmarks, LUBM, SP2Bench, DBpedia, and a private benchmark PRBench. The LUBM and SP2Bench benchmarks were scaled up to 100 million triples each, and their associated published query workloads were used. The DBpedia 3.7 benchmark has 333 million triples. The private benchmark included data from a tool integration application, and it contained 60 million triples about various software artifacts generated by different tools, e.g., bug reports, requirements, etc. For all systems, queries were evaluated in a warm cache scenario. For each dataset, benchmark queries were randomly mixed to create a run, and each run was issued 8 times to the 5 stores. The first run was discard, and the average result for each query over 7 consecutive runs was reported. For each query, its running time was measured excluding the time taken to stream back the results to the API, in order to minimize variations caused by the various APIs available. As shown in Table 1, the evaluated queries were classified into four categories. Queries that failed to parse SPARQL correctly were reported as unsupported. The remainder supported queries were further classified as either complete, timeout, or error. The results from each system were counted, and when a system provided the correct number of answers, the query was classified as completed. If the system returned the wrong number of results, this was classified as an error. Finally, a timeout of 10 minutes was used to trap queries that do not terminate within a reasonable amount of time. In the table, the average time taken (in seconds) to evaluate complete and timeout queries is reported. For queries that timeout, their running time was set to 10 minutes. The time of queries that return the wrong number of results is not reported.
This is the most comprehensive evaluation of RDF systems. Unlike previous works, this is the first study that evaluates 5 systems using a total of 78 queries, over a total of 600 million triples. The experiments were conducted on 5 identical virtual machines (one per system), each equivalent to a 4-core, 2.6 GHz Intel Xeon system with 32 GB of memory running 64-bit Linux. Each system was not memory limited, meaning it could consume all of its 32G. None of the systems came close to this memory limit in any experiment.
The LUBM benchmark requires OWL DL inference, which is not supported across all tested systems. Without inference, most benchmark queries return empty result sets. To address this issue, the existing queries were expanded, and a set of equivalent queries that implement inference and do not require this feature from the evaluated system was created. As an example, if the LUBM ontology stated that GraduateStudent Student, and the query asks for ? x rdf: type Student, the query was expanded into ? x rdf: type Student UNION ? x rdf: type Graduate Student. This set of expansions was performed, and the same expanded query was issued to all systems. From the 14 original queries in the benchmark, only 12 (denoted as LQ1 to LQ10, LQ13 and LQ14) are included here because 2 queries involved ontological axioms that cannot be expanded.
SP2Bench is an extract of DBLP data with corresponding SPARQL queries (denoted as SQ1 to SQ17). This benchmark was used as is, with no modifications. Prior reports on this benchmark were conducted with at most 5 million triples. It was scaled to 100 million triples, and some queries (by design) had rather large result sets. SQ4 in particular created a cross product of the entire dataset, which meant that all systems timeout on this query.
The DBpedia SPARQL benchmark is a set of query templates derived from actual query logs against the public DBpedia SPARQL endpoint. These were used templates with the DBpedia 3.7 dataset, and 20 queries (denoted as DQ1 to DQ20) were obtained that had non-empty result sets. Since templates were derived for an earlier DBpedia version, not all result in non-empty queries.
The private benchmark reflects data from a tool integration scenario where specific information about the same software artifacts are generated by different tools, and RDF data provides an integrated view on these artifacts across tools. This is a quad dataset where triples are organized into over 1 million ‘graphs’. This caused problems for some systems which do not support quads, e.g., RDF-3×, Sesame. Twenty nine SPARQL queries (denoted as PQ1 to PQ29) were used, with some being fairly complex queries,e.g., a SPARQL union of 100 conjunctive queries.
Table 1 shows that R2DF is the only system that evaluates correctly and efficiently 77 out of the 78 tested queries. As mentioned, SQ4 was the only query in which the system did timeout as did all the other systems. If SQ4 is excluded, it is clear from the table that each of the remaining systems had queries returning incorrect number of results, or queries that timeout without returning any results. The advantage of R2DF is not emphasized in terms of SPARQL support, since this is mostly a function of system maturity and continued development.
Given Table 1, it is hard to make direct system comparisons. Still, when the R2DF system is compared with systems that can evaluate approximately the same queries, i.e., Virtuoso and Jena, then R2DF is in the worst case slightly faster, and in the best case, as much as an order of magnitude faster than the other two systems. So, for LUBM, R2DF is significantly faster than Virtuoso (2×) and Jena (4×). For SP2Bench, R2DF is on average times about 50% faster than Virtuoso, although Virtuoso has a better geometric mean (not shown due to space constraints), which reflects Virtuoso being much better on short running queries. For DBpedia, R2DF and Virtuoso have comparable performance, and for PRBench, R2DF is about 5.5× better than Jena. Jena is actually the only system that supports the same queries as R2DF, and across all datasets R2DF is in the worst case 60%, and in the best case as much as two orders of magnitude faster. A comparison between R2DF and RDF-3× is also possible, but only in the LUBM dataset where both systems support a similar number of queries. The two systems are fairly close in performance and out-perform the remaining three systems. When compared between themselves across 11 queries (RDF-3× did not run one query), R2DF is faster than RDF-3× in 3 queries, namely in LQ8, LQ13 and LQ14 (246 ms, 14 ms and 4.6 secs versus 573 ms, 36 ms and 9.5 secs, respectively), while RDF-3× has clearly an advantage in 3 other queries, namely in LQ2, LQ6, LQ10 (722 ms, 12 secs and 1.57 secs versus 20 secs, 33 secs and 3.42 secs, respectively). For the remaining 5 queries, the two systems have almost identical performance with RDF-3× being faster than R2DF by approximately 3 ms for each query.
For a more detailed per-query comparison,
The situation is similar in the PRBench case.
Methods and systems in accordance with exemplary embodiments of the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In a preferred embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software and microcode. In addition, exemplary methods and systems can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer, logical processing unit or any instruction execution system. For the purposes of this description, a computer-usable or computer-readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. Suitable computer-usable or computer readable mediums include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems (or apparatuses or devices) or propagation mediums. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
Suitable data processing systems for storing and/or executing program code include, but are not limited to, at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements include local memory employed during actual execution of the program code, bulk storage, and cache memories, which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution. Input/output or I/O devices, including but not limited to keyboards, displays and pointing devices, can be coupled to the system either directly or through intervening I/O controllers. Exemplary embodiments of the methods and systems in accordance with the present invention also include network adapters coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Suitable currently available types of network adapters include, but are not limited to, modems, cable modems, DSL modems, Ethernet cards and combinations thereof.
In one embodiment, the present invention is directed to a machine-readable or computer-readable medium containing a machine-executable or computer-executable code that when read by a machine or computer causes the machine or computer to perform a method for optimizing semantic web queries in accordance with exemplary embodiments of the present invention and to the computer-executable code itself. The machine-readable or computer-readable code can be any type of code or language capable of being read and executed by the machine or computer and can be expressed in any suitable language or syntax known and available in the art including machine languages, assembler languages, higher level languages, object oriented languages and scripting languages. The computer-executable code can be stored on any suitable storage medium or database, including databases disposed within, in communication with and accessible by computer networks utilized by systems in accordance with the present invention and can be executed on any suitable hardware platform as are known and available in the art including the control systems used to control the presentations of the present invention.
While it is apparent that the illustrative embodiments of the invention disclosed herein fulfill the objectives of the present invention, it is appreciated that numerous modifications and other embodiments may be devised by those skilled in the art. Additionally, feature(s) and/or element(s) from any embodiment may be used singly or in combination with other embodiment(s) and steps or elements from methods in accordance with the present invention can be executed or performed in any suitable order. Therefore, it will be understood that the appended claims are intended to cover all such modifications and embodiments, which would come within the spirit and scope of the present invention.
Number | Name | Date | Kind |
---|---|---|---|
5909678 | Bergman et al. | Jun 1999 | A |
7765176 | Simmons et al. | Jul 2010 | B2 |
7899861 | Feblowitz et al. | Mar 2011 | B2 |
8032525 | Bowers et al. | Oct 2011 | B2 |
8117233 | Liu et al. | Feb 2012 | B2 |
8229775 | Adler et al. | Jul 2012 | B2 |
8260768 | Wang et al. | Sep 2012 | B2 |
8429179 | Mirhaji | Apr 2013 | B1 |
20070033279 | Battat et al. | Feb 2007 | A1 |
20080040308 | Ranganathan et al. | Feb 2008 | A1 |
20080256549 | Liu et al. | Oct 2008 | A1 |
20100030723 | Au | Feb 2010 | A1 |
20100250577 | Cao et al. | Sep 2010 | A1 |
20120047124 | Duan et al. | Feb 2012 | A1 |
20120246153 | Pehle | Sep 2012 | A1 |
20140214857 | Srinivasan et al. | Jul 2014 | A1 |
Number | Date | Country |
---|---|---|
2012135851 | Oct 2012 | WO |
Entry |
---|
Kim et al., From SPARQL to MapReduce: The Journey Using a Nested TripleGroup Algebra, 2011, Department of Computer Science, North Carolina State University, Raleigh, NC, 4 pages. |
SPARQL Query Processing with Conventional Relational Database Systems, 2005, 235-244. |
Querying Distributed RDF Data Sources with SPARQL, 2005, 524-538. |
“4store—scalable RDF storage,” http://4store.org/http://4store.org/. |
Abadi et al., “Scalable semantic web data management using vertical partitioning,” in VLDB, 2007, pp. 411-422. |
Bednarek et al., “Using Methods of Parallel Semi-structured Data Processing for Semantic Web”, 2009 Third International Conference on Advances in Semantic Processing. |
Chen et al., “Mapping XML to a Wide Sparse Table,” in ICDE, 2012, pp. 630-641. |
DBpedia dataset, “http://dbpedia.org.” |
Duan et al., “Apples and oranges: a comparison of RDF benchmarks and real RDF datasets,” in SIGMOD, 2011. |
Guo et al., “LUBM: a benchmark for OWL knowledge base systems,” Journal of Web Semantics, vol. 3, No. 2-3, pp. 158-182, 2005. |
Hartig et al., “The SPARQL Query Graph Model for Query Optimization,” 2007, pp. 564-578. |
Huang et al., “Scalable SPARQL Querying of Large RDF Graphs,” PVLDB, vol. 4, No. 11, pp. 1123-1134, 2011. |
Langegger et al. “A Semantic Web Middleware for Virtual Data Integration on the Web”, The Semantic Web Research and Application (2008) 493-507. |
Le et al., “Scalable Multi-Query Optimization for SPARQL”, Data Engineering (ICDE), 2012 IEEE 28th International Conference on IEEE, 2012. |
Letelier et al., “Static Analysis and Optimization of Semantic Web Queries”, PODS 12, May 21-23, 2012, Scottsdale, Arizona USA. |
Maduko et al., “Estimating the cardinality of rdf graph patterns,” in WWW, 2007, pp. 1233-1234. |
Matano et al., “A Path-based Relational RDF Database”, Proceeding ADC '05 Proceedings of the 16th Astraliasian Database Conference—vol. 39 pp. 95-103 2005. |
Morsey et al., “DBpedia SPARQL Benchmark—Performance Assessment with Real Queries on Real Data,” in ISWC 2011, 2011. |
Neumann et al., “The RDF-3X engine for scalable management of RDF data,” The VLDB Journal, vol. 19, No. 1, pp. 91-113, Feb. 2010. |
Schmidt et al., “SP2Bench: A SPARQL Performance Benchmark,” CoRR, vol. abs/0806.4627, 2008. |
Son, et al., “Performance Evaluation of Storage—Independent Model for SPARQL-to-SQL Translation Algorithms”, NTMS Feb. 2001. |
SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/. |
Stocker et al., “SPARQL basic graph pattern optimization using selectivity estimation”, in WWW 2008, pp. 595-604. |
Tsialiamanis et al., “Heuristics-based query optimisation for SPARQL,” in EDBT, 2012, pp. 324-335. |
Udrea et al., “GRIN: A Graph-based RDF Index,” in AAAI, 2007, pp. 1465-1470. |
Virtuoso Open-Source Edition, “http://virtuoso.openlinksw.com/wiki/main/main/.” |
Weiss et al., “Hexastore: sextuple indexing for semantic web data management,” PVLDB, vol. 1, No. 1, pp. 1008-1019, 2008. |
Wilkinson et al., “Efficient RDF Storage and Retrieval in Jena2”, in Semantic Web and Databases Workshop, 2003, pp. 131-150. |
Number | Date | Country | |
---|---|---|---|
20140304251 A1 | Oct 2014 | US |