COMPUTING TRANSITIVE CLOSURES

Information

  • Patent Application
  • 20170061306
  • Publication Number
    20170061306
  • Date Filed
    August 31, 2015
    9 years ago
  • Date Published
    March 02, 2017
    7 years ago
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for computing transitive closures of relations. One of the methods includes initializing the transitive closure F of an initial iteration with the tuples in another relation f. New first tuples and new second tuples are iteratively computed until no new first tuples are generated, including: generating new first tuples on each iteration by matching destination elements of tuples in the auxiliary relation of a previous iteration with source elements of tuples in the auxiliary relation of the previous iteration, generating second tuples on each iteration by matching destination elements of the new first tuples with source elements of tuples in F of the previous iteration, and adding the new first tuples and the new second tuples to F of a current iteration.
Description
BACKGROUND

This specification relates to data processing, in particular, to calculating the transitive closure of a relation.


A relation is a set of tuples (t1, . . . , Tn), each tuple having n≧1 data elements ti. Each element t1 is a corresponding value, which may represent a value of a corresponding attribute. The attribute will generally have an attribute name. The correspondence between attribute and value is determined by the position of the value in the tuple, i.e., each attribute has a corresponding position. Relations are commonly thought of as, represented as, and referred to as tables in which each row is a tuple and each column is an attribute.


A binary relation is a relation whose tuples have two data elements each, i.e., they are pairs. Given a binary relation f, and a tuple (a,b) in that relation, the elements a and b are related by f. For a given tuple (a,b), a is referred to as the source element, and b is referred to as the destination element.


Two elements a and b are transitively related by f when a sequence of tuples in f can be found such that the first tuple in the sequence contains a as a source element, the last tuple in the sequence contains b as a destination element, and every tuple in the sequence has a destination element that matches a source element of a subsequent tuple in the sequence, except for the last tuple in the sequence, which has no subsequent tuple). From a binary relation f, another relation F, referred to as the transitive closure off, can be generated containing all pairs of elements that are in for are transitively related by f.


Tuples in a relation f can represent edges between nodes in a directed graph. FIG. 1 illustrates an example directed graph. The graph includes nodes v1 through v7 and directed edges that connect the nodes. The edges in the graph can be represented by a relation f having the following tuples:

    • (v1,v2)
    • (v2,v3)
    • (v3,v4)
    • (v4,v5)
    • (v1,v6)
    • (v6,v7)
    • (v7,v6)
    • (v7,v5)


In this example, the source element of each tuple represents a source node of an edge, and the destination element in the tuple represents the destination node of the edge.


In the present example, v1 and v5 are transitively related because there is a sequence of tuples in f such that the first tuple has v1 as its source element, the last tuple has v4 as its destination element, and every tuple in the sequence has destination element that matches a source element of a subsequent tuple in the sequence, e.g., the sequence (v1,v2), (v2,v3), (v3,v4), (v4,v5). Nodes v7 and v4 are not transitively related because there is no such sequence of tuples in f. Thus (v1,v5) is a member of F, and (v7,v4) is not a member of F. In the graph context, two nodes s and d are transitively related if there are edges in the graph such that d is reachable from s. Nodes v1 and v5 are transitively related because node v5 is reachable from node v1. Furthermore, v7 and v4 are not transitively related because node v4 is not reachable from node v7.


In a sequence of tuples representing transitively related elements in off, e.g., (v1,v2), (v2,v3), (v3,v4), (v4,v5), the sequence can be described as n chained uses off, or equivalently, n chained uses of a function f(a,b) that operates on the relation f and returns true if (a,b) is in f, where n represents the number of tuples in the sequence. Thus, when the tuples represent edges in a graph, n chained uses of f represents paths in the graph having a length of n steps.


In this specification, a sequence of tuples having a length n, or n chained uses of a relation f, may be referred to as a path having a length n even when the relation f does not represent a graph. In other words, use of the term path does not necessarily require a graph. Rather, a path existing from s to d merely indicates that s and d are transitively related.


Computing transitive closures can be expensive in both processing time and storage space required. This is due in part to the multiple paths being explored that result in duplicate tuples. For example, a computer system computing the transitive closure off will explore two paths from v1 to v5. The first path goes through v2, and the second path goes through v6. Exploring both of these paths can result in generating (v1,v5) twice. Furthermore, the loop between v6 and v7 means that there are infinitely many paths between v1 and v5, e.g. (v1,v6), (v6,v7), (v7,v6), (v6, v7), (v7, v5) as well as (v1,v6), (v6,v7), (v7,v6), (v6, v7), (v7,v6), (v6, v7), (v7,v6), (v6, v7), (v7, v5).


Furthermore, a system can also end up generating many duplicate tuples due to exploring even single paths in multiple ways. For example, a system can generate the transitive relation (v1,v5) by joining multiple different subpaths, including joining (v1,v2), (v2,v5); (v1,v3), (v3,v5); and (v1,v4), (v4,v5).


The time required to compute the transitive closure of a relation depends significantly on the longest shortest path in f. The transitive closure F will include tuples representing all possible paths between elements in the relation f. Therefore, a computer system will need to explore at least the shortest path between each pair of transitively related elements in f. The longest of these paths, i.e. the longest shortest path, is therefore a limiting factor in how quickly the transitive closure can be computed.


Transitive closures can be computed by a computer system using predicates. In some programming languages, a predicate is a function defined by one or more statements that maps one or more input parameters to true or false. A predicate operates on an associated relation and returns true or false depending on whether a tuple defined by the input parameters occurs in the associated relation. In other words, f(a,b) is true if the tuple (a,b) is in the relation f and false otherwise. For brevity the name of the predicate, e.g., f, may refer to the predicate function itself or to the associated relation, the meaning of which will be clear from the context.


Thus, a predicate F(s,d) can operate on an associated relation that is the transitive closure of another relation associated with a predicate f(s,d). Thus, the predicate F(s,d) returns true if a tuple (s,d) exists in the associated relation for F and returns false otherwise.


The general manner in which evaluation engines compute associated relations from recursively defined statements will now be described. Recursive statements are statements that reference their own output. An example query language that supports recursive statements is Datalog. The following example statement can be written in Datalog:

    • f(i):—i=1; (i=2, f(1)); (i=3, f(2))


This statement in Datalog recursively defines a predicate f having input parameter i, which can be expressed as f(i). The:—operator defines the predicate f(i) to have the body “i=1; (i=2, f(1)); (i=3, f(2))”. A semicolon represents disjunction, i.e., logical “or,” and a comma represents conjunction, i.e., logical “and.” For clarity, logical “and” will occasionally be spelled out as an explicit “and” operator.


The semantics of the predicate f(i) in Datalog is that the body of f(i) is evaluated to compute the associated relation for f(i). The relation is the smallest set of values i such that the body of f(i) is satisfied, i.e., evaluates to true. Then, when a value for i is provided as input to the predicate f(i), the predicate evaluates to true if the value occurs in the associated relation for f(i) and false otherwise. For example, f(1) evaluates to “true” because the term “i=1” in the body defines the associated relation to include the tuple (1). Therefore, because the associated relation includes (1), f(1) evaluates to “true.” Evaluation of predicates is typically performed by an evaluation engine for the query language implemented by software installed on one or more computers.


The relation over which f(i) is evaluated may be specified within the body of the predicate. In this example, the body of f(i) defines a relation having a set of singleton tuples, e.g., {1, 2, 3}. However, the relation over which f(i) is evaluated may alternatively be specified by another predicate or may be explicitly defined. For example, the relation may be defined by a table in a database.


Evaluating a recursive predicate is to compute the least fixed point of the predicate. The least fixed point is a relation having a set of tuples that is a subset of all other fixed points of the predicate. Evaluation engines that evaluate recursive predicates can use a number of different procedures for finding the least fixed point.


Some methods for finding the least fixed point of a recursive predicate recast the predicate into a number of nonrecursive evaluation predicates. The evaluation predicates are then evaluated in sequence until a least fixed point is reached. In general, recasting a recursive predicate into a number of nonrecursive evaluation predicates may be referred to as “flattening” the recursion.


An evaluation engine for a particular query language can recast a recursive predicate as follows. A first nonrecursive predicate is defined as false. In addition to false, a sequence of subsequent nonrecursive predicates are defined according to the body of the recursive predicate. In doing so, the evaluation engine replaces each recursive term with a reference to a previous nonrecursive predicate. Logically, the number of nonrecursive predicates that can be generated is unbounded. However, the evaluation engine will halt evaluation when the least fixed point is reached.


The evaluation engine then evaluates the nonrecursive predicates in order and adds resulting tuples to the associated relation for the predicate. The evaluation engine stops when a nonrecursive predicate is reached whose evaluation adds no additional tuples to the relation. The final result is the associated relation for the recursively defined predicate.


Using this procedure, evaluating each successive predicate regenerates all of the results that have already been generated. Thus, this approach is sometimes referred to as “naive evaluation.”


For simplicity, predicates in this specification will generally be represented logically and may not necessarily have the form of a language construct of any particular query language. However, the implementation of the illustrated logical predicates by an evaluation engine is normally straightforward for query languages that support recursive predicates.


Thus, to illustrate naive evaluation, an evaluation engine can recast the predicate above into the following nonrecursive evaluation predicates.

    • f0(i):—false
    • f1(i):—i=1; (i=2, f0(1)); (i=3, f0(2))
    • f2(i):—i=1; (i=2, f1(1)); (i=3, f1(2))
    • f3(i):—i=1; (i=2, f2(1)); (i=3, f2(2))
    • . . .


Or, for brevity, the evaluation predicates may be represented as:

    • f0(i):—false
      • fn+1(i):—i=1; (i=2, fn(1)); (i=3, fn(2))


At first glance, this notation may look like a recursive definition, but it is not. This is because the subscripts of the predicates denote different nonrecursive predicates occurring in the potentially unbounded sequence of predicates. In other words, the predicate fn+1 is not recursive because it references fn, but not itself. The evaluation engine then evaluates the nonrecursive predicates in order to find the least fixed point.


An evaluation engine can use naive evaluation to compute a relation representing the transitive closure off.



FIG. 2 illustrates another example graph. The edges of the graph in FIG. 2 can be represented by a relation f having the following tuples:

    • (v1,v2)
    • (v2,v3)
    • (v3,v4)
    • (v4,v5)
    • (v5,v6)
    • (v6,v7)
    • (v7,v8)


Evaluating the following recursive predicate will compute the transitive closure off:

    • F(s,d):—f(s,d); exists(a: f(s,a), F(a,d))


The term “exists(a: f(s,a), F(a, d))” has an existential quantifier. A term having an existential quantifier may be referred to as an existential term. This existential term asserts that there is a data element a such that (s,a) is a tuple in f and that (a,d) is a tuple in the transitive closure F. Intuitively, this definition asserts that if (a,d) is in F and (s,a) is in f, then (s,d) is also in F.


This notation can also be thought of as generating a new tuple (s,d) by taking a first tuple (a,d) in F, where (a,d) represents a reachable path in the graph, and extending the reachable path one more step with (s,a) in f. Or equivalently, this notation can be thought of as extending a one-step path represented by (s,a) in f with another path having one or more steps, represented by (a,d) in F.


An evaluation engine can use naive evaluation to compute the transitive closure F(s,d). To do so, the evaluation engine can generate the following non-recursive evaluation predicates to flatten the recursive definition of F(s,d):

    • F0(s,d):—false
    • Fn+1(s,d):—f(s,d); exists(a: f(s,a), Fn(a,d))


Naive evaluation proceeds as illustrated in TABLE 1.












TABLE 1






Previous
Current



Predicate
relation
relation
Comments







F1(s,d)
{ }
{(v1, v2),
The relation of F0(s,d) is empty.




(v2, v3),
Thus, F1(s,d) evaluates to:




(v3, v4),
f(s,d); exists(a: f(s,a), F0(a,d))




(v4, v5),
or




(v5, v6),
(s,d) is in f




(v6, v7),
OR




(v7, v8)}
there exists an a such that (s, a) is in





f and (a,d) is in { }





Thus, only the tuples of f are





generated.


F2(s,d)
{(v1, v2),
{(v1, v2),
F2(s,d) evaluates to:



(v2, v3),
(v2, v3),
f(s,d); exists(a: f(s,a), F1(a,d))



(v3, v4),
(v3, v4),
or



(v4, v5),
(v4, v5),
(s,d) is in f



(v5, v6),
(v5, v6),
OR



(v7, v8)}
(v7, v8),
there exists an a such that (s, a) is in




(v1, v3),
f and (a,d) is in F1.




(v2, v4),
At this point, F1 includes f itself, so




(v3, v5),
the tuples produced are the tuples in




(v4, v6),
f as well as the tuples in f extended




(v5, v7),
by one step in the graph. Thus, the




(v6, v8)}
tuples produced represent two steps





in the graph.





Thus, {(v1, v3), (v2, v4), (v3, v5),





(v4, v6), (v5, v7), (v6, v8)} are





newly generated.


F3(i)
{(v1, v2),
{(v1, v2),
F3(s,d) evaluates to:



(v2, v3),
(v2, v3),
f(s,d); exists(a: f(s,a), F2(a,d)



(v3, v4),
(v3, v4),
or



(v4, v5),
(v4, v5),
(s,d) is in f



(v5, v6),
(v5, v6),
OR



(v7, v8),
(v7, v8),
there exists an a such that (s, a) is in



(v1, v3),
(v1, v3),
f and (a,d) is in F2.



(v2, v4),
(v2, v4),
The tuples produced are those in f



(v3, v5),
(v3, v5),
and those reachable from the



(v4, v6),
(v4, v6),
destinations in F2, which represented



(v5, v7),
(v5, v7),
up to two steps in the graph. Thus,



(v6, v8)}
(v6, v8),
the new tuples are those that




(v1, v4),
represent three steps in the graph.




(v2, v5),
Thus, {(v1, v4), (v2, v5), (v3, v6),




(v3, v6),
(v4, v7), (v5, v8)} are newly




(v4, v7),
generated.




(v5, v8)}


F4(i)
{(v1, v2),
{(v1, v2),
F4(s,d) evaluates to:



(v2, v3),
(v2, v3),
f(s,d); exists(a: f(s,a), F3(a,d)



(v3, v4),
(v3, v4),
or



(v4, v5),
(v4, v5),
(s,d) is in f



(v5, v6),
(v5, v6),
OR



(v7, v8),
(v7, v8),
there exists an a such that (s, a) is in



(v1, v3),
(v1, v3),
f and (a,d) is in F3.



(v2, v4),
(v2, v4),
The tuples produced are those in f



(v3, v5),
(v3, v5),
and those reachable from the



(v4, v6),
(v4, v6),
destinations in F3, which represented



(v5, v7),
(v5, v7),
up to three steps in the graph. Thus,



(v6, v8),
(v6, v8),
the new tuples are those that



(v1, v4),
(v1, v4),
represent four steps in the graph.



(v2, v5),
(v2, v5),
Thus, {(v1, v5), (v2, v6), (v3, v7),



(v3, v6),
(v3, v6),
(v4, v8)} are newly generated.



(v4, v7),
(v4, v7),



(v5, v8)}
(v5, v8),




(v1, v5),




(v2, v6),




(v3, v7),




(v4, v8)}


F5(i)
{(v1, v2),
{(v1, v2),
F5(s,d) evaluates to:



(v2, v3),
(v2, v3),
f(s,d); exists(a: f(s,a), F4(a,d)



(v3, v4),
(v3, v4),
or



(v4, v5),
(v4, v5),
(s,d) is in f



(v5, v6),
(v5, v6),
OR



(v7, v8),
(v7, v8),
there exists an a such that (s, a) is in



(v1, v3),
(v1, v3),
f and (a,d) is in F4.



(v2, v4),
(v2, v4),
The tuples produced are those in f



(v3, v5),
(v3, v5),
and those reachable from the



(v4, v6),
(v4, v6),
destinations in F4, which represented



(v5, v7),
(v5, v7),
up to four steps in the graph. Thus,



(v6, v8),
(v6, v8),
the new tuples are those that



(v1, v4),
(v1, v4),
represent five steps in the graph.



(v2, v5),
(v2, v5),
Thus, {(v1, v6), (v2, v7), (v3, v8)}



(v3, v6),
(v3, v6),
are newly generated.



(v4, v7),
(v4, v7),



(v5, v8),
(v5, v8),



(v1, v5),
(v1, v5),



(v2, v6),
(v2, v6),



(v3, v7),
(v3, v7),



(v4, v8)}
(v4, v8),




(v1, v6),




(v2, v7),




(v3, v8)}


F6(i)
{(v1, v2),
{(v1, v2),
F6(s,d) evaluates to:



(v2, v3),
(v2, v3),
f(s,d); exists(a: f(s,a), F5(a,d)



(v3, v4),
(v3, v4),
or



(v4, v5),
(v4, v5),
(s,d) is in f



(v5, v6),
(v5, v6),
OR



(v7, v8),
(v7, v8),
there exists an a such that (s, a) is in



(v1, v3),
(v1, v3),
f and (a,d) is in F5.



(v2, v4),
(v2, v4),
The tuples produced are those in f



(v3, v5),
(v3, v5),
and those reachable from the



(v4, v6),
(v4, v6),
destinations in F5, which represented



(v5, v7),
(v5, v7),
up to five steps in the graph. Thus,



(v6, v8),
(v6, v8),
the new tuples are those that



(v1, v4),
(v1, v4),
represent six steps in the graph.



(v2, v5),
(v2, v5),
Thus, {(v1, v7), (v2, v8)} are



(v3, v6),
(v3, v6),
newly generated.



(v4, v7),
(v4, v7),



(v5, v8),
(v5, v8),



(v1, v5),
(v1, v5),



(v2, v6),
(v2, v6),



(v3, v7),
(v3, v7),



(v4, v8),
(v4, v8),



(v1, v6),
(v1, v6),



(v2, v7),
(v2, v7),



(v3, v8),
(v3, v8),



(v4, v9)}
(v4, v9),




(v1, v7),




(v2, v8)}


F7(i)
{(v1, v2),
{(v1, v2),
F7(s,d) evaluates to:



(v2, v3),
(v2, v3),
f(s,d); exists(a: f(s,a), F6(a,d)



(v3, v4),
(v3, v4),
or



(v4, v5),
(v4, v5),
(s,d) is in f



(v5, v6),
(v5, v6),
OR



(v7, v8),
(v7, v8),
there exists an a such that (s, a) is in



(v1, v3),
(v1, v3),
f and (a,d) is in F6.



(v2, v4),
(v2, v4),
The tuples produced are those in f



(v3, v5),
(v3, v5),
and those reachable from the



(v4, v6),
(v4, v6),
destinations in F6, which represented



(v5, v7),
(v5, v7),
up to six steps in the graph. Thus,



(v6, v8),
(v6, v8),
the new tuples are those that



(v1, v4),
(v1, v4),
represent seven steps in the graph.



(v2, v5),
(v2, v5),
Thus, {(v1, v8)} is newly



(v3, v6),
(v3, v6),
generated.



(v4, v7),
(v4, v7),



(v5, v8),
(v5, v8),



(v1, v5),
(v1, v5),



(v2, v6),
(v2, v6),



(v3, v7),
(v3, v7),



(v4, v8),
(v4, v8),



(v1, v6),
(v1, v6),



(v2, v7),
(v2, v7),



(v3, v8),
(v3, v8),



(v4, v9),
(v4, v9),



(v1, v7),
(v1, v7),



(v2, v8)}
(v2, v8),




(v1, v8)}


F8(i)
{(v1, v2),
{(v1, v2),
F8(s,d) evaluates to:



(v2, v3),
(v2, v3),
f(s,d); exists(a: f(s,a), F7(a,d)



(v3, v4),
(v3, v4),
or



(v4, v5),
(v4, v5),
(s,d) is in f



(v5, v6),
(v5, v6),
OR



(v7, v8),
(v7, v8),
there exists an a such that (s, a) is in



(v1, v3),
(v1, v3),
f and (a,d) is in F7.



(v2, v4),
(v2, v4),
F7 contains all tuples representing up



(v3, v5),
(v3, v5),
to 7 steps in the graph



(v4, v6),
(v4, v6),
The tuples produced are those in f



(v5, v7),
(v5, v7),
and those reachable from the



(v6, v8),
(v6, v8),
destinations in F7, which represented



(v1, v4),
(v1, v4),
up to seven steps in the graph.



(v2, v5),
(v2, v5),
Because seven steps is the longest



(v3, v6),
(v3, v6),
path in the graph, there are no



(v4, v7),
(v4, v7),
additional tuples generated.



(v5, v8),
(v5, v8),



(v1, v5),
(v1, v5),



(v2, v6),
(v2, v6),



(v3, v7),
(v3, v7),



(v4, v8),
(v4, v8),



(v1, v6),
(v1, v6),



(v2, v7),
(v2, v7),



(v3, v8),
(v3, v8),



(v4, v9),
(v4, v9),



(v1, v7),
(v1, v7),



(v2, v8),
(v2, v8)



(v1, v8)}
(v1, v8)}









When F8 is evaluated, no additional tuples are added to the relation. Therefore, the evaluation engine can determine that the least fixed point has been reached. The tuples in F7 thus represent the transitive closure of f.


The evaluation engine required 8 iterations to compute the transitive closure. In general, when using this strategy an evaluation engine needs to compute lsp+1 iterations, where lsp represents the length of the longest shortest path in f.


Using naive evaluation also generated many duplicate tuples. In particular evaluation of the recursive definition of F required regenerating all the tuples that had already been generated on every single iteration. Thus, even for mildly complicated data sets, naive evaluation is very expensive.


Another prior art procedure for evaluating recursive predicates is referred to as “semi-naive evaluation.” When using semi-naive evaluation, an evaluation engine flattens the recursion of the predicate in a different way than naive evaluation. In particular, the evaluation engine defines a delta predicate whose associated relation is defined to include only the new tuples found on each iteration. The least fixed point is found when an iteration is reached in which the delta predicate's associated relation is empty.


Evaluating the previous definition of F with semi-naive evaluation would avoid some of the unnecessary generation of duplicate tuples, but it would still require the same number of iterations (lsp+1) to find the transitive closure.


An alternative definition for computing the transitive closure off is given by the following recursively defined predicate:

    • F(s,d):—f(s,d); exists(a: F(s,a), F(a,d))


When using either naive or semi-naive evaluation, this definition results in fewer iterations than the definition illustrated in TABLE 1.


Intuitively, this definition asserts that if (a,d) is in F and (s,a) is in F, then (s,d) is also in F. In other words, this notation can be thought of as generating a new tuple (s,d) by extending a path represented by a tuple (s,a) in F by a path represented by another tuple (a,d) that is also in F.


To illustrate using semi-naive evaluation to find the transitive closure, an evaluation engine can generate the following evaluation predicates:

    • δ0(s,d):—false
    • F0(s,d):—false
    • δn+1(s,d):—(f(s,d); (exists(a: Fn(s,a), δn(a,d)); (exists(a: δn(s,a), Fn(a,d))), not Fn(s,d)
    • Fn+1(s,d):—Fn(s,d); δn+1(s,d)


As mentioned above, semi-naive evaluation uses an evaluation predicate that is referred to as a delta predicate. A system can generate the delta predicate by replacing recursive calls in the original predicate with nonrecursive calls to the previous delta predicate; where a single disjunct contains multiple recursive calls, as in this example, the disjunct is duplicated once for each recursive call, as shown above. The system then generates a conjunction of the result with a negation of the predicate from the previous iteration. Thus, the delta predicate is defined to include only new tuples found in a particular iteration of the evaluation. The term “not Fn(s,d)” at the end of the definition for δn+1(s,d) indicates that previously found tuples do not satisfy the delta predicate for δn+1(s,d).


Evaluation of the transitive closure using semi-naive evaluation is illustrated in TABLE 2. An evaluation engine need not compare a previous relation to a current relation as was done for naive evaluation. Rather, the evaluation engine can halt when the first empty delta predicate is found.











TABLE 2





Predicate
Relation
Comments







δ0(s,d)
{ }
Empty by definition


F0(s,d)
{ }
Empty by definition


δ1(s,d)
{(v1, v2),
δ1(s,d) evaluates to:



(v2, v3),
(f(s,d); (exists(a: F0(s,a), δ0(a,d)); (exists(a: δ0(s,a),



(v3, v4),
F0(a,d))), not F0(s,d)



(v4, v5),
Because there are no tuples in F0 or δ0, only the tuples in f are



(v5, v6),
generated.



(v6, v7),



(v7, v8)}


F1(s,d)
{(v1, v2),
F1(s,d) evaluates to:



(v2, v3),
F0(s,d); δ1(s,d)



(v3, v4),
or



(v4, v5),
(s,d) is in { } OR (s,d) is in {(v1, v2), (v2, v3), (v3,



(v5, v6),
v4), (v4, v5), (v5, v6), (v6, v7), (v7, v8)}



(v6, v7),
Thus, the tuples in f are generated.



(v7, v8)}


δ2(s,d)
{(v1, v3),
δ2(s,d) evaluates to:



(v2, v4),
(f(s,d); (exists(a: F1(s,a), δ1(a,d)); (exists(a: δ1(s,a),



(v3, v5),
F1(a,d))), not F1(s,d)



(v4, v6),
The tuples produced are those that are reachable from



(v5, v7),
a source in F1 or a source in δ1.



(v6, v8)}
Since F1 and δ1 are both equal to f, the tuples generated




are those that represent two steps in the graph.


F2(s,d)
{(v1, v2),
F2(s,d) evaluates to:



(v2, v3),
F1(s,d); δ2(s,d)



(v3, v4),
or



(v4, v5),
(s,d) is in {(v1, v2), (v2, v3), (v3, v4), (v4, v5), (v5,



(v5, v6),
v6), (v6, v7), (v7, v8)} OR (s,d) is in {(v1, v3), (v2,



(v6, v7),
v4), (v3, v5), (v4, v6), (v5, v7), (v6, v8)}



(v7, v8),



(v1, v3),



(v2, v4),



(v3, v5),



(v4, v6),



(v5, v7),



(v6, v8)}


δ3(s,d)
{(v1, v4),
δ3(s,d) evaluates to:



(v1, v5),
(f(s,d); (exists(a: F2(s,a), δ2(a,d)); (exists(a: δ2(s,a),



(v2, v5),
F2(a,d))), not F2(s,d)



(v2, v6),
The tuples produced are those that form a transitive



(v3, v6),
relation from a first relation in F2 or δ2 and a second



(v3, v7),
relation in F2 or δ2.



(v4, v7),
Both F2 and δ2 include tuples that represent one step



(v4, v8),
and two steps in the graph, so the new tuples produced



(v5, v8)}
by combining those tuples represent 3 and 4 steps in




the graph.




Thus, the following tuples are generated: {(v1, v4),




(v1, v5), (v2, v5), (v2, v6), (v3, v6), (v3, v7), (v4, v7),




(v4, v8), (v5, v8)}


F3(s,d)
{(v1, v2),
F3(s,d) evaluates to:



(v2, v3),
F2(s,d); δ3(s,d)



(v3, v4),
or



(v4, v5),
(s,d) is in {(v1, v2), (v2, v3), (v3, v4), (v4, v5), (v5,



(v5, v6),
v6), (v6, 7),



(v6, v7),
(v7, v8), (v1, v3), (v2, v4), (v3, v5), (v4, v6), (v5, v7),



(v7, v8),
(v6, v8)} OR (s,d) is in {(v1, v4), (v1, v5), (v2, v5),



(v1, v3),
(v2, v6), (v3, v6), (v3, v7), (v4, v7), (v4, v8), (v5, v8)}



(v2, v4),



(v3, v5),



(v4, v6),



(v5, v7),



(v6, v8),



(v1, v4),



(v1, v5),



(v2, v5),



(v2, v6),



(v3, v6),



(v3, v7),



(v4, v7),



(v4, v8),



(v5, v8)}


δ4(s,d)
{(v1, v6),
δ4(s,d) evaluates to:



(v1, v7),
(f(s,d); (exists(a: F3(s,a), δ3(a,d)); (exists(a: δ3(s,a),



(v1, v8),
F3(a,d))), not F3(s,d)



(v2, v7),
The tuples produced are those that form a transitive



(v2, v8),
relation from a first tuple in F3 and a second tuple in δ3



(v3, v8)}
or from a first tuple in δ3 and a second tuple in F3.




Both F3 and δ3 include tuples that represent one, two,




three, or four steps in the graph, so the new tuples




produced by transitively combining those tuples




represent up to 8 steps in the graph.




Thus, the following tuples are generated: {(v1, v6),




(v1, v7), (v1, v8), (v2, v7), (v2, v8), (v3, v8)}




The tuples formed are those in δ3 that can be extended




by those in F3.


F3(s,d)
{(v1, v2),
F4(s,d) evaluates to:



(v2, v3),
F3(s,d); δ4(s,d)



(v3, v4),
or



(v4, v5),
(s,d) is in {(v1, v2), (v2, v3), (v3, v4), (v4, v5), (v5,



(v5, v6),
v6), (v6, 7),



(v6, v7),
(v7, v8), (v1, v3), (v2, v4), (v3, v5), (v4, v6), (v5, v7),



(v7, v8),
(v6, v8), (v1, v4), (v1, v5), (v2, v5), (v2, v6), (v3, v6),



(v1, v3),
(v3, v7), (v4, v7), (v4, v8), (v5, v8)} OR (s,d) is in



(v2, v4),
{(v1, v6), (v1, v7), (v1, v8), (v2, v7), (v2, v8), (v3, v8)}



(v3, v5),



(v4, v6),



(v5, v7),



(v6, v8),



(v1, v4),



(v1, v5),



(v2, v5),



(v2, v6),



(v3, v6),



(v3, v7),



(v4, v7),



(v4, v8),



(v5, v8),



(v1, v6),



(v1, v7),



(v1, v8),



(v2, v7),



(v2, v8),



(v3, v8)}









Using this alternate definition of the transitive closure required only three iterations. In general, an evaluation engine using this technique to compute the transitive closure needs to compute O(log2(lsp)) iterations. Therefore, it is generally more efficient, in terms of iterations required, than naive evaluation, the previous method.


However, this strategy still generates many duplicates tuples. Consider the tuples generated when considering s=v1 during the iteration of F3. The tuples generated during this iteration are illustrated in TABLE 3:











TABLE 3





Disjunct Used
Tuples Generated
Generated From







exists(a: F3(s,a), δ3(a,d))
(v1, v5)
(v1, v2), (v2, v5)



(v1, v6)
(v1, v2), (v2, v6)



(v1, v6)
(v1, v3), (v3, v6)



(v1, v7)
(v1, v3), (v3, v7)



(v1, v7)
(v1, v4), (v4, v7)



(v1, v8)
(v1, v4), (v4, v8)



(v1, v8)
(v1, v5), (v5, v8)


exists(a: δ3(s,a), F3(a,d))
(v1, v5)
(v1, v4), (v4, v5)



(v1, v6)
(v1, v4), (v4, v6)



(v1, v7)
(v1, v4), (v4, v7)



(v1, v8)
(v1, v4), (v4, v8)



(v1, v6)
(v1, v5), (v5, v6)



(v1, v7)
(v1, v5), (v5, v7)



(v1, v8)
(v1, v5), (v5, v8)









On this iteration for v1, only four new tuples were generated: (v1,v5), (v1,v6), (v1,v7), and (v1,v8). However, the tuple (v1,v5) was generated twice, (v1,v6) was generated three times, (v1,v7) was generated three times, and (v1,v8) was generated three times.


In general, this approach generates a quadratic number of duplicate tuples, but only a linear number of new tuples, on each iteration.


SUMMARY

This specification describes technologies relating to computing the transitive closure of a relation more efficiently in terms of iterations required and duplicate tuples produced.


The subject matter described in this specification can be implemented in particular embodiments so as to realize one or more of the following advantages. The transitive closure of a relation can be computed more quickly and using fewer computational resources. The transitive closure can be computed in O(log2(lsp)) time while also reducing or eliminating duplicate tuples that are generated.


The details of one or more embodiments of the subject matter of this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 illustrates an example directed graph.



FIG. 2 illustrates another example graph.



FIG. 3 is a flow chart of an example process for computing the transitive closure of a relation.





Like reference numbers and designations in the various drawings indicate like elements.


DETAILED DESCRIPTION

This specification describes improved ways of computing the transitive closure of a relation. This specification also describes mechanisms for evaluating a relation representing a transitive closure.


An example application for an evaluation engine that can compute a transitive closure of a relation occurs in static analysis systems. Static analysis refers to techniques for analyzing computer software source code without executing the source code as a software program. A static analysis system can use a query language to determine a variety of attributes about source code in a code base.


In static analysis systems, it is useful to represent control flows of a computer program. For example, a software developer might want to know if a function A defined in a code base ever calls another function B, either directly or indirectly. Similarly, a software developer also might want to know if a variable is always initialized before it is used, or whether a control flow exists in the software in which the variable is used before it is initialized.


In very large code bases, determining an answer to these questions using manual inspection or simple text searching may be difficult or impossible. In addition, actually running the software may not yield a correct answer to the question if the testing procedures omit control flows that nevertheless exist in the software.


A static analysis system can solve this problem by processing source code in a code base to generate a relation f representing relationships between source code elements. For example, the relation can include tuples (p,c) where p represents a calling function and c represents a called function. In large code bases, such a relation may include millions of such tuples.


The static analysis system can then compute the transitive closure of f. After doing so, in order to determine whether a control flow exists in the software in which the function B is called by the function A, the system merely needs to determine whether or not (A,B) exists in the transitive closure off.


Transitive closures are useful in many other applications and industries. For example, some digital authentication mechanisms, e.g., Pretty Good Privacy (PGP), allow a first user to digitally sign a public key of a second user. Doing so signifies that the first user trusts the second user. If a relation contains information about which users trust which other users, a transitive closure of the relation can represent chains of trust among the users. Thus, if the second user trusts a third user, the first user can or should be expected to also trust the third user. Therefore, the transitive closure can reveal such chains of trust among the users.



FIG. 3 is a flow chart of an example process for computing the transitive closure of a relation. In general, the system will iteratively add tuples to a relation Fn for each iteration n. The process will be described as being performed by an appropriately programmed system of one or more computers.


The system receives a request to compute the transitive closure of a relation (310). As described above, computing the transitive closure of a relation f generates a new relation F. The system can receive a request from a user who specifies the relation f. The system can also receive a request to automatically generate transitive closures of relations of source code elements as part of a static analysis process.


The system adds the tuples in f to both F1 and ψ1 (320). Adding the tuples in f to F1 initializes the transitive closure relation.


While computing the transitive closure, the system will also make use of an auxiliary relation ψn+1 on each iteration. In general, the relation ψn+1 includes tuples representing shortest paths having 2n steps on each iteration n. Thus, the length of the paths represented by tuples in ψn+1 doubles on each iteration. So to calculate ψ1, the system chooses the shortest paths having 21-1=1 step, and simply adds the tuples of f to ψ1.


The system generates new first tuples in ψn+1 by matching destination elements of tuples in ψn with source elements of tuples in ψn, and excluding any tuples in Fn (330). In other words, the system determines which tuples in ψn have a destination element that matches a source element of another tuple in ψn.


The new first tuples for iteration n will be tuples representing 2n chained uses of f. If the tuples represent edges in a graph, conceptually this step can be thought of as extending paths in the graph represented by tuples in ψn with paths also represented by the tuples in ψn. Because the tuples in ψn represent paths having length 2n−1, the new first tuples in ψn+1 will represent a doubling of those path lengths, resulting in paths having a length 2n.


The system determines whether new first tuples have been produced on this iteration (340). If not, the system provides an indication that Fn is the transitive closure off (branch to 350).


If new first tuples have been produced, the system generates new second tuples by matching destination elements of the new first tuples in ψn+1 with source elements of the tuples in Fn (branch to 360). In other words, the system uses the newly generated first tuples of ψn+1 to generate new second tuples.


To do so, the system determines which of the new first tuples in ψn+1 have a destination element that matches a source element of a tuple in F. Note that Fn has not been updated to include the new first tuples.


If the tuples represent edges in a graph, this step can be thought of as extending paths in the graph represented by the new first tuples in ψn+1 with paths represented by the tuples in Fn. Doing so generates new second tuples that represent paths having a length greater than 2n steps and less than 2111 steps.


The system adds the tuples in Fn, the new first tuples, and the new second tuples to Fn+1. (370). In other words, the system defines a new relation Fn+1. In practice, the system need not generate an entirely new relation. Rather, the system can merely add the new first tuples and the new second tuples to the relation of Fn and designate the resulting relation as Fn+1.


In the graph example, the sets of tuples that make up Fn+1 respectively represent (1) paths having length less than 2n for Fn, (2) paths having length of exactly 2n for ψn+1, and (3) paths having length greater than 2n but less than 2n+1 for the tuples generated by extending ψn+1 by the tuples in F. Combined together, these three sets represent all tuples of all elements separated by shortest paths of length less than 2n+1.


This approach reduces or eliminates duplicate tuples for the following reasons. The relation for ψn+1 has, on iteration n, tuples representing shortest paths of length 2n. Therefore, a tuple (a,c) in ψn+1 represents a shortest path between a tuple (a,b) in Fn and a tuple (b,c) in Fn. The tuple (a,c) necessarily represents the shortest path between (a,b) and (b,c) because if a shorter path existed, it would already be in Fn, and therefore would not be in ψn+1, since elements of Fn are explicitly excluded from ψn+1.


Then, the system extends the paths in ψn+1 with all shortest paths having length less than 2n, resulting in all shortest paths having a length between 2n and 2n+1. Notably, the system does not need to explore or extend any paths having a length less than 2n because all of those paths were previously explored on a previous iteration. In other words, once a tuple (s,d) is added to Fn, the path that generated (s,d) is not explored again.


Therefore, the system only generates tuples representing shortest paths, which eliminates many if not most duplicate generation when computing the transitive closure. In this example, it results in eliminating duplicates.


In general, duplicates are only generated due to branching or loops in the tuples of f. In FIG. 1, for example, (v1,v5) would be generated twice due to the path through v2, v3, and v4 and the path through v6 and v7. And (v1,v6) would be generated twice due to (v1,v6) in f and the path (v1,v6), (v6,v7), (v7,v6).


The proportion of duplicates that are still generated depends on the type of data represented by the relation. For example, computer software tends to have long sequences of sequentially evaluated expressions and statements, with relatively few branches and loops. Therefore, where the data represents the control flow of a computer program, as is the case for static analysis, almost all of the duplication will be eliminated.


To efficiently calculate the transitive closure of f, an evaluation engine can perform the example process by evaluating the following predicates for n=1, 2, etc., until ψn+1 is empty, at which point Fn will be the transitive closure off:

    • ψ1(s,d):—f(s,d)
    • F1(s,d):—f(s,d)
    • ψn+1(s,d):—exists(a: ψn(s,a), ψn(a,d)), not Fn(s,d)
    • Fn+1(s,d):—Fn(s,d);
      • ψn+1(s,d);
      • exists(a: ψn+1(s,a), Fn(a,d))


Computing the ψn+1(s,d) predicate represents computing the new first tuples on each iteration. The existential term “exists(a: ψn+1(s,a), Fn(a,d))” represents computing the new second tuples on each iteration.


Evaluation of the transitive closure using these predicates is illustrated below in TABLE 4. This example uses the example relation f that represented edges in the graph shown in FIG. 2. Notably, using these predicates result in the transitive closure being generated in log2(lsp) time with no duplicate tuples being generated at all.











TABLE 4





Predicate
Relation
Comments







ψ1(s,d)
{(v1, v2),
ψ1(s,d) evaluates to: f(s,d)



(v2, v3),



(v3, v4),



(v4, v5),



(v5, v6),



(v6, v7),



(v7, v8)}


F1(s,d)
{(v1, v2),
F1(s,d) evaluates to: f(s,d)



(v2, v3),



(v3, v4),



(v4, v5),



(v5, v6),



(v6, v7),



(v7, v8)}


ψ2(s,d)
{(v1, v3),
ψ2(s,d) evaluates to:



(v2, v4),
exists(a: ψ1(s,a), ψ1(a,d)), not F1(s,d)



(v3, v5),
The tuples in ψ2 are tuples representing paths in ψ1



(v4, v6),
extended by paths in ψ1. The tuples in ψ2 represent shortest



(v5, v7),
paths that are 21 steps long, which in this case



(v6, v8)}
is 2 steps in the graph.


F2(s,d)
{(v1, v2),
F2(s,d) evaluates to:



(v2, v3),
F1(s,d); ψ2(s,d); exists(a: ψ2(s,a), F1(a,d))



(v3, v4),
The tuples produced are those in F1, those in ψ2, or



(v4, v5),
those in ψ2 that can be extended by a tuple in F1.



(v5, v6),
We have:



(v6, v7),
F1:



(v7, v8),
{(v1, v2), (v2, v3), (v3, v4), (v4, v5), (v5, v6), (v6,



(v1, v3),
v7), (v7, v8)};



(v2, v4),
ψ2:



(v3, v5),
(v1, v3), (v2, v4), (v3, v5), (v4, v6), (v5, v7), (v6, v8);



(v4, v6),
For the last term, because F1 includes only the tuples in



(v5, v7),
f, ψ2 is only going to be extended by single steps in the graph.



(v6, v8),
Since ψ2 included tuples representing two steps



(v1, v4),
in the graph, the last term will include tuples



(v2, v5),
representing three steps in the graph, or:



(v3, v6),
{(v1, v4), (v2, v5), (v3, v6), (v4, v7), (v5, v8)}



(v4, v7),
In other words, F1 includes tuples representing one



(v5, v8)}
step in the graph, ψ2 includes tuples representing two




steps in the graph, and the last term generates tuples




representing three steps in the graph. Thus, F2 will




now have tuples representing one, two, and three steps




in the graph.


ψ3(s,d)
{(v1, v5),
ψ3(s,d) evaluates to:



(v2, v6),
exists(a: ψ2(s,a), ψ2(a,d)), not F2(s,d)



(v3, v7),
The tuples in ψ3 are tuples representing paths in ψ2



(v4, v6)}
extended by paths in ψ2. The tuples in ψ3 represent




paths that are 22 long, which in this case is 4 steps in




the graph.


F3(s,d)
{(v1, v2),
F3(s,d) evaluates to:



(v2, v3),
F2(s,d); ψ3(s,d); exists(a: ψ3(s,a), F2(a,d))



(v3, v4),
The tuples produced are those in F2, those in ψ3, or



(v4, v5),
those in ψ3 that can be extended by a tuple in F2.



(v5, v6),
As described above, F2 included all tuples representing



(v6, v7),
one, two, and three steps in the graph, and ψ3 included



(v7, v8),
tuples that represent four steps in the graph.



(v1, v3),
Thus, the four-step tuples in ψ3 are going to be



(v2, v4),
extended by all other tuples that represent one, two,



(v3, v5),
and three steps in the graph, which will result in all



(v4, v6),
tuples that include five, six, and seven steps in the



(v5, v7),
graph:



(v6, v8),
{(v1, v6), (v1, v7), (v1, v8), (v2, v7), (v2, v8), (v3, v8)}



(v1, v4),



(v2, v5),



(v3, v6),



(v4, v7),



(v5, v8),



(v1, v5),



(v2, v6),



(v3, v7),



(v4, v6),



(v1, v6),



(v1, v7),



(v1, v8),



(v2, v7),



(v2, v8),



(v3, v8)}


ψ4(s,d)
{ }
ψ4(s,d) evaluates to:




exists(a: ψ3(s,a), ψ3(a,d)), not F3(s,d)




The tuples in ψ4 are tuples representing paths in ψ3




extended by paths in ψ3, or tuples representing paths




having length 23 = 8 steps. However, since there are




no such paths of 8 steps in the graph, ψ4(s,d) is empty.









After the system determines that ψ4(s,d) is empty, the system can end processing. This is because if ψ4(s,d) is empty, there are no new first tuples to be extended and thus, no second tuples either. Therefore, no new tuples are generated on this iteration.


Using this evaluation process, the system can compute the transitive closure in custom-character log2(lsp)custom-character+1 iterations without generating any duplicate tuples.


The examples above assume that the tuples of the relation fare ordered. Thus, in the graph context, the relation f represents edges in a directed graph. If, however, the relation f has unordered tuples, e.g., for an undirected graph, the system can use the same procedure described above on a new relation g(s,d) that is the disjunction of f(s,d) and f(d,s). In other words, the relation g includes all tuples in f, as well, as new tuples generated by reversing the source and destination elements of the tuples in f.


Embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory storage medium for execution by, or to control the operation of, data processing apparatus. Alternatively or in addition, the program instructions can be encoded on an artificially-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.


The term “data processing apparatus” refers to data processing hardware and encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can also be or further include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can optionally include, in addition to hardware, code that creates an execution environment for computer programs, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.


A computer program, which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code, can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub-programs, or portions of code. A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a data communication network.


The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).


Computers suitable for the execution of a computer program include, by way of example, can be based on general or special purpose microprocessors or both, or any other kind of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.


Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.


Control of the various systems described in this specification, or portions of them, can be implemented in a computer program product that includes instructions that are stored on one or more non-transitory machine-readable storage media, and that are executable on one or more processing devices. The systems described in this specification, or portions of them, can be implemented as an apparatus, method, or electronic system that may include one or more processing devices and memory to store executable instructions to perform the operations described in this specification.


To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's device in response to requests received from the web browser.


In addition to the embodiments of the attached embodiments and the embodiments described above, the following embodiments are also innovative:


Embodiment 1 is a method comprising:


receiving a request to compute a transitive closure F of a relation f, wherein the relation f includes tuples that each relate a source element s to a destination element d;


initializing F of an initial iteration with the tuples in f;


initializing an auxiliary relation of the initial iteration with the tuples in f;


iteratively computing new first tuples and new second tuples on each iteration until no new first tuples are generated, including:

    • generating new first tuples on each iteration by matching destination elements of tuples in the auxiliary relation of the previous iteration with source elements of tuples in the auxiliary relation of the previous iteration,
    • generating second tuples on each iteration by matching destination elements of the new first tuples with source elements of tuples in F of the previous iteration, and
    • adding the new first tuples and the new second tuples to F of the current iteration; and


providing an indication that the tuples in F of the current iteration represent the transitive closure off.


Embodiment 2 is the method of embodiment 1, further comprising:


generating evaluation predicates that include:

    • a first predicate that when evaluated generates the new first tuples,
    • a second predicate that when evaluated generates the new second tuples, and
    • an Fn+1 predicate that when evaluated generates a relation having tuples in Fn, tuples generated by the first predicate, and tuples generated by the second predicate,


wherein iteratively computing new first tuples and new second tuples comprises iteratively evaluating the evaluation predicates.


Embodiment 3 is the method of embodiment 2, wherein:


the first predicate is defined by:

    • ψ1(s,d):—f(s,d)
    • ψn+1(s,d):—exists(a: ψn(s,a), ψn(a,d)), custom-characterFn(s,d),


and the second predicate is defined by:

    • F1(s,d):—f(s, d)
    • Fn+1(s,d):—Fn(s,d); ψn+1(s,d); exists(a: ψn+1(s,a), Fn(a,d)).


Embodiment 4 is the method of any one of embodiments 1-3, further comprising:


adding the new first tuples to a new auxiliary relation for a current iteration.


Embodiment 5 is the method of any one of embodiments 1-4, wherein the new first tuples for an iteration n includes tuples representing 2n chained uses of f.


Embodiment 6 is the method of any one of embodiments 1-5, wherein the second tuples for an iteration n includes tuples representing between 2n and 2n chained uses of f.


Embodiment 7 is the method of any one of embodiments 1-6, further comprising computing the transitive relation off while generating duplicate tuples only due to branches and loops.


Embodiment 8 is the method of any one of embodiments 1-7, further comprising computing the transitive relation of f in O(log2(lsp)) iterations, wherein lsp represents a length of the longest shortest path of tuples in F.


Embodiment 9 is a system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform the method of any one of embodiments 1 to 8.


Embodiment 10 is a computer storage medium encoded with a computer program, the program comprising instructions that are operable, when executed by data processing apparatus, to cause the data processing apparatus to perform the method of any one of embodiments 1 to 8.


While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or on the scope of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.


Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.


Particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. As one example, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous.

Claims
  • 1. A computer-implemented method comprising: receiving a request to compute a transitive closure F of a relation f, wherein the relation f includes tuples that each relate a source element s to a destination element d;initializing F of an initial iteration with the tuples in f;initializing an auxiliary relation of the initial iteration with the tuples in f;iteratively computing new first tuples and new second tuples on each iteration until no new first tuples are generated, including: generating new first tuples on each iteration by matching destination elements of tuples in the auxiliary relation of a previous iteration with source elements of tuples in the auxiliary relation of the previous iteration,generating second tuples on each iteration by matching destination elements of the new first tuples with source elements of tuples in F of the previous iteration, andadding the new first tuples and the new second tuples to F of a current iteration; andproviding an indication that the tuples in F of the current iteration represent the transitive closure off.
  • 2. The method of claim 1, further comprising: generating evaluation predicates that include: a first predicate that when evaluated generates the new first tuples,a second predicate that when evaluated generates the new second tuples, andan Fn+1 predicate that when evaluated generates a relation having tuples in Fn, tuples generated by the first predicate, and tuples generated by the second predicate,wherein iteratively computing new first tuples and new second tuples comprises iteratively evaluating the evaluation predicates.
  • 3. The method of claim 2, wherein: the first predicate is defined by: ψ1(s,d):—f(s,d)ψn+1(s,d):—exists(a: ψn(s,a), ψn(a,d)), Fn(s,d),and the second predicate is defined by: F1(s,d):—f(s, d)Fn+1(s,d):—Fn(s,d); ψn+1(s,d); exists(a: ψn+1(s,a), Fn(a,d)).
  • 4. The method of claim 1, further comprising: adding the new first tuples to a new auxiliary relation for a current iteration.
  • 5. The method of claim 1, wherein the new first tuples for an iteration n includes tuples representing 2n chained uses of f.
  • 6. The method of claim 1, wherein the second tuples for an iteration n includes tuples representing between 2n and 2n+1 chained uses of f.
  • 7. The method of claim 1, further comprising computing the transitive relation off while generating duplicate tuples only due to branches and loops.
  • 8. The method of claim 1, further comprising computing the transitive relation of f in O(log2(lsp)) iterations, wherein lsp represents a length of a longest shortest path of tuples in F.
  • 9. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising:receiving a request to compute a transitive closure F of a relation f, wherein the relation f includes tuples that each relate a source element s to a destination element d;initializing F of an initial iteration with the tuples in f;initializing an auxiliary relation of the initial iteration with the tuples in f;iteratively computing new first tuples and new second tuples on each iteration until no new first tuples are generated, including: generating new first tuples on each iteration by matching destination elements of tuples in the auxiliary relation of a previous iteration with source elements of tuples in the auxiliary relation of the previous iteration,generating second tuples on each iteration by matching destination elements of the new first tuples with source elements of tuples in F of the previous iteration, andadding the new first tuples and the new second tuples to F of a current iteration; andproviding an indication that the tuples in F of the current iteration represent the transitive closure off.
  • 10. The system of claim 9, wherein the operations further comprise: generating evaluation predicates that include: a first predicate that when evaluated generates the new first tuples,a second predicate that when evaluated generates the new second tuples, andan Fn+1 predicate that when evaluated generates a relation having tuples in Fn, tuples generated by the first predicate, and tuples generated by the second predicate,wherein iteratively computing new first tuples and new second tuples comprises iteratively evaluating the evaluation predicates.
  • 11. The system of claim 10, wherein: the first predicate is defined by: ψ1(s,d):—f(s,d)ψn+1(s,d):—exists(a: ψn(s,a), ψn(a,d)), Fn(s,d),and the second predicate is defined by: F1(s,d):—f(s, d)Fn+1(s,d):—Fn(s,d); ψn+1(s,d); exists(a: ψn+1(s,a), Fn(a,d)).
  • 12. The system of claim 9, wherein the operations further comprise: adding the new first tuples to a new auxiliary relation for a current iteration.
  • 13. The system of claim 9, wherein the new first tuples for an iteration n includes tuples representing 2n chained uses of f.
  • 14. The system of claim 9, wherein the second tuples for an iteration n includes tuples representing between 2n and 2n+1 chained uses of f.
  • 15. The system of claim 9, wherein the operations further comprise computing the transitive relation off while generating duplicate tuples only due to branches and loops.
  • 16. The system of claim 9, wherein the operations further comprise computing the transitive relation of f in O(log2(lsp)) iterations, wherein lsp represents a length of a longest shortest path of tuples in F.
  • 17. A computer program product, encoded on one or more non-transitory computer storage media, comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: receiving a request to compute a transitive closure F of a relation f, wherein the relation f includes tuples that each relate a source element s to a destination element d;initializing F of an initial iteration with the tuples in f;initializing an auxiliary relation of the initial iteration with the tuples in f;iteratively computing new first tuples and new second tuples on each iteration until no new first tuples are generated, including: generating new first tuples on each iteration by matching destination elements of tuples in the auxiliary relation of a previous iteration with source elements of tuples in the auxiliary relation of the previous iteration,generating second tuples on each iteration by matching destination elements of the new first tuples with source elements of tuples in F of the previous iteration, andadding the new first tuples and the new second tuples to F of a current iteration; andproviding an indication that the tuples in F of the current iteration represent the transitive closure off.
  • 18. The computer program product of claim 17, wherein the operations further comprise: generating evaluation predicates that include: a first predicate that when evaluated generates the new first tuples,a second predicate that when evaluated generates the new second tuples, andan Fn+1 predicate that when evaluated generates a relation having tuples in Fn, tuples generated by the first predicate, and tuples generated by the second predicate,wherein iteratively computing new first tuples and new second tuples comprises iteratively evaluating the evaluation predicates.
  • 19. The computer program product of claim 18, wherein: the first predicate is defined by: ψ1(s,d):—f(s,d)ψn+1(s,d):—exists(a: ψn(s,a), ψn (a,d)), Fn(s,d),and the second predicate is defined by: F1(s,d):—f(s, d)Fn+1(s,d):—Fn(s,d); ψn+1(s,d); exists(a: ψn+1(s,a), Fn(a,d)).
  • 20. The computer program product of claim 17, wherein the operations further comprise: adding the new first tuples to a new auxiliary relation for a current iteration.
  • 21. The computer program product of claim 17, wherein the new first tuples for an iteration n includes tuples representing 2n chained uses of f.
  • 22. The computer program product of claim 17, wherein the second tuples for an iteration n includes tuples representing between 2n and 2n+1 chained uses of f.
  • 23. The computer program product of claim 17, wherein the operations further comprise computing the transitive relation off while generating duplicate tuples only due to branches and loops.
  • 24. The computer program product of claim 17, wherein the operations further comprise computing the transitive relation of f in O(log2(lsp)) iterations, wherein lsp represents a length of a longest shortest path of tuples in F.