The invention is related to a method of computer software and modeling and more specifically to a computer implemented method of persisting directed acylic graphs.
A common construct in computer science is the directed acyclic graph (DAG). This construct has a set of nodes, one of which is a root, and directed edges between some pairs of nodes such that every node is reachable from the root and such that there are no cycles of edges. By way of a non-limiting example of where a GAD arises, the folders in a typical file system without shortcuts form a tree, but after adding shortcuts between folders it typically becomes a DAG, since there is often more than one path from the root to a folder.
A particularly important use of DAGs is the one created by the classes in an object-oriented modeling or programming language, for example UML static class diagrams and/or C++, once an inheritance relationship, including multiple-inheritance, is added between the classes, provided that a top level class such as ‘object’ is provided in the language. An example of this type of DAG is shown in
A known challenge for most storage mechanisms, such as relational databases which represent data in tabular format, is that they are not able to directly represent DAGs. It is possible to represent DAGs using one table to list all the nodes and another table to list all the edges, where each edge is represented as a pair of nodes. In a non-limiting example, a DAG is represented using two columns in an edge table, with one column representing a direction from the node and another column representing to the node. However this representation is very inefficient for answering certain standard queries such as: “find all the descendants of a node n”. Such a query cannot be expresses as a single database query using this representation and cannot be achieved in reasonable time.
Most prior art algorithms focus on finding, or at least approximating, a transitive closure of the DAG, i.e. a list of all pairs of nodes which are connected through one or more edges. In particular, many algorithms focus on finding the entire transitive closure, as required. However, if a DAG is changing reasonably often, then computing a transitive closure each time the DAG changes is extremely inefficient.
One particular prior art reference, “Maintaining Transitive Closure of Graphs in SQL”, by Ghozhu Dong et al., published 1999 in the International Journal of Information Technology, the entire contents of which is incorporated herein by reference, and in particular section 2 of the reference, Transitive Closure of Acyclic Graphs, of the reference, goes further and presents a way of persisting in memory a representation of the transitive closure. Whenever the DAG is updated, the transitive closure is also updated, thereby saving the need to compute it from scratch after every change. The transitive closure maybe used to rapidly answer queries like “is node F descendent from node A” or “find all descendents of node X”. However, there is no efficient algorithm for updating the transitive closure when an edge is removed from the DAG, thus making this solution inappropriate for very large DAGs, or for use with DAGs exhibiting edges which are deleted frequently.
Accordingly, it is a principal object of the present invention to provide a computer implemented method of representing DAGs in memory. Instead of persisting just the nodes and edges of the DAG's, or just the nodes, edges and transitive closure of the DAG's, the present invention persists a full enumeration of all paths in the DAG and updates it whenever the DAG updates. Storing a representation of all paths requires more memory than storing a representation of the transitive closure and significantly more memory than just storing the nodes and edges, however advantageously storing the representation of all paths makes certain important queries significantly faster. Additionally, the algorithms for updating the path table when an edge is deleted are more efficient than the algorithms for updating a transitive closure table of the prior art.
Additional features and advantages of the invention will become apparent from the following drawings and description.
For a better understanding of the invention and to show how the same may be carried into effect, reference will now be made, purely by way of example, to the accompanying drawings in which like numerals designate corresponding elements or sections throughout.
With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present invention only, and are presented in the cause of providing what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the invention. In this regard, no attempt is made to show structural details of the invention in more detail than is necessary for a fundamental understanding of the invention, the description taken with the drawings making apparent to those skilled in the art how the several forms of the invention may be embodied in practice. In the accompanying drawings:
The present embodiments enable a computer implemented method of representing DAGs in memory, comprising persisting a full path table in memory, and updating the path table whenever the DAG updates.
Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not limited in its application to the details of construction and the arrangement of the components set forth in the following description or illustrated in the drawings. The invention is applicable to other embodiments or of being practiced or carried out in various ways. Also, it is to be understood that the phraseology and terminology employed herein is for the purpose of description and should not be regarded as limiting.
User input device 50 is illustrated as a keyboard, however this is not meant to be limiting in any way. The use of any or all of a pointing device, a voice input, or a touch screen is equally applicable and is specifically included. Memory 70 is illustrated as being internal to computing device 20, however this is not meant to be limiting in any way. All or parts of memory 70 may be provided external to computing device 20, such as a network server, the Internet, or a removable computer readable media, without exceeding the scope of the invention.
Computing platform 10 has been described as having a monitor and a user input device 50 associated therewith, however this is not meant to be limiting in any way. In one embodiment computing platform 10 is constituted of a server, comprising a web server or other application programming interface for processing requests received from a connected network.
Memory 70 of computing device 20 is further operative to store the computer implemented method according to the principle of the invention in computer readable format for execution by computing device 20.
The invention addresses a situation where DAG 71 is represented in memory 70 as a list of nodes and list of edges, each edge being an ordered pair of nodes, which changes from time to time. In one non-limiting embodiment DAG 71 represents folders in a file system and in another non-limiting embodiment DAG 71 represents classes in a dynamically changing object-oriented data model.
According to an embodiment of the invention, there is additionally stored in memory 70 a data representation of all paths in the DAG, which is stored within nodes table 72; edges table 73; path table 74; path detail table 75; and optional transitive closure table 76.
In particular, referring to
In particular,
In one embodiment, the method according to a principle of the invention captures every single path from any node to any node in DAG 71. Every path is an ordered list of edges. For efficiency, the method according to the principle of the invention also directly points at both the starting point and the end point of the path even those may be calculated from the first and last edges in the path. In one embodiment, (not shown), the “empty path” from each edge to itself is also stored in path table 74, without any corresponding path details in path detail table 75.
With the existence of path table 74 certain important queries can be performed much more directly and efficiently than would otherwise be possible, even using a single query of a database query language SQL.
1. Is B descendant from A?
A is tested to check if it is an ancestor of B directly by simply querying to see if there is one or more paths found in Path Table 74 whose start is A and end is B.
In another embodiment, transitive closure table 76 which is always precisely equal to path table 74 with the exception that all duplications of paths which have the same start and finish have been removed, is also implemented. Each pair in transitive closure table 76 preferably includes a count of how many paths correspond to each transitive closure, so that it the path is preferably removed when the count reaches 0. Thus, when a path is deleted the corresponding transitive closure is deleted by using the count, without the need to query to see if there are other paths with the same start and finish.
It will be appreciated that a query “find all descendants of A” or “find all ancestors of A” can also be achieved with a direct query of the path table, although duplicate results should then be removed, or by querying the transitive closure table.
In stage 1110, add a new path with nodes C1 . . . Cn D1 . . . Dn for each combination of a path from set (A) followed by a path of set (B) of the path sets A and B identified in stage 1110 to path table 74 and add the details to path detail table 75 (i.e. the edges C1-C2, C2-C3, . . . Cn-D1, D1-D2 . . . ). The set of combinations of paths from A and paths from B is sometimes known in mathematics as the cross product. Optionally, in stage 1120, transitive closure table 76 is updated by adding for each new path C1 . . . Cn D1 . . . Dn, a transitive closure C1-Dn if it doesn't exist, or optionally incrementing its count if it already exists. The below pseudo code implements adding an edge to a DAG and the corresponding path table 74 and path detail table 75.
Those skilled in the art of relational database and/or programming will be able to code the data structures, database schemas and specific transactions and queries reasonably easily using the above guidelines.
By storing both DAG 71 as well as the path table 74, and path detail table 75, a simple and relatively quick computer implemented algorithm for checking ancestors and descendants using the path table, and/or using a transitive closure table derived from it by eliminating duplicates, is provided. Additionally,
The price for this efficiency is the extra storage required for the path table and also the time taken by transactions for adding and removing edges.
By way of an example, which corresponds to the diagrams of
The paths are the following seven sequences of edges: AB; AC; BD; CD; AE; AB BD; AC CD
The transitive closure is the start and finish nodes of the path table with duplications removed in this case:
AB (count 1); AC (count 1); BD (count 1); CD (count 1); AE (count 1); AD (count 2).
To remove the edge BD, as described in relation to
To add edge EC, as described in relation to
Empty path−EC−Empty path=EC
AE EC−Empty path=AE EC;
Empty path−EC CD=EC CD;
It will be appreciated that adding or removing a node does not require any updates to the edge table 73, path table 74, path detail table 75 or optional transitive closure table 76.
The corresponding counts must be updated in transitive closure table 76 for every start and finish of a path, e.g. increment the relationships EC, AC, ED and AC in the transitive closure or create if non-existent.
Thus, the present embodiments enable a computer implemented method of representing DAGs in memory, comprising persisting a full path table in memory, and updating the path table whenever the DAG updates.
It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination.
Unless otherwise defined, all technical and scientific terms used herein have the same meanings as are commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods are described herein.
All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the patent specification, including definitions, will prevail. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.
The terms “include”, “comprise” and “have” and their conjugates as used herein mean “including but not necessarily limited to”.
It will be appreciated by persons skilled in the art that the present invention is not limited to what has been particularly shown and described hereinabove. Rather the scope of the present invention is defined by the appended claims and includes both combinations and sub-combinations of the various features described hereinabove as well as variations and modifications thereof, which would occur to persons skilled in the art upon reading the foregoing description.
This application claims priority from U.S. Provisional Patent Application Ser. No. 60,893,968 filed Mar. 9, 2007, entitled “Virtual Hosted Operating System” the entire contents of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60893968 | Mar 2007 | US |