Graph databases represent entities as vertices and relationships between entities as edges which connect two vertices.
Examples will now be described, by way of non-limiting example, with reference to the accompanying drawings, in which:
Resolving a query on a graph database is achieved using the raw data items of the domain of the database. The querying process involves traversing vertices and edges in the graph database, and inspecting the properties of those vertices and edges. Properties of edges and vertices determine how the graph is traversed and which items are selected to be comprised in the result set of a given query.
An example graph database comprises a plurality of vertices, each of which represents the same type of entity (in this example, an employee). Each vertex may have associated properties, where a property is an item of information relating to the entity represented by that vertex. A property may comprise a value of an attribute of the entity. For example, an entity Ann in the graph database has a gender attribute with the value female, so the vertex representing Ann may have a “female” property. Friendship relationships between the employees are represented by edges. In this example Ann is friends with John and Sue, John is friends with Ann and Rick, Rick is friends with John and Dave, Dave is friends with Rick, and Sue is friends with Ann. Consequently, the graph database includes an edge connecting the Ann vertex and the John vertex, an edge connecting the Ann vertex and the Sue vertex, an edge connecting the Rick vertex and the Ann vertex, an edge connecting the Rick vertex and the John vertex, an edge connecting the Rick vertex and the John vertex, and an edge connecting the Rick vertex and the Dave vertex.
The process of querying a graph database, such as the example graph database described above, can be performed by a graph engine. A graph engine comprises a processing module to run computational processes against the dataset comprised in a graph database.
Many graph engines store the results of at least the latest-run queries as a result set in a cache which is completely separate from the graph database. Result sets which are not cached, or which have been cached for a certain amount of time, are deleted.
Extracting results from a cache, e.g. for input to a subsequent query, may involve inspecting all of the cached elements, and is therefore computationally intensive.
Furthermore, result sets held in the cache are not updated when changes occur to entities in the graph database, meaning that those result sets may no longer be valid at the time when it is wished to re-use them in resolving a subsequent query. Determining which cached result sets will be affected by any given change to an entity in the graph database is difficult because no links are maintained between cached results sets, or between raw data items and specific results sets. Also, any given entity may be included several times in the cache (since it may belong to several result sets), meaning that keeping track of the “belonging” relationships between entities and query results can involve performing full scans of the cache.
A technical challenge may exist with a cache of result sets, as cached result sets cannot themselves be queried using the graph engine. This means that a user cannot easily perform operations such as determining relationships between result sets, or refining a result set. Instead such operations are performed outside of the graph engine, as post-processing operations effected by a different processing module.
Examples disclosed herein provide technical solutions to these technical challenges. An example apparatus 20, e.g. for representing a result set of a query on a graph database by a sub-graph of the graph database, is illustrated in
The instructions encoded by the machine-readable storage medium 30 comprise instructions which, when executed by a processor, cause the processor to: responsive to a generation of a result set for a query on the graph database, add a second-level vertex to the graph database; and add a second-level edge (or multiple second-level edges) to the graph database. The second-level vertex represents the result set of the query and each second-level edge connects the second-level vertex to a first-level vertex. In some examples the non-transitory machine-readable storage medium 30 comprises the storage 22 of the apparatus 20 shown in
In a first block, 401, the graph database is queried to generate a result set, e.g. by submitting a query formulated in a query language to a graph engine of the graph database. Any suitable query language can be used to formulate the query.
In a second block, 402, responsive to the generation of the result set, a second-level vertex and a second-level edge (or multiple second-level edges) are added to the graph database, e.g. by a graph engine of the graph database. In some examples the instructions, e.g. the instructions encoded by the non-transitory machine-readable storage medium 30, or the instructions comprised in the instruction set of the apparatus 20, when executed by a processor cause the processor to add the second level vertices and the second level edges to the graph database. The added second-level vertex represents the result set of the query. Each second-level edge connects the added second-level vertex to a first-level vertex. In some examples a naming scheme is used to identify second-level vertices in the graph database. In some such examples each second-level vertex is associated with a name which comprises a hash encoding of the query parameters and operators of the query which generated the result set represented by the named second-level vertex.
As can be seen from
Thus, in the examples, the result set of a query is added to the graph database itself rather than being stored in a separate cache. This enables previous result sets to be easily re-used by a graph engine as inputs to further queries.
Blocks 601 and 602 are performed as described above in relation to blocks 401 and 402 of
In block 603, the graph database is queried again (i.e. a further query is submitted to the graph engine of the graph database), leading to the generation of a further result set. In some examples the further query is formulated using the same query language as the first query. Any suitable query language can be used to formulate the further query. In some examples block 603 is performed in the same manner as block 601.
In block 604, responsive to the generation of the further result set, a further second-level vertex and a further second-level edge (or multiple further-second level edges) are added to the graph database, e.g. by the graph engine. The added further second-level vertex represents the result set of the further query. Each further second-level edge connects the added further second-level vertex to a first-level vertex. In some examples block 604 is performed in the same manner as block 602.
Then, in block 605, a third-level edge (or multiple third-level edges) are added to the graph database. Each third-level edge connects the added further second-level vertex to a second-level vertex already present in the graph database.
Two result sets are generated by the further query: a Male result set which comprises John and Rick, and a Female result set which comprises Ann. Two further second-level vertices 71 have been added to the sub-graph 50 to create an expanded sub-graph 70. The further second-level vertices 71 represent the Male result set and the Female result set. As with the second-level vertex 51 representing the previous query, each further second-level vertex 71 is connected to the first-level vertices representing entities comprised in the result set which that further second-level vertex represents, by further second-level edges 72. The further-second level edges 72 represent containment relationships. In some examples the further-second level edges 72 represent bi-directional containment relationships.
The further second-level vertices 71 are also linked to the second level vertex 51 by a parenthood relationship. This is represented in the sub-graph 70 by means of a third-level edge 73 connecting each further second-level vertex 71 to the second-level vertex 51. In some examples the instructions, e.g. the instructions encoded by the non-transitory machine-readable storage medium 30, the graph engine of the graph database, or the instructions comprised in the instruction set of the apparatus 20, when executed by a processor cause the processor to connect the third-level edges to the second level-vertices. The third-level edges 73 are shown by dotted lines in
In some examples the process represented by blocks 603-605 of
It is expected that in many situations users will explore a graph database in similar manners. For example, users from a particular geographical region may often apply a filter so that they see results from that region and do not see results from other regions. In such situations it will often be possible to reuse query results already represented by second-level vertices in the graph database. Thus, in the examples, resolving a query does not involve recreating previously computed result sets, nor does it involve performing a O(N) comparison in respect of all of the results in a cache (which contains N elements) to see if a given result is held in that cache. Instead, in the examples, a graph engine checks if a prior computation exists that may be used as an input to a newly received query by analysing the expanded graph. Analysing the expanded graph is significantly less computationally intensive than recomputing previous result sets and/or searching a cache of previous result sets.
A further effect of adding query result sets to a graph database in the form of second-level vertices and second-level edges, as is done by the examples, is that the process of updating stored result sets to account for a change to an entity represented in the graph database is simplified as compared to prior art cache-updating processes.
Blocks 801 and 802 are performed as described above in relation to blocks 401 and 402 of
In block 803 a change in an entity represented by a first-level vertex is detected, e.g. by the graph engine. In some examples the instructions, e.g. the instructions encoded by the non-transitory machine-readable storage medium 30, or the instructions comprised in the instruction set of the apparatus 20, when executed by a processor cause the processor to detect the change. In some examples the change comprises the addition of the entity to the graph database (and therefore the addition to the graph of a first-level vertex representing the entity). In some examples the change comprises the removal of the entity from the graph database (and therefore the deletion from the graph of a first-level vertex representing the entity). In some examples the change comprises a change in the value of an attribute of the entity (and therefore a change in the value of a property of a first-level vertex representing the entity).
In some examples detecting a change in an entity comprises the graph engine detecting that new information has been added to the graph database. In some such examples the new information comprises information about changes, additions, and/or deletions which have occurred in respect of entities in the graph database. In some examples detecting a change in an entity comprises the graph engine performing a full scan of the graph database and comparing the results to the results of a previously performed scan. In a particular example, the graph engine comprises a data ingestion component which is responsible for creating and updating vertices in the graph, using attributes of the entities represented by each given vertex. The ingestion component is to compare the current attributes of an entity with the corresponding vertex in the graph, and detect a change if at least one attribute is found to be different. In a similar manner the ingestion component may detect that a vertex no longer corresponds to an entity, or that a new entity has been created which does not have a corresponding vertex in the graph.
In some examples, the graph engine includes rules to define a first set of attributes which are deemed to cause a change to an entity (for the purposes of the method of
Responsive to a change in an entity represented by a first-level vertex, a change indication is associated with the first-level vertex which represents the changed entity (block 804). In some examples the instructions, e.g. the instructions encoded by the non-transitory machine-readable storage medium 30, a graph engine, or the instructions comprised in the instruction set of the apparatus 20, when executed by a processor cause the processor to associate the change indication with the first-level vertex (or other vertex). A change indication is also associated with each second-level vertex connected to the first-level vertex representing the changed entity, and with each second-level edge connected to the first-level vertex representing the changed entity (block 805). Then, in block 806, a change indication is associated with each second-level vertex connected to a second-level vertex to which a change indication has been associated in block 805. In some examples (i.e. examples in which at least one pair of second-level vertices which have had change indications associated with them are connected by a third-level edge) a further block 807 is performed, in which a change indication is also associated with each third-level edge which connects two second-level vertices which have each had a change indication associated with them in block 804 or block 805. In some examples the change indications comprise flags.
Blocks 1001 to 1004 are performed as described above in relation to blocks 601 to 604 of
If, in block 1005, it is determined that a first-level vertex connected to the further second-level vertex has an associated change indication, then all second-level edges, which are connected to the first-level vertex which is determined to have an associated change indication and which themselves have associated change indications, are recalculated (e.g. by the graph engine). In some examples (i.e. examples in which the graph database comprises at least one third-level edge) a further block 1007 is performed. In block 1007, if it has been determined (i.e. in block 1005) that a first-level vertex connected to the further second-level vertex has an associated change indication, then all third-level edges which have associated change indications, and which are connected to a second-level vertex which is itself connected to the first-level vertex determined to have an associated change indication, are recalculated.
Thus, in the example of
In the case that the new second-level vertex is connected to the dirty Ann first-level vertex, this triggers the graph engine to recalculate the entire “dirty part” of the graph which relates to the change to the Ann entity. A graph may contain several independent “dirty parts”, resulting from changes to multiple different entities. However; whilst dirty parts propagating from entities comprised in the result set of a newly-received query are recalculated, other dirty parts are not recalculated until a query is received which generates a result set including an entity in a given dirty part.
In the case that the new second-level vertex is not connected to the dirty Ann first-level vertex (i.e. it is connected to “clean” first-level vertices which do not have associated change indications, which in this example is any of the first-level vertices apart from Ann, and is not connected to any “dirty” first-level vertices), no recalculation is performed.
The process of
A further effect of adding query result sets to a graph database in the form of second-level vertices and second-level edges, as is done by the examples, is that relationships between result sets can be easily identified by navigating across the graph. In the examples, determining whether two-result sets are related involves navigating from a first second-level vertex to a second second-level vertex, via the underlying graph of first-level vertices.
Blocks 1101 and 1102 are performed as described above in relation to blocks 401 and 402 of
The sub-graph 1210 comprises four second-level vertices, representing the result sets of a first query (Query 1), a refinement of that query (M and F), and a further query (Query 3). The result set Query 1 comprises all employees, the result set M comprises all male employees, the result set F comprises all female employees, and the result set Query 3 comprises all departments. If a user wishes to determine whether a relationship exists between M and Query 3 (i.e. whether the Design department contains any male employees), this determination can be made by determining whether a path exists between the M vertex and the Query 3 vertex. In practice, this may comprise determining whether a path exists between the M vertex and a first-level vertex to which the Query 3 vertex is connected.
It can be seen from
Examples in the present disclosure can be provided as methods, systems or machine readable instructions. Such machine readable instructions may be included on a computer readable storage medium (including but is not limited to disc storage, CD-ROM, optical storage, etc.) having computer readable program codes therein or thereon.
The present disclosure is described with reference to flow charts and/or block diagrams of the method, devices and systems according to examples of the present disclosure. Although the flow diagrams described above show a specific order of execution, the order of execution may differ from that which is depicted. Blocks described in relation to one flow chart may be combined with those of another flow chart. It shall be understood that each flow and/or block in the flow charts and/or block diagrams, as well as combinations of the flows and/or diagrams in the flow charts and/or block diagrams can be realized by machine readable instructions.
The machine readable instructions may, for example, be executed by a general purpose computer, a special purpose computer, an embedded processor or processors of other programmable data processing devices to realize the functions described in the description and diagrams. In particular, a processor or processing apparatus may execute the machine readable instructions. Thus functional modules or engines of the apparatus and devices may be implemented by a processor executing machine readable instructions stored in a memory, or a processor operating in accordance with instructions embedded in logic circuitry. The term ‘processor’ is to be interpreted broadly to include a CPU, processing unit, ASIC, or programmable gate array etc. The methods and functional modules may all be performed by a single processor or divided amongst several processors.
Such machine readable instructions may also be stored in a computer readable storage that can guide the computer or other programmable data processing devices to operate in a specific mode.
Such machine readable instructions may also be loaded onto a computer or other programmable data processing devices, so that the computer or other programmable data processing devices perform a series of operation steps to produce computer-implemented processing, thus the instructions executed on the computer or other programmable devices provide a step for realizing functions specified by flow(s) in the flow charts and/or block(s) in the block diagrams.
While the method, apparatus and related aspects have been described with reference to certain examples, various modifications, changes, omissions, and substitutions can be made without departing from the spirit of the present disclosure. It is intended, therefore, that the method, apparatus and related aspects be limited only by the scope of the following claims and their equivalents. It should be noted that the above-mentioned examples illustrate rather than limit what is described herein, and that those skilled in the art will be able to design many alternative implementations without departing from the scope of the appended claims.
The word “comprising” does not exclude the presence of elements other than those listed in a claim, “a” or “an” does not exclude a plurality, and a single processor or other unit may fulfil the functions of several units recited in the claims.
The features of any dependent claim may be combined with the features of any of the independent claims or other dependent claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2015/065514 | 7/7/2015 | WO | 00 |