Computing systems, devices, and electronic components may access, store, process, or communicate with a database or databases. A database may store data or information in various formats, models, structures, or systems, such as in a relational database system or a graph database structure. Users or processes may access or query the databases to or retrieve data in a database, or to update or manipulate data in a database.
The following detailed description references the drawings, wherein:
Various examples described below provide for managing a graph database. In an example, a graph database system includes a graph processor engine to receive a graph database update from an application, a graph navigation query engine to access a real-time graph and process the graph database update on the real-time graph, and a synchronization engine to extract changes from the real-time graph and process the changes to a derived graph view and to a historical graph. Examples for managing a graph database also include receiving a graph query, determining a graph query type, and in the event that the graph query type is a navigational short query type, accessing a real-time graph on a graph navigation query engine and processing the navigation short query, and in the event that the graph query type is an analytical long query type, accessing a historical graph on a graph analytic query engine and processing the analytical long query.
As the amount of information stored on computing devices has continued to expand, companies, organizations, and information technology departments have adopted new technologies to accommodate the increased size and complexity of data sets, often referred to as big data. Traditional data processing or database storage systems and techniques such as relational databases or relational database management systems (“RDBMS”), which rely on a relational model and/or a rigid schema, may not be ideal for scaling to big data sets. Similarly, such databases may not be ideal or optimized for handling certain data, such as associative data sets.
Organizations may employ a graph database to collect, store, query, and/or analyze all or a subset of the organization's data, and in particular large data sets. A graph database may be employed within an organization alone, in combination with other graph databases, or in combination with relational databases or other types of databases.
A graph database may process different types of queries or requests, such as navigational engines including navigational computations and reachability queries, or analytical engines including analytical computations and iterative processing. A navigational query may, in an example, access and update a small portion of a graph to return a real-time response, while an analytical query may access a large fraction of the graph. Graph databases may be specialized, tailored, or “tuned” for a particular type of workload, query, or algorithm, such as for navigational queries, analytical queries, or other query types. [0M] In such examples, a graph database tuned for navigational queries may comprise internal data structures designed for high throughput and access to a small portion of a graph, and may not perform well with analytical queries. Conversely, graph databases tuned for analytical queries may assume an immutable graph which enables the use of data structures to index and compress the graph so that large portions of the graph can be processed quickly, minimizing the computational resources available to process navigational queries.
Accordingly, graph databases or graph database systems may struggle to perform in a mixed workload environment, e.g., a workload comprising both navigational and analytical queries submitted concurrently to a graph database. Organizations may also need to run and maintain two or more systems to support such an environment including real-time graphs, historical graphs (e.g., graphs that reflect the graph at a previous point in time), and/or derived graphs (or “views”, e.g., graphs used to support an application-specific purpose, such as customer segmentation or fraud detection based on another graph) for particular applications.
In the example of
The graph database 106 may reside in a data center, cloud service, or virtualized server infrastructure (hereinafter “data center”), which may refer to a collection of servers and other computing devices that may be on-site, off-site, private, public, co-located, or located across a geographic area or areas. A data center may comprise or communicate with computing devices such as servers, blade enclosures, workstations, desktop computers, laptops or notebook computers, point of sale devices, tablet computers, mobile phones, smart devices, or any other processing device or equipment including a processing resource. In examples described herein, a processing resource may include, for example, one processor or multiple processors included in a single computing device or distributed across multiple computing devices.
In the example of
Graph database 106 may receive queries or updates from applications 102, which may be applications, processes, tools, scripts, or other engines for purposes of communicating with graph database 106. The queries received from application 102 may be navigational or “short” queries that access a small portion of a graph stored on graph database 106 using requests such as nearest neighbor, shortest path, or other requests that access only a few vertices and/or edges of a graph. The queries received from application 102 may also be analytical or “long” queries that access a large portion of a graph stored on graph database 106 using requests such as a page rank or connected component. In some examples, navigational queries may be executed against a real-time, active, current, or “live” graph, while analytical queries may be executed against a historical graph.
Graph database 106 may comprise or communicate with an engine or engines for executing or processing queries. In an example, an engine may be tuned or adapted to a specific type of query. For example, graph navigation query engine 103 may be tuned for executing navigation or short queries, as discussed above, while graph analytic query engine 112 may be tuned for executing analytical or long queries, as discussed above. In such examples, e.g., in examples of mixed concurrent workloads where graph database 106 may receive queries of varying types, graph database 106 may include an engine for determining which of the query engines to submit a query. In such examples, graph database 106 may include or be coupled to a federation engine or layer to present a hybrid system as a single, unified interface to the applications 102.
Graph database 106 may also comprise a synchronization engine 110 to synchronize the graphs of graph navigation query engine 108, which may access or comprise a real-time graph or graphs, with graph analytic query engine 112, which may access or comprise a historical graph or graphs. Synchronization may occur in batch, periodically, and/or may be transactionally consistent.
Synchronization engine 110 may also enable application-specific views 104 by updating views following an update to an underlying or base graph, such as a view of a particular customer segmentation or other subset or view of data. Application-specific views or models 104 may be derived by analytic queries over the historical graph. These views may be sub-graphs or may be some alternative data structure derived from the graph (e.g., a key-value store). An application may create such a view for more efficient processing of application requests rather than querying the graph database. These views may be, effectively, cached data. As such, they may be informed of updates to the underlying graph by synchronization engine 110 or the entire view may be periodically refreshed by again querying the analytic graph.
Graph database environment 100 may also include external connectors 114, which may be connectors to external systems, processes, or databases, such as a connector to a relational database, legacy system, or other system for ingesting data or exporting data. For example, a relational database may be updated with changes to a graph database via an external connector 114.
In the example of
In block 200, an update is received from, e.g., application 102, which may be an application, process, tool, script, or other engine for purposes of communicating with graph database 106. In the example of
In block 204, a real-time graph is accessed via an engine tuned or configured for a navigational query, e.g., graph navigation query engine 108.
In block 206, the update query is processed on the real-time graph. For example, a graph edge may be inserted, a node may be deleted, or another operation or operations may be performed.
In block 208, changes applied to the real-time graph are extracted. For example, synchronization engine 110 may determine which changes were applied to the real-time graph since the last synchronization.
In block 210, the extracted changes are updated onto a derived graph. In an example, a synchronization engine, e.g., synchronization engine 110, may update a derived graph based on the updates extracted from the real-time graph in block 208. The derived graph may be updated in batch, periodically, and/or may be transactionally consistent. In some examples, the derived graph is used as the basis for application-specific views, e.g., views 104.
In block 212, the extracted changes are updated onto a historical graph. In an example, a synchronization engine, e.g., synchronization engine 110, may update a historical graph via an engine, e.g., graph analytic query engine 112, based on the updates extracted from the real-time graph in block 208.
In some examples, the flow of
In the event that an analytical query executed against a historical graph requires the most recent data, such data may be retrieved on-demand from the real-time or active graph. In one example, analytical query engine 112 may communicate with graph database 106 to request a batch update from graph navigation query engine 108 via synchronization engine 110.
In block 302, a query is received from, e.g., application 102, which may be an application, process, tool, script, or other engine for purposes of communicating with graph database 106. In the example of
In block 304, a determination is made as to whether the query is a navigational-type query or an analytical-type query. Such a determination may be made, for example, by way of simulating execution of the query, as discussed below in more detail with respect to
In block 306, if a determination is made that the query is a navigational query, a real-time graph is accessed via an engine tuned or configured for a navigational query, e.g., graph navigation query engine 108. In block 308, the navigational query is processed, e.g., a short query is processed, against the real-time graph.
In block 310, if a determination is made that the query is an analytical query, a historical graph is accessed via an engine tuned or configured for an analytical query, e.g., graph analytic query engine 112. In block 312, the analytical query is processed, e.g., a long query (or “mining query”) is processed, against the historical graph.
In block 402, the process of determining a graph query type is commenced. Block 402 may be, in some examples, an extension of block 304 of
In block 404, execution of the query is simulated. Simulation of the query executing may indicate or estimate the proportion of graph nodes accessed by the query, which may indicate whether a query is a navigational query or an analytical query.
In block 406, a threshold is fetched. The threshold may indicate, in some examples, a number of nodes or edges in a graph. If the threshold is exceeded, a query may be, or may be likely to be, an analytical query that is likely to access a large number of nodes or edges in a graph. If the threshold is not exceeded, the query may be, or may be likely to be, a navigational query.
In block 408, a determination is made as to whether the threshold is exceeded. The determination may be a calculation as to whether the number or proportion of nodes is less than or greater than the threshold.
In block 410, if the threshold is exceeded, the query may be classified as an analytical or long query. In such examples, the query may be sent to a graph analytic query engine.
In block 412, if the threshold is not exceeded, the query may be classified as a navigational or short query. In such examples, the query may be sent to a graph navigation query engine.
The computing system 500 of
As used herein, a “machine-readable storage medium” may be any electronic, magnetic, optical, or other physical storage apparatus to contain or store information such as executable instructions, data, and the like. For example, any machine-readable storage medium described herein may be any of Random Access Memory (RAM), volatile memory, non-volatile memory, flash memory, a hard drive, a solid state drive, any type of storage disc or optical disc, and the like, or a combination thereof. Further, any machine-readable storage medium described herein may be non-transitory.
System 500 may also include persistent storage and/or memory. In some examples, persistent storage may be implemented by at least one non-volatile machine-readable storage medium, as described herein, and may be memory utilized by system 500. In some examples, a memory may temporarily store data portions while performing processing operations on them, such as for managing a graph database.
In examples described herein, a machine-readable storage medium or media is part of an article or article of manufacture. An article or article of manufacture may refer to any manufactured single component or multiple components. The storage medium may be located either in the computing device executing the machine-readable instructions, or remote from but accessible to the computing device (e.g., via a computer network) for execution.
In some examples, instructions 510 may be part of an installation package that, when installed, may be executed by processing resource 502 to implement the functionalities described herein in relation to instructions 510. In such examples, storage medium 504 may be a portable medium or flash drive, or a memory maintained by a server from which the installation package can be downloaded and installed. In other examples, instructions 510 may be part of an application, applications, or component(s) already installed on a computing device including a processing resource, e.g., a computing device running any of the components of graph database environment 100 of
System 500 may also include a power source 506 and a network interface device 508, as described above, which may receive data such as data 512-514, e.g., via direct connection or a network, and/or which may communicate with an engine such as engines 516 and 518.
The engine comprising instructions in or on the memory or machine-readable storage of system 500 may comprise an engine 510, which may comprise the methods of
In an example, instructions 510 may send the query to a graph analytic query engine in the event that the number of graph elements is greater than the threshold, or may send the query to a graph navigation query engine in the event that the number of graph elements is less than the threshold.
Although the instructions of
All of the features disclosed in this specification, including any accompanying claims, abstract and drawings, and/or all of the elements of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or elements are mutually exclusive.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2015/048562 | 9/4/2015 | WO | 00 |