The present invention relates to the field of electronic database management, in particular, to applying topological graph changes to graph data and generating extensions for the modified graphs.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Graph processing is an important tool for data analytics. Database management systems (DBMS) increasingly allow users to define property graphs from database objects, e.g., relational tables, and to query them using graph pattern matching queries. Graph querying and pattern matching enable interactive exploration of graphs, similar to how SQL (Structured Query Language) interacts with databases.
However, graph queries are a very challenging workload for a DBMS as the execution is focused on edges, i.e., the connections in the data. Therefore, executing graph queries might explore immense amounts of intermediate results, and queries can quickly explode in terms of memory usage. Additionally, graph queries exhibit very irregular access patterns with limited memory locality, as the query patterns and the connections in the data dictate the accesses.
One way that relational DBMSs implement graph queries is by translating the graph query into an SQL join query and processing it with their existing SQL engine. This approach is suboptimal. SQL engines do not leverage the graph structure when doing neighbor traversals. Furthermore, most SQL engines do not take advantage of sharing computation across paths of the graph belonging to multiple instantiations of a given path pattern.
Another approach, given that fast-access memory, (also sometimes referred to as “main memory”) is becoming cheaper and larger, is for the DBMS to generate graph format data from relational tables stored on disk storage and replicate the data in the graph format in fast-access memory. Such caching allows the graph data to be accessible faster due to the graph format and due to faster memory. Accordingly, the graph queries may be performed in a speedier fashion.
Unfortunately, this technique is subpar due to the lag that occurs between replications. Specifically, at any given point in time, some changes made at one of the replicas will not yet have been applied to the other replica. Consequently, the lag inherent in the replication mechanism may result in unpredictable artifacts and, possibly, incorrect results.
Furthermore, if the burden for the update of the graph is performed by each query that updates, e.g., data manipulation language statement (DML instruction or DML), then the update to the graph(s) may negatively impact the performance of DMLs. Each process(es) executing DML may need to reconstruct the graph to properly update the graph based on the DML instruction. Such a negative impact on DML executions would negatively affect the whole operation of the DBMS, slowing the DBMS considerably.
Also, each transaction generally needs to see its own changes, even before those changes have been committed. However, database changes are not typically replicated until the changes have been committed. Thus, a transaction may be limited to using the replica at which the transaction's uncommitted changes were made, even though the format of the data at the other replica may be more efficient for some operations.
In the drawings of certain implementations in which like reference numerals refer to corresponding parts throughout the figures:
In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.
The approaches described herein include detecting and recording updates to a graph data structure, referred to herein as a “graph.” The graph may be generated from persistently stored data in a database of a DBMS and stored in fast access, main, memory (thus, also referred to herein as “in-memory graph”).
Accordingly, to ensure that the in-memory graph reflects the up-to-date data, the in-memory graph has to be updated when any update occurs to the database objects that causes a topological graph change. “Topological graph change” or “graph change” refers herein to changes in the underlying data of the graph that affect the existence of vertices and/or edges of the graph. To keep a graph up-to-date and transactionally consistent, the approaches herein describe detecting graph change transactions and applying the graph changes with minimal effect on the execution of the DML instruction that caused the graph change in the first place.
Furthermore, the approaches enable high concurrency and speedy application of graph changes. For example, the set of processes that performs graph (read) operations may perform the application of the graph changes as well, thereby speeding up the application and lessening the burden for the DML instructions to apply graph changes. The techniques further describe various lightweight time-stamped records that store graph changes per transaction and arrange the changes in a queue for concurrent applications.
Additionally, since rebuilding the graph is a heavy operation, requiring a large amount of input/output operations as well as taking up memory and using up processing time, the techniques describe generating a graph extension to an already generated base graph in response to a graph operation. Thereby, the base graph is infrequently regenerated while a graph extension (which is generally a fraction of the base graph's size because the extension includes only new changes) is generated in response to a graph operation request based on the graph changes necessary for the accurate graph operation. Such graph extension is particular to the requested timestamp of the graph operation request and may be temporarily stored in the private memory.
In an implementation, a base graph (index) data structure is generated at a particular point in time. The point in time is associated with the graph to indicate the timeliness of the data in the graph. The time stamp may be represented by a logical timestamp of the DBMS (e.g., Serial Change Number (SCN)). The graph includes data on source and destination vertices as well as the edges that connect the vertices. Each such element may have a corresponding value and/or identifier.
In an implementation, the elements of the graph originate from a corresponding database object(s) in which the elements are stored as items of the object. For example, the value of each source vertex of a graph may be persistently stored in a table of a database as rows of the table. The destination vertices of a graph may be similarly persistently stored in the same or different table in the same or different database. The edges of the graph may be stored as foreign keys associating the corresponding source and destination pair vertices.
Vertex identifiers (e.g., the elements stored in dstArray) have two uses. First, a vertex identifier i makes it possible to hop to the neighbor of a vertex by jumping to srcArray[i]. Second, a vertex identifier makes it possible to retrieve the vertex properties of a vertex, as a given vertex identifier is mapped to a unique row in a vertex table. On the other hand, edge identifiers (e.g., the elements stored in edgeArray) are not used for topology traversal (e.g., to hop to neighbors). Instead, they are only used to retrieve edge properties.
A CSRE representation is built from DBMS tables, using a graph definition DDL statement that lists information, such as the tables involved in the graph, the PK-FK relationships that connect them, or the columns that are exposed as properties, in an implementation. CSREs are built in main memory and associated with the graph and its creation SCN, in such an implementation.
In an implementation, each element in the graph has an identifier that uniquely identifies the element in the graph. For example, source vertices may be uniquely identified, so no two source vertices have the same unique identifier; similarly, no two destination vertices have the same unique identifier, and similarly, no two edges have the same identifier.
In an implementation, such unique identification is based on the primary keys of the corresponding tables that persistently store the corresponding elements. For example, an image identifier (IMGID) is a unique numeric identifier that represents an element (a vertex or an edge) in a graph element table. There is a one-to-one mapping between an IMGID and a primary key value, and this mapping is maintained through the use of a dictionary (e.g., Global Dictionary (GD)). The insertion of new keys in the graph element table triggers insertions into the GD and the creation of new IMGIDs. IMGID values may be assigned sequentially, starting from 1. Primary keys that are deleted from a graph element table are never removed from the GD. If too many deletions occur, an IMGID re-compaction operation may be triggered: all entries are dropped from the GD, and new entries are created for primary keys that are present in the table only.
DML with a Graph Change
A received query may contain a DML instruction that may affect a maintained graph (change to a vertex and/or an edge). To determine whether the DML instruction affects a graph without impacting the performance of the DML execution, a DML process determines whether the change to object(s) by the DML may cause a graph change without actually performing the graph change itself, in an implementation. In particular, the process determines whether such a change is a topological change to a graph (referred to herein as “graph change”): an insert/delete operation on a source vertex or an insert/delete operation on an edge. An example of non-topological change would be an update of an edge or vertex value.
A DML statement may be compiled once but executed multiple times. An example of such a DML is the statement referencing the same sequence of operations but for different values of an operand. Accordingly, while the statement is compiled once, the statement is executed multiple times. A non-limiting example of a DML statement is one with a parametrized value, e.g., “UPDATE MY_TABLE SET c1=:1:”.
In such an implementation, rather than performing determination for topological change for every DML execution, the existence of graph change is determined only once. At compile time, the DML compile process determines whether or the execution of DML causes a graph change. The subsequent execution of the DML re-uses this determination.
Once the DML process determines that the affected object on the DBMS has corresponding topological graph element(s), the DML process generates an indication of a graph change for the DML and stores the association of the indication with the DML instruction. With the execution of the DML instruction, this indication is propagated to the database change record(s) generated by the execution of the DML instruction. Process(es), other than those executing the DML instruction, then apply the graph change based on the database change records.
For example, the object-to-graph mapping data (e.g., a mapping table) may map unique identifiers of database objects of graph elements to the CSREs arrays of the graph elements. The same database object (e.g., a relational table of DBMS) may be involved in different CSREs and may have different roles, e.g., be a source, or a destination vertex table, an edge table, or both an edge and a vertex table.
If the database object being modified by the DML instruction is found in the mapping data at step 330, then the DML process proceeds to step 340. Otherwise, the DML process proceeds to complete the DML instruction.
Continuing with the mapping table example, the DML compiler process may lookup for the object identifier of the table targeted by the DML. If no mapping to CSRE's is found, no topological change is caused by the DML instruction, and the DML process proceeds to complete the DML instruction at step 380. Otherwise, a list of corresponding CSRE graphs may be returned at step 330.
At step 340, if the DML instruction inserts or deletes an item into the object, such as an INSERT/DELETE operation on a table, then the DML instruction adds or removes vertex (ices). Therefore, the DML process proceeds to step 370 to indicate the DML instruction as a graph change operation.
Otherwise, if at step 340, the DML instruction is determined to be an update to an existing item(s), then the DML process proceeds to 350. At step 360, the DML process determines whether various named set(s) of items of object(s) (e.g., columns) referenced by the object-to-graph mapping data match the DML instruction referenced named sets. If there is a match, then the DML instruction updates a vertex or an edge. Additionally, the DML process may determine whether the update to the edge is to the edge itself or the value(s) associated with the edge. If it is determined that the update is to any of the vertices or to the edge itself, the process proceeds to step 370 to indicate the DML instruction as a graph change operation.
Continuing with the in-memory mapping table example, to determine if any of the columns targeted by the update of the DML is impacting the topology of any of the CSRE graphs, the property graph is locked in shared mode, and a boolean flag tracking a topological graph change is initialized to FALSE. The CSREs from the list returned by the mapping table are searched for CSREs used by the property graph. For each such CSRE, the role of the update table is determined. If it is a vertex table, the columns used to define the primary key for that vertex table are searched in the set of columns targeted by the DML update. If at least one is in the DML set of columns, the DML process may stop the scan as a graph change is caused by the DML and the impacted flag is set to TRUE. If the graph element table is an edge table, in addition to the columns used to define the edge table primary key, the columns used to define a foreign key to the source and destination vertex tables are searched in the set of columns targeted by the DML update. If at least one is in the DML set, the DML process may stop, and the impacted graph flag is set to TRUE.
Otherwise, the DML process proceeds to step 380 to complete the execution of the DML instruction.
Any indication of graph change associated with the DML instruction(s) is propagated to the database change record(s). Stated differently, when the DML instruction with the indication of graph change is completed, any change to the database recorded in the database change records is marked with the indication. Such database changes in the records are committed as part of a transaction. A transaction may include a number of DML instructions, and thus, a transaction log may include multiple database change records for a transaction and identify each database change record that causes a graph change. On the other hand, a single DML instruction execution may generate multiple database change records, one for each modified item of an object.
Additionally, the indication of the graph change of a database change record may include an indication of a type of graph change. For an insert of an item that is associated with a graph change indication, a graph element insert indication is associated with the database change record. For a delete of an item, a graph element delete indication is associated with the database change record, and for an update of an item, a graph element update indication is associated with the database change record. If such an update was for an item that was part of an object that is used to generate a graph but is not a topological graph change, an indication of an update of a non-key element is associated with the database change record.
In an implementation, the database change records in the transaction log are arranged by transactions and are further arranged by the database object (or an item thereof) that each database change record is modifying. The timestamp associated with the execution of the transaction is associated with the database change records of the transaction.
One example of a transaction log is a journal of database change records. The journal keeps track of changes made by transaction(s) to row values of columns of tables of database(s) and replicates those changes in in-memory data structures (MF data). The detailed description of journals and in-memory data structures are described in “Mirroring, In Memory, Data From Disk To Improve Query Performance,” U.S. patent application Ser. No. 14/337,179, filed on Jul. 21, 2014, referred to herein as “Mirroring Data Application”, the entire content of which is incorporated herein by this reference.
In an implementation, a journal for each transaction is extended to store database change records with a graph change indication as described above. Before the transactions are committed, the database changes made by a transaction are stored in the journal for that transaction, along with the determined graph change indication.
Each journal entry may contain all information required to determine (a) what data items are in the entry, (b) what version of those data items the entry reflects, and (c) whether the data items affect the graph and the type of graph change, if any. In one implementation, each journal entry includes:
Another example of a transaction log is a redo log, which stores database change records describing the objects changed and associated timestamps of the action.
In an implementation, a graph transactional change record is generated from a transaction log. The process determines the graph change for each database object or the item thereof, especially if multiple changes have occurred to the same database object or the item thereof during a transaction (multiple database change records are stored for the same database object in a transaction log).
In some implementations, the transactional log is already arranged by modified items (or objects thereof) of a transaction. Accordingly, the range of addresses for those items (or corresponding objects) may themselves indicate whether any graph change is caused by the modification. For that reason, at step 410, the process may determine, based on the object-to-graph mapping data, whether the range of addresses/object of item change(s) in the transactional change record causes any graph change. If the range of addresses maps to an object that has a corresponding graph data structure mapped in the object-to-graph mapping data, then the process proceeds to step 417. Otherwise, at step 415, the range is skipped, and the next range of addresses is analyzed at step 410.
At step 417, multiple database change records may exist for a modified item associated with a graph change indication. Accordingly, if it is determined to be so at step 417, at step 420, the first and last database change record may be selected and, based on the first and last database change record for a modified item and/or database object, and regardless of intermediate database change records for the same item/object, the resulting graph change is determined at 430.
At step 430, the process identifies the pattern of operations on the object item, and based on the pattern in Table 1 of the “first opc” and “last opc” column of the respective database change records, determine the graph change operation as recited in the “topological operation” column.
updnk
upd
* del ins
*)
updnk
upd
del ins
* updnk)
upd
updnk
del ins
* del)
updnk
upd
del ins
* upd)
ins
updnk
upd
* del
* ins)
updnk
upd
del ins
*updnk)
ins
updnk
upd
* del
*)
ins
updnk
upd
* del
*insupd)
upd
updnk
del ins
* upd
)
upd
updnk
del ins
* updnk)
upd
updnk
del ins
* del ins)
upd
updnk
del ins
* del)
indicates data missing or illegible when filed
If the topological operation is “noopg” where “g” stands for a graph change operation, then, at step 440, the process proceeds to step 470, and if the next (set) of database change records exist in the transaction log, then the process proceeds to step 410 to process the next (set) of database change records for the transaction.
Otherwise, if there are no more database change records in the transaction log, at step 450, the process generates an entry for the graph transactional change record that corresponds to the determined graph operation. At step 460, the process appends the generated entry to the graph transactional change record. At step 470, the process repeats the same steps for all database change records in the transaction until a graph transactional change record is generated that contains all the graph changes for the transaction.
Thereby, a graph transactional change record for a transaction is generated by appending entries for each determined graph operation type on the changed database object/item of the database change record(s). The transaction is also identified in the graph transactional change record along with the timestamp at which the database change(s) of the record have been applied (but may not be committed yet).
In the example of
Accordingly, graph transactional change records track changes based on their impact on the graph topology at the transaction granularity, in an implementation. The same object may appear several times in the record, but the set of modified items (i.e., the slot numbers) for the object should be disjoint, and the operation should be different, in such an implementation. This means that a block of a table may appear at most three times in a single graph transactional change record (i.e., the number of different topological operations).
Graph change(s) are scheduled to be applied to the graph(s) using a graph change queue. The graph change queue maintains a queue of the generated transactional change record(s). In an implementation, a set of transaction commit processes schedule graph change(s) of the transaction by inserting the transactional change record of the transaction into the graph change queue. A transaction commit process may have been spawned to prepare and/or commit the transaction referenced in the transactional change record.
In addition to maintaining the order for the application of the transactional change records, the graph change queue maintains information about whether the database change(s) of a transactional change record have been committed and when such a commit occurred. For example, for each graph transactional change record in the queue, the transaction identifier of the transaction and, additionally or alternatively, the timestamp of the queue insertion are stored in an implementation. Using the transaction identifier, the DBMS may determine whether the graph change(s) referenced in the graph transactional change record have been committed and at which timestamp. The timestamp of the insertion of the graph transactional change record into the queue may indicate the staleness of the application of the graph transactional change record.
Continuing with
The entry may contain duplicate information with the graph transactional change record, such as the transaction identifier, to accelerate the determination of whether or not to apply the graph transaction change record to the corresponding graph(s). A process may scan queue 550 and, based on the transaction identifiers in the entries, may quickly determine whether the corresponding transactions have been committed. Other implementations of graph change queue entry may include any range of information from the full graph transactional change record to simply a pointer to one.
In an implementation, the set of DML processes and/or the set of commit processes are not the processes that perform the application of the graph changes themselves. To lessen the burden on the DML compilation/execution and for the DML compilation/execution times not to be affected by graph changes, the DML processes only associate the indication of a graph change to the corresponding database object change(s), in an implementation. Similarly, in such an implementation, the transaction commit processes prepare and schedule the graph changes when the transaction is being prepared and/or committed. And, a different set of processes, asynchronous with the DML processing and committing thereof, performs the graph change application.
In an implementation, the processes that are spawned to service graph operation(s) on graph(s), are in the set of processes that traverse the transactional change record(s) and apply the graph change(s). In such an implementation, when a query with a graph operation request is received, the process servicing the graph operation request determines whether any graph change is to be applied to accurately service the request. The determination is made based on the graph(s) referenced by the graph operation and the timestamp as of which the graph operation is requested (or if none is specified in the request, then using the current timestamp). The processes traverse the graph changes in the queue and identify those graph change(s) that are for the referenced graph(s) in the operation as of the timestamp specified by the graph operation.
In an implementation, the set of processes that is spawned to apply graph changes by graph operations includes an allocating set of processes and a drain set of processes. The allocating set of processes is spawned by graph operation(s) to allocate memory for the graph change application in parallel and concurrent to the drain set of processes. The drain set of processes is spawned by the graph operations to apply the graph changes to generate a graph change log for the graph changes in the allocated memory.
An allocating process determines if an entry in the graph queue qualifies for the request based on the request's timestamp and the referenced graph(s). The allocating process may query the object-to-graph mapping with the object identifier in the graph transactional change record of the entry and retrieve the graph identifier(s) of the affected graph(s). For the affected graph, the allocating process allocates memory in the corresponding graph change log based on the number of graph change operations for the graph.
In an implementation, the set of allocating processes generates draining tasks for the set of draining processes to drain based on the received graph operations and the queue. Entries in the change queue are associated with timestamps obtained at pre-commit (during the queuing), which may be (slightly) older than the actual commit timestamp. Accordingly, an allocating process may compare the requested timestamp in the graph operation with the pre-commit timestamp recorded in the queue entry and, additionally, may retrieve the commit SCN for an accurate comparison. The determination of whether an entry qualifies for draining is made according to the Pscn (pre-commit logical timestamp SCN) associated with the queue entry, Cscn (logical timestamp SCN of the transaction commit) and Rscn (requested graph operation logical timestamp SCN).
Accordingly, the allocation process identifies the entry for draining if the requested graph operation timestamp is later than or equal to the commit timestamp of the transaction with the graph changes. Otherwise, the entry is skipped. Then, the subsequent entries are inspected until the last entry of the queue is reached to determine if the commit timestamps of the entries are earlier or the same as the requested timestamp and, thus, have to be allocated. The process may determine to skip an entry without inquiring about the transactional change record's commit time stamp if the pre-commit timestamp of the transaction is later than the requested timestamp because the commit timestamp for such an entry has to be at least as late as the pre-commit timestamp.
In an implementation, for draining transactional change records into the graph change log(s) of the impacted graph(s), an allocation process computes the memory space needed for the new records in the graph change log(s). The object section(s) of a transaction graph change record, such as 520 and 530 of the transactional change record 500 of
In an implementation, a graph change log record is created in the graph change logs, with a size estimated. The record is set to the state that indicates that the record is not yet ready to be used in the graph operation (e.g., “not ready state”). The record will be in the ready state once properly populated by a draining process.
In an implementation, the set of allocation processes generates drain task(s) for each of the graph change records allocated. The commit timestamp of the corresponding transaction may be the key for the corresponding task in the list of drain tasks. As soon as a drain task is added to the list, a concurrent draining process may start processing the drain task.
The allocation process may have reserved records in the corresponding change logs for the drain tasks to populate. Continuing with
A draining process may, concurrently with the allocating process, traverse the queue of entries and apply the graph change(s) in a graph transaction change record to populate allocated record(s) in the graph change log(s) of the affected graph(s). In an implementation, when multiple draining processes are spawned for the graph operation targeting the same graph, the multiple processes may concurrently drain entries of the queue based on associated timestamps of the entries. In other implementations, the draining process(es) may also perform the allocation, and the allocating and draining steps may be executed in series rather than concurrently.
In an implementation, the process that executes a graph operation traverses the graph change log of the referenced graph for records that qualify for the requested timestamp of the query. If the identified record in the traversal is not in a ready state, the graph process may drain the corresponding drain task identified by the record in the list of drain tasks, thereby executing as a process in the set of draining processes. The draining task is thereby assigned to the draining process by, as an example, associating the draining task with the process identifier. Executing a draining task includes converting the associated graph transactional change record of database operation(s) to record(s) of the graph change log of the graph. In particular, the process maps the affected items of the database objects (e.g., primary keys thereof) to IMGID(s) of the corresponding graph vertices. If a graph transactional change record includes an insert operation on a vertex object of a graph, the process generates a new IMGID for the inserted item. With the identified IMGID, the draining process populates the record, storing the IMGID and the corresponding operation on the vertex or the edge (INSERT, DELETE, UPDATE) in the graph change record. Once populated, the record(s) are set to a ready state.
For example, the received query references CSR 2, which corresponding CSRE graph change log is graph change log 602 of
Using graph transactional change record 500 associated with drain task 611, the draining process determines the graph changes to be applied. The draining process updates all graph change logs impacted by the graph transactional change record, which, in this example, includes record 652 of Graph Change Log 601 and record 661 of Graph Change Log 602, as depicted in
In an implementation, the draining processes generate topological graph operations based on the information in the graph transaction change record and store the generated graph operations in the graph change log records of all affected graphs. A mapping of item(s) (e.g., row addresses) of the object(s) referenced in a graph transactional change record to the corresponding IMGID(s) using the global dictionaries are generated. The draining process encodes the keys of the impacted rows of the impacted vertex and edge table to IMGIDs. These mappings are performed by minimizing block fetches from the tables and global dictionary lookups. For example, a block is fetched at most twice (once for row values at commit, once for values before the first change by the committed transaction), and the global dictionary lookups for IMGIDs are done in bulk for all row identifiers.
In an implementation, for each object in the object section of the transactional graph change log record, the corresponding set of impacted graphs is determined. The draining process determines the role of the object in an impacted graph (source or destination vertex table or edge table). The draining process maintains a pointer to the next free section of the to-be-populated graph change log record. The next section is a vertex section if the role of the object in the impacted graph is source or destination vertex or an edge section if the role is an edge.
In an implementation, the draining process retrieves the IMGIDs for each item of the object (e.g., arrays of key values for each row of a table referenced in the graph transactional change record) by querying the global dictionaries that map the key values to the IMGIDs. The draining process stores into the appropriate section of the graph change log the retrieved IMGID(s) and the corresponding operation type. For example, for an edge update operation, the draining process retrieves the IMGIDs for the source, destination, and edge of the corresponding primary and foreign keys described in the object table(s) for the row. The draining process then modifies the edge section of the graph change log record with the source and destination vertex IMGIDs, the edge IMGID, and the updated value for the edge. The draining process also indicates in the graph change log record that the operation type is UPDATE. Additionally, The draining process may record an IMGID of zero value for an edge IMGID if the edge is being deleted.
Additionally or alternatively, the DBMS may spawn background drain processes to drain the task queue. When no query is received containing a graph operation related to a draining task, the drain task may continue to persist (and if such a query is never received, then the draining task may persist indefinitely). For that reason and as an alternative implementation, DBMS may spawn a set of background drain processes that may select draining tasks to drain solely based on the respective timestamps meeting the criterion of staleness.
For example, if the draining task 610 does not qualify for any graph operation and, at SCN 3, has not yet been drained, a background process may be spawned at any time later than SCN 3 to drain all the draining tasks that have earlier or same SCN 3 timestamp. Drain task 610 qualifies for the background task draining, and therefore, the process identifier of the background process is stored in draining task 610 (not depicted in
An inflight-extended memory graph is a data structure that augments the base memory graph data structure to incorporate the graph changes captured in the graph change log of the graph. The in-memory graph extension indicates the modified vertices and/or edges of the graph. Additionally, the in-memory graph extension includes additions to the graph, such as information about a new vertex/edge or updated values to an existing vertex/edge as specified in the corresponding graph change log record(s).
The in-memory graph extension may contain multiple data structures based on the types of graph operation(s) in the change log. For example, for vertex/edge insertion, the data representation of new edges and vertices in the graph extension is the same as an arrangement of those of the base memory graph data structure having additional data structure(s) for mapping to the base graph data structure. In an implementation, an in-memory graph extension is implemented as a partial CSRE, which includes dstArray, srcArray, and/or edgeArray, each storing the respective updated information from the graph change log of the graph. Additional array(s) may map the indices of the extended arrays to the IMGIDs of the base graph.
In an implementation, the in-memory graph extension is generated on request, based on the graph operation qualifying one or more records of the graph change log of the graph for accurate traversal of the graph. A graph operation may indicate a requested timestamp for the results. If none is indicated, a current timestamp is presumed. If a target graph of the graph operation has no associated graph change log or has no records in the graph change log as of the requested timestamp, then the in-memory base graph is only used for traversal and generation of the result.
Otherwise, the graph operation is performed by traversing the records of the graph change log of target graph(s) that are earlier than or the same as the requested timestamp. Accordingly, the graph extension is generated for the graph changes as of the requested timestamp.
The process spawned by the graph operation may perform the traversal of the graph change log records and generate the in-memory graph extension(s). The results are generated from the traversal of the base memory graph and the graph extension(s). In an implementation, the graph extension(s) are generated in the private memory of the process and are temporary, thereby deallocated after the results have been generated.
In an implementation, an in-memory graph extension of a base graph includes a set of indicators that references the vertex (ices) and/or edge(s) that are deleted in the qualifying graph change log records. The reference(s) are generated by processing the qualified delete record(s) in the graph change log. The process for a graph operation scans the records that qualify for the graph operation. When a delete operation record of the graph change log is identified as qualifying for the graph operation, the process generates a reference to a vertex or an edge to be deleted. The reference, indicating the vertex or the edge in the base memory graph, is stored by the process in the in-memory graph extension of the base graph.
Graph Extension 842 is an extension of the CSRE of Base Graph 800 reflecting the deleted entries of the qualified graph change log record(s). The process generates Invalid Sources 851 by identifying delete entries in the graph change log that match the graph operation. For each deleted source vertex entry identified by the process, the process generates a reference in Invalid Sources 851 to the source vertex in the srcArray 810. For example, the graph change log may contain a record that qualified for the graph operation and contains a delete entry for the source vertex with IMGID 45. In an implementation in which Invalid Sources 851 is a bit vector, the bit corresponding to the location of IMGID 45 is set.
Similarly, in an implementation, the process generates elements for Invalid Edges 852 that correspond to the entry(ies) in the record(s) of the graph change log that qualify for the graph operation and which contain a deletion of an edge. The entry may contain the IMGID of a destination vertex to be deleted. The process inserts the reference(s) to the deleted edges (in dstArray 820 and edgeArray 830) into Invalid Edges 852 to the edges.
For example, a qualified graph change log record entry may contain a deletion of the destination vertex with the IMGID of 142. dstArray 820 contains two elements with the destination vertex of 142 corresponding to index 2 and 5 in Edge Index 831. Accordingly, the process adds elements in Invalid Edges 852 that reference the dstArray 820 and edgeArray 830 at the index 2 and 5 of Edge Index 831. In an implementation in which Invalid Edges 852 is a bit vector, the bits at locations 2 and 5 of the bit vector are set.
In an implementation, for modifying graph change log entry(ies), the in-memory graph extension of a base graph includes a set of indicators that references the vertex (ices) and/or edge(s) that are modified. The process spawned by the graph operation generates the references by scanning the graph change log of the target graph to identify the records and the entries thereof that qualify for the graph operation and contain a modifying operation.
When a modifying operation (e.g., update or insert) entry of the qualifying graph change log record is identified, a graph operation process generates a reference to a vertex or an edge to be modified. If an existing vertex/edge is updated, the reference to the vertex/edge in the base memory graph is generated and stored by the process in the in-memory graph extension of the base graph. Additionally, the changed information for the vertex/edge is stored in the in-memory graph extension in a similar arrangement as the base graph. For example, the modified vertex/edge may be stored as an additional vertex or edge of CSRE.
In an implementation, the process generates Modified Sources 851 by identifying entries of record(s) in the graph change log that match the graph operation for an update of an existing source vertex with an additional edge. For each such identified source vertex entry, the process generates a reference in Modified Sources 853 to the source vertex in the srcArray 810. For example, the graph change log may contain an entry that qualified for the graph operation and contains an update entry for the source vertex with IMGID 44 by adding two additional edges to destination edges with IMGIDs 142 and 144. In an implementation in which Modified Sources 853 is a bit vector, the bit corresponding to the location of IMGID 44 in srcArray may be set to indicate the modified source vertex.
To map an additional edge for the graph to the corresponding source vertex in the base graph, the CSRE extension maintains an additional array aligned with the other array of the CSRE extension, in an implementation. Since the new CSRE extension's source array index does not represent the IMGIDs of the vertices included, the CSRE extension may additionally include another array that is aligned with the CSRE extension arrays in an index and includes the IMGIDs for each source vertex included in the CSRE extension.
Continuing with
The new edges are added to Changed dstArray V70 and Changed edgeArray 880 and associated with the modified source vertex in Changed srcArray 860. Changed IMGIDs 862 contain the IMGIDs of the source vertices included in the CSRE extension of Graph Extension 844.
Since the index of the source vertex array in Changed srcArray 860, Changed Source Index 861, is different from the IMGID's in Base Graph 800's Source Index 811, the process may additionally generate vertex mapping data structure and include the mapping between the IMGID's of Source Index V11 and the index of Changed Source Index 861 for the modified source vertex. Such a mapping data structure is Vertex Mapping 855, which maps the same source vertex in Base Graph 800 with the same source vertex in CSRE Extension 857.
To illustrate an example of inserting new edges into CSRE Extension 857 for an existing source vertex, the process identifies a modified source vertex with IMGID 44 in the graph change log record. The process adds entries to Changed IMGIDs 862 having value 44 and to Changed srcArray 860 to indicate the indices of the respective added edges in Edge Index 881. The process adds entries to Changed dstArray 870 for the new destination vertices, having IMGID 142 and 144, and their corresponding edges, having IMGID 108 and 110, in Changed edgeArray 880. To complete the addition of these two new edges to the modified source vertex, the next entry in Changed srcArray 860 is incremented by 2 at Changed Source Index 861 of 1. Additionally, for mapping the vertex IMGID to srcArray 860, Vertex Mapping 855 is augmented with entries mapping IMGID 44 of Base Graph 800 and the value 0 of Changed Source Index of CSRE Extension 857.
Similarly, when inserting a new source vertex, an insert entry of a qualified change log record is identified by a graph operation process. The process inserts the new source vertex into the in-memory graph extension of the entry in the same arrangement as in the base in-memory graph, in an implementation. The entry in the graph change log may contain an insertion of a source vertex alone or with edge(s). If the entry includes a new edge, the edge is also inserted into the in-memory graph extension.
In an implementation, the graph operation process generates CSRE array entries in the in-memory graph extension to represent the source vertex with the edge(s), if any. The IMGIDs for new vertex (ices) (or new edge(s), if present) may already be generated and stored in the entry of the qualified graph change log record. Alternatively, the process may generate such IMGIDs using techniques described herein.
Continuing with
If no edge is present in the insertion entry of the graph change log, then the process inserts only the new source vertex IMGID into the CSRE of the in-memory graph extension without any modification of the destination or edge representations. For example, for the new source vertex with IMGID 48 at index 3 of Changed Source Index 861, index 4 of Changed srcArray 860 reflects no increment from the previous index, indicating no edge. Accordingly, no destination vertex or edge insertion is performed into Changed dstArray 870 and Changed edgeArray 880.
A graph operation process from the set of graph operation processes spawned to service the graph operation request may perform the traversal of the modified graph for the graph operation. The traversal is performed by combining the traversal of the base in-memory graph and the in-memory graph extension generated by the same set of graph operation processes.
When traversing the base memory graph, the process refers to the deleted source vertex and/or edge representations to determine whether to skip the respective source vertex or an edge. In an implementation, in which such representations are bit vectors, the index corresponding to the set bit in the bit vector is skipped in the respective source or edge arrays when traversing the graph.
Similarly, when traversing the base memory graph, the process refers to the modified source representations in the extended memory graph to determine whether the scanned source vertex has been modified to include additional vertex (ices). If so, using the source vertex mapping data structure, the process identifies the corresponding source vertex in the graph extension and traverses the additional edge(s) as identified in the graph extension.
To determine whether there has been any insertion into the base graph, the process scans the graph extension and determines whether the graph extension contains vertex (ices) stored in addition to those that have been modified in the base graph. If any exist, the process scans the additional vertex (ices) and their corresponding edges, if any.
Graph Extension 840 in
For example, the process traversing Base Graph 800 scans the source vertex of IMGID 44 of Source Index 811 of srcArray 810. In Modified Sources 853 of Graph Extension 840, this source vertex is indicated as modified. Accordingly, in addition to traversing Base Graph 80's single edge with IMGID 107 in edgeArray 830 to the destination vertex with IMGID 143 in dstArray 820, the process also scans Vertex Mapping 855 of Graph Extension 840 to determine the additional edges for the referenced source vertex. Vertex Mapping 855 indicates that the source vertex with IMGID of 44 corresponds to Changed Source Index of 0. At Changed Source Index 0 of CSRE Extension 857, Changed IMGID 862 indeed contains IMGID 44 and Changed srcArray 860 has value 0 and value 2 at the next index of Changed srcArray 860. Thus, two additional edges are traversed by the process in CSRE Extension 857: an edge to destination IMGID 142 in Changed dstArray 870 with edge IMGID 108 in Changed edgeArray 880, and an edge to destination IMGID 144 in Changed dstArray 870 with edge IMGID 110 in Changed edgeArray 880. Thereby, the process traverses in total three edges, one in Base Graph 800 and two in Graph Extension 840 for the graph operation.
When the process scans the next source vertex at index, IMGID, 45, of srcArray 810, the process identifies that the source vertex is indicated in Invalid Source 851 and therefore is skipped for being deleted.
When the process scans the next source vertex at index, IMGID, 46, the process determines that the source vertex is not referenced in Invalid Sources 851 or Modified Sources 853 of Graph Extension 840. Therefore, the process continues traversing the source vertex by scanning the corresponding entries of srcArray 810 at index 46 and 47 of Source Index 811 to determine that the source vertex has three edges to destination vertices in dstArray 820 with the corresponding edges in edgeArray 830.
The process identifies one of the edge indices of the determined edges that is referenced by Invalid Edges 852. Because the edge from the source vertex of IMGID 46 to the destination vertex of IMGID 142 having an edge of IMGID 106 has Edge Index 831 of 5, which is referenced in Invalid Edges 852, the process skips the edge in the graph traversal.
The process then proceeds to the CSRE Extension 857 to determine if any new source vertices and edges thereof (if any) have been inserted from the graph change log record(s). The process traverses CSRE Extension 857, which additionally includes Changed IMGIDs 862 array of IMGIDs for new or modified source vertex (ices). The process determines that new source vertices have been added based on the presence of IMGID and entries that are not referenced in the Modified Sources 853 array.
Accordingly, the process determines that source vertices with IMGIDs 40, 46, and 48 have been inserted into the graph because their entries are present in Changed IMGIDs 862 and/or Changed srcArray 860 lists. The process traverses the edges based on the corresponding values in the Changed srcArray in the current and next entries corresponding to each new source vertex. For example, the value at index 1 of Changed srcArray 860 corresponding to IMGID 40 is 2, and the next index is 3, thus a single edge, having with IMGID 111 of Changed edgeArray 880, to a destination vertex with IMGID 148 of Changed dstArray 870 at index 2 (the entry value for the source vertex in Changed srcArray 860) of Changed Edge Index 880.
Accordingly, the process traverses a complete graph generated based on Base Graph 800 and Graph Extension 840 that reflects all the committed changes at the requested timestamp of the received graph query.
A database management system (DBMS) manages a database. A DBMS may comprise one or more database servers. A database comprises database data and a database dictionary that are stored on a persistent memory mechanism, such as a set of hard disks. Database data may be organized into database objects and stored in one or more data containers. Each container contains records. The data within each record is organized into one or more fields. In relational DBMSs, the data containers are referred to as tables, the records are referred to as rows, and the fields are referred to as columns. In object-oriented databases, the data containers are referred to as object classes, the records are referred to as objects, and the fields are referred to as attributes. Other database architectures may use other terminology to refer to database objects.
In implementations, the databases may be structured as key-value stores (e.g., NoSQL or JSON) where different database objects may represent different data structures. Key-values and associated objects can be referenced, for example, by utilizing look-up tables such as hash tables.
Users interact with a database server of a DBMS by submitting to the database server commands that cause the database server to perform operations on data stored in a database. A user may be one or more applications running on a client computer that interact with a database server. Multiple users may also be referred to herein collectively as a user.
As used herein, “query” refers to a database command and may be in the form of a database statement that conforms to a database language. In one implementation, a database language for expressing the query is the Structured Query Language (SQL). There are many different versions of SQL; some versions are standard and some proprietary, and there are a variety of extensions. Data definition language (“DDL”) commands are issued to a database server to create or configure database schema, including database containers, such as tables, views, or complex data types. SQL/XML is a common extension of SQL used when manipulating XML data in an object-relational database. Although the implementations of the invention are described herein using the term “SQL,” the invention is not limited to just this particular database query language and may be used in conjunction with other database query languages and constructs.
A client may issue a series of requests, such as requests for execution of queries, to a database server by establishing a database session, referred to herein as “session.” A session comprises a particular connection established for a client to a database server, such as a database instance, through which the client may issue a series of requests. The database server may maintain session state data about the session. The session state data reflects the current state of the session and may contain the identity of the user for which the session is established, services used by the user, instances of object types, language and character set data, statistics about resource usage for the session, temporary variable values generated by processes executing software within the session, and storage for cursors and variables and other information. The session state data may also contain execution plan parameters configured for the session.
Database services are associated with sessions maintained by a DBMS with clients. Services can be defined in a data dictionary using data definition language (DDL) statements. A client request to establish a session may specify a service. Such a request is referred to herein as a request for the service. Services may also be assigned in other ways, for example, based on user authentication with a DBMS. The DBMS directs requests for a service to a database server that has been assigned to running that service. The one or more computing nodes hosting the database server are referred to as running or hosting the service. A service is assigned, at run-time, to a node in order to have the node host the service. A service may also be associated with service-level agreements, which are used to assign a number of nodes to services and allocate resources within nodes for those services. A DBMS may migrate or move a service from one database server to another database server that may run on a different one or more computing nodes. The DBMS may do so by assigning the service to be run on the other database server. The DBMS may also redirect requests for the service to the other database server after the assignment. In an implementation, after successfully migrating the service to the other database server, the DBMS may halt the service running in the original database server.
A multi-node database management system is made up of interconnected nodes that share access to the same database. Typically, the nodes are interconnected via a network and share access, in varying degrees, to shared storage, e.g., shared access to a set of disk drives and data blocks stored thereon. The nodes in a multi-node database system may be in the form of a group of computers (e.g., workstations, personal computers) that are interconnected via a network. Alternately, the nodes may be the nodes of a grid, which is composed of nodes in the form of server blades interconnected with other server blades on a rack.
Each node in a multi-node database system hosts a database server. A server, such as a database server, is a combination of integrated software components and an allocation of computational resources, such as memory, a node, and processes on the node for executing the integrated software components on a processor, the combination of the software and computational resources being dedicated to performing a particular function on behalf of one or more clients.
Resources from multiple nodes in a multi-node database system may be allocated to running a particular database server's software. Each combination of the software and allocation of resources from a node is a server that is referred to herein as a “server instance” or “instance.” A database server may comprise multiple database instances, some or all of which are running on separate computers, including separate server blades.
Software system 900 is provided for directing the operation of computing system 1000. Software system 900, which may be stored in system memory (RAM) 1006 and on fixed storage (e.g., hard disk or flash memory) 1010, includes a kernel or operating system (OS) 910.
The OS 910 manages low-level aspects of computer operation, including managing execution of processes, memory allocation, file input and output (I/O), and device I/O. One or more application programs represented as 902A, 902B, 902C . . . 902N, may be “loaded” (e.g., transferred from fixed storage 1010 into memory 1006) for execution by the system 900. The applications or other software intended for use on computer system 1000 may also be stored as a set of downloadable computer-executable instructions, for example, for downloading and installation from an Internet location (e.g., a Web server, an app store, or another online service).
Software system 900 includes a graphical user interface (GUI) 915, for receiving user commands and data in a graphical (e.g., “point-and-click” or “touch gesture”) fashion. These inputs, in turn, may be acted upon by the system 900 in accordance with instructions from operating system 910 and/or application(s) 902. The GUI 915 also serves to display the results of operation from the OS 910 and application(s) 902, whereupon the user may supply additional inputs or terminate the session (e.g., log off).
OS 910 can execute directly on the bare hardware 920 (e.g., processor(s) 1004) of computer system 1000. Alternatively, a hypervisor or virtual machine monitor (VMM) 930 may be interposed between the bare hardware 920 and the OS 910. In this configuration, VMM 930 acts as a software “cushion” or virtualization layer between the OS 910 and the bare hardware 920 of the computer system 1000.
VMM 930 instantiates and runs one or more virtual machine instances (“guest machines”). Each guest machine comprises a “guest” operating system, such as OS 910, and one or more applications, such as application(s) 902, designed to execute on the guest operating system. The VMM 930 presents the guest operating systems with a virtual operating platform and manages the execution of the guest operating systems.
In some instances, the VMM 930 may allow a guest operating system to run as if it is running on the bare hardware 920 of computer system 1000 directly. In these instances, the same version of the guest operating system configured to execute on the bare hardware 920 directly may also execute on VMM 930 without modification or reconfiguration. In other words, VMM 930 may provide full hardware and CPU virtualization to a guest operating system in some instances.
In other instances, a guest operating system may be specially designed or configured to execute on VMM 930 for efficiency. In these instances, the guest operating system is “aware” that it executes on a virtual machine monitor. In other words, VMM 930 may provide para-virtualization to a guest operating system in some instances.
A computer system process comprises an allotment of hardware processor time, and an allotment of memory (physical and/or virtual), the allotment of memory being for storing instructions executed by the hardware processor, for storing data generated by the hardware processor executing the instructions, and/or for storing the hardware processor state (e.g. content of registers) between allotments of the hardware processor time when the computer system process is not running. Computer system processes run under the control of an operating system and may run under the control of other programs being executed on the computer system.
Multiple threads may run within a process. Each thread also comprises an allotment of hardware processing time but share access to the memory allotted to the process. The memory is used to store the content of processors between the allotments when the thread is not running. The term thread may also be used to refer to a computer system process in multiple threads that are not running.
The term “cloud computing” is generally used herein to describe a computing model which enables on-demand access to a shared pool of computing resources, such as computer networks, servers, software applications, and services, and which allows for rapid provisioning and release of resources with minimal management effort or service provider interaction.
A cloud computing environment (sometimes referred to as a cloud environment, or a cloud) can be implemented in a variety of different ways to best suit different requirements. For example, in a public cloud environment, the underlying computing infrastructure is owned by an organization that makes its cloud services available to other organizations or to the general public. In contrast, a private cloud environment is generally intended solely for use by or within a single organization. A community cloud is intended to be shared by several organizations within a community; while a hybrid cloud comprises two or more types of cloud (e.g., private, community, or public) that are bound together by data and application portability.
Generally, a cloud computing model enables some of those responsibilities which previously may have been provided by an organization's own information technology department, to instead be delivered as service layers within a cloud environment, for use by consumers (either within or external to the organization, according to the cloud's public/private nature). Depending on the particular implementation, the precise definition of components or features provided by or within each cloud service layer can vary, but common examples include: Software as a Service (SaaS), in which consumers use software applications that are running upon a cloud infrastructure, while a SaaS provider manages or controls the underlying cloud infrastructure and applications. Platform as a Service (PaaS), in which consumers can use software programming languages and development tools supported by a PaaS provider to develop, deploy, and otherwise control their own applications, while the PaaS provider manages or controls other aspects of the cloud environment (i.e., everything below the run-time execution environment). Infrastructure as a Service (IaaS), in which consumers can deploy and run arbitrary software applications, and/or provision processing, storage, networks, and other fundamental computing resources, while an IaaS provider manages or controls the underlying physical cloud infrastructure (i.e., everything below the operating system layer). Database as a Service (DBaaS) in which consumers use a database server or Database Management System that is running upon a cloud infrastructure, while a DbaaS provider manages or controls the underlying cloud infrastructure, applications, and servers, including one or more database servers. In a cloud computing environment, there is no insight into the application or the application data. For a disconnection-requiring planned operation, with techniques discussed herein, it is possible to release and then to later rebalance sessions with no disruption to applications.
The above-described basic computer hardware and software and cloud computing environment presented for the purpose of illustrating the basic underlying computer components that may be employed for implementing the example implementation(s). The example implementation(s), however, are not necessarily limited to any particular computing environment or computing device configuration. Instead, the example implementation(s) may be implemented in any type of system architecture or processing environment that one skilled in the art, in light of this disclosure, would understand as capable of supporting the features and functions of the example implementation(s) presented herein.
According to one implementation, the techniques described herein are implemented by one or more special-purpose computing devices. The special-purpose computing devices may be hard-wired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field-programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more general-purpose hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination. Such special-purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques. The special-purpose computing devices may be desktop computer systems, portable computer systems, handheld devices, networking devices or any other device that incorporates hard-wired and/or program logic to implement the techniques.
For example,
Computer system 1000 also includes a main memory 1006, such as a random access memory (RAM) or another dynamic storage device, coupled to bus 1002 for storing information and instructions to be executed by processor 1004. Main memory 1006 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 1004. Such instructions, when stored in non-transitory storage media accessible to processor 1004, render computer system 1000 into a special-purpose machine that is customized to perform the operations specified in the instructions.
Computer system 1000 further includes a read-only memory (ROM) 1008 or other static storage device coupled to bus 1002 for storing static information and instructions for processor 1004. A storage device 1010, such as a magnetic disk or optical disk, is provided and coupled to bus 1002 for storing information and instructions.
Computer system 1000 may be coupled via bus 1002 to a display 1012, such as a cathode ray tube (CRT), for displaying information to a computer user. An input device 1014, including alphanumeric and other keys, is coupled to bus 1002 for communicating information and command selections to processor 1004. Another type of user input device is cursor control 1016, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 1004 and for controlling cursor movement on display 1012. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
Computer system 1000 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 1000 to be a special-purpose machine. According to one implementation, the techniques herein are performed by computer system 1000 in response to processor 1004 executing one or more sequences of one or more instructions contained in main memory 1006. Such instructions may be read into main memory 1006 from another storage medium, such as storage device 1010. Execution of the sequences of instructions contained in main memory 1006 causes processor 1004 to perform the process steps described herein. In alternative implementations, hard-wired circuitry may be used in place of or in combination with software instructions.
The term “storage media” as used herein refers to any non-transitory media that store data and/or instructions that cause a machine to operation in a specific fashion. Such storage media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 1010. Volatile media includes dynamic memory, such as main memory 1006. Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid-state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.
Storage media is distinct from but may be used in conjunction with transmission media. Transmission media participates in transferring information between storage media. For example, transmission media includes coaxial cables, copper wire, and fiber optics, including the wires that comprise bus 1002. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 1004 for execution. For example, the instructions may initially be carried on a magnetic disk or solid-state drive of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 1000 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal, and appropriate circuitry can place the data on bus 1002. Bus 1002 carries the data to main memory 1006, from which processor 1004 retrieves and executes the instructions. The instructions received by main memory 1006 may optionally be stored on storage device 1010 either before or after execution by processor 1004.
Computer system 1000 also includes a communication interface 1018 coupled to bus 1002. Communication interface 1018 provides a two-way data communication coupling to a network link 1020 that is connected to a local network 1022. For example, communication interface 1018 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 1018 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 1018 sends and receives electrical, electromagnetic, or optical signals that carry digital data streams representing various types of information.
Network link 1020 typically provides data communication through one or more networks to other data devices. For example, network link 1020 may provide a connection through local network 1022 to a host computer 1024 or to data equipment operated by an Internet Service Provider (ISP) 1026. ISP 1026, in turn, provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet” 1028. Local network 1022 and Internet 1028 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 1020 and through communication interface 1018, which carry the digital data to and from computer system 1000, are example forms of transmission media.
Computer system 1000 can send messages and receive data, including program code, through the network(s), network link 1020 and communication interface 1018. In the Internet example, a server 1030 might transmit a requested code for an application program through Internet 1028, ISP 1026, local network 1022 and communication interface 1018.
The received code may be executed by processor 1004 as it is received, and/or stored in storage device 1010 or other non-volatile storage for later execution.
A computing node is a combination of one or more hardware processors that each share access to a byte-addressable memory. Each hardware processor is electronically coupled to registers on the same chip of the hardware processor and is capable of executing an instruction that references a memory address in the addressable memory, and that causes the hardware processor to load data at that memory address into any of the registers. In addition, a hardware processor may have access to its separate exclusive memory that is not accessible to other processors. The one or more hardware processors may be running under the control of the same operating system
A hardware processor may comprise multiple core processors on the same chip, each core processor (“core”) being capable of separately executing a machine code instruction within the same clock cycles as another of the multiple cores. Each core processor may be electronically coupled to connect to a scratchpad memory that cannot be accessed by any other core processor of the multiple core processors.
A cluster comprises computing nodes that each communicate with each other via a network. Each node in a cluster may be coupled to a network card or a network-integrated circuit on the same board of the computing node. Network communication between any two nodes occurs via the network card or network integrated circuit on one of the nodes and a network card or network integrated circuit of another of the nodes. The network may be configured to support remote direct memory access.
In the foregoing specification, implementations of the invention have been described with reference to numerous specific details that may vary from implementation to implementation. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. The sole and exclusive indicator of the scope of the invention, and what is intended by the applicants to be the scope of the invention, is the literal and equivalent scope of the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction.
This application is related to: This application is related to: U.S. Application Ser. No. 17/162,527 “Efficient Identification of Vertices and Edges For Graph Indexes in an RDBMS,” filed on Sep. 28, 2018 now U.S. Pat. No. 11,500,868 issued Nov. 15, 2022,U.S. Application Ser. No. 16/147,367 “Efficient Graph Generation” filed on Sep. 28, 2018 now U.S. Pat. No. 11,556,500 issued Jan. 17, 2023,U.S. Application Ser. No. 18/385,602 “Modified Graph Extension” filed Oct. 31, 2023,the entire content of each of related applications is hereby entirely incorporated by reference in their entirety as if fully set forth herein.