This disclosure relates generally to updating streaming graphs and, more particularly, to edge batch reordering for streaming graph analytics.
Streaming graph analytics represent an important, emerging class of workloads. Streaming graph analytics involve operating on a graph as it evolves over time. Example applications of streaming graph analytics include computer network search engines, social network analysis, consumer recommendation systems, consumer fraud detection systems, etc. For example, in the context of a search engine, a streaming graph can be used to represent the connectivity between web sites accessible via a computer network, such as the Internet. In such an example, the streaming graph includes vertices to represent the different web sites, edges to represent the links between the web sites, and values of the vertices to represent the number of other web sites that link to corresponding ones of the web sites. The search engine can utilize streaming graph analytics to continuously update the streaming graph to rank the web sites represented in the graph based on an ever evolving number of web sites from which they are accessible.
The figures are not to scale. In general, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts, elements, etc. Connection references (e.g., attached, coupled, connected, and joined) are to be construed broadly and may include intermediate members between a collection of elements and relative movement between elements unless otherwise indicated. As such, connection references do not necessarily infer that two elements are directly connected and in fixed relation to each other.
Descriptors “first,” “second,” “third,” etc., are used herein when identifying multiple elements or components which may be referred to separately. Unless otherwise specified or understood based on their context of use, such descriptors are not intended to impute any meaning of priority, physical order or arrangement in a list, or ordering in time but are merely used as labels for referring to multiple elements or components separately for ease of understanding the disclosed examples. In some examples, the descriptor “first” may be used to refer to an element in the detailed description, while the same element may be referred to in a claim with a different descriptor such as “second” or “third.” In such instances, it should be understood that such descriptors are used merely for ease of referencing multiple elements or components.
Example methods, apparatus, systems and articles of manufacture (e.g., physical storage media) to implement edge batch reordering for streaming graph analytics are disclosed herein. An example streaming graph analytics system disclosed herein includes an example edge reorderer to provide reordered batches of edges to update a streaming graph. Example edge reorderers disclosed herein include an example edge clusterer to reorder, based on vertices of the streaming graph, a first batch of input edges to determine a first reordered batch of input edges. For example, the edge clusterer may cluster the first batch of input edges into respective groups associated with corresponding ones of the graph vertices. Example edge reorderers disclosed herein also include an example graph update analyzer to compute a first performance metric associated with a first update operation performed on the streaming graph with the first reordered batch of input edges. The example graph update analyzer is also to determine, based on at least the first performance metric, whether to reorder a second batch of input edges to be processed by a second update operation to be performed on the streaming graph.
In some disclosed examples, the graph update analyzer is further to compute a second performance metric associated with a third update operation performed on the streaming graph with a third batch of input edges, with the third batch of input edges not being reordered prior to the second update operation. In such examples, the graph update analyzer is to determine whether to reorder the second batch of input edges based on the first performance metric and the second performance metric. In some such examples, the third update operation is to occur before the first update operation, the first update operation is to occur before the second update operation, and the graph update analyzer is to select the third batch of input edges based on a sample frequency. Additionally or alternatively, in some such examples, the first performance metric is a duration of the first update operation, the second performance metric is a duration of the third update operation, and the graph update analyzer is to (i) determine that the second batch of input edges is to be reordered when the second performance metric is larger than the first performance metric, and (ii) determine that the second batch of input edges is not to be reordered when the first performance metric is larger than the second performance metric.
In some disclosed examples, to compute the first performance metric, the graph update analyzer is to determine a first number of edges in the first reordered batch of input edges associated with ones of the vertices that source no more than a threshold number of edges in the first reordered batch of input edges. In some such examples, the graph update analyzer is to determine a second number of vertices that source more than the threshold number of edges in the first reordered batch of input edges. In some such examples, the graph update analyzer is to compute a difference between the first number of edges and a total number of edges in the first reordered batch of input edges, and further compute the first performance metric to be a ratio of the difference and the second number of vertices. In some such examples, the threshold number is a first threshold number, and the graph update analyzer is to (i) determine that the second batch of input edges is to be reordered when the first performance metric is larger than a second threshold value, and determine that the second batch of input edges is not to be reordered when the first performance metric is smaller than the second threshold value.
These and other example methods, apparatus, systems and articles of manufacture (e.g., physical storage media) to implement edge batch reordering for streaming graph analytics are disclosed in further detail below.
As noted above, streaming graph analytics represent an important emerging class of workloads, which exhibit distinct characteristics from traditional static graph processing. Streaming graph analytics involves operating on a graph as it evolves over time. In streaming graph processing, graph updates can contribute to a substantial portion (e.g., 40% in some examples) of the overall graph processing latency. Two contributors to bottlenecks in the update phase of streaming graph processing include i) poor data reuse from on-chip caches, and ii) heavy contention between different threads trying to perform edge updates for a single vertex. By reordering the edges in an incoming edge batch as disclosed in further detail below, it is possible to achieve higher cache locality and lower thread contention, which can reduce graph update latency.
Other streaming graph processing techniques have relied on different types of data structures to optimize the update performance in streaming graph analytics. Such data structures can enable faster insertion/deletion of edges to/from the graph compared to a conventional compressed sparse row (CSR) implementation prevalent in static graph processing. While such other streaming graph processing techniques may enable faster ingestion of incoming edge streams compared to the traditional CSR approach, they do not focus on the problems of poor cache locality and inter-thread contention. As a result, edge batch reordering, as disclosed herein, can improve performance of such other streaming graph processing techniques. For example, edge batch reordering for streaming graph analytics, as disclosed herein, can be applied to any graph data structure and streaming graph update technique to improve update performance by leveraging locality-aware and thread-contention-aware reordering of the edges in the incoming edge batch.
As disclosed in further detail below, batch reordering for streaming graph analytics involves reordering the edges in incoming edge batches by clustering edges belonging to the same vertex of the streaming graph. Clustering increases the opportunity to reuse more on-chip data by exploiting temporal locality when updating the edges of the same vertex. Moreover, clustering creates opportunity for more efficient workload distribution among threads. For example, one thread can be assigned to update several successive edges in the batch belonging to the same vertex, thereby reducing thread contentions when updating edges for the same vertex. As illustrated in further detail below, edge batch reordering can substantially improve the performance of graph updates in streaming graph analytics utilized in different application scenarios, such as computer network search engines, social network analysis, consumer recommendation systems, consumer fraud detection systems, etc. For example, graph update latency for two example imbalanced graph datasets is reduced by approximately a factor of two (2) in examples illustrated below.
Turning to the figures, a block diagram of an example streaming graph analytics methodology 100 in which edge batch reordering can be employed in accordance with teachings of this disclosure is illustrated in
A block diagram of an example system 200 including an example edge reorderer 205 to perform edge batch reordering for streaming graph analytics in accordance with teachings of this disclosure is illustrated in
The example graph updater 215 implements the example update operation 110 and the example compute operation 115 to be performed with a collected batches of edges 105 on the streaming graph stored in the graph data structure 120 to determine the updated vertex values 125, as described above. The example edge reorderer 205 is not limited to a particular type of update operation 110, a particular type of compute operation 115 or a particular type of graph data structure 120. Rather, the edge reorderer 205 can be used with any type of update operation 110 and/or any type of compute operation 115 implemented by the graph updater 215. Examples of update operations 110 that can be implemented by the graph updater 215 include, but are not limited to, operations that insert the edges of a collected batch of edges 105 into the graph data structure 120 based on (i) the source vertices of the edges, (ii) the destination vertices of the edges, etc., operations that change the weights of pre-existing edges in a weighted graph, etc. Examples of compute operations 110 that can be implemented by the graph updater 215 include, but are not limited to, PageRank, Breadth First Search (BFS), Connected Components, Shortest Path, etc. Thus, the graph updater 215 is an example of means for determining updated vertex values of a streaming graph based on a batch of input edges.
In the illustrated example, the streaming graph analytics system 200 provides the updated vertex values 125 computed for the streaming graph to one or more example applications 225. As such, the graph updater 215 can be structured to compute (e.g., with the compute operation 110) any number(s) and/or types of vertex values 125 for each vertex (or some subset of one or more vertices) of the streaming graph stored in the graph data structure 120, with the number(s) and/or types of vertex values 125 appropriate for (e.g., tailored to) the application(s) 225. For example, if the application(s) 225 include a search engine and the vertices of the streaming graph correspond to web sites, the updated vertex values 125 can be a popularity ranking, relevancy ranking, etc., computed for respective ones of the vertices based on a collected batch of edges 105. As another example, if the application(s) 225 include a social media recommendation engine and the vertices of the streaming graph correspond to users of a social media service, the updated vertex values 125 can be a popularity ranking, a follower ranking, etc., computed for respective ones of the vertices based on a collected batch of edges 105. As yet another example, if the application(s) 225 include a fraud detection engine and the vertices of the streaming graph correspond to consumers and merchants, the updated vertex values 125 can be a fraudulent transaction probability, a malicious entity probability, etc., computed for respective ones of the vertices based on a collected batch of edges 105.
The edge reorderer 205 of the illustrated example reorders a collected batch of edges 105 prior to the batch of edges 105 being applied to or, in other words, processed by the graph updater 215, as disclosed in further detail below. In some examples, the edge reorderer 205 also determines, based on one or more criteria, whether reordering is to be performed on a given collected batch of edges 105 to be applied to or, in other words, processed by the graph updater 215, as disclosed in further detail below. An example implementation of the edge reorderer 205 of
The example edge reorderer 205 of
The thread scheduler 310 of the illustrated example assigns (or, in other words, schedules) one or more execution threads to implement the processing of the graph updater 215 (e.g., to implement the update operation 110 and the compute operation 115) on corresponding groups of one or more edges in the output batch of edges 325 (which may be reordered or not based on one or more criteria, as described in further detail below). For example, the thread scheduler 310 may assign edges of the batch of edges 315 successively to respective edge groups each containing up to a threshold number of edges (e.g., 4 edges, 400, edges, 4000 edges, etc.), and further assign the respective edge groups to corresponding execution threads. In such an example, the corresponding execution threads each implement the processing of the graph updater 215 described above to update the streaming graph based on the respective group of edges assigned to that execution thread. In some examples, the threshold number of edges to be assigned to group is pre-initialized, specified as an input parameter, adaptable, etc., or any combination thereof (e.g., pre-configured as an initial threshold, which can be overridden based on an input parameter and/or adaptable over time). In some examples, the thread scheduler 310 may assign groups of one or more edges to corresponding execution threads such that each thread is responsible for performing the graph update processing on some or all of the output batch of edges 325 corresponding to a respective vertex or group of vertices. For example, the thread scheduler 310 may assign all edges of the output batch of edges 325 that are associated with a first one of the streaming graph vertices to a first execution thread, may assign all edges of the output batch of edges 325 that are associated with a second one of the streaming graph vertices to a second execution thread, may assign all edges of the output batch of edges 325 that are associated with third and fourth ones of the streaming graph vertices to a third execution thread, etc. As such, the thread scheduler 310 is an example of means for assigning groups of edges to threads to perform edge update and compute operations for streaming graph analytics. Example batch reordering operation performed by the edge clusterer 305 and the thread scheduler 310 and, more generally, the example edge reorderer 205 of
In some examples, the graph update operation 110 occupies 40% of the edge batch processing latency, on average. In some examples, the bottlenecks in the graph update operation 110 include (i) poor on-chip data reuse, (ii) thread contentions due to multiple threads trying to update the edges of the same graph vertex or graph node. (As used herein, in the context of a streaming graph, the terms vertex and node are equivalent and can be used interchangeably.) To illustrate these bottlenecks,
First, because the edges of an input batch 105 are not organized in any specific order, the edges corresponding to a given source vertex may not be clustered together. Hence, each execution thread 415 is unable to achieve temporal locality in on-chip data reuse for the edges of the same source vertex. For example, in
Second, due to the random order of edges originating from the same vertex in an update batch, it is possible that when executed in parallel (e.g., such as with OpenMP®), multiple threads are assigned the task of updating the edges for the same vertex. For example, in
The example of
The example of
In some examples, the edge reorderer 205 of
In a second example enhancement, the thread scheduler 310 of the edge reorderer 205 is structured to implement vertex-oriented work distribution, in addition to edge batch reordering as disclosed above, to further reduce thread contentions.
Returning to
Accordingly, the graph update analyzer 315 enables the edge reorderer 205 to implement runtime adaptive batch reordering. A runtime approach can be beneficial because, for streaming graphs, it may not be possible to have knowledge of the entire graph in advance. As such, utilizing offline techniques to decide beforehand whether to perform edge batch reordering may prove to be inadequate. Thus, in some examples, at runtime, the graph update analyzer 315 monitors (e.g., periodically, based on one or more events, etc.) incoming batches of edges and adaptively decides whether to reorder the batches. A goal of such an adaptive technique is to mitigate the performance degradation for types of batches which do not benefit from reordering. At the same time, such an adaptive technique can maintain the high performance achieved from reordering for other types of batches.
In some examples, the graph update analyzer 315 implements one or more types of runtime adaptive batch reordering techniques, or any combination thereof. In a first example implementation, the graph update analyzer 315 implements sample-based runtime adaptive batch reordering. In such examples, at a given sampling frequency (e.g., every nth batch), which may be pre-initialized, specified by an input value, adaptable, etc., or any combination thereof, the graph update analyzer 315 configures the edge clusterer 305 of the edge reorderer 205 to not reorder the nth input batch of edges, thereby causing the input edge batch n to be updated without reordering. However, the graph update analyzer 315 configures the edge clusterer 305 of the edge reorderer 205 to reorder the input edge batch (n+1), which causes the (n+1)th batch to be updated with reordering. At the end of updating batch (n+1), the graph update analyzer 315 compares respective runtime performance metrics (e.g., overall update processing time) for the graph updater 215 to perform the respective update operations 110 on the nth and (n+1)th edge batches. Based on the comparison, the graph update analyzer 315 decides whether to configure the edge clusterer 305 of the edge reorderer 205 to reorder or not reorder the next n batches, after which another runtime sampling operation is performed, as described above. Of course, in some examples, the order of not performing reordering on the nth edge batch and performing reordering on the (n+1)th edge batch can be reversed such that the nth edge batch is reordered and the (n+1)th edge batch is not reordered.
In a second example implementation, the graph update analyzer 315 implements heuristics-based runtime adaptive batch reordering. In such examples, at a given monitoring frequency (e.g., every nth batch), which may be pre-initialized, specified by an input value, adaptable, etc., or any combination thereof, the graph update analyzer 315 examines the numbers of edges sources of the input batch sourced by different ones of the vertices of the streaming graph. In some examples, the graph update analyzer 315 configures the edge clusterer 305 of the edge reorderer 205 to reorder the nth input batch of edges, thereby causing the input edge batch n to be updated with reordering, whereas in other examples, the graph update analyzer 315 configures the edge clusterer 305 of the edge reorderer 205 to not reorder the nth input batch of edges, thereby causing the input edge batch n to be updated without reordering. Based on the nth input edge batch (which may be reordered or not reordered), the graph update analyzer 315 computes a performance metric (e.g., Orderk clusterable average degree) according to Equation 1, which is:
According to Equation 1, the graph update analyzer 315 subtracts the edges of low-degree nodes (e.g., having a degree not more than the first threshold, k, as shown in Equation 1, which may be pre-initialized, specified by an input value, adaptable, etc., or any combination thereof) from the batch size to determine a number of remaining edges, and divides the number of remaining edges by a number of high-degree nodes (e.g., having a degree more than the first threshold, k, as shown in Equation 1) to determine the performance metric. The first performance metric measures the average clusterable degree for the high-degree nodes (e.g., having a degree more than the first threshold, k, as shown in Equation 1) of the streaming graph at the nth edge batch update. In this second example implementation, the graph update analyzer 315 decides whether to configure the edge clusterer 305 of the edge reorderer 205 to reorder or not reorder the next n edge batches, after which another runtime sampling operation is performed, based on comparing the performance metric to a second threshold according to Equation 2, which is
Orderk clusterable average degree>threshold Equation 2
The threshold of Equation 2 may be empirically determined and/or pre-initialized, specified by an input value, adaptable, etc., or any combination thereof. Also, in some examples, the threshold of Equation 2 is different from the threshold (k) of Equation 1.
In some examples, to determine the values of x and y from Equation 1, the graph update analyzer 315 utilizes atomic increment (e.g., fetch and add) operations to count the edges in a sorted batch of edges, such as when the counting of Equation 1 is performed on an input edge batch with one thread for which the reordering is performed by another thread. In some examples, to determine the values of x and y from Equation 1, the graph update analyzer 315 utilizes bookkeeping with a combination of a concurrent hash table and a concurrent set that are updated as edges arrive, such as when the counting of Equation 1 is performed on a batch for which reordering is disabled.
In some examples, the graph update analyzer 315 can be configured (e.g., during initialization, at run-time, etc.) to perform sample-based runtime adaptive batch reordering or heuristics-based runtime adaptive batch reordering based on a cost-benefit analysis. For examples, the respective costs of sample-based runtime adaptive batch reordering vs. heuristics-based runtime adaptive batch reordering can correspond to the estimated processing overhead expected to be incurred by the respective adaptive techniques when monitoring a given input edge batch (e.g., the nth input edge batch described above). The respective benefits of sample-based runtime adaptive batch reordering vs. heuristics-based runtime adaptive batch reordering can correspond to an estimate of how often the respective techniques are expected to correctly select whether edge batch reordering should be enabled or disabled (e.g., based on the characteristics of the streaming graph being updated, the characteristics of the input edges, etc.).
Thus, in some examples, the graph update analyzer 315 of
However, in examples in which the graph update analyzer 315 implements heuristic-based runtime adaptive batch reordering, then the graph update analyzer 315 may compute the first performance metric (e.g., Orderk clusterable average degree of Equation 1) by (i) determining a first number of edges (e.g., y of Equation 1) in the first reordered batch of input edges associated with ones of the vertices that source no more than a threshold number of edges (e.g., k of Equation 1) in the first reordered batch of input edges, and (ii) determining a second number of vertices (e.g., x of Equation 1) that source more than the threshold number of edges (e.g., k of Equation 1) in the first reordered batch of input edges. In some such examples, the graph update analyzer 315 then computes a difference between the first number of edges (e.g., y of Equation 1) and a total number of edges in the first reordered batch of input edges, and computes the first performance metric to be a ratio of that difference to the second number of vertices (e.g., x of Equation 1). In some such examples, the graph update analyzer 315 is further to determine that the second batch of input edges is to be reordered when the first performance metric is larger than a second threshold value, and determines that the second batch of input edges is not to be reordered when the first performance metric is smaller than the second threshold value.
Thus, the graph update analyzer 315 is an example of means for determining whether to reorder a batch of input edges to be processed by an update operation to be performed on a streaming graph.
While an example manner of implementing the streaming graph analytics system 200 is illustrated in
Flowcharts representative of example hardware logic, machine readable instructions, hardware implemented state machines, and/or any combination thereof for implementing the example streaming graph analytics system 200 are shown in
The machine readable instructions described herein may be stored in one or more of a compressed format, an encrypted format, a fragmented format, a compiled format, an executable format, a packaged format, etc. Machine readable instructions as described herein may be stored as data (e.g., portions of instructions, code, representations of code, etc.) that may be utilized to create, manufacture, and/or produce machine executable instructions. For example, the machine readable instructions may be fragmented and stored on one or more storage devices and/or computing devices (e.g., servers). The machine readable instructions may require one or more of installation, modification, adaptation, updating, combining, supplementing, configuring, decryption, decompression, unpacking, distribution, reassignment, compilation, etc. in order to make them directly readable, interpretable, and/or executable by a computing device and/or other machine. For example, the machine readable instructions may be stored in multiple parts, which are individually compressed, encrypted, and stored on separate computing devices, wherein the parts when decrypted, decompressed, and combined form a set of executable instructions that implement a program such as that described herein.
In another example, the machine readable instructions may be stored in a state in which they may be read by a computer, but require addition of a library (e.g., a dynamic link library (DLL)), a software development kit (SDK), an application programming interface (API), etc. in order to execute the instructions on a particular computing device or other device. In another example, the machine readable instructions may need to be configured (e.g., settings stored, data input, network addresses recorded, etc.) before the machine readable instructions and/or the corresponding program(s) can be executed in whole or in part. Thus, the disclosed machine readable instructions and/or corresponding program(s) are intended to encompass such machine readable instructions and/or program(s) regardless of the particular format or state of the machine readable instructions and/or program(s) when stored or otherwise at rest or in transit.
The machine readable instructions described herein can be represented by any past, present, or future instruction language, scripting language, programming language, etc. For example, the machine readable instructions may be represented using any of the following languages: C, C++, Java, C#, Perl, Python, JavaScript, HyperText Markup Language (HTML), Structured Query Language (SQL), Swift, etc.
As mentioned above, the example processes of
“Including” and “comprising” (and all forms and tenses thereof) are used herein to be open ended terms. Thus, whenever a claim employs any form of “include” or “comprise” (e.g., comprises, includes, comprising, including, having, etc.) as a preamble or within a claim recitation of any kind, it is to be understood that additional elements, terms, etc. may be present without falling outside the scope of the corresponding claim or recitation. As used herein, when the phrase “at least” is used as the transition term in, for example, a preamble of a claim, it is open-ended in the same manner as the term “comprising” and “including” are open ended. The term “and/or” when used, for example, in a form such as A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) B with C, and (7) A with B and with C. As used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. As used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B.
As used herein, singular references (e.g., “a”, “an”, “first”, “second”, etc.) do not exclude a plurality. The term “a” or “an” entity, as used herein, refers to one or more of that entity. The terms “a” (or “an”), “one or more”, and “at least one” can be used interchangeably herein. Furthermore, although individually listed, a plurality of means, elements or method actions may be implemented by, e.g., a single unit or processor. Additionally, although individual features may be included in different examples or claims, these may possibly be combined, and the inclusion in different examples or claims does not imply that a combination of features is not feasible and/or advantageous.
An example program 1100 that may be executed to implement the example streaming graph analytics system 200 of
Next, at block 1120, the example graph updater 215 performs an update operation 110, as described above, on the reordered batch of edges 325 if at block 1110 the graph update analyzer 315 determined reordering was to be performed, or on the unreordered batch of input edges 105 if at block 1110 the graph update analyzer 315 determined reordering was not to be performed. At block 1125, the graph updater 215 performs a compute operation 115, as described above, on the updated streaming graph to determine updated vertex values to be output to one or more applications 225. At block 1130, the graph update analyzer 315 performs runtime adaptive batch reordering, as described above, to determine whether a subsequent batch of collected input edges is to be reordered before being used to update the streaming graph. Two example programs that may be executed to implement the processing at block 1130 are illustrated in
A first example program 1200 that may be executed to implement the example graph update analyzer 315 of
A second example program 1300 that may be executed to implement the example graph update analyzer 315 of
The processor platform 1400 of the illustrated example includes a processor 1412. The processor 1412 of the illustrated example is hardware. For example, the processor 1412 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor 1412 may be a semiconductor based (e.g., silicon based) device. In this example, the processor 1412 implements the example edge reorderer 205, the example edge collector 210, the example graph updater 215, the example graph data structure 120, the example edge clusterer 305, the example thread scheduler 310 and/or the example graph update analyzer 315.
The processor 1412 of the illustrated example includes a local memory 1413 (e.g., a cache). The processor 1412 of the illustrated example is in communication with a main memory including a volatile memory 1414 and a non-volatile memory 1416 via a link 1418. The link 1418 may be implemented by a bus, one or more point-to-point connections, etc., or a combination thereof. The volatile memory 1414 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®) and/or any other type of random access memory device. The non-volatile memory 1416 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 1414, 1416 is controlled by a memory controller.
The processor platform 1400 of the illustrated example also includes an interface circuit 1420. The interface circuit 1420 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface.
In the illustrated example, one or more input devices 1422 are connected to the interface circuit 1420. The input device(s) 1422 permit(s) a user to enter data and/or commands into the processor 1412. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, a trackbar (such as an isopoint), a voice recognition system and/or any other human-machine interface. Also, many systems, such as the processor platform 1400, can allow the user to control the computer system and provide data to the computer using physical gestures, such as, but not limited to, hand or body movements, facial expressions, and face recognition.
One or more output devices 1424 are also connected to the interface circuit 1420 of the illustrated example. The output devices 1424 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer and/or speakers(s). The interface circuit 1420 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.
The interface circuit 1420 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 1426. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc.
The processor platform 1400 of the illustrated example also includes one or more mass storage devices 1428 for storing software and/or data. Examples of such mass storage devices 1428 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and digital versatile disk (DVD) drives. In some examples, the mass storage device 1428 may implement the example graph data structure 120. Additionally or alternatively, in some examples the volatile memory 1414 may implement the example graph data structure 120.
The machine executable instructions 1432 corresponding to the instructions of
From the foregoing, it will be appreciated that example methods, apparatus and articles of manufacture have been disclosed that implement edge batch reordering for streaming graph analytics. The disclosed methods, apparatus and articles of manufacture can improve the efficiency of using a computing device to implement streaming graph analytics by clustering edges belonging to the same vertex of the streaming graph, thereby providing temporal locality, which can improve data reuse in on-chip caches, as described above. The disclosed methods, apparatus and articles of manufacture can also improve the efficiency of using a computing device to implement streaming graph analytics by achieving an efficient workload distribution among threads, thereby reducing contention between different threads attempting to perform edge updates, as described above. The disclosed methods, apparatus and articles of manufacture are accordingly directed to one or more improvement(s) in the functioning of a computer.
The foregoing disclosure provides example solutions to implement edge batch reordering for streaming graph analytics. The following further examples, which include subject matter such as an apparatus to implement edge batch reordering for streaming graph analytics, a non-transitory computer readable medium including instructions that, when executed, cause at least one processor to implement edge batch reordering for streaming graph analytics, and a method to implement edge batch reordering for streaming graph analytics, are disclosed herein. The disclosed examples can be implemented individually and/or in one or more combinations.
Example 1 is an apparatus to provide reordered batches of edges to update a streaming graph. The apparatus of example 1 includes an edge clusterer to reorder, based on vertices of the streaming graph, a first batch of input edges to determine a first reordered batch of input edges. The apparatus of example 1 also includes a graph update analyzer to: (i) compute a first performance metric associated with a first update operation performed on the streaming graph with the first reordered batch of input edges; and (ii) determine, based on at least the first performance metric, whether to reorder a second batch of input edges to be processed by a second update operation to be performed on the streaming graph.
Example 2 includes the subject matter of example 1, wherein the graph update analyzer is to: (i) compute a second performance metric associated with a third update operation performed on the streaming graph with a third batch of input edges, the third batch of input edges not reordered prior to the second update operation; and (ii) determine whether to reorder the second batch of input edges based on the first performance metric and the second performance metric.
Example 3 includes the subject matter of example 2, wherein the third update operation is to occur before the first update operation, the first update operation is to occur before the second update operation, and the graph update analyzer is to select the third batch of input edges based on a sample frequency.
Example 4 includes the subject matter of example 2 or example 3, wherein the first performance metric is a duration of the first update operation, the second performance metric is a duration of the third update operation, and the graph update analyzer is to: (i) determine that the second batch of input edges is to be reordered when the second performance metric is larger than the first performance metric; and (ii) determine that the second batch of input edges is not to be reordered when the first performance metric is larger than the second performance metric.
Example 5 includes the subject matter of example 1, wherein to compute the first performance metric, the graph update analyzer is to: (i) determine a first number of edges in the first reordered batch of input edges associated with ones of the vertices that source no more than a threshold number of edges in the first reordered batch of input edges; and (ii) determine a second number of vertices that source more than the threshold number of edges in the first reordered batch of input edges.
Example 6 includes the subject matter of example 5, wherein the graph update analyzer is to: (i) compute a difference between the first number of edges and a total number of edges in the first reordered batch of input edges; and (ii) compute the first performance metric to be a ratio of the difference and the second number of vertices.
Example 7 includes the subject matter of example 5 or example 6, wherein the threshold number is a first threshold number, and the graph update analyzer is to: (i) determine that the second batch of input edges is to be reordered when the first performance metric is larger than a second threshold value; and (ii) determine that the second batch of input edges is not to be reordered when the first performance metric is smaller than the second threshold value.
Example 8 includes the subject matter of any one of examples 1 to 7, wherein to reorder the first batch of input edges, the edge clusterer is to cluster the first batch of input edges into respective groups associated with corresponding ones of the vertices.
Example 9 includes the subject matter of any one of examples 1 to 8, wherein the first reordered batch of input edges includes a reordered edge batch for in-neighbors of the vertices of the streaming graph and a reordered edge batch for out-neighbors of the vertices of the streaming graph, and the edge clusterer is to (i) store the reordered edge batch for in-neighbors in a first queue, and (ii) store the reordered edge batch for out-neighbors in a second queue.
Example 10 is a non-transitory computer readable medium including computer readable instructions that, when executed, cause a processor to at least: (i) reorder, based on vertices of a streaming graph, a first batch of input edges to determine a first reordered batch of input edges; (ii) compute a first performance metric associated with a first update operation performed on the streaming graph with the first reordered batch of input edges; and (iii) determine, based on at least the first performance metric, whether to reorder a second batch of input edges to be processed by a second update operation to be performed on the streaming graph.
Example 11 includes the subject matter of example 10, wherein the computer readable instructions, when executed, cause the processor to: (i) compute a second performance metric associated with a third update operation performed on the streaming graph with a third batch of input edges, the third batch of input edges not reordered prior to the second update operation; and (ii) determine whether to reorder the second batch of input edges based on the first performance metric and the second performance metric.
Example 12 includes the subject matter of example 11, wherein the third update operation is to occur before the first update operation, the first update operation is to occur before the second update operation, and the computer readable instructions, when executed, cause the processor to select the third batch of input edges based on a sample frequency.
Example 13 includes the subject matter of example 11 or example 12, wherein the first performance metric is a duration of the first update operation, the second performance metric is a duration of the third update operation, and the computer readable instructions, when executed, cause the processor to: (i) determine that the second batch of input edges is to be reordered when the second performance metric is larger than the first performance metric; and (ii) determine that the second batch of input edges is not to be reordered when the first performance metric is larger than the second performance metric.
Example 14 includes the subject matter of example 10, wherein to compute the first performance metric, the computer readable instructions, when executed, cause the processor to: (i) determine a first number of edges in the first reordered batch of input edges associated with ones of the vertices that source no more than a threshold number of edges in the first reordered batch of input edges; and (ii) determine a second number of vertices that source more than the threshold number of edges in the first reordered batch of input edges.
Example 15 includes the subject matter of example 14, wherein the computer readable instructions, when executed, cause the processor to: (i) compute a difference between the first number of edges and a total number of edges in the first reordered batch of input edges; and (ii) compute the first performance metric to be a ratio of the difference and the second number of vertices.
Example 16 includes the subject matter of example 14 or example 15, wherein the threshold number is a first threshold number, and the computer readable instructions, when executed, cause the processor to: (i) determine that the second batch of input edges is to be reordered when the first performance metric is larger than a second threshold value; and (ii) determine that the second batch of input edges is not to be reordered when the first performance metric is smaller than the second threshold value.
Example 17 includes the subject matter of any one of examples 10 to 16, wherein to reorder the first batch of input edges, the computer readable instructions, when executed, cause the processor to cluster the first batch of input edges into respective groups associated with corresponding ones of the vertices.
Example 18 includes the subject matter of any one of examples 10 to 16, wherein the first reordered batch of input edges includes a reordered edge batch for in-neighbors of the vertices of the streaming graph and a reordered edge batch for out-neighbors of the vertices of the streaming graph, and the computer readable instructions, when executed, cause the processor to (i) store the reordered edge batch for in-neighbors in a first queue, and (ii) store the reordered edge batch for out-neighbors in a second queue.
Example 19 is a method to provide reordered batches of edges to update a streaming graph. The method of example 19 includes reordering, by executing an instruction with a processor and based on vertices of a streaming graph, a first batch of input edges to determine a first reordered batch of input edges. The method of example 19 also includes computing, by executing an instruction with the processor, a first performance metric associated with a first update operation performed on the streaming graph with the first reordered batch of input edges. The method of example 19 further includes determining, based on at least the first performance metric and by executing an instruction with the processor, whether to reorder a second batch of input edges to be processed by a second update operation to be performed on the streaming graph.
Example 20 includes the subject matter of example 19, and further includes: (i) computing a second performance metric associated with a third update operation performed on the streaming graph with a third batch of input edges, the third batch of input edges not reordered prior to the second update operation; and (ii) determining whether to reorder the second batch of input edges based on the first performance metric and the second performance metric.
Example 21 includes the subject matter of example 20, wherein the third update operation is to occur before the first update operation, the first update operation is to occur before the second update operation, and further including selecting the third batch of input edges based on a sample frequency.
Example 22 includes the subject matter of example 20 or example 21, wherein the first performance metric is a duration of the first update operation, the second performance metric is a duration of the third update operation, and further including: (i) determining that the second batch of input edges is to be reordered when the second performance metric is larger than the first performance metric; and (ii) determining that the second batch of input edges is not to be reordered when the first performance metric is larger than the second performance metric.
Example 23 includes the subject matter of example 19, wherein computing the first performance includes: (i) determining a first number of edges in the first reordered batch of input edges associated with ones of the vertices that source no more than a threshold number of edges in the first reordered batch of input edges; and (ii) determining a second number of vertices that source more than the threshold number of edges in the first reordered batch of input edges.
Example 24 includes the subject matter of example 23, and further includes: (i) computing a difference between the first number of edges and a total number of edges in the first reordered batch of input edges; and (ii) computing the first performance metric to be a ratio of the difference and the second number of vertices.
Example 25 includes the subject matter of example 23 or example 24, wherein the threshold number is a first threshold number, and further including: (i) determining that the second batch of input edges is to be reordered when the first performance metric is larger than a second threshold value; and (ii) determining that the second batch of input edges is not to be reordered when the first performance metric is smaller than the second threshold value.
Example 26 includes the subject matter of any one of examples 19 to 25, wherein the reordering of the first batch of input edges includes clustering the first batch of input edges into respective groups associated with corresponding ones of the vertices.
Example 27 includes the subject matter of any one of examples 19 to 26, wherein the first reordered batch of input edges includes a reordered edge batch for in-neighbors of the vertices of the streaming graph and a reordered edge batch for out-neighbors of the vertices of the streaming graph, and further including (i) storing the reordered edge batch for in-neighbors in a first queue, and (ii) storing the reordered edge batch for out-neighbors in a second queue.
Example 28 is a system to provide reordered batches of edges to update a streaming graph. The system of example 28 includes means for reordering, based on vertices of the streaming graph, a first batch of input edges to determine a first reordered batch of input edges. The system of example 28 also includes means for determining whether to reorder a second batch of input edges to be processed by a second update operation to be performed on the streaming graph. In example 28, the means for reordering is to (i) compute a first performance metric associated with a first update operation performed on the streaming graph with the first reordered batch of input edges, and (ii) determine whether to reorder the second batch of input edges based on the first performance metric.
Example 29 includes the subject matter of example 28, wherein the means for determining is to: (i) compute a second performance metric associated with a third update operation performed on the streaming graph with a third batch of input edges, the third batch of input edges not reordered prior to the second update operation; and (ii) determine whether to reorder the second batch of input edges based on the first performance metric and the second performance metric.
Example 30 includes the subject matter of example 29, wherein the third update operation is to occur before the first update operation, the first update operation is to occur before the second update operation, and the means for determining is to select the third batch of input edges based on a sample frequency.
Example 31 includes the subject matter of example 29 or example 30, wherein the first performance metric is a duration of the first update operation, the second performance metric is a duration of the third update operation, and the means for determining is to: (i) determine that the second batch of input edges is to be reordered when the second performance metric is larger than the first performance metric; and (ii) determine that the second batch of input edges is not to be reordered when the first performance metric is larger than the second performance metric.
Example 32 includes the subject matter of example 28, wherein to compute the first performance metric, the means for determining is to: (i) determine a first number of edges in the first reordered batch of input edges associated with ones of the vertices that source no more than a threshold number of edges in the first reordered batch of input edges; and (ii) determine a second number of vertices that source more than the threshold number of edges in the first reordered batch of input edges.
Example 33 includes the subject matter of example 32, wherein the means for determining is to: (i) compute a difference between the first number of edges and a total number of edges in the first reordered batch of input edges; and (ii) compute the first performance metric to be a ratio of the difference and the second number of vertices.
Example 34 includes the subject matter of example 32 or example 33, wherein the threshold number is a first threshold number, and the means for determining is to: (i) determine that the second batch of input edges is to be reordered when the first performance metric is larger than a second threshold value; and (ii) determine that the second batch of input edges is not to be reordered when the first performance metric is smaller than the second threshold value.
Example 35 includes the subject matter of any one of examples 28 to 34, wherein to reorder the first batch of input edges, the means for reordering is to cluster the first batch of input edges into respective groups associated with corresponding ones of the vertices.
Example 36 includes the subject matter of any one of examples 28 to 35, wherein the first reordered batch of input edges includes a reordered edge batch for in-neighbors of the vertices of the streaming graph and a reordered edge batch for out-neighbors of the vertices of the streaming graph, and the means for reordering is to (i) store the reordered edge batch for in-neighbors in a first queue, and (ii) store the reordered edge batch for out-neighbors in a second queue.
Although certain example methods, apparatus and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the claims of this patent.