The technical field of this invention is lossless compression for graph traversal.
Graph traversal is a core operation that is used in many graph processing applications. It involves visiting all nodes in the graph at most once in a particular order. The traversal procedure is typically performed in distinct steps where nodes at a particular level could be discovered in any order. Breadth-First Search (BFS) and Depth-First Search (DFS) are common examples of graph traversal.
Some of the applications of graph traversal are finding all reachable nodes (for garbage collection), finding the best reachable node (single-player game search), finding the best path through a graph (for routing and map directions) or topologically sorting a graph.
Graph traversal is a commonly used operation. When a distributed processing system is used, the devices must communicate their local bitmaps to all other devices. A method of lossless compression is shown that reduces the amount of data that needs to be communicated between processing nodes.
These and other aspects of this invention are illustrated in the drawings, in which:
The breadth-first search is shown in
To prevent multiple visiting of a given node during graph traversal in a distributed processing system, the node status should be available to all neighbors. This status could either be visited or unvisited, i.e., a single bit is needed to represent the node status. Hence, the whole graph is represented by a binary bitmap where the number of bits equals the number of nodes in the graph.
In distributed systems, individual devices process disjoint subsets of nodes. Each device holds a local bitmap that holds the status of its local nodes. These local bitmaps need to be communicated to other devices during graph traversal to avoid redundant traversal of graph nodes. A compression method is described in this invention for the raw bitmap that significantly reduces the bitmap size and provides more efficient communication.
Each node in the graph is represented by a single bit that indicates whether the node is visited or unvisited. Bits of the graph bitmap are combined in words, where the word size is chosen as 8, 16, 32, or 64 bits to simplify software implementation. The bits of a given word in the bitmap may represent successive or interleaved local nodes (based on the node ID). Interleaving of nodes is optimized to maximize the similarity between nodes within a word, i.e., nodes in the same word are likely to have the same status during graph traversal.
The bitmap compression procedure shown in
In the first few steps of the breath first search (BFS), most of the nodes are unvisited. Hence, the graph bitmap is dominated by all-zeros words. Conversely, in the final stages the graph bitmap is dominated by all-ones words. At a given BFS step, only the nodes that are discovered in the earlier step (i.e., the frontier nodes) are relevant. Therefore, we could either transmit information about the whole graph bitmap or only the bitmap of the frontier nodes, whichever provides more compression.
To enable lossless compression, an auxiliary bitmap 202 is generated that provides side information about the graph bitmap. Each bit in the auxiliary bitmap represents a word in the graph bitmap. A zero bit in the auxiliary bitmap means that the corresponding word in the graph bitmap is not transmitted. Therefore, it is set to the default value, λ, during decompression. This default value could be either an all-zeros word, or all-ones word depending on the BFS step. A one bit in the auxiliary bitmap means that the corresponding word in the graph bitmap is transmitted. An example of this coding procedure, with λ=0, is shown in
To further improve the coding efficiency, the auxiliary bitmap is encoded using standard Huffman encoding with more emphasis to the more sparse patterns.
The coding efficiency could be further improved by marking all isolated nodes (i.e., nodes with no neighbors) as visited during the initialization of the graph bitmap. The status of these nodes will never change, and this assignment improves the compression performance at later BFS steps.
The compression algorithm is shown in
(301) The graph bitmap is constructed by combining the status graph nodes in words. The nodes may be in their original order, or they may be interleaved. The interleaving is used to maximize the similarity between the statuses of nodes within a word. For example, column interleaving may be used such that the graph nodes are ordered in rows (according to their ID), where the total number of rows equal the width of the graph bitmap word. Then each word in the graph bitmap corresponds to a single column.
(302) Mark all the isolated nodes in the graph as visited in the graph bitmap.
(303) For the first stages in the Breadth-First search, only the active nodes in the graph frontier are encoded. These nodes are found by XORing the current graph bitmap with the graph bitmap at the previous BFS step. In these first stages, the default value of compression, λ, is set to zero.
(304) For the later stages, the graph bitmap itself is used for compression. In these stages, the default value of compression, λ, is set to all-ones word.
(305) An auxiliary bitmap is constructed, where each bit in the auxiliary bitmap represents a word in the graph bitmap.
(306) If a word in the graph bitmap equals the default value, λ, set the corresponding bit in the auxiliary bitmap to zero, and do not include this word in the compressed graph bitmap.
(307) Otherwise, if a word in the graph bitmap does not equal the default value, λ, set the corresponding bit in the auxiliary bitmap to one, and include this word in the compressed graph bitmap.
(308) After constructing the entire auxiliary bitmap, Huffman encoding is used for additional lossless compression of the auxiliary bitmap.
(309) Both the encoded auxiliary bitmap and the compressed graph bitmap are included as the output of the compression algorithm.
Number | Name | Date | Kind |
---|---|---|---|
5907297 | Cohen | May 1999 | A |
6278992 | Curtis | Aug 2001 | B1 |
7565380 | Venkatachary | Jul 2009 | B1 |
7620666 | Root | Nov 2009 | B1 |
9793920 | Kataoka | Oct 2017 | B1 |
9844092 | Kim | Dec 2017 | B2 |
20020095421 | Koskas | Jul 2002 | A1 |
20100223237 | Mishra | Sep 2010 | A1 |
20120002895 | Blum | Jan 2012 | A1 |
20130080725 | Usui | Mar 2013 | A1 |
20130282953 | Orme | Oct 2013 | A1 |
20150030036 | Wang | Jan 2015 | A1 |
20160139842 | Nakata | May 2016 | A1 |
20170103123 | Kataoka | Apr 2017 | A1 |
20170300491 | Kataoka | Oct 2017 | A1 |
20180232420 | Kaldewey | Aug 2018 | A1 |
Number | Date | Country | |
---|---|---|---|
20170346503 A1 | Nov 2017 | US |