This application is based on and claims priority under 35 U.S.C. § 119 to Korean Patent Application Nos. 10-2022-0114371 and 10-2022-0162047, respectively filed on Sep. 8, 2022 and Nov. 28, 2022, in the Korean Intellectual Property Office, the disclosures of which are incorporated by reference herein in their entirety.
Embodiments of the disclosure relate to a method and apparatus for obtaining a triangle in a graph, and more particularly, to a method and apparatus for identifying vertices connected by edges in a graph and forming a triangle.
Various types of data may be represented as graphs consisting of vertices and edges. For example, data in various industries such as web pages, social networking services (SNSs), communications, finance, and bio/healthcare are related to each other, and a relationship between these data may be represented as a graph consisting of vertices and edges. A relative importance between web pages may be measured by representing web pages as vertices and representing hyperlinks between the web pages as edges.
Also, after various Internet data are represented as a graph, it is possible to find friends on SNS, detect fake accounts, find web spam, and find communities based on vertices forming a triangle in the graph. However, as the amount of data is rapidly increasing, the size of a graph representing such data is also increasing. As the size of a graph increases, it is difficult to load and calculate the entire graph data into a memory at once. For example, it is difficult to load and calculate a graph including 1 trillion edges into a memory at once, and it takes a lot of resources and time to calculate vertices forming a triangle.
Embodiments of the disclosure provide a method and apparatus for rapidly identifying vertices forming a triangle in a graph.
Additional aspects will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the presented embodiments.
According to an embodiment of the disclosure, a method used by a computing device including one or more memories, one or more processors, and one or more input/output devices to obtain a triangle in a graph including vertices and edges includes receiving an adjacency matrix in which two vertices connected to each edge are expressed in rows and columns, dividing the adjacency matrix into a plurality of blocks, for the plurality of blocks existing in the adjacency matrix, searching for a search area including a plurality of blocks located at (I,K), (I,J), and (J,K) satisfying I>=J>=K (where I, J, and K are block indexes), and identifying three vertices forming a triangle based on edge information existing in the search area.
According to an embodiment of the disclosure, a computing device includes a memory into which an adjacency matrix in which two vertices connected to an edge are expressed in rows and columns is loaded, and a processor configured to identify a combination of vertices constituting a triangle for graph data stored in the memory by using a method of calculating a triangle in a graph, wherein the method of calculating a triangle in a graph includes dividing the adjacency matrix into a plurality of blocks, for the plurality of blocks existing in the adjacency matrix, searching for a plurality of blocks located at (I,K), (I,J), and (J,K) satisfying I>=J>=K (where I, J, and K are block indexes) as a search area, and identifying three vertices forming a triangle based on edge information existing in the search area.
The above and other aspects, features, and advantages of certain embodiments will be more apparent from the following description taken in conjunction with the accompanying drawings, in which:
Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to like elements throughout. In this regard, the present embodiments may have different forms and should not be construed as being limited to the descriptions set forth herein.
Accordingly, the embodiments are merely described below, by referring to the figures, to explain aspects of the present description. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. Expressions such as “at least one of,” when preceding a list of elements, modify the entire list of elements and do not modify the individual elements of the list.
A method and apparatus for obtaining a triangle in a graph according to an embodiment of the disclosure will be described in detail with reference to the accompanying drawings.
Referring to
Referring to
Identification information expressed in numbers or characters may be assigned to the vertex 210 existing in the graph 200. For example, numbers that sequentially increase from a predefined number (e.g., 1) may be assigned to the vertices 210 of the graph 200. In addition, vertex identification information may be displayed in any of various forms such as a mixture of characters and numbers. However, the following will be described assuming that numbers that sequentially increase are assigned as identification information to the vertices 210 of the graph 200.
In the graph, there are triangles 230, 240, and 250 each including three vertices and three edges. For example, there is the triangle 230 including an edge between vertex 0 and vertex 1, an edge between vertex 1 and vertex 3, and an edge between vertex 0 and vertex 3. Also, there are the triangle 240 including vertices 0, 3, and 6 and edges between vertices 0, 3, and 6, and the triangle 250 including vertices 2, 4, and 6 and edges between vertices 2, 4, and 6. Triangles in a graph may be used in various fields such as finding friends on SNS, detecting fake accounts, finding web spam, and finding communities. A specific method of obtaining a triangle in a graph will be described below with reference to
Referring to
Identification information of each vertex and a mapped index respectively exist on horizontal and vertical axes of the adjacency matrix. In an embodiment, the identification information and the index may be the same or different from each other. In the present embodiment, the index is a value that sequentially increases from 0. For example, when the graph includes nine vertices, indexes of 0 to 8 may be respectively applied to the nine vertices. An edge of the graph may be expressed as indexes of two vertices. The present embodiment is merely an example, and the index assigned to each row and column of the adjacency matrix may be set to any of various values other than a value that increases from 0. Hereinafter, a position of each element in the adjacency matrix is indicated by using an index. For example, because an index of a first row of a matrix is 0 and an index of a second column is 1, (first row, second column) is expressed as (0, 1).
The computing device may perform a process of obtaining triangles in the graph based on the entire adjacency matrix. In another embodiment, in the adjacent matrix for an undirected graph, because an upper triangle area and a lower triangle area are symmetrical to each other with respect to a main diagonal 310, the computing device 100 may perform a process of obtaining triangles only for one of the two triangle areas. Hereinafter, a process of obtaining triangles in a graph based on an adjacency matrix 350 including only a lower triangle area from among two triangle areas with respect to a main diagonal of the adjacency matrix will be described.
Referring to
Shapes of the triangles 400, 410, and 420 of the adjacency matrix 400 corresponding to the triangles 230, 240, and 250 are right triangles. In other words, positions of three edges in the adjacency matrix 400 are (i,k), (i,j), and (j,k), and a relationship i>j>k is satisfied. Here, i, j, and k are indexes of three vertices.
When three vertices of the triangles 230, 240, and 250 in the graph of
Accordingly, the computing device may identify three vertices (i,j,k) forming a triangle in a graph by identifying positions (i,k), (i,j), and (j,k) of three edges satisfying i>j>k in an adjacency matrix. However, because the size of the adjacency matrix 400 of the graph including 1 trillion edges or more is large, it is difficult to load the adjacency matrix 400 into a memory at once and it takes a lot of time to identify all combinations of various positions of edges for the entire adjacency matrix 400. To solve these problems, the disclosure proposes a method of obtaining triangles in a graph by dividing the adjacency matrix 400 into blocks, which will be described with reference to
Referring to
The computing device 100 divides the adjacency matrix into a plurality of blocks (S510). For example, the computing device 100 may recursively divide the adjacency matrix so that a size of a block is equal to or less than a size that may be loaded into a memory (e.g., a main memory or a GPU memory). An example where an adjacency matrix is divided into a plurality of blocks is illustrated in
Referring back to
Accordingly, the computing device searches for a search area including a plurality of blocks located at (I,K), (I,J), and (J,K) satisfying I>=J>=K (where I, J, and K are block indexes), for the plurality of blocks (S520). A plurality of search areas including various combinations of blocks satisfying the above condition may be obtained. Alternatively, the computing device 100 may search for a search area including a plurality of blocks located at (I,K), (I,J), and (J,K) satisfying I<=J<=K. In this case, the computing device 100 may identify three vertices (i,j,k) forming a triangle in a graph by identifying positions (i,k), (i,j), and (j,k) of three edges satisfying i<j<k in the search area. However, for convenience of explanation, the following will be described assuming that a plurality of blocks located at (I,K), (I,J), and (J,K) satisfying I>=J>=K are searched.
In an embodiment, when the graph is an undirected graph, the computing device 100 may identify a search area by searching for blocks existing in an upper triangle area or a lower triangle area with respect to a main diagonal of the adjacency matrix. An example of a method of searching for blocks existing in a lower triangle area is illustrated in
The computing device 100 identifies three vertices forming a triangle based on edge information existing in the search area (S530). The computing device 100 may perform a process of identifying a triangle in the graph based on up to three blocks corresponding to the search area, instead of the entire graph data. There may be various combinations of search areas including up to three blocks, and the computing device 100 may perform a process of obtaining a triangle for each search area of the various block combinations.
In another embodiment, the computing device 100 may identify and provide the number of triangles in the graph (S540). An example of identifying the number of triangles by using a plurality of threads is illustrated in
Referring to
In an embodiment, the computing device 100 may divide the adjacency matrix 600 so that a size of a final block generated by recursive division is equal to or less than a predefined size. For example, when the computing device 100 processes graph data by using a GPU, the computing device 100 may divide the graph data so that a size of a block is equal to or less than the size of a GPU memory.
In the present embodiment, an example of dividing the adjacency matrix 600 into 3*3 blocks is illustrated. Each block may be expressed as index information. In a matrix including blocks, an index of 0 to 2 may be assigned to each row and an index of 0 to 2 may be assigned to each column. In this case, positions of the blocks may be expressed as (0,0), (1,0), (2,0), (0,1) . . . , and (2,2). A range and a start value of a block index value may be modified in various ways according to embodiments.
Referring to
The computing device 100 searches for three blocks 710, 720, and 730 satisfying block indexes I>=J>=K. Positions of the three blocks are (I,K), (I,J), and (J,K). Hereinafter, a block located at (I,K) is referred to as a pivot block 710, a block located at (I,J) is referred to as a horizontal block 720, and a block located at (J,K) is referred to as a vertical block 730.
Referring to
Referring to
Referring to
Referring to
When the pivot block 710 is selected, the computing device 100 searches for the horizontal block 720 while moving rightward from (2,0) that is the position of the pivot block by using the method of
When the search for the horizontal block 720 is completed, the computing device 100 searches for the vertical block 730 by using the method of
The computing device 100 may determine a combination of the vertical block 710, the horizontal block 720, and the vertical block 730 as a search area 1100. Because the number of horizontal blocks 720 searched in the present embodiment is 3 and the number of vertical blocks 730 is 3, a total of 9 different combinations (the pivot block 710, the horizontal block 720, the vertical block 730) may be made. However, because positions of the pivot block 710, the horizontal block 720, and the vertical block 730 should exist at (I,K), (I,J), and (J,K) satisfying the condition I>=J>=K, the number of combinations of blocks satisfying the condition is 3.
For example, when a position of the pivot block 710 is (2,0) and a position of the horizontal block 710 is (2,0), a position satisfying the above condition from among positions of the vertical block 730 is only one (0,0). In other words, when a position of the pivot block 710 and a position of the horizontal block 720 are determined, a position of the vertical block 730 satisfying the above condition is determined. Accordingly, because a process of obtaining a triangle in a graph only needs to be performed for three different combinations of blocks without having to perform a process of obtaining a triangle in a graph for all nine different combinations, the amount of calculation may be reduced.
In this way, the computing device 100 may identify various combinations of the horizontal block 720 and the vertical block 730 for each pivot block 710 searched in
Referring to
In an embodiment, calculation for obtaining a triangle in a graph may be performed by using one or more GPUs. A GPU may include a plurality of streaming multiprocessors (SMs), and each of the SMs may include a plurality of cores. Each core may execute one or more threads. Accordingly, in the present embodiment, a case of processing graph data by using three calculation blocks including two threads will be described. A plurality of calculation blocks may be implemented as a plurality of GPUs, or a plurality of cores or a plurality of SMs in a GPU.
Referring to
Referring to
Referring to
A positional relationship of edges constituting a triangle in the adjacency matrix is (i,k), (i,j), and (j,k), and i>j>k should be satisfied. Also, (i,k) is a position of an edge in the pivot block, (i,j) is a position of an edge in the horizontal block, and (j,k) is a position of an edge in the vertical block.
In
The first thread and the second thread determine whether there is an edge forming a triangle in the vertical block. For example, the first and second threads identify a position of an edge existing in the first row of the vertical block (i.e., a fifth row of the adjacency matrix). The computing device 100 may determine through the first and second threads that edges are located at (4,0), (4,1), (4,2), and (4,3) in the case of the first row of the vertical block.
The computing device 100 determines whether there is an edge of the pivot block in the same column as each edge located in the vertical block through the first thread and the second thread. In an embodiment, the computing device 100 may determine whether there is an edge of the pivot block and an edge of the vertical block in the same column by using the bitmap 1400. Because there is an edge (12(=i),0(=k)) of the pivot block in the same column as an edge (4(=j),0(=k)) of the vertical block, the first thread identifies three vertices (i,j,k)=(12,4,0) constituting a triangle, and increases the number of triangles by 1. Also, because there is an edge (12,1) of the pivot block in the same column as an edge (4,1) of the vertical block, the second thread may identify three vertices (12,4,1) constituting a triangle, and may increase the number of triangles by 1.
Referring back to
In this way, the computing device 100 may search for the vertices (12,4,0), (12,4,1), (12,4,3) of triangles including a first column (i=12) of the pivot block and a first column (j=4) of the horizontal block through the first calculation block.
Referring to
For example, the computing device 100 identifies edges (5,0), (5,2), and (5,3) located in the second row of the vertical block by using the first and second threads. The computing device 100 determines whether there is an edge of the pivot block located in the same column as each edge by using the two threads. Because there are edges of the pivot block located in the same columns as the edges (5,0) and (5,3) existing in a first column and a third column of the second row of the vertical block, the first thread identifies three vertices (12,5,0) and (12,5,3) forming triangles, and increases the number of triangles by 2.
Referring to
As a result, a total number of triangles obtained by searching the first row (i=12) of the pivot block, four columns (j=4,5,6,7) of the horizontal block, and the corresponding vertical block is 4.
The second calculation block and the third calculation block perform a process of obtaining a triangle on a second row (i=13) and a third row (i=14) of the pivot block in the same manner as above. When the first calculation block completes a process of obtaining a triangle on the first row of the pivot block, the first calculation block performs a process of obtaining a triangle on a fourth row (i=15) of the pivot block again.
When a process of obtaining a triangle in a search area of the pivot block, the horizontal block, and the vertical block is completed, the computing device 100 performs a process of obtaining a triangle for a next search area again. For example, when a process of obtaining a triangle for three blocks (2,0), (2,1), and (1,0) is completed in the example of
Referring to
The processor identifies and stores a plurality of search areas for obtaining a triangle in a graph, by using the method described with reference to
The processor reads the pivot block, the horizontal block, and the vertical block corresponding to the search area from the auxiliary storage device and loads the pivot block, the horizontal block, and the vertical block into the memory. For example, referring to
When calculation processes for obtaining a graph triangle are performed in parallel by using a plurality of GPU streams, the processor may load a plurality of search areas into a plurality of GPU memories. In the present embodiment, blocks corresponding to block indexes (2,2,0) corresponding to a first search area and blocks corresponding to block indexes (2,5,10) are respectively loaded into GPU memories.
The processor loads blocks corresponding to a first search area into a first GPU memory through a first GPU stream, and loads blocks corresponding to a second search area into a second GPU memory through a second GPU stream. One or more GPUs perform calculation processes for obtaining a graph triangle in parallel through a plurality of threads on blocks of search areas loaded into the first GPU memory and the second GPU memory.
Referring to
The disclosure may also be embodied as computer-readable code on a computer-readable recording medium. The computer-readable recording medium includes any data storage device that may store data which may be thereafter read by a computer system. Examples of the computer-readable recording medium include a read-only memory (ROM), a random-access memory (RAM), a compact disk (CD)-ROM, a magnetic tape, a floppy disk, and an optical data storage device. The computer-readable recording medium may also be distributed over network-coupled computer systems so that the computer-readable code is stored and executed in a distributive manner.
According to an embodiment of the disclosure, a time required to identify a triangle in a graph may be reduced In another embodiment, distributed processing may be performed by using one or more graphics processing units (GPUs). Also, when a triangle is obtained, a single machine may process graph data including 1 trillion edges.
It should be understood that embodiments described herein should be considered in a descriptive sense only and not for purposes of limitation. Descriptions of features or aspects within each embodiment should typically be considered as available for other similar features or aspects in other embodiments. While one or more embodiments have been described with reference to the figures, it will be understood by one of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2022-0114371 | Sep 2022 | KR | national |
10-2022-0162047 | Nov 2022 | KR | national |