This application claims priority to China Application Serial Number 202111304147.6, filed on Nov. 5, 2021, which is incorporated by reference in its entirety.
The present application relates to a neural network, particularly to a graph neural network processing method and an associated machine and system.
In a distributed environment, multiple graph neural network (GNN) machines work together to train the graph neural network. For any node, most of the attributes of adjacent nodes are stored in different graph neural network machines. Therefore, when each node performs sampling of adjacent nodes, the data needs to be transmitted back and forth between the machines, and the transmission time becomes the bottleneck in the distributed computing, causing the overall training and reasoning time to increase significantly. Therefore, how to reduce the amount of data transmission between machines in a distributed environment has become one of the urgent problems in this field.
One of the purposes of the present application is to provide a graph neural network processing method and a related machine and system to address the above-mentioned issues.
One embodiment of the present disclosure discloses a graph neural network processing method, for use in a master, which works jointly with a first worker and a second worker in a distributed environment to train the graph neural network, wherein the first worker and the second worker respectively store information of some of nodes of the graph neural network, the graph neural network method includes: receiving a first request sent from the first worker and a second request sent from the second worker, wherein the first worker sends the first request to the master to obtain at least an attribute of a first requested node, and the second worker sends the second request to the master to obtain at least an attribute of a second requested node; determining whether the first requested node and the second requested node are the same nodes and generating a determination result accordingly; and selectively performing broadcast or unicast to the first worker and the second worker, at least based on the determination result.
One embodiment of the present disclosure discloses a graph neural network processing method, for use in a first worker, which works jointly with a master and a second worker in a distributed environment to train the graph neural network, wherein the master, the first worker and the second worker respectively store information of some of nodes of the graph neural network, the graph neural network method includes: sending a first request to the master to obtain an attribute of the first node, wherein the first worker stores the attribute of the second node; receiving a broadcast content sent from the master; and subtracting the attribute of the second node from the broadcast content to obtain the attribute of the first node.
One embodiment of the present disclosure discloses a graph neural network machine, for use as a master, which works jointly with a first worker and a second worker in a distributed environment to train the graph neural network, wherein the master, the first worker and the second worker respectively store information of some of nodes of the graph neural network, the graph neural network machine includes: an input storage device, configured to store a first request sent from the first worker and a second request of the second worker, wherein the first worker sends the first request to the master to obtain at least an attribute of the a first requested node, and the second worker sends a second request to the master to obtain at least an attribute of the second requested node; and a controller, coupled to the input storage device, wherein the controller is configured to determine whether the first requested node and the second requested node are the same nodes, and generate a determination result accordingly; and the controller selectively performs broadcast or unicast to the first worker and the second worker at least based on the determination result.
One embodiment of the present disclosure discloses a graph neural network machine, for use as a first worker, which works jointly with a master and a second worker in a distributed environment to train the graph neural network, wherein the master, the first worker and the second worker respectively store information of some of nodes of the graph neural network, and the graph neural network machine includes: an input storage device, configured to store a broadcast content sent from the master, wherein the first worker sends the first request to the master to obtain at least an attribute of the first requested node, and the first worker stores the attribute of the second node; and a controller, coupled to the input storage device, wherein the controller subtracts the attribute of the second node from the broadcast content to obtain the attribute of the first node.
One embodiment of the present disclosure discloses a graph neural network system, including: a first machine, including the foregoing graph neural network machine; and a second machine, including the foregoing graph neural network machine.
The graph neural network method and the related machine and system provided by the present application can reduce the amount of data transmitted between each machine when training the graph neural network in a distributed environment, thereby reducing the overall training time.
The following disclosure provides many different embodiments or examples for implementing different features of the present disclosure. Specific examples of components and arrangements are described below to simplify the present disclosure. These are, of course, merely examples and are not intended to be limiting. For example, the formation of a first feature over or on a second feature in the description that follows may include embodiments in which the first and second features are formed in direct contact and may also include embodiments in which additional features may be formed between the first and second features, such that the first and second features may not be in direct contact. In addition, the present disclosure may repeat reference numerals and/or letters in the various embodiments. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed.
Further, spatially relative terms, such as “beneath”, “below”, “lower”, “above”, “upper,” and the like, may be used herein for ease of description to discuss one element or feature's relationship to another element(s) or feature(s) as illustrated in the drawings. These spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the drawings. The apparatus may be otherwise oriented (e.g., rotated by 90 degrees or at other orientations), and the spatially relative descriptors used herein may likewise be interpreted accordingly.
Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the invention are approximations, the numerical values set forth in the specific examples are reported as precisely as possible. Any numerical value, however, inherently contains certain errors necessarily resulting from the standard deviation found in the respective testing measurements. Also, as used herein, the term “the same” generally means within 10%, 5%, 1%, or 0.5% of a given value or range. Alternatively, the term “the same” means within an acceptable standard error of the mean when considered by one of ordinary skill in the art. As could be appreciated, other than in the operating/working examples, or unless otherwise expressly specified, all of the numerical ranges, amounts, values, and percentages (such as those for quantities of materials, duration of times, temperatures, operating conditions, portions of amounts, and the likes) disclosed herein should be understood as modified in all instances by the term “the same.” Accordingly, unless indicated to the contrary, the numerical parameters set forth in the present disclosure and attached claims are approximations that can vary as desired. At the very least, each numerical parameter should at least be construed in light of the number of reported significant digits and by applying ordinary rounding techniques. Here, ranges can be expressed herein as from one endpoint to another endpoint or between two endpoints. All ranges disclosed herein are inclusive of the endpoints, unless specified otherwise.
The graph neural network has a plurality of nodes, and the machine 1, the machine 2 and the machine 3 respectively store (e.g., in a cache) some of the nodes. When the machine 1, the machine 2 and the machine 3 train the graph neural network, they have to exchange information of the nodes (e.g., attributes of the nodes) with each other from time to time. This scenario takes places quite frequently when, for example, a certain node needs to sample the neighboring nodes. Take
The graph neural network processing method 200 includes the Step 202 to the Step 204. In the Step 202, the machine 3 serving as the master receives a first request sent from the machine 1 serving as the worker and a second request sent from the machine 2 serving as the worker, wherein the machine 1 sends the first request to the machine 3 to obtain at least an attribute of the first requested node, and the machine 2 sends the second request to the machine 3 to obtain at least an attribute of the second requested node.
For example, in the embodiment of
According to the existing method, the machine 3 sends the attributes of the node {circle around (1)}, the node {circle around (2)} and the node {circle around (4)} to the machine 1 through 3 unicasts to complete the request of the machine 1, and the machine 3 sends the attributes of the node {circle around (1)}, the node {circle around (3)} and the node {circle around (5)} to the machine 2 through 3 unicasts to complete the request of the machine 2. However, in the Step 204 of the graph neural network method 200, the machine 3 serving as the master determines whether said the request node and the second request node are the same nodes, and generates a determination result accordingly. Then, in the Step 206, the machine 3 serving as the master selectively performs broadcast or unicast to the machine 1 as well as the machine 2 based at least on the determination result.
In other words, embodiments of the present disclosure may comprehensively refer to known information to determine whether some of the original 6 unicast can be combined and compressed into one broadcast, so as to reduce the overall data transmission time. Hereinafter, several implementation details of the graph neural network method 200 will be explained in more detail through the schematic diagrams of the embodiment of the operation of the machine 3 serving as the master shown in
Specifically, the requested nodes are categorized into three types according to the present disclosure.
The first one of the three types is a repetitive request type, wherein a node being the repetitive request type means that the node is requested by two workers at the same time. From the scenario of
The second one of the three types is a pending paring type. From the scenario of
The node of the pending paring type in the request from the machine 1 can be paired with the node of the pending paring type in the request from the machine 2, and then sent to both the machine 1 and the machine 2 by means of a broadcast after the computation. For example, the machine 3 can sum up the attribute of the node {circle around (2)} and the attribute of the node {circle around (3)} and then sends them to both the machine 1 and the machine 2 by means of a broadcast. The machine 1 can simply subtract the attribute of the node {circle around (3)} (originally stored in the machine 1) from the received result to obtain the attribute of the node {circle around (2)}. In certain embodiments, operations other than addition, such as multiplication, etc., may also be used. It should be noted that if there are cases where the nodes of the pending paring type cannot be fully paired, such as in the scenario in
The last of the three types is a remaining type; that is, all the nodes that do not belong to either the repetitive request type or the pending paring type are of the remaining type. The machine 3 sends the attributes of the nodes of the remaining type to the request party by means of unicast.
In sum, the machine 3 may selectively perform broadcast or unicast to the machine 1 and the machine 2 at least based on requests of the machine 1 and the machine 2.
As shown in
Next, the controller of the machine 3 controls the first input storage device 302 and the second input storage device 304 to output the number respectively, as shown in
Next, reference is made to
After the number of the node {circle around (2)} is cleared from the comparator 306, only the number of the node {circle around (3)} remains in the comparator 306, so as shown in
In
After the number of the node {circle around (3)} is cleared from the comparator 306, only the number of the node {circle around (4)} remains in the comparator 306, so as shown in
Next, as shown in
As mentioned above, the attributes of the nodes in the first output storage device 308 and the second output storage device 312 or the computed results of the attributes of multiple nodes will be broadcast to the machine 1 and machine 2; the attributes of the nodes in the third output the storage device 314 are sent to the machine 1 or the machine 2 in a unicast manner. In this embodiment, in a time-sharing manner, the attribute of the node {circle around (1)} will be sent to the machine 1 and the machine 2 in sequence through the first broadcast, and the attribute of the node {circle around (2)} will be correlated with the attribute of the node {circle around (3)} through the second broadcast. The result of the addition is sent to the machine 1 and the machine 2, the attribute of the node {circle around (4)} is sent to the machine 1 through the first unicast, and the attribute of the node {circle around (5)} is sent to the machine 2 through the second unicast. That is, the original six unicasts are replaced by two broadcasts and two unicasts, which reduces the time of data transmission.
The main scenario shown in the graph neural network processing method 400 is that the first node requested by the machine 1 and the second node requested by the machine 2 are both of pending paring type (that is, the machine 1 stores the attribute of the second node, the machine 2 stores the attribute of the first node), and the machine 3 pairs the first node and the second node with each other, sums up the attribute of the first node and the second node, and sends the result to the machine 1 and the machine 2 in a broadcast manner.
The graph neural network processing method 400 includes the Step 402 to the Step 404. In the Step 402, the machine 1 serving as the worker sends a first request to the machine 3 serving as the master to obtain the attribute of the first node; at the same time, the machine 2 serving as the worker sends a second request to the machine 3 to obtain the attribute of the second node, and the machine 1 stores the attribute of the second node. Then, in the Step 404, the machine 1 receives the broadcast content sent from the machine 3, wherein the broadcast content includes a summation result of the attribute of the first node and the attribute of the second. Consequently, in the Step 406, the machine 1 subtracts the attribute of the second node from the broadcast content to obtain the attribute of the first node.
Hereinafter, several implementation details of the graph neural network processing method 400 will be explained in more detail through the schematic diagrams of the embodiment of the operation of the machine 1 serving as a worker shown in
As shown in
Therefore, the controller in the machine 1 can directly output the attributes of the nodes in the first input storage device 102 and the third input storage device 106 to the storage device 112 for use. However, for the computation result of the attributes of the nodes in the second input storage device 104, a reverse computation is required to decode the attribute of the node that is actually requested. As shown in
The machine 2, similar to the machine 1, also serves as a worker, and hence, the embodiment of the operation of the machine 2 is similar to that of the machine 1, and the details is omitted for the sake of brevity. Further, since the machine 1, the machine 2, and the machine 3 all have the opportunity to serve as a master or a worker, the machine 1, the machine 2, and the machine 3 can all have the components required as a master and as a worker at the same time, so that they can be used in the operation as a master and a worker.
The graph neural network machine, graph neural network system, and graph neural network processing method according to the present disclosure can speed up the overall training time without cutting any edge in the graph neural network, and hence, would not affect the training effect.
The foregoing description briefly sets forth the features of certain embodiments of the present application so that persons having ordinary skill in the art more fully understand the various aspects of the disclosure of the present application. It will be apparent to those having ordinary skill in the art that they can easily use the disclosure of the present application as a basis for designing or modifying other processes and structures to achieve the same purposes and/or benefits as the embodiments herein. It should be understood by those having ordinary skill in the art that these equivalent implementations still fall within the spirit and scope of the disclosure of the present application and that they may be subject to various variations, substitutions, and alterations without departing from the spirit and scope of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
202111304147.6 | Nov 2021 | CN | national |