The present invention relates to an assignment device that assigns weights in a neural network to chips of an operation device that executes operations of a neural network by a plurality of chips.
Patent literatures 1 and 2 describe circuits, etc. that perform parallel processing.
In addition, non-patent literature 1 describes a device that processes one frame and the next frame in a video with different circuits.
Non-patent literature 2 describes a device that performs the processing of the first through nth layer of a neural network, and the processing of the (n+1)th and subsequent layers with different circuits.
In addition, grouped convolution is described in non-patent literature 3.
Non-Patent literature 4 describes a technique to set a weight in a neural network to zero.
Non-patent literature 5 describes a technique to reduce a weight in a neural network.
In recent years, operations of a neural network have become increasingly large-scale. This makes it difficult to perform high-speed operations when operations of a neural network are performed on a single chip.
On the other hand, it is possible to perform neural network operations on multiple chips. In such a case, if the amount of data communication between chips increases, it becomes difficult to perform high-speed operations.
Therefore, it is an object of the present invention to provide an assignment device, an assignment method, and an assignment program that can define edges between neighboring layers so that the amount of data communication between chips can be suppressed, and also can assign weights to chips of an operation device that performs neural network operations by a plurality of chips.
An assignment device according to the present invention comprises: a learning unit which learns a weight for each edge connecting a channel in a first layer that is a layer in a neural network, and a channel in a 0th layer which is a previous layer to the first layer, a determination unit which divides channels in the 0th layer and channels in the first layer into groups whose number is equal to the number of chips that are included in an operation device executing an operation of the neural network using a learning result of the weight for each edge, respectively, determines association of the groups of the channels in the 0th layer and the groups of the channels in the first layer and the chips included in the operation device, and edges to be removed, and removes the edges to be removed, and a weight assignment unit which stores the weights for the edges each connecting the channel in the 0th layer and the channel in the first layer to a weight storage unit in the chip corresponding to the edge.
An assignment method according to the present invention is executed by a computer, and comprises: executing a learning process for learning a weight for each edge connecting a channel in a first layer that is a layer in a neural network, and a channel in a 0th layer which is a previous layer to the first layer, executing a determination process for dividing channels in the 0th layer and channels in the first layer into groups whose number is equal to the number of chips that are included in an operation device executing an operation of the neural network using a learning result of the weight for each edge, respectively, determining association of the groups of the channels in the 0th layer and the groups of the channels in the first layer and the chips included in the operation device, and edges to be removed, and removing the edges to be removed, and executing a weight assignment process for storing the weights for the edges each connecting the channel in the 0th layer and the channel in the first layer to a weight storage unit in the chip corresponding to the edge.
An assignment program according to the present invention causes a computer to execute: a learning process for learning a weight for each edge connecting a channel in a first layer that is a layer in a neural network, and a channel in a 0th layer which is a previous layer to the first layer, a determination process for dividing channels in the 0th layer and channels in the first layer into groups whose number is equal to the number of chips that are included in an operation device executing an operation of the neural network using a learning result of the weight for each edge, respectively, determining association of the groups of the channels in the 0th layer and the groups of the channels in the first layer and the chips included in the operation device, and edges to be removed, and removing the edges to be removed, and a weight assignment process for storing the weights for the edges each connecting the channel in the 0th layer and the channel in the first layer to a weight storage unit in the chip corresponding to the edge.
According to the invention, the edges between neighboring layers can be defined so that the amount of data communication between chips can be suppressed, and weights can be assigned to the chips of the operation device that performs the neural network operations by a plurality of chips.
Before explaining the example embodiment of the present invention, an operation of a neural network is explained. In the operation of a neural network, when calculating values in a layer, the values calculated in the previous layer are used. Such calculation of values is performed sequentially for each layer. In the following explanation, the layer for which values are to be calculated and the previous layer are focused on. The layer where the values are to be calculated is called the L1 layer, and the layer before the L1 layer is called the L0 layer, where the values have already been calculated.
Each layer contains a plurality of channels. The L0 and L1 layers also contain a plurality of channels, respectively.
In the example shown in
The individual circles in
The set of values for each channel is referred to as the feature value group.
In the example shown in
In order to calculate sets of feature values in the L1 layer, weights are determined by learning to the connections between the channels in the L1 layer and the channels in the L0 layer. The connection between the channels for which weights are determined is called edge. In the example shown in
Each feature value group of the L1 layer is calculated by the weights and the feature value group of the L0 layer.
The feature value group C11 corresponding to the channel CH1 of the L1 layer is calculated using the feature value group C01, the weight W11, the feature value group C02, and the weight W21 (refer to
Similarly, the feature value group C12 corresponding to the channel CH2 of the L1 layer is calculated using the feature value group C01, the weight W12, the feature value group C02, and the weight W22 (refer to
Similarly, the feature value group C13 corresponding to the channel CH3 of the L1 layer is calculated using the feature value group C01, the weight W13, the feature value group C02, and the weight W23 (refer to
The chip 10 comprises a weight storage unit 11, an operation circuit 12, and a communication circuit 13.
Similarly, the chip 20 comprises a weight storage unit 21, an operation circuit 22, and a communication circuit 23.
The weight storage units 11, 21 is realized by a memory in the chip. The operation circuits 12, 22 are realized by a processor in the chip. The communication circuits 13, 23 are realized by a communication interface for inter-chip communication.
Here, the case of calculating the feature value group of the L1 layer from the feature value group of the L0 layer will be used as an example. The operation method between the other layers may be the same as the operation method for calculating the feature value group of the L1 layer from the feature value group of the L0 layer.
The operation circuits 12, 22 calculate the feature value group of the L1 layer from the feature value group of the L0 layer.
It is assumed that channels of the L0 layer and channels of the L1 layer is, respectively, divided into groups whose number is equal to the number of the chips (2 in this example) that are included in the operation device 1. The number of channels belonging to one group may be 0 or 1.
In addition, groups of channels in the L0 layer, groups of channels in the L1 layer, and chips are associated. In this example, it is assumed that group A of the L0 layer, group A of the L1 layer and chip 10 is associated, and group B of the L0 layer, group B of the L1 layer and chip 20 is associated.
It is assumed that the weight storage unit 11 in the chip 10 stores weights W11, W12, W21, and W22 of the edges connecting the channels CH1 and CH2 belonging to the group A of the L1 layer corresponding to the chip 10 and each channel of the L0 layer. Similarly, the weight storage unit 21 in the chip 20 shall store the weights W13 and W23 of the edges connecting the channel CH3 belonging to the group B of the L1 layer corresponding to the chip 20 and each channel of the L0 layer.
The operation circuit 12 in the chip 10 calculates the feature value groups C11 and C12 of channels CH1 and CH2 belonging to group A of the L1 layer corresponding to the chip 10. The operation circuit 22 in the chip 20 calculates the feature value group C13 of channel CH3 belonging to group B of the L1 layer corresponding to the chip 20. However, in this example, data communication is required between the chips 10, 20.
The operation circuit 12 in the chip 10 calculates the feature value group C11 using the feature value group C01, weight W11, feature value group C02, and weight W21 (refer to
The operation circuit 12 in the chip 10 calculates the feature value group C12 using the feature value group C01, the weight W12, the feature value group C02, and the weight W22 (refer to
The operation circuit 22 in the chip 20 calculates the feature value group C13 using the feature value group C01, the weight W13, the feature value group C02, and the weight W23 (refer to
As shown in
Each example embodiment of the present invention describes an assignment device that defines the edge between the L0 layer and the L1 layer so that the amount of data communication between chips can be suppressed, and also assigns weights to each chip in the operation device 1. As mentioned above, for the sake of simplicity of explanation, the case where the operation device 1 comprises two chips 10, 20 is used as an example, but the operation device 1 may comprise three or more chips.
In the following explanation, it is assumed that the multiple channels in the L0 and L1 layers are represented as illustrated in
Then, based on the channels in each of the L0 and L1 layers in the initial state, and each edge between the L0 and L1 layers in the initial state, the assignment device of the present example embodiment determines the weight of each edge, grouping of channels in the L0 layer, grouping of channels in the L1 layer, the association of the groups of the channels in the L0 layer and the groups of the channels in the L1 layer and the chips included in the operation device 1, and edges to be removed. The assignment device of the present example embodiment also removes the edges to be removed.
The learning unit 31 learns the weight of each edge that connects each channel in the L0 layer to each channel in the L1 layer. As mentioned above, in the example shown in
The method by which the learning unit 31 learns the weight of each edge may be any known method and is not limited. The learning unit 31 may learn the weights of each edge so that the weights of some edges (for example, a predetermined percentage of the number of edges) are as 0 or close to 0 as possible.
Using the results of learning the weights of each edge, the determination unit 32 divides the channels of the L0 layer and the channels of the L1 layer into groups whose number is equal to the number of the chips 10, 20 (2 in this example) included in the operation device 1 (refer to
The following is a more specific description of the determination unit 32.
The candidate generation unit 34 included in the determination unit 32 generates a plurality of candidates for combination of the grouping of the channels in the L0 layer, the grouping of the channels in the L1 layer, the association of the groups of the channels in the L0 layer and the groups of the channels in the L1 layer and the chips, and edges to be removed. The number of channels belonging to one group may be 0 or 1.
However, the candidate generation unit 34 makes sure that the number of groups in any of the L0 and L1 layers in each candidate is the same as the number of chips included in the operation device 1.
When associating the groups of the channels in the L0 layer and the groups of the channels in the L1 layer and the chips, the association is determined so that one group of channels in the L0 layer is not associated to multiple groups of channels in the L1 layer or to multiple chips. The same is true for groups of channels in the L1 layer and chips. This is also the case in the second example embodiment described below.
There are more than one way to define each of “grouping of the channels in the L0 layer”, “grouping of the channels in the L1 layer”, “the association of the groups of the channels in the L0 layer and the groups of the channels in the L1 layer and the chips” and “edges to be removed”.
The candidate generation unit 34 may exhaustively generate candidates for combination of the grouping of the channels in the L0 layer, the grouping of the channels in the L1 layer, the association of the groups of the channels in the L0 layer and the groups of the channels in the L1 layer and the chips, and edges to be removed.
Alternatively, the candidate generation unit 34 may generate a plurality of candidates for combination under predetermined condition.
For example, the candidate generation unit 34 may identify predetermined number of edges in the order that the weights are close to 0, and generate a plurality of candidates for combination of the grouping of the channels in the L0 layer, the grouping of the channels in the L1 layer, the association of the groups of the channels in the L0 layer and the groups of the channels in the L1 layer and the chips, and edges to be removed, under the condition that the identified predetermined number of edges are defined as the edges to be removed.
For example, the candidate generation unit 34 may identify one edge whose weight is closest to 0, and generate a plurality of candidates for combination of the grouping of the channels in the L0 layer, the grouping of the channels in the L1 layer, the association of the groups of the channels in the L0 layer and the groups of the channels in the L1 layer and the chips, and edges to be removed, under the condition that the identified edge is defined as the edge to be removed.
The simulation execution unit 35 included in the determination unit 32 executes the simulation of the operation of the neural network in the operation device 1 for each candidate of the combination generated by the candidate generation unit 34. The simulation of the operation of the neural network is the simulation of the operation of sequentially calculating the feature value groups of the channels in each layer from the input layer to the output layer of the neural network and deriving the result in the output layer. Here, the candidate generation unit 34 focuses on the L0 layer and the L1 layer, and generates candidates for the combination of the grouping of the channels in the L0 layer, the grouping of the channels in the L1 layer, the association of the groups of the channels in the L0 layer and the groups of the channels in the L1 layer and the chips, and edges to be removed. The state of the neural network before the L0 layer and the state of the neural network after the L1 layer can be fixed by the simulation execution unit 35. In this way, by fixing the states of the neural networks other than those defined as candidates, it is possible to sequentially calculate the feature value groups of the channels in each layer from the input layer to the output layer and derive the results in the output layer.
The test data storage unit 37 is a storage device that stores a plurality of pairs of data input in the above simulation (hereinafter referred to as test data) and the correct answer data of the neural network operation corresponding to the test data. For example, suppose that the neural network operation outputs the estimation result of an object in an image. In this case, the pairs of the image and the data indicating the actual objects in the image can be used as the pairs of the test data and the correct answer data. In the following, the case where the result of the neural network operation is the estimated result of the objects in the image will be used as an example.
The simulation execution unit 35 sequentially selects the candidates one by one. Then, for the selected candidate, the simulation execution unit 35 sequentially calculates the feature value groups of the channels in each layer from the input layer to the output layer, using the individual test data (images) as input data respectively, and derives the estimation result of the object in the images. Then, the simulation execution unit 35 compares the estimation result with the correct answer data corresponding to the input data, and calculates the rate (i.e., the correct answer rate) of the number of correct answers of the estimation results (results obtained by simulation) to the number of pairs of test data and correct answer data.
For each selected candidate, the simulation execution unit 35 measures the number of test data (images) processed per second (in this example, Frame Per Second (FPS)) in the simulation, while sequentially calculating the feature value groups of the channels in each layer from the input layer to the output layer using the individual test data (images) as input data and deriving the estimation results of the objects in the images.
The simulation execution unit 35 then calculates the sum of the correct answer rate and the FPS for each selected candidate.
The correct answer rate is an index that indicates the accuracy of the operation for the selected candidate. The larger the value of the correct rate, the better the accuracy of the operation. The FPS is an index of the speed of the operation for the selected candidate. Therefore, it can be said that the sum of the correct answer rate and the FPS is an index that represents both the accuracy and the speed of the operation of the selected candidate. In other words, the greater the sum of the correct answer rate and the FPS, the better the overall accuracy of the operation and the faster the operation.
In addition, the fact that the amount of data communication between chips is small is one of the factors that makes the operation faster. Therefore, it can be said that if the sum of the correct answer rate and the FPS is large, the amount of data communication between chips tends to be small.
An index other than the “sum of the correct answer rate and the FPS” may be used as an index that represents both the accuracy of the operation and the speed of the operation. In the following description, the case where the simulation execution unit 35 calculates the sum of the correct answer rate and the FPS as an index that represents both the accuracy of the operation and the speed of the operation will be used as an example.
The combination determination unit 36 included in the determination unit 32 determines the combination that corresponds to the candidate with the largest sum of the correct answer rate and FPS as the combination of the grouping of the channels in the L0 layer, the grouping of the channels in the L1 layer, the association of the groups of the channels in the L0 layer and the groups of the channels in the L1 layer and the chips, and edges to be removed. As a result, the combination the grouping of the channels in the L0 layer, the grouping of the channels in the L1 layer, the association of the groups of the channels in the L0 layer and the groups of the channels in the L1 layer and the chips, and edges to be removed have been determined.
In addition, the combination determination unit 36 removes the edges to be removed included in that combination from edges between the L0 and L1 layers.
The weight assignment unit 33 stores the weights of the edges each connecting the channels of the L0 layer and the channels of the L1 layer in the weight storage unit in the chip corresponding to the edge, based on the combination determined by the combination determination unit 36. In other words, the weight assignment unit 33 causes the weight storage unit in the chip corresponding to the edge to store the weights of the edges that remain unremoved by the combination determination unit 36.
The following is an example of an operation in which the weight assignment unit 33 stores the weight of an edge in the weight storage unit in the chip corresponding to the edge. When the weight assignment unit 33 stores the weight of one edge in the weight storage unit, for example, it stores the weight of the edge in the weight storage unit in the chip corresponding to the group to which the L1 layer channel belongs among the L0 layer channel and L1 layer channel connected by that edge. For example, suppose that the edge connecting the channel CH1 of the L0 layer and the channel CH1 of the L1 layer shown in
However, the operation of storing the edge weights in the weight storage unit in the chip according to the edge is not limited to the above example, but may be other operations.
The weight assignment unit 33 comprises an interface (not shown in
The weight assignment unit 33 is realized by, for example, a CPU (Central Processing Unit) of a computer that operates according to an assignment program, and the interface of the computer (more specifically, the interface with the respective chips 10, 20 of the operation device 1. Hereinafter referred to as the chip interface). For example, the CPU may read the assignment program from a program recording medium such as a program storage device of the computer, and operate as the weight assignment unit 33 using the chip interface according to the assignment program.
The determination unit 32 including the candidate generation unit 34, the simulation execution unit 35, and the combination determination unit 36, and the learning unit 31 are realized by, for example, the CPU of the computer that operates according to the assignment program. For example, the CPU may read the assignment program from the program recording medium as described above, and operate as the determination unit 32 including the candidate generation unit 34, the simulation execution unit 35, and the combination determination unit 36, and the learning unit 31 according to the assignment program.
The test data storage unit 37 is realized by, for example, a storage device included in the computer.
Next, the processing progress will be explained.
As mentioned above, it is assumed that the multiple channels in the L0 and L1 layers are represented as illustrated in
First, the learning unit 31 learns the weight of each edge connecting each channel of the L0 layer and each channel of the L1 layer (Step S1). As a result of Step S1, the weights W11, W12, W13, W21, W22, W23 (refer to
Next, the candidate generation unit 34 generates a plurality of candidate for combination of the grouping of the channels in the L0 layer, the grouping of the channels in the L1 layer, the association of the groups of the channels in the L0 layer and the groups of the channels in the L1 layer and the chips, and edges to be removed (Step S2).
In step S2, the candidate generation unit 34 may identify predetermined number of edges in the order that the weights are close to 0, and generate a plurality of candidates, under the condition that the identified predetermined number of edges are defined as the edges to be removed.
In step S2, the candidate generation unit 34 may identify one edge whose weight is closest to 0, and generate a plurality of candidates, under the condition that the identified edge is defined as the edge to be removed.
In step S2, the candidate generation unit 34 may generate a plurality of candidates exhaustively.
Next to Step S2, the simulation execution unit 35 determines whether there are any candidates that have not yet been selected in Step S4 among the candidates generated in Step S2 (Step S3). If there is a candidate that has not yet been selected in Step S4 (Yes in Step S3), the process moves to Step S4. When moving from Step S2 to Step S3, since none of the candidates have been selected yet, the process moves to Step S4.
In step S4, the simulation execution unit 35 selects one unselected candidate among the candidates generated in step S2.
Next to step S4, the simulation execution unit 35 performs a simulation of the operation of the neural network in the operation device 1 using the individual test data stored in the test data storage unit 37 with respect to the selected candidate. Further, the simulation execution unit 35 calculates the sum of the correct answer rate of the operation result in the simulation and the FPS in the simulation (step S5).
After step S5, repeat the process from step S3 onward.
In step S3, if the simulation execution unit 35 determines that there are no unselected candidates (No in step S3), it moves to step S6 (refer to
In step S6, the combination determination unit 36 determines the combination that corresponds to the candidate with the largest sum of the correct answer rate and FPS as the combination of the grouping of the channels in the L0 layer, the grouping of the channels in the L1 layer, the association of the groups of the channels in the L0 layer and the groups of the channels in the L1 layer and the chips, and edges to be removed. In addition, the combination determination unit 36 removes the edges to be removed that are included in the combination.
As a result of step S6, the grouping of the channels in the L0 layer, the grouping of the channels in the L1 layer, the association of the groups of the channels in the L0 layer and the groups of the channels in the L1 layer and the chips are determined, and the edges to be removed are removed.
As a result of step S6, the above state is assumed to be defined.
After step S6, the weight assignment unit 33 stores the weights of the edges that remain unremoved, based on the combination determined in step S6, in the weight storage unit in the chip corresponding to each edge (step S7).
When storing the weight of one edge in the weight storage unit, for example, the weight assignment unit 33 stores the weight of the edge in the weight storage unit in the chip corresponding to the group to which the L1 layer channel belongs among the L0 layer channel and L1 layer channel connected by that edge. For example, in this example, the weight assignment unit 33 stores the weights W11 and W21 in the weight storage unit 11 in the chip 10 corresponding to the group A to which the channel CH1 of the L1 layer belongs. The weight assignment unit 33 stores the weight W22 in the weight storage unit 21 in the chip 20 corresponding to the group B to which the channel CH2 of the L1 layer belongs. The weight assignment unit 33 stores the weight W23 in the weight storage unit 21 in the chip 20 corresponding to the group B to which the channel CH3 of the L1 layer belongs.
Next, the operation in which the operation device 1, which has stored the weights as described above, calculates the feature value groups of the L1 layer from the feature value groups of the L0 layer is explained. It is assumed that the states of the neural network before the L0 layer and after the L1 layer are also defined.
The operation circuit 12 (refer to
The operation circuit 12 calculates the feature value group C11 corresponding to the channel CH1 of the L1 layer using the feature value group C01, the weight W11, the feature value group C02, and the weight W21 (refer to
The operation circuit 12 then calculates the feature value group C11 by using the feature value group C01, weight W11, feature value group C02, and weight W21 as described above.
The operation circuit 22 calculates the feature value group C12 corresponding to the channel CH2 of the L1 layer using the feature value group C02 and the weights W22 (refer to
Similarly, the operation circuit 22 calculates the feature value group C13 corresponding to the channel CH3 of the L1 layer using the feature value group C02 and the weights W23 (refer to
The operation circuits 12, 22 sequentially calculate the feature value groups for each layer after the L1 layer.
As described above, the operation device 1 may perform data communication between chips in order to calculate some feature value groups of the L1 layer (feature value group C11 in the above example). However, it is not necessary to perform data communication every time all feature value groups of the L1 layer are calculated respectively. Therefore, the operation speed in the operation device 1 can be accelerated.
In other words, in the present example embodiment, the candidate generation unit 34 generates a plurality of candidates for combinations. Then, the simulation execution unit 35 executes a simulation of the operation of the neural network in the operation device 1 for each candidate, and calculates the sum of the correct answer rate and the FPS (an index representing both the accuracy of the operation and the speed of the operation). Then, the combination determination unit 36 determines the combination corresponding to the candidate with the largest sum of the correct answer rate and the FPS, and removes the edges to be removed included in the combination. Then, the weight assignment unit 33 stores the weights of the edges that remain unremoved based on the combination in the weight storage unit in the chip corresponding to the edge. Thus, according to the present example embodiment, the edges between neighboring layers can be defined so that the amount of data communication between chips can be suppressed, and weights can be assigned to the chips of the operation device that executes the operation of the neural network by a plurality of chips.
In the present example embodiment, the learning unit 31 may re-learn the weights of the edges that remain unremoved after step 6.
For each neighboring layers, the assignment device 30 may determine the combination of the grouping of the channels in the L0 layer, the grouping of the channels in the L1 layer, the association of the groups of the channels in the L0 layer and the groups of the channels in the L1 layer and the chips, and edges to be removed, and remove the edges to be removed.
The candidate generation unit 34 may generate a plurality of candidates for the combination of grouping of the channels in each layer, association of the groups of the channels of each layer and the chips, and edges to be removed, from the input layer to the output layer as a whole. Then, the simulation execution unit 35 may performs a simulation of the operation for each candidate and calculate the sum of the correct answer rate and the FPS. Then, the combination determination unit 36 may determine the combination corresponding to the candidate with the largest sum of the correct answer rate and the FPS, and remove the edge to be removed included in that combination.
In the second example embodiment, it is also assumed that the plural channels in the L0 and L1 layers are represented as illustrated in
The learning unit 41 learns the weight of each edge that connects each channel in the L0 layer to each channel in the L1 layer. At this time, the learning unit 41 learns the weights of each edge so that the weights of a predetermined percentage of the number of edges among those edges are as 0 or close to 0 as possible. However, it does not necessarily mean that the weights learned to be as 0 or close to 0 as possible will be such values. For example, even if the weights of an edge are learned to be as 0 or close to 0 as possible, the result may be that the weight of that edge becomes a value such as “5”.
In the example shown in
The learning unit 41 may learn the weight of each edge so that the weight of each edge is as 0 or close to 0 as possible. However, in this learning, not all the edge weights will be 0 or close to 0.
The determination unit 42 compares the weight of each edge obtained by learning with a predetermined threshold value, and removes the edges whose weight is equal to or less than the threshold value. This threshold value is a threshold value for distinguishing weights with values of 0 or close to 0 from those without, and is defined as a value that is relatively close to 0. In this example, weights W13 and W21 are equal to or less than the threshold value. The other weights W11, W12, W22, and W23 are larger than the threshold value. Therefore, the determination unit 42 removes the edge connecting the channel CH1 of the L0 layer and the channel CH3 of the L1 layer, and the edge connecting the channel CH2 of the L0 layer and the channel CH1 of the L1 layer (refer to
The determination unit 42 divides the channels of the L01 layer and the channels of the L1 layer into groups whose number is equal to the number of the chips 10, 20 (2 in this example) included in the operation device 1 (refer to
However, the determination unit 42 determines the grouping of the channels in the L0 layer, the grouping of the channels in the L1 layer, and the association of the groups of the channels in the L0 layer and the groups of the channels in the L1 layer and the chips so that a condition that the channels, connected by the removed edges, in the L0 layer and in the L1 layer, belong to non-corresponding groups of channels in the L0 layer and in the L1 layer respectively, is satisfied. The “non-corresponding groups of channels in the L0 layer and in the L1 layer” can also be expressed as “groups of channels in the L0 layer and in the L1 layer not corresponding to the same chip”.
In the above example, the determination unit 42 removes the edge connecting the channel CH1 of the L0 layer and the channel CH3 of the L1 layer, and the edge connecting the channel CH2 of the L0 layer and the channel CH1 of the L1 layer. Thus, in this case, the determination unit 42 determines the grouping of the channels in the L0 layer, the grouping of the channels in the L1 layer, and the association of the groups of the channels in the L0 layer and the groups of the channels in the L1 layer and the chips so that the condition that the channel CH1 of the L0 layer and the channel CH3 of the L1 layer belong to non-corresponding groups and the channel CH2 of the L0 layer and the channel CH1 of the L1 layer belong to non-corresponding groups is satisfied.
An example of the grouping and association that satisfy the above condition is shown in
There may be more than one grouping and result of association that satisfy the above condition. For example, in the example shown in
Also, for example, when the number of removed edges is large, there may be cases where there is no pattern of grouping and association that completely satisfies the condition that the channels, connected by the removed edges, in the L0 layer and in the L1 layer, belong to non-corresponding groups of channels in the L0 layer and in the L1 layer respectively. In such a case, the determination unit 42 gives priority to determining the grouping of the channels in the L0 layer, the grouping of the channels in the L1 layer, and the association of the groups of the channels in the L0 layer and the groups of the channels in the L1 layer and the chips, and allows the above condition to not be completely satisfied.
The weight assignment unit 43 stores the weights of the edges that connect the channel of the L0 layer and the channel of the L1 layer (more specifically, the edges that remain unremoved) in the weight storage unit in the chip corresponding to each edge.
The operation of storing the weight of an edge in the weight storage unit in the chip corresponding to the edge may be the same as the operation described in the first example embodiment. That is, when the weight assignment unit 43 stores the weight of one edge in the weight storage unit, for example, it stores the weight of the edge in the weight storage unit in the chip corresponding to the group to which the channel of the L1 layer belongs among the channel of the L0 layer and the channel of the L1 layer connected by that edge. For example, in the example shown in
However, the operation of storing the weight of the edge in the weight storage unit in the chip according to the edge is not limited to the above example, but may be other operations.
The weight assignment unit 43 comprises an interface (chip interface, not shown in
The weight assignment unit 43 is realized by, for example, a CPU of a computer that operates according to an assignment program and a chip interface of the computer. For example, the CPU may read the assignment program from a program recording medium such as a program storage device of the computer, and operate as the weight assignment unit 43 using the chip interface according to the assignment program.
The learning unit 41 and the determination unit 42 are realized by, for example, the CPU of the computer that operates according to the assignment program. For example, the CPU may read the assignment program from the program recording medium as described above, and operate as the learning unit 41 and the determination unit 42 according to the assignment program.
Next, the processing process will be explained.
As mentioned above, it is assumed that the multiple channels in the L0 and L1 layers are represented as illustrated in
First, the learning unit 41 learns the weights of each edge (each edge connecting each channel of the L0 layer and each channel of the L1 layer) so that the weights of a predetermined percentage of the number of edges each connecting each channel of the L0 layer and each channel of the L1 layer are as 0 or close to 0 as possible (Step S11).
Next, the determination unit 42 removes the edges whose weights learned in step S11 are equal to or less than the threshold value (step S12). This threshold value is a threshold value for distinguishing weights with values of 0 or close to 0 from those without. This threshold value is predetermined as a value that is relatively close to 0. Therefore, in step S12, edges with weights defined as 0 or close to 0 are removed.
However, regarding edges where the weights are learned to be as 0 or close to 0 as possible, it is not always the case that such a value of weight is obtained as a result of learning. Therefore, even edges whose weights are learned to be as 0 or close to 0 as possible in step S11 are not necessarily removed in step S12.
Next to step S12, the determination unit 42 determines the grouping of the channels in the L0 layer, the grouping of the channels in the L1 layer, and the association of the groups of the channels in the L0 layer and the groups of the channels in the L1 layer and the chips so that a condition that the channels, connected by the removed edges removed in step S12, in the L0 layer and in the L1 layer, belong to non-corresponding groups of channels in the L0 layer and in the L1 layer respectively, is satisfied (Step S13).
In step S13, the determination unit 42 divides the channels of the L01 layer and the channels of the L1 layer into groups whose number is equal to the number of the chips 10, 20 (2 in this example) included in the operation device 1 (refer to
When there are multiple patterns of grouping and association that satisfy the above conditions, the determination unit 42 may determine any one of them.
The result of step S13 is represented, for example, as illustrated in
After step S13, the weight assignment unit 43 stores the weights of the edges that remain unremoved in the weight storage unit in the chip corresponding to each edge (step S14).
When storing the weight of one edge in the weight storage unit, for example, the weight assignment unit 33 stores the weight of the edge in the weight storage unit in the chip corresponding to the group to which the L1 layer channel belongs among the L0 layer channel and L1 layer channel connected by that edge. For example, in the example shown in
Next, the operation for calculating the feature value groups of the L1 layer from the feature value groups of the L0 layer, performed by the operation device 1 which has stored the weights as described above, is explained. It is assumed that the states of the neural network before the L0 layer and after the L1 layer are also defined.
The operation circuit 12 (refer to
The operation circuit 12 calculates the feature value group C11 corresponding to the channel CH1 of the L1 layer using the feature value group C01 and the weights W11 (refer to
The operation circuit 12 calculates the feature value group C12 corresponding to the channel CH2 of the L1 layer using the feature value group C01, the weight W12, the feature value group C02, and the weight W22 (refer to
The operation circuit 12 then calculates the feature value group C12 by using the feature value group C01, weight W12, feature value group C02, and weight W22 as described above.
The operation circuit 22 calculates the feature value group C13 corresponding to the channel CH3 of the L1 layer using the feature value group C02 and the weights W23 (refer to
The operation circuits 12, 22 sequentially calculate the feature value groups for each layer after the L1 layer.
As described above, the operation device 1 may perform data communication between chips in order to calculate some feature value groups of the L1 layer (feature value group C12 in the above example). However, it is not necessary to perform data communication every time all feature value groups of the L1 layer are calculated respectively. Therefore, the operation speed in the operation device 1 can be accelerated.
In other words, in the present embodiment, the learning unit 41 learns the weights of each edge so that the weights of a predetermined percentage of the number of edges are as 0 or close to 0 as possible. Then, the determination unit 42 removes the edges whose weights are equal to or less than the threshold value. In addition, the determination unit 42 determines the grouping of the channels in the L0 layer, the grouping of the channels in the L1 layer, and the association of the groups of the channels in the L0 layer and the groups of the channels in the L1 layer and the chips so that a condition that the channels, connected by the removed edges, in the L0 layer and in the L1 layer, belong to non-corresponding groups of channels in the L0 layer and in the L1 layer respectively, is satisfied. In this way, after removing the edges, the grouping and association are performed to satisfy the condition that the channels, connected by the removed edges, in the L0 layer and in the L1 layer, belong to non-corresponding groups of channels in the L0 layer and in the L1 layer respectively. As a result, the number of edges connecting channels that belong to non-corresponding groups is reduced. Therefore, according to this example embodiment, the edges between neighboring layers can be defined so that the amount of data communication between chips can be suppressed, and weights can be assigned to the chips of the operation device that executes the neural network operation by multiple chips.
In the present example embodiment, the learning unit 41 may re-learn the weights of the edges that remain unremoved after step S12.
For each neighboring layers, the assignment device 40 may remove some edges between the L0 and L1 layers, and determine the grouping of the channels in the L0 layer, the grouping of the channels in the L1 layer, and the association of the groups of the channels in the L0 layer and the groups of the channels in the L1 layer and the chips, using the method described in the second example embodiment.
Channel shuffling may be applied to the first and second example embodiments.
The assignment devices 30, 40 of each example embodiment of the present invention are realized by the computer 1000. The operations of the assignment devices 30, 40 are stored in the auxiliary memory 1003 in the form of an assignment program. The CPU 1001 reads the assignment program from the auxiliary memory 1003, deploys it to the main memory 1002, and executes the processes described in each of the above example embodiments in accordance with the assignment program.
The auxiliary memory 1003 is an example of a non-transitory tangible medium. Other examples of non-transitory tangible media are a magnetic disk, an optical magnetic disk, a CD-ROM (Compact Disk Read Only Memory), a DVD-ROM (Digital Versatile Disk Read Only Memory), a semiconductor memory, and the like, which are connected through the interface 1004. When the program is delivered to the computer 1000 through a communication line, the computer 1000 that receives the delivery may deploy the program into the main memory 1002 and execute the processing described in each of the above example embodiments according to the program.
Some or all of the components of the assignment device may be realized by general-purpose or dedicated circuitry, processors, or a combination of these. They may be configured by a single chip or by multiple chips connected through a bus. Some or all of the components may be realized by a combination of the above-mentioned circuits, etc. and a program.
If some or all of the components of the assignment device are realized by multiple information processing devices, circuits, etc., the multiple information processing devices, circuits, etc. may be centrally located or distributed. For example, the information processing devices, circuits, etc. may be realized as a client-and-server system, cloud computing system, etc., each of which is connected through a communication network.
The learning unit 71 (for example, learning unit 31, 41) learns a weight for each edge connecting a channel in a first layer (for example, L1 layer) that is a layer in a neural network, and a channel in a 0th layer (for example, L0 layer) which is a previous layer to the first layer.
The determination unit 72 (for example, determination units 32, 42) divides channels in the 0th layer and channels in the first layer into groups whose number is equal to the number of chips (for example, chips 10, 20) that are included in an operation device (for example, operation device 1) executing an operation of the neural network using a learning result of the weight for each edge, respectively, determines association of the groups of the channels in the 0th layer and the groups of the channels in the first layer and the chips included in the operation device, and edges to be removed, and removes the edges to be removed.
The weight assigning unit 73 (for example, weight assigning unit 33, 43) stores the weights for the edges each connecting the channel in the 0th layer and the channel in the first layer to a weight storage unit in the chip corresponding to the edge.
According to such a configuration, the edges between neighboring layers can be defined so that the amount of data communication between chips can be suppressed, and weights can be assigned to the chips of the operation device that performs the neural network operations by a plurality of chips.
The aforementioned example embodiments of the present invention can be described as supplementary notes mentioned below, but are not limited to the following supplementary notes.
(Supplementary note 1) An assignment device comprising:
(Supplementary note 2) The assignment device according to supplementary note 1,
(Supplementary note 3) The assignment device according to supplementary note 2, wherein
(Supplementary note 4) The assignment device according to supplementary note 2, wherein
(Supplementary note 5) The assignment device according to supplementary note 1,
(Supplementary note 6) An assignment method, executed by a computer, comprising:
(Supplementary note 7) The assignment method, implemented by the computer, according to supplementary note 6,
(Supplementary note 8) The assignment method, implemented by the computer, according to supplementary note 6,
((Supplementary note 9) An assignment program causing a computer to execute:
(Supplementary note 10) The assignment program according to supplementary note 9, causing the computer to execute
(Supplementary note 11) The assignment program according to supplementary note 9,
While the present invention has been described with reference to the example embodiments, the present invention is not limited to the aforementioned example embodiments. Various changes understandable to those skilled in the art within the scope of the present invention can be made to the structures and details of the present invention.
The present invention is suitably applied to an assignment device that assigns weights in a neural network to chips of an operation device that executes operations of a neural network by a plurality of chips.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/018430 | 5/8/2019 | WO | 00 |