The present invention relates to an operation device comprising a plurality of chips, and an operation allocation method of allocating operations to the plurality of chips.
Patent literatures 1 and 2 describe circuits, etc. that perform parallel processing.
In addition, non-patent literature 1 describes a device that processes one frame and the next frame in a video with different circuits.
Non-patent literature 2 describes a device that performs the processing of the first through nth layer of a neural network, and the processing of the (n+1)th and subsequent layers with different circuits.
In addition, grouped convolution is described in non-patent literature 3.
Non-Patent literature 4 describes a technique to set a weight in a neural network to zero.
Non-patent literature 5 describes a technique to reduce a weight in a neural network.
In recent years, operations of a neural network have become increasingly large-scale. This makes it difficult to perform high-speed operations when operations of a neural network are performed on a single chip.
On the other hand, it is possible to perform neural network operations on multiple chips. In such a case, if the amount of data communication between chips increases, it becomes difficult to perform high-speed operations.
Therefore, it is an object of the present invention to provide an operation device and an operation allocation method that can reduce the amount of data communication between chips while performing neural network operations on multiple chips.
An operation device according to the present invention includes a plurality of chips, wherein each chip comprises weight storage means for storing weights for each edge determined by learning under the condition that channels in a first layer that is a layer in a neural network and channels in a 0th layer that is a previous layer to the first layer are divided into groups whose number is equal to the number of the chips, respectively, the groups of the channels in the first layer and the groups of the channels in the 0th layer and the chips are associated, an edge is set between the channels belonging to corresponding groups, an edge is set between the channels belonging to non-corresponding groups under a restriction, wherein the weight storage means in each chip stores the weights determined for the edge between the channels, each of which corresponds to each chip including the weight storage means, belonging to corresponding groups, and wherein each chip further includes operation means for calculating a set of values for the channel that belongs to the group in the first layer corresponding to the group in the 0th layer, based on the weight stored in the weight storage means in the chip, and a set of values for the channel that belongs to the group in the 0th layer corresponding to the chip.
An operation device according to the present invention includes a plurality of chips, wherein each chip comprises weight storage means for storing weights for each edge determined by learning under the condition that channels in a first layer that is a layer in a neural network and channels in a 0th layer that is a previous layer to the first layer are divided into groups whose number is equal to the number of the chips, respectively, the groups of the channels in the first layer and the groups of the channels in the 0th layer and the chips are associated, an edge is set between each channel in the first layer and each channel in the 0th layer, the weight between the channels that belong to non-corresponding groups is learned so that the weight becomes to be 0 or close to 0 as possible, wherein the weight storage means in each chip stores a first weight determined for the edge between the channels, each of which corresponds to each chip including the weight storage means, belonging to corresponding groups, and a second weight for the edge between the channel, belonging to the group in the first layer, corresponding to the chip and the channel, belonging to the group in the 0th layer, non-corresponding to the chip, wherein the second weight is equal to or more than a predetermined threshold, and wherein each chip further includes operation means for calculating a set of values for the channel that belongs to the group in the first layer corresponding to the group in the 0th layer, based on the first weight and a set of values for the channel that belongs to the group in the 0th layer corresponding to the chip, and when calculating the set of values for the channel that belongs to the group corresponding to the chip in the first layer, if there is the channel belonging to the group that does not correspond to the group corresponding to the chip and for which the edge connected to the channel belonging to the group corresponding to the chip is set wherein the second weight is determined for the edge, obtaining the set of values for the channel belonging to the group that does not correspond to the group corresponding to the chip from another chip that corresponds to the group that does not correspond to the group corresponding to the chip, and calculating the set of values for the channel that belongs to the group corresponding to the chip in the first layer using obtained set of values and the second weight.
An operation method according to the present invention is a method for allocating operations to a plurality of chips included in an operation device, including determining weights for each edge by learning under the condition that channels in a first layer that is a layer in a neural network and channels in a 0th layer that is a previous layer to the first layer are divided into groups whose number is equal to the number of the chips, respectively, the groups of the channels in the first layer and the groups of the channels in the 0th layer and the chips are associated, an edge is set between the channels belonging to corresponding groups, an edge is set between the channels belonging to non-corresponding groups under a restriction, and allocating the weight determined for the edge between the channels, each of which corresponds to each chip, belonging to corresponding groups, to each chip, wherein a set of values for the channel that belongs to the group in the first layer corresponding to the group in the 0th layer is calculated by each chip, based on the weight allocated to the chip, and a set of values for the channel that belongs to the group in the 0th layer corresponding to the chip.
An operation method according to the present invention is a method for allocating operations to a plurality of chips included in an operation device, including determining weights for each edge by learning under the condition that channels in a first layer that is a layer in a neural network and channels in a 0th layer that is a previous layer to the first layer are divided into groups whose number is equal to the number of the chips, respectively, the groups of the channels in the first layer and the groups of the channels in the 0th layer and the chips are associated, an edge is set between each channel in the first layer and each channel in the 0th layer, the weight between the channels that belong to non-corresponding groups is learned so that the weight becomes to be 0 or close to 0 as possible, removing the edge whose weight is less than a predetermined threshold, and allocating to each chip a first weight determined for the edge between the channels, each of which corresponds to each chip, belonging to corresponding groups, and a second weight determined for the edge between the channel, belonging to the group in the first layer, corresponding to the chip and the channel, belonging to the group in the 0th layer, non-corresponding to the chip, wherein the second weight is equal to or more than a predetermined threshold, wherein in each chip, a set of values for the channel that belongs to the group in the first layer corresponding to the group in the 0th layer is calculated, based on the first weight allocated to the chip and a set of values for the channel that belongs to the group in the 0th layer corresponding to the chip, and when calculating the set of values for the channel that belongs to the group corresponding to the chip in the first layer, if there is the channel belonging to the group that does not correspond to the group corresponding to the chip and for which the edge connected to the channel belonging to the group corresponding to the chip is set, the set of values for the channel belonging to the group that does not correspond to the group corresponding to the chip is obtained from another chip that corresponds to the group that does not correspond to the group corresponding to the chip, and the set of values for the channel that belongs to the group corresponding to the chip in the first layer is calculated using obtained set of values and the second weight.
According to this invention, it is possible to reduce amount of data communication between chips while performing neural network operations on multiple chips.
Before explaining the example embodiment of the present invention, an operation of a neural network is explained. In the operation of a neural network, when calculating values in a layer, the values calculated in the previous layer are used. Such calculation of values is performed sequentially for each layer. In the following explanation, the layer for which values are to be calculated and the previous layer are focused on. The layer where the values are to be calculated is called the L1 layer, and the layer before the L1 layer is called the L0 layer, where the values have already been calculated.
Each layer contains a plurality of channels. The L0 and L1 layers also contain a plurality of channels, respectively.
In the example shown in
The individual circles in
The set of values for each channel is referred to as the feature value group.
In the example shown in
In order to calculate sets of feature values in the L1 layer, weights are determined by learning to the connections between the channels in the L1 layer and the channels in the L0 layer.
The connection between the channels for which weights are determined is called edge. In the example shown in
Each feature value group of the L1 layer is calculated by the weights and the feature value group of the L0 layer.
The feature value group C11 corresponding to the channel CH1 of the L1 layer is calculated using the feature value group C01, the weight W11, the feature value group C02, and the weight W21 (refer to
Similarly, the feature value group C12 corresponding to the channel CH2 of the L1 layer is calculated using the feature value group C01, the weight W12, the feature value group C02, and the weight W22 (refer to
Similarly, the feature value group C13 corresponding to the channel CH3 of the L1 layer is calculated using the feature value group C01, the weight W13, the feature value group C02, and the weight W23 (refer to
Hereinafter, example embodiments of the present invention are described with reference to the drawings.
In each of the aforementioned L0 and L1 layers, the channels shall be divided into the same number of groups. This number of groups is the number of chips included in the operation device of the present invention. That is, in each of the L0 and L1 layers, the channels are divided into the same number of groups as it of the chips. The number of chips is an integer greater than or equal to two. For the sake of simplicity, the case where the number of chips is two will be used as an example.
The number of groups of channels in the L0 layer and the number of groups of channels in the L1 layer are the same. In addition, the number of groups of channels in each layer is the same as the number of chips. Therefore, the groups of channels in the L0 layer and the groups of channels in the L1 layer can be mapped one-to-one. In this example, it is assumed that the group A of each layer is mapped to each other and the group B of each layer is mapped to each other. It is also assumed that one of the two chips is mapped to the group A and the other to the group B.
When the channels are divided into the same number of pairs in each of the L0 and L1 layers, edges are set between the channels belonging to the corresponding groups. In this example, since the group A corresponds to each other, an edge is set between CH1 of the L0 layer and CH1 of the L1 layer, and between CH1 of the L0 layer and CH2 of the L1 layer, respectively. Similarly, since the group B corresponds to each other, an edge is set between the channel CH2 of the L0 layer and the channel CH3 of the L1 layer.
In this example embodiment, there is a restriction on setting edges between channels that belong to non-corresponding groups. One example of the restriction is that no edge is set between channels that belong to non-corresponding groups. Another example is the restriction that edges are set only for some pairs of channels that belong to non-corresponding groups.
The feature value group C11 corresponding to the channel CH1 of the L1 layer is calculated using the feature value group C01 and the weight W11 (refer to
The feature value group C13 corresponding to the channel CH3 of the L1 layer is calculated using the feature value group C02 and the weights W23 (refer to
In the case of the examples shown in
Next, an example of the case, where the restriction that edges are set only for some pairs of channels that belong to non-corresponding groups is adopted, will be shown.
In the example shown in
In the case of the examples shown in
The edge weights may be determined in the same way for each connection between adjacent layers. This is also the case in the second example embodiment described below.
In the following explanation, the case of calculating the feature value group of the L1 layer from the feature value group of the L0 layer will be used as an example. It is preferable that the calculation method regarding the connection between other layers is the same as the calculation method for calculating the feature value group of the L1 layer from the feature value group of the L0 layer. However, the calculation method regarding the connection between other layers may be different from the calculation method for calculating the feature value group of the L1 layer from the feature value group of the L0 layer. In the present invention, it is sufficient that the calculation method for calculating the feature value group of the L1 layer from the feature value group of the L0 layer is applied between at least one group of adjacent layers in the neural network.
The chip 10 comprises a weight storage unit 11, an operation circuit 12 and a communication circuit 13.
Similarly, the chip 20 comprises a weight storage unit 21, an operation circuit 22 and a communication circuit 23.
The weight storage units 11, 21 is realized by a memory in the chip. The operation circuits 12, 22 are realized by a processor in the chip. The communication circuits 13, 23 are realized by a communication interface for inter-chip communication.
The weight storage unit 11 and the weight storage unit 21 store the weights determined for each edge by learning. In
Here, the learning of the weights stored in the weight storage units 11, 21 in the respective chips 10, 20 will be explained.
Before learning the weights, the channels in the L0 layer and the channels in the L1 layer are divided into the same number of groups as the number of chips. Further, the groups of channels in the L0 layer and the groups of channels in the L1 layer are associated with the chips without omission and without overlap. The grouping of the channels and the association of the groups of channels in the L0 layer and the groups of channels in the L1 layer to the chips may be performed, for example, by an operator or by the operation device 1 or other devices.
In this example, it is assumed that the channels in the L0 layer are divided into the group A and the group B, and the channels in the L1 layer are also divided into the group A and the group B, as illustrated in
In addition, an edge is set between channels that belong to the corresponding groups. In other words, it is determined that an edge is set between the channels that belong to the corresponding groups.
Furthermore, the setting of edges between channels belonging to non-corresponding groups is performed under a certain restriction. In this example, it is assumed that this restriction is the restriction that no edge is set between channels that belong to non-corresponding groups. Therefore, it is determined that no edges are set between channels that belong to non-corresponding groups. The setting of edges between channels belonging to non-corresponding groups may be performed, for example, by the operator or by the operation device 1 or other devices, as in the above case.
After grouping of channels, the association of the groups of channels in the L0 layer and the groups of channels in the L1 layer and the chips, the edges between the channels belonging to the corresponding groups, and the edges between the channels belonging to the non-corresponding groups are determined, weights are determined by the learning for each edge set between the L0 layer and the L1 layer according to such conditions.
The determined weights are then allocated to the weight storage units 11, 21 in respective chips, and the weight storage units 11, 21 store the allocated weights.
The weight storage unit 11 in the chip 10 is allocated the weights defined for the edges between the channels belonging to the corresponding groups (in this example, the group A shown in
The weight storage unit 21 in the chip 20 is allocated the weights defined for the edges between the channels belonging to the corresponding groups (in this example, the group B shown in
The entities that perform the process of learning the weights and the process of allocating the weights to the chips are, for example, the operation circuits 12, 22 in each chips 10, 20. In this case, the operation circuits 12, 22 in each chip 10, 20 can be referred to as learning means. Alternatively, a device (for example, a computer) external to the operation device 1 may be the entity that performs the process of learning the weights and the process of allocating the weights to the chips. In this case, the external device is referred to as learning means.
The operation circuits 12, 22 in each chip 10, 20 calculate a set of values of each layer of the neural network based on a set of values of the previous layer and the weights. An example of values to an input layer is respective pixel values of an image. The operation circuit 12 calculates the feature value group C01 corresponding to the channel CH1 of the L0 layer as a set of values of the L0 layer. The operation circuit 22 calculates the feature value group C02 corresponding to the channel CH2 of the L0 layer as a set of values of the L0 layer.
Then, the operation circuit 12 in the chip 10 calculates the feature value group C11 corresponding to the channel CH1 of the L1 layer using the feature value group C01 and the weight W11. Similarly, the operation circuit 12 calculates the feature value group C12 corresponding to the channel CH2 of the L1 layer using the feature value group C01 and the weight W12. When calculating the feature value groups C11 and C12, the data held by the chip 20 is not used. Therefore, no data communication between the chip 10 and the chip 20 is required when the operation circuit 12 calculates the feature value groups C11 and C12.
The operation circuit 22 in the chip 20 calculates the feature value group C13 corresponding to the channel CH3 of the L1 layer using the feature value group C02 and the weights W23. When calculating the feature value group C13, the data held by the chip 10 is not used. Therefore, data communication between the chip 10 and the chip 20 is not necessary even when the operation circuit 22 calculates the feature value group C13.
The operation circuits 12, 22 sequentially calculate a set of values for each layer after the L1 layer.
In the above example, the case where the restriction on setting edges between channels belonging to non-corresponding groups is the restriction that no edges are set between channels belonging to non-corresponding groups is shown. In the following explanation, the case where the restriction on setting edges between channels belonging to non-corresponding groups is the restriction on setting edges only for some pairs of channels belonging to non-corresponding groups will be explained as an example, referring to
In this example, it is assumed that it is determined to set an edge only on a pair of CH1 of the L0 layer and CH3 of the L1 layer among the pairs of channels belonging to non-corresponding groups. This setting may be made by the operator, for example, or by the operation device 1 or other devices, as in the above case.
The other matters to be determined before learning are the same as in the above case. After each matter is determined, weights are determined by learning for each edge between the L0 layer and the L1 layer according to such conditions. The determined weights are then allocated to the weight storage units 11, 21 of the respective chips, and the weight storage units 11, 21 store the allocated weights. Since the entities that perform the process of learning weights and allocating weights to chips have already explained, explanations are omitted here.
The weight storage unit 11 of chip 10 stores weights W11 and W12 in the same way as described above. The weight storage unit 21 in chip 20 stores the weights W23 in the same way as described above.
Further, in this example, the weight storage unit 21 in the chip 20 is allocated the weight W13 (refer to
As in the above case, the operation circuits 12, 22 of respective chips 10, 20 calculate a set of values of each layer of the neural network based on the set of values of the previous layer and the weights. The operation circuit 12 calculates the feature value group C01 corresponding to the channel CH1 of the L0 layer as the set of values of the L0 layer. The operation circuit 22 calculates the feature value group C02 corresponding to the channel CH2 of the L0 layer as the set of values of the L0 layer.
The operation circuit 12 in the chip 10 calculates the feature value group C11 corresponding to the channel CH1 of the L1 layer and the feature value group C12 corresponding to the channel CH2 of the L1 layer. This process is similar to the process described above, and no data communication between the chip 10 and the chip 20 is required when the operation circuit 12 calculates the feature value groups C11 and C12.
The operation circuit 22 in the chip 20 calculates the feature value group C13 corresponding to the channel CH3 of the L1 layer using the feature value group C01, the weight W13, the feature value group C02, and the weight W23. The channel CH3 of the L1 layer belongs to the group B. Then, when calculating the feature value group C13, the feature value group C01 corresponding to the channel CH1 belonging to the group A of the L0 layer that does not correspond to the group B of the L1 layer is used. The chip corresponding to the group A of the L0 layer is the chip 10, and the feature value group C01 is held in the chip 10. Therefore, the operation circuit 22 in the chip 20 obtains the feature value group C01 held in the operation circuit 12 in the chip 10. For example, the operation circuit 22 requests the feature value group C01 to the chip 10 through the communication circuit 23. When the operation circuit 12 in the chip 10 receives the request through the communication circuit 13, it transmits the feature value group C01 to the chip 20 through the communication circuit 13. The operation circuit 22 can receive the feature value group C01 through the communication circuit 23.
After obtaining the feature value group C01, the operation circuit 22 calculates the feature value group C13 using the feature value group C01, the weight W13, the feature value group C02 and the weight W23. In this way, when calculating the feature value group C13, the feature value group C01 is transmitted and received between the chip 10 and the chip 20. However, the amount of data communication is less than the case where edges are set for all pairs of channels belonging to non-corresponding groups. Therefore, the amount of data communication between chips can be reduced in this example as well.
The operation circuits 12, 22 sequentially calculate a set of values for each layer after the L1 layer.
The above example shows a case where the weight W13 is allocated to the weight storage unit 21 in the chip 20. However, the weight W13 may be allocated to the weight storage unit 11 in the chip 10, and the weight storage unit 11 may store the weight W13. In this case, the operation circuit 12 in the chip 10 may calculate values for calculating the feature value group C13 using the feature value group C01 and the weight W13, and the operation circuit 22 in the chip 20 may obtain the calculation result from the chip 10. Then, the operation circuit 22 may calculate the feature value group C13 using the calculation result, the feature value group C02 and the weight W23.
In the above example, if the absolute value of the value of the weight corresponding to the edge defined for a pair of channels belonging to non-corresponding groups is less than or equal to a predetermined threshold value, the edge is assumed not to exist, and the weight may not be allocated either. For example, in the above example, if the absolute value of W13 is less than or equal to the threshold value, the edge between CH1 of the L0 layer and CH3 of the L1 layer is considered to not exist, and the allocation of W13 to the chip may not be performed. In this case, the operation circuit 22 in the chip 20 may calculate the feature value group C13 using the feature value group C02 and the weight W23. Accordingly, the amount of data communication between chips can be further reduced. In this case, the amount of data communication between chips becomes zero.
First, the weights of respective edges defined between the L0 layer and the L1 layer is learned (Step S1). Since the matters to be determined before learning have already been explained, they will not be explained here. In Step S1, the weights of respective edges are learned based on the determined matters.
Next, weights corresponding to the chip are allocated to each chip 10, 20 (Step S2). The weight storage units 11, 21 in the chips 10, 20 store the allocated weights.
Then, when the data (for example, an image) that will be the input layer is input, the operation circuits 12, 22 in respective chips 10, 20 calculate a set of values for each layer, sequentially (Step S3). The process of calculating the feature value group of the L1 layer from the feature value group of the L0 layer has already been explained, so explanations are omitted here.
According to this example embodiment, the setting of edges between channels belonging to non-corresponding groups is performed under a predetermined restriction. Then, the weights of respective edges are learned to satisfy the edge settings so defined. The chip 10 is associated with the group A of the L0 and L1 layers, and the chip 20 is associated with the group B of the L0 and L1 layers. Then, a weight is allocated to each chip corresponding to the chip.
Therefore, the amount of data communication between the chip 10 and the chip 20 can be reduced when each chip 10, 20 calculates the feature value group of the L1 layer using the feature value group of the L0 layer. Further, since the amount of data communication between the chips 10, 20 can be reduced, it is also possible to achieve higher speed in the calculation of the neural network.
In the second example embodiment of the present invention, the channels are divided into the same number of groups in each of the L0 and L1 layers. This number of groups is the number of chips included in the operation device of the present invention. That is, in each of the L0 and L1 layers, the channels are divided into the same number of groups as it of the chips. Furthermore, the groups of channels in the L0 layer and the groups of channels in the L1 layer are associated with the chips. This point is the same as in the first example embodiment. For the sake of simplicity, the case where the number of chips is two will be used as an example also in this example embodiment. The configuration of the operation device of the second example embodiment can be represented as shown in
In the second example embodiment, an edge is set between each channel in the L1 layer and each channel in the L0 layer. In this state, the weight of each edge is determined by learning. In other words, under the condition that in each of the L0 and L1 layers, the channels are divided into the same number of groups as it of the chips, the groups of channels in the L0 layer, the groups of channels in the L1 layer, and the chips are associated, and an edge between each channel in the L1 layer and each channel in the L0 layer is set, weights of respective edges are determined by learning.
The learning means (which may be the operation circuits 12, 22 in each chip 10, 20, or a device external (for example, a computer) to the operation device) learns the weight of each edge shown in
There is no particular condition for learning the weights of the edges (edges shown by solid lines in
Hereinafter, the weight determined for the edge set between the channels belonging to the corresponding groups is referred to as the first weight. In the example shown in
The learning means removes edges when the weights defined for the edges set between the channels belonging to the non-corresponding groups are less than a predetermined threshold. In this example, for the sake of simplicity, it is assumed that the weights W21 and W22 are less than the threshold value, and that the two edges for which the weights W21 and W22 are set have been removed. Depending on the learning result, it is possible that the weights W21 and W22 are not less than the threshold values, but the operation as the second example embodiment of the invention remains the same. In this example, if the weights W21, W22, and W13 are all less than the threshold value, then all edges set between channels belonging to non-corresponding groups will be removed.
The learning means stores the first weight in the weight storage unit in the chip corresponding to the group to which the channel connected by the edge for which the weight is determined belongs. For example, since the group A to which the channel connected by the edge for which the weight W11 is determined belongs corresponds to chip 10 (refer to
The learning means stores the second weight in the weight storage unit in the chip corresponding to the group to which the L1 layer channel belongs among the channels connected by the edge for which the weight is determined. In this example, the weight W13 is equal to or more than or equal to the threshold and corresponds to the second weight. The weight W13 is a weight determined for the edge connecting the channel CH1 of the L0 layer and the channel CH3 of the L1 layer (refer to
The operation in which the operation device 1 (refer to
The operation device 1 executes an operation to calculate the feature value groups C11 and C12 on the chip 10 corresponding to group A, and an operation to calculate the feature value group C13 on the chip 20 corresponding to group B.
The operation circuits 12, 22 in respective chips 10, 20 calculate the set of values of respective layers of the neural network based on the set of values of the previous layer and the weights. An example of a value to an input layer is an individual pixel value of an image. The operation circuit 12 calculates the feature value group C01 corresponding to the channel CH1 of the L0 layer as a set of values of the L0 layer. The operation circuit 22 calculates the feature value group C02 corresponding to the channel CH2 of the L0 layer as a set of values of the L0 layer.
Then, the operation circuit 12 in the chip 10 calculates the feature value group C11 corresponding to the channel CH1 of the L1 layer using the feature value group C01 and the weight W11 (refer to
The operation circuit 22 in the chip 20 calculates the feature value group C13 corresponding to the channel CH3 of the L1 layer using the feature value group C01, the weight W13, the feature value group C02 and the weight W23. Here, the feature value group C01 is held in the chip 10. Therefore, the operation circuit 22 in the chip 20 obtains the feature value group C01 held in the operation circuit 22 in the chip 10. For example, the operation circuit 22 requests the feature value group C01 to the chip 10 through the communication circuit 23. When the operation circuit 12 in the chip 10 receives the request through the communication circuit 13, it transmits the feature value group C01 to the chip 20 through the communication circuit 13. The operation circuit 22 can receive the feature value group C01 through the communication circuit 23. After obtaining the feature value group C01, the operation circuit 22 calculates the feature value group C13 using the feature value group C01, the weight W13, the feature value group C02, and the weight W23.
Thus, when calculating the feature value group C13, the feature value group C01 is transmitted and received between the chip 10 and the chip 20. However, in this example embodiment, the weights of the edges set between the channels belonging to the non-corresponding groups are learned to be 0 or close to 0 as much as possible, and the edges set between the channels belonging to the non-corresponding groups whose determined weights are less than the threshold are removed. Therefore, in the second example embodiment, the operation of the neural network can be executed while the amount of data communication between chips can also be reduced.
The operation circuits 12, 22 sequentially calculate a set of values for each layer after the L1 layer.
Next, an overview of the present invention will be explained.
Each chip 70 comprises weight storage means 71 (for example, weight storage units 11, 21) for storing weights for each edge determined by learning under the condition that channels in a first layer (for example, the L1 layer) that is a layer in a neural network and channels in a 0th layer (for example, the L0 layer) that is a previous layer to the first layer are divided into groups whose number is equal to the number of the chips, respectively, the groups of the channels in the first layer and the groups of the channels in the 0th layer and the chips are associated, an edge is set between the channels belonging to corresponding groups, an edge is set between the channels belonging to non-corresponding groups under a restrict.
The weight storage means 71 in each chip 70 stores the weights determined for the edge between the channels, each of which corresponds to each chip including the weight storage means, belonging to corresponding groups.
In addition, each chip comprises operation means 72 (for example, operation circuits 12, 22) for calculating a set of values for the channel that belongs to the group in the first layer corresponding to the group in the 0th layer, based on the weight stored in the weight storage means in the chip, and a set of values for the channel that belongs to the group in the 0th layer corresponding to the chip.
With such a configuration, the amount of data communication between chips can be reduced while the neural network operations are performed on multiple chips.
The weight storage means in each chip may store the weight for each edge determined under the condition that the edges between channels that belong to non-corresponding groups are set only for some pairs among pairs of channels that belong to the non-corresponding groups, and when calculating the set of values for the channel that belongs to the group corresponding to the chip in the first layer, if there is the channel belonging to the group that does not correspond to the group and for which the edge connected to the channel belonging to the group is set, the operation means in each chip obtains the set of values for the channel belonging to the group that does not correspond to the group from another chip corresponding to the group that does not correspond to the group, and calculates the set of values for the channel that belongs to the group in the first layer using obtained set of values.
The weight storage means in each chip may store the weight for each edge determined under the condition that the edge is not set between the channels that belong to non-corresponding groups.
The operation means in each chip may determine the weight by learning.
While the present invention has been described with reference to the example embodiments, the present invention is not limited to the aforementioned example embodiments.
Various changes understandable to those skilled in the art within the scope of the present invention can be made to the structures and details of the present invention.
The present invention is suitably applied to an operation device that performs neural network operations.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/018429 | 5/8/2019 | WO | 00 |