The present invention relates to a grouped convolution processing definition changing device, a grouped convolution processing definition changing method, and a grouped convolution processing definition changing program.
A convolutional neural network (CNN) is a feed-forward neural network with a structure consisting of two alternating layers: a convolutional layer and a pooling layer. Hereafter, convolutional neural networks are also referred to simply as CNNs.
C1 and C2 shown in
Note that an image is an example of input data; data input to the CNN may be data other than images.
P1 and P2 shown in
F shown in
The following is a specific explanation of convolution computation in a CNN.
The input image shown in
For simplicity of explanation, consider as input X, the object of the convolution computation, an image with a vertical size of 1, horizontal size of 1, and number of channels Cin marked with the grid pattern shown in
In other words, in the example of the convolution computation shown in
In the convolution computation shown in
The convolution computation shown in
The CNNs shown in
The CNNs that use grouped convolution as a method of the above convolution computation are increasing. For example, a description of grouped convolution is provided in Non Patent Literature (NPL) 1.
The computation of grouped convolution has advantages over the computation of ordinary convolution, such as smaller arithmetic operations and higher accuracy. However, when AI (Artificial Intelligence) chips, which are semiconductor integrated circuits specialized for AI processing, execute the computation of grouped convolution, the computation speed may decrease.
Since the use of both grouped convolution and AI chips is expected to increase, there is a need for a method that does not slow down the computation speed of grouped convolution even when AI chips execute the computation of grouped convolution. NPL 1 does not describe a method that does not slow down the computation speed of grouped convolution.
Therefore, it is an object of the present invention to provide a grouped convolution processing definition changing device, a grouped convolution processing definition changing method, and a grouped convolution processing definition changing program that can increase the computation speed of grouped convolution.
A grouped convolution processing definition changing device according to the present invention includes a changing means which changes, for a learned convolutional neural network in which grouped convolution is defined in which input data consisting of first to N-th channels (N is an integer greater than or equal to 2) arranged in order is divided into G (G is an integer greater than or equal to 2) pieces in a channel direction, and convolution computations are executed for data consisting of {(i−1)×N/G+1} to (i×N/G) channels (i=1 to G) divided using i-th weight matrix over i=1 to i=G respectively, a number which the input data is divided from G to g (where g is a divisor of G excluding G), and a generation means which generates a new j-th weight matrix to be used in the convolution computation of data consisting of {(j−1)×N/g+1} to (j×N/g) channels (j=1 to g) divided after the divided number is changed over j=1 to j=g respectively, wherein the generation means generates the new j-th weight matrix by placing {(j−1)×G/g+1} to (j×G/g) weight matrices from the upper left to the lower right of the new j-th weight matrix in the order of {(j−1)×G/g+1} to (j×G/g) weight matrices on the diagonal line, and setting values of all components except for components at locations where the weight matrices are placed to 0.
A grouped convolution processing definition changing method according to the present invention includes changing, for a learned convolutional neural network in which grouped convolution is defined in which input data consisting of first to N-th channels (N is an integer greater than or equal to 2) arranged in order is divided into G (G is an integer greater than or equal to 2) pieces in a channel direction, and convolution computations are executed for data consisting of {(i−1)×N/G+1} to (i×N/G) channels (i=1 to G) divided using i-th weight matrix over i=1 to i=G respectively, a number which the input data is divided from G to g (where g is a divisor of G excluding G), generating a new j-th weight matrix to be used in the convolution computation of data consisting of {(j−1)×N/g+1} to (j×N/g) channels (j=1 to g) divided after the divided number is changed over j=1 to j=g respectively, and generating the new j-th weight matrix by placing {(j−1)×G/g+1} to (j×G/g) weight matrices from the upper left to the lower right of the new j-th weight matrix in the order of {(j−1)×G/g+1} to (j×G/g) weight matrices on the diagonal line, and setting values of all components except for components at locations where the weight matrices are placed to 0.
A grouped convolution processing definition changing program according to the present invention, causing a computer to execute a changing process of changing, for a learned convolutional neural network in which grouped convolution is defined in which input data consisting of first to N-th channels (N is an integer greater than or equal to 2) arranged in order is divided into G (G is an integer greater than or equal to 2) pieces in a channel direction, and convolution computations are executed for data consisting of {(i−1)×N/G+1} to (i×N/G) channels (i=1 to G) divided using i-th weight matrix over i=1 to i=G respectively, a number which the input data is divided from G to g (where g is a divisor of G excluding G), and a generation process of generating a new j-th weight matrix to be used in the convolution computation of data consisting of {(j−1)×N/g+1} to (j×N/g) channels (=1 to g) divided after the divided number is changed over j=1 to j=g respectively, wherein the grouped convolution processing definition changing program causes the computer to generate the new j-th weight matrix by placing {(j−1)×G/g+1} to (j×G/g) weight matrices from the upper left to the lower right of the new j-th weight matrix in the order of {(j−1)×G/g+1} to (j×G/g) weight matrices on the diagonal line, and setting values of all components except for components at locations where the weight matrices are placed to 0, in the generation process.
According to this invention, it is possible to increase the computation speed of grouped convolution.
First, the computation of grouped convolution in the CNN is explained in detail.
The example of the computation of grouped convolution shown in
In the example of the computation of grouped convolution shown in
Note that input X, which is an image, is an example of input data; data input to the CNN may be data other than images.
Thus, as shown in
As shown in
Finally, the AI chip places the obtained images in the same positions as the quad-divided input X used in the computation. After placing each of the four obtained images, the AI chip combines each image.
By combining, the AI chip obtains an output Y that is an image with the number of channels Cout, which is equivalent to the result of the computation of the normal convolution. Note that the output Y obtained by the computation of grouped convolution shown in
The amount of the convolution computation is proportional to the size of the weights. For example, the amount of the convolution computation shown in
In general, if the input X is divided into G (G is an integer greater than or equal to 2) groups and the computation of grouped convolution is executed, the amount of the computation of grouped convolution shown in
According to the above theory, the computational speed of grouped convolution is expected to be G times faster than that of normal convolution, since the amount of computation is 1/G. However, many AI chips are not suitable for grouped convolution because they are optimized for normal convolution.
For AI chips that are not suitable for grouped convolution, for example, the computational process of grouped convolution may be implemented as a computational process of multiple convolutions. Thus, an AI chip with a computational process of multiple convolutions implemented would be affected by the overhead of calling convolution computations G times when executing the computations of grouped convolution. If an AI chip is affected by the overhead G times, the computational speed of grouped convolution would decrease.
In addition, in the computation of grouped convolution, the number of channels of images to be computed at one time is small (e.g., (Cin/4) shown in
For example, if an image with 8 channels is input to an AI chip that can process up to 256 processes in parallel, the AI chip can only process a maximum of 8 processes in parallel. In other words, the smaller parallelism of processing is one of the factors that reduce the computational speed of grouped convolution.
The following describes with reference to the drawings an example embodiment of the present invention that can increase the computation speed of grouped convolution, the issue discussed above.
The grouped convolution processing definition changing device 100 shown in
The pre-change CNN model storage unit 200 stores the learned CNN models described above, including the weights Wa to Wd shown in
The post-change CNN model storage unit 300 stores the learned CNN models stored in the pre-change CNN model storage unit 200 whose definitions have been changed by the grouped convolution processing definition changing device 100.
The AI chip 400 is communicatively connected to the post-change CNN model storage unit 300. The AI chip 400 is a chip that executes the convolution computation using the learned CNN model stored in the post-change CNN model storage unit 300.
As shown in
The acquisition unit 110 acquires the learned CNN model from the pre-change CNN model storage unit 200, including the weights Wa to Wd shown in
In order to solve the above-mentioned issues, the grouped convolution processing definition changing device 100 of this example embodiment is characterized by re-combining each group in the computation of grouped convolution. The following describes how the grouped convolution processing definition changing device 100 solves the issues.
In the example of the convolution computation shown in
In the convolution computation shown in
As shown in
As shown in
The storage unit 140 of the grouped convolution processing definition changing device 100 stores the learned CNN models whose definitions have been changed by the definition changing unit 120 in the post-change CNN model storage unit 300, including the weight W1 generated by the weight changing unit 130.
Therefore, in the convolution computation shown in
In the convolution computation shown in
In the computation of grouped convolution shown in
The definition changing unit 120 of this example embodiment changes the number of groups defined in the learned CNN model to the number of groups' divisors. In the example shown in
As shown in
The weight changing unit 130 places the weights Wc to Wd shown in
As shown in
The storage unit 140 of the grouped convolution processing definition changing device 100 stores the learned CNN models whose definitions have been changed by the definition changing unit 120 in the post-change CNN model storage unit 300, including the weights W2 to W3 generated by the weight changing unit 130.
Thus, in the convolution computation shown in
The AI chip 400 executes the above computation for the weight W3 as well. In other words, the AI chip 400 divides the input X into two pieces in the channel direction, and executes the convolution computation for the image consisting of the {(j−1)×Cin/2+1} to the (j×Cin/2) channels (j=1 to 2) divided using the new j-th weight matrix over j=1 to j=2, respectively. The new first weight matrix to the new second weight matrix corresponds to the weights W2 to W3, respectively.
As a result of each computation, the AI chip 400 obtains two images with the number of channels (Cout/2). The AI chip 400 then places the obtained images in the same positions as the bi-divided input X used in the computation.
After placing each of the two obtained images, the AI chip 400 combines each image. By combining, the AI chip 400 obtains the output Y.
In other words, the weight changing unit 130 places the weights Wa and Wb from the upper left to the lower right of the weight W2 corresponding to the new first weight matrix, in the order of the weights Wa to Wb on the diagonal line. Next, the weight changing unit 130 generates the weight W2 by setting the values of all components except for the components at the locations where the weights are placed to 0.
In the grouped convolution shown in
The weight changing unit 130 places the weights Wc and Wa from the upper left to the lower right of the weight W3 corresponding to the new second weight matrix in the order of the weights Wc to Wd on the diagonal line. Next, the weight changing unit 130 generates the weight W3 by setting the values of all components except for the components at the locations where the weights are placed to 0.
In the grouped convolution shown in
In the computation of grouped convolution shown in
In addition, in the computation of grouped convolution shown in
As described above, the grouped convolution processing definition changing device 100 of this example embodiment handles a learned convolutional neural network in which grouped convolution is defined in which input data consisting of first to N-th channels (N is an integer greater than or equal to 2) arranged in order is divided into G (G is an integer greater than or equal to 2) pieces in the channel direction. In the grouped convolution, the convolution computations are executed for the data consisting of the {(i−1)×N/G+1} to (i×N/G) channels (i=1 to G) divided using the i-th weight matrix over i=1 to i=G, respectively.
The definition changing unit 120 changes the number which the input data is divided from G to g (where g is a divisor of G excluding G). When the number is changed from G to g, the weight changing unit 130 generates the new j-th weight matrix to be used in the convolution computation of the data consisting of {(j−1)×N/g+1} to j×N/g) channels (j=1 to g) divided after the divided number is changed over j=1 to j=g, respectively.
Specifically, the weight changing unit 130 generates a new j-th weight matrix by placing the {(j−1)×G/g+1} to (j×G/g) weight matrices from the upper left to the lower right of the new j-th weight matrix in the order of the {(j−1)×G/g+1} to (j×G/g) weight matrices on the diagonal line, and setting the values of all the components except for the components at the locations where the weight matrices are placed to 0.
The optimal value of g depends on the AI chip 400. Therefore, it is preferable that the grouped convolution processing definition changing device 100 generates new weight matrices for all the divisors of G except G, the performance of each AI chip 400 when each weight matrix is used is measured, and the optimal value of g is determined based on the results of each measurement.
The definition changing unit 120 may also change the divided number of the input data from G to 1. The definition changing unit 120 may also change the optimal value of g for each of the multiple convolutional layers that compose the CNN.
An operation on the grouped convolution processing definition changing device 100 of this exemplary embodiment is described below with reference to
First, the acquisition unit 110 of the grouped convolution processing definition changing device 100 acquires the learned CNN model from the pre-change CNN model storage unit 200 (step S101).
Next, the definition changing unit 120 changes the number of groups G defined in the acquired learned CNN model to a new number of groups g (step S102). Note that g is the divisor of G excluding G.
Next, the weight changing unit 130 generates new weights from the weights used in the acquired learned CNN model based on the changed g (step S103).
The method of generating weights by the weight changing unit 130 is as described above. For example, if the number of groups G is changed to the number of groups g, the weight changing unit 130 generates g weights in total. The vertical and horizontal sizes of the weights generated by the weight changing unit 130 are G/g times the vertical and horizontal sizes of the weights acquired.
Next, the storage unit 140 stores the learned CNN model whose definition has been changed by the definition changing unit 120 in the post-change CNN model storage unit 300, including the weights generated by the weight changing unit 130 (step S104). After storing, the grouped convolution processing definition changing device 100 terminates the grouped convolution processing definition changing processing.
The grouped convolution processing definition changing device 100 of this example embodiment has the definition changing unit 120 and the weight changing unit 130 that replace the grouped convolution processing with a processing suitable for the AI chip 400, thereby speeding up the computation of grouped convolution.
Specifically, the grouped convolution processing is replaced by a processing with a high degree of parallelism, which increases the amount of computation but is affected by overhead less often and in which the AI chip 400 excels. Therefore, when the grouped convolution processing definition changing device 100 of this example embodiment is used, the computation of grouped convolution is accelerated despite the increase in the amount of computation.
A specific example of a hardware configuration of the grouped convolution processing definition changing device 100 according to this example embodiment will be described below.
The grouped convolution processing definition changing device 100 shown in
The grouped convolution processing definition changing device 100 is realized by software, with the CPU 11 shown in
Specifically, each function is realized by software as the CPU 11 loads the program stored in the auxiliary storage unit 14 into the main storage unit 12 and executes it to control the operation of the grouped convolution processing definition changing device 100.
The grouped convolution processing definition changing device 100 shown in
The main storage unit 12 is used as a work area for data and a temporary save area for data. The main storage unit 12 is, for example, RAM (Random Access Memory).
The communication unit 13 has a function of inputting and outputting data to and from peripheral devices through a wired network or a wireless network (information communication network).
The auxiliary storage unit 14 is a non-transitory tangible medium. Examples of non-transitory tangible media are, for example, a magnetic disk, an optical magnetic disk, a CD-ROM (Compact Disk Read Only Memory), a DVD-ROM (Digital Versatile Disk Read Only Memory), a semiconductor memory.
The input unit 15 has a function of inputting data and processing instructions. The input unit 15 is, for example, an input device such as a keyboard or a mouse.
The output unit 16 has a function to output data. The output unit 16 is, for example, a display device such as a liquid crystal display device, or a printing device such as a printer.
As shown in
The auxiliary storage unit 14 stores programs for realizing the acquisition unit 110, the definition changing unit 120, the weight changing unit 130, and the storage unit 140 in the grouped convolution processing definition changing device 100 of this example embodiment.
The grouped convolution processing definition changing device 100 may be implemented with a circuit that contains hardware components inside such as an LSI (Large Scale Integration) that realize the functions shown in
The grouped convolution processing definition changing device 100 may be realized by hardware that does not include computer functions using elements such as a CPU. For example, some or all of the components may be realized by a general-purpose circuit (circuitry) or a dedicated circuit, a processor, or a combination of these. They may be configured by a single chip (for example, the LSI described above) or by multiple chips connected via a bus. Some or all of the components may be realized by a combination of the above-mentioned circuit, etc. and a program.
In the case where some or all of the components are realized by a plurality of information processing devices, circuits, or the like, the plurality of information processing devices, circuits, or the like may be centrally located or distributed. For example, the information processing devices, circuits, etc. may be realized as a client-server system, a cloud computing system, etc., each of which is connected via a communication network.
Next, an overview of the present invention will be described.
With such a configuration, the grouped convolution processing definition changing device can increase the computation speed of grouped convolution.
The changing means 21 may change the number which the input data is divided from G to 1.
With such a configuration, the grouped convolution processing definition changing device can minimize the impact of overhead.
The changing means 21 may change the number which the input data is divided for each of multiple convolutional layers that compose the learned convolutional neural network, respectively.
With such a configuration, the grouped convolution processing definition changing device can change the definition of grouped convolution for each convolutional layer.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/018168 | 4/28/2020 | WO |