The present invention relates to compression of a neural network.
In recent years, there is accelerating movement toward realizing high-level automatic driving using highly accurate object recognition and behavior prediction by implementing a deep neural network (DNN) on an in-vehicle electronic control unit (ECU).
As illustrated in
Examples of a DNN for automatic driving include point cloud artificial intelligence (AI) which uses a Light Detection and Ranging or Laser Imaging Detection and Ranging (LiDAR) point cloud to realize object recognition, segmentation, and the like.
PTL 1 discloses a method for reducing (compressing) computation of a neural network represented by a DNN.
In order to apply the point cloud AI to automatic driving, it is necessary to improve the performance of the point cloud AI by using high-resolution LiDAR. Therefore, implementing the point cloud AI for automatic driving requires an enormous amount of computation as compared with the computing performance of an in-vehicle processor. For example, computation of at least 65 giga operations (GOPs) is required in order to realize object recognition in automatic driving using PointPillars, which is a type of point cloud AI. Conversely, the computing performance of a processor that can be mounted on an in-vehicle ECU is approximately several tens of tera operations per second (TOPS), and is insufficient for executing an entire automatic driving system including PointPillars in real time.
Therefore, there is a need for a computation reduction (compression) algorithm capable of both reducing the computation of the point cloud AI for automatic driving and maintaining high inference accuracy.
In the technique described in PTL 1, in order to reduce the computation of a neural network represented by a DNN, compression of the DNN is realized by a multi-iteration compression method.
For a training data set 301 and a precompression DNN model 302, for example, an initial compression condition is determined by selecting from preset compression conditions (S303). It is assumed that the precompression DNN model 302 has been trained with the training data set 301.
Next, the precompression DNN model 302 is compressed based on a compression position in the DNN and a compression rate based on the initial compression condition (S304).
Next, the compressed DNN model is retrained with the training data set 301 (S305), and it is determined whether to end the compression by evaluating an error in the output of the DNN model before and after compression (S306). When the compression is completed, the compressed DNN model is recorded as a compressed DNN model 307.
When the compression is not completed, optimal conditions of the compression position and the compression rate are searched for (S308), and the compression is performed again (S304).
However, in a DNN with a complicated structure such as point cloud AI, an optimal compression algorithm is different for each part (subgraph) constituting the DNN. Therefore, it is necessary to select and apply an optimal compression algorithm in subgraph units in order to realize high accuracy compression that achieves both computation reduction of the DNN and maintenance of inference accuracy.
As such, in the above configuration, there is a problem that optimization of the compression algorithm to be applied cannot be realized in subgraph units of the neural network.
A preferred aspect of the present invention is an information processing device that selects an algorithm for compressing a neural network. The information processing device includes a subgraph dividing section which divides the neural network into subgraphs and an optimizing section which outputs a compression configuration in which one compression technique selected from a plurality thereof is associated with each of the subgraphs.
Another preferred aspect of the present invention is a neural network compression method including a first step of dividing a neural network into subgraphs and a second step of performing tentative compression by associating one compression technique selected from a plurality thereof with each of the subgraphs.
Optimization of the compression algorithm to be applied can be realized in subgraph units of the neural network.
The following describes embodiments with reference to the accompanying drawings. However, the present invention is not to be construed as being limited to the description of the embodiments indicated in the following. Those skilled in the relevant art should easily be able to understand that the specific configuration can be changed within a scope not departing from the spirit or gist of the present invention.
In the configurations of the embodiments described below, the same reference numerals are commonly used for the same parts or parts having similar functions in different drawings, and redundant description may be omitted.
In a case where there is a plurality of elements having the same or similar functions, the same reference numerals may be appended with different subscripts for description. However, in a case where it is not necessary to distinguish between a plurality of elements, description may be omitted.
Notations such as “first”, “second”, and “third” in the present specification and the like are appended to identify constituent elements, and do not necessarily limit the number, order, or contents thereof. In addition, a number for identifying a constituent element is used for each context, and a number used in one context does not necessarily indicate the same configuration in another context. Furthermore, a constituent element identified by a certain number is not prevented from also functioning as a constituent element identified by another number.
Aspects such as position, size, shape, and range of the respective components illustrated in the drawings and the like may not represent actual aspects such as position, size, shape, and range thereof in order to facilitate understanding of the invention. Therefore, the present invention is not necessarily limited to aspects such as the position, size, shape, and range disclosed in the drawings and the like.
The publications, patents, and patent applications cited in the present specification constitute a part of the description of the present specification as-is.
Constituent elements expressed in the singular in the present specification are intended to include the plural unless the context clearly dictates otherwise.
In the compression algorithm optimization (S409), a precompression DNN model 402 is first divided into a plurality of subgraphs, and an optimal compression algorithm is searched for with high efficiency in subgraphs units.
For a training data set 401 and the precompression DNN model 402, for example, an initial compression condition is determined for each subgraph by selecting from preset compression conditions (S403). It is assumed that the precompression DNN model 402 has been trained with the training data set 401.
Next, the precompression DNN model 402 is compressed in each subgraph based on a compression position in the DNN and a compression rate based on the initial compression condition (S404).
Next, the compressed DNN model is retrained with the training data set 401 (S405), and it is determined whether to end the compression by performing such actions as evaluating an error in the output of the DNN model before and after compression (S406). When the compression is completed, the compressed DNN model is recorded as a compressed DNN model 407.
Here, since detection accuracy decreases due to compression, retraining is performed to recover the detection accuracy. When retraining is not performed, there is a possibility that the detection rate of the compressed AI is greatly reduced. In retraining, basically, the same data set as that used for training of the precompression DNN model may be used, but the data set can be flexibly changed according to an application scene or the like.
When compression is not ended, optimal conditions of the compression position and the compression rate are searched for (S408), and compression is performed again (S404).
Next, the compression algorithm is applied to each of the divided subgraphs, and an error absolute value of the calculation result of each subgraph before and after the application of compression is output as the magnitude of a perturbation. The compression algorithm is selected from, for example, a plurality of previously prepared algorithms. Different compression algorithms can be selected for each subgraph.
The perturbation calculation results are merged and comprehensively evaluated. Next, using an arbitrary optimization algorithm, a combination of compression algorithms to be applied to each subgraph is changed so that the magnitude of the evaluated perturbation becomes small. At this time, in a case where the magnitude of the perturbation does not converge to less than a certain threshold even though the combination of compression algorithms is changed an arbitrary number of times, the subgraph division position of the precompression DNN model 402 is changed, and combination optimization of the compression algorithms is performed again.
In the present embodiment, a compression technique 1a (reduction in the number of point clouds per voxel) is applied to a subgraph 1 of the DNN model of the precompression point cloud AI 601, and a compression technique 2a (pruning) is applied to a subgraph 2. As a result, an optimally compressed point cloud AI 602 is obtained.
Note that an arrow 603 indicates a limit value of the detection rate and the processing time obtained by the compression technique 1a for the subgraph 1, and an arrow 604 indicates a limit value of the detection rate and the processing time obtained by the compression technique 2a for the subgraph 2. An oblique dotted line 605 indicates these boundaries. As described above, it can be understood that there is a compression technique suitable for each subgraph.
The optimally compressed point cloud AI 602 of the present embodiments can reduce inference time by 50% while suppressing a decrease in the detection rate due to the compression of PointPillars. Note that the present embodiments can be applied not only to point cloud AI but also to other AI such as an image processing CNN.
Next, by inputting the compression configuration (B003), the data set (B004), and n subgraphs to a compressing section (B007), a compressed DNN (B005) is generated and stored in the memory. Finally, the compressed DNN (B005) and the LiDAR point cloud are input to an inferring section (B008) to obtain an inference result. According to the present embodiment, it is possible to dynamically adjust the DNN compression technique according to a change in a traveling scene or the like.
The system configuration of
In
The inferring section (B008) is a device for executing processing of AI, and is commercially available as a module capable of implementing a neural network. Known examples of the module include Versal (registered trademark), Xavier (registered trademark), and NCS2 (trade name).
The above-described server configuration may be configured by a single device, or may be configured by another computer to which by any part of the input device, the output device, the processing device, or the storage device is connected via a network.
In the present embodiment, a function equivalent to a function configured by software can also be realized by hardware such as a field programmable gate array (FPGA) or an application specific integrated circuit (ASIC).
Hereinafter, the configuration and the processing flow of the compression algorithm optimizing section (B006) in
The flow of
S000: The precompression DNN, a subgraph perturbation output from a perturbation calculating section 2 (C006), a maximum number of subgraphs that is the upper limit of a number of divisions into subgraphs, and a graph division signal output from an optimizing section (C007) are input to a subgraph dividing section (C003), and the precompression DNN is divided into n subgraphs. The maximum number of subgraphs may be preset or specified by the user each time.
S001: A compression algorithm for each subgraph output by the subgraphs, the compression table (B002), and the optimizing section (C007) is input to a tentative compressing section (C004), and the compression algorithms corresponding to respective subgraphs are applied to generate tentatively compressed subgraphs. As illustrated in
S002: The precompression DNN, the data set, and the tentatively compressed subgraphs are input to a perturbation calculating section 1 (C005) and the perturbation calculating section 2 (C006), and perturbations of calculation results before and after compression are generated as an NN perturbation and a subgraph perturbation, respectively. Here, the subgraph perturbation is recorded as a log file (C008) in the memory.
S003: By inputting the NN perturbation to the optimizing section (C007), the combination of compression algorithms is corrected so that the value of the NN perturbation becomes small. In the search for the combination, for example, combinations of algorithms with high compatibility in the compression table B002 are comprehensively tried, and loop processing of the compression algorithm optimization is performed.
S004: In a case where the NN perturbation is equal to or larger than an arbitrary threshold value for predetermined consecutive k times during a trial of the optimizing section (C007), the graph division signal is enabled and the processing returns to S000. Otherwise, the processing proceeds to S005.
S005: The end of the optimization is determined according to a predetermined end condition. The end condition includes items such as that the NN perturbation satisfies a predetermined condition and that a limit time or a limit number of times of processing is exceeded. When the optimization of the compression algorithms is continued, the processing returns to S001. Otherwise, the processing is completed.
Next, the subgraph dividing section (C003) will be described.
The subgraph dividing section (C003) is normally in a standby state, and is activated when the graph division signal has been enabled (S004). Since the subgraph perturbation and the graph division signal are not applied immediately after the start of the compression processing of the AI, the AI is not divided (i.e., the number of subgraphs n=1), and the same compression technique is applied to the entire AI (001). The optimal compression technique to be applied is selected by the tentative compressing section (C004) with reference to the compression table (B002). The perturbation calculating section 1 (C005) calculates a perturbation value of an inference result caused by compression for the entire AI (S002). At this time, since the subgraph=the precompression DNN, the perturbation calculating section 2 (C006) outputs the same perturbation value as the perturbation calculating section 1 (C005). The combination of compression algorithms is corrected (S003), and when the end condition is satisfied, it is not necessary to divide the AI, so the compression processing ends (S005). In a case where the NN perturbation is equal to or greater than the threshold value for a predetermined number of consecutive times even if the combination of the compression algorithms is corrected, the graph division signal is enabled, and the processing proceeds to S012 (S000) (S004).
S012: The precompression DNN (graph 1101 in
S013: The graph stored in the division target node is divided into m, and stored in m child nodes under the division target node. In the example of
S014: The tentative compressing section (C004) applies a compression technique for each subgraph after the division (S001), and the perturbation calculating section 1 (C005) calculates a perturbation value of an inference result caused by compression for the entire AI (S002). The compression algorithms are corrected (S003). In a case where the NN perturbation is greater than the threshold value due to compression, the graph division signal is enabled and the processing proceeds to S015 (S004). In a case where the NN perturbation is smaller than the threshold value, it is determined that the effect of the graph division has been obtained, and the processing ends (S005).
S015: For each of the subgraphs stored in the m child nodes, the perturbation calculating section 2 (C006) acquires a value of the subgraph perturbation (S002). In the example of
S016: Among the m child nodes, a child node with the largest absolute value of the subgraph perturbation is set as a new division target node. In the example of
S017: When the number of leaves of the generated m-branch tree is equal to or less than the maximum number of subgraphs, the processing returns to S013. Otherwise, the processing is ended. In the example of
S013: The graph of the division target node (the subgraph 1102 in this example) is a division target and is divided into m=2. The divided subgraphs 1104 and 1105 are stored in respective m children of the division target node. In this state, each subgraph stored in 2m−1 leaf nodes (leaf nodes of depth 1: m−1+m leaf nodes of depth 2) of the m-branch tree are divided subgraphs. In this example, a leaf node of depth 1 is the subgraph 1103, and leaf nodes of depth 2 are the subgraphs 1104 and 1105. The number of these leaf nodes (3 in the example of
S014: The tentative compressing section (C004) applies a compression technique to each 2m −1 subgraph, and the perturbation calculating section 1 (C005) calculates a perturbation value of an inference result caused by compression for the entire AI. The compression algorithms are corrected (S003). In a case where the value of the NN perturbation due to compression is greater than the threshold value, the subgraph division signal is enabled and the division is repeated (S004). The next division target is a graph having the largest subgraph perturbation among the subgraphs 1103, 1104, and 1105.
The end condition of the loop is a case where a division number n of the subgraphs (the number of leaf nodes of the m-branch tree) exceeds a maximum value (S017).
Next, the perturbation calculating section 1 (C005) will be described. The perturbation calculating section 1 (C005) calculates the NN perturbation of the entire network model. The NN perturbation is used to search for an optimal combination of compression techniques.
Next, the perturbation calculating section 2 (C006) will be described. The perturbation calculating section 2 (C006) calculates a subgraph perturbation for each subgraph. The subgraph perturbation is used for determining necessity of subgraph division of the network.
Number | Date | Country | Kind |
---|---|---|---|
2020-188356 | Nov 2020 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/034898 | 9/22/2021 | WO |