The present disclosure relates to a convolution operation device and, in particular, to a convolution operation device and method that can scale the convolution input values.
Deep learning is an important technology for developing artificial intelligence (AI). In the recent years, the convolutional neural network (CNN) is developed and applied in the identification of the deep learning field. Compared with other deep learning architectures, especially in the mode classification field such as picture and voice identifications, the convolutional neural network can directly process the original pictures or data without the complex preprocessing. Thus, it becomes more popular and has a better identification result.
However, the convolution operation usually consumes a lot of performance. In the convolutional neural network application, especially for the convolution operation of fractional parts, the truncation error or ceiling error may occur after the calculations of multiple convolution layers. Therefore, it is desired to provide a convolution operation device that can reduce the truncation error or ceiling error.
In view of the foregoing, the present disclosure is to provide a convolution operation device and method that can reduce the truncation error or ceiling error.
A convolution operation device includes a convolution operation module, a memory, a scale control module, and a scaling unit. The convolution operation module outputs a plurality of convolution operation results containing fractional parts. The memory is coupled to the convolution operation module for receiving and storing the convolution operation results containing the fractional parts, and outputting a plurality of convolution operation input values containing fractional parts. The scale control module is coupled to the convolution operation module and generates a scaling signal according to a total scale of the convolution operation results containing the fractional parts. The scaling unit is coupled to the memory, the scale control module, and the convolution operation module. The scaling unit adjusts a scale of the convolution operation input values containing the fractional parts according to the scaling signal, and outputs the adjusted convolution operation input values containing the fractional parts to the convolution operation module.
In one embodiment, the convolution operation results containing the fractional parts are operation results of an (N−1)th layer of a convolution neural network, and the convolution operation input values containing the fractional parts are operation inputs of an Nth layer of the convolution neural network. Herein, N is a natural number greater than 1.
In one embodiment, the convolution operation results containing the fractional parts of a final layer of the convolution neural network stored in the memory are directly outputted without processing a reverse scaling.
In one embodiment, the convolution operation results containing the fractional parts of a final layer of the convolution neural network stored in the memory are processed with a reverse scaling and then outputted.
In one embodiment, the scale control module includes a detector and an estimator. The detector is coupled to the convolution operation module for detecting the total scale of the convolution operation results containing the fractional parts. The estimator is coupled to the detector for receiving at least a convolution operation coefficient and estimating a possible convolution operation scale according to the total scale of the convolution operation results containing the fractional parts and the convolution operation coefficient so as to generate the scaling signal according to the possible convolution operation scale.
In one embodiment, when the possible convolution operation scale is relative small, the scaling signal control the scaling unit to scale up the convolution operation input values containing the fractional parts.
In one embodiment, when the possible convolution operation scale is relative large, the scaling signal control the scaling unit to scale down the convolution operation input values containing the fractional parts.
In one embodiment, the detector includes a counting unit, a first integration unit, an averaging unit, a squaring unit, a second integration unit and a variation unit. The counting unit accumulates amounts of the convolution operation results containing the fractional parts for outputting a total amount. The first integration unit accumulates values of the convolution operation results containing the fractional parts for outputting a total value. The averaging unit is coupled to the counting unit and the first integration unit and divides the total value by the total amount to generate an average value. The squaring unit squares the values of the convolution operation results containing the fractional parts for outputting a plurality of squared values. The second integration unit is coupled to the squaring unit and accumulates the squared values to generate a total squared value. The variation unit is coupled to the counting unit and the second integration unit and divides the total squared value by the total amount to generate a variation value. The average value and the variation value represent the total scale of the convolution operation results containing the fractional parts.
In one embodiment, the estimator estimates the possible convolution operation scale according to Gaussian distribution.
In one embodiment, the convolution operation device is a chip, and the memory is a cache or a register inside the chip.
A scaling method of convolution inputs of a convolution neural network includes: outputting a plurality of convolution operation results containing fractional parts from a convolution operation module; generating a scaling signal according to a total scale of the convolution operation results containing the fractional parts; outputting a plurality of convolution operation input values containing fractional parts from a memory; adjusting a scale of the convolution operation input values containing the fractional parts according to the scaling signal; and outputting the adjusted convolution operation input values containing the fractional parts to the convolution operation module.
In one embodiment, the convolution operation results containing the fractional parts are operation results of an (N−1)th layer of a convolution neural network, and the convolution operation input values containing the fractional parts are operation inputs of an Nth layer of the convolution neural network. Herein, N is a natural number greater than 1.
In one embodiment, the convolution operation results containing the fractional parts of a final layer of the convolution neural network stored in the memory are directly outputted without processing a reverse scaling.
In one embodiment, the convolution operation results containing the fractional parts of a final layer of the convolution neural network stored in the memory are processed with a reverse scaling and then outputted.
In one embodiment, the step of generating the scaling signal includes: detecting the total scale of the convolution operation results containing the fractional parts; estimating a possible convolution operation scale according to the total scale of the convolution operation results containing the fractional parts and a convolution operation coefficient; and generating the scaling signal according to the possible convolution operation scale.
In one embodiment, when the possible convolution operation scale is relative small, the scaling signal control the scaling unit to scale up the convolution operation input values containing the fractional parts.
In one embodiment, when the possible convolution operation scale is relative large, the scaling signal control the scaling unit to scale down the convolution operation input values containing the fractional parts.
In one embodiment, the step of detecting the total scale includes: accumulating amounts of the convolution operation results containing the fractional parts for outputting a total amount; accumulating values of the convolution operation results containing the fractional parts for outputting a total value; dividing the total value by the total amount to generate an average value; squaring the values of the convolution operation results containing the fractional parts for outputting a plurality of squared values; accumulating the squared values to generate a total squared value; and dividing the total squared value by the total amount to generate a variation value. The average value and the variation value represent the total scale of the convolution operation results containing the fractional parts.
In one embodiment, the estimating step is to estimate the possible convolution operation scale according to Gaussian distribution.
As mentioned above, the convolution operation device and the scaling method of the convolution inputs of the convolution neural network of this disclosure can adjust the convolution operation input values containing fractional parts according to the total scale of the convolution operation results containing fractional parts. Accordingly, during the convolution operation, the numeric is not always in the fixed point format. In this disclosure, the possible range of the subsequent or next convolution operation results is estimated followed by dynamically scaling up or down the scale of the convolution operation input values and adjusting the position of the decimal point of the convolution operation input values. This configuration can prevent the truncation error or ceiling error in the convolution operation.
The disclosure will become more fully understood from the detailed description and accompanying drawings, which are given for illustration only, and thus are not limitative of the present disclosure, and wherein:
The present disclosure will be apparent from the following detailed description, which proceeds with reference to the accompanying drawings, wherein the same references relate to the same elements.
The memory 4 stores the convolution operation input values MO (for the following convolution operations) and the convolution operation results CO. The convolution operation result CO can be an intermediate result or a final result. The input values or results can be, for example, image data, video data, audio data, statistics data, or the data of any layer of the convolutional neural network. The image data may contain the pixel data. The video data may contain the pixel data or movement vectors of the frames of the video, or the audio data of the video. The data of any layer of the convolutional neural network are usually 2D array data. The image data are usually 2D array pixel data. In addition, the memory 4 may include multiple layers of storage structures for individually storing the data to be processed and the processed data. In other words, the memory 4 can be functioned as a cache of the convolution operation device.
The convolution operation input values MO for following convolution operations can be stored in other places, such as another memory or an external memory outside the convolution operation device. For example, the external memory or another memory can be optionally a DRAM (dynamic random access memory) or other kinds of memories. When the convolution operation device perform the convolution operation, these data can be totally or partially loaded to the memory 4 from the external memory or another memory, and then the convolution operation module 3 can access these data from the memory 4 for performing the following convolution operation.
The convolution operation module 3 includes one or more convolution units. Each convolution unit executes a convolution operation based on a filter and a plurality of current convolution operation input values CI for generating convolution operation results CO. The generated convolution operation results CO can be outputted to and stored in the memory 4. One convolution unit can execute an m×m convolution operation. In more detailed, the convolution operation input values CI include m values, and the filter F includes m filter coefficients. Each convolution operation input value CI is multiplied with one corresponding filter coefficient, and the total multiplying results are added to obtain the convolution operation result of the convolution unit.
In the application of convolution neural network, the convolution operation results CO are stored in the memory 4. Accordingly, when the convolution operation module 3 performs the convolution operation for next convolution layer, the data can be rapidly retrieved from the memory 4 as the inputs of the convolution operation. The filter F includes a plurality of filter coefficients, and the convolution operation module 3 can directly retrieve the filter coefficients from external memory by direct memory access (DMA).
In general, each of the convolution operation input values, filter coefficients and the convolution operation results is a numeric containing a fractional part. As shown in
Each of the convolution operation input values, the filter coefficients and convolution operation results includes an integer part and a fractional part, and the widths of these data are the same. Thus, the multiplication in the convolution operation can easily generate truncation error or ceiling error. In order to prevent these errors, the numeric is not always in the fixed point format during the convolution operation. In this disclosure, the data format of the convolution operation input values is dynamically adjusted (e.g. by scaling up or down). Accordingly, the width of the convolution operation input values is kept the same, but the position of the decimal point of the convolution operation input values is shifted right or left. In other words, in each convolution operation input value, the bits of the integer part and the fractional part can be dynamically adjusted, thereby reducing the computation error and still keeping the same bit width of the convolution operation results.
The total scale of the convolution operation results CO containing the fractional part can be represented by the average value and standard deviation thereof. For example, if the convolution operation results CO containing the fractional parts includes m values, the average value and standard deviation of these m values are obtained to represent the total scale. Assuming the m values are modelled as Gaussian distribution, the average value and standard deviation can represent the distribution status of these m values. The estimator can estimate the possible convolution operation scale based on the Gaussian distribution. Since the convolution operation results of the previous layer are the inputs of the current layer, the range of the convolution operation results of the current layer can be estimated based on the pre-known convolution operation results of the previous layer. Accordingly, it is possible to make the effective bit width of the convolution operation results of the current layer be the same as or approach the width of the filter coefficients or the width of the convolution operation input values. For example, when the width of the filter coefficients or the width of the convolution operation input values is 16 bits, the effective bit width of the convolution operation results of the current layer can be or approach 16 bits.
The convolution operation device of
In general, the CNN adopts the fixed point format to show the filter coefficients and the intermediate results. In other words, the inputs and outputs of all operation layers all adopt the same fixed point format. The fixed point format includes an integer part and a fractional part. The integer part has j bits, and the fractional part has k bits. For example, the 16-bit fixed point data usually have an 8-bit integer part and an 8-bit fractional part, and the leftmost bit of the integer part may be a sign bit.
However, the bit width of the convolution operation results is greater than the bit width of the filter coefficients or the input values. In order to keep the bit widths of the convolution operation result and the filter coefficient or input value to be the same, a part of the convolution operation result must be truncated. Taking 16-bit data as an example, generally, the convolution operation result is a 32-bit output, including 16-bit integer and 16-bit fraction. To keep the total width to be 16 bits, the 16-bit fraction needs to be truncated to be 8 bits, which results a truncation error, and the 16-bit integer needs to be ceiled to be 8 bits, which results a ceiling error. In this embodiment, the scale of the convolution input values containing fractional parts can be dynamically adjusted, thereby reducing the above computation errors.
Since the convolution neural network usually includes more than one layers, the dynamic scaling procedure between different layers can further minimize the truncation error and the ceiling error after the operations of multiple layers.
In addition, the convolution operation results containing the fractional parts of a final layer of the convolution neural network stored in the memory 4 are processed with a reverse scaling and then outputted. For example, the convolution operation results are outputted to a controller or a device outside the convolution operation device. In one embodiment, the scaling process can cause the shift of the decimal point in the convolution output results. The reverse scaling step is to eliminate the accumulated shift of the decimal point after multiple layers of convolution operations, so that the decimal point of the convolution operation results containing the fractional parts of the final layer can be shifted to a proper position in the current scale. For example, if the accumulated shift after multiple layers of convolution operations is at 6 bits to right, the value of the convolution operation result containing the fractional part of the final layer should be shifted for 6 bits leftwardly. This operation is suitable for the application that focusing on the values of the convolution operation results.
In addition, the convolution operation results containing the fractional parts of a final layer of the convolution neural network stored in the memory 4 are directly outputted without processing a reverse scaling. For example, the convolution operation results are outputted to a controller or a device outside the convolution operation device. In this embodiment, no matter how many the accumulated shift of the decimal point is, the step for shifting the decimal point back is not needed. This operation is suitable for the application that focusing on the ratio of the convolution operation results, which is not focusing on the values of the convolution operation results.
As shown in
In this embodiment, the convolution operation results Rst containing the fractional parts of a final layer of the convolution neural network stored in the memory 4 can be outputted to a controller 5. In this case, the convolution operation results Rst containing the fractional parts are directly outputted without processing a reverse scaling. In addition, the convolution operation results Rst containing the fractional parts can be processed with a reverse scaling and then outputted. For example, the estimator 12 generates a scaling result according to the scaling signal S of each layer, and outputs the scaling result SR to the controller 5. Then, the controller 5 reads the convolution operation results Rst containing the fractional parts to determine whether to perform the reverse scaling or not. The scaling result SR can be a sum of the entire scaling signals S. Alternatively, the estimator 12 may generate one scaling result SR upon generating each scaling signal S. The scaling results SR can transfer the message about the scaling size of each layer to the controller 5. Besides, the controller 5 can output a control signal SC to request the estimator 12 to generate the scaling signal S and scaling result SR by either one of the above modes.
In the above embodiments, the convolution operation device can be a chip, and the memory can be a cache or register inside the chip. The memory can be an SRAM (static random-access memory). The scale control module 1, the scaling unit 2 and the convolution operation module 3 can be the logic circuits inside the chip.
In summary, the convolution operation device and the scaling method of the convolution inputs of the convolution neural network of this disclosure can adjust the convolution operation input values containing fractional parts according to the total scale of the convolution operation results containing fractional parts. Accordingly, during the convolution operation, the numeric is not always in the fixed point format. In this disclosure, the possible range of the subsequent or next convolution operation results is estimated followed by dynamically scaling up or down the scale of the convolution operation input values and adjusting the position of the decimal point of the convolution operation input values. This configuration can prevent the truncation error or ceiling error in the convolution operation.
Although the disclosure has been described with reference to specific embodiments, this description is not meant to be construed in a limiting sense. Various modifications of the disclosed embodiments, as well as alternative embodiments, will be apparent to persons skilled in the art. It is, therefore, contemplated that the appended claims will cover all modifications that fall within the true scope of the disclosure.