This application claims the priority benefit of Taiwan application serial no. 108121308, filed on Jun. 19, 2019. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
The disclosure relates to a neural network technique, and more particularly, to a hardware architecture and a processing method thereof for an activation function in a neural network.
The neural network is an important subject in artificial intelligence (AI) and makes decisions by simulating the operation of human brain cells. It is noted that there are many neurons in human brain cells, and these neurons are connected to each other through synapses. Each neuron may receive signals through the synapses, and transmits the signal after transformation to other neurons. The capability of transformation of each neuron is different, and human beings can form the capability of thinking and making decisions through the operation of the aforementioned signal transmission and transformation. The neural network achieves the corresponding capability based on the above operation.
In view of the above, the disclosure provides a hardware architecture and a processing method thereof for an activation function in a neural network, in which a piecewise linear function is used to approximate the activation function to simplify the calculation, the input ranges are limited, and the bias of each piece of linear function is changed, to achieve a better balance between accuracy and complexity.
An embodiment of the disclosure provides a hardware architecture for an activation function in a neural network. The hardware architecture includes a storage device, a parameter determining circuit, and a multiplier-accumulator, but the disclosure is not limited thereto. The storage device is configured to record a look-up table. The look-up table is a corresponding relation among multiple input ranges and multiple linear functions, the look-up table stores slopes and biases of the linear functions, a difference between an initial value and an end value of each of the input ranges is an exponentiation of base-2, and the linear functions form a piecewise linear function to approximate the activation function for the neural network. The parameter determining circuit is coupled to the storage device and uses at least one bit value in an input value of the activation function as an index to query the look-up table, to determine the corresponding linear function. The index is an initial value of one of the input ranges. The multiplier-accumulator is coupled to the parameter determining circuit and calculates an output value of determined linear function by feeding a part of bits value of the input value.
On the other hand, an embodiment of the disclosure provides a processing method for an activation function in a neural network. The processing method includes the following steps, but the disclosure is not limited thereto. A look-up table is provided. The look-up table is a corresponding relation among multiple input ranges and multiple linear functions, the look-up table stores slopes and biases of the linear functions, a difference between an initial value and an end value of the input range of each of the linear functions is an exponentiation of base-2, and the linear functions form a piecewise linear function to approximate the activation function for the neural network. At least one bit value in an input value of the activation function is used as an index to query the look-up table, to determine the corresponding linear function. The index is an initial value of one of the input ranges. The output value of the determined linear function is calculated by feeding the part of bits value of the input value.
Based on the above, the hardware architecture and the processing method thereof for an activation function in a neural network are depicted in the embodiments of the disclosure. The piecewise linear function is used to approximate the activation function, the range size of each piece of range is limited, and the bias of each linear function is adjusted. Therefore, it is not required to perform multi-range comparison (i.e., a large number of comparators may be omitted), and the hardware operation efficiency can be improved. In addition, by modifying the bias of the linear function, the number of input bits of the multiplier-accumulator can be reduced, and the objectives of low costs and low power consumption can be achieved.
To make the aforementioned more comprehensible, several embodiments accompanied with drawings are described in detail as follows.
The storage device 110 may be a fixed or movable random access memory (RAM), read-only memory (ROM), flash memory, register, combinational circuit, or a combination of the above devices. In the present embodiment of the disclosure, the storage device 110 records a look-up table. The look-up table relates to a corresponding relation among input ranges and approximating linear functions of the activation function. The look-up table stores the slopes and biases of multiple linear functions, and the details thereof will be described in the subsequent embodiments.
The parameter determining circuit 130 is coupled to the storage device 110. The parameter determining circuit 130 may be a specific functional unit, a logic circuit, a microcontroller, or a processor of various types.
The multiplier-accumulator 150 is coupled to the storage device 110 and the parameter determining circuit 130. The multiplier-accumulator 150 may be a specific circuit capable of multiplication and addition operations, or may be a circuit or processor composed of one or more multipliers and adders.
To facilitate the understanding of the operation process of the present embodiment of the disclosure, several embodiments will be provided below to detail the operation process of the hardware architecture 100 in the present embodiment of the disclosure. Hereinafter, the method of the present embodiment of the disclosure will be described with reference to the devices or circuits in the hardware architecture 100. The processes of the method may be adjusted according to the implementation condition, and the disclosure is not limited thereto.
where x0 to x3 are the initial values or end values of the input ranges, x0 is 0, x1 is 1, x2 is 2, and x3 is 3. In addition, w0 to w2 are respectively the slopes of the linear functions of each of input ranges, and b0 to b2 are respectively the biases (or referred to as the y-intercept, i.e., the y-coordinate of the graph of this function intersecting with the y-axis (the vertical axis in
w
i=(ƒ(xi+1)−ƒ(xi))/(xi+1−xi) (2)
where i is 0, 1, or 2. In addition, the value of each bias is the difference between the result obtained by substituting the initial value of the input range into the activation function ƒ(x) and the product of the initial value and the corresponding slope:
b
i=ƒ(xi)−wi*xi (3)
It is noted that the initial value of the input range is the same as the end value of the adjacent input range. The term “adjacent” here may also mean “closest”.
It is noted that the circuit design for implementing the piecewise linear function in the related art requires multiple comparators to sequentially compare the input value with the input ranges in order to determine the input range in which the input value is located. As the input ranges increase, more pieces of linear functions approximate the activation function, and a higher accuracy is obtained. However, the number of comparators also needs to be increased correspondingly, which increases the complexity of the hardware architecture. In addition, to avoid loss of accuracy, the multiplier-accumulator with a greater number of input bits is generally used, which similarly increases the hardware cost or even affects the operation efficiency and increases the power consumption. Although decreasing the input ranges or using a multiplier-accumulator with a small number of input bits can improve the aforementioned issue, doing so will result in loss of accuracy. Therefore, how to strike a balance between the two objectives of a high accuracy and a low complexity is one of the subjects requiring effort in the related fields.
The present embodiment of the disclosure provides a new linear function segmentation method, which limits the difference between the initial value and the end value of each input range to an exponentiation of base-2. For example, if the initial value is 0 and the end value is 0.5, then the difference between the two is 2{circumflex over ( )}−1; if the initial value is 1 and the end value is 3, then the difference between the two is 2{circumflex over ( )}1. When the input value is represented in the binary system, by using only one or more bits value in the input value as the index, the input range in which the input value is located may be determined without comparing any input ranges.
In addition, the look-up table provided in the present embodiment of the disclosure is a corresponding relation among multiple input ranges and multiple linear functions, and one input range corresponds to a specific linear function. For example, the input range is 0≤x<1 and corresponds to w0*x+b0, i.e., the linear function corresponding to one piece in the piecewise linear function. It is noted that the aforementioned index can correspond to the initial value of the input range. Since the range size (i.e., the difference between the initial value and the end value) of the input range is limited to the exponentiation of base-2, and the input value is represented in the binary system, the input range to which the input value belongs can be directly obtained from the bits value of the input value. Moreover, the index is also used to access the slope and bias of the linear function in the look-up table.
In an embodiment, the index includes the first N bits value in the input value, and N is a positive integer greater than or equal to 1 and corresponds to the initial value of the input range of one linear function. Taking
It is noted that “first” as in the aforementioned “first N” refers to the value of the N highest-order bits in the input value of the binary system. In addition, depending on different design variations of the input ranges, in other embodiments, the parameter determining circuit 130 may select the bit value of specific bits from the input value. For example, if the input ranges are 0≤x<0.25, 0.25≤x<0.75, and 0.75≤x<2.75, then the parameter determining circuit 130 selects the 1st bit before the decimal point and the 1st and 2nd bits after the decimal point from the input value.
Other variations of the input ranges are also possible.
The differences between the initial value and the end value in each of the input ranges are respectively 2{circumflex over ( )}−1, 2{circumflex over ( )}−1, 2{circumflex over ( )}−1, 2{circumflex over ( )}−1, and 2{circumflex over ( )}0 (i.e., are all the exponentiation of base-2). v0 to v4 are respectively the slopes of the linear functions of the input ranges of each of the pieces, and c0 to c4 are the biases of the linear functions of the input ranges of each of the pieces. The piecewise linear function ƒ2(x) intersects with the activation function ƒ(x) at the initial values and the end values of each of the input ranges.
In the present embodiment, the first N bits value in the input value may be used as the index. For example, if the input value is 0001.1010_1100_00112, then the bit value of the first 5 bits may be obtained as 0001.12 (i.e., 1.5 in the decimal system), which namely corresponds to the input range of 1.5≤x<2 in
Next, the multiplier-accumulator 150 calculates an output value of the activation function by feeding a part of the bits value of the input value into the determined linear function (step S330). Specifically, step S310 may determine the linear function and the weight and bias therein. The parameter determining circuit 130 may input the input value, the weight, and the bias to the multiplier-accumulator 150, and the multiplier-accumulator 150 will calculate a product of the input value and the weight and use a sum of the product and the bias as the output value. Referring to the basic operating architecture of
It is noted that, to avoid an excessively low accuracy caused by the output result of function approximation, a multiplier-accumulator with a high number of input bits is adopted in the related art, which however increases the hardware cost. In an embodiment of the disclosure, the parameter determining circuit 130 uses a result of subtracting the initial value of the input range corresponding to the index from the input value as a new input value, and the multiplier-accumulator 150 feeds the new input value into the determined linear function. Specifically, taking the linear function of ƒ1(x)=w1*x+b1 of
In other words, the difference between the input value and the initial value of the belonged input range may be used as the new input value, and the bias is the output value (this value may be recorded in the look-up table in advance) obtained by feeding the initial value of the belonged input range into the piece of linear function ƒ1( ) or the activation function ƒ( ). Thereby, the number of bits of the multiplier-accumulator 150 can be reduced.
In an embodiment, since the parameter determining circuit 130 only needs to use the difference between the input value and the initial value of the belonged input range as the new input value, if the initial value of the input range is associated with the first few bits value in the input value, then the parameter determining circuit 130 may use the first N bits value in the input value as the index (where N is a positive integer greater than or equal to 1 and the index corresponds to the initial value one of the input ranges) and use the last M bits value in the input value as the new input value. The sum of M and N is the total number of bits of the input value. Therefore, it is not required to adopt a multiplier-accumulator with a number of input bits equal to the total number of bits of the input value as the multiplier-accumulator 150, and the hardware architecture 100 of the present embodiment of the disclosure may adopt a multiplier-accumulator with a number of input bits smaller than the total number of bits of the input value.
Taking
It is noted that, according to different design requirements, the number of the linear functions used to approximate the activation function is associated with the maximum error of comparing the output value with the output value obtained by feeding the input value into the activation function. To reduce the maximum error (i.e., to improve the accuracy of approximation), it is a means to increase the number of the input ranges (corresponding to the number of the linear functions), but doing so would increase the complexity. To strike a balance between accuracy (or maximum error) and complexity, the number of the input ranges is crucial or even affects the number of input bits required for the multiplier.
In addition, the activation functions tanh and sigmoid have a mutually convertible characteristic (sigmoid(x)=tanh(x/2)/2+0.5 . . . (6)), and the parameter determining circuit 130 may also obtain the output value of the sigmoid function by using a piecewise linear function which approximates tanh.
Taking
It is noted that the input ranges used in the piecewise linear functions ƒ1(x) and ƒ2(x) and the contents of the linear functions in the foregoing embodiment have only been described as an example. In other embodiments, the contents thereof may be changed, and the present embodiment of the disclosure is not limited thereto.
In summary of the above, the hardware architecture and the processing method thereof for an activation function in a neural network are depicted in the embodiments of the disclosure. The input ranges of the piecewise linear function which approximates the activation function are limited, so that the range size of the input ranges is associated with the input value represented in the binary system (in the embodiments of the disclosure, the range size is limited to the exponentiation of base-2). Therefore, it is not required to perform multi-range comparison, but the corresponding linear function can be obtained by directly using the part of bits value of the input value as the index. In addition, the embodiments of the disclosure change the bias of each piece of linear function and redefines the input value of the linear function to thereby reduce the number of input bits of the multiplier-accumulator and further achieve the objectives of low costs and low power consumption.
It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed embodiments without departing from the scope or spirit of the disclosure. In view of the foregoing, it is intended that the disclosure covers modifications and variations provided that they fall within the scope of the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
108121308 | Jun 2019 | TW | national |