This application claims priority under 35 U.S.C. § 119 (a) to Korean Patent Application No. 10-2023-0180602, filed on Dec. 13, 2023, with the Korean Intellectual Property Office, the disclosure of which is incorporated herein in its entirety by reference.
The present disclosure relates to an apparatus and method for accelerating activation function, and more particularly, to an apparatus and method for accelerating activation function based on PIM.
Artificial neural networks typically consist of multiple layers, each comprising multiple kernels that receive input or output from the previous layer and perform neural network operations.
As shown in
of the received plurality of values u1 to ud and a plurality of weights w1 to wd determined by learning, and inputs the result of the weighted sum to the activation function f, and outputs the result of the activation operation of the activation function for the result of the weighted sum as a kernel output y. Here, the activation function f is a function that is specified in advance when configuring the artificial neural network.
Conventional artificial neural networks primarily used simple activation functions, such as ReLU (Rectified Linear Unit). However, recent artificial neural networks such as the Transformer model or the DCNN (Deep Convolutional Neural Network) model increasingly employ activation functions with high complexity. In addition, as the complexity of the activation function increases, the computational cost of the artificial neural network increases.
An object of the present disclosure is to provide an apparatus and method for accelerating activation function, which can reduce computational cost and accelerate operations by using approximate activation functions approximated by intervals according to input values for operations of high-complexity activation functions.
Another object of the present disclosure is to provide an apparatus and method for accelerating activation function, which can easily determine intervals based on input values for selecting approximate activation functions.
Still another object of the present disclosure is to provide an apparatus and method for accelerating activation function, which can obtain multiple approximate activation functions for each of multiple input values in a memory and accelerate the computations by performing parallel processing.
According to one embodiment of the present disclosure, an apparatus for accelerating activation function is provided, which comprises: a memory; and a controller, wherein the application range of the activation function is divided into multiple intervals, and the controller obtains an interval indicator indicating an interval in which an input value is included by using multiple interval detection operators to select one approximate activation function among multiple approximate activation functions approximated in each interval, obtains an approximate activation function set for the interval designated by the interval indicator, and inputs the input value into the obtained approximate activation function to obtain an activation operation result.
The interval detection operator may be configured as an operator that inverts some bits of an input value and performs a logical AND operation according to the bit length and bit value of a bit pattern that values included in each interval have in common, when the application range of the activation function is divided into multiple intervals by multiple boundary points expressed in floating point numbers.
The interval indicator may be a one-hot segment vector composed of a 1-bit interval detection value obtained by operating the input value with each of multiple interval operators for each interval.
The multiple approximate activation functions may be obtained by searching for a point where the error between the activation function and an approximate function in which the activation function is approximated by linear interpolation in the interval divided so far in the application range of the activation function is maximum, setting points whose absolute value is a power of 2 among the closest points to the searched point as boundary points and adding them to a boundary point list, if the set boundary points are already included in the boundary point list, setting the midpoint between the two closest power of 2 points to the searched point as a boundary point and adding it to the boundary point list, if both the selected boundary point and the midpoint are boundary points that are already included in the boundary point list, setting the midpoint between the closest point to the searched point in the boundary point list and the searched point as a boundary point and adding it to the boundary point list, and linearly interpolating the activation function in each interval divided by the multiple boundary points.
The apparatus for accelerating activation function may be implemented with PIM (Processing-in-memory).
The controller may obtain the interval indicator for each of the multiple input values by performing an interval detection operation in parallel for each of the multiple input values stored in the memory using the multiple interval detection operators.
The controller may obtain coefficients and biases of approximate activation functions selected by each of the multiple interval indicators according to multiple input values by combining logical operations of bit values for each bit position of the multiple interval indicators stored in the memory.
The controller may obtain the bit values for each bit position of the coefficient and the bias by performing NOR operations on the bit values for each bit position specified in the interval indicators.
The apparatus for accelerating activation function may be implemented with a nonvolatile memory-based PIM, and the controller may perform reduced-NOR operations that first performs and stores NOR operations on two bits when performing NOR operations on three or more multi-bits, and then overlaps and records NOR operations on the bit values of other bits.
The input values may be obtained by calculating a weighted sum of weights and at least one value applied to multiple kernels provided in the artificial neural network, respectively.
According to another embodiment of the present disclosure, a method for accelerating activation function is provided, which is performed by a controller in an apparatus including a memory and the controller, comprising the steps of: obtaining an interval indicator indicating an interval in which an input value is included by using multiple interval detection operators to select one approximate activation function among multiple approximate activation functions approximated in each interval where the application range of the activation function is divided into multiple intervals; obtaining an approximate activation function set for the interval designated by the interval indicator; and inputting the input value into the obtained approximate activation function to obtain an activation operation result.
The apparatus and method for accelerating activation function of the present disclosure utilize an approximate activation function approximated by an interval according to an input value for the operation of a high-complexity activation function, and can easily determine an interval according to an input value for selecting an approximate activation function. In addition, by obtaining multiple approximate activation functions for each of multiple input values, the operation can be accelerated by performing activation operations simultaneously in parallel.
Hereinafter, specific embodiments according to the embodiments of the present disclosure will be described with reference to the drawings. The following detailed description is provided to assist in a comprehensive understanding of the methods, devices and/or systems described herein. However, this is only an example and the present invention is not limited thereto.
In describing the embodiments of the present disclosure, when it is determined that detailed descriptions of known technology related to the present disclosure may unnecessarily obscure the gist of the embodiments, the detailed descriptions thereof will be omitted. The terms used below are defined in consideration of functions in the present disclosure, but may be changed depending on the customary practice or the intention of a user or operator. Thus, the definitions should be determined based on the overall content of the present specification. The terms used herein are only for describing the embodiments, and should not be construed as limitative. Unless the context clearly indicates otherwise, the singular forms are intended to include the plural forms as well. It should be understood that the terms “comprises,” “comprising,” “includes,” and “including,” when used herein, specify the presence of stated features, numerals, steps, operations, elements, or combinations thereof, but do not preclude the presence or addition of one or more other features, numerals, steps, operations, elements, or combinations thereof. In addition, terms such as “ . . . unit”, “ . . . er/or”, “module” and “block” described in the specification means a unit for processing at least one function or operation, which may be implemented by hardware or software or a combination of hardware and software.
Referring to
The interval information acquisition module 10 acquires interval information for each of multiple intervals to which multiple approximate activation functions that approximate the activation function f are applied.
In the present disclosure, the apparatus for accelerating activation function reduces computational cost by using multiple approximate activation functions f1′, f2′, f3′ and f4′ in which a complicated activation function f in an artificial neural network is divided into intervals and approximated by a simple linear function, as shown in
However, since the activation function is divided into intervals and approximated, the approximate activation functions f1′, f2′, f3′ and f4′ are different for each interval, so the apparatus for accelerating activation function must perform the activation operation by selecting the approximate activation function of the interval corresponding to the input value x. Accordingly, the interval information acquisition module 10 acquires interval information for each interval determined when the activation function f is approximated by multiple approximate activation functions.
Here, the activation function f may be approximated based on the PWL (piecewise linear) approximation technique, as shown in
In this way, when dividing the range to which the activation function f is applied into multiple intervals and approximating it, the multiple intervals may be divided to have uniform sizes, but as in
Here, the apparatus for accelerating activation function uses multiple approximate activation functions f1′, f2′, f3′ and f4′ obtained by approximating the activation function f based on the PWL approximation technique, which is a non-uniform interpolation technique, to improve accuracy. In addition, the activation function f may be divided into multiple intervals where the range over which the activation function f is applied is non-uniform according to the Variable Breakpoint Generation (hereinafter, VBG) algorithm in the process of approximating the activation function f based on the PWL approximation technique. In the VBG algorithm, the number of boundary points to be used to divide the intervals, i.e. the number of intervals, is used as a constraint.
In the VBG algorithm, the boundary points are sequentially repeatedly generated up to a specified number within a specified range. Specifically, the VBG algorithm extracts an approximate function f in which the activation function f is approximated by linear interpolation in the currently divided intervals. Then, the maximum error point e indicating the input value x where the error between the activation function f and the approximate function f is maximized is searched first. Then, considering the floating point representation, the value closest to the searched maximum error point e and which has the minimum number of bits in the part that must be expressed as a mantissa in the floating point representation is generated as a boundary point (Breakpoint) that divides the interval. At this time, the boundary point may be set so that the number of bits in the mantissa is the minimum at a level that can be distinguished from other previously set boundary points.
Specifically, among the points closest to the maximum error point, the point whose absolute value is a power of 2 may be set as a boundary point first and added to the boundary point list. However, if the boundary point is already set as a boundary point and included in the boundary point list, two points in the power of 2 closest to the maximum error point may be selected from the boundary point list, and the midpoint between the two selected points may be set as the boundary point. In addition, if the midpoints of the power of 2 and power of 2 are preset and are in the boundary point list, the point closest to the maximum error point may be selected from the boundary point list, and the midpoint between the maximum error point and the selected point may be added to the boundary point list.
Afterwards, the process of extracting the approximation function f again from the interval divided according to the set boundary point and searching for the maximum error point e with the maximum error value to generate boundary points is repeated. The VBG algorithm repeatedly generates boundary points until the number of generated boundary points reaches the specified number, thereby dividing the range to which the activation function f is applied into multiple intervals.
Hereinafter, the operation of the VBG algorithm will be specifically described with reference to the example illustrated in
In addition, here, as an example, it is assumed that the activation function uses a floating point representation in the bf16 (brain floating point16) format. Bf16 is a 16-bit floating point format developed for artificial neural networks, consisting of a sign bit of 1, an exponent bit of 8, and a mantissa bit of 7. Bf16 has lower precision than the existing 32-bit floating point format, but it can support small memory capacity and fast operations, and it has higher precision due to the extended number of exponent bits compared to the existing 16-bit floating point format, making it a suitable floating point representation format for artificial neural networks.
Referring to (a) of
Then, the maximum error point e is searched for, which represents the input value x that maximizes the error between the linearly interpolated approximate activation function and the activation function f between the initial boundary points p1 and p2. In (a), it is shown that when the input value x is 2.75, the error between the approximate activation function and the activation function f is maximized, and therefore the maximum error point e is e=2.75. However, in the VBG algorithm, considering the floating point representation format (bf16 in this case), the point where the mantissa part expressed in 7 bits in bf16 is minimized is generated as the boundary point. In other words, the boundary point is generated so that the 7 bits corresponding to the mantissa part are not utilized as much as possible.
Accordingly, as in (b), 2 with the mantissa part omitted from the maximum error point e is generated as a boundary point p2, and the bf16 expression for the boundary point p2 is 2=“0_10000000_0000000”. Since the boundary point p2 generated here is distinguishable from the initial boundary points p1 and p3 even if the part corresponding to 0.75, which should be expressed as a mantissa in the maximum error point e of 2.75, is omitted, it is set to 2.
In
Then, since the entire range (1 to 4) is divided into two intervals by the boundary point p2 generated in (b), the maximum error point e is searched again which represents the input value x that maximizes the error between the linearly interpolated approximate activation function for each of the two divided intervals and the activation function f. In (b), the maximum error point e is shown as e=1.25. When the maximum error point (e) is 1.25, the closest point that does not utilize the 7 bits corresponding to the mantissa part is 1. However, 1 is already set as the boundary point p1, and the other closest point, 2, is also already set as the boundary point p2. Therefore, the closest point that does not utilize the 7 bits corresponding to the mantissa part as much as possible is 1.5, and thus 1.5 (0_01111111_1000000) is generated as a new boundary point p2 as shown in (c). At this time, the number of bits of the mantissa that are not utilized at the generated boundary point p2 is 6 (k=6), so the non-expression mantissa bit information q6 is also indicated.
After that, in (c), the new maximum error point e is e=1.125, and the closest point that does not utilize the 7 bits corresponding to the mantissa part as much as possible is 1.25. Therefore, as shown in (d), 1.25 (0_01111111_0100000) is generated as a new boundary point p2, and since the number of bits of the mantissa that are not utilized is 5 (k=5), the non-expression mantissa bit information q5 is also indicated.
That is, in the example of
Accordingly, the interval information acquisition module 10 acquires interval information for the set intervals S1 to S4 in the process of approximating the activation function f with multiple approximate activation functions f1′ to f4′. Here, the interval information S1 to S4 may be acquired based on the boundary points p1 to p5 expressed in the bf16 format, and has a unique floating point representation for each interval.
In addition, the interval information acquisition module 10 may acquire an approximate activation function f1′ to f4′ in which the activation function f in each interval is approximated by linear interpolation.
As the activation function f is approximated by multiple interval-wise approximate activation functions f1′ to f4′, the apparatus for accelerating activation function must determine an interval corresponding to the input value x among the multiple divided intervals, acquire an approximate activation function approximated in the determined interval, and perform an operation.
Accordingly, the input interval determination module 20 sets multiple interval detection operators to detect intervals according to input values x based on common bit patterns of values included in each interval, among the multiple intervals divided according to boundary points p1 to p5 acquired as interval information, and determines an interval corresponding to the input value x using the set multiple interval detection operators to obtain an interval indicator.
When multiple boundary points are generated based on the VBG algorithm, and the range to which the activation function f is applied is divided into multiple intervals S1 to S4, the expression of the boundary point pn that determines each interval is expressed as a single bit difference. Therefore, common bit patterns can be easily extracted from each interval.
Referring to
The interval detection operators can be extracted as a unique common bit pattern from the floating point expressions that constitute each interval Sn, and can be set as an operator that performs an inversion (NOT) operation on the bit values of the input value x and a logical AND operation on the entire common bit pattern according to the bit value of the bit pattern. At this time, the inversion (NOT) operation may be performed on the bit whose bit value of the common bit pattern is 0, and the logical AND operation may be performed on the entire common bit pattern including the inverted bit.
The interval detection operators may consist of an inversion operator (NOT) and a logical AND operator (AND), which are set according to the bit length and bit value required for a unique floating point representation of each interval.
Specifically, to set up the interval detection operators, first, the number of bits of a common bit pattern required for a unique floating point representation in the first interval S1 divided with the first boundary point p1 and the second boundary point p2 is checked, and the number of bits for the interval detection operation is specified.
Referring to
In the second interval S2, since the number of bits required for the second boundary point p2 is greater than that for the third boundary point p3, the number of bits for the interval detection operation may be set to 11 bits, and the 11-bit common bit pattern of the second interval S2 may be extracted as “0_01111111_01”. In addition, in the third interval S3, it may be set to 10 bits, and the 10-bit common bit pattern of the third interval S3 may be extracted as “0_01111111_1”, and in the fourth interval S4, it may be set to 9 bits, and the 9-bit common bit pattern of the fourth interval S4 may be extracted as “0_10000000”.
The input value x belonging to a specific interval means that the input value x has the same bit pattern as the common bit pattern of the interval. Therefore, the interval detection operator may be composed of a logical AND operation on the input value x. However, since the logical AND operation outputs 1 only when all input values are 1, each interval detection operator may be configured so that the inversion (NOT) operation is performed first on the bit positions that have a value of 0 in the common bit pattern of the interval before the logical AND operation.
As an example, the interval detection operator for the first interval S1 checks the bits (s0, e7, m6, m5) whose bit values are 0 among the 11 bits (0_01111111_00) that are the common bit patterns of the first interval S1, and specifies an inversion operator (NOT) that inverts the bit values at the corresponding bit positions in the input value x. In addition, the interval detection operator for the second interval S2 also checks the bits (s0, e7, m6) whose bit value is 0 among the 11-bit common bit pattern (0_01111111_01), and specifies the inversion operator (NOT). In addition, in the interval detection operator for the third interval S3, the inversion operator (NOT) is specified for the bits (s0, e7) whose bit value is 0 among the 10-bit common bit pattern (0_01111111_1). In addition, in the interval detection operator for the fourth inversion S4, the inversion operator (NOT) is specified for the bits (s0, e6˜e0) whose bit value is 0 among the 9-bit common bit pattern (0_10000000).
Once the number of bits of the interval detection operator for each interval Sn and the bit positions to which the inversion operator is to be applied are determined, an interval detection operator for each interval Sn to determine the interval to which the input value x belongs may be configured as shown on the right side of
Once multiple interval detection operators are configured for each interval Sn, the input interval determination module 20 may perform an interval detection operation on the input value x with each of the multiple interval detection operators, as shown in
That is, in the 11 bits consisting of the sine so and exponent e7, . . . , e0 of the input value x in the bf16 format and the 2-bit mantissa m6 and m5, the four bit values at the bit positions corresponding to (s0, e7, m6, m5) may be inverted (NOT), and then a logical AND operation (AND) may be performed on the entire 11 bits to obtain the interval detection value o1 for the first interval S1. Similarly, in 11 bits consisting of the sine so and exponent e7, . . . , e0 of the input value x and the 2-bit mantissa m6 and m5, the three bit values at the bit positions corresponding to (s0, e7, m6) may be inverted (NOT) and then a logical AND operation may be performed on the entire 11 bits to obtain the interval detection value o2 for the second interval S2, and in 10 bits s0, e7, . . . , e0, m6, the bit values at the bit positions corresponding to (s0, e7) are inverted (NOT) and then a logical AND operation is performed to obtain the interval detection value o3 for the third interval S3. In addition, by performing an inversion operation (NOT) on the 8 bit values of the bit positions corresponding to (s0, e6, . . . , e0) in 9 bits s0, e7, . . . , e0 and then performing a logical AND operation (AND), the interval detection value o4 for the 4th interval S4 can be obtained.
When the input value x is operated with the interval operator for each interval, among the multiple interval detection values o1, o2, o3 and o4 calculated for each interval Sn, only the interval detection value for the interval to which the input value x belongs has a bit value of 1, and the rest all have bit values of 0. Therefore, the multiple calculated interval detection values o1, o2, o3 and o4 form an interval indicator O, which can also be called a one-hot segment vector that expresses the intervals in which the input value x is included in multiple intervals Sn by encoding them in a one-hot encoding manner.
As described above, the input interval determination module 20 sets multiple interval operators that perform bitwise operations on the input value x to determine the interval in which the input value x is included, and performs interval operations on the input value x with the set multiple interval operators to obtain a one-hot encoded interval indicators O. Accordingly, the input interval determination module 20 may also be called a BSE (Bitwise Segment Encoder).
Here, for the convenience of understanding, it is explained that the interval information acquisition module 10 acquires multiple boundary points having floating point representations as interval information, the input interval determination module 20 sets multiple interval operators based on the multiple boundary points acquired as interval information, and acquires interval indicators O using the set interval operators. However, in the trained artificial neural network, interval information for approximating the activation function with multiple approximate activation functions may be set in advance, and when the interval information is set, multiple interval operators can also be determined in advance according to the set interval information.
Therefore, the input interval determination module 20 may be configured to perform interval operations on the input value x by directly obtaining preset interval operators without setting multiple interval operators. In this case, the interval information acquisition module 10 can be omitted.
Meanwhile, the apparatus for accelerating activation function of the present disclosure can be implemented using PIM (Processing-in-memory) in which a plurality of input values x1, . . . , x5 are stored. In this case, the input interval determination module 20 may be configured to simultaneously perform a multi-input interval detection operation in parallel by performing a bit inversion and a logical AND operation for multiple inputs on multiple input values x1, . . . , x5 stored in a memory array under the control of a memory controller (not shown) as in
The approximate function acquisition module 30 acquires an approximate activation function of a specified interval according to an interval indicator acquired from the input interval determination module 20.
As described above, when an interval including an input value x is determined by an interval indicator O, an approximate activation function in which an activation function f is approximated by a linear function in the determined interval must be acquired. As shown in
However, as described above, when the apparatus for accelerating activation function is implemented using PIM, the approximation function fn′ corresponding to the acquired multiple interval indicators Oi for the multiple stored input values (xi, where i is a natural number) must be identified and obtained, respectively. Accordingly, as the number i of input values xi increases, the approximation function fn′ must be repeatedly acquired through the interval indicators frequently, which may lead to increased computational costs.
Meanwhile, since the approximate activation function approximated by linear interpolation in each interval has the form ax+b for the input value x, once the coefficient a and bias b of the approximate activation function fn′ are obtained, it can be considered that the approximate activation function has been obtained.
Accordingly, when the apparatus for accelerating activation function is implemented using PIM, the approximate function acquisition module 30 may be configured to simultaneously acquire coefficients a and biases b of multiple approximate activation functions fn′ in parallel as a bit combination of multiple interval indicators Oi acquired for multiple input values xi, as shown in
In
The approximate function acquisition module 30 implemented with PIM may logically combine the bits of each interval detection value o1 to o16 from multiple interval indicators Oi stored in the memory under the control of a memory controller (not shown), and the values in the column direction in
In the example of
Similarly, a case is illustrated where an NOR operation is performed on the bit values of the 4th bit o4, the 10th bit o10, and the 13th bit o13 of the interval indicator Oi to obtain the first bit of the bias bi, and an NOR operation is performed on the bit values of the 12th bit o12 and the 15th bit o15 of the interval indicator Oi to obtain the second bit of the bias bi.
At this time, the bits on of the interval indicator Oi for which an NOR operation must be performed to obtain each coefficient ai and bias bi may be specified by the memory controller using a lookup table, etc.
As illustrated in
The approximate function acquisition module 30 receives and decodes an interval indicator encoded as a one-hot segment vector to acquire coefficients ai and biases bi of an approximate activation function fn′, and therefore may be referred to as a segment coefficient decoder (SCD).
The activation operation module 40 may perform an activation operation on an input value x using multiple input values xi and the coefficients ai and bias bi of an approximate activation function fn′ obtained for each of the multiple input values xi, and obtain the activation operation result as a kernel output y.
As described above, the apparatus for accelerating activation function may be implemented with a digital PIM to reduce the computational cost associated with the increase in complexity of the activation function.
As shown in
In addition, the activation operation module 40 may also obtain coefficients ai and biases bi of multiple approximate functions fn′ for multiple input values xi by decoding multiple stored interval indicators Oi in parallel under the control of the memory controller 50, and the obtained coefficients ai and biases bi of multiple approximate functions fn′ may also be stored in the memory array.
Accordingly, the activation operation module 40 may perform an activation function operation using multiple input values xi stored in the memory array and coefficients ai and biases bi of multiple approximation functions fn′, and may store the kernel output y, which is the result of performing the activation function operation, back in the memory array.
In this way, since parallel processing for multiple inputs is possible based on the PIM operation, the throughput of the activation function operation can be greatly improved, and since the coefficients ai and biases bi are stored in the memory array, there is an advantage in that hardware for implementing a separate LUT is not required. In particular, since the coefficients ai and biases bi of the linearly approximated approximate activation function fn′ are stored in the memory array and parallel operations are performed, the computational cost can be greatly reduced while maintaining high accuracy compared to when the activation function f is approximated based on a polynomial.
In addition, when the apparatus for accelerating activation function is implemented with a non-volatile memory-based digital PIM, the computational cost can be further reduced by applying reduced-NOR operation.
As shown in
However, when the apparatus for accelerating activation function is implemented with a nonvolatile memory-based digital PIM such as a memristor, FeRAM (Ferroelectric RAM), MRAM (Magnetic RAM), and PRAM (Phase Change RAM), as shown in
However, this is a measure to further improve the operation performance of the apparatus for accelerating activation function of the present disclosure, and the apparatus for accelerating activation function does not have to be implemented with a non-volatile memory-based digital PIM.
In the illustrated embodiment, respective configurations may have different functions and capabilities in addition to those described above, and may include additional configurations in addition to those described above. In addition, in an embodiment, each configuration may be implemented using one or more physically separated devices, or may be implemented by one or more processors or a combination of one or more processors and software, and may not be clearly distinguished in specific operations unlike the illustrated example.
In addition, the apparatus for accelerating activation function shown in
In addition, the apparatus for accelerating activation function may be mounted in a computing device or server provided with a hardware element as a software, a hardware, or a combination thereof. The computing device or server may refer to various devices including all or some of a communication device for communicating with various devices and wired/wireless communication networks such as a communication modem, a memory which stores data for executing programs, and a microprocessor which executes programs to perform operations and commands.
Referring to
of a weighted sum of the weights w1˜wd for multiple values u1˜ud applied to the kernels provided in the artificial neural network, as described in
Once the input value x is obtained, multiple interval detection operators are obtained to determine the interval in which the input value x is included in order to select one of the multiple approximate activation functions fn′ that linearly approximate the activation function f set for each kernel by dividing it into intervals (72). Here, the multiple interval detection operators may be composed of operators that invert and perform logical AND operations on some bits of the input value x according to the bit length and bit value used for the floating point representation of the boundary point pn that divides each interval, as shown in
Once the multiple interval detection operators are obtained according to multiple divided intervals, an interval detection operation is performed on the input value x using the obtained interval detection operators to obtain an interval indicator O (73). Here, the interval indicator is composed of the result of performing an interval detection operation on the input value x using multiple interval detection operators, and may be obtained in the form of a one-hot segment vector.
Once the interval indicator O is obtained, the interval that includes the input value x is determined among the multiple intervals divided according to the obtained interval indicator O, and an approximate activation function that approximates the activation function in the determined interval is obtained (74). Since the approximate activation function for each interval is set in advance, when the interval that includes the input value x is determined, the approximate activation function can be easily obtained.
Then, by performing an activation function operation on the input value x using the obtained approximate activation function, the output y of the kernel is obtained (75).
Meanwhile, the method for accelerating activation function shown in
In addition, rather than individually searching and obtaining the approximate activation function for each of the multiple interval indicators Oi stored in the memory array, the coefficients ai and biases bi that constitute the multiple approximated activation functions may be obtained as a combination of bitwise NORs of the multiple interval indicators Oi and stored in the memory array, thereby further reducing the computational cost.
Additionally, when the apparatus for accelerating activation function is implemented as a nonvolatile memory-based digital PIM, the reduced-NOR operation shown in (b) of
In
In the illustrated embodiment, respective configurations may have different functions and capabilities in addition to those described below, and may include additional configurations in addition to those described below. The illustrated computing environment 90 may include a computing device 91 to perform the method for accelerating activation function illustrated in
The computing device 91 includes at least one processor 92, a computer readable storage medium 93 and a communication bus 95. The processor 92 may cause the computing device 91 to operate according to the above-mentioned exemplary embodiment. For example, the processor 92 may execute one or more programs 94 stored in the computer readable storage medium 93. The one or more programs 94 may include one or more computer executable instructions, and the computer executable instructions may be configured, when executed by the processor 92, to cause the computing device 91 to perform operations in accordance with the exemplary embodiment.
The communication bus 95 interconnects various other components of the computing device 91, including the processor 92 and the computer readable storage medium 93.
The computing device 91 may also include one or more input/output interfaces 96 and one or more communication interfaces 97 that provide interfaces for one or more input/output devices 98. The input/output interfaces 96 and the communication interfaces 97 are connected to the communication bus 95. The input/output devices 98 may be connected to other components of the computing device 91 through the input/output interface 96. Exemplary input/output devices 98 may include input devices such as a pointing device (such as a mouse or trackpad), keyboard, touch input device (such as a touchpad or touchscreen), voice or sound input device, sensor devices of various types and/or photography devices, and/or output devices such as a display device, printer, speaker and/or network card. The exemplary input/output device 98 is one component constituting the computing device 91, may be included inside the computing device 91, or may be connected to the computing device 91 as a separate device distinct from the computing device 91.
The present disclosure has been described in detail through a representative embodiment, but those of ordinary skill in the art to which the art pertains will appreciate that various modifications and other equivalent embodiments are possible from this. Therefore, the true technical protection scope of the present invention should be determined by the technical spirit set forth in the appended scope of claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2023-0180602 | Dec 2023 | KR | national |