The present disclosure relates to a technical field of neural networks, and more particularly relates to an apparatus of implementing activation logic for a neural network and method thereof.
The most common way to implement activation logics in a neural network is to use lookup tables. Basically, a method of performing the lookup tables consumes a number of bits of memory in the neural network of an electronic apparatus. In particular, even if same values of lookup tables are shared with each of neurons in the neural network, a number of multi-level multiplexer of outputs of the activation logics will be used in order to operate the values of lookup tables in parallel, thereby increasing the burden on hardware components of the electronic apparatus. Therefore, there is a need to reduce bit size of the memory of the lookup tables to solve above-mentioned problems.
The disclosure provides an apparatus of implementing activation logic for a neural network and method thereof, such that a memory size of neural network is reduced. The apparatus is configured to reduce the memory size of an address translated by mapping input values and output values of the neural network in a multi-stage mode.
Based on the above objective, the present disclosure sets forth an apparatus of implementing activation logic for a neural network, the apparatus comprising: an input unit comprising n bits and n input data values that are stored in the n bits correspondingly, wherein n is defined as a sum of n1 and n2, and n, n1, and n2 are positive integers; a first address translated look-up table comprising (2{circumflex over ( )}n1) first entries that map to (2{circumflex over ( )}n1) addresses based on the n bits of the input unit, wherein each of the first entries comprises (n1−1) bits and (n1−1) first preset values stored in the (n1−1) bits correspondingly, and n1 input data values of the n bits of the input unit are mapped to the (n1−1) first preset values that stored in one of the (2{circumflex over ( )}n1) first entries of the first address translated look-up table; an intermediate storage unit coupled to the input unit and the first address translated look-up table, wherein the intermediate storage unit comprises (n−1) bits by combining (n1−1) bits of the first address translated look-up table with n2 bits of the input unit, and comprises (n−1) intermediate data values by combining the (n1−1) first preset values of the first address translated look-up table with n2 input data values of the n bits of the input unit; a second address translated look-up table comprising (2{circumflex over ( )}(n−1)) second entries that map to the (2{circumflex over ( )}(n−1)) bit addresses based on of the (n−1) bits of the intermediate storage unit, wherein each of the (2{circumflex over ( )}(n−1)) second entries comprises (n2+1) bits and (n2+1) second preset values stored in the (n2+1) bits correspondingly, and the (n−1) intermediate data values of the (n−1) bits of the intermediate storage unit are mapped to the (n2+1) second preset values stored in one of the (2{circumflex over ( )}(n−1)) second entries of the second address translated look-up table; and an output unit coupled to the first address translated look-up table and the second address translated look-up table, combining (n1−1) bits of the first address translated look-up table with (n2+1) bits of the second address translated look-up table for outputting n output data values by combining the (n1−1) first preset values and the (n2+1) second preset values.
In an embodiment, the apparatus of implementing activation logic for the neural network further comprises a first decoder coupled to the input unit and the first address translated look-up table, decoding n1 bits of the n bits for generating (2{circumflex over ( )}n1) bit addresses using two to a power of n1.
In an embodiment, the apparatus of implementing activation logic for the neural network further comprises a second decoder coupled to the intermediate storage unit and the second address translated look-up table, decoding (n−1) bits of the intermediate storage unit for generating (2{circumflex over ( )}(n−1)) bit addresses using two to a power of (n−1).
In an embodiment, in the input unit, n1 bits of the n bits are defined as upper bits of n1 input data values, and n2 bits are defined as lower bits of n2 input data values.
In an embodiment, the (n2+1) bits of each of the (2{circumflex over ( )}(n−1)) second entries of the second address translated look-up table comprises a bit that is defined as an indicator of a saturation point of the combination of the (n1−1) first preset values and the (n2+1) second preset values.
In an embodiment, in the output unit, the (n1−1) bits of the first address translated look-up table are disposed between the bit of the saturation point and the (n2+1) bits.
In an embodiment, the (n−1) bits of the intermediate storage unit are defined as upper bits of n1 intermediate data values, and the n2 bits of the intermediate storage unit are defined as upper bits of n2 input data values of the input unit.
In an embodiment, the a memory size sum of a product of (2{circumflex over ( )}n1) bits and (n1−1) bits of the first address translated look-up table and a product of (2{circumflex over ( )}(n−1)) bits and (n2+1) bits of the second address translated look-up table is less than a memory size sum of a product of (2{circumflex over ( )}n) bits and n bits.
In an embodiment, the product of (2{circumflex over ( )}n1) bits and (n1−1) bits of the first address translated look-up table is less than the product of (2{circumflex over ( )}(n−1)) bits and (n2+1) bits of the second address translated look-up table.
In an embodiment, then bits of the input unit further comprise a signed bit.
In an embodiment, when the signed bit is a negative signed bit, the n input data values of the n bits are represented as 2's complement.
In an embodiment, an apparatus of implementing activation logic for a neural network and the apparatus comprises: an input unit comprising n bits and n input data values that are stored in the n bits correspondingly, wherein n is defined as a sum of n1 and n2, and n, n1, and n2 are positive integers; a first address translated look-up table comprising (2{circumflex over ( )}n1) first entries that map to (2{circumflex over ( )}n1) bit addresses based on the n bits of the input unit, wherein each of the first entries comprises (n1−1) bits and (n1−1) first preset values stored in the (n1−1) bits correspondingly, and n1 input data values of the n bits of the input unit are mapped to the (n1−1) first preset values that stored in one of the (2{circumflex over ( )}n1) first entries of the first address translated look-up table; an intermediate storage unit coupled to the input unit and the first address translated look-up table, wherein the intermediate storage unit comprises (n−1) bits by combining (n1−1) bits of the first address translated look-up table with n2 bits of the input unit, and comprises (n−1) intermediate data values by combining the (n1−1) first preset values of the first address translated look-up table with n2 input data values of the n bits of the input unit; and a second address translated look-up table comprising (2{circumflex over ( )}(n−1)) second entries that map to the (2{circumflex over ( )}(n−1)) bit addresses based on of the (n−1) bits of the intermediate storage unit, wherein each of the (2{circumflex over ( )}(n−1)) second entries comprises (n2+1) bits and (n2+1) second preset values stored in the (n2+1) bits correspondingly, and the (n−1) intermediate data values of the (n−1) bits of the intermediate storage unit are mapped to the (n2+1) second preset values stored in one of the (2{circumflex over ( )}(n−1)) second entries of the second address translated look-up table, wherein the second address translated look-up table outputs n output data values that combines the (n1−1) first preset values and the (n2+1) second preset values.
Based on the above objective, the present disclosure sets forth a method of implementing activation logic for a neural network, the apparatus comprising: inputting, by an input unit, n input data values to n bits, wherein the input data values are stored in then bits correspondingly, and n is defined as a sum of n1 and n2, and n, n1, and n2 are positive integers;
mapping, by a first address translated look-up table, (2{circumflex over ( )}n1) first entries to (2{circumflex over ( )}n1) bit addresses based on the n bits of the input unit, wherein each of the first entries comprises (n1−1) bits and (n1−1) first preset values stored in the (n1−1) bits correspondingly, and n1 input data values of the n bits of the input unit are mapped to the (n1−1) first preset values that stored in one of the (2{circumflex over ( )}n1) first entries of the first address translated look-up table;
combing, by an intermediate storage unit, (n1−1) bits of the first address translated look-up table with n2 bits of the input unit;
combing, by the intermediate storage unit, the (n1−1) first preset values of the first address translated look-up table with n2 input data values of the n bits of the input unit;
mapping, by a second address translated look-up table, (2{circumflex over ( )}(n−1)) second entries to the (2{circumflex over ( )}(n−1)) bit addresses based on of the (n−1) bits of the intermediate storage unit, wherein each of the (2{circumflex over ( )}(n−1)) second entries comprises (n2+1) bits and (n2+1) second preset values stored in the (n2+1) bits correspondingly;
mapping, by the second address translated look-up table, the (n−1) intermediate data values of the (n−1) bits of the intermediate storage unit to the (n2+1) second preset values stored in one of the (2{circumflex over ( )}(n−1)) second entries of the second address translated look-up table; and
combing, by an output unit, (n1−1) bits of the first address translated look-up table with (n2+1) bits of the second address translated look-up table for outputting n output data values by combining the (n1−1) first preset values and the (n2+1) second preset values.
The disclosure provides an apparatus of implementing activation logic for a neural network and method thereof, such that a memory size of neural network is reduced. The apparatus is configured to reduce the memory size of address translated by mapping input values and output values of the neural network in a multi-stage mode, such that the input data values and output data values are mapped by decreasing memory size of the look-up tables when performing activation logic of the neural network.
The following embodiments refer to the accompanying drawings for exemplifying specific implementable embodiments of the present disclosure in a suitable computing environment. It should be noted that the exemplary described embodiments are configured to describe and understand the present disclosure, but the present disclosure is not limited thereto.
The following embodiments refer to the accompanying figures for exemplifying specific implementable embodiments of the present disclosure in a suitable computing environment. It should be noted that the exemplary described embodiments are configured to describe and understand the present disclosure, but the present disclosure is not limited thereto. Directional terms, such as an upper side, a lower side, a front side, a back side, a left side, a right side, an inner side, an outer side, and a lateral side, mentioned in the present disclosure are only for reference. Therefore, the directional terms are used for describing and understanding rather than limiting the present disclosure. In the figures, units having similar structures are used for the same reference numbers.
As shown in
In
As shown in
In
In an embodiment, the first decoder 102 is coupled to the input unit 100 and the first address translated look-up table 104 and is configured to decode n1 bits of the n bits for generating (2{circumflex over ( )}n1) bit addresses using two to a power of n1. For example, if n1 is 4, (2{circumflex over ( )}n1) bit addresses are in a range from 0 to 15 corresponding to 16 first entries. In an embodiment, the second decoder 108 is coupled to the intermediate storage unit 106 and the second address translated look-up table 110 and is configured to decode (n−1) bits of the intermediate storage unit 106 for generating (2{circumflex over ( )}(n−1)) bit addresses using two to a power of (n−1).
In an embodiment, n1 bits of the n bits are defined as upper bits of n1 input data values of the input unit 100, and n2 bits are defined as lower bits of n2 input data values of the input unit 100.
In an embodiment, the (n2+1) bits of each of the (2{circumflex over ( )}(n−1)) second entries of the second address translated look-up table 110 includes a bit that is defined as an indicator of a saturation point of the combination of the (n1−1) first preset values and the (n2+1) second preset values. In an embodiment, the (n1−1) bits of the first address translated look-up table are disposed between the bit of the saturation point and the (n2+1) bits in the output unit 112.
In an embodiment, the (n−1) bits of the intermediate storage unit 106 are defined as upper bits of n1 intermediate data values, and the n2 bits of the intermediate storage unit 106 are defined as upper bits of n2 input data values of the input unit 100.
In an embodiment, a memory size sum of a product of (2{circumflex over ( )}n1) bits and (n1−1) bits of the first address translated look-up table 104 and a product of (2{circumflex over ( )}(n−1)) bits and (n2+1) bits of the second address translated look-up table 110 is less than a memory size sum of a product of (2{circumflex over ( )}n) bits and n bits.
In an embodiment, the n bits of the input unit 100 further include a signed bit. When the signed bit is a negative signed bit, the n input data values of the n bits are represented as 2's complement.
In embodiments of the present disclosure, an apparatus of implementing activation logic for a neural network is configured to reduce the memory size of address translated tables by mapping input data values and output data values of the neural network in a multi-stage mode (e.g., a first stage mode and a second stage mode). For example, the first address translated look-up table 104 is defined as a first stage mode and the second address translated look-up table 110 that is coupled to the first address translated look-up table 104 via the first address translated look-up table 104 and the second decoder 108, such that the input data values and output data values are mapped by decreasing memory size of the look-up tables when performing activation logic of the neural network.
In
In
At a step S500, inputting, by an input unit, n input data values to n bits is performed, wherein the input data values are stored in the n bits correspondingly, and n is defined as a sum of n1 and n2, and n, n1, and n2 are positive integers. In one embodiment, decoding, by a first decoder, n1 bits of the n bits for generating (2{circumflex over ( )}n1) bit addresses using two to a power of n1 is performed.
At a step S502, mapping, by a first address translated look-up table, (2{circumflex over ( )}n1) first entries to (2{circumflex over ( )}n1) bit addresses based on the n bits of the input unit is performed, wherein each of the first entries comprises (n1−1) bits and (n1−1) first preset values stored in the (n1−1) bits correspondingly, and n1 input data values of the n bits of the input unit are mapped to the (n1−1) first preset values that stored in one of the (2{circumflex over ( )}n1) first entries of the first address translated look-up table.
At a step S504, combing, by an intermediate storage unit, (n1−1) bits of the first address translated look-up table with n2 bits of the input unit is performed.
At a step S506, combing, by the intermediate storage unit, the (n1−1) first preset values of the first address translated look-up table with n2 input data values of the n bits of the input unit is performed.
At a step S508, mapping, by a second address translated look-up table, (2{circumflex over ( )}(n−1)) second entries to the (2{circumflex over ( )}(n−1)) bit addresses based on of the (n−1) bits of the intermediate storage unit is performed, wherein each of the (2{circumflex over ( )}(n−1)) second entries comprises (n2+1) bits and (n2+1) second preset values stored in the (n2+1) bits correspondingly. In one embodiment, decoding, by a second decoder, (n−1) bits of the intermediate storage unit for generating (2{circumflex over ( )}(n−1)) bit addresses using two to a power of (n−1) is performed.
At a step S510, mapping, by the second address translated look-up table, the (n−1) intermediate data values of the (n−1) bits of the intermediate storage unit to the (n2+1) second preset values stored in one of the (2{circumflex over ( )}(n−1)) second entries of the second address translated look-up table is performed.
At a step S512, combing, by an output unit, (n1−1) bits of the first address translated look-up table with (n2+1) bits of the second address translated look-up table for outputting n output data values by combining the (n1−1) first preset values and the (n2+1) second preset values is performed.
In one embodiment, in the input unit, n1 bits of the n bits are defined as upper bits of n1 input data values, and n2 bits are defined as lower bits of n2 input data values.
In one embodiment, the (n2+1) bits of each of the (2{circumflex over ( )}(n−1)) second entries of the second address translated look-up table comprises a bit that is defined as an indicator of a saturation point of the combination of the (n1−1) first preset values and the (n2+1) second preset values.
In one embodiment, in the output unit, the (n1−1) bits of the first address translated look-up table are disposed between the bit of the saturation point and the (n2+1) bits.
In one embodiment, the (n−1) bits of the intermediate storage unit are defined as upper bits of n1 intermediate data values, and the n2 bits of the intermediate storage unit are defined as upper bits of n2 input data values of the input unit.
In one embodiment, a memory size sum of a product of (2{circumflex over ( )}n1) bits and (n1−1) bits of the first address translated look-up table and a product of (2{circumflex over ( )}(n−1)) bits and (n2+1) bits of the second address translated look-up table is less than a memory size sum of a product of (2{circumflex over ( )}n) bits and n bits.
In one embodiment, the product of (2{circumflex over ( )}n1) bits and (n1−1) bits of the first address translated look-up table is less than the product of (2{circumflex over ( )}(n−1)) bits and (n2+1) bits of the second address translated look-up table.
In one embodiment, the n bits of the input unit further comprise a signed bit.
In one embodiment, when the signed bit is a negative signed bit, the n input data values of the n bits are represented as 2's complement.
In one embodiment, an apparatus of implementing activation logic for a neural network includes a processor and a memory, wherein the memory is configured to store executable program instructions, and the processor is configured to execute the executable program instructions performing above-mentioned steps S500 to S512.
In the description of the present disclosure, reference is made to the term “one embodiment”, “certain embodiments”, “exemplary embodiments”, “some embodiments”, “examples”, “specific examples”, or “some examples” and the like, and are intended to refer to specific features described in connection with the embodiments or examples, structure, material or characteristic that is included in at least one embodiment or example of the present disclosure. In the present disclosure, the schematic expressions of the terms are not necessarily referring to the same embodiment or example. Moreover, the described specific features, structures, materials, or features may be combined in any suitable manner in any one or more embodiments or examples of the present disclosure. The actions of the method disclosed by the embodiments of present disclosure can be embodied directly as a hardware decoding processor can be directly executed by a hardware decoding processor, or by combinations of hardware and software codes in a decoding processor. The software codes can be stored in a storage medium selected from one group consisting of random access memory, a flash memory, a read-only memory, a programmable read-only memory, an electrically erasable programmable memory, and registers. The processor read information (e.g., instructions) in the memory and completes the above-mentioned actions of the method in combination with hardware.
According to the above-mentioned descriptions, the disclosure provides an apparatus of implementing activation logic for a neural network and method thereof, such that a memory size of neural network is reduced. The apparatus is configured to reduce the memory size of address translated by mapping input values and output values of the neural network in a multi-stage mode, such that the input data values and output data values are mapped by decreasing memory size of the look-up tables when performing activation logic of the neural network.
As is understood by a person skilled in the art, the foregoing preferred embodiments of the present disclosure are illustrative rather than limiting of the present disclosure. It is intended that they cover various modifications and similar arrangements be included within the spirit and scope of the present disclosure, the scope of which should be accorded the broadest interpretation so as to encompass all such modifications and similar structures.
This application is a National Phase of PCT Patent Application No. PCT/CN2019/087299 having International filing date of May 16, 2019, which claims the benefit of priority to U.S. Provisional Application No. 62/756,095, filed Nov. 6, 2018. The contents of the above applications are all incorporated by reference as if fully set forth herein in their entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2019/087299 | 5/16/2019 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2020/093676 | 5/14/2020 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
7685355 | Bond | Mar 2010 | B2 |
7752417 | Manczak et al. | Jul 2010 | B2 |
8074050 | Fan | Dec 2011 | B2 |
20100318761 | Moyer et al. | Dec 2010 | A1 |
20120030203 | Chiang | Feb 2012 | A1 |
20140067739 | Hombs | Mar 2014 | A1 |
20170344492 | Bolbenes et al. | Nov 2017 | A1 |
Number | Date | Country |
---|---|---|
101661438 | Mar 2010 | CN |
104333435 | Feb 2015 | CN |
2601073 | May 2022 | GB |
Entry |
---|
Li et al., “For Activating a Function of Depth Neural Network”, published on Jul. 12, 2019, Document ID: CN-110009092-A, pp. 25 (Year: 2019). |
Number | Date | Country | |
---|---|---|---|
20220004850 A1 | Jan 2022 | US |
Number | Date | Country | |
---|---|---|---|
62756095 | Nov 2018 | US |