The present disclosure relates to an electronic device for performing neural network computation, and more particularly, to an electronic device for performing a neural network convolution operation based on a Winograd transform and an operating method of the electronic device.
With the development of neural network technology, research has been actively conducted on a technique of extracting valid information from input data by performing neural network convolution operations in various types of systems. Recently, there have been discussions regarding techniques that enable an electronic device to directly perform a convolution operation in a neural network by using edge computing technology. As a convolution operation occupies a significant part of computations required in a neural network model, a neural network model for performing a convolution operation in an electronic device and extracting information needs to be lightweight, and an electronic device capable of efficiently processing the convolution operation in the lightweight neural network model is required.
Various embodiments are to provide an electronic device and operation method thereof for transforming pieces of data to a Winograd domain and performing convolution operations using different types of low-precision multiply-accumulate (MAC) units according to statistical characteristics of the transformed pieces of data.
According to an aspect of the present disclosure, an electronic device for performing a convolution operation may be provided. The electronic device may be a device including an input feature map transformer configured to transform an input feature map (IFM) to a Winograd domain, a weight kernel transformer configured to transform a weight kernel to the Winograd domain, a transformation data processor configured to map, a plurality of types of multiply-accumulate (MAC) units included in a computation unit, to a plurality of feature groups created by grouping feature values in a plurality of channels of the transformed input feature map and a plurality of weight value groups created by grouping weight values in a plurality of channels of the transformed weight kernel, the plurality of types of MAC units being configured to perform a MAC operation between a weight value group of the plurality of weight value groups and a feature value group of the plurality of feature groups, a computation data processor configured to collect MAC operation results from the computation unit, and an inverse transformer configured to perform an inverse Winograd transform on a result output according to the collected MAC operation results from the computation data processor to thereby generate an output feature map (OFM) useable for performing a convolution operation.
The computation may include at least one type of MAC unit from among a first type MAC unit, a second type MAC unit, and a third type MAC unit included in the plurality of types of MAC units, and the transformation data processor may map the plurality of feature value groups and the plurality of weight value groups to the at least one type of MAC unit from among the first type MAC unit, the second type MAC unit, and the third type MAC unit included in the plurality of types of MAC units.
The transformation data processor may map, based on a pre-generated mapping table, the plurality of feature value groups and the plurality of weight value groups to the at least one type of MAC unit from among the first type MAC unit, the second type MAC unit, and the third type MAC unit included in the plurality of types of MAC units.
The pre-generated mapping table may be generated to map the plurality of feature value groups and the plurality of weight value groups to corresponding types of MAC units to perform MAC operations, based on a frequency distribution calculated for each of the plurality of feature value groups and a frequency distribution calculated for each of the plurality of weight value groups.
The pre-generated mapping table may be generated to map the plurality of feature value groups and the plurality of weight value groups to corresponding types of MAC units to perform MAC operations, based on at least one of an input feature map sensitivity matrix representing statistical characteristics of the transformed input feature map, a weight kernel sensitivity matrix representing statistical characteristics of the transformed weight kernel, and an output feature map sensitivity matrix representing statistical characteristics of the output feature map.
The first type MAC unit may include a plurality of multiplier units, and an accumulator configured to accumulate and add outputs respectively from the plurality of multiplier units, and each of the plurality of multiplier units may include a first shifter configured to receive a first fixed-point number and perform a right shift operation, a second shifter configured to receive a second fixed-point number and perform a right shift operation, a multiplier configured to receive an output of the first shifter and an output of the second shifter and perform a multiplication operation between the output of the first shifter and the output of the second shifter, and a restoration shifter configured to receive an output of the multiplier and restore a bit length by performing a left shift operation on the output of the multiplier.
The first shifter included in each of the plurality of multiplier units of the first type MAC unit may, when a bit length of the first fixed-point number exceeds a preset first bit length, reduce the bit length of the first fixed-point number by shifting the first fixed-point number to the right by a bit length in excess of the preset first bit length, the second shifter included in each of the plurality of multiplier units of the first type MAC unit may, when a bit length of the second fixed-point number exceeds a preset second bit length, reduce the bit length of the second fixed-point number by shifting the second fixed-point number to the right by a bit length in excess of the preset second bit length, and the restoration shifter included in each of the plurality of multiplier units of the first type MAC unit may restore the bit length of the first fixed-point number by shifting the output of the multiplier to the left by a sum of the bit length of the first fixed-point number by which the first shifter shifts the first fixed-point number to the right and the bit length of the second fixed-point number by which the second shifter shifts the second fixed-point number to the right.
The first fixed-point number input to the first shifter of the first type MAC unit and the second fixed-point number input to the second shifter of the first type MAC unit may be input by exchanging, based on a value of the first fixed-point number, the first bit length, a value of the second fixed-point number, and the second bit length, the value of the first fixed-point number and the value of the second fixed-point number.
The second type MAC unit may include a plurality of multiplier units, an accumulator configured to accumulate and add outputs respectively from the plurality of multiplier units, and a restoration shifter configured to receive an output of the accumulator and restore a bit length by performing a left shift operation on the output of the accumulator, and each of the plurality of multiplier units may include a first shifter configured to receive a first fixed-point number and perform a right shift operation, a second shifter configured to receive a second fixed-point number and perform a right shift operation, a multiplier configured to receive an output of the first shifter and an output of the second shifter and perform a multiplication operation between the output of the first shifter and the output of the second shift, and a first restoration shifter configured to receive an output of the multiplier and increase a bit length by performing a left shift operation on the output of the multiplier.
The first shifter included in each of the plurality of multiplier units of the second type MAC unit may reduce a bit length of the first fixed-point number by shifting the first fixed-point number to the right by a preset first bit length, the second shifter included in each of the plurality of multiplier units of the second type MAC unit may, when a bit length of the second fixed-point number exceeds a preset second bit length, reduce the bit length of the second fixed-point number by shifting the second fixed-point number to the right by a bit length in excess of the second bit length of the second fixed-point number, the restoration shifter included in each of the plurality of multiplier units of the second type MAC unit may increase the bit length of the second fixed-point number by shifting the output of the multiplier to the left by the bit length of the second fixed-point number by which the second shifter shifts the second fixed-point number to the right, and the second restoration shifter of the second type MAC unit may restore the bit length of the first fixed-point number by shifting the output of the accumulator of the second type MAC unit to the left by the first bit length of the first fixed-point number.
The preset first bit length may be determined for each of the plurality of weight value groups, based on a maximum value and a minimum value of weight values included in each of the plurality of weight value groups.
The third type MAC unit may include a plurality of multiplier units, an accumulator configured to accumulate and add outputs respectively from the plurality of multiplier units, and a restoration shifter configured to receive an output of the accumulator and restore a bit length by performing a left shift operation, and each of the plurality of multiplier units may include a first shifter configured to receive a first fixed-point number and perform a right shift operation, a second shifter configured to receive a second fixed-point number and perform a right shift operation, and a multiplier configured to receive an output of the first shifter and an output of the second shifter and perform a multiplication operation between the output of the first shifter and the output of the second shifter.
The first shifter included in each of the plurality of multiplier units of the third type MAC unit may reduce a bit length of the first fixed-point number by shifting the first fixed-point number to the right by a preset first bit length, the second shifter included in each of the plurality of multiplier units of the third type MAC unit may reduce a bit length of the second fixed-point number by shifting the second fixed-point number to the right by a preset second bit length, and the restoration shifter of the third type MAC unit may restore the bit length by shifting the output of the accumulator to the left by a sum of the first bit length and the second bit length.
The preset first bit length may be determined for each of the plurality of weight value groups, based on a maximum value and a minimum value of weight values included in each of the plurality of weight value groups, and the preset second bit length may be determined for each of the plurality of feature value groups, based on a maximum value and a minimum value of feature values included in each of the plurality of feature value groups.
The preset first bit length may be determined for each of the plurality of weight value groups, based on a preset maximum threshold and a preset minimum threshold of weight values included in each of the plurality of weight value groups, and the preset second bit length may be determined for each of the plurality of feature value groups, based on a maximum value and a minimum value of feature values included in each of the plurality of feature value groups.
According to an aspect of the present disclosure, an operation method of an electronic device for performing a convolution operation may be provided. The operation method may include transforming an input feature map (IFM) to a Winograd domain, transforming a weight kernel to the Winograd domain, creating a plurality of feature value groups by grouping feature values at same coordinates in a plurality of channels of the transformed input feature map, creating a plurality of weight value groups by grouping weight values at same coordinates in a plurality of channels of the transformed weight kernel, mapping the plurality of feature value groups and the plurality of weight value groups to a plurality of types of MAC units included in the electronic device, outputting a MAC operation value by performing, a MAC operation, for the plurality of feature value groups with the plurality of weight value groups, respectively; generating a transformed output feature map by collecting MAC operation results according to the outputting of the output MAC operation value, and performing an inverse Winograd transform on the generated transformed output feature map to thereby generate an output feature map (OFM) useable for performing a convolution operation.
According to an aspect of the present disclosure, a non-transitory computer-readable recording medium having recorded thereon a program to execute a method, performed by an electronic device, of performing a convolution operation may be provided. The method stored in the computer-readable recording medium may include transforming an input feature map (IFM) to a Winograd domain, transforming a weight kernel to the Winograd domain, creating a plurality of feature value groups by grouping feature values at the same coordinates in a plurality of channels of the transformed input feature map, creating a plurality of weight value groups by grouping weight values at the same coordinates in a plurality of channels of the transformed weight kernel, mapping the plurality of feature value groups and the plurality of weight value groups to a plurality of types of MAC units included in the electronic device, outputting a MAC operation value by performing, for each of the plurality of feature value groups, a MAC operation with each of a plurality of weight value groups, generating a transformed output feature map by collecting the output MAC operation value, and generating an output feature map by performing an inverse Winograd transform on the generated transformed output feature map to thereby generate an output feature map (OFM) useable for performing a convolution operation.
The above and other aspects, features, and advantages of specific embodiments of the present disclosure will be more apparent from the following description taken in conjunction with the accompanying drawings.
Terms used in the present specification will now be briefly described and then an embodiment of the present disclosure will be described in detail.
The terms used in the present specification may be general terms currently widely used in the art based on functions described in the present disclosure, but may be changed according to an intention of a technician engaged in the art, precedent cases, advent of new technologies, etc. Furthermore, some particular terms may be arbitrarily selected by the applicant, and in this case, the meaning of the selected terms will be described in detail in the detailed description of the disclosure. Thus, the terms used herein should be defined not by simple appellations thereof but based on the meaning of the terms together with the overall description of the present disclosure.
Throughout the specification, when a part “includes” or “comprises” an element, unless there is a particular description contrary thereto, it is understood that the part may further include other elements, not excluding the other elements. Furthermore, terms such as “portion,” “module,” etc. used herein indicate a unit for processing at least one function or operation and may be implemented as hardware or software or a combination of hardware and software.
Embodiments of the present disclosure will be described more fully hereinafter with reference to the accompanying drawings so that they may be easily implemented by one of ordinary skill in the art. However, the present disclosure may be implemented in different forms and should not be construed as being limited to the embodiments set forth herein. In addition, parts not related to descriptions of the present disclosure are omitted to clearly explain the present disclosure in the drawings, and like reference numerals denote like elements throughout.
Methods by which a shifter shifts input values to the right according to the disclosed embodiments may include two types of shifting methods.
A first shifting method refers to a shifting method for, when a bit length of input data exceeds a preset bit length, reducing the bit length by shifting the input data to the right by a bit length in excess of the preset bit length.
A second shifting method refers to a shifting method for, regardless of a bit length of input data, reducing the bit length by shifting the input data to the right by a preset bit length.
Data having a bit length reduced by shifting the data to the right using the first shifting method or the second shifting method may be processed into data having a bit length restored by shifting, via a restoration shifter, the data to the left by the number of bits by which the original bit length is reduced.
Referring to
According to an embodiment, the electronic device 100 may include at least an input/output (I/O) interface 110, a main processor 112, a neural processor 114, and a memory 116.
In an embodiment, the I/O interface 110, the main processor 112, the neural processor 114, and the memory 116 included in the electronic device 100 may be implemented as a single semiconductor chip, e.g., as a system on chip (SoC). However, the present disclosure is not limited thereto, and components of the electronic device 100 may be composed of a plurality of semiconductor chips.
The I/O interface 110 may receive input data via user input or from the outside, and may output a result of data processing by the electronic device 100. For example, the I/O interface may include a camera, a display, a touch screen panel, a keyboard, a plurality of sensors, etc. The plurality of sensors may include, for example, an image sensor, a light detection and ranging (LiDAR) sensor, an ultrasonic sensor, an infrared sensor, etc, but are not limited thereto. The I/O interface 110 may receive data (e.g., image data) from the outside of the electronic device 100, store the received data in the memory 116, or provide the received data to the main processor 112 or the neural processor 114. In addition, the I/O interface 110 may further include a communication interface for transmitting and receiving data from the outside.
The main processor 112 may perform all operations of the electronic device 100. The main processor 112 may execute one or more instructions of a program stored in a memory or a plurality of neural network models. According to an embodiment, the main processor 112 may be, but is not limited to, a central processing unit (CPU), and may include an application processor (AP), a graphics processing unit (GPU), etc.
The neural processor 114 may be a dedicated artificial intelligence (AI) processor or the like designed as a hardware structure specialized for processing a neural network model. The neural processor 114 may generate a neural network model, train the neural network model, or perform computation based on received input data by using the neural network model and generate output data. For example, the neural network model may include, but is not limited to, various types of neural network models, such as a convolutional neural network (CNN), a deep neural network (DNN), a recurrent neural network (RNN), a restricted Boltzmann machine (RBM), a deep belief network (DBN), a bidirectional recurrent DNN (BRDNN), or deep Q-networks (DQN). A neural network model 102 may be downloaded to the electronic device 100 from the outside and stored in the memory 116 of the electronic device. Furthermore, the neural network model 102 stored in the memory 116 may be updated.
The memory 116 may store various pieces of data, programs, or applications for operating and controlling the electronic device 100. A program stored in memory 116 may include one or more instructions. A program (one or more instructions) or application stored in memory 116 may be executed by the main processor 112. According to an embodiment, the memory 116 may include at least one type of storage medium among a flash memory-type memory, a hard disk-type memory, a multimedia card micro-type memory, a card-type memory (e.g., an SD card or an XD memory), random access memory (RAM), static RAM (SRAM), read-only memory (ROM), electrically erasable programmable ROM (EEPROM), PROM, a magnetic memory, a magnetic disk, and an optical disk.
The electronic device 100 may execute the neural network model 102 stored in the memory 116 and perform a convolution operation in the neural network model 102 by using the neural processor 114. In this case, the neural network model 102 may include a plurality of layers. The electronic device 100 may receive input data 101 and perform convolution operations in the plurality of layers included in the neural network model 102 to thereby obtain output data 103.
The neural processor 114 may include a convolution operator capable of performing convolution operations respectively performed in a plurality of layers included in the neural network model 102. In this case, the convolution operator may be configured to include, for example, a computational circuit including a plurality of multiplier-accumulators. In each layer 105 of the plurality of layers included in the neural network model 102, the neural processor 114 may receive data output from a previous layer, perform a convolution operation thereon, and output a result to a next layer. However, the present disclosure is not limited thereto, and convolution operations performed by the neural processor 114 may also be performed by another hardware device (e.g., a CPU, a GPU, or an application-specific integrated circuit (ASIC)) capable of performing convolution operations.
The electronic device 100 according to the disclosed embodiment may be a type of edge device capable of performing edge computing. However, the present disclosure is not limited thereto, and the electronic device may include hardware capable of performing convolution operations to execute a neural network model and process data.
Referring to
In this case, the output feature map OFM 270 generated as a result of the convolution operation may have a size of m×m. Also, the weight kernel WK 210 may include a plurality of channels and have a size of r×r. Also, the input feature map IFM 220 may include a plurality of channels and may have a size of (m+r−1)×(m+r−1). Also, the number of the plurality of channels in each of the input feature map IFM 220 and the weight kernel WK 210 is equal.
As the Winograd transform is performed on each of the input feature map IFM 220 and the weight kernel WK 210, an input feature map 240 transformed to the Winograd domain and a weight kernel 230 transformed to the Winograd domain may be generated.
In addition, as a result of performing element-wise multiplications ⊙ between the transformed input feature map 240 and the transformed weight kernel 230, a plurality of output matrices 250 having a matrix structure in which the element-wise multiplications ⊙ have been performed for each channel may be obtained, and the plurality of output matrices 250 may be added together to obtain a transformed output feature map 260.
Furthermore, an output feature map OFM 270 may be obtained by performing an inverse Winograd transform on the transformed output feature map 260.
For convenience of description, the Winograd transform convolution is described in detail below by taking as an example an embodiment in which the weight kernel WK 210 has a size of 3×3, the output feature map OFM 270 has a size of 2×2, and the plurality of channels are 3 channels.
Referring back to
When element-wise multiplications ⊙ are performed on the transformed input feature map 240 and the transformed weight kernel 230 in the Winograd domain, the output matrices 250 having a 4×4 matrix structure for the three channels may be output, and operation results may be added element-wise to obtain the transformed output feature map 260 having a 4×4 matrix structure. In addition, the output feature map 270 having a 2×2 matrix structure may be obtained by performing an inverse Winograd transform on the transformed output feature map 260.
In the convolution operation based on the Winograd transform according to the example, which includes performing element-wise multiplications in the Winograd domain and performing an inverse Winograd transform to obtain an output feature map, the output feature map may be obtained with a reduced number of multiplication operations compared to when performing a convolution operation in a spatial domain while achieving the same result as in the convolution operation in the spatial domain.
The electronic device according to the disclosed embodiment may be designed with a structure suitable for performing the above-described Winograd-based convolution operation. In detail, the Winograd-based convolution operation may be performed by transforming pieces of data to the Winograd domain, grouping pieces of data transformed to the Winograd domain, and performing multiply-accumulate (MAC) operations by mapping the grouped pieces of data to a plurality of types of MAC units.
Referring to
According to an embodiment, the weight transformer 310 may receive weight kernels comprising of a plurality of channels, and perform a Winograd transform operation thereon to generate transformed weight kernels, each having a matrix structure and including a plurality of channels.
For example, referring to
Referring back to
For example, referring to
The transformation data processor 330 may group weight values from a plurality of channels of the transformed weight kernels. For example, a plurality of weight values at the same matrix coordinates in a plurality of channels of the transformed weight kernels may be grouped so as to be reorganized into a plurality of weight value groups. Also, the transformation data processor 330 may group feature values from a plurality of channels of the transformed input feature maps. For example, feature values at the same matrix coordinates in a plurality of channels of transformed feature maps may be grouped so as to be reorganized into a plurality of feature value groups.
For example, referring to
According to an embodiment, by creating a plurality of weight value groups and a plurality of feature value groups, the transformation data processor 330 may output the plurality of weight value groups and the plurality of feature value groups to the statistical characteristics analyzer 380 in order to obtain statistical characteristics of values included in each of the groups. In addition, when element-wise MAC operations are performed on the created weight value groups and feature value groups, the transformation data processor 330 may assign each weight value group and each feature value group to a corresponding type of MAC unit based on a MAC unit mapping table, so that a MAC operation may be performed in a different way for each weight value group and each feature value group.
According to an embodiment, the transformation data processor 330 may output the plurality of weight value groups and the plurality of feature value groups to the statistical characteristics analyzer 380. The statistical characteristics analyzer 380 may analyze statistical characteristics of weight values included in each of the plurality of weight value groups and statistical characteristics of feature values included in each of the plurality of feature value groups. A method, performed by the statistical characteristics analyzer 380, of analyzing statistical characteristics of the weight value groups and the feature value groups is described in detail below in the description with respect to
According to an embodiment, the transformation data processor 330 may output the plurality of weight value groups and the plurality of feature value groups to the computation unit 340. In this case, the transformation data processor 330 may output the plurality of weight value groups and the plurality of feature value groups by mapping, based on a mapping table 370, the plurality of weight value groups and the plurality of feature value groups to MAC units included in the computation unit 340. In this case, the mapping table 370 may be pre-generated. Also, according to an example, the transformation data processor 330 may consist of a plurality of circuit elements that receive pieces of data from the weight transformer 310 and the input feature map transformer 320 and output processing results to the computation unit 340.
Referring back to
The computation unit 340 may include a plurality of circuit elements for performing computations. In detail, according to the disclosed embodiment, the computation unit 340 may include a plurality of types of MAC units having different structures.
According to the disclosed embodiment, a weight value and a feature value transformed to the Winograd domain may have greater bit lengths than a weight value and a feature value not transformed to the Winograd domain. Therefore, in order to reduce the area of hardware performing computations, a bit length of a transformed feature value or a transformed weight value may need to be reduced by using various types of MAC units. In this case, elements in an input weight value group and elements in an input feature value group may be truncated to a reduced bit length when input to a multiplier and restored to the original bit length after undergoing a multiplication operation.
Each of a plurality of MAC units included in the computation unit 340 may be used to perform an MAC operation by multiplying and adding elements of a weight value group and elements of a feature value group.
For example, referring to
According to an embodiment, an N-th type MAC unit 342 (e.g., the first type MAC unit) included in the computation unit 340 may receive a weight value group and a feature value group as an input and then multiply an element in the weight value group by a corresponding element in the feature value group for each channel. The computation unit 340 may cumulatively add results of performing element-wise multiplication operations channel by channel to thereby output a partial sum that is a result of a MAC operation between the weight value group and the feature value group.
Referring back to
The computation data processor 350 may generate a transformed output feature map by using MAC partial sums calculated by the plurality of MAC units. For example, the computation data processor 350 may collect partial sums that results of performing MAC operations between weight value groups and corresponding feature value groups, and reorganize the resulting values to have the same coordinates as before the weight value groups and the feature value groups are mapped to the computation unit 340. Therefore, the computation data processor 350 may generate a transformed output feature map having the same matrix structure as before the transformed input feature maps are input to the computation unit 340. According to an example, the computation data processor 350 may include a plurality of circuit elements that generate the transformed output feature map.
According to an embodiment, the inverse transformer 360 may be connected to the computation data processor 350 and receive the transformed output feature map generated by the computation data processor 350.
For example, referring to
In describing
Referring to
The transformation data processor 330 may map, based on the mapping table 370, the created plurality of weight value groups and the created plurality of feature value groups to a plurality of types of MAC units included in the computation unit 340.
According to an embodiment, each of the plurality of types of MAC units included in the computation unit 340 may receive a weight value group and a feature value group as an input and output a result of performing a MAC operation thereon.
For example, an N-th type MAC unit 430, which is one of a plurality of MAC units, may receive, as an input, a weight value group 432 among the plurality of weight value groups and a feature value group 434 corresponding to the weight value group 432 and perform a MAC operation therebetween to output a partial sum 436.
According to an embodiment, the computation data processor 350 may collect partial sums output by performing MAC operations between weight value groups and corresponding feature value groups, and reorganize the collected partial sums to generate a transformed output feature map 440.
In describing
According to an embodiment, the pre-generated mapping table 370 may be obtained by the statistical characteristics analyzer 380 based on a transformed weight kernel, a transformed input feature map, and a transformed output feature map, which are obtained via transformation to the Winograd domain and have a matrix structure.
Referring to
In detail, for a matrix BTIB representing the transformed input feature map 240, a matrix (GWGT representing the transformed weight kernel 230, and a matrix ATXA representing the output feature map 270, a plurality of coefficient matrices CBi,j, CGi,j, and CAi,j may be respectively calculated using predefined computation processes. A detailed method of calculating the plurality of coefficient matrices CBi,j, CGi,j, and CAi,j is described in detail in the description with reference to
In operation S520, an input feature map sensitivity matrix SB, a weighted kernel sensitivity matrix SG, and an output feature map sensitivity matrix SA may be respectively calculated based on the input feature map coefficient matrices CBi,j, the weight kernel coefficient matrices CGi,j, and the output feature map coefficients CAi,j. A detailed method of calculating a plurality of sensitivity matrices SB, SG, and SA based on a plurality of coefficient matrices CBi,j, CGi,j, and CAi,j is described in detail in the description with reference to
In operation S530, the statistical characteristics analyzer 380 according to an embodiment may generate, based on at least one of the input feature map sensitivity matrix, the weight kernel sensitivity matrix, and the output feature map sensitivity matrix, a mapping table that maps MAC unit types to correspond to sensitivity map elements, and provide the mapping table to the electronic device 300.
Referring to
[BTIB]i,j=ΣΣ([CBi,j]⊙[I]) [Equation 1]
In Equation 1, BTIB denotes a transformed input feature map matrix, CBi,j denotes an input feature map coefficient matrix, and I denotes an input feature map matrix.
A matrix element [BTIB]i,j at coordinates (i,j) in the transformed input feature map matrix BTIB may be obtained based on the input feature map coefficient matrix CBi,j and the input feature map matrix I. Specifically, [BTIB]i,j may be obtained by performing an element-wise multiplication operation ⊙ between an input feature map coefficient matrix CBi,j corresponding to the coordinates (i, j) and the input feature map matrix I and then adding together all matrix elements of the resulting matrix. For example, in the transformed input feature map matrix V(=BTIB), a matrix element V1,1 at coordinates (1,1) may be obtained by performing an element-wise multiplication ⊙ between an input feature map coefficient matrix CB1,1 corresponding to the coordinates (1,1) and the input feature map matrix I and then adding all matrix elements of the resulting matrix.
In addition, because matrix elements in the input feature map coefficient matrix CBi,j that satisfies Equation 1 are elements that affect the matrix element [BTIB]i,j at coordinates (i, j) in the transformed input feature map matrix BTIB, an input feature map sensitivity matrix SB may be defined as in Equation 2 based on the input feature map coefficient matrix CBi,j.
[SB]i,j=ΣmΣn|[CBi,j]m,n| [Equation 2]
In Equation 2, SB denotes an input feature map sensitivity matrix, and CBi,j denotes an input feature map coefficient matrix.
The matrix element [SB]i,j at coordinates (i, j) in the input feature map sensitivity matrix SB may be obtained by adding together absolute values of all matrix elements of the input feature map coefficient matrix CBi,j corresponding to coordinates (i, j).
Referring to
[GWGT]i,j=ΣΣ([CGi,j]⊙[W]) [Equation 3]
In Equation 3, GWGT denotes a transformed weight kernel matrix, CGi,j denotes a weight kernel coefficient matrix, and W denotes a weight kernel matrix.
A matrix element [GWGT]i,j at coordinates (i,j) in the transformed weight kernel matrix GWGT may be obtained based on the weight kernel coefficient matrix CGi,j and the weight kernel matrix W. Specifically, [GWGT]i,j may be obtained by performing an element-wise multiplication operation ⊙ between an weight kernel coefficient matrix CGi,j corresponding to the coordinates (i, j) and the weight kernel matrix W and then adding together all matrix elements of the resulting matrix. For example, in the transformed weight kernel matrix U(=GWGT), a matrix element U1,1 at coordinates (1,1) may be obtained by performing an element-wise multiplication ⊙ between a weight kernel coefficient matrix CG1,1 corresponding to the coordinates (1,1) and the weight kernel matrix W and then adding all matrix elements of the resulting matrix.
In addition, because matrix elements in the weight kernel coefficient matrix CGi,j that satisfies Equation 3 are elements that affect the matrix element [GWGT]i,j at coordinates (i, j) in the transformed weight kernel matrix GWGT, a weight kernel sensitivity matrix SG may be defined as in Equation 4 based on the weight kernel coefficient matrix CGi,j.
[SG]i,j=ΣmΣn|[CGi,j]m,n| [Equation 4]
In Equation 4, SG denotes a weight kernel sensitivity matrix, and CGi,j denotes a weight kernel coefficient matrix.
The matrix element [SG]i,j at coordinates (i, j) in the weight kernel sensitivity matrix SG may be obtained by adding together absolute values of all matrix elements of the weight kernel coefficient matrix CGi,j corresponding to coordinates (i, j).
Referring to
[ATXA]i,j=ΣΣ([CAi,j]⊙[X]) [Equation 5]
In Equation 5, ATXA denotes an output feature map matrix, CAi,j denotes an output feature map coefficient matrix, and X denotes a transformed output feature map matrix.
A matrix element [ATXA]i,j at coordinates (i, j) in the output feature map matrix ATXA may be obtained based on the output feature map coefficient matrix CAi,j and the transformed output feature map matrix X. Specifically, [ATXA]i,j may be obtained by performing an element-wise multiplication operation ⊙ between an output feature map coefficient matrix CAi,j corresponding to the coordinates (i, j) and the transformed output feature map matrix X and then adding together all matrix elements of the resulting matrix. For example, in the output feature map matrix Y(=ATXA), a matrix element Y1,1 at coordinates (1,1) may be obtained by performing an element-wise multiplication ⊙ between an output feature map coefficient matrix CA1,1 corresponding to the coordinates (1,1) and the transformed output feature map matrix X and then adding all matrix elements of the resulting matrix.
In addition, because matrix elements in the output feature map coefficient matrix CAi,j that satisfies Equation 5 are elements that affect the matrix element [ATXA]i,j at coordinates (i, j) in the output feature map matrix ATXA, an output feature map sensitivity matrix may be defined as in Equation 6 based on the output feature map coefficient matrix CAi,j.
[SA]i,j=(ΣmΣn|[CAm,n]|)i,j [Equation 6]
In Equation 6, SA denotes an output feature map sensitivity matrix, and CAm,n denotes an output feature map coefficient matrix.
The matrix element [SA]i,j at coordinates (i, j) in the output feature map sensitivity matrix SA may be a matrix element at coordinates (i, j) in a matrix obtained by adding together absolute values of all output feature map coefficient matrices.
A plurality of types of MAC units mapped based on the mapping table 370 may perform MAC operations in different modes for each MAC unit type. In this case, because the plurality of types of MAC units each receive an input with reduced precision by reducing a bit length of an input value in different ways and restore the precision of the input to output a MAC result, the degree to which an output value is reconstructed is different for each MAC unit type.
Therefore, the statistical characteristics analyzer 380 may generate, based on at least one of the weighted kernel sensitivity matrix, the input feature map sensitivity matrix, and the output feature map sensitivity matrix, a mapping table so that a MAC unit type with lower precision loss is mapped to correspond to a sensitivity matrix element requiring higher precision.
In detail, referring to an input feature map sensitivity matrix 510, a matrix element at coordinates (2, 2) in the input feature map sensitivity matrix 510 may have a highest value. In this case, the statistical characteristics analyzer 380 may map, as a MAC unit type corresponding to the matrix element at coordinates (2,2), a MAC unit type with lower precision loss than MAC unit types corresponding to the remaining matrix elements at the other coordinates.
Also, referring to a weighted kernel sensitivity matrix 520, matrix elements at coordinates (1,1), (1,4), (4,1), and (4,4) in the weighted kernel sensitivity matrix 520 may have a lowest value, matrix elements at coordinates (1,2), (1,3), (2,1), (2,4), (3,1), (3,4), (4,2), and (4,3) may have an intermediate value, and matrix elements at coordinates (2,2), (2,3), (3,2), and (3,3) may have a highest value. In this case, the statistical characteristics analyzer 380 may map a MAC unit type with lower precision loss to correspond to a matrix element having a higher value.
In addition, because the same description as for the weight kernel sensitivity matrix 520 is applied to an output feature map sensitivity matrix 530, a description thereof will be omitted herein.
According to an embodiment, the statistical characteristics analyzer 380 may generate a mapping matrix 550 that maps a MAC unit type for performing a MAC operation to a weight value group and a feature value group, and create the mapping table 370 based on the mapping matrix 550.
Referring to
According to an embodiment, the transformation data processor 330 may output, based on the mapping table 370 generated through the above-described processes, a plurality of weight value groups and a plurality of feature value groups to a plurality of types of MAC units mapped thereto.
In order for the electronic device 300 according to an embodiment to perform computation using a trained neural network model, an input value may need to be converted from floating-point precision to fixed-point precision via quantization. The electronic device 300 may receive an input value having a bit length reduced via quantization to perform computation thereon.
A method of quantizing input values for the electronic device 300 may be a quantization method 610 based on maximum-minimum values. For example, a maximum-minimum based quantization method may be used to convert input values to values having a smaller bit length (e.g., 8 bits) based on maximum and minimum values of values included in a weight kernel and values included in an input feature map. In this case, the input values may be quantized to values between −128 and 127 centered on a zero point, or quantized to values between 0 and 255 with respect to the zero point.
Also, a method of quantizing input values for the electronic device 300 may be a quantization method 620 based on threshold values. For example, a threshold-based quantization method may involve determining a maximum threshold value and a minimum threshold value among values included in a weight kernel and values included in an input feature map, and converting input values to values having a smaller bit length (e.g., 8 bits) based on the maximum threshold value and the minimum threshold value. In this case, the input values may be quantized to values between −128 and 127 centered on the zero point, or quantized to values between 0 and 255 with respect to the zero point.
Furthermore, a method of quantizing input values for the electronic device 300 may be determined based on characteristics of the input values. For example, when an input value is a value representing a weight kernel, weight values are fixed as the neural network model is defined, and thus, maximum and minimum values of the weight values may be already clear. In this case, the quantization method may be the quantization method 610 based on maximum-minimum values. In another example, when an input value is a value representing an input feature map, maximum and minimum values of feature values may not be clearly fixed because arbitrary data may be input. In this case, the quantization method may be the quantization method 620 based on threshold values. Although the method of quantizing an input value and converting it to fixed-point precision has been described through the above-described embodiments, the quantization method is not limited thereto, and an input value input to the electronic device 300 may be quantized using various quantization schemes.
According to an embodiment, the computation unit 340 may receive a fixed-point number via conversion using quantization and perform a shift operation and a MAC operation thereon.
In the description with respect to
Referring to
According to an embodiment, the multiplier unit 710 may include a cross shift unit 720, a multiplier 730, and a restoration shifter 740, and the cross shift unit 720 may include a first shifter 722 and a second shifter 724.
According to an embodiment, the first shifter 722 included in the cross shift unit 720 may receive a first fixed-point number representing a weight value to perform a right shift operation by using the first shifting method.
In detail, when a bit length of the input first fixed-point number does not exceed a preset first bit length, the first shifter 722 may bypass the first fixed-point number. Also, when the bit length of the input first fixed-point number exceeds the preset first bit length, the first shifter 722 may reduce the bit length by shifting the first fixed-point number to the right by a bit length NW in excess of the first bit length. In this case, the first bit length may correspond to an input bit length of a first input fed to the multiplier.
According to an embodiment, the second shifter 724 included in the cross shift unit 720 may receive a second fixed-point number representing a feature value to perform a right shift operation by using the first shifting method.
In detail, when a bit length of the input second fixed-point number does not exceed a preset second bit length, the second shifter 724 may bypass the second fixed-point number. Also, when the bit length of the input second fixed-point number exceeds the preset second bit length, the second shifter 724 may reduce the bit length by shifting the second fixed-point number to the right by a bit length ND in excess of the second bit length. In this case, the second bit length may correspond to an input bit length of a second input fed to the multiplier.
According to an embodiment, the cross shift unit 720 may exchange the value of the first fixed-point number and the value of the second fixed-point number, based on at least one of a value of the first fixed-point number, the first bit length, a value of the second fixed-point number, and the second bit length, so that the value of the first fixed-point number and the value of the second fixed-point number are input in a crossed manner. In this case, the first fixed-point number may be input to the second shifter 724 to perform a right shift operation, and the second fixed-point number may be input to the first shifter 722 to perform a right shift operation. A method, performed by the cross shift unit 720 of exchanging the value of the first fixed-point number and the value of the second fixed-point number is described in more detail with reference to
According to an embodiment, the multiplier 730 may receive a value of the first bit length output from the first shifter 722 and a value of the second bit length output from the second shifter 724 to perform a multiplication operation between the values.
According to an embodiment, the restoration shifter 740 may receive a multiplication operation result output from the multiplier 730 to perform a left shift operation for restoring a bit length. In detail, the restoration shifter 740 may restore a bit length by shifting the multiplication operation result output from the multiplier 730 to the left by a sum of the bit length NW by which the first shifter 722 shifts right and the bit length ND by which the second shifter 724 shifts right. Also, NW+ND, which is a bit length by which the restoration shifter 740 shifts the multiplication operation result to the left, may be received from the cross shift unit 720.
According to an embodiment, the accumulator 750 may accumulate and add values output from the plurality of multiplier units included in the first type MAC unit. The accumulator 750 may include an adder and a register.
According to an embodiment, the cross shift unit 720 may allow a value of a first fixed-point number and a value of a second fixed-point number to be exchanged and input, based on the value of the first fixed-point number, a first bit length, the value of the second fixed-point number, and a second bit length.
Hereinafter, referring to an example table 705 for convenience of description, operations of the multiplier unit 710 are described by considering an example in which a bit length of a weight value (the first fixed-point number) is 20 bits, a bit length of a feature value (the second fixed-point number) is 12 bits, the first bit length that is a bit length of a first input value fed to the multiplier 730 is 16 bits, and the second bit length that is a bit length of a second input value fed to the multiplier 730 is 10 bits.
In operation S710, according to an embodiment, the multiplier unit 710 may receive the first fixed-point number representing the weight value and the second fixed-point number representing the feature value. For example, the multiplier unit 710 may receive the 20-bit first fixed-point number representing the weight value and the 12-bit second fixed-point number representing the feature value.
In operation S720, according to an embodiment, the cross shift unit 720 may exchange the value of the first fixed-point number and the value of the second fixed-point number, so that the first fixed-point number and the second fixed-point number are respectively input in a crossed manner to the second shifter 724 and the first shifter 722.
For example, when they are not input in a crossed manner, the 20-bit first fixed-point number may be input to the first shifter 722 and shifted to the right by 4 bits, and the 12-bit second fixed-point number may be input to the second shifter 724 and shifted to the right by 2 bits. However, if the value of the 12-bit second fixed-point number is greater than 2{circumflex over ( )}10, precision loss may occur when the second fixed-point number is input to the second shifter 724 and shifted to the right by 2 bits. In this case, if the value of the 20-bit first fixed-point number is less than 2{circumflex over ( )}12 and the value of the 12-bit second fixed-point number is greater than 2{circumflex over ( )}10, the cross shift unit 720 may allow the first fixed-point number to be input to the second shifter 724 and shifted to the right by 4 bits and the second fixed-point number to be input to the first shifter 722 and shifted to the right by 2 bits, thereby preventing precision loss with respect to the second fixed-point number. However, the present disclosure is not limited thereto, and instead of exchanging the values of the first fixed-point number and the second fixed-point number, the cross shift unit 720 may allow the value of the first fixed-point number and the value of the second fixed-point number to be respectively input to the first shifter 722 and the second shifter 724.
In operation S730, according to an embodiment, the first shifter 722 may shift the first fixed-point number to the right based on the first bit length.
For example, when the first fixed-point number has a bit length of 20 bits and the first bit length indicating a bit length input to the multiplier is 16 bits, the first shifter 722 may shift the first fixed-point number right by 4 bits.
In operation S740, according to an embodiment, the second shifter 724 may shift the second fixed-point number to the right based on the second bit length.
For example, when the second fixed-point number has a bit length of 12 bits and the second bit length indicating another bit length input to the multiplier is 10 bits, the second shifter 724 may shift the second fixed-point number right by 2 bits. Although operations S730 and S740 are shown sequentially in
In operation S750, according to an embodiment, the multiplier 730 may receive an output value obtained via shifting by the first shifter 722 and an output value obtained via shifting by the second shifter 724 and perform a multiplication operation between the output values.
For example, the multiplier 730 may receive a value obtained by the first shifter 722 shifting the first fixed-point number by 4 bits and a value obtained by the second shifter 724 shifting the second fixed-point number by 2 bits and perform a multiplication operation between the values.
In operation S760, according to an embodiment, the restoration shifter 740 may receive a multiplication operation result from the multiplier 730 and perform a left shift operation thereon to restore the bit length. Specifically, the restoration shifter 740 may shift the multiplication operation result from the multiplier 730 to the left by a sum of the bit length by which the first shifter 722 shifts the first fixed-point number to the right and the bit length by which the second shifter 724 shifts the second fixed-point number to the right.
For example, when the first shifter 722 shifts the first fixed-point number to the right by 4 bits and the second shifter 724 shifts the second fixed-point number to the right by 2 bits, the restoration shifter 740 may shift the multiplication operation result output from the multiplier 730 to the left by 6 bits.
According to an embodiment, the multiplier unit 710 may generate lower bit-width values by shifting input values to the right, perform a multiplication operation on the lower bit-width values, and restore a bit length by shifting a multiplication operation result back to the left. Accordingly, the hardware area of the multiplier unit may be reduced by performing computation on lower bit-widths, while minimizing precision loss that occurs during the computation.
Referring to
For example, when the transformed feature map and the transformed weight kernel each consist of N channels, one weight value group may include N weight values, and one feature value group may include N feature values. In this case, the first type MAC unit may perform a MAC operation by respectively multiplying weight values (weight value A to weight value N) by feature values (feature value A to feature value N) and adding together their multiplication results.
The first type MAC unit may output a partial sum by performing a MAC operation between the weight value group and the feature value group, i.e., by respectively obtaining multiplication operation results (multiplication operation result A to multiplication operation result N) from the plurality of multiplier units and accumulating and adding the plurality of multiplication operation results using the accumulator 750.
In the description with respect to
Referring to
According to an embodiment, the multiplier unit 810 may include a first shifter 820, a second shifter 830, a multiplier 840, and a first restoration shifter 850.
According to an embodiment, the first shifter 820 may receive a first fixed-point number representing a weight value to perform a right shift operation by using the second shifting method.
In detail, the first shifter 820 may reduce a bit length of the received first fixed-point number by shifting the first fixed-point number to the right by a first bit length NW, which is a fixed-shift bit length, regardless of the bit length of the first fixed-point number. In this case, the first bit length may be a value determined by a statistical characteristics analyzer 805 based on a result of analyzing statistical characteristics of weight values off-line. A method, performed by the statistical characteristics analyzer 805, of determining the first bit length that is the fixed-shift bit length is described in detail below with reference to
According to an embodiment, the second shifter 830 may receive a second fixed-point number representing a feature value to perform a right shift operation by using the first shifting method.
When a bit length of the received second fixed-point number does not exceed a preset second bit length, the second shifter 830 may bypass the second fixed-point number. Also, when the bit length of the received second fixed-point number exceeds the preset second bit length, the second shifter 830 may reduce the bit length by shifting the second fixed-point number to the right by a bit length ND in excess of the second bit length.
According to an embodiment, the multiplier 840 may receive a value output from the first shifter 820 (a value obtained by shifting the first fixed-point number to the right by the first bit length) and a value of the second bit length output from the second shifter 830 to perform a multiplication operation between the values.
According to an embodiment, the first restoration shifter 850 may receive a multiplication operation result output from the multiplier 840 to perform a left shift operation for restoring a bit length.
In detail, the first restoration shifter 850 may restore a bit length by shifting the multiplication operation result output from the multiplier 840 to the left by the bit length ND by which the second shifter 830 performs the right shift. Also, the bit length ND by which the first restoration shifter 850 shifts the multiplication operation result to the left may be received from the second shifter 830.
According to an embodiment, the accumulator 860 may accumulate and add values output from the plurality of multiplier units included in the second type MAC unit. The accumulator 860 may include an adder and a register.
According to an embodiment, the second restoration shifter 870 may receive an operation result output from the accumulator 860 and perform a left shift operation for restoring a bit length.
Specifically, the second restoration shifter 870 may restore the bit length by shifting a multiplication operation result output from the accumulator 860 to the left by the bit length NW by which the first shifter 820 shifts right. Also, the bit length NW by which the second restoration shifter 870 shifts the multiplication operation result to the left may be received from the statistical characteristics analyzer 805.
In another embodiment, when a neural network model is defined, weight values may be fixed so that maximum and minimum values thereof may be clear, and accordingly, an operation of shifting right by the first bit length may be processed in advance. In this case, the multiplier unit 810 may not include the first shifter. That is, a value having a bit length reduced by shifting a weight value right by the first bit length via offline processing may be input to the multiplier 840.
Referring to
For example, when the transformed feature map and the transformed weight kernel each consist of N channels, one weight value group may include N weight values, and one feature value group may include N feature values. In this case, the second type MAC unit 800 may perform a MAC operation by respectively multiplying weight values (weight value A to weight value N) by feature values (feature value A to feature value N) and adding together their multiplication results.
The second type MAC unit 800 may output a partial sum by performing a MAC operation between the weight value group and the feature value group, i.e., by respectively obtaining multiplication operation results (multiplication operation result A to multiplication operation result N) from the plurality of multiplier units, accumulating and adding the plurality of multiplication operation results using the accumulator 860, and restoring, by using the second restoration shifter 870, bit lengths respectively reduced by the plurality of multiplier units shifting right by the fixed-shift bit length (the first bit length).
The plurality of multiplier units 810 may reduce bit lengths by shifting weight values respectively input thereto by using the second shifting method, and the second restoration shifter 870 may be used to restore the reduced bit lengths to the original ones. In other words, the plurality of multiplier units 810 may collectively shift the weight values to the right by the first bit length that is a bit length determined by the statistical characteristics analyzer 805. Therefore, the plurality of multiplier units 810 may share a value of the first bit length to allow the second restoration shifter 870 to collectively restore reduced bit lengths during restoration.
In addition, the plurality of multiplier units 810 may reduce bit lengths by shifting feature values respectively input thereto by using the first shifting method, and use the second restoration shifter 870 to restore the reduced bit lengths to the original ones. However, the present disclosure is not limited thereto, and the plurality of multiplier units 810 may perform a MAC operation by shifting weight values using the first shifting method while shifting feature values using the second shifting method.
According to an embodiment, the plurality of multiplier units 810 may share a single second restoration shifter 870 to restore bit lengths of pieces of data reduced by the plurality of multiplier units shifting the pieces of data by using the second shifting method, thereby reducing the number of shifters used in the MAC unit and thus reducing the hardware area covered by the multiplier units 810 and the second type MAC unit 800 including the multiplier units 810.
In the description with respect to
Referring to
According to an embodiment, the multiplier unit 810 may include a first shifter 920, a second shifter 930, and a multiplier 940.
According to an embodiment, the first shifter 920 may receive a first fixed-point number representing a weight value to perform a right shift operation by using the second shifting method.
In detail, the first shifter 920 may reduce a bit length of the received first fixed-point number by shifting the first fixed-point number to the right by a first bit length NW, which is a fixed-shift bit length, regardless of the bit length of the first fixed-point number.
In this case, the first bit length may be a value determined by a statistical characteristics analyzer 905 based on a result of analyzing statistical characteristics of weight values off-line. A method, performed by the statistical characteristics analyzer 905, of determining the first bit length that is the fixed-shift bit length is described in detail below with reference to
According to an embodiment, the second shifter 930 may receive a second fixed-point number representing a feature value to perform a right shift operation by using the second shifting method.
In detail, the second shifter 930 may reduce a bit length of the received second fixed-point number by shifting the second fixed-point number to the right by a second bit length ND, which is a fixed-shift bit length, regardless of the bit length of the second fixed-point number. In this case, the second bit length may be a value determined by the statistical characteristics analyzer 905 based on a result of analyzing statistical characteristics of feature values off-line. A method, performed by the statistical characteristics analyzer 905, of determining the second bit length that is the fixed-shift bit length is described in detail below with reference to
According to an embodiment, the multiplier 940 may receive a value output from the first shifter 920 (a value obtained by shifting the first fixed-point number to the right by the first bit length) and a value output from the second shifter 930 (a value obtained by shifting the second fixed-point number to the right by the second bit length) to perform a multiplication operation between the values.
According to an embodiment, the accumulator 950 may accumulate and add values output from the plurality of multiplier units included in the third type MAC unit. The accumulator 950 may include an adder and a register.
According to an embodiment, the restoration shifter 960 may receive an operation result output from the accumulator 950 and perform a left shift operation for restoring a bit length.
In detail, the restoration shifter 960 may restore a bit length by shifting the multiplication operation result output from the accumulator 950 to the left by a sum (NW+ND) of the bit length NW by which the first shifter 920 shifts right and the bit length NW by which the second shifter 930 shifts right. Also, NW+ND, which is a bit length by which the restoration shifter 960 shifts the multiplication operation result to the left, may be received from the statistical characteristics analyzer 905.
In another embodiment, when a neural network model is defined, weight values may be fixed so that maximum and minimum values thereof may be clearly designated, and accordingly, an operation of shifting right by the first bit length may be processed in advance. In this case, the multiplier unit 910 may not include the first shifter. That is, a value having a bit length reduced by shifting a weight value right by the first bit length via off-line processing may be input to the multiplier 940.
Referring to
For example, when the transformed feature map and the transformed weight kernel each consist of N channels, one weight value group may include N weight values and one feature value group may include N feature values. In this case, the third type MAC unit 900 may perform a MAC operation by respectively multiplying weight values (weight value A to weight value N) by feature values (feature value A to feature value N) and adding together their multiplication results.
The third type MAC unit 900 may output a partial sum by performing a MAC operation between the weight value group and the feature value group, i.e., by respectively obtaining multiplication operation results (multiplication operation result A to multiplication operation result N) from the plurality of multiplier units, accumulating and adding the plurality of multiplication operation results using the accumulator 950, and restoring, by using the restoration shifter 960, bit lengths respectively reduced by the plurality of multiplier units shifting right by the fixed-shift bit lengths (the first and second bit lengths).
According to an embodiment, the plurality of multiplier units 910 may reduce bit lengths by shifting transformed weight values and feature values respectively input thereto by using the second shifting method, and use the restoration shifter 960 to restore the reduced bit lengths to the original ones. In other words, the plurality of multiplier units 910 may collectively shift the weight values to the right by the first bit length that is a bit length determined by the statistical characteristics analyzer 905 off-line. In addition, the plurality of multiplier units 910 may collectively shift the feature values to the right by the second bit length that is a bit length determined by the statistical characteristics analyzer 905 off-line. Thus, the plurality of multiplier units 910 may share a value of the first bit length and a value of the second bit length to allow the restoration shifter 960 to collectively restore reduced bit lengths during restoration.
According to an embodiment, the plurality of multiplier units 910 may share a single restoration shifter 960 to restore bit lengths of pieces of data reduced by the plurality of multiplier units shifting the pieces of data using the second shifting method, thereby reducing the number of shifters used in the MAC unit and thus reducing the hardware area covered by the multiplier units 910 and the third type MAC unit 900 including the multiplier units 910.
Unlike the above, an input feature map and a weight kernel according to an embodiment may be pieces of data that have not been transformed to the Winograd domain. In this case, bit lengths of a feature value included in the input feature map and a weight value included in the weight kernel may be less than bit lengths of a feature value transformed to the Winograd domain and a weight value transformed to the Winograd domain. Therefore, an operation of reducing the bit lengths of the feature value or weight value may not be needed.
In the description with respect to
Referring to
According to an embodiment, the multiplier unit 1010 may include a first shifter 1020, a multiplier 1030, and a restoration shifter 1040.
According to an embodiment, the first shifter 1020 may receive a first fixed-point number representing a weight value to perform a right shift operation by using the first shifting method.
When a bit length of the received first fixed-point number does not exceed a preset first bit length, the first shifter 1020 may bypass the first fixed-point number. Also, when the bit length of the received first fixed-point number exceeds the preset first bit length, the first shifter 1020 may reduce the bit length by shifting the first fixed-point number to the right by a bit length NW in excess of the first bit length. In this case, the first bit length may be a value determined by a statistical characteristics analyzer based on a result of analyzing statistical characteristics of weight values off-line. The statistical characteristics analyzer 1005 may determine the first bit length that is a bit length input to the multiplier 1030.
According to an embodiment, the multiplier 1030 may receive a value output from the first shifter 1020 and a non-shifted feature value to perform a multiplication operation therebetween.
According to an embodiment, the restoration shifter 1040 may receive a multiplication operation result output from the multiplier 1030 to perform a left shift operation for restoring a bit length.
According to an embodiment, the accumulator 1050 may accumulate and add values output from the plurality of multiplier units included in the fourth type MAC unit. The accumulator 1050 may include an adder and a register.
In
Referring to
According to an embodiment, the statistical characteristics analyzer 1100 may analyze statistical characteristics of weight kernels 1130 transformed to the Winograd domain and input feature maps 1140 transformed to the Winograd domain.
In operation S1110, the statistical characteristics analyzer 1100 may analyze maximum and minimum values for each of a plurality of feature value groups and each of a plurality of weight value groups.
Specifically, the statistical characteristics analyzer 1100 may calculate a frequency distribution for each of the plurality of feature value groups. For example, the statistical characteristics analyzer 1100 may create a histogram by calculating a frequency distribution for a first feature value group comprising of feature values at coordinates (1, 1) in matrices representing a plurality of transformed input feature maps 1140. The statistical characteristics analyzer 1110 may generate transformed input feature map statistical characteristics 1150 by calculating a frequency distribution of feature values and creating a histogram for each of the first to sixteenth feature value groups in the plurality of transformed input feature maps 1140.
In the same way, the statistical characteristics analyzer 1100 may generate transformed weight kernel statistical characteristics 1160 by calculating a frequency distribution and creating a histogram for each of the plurality of weight value groups.
According to an embodiment, the statistical characteristics analyzer 1100 may obtain, based on created histograms, maximum and minimum values for each of the plurality of feature value groups and each of the plurality of weight value groups.
In operation S1120, the statistical characteristics analyzer 1100 may calculate fixed-shift bit lengths Nshift for each of the plurality of feature value groups and each of the plurality of weight value groups. For example, the fixed-shift bit length Nshift may be calculated by using Equation 7.
Here, max_val denotes a maximum value of weight values included in a weight value group, and min_val denotes a minimum value of the weight values in the weight value group. Alternatively, max_val denotes a maximum value of feature values included in a feature value group, and min_val denotes a minimum value of the feature values included in the feature value group.
Because statistical characteristics of each of the plurality of feature value groups and statistical characteristics of each of the plurality of weight value groups according to an embodiment are all different, fixed-shift bit lengths calculated for each of the feature value groups and each of the weight value groups may all be different.
In addition, when a neural network model is defined, weight values may be fixed and thus maximum and minimum values thereof may be already clear, while maximum and minimum values for an input feature map may not be clearly fixed because the input feature map is arbitrary data. In this case, a data set for calibration may be analyzed offline, threshold values corresponding to maximum and minimum values may be determined, and statistical characteristics of the feature values may be analyzed based on the determined threshold values.
A fixed-shift bit length calculated by the statistical characteristics analyzer 1100 according to an embodiment may be provided to the second type MAC unit as described with reference to
In addition, the fixed-shift bit length calculated by the statistical characteristics analyzer 1100 may be provided to the third type MAC unit as described with reference to
While the embodiments have been described with reference to limited examples and figures, it will be understood by those of ordinary skill in the art that various modifications and changes in form and details may be made from the above descriptions. For example, adequate effects may be achieved even when the aforementioned components such as computer systems or modules are coupled or combined in different forms and modes than those described above or are replaced or supplemented by other components or their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2020-0147081 | Nov 2020 | KR | national |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/KR2021/015706 | Nov 2021 | US |
Child | 18142170 | US |