This application relates to digital circuit technologies, and in particular, to a multiplier.
Convolutional neural networks (CNNs) are widely used in image and speech recognition. Both training and inference of the CNNs require hardware to perform a large quantity of multiplication operations, and these multiplication operations are usually multiplication operations for different data formats. For example, a current mainstream processor or accelerator for implementing neural network computation may support 4-bit integers (integer 4, INT 4), 8-bit integers (integer 8, INT 8), or 16-bit floating point numbers (floating point 16, FP16).
Embodiments of this application provide a multiplier, to simultaneously implement a plurality of low bit width multiplication operations.
According to a first aspect, an embodiment of this application provides a multiplier, including a multiplicator input end for receiving multiplicator data, a multiplicand input end for receiving multiplicand data, a mask circuit for masking processing, and a multiplication operation circuit. A sum of bit widths of a first multiplicator and a second multiplicator included in the multiplicator data is less than a bit width of the multiplicator data, that is, less than a bit width of the multiplicator input end. Similarly, a sum of bit widths of a first multiplicand and a second multiplicand included in the multiplicand data is also less than a bit width of the multiplicand input end. The mask circuit is configured to respectively mask the first multiplicator and mask the second multiplicator in the multiplicator data to obtain a first mask result and a second mask result. The multiplication operation circuit is configured to respectively multiply the first mask result and the second mask result by the first multiplicand and the second multiplicand to obtain two multiplication results.
The mask circuit in the multiplier may mask a plurality of low bit width multiplicators respectively to calculate partial products corresponding to different multiplicators. Therefore, the multiplier can be adapted to multiplication operations of a plurality of low bit width multiplicators and a plurality of low bit width multiplicands in different data formats, thereby resolving a problem of a hardware resource waste caused because a single multiplier can process a multiplication operation of only one data format. Using the multiplier to implement multiplication operations of different data formats can reduce a hardware area occupied by the multiplier and reduce power consumption and overheads.
In a possible implementation, the multiplication operation circuit includes a Booth encoder. The Booth encoder may be a Booth encoder based on Radix-4, Radix-8, or another mode. Using the Booth encoder to implement a multiplication operation can reduce a hardware area of the multiplier and reduce power consumption.
In a possible implementation, the multiplication operation circuit further includes a partial product calculation circuit configured to perform partial product calculation based on encoding results generated by the Booth encoder, and an accumulator configured to accumulate a plurality of partial products generated by the partial product calculation circuit. The multiplicand is encoded, partial products of encoding results and a mask result corresponding to the multiplicand are calculated, and finally the obtained partial products are accumulated to implement a multiplication operation, thereby further saving hardware resources.
In a possible implementation, the Booth encoder includes a plurality of sub-encoders, configured to perform Booth encoding on the first multiplicand to obtain a first encoding result, and perform Booth encoding on the second multiplicand to obtain a second encoding result. There may be one or more encoding results. The partial product calculation circuit is specifically configured to calculate a first partial product of the first encoding result and the first mask result, and calculate a second partial product of the second encoding result and the second mask result. A quantity of partial products is the same as a quantity of encoding results. The accumulator is specifically configured to perform accumulation on the first partial product to obtain a result of multiplying the first multiplicator and the first multiplicand, and perform accumulation on the second partial product to obtain a result of multiplying the second multiplicator and the second multiplicand.
In a possible implementation, the multiplier further includes an adder, configured to add the result, obtained by the accumulator, of multiplying the first multiplicator and the first multiplicand and the result, obtained by the accumulator, of multiplying the second multiplicator and the second multiplicand. The adder can add results of all low bit width multiplication operations to implement a convolution calculation function.
In a possible implementation, the multiplier further includes a shifter, configured to shift the multiplication operation results obtained by the accumulator. A final calculation result may be obtained by shifting the multiplication operation results.
In a possible implementation, the data in the multiplicand input end includes the first multiplicand located at a less significant bit of the multiplicand input end, the second multiplicand located at a more significant bit, one extended bit of 0 inserted at an end of a least significant bit of the first multiplicand, and another bit set to 0 other than the first multiplicand, the second multiplicand, and the extended bit in the multiplicand input end. The multiplicator data includes: the first multiplicator located at a less significant bit of the multiplicator input end, the second multiplicator located at a more significant bit, and another bit set to 0 other than the first multiplicator and the second multiplicator in the multiplicator input end. A position of the first multiplicator in the multiplicator input end is the same as a position of the first multiplicand in the multiplicand input end, and a position of the second multiplicator in the multiplicator input end is the same as a position of the second multiplicand in the multiplicand input end. The another bit is set to 0, so that an encoding result of the bit set to 0 does not affect subsequent partial product calculation and partial product accumulation. The two multiplicators and the two multiplicands are respectively located at the same positions, so that when the accumulator accumulates partial products, no additional shift operation is required to align the partial products, thereby saving hardware resources.
In a possible implementation, the first multiplicand and the second multiplicand in the multiplicand input end are separated by at least one bit of 0, the first multiplicator and the second multiplicator in the multiplicator input end are separated by at least one bit of 0, and the multiplier further includes a selector, configured to output data of a most significant bit of the first multiplicand to a most significant bit of a corresponding first sub-encoder, and output data 0 to a least significant bit of a sub-encoder that is adjacent to the first sub-encoder and that encodes an idle bit. The idle bit is a bit set to 0 between the first multiplicand and the second multiplicand. When multiplicands and multiplicators are stored in the multiplicand input end and the multiplicator input end in the foregoing manner, the selector may implement allocation of valid data and 0, so that a sub-encoder correctly encodes data in the multiplicand input end.
In a possible implementation, a most significant bit of the first multiplicand in the multiplicand input end is adjacent to a least significant bit of the second multiplicand, a most significant bit of the first multiplicator in the multiplicator input end is adjacent to a least significant bit of the second multiplicator, and the multiplier further includes a selector, configured to output data of the most significant bit of the first multiplicand to a most significant bit of a corresponding first sub-encoder, and output data 0 to a least significant bit of a second sub-encoder. The second sub-encoder is a sub-encoder that is adjacent to the first sub-encoder and that encodes the second multiplicand. When multiplicands and multiplicators are stored in the multiplicand input end and the multiplicator input end in the foregoing manner, the selector may implement allocation of valid data and 0, so that a sub-encoder correctly encodes data in the multiplicand input end.
In a possible implementation, the partial product sub-circuit includes a plurality of first partial product sub-circuits, a plurality of second partial product sub-circuits, and a plurality of third partial product sub-circuits. The plurality of first partial product sub-circuits are configured to respectively calculate a plurality of first partial products based on the first mask result by using a plurality of encoding results of the first multiplicand as control signals. The plurality of second partial product sub-circuits are configured to respectively calculate a plurality of second partial products based on the second mask result by using a plurality of encoding results of the second multiplicand as control signals. The plurality of third partial product sub-circuits are configured to respectively calculate a plurality of third partial products based on data in the multiplicator input end by using idle bits as control signals. The idle bits are bits set to 0 in the multiplicand input end. The accumulator is specifically configured to accumulate the plurality of first partial products, the plurality of second partial products, and the plurality of third partial products.
In a possible implementation, the multiplier further includes a switch. The switch is configured to: when in an on state, activate the mask circuit, the shifter, and the adder; and when in an off state, disable the mask circuit, the shifter, and the adder, in other words, the mask circuit, the shifter, and the adder directly transmit received data. The switch controls the mask circuit, the shifter, and the adder, so that the multiplier can switch between two modes of a plurality of multiplication operations and one multiplication operation, thereby further enhancing a capability of the multiplier to process multiplication operations.
In a possible implementation, the multiplier further includes a switch. The switch is configured to: when in an on state, activate the mask circuit; and when in an off state, disable the mask circuit, in other words, the mask circuit directly transmits received data. The switch controls the mask circuit, so that the multiplier can switch between two modes of a plurality of multiplication operations and one multiplication operation, thereby further enhancing a capability of the multiplier to process multiplication operations.
In a possible implementation, the mask circuit includes two AND gates, configured to respectively mask the first multiplicator and mask the second multiplicator in the multiplicator data to output the two mask results. Using two AND gates to implement a function of the mask circuit can further simplify a circuit structure of the multiplier, save hardware resources, and reduce power consumption.
In a possible implementation, a 1st sub-encoder of the plurality of sub-encoders in the encoder is configured to perform Booth encoding on data of an extended bit and an LSB to a (k−2)th bit of the multiplicand input end, and an ith sub-encoder is configured to perform Booth encoding on data of an (i×(k−1)+1)th bit to an ((i+1)×(k−1)+1)th bit of the multiplicand input end, where k is a bit width of each sub-encoder, k≥2 and k is an integer, and i≥2 and i is an integer.
In a possible implementation, the multiplier further includes a selector, configured to output data of the most significant bit of the first multiplicand to a most significant bit of an xth sub-encoder, and output 0 to a least significant bit of an (x+1)th sub-encoder. The xth sub-encoder is a sub-encoder that encodes the most significant bit MSB2 to an (MSB2−k+1)th bit of the first multiplicand.
In a possible implementation, the multiplier further includes a selector, configured to output 0 to a most significant bit of an xth sub-encoder, and output data of the least significant bit of the second multiplicand to a least significant bit of an (x+1)th sub-encoder. The xth sub-encoder is a sub-encoder that encodes the most significant bit MSB1 to an (MSB1−k+1)th bit of the first multiplicand, and the (x+1)th sub-encoder is a sub-encoder that encodes the least significant bit LSB2 to an (LSB2−k+1)th bit of the second multiplicand.
According to a second aspect, an embodiment of this application provides a multiplication calculation method, applied to a multiplier. The multiplier includes a multiplicator input end and a multiplicand input end, and the multiplication calculation method includes: receiving multiplicator data, where the multiplicator data includes a first multiplicator and a second multiplicator, and a sum of a bit width of the first multiplicator and a bit width of the second multiplicator is less than a bit width of the multiplicator data; masking the second multiplicator in the multiplicator data to obtain a first mask result, and masking the first multiplicator in the multiplicator data to obtain a second mask result; receiving a first multiplicand and a second multiplicand, where a sum of a bit width of the first multiplicand and a bit width of the second multiplicand is less than a bit width of the multiplicand input end; and performing a multiplication operation on the first mask result and the first multiplicand to obtain a result of multiplying the first multiplicator and the first multiplicand, and performing a multiplication operation on the second mask result and the second multiplicand to obtain a result of multiplying the second multiplicator and the second multiplicand.
The mask circuit in the multiplier may mask a plurality of low bit width multiplicators respectively to calculate partial products corresponding to different multiplicators. Therefore, the multiplier can be adapted to multiplication operations of a plurality of low bit width multiplicators and a plurality of low bit width multiplicands in different data formats, thereby resolving a problem of a hardware resource waste caused because a single multiplier can process a multiplication operation of only one data format. Using the multiplier to implement multiplication operations of different data formats can reduce a hardware area occupied by the multiplier and reduce power consumption and overheads.
In a possible implementation, the step of performing a multiplication operation on the first mask result and the first multiplicand to obtain a result of multiplying the first multiplicator and the first multiplicand, and performing a multiplication operation on the second mask result and the second multiplicand to obtain a result of multiplying the second multiplicator and the second multiplicand includes: performing Booth encoding on the first multiplicand and the second multiplicand. Using a Booth encoder to implement a multiplication operation can reduce a hardware area of the multiplier and reduce power consumption.
In a possible implementation, the step of performing a multiplication operation on the first mask result and the first multiplicand to obtain a result of multiplying the first multiplicator and the first multiplicand, and performing a multiplication operation on the second mask result and the second multiplicand to obtain a result of multiplying the second multiplicator and the second multiplicand further includes: performing partial product calculation based on encoding results generated by the Booth encoding, to obtain a plurality of partial products; and accumulating the plurality of partial products. The multiplicand is encoded, partial products of encoding results and a mask result corresponding to the multiplicand are calculated, and finally the obtained partial products are accumulated to implement a multiplication operation, thereby further saving hardware resources.
In a possible implementation, the step of performing Booth encoding on the first multiplicand and the second multiplicand includes: performing, by using a plurality of sub-encoders, Booth encoding on the first multiplicand to obtain at least one first encoding result, and Booth encoding on the second multiplicand to obtain at least one second encoding result. The step of performing partial product calculation based on encoding results generated by the Booth encoding, to obtain a plurality of partial products includes: calculating at least one first partial product of the at least one first encoding result and the first mask result, and calculating at least one second partial product of the at least one second encoding result and the second mask result. The step of accumulating the plurality of partial products includes: performing accumulation on the at least one first partial product to obtain the result of multiplying the first multiplicator and the first multiplicand, and performing accumulation on the at least one second partial product to obtain the result of multiplying the second multiplicator and the second multiplicand.
In a possible implementation, the multiplication calculation method further includes: adding the result, obtained by the accumulator, of multiplying the first multiplicator and the first multiplicand and the result, obtained by the accumulator, of multiplying the second multiplicator and the second multiplicand. Adding results of all low bit width multiplication operations can implement a convolution calculation function.
In a possible implementation, the multiplication calculation method further includes: shifting the result, obtained by the accumulator, of multiplying the first multiplicator and the first multiplicand and the result, obtained by the accumulator, of multiplying the second multiplicator and the second multiplicand.
In a possible implementation, data in the multiplicand input end includes the first multiplicand located at a less significant bit of the multiplicand input end, the second multiplicand located at a more significant bit of the multiplicand input end, one extended bit of 0 inserted at an end of a least significant bit of the first multiplicand, and another bit set to 0 other than the first multiplicand, the second multiplicand, and the extended bit in the multiplicand input end. The multiplicator data includes the first multiplicator located at a less significant bit of the multiplicator input end, the second multiplicator located at a more significant bit of the multiplicator input end, and another bit set to 0 other than the first multiplicator and the second multiplicator in the multiplicator input end. A position of the first multiplicator in the multiplicator input end is the same as a position of the first multiplicand in the multiplicand input end, and a position of the second multiplicator in the multiplicator input end is the same as a position of the second multiplicand in the multiplicand input end. The another bit is set to 0, so that an encoding result of the bit set to 0 does not affect subsequent partial product calculation and partial product accumulation. The two multiplicators and the two multiplicands are respectively located at the same positions, so that when the accumulator accumulates partial products, no additional shift operation is required to align the partial products, thereby saving hardware resources.
In a possible implementation, the first multiplicand and the second multiplicand are separated by at least one bit of 0, the first multiplicator and the second multiplicator are separated by at least one bit of 0, and the multiplication calculation method further includes: outputting data of a most significant bit of the first multiplicand to a most significant bit of a corresponding first sub-encoder, and outputting data 0 to a least significant bit of a sub-encoder that is adjacent to the first sub-encoder and that encodes an idle bit. The idle bit is a bit set to 0 between the first multiplicand and the second multiplicand. When multiplicands and multiplicators are stored in the multiplicand input end and the multiplicator input end in the foregoing manner, allocation of valid data and 0 may be implemented, so that a sub-encoder correctly encodes data in the multiplicand input end.
In a possible implementation, a most significant bit of the first multiplicand is adjacent to a least significant bit of the second multiplicand, a most significant bit of the first multiplicator is adjacent to a least significant bit of the second multiplicator, and the multiplication calculation method further includes: outputting data of the most significant bit of the first multiplicand to a most significant bit of a corresponding first sub-encoder, and outputting data 0 to a least significant bit of a second sub-encoder. The second sub-encoder is a sub-encoder that is adjacent to the first sub-encoder and that encodes the second multiplicand. When multiplicands and multiplicators are stored in the multiplicand input end and the multiplicator input end in the foregoing manner, allocation of valid data and 0 may be implemented, so that a sub-encoder correctly encodes data in the multiplicand input end.
In a possible implementation, the multiplication calculation method further includes: using a switch, where when the switch is in an on state, masking processing is performed; and when the switch is in an off state, masking processing is not performed. With control of the switch, the multiplier can switch between two modes of a plurality of multiplication operations and one multiplication operation, thereby further enhancing a capability of the multiplier to process multiplication operations.
In a possible implementation, the step of masking the second multiplicator in the multiplicator data to obtain a first mask result, and masking the first multiplicator in the multiplicator data to obtain a second mask result includes: using two AND gates, to respectively mask the first multiplicator and mask the second multiplicator in the multiplicator data to output the two mask results. Using two AND gates to implement a mask function can further simplify a circuit structure of the multiplier, save hardware resources, and reduce power consumption.
According to a third aspect, an embodiment of this application provides a data processing system, including: an encoder, configured to encode a first multiplicand and a second multiplicand to obtain a plurality of encoding results, where a sum of bit widths of the first multiplicand and the second multiplicand is less than a bit width of a multiplicand input end of the encoder; and a plurality of multipliers. Each multiplier includes: a mask circuit, configured to respectively mask a first multiplicator and a second multiplicator to obtain two mask results, where a sum of bit widths of the first multiplicator and the second multiplicator is less than a bit width of a multiplicator input end of each multiplier; a partial product calculation circuit, configured to respectively calculate, by using the plurality of encoding results as control signals, a plurality of partial products based on two mask results respectively corresponding to the plurality of encoding results; and an accumulator, configured to accumulate the plurality of partial products to obtain an accumulation result.
Because the plurality of multipliers may share the encoding results of the encoder, the encoder may not be used inside the multipliers to perform repeated encoding, thereby simplifying hardware design inside the multipliers, helping reduce hardware complexity, and helping improve processing efficiency of the multipliers because processing steps are simplified.
In a possible implementation, each multiplier further includes the multiplicator input end, configured to receive the first multiplicator and the second multiplicator. The mask circuit includes two masks, configured to respectively mask the first multiplicator and mask the second multiplicator in data of the multiplicator input end by using AND gates, to respectively output a first mask result and a second mask result.
In a possible implementation, the encoder includes the multiplicand input end, configured to receive the first multiplicand and the second multiplicand. The encoder includes a plurality of sub-encoders, configured to perform Booth encoding on data in the multiplicand input end.
In a possible implementation, the partial product calculation circuit includes a plurality of first partial product sub-circuits and a plurality of second partial product sub-circuits. The plurality of first partial product sub-circuits are configured to respectively calculate a plurality of first partial products based on the first mask result by using a plurality of encoding results of the first multiplicand as control signals. The plurality of second partial product sub-circuits are configured to respectively calculate a plurality of second partial products based on the second mask result by using a plurality of encoding results of the second multiplicand as control signals.
In a possible implementation, the multiplicand input end is further configured to: store the first multiplicand at a less significant bit of the multiplicand input end, store the second multiplicand at a more significant bit of the multiplicand input end, insert one extended bit of 0 at an end of a least significant bit of the first multiplicand, and set an idle bit to 0. The idle bit is another bit other than the first multiplicand, the second multiplicand, and the extended bit in the multiplicand input end. The partial product calculation circuit further includes a plurality of third partial product sub-circuits, configured to respectively calculate a plurality of third partial products based on data in the multiplicator input end by using idle bits as control signals.
In a possible implementation, a Pt sub-encoder of the plurality of sub-encoders is configured to perform Booth encoding on data of an extended bit and an LSB to a (k−2)th bit of the multiplicand input end, and an ith sub-encoder is configured to perform Booth encoding on data of an (i×(k−1)+1)th bit to an ((i+1)×(k−1)+1)th bit of the multiplicand input end, where k is a bit width of each sub-encoder, k≥2 and k is an integer, and i≥1 and i is an integer.
In a possible implementation, the accumulator is configured to accumulate the plurality of first partial products, the plurality of second partial products, and the plurality of third partial products.
In a possible implementation, the multiplicand input end is further configured to: store the first multiplicand at a less significant bit of the multiplicand input end, store the second multiplicand at a more significant bit of the multiplicand input end, and separate the first multiplicand and the second multiplicand by at least one bit of 0. The multiplicator input end is further configured to: store the first multiplicator at a less significant bit of the multiplicator input end, store the second multiplicator at a more significant bit of the multiplicator input end, and separate the first multiplicator and the second multiplicator by at least one bit of 0. Positions where the first multiplicator and the second multiplicator are stored in the multiplicator input end are respectively the same as positions where the first multiplicator and the second multiplicand stored in the multiplicator input end.
In a possible implementation, the multiplier further includes a selector. The selector is configured to output data of a most significant bit of the first multiplicand to a most significant bit of an xth sub-encoder, and output 0 to a least significant bit of an (x+1)th sub-encoder. The xth sub-encoder is a sub-encoder that encodes the most significant bit MSB2 to an (MSB2−k+1)th bit of the first multiplicand.
In a possible implementation, the multiplier further includes an adder. The adder is configured to add a result of accumulating the plurality of first partial products and a result of accumulating the plurality of second partial products.
In a possible implementation, the multiplicand input end is further configured to: store the first multiplicand at a less significant bit of the multiplicand input end, store the second multiplicand at a more significant bit of the multiplicand input end, and make a least significant bit of the second multiplicand adjacent to a most significant bit of the first multiplicand. The multiplicator input end is further configured to: store the first multiplicator at a more significant bit of the multiplicator input end, store the second multiplicator at a less significant bit of the multiplicator input end, and make a least significant bit of the first multiplicator adjacent to a most significant bit of the second multiplicator. A position where the first multiplicator is stored in the multiplicator input end is the same as a position where the second multiplicand is stored in the multiplicand input end, and a position where the second multiplicator is stored in the multiplicator input end is the same as a position where the first multiplicand is stored in the multiplicand input end.
In a possible implementation, the multiplier further includes a selector. The selector is configured to output 0 to a most significant bit of an xth sub-encoder, and output data of the least significant bit of the second multiplicand to a least significant bit of an (x+1)th sub-encoder. The xth sub-encoder is a sub-encoder that encodes the most significant bit MSB1 to an (MSB1−k+1)th bit of the first multiplicand, the (x+1)th sub-encoder is a sub-encoder that encodes the least significant bit LSB2 to an (LSB2−k+1)th bit of the second multiplicand.
In a possible implementation, the multiplier further includes a shifter. The shifter is configured to shift a result of accumulating the plurality of first partial products and a result of accumulating the plurality of second partial products.
In a possible implementation, the plurality of first partial product sub-circuits are specifically configured to respectively multiply the plurality of encoding results of the first multiplicand by the first mask result, to calculate the plurality of first partial products. The plurality of second partial product sub-circuits are specifically configured to respectively multiply the plurality of encoding results of the second multiplicand by the second mask result, to calculate the plurality of second partial products. The plurality of third partial product sub-circuits are specifically configured to respectively multiply a plurality of encoding results of the idle bits by data received by the multiplicator input end, to calculate the plurality of third partial products.
In a possible implementation, the multiplier further includes a switch. The switch is configured to: when in an on state, activate the mask circuit, the selector, the shifter, or the adder.
In a possible implementation, the first multiplicand and the second multiplicand are convolution kernel data, and the first multiplicator and the second multiplicator are feature layer data. Alternatively, the first multiplicand and the second multiplicand are feature layer data, and the first multiplicator and the second multiplicator are convolution kernel data.
In a possible implementation, the plurality of multipliers each further include a plurality of storage units. One storage unit in storage units of every two multipliers is configured to receive the plurality of encoding results, and the other storage unit is configured to read the plurality of encoding results.
According to a fourth aspect, an embodiment of this application provides a multiplication processing system. The multiplication processing system reads a configuration file from a memory coupled to the multiplication processing system, so that the multiplication processing system may be configured as the multiplier according to any possible implementation of the first aspect, or configured as the data processing system according to any possible implementation of the third aspect.
According to a fifth aspect, an embodiment of this application provides a computer-readable storage medium. The computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the method according to any possible implementation of the second aspect is implemented.
According to a sixth aspect, an embodiment of this application provides a computer program. The computer program includes instructions. When the computer program is executed by a computer, the computer is enabled to perform the method according to any possible implementation of the second aspect.
The following clearly describes technical solutions in embodiments of this application with reference to accompanying drawings in embodiments of this application.
The multiplier may be a multiplier 400 shown in
A manner in which the sub-encoder encodes every three bits of data may be Booth encoding. The Booth encoding may follow a rule in Table 1:
where i is an integer, yi+1yiyi−1 represents three consecutive bits of data in the multiplicand, and X is used to represent a multiplicator. In the rule of the foregoing table, the encoding result may alternatively be expressed as:
(yi+yi−1−2yi+1)X
For example, if a multiplicand is 010010111011 in binary representation, an extended multiplicand is 0100101110110. The sub-encoder 0 encodes three less significant bits, that is, 110, of the extended multiplicand based on the rule of the foregoing table, and an obtained encoding result is −X. The sub-encoder 1 encodes 101 based on the rule of the foregoing table, and an obtained encoding result is −X. The sub-encoder 2 encodes 111 based on the rule of the foregoing table, and an obtained encoding result is 0. The sub-encoder 3 encodes 101 based on the rule of the foregoing table, and an obtained encoding result is −X. The sub-encoder 4 encodes 001 based on the rule of the foregoing table, and an obtained encoding result is X. The sub-encoder 5 encodes 010 based on the rule of the foregoing table, and an obtained encoding result is X. The encoding process is shown in Table 2.
The partial product calculation circuit 420 includes a partial product sub-circuit 0, a partial product sub-circuit 1, a partial product sub-circuit 2, a partial product sub-circuit 3, a partial product sub-circuit 4, and a partial product sub-circuit 5 shown in
The accumulator 430 receives a plurality of partial products generated by the plurality of partial product sub-circuits, and accumulates the partial products based on weights of bits, corresponding to each partial product, of the multiplicand. For example, the partial products generated by the partial product sub-circuit 0 to the partial product sub-circuit 5 are −X, −X, 0, −X, X, and X, respectively. Because the multiplicand is encoded at the interval of two bits, corresponding weights of every two adjacent partial product sub-circuits are in a fourfold relationship. That is, the partial product generated by the partial product sub-circuit 0 is accumulated as 20 (−X), the partial product generated by the partial product sub-circuit 1 is accumulated as 22 (−X), the partial product generated by the partial product sub-circuit 2 is accumulated as 24 (0), the partial product generated by the partial product sub-circuit 3 is accumulated as 26 (−X), the partial product generated by the partial product sub-circuit 4 is accumulated as 28 (X), and the partial product generated by the partial product sub-circuit 5 is accumulated as 210 (X). A final accumulation result is a final result obtained by multiplying the multiplicator and the multiplicand. The accumulation process may be expressed as:
22(22(22(22(22X+X)+−X)+0)−X)−X=1211X
The multiplier 400 shown in
In a case in which a sum of bit widths of a plurality of multiplicators participating in multiplication operation is less than the bit width of the multiplicator input end 550 of the multiplier 500, and a sum of bit widths of a plurality of multiplicands participating in multiplication operation is less than the bit width of the multiplicand input end 560 of the multiplier 500, the multiplier 500 may be configured to simultaneously perform a plurality of groups of low bit width multiplication operations. In embodiments provided in this application, an example in which the multiplier 500 processes two groups of low bit width multiplication operations is used to describe a specific structure and function of the multiplier 500. For example, in this embodiment of this application, a first multiplicand b0 is multiplied by a first multiplicator a0, and a second multiplicand b1 is multiplied by a second multiplicator a1. However, it is easy to understand that the multiplier 500 provided in this embodiment of this application may also implement more than two groups of low bit width multiplication operations. That a first multiplicand b0 is multiplied by a first multiplicator a0, and a second multiplicand b1 is multiplied by a second multiplicator a1 may be of data in different formats, for example, INT4, INT8, FP16, or another format. It should be noted that a low bit width and a high bit width in this application are two relative concepts. For example, when the bit width of the multiplicator input end or the multiplicand input end of the multiplier is twice the bit width of the multiplicator or the multiplicand, the bit width of the multiplier is of the high bit width, and the bit width of the multiplicator or multiplicand is of the low bit width. In addition, a multiplicator and a multiplicand in this application are concepts relative to each other. For example, convolution kernel data may be input to the multiplier 500 as a multiplicator, and feature layer data may be input to the multiplier 500 as a multiplicand. Alternatively, convolution kernel data may be input to the multiplier 500 as a multiplicand, and feature layer data may be input to the multiplier 500 as a multiplicator.
The mask circuit 540 in the multiplier 500 is configured to separately mask the multiplicator data, to separately obtain two mask results. Specifically, the mask circuit 540 masks the second multiplicator a1 in the multiplicator data, to obtain a first mask result that indicates the first multiplicator a0. Similarly, the masking circuit 540 masks the first multiplicator a1 in the multiplicator data, to obtain a second mask result that indicates the second multiplicator a1. For example, if the multiplicator data is 110100000101 of 12 bits, where 0101 of four less significant bits is the first multiplicator, and 1101 of four more significant bits is the second multiplicator, the first mask result is 000000000101, and the second mask result is 110100000000.
The multiplication operation circuit 502 is configured to perform a multiplication operation on the first mask result and the first multiplicand b0 to obtain a product of the first mask result and the first multiplicand b0; and perform a multiplication operation on the second mask result and the second multiplicand b1 to obtain a product of the second mask result and the second multiplicand b1.
The mask circuit 540 in the multiplier 500 may mask a plurality of low bit width multiplicators respectively to calculate partial products corresponding to different multiplicators. Therefore, the multiplier 500 can be adapted to multiplication operations of a plurality of low bit width multiplicators and a plurality of low bit width multiplicands in different data formats, thereby resolving a problem of a hardware resource waste caused because a single multiplier can process a multiplication operation of only one data format. Using the multiplier 500 to implement multiplication operations of different data formats can reduce a hardware area occupied by the multiplier and reduce power consumption and overheads.
Specifically, the encoder 510 in the multiplier 500 is configured to encode the first multiplicand b0 and the second multiplicand b1, to separately obtain a first encoding result and a second encoding result, where both the first encoding result and the second encoding result may be a plurality of encoding results. An encoding manner used by the encoder 510 may be Booth encoding, in other words, the encoder 510 may be a Booth encoder. In an implementation, the encoder 510 may encode the first multiplicand b0 and the second multiplicand b1 in a radix-4 manner: encoding every three bits of data at an interval of two bits according to the rule described in Table 1. In another implementation, the encoder 510 may encode the first multiplicand b0 and the second multiplicand b1 in a radix-8 manner: encoding every four bits of data at an interval of three bits according to a preset rule. The encoder 510 may also use another manner, for example, a radix-16. For ease of description, a radix-4 manner is used as an example to describe a working principle of the encoder 510 in this application.
The partial product calculation circuit 520 is configured to calculate a partial product based on the encoding result. Specifically, the partial product calculation circuit 520 is configured to: calculate, by using the first encoding result and the second encoding result obtained by the encoder 510 as control signals, a plurality of partial products based on two mask results corresponding to the first encoding result and the second encoding result, in other words, calculate a first partial product of the first encoding result and the first mask result and calculate a second partial product of the second encoding result and the second mask result. The partial product calculation circuit 520 may determine a relationship between a corresponding encoding result and a corresponding partial product based on the control signal. For example, when the control signal indicates that the encoding result is −X, an output partial product is −1 times the corresponding mask result.
The accumulator 530 is configured to accumulate the plurality of partial products (for example, the first partial product and the second partial product) obtained by the partial product calculation circuit 520, to obtain an accumulation result. Because the partial product calculation circuit 520 separately performs partial product calculation on the plurality of encoding results to obtain the plurality of partial products, the accumulator 300 accumulates the plurality of partial products to obtain a final result.
In an implementation, the multiplicator input end 550 may include a storage circuit, for example, a register or a register group, configured to store the multiplicator data including the first multiplicator a0 and the second multiplicator a1. Correspondingly, the multiplicand input end 560 is configured to receive the first multiplicand b0 and the second multiplicand b1. In an implementation, the multiplicator input end 660 may include a storage circuit, for example, a register or a register group, configured to store the first multiplicand b0 and the second multiplicand b1.
In an implementation, positions where the first multiplicator a0 and the second multiplicator a1 are stored in the multiplicator input end 550 are the same as positions where the first multiplicand b0 and the second multiplicand b1 stored in the multiplicand input end 560. For example, the first multiplicator a0 and the second multiplicator a1 are respectively stored in an LSB to the 3rd bit and 4th to 7th bits of the storage circuit of the multiplicator input end 550. The first multiplicand b0 and the second multiplicand b1 are respectively stored in an LSB to the 3rd bit and 4th to 7th bits of the storage circuit of the multiplicand input end 560. In another implementation, a position where the first multiplicator a0 is stored in the multiplicator input end 550 is the same as a position where the second multiplicand b1 is stored in the multiplicand input end 560, and a position where the second multiplicator a1 is stored in the multiplicator input end 550 is the same as a position where the first multiplicand b0 is stored in the multiplicand input end 560. For example, the first multiplicator a0 and the second multiplicator a1 are respectively stored in an LSB to the 3rd bit and 4th to 7th bits of the storage circuit of the multiplicator input end 550. The first multiplicand b0 and the second multiplicand b1 are respectively stored in 4th to 7th bits and an LSB to the 3rd bit of the storage circuit of the multiplicand input end 560. The LSB, the 1st bit, and the like all refer to data in bits before the extension.
The mask circuit 540 may include two masks, configured to respectively mask the first multiplicator a0 and mask the second multiplicator a1 in the multiplicator data of the multiplicator input end 650 by using AND gates, to respectively output the first mask result and the second mask result. In an implementation, the mask circuit 540 may include more than two masks. It is easy to understand that a quantity of masks in the mask circuit 540 is the same as a quantity of multiplicators in the multiplicator data.
The encoder 510 may include a plurality of sub-encoders, configured to perform Booth encoding on the data in the multiplicand input end 560. The encoding process includes Booth encoding on the first multiplicand b0 and the second multiplicand b1, and Booth encoding on another bit in the storage circuit in the multiplicand input end 560.
In an implementation, the multiplicand input end 560 is further configured to: store the first multiplicand b0 in lower bits of the storage circuit of the multiplicand input end 560, and store the second multiplicand b1 in higher bits; insert one extended bit of 0 at an end of a least significant bit of the first multiplicand b0; and set other idle bits than the first multiplicand b0, the second multiplicand b1, and the foregoing extended bit to 0.
For example, a bit width of the storage circuit of the multiplicand input end 560 is 12 bits, where the first multiplicand b0 is stored in an LSB to the 3rd bit and the second multiplicand b1 is stored in the 8th to 11th bits. The multiplicand input end 560 is further configured to insert one extension bit of 0 at the end of the least significant bit of the first multiplicand b0, and set idle bits, namely, the 4th to 7th, to 0.
Similarly, the multiplicator input end 550 is further configured to store the first multiplicator a0 in low bits of the storage circuit of the multiplicator input end 550, and store the second multiplicator a1 in high bits; and set other idle bits than the first multiplicator a0 and the second multiplicator a1 to 0.
In an implementation, the multiplicand input end 560 is further configured to store the first multiplicand b0 in low bits of the storage circuit of the multiplicand input end 560, and store the second multiplicand b1 in high bits of the multiplicand input end 560. In addition, the first multiplicand b0 and the second multiplicand b1 are separated by at least one bit of 0.
A first sub-encoder in the plurality of sub-encoders in the encoder 510 is configured to perform Booth encoding on data from an extended bit to the (k−2)th bit of the multiplicand input end 560. The ith sub-coder is configured to perform Booth encoding on data from the (i×(k−1)+1)th bit to the ((i+1)×(k−1)+1)th bit of the multiplicand input end 560, where k is a bit width of each sub-coder, k≥2 and is an integer, and i≥1 and is an integer.
For example, the first sub-encoder performs Booth encoding on the extended bit, the LSB, and the 1st bit of the multiplicand input end 560. The second sub-encoder performs Booth encoding on the 1st to 3rd bits of the multiplicand input end 560. The third sub-encoder performs Booth encoding on the 3rd to 5th bits of the multiplicand input end 560, and so on.
When the bit width of the storage circuit of the multiplicand input end 560 is 12 bits, the first multiplicand b0 is stored in the LSB to the 3rd bit, and the second multiplicand b1 is stored in the 8th to 11th bits, encoding results of the first multiplicand b0 are encoding results of the first sub-encoder and the second sub-encoder, and encoding results of the second multiplicand b1 are encoding results of the 5th sub-encoder and the 6th sub-encoder.
When the encoder 510 performs encoding in a Radix-4 manner, k=3, that is, a bit width of each sub-coder is three bits.
When the encoder 510 performs encoding in a Radix-8 manner, k=4, that is, a bit width of each sub-coder is four bits.
When the encoder 510 performs encoding in a Radix-n manner, k=+1, that is, a bit width of each sub-encoder is +1 bit.
The partial product calculation circuit 520 includes a plurality of partial product sub-circuits, configured to respectively calculate, by using the plurality of encoding results obtained by a plurality of sub-encoders in the encoder 510 as control signals, a plurality of partial products based on two mask results respectively corresponding to the plurality of encoding results.
Specifically, the partial product calculation circuit 520 may include a plurality of first partial product sub-circuits and a plurality of second partial product sub-circuits. The first partial product sub-circuit is configured to respectively calculate a plurality of first partial products based on the first mask result by using a plurality of encoding results of the first multiplicand b0 as control signals.
The second partial product sub-circuit is configured to respectively calculate a plurality of second partial products based on the second mask result by using a plurality of encoding results of the second multiplicand b1 as control signals.
In an implementation, the partial product calculation circuit 520 further includes a plurality of third partial product sub-circuits, configured to respectively calculate a plurality of third partial products based on data in the multiplicator input end by using idle bits as control signals.
For example, for the first partial product sub-circuit or the second partial product sub-circuit, if a received encoding result is −2×, the control signal indicates that a relationship between a partial product corresponding to the encoding result and the mask result is −2 times. Therefore, a calculated partial product is the product of −2 and the mask result.
For the third partial product sub-circuit, if an encoding result received by the third partial product sub-circuit is X, the control signal indicates that a relationship between a partial product corresponding to the encoding result and data stored in the multiplicator input end 550 is 1 times. Therefore, a calculated partial product is the product of 1 and the data stored in the multiplicator input end 550.
The accumulator 530 is configured to separately accumulate the plurality of first partial products, the plurality of second partial products, and the third partial product.
In an implementation, the multiplier 600 further includes a selector 670, an adder 680, and a switch 690. The selector 670 is configured to output data of a most significant bit of the first multiplicand b0 to a most significant bit of a corresponding sub-encoder, and output data 0 to a least significant bit of a sub-encoder that is adjacent to the sub-encoder and that encodes an idle bit. The idle bit is a bit set to 0 between the first multiplicand b0 and the second multiplicand b1. Specifically, the selector 670 is configured to output the data of the most significant bit of the first multiplicand b0 to a most significant bit of an xth sub-encoder, and output 0 to a least significant bit of an (x+1)th sub-encoder. The xth sub-encoder is a sub-encoder that encodes the most significant bit MSB2 to an (MSB2−k+1)th bit of the first multiplicand b0. The adder 680 is configured to add a result of accumulating the plurality of first partial products (that is, a result of multiplying the first multiplicator and the first multiplicand) and a result of accumulating the plurality of second partial products (that is, a result of multiplying the second multiplicator and the second multiplicand), to obtain a multiplication and accumulation result a0×a1+b0×b1.
A working principle of the multiplier 600 is described by using a more specific multiplier 600 shown in
Using
The encoder 510 includes a sub-encoder 0 (that is, the first sub-encoder, where the subsequent is deduced from this), the sub-encoder 1, the sub-encoder 2, a sub-encoder 3, a sub-encoder 4, and a sub-encoder 5. Specifically, the sub-encoder 0 encodes the extended bit of 0 inserted at the end of the first multiplicand b0, an LSB, and a 1st bit; the sub-encoder 1 encodes the 1st bit, a 2nd bit, and a 3rd bit; the sub-encoder 2 encodes the 3rd bit, a 4th bit, and a 5th bit; and so on, as shown in
The multiplicator input end 550 receives and stores the first multiplicator a0 and the second multiplicator a1 in the same manner, that is, an LSB to a 3rd bit of the multiplicator input end 550 receive and store the first multiplicator a0, and 8th to 11th bits receive and store the second multiplicator a1. It should be noted that in the multiplicand input end shown in
The mask circuit 540 further includes a first mask 342 and a second mask 344. The first mask 342 is configured to zero a less significant bit part of the data stored in the multiplicator input end 550, that is, retain the second multiplicator a1 and zero the first multiplicator a0. Specifically, the first mask 342 may be an AND gate for implementing AND 0xF00 logic, so that a bitwise AND operation is performed on the 12-bit data stored in the multiplicator input end 550 and 0xF00, to retain data of the 8th to 11th bits, and zero data of the LSB to a 7th bit. Correspondingly, the second mask 344 is configured to zero a more significant bit part of the data stored in the multiplicator input end 550, that is, retain the first multiplicator a0 and zero the second multiplicator a1. Specifically, the second mask 342 may be an AND gate for implementing AND 0x00F logic, so that a bitwise AND operation is performed on the 12-bit data stored in the multiplicator input end 550 and 0x00F, to retain data of the LSB to the 3rd bit, and zero data of 4th to 11th bits.
The partial product calculation circuit 520 includes a partial product sub-circuit 0, a partial product sub-circuit 1, a partial product sub-circuit 2, a partial product sub-circuit 3, a partial product sub-circuit 4, and a partial product sub-circuit 5 shown in
The accumulator 530 is configured to receive and accumulate a plurality of partial products generated by the partial product sub-circuits in the partial product calculation circuit 520. For example, the partial product generated by the partial product sub-circuit 0 is pp0, the partial product generated by the partial product sub-circuit 1 is pp1, the partial product generated by the partial product sub-circuit 2 is pp2, the partial product generated by the partial product sub-circuit 3 is pp3, the partial product generated by the partial product sub-circuit 4 is pp4, and the partial product generated by the partial product sub-circuit 5 is pp5. Because the multiplicand is encoded at an interval of two bits, corresponding weights of every two adjacent partial product sub-circuits are in a fourfold relationship. That is, the partial product generated by the partial product sub-circuit 0 is accumulated as 20(pp0), the partial product generated by the partial product sub-circuit 1 is accumulated as 22(pp 1), the partial product generated by the partial product sub-circuit 2 is accumulated as 24(pp2), the partial product generated by the partial product sub-circuit 3 is accumulated as 26(pp3), the partial product generated by the partial product sub-circuit 4 is accumulated as 28(pp4), and the partial product generated by the partial product sub-circuit 5 is accumulated as 210(pp5). A final accumulation result is a final result obtained by multiplying the multiplicator and the multiplicand. The accumulation process may be expressed as:
22(22(22(22(22pp5+pp4)+pp3)+pp2)+pp1)+pp0
The accumulation result obtained in the accumulation process has a bit width of 23 bits. 16th to 22nd bits of the accumulation result store a result of a1×b1, and an LSB to a 6th bit store a result of a0×b0.
The adder 680 is configured to receive the accumulation result generated by the accumulator, and add data in the 16th to 22nd bits and the LSB to the 6th bit in the accumulation result, to obtain a result a0×b0+a1×b1, and output the result.
In an implementation, the multiplier 600 further includes a switch 690, configured to activate the mask circuit 540, the selector 670, and the adder 680 based on a mode control signal. When activated, the mask circuit 540, the selector 670, and the adder 680 respectively perform functions described above. When the mask circuit 540, the selector 670, and the adder 680 are not activated (disabled), the mask circuit 540 is configured to directly transmit a received multiplicator, the selector 670 is configured to output the most significant bit of the first multiplicand b0 to the most significant bit of the sub-encoder 1 and the least significant bit of the sub-encoder 2, and the adder 680 is configured to directly transmit a received accumulation result. Specifically, when the mode control signal received by the switch 690 is a low bit width multiplication mode, the mask circuit 540, the selector 670, and the adder 680 are activated to respectively implement the functions shown in
The selector 680 in the multiplier 800 is configured to output data of the most significant bit of the first multiplicand b0 to a corresponding sub-encoder, and output data 0 to a sub-encoder that is adjacent to the sub-encoder and that encodes the second multiplicand b1. Specifically, the selector 680 is configured to output 0 to a most significant bit of an xth sub-encoder, and output data of the least significant bit of the second multiplicand b1 to a least significant bit of the (x+1)th sub-encoder. The xth sub-encoder is a sub-encoder that encodes the most significant bit MSB1 to an (MSB1−k+1)th bit of the first multiplicand b0, and the (x+1)th sub-encoder is a sub-encoder that encodes the least significant bit LSB2 to an (LSB2−k+1)th bit of the second multiplicand b1. In addition, the multiplier 800 further includes a shifter (shifter) 882, configured to shift the result of accumulating the plurality of first partial products (that is, the result of multiplying the first multiplicator and the first multiplicand) and the result of accumulating the plurality of second partial products (that is, the result of multiplying the second multiplicator and the second multiplicand). Specifically, a bit quantity of the shift is a bit width of the multiplicator or the multiplicand.
A working principle of the multiplier 800 is described by using another more specific multiplier 800 shown in
An embodiment of this application further provides a data processing system, including an encoder and at least one multiplier that shares the encoder. A data processing system 1000 shown in
In a possible implementation, the data processing system 1000 may further include a plurality of multipliers, and the plurality of multipliers each may include the mask circuit 540, the partial product calculation circuit 520, the accumulator 540, the multiplicator input end 550, the multiplicand input end 560, the switch 690, the selector 670, and the adder 680 or the shifter 882 that are provided in the embodiments of this application.
S1110. Receive multiplicator data, where the multiplicator data includes a first multiplicator and a second multiplicator, and a sum of a bit width of the first multiplicator and a bit width of the second multiplicator is less than a bit width of the multiplicator data.
S1120. Mask the second multiplicator in the multiplicator data to obtain a first mask result, and mask the first multiplicator in the multiplicator data to obtain a second mask result.
S1130. Receive a first multiplicand and a second multiplicand, where a sum of a bit width of the first multiplicand and a bit width of the second multiplicand is less than a bit width of the multiplicand input end.
S1140. Perform a multiplication operation on the first mask result and the first multiplicand to obtain a result of multiplying the first multiplicator and the first multiplicand, and perform a multiplication operation on the second mask result and the second multiplicand to obtain a result of multiplying the second multiplicator and the second multiplicand.
In an implementation, the step of performing a multiplication operation on the first mask result and the first multiplicand to obtain a result of multiplying the first multiplicator and the first multiplicand, and performing a multiplication operation on the second mask result and the second multiplicand to obtain a result of multiplying the second multiplicator and the second multiplicand includes: performing Booth encoding on the first multiplicand and the second multiplicand.
In an implementation, the step of performing a multiplication operation on the first mask result and the first multiplicand to obtain a result of multiplying the first multiplicator and the first multiplicand, and performing a multiplication operation on the second mask result and the second multiplicand to obtain a result of multiplying the second multiplicator and the second multiplicand further includes: performing partial product calculation based on encoding results generated by the Booth encoding, to obtain a plurality of partial products; and accumulating the plurality of partial products.
In an implementation, the step of performing Booth encoding on the first multiplicand and the second multiplicand includes: performing, by using a plurality of sub-encoders, Booth encoding on the first multiplicand to obtain at least one first encoding result, and Booth encoding on the second multiplicand to obtain at least one second encoding result. The step of performing partial product calculation based on encoding results generated by the Booth encoding, to obtain a plurality of partial products includes: calculating at least one first partial product of the at least one first encoding result and the first mask result, and calculating at least one second partial product of the at least one second encoding result and the second mask result. The step of accumulating the plurality of partial products includes: performing accumulation on the at least one first partial product to obtain the result of multiplying the first multiplicator and the first multiplicand, and performing accumulation on the at least one second partial product to obtain the result of multiplying the second multiplicator and the second multiplicand.
In an implementation, the multiplication calculation method further includes: adding the result, obtained by the accumulator, of multiplying the first multiplicator and the first multiplicand and the result, obtained by the accumulator, of multiplying the second multiplicator and the second multiplicand.
In an implementation, the multiplication calculation method further includes: shifting the result, obtained by the accumulator, of multiplying the first multiplicator and the first multiplicand and the result, obtained by the accumulator, of multiplying the second multiplicator and the second multiplicand.
In an implementation, data in the multiplicand input end includes the first multiplicand located at a less significant bit of the multiplicand input end, the second multiplicand located at a more significant bit of the multiplicand input end, one extended bit of 0 inserted at an end of a least significant bit of the first multiplicand, and another bit set to 0 other than the first multiplicand, the second multiplicand, and the extended bit in the multiplicand input end. The multiplicator data includes: the first multiplicator located at a less significant bit of the multiplicator input end, the second multiplicator located at a more significant bit of the multiplicator input end, and another bit set to 0 other than the first multiplicator and the second multiplicator in the multiplicator input end. A position of the first multiplicator in the multiplicator input end is the same as a position of the first multiplicand in the multiplicand input end, and a position of the second multiplicator in the multiplicator input end is the same as a position of the second multiplicand in the multiplicand input end.
In an implementation, the first multiplicand and the second multiplicand are separated by at least one bit of 0, the first multiplicator and the second multiplicator are separated by at least one bit of 0, and the multiplication calculation method further includes: outputting data of a most significant bit of the first multiplicand to a most significant bit of a corresponding first sub-encoder, and outputting data 0 to a least significant bit of a sub-encoder that is adjacent to the first sub-encoder and that encodes an idle bit. The idle bit is a bit set to 0 between the first multiplicand and the second multiplicand.
In an implementation, a most significant bit of the first multiplicand is adjacent to a least significant bit of the second multiplicand, a most significant bit of the first multiplicator is adjacent to a least significant bit of the second multiplicator, and the multiplication calculation method further includes: outputting data of the most significant bit of the first multiplicand to a most significant bit of a corresponding first sub-encoder, and outputting data 0 to a least significant bit of a second sub-encoder. The second sub-encoder is a sub-encoder that is adjacent to the first sub-encoder and that encodes the second multiplicand.
In an implementation, the multiplication calculation method further includes: using a switch, where when the switch is in an on state, masking processing is performed; and when the switch is in an off state, masking processing is not performed.
In an implementation, the step of masking the second multiplicator in the multiplicator data to obtain a first mask result, and masking the first multiplicator in the multiplicator data to obtain a second mask result includes: using two AND gates, to respectively mask the first multiplicator and mask the second multiplicator in the multiplicator data to output the two mask results.
In an implementation, the IP core 1220 may be implemented by using a DSP or a CPU, for example, by using a soft core. In an implementation, the IP core 1220 may alternatively be implemented by using a hard core. In another implementation, the IP core 1220 may be implemented by running a firm core on a DSP/CPU. For example, the multiplication processing system 1200 reads a configuration file (firm core) from a computer-readable storage medium, where the configuration file is used to configure the multiplication processing system 1200, so that the multiplication processing system 1200 can be configured as any multiplier or data processing system provided in the embodiments of this application, or configured to implement any method provided in the embodiments of this application. The configuration file is a functionally verified circuit structure encoding file.
In an implementation, an embodiment of this application provides a computer-readable storage medium that stores a computer program. When the computer program is executed by a processor, any method provided in the embodiments of this application is implemented.
In an implementation, an embodiment of this application provides a computer program. The computer program includes instructions. When the computer program is executed by a computer, the computer is enabled to perform any method provided in the embodiments of this application.
The foregoing descriptions are merely specific implementations of this application, but are not intended to limit the protection scope of this application. Any variation or replacement readily figured out by a person skilled in the art within the technical scope disclosed in this application shall fall within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.
This application is a continuation of International Application No. PCT/CN2019/106902, filed on Sep. 20, 2019, the disclosure of which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2019/106902 | Sep 2019 | US |
Child | 17698068 | US |