This application claims priority under 35 U.S.C. § 119 to Chinese Patent Application No. 202210832162.6, filed on Jul. 15, 2022, entitled as “HIGH-PERFORMANCE MULTIPLIER-ACCUMULATOR, MULTIPLICATION-ACCUMULATION METHOD, AND ELECTRONIC DEVICE” and Chinese Patent Application No. 202210830964.3, filed on Jul. 15, 2022, entitled as “ASYMMETRIC MULTIPLIER-ACCUMULATOR, MULTIPLICATION-ACCUMULATION METHOD, AND ELECTRONIC DEVICE”, the entire contents of which are incorporated herein in their entireties.
The present disclosure relates to the field of chip technologies, and in particular, to a multiplication-accumulation system, a multiplication-accumulation method, and an electronic device.
In a logical operation unit of a microprocessor, a floating-point number multiplication-accumulation operation is generally realized by using a multiplier-accumulator. In general, a design solution of the multiplier-accumulators in the logical operation unit is to arrange n single-precision multiplier-accumulators and 2n half-precision multiplier-accumulators. When the logical operation unit performs single-precision floating-point number multiplication-accumulation operations, the n single-precision multiplier-accumulators simultaneously operate to obtain n single-precision multiplication-accumulation results. When the logical operation unit performs half-precision floating-point number multiplication-accumulation operations, the 2n half-precision multiplier-accumulators simultaneously operate to obtain 2n half-precision multiplication-accumulation results.
However, when the logical operation unit performs single-precision floating-point number multiplication-accumulation operations, the 2n half-precision multiplier-accumulators are idle, and when the logical operation unit performs half-precision floating-point number multiplication-accumulation operations, the n single-precision multiplier-accumulators are idle, leading to low utilization rate of the multiplier-accumulators, and increased hardware overhead caused by the design of a large number of multiplier-accumulators.
Based on the above, there is a need to arrange, with respect to the above technical problems, a high-performance multiplier-accumulator, a multiplication-accumulation method, and an electronic device that can improve utilization of the multiplier-accumulator.
In a first aspect, the present disclosure provides a high-performance multiplier-accumulator, the high-performance multiplier-accumulator including: N single-precision multiplication-accumulation units, each of the single-precision multiplication-accumulation units including: two half-precision multiplier-accumulators;
In a second aspect, the present disclosure further provides a multiplication-accumulation method, the method including:
In a third aspect, the present disclosure further provides an asymmetric multiplier-accumulator, the asymmetric multiplier-accumulator including: N multiplication-accumulation units, each of the multiplication-accumulation units including: a single-precision multiplier-accumulator and a half-precision multiplier-accumulator;
In a fourth aspect, the present disclosure further provides a multiplication-accumulation method, the method including:
In a fifth aspect, the present disclosure further provides an electronic device. The electronic device includes a memory and a processor. The memory stores a computer program, and the processor implements the method provided in the second aspect or the method provided in the fourth aspect when executing the computer program.
In a sixth aspect, the present disclosure further provides a computer-readable storage medium. The computer-readable storage medium stores a computer program, and the method provided in the second aspect or the method provided in the fourth aspect is implemented when the computer program is executed by a processor.
In a seventh aspect, the present disclosure further provides a computer program product. The computer program product includes a computer program, and the method provided in the second aspect or the method provided in the fourth aspect is implemented when the computer program is executed by a processor.
Two design ideas are provided in the above multiplication-accumulation method and apparatus, processor, and computer program product. One design idea is to still arrange 2n half-precision multiplier-accumulators in the logical operation unit. The 2n half-precision multiplier-accumulators are grouped in pairs to obtain a total of n groups. When the single-precision floating-point number multiplication-accumulation operations are performed, the half-precision multiplier-accumulators also participate in the operations and are not idle, improving utilization of the half-precision multiplier-accumulators. Moreover, according to the solution of the present disclosure, n single-precision multiplier-accumulators are saved, and the hardware overhead is reduced. Another design idea is to arrange n half-precision multiplier-accumulators and n single-precision multiplier-accumulators in the logical operation unit. One half-precision multiplier-adder and one single-precision multiplier-accumulator form a group, and a total of n groups are obtained. When the half-precision floating-point number multiplication-accumulation operations are performed, the single-precision multiplier-accumulators also participate in the operations and are not idle, improving utilization of the single-precision multiplier-accumulators. Moreover, the solution of the present disclosure saves n half-precision multiplier-accumulators and reduces the hardware overhead.
In order to make the objectives, technical solutions, and advantages of the present disclosure clearer, the present disclosure is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that specific embodiments described herein are only intended to explain the present disclosure, and are not intended to limit the present disclosure.
For ease of understanding, terms involved in the embodiments of the present disclosure are explained as follows.
Single-precision floating-point number: It is stipulated in a binary floating-point arithmetic standard (Institute of Electrical and Electronics Engineers 754) that a single-precision floating-point number includes 32-bit binary data. A data format of the single-precision floating-point number is shown in
S denotes a sign bit, S=0 means that a value represented by the single-precision floating-point number is positive, and S=1 means that the value represented by the single-precision floating-point number is negative.
Exponent denotes an exponent part, which is 8-bit binary data.
Mantissa denotes a part after a decimal point, which is 23-bit binary data.
Normal means that the exponent is not all 1 and not all 0, and the number 1 before the decimal point is omitted.
Denormal means that the exponent is all 0, the mantissa is not all 0, and the number 0 before the decimal point is omitted.
Exponent bias: A bias of a normal single-precision floating-point number is 0x7F, and a bias of a denormal single-precision floating-point number is 0x7E.
A value represented by the normal single-precision floating-point number is:
data=(−1)S*2exponent-0x7F*(1.mantissa)
A value represented by the denormal single-precision floating-point number is:
data=(−1)S*2exponent-0x7F*(0.mantissa)
Half-precision floating-point number: A half-precision floating-point number is formed by 16-bit binary data. A data format of the half-precision floating-point number is shown in
S denotes a sign bit, S=0 means that a value represented by the half-precision floating-point number is positive, and S=1 means that the value represented by the half-precision floating-point number is negative.
Exponent denotes an exponent part, which is 5-bit binary data.
Mantissa denotes a part after a decimal point, which is 10-bit binary data.
Normal means that the exponent is not all 1 and not all 0, and the number 1 before the decimal point is omitted.
Denormal means that the exponent is all 0, the mantissa is not all 0, and the number 0 before the decimal point is omitted.
Exponent bias: A bias of a normal half-precision floating-point number is 0xF, and a bias of a denormal half-precision floating-point number is 0xE.
A value represented by the normal half-precision floating-point number is:
data=(−1)S*2exponent-0x7F*(1.mantissa)
A value represented by the denormal half-precision floating-point number is:
data=(−1)S*2exponent-0x7F*(0.mantissa)
In a logical operation unit of a microprocessor, a multiplication-accumulation operation for floating-point numbers is generally realized by using a multiplier-accumulator. According to some embodiments, references may be made to
In view the above problem, two other design solutions are presented in some embodiments of the present disclosure. One design solution is, as shown in
As shown in
It is to be noted that the high-performance multiplier-accumulator shown in
The high-performance multiplier-accumulator shown in
Firstly, the high-performance multiplier-accumulator shown in
According to an embodiment, the high-performance multiplier-accumulator includes: N single-precision multiplication-accumulation units. Each single-precision multiplication-accumulation unit includes: two half-precision multiplier-accumulators. When the high-performance multiplier-accumulator performs a single-precision floating-point number multiplication-accumulation operation, the two half-precision multiplier-accumulators in each single-precision multiplication-accumulation unit are configured to be combined to perform a multiplication-accumulation operation on to-be-processed single-precision floating-point numbers to obtain a corresponding single-precision multiplication-accumulation result, and a total of N multiplication-accumulation results are obtained. When the high-performance multiplier-accumulator performs half-precision floating-point number multiplication-accumulation operations, each half-precision multiplier-accumulator is configured to perform the multiplication-accumulation operations on to-be-processed half-precision floating-point numbers to obtain a corresponding half-precision multiplication-accumulation result, and a total of 2N multiplication-accumulation results are obtained.
The two half-precision multiplier-accumulators includes: a first half-precision multiplier-accumulator and a second half-precision multiplier-accumulator, and the to-be-processed single-precision floating-point numbers include: a first single-precision multiplier, a second single-precision multiplier, and a single-precision addend.
When the high-performance multiplier-accumulator performs the single-precision floating-point number multiplication-accumulation operation, the first half-precision multiplier-accumulator is specifically configured to perform a first-part multiplication to obtain a first multiplication result, and transmit the first multiplication result to the second half-precision multiplier-accumulator. The second half-precision multiplier-accumulator is specifically configured to perform a second-part multiplication to obtain a second multiplication result. The first-part multiplication and the second-part multiplication are classified based on a decimal of the first single-precision multiplier and a decimal of the second single-precision multiplier according to a preset rule. The second half-precision multiplier-accumulator is further configured to determine a decimal of a multiplication result according to the first multiplication result and the second multiplication result. The single-precision multiplication-accumulation result is determined according to the decimal of the multiplication result, an exponent of the first single-precision multiplier, an exponent of the second single-precision multiplier, a sign of the first single-precision multiplier, a sign of the second single-precision multiplier, a decimal of the single-precision addend, an exponent of the single-precision addend, and a sign of the single-precision addend.
The second half-precision multiplier-accumulator includes: an exponent addition module, a decimal addition module, and a determination module. The exponent addition module is configured to determine an exponent of the multiplication result according to the exponent of the first single-precision multiplier and the exponent of the second single-precision multiplier. The decimal addition module is configured to determine a sign of the multiplication result according to the sign of the first single-precision multiplier and the sign of the second single-precision multiplier. The determination module is configured to determine a decimal of a multiplication-accumulation result, a sign of the multiplication-accumulation result, and an exponent of the multiplication-accumulation result according to the exponent of the multiplication result, the exponent of the single-precision addend, the sign of the multiplication result, the sign of the single-precision addend, the decimal of the multiplication result, and the decimal of the single-precision addend; and determine the single-precision multiplication-accumulation result according to the decimal of the multiplication-accumulation result, the sign of the multiplication-accumulation result, and the exponent of the multiplication-accumulation result.
The determination module includes: a first exponent subtraction module, a first shift operation module, a first addition module, and a first multiplication-accumulation result exponent determination module. The first exponent subtraction module is configured to determine an absolute value of an exponent difference according to the exponent of the multiplication result and the exponent of the single-precision addend. The first shift operation module is configured to perform a shift operation on the decimal of the multiplication result or the decimal of the single-precision addend according to the exponent of the multiplication result, the exponent of the single-precision addend, and the absolute value of the exponent difference, to obtain a decimal of a multiplication result after the shift operation and a decimal of a single-precision addend after the shift operation. The first addition module is configured to determine the decimal of the multiplication-accumulation result and the sign of the multiplication-accumulation result according to the sign of the multiplication result, the sign of the single-precision addend, the decimal of the multiplication result after the shift operation, and the decimal of the single-precision addend after the shift operation. The first multiplication-accumulation result exponent determination module is configured to determine the exponent of the multiplication-accumulation result according to the exponent of the multiplication result and the exponent of the single-precision addend.
Specifically, the exponent addition module is specifically configured to:
op01.exp=op0.exp+op1.exp−bias
Specifically, the decimal addition module is specifically configured to:
Specifically, the first shift operation module is specifically configured to:
Specifically, the first addition module is specifically configured to:
Specifically, the first multiplication-accumulation result exponent determination module is specifically configured to:
In an embodiment, the to-be-processed half-precision floating-point numbers include: a first half-precision multiplier, a second half-precision multiplier, and a half-precision addend.
When the high-performance multiplier-accumulator performs a half-precision floating-point number multiplication-accumulation operation, the half-precision multiplier-accumulator is specifically configured to determine a decimal of a multiplication result according to a decimal of the first half-precision multiplier and a decimal of the second half-precision multiplier; determine an exponent of the multiplication result according to an exponent of the first half-precision multiplier and an exponent of the second half-precision multiplier; determine a sign of the multiplication result according to a sign of the first half-precision multiplier and the sign of a second half-precision multiplier; determine a decimal of a multiplication-accumulation result, a sign of the multiplication-accumulation result, and an exponent of the multiplication-accumulation result according to the decimal of the multiplication result, the exponent of the multiplication result, the sign of the multiplication result, the exponent of the half-precision addend, the sign of the half-precision addend, and the decimal of the half-precision addend; and determine the half-precision multiplication-accumulation result according to the decimal of the multiplication-accumulation result, the sign of the multiplication-accumulation result, and the exponent of the multiplication-accumulation result.
The half-precision multiplier-accumulator includes: a second exponent subtraction module, a second shift operation module, a second addition module, and a second multiplication-accumulation result exponent determination module. The second exponent subtraction module is configured to determine an absolute value of an exponent difference according to the exponent of the multiplication result and the exponent of the half-precision addend. The second shift operation module is configured to perform a shift operation on the decimal of the multiplication result or the decimal of the half-precision addend according to the exponent of the multiplication result, the exponent of the half-precision addend, and the absolute value of the exponent difference, to obtain a decimal of a multiplication result after the shift operation and a decimal of a half-precision addend after the shift operation. The second addition module is configured to determine the decimal of the multiplication-accumulation result and the sign of the multiplication-accumulation result according to the sign of the multiplication result, the sign of the single-precision addend, the decimal of the multiplication result after the shift operation, and the decimal of the half-precision addend after the shift operation. The second multiplication-accumulation result exponent determination module is configured to determine the exponent of the multiplication-accumulation result according to the exponent of the multiplication result and the exponent of the single-precision addend.
In an embodiment, as shown in
In S602, when the high-performance multiplier-accumulator performs single-precision floating-point number multiplication-accumulation operations, the two half-precision multiplier-accumulators in each single-precision multiplication-accumulation unit are configured to be combined to perform a multiplication-accumulation operation on to-be-processed single-precision floating-point numbers, and obtain a corresponding single-precision multiplication-accumulation result, and a total of N multiplication-accumulation results are obtained.
For ease of description, for each single-precision multiplication-accumulation unit, the two half-precision multiplier-accumulators included in the single-precision multiplication-accumulation unit may be referred to as a first half-precision multiplier-accumulator and a second half-precision multiplier-accumulator respectively. The multiplication-accumulation operation, as suggested by the name, includes both multiplication and addition. Therefore, the to-be-processed single-precision floating-point numbers include two single-precision multipliers and one single-precision addend. For ease of description, the two single-precision multipliers are referred to as a first single-precision multiplier and a second single-precision multiplier respectively.
An expression of the multiplication-accumulation operation is:
dst=op0*op1+op2
Variables used in a multiplication-accumulation process are described below.
Optionally, when a multiplication-accumulation operation is performed on the to-be-processed single-precision floating-point number, op0.mant*op1.mant may be divided into two parts based on op0.mant and op1.mant according to a preset rule. For ease of description, the two-part multiplication-accumulation operation obtained by division may be referred to as a first-part multiplication and a second-part multiplication.
For example, since
op0.mant[23:0]*op1.mant[23:0]=op0.mant[23:0]*op1.mant[11:0]+op0.mant[23:0]*op1.mant[23:12]
op0.mant[23:0]*op1.mant[23:0] may be divided into two parts of multiplication. The first-part multiplication is op0.mant[23:0]*op1.mant[11:0], and the second-part multiplication is op0.mant[23:0]*op1.mant[23:12]. Alternatively, the first-part multiplication is op0.mant[23:0]*op1.mant[23:12], and the second-part multiplication is op0.mant[23:0]*op1.mant[11:0].
It is to be noted that the above-mentioned manner of dividing the multiplication is merely an example, the embodiments of the present disclosure are not limited thereto, and other division manners may be feasible. For example, op0.mant[23:0]*op1.mant[23:0] is divided into op0.mant[23:0]*op1.mant[12:0] and op0.mant[23:0]*op1.mant[23:13].
Optionally, the first half-precision multiplier-accumulator may perform first-part multiplication to obtain a first multiplication result, and transmit the first multiplication result to the second half-precision multiplier-accumulator. The second half-precision multiplier-accumulator may perform the second-part multiplication to obtain a second multiplication result. The second half-precision multiplier-accumulator determines op01.mant according to the first multiplication result and the second multiplication result. Specifically, the second half-precision multiplier-accumulator may add the first multiplication result and the second multiplication result to obtain op01.mant. In the embodiments of the present disclosure, when a single-precision floating-point number multiplication-accumulation operation is performed, the multiplication is divided into two parts, the two half-precision multiplier-accumulators perform one part respectively, and then an addition is performed, so that the two half-precision multiplier-accumulators can perform the single-precision floating-point number multiplication-accumulation operation, thereby improving a utilization rate of the half-precision multiplier-accumulators, and reducing the hardware overhead because no additional single-precision multiplier-accumulator is required.
Optionally, after the two half-precision multiplier-accumulators operate together to obtain op01.mant, the second half-precision multiplier-accumulator may determine the corresponding single-precision multiplication-accumulation result according to op01.mant, op0.exp, op1.exp, op0.S, op1.S, op2.mant, op2.exp, and op2.S. Specifically, the second half-precision multiplier-accumulator may determine op01.exp according to op0.exp and op1.exp; determine op01.S according to op0.S and op1.S; determine dst.mant, dst.S, and dst.exp according to op01.exp, op2.exp, op01.S, op2.S, op01.mant, and op2.mant; and then normalize dst.mant, dst.S, and dst.exp to a data format of the single-precision floating-point number, so as to obtain the single-precision multiplication-accumulation result.
Optionally, two operands of the addition operation are op01 and op2. In order to make exponents of these two numbers the same, decimals are required to be shifted. The second half-precision multiplier-accumulator may determine an absolute value of an exponent difference according to op01.exp and op2.exp; and perform a shift operation on op01.mant or op2.mant according to op01.exp, op2.exp, and the absolute value of the exponent difference, to obtain op01.mant after the shift operation and op2.mant after the shift operation. The second half-precision multiplier-accumulator may determine dst.mant and dst.S according to op01.S, op2.S, op01.mant after the shift operation, and op2.mant after the shift operation; and determine dst.exp according to op01.exp and op2.exp.
Structures of the first half-precision multiplier-accumulator and the second half-precision multiplier-accumulator are described below.
According to a possible implementation, the first half-precision multiplier-accumulator and the second half-precision multiplier-accumulator may be designed according to structures shown in
It is to be noted that, as shown by a dashed box in
A process of the single-precision floating-point number multiplication-accumulation operation is described in detail below with reference to the structures shown in
In S801, the decimal multiplication module 10 performs a first-part multiplication to obtain a first multiplication result, and transmits the first multiplication result to the decimal addition module 27. The decimal multiplication module 20 performs second-part multiplication to obtain a second multiplication result, and transmits the second multiplication result to the decimal addition module 27. The decimal addition module 27 adds the first multiplication result and the second multiplication result to obtain op01.mant.
Specifically, the division manner of the multiplication may be obtained with reference to the above. Details are not described herein again in the embodiments of the present disclosure.
In S802, the exponent addition module 21 calculates op01.exp by using the following formula:
op01.exp=op0.exp+op1.exp−bias
where bias denotes a bias of a normal single-precision floating-point number, that is, bias=0x7F.
In S803, the decimal addition module 27 performs an XOR operation on op0.S and op1.S to obtain op01.S.
Optionally, op0.S and op1.S may be inputted to the decimal addition module 27, so that the decimal addition module 27 performs the XOR operation.
In S804, the exponent subtraction module 22 calculates an absolute value of an exponent difference of op01.exp and op2.exp.
Optionally, after calculating the absolute value of the exponent difference, the exponent subtraction module 22 may transmit op01.exp, op2.exp, and the absolute value to the shift operation module 23.
In S805, the shift operation module 23 compares op01.exp and op2.exp, and performs a shift operation according to a comparison result.
Specifically, if op01.exp is greater than or equal to op2.exp, op2.mant is right-shifted by a number of digits corresponding to the absolute value. If op01.exp is less than op2.exp, op01.mant is right-shifted by the number of digits corresponding to the absolute value, to obtain op01.mant and op2.mant after the shift operation.
Specifically, if op01.exp>=op2.exp, op2.mant is right-shifted by |op01.exp-op2.exp|.
If op01.exp<op2.exp, op01.mant is right-shifted by |op00.exp-op2.exp|.
Optionally, two selectors a and b may be provided between the decimal addition module 27 and the shift operation module 23. If the shift operation module 23 determines op01.exp>=op2.exp, op2.mant is selected from the selector b for a shift operation. In this case, the selector a transmits op01.mant to the addition module 24. If the shift operation module 23 determines op01.exp<op2.exp, op01.mant is selected from the selector b for a shift operation. In this case, the selector a transmits op2.mant to the addition module 24.
In S806, the addition module 24 compares op01.S and op2.S, and performs summation according to a comparison result.
If op01.S and op2.S are different, op2.mant after the shift operation in S805 is inverted and one is added thereto, to obtain op2.mant after the inversion and addition-by-one; and op2.mant after the inversion and addition-by-one and op01.mant after the shift operation in S806 are summed, and a sum result is taken as dst.mant. Moreover, dst.S is equal to op01.S.
If op01.S and op2.S are the same, op2.mant and op01.mant after the shift operation in S805 are directly summed, and a sum result is taken as dst.mant. Moreover, dst.S is equal to op01.S.
Optionally, the decimal addition module 27 may transmit op01.S to the addition module 24 through the shift operation module 23, and may input op2.S to the addition module 24, so that the addition module 24 can judge whether op01.S and op2.S are the same. Alternatively, op0.S, op1.S, and op2.S are all directly inputted to the addition module 24, so that the addition module 24 performs the XOR operation to obtain op01.S and performs the above judgment according to op01.S and op2.S.
In S807, the multiplication-accumulation result exponent determination module 26 determines dst.exp according to op01.exp and op2.exp.
Specifically, if op01.exp>=op2.exp, dst.exp=op01.exp.
If op01.exp<op2.exp, dst.exp=op02.exp.
Optionally, after dst.exp is obtained, dst.exp may be normalized to obtain an 8-bit binary exponent of the normalized single-precision floating-point number.
In S808, the normalization module 25 normalizes dst.mant obtained in S806 to obtain 23-bit binary mantissa of the normalized single-precision floating-point number.
Optionally, the addition module 24 may transmit dst.S to the normalization module 25. In this case, the normalization module 25 also outputs a sign bit in addition to the 23-bit binary mantissa. The sign bit, the 8-bit binary exponent obtained in S807, and the 23-bit binary mantissa obtained in S808 form the single-precision multiplication-accumulation result.
In the above multiplication-accumulation method, the multiplication is divided into two parts, and each of the two half-precision multiplier-accumulators performs one part respectively, so that the two half-precision multiplier-accumulators are combined to complete the single-precision floating-point number multiplication-accumulation operation, which improves utilization of the half-precision multiplier-accumulators, requires no additional single-precision multiplier-accumulators, and reduces the hardware overhead.
In S604, when the high-performance multiplier-accumulator performs half-precision floating-point number multiplication-accumulation operations, each half-precision multiplier-accumulator performs the multiplication-accumulation operation on to-be-processed half-precision floating-point numbers to obtain a corresponding half-precision multiplication-accumulation result, and a total of 2N multiplication-accumulation results are obtained.
The to-be-processed half-precision floating-point numbers include two half-precision multipliers and one half-precision addend. For ease of description, the two half-precision multipliers are called a first half-precision multiplier and a second half-precision multiplier respectively.
An expression of the multiplication-accumulation operations is:
dst=op0*op1+op2
where op0 denotes the first half-precision multiplier, op1 denotes the second half-precision multiplier, op2 denotes the half-precision addend, and dst denotes a multiplication-accumulation result.
It is to be noted that, different from the single-precision floating-point number multiplication-accumulation process, since a single half-precision multiplier-accumulator supports the half-precision floating-point number multiplication-accumulation operation, there is no need to divide the multiplication.
Optionally, the half-precision multiplier-accumulator may determine op01.mant according to op1.mant and op2.mant; determine op01.exp according to op0.exp and op1.exp; determine op01.S according to op0.S and op1.S; determine dst.mant, dst.exp, and dst.S according to op01.mant, op01.exp, op01.S, op2.mant, op2.exp, and op2.S; and then normalize dst.mant, dst.exp, and dst.S to a data format of the half-precision floating-point number, so as to obtain the half-precision multiplication-accumulation result.
Similarly, two operands of the addition are op01 and op2. In order to make exponents of these two numbers the same, decimals are required to be shifted. The half-precision multiplier-accumulator may determine an absolute value of an exponent difference according to op01.exp and op2.exp; perform a shift operation on op01.mant or op2.mant according to op01.exp, op2.exp, and the absolute value of the exponent difference, to obtain op01.mant after the shift operation and op2.mant after the shift operation; determine dst.mant and dst.S according to op01.S, op2.S, op01.mant after the shift operation, and op2.mant after the shift operation; and determine dst.exp according to op01.exp and op2.exp.
A process of the half-precision floating-point number multiplication-accumulation operations is described in detail below with reference to the structures shown in
In S1001, the decimal multiplication module 20 calculates a product of op0.mant and op1.mant.
Specifically, op1.mant and op2.mant are both 11-bit binary data, and the op01.mant obtained through calculation is 22-bit binary data. Since the decimal multiplication module 10 is connected to the decimal addition module 27 and each half-precision multiplier-accumulator performs the half-precision floating-point number multiplication-accumulation operation, that is, the first half-precision multiplier-accumulator also performs the half-precision floating-point number multiplication-accumulation operation, in order to prevent an influence of data transmitted from the decimal multiplication module 10, the decimal addition module 27 may zero the data transmitted from the decimal multiplication module 10. In this way, an output result of the decimal addition module 27 is a calculation result of the decimal multiplication module 20.
In 51002, the exponent addition module 21 calculates op01.exp.
In 51003, the decimal addition module 27 performs an XOR operation on op0.S and op1.S to obtain op01.S.
In 51004, the exponent subtraction module 22 calculates an absolute value of an exponent difference of op01.exp and op2.exp.
In 51005, the shift operation module 23 compares op01.exp and op2.exp, and performs a shift operation according to a comparison result.
In S1006, the addition module 24 compares op01.S and op2.S, and performs summation according to a comparison result to obtain dst.mant.
In S1007, the multiplication-accumulation result exponent determination module 26 determines dst.exp according to op01.exp and op2.exp.
After dst.exp is obtained, dst.exp may be normalized to obtain a 5-bit binary exponent of the normalized single-precision floating-point number.
In S1008, the normalization module 25 normalizes dst.mant obtained in S1006 to obtain 10-bit binary mantissa of the normalized single-precision floating-point number.
Implementation processes of S1001 to S1008 are similar to those of S801 to S808. Details are not described herein again in the embodiments of the present disclosure. The sign bit, the 5-bit binary exponent obtained in S1007, and the 10-bit binary mantissa obtained in S1008 form the half-precision multiplication-accumulation result.
According to the multiplication-accumulation method in the embodiments of the present disclosure, when the high-performance multiplier-accumulator performs single-precision floating-point number multiplication-accumulation operation, the two half-precision multiplier-accumulators in each single-precision multiplication-accumulation unit are combined to perform the multiplication-accumulation operation on to-be-processed single-precision floating-point numbers to obtain a corresponding single-precision multiplication-accumulation result, and a total of N multiplication-accumulation results are obtained. When the high-performance multiplier-accumulator performs half-precision floating-point number multiplication-accumulation operations, each half-precision multiplier-accumulator performs the multiplication-accumulation operation on to-be-processed half-precision floating-point numbers to obtain a corresponding half-precision multiplication-accumulation result, and a total of 2N multiplication-accumulation results are obtained. According to the solution in the embodiments of the present disclosure, when the single-precision floating-point number multiplication-accumulation operations are performed, the half-precision multiplier-accumulators also participate in the operations and are not idle, improving utilization of the half-precision multiplier-accumulators. Moreover, compared with the design solution shown in
The asymmetric multiplier-accumulator shown in
In an embodiment, the asymmetric multiplier-accumulator includes: N multiplication-accumulation units. Each multiplication-accumulation unit includes: a single-precision multiplier-accumulator and a half-precision multiplier-accumulator. When the asymmetric multiplier-accumulator performs half-precision floating-point number multiplication-accumulation operations, the single-precision multiplier-accumulator and the half-precision multiplier-accumulator respectively perform multiplication-accumulation operations on to-be-processed half-precision floating-point numbers to obtain corresponding half-precision multiplication-accumulation results, and a total of 2N half-precision multiplication-accumulation results are obtained. When the asymmetric multiplier-accumulator performs single-precision floating-point number multiplication-accumulation operations, the single-precision multiplier-accumulator performs the multiplication-accumulation operation on to-be-processed single-precision floating-point numbers to obtain a corresponding single-precision multiplication-accumulation result, and a total of N single-precision multiplication-accumulation results are obtained.
The to-be-processed half-precision floating-point numbers include: a first half-precision multiplier, a second half-precision multiplier, and a half-precision addend. The single-precision multiplier-accumulator includes: a first conversion unit, a determination module, and a second conversion unit. The first conversion unit is configured to convert the first half-precision multiplier, the second half-precision multiplier, and the half-precision addend into single precision, to obtain a first single-precision multiplier, a second single-precision multiplier, and a single-precision addend. The determination module is configured to determine the single-precision multiplication-accumulation result according to a decimal of the first single-precision multiplier, an exponent of the first single-precision multiplier, a sign of the first single-precision multiplier, a decimal of the second single-precision multiplier, an exponent of the second single-precision multiplier, a sign of the second single-precision multiplier, a decimal of the single-precision addend, an exponent of the single-precision addend, and a sign of the single-precision addend. The second conversion unit is configured to convert the single-precision multiplication-accumulation result to obtain the half-precision multiplication-accumulation result.
The determination module includes: a decimal multiplication module, an exponent addition module, an addition module, and a determination unit. The decimal multiplication module is configured to determine a decimal of a multiplication result according to the decimal of the first single-precision multiplier and the decimal of the second single-precision multiplier. The exponent addition module is configured to determine an exponent of the multiplication result according to the exponent of the first single-precision multiplier and the exponent of the second single-precision multiplier. The addition module is configured to determine a sign of the multiplication result according to the sign of the first single-precision multiplier and the sign of the second single-precision multiplier. The determination unit is configured to determine a decimal of a multiplication-accumulation result, a sign of the multiplication-accumulation result, and an exponent of the multiplication-accumulation result according to the decimal of the multiplication result, the exponent of the multiplication result, the sign of the multiplication result, the exponent of the single-precision addend, the sign of the single-precision addend, and the decimal of the single-precision addend; and determine the single-precision multiplication-accumulation result according to the decimal of the multiplication-accumulation result, the sign of the multiplication-accumulation result, and the exponent of the multiplication-accumulation result.
The determination unit includes: a first exponent subtraction module, a first shift operation module, a first addition module, and a first multiplication-accumulation result exponent determination module. The first exponent subtraction module is configured to determine an exponent difference according to the exponent of the multiplication result and the exponent of the single-precision addend. The first shift operation module is configured to perform a shift operation on the decimal of the multiplication result or the decimal of the single-precision addend according to the exponent of the multiplication result, the exponent of the single-precision addend, and the exponent difference, to obtain a decimal of a multiplication result after the shift operation and a decimal of a single-precision addend after the shift operation. The first addition module is configured to determine the decimal of the multiplication-accumulation result and the sign of the multiplication-accumulation result according to the sign of the multiplication result, the sign of the single-precision addend, the decimal of the multiplication result after the shift operation, and the decimal of the single-precision addend after the shift operation. The first multiplication-accumulation result exponent determination module is configured to determine the exponent of the multiplication-accumulation result according to the exponent of the multiplication result and the exponent of the single-precision addend.
Specifically, the exponent addition module is specifically configured to:
determine the exponent of the multiplication result by using the following formula:
op01.exp=op0.exp+op1.exp−bias
where op01.exp denotes the exponent of the multiplication result, op0.exp denotes the exponent of the first single-precision multiplier, op1.exp denotes the exponent of the second single-precision multiplier, and bias denotes an exponent bias.
Specifically, the addition module is specifically configured to:
Specifically, the first shift operation module is specifically configured to:
Specifically, the first addition module is specifically configured to: invert and add one to the decimal of the single-precision addend after the shift operation if the sign of the multiplication result is different from the sign of the single-precision addend, to obtain a decimal of a single-precision addend after the inversion and addition-by-one; and sum the decimal of the multiplication result after the shift operation and the decimal of the single-precision addend after the inversion and addition-by-one, and take a sum result as the decimal of the multiplication-accumulation result; and sum the decimal of the multiplication result after the shift operation and the decimal of the single-precision addend after the shift operation if the sign of the multiplication result is the same as the sign of the single-precision addend, and take a sum result as the decimal of the multiplication-accumulation result.
Specifically, the first multiplication-accumulation result exponent determination module is specifically configured to:
In an embodiment, the to-be-processed single-precision floating-point numbers include: a first single-precision multiplier, a second single-precision multiplier, and a single-precision addend. The single-precision multiplier-accumulator is specifically configured to determine a decimal of a multiplication result according to a decimal of the first single-precision multiplier and a decimal of the second single-precision multiplier; determine an exponent of the multiplication result according to an exponent of the first single-precision multiplier and an exponent of the second single-precision multiplier; determine a sign of the multiplication result according to a sign of the first single-precision multiplier and a sign of the second single-precision multiplier; determine a decimal of a multiplication-accumulation result, a sign of the multiplication-accumulation result, and an exponent of the multiplication-accumulation result according to the decimal of the multiplication result, the exponent of the multiplication result, the sign of the multiplication result, the exponent of the single-precision addend, the sign of the single-precision addend, and the decimal of the single-precision addend; and determine the single-precision multiplication-accumulation result according to the decimal of the multiplication-accumulation result, the sign of the multiplication-accumulation result, and the exponent of the multiplication-accumulation result.
The single-precision multiplier-accumulator includes: a second exponent subtraction module, a second shift operation module, a second addition module, and a second multiplication-accumulation result exponent determination module. The second exponent subtraction module is configured to determine an exponent difference according to the exponent of the multiplication result and the exponent of the single-precision addend. The second shift operation module is configured to perform a shift operation on the decimal of the multiplication result or the decimal of the single-precision addend according to the exponent of the multiplication result, the exponent of the single-precision addend, and the exponent difference, to obtain a decimal of a multiplication result after the shift operation and a decimal of a single-precision addend after the shift operation. The second addition module is configured to determine the decimal of the multiplication-accumulation result and the sign of the multiplication-accumulation result according to the sign of the multiplication result, the sign of the single-precision addend, the decimal of the multiplication result after the shift operation, and the decimal of the single-precision addend after the shift operation. The second multiplication-accumulation result exponent determination module is configured to determine the exponent of the multiplication-accumulation result according to the exponent of the multiplication result and the exponent of the single-precision addend.
In an embodiment, as shown in
In S1102, when the asymmetric multiplier-accumulator performs half-precision floating-point number multiplication-accumulation operations, the single-precision multiplier-accumulator and the half-precision multiplier-accumulator perform the multiplication-accumulation operations on to-be-processed half-precision floating-point numbers respectively to obtain corresponding half-precision multiplication-accumulation results, and a total of 2N half-precision multiplication-accumulation results are obtained.
It is to be noted that a process of the half-precision multiplier-accumulator performing the half-precision floating-point number multiplication-accumulation operations may be obtained with reference to the prior art. Details are not described herein again in the embodiments of the present disclosure. The following focuses on a process of the single-precision multiplier-accumulator performing the half-precision floating-point number multiplication-accumulation operations.
The to-be-processed half-precision floating-point numbers include: two half-precision multipliers and one half-precision addend. For ease of description, the two half-precision multipliers are called a first half-precision multiplier and a second half-precision multiplier respectively.
Optionally, the single-precision multiplier may convert the first half-precision multiplier, the second half-precision multiplier, and the half-precision addend into single precision, to obtain a first single-precision multiplier, a second single-precision multiplier, and a single-precision addend, and then perform the multiplication-accumulation operation on the first single-precision multiplier, the second single-precision multiplier, and the single-precision addend.
An expression of the multiplication-accumulation operations is:
dst=op0*op1+op2
where op0 denotes the first single-precision multiplier after the conversion, op1 denotes the second single-precision multiplier after the conversion, op2 denotes the single-precision addend after the conversion, and dst denotes a single-precision multiplication-accumulation result.
Optionally, after the above conversion, the single-precision multiplier-accumulator may determine the single-precision multiplication-accumulation result according to op0.mant, op1.mant, op2.mant, op0.exp, op1.exp, op2.exp, op0.S, op1.S, and op2.S; and convert the single-precision multiplication-accumulation result to obtain the half-precision multiplication-accumulation result. In this way, the single-precision multiplier-accumulator can also perform the half-precision floating-point number multiplication-accumulation operations, improving utilization of the single-precision multiplier-accumulator. According to the solution in the embodiments of the present disclosure, fewer n half-precision multiplier-accumulators can be provided, reducing the hardware overhead.
Optionally, a process of converting the half-precision floating-point number into a single-precision floating-point number may be obtained with reference to the prior art. Details are not described herein again in the embodiments of the present disclosure.
Optionally, the single-precision multiplier-accumulator may determine op01.mant according to op0.mant and op1.mant; determine op01.exp according to op0.exp and op2.exp; determine op01.S according to op0.S and op1.S; determine dst.mant, dst.S, and dst.exp according to op01.mant, op01.exp, op01.S, op2.exp, op2.S, and op2.mant; and then normalize dst.mant, dst.S, and dst.exp to a data format of the single-precision floating-point number, so as to obtain the single-precision multiplication-accumulation result.
Similarly, two operands of the addition are op01 and op2. In order to make exponents of these two numbers the same, decimals are required to be shifted. The half-precision multiplier-accumulator may determine an absolute value of an exponent difference according to op01.exp and op2.exp; perform a shift operation on op01.mant or op2.mant according to op01.exp, op2.exp, and the absolute value of the exponent difference, to obtain op01.mant after the shift operation and op2.mant after the shift operation; determine dst.mant and dst.S according to op01.S, op2.S, op01.mant after the shift operation, and op2.mant after the shift operation; and determine dst.exp according to op01.exp and op2.exp.
Structures of the single-precision multiplier-accumulator and the half-precision multiplier-accumulator are described below.
In a possible implementation, the single-precision multiplier-accumulator and the half-precision multiplier-accumulator may be designed to structures shown in
A process of the single-precision multiplier-accumulator performing the half-precision floating-point number multiplication-accumulation operations is described in detail below with reference to the structures shown in
In S1301, the first conversion unit 47 converts the first half-precision multiplier, the second half-precision multiplier, and the half-precision addend into single precision to obtain op0, op1, and op2.
Optionally, a process of converting the half-precision floating-point number into a single-precision floating-point number may be obtained with reference to the prior art. Details are not described herein again in the embodiments of the present disclosure. The first conversion unit 47, after obtaining op0, op1, and op2 by conversion, transmits op0.mant and op1.mant to the decimal multiplication module 40.
In S1302, the decimal multiplication module 40 calculates a product op01.mant of op0.mant and op1.mant.
Specifically, op0.mant and op1.mant are both 24-bit binary data, and the op01.mant obtained through calculation is 48-bit binary data.
In S1303, the exponent addition module 41 calculates op01.exp.
In S1304, the addition module 44 performs an XOR operation on op0.S and op1.S to obtain op01.S.
Optionally, op0.S, op1.S, and op2.S may be inputted to the addition module 44, so that the addition module 44 can perform the XOR operation to obtain op01. S.
In S1305, the exponent subtraction module 42 calculates an absolute value of an exponent difference of op01.exp and op2.exp.
In S1306, the shift operation module 43 compares op01.exp and op2.exp, and performs a shift operation according to a comparison result.
In S1307, the addition module 44 compares op01.S and op2.S, and performs summation according to a comparison result to obtain dst.mant.
In S1308, the multiplication-accumulation result exponent determination module 26 determines dst.exp according to op01.exp and op2.exp.
After dst.exp is obtained, dst.exp may be normalized to obtain an 8-bit binary exponent of the normalized single-precision floating-point number.
In S1309, the normalization module 45 normalizes dst.mant obtained in S1307 to obtain 23-bit binary mantissa of the normalized single-precision floating-point number.
In S1310, the second conversion unit converts 48-bit binary mantissa into 10-bit binary mantissa, and converts the 8-bit binary exponent into a 5-bit binary exponent.
Optionally, the addition module 44 may transmit dst.S to the normalization module 45. In this case, the normalization module 45 also outputs a sign bit in addition to the 23-bit binary mantissa. The sign bit, the 10-bit binary exponent obtained in S1310, and the 5-bit binary mantissa obtained in S1310 form the half-precision multiplication-accumulation result.
In the above multiplication-accumulation method, the half-precision floating-point number is converted into single precision. In this way, the single-precision multiplier-accumulator can also perform the half-precision floating-point number multiplication-accumulation operations, improving utilization of the single-precision multiplier-accumulator. According to the solution in the embodiments of the present disclosure, fewer n half-precision multiplier-accumulators can be provided, reducing the hardware overhead.
It is to be noted that, compared with the process shown in
In S1104, when the asymmetric multiplier-accumulator performs single-precision floating-point number multiplication-accumulation operations, the single-precision multiplier-accumulator performs the multiplication-accumulation operation on to-be-processed single-precision floating-point numbers to obtain a corresponding single-precision multiplication-accumulation result, and a total of N single-precision multiplication-accumulation results are obtained.
The to-be-processed single-precision floating-point numbers include two single-precision multipliers and one single-precision addend. For ease of description, the two single-precision multipliers are called a first single-precision multiplier and a second single-precision multiplier respectively.
An expression of the multiplication-accumulation operations is:
dst=op0*op1+op2
where op0 denotes the first single-precision multiplier, op1 denotes the second single-precision multiplier, op2 denotes the single-precision addend, and dst denotes a multiplication-accumulation result.
It is to be noted that, different from the half-precision floating-point number multiplication-accumulation process, since the single-precision multiplier-accumulator supports the single-precision floating-point number multiplication-accumulation operations, conversion is not required when the single-precision multiplier-accumulator performs the single-precision floating-point number multiplication-accumulation operations.
Optionally, the single-precision multiplier-accumulator may determine op01.mant according to op1.mant and op2.mant; determine op01.exp according to op0.exp and op1.exp; determine op01.S according to op0.S and op1.S; determine dst.mant, dst.exp, and dst.S according to op01.mant, op01.exp, op01.S, op2.mant, op2.exp, and op2.S; and then normalize dst.mant, dst.exp, and dst.S to a data format of the single-precision floating-point number, so as to obtain the single-precision multiplication-accumulation result.
Similarly, two operands of the addition are op01 and op2. In order to make exponents of these two numbers the same, decimals are required to be shifted. The single-precision multiplier-accumulator may determine an absolute value of an exponent difference according to op01.exp and op2.exp; perform a shift operation on op01.mant or op2.mant according to op01.exp, op2.exp, and the absolute value of the exponent difference, to obtain op01.mant after the shift operation and op2.mant after the shift operation; determine dst.mant and dst.S according to op01.S, op2.S, op01.mant after the shift operation, and op2.mant after the shift operation; and determine dst.exp according to op01.exp and op2.exp.
A process of the single-precision multiplier-accumulator performing the single-precision floating-point number multiplication-accumulation operations is described in detail below with reference to the structures shown in
In S1501, the decimal multiplication module 40 calculates a product of op0.mant and op1.mant.
Specifically, since conversion is not required when the single-precision floating-point number multiplication-accumulation operations are performed, the first conversion unit 47 and the second conversion unit 48 are in ineffective states and only perform transparent transmission functions. op1.mant and op2.mant are both 24-bit binary data, and op01.mant obtained through calculation is 48-bit binary data.
In S1502, the exponent addition module 41 calculates op01.exp.
In S1503, the addition module 44 performs an XOR operation on op0.S and op1.S to obtain op01.S.
In S1504, the exponent subtraction module 42 calculates an absolute value of an exponent difference of op01.exp and op2.exp.
In S1505, the shift operation module 43 compares op01.exp and op2.exp, and performs a shift operation according to a comparison result.
In S1506, the addition module 44 compares op01.S and op2.S, and performs summation according to a comparison result to obtain dst.mant.
In S1507, the multiplication-accumulation result exponent determination module 46 determines dst.exp according to op01.exp and op2.exp.
After dst.exp is obtained, dst.exp may be normalized to obtain an 8-bit binary exponent of the normalized single-precision floating-point number.
In S1508, the normalization module 25 normalizes dst.mant obtained in S1506 to obtain 23-bit binary mantissa of the normalized single-precision floating-point number.
Implementation processes of S1501 to S1508 are similar to those of S1302 to S1309. Details are not described herein again in the embodiments of the present disclosure. The sign bit, the 8-bit binary exponent obtained in S1507, and the 23-bit binary mantissa obtained in S1508 form the single-precision multiplication-accumulation result.
According to the multiplication-accumulation method in the embodiments of the present disclosure, when the asymmetric multiplier-accumulator performs half-precision floating-point number multiplication-accumulation operations, each single-precision multiplier-accumulator and each half-precision multiplier-accumulator both perform multiplication-accumulation operations on to-be-processed half-precision floating-point numbers to obtain corresponding half-precision multiplication-accumulation results, and a total of 2N half-precision multiplication-accumulation results are obtained. When the asymmetric multiplier-accumulator performs single-precision floating-point number multiplication-accumulation operations, each single-precision multiplier-accumulator performs the multiplication-accumulation operation on to-be-processed single-precision floating-point numbers to obtain a corresponding single-precision multiplication-accumulation result, and a total of N single-precision multiplication-accumulation results are obtained. According to the solution in the embodiments of the present disclosure, when the half-precision floating-point number multiplication-accumulation operations are performed, the single-precision multiplier-accumulators also participate in the operations and are not idle, improving utilization of the single-precision multiplier-accumulators. Moreover, compared with the design solution shown in
It should be understood that, although the steps in the flowcharts involved in the embodiments as described above are displayed in sequence as indicated by the arrows, the steps are not necessarily performed in the order indicated by the arrows. Unless otherwise clearly specified herein, the steps are performed without any strict sequence limitation, and may be performed in other orders. In addition, at least some steps in the flowcharts involved in the embodiments as described above may include a plurality of steps or a plurality of stages, and such steps or stages are not necessarily performed at a same moment, and may be performed at different moments. The steps or stages are not necessarily performed in sequence, and the steps or stages and at least some of other steps or steps or stages of other steps may be performed in turn or alternately. The steps performed by the modules in the above apparatus embodiment may be obtained with reference to the description in the method embodiment.
The modules in the foregoing multiplication-accumulation apparatus may be implemented entirely or partially by software, hardware, or a combination thereof. The above modules may be built in or independent of a processor of a computer device in a hardware form, or may be stored in a memory of the computer device in a software form, so that the processor invokes and performs operations corresponding to the above modules.
In an embodiment, a computer device is further provided, including a memory and a processor. The memory stores a computer program. The processor implements the steps in the above method embodiments when executing the computer program.
The technical features in the above embodiments may be randomly combined. For concise description, not all possible combinations of the technical features in the above embodiments are described. However, all the combinations of the technical features are to be considered as falling within the scope described in this specification provided that they do not conflict with each other.
The above embodiments only describe several implementations of the present disclosure, and their description is specific and detailed, but cannot therefore be understood as a limitation on the patent scope of the present disclosure. It should be noted that those of ordinary skill in the art may further make variations and improvements without departing from the conception of the present disclosure, and these all fall within the protection scope of the present disclosure. Therefore, the patent protection scope of the present disclosure should be subject to the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
202210830964.3 | Jul 2022 | CN | national |
202210832162.6 | Jul 2022 | CN | national |