Embodiments of this disclosure relate to the computer field, further to an application of an artificial intelligence (AI) technology in the computer field, and in particular, to a floating-point number calculation circuit and a floating-point number calculation method.
AI is a theory, a method, a technology, or an application system that simulates, extends, and expands human intelligence by using a digital computer or a machine controlled by a digital computer, to perceive an environment, obtain knowledge, and achieve an optimal result based on the knowledge. In other words, AI is a branch of computer science and attempts to understand essence of intelligence and produce a new intelligent machine that can react in a similar manner to human intelligence. AI is to research design principles and implementation methods of various intelligent machines, so that the machines have perception, inference, and decision-making functions. Research in the field of AI includes robotics, natural language processing, computer vision, decision-making and inference, man-machine interaction, recommendation and search, AI basic theories, and the like.
Currently, a convolutional neural network (CNN) is widely used in a plurality of types of image processing applications. In such applications, when floating-point 16 (FP16) data is used to perform network training on a model, network training is not converged or a convergence speed is low due to low precision of the FP16 data. Therefore, higher-precision floating-point 32 (FP32) data is required to ensure network training effect. In addition, in a supercomputing application, higher-precision floating-point 64 (FP64) data is required for numerical calculation.
In an existing data calculation solution, a large bit-width multiplier is usually used to calculate data. For example, a multiplier for calculating FP64 data is usually reused to calculate the FP64 data and FP32 data. In an existing calculation solution, a 54-bit multiplier is designed to directly support calculation of a mantissa of the FP64 data. When the multiplier is used to calculate the FP32 data, the 54-bit multiplier is logically divided into two 27-bit parts to support calculation of mantissa parts of two pairs of FP32 data. As for processing of an exponential (exp) part, an eap processing unit of the FP64 part is directly copied to process the extra eap part of the FP32. However, in terms of an area ratio, area overheads of an FP64 multiplier are approximately equal to those of four FP32 multipliers. When the FP64 multiplier is reused to calculate the FP32 data, the FP64 multiplier implements only two times higher than calculation performance of the FP32 multiplier and the FP64 multiplier also has large timing overheads and high hardware design costs. Therefore, when the large bit-width multiplier is used to calculate the data, timing overheads, a hardware design and the like are unsatisfactory.
Embodiments of this disclosure provide a floating-point number calculation circuit and a floating-point number calculation method. The floating-point number calculation circuit can split a large bit-width floating-point number into small bit-width floating-point numbers, so that a small bit-width multiplier is used to calculate the large bit-width floating-point number. The floating-point number calculation circuit has small timing overheads, and low hardware design costs. Therefore, calculation performance of the multiplier is appropriately used.
A first aspect of embodiments of this disclosure provides a floating-point number calculation circuit. The floating-point number calculation circuit includes a memory controller, a splitting circuit, a storage circuit, an exponential processing circuit, and a calculation circuit. An input terminal of the splitting circuit is electrically connected to an output terminal of the memory controller, and an output terminal of the splitting circuit is electrically connected to an input terminal of the storage circuit. An input terminal of the exponential processing circuit is electrically connected to a first output terminal of the storage circuit, and an output terminal of the exponential processing circuit is electrically connected to a first input terminal of the calculation circuit. A second input terminal of the calculation circuit is electrically connected to a second output terminal of the storage circuit. The memory controller is configured to obtain a first floating-point number and a second floating-point number. The splitting circuit is configured to split a mantissa part of the first floating-point number and a mantissa part of the second floating-point number, and obtain a first number of shifted bits of each mantissa part obtained after splitting. The storage circuit is configured to store each mantissa part obtained after splitting, an exponential part corresponding to each mantissa part obtained after splitting, and the first number of shifted bits of each mantissa part obtained after splitting. The exponential processing circuit is configured to: add an exponential part of the first floating-point number and an exponential part of the second floating-point number to obtain a first operation result, add the first number of shifted bits of each mantissa part obtained after splitting and the exponential part corresponding to each mantissa part obtained after splitting to obtain a plurality of second operation results, and obtain, based on the plurality of second operation results, a second number of shifted bits of each mantissa part obtained after splitting. The calculation circuit is configured to calculate a product of the mantissa part of the first floating-point number and the mantissa part of the second floating-point number based on each mantissa part obtained after splitting and the second number of shifted bits of each mantissa part obtained after splitting.
An embodiment of this disclosure provides a floating-point number calculation circuit. A splitting circuit included in the floating-point number calculation circuit splits a mantissa part of a first floating-point number and a mantissa part of a second floating-point number. An exponential processing circuit obtains a second number of shifted bits of each mantissa part obtained after splitting. A calculation circuit calculates a product of the mantissa part of the first floating-point number and the mantissa part of the second floating-point number based on each mantissa part obtained after splitting and the second number of shifted bits of each mantissa part obtained after splitting. The floating-point number calculation circuit can split a large bit-width floating-point number into small bit-width floating-point numbers, so that a small bit-width multiplier is used to calculate the large bit-width floating-point number. The floating-point number calculation circuit provided in this disclosure has small timing overheads and low hardware design costs. Therefore, calculation performance of the multiplier is appropriately used.
In a possible implementation of the first aspect, the splitting circuit is configured to split the mantissa part of the first floating-point number into a first high-order mantissa and a first low-order mantissa, and split the mantissa part of the second floating-point number into a second high-order mantissa and a second low-order mantissa. The first number of shifted bits indicates a shift difference between a most significant bit of each high-order mantissa and a most significant bit of each low-order mantissa.
In this possible implementation, according to the floating-point number calculation circuit provided in this disclosure, the large bit-width mantissa part of the first floating-point number can be split into the first high-order mantissa and the first low-order mantissa with a small bit width, and the large bit-width mantissa part of the second floating-point number can be split into the second high-order mantissa and the second low-order mantissa with a small bit width, so that a small bit-width multiplier is used to calculate the product of the mantissa parts obtained after splitting. This reduces hardware design costs, and calculation performance of the multiplier is appropriately used.
In a possible implementation of the first aspect, the first high-order mantissa includes a first mantissa, the first low-order mantissa includes a second mantissa, the second high-order mantissa includes a third mantissa, and the second low-order mantissa includes a fourth mantissa.
In this possible implementation, a specific splitting manner for a mantissa part of a floating-point number is provided. After a mantissa part of an FP32 floating-point number is split in this splitting manner, an FP16 multiplier can be used for calculation. Similarly, after a mantissa part of an FP64 floating-point number is split in this splitting manner, an FP32 multiplier can be used for calculation. After a mantissa part of a floating-point 128 (FP128) floating-point number is split in this splitting manner, an FP64 multiplier can be used for calculation. In this splitting manner, a small bit-width multiplier can be used to calculate a product of large bit-width mantissa parts.
In a possible implementation of the first aspect, the first high-order mantissa includes a first mantissa. The first low-order mantissa includes a second mantissa, a third mantissa, a fourth mantissa, and a fifth mantissa. The second high-order mantissa includes a sixth mantissa. The second low-order mantissa includes a seventh mantissa, an eighth mantissa, a ninth mantissa, and a tenth mantissa.
In this possible implementation, a specific splitting manner for a mantissa part of a floating-point number is provided. After a mantissa part of an FP64 floating-point number is split in this splitting manner, an FP16 multiplier can be used for calculation. Similarly, after a mantissa part of an FP128 floating-point number is split in this splitting manner, an FP32 multiplier can be used for calculation. In this splitting manner, a small bit-width multiplier can be used to calculate a product of large bit-width mantissa parts.
In a possible implementation of the first aspect, the exponential processing circuit includes a first adder, a selection circuit, and a second adder. An input terminal of the first adder is electrically connected to the first output terminal of the storage circuit, and an output terminal of the first adder is electrically connected to a first input terminal of the second adder. A second input terminal of the second adder is electrically connected to an output terminal of the selection circuit, and an output terminal of the second adder is electrically connected to the first input terminal of the calculation circuit. The first adder is configured to add the first number of shifted bits of each mantissa part obtained after splitting and the exponential part corresponding to each mantissa part obtained after splitting, to obtain the plurality of second operation results. The selection circuit is configured to select a largest value in the plurality of second operation results. The second adder is configured to subtract each second operation result from the largest value in the plurality of second operation results, to obtain the second number of shifted bits of each mantissa part obtained after splitting.
This possible implementation provides a specific implementation form of hardware, thereby improving implementation of this solution.
In a possible implementation of the first aspect, the calculation circuit includes a multiplier, a shift register, and a third adder. An input terminal of the multiplier is electrically connected to the second output terminal of the storage circuit, and an output terminal of the multiplier is electrically connected to a first input terminal of the shift register. A second input terminal of the shift register is electrically connected to the output terminal of the second adder. An output terminal of the shift register is electrically connected to an input terminal of the third adder. The multiplier is configured to respectively multiply all mantissa parts that are obtained after splitting and that include the first high-order mantissa and the first low-order mantissa by all mantissa parts that are obtained after splitting and that include the second high-order mantissa and the second low-order mantissa, to obtain a plurality of pieces of multiplication data. The shift register is configured to perform shift processing on the plurality of pieces of multiplication data based on the second number of shifted bits of each mantissa part obtained after splitting. The third adder is configured to perform an addition operation on a plurality of pieces of multiplication data obtained after shift processing, to obtain the product of the mantissa part of the first floating-point number and the mantissa part of the second floating-point number.
This possible implementation provides a specific implementation form of hardware, thereby improving implementation of this solution.
A second aspect of embodiments of this disclosure provides a floating-point number calculation method. The method includes: obtaining a first floating-point number and a second floating-point number; splitting a mantissa part of the first floating-point number and a mantissa part of the second floating-point number, and obtaining a first number of shifted bits of each mantissa part obtained after splitting; storing each mantissa part obtained after splitting, an exponential part corresponding to each mantissa part obtained after splitting, and the first number of shifted bits of each mantissa part obtained after splitting; adding an exponential part of the first floating-point number and an exponential part of the second floating-point number to obtain a first operation result, adding the first number of shifted bits of each mantissa part obtained after splitting and the exponential part corresponding to each mantissa part obtained after splitting to obtain a plurality of second operation results, and obtaining, based on the plurality of second operation results, a second number of shifted bits of each mantissa part obtained after splitting; and calculating a product of the mantissa part of the first floating-point number and the mantissa part of the second floating-point number based on each mantissa part obtained after splitting and the second number of shifted bits of each mantissa part obtained after splitting.
In this embodiment of this disclosure, the mantissa part of the first floating-point number and the mantissa part of the second floating-point number are split to obtain the second number of shifted bits of each mantissa part obtained after splitting. Then, the product of the mantissa part of the first floating-point number and the mantissa part of the second floating-point number is calculated based on each mantissa part obtained after splitting and the second number of shifted bits of each mantissa part obtained after splitting. In the method, a large bit-width floating-point number can be split into a small bit-width floating-point number, so that a small bit-width multiplier is used to calculate the large bit-width floating-point number. According to the floating-point number calculation method provided in this disclosure, a calculation apparatus has short timing overheads and low hardware design costs, and calculation performance of a multiplier included in the calculation apparatus is appropriately used.
In a possible implementation of the second aspect, the splitting a mantissa part of the first floating-point number and a mantissa part of the second floating-point number includes: splitting the mantissa part of the first floating-point number into a first high-order mantissa and a first low-order mantissa, and splitting the mantissa part of the second floating-point number into a second high-order mantissa and a second low-order mantissa. The first number of shifted bits indicates a shift difference between a most significant bit of each high-order mantissa and a most significant bit of each low-order mantissa.
In this possible implementation, according to the floating-point number calculation method provided in this disclosure, the large bit-width mantissa part of the first floating-point number can be split into the first high-order mantissa and the first low-order mantissa with a small bit width, the large bit-width mantissa part of the second floating-point number can be split into the second high-order mantissa and the second low-order mantissa with a small bit width, so that a small bit-width multiplier is used to calculate the product of the mantissa parts obtained after splitting. This reduces hardware design costs, and calculation performance of the multiplier is appropriately used.
In a possible implementation of the second aspect, the first high-order mantissa includes a first mantissa, the first low-order mantissa includes a second mantissa, the second high-order mantissa includes a third mantissa, and the second low-order mantissa includes a fourth mantissa.
In this possible implementation, a specific splitting manner for a mantissa part of a floating-point number is provided. After a mantissa part of an FP32 floating-point number is split in this splitting manner, an FP16 multiplier can be used for calculation. Similarly, after a mantissa part of an FP64 floating-point number is split in this splitting manner, an FP32 multiplier can be used for calculation. After a mantissa part of an FP128 floating-point number is split in this splitting manner, an FP64 multiplier can be used for calculation. In this splitting manner, a small bit-width multiplier can be used to calculate a product of large bit-width mantissa parts.
In a possible implementation of the second aspect, the first high-order mantissa includes a first mantissa. The first low-order mantissa includes a second mantissa, a third mantissa, a fourth mantissa, and a fifth mantissa. The second high-order mantissa includes a sixth mantissa. The second low-order mantissa includes a seventh mantissa, an eighth mantissa, a ninth mantissa, and a tenth mantissa.
In this possible implementation, a specific splitting manner for a mantissa part of a floating-point number is provided. After a mantissa part of an FP64 floating-point number is split in this splitting manner, an FP16 multiplier can be used for calculation. Similarly, after a mantissa part of an FP128 floating-point number is split in this splitting manner, an FP32 multiplier can be used for calculation. In this splitting manner, a small bit-width multiplier can be used to calculate a product of large bit-width mantissa parts.
A third aspect of embodiments of this disclosure provides a calculation apparatus. The calculation apparatus includes a control circuit and a floating-point number calculation circuit. The floating-point number calculation circuit calculates data under control of the control circuit. The floating-point number calculation circuit is the floating-point number calculation circuit described in any one of the first aspect or the possible implementations of the first aspect.
To make objectives, technical solutions, and advantages of this disclosure clearer, the following describes embodiments of this disclosure with reference to accompanying drawings. It is clear that the described embodiments are merely some rather than all of the embodiments of this disclosure. A person of ordinary skill in the art may learn that, as a new application scenario emerges, the technical solutions provided in embodiments of this disclosure are also applicable to a similar technical problem.
In this specification, claims, and accompanying drawings of this disclosure, the terms “first”, “second”, and the like are intended to distinguish between similar objects but do not necessarily indicate a specific order or sequence. It should be understood that data termed in such a way is interchangeable in proper circumstances, so that embodiments described herein can be implemented in other orders than the order illustrated or described herein. In addition, the terms “include”, “contain” and any other variants mean to cover the non-exclusive inclusion, for example, a process, method, system, product, or device that includes a list of steps or modules is not necessarily limited to those steps or modules, but may include other steps or modules not expressly listed or inherent to such a process, method, product, or device. Names or numbers of steps in this disclosure do not mean that the steps in the method procedure need to be performed in a time/logical sequence indicated by the names or numbers. An execution sequence of the steps in the procedure that have been named or numbered can be changed based on a technical objective to be achieved, provided that same or similar technical effect can be achieved.
AI is a theory, a method, a technology, or an application system that simulates, extends, and expands human intelligence by using a digital computer or a machine controlled by a digital computer, to perceive an environment, obtain knowledge, and achieve an optimal result based on the knowledge. In other words, AI is a branch of computer science and attempts to understand essence of intelligence and produce a new intelligent machine that can react in a similar manner to human intelligence. AI is to research design principles and implementation methods of various intelligent machines, so that the machines have perception, inference, and decision-making functions. Research in the field of AI includes robotics, natural language processing, computer vision, decision-making and inference, man-machine interaction, recommendation and search, AI basic theories, and the like.
The CNN has a wide application prospect in the fields such as image, speech recognition, and the like. As shown in
Currently, the CNN is widely used in a plurality of types of image processing applications. In the image processing application, when FP16 data is used to perform network training on a model, network training is not converged or a convergence speed is low due to low precision of the FP16 data. Therefore, higher-precision FP32 data is required to ensure network training effect. In addition, in some applications, higher-precision FP64 data and FP128 data are required for model training.
It should be noted that, in addition to being used in the field of Al, the floating-point number calculation circuit in the present disclosure may be further used in the field of data signal processing, for example, an image processing system, a radar system, and a communication system. This circuit and method can optimize performance of digital signal processing (DSP) or other digital devices. For example, the circuit is used in a digital device in an existing communication system, for example, a Long-Term Evolution (LTE) system, a Universal Mobile Telecommunications System (UMTS), and a Global System for Mobile Communications (GSM).
In an existing data calculation solution, a large bit-width multiplier is usually used to calculate data. For example, a multiplier for calculating FP64 data is usually reused to calculate the FP64 data and FP32 data. In some calculation solutions, a 54-bit multiplier is designed to directly support calculation of a mantissa of the FP64 data. When the multiplier is used to calculate the FP32 data, the 54-bit multiplier is logically divided into two 27-bit parts to support calculation of mantissa parts of two pairs of FP32 data. However, in terms of an area ratio, area overheads of an FP64 multiplier are approximately equal to those of four FP32 multipliers. In the other technologies, when the FP64 multiplier is reused to calculate the FP32 data, the FP64 multiplier implements two times higher than calculation performance of the FP32 multiplier, and the FP64 multiplier also has large timing overheads and high hardware design costs. Therefore, when the large bit-width multiplier is used to calculate the data, timing overheads, a hardware design and the like are unsatisfactory.
For the foregoing problems in the existing data calculation solution, embodiments of this disclosure provide a floating-point number calculation circuit. A splitting circuit included in the floating-point number calculation circuit splits a mantissa part of a first floating-point number and a mantissa part of a second floating-point number, and obtains a first number of shifted bits of each mantissa part obtained after splitting. An exponential processing circuit adds the first number of shifted bits of each mantissa part obtained after splitting and an exponential part corresponding to each mantissa part obtained after splitting, to obtain a plurality of second operation results, and obtains, based on the plurality of second operation results, a second number of shifted bits of each mantissa part obtained after splitting. A calculation circuit calculates a product of the mantissa part of the first floating-point number and the mantissa part of the second floating-point number based on each mantissa part obtained after splitting and the second number of shifted bits of each mantissa part obtained after splitting. The floating-point number calculation circuit can split a large bit-width floating-point number into small bit-width floating-point numbers, so that a small bit-width multiplier is used to calculate the large bit-width floating-point number. Therefore, calculation performance of the multiplier is appropriately used, timing overheads are small, and hardware design costs are low.
The following clearly describes the technical solutions in this disclosure with reference to the accompanying drawings in this disclosure. It is clear that the described embodiments are merely some rather than all of the embodiments of this disclosure. The following several specific embodiments may be combined with each other, and same or similar content is not repeatedly described in different embodiments. It should be further noted that lengths, widths, and heights (or thicknesses) of various components shown in embodiments of this disclosure are merely examples for description, and are not intended to limit the storage unit in this disclosure.
Currently, there are four common formats of floating-point numbers: FP16, FP32, FP64 and FP128. Each floating-point number includes three parts: a sign bit (sign), an exponent bit (exp), and a mantissa bit (mantissa). An actual value of a floating-point number is equal to sign * 2exp* mantissa.
As shown in
When a floating-point number A*B is calculated, a calculation process of an exponential part is A _exp+B_exp, and a calculation process of a mantissa part is A_mantissa*B_mantissa. Then, a newly obtained exp and mantissa are used to generate a new floating-point number according to a format in a standard.
When a floating-point number A+B is calculated, a larger one between A_exp and B_exp is first calculated. It is assumed that A_exp is n greater than B_exp. When mantissas are added, B_mantissa needs to be first shifted rightwards by n bits, and then B_mantissa obtained after shifting is added to A_mantissa to obtain a new mantissa. Then, a new floating-point number is generated according to a standard. When a plurality of floating-point numbers are added together, a maximum exp is first obtained, mantissas are correspondingly shifted based on differences between the maximum exp and exps of all floating-point numbers, and then the mantissas obtained after shifting are added.
Refer to
In this embodiment of this disclosure, an input terminal of the splitting circuit 102 is electrically connected to an output terminal of the memory controller 101, and an output terminal of the splitting circuit 102 is electrically connected to an input terminal of the storage circuit 103. An input terminal of the exponential processing circuit 104 is electrically connected to a first output terminal of the storage circuit 103, and an output terminal of the exponential processing circuit 104 is electrically connected to a first input terminal of the calculation circuit 105. A second input terminal of the calculation circuit 105 is electrically connected to a second output terminal of the storage circuit 103.
In this embodiment of this disclosure, a memory stores a first floating-point number and a second floating-point number, and the memory controller 101 is configured to obtain the first floating-point number and the second floating-point number. Optionally, the memory may be a double data rate (DDR) memory, or may be another memory. This is not specifically limited herein. The memory controller may be a DDR controller, or may be a memory controller of another type. This is not specifically limited herein.
In this embodiment of this disclosure, the splitting circuit 102 is configured to split a mantissa part of the first floating-point number and a mantissa part of the second floating-point number, and obtain a first number of shifted bits of each mantissa part obtained after splitting. The storage circuit is 103 configured to store each mantissa part obtained after splitting, an exponential part corresponding to each mantissa part obtained after splitting, and the first number of shifted bits of each mantissa part obtained after splitting.
For example, if the first floating-point number is an FP32 floating-point number, it is assumed that the mantissa part of the first floating-point number is 100000000000000000000001. The splitting circuit 102 may split the mantissa part of the first floating-point number into a part A whose length is 12 bits and a part B whose length is 12 bits. The part A is 100000000000, and the part B is 000000000001. If the part A is used as a reference, the part B obtained after splitting needs to be shifted rightwards by 12 bits, and then a result obtained after shifting is added to the part A to obtain the mantissa part of the first floating-point number. Therefore, the first number of shifted bits that is of the part B obtained after splitting and that is obtained by the splitting circuit 102 indicates to shift rightwards by 12 bits.
The foregoing splitting manner is merely used as an example for description. Optionally, the first floating-point number may be an FP32 floating-point number. Alternatively, the first floating-point number may be an FP64 floating-point number. Alternatively, the first floating-point number may be an FP128 floating-point number. This is not specifically limited herein. Optionally, when the mantissa part of the first floating-point number is split, the mantissa part may be split into two parts, or may be split into a plurality of parts. This is not specifically limited herein. All mantissa parts obtained after splitting may have a same number of bits, or all mantissa parts obtained after splitting may have a different number of bits. This is not specifically limited herein.
In this embodiment of this disclosure, a data type of the second floating-point number is similar to a data type of the first floating-point number, and a splitting manner for the mantissa part of the second floating-point number is similar to a splitting manner for the mantissa part of the first floating-point number. Details are not described herein again.
In this embodiment of this disclosure, the exponential processing circuit 104 is configured to add an exponential part of the first floating-point number and an exponential part of the second floating-point number to obtain a first operation result. The first operation result is an operation result of an exponential part obtained when the first floating-point number and the second floating-point number are multiplied. The exponential processing circuit 104 is further configured to: add the first number of shifted bits of each mantissa part obtained after splitting and an exponential part corresponding to each mantissa part obtained after splitting, to obtain a plurality of second operation results, and obtain, based on the plurality of second operation results, a second number of shifted bits of each mantissa part obtained after splitting. The calculation circuit 105 is configured to calculate a product of the mantissa part of the first floating-point number and the mantissa part of the second floating-point number based on each mantissa part obtained after splitting and the second number of shifted bits of each mantissa part obtained after splitting.
Refer to
In this disclosure, two specific splitting manners for the first high-order mantissa and the first low-order mantissa are provided, and are described in detail in the following embodiment.
Manner 1: The first high-order mantissa includes a first mantissa, the first low-order mantissa includes a second mantissa, the second high-order mantissa includes a third mantissa, and the second low-order mantissa includes a fourth mantissa.
For example, if the first floating-point number is an FP32 floating-point number, it is assumed that the mantissa part of the first floating-point number is 100000000011000000000001. The splitting circuit 102 may split the mantissa part of the first floating-point number into the first mantissa whose length is 11 bits and the second mantissa whose length is 13 bits. The first mantissa is 10000000001, and the second mantissa is 1000000000001.
In this embodiment, the first mantissa belongs to the first high-order mantissa, and the second mantissa belongs to the first low-order mantissa. The first number of shifted bits indicates the shift difference between the most significant bit of each high-order mantissa and the most significant bit of each low-order mantissa. To be specific, a number of shifted bits of the first mantissa is 0, and the first number of shifted bits of the second mantissa is a shift difference of 11 bits between a first bit of the second mantissa and a first bit of the first mantissa. Therefore, the first number of shifted bits of the second mantissa indicates to shift rightwards by 11 bits.
In this embodiment, a splitting manner for the second high-order mantissa is similar to that of the first high-order mantissa, and a splitting manner for the second low-order mantissa is similar to that for the first low-order mantissa. Details are not described herein again.
Manner 2: The first high-order mantissa includes a first mantissa, the first low-order mantissa includes a second mantissa, a third mantissa, a fourth mantissa, and a fifth mantissa, the second high-order mantissa includes a sixth mantissa, and the second low-order mantissa includes a seventh mantissa, an eighth mantissa, a ninth mantissa, and a tenth mantissa.
For example, if the first floating-point number is an FP64 floating-point number. It is assumed that the splitting circuit 102 may split the mantissa part of the first floating-point number into the first mantissa 10001 whose length is 5 bits, the second mantissa 100000000001 whose length is 12 bits, the third mantissa 100000000011 whose length is 12 bits, the fourth mantissa 100000000111 whose length is 12 bits, and the fifth mantissa 100000001111 whose length is 12 bits.
In this embodiment, the first mantissa belongs to the first high-order mantissa, and the second mantissa, the third mantissa, the fourth mantissa, and the fifth mantissa belong to the first low-order mantissa. The first number of shifted bits indicates a shift difference between a most significant bit of each high-order mantissa and a most significant bit of each low-order mantissa. To be specific, the number of shifted bits of the first mantissa is 0, and the first number of shifted bits of the second mantissa is a shift difference of five bits between a first bit of the second mantissa and a first bit of the first mantissa, and is the same as a number of bits of the first mantissa. Therefore, the first number of shifted bits of the second mantissa indicates to shift rightwards by five bits. The first number of shifted bits of the third mantissa is a shift difference of 17 bits between a first bit of the third mantissa and the first bit of the first mantissa, and is the same as a sum of numbers of shifted bits of the first mantissa and the second mantissa. Therefore, the first number of shifted bits of the third mantissa indicates to shift rightwards by 17 bits. The first number of shifted bits of the fourth mantissa is a shift difference of 29 bits between a first bit of the fourth mantissa and the first bit of the first mantissa, and is the same as a sum of numbers of shifted bits of the first mantissa, the second mantissa, and the third mantissa. Therefore, the first number of shifted bits of the fourth mantissa indicates to shift rightwards by 29 bits. The first number of shifted bits of the fifth mantissa is a shift difference of 41 bits between a first bit of the fifth mantissa and the first bit of the first mantissa, and is the same as a sum of numbers of shifted bits of the first mantissa, the second mantissa, the third mantissa, and the fourth mantissa. Therefore, the first number of shifted bits of the fifth mantissa indicates to shift rightwards by 41 bits.
In this embodiment, the first high-order mantissa and the second high-order mantissa may alternatively be split in another different manner. For example, the length of the first mantissa is 9 bits, and the lengths of the second mantissa, the third mantissa, the fourth mantissa, and the fifth mantissa are all 11 bits. This is not specifically limited herein.
In this embodiment, a splitting manner for the second high-order mantissa is similar to that of the first high-order mantissa, and a splitting manner for the second low-order mantissa is similar to that for the first low-order mantissa. Details are not described herein again.
In this embodiment of this disclosure, in addition to the splitting manners provided in Manner 1 and Manner 2, the floating-point number calculation circuit may further use another splitting manner when calculating a product of floating-point numbers. This is not specifically limited herein.
Refer to
In this embodiment of this disclosure, an input terminal of the first adder is electrically connected to a first output terminal of a storage circuit, and an output terminal of the first adder is electrically connected to a first input terminal of the second adder. A second input terminal of the second adder is electrically connected to an output terminal of the selection circuit, and an output terminal of the second adder is electrically connected to a first input terminal of a calculation circuit.
In this embodiment of this disclosure, the first adder is configured to add a first number of shifted bits of each mantissa part obtained after splitting and an exponential part corresponding to each mantissa part obtained after splitting, to obtain a plurality of second operation results. The selection circuit is configured to select a largest value in the plurality of second operation results. The second adder is configured to subtract each second operation result from the largest value in the plurality of second operation results, to obtain a second number of shifted bits of each mantissa part obtained after splitting.
Optionally, the calculation circuit may include a multiplier, a shift register, and a third adder.
In this embodiment of this disclosure, an input terminal of the multiplier is electrically connected to a second output terminal of the storage circuit, and an output terminal of the multiplier is electrically connected to a first input terminal of the shift register. A second input terminal of the shift register is electrically connected to an output terminal of the second adder. An output terminal of the shift register is electrically connected to an input terminal of the third adder.
In this embodiment of this disclosure, the multiplier is configured to respectively multiply all mantissa parts that are obtained after splitting and that include the first high-order mantissa and the first low-order mantissa by all mantissa parts that are obtained after splitting and that include the second high-order mantissa and the second low-order mantissa, to obtain a plurality of pieces of multiplication data. The shift register is configured to perform shift processing on the plurality of pieces of multiplication data based on the second number of shifted bits of each mantissa part obtained after splitting. The third adder is configured to perform an addition operation on the plurality of pieces of multiplication data obtained after shift processing, to obtain a product of a mantissa part of a first floating-point number and a mantissa part of a second floating-point number.
Refer to
As shown in
The exponential part corresponding to AMSB is A_EXP, and an exponential part corresponding to BLSB is B_EXP. A number of shifted bits AMSB obtained by the splitting circuit is 0, and a number of shifted bits BLSB is –12. For ease of calculation, the number of shifted bits –12 can be split into –6 and –6, and exponential parts are respectively denoted as A_EXP–6 and B_EXP–6. The EXP offset (the first adder) adds results of AMSB–6 and BLSB–6 to obtain A_EXP+B_EX_12. A_EXP+B_EXP–12 is the second operation result corresponding to AMSB* BLSB. The second operation result may indicate an operation result obtained after exponential parts corresponding to AMSB * BLSB are multiplied.
The exponential part corresponding to ALSB is A_EXP, and an exponential part corresponding to BMSB is B_EXP. A number of shifted bits ALSB obtained by the splitting circuit is –12, and a number of shifted bits BMSB is 0. For ease of calculation, the number of shifted bits –12 can be split into –6 and –6, and exponential parts are respectively denoted as A_EXP–6 and B_EXP–6. The EXP offset (the first adder) adds results of ALSB–6 and BMSB–6 to obtain A_EXP+B_EXP–12. A_EXP+B_EXP–12 is the second operation result corresponding to ALSB* BMSB. The second operation result may indicate an operation result obtained after exponential parts corresponding to ALSB * BMSB are multiplied.
The exponential part corresponding to ALSB is A_EXP, and an exponential part corresponding to BLSB is B_EXP. A number of shifted bits ALSB obtained by the splitting circuit is –12, and a number of shifted bits BLSB is –12. The EXP offset (the first adder) adds results of ALSB–12 and BLSB–12 to obtain A_EXP+B_EXP–24. A_EXP+B_EXP–24 is the second operation result corresponding to ALSB* BLSB. The second operation result may indicate an operation result obtained after exponential parts corresponding to ALSB* BLSB are multiplied.
After the plurality of second operation results are obtained through calculation, the selection circuit obtains MAX EXP (the largest value in the plurality of second operation results), and then inputs MAX EXP to each delta (the second adder). Each delta subtracts each second operation result from MAX EXP, to obtain the second number of shifted bits of each mantissa part obtained after splitting.
Each 13-bit Mul unit (the multiplier) separately calculates AMSB* BMSB, AMSB* BLSB, ALSB* BMSB, and ALSB* BLSB to obtain a plurality of pieces of multiplication data. A shift (shift register) shifts each part of input multiplication data after receiving each second number of shifted bits sent by each delta. An adder (the third adder) adds a plurality of pieces of multiplication data obtained after shifting, to obtain the product of the mantissa part of the first floating-point number and the mantissa part of the second floating-point number.
In this embodiment, optionally, the number of shifted bits –12 may alternatively be split in another manner, and may be split into –3 and –9, –4 and –8, or a plurality of other split manners, provided that a sum of numbers of shifted bits of two parts obtained after splitting is –12. This is not specifically limited herein. Similarly, the number of shifted bits –24 may alternatively be split in different manners. This is not specifically limited herein.
In this embodiment, optionally, the number of shifted bits –12 may alternatively be split in another manner, and may be split into –3 and –9, –4 and –8, or a plurality of other split manners, provided that a sum of numbers of shifted bits of two parts obtained after splitting is –12. This is not specifically limited herein. Similarly, the number of shifted bits –24 may alternatively be split in different manners. This is not specifically limited herein.
In this embodiment of this disclosure, refer to
Example 2: If both the first floating-point number A and the second floating-point number B are FP64 floating-point numbers, when the FP64 floating-point numbers are calculated, the mantissa part of the first floating-point number is split into five parts: a0, a1, a2, a3, a4, and a5. The mantissa part of the first floating-point number is split into five parts: b0, b1, b2, b3, b4, and b5. a1, a2, a3, a4, b1, b2, b3, and b4 are all 12 bits, and a0 and b0 are 5 bits. A multiplication of the mantissa part of the first floating-point number A and the mantissa part of the second floating-point number B may be represented as a formula 2.
A process in which the exponential processing circuit and the calculation circuit calculate the product of the first floating-point number and the mantissa part of the second floating-point number is similar to that in the embodiment shown in Example 1. Details are not described herein again.
In this embodiment, because a length of the mantissa part of the FP64 floating-point number is 53 bits, a total length of mantissa parts obtained after calculation of A_mantissa*B_mantissa is 106 bits. To directly calculate mantissa parts of a pair of FP64 floating-point numbers in one calculation module, the adder (the third adder) needs to be extended to support calculation of 106-bit data. However, both area costs and timing costs of the extended adder are extremely high. Therefore, mantissas of the pair of FP64 floating-point numbers can be split into two parts for multiplication.
Refer to
The floating-point number calculation circuit provided in this embodiment of this disclosure may be used in a CNN. A specific application process is described in detail in the following embodiment.
It is assumed that both a first floating-point number A and a second floating-point number B are FP32 floating-point numbers, and the first floating-point number A is data in a feature map.
Step 1: Refer to
Step 2: Refer to
Step 3: Refer to
Step 4: Refer to
Step 5: Repeat the step 4 until all data is calculated, to generate a result.
The floating-point number calculation circuit and the floating-point number calculation method provided in embodiments of this disclosure are described in detail above. The principle and implementations of this disclosure are described herein through specific examples. The foregoing embodiments are merely intended to help understand the method and core idea of this disclosure. In addition, a person of ordinary skill in the art may make variations and modifications to this disclosure in terms of the specific implementations and application scopes based on the ideas of this disclosure. Therefore, the content of this specification shall not be construed as a limitation to this disclosure.
This is a continuation of International Patent Application No. PCT/CN2020/125676 filed on Oct. 31, 2020, the disclosure of which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2020/125676 | Oct 2020 | WO |
Child | 18309269 | US |