This application claims the benefit of People's Republic of China application Serial No. 202011474919.6, filed Dec. 14, 2020, the subject matter of which is incorporated herein by reference.
The invention relates in general to a mixed-precision artificial intelligence (Al) processor and an operating method thereof.
The processor for performing Al calculation normally adopts one of Int8, BF16 and FP32 as the data format. In terms of calculation precision, FP32 is the highest, BF16 is the second, and Int8 is the lowest. In terms of calculation speed (or referred as computing power), Int8 is the highest, BF16 is the second, and FP32 is the lowest. That is, it is difficult for the AI processor to meet the requirement of calculation precision and the requirement of calculation speed using one data format.
According to one embodiment of the present invention, a mixed-precision artificial intelligence (AI) processor is provided. The AI processor includes a first calculation module, a second calculation module and a control module. The first calculation module is configured to perform calculation based on the data with a first format. The second calculation module is configured to perform calculation based on the data with a second format different from the first format. The control module is coupled to the first calculation module and the second calculation module to switch the AI processor to a first mode, a second mode or a third mode according to a calculation strategy and perform calculation based on an input data to obtain a calculation result; wherein the calculation strategy includes: the format used in each of several calculations is the first format or the second format; in the first mode, the control module enables the first calculation module to perform calculation based on the input data; in the second mode, the control module enables the second calculation module to perform calculation based on the input data; in the third mode, for each of the calculations, the control module enables the first calculation module or the second calculation mode to perform calculation based on the input data or a data derived from the input data according to the calculation strategy.
According to another embodiment of the present invention, an operating method of a mixed-precision AI processor applicable to an AI processor is provided. The operating method includes the following steps: An input data is received. The AI processor is switched to a first mode, a second mode or a third mode by a control module of the AI processor according to a calculation strategy. The calculation strategy includes: the format used in each of several calculations is a first format or a second format; in the first mode, the control module enables a first calculation module to perform the first format calculation based on the input data; in the second mode, the control module enables a second calculation module to perform the second format calculation based on the input data; and in the third mode, for each of the calculations, the control module enables the first calculation module or the second calculation mode to perform calculation based on the input data or a data derived from the input data according to the calculation strategy.
The above and other aspects of the invention will become better understood with regard to the following detailed description of the preferred but non-limiting embodiment (s). The following description is made with reference to the accompanying drawings.
The principles of the structures and operations of the present invention are disclosed below with accompanying drawings.
Referring to
The first calculation module 102 and the second calculation module 104 can be realized by two mutually independent circuits. For example, the first calculation module 102 can be realized by a first circuit, and the second calculation module 104 can be realized by a second circuit, wherein the first circuit and the second circuit can respectively include an adder, a multiplier, and a comparator configured to perform various logic operations. In an embodiment, the first circuit and the second circuit are mutually independent and are integrated on an integrated circuit chip through the layout of integrated circuit.
The control module 106 can be realized by hardware, firmware and software or a combination thereof. For example, the control module 106 can be realized by a combination of a third circuit and a decision program, the decision program determines the calculation strategy according to the input data, and determines whether to select the first mode, the second mode or the third mode to perform calculation based on the input data according to the calculation strategy. The third circuit is configured to instruct and/or select the circuit configuration of the first calculation module 102 and/or the second calculation module 104 according to the to-be-switched mode. The determination of the calculation strategy is based on the requirement of calculation speed, the requirement of calculation precision, the requirement of bandwidth, power consumption of the data, and/or a predetermined order. Specifically, in each round of decision process, the AI system needs to perform a series of “calculations”. Each “calculation” as defined in the present specification refers to a fundamental mathematical calculation such as addition, subtraction, multiplication or division, a composite convolution (product sum) formed of several fundamental mathematical calculations, or the calculation of a channel, a layer or even a network in a complicated machine learning architecture. Let object recognition of a picture performed by the AI system be taken for example. The AI system performs several rounds of filter processing on the picture to remove the background and sharpen the picture. In terms of mathematics, each filter processing can be an addition calculation, a multiplication calculation or a convolution calculation. That is, at each round of decision process during object recognition, the AI system performs a series of mathematical calculation (such as addition, multiplication, and convolution) on the input data (such as a picture); the control module 106 determines whether to select the first format or the second format to perform each calculation in the current series of calculations according to the requirement of calculation speed, the requirement of calculation precision, and the requirement of bandwidth, power consumption of the data so as to formulate the calculation strategy. For example, if the first format fits the entire series of calculations, the control module 106 switches the AI processor 10 to the first mode; if the second format fits the current series of calculations, the control module 106 switches the AI processor 10 to the second mode; if the first format fits a part of the current series of calculations and the second mode fits some other part of the current series of calculations, the control module 106 switches the AI processor 10 to the third mode. That is, the calculation decision represents the corresponding format of each calculation in the current series of calculations. For example, the current series of calculations includes a first calculation and a second calculation. The control module 106 determines to use the first format for the first calculation and use the second format for the second calculation. Thus, the calculation decision is: [the first calculation—the first format; the second calculation—the second format]. The control module 106 switches the AI processor 10 to the third mode. Moreover, when performing the first calculation, the control module 106 instructs/selects the first calculation module 102 to perform calculation; when performing the second calculation, the control module 106 instructs/selects the second calculation module 104 to perform calculation.
Referring to
In step S201, an input data is provided to the AI processor.
In step S203, the AI processor is switched to a first mode, a second mode or a third mode by a control module of the AI processor according to a calculation strategy. The calculation strategy includes determining whether the corresponding format of each of the calculations that need to be performed in one round of decision process is the first format or the second format. In the first mode, only the first format is used for calculation; in the second mode, only the second format is used for calculation; in the third mode, a combination of the first format and the second format is sued for calculation. In the first mode, step S205 is performed; in the second mode, step S207 is performed; in the third mode, step S209 is performed.
In step S205, the first calculation module is enabled by the control module. In an embodiment, the control module further disables the second calculation module.
In step S206, the calculations in the current round of decision process are performed by the first calculation module according to the input data. In an embodiment: if the format of the input data is not the first format, the first calculation module converts the format of the input data to the first format.
In step S207, the second calculation module is enabled by the control module. In an embodiment, the control module further disables the first calculation module.
In step S208, the calculations in the current round of decision process are performed by the second calculation module according to the input data. In an embodiment: if the format of the input data is not the second format, the second calculation module converts the format of the input data to the second format.
In step S209, for each calculation in the current round of decision process, one of the first calculation module and the second calculation module is enabled by the control module according to the calculation strategy.
In step S210, for each calculation in the current round of decision process, the calculations are performed by the enabled one of the first calculation module and the second calculation module according to the input data or the data derived from the input.
The above steps relate to each calculation that the AI system needs to perform in a decision process. That is, of the calculations that the AI system needs to perform in a decision process, all of them are performed by the first calculation module 102 alone or by the second calculation module 104 alone, or a part of them are performed by the first calculation module 102 and the remaining part of them are performed by the second calculation module 104.
According to the above method, when the calculation requires high precision, the AI processor 10 can select a calculation module with high precision data format to perform calculation; for other calculation not requiring high precision, the AI processor 10 can select a calculation module with low precision data format to perform calculation. Thus, the calculation speed of the AI processor can be effectively increased and at the same time the requirement of calculation precision can be met.
Referring to
Referring to
The operating method of
In step S401, an input data is provided to the AI processor.
In step S403, the AI processor is switched to a first mode, a second mode or a third mode by a control module of the AI processor according to a calculation strategy, wherein the calculation strategy includes determining whether the corresponding format of each of the calculations that need to be performed in one round of decision process is the first format or the second format. In the first mode, only the first format is used for calculation; in the second mode, only the second format is used for calculation; in the third mode, a combination of the first format and the second format is used for calculation.
In the first mode S405; in the second mode, step S407 is performed; in the third mode, step S409 is performed.
In step S405, the integrated calculation module is allocated to the first configuration by the control module.
In step S406, the calculations in the current round of decision process are performed by the integrated calculation module according to the input data. In an embodiment: if the format of the input data is not the first format, the integrated calculation module converts the format of the input data to the first format.
In step S407, the integrated calculation module is allocated to the second configuration by the control module.
In step S408, the calculations in the current round of decision process are performed by the integrated calculation module according to the input data. In an embodiment: if the format of the input data is not the second format, the integrated calculation module converts the format of the input data to the second format.
In step S409, for each calculation in the current round of decision process, the integrated calculation module is allocated to one of the first configuration and the second configuration by the control module according to the calculation strategy.
In step S410, for each calculation in the current round of decision process, the calculations are performed by the integrated calculation module according to the input data or the data derived from the input data.
In step S409, the data format of the input data is converted to be identical to the data format used in one of the first mode and the second mode, the integrated calculation module is switched to be the selected one of the first mode and the second mode by the control module, and calculations are performed by the integrated calculation module according to the input data to obtain a calculation result.
In an embodiment, since the AI system may use different types of data, such as pictures and formats, in each round of decision process, a part of calculations in each round of decision process are mutual independent.
Therefore, the control module 106306 can schedule the calculations using the first format together and schedule the calculations using the second format together. Thus, the number of times of data format conversion can be reduced and the calculation speed of the AI processor can be increased. Also, in the AI system adopting the AI processor 10 of
In an experiment, the same data group and the same series of calculation are used to test several AI systems using the same Yolo_v3_416 version but adopting different AI processors. In terms of the precision (accuracy) of calculation result, the precision of the uni-precision AI processor using data format FP32 is set as the reference level, that is, 100%, the precision of the uni-precision AI processor using Int8 is 90%, the precision of the uni-precision AI processor using data format BF16 is 100%, and the precision of the mixed-precision AI processor using data formats Int8 and BF16 is 99%. In terms of efficiency (calculation speed), the efficiency of the Uni-precision AI processor using data format Int8 is set as the reference level, that is, 100%, the efficiency of the uni-precision AI processor using data format BF16 is 26%, and the efficiency of the mixed-precision AI processor using data formats Int8 and BF1 is 96%. The above experimental data shows that in comparison to the uni-precision AI processor using data format BF16, the mixed-precision AI processor using data formats Int8 and BF16 is slightly lower in terms of accuracy of calculation result (decreased to 99% from 100%), but the calculation speed is greatly increased (increased to 96% from 26%). In another experiment, the same data group and the same series of calculation are used to test several AI systems using the same mobilenet_v1_0.25 version but adopting different AI processors. In In terms of precision (accuracy) of calculation result, the precision of the uni-precision AI processor using data format FP32 is set as the reference level 100%, the precision of the uni-precision AI processor using data format Int8 is 85.8%, the precision of the uni-precision AI processor using data format BF16 is 97.6%, and the precision of the mixed precision AI processor using data formats Int8 and BF16 is 96.1%, wherein the calculation amount of the AI processor using data format BF16 amounts to 15% of the calculation amount of the mixed precision AI processor using data formats Int8 and BF16. In terms of efficiency (calculation speed), the efficiency of the uni-precision AI processor using data format Int8 is set as the reference level, that is, 100%, the efficiency of the uni-precision AI processor using data format BF16 is 50%, and the efficiency of the mixed precision AI processor using data formats Int8 and BF16 is 69%, wherein the calculation amount of the AI processor using data format BF16 amounts to 15% of the calculation amount of the mixed precision AI processor using data formats Int8 and BF16. The above experimental data shows that in comparison to the uni-precision AI processor using data format BF16, the mixed-precision AI processor using data formats Int8 and BF16 is slightly lower in terms of accuracy of calculation result (decreased to 96.1% from 97.6%), but the calculation speed is greatly increased (increased to 69% from 50%).
To summarize, the mixed-precision AI processor of the present invention can select the most suitable one among three modes (the pure integer mode, the pure floating-point mode, and the integer floating-point mixed mode) to preform calculations according to actual requirements of efficiency and precision. In comparison to the uni-precision AI processor, the mixed-precision AI processor of the present invention is more flexible and fits actual needs better.
While the invention has been described by way of example and in terms of the preferred embodiment (s), it is to be understood that the invention is not limited thereto. On the contrary, it is intended to cover various modifications and similar arrangements and procedures, and the scope of the appended claims therefore should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements and procedures.
Number | Date | Country | Kind |
---|---|---|---|
202011474919.6 | Dec 2020 | CN | national |