This application is based upon and claims the benefit of priority from Japanese patent application No. 2009-086006, filed on Mar. 31, 2009, the disclosure of which is incorporated herein in its entirety by reference.
The present invention relates to a vector multiplication processing device, and a method and a program thereof and, more particularly, to a technique of coping with a plurality of data formats by one multiplication circuit.
For speeding up multiplication result calculation, a vector multiplication processing device capable of copying with a plurality of data formats by one multiplication circuit is mounted with a dedicated hardware circuit for overflow foresight processing of a fixed point data format or sticky bit foresight processing of a floating point data format.
For example, disclosed in Patent Literature 1 is a floating point multiplier mounted with a sticky bit foresight circuit of a floating point data format, which executes high-speed arithmetic by generating a sticky bit in parallel with multiplication operation of a mantissa part of floating point data.
Disclosed in Patent Literature 2 is a technique of, in an array multiplier formed of a partial product array including a plurality of array elements, reducing the number of array elements for use in calculation of an operand product by shifting an operand smaller than a corresponding size of the partial product array toward the most significant element of the array or toward a column.
Patent Literature 1: Japanese Patent Laying-Open No. 2000-259394.
Patent Literature 2: Japanese Patent Laying-Open No. 2008-533617.
According to the technique disclosed in the above-described Patent Literature 1, since the foregoing processing is determined based on an output of a multiplication circuit, with such a speed-up circuit mounted, even when arithmetic operation is executed at a partial product generation circuit in the multiplication circuit, there exists a region resultingly not referred to. In a case of a vector multiplier, successive arithmetic operation by pipelining processing with respect to a vector element makes the circuit constantly operate for each element, which is one factor in increasing power consumption.
On the other hand, while the technique disclosed in Patent Literature 2 avoids the above-described problem, shifting a multiplicand or a multiplicator, or both of them generates an array element not used, so that a circuit element therefor is required and a processing load therefor is required as well.
An object of the present invention is to provide a vector multiplication processing device, and a method and a program thereof which realize, when a speed-up circuit is mounted, reduction in power consumption without requiring shift of an operand by directly suppressing a region not to be referred to as a result even if a partial product generation circuit in a multiplication circuit executes arithmetic operation by means of the partial product generation circuit.
According to a first exemplary aspect of the invention, a vector multiplication processing device which calculates a product of a first operand and a second operand input based on a multiplication instruction, includes an overflow foresight circuit of a fixed point data format, a sticky bit foresight circuit of a floating point data format, and a multiplication circuit including a partial product generation circuit which uses the overflow foresight circuit and the sticky bit foresight circuit to generate a partial product of a first operand and a second operand input and a partial product control circuit which suppresses operation of the partial product generation circuit in a specific region resultingly not referred to related to generation of the partial product according to the multiplication instruction and data format.
According to a second exemplary aspect of the invention, a vector multiplication processing method for use in a vector multiplication processing device including a multiplication circuit which calculates a product of a first operand and a second operand input based on a multiplication instruction, wherein the multiplication circuit includes a partial product generation step of generating a partial product of input first operand and second operand by using an overflow foresight circuit of a fixed point data format and a sticky bit foresight circuit of a floating point data format, and a circuit operation suppression step of suppressing circuit operation in a specific region resultingly not referred to related to generation of the partial product according to the multiplication instruction and data format.
According to a third exemplary aspect of the invention, a vector multiplication processing program of a vector multiplication processing device executed on a computer, which device comprises at least an overflow foresight circuit of a fixed point data format and a sticky bit foresight circuit of a floating point data format to calculate a product of a first operand and a second operand input based on a multiplication instruction, includes a partial product generation processing of generating a partial product of input first operand and second operand by using the overflow foresight circuit and sticky bit foresight circuit, and a circuit operation suppression processing of suppressing circuit operation in a specific region resultingly not referred to related to generation of the partial product according to the multiplication instruction and data format.
The present invention enables provision of a vector multiplication processing device, and a method and a program thereof which realize, when a speed-up circuit is mounted, reduction in power consumption without requiring shift of an operand by directly suppressing a region not to be referred to as a result even if a partial product generation circuit in a multiplication circuit executes arithmetic operation by means of the partial product generation circuit.
The reason is that the partial product control circuit suppresses circuit operation in a specific range resultingly not referred to related to an output of the partial product circuit according to a multiplication instruction and a data format.
Next, exemplary embodiments of the present invention will be described in detail with reference to the drawings.
With reference to
The vector register 1 is connected to the preprocessing circuit 3 and the fixed point overflow foresight circuit 5 and stores a first operand (OP). The vector register 2 is connected to the preprocessing circuit 3 and the fixed point overflow foresight circuit 5 and stores a second operand. The preprocessing circuit 3 is connected to the vector register 1 or the vector register 2, and the multiplication circuit 4, the sticky bit foresight circuit 6 and the exponent part adder 9 and divides an operand supplied from the vector register 1 or the vector register 2 into an exponent part and a mantissa part according to a multiplication instruction and a data format.
The multiplication circuit 4 is connected to the preprocessing circuit 3, the floating point adder 7 and the fixed point adder 8 and multiplies mantissa parts which are outputs of the preprocessing circuit 3 to output a multiplication result to the floating point adder 7 and the fixed point adder 8.
The fixed point overflow foresight circuit 5 is connected to the vector register 1, the vector register 2 and the selection circuit 13 and with the first operand and the second operand as an input, foresees whether a fixed point multiplication result overflows or not. The sticky bit foresight circuit 6 is connected to the preprocessing circuit 3 and the normalization rounding circuit 11 and with a first operand mantissa part and a second operand mantissa part as an input, foresees a sticky bit for use in rounding processing out of floating point multiplication results.
The floating point adder 7 is connected to the multiplication circuit 4, the zero counter 10 and the normalization rounding circuit 11 and adds two outputs of the multiplication circuit 4 to output a result to the zero counter 10 and the normalization rounding circuit 11. The fixed point adder 8 is connected to the multiplication circuit 4 and the selection circuit 13 and adds two outputs of the multiplication circuit 4 to output an effective digit out of the addition results to the selection circuit 13. The output of the fixed point adder 8 will be a fixed point multiplication result.
The exponent part adder 9 is connected to the preprocessing circuit 3 and the exponent part correction circuit 12 and executes determination of a code as an output of the preprocessing circuit 3 and addition of exponent parts to output the code and an exponent addition result to the exponent part correction circuit 12. The zero counter 10 is connected to the floating point adder 7, the normalization rounding circuit 11 and the exponent part correction circuit 12 and with an output of the floating point adder 7 as an input, counts the number of bits 0 from a most significant bit (MSB) and outputs the count to the normalization rounding circuit 11 and the exponent part correction circuit 12.
The normalization rounding circuit 11 is connected to the sticky bit foresight circuit 6, the floating point adder 7, the zero counter 10 and the selection circuit 13 and according to the output of the zero counter 10, shifts and normalizes an output of the floating point adder 7 and furthermore, with an output of the sticky bit foresight circuit 6 as an input, executes rounding processing to output a result to the selection circuit 13. The output of the normalization rounding circuit 11 will be a mantissa part of the floating point multiplication result. The exponent part correction circuit 12 is connected to the exponent part adder 9, the zero counter 10 and the selection circuit 13 and according to the output of the zero counter 10, corrects an exponent part addition result out of the output of the exponent part adder 9. The output of the exponent part correction circuit 12 will be an exponent part of the floating point multiplication result.
The selection circuit 13 is connected to the fixed point overflow foresight circuit 5, the fixed point adder 8, the normalization rounding circuit 11 and the exponent part correction circuit 12 and when a multiplication instruction indicates floating point multiplication, links a code and an exponent part output of the exponent correction circuit 12 and a mantissa part output of the normalization rounding circuit 11 to output a floating point multiplication result. When the multiplication instruction indicates fixed point multiplication, output the output of the fixed point adder 8 as a fixed point arithmetic result. When at this time, the output of the fixed point overflow foresight circuit 5 indicates overflow, output a predetermined format (the maximum number etc.) as an arithmetic result of the fixed point multiplication.
With reference to
The partial product control circuit 42 is connected to the partial product generation circuit 41 and obtains a multiplication instruction and a data format as an input to generate a control signal (off1, off2, off3, off4) and output the same to the partial product generation circuit 41. The partial product generation circuit 41 is connected to the preprocessing circuit 3, the partial product control circuit 42, the decoder 43 and the partial product adder 44 and obtains a mantissa part of the second operand as an input to generate a partial product with the second operand mantissa part multiplied based on a decoding signal sent from the decoder 43 and the off signal output from the partial product control circuit 42.
The partial product adder 44 is connected to the partial product generation circuit 41, the floating point adder 7 and the fixed point adder 8 and adds a number n of partial products as outputs of the partial product generation circuit 41 until the remaining number of the partial products goes two to output ultimately obtained two partial products to the floating point adder 7 and the fixed point adder 8.
Next, operation of the vector multiplication processing device 20 according to the present exemplary embodiment will be detailed with reference to
The vector multiplication processing device 20 according to the present exemplary embodiment executes floating point multiplication and fixed point multiplication of vector data by the same hardware according to a multiplication instruction and a data format. Here, description will be made of a vector multiplication processing device, as an example, which copes with a total of four control pattern (see
First, description will be made of operation to be executed when fixed point multiplication is executed with reference to the schematic diagrams of the multiplication array 41 shown in
It is assumed that a multiplication instruction to be sent to the above-described preprocessing circuit 3, multiplication circuit 4 and selection circuit 12 is designated to be “fixed point multiplication” and a data format is designated to be “64 bits” or “32 bits”. At this time point, according to the multiplication instruction and the data format, the preprocessing circuit 3 here outputs “0” as an exponent part to the exponent adder 9 because of fixed point multiplication and in a case of fixed point multiplication 64 bits, outputs all the bits of the first and the second operands as a mantissa part as shown, for example, in
With the input first operand mantissa part of 64 bits as a multiplicator and the second operand mantissa part as a multiplicand, the multiplication circuit 4 aligns a result (partial products) obtained by multiplying each bit of the multiplicator by the multiplicand in n stages (multiplication array) in a form of binary calculation by writing as shown in
In the vector multiplication processing device 20 according to the present exemplary embodiment, the fixed point overflow foresight circuit 5 foresees whether a fixed point multiplication result overflows or not with the first and second operands as an input and outputs the result to the selection circuit 12. Therefore, the region indicated by the dotted line in
As to foresight of an overflow of fixed point multiplication, it is known that the number of bits “0” from MSB of each input data is counted and when the total is within a fixed number, overflow occurs.
In the structure of the multiplication circuit 4 shown in
In
With reference to
Return the description to
Next, operation at the time of execution of floating point multiplication will be described with reference to the schematic diagrams of the multiplication arrays shown in
According to the multiplication instruction and the data format, in a case, for example, of a floating point multiplication double precision as shown in
In a case of a floating point multiplication double precision, the mantissa part (M) of 52 bits of the first and second operands and 11 bits of “0” are added to the top hidden bit “1” of the mantissa part in the expression in the IEEE floating point data format as shown in
With the input first operand mantissa part of 64 bits as a multiplicator and the second operand mantissa part as a multiplicand, the multiplication circuit 4 aligns partial products obtained by multiplying each bit of the multiplicator by the multiplicand in n stages in a form of binary calculation by writing as shown in
In the structure of the vector multiplication processing device 20 according to the present exemplary embodiment, since the sticky bit foresight circuit 6 foresees a sticky bit with the first and second operands as an input and outputs the result to the normalization rounding circuit 11, the region indicated by the dotted lines in
Return the description to
In a case of floating point multiplication double precision, the off3 signal is generated. In a case of floating point multiplication single precision, the off4 signal is generated. Each off signal is assumed to attain “0” when it is effective. When to one bit of the partial product generation circuit 41 in
In
At this time, the number of shifts as the output of the zero counter 10 is output also to the exponent part correction circuit 12, which exponent part correction circuit 12 corrects the exponent part to obtain a code and an exponent part of the floating point multiplication result. The selection circuit 13 combines the output of the exponent part correction circuit 12 and the output of the normalization rounding circuit 11 and outputs the obtained result as an arithmetic result of the floating point multiplication.
First effect obtained by the present invention is reduction in power consumption of a vector multiplication processing device which supports a plurality of data formats by one multiplication circuit.
The reason is that by controlling operation of the partial product generation circuit in the multiplication circuit on a basis of a multiplication instruction and a data format, operation of a region resultingly not referred to related to an output of the partial product generation circuit is suppressed.
Next, the vector multiplication processing device 20 according to a second exemplary embodiment of the present invention will be described with reference to a structural diagram of the vector multiplication processing device 20 shown in
The vector multiplication processing device 20 according to the present exemplary embodiment shown in
In IEEE floating point arithmetic, since as a result of arithmetic of a floating point, a result generated because of application of a false operand is output as a non-numeric value NaN, no result of the multiplication circuit 4 will be referred to. Accordingly, when an output of the non-numeric value detection circuit 14 is a non-numeric value at the time of a floating point multiplication instruction, supplying an off signal to all the regions of the partial product generation circuit 41 by the partial product control circuit 42 enables operation of the entire circuit following the partial product generation circuit 41 to be stopped, thereby further reducing power consumption.
According to the vector multiplication processing device 20 according to the present exemplary embodiment, by detecting a non-numeric value of an IEEE floating point data format and when a non-numeric value is detected, supplying an off signal to all the regions of the partial product generation circuit 41 by the partial product control circuit 42 enables operation of the entire circuit following the partial product generation circuit 41 to be stopped, thereby realizing further reduction of power consumption in this case.
The functions that the multiplication circuit 4 of the vector multiplication processing device 20 shown in each of
Although the present invention has been described with respect to the preferred exemplary embodiments and modes of implementation in the foregoing, the present invention is not necessarily limited to the above-described exemplary embodiments and modes of implementation and can be implemented in various modifications without departing from the scope of their technical ideas.
Number | Date | Country | Kind |
---|---|---|---|
2009-086006 | Mar 2009 | JP | national |