The present disclosure generally relates to a floating point multiplier circuit in a processor, and more particularly to a floating point multiplier circuit using a merged compressor flop circuit.
Modern processors, such as central processing units (“CPU's”) and graphical processing units (“GPU's”), are generally capable of implementing a floating point multiplication calculation. The term floating point refers to the fact that the radix point (decimal point, or, more commonly in computers, binary point) can “float”; that is, it can be placed anywhere relative to the significant digits of the number. Floating point calculations typically take at least three clock cycles for the processor to perform. Furthermore, the processor requires large numbers of circuit elements to perform the floating point calculation which can take up a large amount of space on the processor and can use a large amount of power.
In order to improve the performance of a floating point calculation in a processor, as well as to reduce the area required by the floating point multiplier and reduce the amount of power consumed thereby, a merged compressor flip-flop circuit is used.
A merged compressor flip-flop circuit is provided, the merged compressor flip-flop circuit includes a compressor circuit having a front-end and a back-end, the front-end configured to receive four input bits and to output a first carry-bit to a back-end of a second compressor circuit, the front end further configured to output intermediate sum signals to the back-end of the compressor circuit, the back-end configured to receive the intermediate sum signals from the front-end and further configured to receive a second carry-bit from a front-end of a third compressor circuit, the back-end further configured to output a sum-bit and a third carry-bit based upon the intermediate sum signals and the second carry-bit, and a flip-flop circuit configure to receive the sum-bit and third carry-bit and to store the sum-bit and third carry-bit, wherein the back-end of the compressor circuit directly drives the sum-bit and third carry-bit into the flip-flop circuit.
A processor including a floating point multiplier circuit is provided. The processor includes a plurality of merged compressor latch circuits. Each of the merged compressor latch circuits include a compressor circuit comprising a front-end and a back-end, the front-end configured to receive four input bits and to output a first carry-bit to a back-end of a compressor circuit in a second merged compressor latch circuit, the front end further configured to output intermediate sum signals to the back-end of the compressor circuit, the back-end configured to receive the intermediate sum signals from the front-end and further configured to receive a second carry-bit from a front-end of a compressor circuit of a third merged compressor latch circuit, the back-end further configured to output a sum-bit and a third carry-bit based upon the intermediate sum signals and the second carry-bit, and a latch circuit configure to receive the sum-bit and third carry-bit and to store the sum-bit and third carry-bit, wherein the back-end of the compressor circuit directly drives the sum-bit and third carry-bit into the latch circuit.
A computer-readable medium having computer-executable instructions or data stored thereon that, when executed, facilitate fabrication of a semiconductor device is provided. The semiconductor device includes a compressor circuit having a front-end and a back-end, the front-end configured to receive four input bits and to output a first carry-bit to a back-end of a second compressor circuit, the front end further configured to output intermediate sum signals to the back-end of the compressor circuit, the back-end configured to receive the intermediate sum signals from the front-end and further configured to receive a second carry-bit from a front-end of a third compressor circuit, the back-end further configured to output a sum-bit and a third carry-bit based upon the intermediate sum signals and the second carry-bit, and a latch circuit configure to receive the sum-bit and third carry-bit and to store the sum-bit and third carry-bit, wherein the back-end of the compressor circuit directly drives the sum-bit and third carry-bit into the latch circuit.
The present embodiments will hereinafter be described in conjunction with the following figures.
The following detailed description of embodiments is merely exemplary in nature and is not intended to limit the embodiments or the application and uses of the embodiments. Furthermore, there is no intention to be bound by any theory presented in the preceding background or the following detailed description.
The compressor circuit 110 receives four single-bit inputs A-D and outputs a signal XABCD. The signal XABCD is the output of the equation: (A⊕B)⊕(C⊕D), where “⊕” symbolizes an exclusive OR (“XOR”) operation. Any combination of logic gates may be used to generate the signal XABCD. The compressor circuit 110 also outputs an inverse carry-bit
The signals XABCD,
While the circuit 100 receives four input bits A-D and outputs a sum-bit and carry-bit like a traditional 4:2 compressor circuit, the circuit 100 also outputs a carry-bit
Another advantage of the embodiment illustrated in
While the embodiments described herein suggest using a flip-flop to hold the output of the circuit 100, other latch circuitry may be used. For example, a transparent latch may be used.
Returning to
Returning to
The inverse output of the majority gate 240 is used as the inverse carry-bit
The compressor circuit 110 also includes an inverter 250 which inverts the input not received by the majority gate 240. As seen in
The back-end 124 of flip-flop 120 illustrated in
As discussed above, the signals
The master flop circuit 720 and slave flop circuit 730 illustrated in
The latching element 820 illustrated in
Physical embodiments of the subject matter described herein can be realized using existing semiconductor fabrication techniques and computer-implemented design tools. For example, hardware description language code, netlists, or the like may be utilized to generate layout data files, such as Graphic Database System data files (e.g., GDSII files), associated with various logic gates, standard cells and/or other circuitry suitable for performing the tasks, functions, or operations described herein. Such layout data files can be used to generate layout designs for the masks utilized by a fabrication facility, such as a foundry or semiconductor fabrication plant (or fab), to actually manufacture the devices, apparatus, and systems described above (e.g., by forming, placing and routing between the logic gates, standard cells and/or other circuitry configured to perform the tasks, functions, or operations described herein). In practice, the layout data files used in this context can be stored on, encoded on, or otherwise embodied by any suitable non-transitory computer readable medium as computer-executable instructions or data stored thereon that, when executed by a computer, processor, of the like, facilitate fabrication of the apparatus, systems, devices and/or circuitry described herein.
While at least one exemplary embodiment has been presented in the foregoing detailed description, it should be appreciated that a vast number of variations exist. It should also be appreciated that the exemplary embodiment or exemplary embodiments are only examples, and are not intended to limit the scope, applicability, or configuration of the embodiments in any way. Rather, the foregoing detailed description will provide those skilled in the art with a convenient road map for implementing an exemplary embodiment. It being understood that various changes may be made in the function and arrangement of elements described in an exemplary embodiment without departing from the scope of the embodiments as set forth in the appended claims.