High performance, low power incrementer for dynamic circuits

Information

  • Patent Grant
  • 6279024
  • Patent Number
    6,279,024
  • Date Filed
    Thursday, January 4, 1996
    28 years ago
  • Date Issued
    Tuesday, August 21, 2001
    23 years ago
Abstract
A dynamic incrementer, implemented in the Self Resetting Complementary Metal Oxide Semiconductor (SRCMOS) circuit family, which internally performs single rail calculations and which generates the dual rail result using a strobing technique. The carry-lookahead function is implemented with an OR tree using the complement input signals, resulting in a very fast and economical incrementer.
Description




FIELD OF THE INVENTION




The invention is a dynamic incrementer, implemented in the Self Resetting Complementary Metal Oxide Semiconductor (SRCMOS) circuit family, which internally performs single rail calculations and which generates the dual rail result using a strobing technique. The carry-lookahead function is implemented with an OR tree using the complement input signals, resulting in a very fast and economical incrementer.




BACKGROUND OF THE INVENTION




Circuits which perform addition by 1, known as incrementers, are widely used in microprocessors due to the sequential nature of instruction generation and execution. Implementation in dynamic logic offers considerable speed advantages. However, adders and incrementers use both true and complement signals. In dynamic logic schemes, if both true and complement (“dual rail”) signals are required, they usually have to be generated in parallel from the preceding latch, thereby consuming twice the area of and dissipating more power than single-rail logic. Therefore an optimized incrementer can provide a reduction in area and in power dissipated across an entire microprocessor chip.




In an incrementer, as in an adder, the critical path consists of the calculation of the carry signals. These are usually calculated by the use of an AND tree, which can be 64 high in state of the art 64-bit microprocessors. This limits the achievable speed.




SUMMARY OF THE INVENTION




The invention is comprised of an incrementer architecture based on a single rail, negative logic OR tree for the carry look-ahead function. Such an OR function is faster, dissipates less power, and occupies considerably less area than a corresponding AND function.




The dual rail sum is calculated using a strobed XOR function. This strobing technique eliminates the duplication associated with calculating both true and complement signals from the start.




This incrementer can be constructed using all types of dynamic logic whether the reset signal is generated locally, as in Self Resetting CMOS (SRCMOS) logic, or clock distributed as e.g., in Domino logic (see Weste and Eshragian, “Principles of CMOS VLSI Design: A systems perspective”, Addison Wesley, Reading Mass., 1988).




The above architecture allows this incrementer to be used in high speed circuits with low latency and fast cycle time.











BRIEF DESCRIPTION OF THE DRAWINGS





FIG. 1

is a schematic diagram of the major components of the incrementer.





FIG. 2

is a schematic diagram of the carry look-ahead OR tree of FIG.


1


.





FIG. 3

shows a self-resetting


4


-wide OR gate as used in FIG.


2


.





FIG. 4

shows an input latch of

FIG. 1

, used to convert dual rail pulsed input signals to a static signal.





FIG. 5

shows the self resetting strobed sum circuit.





FIG. 6

shows waveforms corresponding to the circuits of

FIGS. 4 and 5

.





FIG. 7

shows the strobe generator circuit which generates a strobe signal that matches the timing of the OR tree.





FIG. 8

shows the configurations of reset chain


1


and reset chain


2


of FIG.


1


.











DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS





FIG. 1

shows an overview of the major building blocks comprising the preferred implementation of the present invention. Each block will be described in detail below.




A preferred embodiment is a 64-bit incrementer. However, reduction of the present scheme to less bits or extension to more bits is straight forward.




The present invention can be implemented in any dynamic logic family. The embodiment shown here is in SRCMOS logic, as described in commonly assigned and copending U.S. application Ser. No. 08/463,146, filed Jun. 5, 1995, now U.S. Pat. No. 5,633,820, by Chappell et al., and complies with the SRCMOS test modes described in commonly assigned and copending U.S. patent application Ser. No. to 08/583,300, filed Dec. 6, 1995, now U.S. Pat. No. 5,748,012, by Chappell et al. (“Chappell”).




The core of the present invention is the carry look-ahead circuit. First, the familiar logic functions for the sum signals S


i


and carry signals C


i


are given for an n-bit adder (see Weste and Eshragian, “Principles of CMOS VLSI Design: A systems perspective”, Addison Wesley, Reading Mass., 1988):






S


i


=A


i


⊕B


i


⊕C


i












C




i+1




=A




i




B




i


+(


A




i




+B




i


)


C




i




i=


0 . . .


n−


1  (1)






For an incrementer, since B


i


=0(i=0 . . . n−1), this simplifies to:






S


i


=A


i


⊕C


i










C


0


=1










C




i+1




=A




i




C




i




i=


0 . . .


n−


1  (2)






The last equation implies an n-high AND tree for the most significant carry bit C


n−1


. In dynamic logic, however, an OR function can be implemented faster and using less area than an equivalently wide AND function, and thus it is advantageous to calculate the complemented carry signals:






S


i


=A


i


⊕C


i


=A


i


{overscore (C


i


+L )}+{overscore (A


i


+L )}C


i










{overscore (S


i


+L )}={overscore (A


i


+L ⊕C


i


+L )}=A


i


C


i


+{overscore (A


i


+L C


i


+L )}  (3)








{overscore (C


0


+L )}=0








{overscore (


C





i+1


+L )}={overscore (


A





i


+L )}+{overscore (


C





i


+L )}


i


=0 . . .


n−


1






In

FIG. 2

, the OR tree circuit that implements the last equations for {overscore (C


i


+L )} (i=0 . . . n) is schematically shown for a 64-bit incrementer. At the bottom, the input signals {overscore (A


i


+L )}(i=0 . . . 63) are indicated by their index i. The {overscore (C


0


+L )} input is shown tied to ground. At the top of the figure, the output signals {overscore (C


i


+L )}(i=0 . . . 63) and {overscore (C


out


+L )}={overscore (C


64


+L )} are indicated by their index i.




The circuit of

FIG. 2

implements a 4-bit merge carry look-ahead scheme. Except for a single 5-wide OR gate, the OR gates are maximally 4 wide, and they are arranged in a balanced tree. Buffers have been inserted into the tree to balance delay and to provide for the necessary drive of the signals with larger fan-out. Using the configuration of

FIG. 2

, no Carry signal takes more than 3 gate delays to be calculated.




A 4-wide OR element is shown in

FIG. 3

, as implemented in SRCMOS logic.




In equation 3 above, the logic functions for a dual rail sum circuit, generating signals S


i


and {overscore (S


i


+L )} were expanded, showing that the sum circuit requires the presence of both the true signals C


i


and A


i


and the complement signals {overscore (C


i


+L )} and {overscore (A


i


+L )}. In SRCMOS logic, signals are represented by voltage pulses on a net. To evaluate the sum logic correctly, the pulses representing the above signals have to overlap in time. This is accomplished in the following manner.




The true and complement input pulses A


i


and {overscore (A


i


+L )} are captured in input latches, as given in

FIG. 4

, which act as pulse to static converters. In a given machine cycle, an (active high) pulse only appears on one of the two inputs, which then sets the latch, comprised of back to back inverters I


1


and I


2


, to have either output node{overscore (AS


i


+L )} following a pulse on input node A


i


, or to have{overscore (AS


i


+L )} high, following a pulse on input node {overscore (A


i


+L )}. The output{overscore (AS


i


+L )} is therefore a static representation of the dual rail pulsed input signals.




The static {overscore (AS


i


+L )} signal from

FIG. 4

is now fed into the sum XOR circuit of

FIG. 5.

, and inverted to yield static signal AS


i


. Both AS


i


and {overscore (AS


i


+L )} are then combined (AND-ed) with a strobe pulse, to generate either a true or a complement pulse, AT


i


or {overscore (AS


i


+L )}, respectively. By use of the strobe, these last pulses are timed to coincide with (or be slightly delayed with respect to) the pulsed{overscore (C


i


+L )} signal resulting from the OR tree of FIG.


2


. The AND-ing of AT


i


or {overscore (AT


i


+L )} with C


i


and {overscore (C


i


+L )} constitutes the appropriate XOR or XNOR function to calculate the output sum signals S


i


and {overscore (S


i


+L )}.




Waveforms are given in

FIG. 6

for each possible combination of A


i


, {overscore (A


i


+L )}, C


i


and {overscore (C


i


+L )}, as depicted in 4 successive cycles separated by the vertical dividing lines, and annotated with the sum logic term activated during each cycle.




In the 1st cycle, annotated with S


i


=A


i


{overscore (C


i


+L )}, an input pulse on net A


i


results in AS


i


going high, so that the strobe triggers a pulse on AT


i


. If the OR tree resulted in{overscore (C


i


+L )} firing, coincident with the strobe, then C


i


is low during the pulse AT


i


, which therefore triggers, through transistor Q


14


in

FIG. 5

, a pulse on output net S


i


. In the next cycle, annotated with S


i


={overscore (A


i


+L )}C


i


, a similar sequence of events is depicted for an input pulse on net {overscore (A


i


+L )}. This results in a pulse {overscore (AT


i


+L )} at the time of the strobe. Since {overscore (C


i


+L )} did not fire (i.e., stays low), the{overscore (AT


i


+L )} pulse activates a pulldown conduction path through transistor Q


13


, resulting again in an output pulse S


i


.




The rest of the cycles of

FIG. 6

are analogous to those described above.




In

FIG. 5

, it is noticed that ground interrupt device Q


1


allows reset signal r


7


to start the reset (trailing edge) of AT


i


or {overscore (AT


i


+L )} before the trailing edge of the strobe. This feature allows pulse width control of the sum circuit independent of the pulse width in the carry tree.




The calculation of the sum in two stages in

FIG. 5

allows the final nFET AND stacks in the XOR and XNOR sub-circuits to be only two high, rather than


4


high (AS


i


, C


i


, strobe and ground interrupt). This optimizes the speed of the critical path.




For correct operation of the described circuit, the timing of the strobe signal is critical. As shown in FIG.


1


and

FIG. 7

, the strobe signal is generated by an OR function from the true and complement input of the least significant bit (LSB): strobe=A


0


+{overscore (A


0


+L )}. The strobe is then propagated to track the critical path in terms of time delay of each stage. To ensure that the tracking has minimal dependence on process variations, the strobe propagation circuit mimics the carry tree by employing a series of 4-wide OR gates with unused inputs tied to ground, as shown in FIG.


7


.




According to the SRCMOS circuit methodology, the unipolar switching circuits described above in

FIGS. 3 and 5

are reset using a locally derived reset signal, as opposed to a reset (precharge) by a global clock, as in Domino logic. For better margins control as well as low circuit cycle time two reset chains are used, as shown in FIG.


1


and as detailed in FIG.


8


. The first reset chain, generating reset pulses r


1


, r


2


, r


3


, r


4


, r


5


and r


6


services the OR gate tree and is triggered by the rising edge of the strobe signal. Since this chain resets the OR tree, it will also reset the strobe signal to standby low.




The second reset chain applies to the sum circuits of

FIG. 5

, generating reset pulses r


7


, r


8


, r


9


and r


10


. This chain is triggered by a very wide OR of all the sum circuit outputs S


i


and {overscore (S


i


+L )} (i=0 . . . n−1) of FIG.


5


. Whereas each of the nFETs Q


0




a


through Q


63




b


in

FIG. 8

may not be strong enough to pull down the “titrating OR” node S_OR, during the course of the evaluation of the sum circuits, eventually half of the nFETs will switch on, pulling down the S_OR node in the process, and triggering the reset chain.




The pulse width of nodes r


7


through r


10


is governed by the feedback loop starting from node r


9




a.


The S_OR node itself is reset using the feedback loop starting from node r


9


.




The polarities of the various pulsed signals (active high or low) is schematically indicated in FIG.


8


. Odd numbered reset pulses are active low (applied to pFETs), whereas even numbered reset pulses are active high and applied to nFETs.




Breaking the reset chain into two parts allows for easy output pulse width control, as indicated above. The reset chains can easily be altered by changing device sizes as well as adding additional links. This way, margins between reset pulses can be tailored and pulse widths can be controlled.




The reset chains comprise the necessary logic to force or to inhibit the reset signals, as required by the test modes for SRCMOS described in copending Chappell. The state of the global signals Reset, Evaluate and Static_Evaluate in the functional operation modes and various test modes is given in the following table (where L=low voltage (ground) and H=high voltage (vdd)):


















Global signal
















mode




Reset




Evaluate




Static_Evaluate











Functional




L




L




L







Reset




H




L




L







Evaluate (leakage)




L




H




L







Static_Evaluate




L




H




H / switching















In particular, the forced reset mode (Reset) or inhibited reset mode (Evaluate) are indicated by global signals Reset and Evaluate, respectively, and their locally buffered (and possibly inverted) versions RS, RS_ and EV_, as shown in FIG.


8


.




Furthermore, all unipolar switching nodes in the SRCMOS circuits described in

FIGS. 3 and 5

have been equipped with small leakage pFETs, activated in Static Evaluate test mode by an active low signal {overscore (SE)}, which is a locally inverted and buffered representation of global signal Static_Evaluate, again as described in copending Chappell. Thus the present circuit fully complies with the SRCMOS test modes described therein.



Claims
  • 1. An incrementing circuit comprising:an input latch for receiving a pulsed input data and outputting a static complement of the pulsed input data, the pulsed input data representing a number to be incremented; a carry-lookahead circuit, coupled to receive said static complement of the pulsed input data, said carry-lookahead circuit for generating a carry signal from the number to be incremented; and a summing circuit coupled to receive the carry signals from the carry-lookahead circuit and the pulsed input data representing the number to be incremented, said summing circuit for summing said carry signals and said pulsed input data and producing a pulsed output representing a sum.
  • 2. The circuit of claim 1, further comprising a strobe circuit for generating a triggering output to trigger said summing circuit to add the carry signals and the pulsed input data.
  • 3. The circuit of claim 1, wherein the carry lookahead circuit is an OR tree.
  • 4. The circuit of claim 3, wherein the OR tree evaluates the carry signals using negative logic.
  • 5. The circuit of claim 1, wherein the OR tree is implemented using dynamic logic.
  • 6. The circuit of claim 5, wherein the dynamic logic is self-resetting, and the reset signal is triggered locally.
  • 7. The circuit of claim 5, wherein a reset provided to the OR tree is globally generated.
  • 8. The system of claim 1, wherein the summing circuit is implemented using dynamic logic.
  • 9. A method for incrementing a number represented by a pulsed electrical signal, comprising steps of:converting the pulsed electrical signal representing the number into a static signal; using a complement of the pulsed electrical signal to determine carries required for incrementing the number; generating a pulsed data representation of the carries; and summing the static signal and the pulsed data representation of the carries to form a pulsed representation of the incremented number.
US Referenced Citations (4)
Number Name Date Kind
3989940 Kihara Nov 1976
4417315 Russell Nov 1983
5345110 Renfro et al. Sep 1994
5384724 Jagini Jan 1995
Non-Patent Literature Citations (1)
Entry
“FET DRAM Look-Ahead Address Incrementer” IBM, Tech. Discl. Bul., vol. 28 No. 1 Jun. 1985, pp. 71-73.