1. Technical Field
The present invention relates in general to integrated circuits.
2. Description of the Related Art
In many conventional integrated circuits, circuit operation is timed utilizing a clock signal, which synchronizes the flow of data signals through the circuit. A key design consideration for such clocked circuits is the timing requirements of the data signal(s) with reference to the clock signal, including the setup and hold times for the data signal(s). The setup time refers to the required relative arrival times of the clock and data signals. Hold time refers to the time following a clock pulse during which the data signal must remain stable in order to guarantee that the data passed to the next circuit stage is correct. If circuit timing requirements are not met, for example, if a data signal fails to meet the required setup time, the circuit may output incorrect data, possibly cascading to cause a larger system error or failure.
Because integrated circuits embodying the same circuit design in practice experience a range of timing behaviors due to a number of conditions, like temperature, voltage reference variations, fabrication process variations, etc., the timing analysis phase of the circuit design process typically includes so-called “corner” analysis in order to qualify an integrated circuit design across a wide range of conditions. In performing corner analysis, the operative assumption is that if a design works under each extreme condition, then assuming monotonic behavior, the design is also qualified for all intermediate conditions.
To enable a circuit design to pass corner analysis, timing requirements are often relaxed by the addition of excess timing margin to the circuit timing, thus enabling the timing requirements to be met across a wide range of conditions. As will be appreciated, the introduction of excess timing margin in a circuit design, while ensuring correct circuit operations, will eventually cause to circuit to fail its performance requirements.
In view of the foregoing, the present invention appreciates that it would be desirable to enable an integrated circuit to meet its setup time through improvements in the circuit design itself rather than the mere addition of timing margin to the design.
In one embodiment, an integrated circuit includes a circuit output, a data input that receives a data signal, and a clock input that receives a clock signal. The integrated circuit further includes first and second logic gates. The first logic gate has a first input coupled to the clock input, a second input coupled to the data input, and an output and a second logic gate. The second logic gate has a first input coupled to the data input, a second input coupled to the output of the first logic gate, and an output coupled to the circuit output. Setup time of the data signal relative to the clock signal at the second logic gate is improved by reciprocal gating of the data and clock signals.
In another embodiment, a memory circuit includes a plurality of memory cells that generate a plurality of matchline signals and a plurality of wordline driver circuits each coupled to receive a respective one of the matchline signals. Each of the plurality of wordline driver circuits includes first and second logic gates. The first logic gate has a first input coupled to the clock input, a second input coupled to the data input, and an output and a second logic gate. The second logic gate has a first input coupled to the data input, a second input coupled to the output of the first logic gate, and an output coupled to the circuit output. Setup time of the matchline signal relative to the clock signal at the second logic gate is improved by reciprocal gating of the data and clock signals.
In still another embodiment, a processor includes a cache memory that employs real addresses, a plurality of execution units for executing instructions, an instruction sequencing unit that fetches instructions from the cache memory for execution by the execution units, and an effective-to-real address translation table that translates effective addresses to real addresses to permit access to the cache memory. The effective-to-real address translation table includes a plurality of content addressable memory cells for storing effective addresses, a plurality of wordline driver circuits, and a random access memory. The plurality of content addressable memory cells generates a plurality of matchline signals in response to an input effective address. The plurality of wordline driver circuits are each coupled to receive a respective one of the matchline signals. Each wordline driver circuit includes a circuit output, a data input that receives the matchline signal, a clock input that receives a clock signal, and first and second logic gates. The first logic gate has a first input coupled to the clock input, a second input coupled to the data input, and an output and a second logic gate. The second logic gate has a first input coupled to the data input, a second input coupled to the output of the first logic gate, and an output coupled to the circuit output. Setup time of the matchline signal relative to the clock signal at the second logic gate is improved by reciprocal gating of the data and clock signals. The random access memory has a plurality of entries corresponding in number to the plurality of content addressable memory cells. Each entry of the plurality of entries includes a wordline that is asserted to read out a real address stored in that entry, and each of the wordlines is coupled to the circuit output of a respective one of the plurality of wordline driver circuits.
All objects, features, and advantages of the present invention will become apparent in the following detailed written description.
The novel features believed characteristic of the invention are set forth in the appended claims. However, the invention, as well as a preferred mode of use, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
With reference now to
As illustrated in
Instructions are fetched from instruction cache 110 and ordered for processing by instruction sequencing unit 114, which includes effective-to-real address translation (ERAT) table 116 for translating effective instruction fetch addresses generated by ISU 114 into the real addresses employed by instruction caches 110 and system memory 102. ISU 114 dispatches instructions according to instruction type. That is, fixed-point, load-store and floating point instructions are dispatched to fixed-point unit (FXU) 120, a load-store unit (LSUs) 124, and floating-point unit (FPU) 130, respectively. As further depicted in
Each of execution units 120, 124 and 130 is preferably implemented as an execution pipeline having a number of pipeline stages. During execution within one of execution units 120, 124 and 130, an instruction receives operands, if any, from one or more architected and/or rename registers within a register file (i.e., general purpose registers (GPRs) 122 or floating-point registers (FPRs) 128) coupled to the execution unit. After an execution unit finishes execution of an instruction, the execution unit notifies ISU 114, which schedules completion of instructions in program order.
Referring now to
Still referring to
With reference to
As depicted, WLD circuit 218 has as a data input mglob 300, which receives the global matchline signal provided by the CAM entry 216 containing WLD circuit 218. Mglob 300 is biased to a logic high value (‘1’) and asserted to a logic low value (‘0’) to signal a CAM miss. WLD circuit 218 also has a clock input camgat 302, which receives a one-shot logic low pulse that acts as the clock or “strobe” for WLD circuit 218, and one output, namely, wordline 220. As noted above, wordline 220 is asserted to a logic high value to access a real address in RAM 204.
WLD circuit 218 further includes a reset input 304, which receives a one-shot logic high pulse to reset (precharge) mglob 300 to a logic high value following evaluation in response to the strobe of camgat 302. Reset input 304 is coupled via two input inverters 310 and 312 comprising transistors T11, T12 and T13, T14, respectively, to the gate of precharge transistor T5. When reset input 304 transitions from logic high to logic low, precharge transistor T5 is turned on to restore mglob 300 to a logic high state. The logic high state is retained by a keeper circuit comprising a first keeper inverter 314 formed by transistors T1 and T2 and the second keeper inverter 316 formed by transistors T3 and T4. The input of first keeper inverter 314 is connected to mglob 300, and the output of first keeper inverter 314 is connected to the input of second keeper inverter 316. The output of second keeper inverter 316 is connected to mglob 300.
Camgat 302 is connected via a first clock inverter 320 comprising transistors T16 and T17 to the input of an AND gate 322 comprising transistors T19 and T20. AND gate 322 has a first input A at the gate of transistor T19, which is connected to data input mglob 300, a second input B at the gate of transistor T20, which is connected to the output of clock inverter 320, and an output C that is biased to a logic high value by transistor T18. A pull-up transistor T21 further has its gate connected to mglob 300 and a leg connected to output C to pull output C to a logic high value when mglob 300 has a logic low value. Thus, the logic state of output C is dependent upon, that is, gated by the global matchline signal received at mglob 300.
Output C of AND gate 322 is further connected to a second clock inverter 324 comprising transistors T22 and T23. The output of second clock inverter 324 is connected to input D of a two-input NAND gate 330 comprising transistors T6, T7, T8 and T15. The second input E of NAND gate 330 is connected to mglob 300. Output F of NAND gate 330 is coupled via inverter 332 comprising transistors T9 and T10 to wordline 220, the output of WLD circuit 218. By coupling the output of inverter 324 to input D of NAND gate 330 the logic value of output F of NAND gate 330, and thus the value of wordline 220, is timing dependent upon (i.e., gated by) the clock signal provided at camgat 302. Wordline 220 will only be asserted to a logic high state if mglob 300 has a logic high state and camgat 302 has a logic low state.
Still referring to
In conventional circuit designs, it is typical for a clock signal, such as camgat 302, to be pulsed or “fired” regardless of the state of corresponding data signal (e.g., the global matchline signal received at mglob 300). In such prior art circuit designs, observance of the setup time is a critical design factor address through rigorous corner analysis. In WLD circuit 218, however, input node D of NAND gate 330 will not pulse in response to a pulse received at camgat 302 if mglob 300 has a logic low state by virtue of the data gating imposed by AND gate 322. Consequently, setup time of WLD circuit 218 is relaxed.
As depicted in
WLD circuit 218 suppresses glitches on wordline 220 more effectively than conventional circuit designs because the clock signal received at input D of NAND gate 330 is a function of (i.e., gated by) the global matchline signal received at mglob 300, leading to an improved setup time or strobe margin. In addition, WLD circuit 218 is more insensitive to fabrication process variations than conventional circuit designs. For example, in the StrongP/WeakN process corner, the global matchline signal received at mglob 300 falls even more slowly than the nominal case. However, the slow slew rate also delays the arrival of the clock signal at input D of NAND gate 330 and decreases its glitch amplitude.
It should be noted that the setup time improvement afforded by the reciprocal clock-data gating of the present invention is actually more pronounced in corners exhibiting the greatest disparity in the slew rates of the clock signal derived from the strobe received at camgat 302 and the global matchline signal received at mglob 300. For example, in the depicted embodiment in which the global matchline signal received at mglob 300 is subject to significant RC loading on the global matchline by the CAM entry 216, the slow slew rate exhibited by the global matchline signal is substantially independent of device characteristics. Thus, if the integrated circuit performs at greater than nominal speed (e.g., in the best case corner), the slew of the global matchline signal is relatively unchanged. Once the voltage at mglob 300 drops below the voltage rail, the data gating provided by AND gate 322 tends to retard the clock signal more than normally. The window of opportunity in which clock gating is effective consists of the time that the global matchline signal is in transition between its high and low states. Thus, if the duration of this transition stays about the same for both fast and nominal device speeds, the gating has a relatively larger window in which to work compared to device speed, resulting in a better relative setup time than in a slower corner.
While the invention has been particularly shown as described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.
The present application is related to U.S. patent application Ser. No. ______, (Docket Number AUS920060757US1), which is filed concurrently herewith and incorporated herein by reference in its entirety.