This invention relates to a memory storage element utilized in a digital circuit, and particularly to a design structure for a double edge-triggered scannable pulsed flip-flop that can be utilized in high frequency and/or low power digital circuits.
Microprocessors, microcontrollers, and Application-Specific Integrated Circuits (ASICs), as well as other digital components face an increasing demand for both high frequency clocking rates and low power dissipation. The demand for high frequency clocking rates derives from the need for data operations to occur at a high data-throughput rate. Meanwhile, the drive for new desk, counter, console, and rack applications demands power efficiency and miniaturization.
The total power consumption in a digital system is composed of several components and can be represented by the following equation:
P
machine
=P
clock
+P
registers
+P
logic
+P
leakage
+P
misc
where:
In many digital components a high data-throughput rate is achieved using a pipelined datapath and pipeline registers. The pipeline registers require a clock signal to synchronize the data operations, which is distributed to all registers through a centralized distribution system. The centralized clock distribution system dissipates significant power due to the switching capacitance of the network.
Typically, the power dissipated in the clock distribution system comprises approximately 25% of the overall machine power. The power dissipated is directly proportional to the frequency of the clock signal and is described by the following equation:
P
clock
=C·V
2
·F
where:
As the clock frequency increases in response to a higher demand for speed and data-throughput, the power dissipated in the clock distribution system also increases.
The register power can be approximated to be about 50% of the overall machine power on a typical machine. (For a deeply pipelined machined, the register power may comprise upwards of 70% of the overall machine power.) Thus, the clock distribution system and the pipeline registers together constitute over 75% of the overall machine power.
As micro architectures have become more complex and utilize deeper pipelines, it has become more difficult to thoroughly test the combinational logic in each pipeline stage. One way to ease the burden of testing different pipeline stages is to use pipeline registers and flip-flops that have a scan feature. The scan feature allows a set of test vectors to be inserted into a pipeline register in the middle of the pipeline and allows the result vectors to be scanned out of a pipeline register without having to wait for data to propagate through the entire pipeline. A major disadvantage of scannable registers and flip-flops is that they consume more chip area than a non-scannable flip-flop. Thus, the power consumed by the pipeline registers increases.
Therefore, there is a need for a scannable digital storage element that reduces power dissipation in the clock distribution system and in the pipeline registers while maintaining a high data-throughput rate.
The foregoing discussed drawbacks and deficiencies of the prior art are overcome or alleviated by a design structure embodied in a machine readable medium used in a design process, the design structure including a circuit for data storage. The circuit includes a double edge clock generation circuit for generating a pulse clock signal having first and second clock pulses for each clock cycle of a system clock. The circuit also includes a scan clock generation circuit for generating first and second scan clock signals. The circuit further includes a scannable pulse flip-flop circuit having a data input and a data output that are connected with an internal storage node. The scannable pulse flip-flop circuit further has a scan input and a scan output that are connected with the internal storage node. The scannable pulse flip-flop circuit is receptive to the pulse clock signal and the scan clock signals. The scannable pulse flip-flop circuit is configured to be operable in a function mode of operation and a scan mode of operation. In the function mode of operation, the first and second scan clock signals are held at a logic level to allow data to pass from the data input to the internal storage node at the first clock pulse and from the internal storage node to the data output at the second clock pulse signal. In the scan mode of operation the pulse clock signal is held at a logic level to allow data to pass from the scan input to the internal storage node at a pulse of the first scan clock signal and from the internal storage node to the scan output at a pulse of the second scan clock signal.
The subject matter, which is regarded as the invention, is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the invention are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:
Turning now to the drawings in greater detail, it will be seen that in
The double edge clock generation circuit 10 includes a clock pulse generation circuit 12 with an inverting clock delay circuit 14 for generating positive and negative edge triggered pulses, and a scan function controlling AOI (AND-OR-Invert) circuit 16. A negative edge-triggered flip-flop 17 accepts scan gating control, wherein an input (scan gate enabled “sg_n”) generates an output (scan gate “sg”). The clock pulse generation circuit 12 creates clock control signals on both edges of a master clock as shown in
In a functional mode the scan-gate (“sg”) is held low (i.e., scan de-asserted). When the scan-gate control is held low (i.e., de-asserted), the inverting clock delay circuit 14 keeps the logical inverse of the previous state of the master clock (“clk”) for a predetermined amount of time. When the clock transitions, the delayed clock is compared to the new state in a NAND gate 20 and a NOR gate 22. If “clk” rises, the delayed clock remains high for the short (delayed) period of time; the output of the NAND gate 20 then pulses low until the delayed clock rises. This signal ultimately generates the rising edge (of “clk”) triggered pulse (for “pclk”) in the timing diagram of
In a scan mode, the scan-gate (“sg”) is high (i.e., scan asserted), whereby set-up and hold is relative to system clock negative edge. When in scan operation mode, the “sg” signal disables the standard “pclk” operation. The “sg” signal rising edge is set-up to the falling edge of “clk”. This allows the disabling of the “pclk” pulses. The “pclk” signal is then forced low via the AOI circuit 16 in the clock control path and cannot trigger. When the scan operation is complete, the “sg” signal falling edge is set-up to the falling edge of “clk” to enable the “pclk” pulses. The scan operation (“sg”) is discussed in more detail hereinafter. The clock signal “pclk” is routed to a register comprised of the scannable pulsed flip-flop circuit 18, a single bit of which is shown in
Turning now to
In the scan mode, the signal “sclk2” connects the stored data at the internal storage node to a slave scan latch 30 and, subsequently, to a scan output node “so” which is the scan output of the pulsed flip-flop 18. Signal “sclk1” connects a scan input node “si” to a master scan latch 29 and, subsequently, to the internal storage node “int”. The scan clock generation circuit 26 (
Turning now to
Turning now to
In the functional mode operation, the data is launched twice every master clock (“nclk”) cycle by virtue of the “pclk” pulsing twice per cycle. Logic 36 between subsequent logical rows 34 of flops evaluates as the data is pumped through the system 32. At any time, the operation may be halted by a scan operation, described below, and the resulting evaluated data read and/or new preset data applied to the overall system for testing.
In the scan mode operation, when scan gate (“sg”) is driven low (set-up to fall of “clk”, as seen in
An important feature of the scan mode operation is that scan begins with the “sclk2” assertions to preserve the stored evaluation data. For example, if the operation began with “sclk1”, then the functional data would be destroyed by the incoming scan-in data, losing the machine state to be read.
During the scan read function, an equivalent scan write may also be performed. Each register master node (“int”) is set to the desired state. The output of each register bit immediately receives the scan information through inverter 28. The logic connected to the output of the register 34 evaluates the new data, ultimately driving it the input of the next register's data (“d”) input. The evaluated data is not transferred to the register's master storage node “int” since “pclk” remains low. At some point, “sg” is de-asserted and the “pclk” operation resumes. The functional operation with “pclk” immediately loads the functional data (which was awaiting on the normal data port), piping to the register bit output. This operation does not test the speed of the cycle's logic as the input was awaiting the “pclk” pulse. The next “pclk” pulse stores the data of the next cycle operation in a similar fashion. However, this data is forced through the logical pipe and is sampled at speed, allowing the cycle's operation to be testing for frequency. Each previous cycle, then, that received awaiting data also received new “at speed” data, allowing frequency testing to be similarly performed. At some point, “sg” is asserted again and the scan data can be read using the aforementioned method.
The scan mode operation is somewhat different than traditional scan functions, in that, the data output of the flip-flops is immediately evaluated rather than waiting on a slave/launch clock. However, it is important to note that debug and testability is preserved.
A sample register consisting of 32 bits was simulated for power consumption using the IBM “CPAM” power estimation tool. This was compared to a standard 32-bit STI-style master-slave flip-flop system of equivalent drive strength. The system clocks were adjusted for the two registers: the present system was clocked at 2 GHZ due to the double-pulse nature; the standard design was clocked at 4 GHZ to provide an equivalent data rate. Output loads were removed to preclude damping of the power data due to the load. The scan operation was disabled in both cases. Assuming a 20% data activity factor, the resulting power of the registers was shown to be: Pulsed Invention: 0.401 mW for 32-bit register; and Standard Master-Slave: 1.002 mW for 32-bit register. This represents a savings of approximately 60%. This value was rounded down to 50% for comparison purposes because of wire modeling. Additionally, the clock power is approximately 25% of the overall machine power and is directly proportional to frequency from the equation P=C*V2*F and consumes approximately 25% of the overall system power. Since the clock in the new system runs at one-half the frequency of a traditional system, an additional power savings of (50%)*(25%), or 12.5%. Therefore, the estimated power savings of the system is the sum of the register and the clock savings, or ˜37.5%.
Design process 820 includes using a variety of inputs; for example, inputs from library elements 835 which may house a set of commonly used elements, circuits, and devices, including models, layouts, and symbolic representations, for a given manufacturing technology (e.g., different technology nodes, 32 nm, 45 nm, 90 nm, etc.), design specifications 840, characterization data 850, verification data 860, design rules 870, and test data files 880, which may include test patterns and other testing information. Design process 820 further includes, for example, standard circuit design processes such as timing analysis, verification tools, design rule checkers, place and route tools, etc. One of ordinary skill in the art of integrated circuit design can appreciate the extent of possible electronic design automation tools and applications used in design process 820 without deviating from the scope and spirit of the invention. The design structure of the invention embodiments is not limited to any specific design flow.
Design process 820 preferably translates embodiments of the invention as shown in
The capabilities of the present invention can be implemented in software, firmware, hardware or some combination thereof.
As one example, one or more aspects of the present invention can be included in an article of manufacture (e.g., one or more computer program products) having, for instance, computer usable media. The media has embodied therein, for instance, computer readable program code means for providing and facilitating the capabilities of the present invention. The article of manufacture can be included as a part of a computer system or sold separately.
Additionally, at least one program storage device readable by a machine, tangibly embodying at least one program of instructions executable by the machine to perform the capabilities of the present invention can be provided.
There may be many variations to the steps (or operations) described therein without departing from the spirit of the invention. For instance, the steps may be performed in a differing order, or steps may be added, deleted or modified. All of these variations are considered a part of the claimed invention.
While the preferred embodiment to the invention has been described, it will be understood that those skilled in the art, both now and in the future, may make various improvements and enhancements which fall within the scope of the claims which follow. These claims should be construed to maintain the proper protection for the invention first described.
This non-provisional U.S. patent application is a Continuation-In-Part of pending U.S. patent application Ser. No. 11/531,310, which was filed Sep. 13, 2006, and is assigned to the present assignee.
Number | Date | Country | |
---|---|---|---|
Parent | 11531310 | Sep 2006 | US |
Child | 12123107 | US |