The invention is related to the field of electronic circuit design.
Electronic design automation tools help a user to create and assess a design of an electronic system in terms of its functionality, performance, and cost. For example, a user can simulate an electronic circuit before it is built to determine if it will perform as intended. If it performs below its desired level, the user can modify the design to improve system performance.
One factor that can affect a circuit's performance is the circuit's power consumption characteristics. For example, due to the constant reduction in scaling of integrated circuit (IC) process technology, the size and power consumption of integrated circuits continues to increase significantly. The instantaneous power and current demands of modern integrated circuits can be so large that the on-chip power grid that distributes power to these circuits are taxed by excessive voltage and power losses. For example, the power drawn by a given node at a given instant may overwhelm the power supply. These instantaneous demands can adversely affect the performance of the IC to the point of failure. Conventional design tools fail to account for dynamic variations in the power grid of the IC.
A method to analyze and correct dynamic power grid variations in an IC includes performing a dynamic power grid analysis of the circuit, identifying an excessive dynamic power grid voltage fluctuation from the analysis, and modifying the circuit to reduce the excessive dynamic power grid fluctuation.
a) and 4(b) show an example of tracing and simulating an active clock net using channel-connected components.
To remedy the problem of dynamic power grid variations in an integrated circuit (IC), decoupling capacitors are inserted between the on-chip power and ground grids of the IC. A decoupling capacitor acts as a local repository of charge that replenishes the charge drawn by a transistor or logic gate when it transitions from one logic state to another. To prevent large instantaneous currents from being drawn from the off-chip power supply and traversing the on-chip power grid, adequate amounts of decoupling capacitors are added throughout the power grid. One aim of adding decoupling capacitance is to minimize instantaneous voltage drop in the power grid to a small percentage (such as, for example, less than 10%) of the supply voltage rail. In order to reduce or minimize dynamic power grid variations, a dynamic power grid analysis is performed in the presence of decoupling capacitance to both determine an adequate amount of decoupling capacitance and to determine the locations to insert decoupling capacitors in order to reduce instantaneous power grid voltage fluctuation to a tolerable level.
Dynamic Power Grid Analysis and Correction Method
Transient Instance Current Calculation
An aim of calculating the transient instance current, 120, is to determine the maximum instantaneous current drawn systematically from the power supplies when the logic circuit instances switch state. In a synchronous (clocked) design, a sizeable current dissipation occurs in a clock cycle when the clock net(s) are activated and the latch outputs switch state. Therefore, the method determines the circuit conditions that will result in the maximum number of latch outputs switching when the clock is asserted between two adjacent clock periods. To determine these circuit conditions, the Boolean logic for the circuit, as well as an optimization criterion, are formulated. Assuming a circuit with N nets comprising many combinational logic gates, each with a fanin of F, the Boolean logic may be represented as follows:
xiP=ƒ(xi1P, . . . , xiFP) (a)
xiN=ƒ(xi1N, . . . , xiFN) (b)
where xiP and xiN refer to the Boolean variable of net i in the previous and next clock cycles respectively, i1, . . . , iFεN, and each variable is a logic function ƒ of its fanin. The sequential logic transition between the two clock cycles is represented as:
QjN=DjP (c)
where j=1, . . . , S and S is the total number of latches in the circuit, with D and Q being the input and output of the latch respectively, as illustrated in
These three equations constrain the operation of the circuit in two adjacent clock cycles. In addition, an optimization criterion is specified that will result in the maximum number of simultaneously switching nets while satisfying the logic operation equations. The optimization criterion is written as:
where ci represents the optimization cost. To ensure that the maximum number of latch outputs switch state, ci is set to 1 for all latch output nets and 0 for all other nets. The XOR condition enforces switching of logic level from the previous cycle to the next cycle on net i. The hazards are ignored in the above optimization criterion, because when a circuit simulation is run based on the generated 2-cycle setup (as described below), any hazard in the circuit and the resulting current consumption will be captured implicitly.
The above equations describe a weighted max-satisfiability problem. Recent algorithmic advances in Boolean satisfiability, as well as highly efficient solver implementations enable the application of max-satisfiability towards industrial sized circuits [1]. In terms of implementation, each logic equation for the entire circuit is translated into individual clauses, which are fed to a SAT solver, such as Chaff [2] for example, to check for satisfiability. The solved max-satisfiability problem ensures that for each latch that can switch simultaneously, its data input and its output state in the current cycle are of opposite logic states. For example, as shown in
Next, starting at the primary clock inputs, each active clock net is traced by first partitioning the clock net into channel-connected components (CCCs) and then sensitizing each CCC to propagate a switching input forward. The sensitization is performed using a binary decision diagram (BDD) to detect the possible paths from the power and ground rails of the CCC to its output. Solving the BDD sets the side inputs of the CCC to appropriate values to ensure propagation of the switching input to the CCC output. An example of sensitizing a CCC to propagate a switching input forward is illustrated in
Once the tracing is completed, a simulation is performed on the traced CCCs, one by one, to determine the instantaneous current drawn when the switching input is propagated through it. The switching output waveform at the output of one CCC is used as the input stimulus to a subsequent CCC in the trace. The waveforms at the input pins of the traced clock nets are assumed known (and provided by the user). Note that a simulation ordering is enforced to ensure that in the case of a CCC with multiple switching inputs (as in the case for parallel clock drivers), the switching inputs waveforms of a CCC are determined before simulation of that CCC is performed. An example of the simulation methodology is illustrated in
In addition to simulating the clock net, each switching latch is also simulated with the traced waveform applied to the clock pin, and a fixed state (as determined by the max satisfiability solution) is applied to the data input to ensure that the output changes state. The latch simulation can be performed in multiple ways. For example, if the total number of latches is small, each latch can be simulated as a collection of CCCs during the clock net simulation. Alternatively, to save computational time, simulation of each unique latch cell can be done in a pre-characterization step whereby the latch is simulated under varying switching input slew and output load conditions. The power supply current waveforms drawn by the latch are stored and reused during clock net tracing and simulation.
In simulating the clock net(s) distribution and the subsequently activated latches, the simulation time (interval of time over which simulation is performed) is relatively short, being roughly equal to the clock skew from the primary clock input to the furthest latch from it. Since the simulation stops after each latch changes state, the switching latch output does not propagate through the combinational circuitry connected to it. However, even if switching the combinational circuitry does not affect the peak instantaneous current dissipation, it may still be helpful to capture the power supply current characteristics over the entire clock cycle. This would use an extra simulation to be performed for each cone of combinational logic between latches.
Capturing these characteristics may be accomplished by partitioning each cone of combinational logic into CCCs, applying the latch output switching waveforms as input stimuli to them, and propagating these switching waveforms through the CCCs. To simulate the combinational logic circuitry, note that no extra sensitization is necessary since the max satisfiability solution would have performed this task. However, simulation ordering is once again applied to ensure that the inputs to a CCC are known prior to propagating a switching input through a CCC.
As a byproduct of calculating the instantaneous power currents as described above, a transistor-level netlist of the traced circuitry (clock net, latches and combinational circuitry) is written out in a suitable syntax for a circuit simulator (such as, for example, SPICE or Spectre). Also, voltage sources are written for the switching stimuli connected to the primary inputs of the clock net(s) and for the static logic values applied at all other primary inputs as dictated by results of the max satisifiability solution. Furthermore, initial states are applied to the latch outputs using appropriate initial condition statements for the circuit simulator. As a validation step, a user can simulate the generated transistor-level netlist with his/her circuit simulator of choice, and validate accuracy of the calculated instantaneous power supply currents as described above.
Power Grid Response Calculation
The transient instance currents are injected into the power grid network, 130, comprising resistive and capacitive parasitics as shown in
To solve the network of
where Gv is a N×N conductance matrix of the power grid, Gg is a M×M conductance matrix of the ground grid, vv, vg are the vectors of N power grid node voltages and M ground grid node voltages, vvc, vgc are vectors of P power grid and ground grid node voltages that are coupled by capacitors and Cvg is a P×P diagonal matrix representing these capacitors. Matrices Ev and Eg are matrices of size N×P and M×P respectively in which each column has a unit entry in the row corresponding to a capacitor connection and zero elsewhere. The voltage sources are not explicitly part of equations (1) or (2) since their contribution to the power grid response can be added to the power grid voltages, vv as a direct current (DC) term. At a time point of interest, t+Δt, the following equation is obtained from (1).
Note that in (3), the solution for the power grid node voltages is obtained by using the ground grid node voltage derivatives from the previous time point. This is equivalent to delaying the effect of the ground grid voltage by one time point. Similarly, the following equation is obtained from (2).
To solve (3) and (4), a first order implicit integration algorithm is employed to yield the following equations:
The solutions to (5) and (6) are computationally dominated by the calculation of four matrix inversions, Gv−1, Gg−1, (Gv+EvCvgEvT/Δt)−1 and (Gg+EgCvgEgT/Δt)−1. Since the matrices, Gv and Gg are the same matrices used in the calculation of a static IR drop analysis, which may be run prior to a dynamic analysis, the inverse matrices Gv−1 and Gg−1 can be readily available to solve (5) and (6). Furthermore, for a given right hand side, b, it is possible to calculate the solutions (Gv+EvCvgEvT/Δt)−1.b and (Gg+EgCvgEgT/Δt)−1.b by applying P rank one updates to Gv−1.b and Gg−1.b respectively, since Cvg is a diagonal matrix of rank P, (P<N; N=rank(G)). This is accomplished by repeated use of the Sherman Morrison formula on the matrix equation:
where ei is a vector of size N with a unit entry at row i and zero everywhere else. The inverse of a rank one update of matrix, Gv defined as:
(Gv+αeieiT)x=b (8)
which is solved by solving the two auxiliary problems
Gvy=b (9)
Gvz=αei (10)
for the vectors y and z. The solution is given in terms of these as:
Note that the dot products eiTy and eiTz are equivalent to picking out the ith entry of vectors y and z respectively and hence equation (11) uses N floating point multiplications and N floating point subtractions. For P updates of rank one as is used in the solution of equation (7), one would need to solve equation (10) P times and equation (11) (P2+P)/2 times. However, if the time step employed in the integration algorithm is fixed, the P solutions to (10) and (P2−P)/2 solutions to (11) can be done only one time a priori and stored as P vectors of size N. Then, at every time step, the solution to (7) can be computed using P solutions of (11) requiring a total of N*P floating point operations.
Therefore, at every time step the solution to equation (5) can be obtained using two forward back substitutions using the matrix inverse, Gv−1 requiring at most (N*N+1)/2 floating point operations (which may be much less owing to sparsity) and P rank one updates requiring N*P floating point operations. The solution to equation (6) is obtained likewise. Once the power and ground grid voltages are determined, the currents that flow into the pads (voltage sources), ip, ip2, ig, ig2 are determined by applying KCL at the corresponding pad nodes. This vector of short circuit currents, ipgeq is used in constructing an admittance macromodel of the power and ground grid network as described below.
Incorporating Effects of Off-Chip Parasitics
On an integrated circuit, the power and ground grids are connected through package and other off-chip parasitics to ideal board-level power supplies. Therefore, to predict on-chip dynamic power grid variations, the power and ground grid equations coupled with the package network equations are solved. To facilitate this solution, the power and ground pad nodes (connected to voltage sources) are defined as ports, and a macromodel of the coupled power and ground grid about the ports is constructed as follows:
where ipg and vpg are the port current and voltage vectors of size p (p is the number of pads), and E is a n×p matrix (n=N+M) where each column has a unit entry in the row corresponding to a port and zero elsewhere. Furthermore the matrices G and C of size n×n are defined as follows:
where the node voltage vector v is arranged as:
and the matrix Ev, Eg are the same matrices as defined in the previous section.
The current vector, ipgeq is the short circuit current flowing out of the pads as described in the previous section. In matrix form (12) and (13) are written as follows:
To create a reduced order macromodel of (16), an orthonormal projection matrix, Vq is determined using the block Arnoldi process [3]. The block Arnoldi process recursively produces an orthonormal basis for the Krylov subspace generated by the matrix G−1C and a starting block of vectors given by G−1E. After k iterations, the block Arnoldi process produces an n×q matrix where q=k×p (p is the number of ports) and q<n:
Vq=[V0V1 . . . Vk−1] (17)
whose columns form bases for the Krylov subspace generated by G−1C and G−1E. i.e.,
colsp(Vq)=Kr(G−1C, G−1E, q) (18)
Furthermore, its columns can be shown to be orthonormal. i.e.,
VqTVq=I (19)
The computational complexity of the block Arnoldi process is dominated by k solutions of the equation:
Vj=−G−1CVj−1 (20)
where Vj is the jth column and Vj−1 the previous column of the matrix in (17). To calculate (20) quickly we exploit the block diagonal structure of the G and C matrix as follows:
which leads to:
which in turn is calculated efficiently by exploiting the fact that inverse matrices Gv−1 and Gg−1 are readily available, Cvg is a diagonal matrix and that application of the matrices Ev, Eg, EvT and EgT amounts to selecting the appropriate rows or columns of the subsequent matrix.
Once the projection matrix, Vq is determined, it is appended with an identity matrix of size p such that the resulting matrix of size (n+p)×(q+p) projects the n+p equations of (16) to a matrix of q+p equations. The projection matrix is applied to (16) as follows:
Next, defining,
provides a (q+p)×(q+p) matrix of equations as follows:
In the time domain, the corresponding equations are:
which is a order q reduced order model of (12) and (13). The q order approximation guarantees that the first k=q/p block moments of the reduced order model and the original circuit model are the same.
At time t+Δt, applying a first order integration algorithm to (26), (27) results in the following equation describing the reduced order macromodel:
For the off-chip parasitics, applying a time domain integration to the equations governing them results in:
−ipg(t+Δt)=ioff(t+Δt)+Yoffvpg(t+Δt) (30)
where ioff is the off-chip short circuit current vector and Yoff the off-chip admittance matrix at the given time point. Solving (29) and (30) simultaneously results in the solution for vpg(t+Δt) and {circumflex over (v)}(t+Δt).
The updates to power and ground grid voltages due to off-chip parasitics can then be calculated as:
v(t+Δt)=Vq·{circumflex over (v)}(t+Δt) (31)
By linear superposition, the final power and ground voltage response is obtained as the sum of the solutions of (5), (6) and (31).
Incorporating Effects of Lossy Capacitors
Sometimes, a series resistance is added to a decoupling capacitor to better control the time constant with which it responds to switching current demand. Moreover, while modeling the decoupling effect of non-switching circuits, it is important to model the holding resistance of a non-switching circuit in series with the decoupling capacitor, as is illustrated in
To solve the network of
where Gvg is a diagonal P×P matrix containing the loss resistors. For each matrix solution, (Gv′)−1b, the corresponding matrix equation is defined as follows.
Next, multiplying the second row of (32) by Ev and adding to the first row results in:
which is then solved as two equations.
vN=Gv−1(bN+Evbp) (35)
vp=Gvg−1(bp+GvgEvTvN) (36)
Note that (35) reuses the factorization of the original power grid matrix (i.e., excluding the loss resistors) and (36) requires a factorization of the diagonal P×P matrix, Gvg. Using (35) and (36), the effect of loss resistors can be included in the dynamic power grid analysis with minimal computational overhead.
Top Instance Contributor Calculation
For each power/ground grid voltage node whose voltage response exceeds the preset tolerance level (“failed node”), it may be helpful to determine the top instance contributors to the failed node voltage response, 160. To reduce or minimize the voltage drop at the failed node, decoupling capacitance is added to the top contributing instances in order to provide local repositories of charge precisely where they are beneficial, 170. In order to compute the contributions, 160, the following equation is formulated:
zi=(G+sC)−1ei (32)
where i is the row number corresponding to the failed node in the vector of power and ground grid nodes, and zi represents the impedance from the failed node to all the power and ground grid nodes. By reciprocity,
zij=ejT(G+sC)−1ei=eiT(G+sC)−1ej=zji (33)
where zij is the impedance from node i to node j.
Next, defining node i as a port and employing k iterations of the block Arnoldi process recursively produces an orthonormal basis, Vk for the Krylov subspace generated by the matrix G−1C and a starting vector G−1ei. Note that the projection matrix Vk has k columns (k<<n) owing to the fact that only a single port is defined and it is determined efficiently by exploiting the block diagonal structure of the G and C matrices. Applying the projection matrix to (32) results in:
zi=Vk(Ĝ+sĈ)−1(VkT·ei) (34)
Ĝ=vkTGVkĈ=VkTCVk (35)
where Ĝ and Ĉ are k×k matrices. The contributions of all the instance currents on the voltage response at node i can be expressed as:
In the time domain this is calculated as:
where vij=zijij is the contribution of the jth instance current ij at node i.
Note that it is possible to diagonalize matrices Ĝ and Ĉ by employing a second projection on these matrices using the eigenvectors of Ĝ−1Ĉ as the projection matrix. Thus, at every time step, the solution to (37) and (38) requires just 4*k floating point operations.
Adding Decoupling Capacitors
Once the contribution waveform vij at node i from each instance j is computed, the contributors are sorted by the peak magnitude of their contribution. Starting from the biggest instance contributor, decoupling capacitors are added 170 between power and ground rails of each of the top instance contributors until the magnitude of the voltage response at the failed node is below the predefined tolerance.
For example,
The size of the decoupling capacitance for the jth instance contributor added during 750 is determined as:
where the charge, Qd dissipated over one clock period, T as determined by integrating the instance current ij(t), is used to determine the size of capacitor required to provide it. Once the decoupling capacitors have been added for a failed node, and its response is deemed fixed, the newly added decoupling capacitors are included in the global capacitance matrix so that they are included in the analysis required to fix the next failed node.
According to one embodiment of the invention, computer system 800 performs specific operations by processor 804 executing one or more sequences of one or more instructions contained in system memory 806. Such instructions may be read into system memory 806 from another computer readable medium, such as static storage device 808 or disk drive 810. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention.
The term “computer readable medium” as used herein refers to any medium that participates in providing instructions to processor 804 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks, such as disk drive 810. Volatile media includes dynamic memory, such as system memory 806. Transmission media includes coaxial cables, copper wire, and fiber optics, including wires that comprise bus 802. Transmission media can also take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications.
Common forms of computer readable media includes, for example, floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, carrier wave, or any other medium from which a computer can read.
In an embodiment of the invention, execution of the sequences of instructions to practice the invention is performed by a single computer system 800. According to other embodiments of the invention, two or more computer systems 800 coupled by communication link 820 (e.g., LAN, PTSN, or wireless network) may perform the sequence of instructions required to practice the invention in coordination with one another.
Computer system 800 may transmit and receive messages, data, and instructions, including program, i.e., application code, through communication link 820 and communication interface 812. Received program code may be executed by processor 804 as it is received, and/or stored in disk drive 810, or other non-volatile storage for later execution.
In the foregoing specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. For example, the above-described process flows are described with reference to a particular ordering of process actions. However, the ordering of at least some of the described process actions may be changed without affecting the scope or operation of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than restrictive sense.
This application claims the benefit of U.S. Provisional Application No. 60/474,711, filed May 29, 2003, which is hereby incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5553008 | Huang et al. | Sep 1996 | A |
5566083 | Fang | Oct 1996 | A |
5696694 | Khouja et al. | Dec 1997 | A |
5946475 | Burks et al. | Aug 1999 | A |
6134513 | Gopal | Oct 2000 | A |
6345379 | Khouja et al. | Feb 2002 | B1 |
6405348 | Fallah-Tehrani et al. | Jun 2002 | B1 |
6446016 | Zhu | Sep 2002 | B1 |
6509785 | Ang et al. | Jan 2003 | B2 |
6523154 | Cohn et al. | Feb 2003 | B2 |
6577992 | Tcherniaev et al. | Jun 2003 | B1 |
6807660 | Frenkil | Oct 2004 | B1 |
6898769 | Nassif et al. | May 2005 | B2 |
6976235 | Bobba et al. | Dec 2005 | B2 |
20030144825 | Korobkov | Jul 2003 | A1 |
20030212538 | Lin et al. | Nov 2003 | A1 |
Number | Date | Country | |
---|---|---|---|
60474711 | May 2003 | US |