The present invention relates to the field of electronic design automation tools. In particular, the present invention relates to methods and system for simulating a circuit
An integrated circuit is a network of circuit elements such as resistors, capacitors, inductors, mutual inductors, transmission lines, diodes, bipolar junction transistors (BJT), junction field effect transistors (JFET), metal-oxide-semiconductor field effect transistors (MOSFET), metal-semiconductor field effect transistors (MESFET), thin-film transistors (TFT), etc.
The development of complicated integrated circuits often requires powerful numerical simulation programs. For example, circuit simulation is an essential part in the design flow of integrated circuits, helping circuit designers to verify the functionality and performance of their designs without going through expensive fabrication processes. As the semiconductor processing technology migrates to nanometer dimensions, new simulation methodologies are needed to solve the new problems intrinsically existing in circuit design with nanometer features. Modern integrated circuits continually challenge circuit simulation algorithms and implementations in the development of new technology generations. The semiconductor industry requires EDA software with the ability to analyze nanometer effects like coupling noise, ground bounce, transmission line wave propagation, dynamic leakage current, supply voltage drop, and nonlinear device and circuit behavior, which are all related to dynamic current. Thus, detailed circuit simulation and transistor-level simulation have become one of the most effective ways to investigate and resolve issues with nanometer designs.
Examples of electronic circuit simulators include the Simulation Program with Integrated Circuit Emphasis (SPICE) developed at the University of California, Berkeley (UC Berkeley), and various enhanced versions or derivatives of SPICE. SPICE and its derivatives or enhanced versions will be referred to hereafter as SPICE circuit simulators, or SPICE.
SPICE-like simulations may provide fairly accurate predictions of how corresponding circuits will behave when actually built. The predictions are preferably made not only for individual sub-circuit but also for whole systems (e.g., whole integrated circuits) so that system-wide problems relating to noise and the like may be uncovered and dealt with. In a general process flow of a SPICE-like simulation, an analog integrated circuit under simulation is often represented in the form of a netlist description. A netlist is a circuit description of the analog circuit to be simulated written in a SPICE-like language. SPICE netlists are pure structural languages with simulation control statements. Other language like Verilog-A™ has the capability to include behavioral constructs. The structural netlist of SPICE together with a predefined set of circuit components of the analog integrated circuit may be represented in the form of a matrix in accordance with certain circuit modeling methodologies (which is not a concern of the present invention). The number of non-linear differential equations ranges from 1 to n. There are a corresponding number of input vectors to be operated by the linear equation. The set of input vectors are shown as {I1, I2, . . . In}. Next, the linear matrix is computed with the set of input vectors to generate a set of solution vectors {V1, V2, . . . Vn}. The computation is repeated until the set of solutions converge. The set of solutions may be then displayed in the form of waveforms, measurements, or checks on a computer screen for engineers to inspect the simulation results.
However, SPICE-like simulation of a whole system becomes more difficult and problematic as the industry continues its relentless trek of scaling down to smaller and smaller device geometries and of cramming more interconnected components into the system. An example of such down scaling is the recent shift from micron-sized channels toward deep submicron sized transistor channel lengths. Because of the smaller device geometries, a circuit designer are able to cram exponentially larger numbers of circuit components (e.g., transistors, diodes, capacitors) into a given integrated circuit (IC), and therefore increases the matrix size to a complexity which may not be solved in a desired time frame.
SPICE models a circuit in a node/element fashion, i.e., the circuit is regarded as a collection of various circuit elements connected at nodes. At the heart of SPICE is the so-called Nodal Analysis, which is accomplished by formulating nodal equations (or circuit equations) in matrix format to represent the circuit and by solving these nodal equations. The circuit elements are modeled by device models, which produce model results that are represented in the circuit equations as matrices.
A device model for modeling a circuit element, such as the SPICE model for modeling MOSFET devices, developed by UC Berkeley, typically includes model equations and a set of model parameters that mathematically represent characteristics of the circuit element under various bias conditions. For example, a circuit element with n terminals can be modeled by the following current-voltage relations:
Ii=fi(V1 . . . , Vn,t)for i=1 . . . , n,
where Ii represents the current entering terminal l; Vj(j=1, . . . , n) represents the voltage or terminal bias across terminal j and a reference terminal, such as the ground; and t represents the time. The Kirchhoff's Current Law implies that the current entering terminal n is given by
A conductance matrix of the circuit element is defined by:
To model the circuit element under alternating current (AC) operations, the device model also considers the relationship between node charges and the terminal biases:
Qi=qi(V1 . . . , Vn,t)for i=1, . . . , n.
where Qi represents the node charge at terminal i. Thus, the capacitance matrix of the n-terminal circuit element is defined by
The SPICE method considers a circuit as a non-divided object. A circuit may be represented as a large numerically discrete nonlinear matrix for analyzing instant current. The matrix dimension is of the same order as the number of the nodes in the circuit. For transient analysis, this giant nonlinear system needs to solve hundreds of thousand times, thus restricting the capacity and performance of the SPICE method. The SPICE method in general can simulate a circuit up to about 50,000 nodes. Therefore it is not practical to use the SPICE method in full chip design. It is widely used in cell design, library building, and accuracy verification.
With some accuracy lost, the Fast SPICE method developed in the early 1990s provides capacity and speed about two orders of magnitude greater than the SPICE method. The performance gain was made by employing simplified models, circuit partition methods, and event-driven algorithms, and by taking advantage of circuit latency.
However, the assumptions made by the Fast SPICE method about circuit latency become questionable for nanometer designs because some subcircuits may have been functionally latent and yet electrically active because of voltage variation in Vdd and Gnd busses or in small crosstalk coupling signals. Also, the event-driven algorithm used by the Fast SPICE method is generally insufficient to handle analog signal propagation. Fast SPICE's capacity is limited to a circuit size considerably less than ten million transistors. It is therefore inadequate for full chip simulations for large circuits. Furthermore, the simulation time increases drastically with the presence of many bipolar junction transistors (BJTs), inductors, diodes, or a substantial number of cross coupling capacitors.
Thus, management and optimization of timing, power, and reliability become challenging tasks in nanometer designs because the conventional timing, power, and reliability analysis methods are insufficient to handle new features and new semiconductor processing technologies. Some effects like variability in production, circuit complexity, and significant parasitic effects need to be considered in a new light. Therefore, there is a need for a method and system that address the issues of the conventional simulation systems described above.
Methods and system for simulating a circuit are disclosed. In one embodiment, the method for simulating a circuit includes representing a circuit using a matrix, where the matrix represents a set of linear equations to be solved, identifying a delta matrix, which is a subset of the matrix that changed states from a previous time step to a current time step, computing an update of the delta matrix using a matrix decomposition approach, generating a current state of the matrix using a previous state of the matrix and the update of the delta matrix, and storing the current state of the matrix in a memory device.
In another embodiment, a system for simulating a circuit includes a graphics processing unit (GPU) having one or more multiprocessors and each multiprocessor includes a plurality of processors and a shared memory configured to be used by the plurality of processors, a graphical-user-interface for viewing representations of the matrix on a display, and a global memory for storing information related to the matrix. The system further includes logic for representing a circuit using a matrix, where the matrix represents a set of linear equations to be solved, logic for identifying a delta matrix which is a subset of the matrix that changed states from a previous time step to a current time step, logic for computing an update of the delta matrix using a matrix decomposition approach, logic for generating a current state of the matrix using a previous state of the matrix and the update of the delta matrix, and logic for storing the current state of the matrix in a memory device.
In yet another embodiment, a method for simulating a circuit includes receiving a description of the circuit in a netlist, computing a circuit topology from the netlist, creating a universal device (UD) tree for representing the circuit topology, where the UD tree includes a hierarchical arrangement of UDs in multiple levels, calculating a time step for simulation, simulating the UD tree in accordance with the time step, and storing simulation results in a memory device.
The aforementioned features and advantages of the invention, as well as additional features and advantages thereof, will be more clearly understandable after reading detailed descriptions of embodiments of the invention in conjunction with the following drawings.
Methods and systems are provided for simulating a circuit. The following descriptions are presented to enable any person skilled in the art to make and use the invention. Descriptions of specific embodiments and applications are provided only as examples. Various modifications and combinations of the examples described herein will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other examples and applications without departing from the spirit and scope of the invention. Thus, the present invention is not intended to be limited to the examples described and shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
Some portions of the detailed description that follows are presented in terms of flowcharts, logic blocks, and other symbolic representations of operations on information that can be performed on a computer system. A procedure, computer-executed step, logic block, process, etc., is here conceived to be a self-consistent sequence of one or more steps or instructions leading to a desired result. The steps are those utilizing physical manipulations of physical quantities. These quantities can take the form of electrical, magnetic, or radio signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. These signals may be referred to at times as bits, values, elements, symbols, characters, terms, numbers, or the like. Each step may be performed by hardware, software, firmware, or combinations thereof.
The memory device 104 may include high-speed random-access memory and may also include non-volatile memory, such as one or more magnetic disk storage devices. The memory device may also include mass storage that is located remotely from the GPUs/CPUs. The memory device preferably stores:
The databases, the application programs, and the program for simulating a circuit may include executable procedures, sub-modules, tables, and other data structures. In other embodiments, additional or different modules and data structures may be used, and some of the modules and/or data structures listed above may not be used.
It is a challenge in circuit simulation to handle large matrix size when a computer's physical memory is limited. The bottle neck is how to solve a discrete linear system. This situation becomes worse for nanometer designs because of the native properties in time domain analysis. Parallel LU decomposition is a method used in large and complex circuit simulation. In linear algebra, the LU decomposition is a matrix decomposition which writes a matrix as the product of a lower and upper triangular matrix. The product sometimes includes a permutation matrix as well. This decomposition is used in numerical analysis to solve systems of linear equations or calculate the determinant.
In general, LU decomposition consumes more computation time than solving the matrix. There are two methods commonly used for LU decomposition, namely the Left-Looking method and the Right-Looking method. For the Left-Looking method, LU decomposition is performed on a column-by-column basis. For example, when column j is processed, only data in columns to the left of column j(i<=j) are needed. The following pseudo codes show a typically algorithm for implementing the Left-Looking method.
for column j=1 to n do
end for;
For the Right-Looking method, LU decomposition is performed in a left to right fashion (thus the name Right-Looking). The following pseudo codes show a typically algorithm for implementing the Right-Looking method.
Loop k from 1 to N:
endLoop
Embodiments of the present invention employ a divide-and-conquer methodology that utilizes a universal device (UD) in performing LU decomposition of a matrix in parallel. In general, a UD is a physical domain that communicates with other physical domains through its ports.
As shown in
Note that by the way the UD tree is formed, UDs in a certain level of the UD tree are independent of each other, and UDs in different hierarchical branches of the UD tree are independent of each other. As a result, processing of UDs in each level as well as processing of UDs in different hierarchical branches of the UD tree may be conducted in parallel during circuit simulation.
where D represents ports of child UDs A1 310, A2 312, and A3 314, and it may be written in stamping format as shown in equation 2:
through numerical transformation, equation 1 may be re-written as equation 3.
where
is the Schur complement, which may be rewritten as:
Note that DAi−CAiAi−1BAi is the Schur compliment of Ai's, Ti and Si are stamping operators, and
is a parent UD's stamping matrix.
Note that since a newly formed parent UD is considered as a new UD, its stamping matrix is the Schur complement. If the newly formed parent UD is a child of another UD, the procedure described in
In solving UDs in the tree, multiple UDs may be assigned to corresponding multiple blocks of threads to be executed by multiple processors in parallel. The UD tree created in
According to embodiments of the present invention, the UD static management method processes UDs in a predefined sequence. This predefined sequence does not change during the simulation. The UD dynamic management method processes UDs in the sequence based on dynamic events occurred during the simulation. The UD horizontal management method processes UDs according the level of the UDs in the tree. In this approach, leaves in the bottom level are processed first, then the UDs in the second level are processed next. The process continues until the top level UD (TopUD) is processed. The UD vertical management method processes UD with a depth-first approach. The depth-first approach processes a parent UD after its children UDs have been processed. This approach takes advantages of the hierarchical memory structure in tackling memory intensive computations.
In processing UDs, the method solves or partially solves a system of linear equations for leaf UDs or parent UDs. For example, consider the case where a matrix A with dimension n×n is modified by a unsymmetric matrix of rank-one as
Ā=A+αuvT (Eq. 5)
in which, α is a parameter, u and v are vectors with dimension n. Assuming that A is decomposed as A=LU for determining its factors
Ā=
Through a transformation, equation 5 can be rewritten as:
Ā=A+αuvT=L(I+αpqT)U, where Lp=z,qTU=vT (Eq. 7)
if the factorization is formed
I+αpqT={tilde over (L)}ŨT (Eq. 8)
the modified Cholesky factors become the form
This is because the product of two lower-triangular matrices is a lower triangular matrix. Here, is a structured matrix, may be calculated through O(n2) operations. Thus the total operation to factor equation 5 through update is O(n2) while the total operation to factor equation 5 through direct method is O(n3). The manner in which the factorization of equation 8 is performed may increase processing efficiency to O(n2) instead of O(n3) by structured matrix multiplication.
Note that conventional circuit simulation methods need to calculate LU for every time step or Newton iteration. According to embodiments of the present invention, an approach for computing LU through updates based on LU of previous time step is described in the following section.
A=A0+ΔA
where the reference matrix A0 is the simulation matrix in the former time step, and ΔA 704 represents a part of the matrix that has changed its states between the previous and current time steps. From the intrinsic properties of circuit, ΔA 704 is typically highly sparse and low rank.
According to embodiments of the present invention, there are two ways to form ΔA. The first way is to use mathematical computations as shown below.
ΔA=A−A0
The second way is to use physical measurements. In this approach, the active region is defined using the device models. Then ΔA can be assembled from the stamping procedure described below.
First, the stamping procedure forms ΔA. Note that the time required for forming ΔA is shorter than the time required for forming a full matrix. This is because in the former case only active nodes in the circuit are considered. The method to choose active nodes can be based on 1) rate of voltage change, 2) rate of voltage change versus time, 3) rate of current change, 4) Newton convergence criteria, 5) time steps convergence criteria, and 6) estimated current of the active nodes.
Next, suppose ΔA have r ranks. ΔA may be decomposed as summation of rank-one's
Note that methods for decomposing ΔA include, but not limited to, singular value decomposition (SVD) and primary elementary matrix computation. Using SVD decomposition as an example, ΔA may be expressed as:
ΔA=UΣVT, where U, V are orthogonal matrices, Σ is diagonal matrix.
Then, the ΔA expression may be written as:
In other embodiments, ΔA may also be formed by mathematical transformation. For example, ΔA can be expressed as:
where ai, i=1, . . . , r with r is the rank of ΔA
ei is a vector has 1 in ith entry while all other entries are 0.
Next, the stamping procedure updates LU. Instead of decompose matrix A, the LU may be computed as follows:
In which, LiUi is obtained by recursively calling the rank-one update method described above.
In addition to circuit simulation, methods described above may be applied to other fields such as analyses involving time domain and non-linear systems, including but not limited to, fluid dynamics, aerospace, chemical processing, structure analysis, graphics rendering, MEMS, seismic, biotech, and electromagnetic field analyses.
It will be appreciated that the above description for clarity has described embodiments of the invention with reference to different functional units and processors. However, it will be apparent that any suitable distribution of functionality between different functional units or processors may be used without detracting from the invention. For example, functionality illustrated to be performed by separate processors or controllers may be performed by the same processors or controllers. Hence, references to specific functional units are to be seen as references to suitable means for providing the described functionality rather than indicative of a strict logical or physical structure or organization.
The invention can be implemented in any suitable form, including hardware, software, firmware, or any combination of these. The invention may optionally be implemented partly as computer software running on one or more data processors and/or digital signal processors. The elements and components of an embodiment of the invention may be physically, functionally, and logically implemented in any suitable way. Indeed, the functionality may be implemented in a single unit, in a plurality of units, or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units and processors.
One skilled in the relevant art will recognize that many possible modifications and combinations of the disclosed embodiments may be used, while still employing the same basic underlying mechanisms and methodologies. The foregoing description, for purposes of explanation, has been written with references to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described to explain the principles of the invention and their practical applications, and to enable others skilled in the art to best utilize the invention and various embodiments with various modifications as suited to the particular use contemplated.
This application claims the priority and benefit of provisional applications 60/977,972, “Divide Conquer Project Introduction,” filed on Oct. 5, 2007; 60/977,976, “Schur and UD Flow,” filed on Oct. 5, 2007; and 60/977,981, “RAPT Flow,” filed on Oct. 5, 2007, which are incorporated herein in their entirety by reference.
Number | Name | Date | Kind |
---|---|---|---|
6182270 | Feldmann et al. | Jan 2001 | B1 |
6577992 | Tcherniaev et al. | Jun 2003 | B1 |
20060161413 | Wei et al. | Jul 2006 | A1 |
20080010329 | Kuwada | Jan 2008 | A1 |
Number | Date | Country | |
---|---|---|---|
60977972 | Oct 2007 | US | |
60977976 | Oct 2007 | US | |
60977981 | Oct 2007 | US |