The present invention relates to computer-aided analysis of non-linear systems and, more specifically, to computer-aided harmonic balance analysis.
Harmonic balance analysis (HB) is a popular method for simulating RF and microwave circuits. For circuits where the nonlinearities are moderate and the signal transitions are not very sharp, HB analysis is preferred over time-domain techniques because HB analysis gives unmatched performance and accuracy. HB analysis is particularly useful in RF and microwave applications because these applications rely heavily on measured frequency domain data such as s-parameters of linear networks. Unlike time-domain techniques, HB analysis can be used for direct simulation in the frequency domain without requiring expensive convolutions or fitting.
The drawback of traditional HB implementations is that the underlying HB Jacobian is often large and not very sparse. This makes the HB Jacobian very inefficient to factor and solve directly. Conventional HB implementations separate the linear and nonlinear portions of the circuit and formulate the problem so as to achieve balance between the harmonics of the waveforms of the two portions (hence the name harmonic balance). Since there is no coupling between harmonics in the linear portion, the linear portion can be arbitrarily large. Thus, the limitation of this approach is the size and nonlinearity of the portion containing the nonlinear elements.
One traditional approach to addressing these limitations uses Krylov subspace methods. These iterative methods do not require the HB Jacobian matrix to be explicitly formed, let alone factored. Instead Krylov subspace methods require that the matrix-vector product is formed at each iteration. Matrix-vector products with the HB Jacobian can be formed efficiently using a series of Fast Fourier Transforms (FFTs) and sparse matrix vector multiplications with circuit matrices. Iterative methods such as Krylov subspace methods tend to be increasingly useful with increasing circuit sizes. Therefore, conventional simulators restrict the use of a direct solution method to small circuits, and use Krylov methods for more complex circuits.
However, the effectiveness of Krylov subspace methods to the harmonic balance problem critically depends on selecting appropriate preconditioners. Otherwise, the number of iterations becomes prohibitively large. In addition, as the complexity and nonlinearity of circuits increases, Krylov subspace based HB engines require more advanced preconditioners and still require a large number of iterations to reliably solve the system of linear equations. In many cases, the conventional methods are unable to solve the system at all. Therefore, a robust and efficient harmonic balance solution is still needed.
A simulation system, computer-readable storage medium, and computer-implemented method factors a Jacobian matrix into a lower triangular component L and an upper triangular component U. A simulation system receives a Jacobian matrix having n rows of blocks and n columns of blocks. Each block Aij of the Jacobian matrix comprises a sub-matrix of numerical entries. The simulation system generates a test matrix Z having n rows and n columns of representative numerical entries. The representative numerical entries zij in the test matrix Z are each determined as a function of a corresponding block Aij in the Jacobian matrix. The simulation system performs an LU factorization on the test matrix Z to derive a lower triangular component LZ and an upper triangular component UZ. For each non-zero entry in LZ the simulation system determines a corresponding block in the lower triangular component L of the Jacobian matrix. For each non-zero entry in UZ determining, the simulation system determines a corresponding block in the upper triangular component U of the Jacobian matrix. The simulation system then outputs the lower and upper triangular factors L and U.
In a first embodiment, the described direct factorization technique is used to simulate a non-linear circuit. A simulation system receives a Jacobian matrix for a harmonic balance equation describing the non-linear circuit with at least one unknown state variable. Using a test matrix representative of the structure of the Jacobian matrix, the simulation system determine the locations of zero and non-zero entries in the lower triangular factor L and the upper triangular factor U of the Jacobian matrix. The simulation system then factors the Jacobian matrix into its lower triangular factor L and upper triangular factor U by solving only for entries in the L and U matrices determined to be the non-zero entries. The simulation system bypasses calculations on the entries determined to be the zero entries, thereby providing a computationally efficient solution. The simulation system then solves the harmonic balance equation for the at least one unknown state variable using the lower triangular factor L and the upper triangular factor U in an LU decomposition method.
In a second embodiment, the described direct factorization technique is applied to a Fourier envelope analysis problem. In this embodiment, the simulation system receives parameters of a multi-rate matrix equation representative of the Fourier envelope analysis. The simulation system generates the Jacobian matrix for the multi-rate matrix. The multi-rate matrix equation is then solved using the direct solve technique described above.
In a third embodiment, the direct factorization technique is used in combination with preconditioners as part of an iterative solver.
Advantageously, the claimed invention provides substantially improved robustness and computational efficiency over the conventional techniques.
The teachings of the present invention can be readily understood by considering the following detailed description in conjunction with the accompanying drawings. Like reference numerals are used for like elements in the accompanying drawings.
The figures depict embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.
A embodiment of the present invention is now described with reference to the figures where like reference numbers indicate identical or functionally similar elements. Also in the figures, the left most digit of each reference number corresponds to the figure in which the reference number is first used.
Overview
A, system, computer-readable storage medium, and method is disclosed for directly solving non-linear systems that have the HB Jacobian as the coefficient matrix. Unlike conventional techniques, a highly specialized direct solver exploits the unique structure of the HB Jacobian. Compared to the best preconditioned Krylov subspace methods, the direct solver of the present invention gives at least competitive performance on systems with mild nonlinearities, and far superior performance and robustness for systems with moderate to strong nonlinearities. Furthermore, the specialized direct solver gives far superior performance compared to a generic sparse direct solver.
In a first embodiment, the direct solver is configured to simulate non-linear circuits for RF or microwave applications. In a second embodiment, the direct solver is applied to Fourier envelope applications. Here, the direct solver lends itself very naturally to Jacobian bypass and therefore gives additional performance advantages over conventional simulation techniques. In a third embodiment, the direct solver is used together with new preconditioners for iterative methods. These preconditioners provide a trade-off between performance and memory, and in some cases they provide both speed and memory improvements over conventional techniques.
System Architecture
The processor 106 processes data signals and may comprise various computing architectures such as a complex instruction set computer (CISC) architecture, a reduced instruction set computer (RISC) architecture, or an architecture implementing a combination of instruction sets. Although only a single processor 106 is shown in
The input device 102 is any device configured to provide input to the simulation system 100. For example, the input device 102 may be used to input a circuit netlist 101 to the system 100 using external user devices such as a keyboard, a mouse, etc. Alternatively, the input device 102 can import a netlist 101 from an external network or database. The output device 108 represents any device equipped to output processed data to, for example, an external storage device, a network, or a display. For example, the output device 108 may output the simulation results 109 from the simulation system 100.
The memory 104 comprises a computer-readable storage medium that stores computer-executable instructions and computer-readable data. The instructions may comprise code for performing any and/or all of the techniques described herein. The memory 104 furthermore stores data inputted via input device 102, data to be outputted by output device 108, and any intermediate data used to carry out the process steps of the simulation engine 114 described herein. Memory 104 may be a dynamic random access memory (DRAM) device, a static random access memory (SRAM) device, Flash RAM (non-volatile storage), combinations of the above, or some other memory device known in the art. In operation, the processor 106 loads the computer-executable instructions and/or data from the memory 104 to carry out the process steps described herein.
In one embodiment, the memory 104 stores a computer simulation engine 114 and model equations database 112. The model equations database 112 stores analytical models modeling the behavior of a variety of circuit components (e.g., capacitor, inductors, transistors, diodes, etc.) that are typically included in electronic circuits, for use in generating equations representing the circuit to be simulated. The circuit simulation engine 114 stores computer-executable program instructions that when executed by the processor 106, cause the simulation system 100 to receive the circuit netlist 101 describing the nodes and connections of the electronic circuit to be simulated, generate equations representing the circuit based on the netlist 101 and the model equations stored in database 112, and execute a variety of algorithms for solving the equations. For example, in one embodiment, the simulation engine 114 provides computer-executable program instructions for performing harmonic balance analysis in order to determine at least one initially unknown voltage or current in the circuit described by the netlist 101.
The simulation system 100 may include more or less components than those shown in
Overview of Harmonic Balance Analysis
where x(t)εn represents the state variables in the circuit, n is the circuit size, y represents the matrix-valued impulse response function of frequency-domain linear elements (such as s parameters), q: n→n represents the nonlinear charge and flux storage in the circuit, f: n→n represents the memoryless nonlinearities and b: →Rn represents the time-dependent excitations in the circuit which are assumed to be periodic with period T.
The simulations system 100 derives 204 a harmonic balance equation from the system of equations in (1). Since the circuit is non-autonomous, the circuit steady state response x(t) will also be periodic with period T and its functions q(x) and f(x) are also T-periodic. Therefore all these waveforms can be expanded in Fourier series as follows:
Substituting the Fourier series in (2), Eq. (1) can be written in frequency-domain as:
where Y(f)=∫−∞∞y(t)exp(−j2πft)dt is the Fourier transform of y(t) and Yi=Y(if0). Since exp(j2πif0t) are orthogonal, it follows that:
YiXi+j2πif0Qi+Fi+Bi=0 (4)
∀i. The infinite summations can be truncated to a finite number of harmonics k. Collocating the above equation at iε[−k,k] yields:
Y−kX−k+j2π(−k)f0Q−k+F−kB−k=0
Y0X0+j2π0f0Q0+F0+B0=0
YkXk+j2πkf0Qk+Fk+Bk=0 (5)
In matrix form, the harmonic balance equation is given by:
Fhb=yx+j2πf0ΩQ+F+B=0 (6)
where, Ω=diag ([−kI . . . 0 . . . kI]) and
Q=[Q−k, . . . ,Q0, . . . ,Qk]T (7)
B, F and X are similarly defined, and
y=diag[Y−k . . . 0 . . . Yk] (8)
The harmonic balance equation in (6) represents a system of m(2k+1) nonlinear equations in m(2k+1) unknowns X.
The harmonic balance equation in (6) is solved 206 using Newton's method. Newton's method is a iterative algorithm that finds successively better approximations to the roots of an equation. As applied to the harmonic balance equation Fhb, an initial guess X0 is made for the vector of unknowns X. According to Newton's method, a better approximation X1 is computed by solving the following matrix equation:
Jhb(Ξ0)(Ξ1−Ξ0)=−Fhb(Ξ0) (9)
where Jhb is the harmonic balance Jacobian given by:
Generalizing to n+1 iterations, the following equation is iteratively solved to yield successively better approximations for X:
Jhb(Ξn)(Ξn+1−Ξn)=−Fhb(Ξn) (11)
At each iteration, Eq. (11) can be efficiently solved using an LU decomposition method. An embodiment of a process for efficiently factoring the HB Jacobian Jhb using a direct solver is described in further detail below with respect to
Representative Example Using Simple Rectifier Circuit
The diode current through diode D is given by:
where IS, υT and τ are constants based on the diode characteristics. Kirchoff's current law (KCL) states that the net current flowing into a node is equal to the net current flowing out of the node. Applying KCL at node V2, the following equation can be derived:
The linear structure 302 receives the input voltage VS and outputs a voltage VS′ as a function of VS according to the following equation:
−Vs′+∫−∞th(t−s)Vs(s)ds=0 (14)
The voltage source VS is a sinusoid of frequency f. Therefore, the following equation can also be derived:
Vs−sin(2πft)=0 (15)
The equations in (13)-(15) can be re-written in the generalized matrix form of
The most natural way of describing y is in terms of its Fourier transform:
H(f)=∫−∞∞h(t)exp(−j2πft)dt (17)
H(f) is typically measured using standard instruments and may comprise tabulated values of frequency f. Since the circuit is non-oscillatory, in steady-state, all the waveforms V2, VS, VS′, f, q will be periodic with frequency f and therefore can be written in Fourier series similarly to Eq. (2) above as:
The harmonic balance equation and harmonic balance Jacobian can then be derived following Eqs. (3)-(11) described above.
While the circuit in
Direct Solution of HB Jacobian
As previously described, the solution to the harmonic balance equation in (6) involves performing an LU factorization on the HB Jacobian Jhb. Beginning with Eq. (10) above, it can be easily shown that:
Jhb=y+j2πf0ΩΓCΓ−1+ΓGΓ−1 (19)
where
where
G is also similarly defined. Γ represents time to frequency translation and consists of a series of permutations and Fourier transforms. The Jacobian Jhb is large and not very sparse. Therefore a generic dense or sparse solver will not be able to solve this system very efficiently. Note that:
where Ci is the ith Fourier coefficient of C(t). I.e., the matrix block-circulant. The structure of Jhb is similar to (21) except that it is not block circulant as ΓCΓ−1.
A test matrix Z is formed 404 which has the same sparsity structure as the transient Jacobian, but includes a representative scalar entry in place of each (2k+1)×(2k+1) block Aij. Thus, the test matrix Z has n rows and n columns of scalar entries. For example, in the circuit of
zij=∥Aij∥ (22)
The test matrix Z should be constructed so that it mimics the properties of Jhb. For instance, if there is an entry in the transient Jacobian at (i, j)th location where only C matrix entry is present, then the corresponding block in JHB will be j2πf0ΩCi,j which is singular. This entry should not be a choice of pivot for the test matrix otherwise the matrix factor of Jhb will fail. This can be avoided by introducing numerical zeroes in such locations which will prevent these entries to be used as pivots when factoring the test matrix Z in the following steps.
The test matrix Z is factored 406 into its lower triangular component LZ and upper triangular component UZ such that Z=LZUZ. This factorization can be efficiently performed by a modern sparse LU solver. The solver, along with factoring the matrix will determine a row and column permutation and a row and column scaling scheme which will minimize the number of fills in factoring the test matrix. Moreover, the LU solver will provide the locations of the zeroes and non-zeros in the L and U matrices of the HB Jacobian.
Given LZ and UZ, LU factorization can be efficiently performed on Jhb to derive the factors L and U of the HB Jacobian. Using the information from the test matrix factorization (i.e. the locations of the zeroes and non-zeroes), the simulation system 100 can solve only for entries in the L and U matrices determined to be the non-zero entries, and bypass the entries determined to be the zero entries. Beginning with a first column j=1, row permutation and row and column scaling are performed 408 on the jth column of matrix Jhb. Beginning with the first row i=1, if there is a structural nonzero entry at the (i, j)th entry in UZ, then the (i,j)th entry Uij of the upper triangular factor of the HB Jacobian is computed 410 as follows:
Similarly, if there is a structural nonzero at the (i, j)th entry of LZ, then the (I,j)th entry Lij of the lower triangular factor of the HB Jacobian is computed 412 as follows:
The simulation system 100 then determines 414 if the current row i is less than the total number of rows, and if so, the computations in steps 410 and 412 repeat 420 for each row i of col j. Otherwise, the simulation system 100 determines 416 if the current column j is less than the total number of columns, and if so, steps 408, 410, 412, 414 are repeated 422 for each column j. Once all rows and columns have been processed, the final matrix factors L, U are outputted 418.
The operations in steps 410, 412 can be performed very efficiently using aggressively optimized implementations for the target CPU architecture. For example, in one embodiment a “BLAS” implementation is used as described in Blackford, et al., “An Updated Set of Linear Algebra Subprograms (BLAS).” ACM Transactions on Mathematical Software, 28:135-151, 2002, the content of which is incorporated by reference herein. In another embodiment, a LAPACK implementation is used as described in Anderson, et al., “LAPACK User's Guide,” 3rd ed., Society for Industrial and Applied Mathematics, Philadelphia, Pa., 1999, the content of which is incorporated by reference herein. Furthermore, these BLAS and LAPACK routines are also available in parallel form and multi-threaded versions of these routines can be used to further improve performance as the performance of these basic routines scales almost linearly with the number of processors.
Complexity Analysis
In practice, for improved computation efficiency, matrix Jhb need not be constructed explicitly and only the factors L and U are directly computed. During the factorization, the block matrices of Aij can be constructed on the fly and each block is only needed once. Therefore the storage requirement of the direct solver is:
(nnzl+nnzu)(2k+1)2 (25)
where nnzu and nnzl are the number of non-zeroes in U and L respectively. For comparison, in traditional preconditioned Krylov subspace methods, the storage requirement is 2nnt where nt is the number of time points. The computation complexity of the direct solver is
O((nnzlα
where αl, αu>1 but are close to 1. The complexity of a conventional Krylov subspace based method is somewhat less straight-forward to express. If a Krylov subspace method converges in 1 iterations, then the complexity of HB matrix vector multiply is:
lO((nnzc+nnzg)(2k+1)log(2k+1)) (27)
where nnzc and nnzg are the number of non-zeroes in C and G respectively. For an Arnoldi based Krylov subspace method, the complexity of running l iterations is O(l2n(2k+1)). The complexity of applying the preconditioner per iteration is typically O(nnzα(2k+1)).
If the number of Krylov subspace iterations l required to achieve convergence is greater than 2k+1, the direct method can be competitive or significantly superior in performance compared to Krylov subspace methods. For problems with moderate to strong nonlinearities, the number of Krylov subspace iterations can be very large even though the number of harmonics requested is moderate. In these situations, the direct method substantially outperforms the conventional Krylov subspace methods. For many applications, the direct solver method furthermore provides superior CPU time and improved robustness over conventional methods. Additional details of the performance results are described in more detail in Mehrotra, A. and Somani, A., “A Robust and Efficient Harmonic Balance (HB) Using Direct Solution of HB Jacobian,” Design Automation Conference, 2009, the content of which is incorporated by reference herein.
Direct Solve for Fourier Envelope Analysis
In another embodiment, the direct solve method of
The resulting partial differential equation can be efficiently solved and the circuit response can be derived from the multi-rate solution. A multi-rate partial differential equation (PDE) variation of (1) is given by:
where the convolution term has been omitted. If x(t1, t2) is the solution of (28), then it can be shown that x(t, t) is the solution of the original differential algebraic equation (DAE). Fourier envelope assumes that the signals are periodic in t2 and therefore can be expanded in Fourier series as described above in Eq. (2). After following steps substantially similar to Eqs. (3)-(5) above, the following matrix equation is derived:
The above DAE is discretized using an appropriate linear multistep method (such as Backward Euler) and integrated in t1:
Eq. (30) can then be solved using Newton's method. The Jacobian for (30) is of the form:
Note that the Fourier envelope Jacobian in (31) is substantially similar to the HB Jacobian except for an additional term
Therefore the direct factorization technique described above with respect to
In one embodiment, a Jacobian bypass technique is used. In this method, the Jacobian matrix is not factored (bypassed) and the previous factors are used to perform triangle solves with L and U. As the number of bypassed Jacobians increases, the total number of Newton iterations increases, but the time taken per Newton iteration reduces because of time savings due to Jacobian bypass.
Jacobian bypass can result in 2-3 times performance improvement over non-bypassed implementation of the direct solver. Performance results are discussed in more detail in Mehrotra, A. and Somani, A., “A Robust and Efficient Harmonic Balance (HB) Using Direct Solution of HB Jacobian,” Design Automation Conference, 2009, the content of which is incorporated by reference herein.
Preconditioners Based on Direct Solve
In one embodiment, the HB Jacobian Jhb can be segmented into four parts of approximately equal number of harmonics, as shown in
In an embodiment of the present invention, the direct solve technique of
is formed by multiplying JHB with [x1, . . . , xi−1, 0, . . . ]T and taking the result of the ith block row. The inclusion or non-inclusion of the Aij blocks applies only to the preconditioner matrix, and the accuracy of the solution of HB Jacobian is unaffected in both the cases.
The advantage of using direct solve of A11 and A22 is that these matrices need to be factored only once. In each subsequent Krylov subspace iteration, the process only needs to perform triangle solves with L and U factors of these matrices. This gives significant speed advantage over the preconditioner where A11 and A22 are also solved using Krylov subspace methods.
Complexity Analysis for Preconditioner-Based Direct Solve
Assuming that the matrix is divided into l segments of approximately equal number of harmonics, the storage requirement of each segment is reduced by the factor l2 but l segments are stored. Hence the overall memory requirement reduces by a factor of l. Similarly the matrix factor time is reduced by l2 l triangle solves are performed at each Krylov iteration. If m iterations are required for Krylov subspace method to converge to the solution, then the total solve time is:
The number of matrix vector multiplies with the HB Jacobian is m−1. Therefore the overall cost is:
This provides a direct trade-off between memory and speed. As l is increased, the memory consumption goes down but m increases which may increase the HB solve time depending how steep the increase in m is. Performance results are discussed in more detail in Mehrotra, A. and Somani, A., “A Robust and Efficient Harmonic Balance (HB) Using Direct Solution of HB Jacobian,” Design Automation Conference, 2009, the contents of which is incorporated by reference herein.
Although the present invention has been described above with respect to several embodiments, various modifications can be made within the scope of the present invention. For example, although examples are provided in the context of RF circuits, the described method for efficiently performing harmonic balance is applicable to other areas such as biological systems, chemical systems, dynamic systems, or other systems. In these applications, the system is modeled using an equation of the form of Eq. (1) above. Instead of the variables representing circuit components, they instead represent components of the relevant biological, chemical, or dynamic system. Such an equation can then be solved using the harmonic balance described herein to solve for the unknown variables in the relevant system. Accordingly, the disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.
Number | Name | Date | Kind |
---|---|---|---|
7577547 | Subramaniam | Aug 2009 | B2 |
20090292520 | Nwankpa et al. | Nov 2009 | A1 |
20100082724 | Diyankov et al. | Apr 2010 | A1 |